Another Example with Dummy Variables
Suppose
we want to predict a restaurant’s sales from traffic flow going past the
restaurant and the city in which the restaurant is located.
First,
we plot the sales vs. the traffic flow, but in a special way. We create the
plot so that the points are broken up by the city. So, on the plot, seeing a `1`
means this is the sales for a restaurant in city 1 with the specified traffic
flow.

Does it appear that a different line is needed to describe sales in each
city?
If so, does each line need a
different slope and intercept?
Begin with a model that only uses traffic flow to predict sales:
Regression Analysis: Sales versus Flow
# Some output deleted
The regression equation is
Sales = 0.018 + 0.108 Flow
S = 0.5957
R-Sq = 93.4% R-Sq(adj) =
93.1%
Analysis of Variance
Source
DF SS MS F P
Regression
1 111.34 111.34 313.75 0.000
Residual Error
22 7.81 0.35
Total
23 119.15
Now, try a model which uses the city variable,
with no flow*city interaction. This gives parallel lines that have different
y-intercepts.
Regression Analysis: Sales versus Flow, City1,
City2, City3
# Some output deleted
The regression equation is
Sales = 1.08 + 0.104 Flow - 1.22 City1 - 0.531 City2 -
1.08 City3
S = 0.3623
R-Sq = 97.9% R-Sq(adj) =
97.5%
Analysis of Variance
Source
DF SS MS F P
Regression
4 116.656 29.164 222.17 0.000
Residual Error
19 2.494 0.131
Total
23 119.150
Finally, set up a model with interaction terms,
so the lines will have differing slopes and intercepts.
Regression Analysis: Sales versus Flow, City1,
...
# Some output deleted
The regression equation is
Sales = 0.709 + 0.109 Flow - 0.252 City1 - 0.618 City2
- 1.20 City3
-
0.0156 FlowCity1 + 0.0055 FlowCity2 + 0.0049 FlowCity3
S = 0.3418
R-Sq = 98.4% R-Sq(adj) =
97.7%
Analysis of Variance
Source
DF SS MS F P
Regression
7 117.280 16.754 143.39 0.000
Residual Error
16 1.870 0.117
Total
23 119.150