Another Example with Dummy Variables

 

 

 Suppose we want to predict a restaurant’s sales from traffic flow going past the restaurant and the city in which the restaurant is located.

 

 First, we plot the sales vs. the traffic flow, but in a special way. We create the plot so that the points are broken up by the city. So, on the plot, seeing a `1` means this is the sales for a restaurant in city 1 with the specified traffic flow.

Does it appear that a different line is needed to describe sales in each city?

 

If so, does each line need a different slope and intercept?

 

 

Begin with a model that only uses traffic flow to predict sales:

 

Regression Analysis: Sales versus Flow

 

# Some output deleted

 

The regression equation is

Sales = 0.018 + 0.108 Flow

 

S = 0.5957      R-Sq = 93.4%     R-Sq(adj) = 93.1%

 

Analysis of Variance

 

Source            DF          SS          MS         F        P

Regression         1      111.34      111.34    313.75    0.000

Residual Error    22        7.81        0.35

Total             23      119.15

 

 

Now, try a model which uses the city variable, with no flow*city interaction. This gives parallel lines that have different y-intercepts.

 

 

Regression Analysis: Sales versus Flow, City1, City2, City3

 

# Some output deleted

 

The regression equation is

Sales = 1.08 + 0.104 Flow - 1.22 City1 - 0.531 City2 - 1.08 City3

 

S = 0.3623      R-Sq = 97.9%     R-Sq(adj) = 97.5%

 

Analysis of Variance

 

Source            DF          SS          MS         F        P

Regression         4     116.656      29.164    222.17    0.000

Residual Error    19       2.494       0.131

Total             23     119.150

 

 

 

 

Finally, set up a model with interaction terms, so the lines will have differing slopes and intercepts.

 

Regression Analysis: Sales versus Flow, City1, ...

 

# Some output deleted

 

The regression equation is

Sales = 0.709 + 0.109 Flow - 0.252 City1 - 0.618 City2 - 1.20 City3

           - 0.0156 FlowCity1 + 0.0055 FlowCity2 + 0.0049 FlowCity3

 

S = 0.3418      R-Sq = 98.4%     R-Sq(adj) = 97.7%

 

Analysis of Variance

 

Source            DF          SS          MS         F        P

Regression         7     117.280      16.754    143.39    0.000

Residual Error    16       1.870       0.117

Total             23     119.150