next up previous
Next: About this document ...

First and Second Order Models and the Partial F-test (Sect. 11.11 of Text (8th Ed.))



We wish to study a company manager's pay ($y$, in $000's) as a function of the satisfaction rating of the manager ($x_{1}$, which measures how happy he/she is with the job) and the number of employees in the company ($x_{2}$). A random sample of 27 managers from different companies was selected, and the data are below:

                                 Number of Employees 
                 
                         50       |       55       |       60 
         -------------------------|----------------|----------------
          80       50.8 50.7 49.4 | 93.7 90.9 90.9 | 74.5 73.0 71.2
         -------------------------|----------------|----------------
Sat.      90       63.4 61.6 63.4 | 93.8 92.1 97.4 | 70.9 68.8 71.3 
Rating   -------------------------|----------------|---------------- 	
         100       46.6 49.1 46.4 | 69.8 72.5 73.2 | 38.7 42.5 41.4

We begin by using a first order model to relate $y$ to $x_{1}$ and $x_{2}$:

\begin{displaymath}y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + e
\end{displaymath}

The output is below.
 The regression equation is
 Pay = 106 - 0.916 Satis + 0.788 Emp 

 Predictor        Coef     SE Coef          T        p
 Constant       106.09       55.95       1.90    0.070
 Satis         -0.9161      0.3930      -2.33    0.028
 Emp            0.7878      0.7860       1.00    0.326

 s = 16.67       R-sq = 21.2%     R-sq(adj) = 14.6%

 Analysis of Variance

 SOURCE       DF          SS          MS         F        p
 Regression    2      1789.9       895.0      3.22    0.058
 Error        24      6671.5       278.0
 Total        26      8461.4

 SOURCE       DF      Seq SS
 Satis         1      1510.7
 Emp           1       279.3

The F statistic and p-value suggest there is weak evidence to suggest this model is useful in pay.

Let's now try using a complete second order model, including a term to allow for an interaction between satisfaction rating and number of employees:

\begin{displaymath}y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2}
+ \beta_{3}x_{1}^{2} + \beta_{4}x_{2}^{2} +
\beta_{5}x_{1}x_{2} + e
\end{displaymath}

We can use the calc option in Minitab to create $x_{1}^{2} = $ Satissq, $x_{2}^{2} = $ Empsq and $x_{1}x_{2} = $ SatisEmp.

 The regression equation is
 Pay = - 5128 + 31.1 Satis + 140 Emp - 0.133 Satissq - 1.14 Empsq - 0.145 SatisEmp

 Predictor        Coef     SE Coef          T        p
 Constant      -5127.9       110.3     -46.49    0.000
 Satis          31.096       1.344      23.13    0.000
 Emp           139.747       3.140      44.50    0.000
 Satissq     -0.133389    0.006853     -19.46    0.000
 Empsq        -1.14422     0.02741     -41.74    0.000
 SatisEmp    -0.145500    0.009692     -15.01    0.000

 s = 1.679       R-sq = 99.3%     R-sq(adj) = 99.1%

 Analysis of Variance

 SOURCE       DF          SS          MS         F        p
 Regression    5      8402.3      1680.5    596.32    0.000
 Error        21        59.2         2.8
 Total        26      8461.4

 SOURCE      DF      Seq SS
 Satis        1      1510.7
 Emp          1       279.3
 Satissq      1      1067.6 #  1067.6 + 4909.6 + 635.1 =
 Empsq        1      4909.6 #  amount by which SSE rises if
 SatisEmp     1       635.1 #  Satissq, Empsq, SatisEmp dropped from model
Note that our F statistic and p-value give very strong evidence that this model is useful in predicting pay. Also note how $R^{2}$ has changed, from 0.212 in the first order model to 0.993 in the second order model. This also suggests that the second order model is a more appropriate choice.

However, we want to formally assess what appears to be happening: are these second order terms needed in the model? To answer this question, we use a partial F-test.

Complete model: $y = \beta_{0} + \beta_{1}x_{1} +
\beta_{2}x_{2} +
\beta_{3}x_{1}^{2} +
\beta_{4}x_{2}^{2} +
\beta_{5}x_{1}x_{2} + e$

Reduced model: $y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + e$.

Our hypotheses are:

\begin{displaymath}H_{o}: \beta_{3} = \beta_{4}= \beta_{5} = 0 \quad
(\mbox{do not need second order terms}) \end{displaymath}


\begin{displaymath}H_{a}: \mbox{at least one of above } \beta_{i} \neq 0 \end{displaymath}

We have 2 ways of finding $SSE_{R} - SSE_{C}$. One way is to use Minitab to fit both possible models, which we have done above. For the complete model we see that $SSE_{C} = 59.2$ while $SSE_{R} = 6671.5$ for the reduced model. Then

\begin{displaymath}
SSE_{R} - SSE_{C} = 6671.5 - 59.2 = 6612.3
\end{displaymath}

A second method is to fit the complete model, as shown in the Minitab output above. Using the sequential sums of squares Seq SS output from Minitab, we find

\begin{displaymath}
SSE_{R} - SSE_{C} = 1067.6 + 4909.6 + 635.1 = 6612.3
\end{displaymath}

Then this value is used to calculate our test statistic $F_{obs}$. We see that both methods give the same result.




Warning: Using the Seq SS values to find $SSE_{R} - SSE_{C}$ is only guaranteed to work if the variables we propose to drop from our model are listed last in the box of explanatory variables we supply to Minitab when using regression. For example, suppose we wanted to drop the $x_{2}$, $x_{1}^{2}$ and $x_{2}^{2}$ terms from our model. This doesn't make much sense, since we are keeping the $x_{1}x_{2}$ term, but we do it for illustrative purposes.

Using the first approach above, we could fit the reduced model that only includes $x_{1}$ and $x_{1}x_{2}$:

 The regression equation is
 Pay = 149 - 1.33 Satis + 0.00749 SatisEmp

 Analysis of Variance
 
 SOURCE       DF          SS          MS         F        p
 Regression    2      1717.0       858.5      3.06    0.066
 Error        24      6744.4       281.0
 Total        26      8461.4

Then $SSE_{R} - SSE_{C} = 6744.4 - 59.2 = 6685.2$.

However, if we go back to the complete model, and add together the Seq SS values for $x_{2}$, $x_{1}^{2}$ and $x_{2}^{2}$, we get

\begin{displaymath}
\mbox{Seq SS } = 279.3 + 1067.6 + 4909.6 = 6256.5 \neq 6685.2
\end{displaymath}

We have to follow a certain procedure if we want to use the Seq SS values to calculate
$SSE_{R} - SSE_{C}$:



RULE: To use the Seq SS values to calculate $SSE_{R} - SSE_{C}$, the terms we want to consider dropping must be listed last in the box of explanatory variables we supply to Minitab.

# Most output deleted 

 The regression equation is
 Pay = - 5128 + 31.1 Satis - 0.145 SatisEmp + 140 Emp - 0.133 Satissq - 1.14 Empsq

 SOURCE       DF      Seq SS
 Satis         1      1510.7
 SatisEmp      1       206.4
 Emp           1       708.0
 Satissq       1      1067.5
 Empsq         1      4909.7

Then, by adding the final 3 Seq SS values, which correspond to dropping $x_{2}$, $x_{1}^{2}$ and $x_{2}^{2}$, we find

\begin{displaymath}
708.0 + 1067.5 + 4909.7 = 6685.2 = SSE_{R} - SSE_{C}
\end{displaymath}

as we found previously.



Final Note: If you compare the Total SS values given in the ANOVA table by each regression model used, you will notice that it is always 8461.4. This is no accident. If you recall that this number is $SS_{yy}$:

\begin{displaymath}
SS_{yy} = \sum_{i=1}^{n} y^{2}_{i} - n(\bar{y})^{2}
\end{displaymath}

we see that $SS_{yy}$ doesn't care what regression model we use, since it doesn't depend on any of the $x's$ or $\beta's$. That's why, for a given data set, the Total SS value will never change, regardless of how many explanatory variables you use in your regression model.




next up previous
Next: About this document ...
Gary Sneddon 2003-10-08