We wish to study a company manager's pay (
, in $000's)
as a function of the satisfaction rating of the manager
(
, which measures how happy he/she is with the job)
and the number of employees in the company
(
). A random sample of 27 managers from different companies was
selected, and the data are below:
Number of Employees
50 | 55 | 60
-------------------------|----------------|----------------
80 50.8 50.7 49.4 | 93.7 90.9 90.9 | 74.5 73.0 71.2
-------------------------|----------------|----------------
Sat. 90 63.4 61.6 63.4 | 93.8 92.1 97.4 | 70.9 68.8 71.3
Rating -------------------------|----------------|----------------
100 46.6 49.1 46.4 | 69.8 72.5 73.2 | 38.7 42.5 41.4
We begin by using a first order model to relate
to
and
:
The regression equation is Pay = 106 - 0.916 Satis + 0.788 Emp Predictor Coef SE Coef T p Constant 106.09 55.95 1.90 0.070 Satis -0.9161 0.3930 -2.33 0.028 Emp 0.7878 0.7860 1.00 0.326 s = 16.67 R-sq = 21.2% R-sq(adj) = 14.6%
Analysis of Variance SOURCE DF SS MS F p Regression 2 1789.9 895.0 3.22 0.058 Error 24 6671.5 278.0 Total 26 8461.4 SOURCE DF Seq SS Satis 1 1510.7 Emp 1 279.3
The F statistic and p-value suggest there is weak evidence to suggest this model is useful in pay.
Let's now try using a complete second order model, including a term to
allow for an interaction between satisfaction rating and number of
employees:
The regression equation is Pay = - 5128 + 31.1 Satis + 140 Emp - 0.133 Satissq - 1.14 Empsq - 0.145 SatisEmp Predictor Coef SE Coef T p Constant -5127.9 110.3 -46.49 0.000 Satis 31.096 1.344 23.13 0.000 Emp 139.747 3.140 44.50 0.000 Satissq -0.133389 0.006853 -19.46 0.000 Empsq -1.14422 0.02741 -41.74 0.000 SatisEmp -0.145500 0.009692 -15.01 0.000 s = 1.679 R-sq = 99.3% R-sq(adj) = 99.1% Analysis of Variance SOURCE DF SS MS F p Regression 5 8402.3 1680.5 596.32 0.000 Error 21 59.2 2.8 Total 26 8461.4 SOURCE DF Seq SS Satis 1 1510.7 Emp 1 279.3 Satissq 1 1067.6 # 1067.6 + 4909.6 + 635.1 = Empsq 1 4909.6 # amount by which SSE rises if SatisEmp 1 635.1 # Satissq, Empsq, SatisEmp dropped from modelNote that our F statistic and p-value give very strong evidence that this model is useful in predicting pay. Also note how
However, we want to formally assess what appears to be happening: are these second order terms needed in the model? To answer this question, we use a partial F-test.
Complete model:
Reduced model:
.
Our hypotheses are:
We have 2 ways of finding
.
One way is to use Minitab to fit both possible models, which
we have done above. For the complete model we see that
while
for the reduced model. Then
A second method is to fit the complete model, as shown in the Minitab
output above. Using the
sequential sums of squares Seq SS output from Minitab, we find
Warning: Using the Seq SS values to find
is only guaranteed to work if the variables
we propose to drop from our model are listed last in the box of
explanatory variables we supply to Minitab when using regression.
For example, suppose we wanted to drop the
,
and
terms from our model.
This doesn't make much sense, since we are keeping the
term, but we do it for illustrative purposes.
Using the
first approach above, we could fit the reduced model that only includes
and
:
The regression equation is Pay = 149 - 1.33 Satis + 0.00749 SatisEmp Analysis of Variance SOURCE DF SS MS F p Regression 2 1717.0 858.5 3.06 0.066 Error 24 6744.4 281.0 Total 26 8461.4
Then
.
However, if we go back to the complete model, and add together the
Seq SS values for
,
and
, we get
We have to follow a certain procedure if we want to use the
Seq SS values to calculate
:
RULE: To use the Seq SS values to calculate
, the terms we want to consider dropping must be
listed last in the box of explanatory variables we supply to
Minitab.
# Most output deleted The regression equation is Pay = - 5128 + 31.1 Satis - 0.145 SatisEmp + 140 Emp - 0.133 Satissq - 1.14 Empsq SOURCE DF Seq SS Satis 1 1510.7 SatisEmp 1 206.4 Emp 1 708.0 Satissq 1 1067.5 Empsq 1 4909.7
Then, by adding the final 3 Seq SS values, which correspond to
dropping
,
and
, we find
Final Note: If you compare the Total SS values given in the ANOVA
table by each regression model used, you will notice that it is always
8461.4. This is no accident. If you recall that this number is
: