From the output below, we see the least squares line is
Regression Analysis: Univ GPA versus HS GPA, SAT, Activiti The regression equation is Univ GPA = 0.72 + 0.611 HS GPA + 0.00271 SAT + 0.0463 Activiti
. This means that, as SAT increases by 1
point, we predict that university GPA will increase by 0.00271 points.
From the Minitab output below
So we have very strong evidence against
.
Model appears useful in predicting university GPA.
Analysis of Variance Source DF SS MS F P Regression 3 160.237 53.412 12.96 0.000 Residual Error 96 395.697 4.122 Total 99 555.934
From the output below:
,
p-value =
using T-distribution with
df.
Very little evidence against
.
So activities can probably be dropped from the model.
Predictor Coef SE Coef T P Constant 0.721 1.870 0.39 0.701 HS GPA 0.6109 0.1007 6.06 0.000 SAT 0.002708 0.002873 0.94 0.348 Activiti 0.04625 0.06405 0.72 0.472
From the Minitab output below, the 90% PI is
Predicted Values for New Observations New Obs Fit SE Fit 90.0% CI 90.0% PI 1 8.079 0.303 ( 7.575, 8.582) ( 4.669, 11.488) Values of Predictors for New Observations New Obs HS GPA SAT Activiti 1 9.00 550 8.00
NOTE: The plots can use either the standardized residuals or the residuals.
The plot of the residuals vs. the
values does not show any
pattern, nor do there appear to be any outliers. Therefore, most of
our model assumptions seem satisified.
The QQ-plot appears linear, suggesting that it is safe to assume that the errors are normally distributed.
From the Minitab output below:
,
p-value =
using T-distribution with
df.
Very little evidence against
.
So we don't need the interaction term.
NOTE: You may have noticed a strange thing in this problem.
Without the interaction term, it appears that one variable (high school
GPA) is useful. However, when a (high school GPA
SAT)
interaction is included, no variables appear useful.
To understand bizarre results like this, you'll definitely want to attend Stats 3521 in the Winter :-)
Predictor Coef SE Coef T P Constant 2.738 6.375 0.43 0.669 HS GPA 0.3519 0.7888 0.45 0.657 SAT -0.00104 0.01167 -0.09 0.929 Activiti 0.04886 0.06483 0.75 0.453 gpasat 0.000480 0.001449 0.33 0.741
From the output below:
,
p-value =
using T-distribution with
df.
Little evdience against
.
There is little evidence that apt. vacancy rate is needed.
The regression equation is Drywall = - 138 + 4.97 Permits + 20.7 Mortgage - 10.9 A Vacanc + 0.50 O Vacanc Predictor Coef SE Coef T P Constant -137.9 163.0 -0.85 0.412 Permits 4.9657 0.4869 10.20 0.000 Mortgage 20.70 19.77 1.05 0.313 A Vacanc -10.946 8.019 -1.36 0.194 O Vacanc 0.504 3.336 0.15 0.882 S = 43.68 R-Sq = 89.6% R-Sq(adj) = 86.6% Analysis of Variance Source DF SS MS F P Regression 4 229881 57470 30.11 0.000 Residual Error 14 26717 1908 Total 18 256599
From the output below, the 99% CI for mean sales if 40 permits
were issued, mortgage rates were 8.5%, and vacancy rates were
2% and 10% respectively is
Predicted Values for New Observations New Obs Fit SE Fit 99.0% CI 99.0% PI 1 219.9 31.4 ( 126.3, 313.4) ( 59.7, 380.0) Values of Predictors for New Observations New Obs Permits Mortgage A Vacanc O Vacanc 1 40.0 8.50 2.00 10.0
The plot of the residuals vs. the
values does not appear to
have a pattern, or any noticeable outliers (even though the Minitab
output flags one potential outlier). Therefore most of our model
assumptions appear satistfied.
The QQ-plot appears linear, so the assumption of normally distributed errors is reasonable.
As an added point (doesn't have to be included in discussion), it looks like the least squares line comes reasonably close to the unusual value on our plot.
This pattern implies that one (or more) of our model assumptions is not satisfied.
Our unusual point also stands out for being far away from the
other points, in terms of its
value, and because it
also has a reasonably large residual.
(b) The plot of the least squares line on the new data is
attached. The
value is 0.129, which is much
smaller than observed with the entire dataset. This is
telling us that the model we have is not doing a very good
job.
NOTE: The explanation below is not required.
What happened? It's difficult to say, to be honest. If we
look at this new plot, it seems like the data exists in two
different groupings on our plot. However, the least squares
line goes right between the groupings, so is a poor
description of the relationship. So the low
is
probably reflecting this.
However, that doesn't explain why we go a more reasonable
value of
with the unusual value included.
I guess this problem is a good illustration of what an influential observation can do, and why we don't always have simple answers to things in statistics.
(c) Residual plot. Although there is no dramatic pattern
like before, there appears to be a pattern. In particular, it
looks like the residuals are getting further from 0 as
increases.
This suggests that the assumption that the errors have constant variance is violated.
The relationship is definitely not linear. The pattern appears
to be curved, with
dropped quickly as
increases to
start, then it starts to level off, and perhaps increase
slightly.
From output:
,
p-value =
.
Test at
: Since
, reject
.
Equivalently, look up
in T-table, and
reject since
.
In the words of the problem: we need the weight-squared term, or we conclude that there is a quadratic relationship between weight and ENE.