SOURCE DF SS MS F p
Regression 6 60 10 13.00 0.000
--- ----- ----- -----
Error 26 20 0.769
---- -----
Total 32 80
----- -----
Reasons:
x P (X <= x)
13.00 1.000 # So P(F <= 13.00) = 1 (approx).
# So P(F > 13.00) = 1 - 1 = 0.000 (approx).
So p-value = 0.000 (approx.).
Regression Analysis: Salary versus YrsEm, YrsEmsq, Educ, Educsq The regression equation is Salary = 24187 + 844 YrsEm - 7.7 YrsEmsq + 1112 Educ + 76.0 Educsq
From the output, we see that
Analysis of Variance Source DF SS MS F P Regression 4 4052707001 1013176750 29.85 0.000 Residual Error 41 1391544402 33940107 Total 45 5444251403
From the output
Therefore we have very strong evidence against
.
So it appears that the model is useful in predicting the salary.
Predicted Values for New Observations New Obs Fit SE Fit 90.0% CI 90.0% PI 1 36111 1135 ( 34201, 38021) ( 26123, 46100) Values of Predictors for New Observations New Obs YrsEm yrssq Educ educsq 1 8.00 64.0 4.00 16.0From the output, we see that
NOTE: I will use the standardized residuals in my plots. It is perfectly fine to use the regular residuals, since the shape of the plots will be exactly the same. Therefore, full credit will be given if the regular residuals are used in the plots.
First, the residuals vs. the
values. There does seem to be a
pattern in this plot; as the fitted values (
values) increase,
the residuals are getting further from 0. This indicates one of our
model assumptions is probably violated. In particular, it suggests that
our assumption that the errors all have the same standard deviation
(
) may not hold.
In terms of potential outliers, there appear to be one or two, as we see one standardized residual around -3, and another around -2.5.
Next, the QQ-plot to assess normality. A linear pattern in this plot may seem reasonable, if we ignore the 2 points to the far left of the plot (the potential outliers). If we do this, then the assumption of normality in the errors seems reasonable.
We are testing if two terms can be dropped. This is a partial F-test, and should not be done as two separate t-tests, one on each term. Three (3) points will be deducted if this is done.
There are 2 ways to find
. FULL CREDIT will be
given to either approach.
Method 1: Use the output from the complete model used in (a), where we have included the variables we want to drop last in the regress command. The required portion of the output is below:
Source DF Seq SS YrsEm 1 3187557844 YrsEmsq 1 12205007 Educ 1 828603692 Educsq 1 24340458Then
Method 2: From the output of the complete model in (a), we see
that
. Now we use Minitab to fit the reduced
model:
Regression Analysis: Salary versus YrsEm, YrsEmsq Analysis of Variance Source DF SS MS F P Regression 2 3199762850 1599881425 30.65 0.000 Residual Error 43 2244488552 52197408 Total 45 5444251403From this output,
Then
From the F-table:
, using 2 and 40 df, since 41 df is
not in the table.
Since
, we reject
.
Therefore it appears that we cannot drop the education terms.
The model is
Regression Analysis: Yield versus Wheat, Barley, Maize The regression equation is Yield = 6.98 - 1.27 Wheat - 0.383 Barley - 1.48 Maize
From the output, the least squares line is
NOTE: If someone just states
,
, etc., without showing any numerical
calculations, take off one point for this question.
I've used Minitab to find the 4 averages. However, it's fine if a calculator was used, so no Minitab output is shown.
From the output below, we see that
Wheat:
,
Barley:
,
Maize:
,
Oats:
,
and
(Oats)
(Wheat - Oats)
(Barley - Oats)
(Maize - Oats)
Predictor Coef SE Coef T P
Constant 6.9833 0.3552 19.66 0.000
Wheat -1.2667 0.5023 -2.52 0.020
Barley -0.3833 0.5023 -0.76 0.454
Maize -1.4833 0.5023 -2.95 0.008
Variable Type N Mean
yield Wheat 6 5.717
Barley 6 6.600
Maize 6 5.500
Oats 6 6.983
Now we must express these in terms of the mean yields for the variety
of peas. Although I will rewrite both
and
, only the
rewriting of
will be graded.
Suppose we let
mean content for wheat,
,
mean content for oats. Then we can write
: at least 2
's differ.
From the output,
Test at
: Since
p-value =
, we reject
.
So the model is useful (at least 2 of the mean thiamin contents differ).
NOTE: Give full credit if F-table is used to find
.
Analysis of Variance Source DF SS MS F P Regression 3 8.9833 2.9944 3.96 0.023 Residual Error 20 15.1367 0.7568 Total 23 24.1200