next up previous
Next: About this document ...

Statistics 2501 (001)
Assignment #4: Oct. 22, 2003
Due in class: Nov. 3, 2003




If a question does not specify whether it should be done by hand or on the computer, you can use whichever method you prefer.

  1. A first-order linear regression model with 6 explanatory variables was fit to a data set of 33 observations. Complete the following ANOVA table:

     SOURCE       DF          SS          MS         F        p
    
     Regression   __       _____          10      ____    _____ 
    
     Error        __          20       _____ 
    
     Total        __      ______
    
    To get the p-value to place in the ANOVA table, you will do something similar to the example below in Minitab. Suppose we need to find $P(F > 6.4)$, using 6 and 11 d.f.

    First, select Calc - Probability Distributions - F

    Then, choose Cumulative Probability, supply 6 for numerator df, 11 for denominator, and supply 6.4 for Input constant. When you're ready, click OK.

    On the screen, this gives you an answer for $P(F < 6.4)$. What would you do to this result to find $P(F > 6.4)$?

    Of course, you will replace 6.4 by your F-value, and the degrees of freedom by those you find in this problem.

  2. The Temco company has collected salary information on all of its employees in four departments. You can access the worksheet containing this data by selecting the following in Minitab:

    Open Worksheet, then :M Drive, then Mtbwin - Studnt12 - Temco.mtw

    In this dataset, the column C1 contains the employee salaries, C2 is the number of years employed at Temco, and C4 is the years of education completed after high school. Let these be the variables $y$, $x_{1}$ and $x_{2}$ respectively.

    Now answer the following:

    1. Fit the regression equation

      \begin{displaymath}
y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{1}^{2} +
+ \beta_{3}x_{2} + \beta_{4}x_{2}^{2} + e
\end{displaymath}

      and report the equation of your least squares line.

    2. Is this model useful in predicting the salary? Base your conclusion on the p-value of the appropriate test.

    3. Find a 90% prediction interval for the salary of an employee who has worked at the company 8 years and has 4 years of education beyond high school.

    4. Plot the residuals vs. the $\hat{y}$ values and construct a QQ-plot of the residuals. Identify any possible outliers in the data and any assumptions that may be violated.

    5. Can we drop the terms involving education from the model in (a)? Test at $\alpha = 0.1$.

  3. Six samples of each of four types of cereal grain grown in a certain region were analyzed to determine thiamin content, resulting in the following data (measured in $\mu$g/g):

    Wheat 5.2 4.5 6.0 6.1 6.7 5.8
    Barley 6.5 8.0 6.1 7.5 5.9 5.6
    Maize 5.8 4.7 6.4 4.9 6.0 5.2
    Oats 8.3 6.1 7.8 7.0 5.5 7.2

    Use Minitab to fit the following model to the data:

    \begin{displaymath}
y = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} +
\beta_{3}x_{3} + e
\end{displaymath}

    where

    \begin{displaymath}
x_{1} = \left\{
\begin{array}{ll}
1 & \mbox{for Wheat} \\ ...
...mbox{for Maize} \\
0 & \mbox{otherwise}
\end{array} \right.
\end{displaymath}

    Now answer the following:

    1. What is the equation of the least squares line that predicts thiamin content from the type of cereal grain?

    2. Show how the $\hat{\beta}$ values are related to the average thiamin contents observed for each of the 4 grain types.

    3. Is your model useful in predicting thiamin content? Base your answer on the p-value of the appropriate test. In stating your hypotheses, express them both in terms of the $\beta$ values and the mean content for each of the 4 grain types.




next up previous
Next: About this document ...
Gary Sneddon 2003-10-23