next up previous
Next: About this document ...

Statistics 2501 (001)
Assignment #2: Solutions



  1. We could not lengthen the lives of people on Rwanda by shipping them TV sets. The high correlation does not imply that having a TV causes people to live longer. This is just an illustration that a high correlation does not imply a cause-and-effect relationship.

    The main reason that would account for the high correlation is that countries with a large number of TV's per person will be countries with high standards of living, so have healthier people. Therefore, they live longer.

    However, any explanation that is reasonable will be accepted.

  2. Refer to the data summarized in the problem.

    1. Find the least squares line.

      \begin{displaymath}
\bar{x} = 42.9/10 = 4.29, \quad
\bar{y} = -172.7/10 = -17.27
\end{displaymath}


      \begin{displaymath}
SS_{xy} = \sum x_{i}y_{i} - n\bar{x}\bar{y}
= -895.5 - 10(4.29)(-17.27) = -154.62
\end{displaymath}

      \begin{eqnarray*}
\hat{\beta}_{1} & = & \frac{SS_{xy}}{SS_{xx}}
= -154.62/38.4...
...r{y} - \hat{\beta}_{1} \bar{x}
= -17.27 - (-4.03)(4.29) = 0.019
\end{eqnarray*}



      Therefore the least squares line is: $\hat{y} = 0.019 - 4.03x$.

    2. In this problem, you can either find $r$ first, then use that to find $R^{2}$, or find $R^{2}$ first, and use that to calculate $r$.

      NOTE: This part has an error that is revealed if someone tries to use both approaches to get $r$: the results do not match. This is because the SSE value give in the question is wrong. It should be about 678, not 33.9. So, give full marks to either approach, regardless of the answer.

      First,

      \begin{displaymath}
SS_{yy} = \sum y_{i}^{2} - n\bar{y}^{2}
= 4283.2 - 10 (-17.27)^2 = 1300.67
\end{displaymath}

      Then

      \begin{displaymath}
R^{2} = 1 - SSE/SS_{yy} = 1 - (33.9/1300.67) = 0.974, \quad
r = - \sqrt{0.974} = 0.987
\end{displaymath}

      or

      \begin{displaymath}
r = \frac{SS_{xy}}{ \sqrt{ SS_{xx}SS_{yy} } }
= \frac{-154...
...4)(1300.67)} } = -0.692, \quad
R = r^{2} = (-0.692)^2 = 0.479
\end{displaymath}

    3. Test for a positive negative relationship between $x$ and $y$. This implies we have:
      $H_{o}: \beta_{1} = 0$
      $H_{a}: \beta_{1} < 0$

      From (a), we know that $\hat{\beta} = -4.03$. We are also told that SSE = 33.9 and $SS_{xx} = 38.4$. Therefore

      \begin{displaymath}
\hat{\sigma} = s = \sqrt{MSE}
= \sqrt{SSE/(n-2)}
= \sqrt{33.9/8} = 2.06
\end{displaymath}

      Then our test statistic is:

      \begin{displaymath}
t_{obs} = \frac{ \hat{\beta}_{1} }{\hat{\sigma}/\sqrt{SS_{xx}} }
= \frac{-4.03}{2.06/\sqrt{38.4} } = -12.12
\end{displaymath}

      P-value = $P(t \leq -12.12)$ with $(n - 2) = 8$ df.

      Since the T-distribution is symmetric, $P(t \leq -12.12) = P(t \geq 12.12)$.

      From the T-table, we find that $P(t \geq 5.041) = 0.0005$.

      Since $12.12 > 5.041$, $P(t \geq 12.12) < 0.0005$. Therefore p-value $< 0.0005$.

      If you used Minitab, you would have found $P(t \leq -12.12) = P(t \geq 12.12) = 0$ (approximately).

      Therefore we have very strong evidence against $H_{o}$.

      Therefore there is very strong evidence to suggest there is a positive linear relationship between $x$ and $y$.

    4. Find a 95% CI for $y$ when $x_{p} = 3$.

      When $x = 3$, $\hat{y} = 0.019 - 4.03(3) = -12.07$.

      We want a 95% CI, so $\alpha = 0.1$, so
      $t_{\alpha/2} = t_{.025} = 2.306$ using the T-table with $(n - 2) = 8$ df.

      The 95% CI is

      \begin{eqnarray*}
\hat{y} \pm t_{\alpha/2} \hat{\sigma} \sqrt{
\left( \frac{1}...
...}{38.4}
\right) } \\
& = & -12.07 \pm 1.80 = (-13.87, -10.27)
\end{eqnarray*}



  3. Refer to 10.37, p. 485.

    1. 10.37(a). Plot the data. The plot is attached.

      From the plot, it appears that, as the rank on the wedding day gets higher (which, for tennis players, means they are getting worse), the anniversary ranking gets higher. However, it doesn't seem to be a very strong relationship. It also seems that an assumption of a linear relationship is questionable.

    2. 10.35(b). Find the least squares line. A portion of the Minitab output follows:
      The regression equation is
      Anniv = 15.9 + 0.927 Wedding
      
      Predictor        Coef     SE Coef          T        P
      Constant        15.88       12.85       1.24    0.231
      Wedding        0.9273      0.3782       2.45    0.024
      
      S = 41.60       R-Sq = 23.1%     R-Sq(adj) = 19.3%
      
      From the output, we see the least squares line is

      \begin{displaymath}
\hat{y} = 15.9 + 0.927x
\end{displaymath}

    3. 10.37(c). Test if there is a linear relationship.

      $H_{o}: \beta_{1} = 0$
      $H_{a}: \beta_{1} \neq 0$
      From the Minitab output

      \begin{displaymath}
t_{obs} = 2.45, \quad
\mbox{p-value} = 2P(t > 2.45) = 0.024
\end{displaymath}

      Test at $\alpha = 0.05$: There are two approaches that can be used:

      Since p-value = $0.024 < 0.05$ (using T-distribution with $(n - 2) = 20$ df) , reject $H_{o}$.

      Or, $t_{\alpha/2} = 2.086 < 2.45$, so reject $H_{o}$.

      Therefore there does appear to be linear relationship between wedding and anniversary ranking.

    4. 10.37(d). What would $\beta_{0}$, $\beta_{1}$ be if there was no change in rankings after marriage?

      In other words, a ranking of 50 on the wedding day would still be a 50 on the first anniversary. This means $\beta_{0} = 0$ and $\beta_{1} = 1$, so the equation would be $y = x$.

    5. Find a 99% PI for $y$ if $x_{p} = 21$, using Minitab or by hand. From the Minitab output below:
      Predicted Values for New Observations
      
      New Obs     Fit     SE Fit         99.0% CI             99.0% PI
      1         35.35       8.97   (    9.82,   60.88)  (  -85.73,  156.44)   
      
      Values of Predictors for New Observations
      
      New Obs   Wedding
      1            21.0
      
      we see that the 99% PI for $y$ is:

      \begin{displaymath}
\hat{y} \pm t_{\alpha/2} \hat{\sigma} \sqrt{
\left( 1 + \f...
...x_{p} - \bar{x})^{2}}
{SS_{xx}} \right) }
= (-85.73, 156.44)
\end{displaymath}

      You could also calculate this interval by hand, where $\hat{\sigma} = 41.6$ from the output, and the other terms would have to be calculated by hand.

      Note that the lower value of the PI isn't realistic, since it says the player's ranking could be -85.

  4. Problem on drywall sales.

    1. Find the least squares equation that predicts sales from the explanatory variables.

      A portion of the Minitab output is below:

      Regression Analysis: Drywall versus Permits, Mortgage, ...
      
      
      The regression equation is
      Drywall = - 138 + 4.97 Permits + 20.7 Mortgage - 10.9 A Vacanc + 0.50 O Vacanc
      
      Predictor        Coef     SE Coef          T        P
      Constant       -137.9       163.0      -0.85    0.412
      Permits        4.9657      0.4869      10.20    0.000
      Mortgage        20.70       19.77       1.05    0.313
      A Vacanc      -10.946       8.019      -1.36    0.194
      O Vacanc        0.504       3.336       0.15    0.882
      
      S = 43.68       R-Sq = 89.6%     R-Sq(adj) = 86.6%
      

      From the output, the least squares equation is

      \begin{displaymath}
\hat{y} = - 138 + 4.97x_{1} + 20.7 x_{2} - 10.9 x_{3} + 0.50x_{4}
\end{displaymath}

      where $x_{1}$ = number of permits, $x_{2}$ = mortgage rates, $x_{3}$ = apartment vacancy rates and $x_{4}$ = building vacancy rates.

    2. Interpret $\hat{\beta}_{2}$.

      $\hat{\beta}_{2} = 20.7$. This tells us that, if mortgage rates increase by 1 percentage point, and the other explanatory variables remain fixed, we predict that drywall sales will increase by 2070 sheets (since sales are reported in 100's of sheets).

      Question to consider: Do you think this is unusual? We may have assumed that increasing mortgage rates would mean less building, so less demand for drywall.

    3. Interpret $R^{2}$.

      From the output, $R^{2} = 0.896 = 89.6\%$.

      By the definition, 89.6% of the variability in drywall sales can be accounted for (or explained) by the regression line relating pay to number of permits, mortgage rates and vacancy rates.

      In a general sense, this value of $R^{2}$ indicates we have a reasonable model for predicting sales.

    4. Predict $y$ if $x_{1} = 30$, $x_{2} = 8$, $x_{3} = 2.5$ and $x_{4} = 11$.

      \begin{displaymath}
\hat{y} = - 138 + 4.97(30) + 20.7 (8) - 10.9 (2.5) + 0.50(11)
= 154.95
\end{displaymath}

      So the sales would be 15495 sheets.

    5. Test at $\alpha = 0.01$ if the model is useful.

      The needed Minitab output is below:

      Analysis of Variance
      
      Source            DF          SS          MS         F        P
      Regression         4      229881       57470     30.11    0.000
      Residual Error    14       26717        1908
      Total             18      256599
      
      $H_{o}: \beta_{1} = \beta_{2} = \beta_{3} = \beta_{4} = 0$ (not useful)
      $H_{a}: \mbox{at least 1 } \beta_{i} \neq 0$ (model useful)

      From the output

      \begin{displaymath}
F_{obs} = \frac{MS(model)}{MSE} = \frac{57470}{1908} = 30.11
\end{displaymath}

      $\mbox{p-value } = P(F > 30.11) \approx 0$ using F-distribution with 4 and 14 df.

      Given $\alpha = 0.01$: Since p-value $ < 0.01$, reject $H_{o}$.

      Or, $F_{.01} = 5.04 < F_{obs}$, so reject $H_{o}$.

      So model appears useful-at least one of the explanatory variables helps to predict drywall sales.




next up previous
Next: About this document ...
Gary Sneddon 2003-10-08