next up previous
Next: About this document ...

Statistics 2501-001 Midterm Oct. 17, 2003
Name: I.D.


  1. A local car dealership has been using 30-second TV commercials to try and increase sales. The commercials always air during the evening, and advertise different models and price ranges that are available that week. The dealer keeps track of the number of TV ads each week, and the number of cars sold that week.

    Below are the data for 10 weeks, along with some summary statistics:


    Ads ($x$) 6 20 0 14 25 16 28 18 10 8
    Cars Sold ($y$) 15 31 10 16 28 20 40 25 12 15




    \begin{displaymath}
\sum_{i} x_{i}^{2} = 2785, \quad \sum_{i} x_{i} = 145, \quad
SS_{xx} = 682.5
\end{displaymath}


    \begin{displaymath}
\sum_{i} x_{i} y_{i} = 3764, \quad \sum_{i} y_{i} = 212, \quad
\sum_{i} y_{i}^{2} = 5320, \quad
SS_{yy} = 825.6
\end{displaymath}

    1. Assuming a linear relationship is appropriate, determine the least squares regression line.

      SOLUTION:

      \begin{eqnarray*}
SS_{xy} & = & \sum_{i} x_{i}y_{i} - n \bar{x}\bar{y} \\
& =...
...{\beta}_{1}\bar{x} \\
& = & 21.2 - 1.01(14.5) \\
& = & 6.56
\end{eqnarray*}



      Least squares line: $\hat{y} = 6.56 + 1.01x$.

    2. Does it appear that there is a linear relationship between number of weekly ads and car sales? State the appropriate hypotheses, find the test statistic and use $\alpha = 0.01$ to draw your conclusion. You may use the fact that $\hat{\sigma}^{2} = 16$.

      SOLUTION

      $H_{o}: \beta_{1} = 0$, $H_{a}: \beta_{1} \neq 0$.

      $\hat{\sigma}^{2} = 16$, so $\hat{\sigma} = 4$.

      \begin{eqnarray*}
t_{obs} & = & \frac{\hat{\beta}_{1}}{s_{\hat{\beta}_{1}}} \\ ...
...S_{xx}}} \\
& = & \frac{1.01}{4/\sqrt{682.5}} \\
& = & 6.60
\end{eqnarray*}



      Test at $\alpha = 0.01$: Reject $H_{o}$ if $6.60 > t_{0.01/2}$, using T-distribution with $n - 2 = 8$ df.

      Table: $t_{0.01/2} = t_{0.005} = 3.355$.

      Since $6.60 > 3.355$, reject $H_{o}$.

      It appears that there is a linear relationship between ads and sales.

  2. A study shows that there is a high positive correlation between the size of a hospital (measured by the number of beds) and the average number of days that patients remain in the hospital.

    Are the large hospitals padding their bills by keeping patients longer? If no, give a possible explanation for the high correlation.

    SOLUTION:

    NOT padding bills-the whole correlation does not imply causation idea.

    Most common answer for what explains the relationship: larger hospitals probably have sicker patients, who have to stay in hospital longer.

  3. If we consider the residuals from a simple linear regression model:

    1. If we know that MSE = 40, within what range would we expect about 95% of the residuals to be found?

      SOLUTION

      $\hat{\sigma} = \sqrt{MSE} = \sqrt{40} = 6.32$.

      Expect 95% of residuals to fall in the range

      $[-2(6.32), 2(6.32)] = (-12.64, 12.64)$.

    2. Why do we construct a QQ plot of the residuals? What pattern do we want to see in a QQ plot of the residuals?

      SOLUTION

      Why?: To investigate if the errors are normally distributed.

      Want a linear pattern, which indicates the normality assumption is reasonable.

  4. A motion picture industry analyst is studying movies based on epic novels. Data were obtained for 16 Hollywood movies made in the past 7 years. Each movie was based on an epic novel. The analyst believed that the first year box office receipts of the movie depended on the production costs of the movie ($x_{1}$), promotional costs of the movie ($x_{2}$) and the book sales prior to movie release ($x_{3}$). The units of all variables are measured in millions of dollars.

    The analyst decided to use a multiple regression equation to relate the box office receipts to these explanatory variables. A portion of the Minitab output follows:

     The regression equation is
     Receipts = 7.67 +  3.66 product + 7.6211 promote + 0.8285 book
     
     Predictor       Coef     SE Coef          
     Constant      7.6760     6.7602 
     product       3.6616     1.1178
     promote       7.6211     1.6573
     book          0.8285     0.5394  
     
     Analysis of Variance
     
     SOURCE       DF          SS         
     Regression    3       10714  
     Error                   696
     Total        15
    

    1. Interpret the value of $\hat{\beta}_{3}$ in this model.

      SOLUTION

      $\hat{\beta}_{3} = 0.8285$: As book sales increase by $1 million, and all other variables remain fixed, we expect box office receipts to increase by

      $0.8285(\$1 \mbox{ million} ) = \$828,500$.

    2. Test at $\alpha = 0.01$ whether the model is useful in predicting box office receipts.

      SOLUTION

      $H_{o}: \beta_{1} = \beta_{2} = \beta_{3} = 0$
      $H_{a}: \mbox{ At least 1 } \beta_{i} \neq 0$

      \begin{eqnarray*}
MS(model) & = & \frac{SS(model)}{k} \\
& = & \frac{10714}{3} \\
& = & 3571.3
\end{eqnarray*}



      \begin{eqnarray*}
MSE & = & \frac{696}{n - k - 1} \\
& = & \frac{696}{12} \\
& = & 58
\end{eqnarray*}



      \begin{eqnarray*}
F_{obs} & = & \frac{MS(model)}{MSE} \\
& = & \frac{3571.3}{58} \\
& = & 61.57
\end{eqnarray*}



      Test at $\alpha = 0.01$: Reject if $F_{obs} > F_{.01}$, using F-distribution with $k = 3$ and $(n - k - 1) = 12$ df.

      $F_{.01} = 5.95 < 61.57$, so reject $H_{o}$.

      So model appears useful.

    3. Should the production costs be kept in the model? Base your conclusion on the p-value of the appropriate test.

      SOLUTION:

      $H_{o}: \beta_{1} = 0$, $H_{o}: \beta_{1} \neq 0$


      \begin{displaymath}
t_{obs} = \frac{\hat{\beta}_{1}}{s_{\hat{\beta}_{1}}}
= \frac{3.6616}{1.1178} = 3.28
\end{displaymath}

      p-value = $2P(t \geq \vert 3.28\vert) = 2P(t \geq 3.28)$ using T-distribution with $(n - k - 1) = 12$ df.

      T-Table: $P(t > 3.055) = 0.005$, $P(t > 3.930) = 0.001$

      Therefore

      $0.005 > P(t > 3.28) > 0.001$

      $2(0.005) > 2P(t > 3.28) > 2(0.001)$

      $ 0.01 > \mbox{ p-value } > 0.002$

      Very strong evidence against $H_{o}$.

      So, keep production costs in model.

  5. Answer EITHER (a) or (b), not both:

    1. I really wish we had been given more than 10 M & M candies to eat in class one day. TRUE FALSE

    2. Prove Lebesgue's dominated convergence theorem: If $\vert f_{n}\vert < g$ almost everywhere, where $g$ is integrable, and if $f_{n} \rightarrow f$ almost everywhere, then $f$ and the $f_{n}$ are integrable and $\int f_{n} d\mu \rightarrow \int f d\mu$.

      Hint: Use Fatou's lemma.

      SOLUTION: You've got to be kidding!




next up previous
Next: About this document ...
Gary Sneddon 2003-12-01