Multiple Regression: Correction to Data The following dataset gives information on makes of cars taken from the April, 1990 issue of Consumer Reports. In particular, the data frame "fuel.frame" contains data on mileage, weight of car, engine displacement and size of car (qualitative, 6 possibilities, saved under "Type"). Question: Can we use a regression model to predict mileage from some of these variables? We should begin with some plots to study the relationships among the variables. ISSUE: When we plot Mileage vs. Displacement, something strange appears in plots: plot(Mileage ~ Disp, fuel.frame) It appears that some Displacement values are near 0, which doesn't make any sense. Let's take a look at a part of the data frame: > fuel.frame[1:3,] row.names Weight Disp. Mileage Fuel Type 1 Eagle.Summit.4 2560 0.97 33 3.030303 Small 2 Ford.Escort.4 2345 114.00 33 3.030303 Small 3 Ford.Festiva.4 1845 0.81 37 2.702703 Small We see that these small displacments appear to be off by a factor of around 100 (the 0.97 should be 97, etc.). So we're going to tell R to multiply all these small displacements by 100 with the following command: > Dispnew <- ifelse(fuel.frame$Disp < 10, fuel.frame$Disp*100, fuel.frame$Disp) This command is telling R to take any displacement that is less than 10 and multiply it by 100. If it is greater than or equal to 10, then leave it alone. Now will put this into our data frame: > fuel.frame <- cbind(fuel.frame, Dispnew) Now let's redo our model: > fuel.fit1new <- lm(Mileage ~ Weight + Dispnew, data = fuel.frame) > summary(fuel.fit1new) Call: lm(formula = Mileage ~ Weight + Dispnew, data = fuel.frame) Residuals: Min 1Q Median 3Q Max -4.6633 -1.7553 -0.1638 1.9454 5.5482 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 48.039428 2.263134 21.227 < 2e-16 *** Weight -0.007927 0.001139 -6.963 3.67e-09 *** Dispnew -0.003025 0.010424 -0.290 0.773 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 2.583 on 57 degrees of freedom Multiple R-squared: 0.7193, Adjusted R-squared: 0.7094 F-statistic: 73.02 on 2 and 57 DF, p-value: < 2.2e-16 > anova(fuel.fit1new) # NOTE: This adds terms sequentially (first to last) Analysis of Variance Table Response: Mileage Df Sum Sq Mean Sq F value Pr(>F) Weight 1 973.75 973.75 145.9588 <2e-16 *** Dispnew 1 0.56 0.56 0.0842 0.7727 Residuals 57 380.27 6.67 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 > par(mfrow=c(3,2)) > plot(fuel.fit1new, which = 1:6) # Gives some summary plots of the fitted model