Multiple Regression: Residual Analysis Recall that we have been using a multiple linear regression model to relate mileage to a car's weight and engine displacement: > fuel.fit1new <- lm(Mileage ~ Weight + Dispnew, fuel.frame) One important issue is to assess the assumptions we are making. We can do this through some residual plots. Recall that the command > plot(fuel.fit1new, which = 1:6) gave a series of plots (the "which" option tells R to give all of the plots listed below). They are: 1. Residuals vs. fitted (predicted) values 2. Scale-Location plot of sqrt{| residuals |} against fitted values. The square root of the absolute value of the residuals is less skewed than the absolute value of the residuals. 3. Normal Q-Q plot of residuals. 4. Cook's distances vs. observation number. Used to identify influential observations. 5. Residuals vs. leverages, showing contours of equal Cook's distance. 6. Cook's distances vs. leverage/(1-leverage). Contours of standardized residuals that are equal in magnitude are lines through the origin. We can also create some of these plots with other commands: > par(mfrow=c(2,2)) > plot(fitted(fuel.fit1new), residuals(fuel.fit1new)) # resid. vs. predicted > lines(c(0,50), c(0,0)) # To add in axis line at residual = 0 # Now, add in lines that show 2*sqrt(MSE) limits. # This isn't done that often. We'll discuss how to avoid this # at the end of this handout. > lines(c(0,50), c(-2*2.58, -2*2.58)) > lines(c(0,50), c(2*2.58, 2*2.58)) > plot(fuel.frame$Weight, residuals(fuel.fit1new)) # resid. vs. Weight > plot(fuel.frame$Dispnew, residuals(fuel.fit1new)) > hist(residuals(fuel.fit1new)) # Histogram of residuals > mtext("Residual Plots for mileage=Weight + Disp", outer=T, line=-1) Standardized Residuals ----------------------- Sometimes we don't want to picture where the 2*sqrt(MSE) limits would be on our residual plot. We can avoid this by using standardized residuals, which are the residuals we obtain by rescaling the original residuals so that the mean of the standardized residuals is 0, and their variance is 1. We can do this using the rstandard() command in R. Here is how we would plot the standardized residuals vs. the predicted (fitted) values: plot(predict(fuel.fit1new),rstandard(fuel.fit1new)) If we compared this plot to the one we obtained earlier: plot(predict(fuel.fit1new),residuals(fuel.fit1new)) The plots look exactly the same, except the scale on the y-axis differs.