You are required to analyze the data set that is described at the end of this handout. While you may discuss aspects of this project with others in the class, and of course with myself, each of you will submit your own report that contains your own work.
The format of your project should be a report which contains different sections, paragraphs and complete sentences. I will expect to see some computer output included with your report. This will include plots and results given by Minitab. However, I ask that you only submit relevant output. So, if Minitab has done things you haven't wanted at times while working on your dataset, please don't submit that material.
There is no particular length that your report should be. The main thing is you address the issues that are asked. This probably can't be done in 2 pages, but it won't take 50 pages, either. I suspect most reports will be in the 8-13 page range.
Your report may be typed or handwritten. It should contain the following sections. Not all of these sections will be of equal length. In fact, some sections may be very brief.
INTRODUCTION
DATA ANALYSIS
You are also encouraged to refer to values such as
when making
general comments about the suitability of a particular model. You
should also read section 11.5 of the text, which discusses
, and
another measure, called
, which you may find helpful. The
value
is labelled R-sq (adj) in Minitab. You should
also read about the concept of a parsimonious model in section
11.11.
For any hypothesis testing that you do, make sure to state your hypotheses and test statistic. Base your conclusions on your p-values whenever possible.
Make sure to compare the final model you select with your original model, using a partial F-test. However, even if this test suggests you keep the original model, give some argument (which may be non-statistical) to support your choice of model.
MODEL FIT
In addition, if there appear to be any outliers in the dataset, be sure to mention this. If a small number (say 2 or 3) of the residuals appear to be quite far from 0 (i.e. standardized residuals that are greater than 4), remove these points from your dataset, and re-fit the final model you proposed in the previous section. Have your estimates changed a great deal?
CONCLUSION
Your chief goal is to develop a regression model that predicts the price of homes in the suburbs of Boston based on size and neighborhood information. However, if you can think of other questions of interest that arise from this dataset, please make sure to mention and address those.
The dataset is available both on our course website
(www.math.mun.ca/~sneddon/st2501) and is directly accessible from
Minitab. To access it in Minitab, ask Minitab to open a worksheet, and
look in
Pub on `CS-thebe', then select the directory stat2501,
followed by the file boston.MTW
The dataset contains 13 columns, which contain the variables listed below: