Stat 401B Project 2 (Multiple Regression) Fall 2004

advertisement
Stat 401B
Project 2 (Multiple Regression)
Fall 2004
1
The second project (worth 8% of your final grade) involves using multiple regression to build
the “best” model for a given set of data. For this project the “best” model is defined to
meet the following criteria:
1. the model is statistically useful at the 5% level.
2. each variable in the model is statistically significant at the 5% level, given the other
variables in the model.
3. among those models that meet 1) and 2), the one with the highest R 2 .
Once you have the “best” model you should analyze the residuals from the “best” model
and investigate potential outliers and influential observations.
There are four data sets that you can choose from. A brief description of each of the data
sets along with response and explanatory variables can be found in the handout, Data Sets
for Model Building Project. This is an individual project. The result of the project is a
thorough, but concise, professional quality technical report of no more than 5 typed double
spaced pages. The report is to be handed in to your instructor by noon on December 10,
2004. Your report should include at least:
• an executive summary.
• a description of the data set including the response and explanatory variables.
• a description of the “best” and second “best” single variable models.
• a description of the final models fit by JMP using Forward, Backward and Mixed
selection procedures (using the default settings for the Prob to Enter and
Prob to Leave).
• a description of your final fitted model with supporting evidence that it is the “best”
model.
• an analysis of residuals for your final fitted model. This analysis should include appropriate plots and tests for outliers, high leverage points and influential points.
Simply attaching a ream of computer printout in the appendix and expecting the instructor
to find what is important is not acceptable. The main body of the report should include
only the end products of any statistical calculations. It is appropriate to include plots
and important summary values within the body of the report. If you are going to include
complete computer printouts, they should be included only as appendices but count toward
the limit of 5 pages. Write the report as if a busy executive or manager were going to read
it. Statistical jargon for the sake of statistical jargon will not be well received.
The following due dates will be used to assure satisfactory progress on the projects, you may
complete each progress step before the date listed.
• November 19 Individuals indicate which of the four data sets they have chosen for their
project.
• December 10 The final reports are due. These are to be typed (or word processed)
on plain white paper and should not exceed 5 pages (10 pt font or larger) in length
(including all JMP output).
Download