Instructions for Regression Assignment

advertisement
Instructions for Regression Assignment
This assignment is due by 11:59 p.m. on Tuesday, 11/23/10.
Email me an Excel file that includes your answers to all parts of the assignment. The file
name should be your last name and first initial (for example, Mark Johnson’s file would
be called johnsonm.xls).
Check your file for viruses! I will mark down your paper by two letter grades if you send
me an infected file!
1. Download the file regressionF10.txt from the website. Put this data into an Excel
file. Make sure all the data ends up in the correct columns. Name this worksheet “data.”
2. Create a dummy variable based on the variable “MAJOR.” The dummy should equal
1 for any student in a natural science major (physics, biology, or chemistry) and 0
otherwise. Call this variable “SCIENCE.”
2. Perform a multiple regression using the following model:
GPA = α + β1SAT + β2STUDY + β3WORK + β4SCIENCE + ε
Name the worksheet with the regression output “regression 1.” Expand the columns as
needed to make the results look nice.
3. In a text box inserted in the “regression 1” worksheet, interpret the regression results.
Specifically, you should discuss (a) the meaning of each slope coefficient, including an
explanation of the effect of the variable on GPA and whether the coefficient has the sign
you would expect, (b) the statistical significance of each coefficient, and (c) the
explanatory power of the regression as a whole, using both R-squared and F-stat. (Be
sure to put your explanations of slope coefficients in terms of the original units of
measure, as given in the original data file.)
4. In your regression output, look at the residual plot for STUDY (be sure to expand the
graph so you can see it clearly). Do you notice a pattern? In another text box inserted in
the regression 1 worksheet, describe the pattern (if any), and what it implies about the
effectiveness of studying. Use economic terminology to describe this phenomenon.
5. Create a new variable that is equal to the hours of study squared. Call it STUDYSQ.
Perform a second regression, using the same explanatory variables as the first regression
and this additional variable. Expand the columns to make the results look nice. Name
the worksheet with the new regression output “regression 2.” (a) In a text box inserted
into this worksheet, interpret the results of this regression (as in question 3 above, parts a,
b, and c). However, don’t worry about the STUDY and STUDYSQ yet. (b) In a second
text box in this worksheet, explain the meaning of STUDY and STUDYSQ. What effect
will studying one more hour per week have for a student who currently only studies 1
hour per week? What effect will it have for a student who already studies 6 hours per
week? (b) In a third text box in this worksheet, compare your results from this regression
to the previous one. Does the second regression do a better job of explaining differences
in GPAs?
6. Think carefully about the possible determinants of a student’s GPA. What other
variables do you think might be relevant? How would you go about including them in
the regression? (You do not need to perform another regression. Just explain in words
the approach you might use.) Create a new worksheet called “analysis,” and put your
answer to this question in a text box in that worksheet.
7. Can you think of any other problems or difficulties with the approach we’ve used?
Put your answer in a second text box in the “analysis” worksheet. (Do not repeat your
answer to #6, and do not expect full credit for pointing out just one potential problem or
difficulty. Refer to the “Regression Difficulties” lecture for further guidance.)
Download