Final Project - calciferanalytics

advertisement
Economics/Management 230
Fall 2010; Prof. Wegge (with Prof. Yagahashi, and Prof. Balagyozyan)
Contact info: simone.wegge@csi.cuny.edu
Final Project: Regression Analysis
Directions: The first part of this project involves analyzing a small data set on housing prices and
incomes. The second part involves studying data on foreclosure. Both datasets are available on
Blackboard under course documents. Use the version of Excel (2007) in our computer labs to get
data results. Appendix I in Doane and Seward, page 803, also includes tips on writing reports.
Part I: (35 points) This dataset covers median income and median home prices for 34 selected
Eastern cities for the year 2004.
1. Describe the data. What does each variable measure? Is the data cross-sectional or time
series data? What is the unit of observation (firm, individual, year, etc.)?
2. Construct a graph in Excel showing the relationship between two variables. Are there any
unusual observations in the data?
3. How might median income be related to median home prices?
4. Regression analysis:
a. Estimate a simple regression, regressing home prices on median income. Write
down what the regression line is, in terms of what you got for b0 and b1.
b. Interpret the regression results. Does the sign of the coefficient (b1) on median
income agree with what you might expect?
c. Look at the city of Hunter Mill, VA. The median home price is $290,000. What
does the regression line predict and how does this compare with $290,000?
d. Are the coefficients of the linear regression statistically significant? Perform a
two tailed t test first for the intercept (b0) and then for the slope (b1).
5. Examine the R2 (coefficient of determination) of this regression. What does it tell you?
6. Examine the F statistic for the regression. What does it explain?
Part II: (65 points) This dataset covers foreclosure rates by state in the year 2007.
1. In a few paragraphs, describe what has happened in the U.S. housing market over the
past few years. Make sure you cite at least three outside sources from journals or and
include them in a reference list. On the homepage of our library, use the tab “Find
Articles” (underneath the pictures), and from there you can search for articles in the
Multiple database Search using the keyword “foreclosure.” Here is a possible
reference: Dunne, Timothy; Venkatu, Guhan. “Foreclosure Metrics.” Economic
Commentary, 6/1/2009, pp. 1-4.
2. Describe the data. What does each variable measure? Is the data cross-sectional or
time series data? What is the unit of observation (firm, individual, year, etc.)?
3. In Part II we are trying to explain foreclosure rates and figure out what other
economic variables might influence them. In the regression you will estimate (below),
the foreclosure rate will be the response variable, and all the other variables are the
predictors. For now, construct at least three graphs, each one showing the relationship
between two variables at a time. You can graph the foreclosure rate versus one of the
predictors, or you can construct a graph with two of the predictors. For each of your
three graphs, explain why you think it is interesting to look at.
4. You are trying to explain foreclosure rates. State you’re a priori hypotheses about the
sign (positive or negative) of each predictor (variable) and your reasoning about cause
and effect. Basically, state here how you think the different variables may affect
foreclosure rates.
5. Regression analysis, 1st part:
a. Estimate a simple regression, regressing foreclosure rates on one other
variable of the dataset. Write down what the regression line is, in terms of
what you got for b0 and b1.
b. Interpret the regression results. Do the coefficient signs agree with you’re a
priori expectations that you discussed above?
c. Are the coefficients statistically significant? Perform a two-tailed t test to
examine this.
d. Examine the R2 of this regression. What does it tell you?
e. Does the R2 differ a lot from the R2adj of this regression? If so, what does that
mean?
f. Examine the F Statistic of this regression. What does it tell you?
6. Regression analysis, 2nd part:
a. Estimate a multiple regression, regressing foreclosure rates on other variables.
Make sure you include the predictor variable you chose in the 1st part (above).
Write down what the regression line is, in terms of what you got for b0, b1, b2,
and b3, etc. In this part of the assignment, you may need to try several different
specifications of the regression model before you decide on one to provide
here.
b. Interpret the regression results.
i. Explain why you choose this particular regression model in terms of
the predictors you have chosen. Did you experiment with any other
model?
ii. In terms of the number of predictors in your model, what do Evan’s
Rule or at least Doane’s Rule suggest?
iii. Do the signs on the coefficients agree with you’re a priori
expectations that you discussed above? What seem to be the major
predictors in terms of foreclosure rates?
iv. In particular, compare the coefficient on the predictor you used in the
1st part (#5) with what you have in this part. Did the coefficient change
a lot once you included it in a multiple regression?
c. Are the coefficients statistically significant? Perform a two-tailed t test to
examine this.
d. Examine the R2 of this regression. What does it tell you?
e. Does the R2 differ a lot from the R2adj of this regression? If so, what does that
mean?
f. Examine the F Statistic of this regression. What does it tell you?
g. Look at the state of Nevada. Write down what the predicted foreclosure rate is
given your regression results. How does this differ from the actual foreclosure
rate in the dataset? See the section titled “Predictions from a Fitted
Regression” on page 548 to get a sense as to what this is.
7. In terms of your regression results, how has the data analysis you have done here
informed your opinion about the foreclosure crisis in the U.S.? What have you learned
that is new that you would not have learned by just reading reports from journals and
news outlets that cover business topics? Do you think there are factors driving
foreclosure rates that have been ignored by the press?
Download