Economics/Management 230 Fall 2010; Prof. Wegge (with Prof. Yagahashi, and Prof. Balagyozyan) Contact info: simone.wegge@csi.cuny.edu Final Project: Regression Analysis Directions: The first part of this project involves analyzing a small data set on housing prices and incomes. The second part involves studying data on foreclosure. Both datasets are available on Blackboard under course documents. Use the version of Excel (2007) in our computer labs to get data results. Appendix I in Doane and Seward, page 803, also includes tips on writing reports. Part I: (35 points) This dataset covers median income and median home prices for 34 selected Eastern cities for the year 2004. 1. Describe the data. What does each variable measure? Is the data cross-sectional or time series data? What is the unit of observation (firm, individual, year, etc.)? 2. Construct a graph in Excel showing the relationship between two variables. Are there any unusual observations in the data? 3. How might median income be related to median home prices? 4. Regression analysis: a. Estimate a simple regression, regressing home prices on median income. Write down what the regression line is, in terms of what you got for b0 and b1. b. Interpret the regression results. Does the sign of the coefficient (b1) on median income agree with what you might expect? c. Look at the city of Hunter Mill, VA. The median home price is $290,000. What does the regression line predict and how does this compare with $290,000? d. Are the coefficients of the linear regression statistically significant? Perform a two tailed t test first for the intercept (b0) and then for the slope (b1). 5. Examine the R2 (coefficient of determination) of this regression. What does it tell you? 6. Examine the F statistic for the regression. What does it explain? Part II: (65 points) This dataset covers foreclosure rates by state in the year 2007. 1. In a few paragraphs, describe what has happened in the U.S. housing market over the past few years. Make sure you cite at least three outside sources from journals or and include them in a reference list. On the homepage of our library, use the tab “Find Articles” (underneath the pictures), and from there you can search for articles in the Multiple database Search using the keyword “foreclosure.” Here is a possible reference: Dunne, Timothy; Venkatu, Guhan. “Foreclosure Metrics.” Economic Commentary, 6/1/2009, pp. 1-4. 2. Describe the data. What does each variable measure? Is the data cross-sectional or time series data? What is the unit of observation (firm, individual, year, etc.)? 3. In Part II we are trying to explain foreclosure rates and figure out what other economic variables might influence them. In the regression you will estimate (below), the foreclosure rate will be the response variable, and all the other variables are the predictors. For now, construct at least three graphs, each one showing the relationship between two variables at a time. You can graph the foreclosure rate versus one of the predictors, or you can construct a graph with two of the predictors. For each of your three graphs, explain why you think it is interesting to look at. 4. You are trying to explain foreclosure rates. State you’re a priori hypotheses about the sign (positive or negative) of each predictor (variable) and your reasoning about cause and effect. Basically, state here how you think the different variables may affect foreclosure rates. 5. Regression analysis, 1st part: a. Estimate a simple regression, regressing foreclosure rates on one other variable of the dataset. Write down what the regression line is, in terms of what you got for b0 and b1. b. Interpret the regression results. Do the coefficient signs agree with you’re a priori expectations that you discussed above? c. Are the coefficients statistically significant? Perform a two-tailed t test to examine this. d. Examine the R2 of this regression. What does it tell you? e. Does the R2 differ a lot from the R2adj of this regression? If so, what does that mean? f. Examine the F Statistic of this regression. What does it tell you? 6. Regression analysis, 2nd part: a. Estimate a multiple regression, regressing foreclosure rates on other variables. Make sure you include the predictor variable you chose in the 1st part (above). Write down what the regression line is, in terms of what you got for b0, b1, b2, and b3, etc. In this part of the assignment, you may need to try several different specifications of the regression model before you decide on one to provide here. b. Interpret the regression results. i. Explain why you choose this particular regression model in terms of the predictors you have chosen. Did you experiment with any other model? ii. In terms of the number of predictors in your model, what do Evan’s Rule or at least Doane’s Rule suggest? iii. Do the signs on the coefficients agree with you’re a priori expectations that you discussed above? What seem to be the major predictors in terms of foreclosure rates? iv. In particular, compare the coefficient on the predictor you used in the 1st part (#5) with what you have in this part. Did the coefficient change a lot once you included it in a multiple regression? c. Are the coefficients statistically significant? Perform a two-tailed t test to examine this. d. Examine the R2 of this regression. What does it tell you? e. Does the R2 differ a lot from the R2adj of this regression? If so, what does that mean? f. Examine the F Statistic of this regression. What does it tell you? g. Look at the state of Nevada. Write down what the predicted foreclosure rate is given your regression results. How does this differ from the actual foreclosure rate in the dataset? See the section titled “Predictions from a Fitted Regression” on page 548 to get a sense as to what this is. 7. In terms of your regression results, how has the data analysis you have done here informed your opinion about the foreclosure crisis in the U.S.? What have you learned that is new that you would not have learned by just reading reports from journals and news outlets that cover business topics? Do you think there are factors driving foreclosure rates that have been ignored by the press?