NAME:______________________ I.D. # : ______________________ ECONOMICS 2900 Economics and Business Statistics SPRING SEMESTER, 2004 MIDTERM EXAMINATION Tuesday, Feb. 24th Weight 35% NOTE : You have 75 minutes to complete the exam. Please answer all questions on this exam booklet. Calculators used must not have the ability to program alphabetic characters (whole words or sentences) GOOD LUCK Question # 1 At a recent Willie Nelson concert, a survey was conducted that asked a random sample of 20 people their age and how many concerts they have attended since the first of the year. The following data were collected: Age Number of Concerts 62 6 57 5 40 4 49 3 67 5 54 5 43 2 65 6 54 3 41 1 Age Number of Concerts 44 3 48 2 55 4 60 5 59 4 63 5 69 4 40 2 38 1 52 3 An Excel output follows : SUMMARY OUTPUT DESCRIPTIVE STATISTICS Regression Statistics Multiple R 0.80203 R Square 0.64326 Adjusted R Square 0.62344 Standard Error 0.93965 Observations 20 Age Mean Standard Error Standard Deviation Sample Variance Count 53 2.1849 9.7711 95.4737 20 Concerts Mean Standard Error Standard Deviation Sample Variance Count MS 28.65711 0.88294 F 32.45653 Significance F 2.1082E-05 t Stat -2.53491 5.69706 P-value 0.02074 0.00002 Lower 95% -5.50746 0.07934 3.65 0.3424 1.5313 2.3447 20 SPEARMAN RANK CORRELATION COEFFICIENT=0.8306 ANOVA Regression Residual Total Intercept Age df 1 18 19 SS 28.65711 15.89289 44.55 Coefficients Standard Error -3.01152 1.18802 0.12569 0.02206 A. What is the regression equation? What does it mean? B. What is the R squared? What does it tell you? Upper 95% -0.5156 0.1720 C. What is the Standard Error? What does the value of this statistic mean? D. Does Age appear to be important when predicting number of concerts attended? E. Is the linear model appropriate? How can you tell? E. Predict with 95% confidence the number of concerts attended by a 45 years-old individual. (Just show the formula –do not calculate) F. Predict with 95% confidence the average number of concerts attended by all 45 years-old individuals. (Just show the formula –do not calculate) . Histogram 3.000 10 2.000 8 1.000 0.000 -1.000 0 1 2 3 4 5 6 Frequency Residuals Residuals versus Predicted 6 4 2 0 -1 -2.000 Predicted 0 1 Residuals G. What is heteroskedasticity? Does it appear to be a problem in this model? H. What does the Histogram tell you? Why is this important? 2 Question #2 An economist wanted to develop a multiple regression model to enable him to predict the annual family expenditure on clothes. After some consideration, he developed the multiple regression model y 0 1 x1 2 x2 3 x3 where y = annual family clothes expenditure (in $1,000) x1 = annual household income (in $1,000) x 2 = number of family members x3 = number of children under 10 years of age The computer output is shown below. THE REGRESSION EQUATION IS y 1.74 0.091x1 0.93x2 0.26 x3 Predictor Constant x1 x2 x3 S = 2.06 Coef 1.74 0.091 StDev 0.630 0.025 T 2.762 3.640 0.93 0.290 3.207 0.26 0.180 1.444 R-Sq = 59.6% ANALYSIS OF VARIANCE Source of Variation Regression Error Total A. df 3 46 49 SS 288 195 483 MS 96 4.239 Is this model useful? (Use 5% significance level) F 22.647 B. Test at the 1% significance level to determine whether the number of family members and annual family clothes expenditure are linearly related. F. What is Multicollinearity? Does it appear to be a problem in this model? How can you tell? Question # 3 An avid football fan was in the process of examining the factors that determine the success or failure of football teams. He noticed that teams with many rookies and teams with many veterans seem to do quite poorly. To further analyze his beliefs he took a random sample of 20 teams and proposed a second-order model with one independent variable. The selected model is y 0 1 x 2 x 2 where y = winning team’s percentage x = average years of professional experience The computer output is shown below. THE REGRESSION EQUATION IS y 32.6 5.96 x 0.48 x 2 Predictor Constant x x2 S = 16.1 Coef 32.6 5.96 -0.48 StDev 19.3 2.41 0.22 T 1.689 2.473 -2.182 R-Sq = 43.9% ANALYSIS OF VARIANCE Source of Variation Regression Error Total A. df 2 17 19 SS 3452 4404 7856 MS 1726 259.059 F 6.663 Suggest a reason why this fan would choose to include a variable like X2? What is the meaning of this variable and how do you know whether it should be retained as part of the model? Question # 4 A professor of accounting wanted to develop a multiple regression model to predict the students’ grades in her fourth-year accounting course. She decides that the two most important factors are the student’s grade point average in the first three years and the student’s major. She proposes the model y 0 1 x1 2 x2 3 x3 where y = Fourth-year accounting course mark (out of 100) x1 = G.P.A. in first three years (range 0 to 12) x2 = 1 if student’s major is accounting = 0 if not x3 = 1 if student’s major is finance = 0 if not The computer output is shown below. THE REGRESSION EQUATION IS y 9.14 6.73x1 10.42 x2 5.16 x3 Predictor Constant x1 x2 x3 S = 15.0 Coef 9.14 6.73 StDev 7.10 1.91 T 1.287 3.524 10.42 4.16 2.505 5.16 3.93 1.313 R-Sq = 44.2% ANALYSIS OF VARIANCE Source of Variation df Regression 3 Error 96 Total 99 SS 17098 21553 38651 MS 5699.333 224.510 F 25.386 Rank the students, according to their major, in order of who tends to score the highest in the accounting course. Question # 5 A. What is Autocorrelation? B. The only information you are given about a regression is as follows: d = 1.75, n = 20, k = 2, and 0.05. Test to see if Autocorrelation is a problem C. How else can you tell if Autocorrelation exists?