0 NAME:______________________ I.D. # : ______________________ ECONOMICS 2900 Economics and Business Statistics SUMMER SESSION 1, 2004 MIDTERM EXAMINATION Tuesday, June 1st 2004 Weight 35% NOTE : You have 2 hours to complete the exam, budget your time accordingly. Please answer all questions on this exam booklet. Calculators used must not have the ability to program alphabetic characters (whole words or sentences) GOOD LUCK ** Please do not mark the tables ** Question 1 (25 marks) It is doubtful that any sport collects more statistics than baseball. This surfeit of statistics allows fans to conduct a great variety of statistical analyses. For example, fans are always interested in determining which factors lead to successful teams. A statistics practitioner determined the team batting average and the team winning percentage for the 14 American League teams at the end of a recent season. We will assume that these data represent a random sample of the relationship between batting average and winning percentage for all time. Team BA Winning% Team BA^2 win%^2 TeamBA * win% 0.254 0.414 0.064516 0.171396 0.105156 0.269 0.519 0.072361 0.269361 0.139611 0.255 0.500 0.065025 0.25 0.1275 0.262 0.537 0.068644 0.288369 0.140694 0.254 0.352 0.064516 0.123904 0.089408 0.247 0.519 0.061009 0.269361 0.128193 0.264 0.506 0.069696 0.256036 0.133584 0.271 0.512 0.073441 0.262144 0.138752 0.280 0.586 0.0784 0.343396 0.16408 0.256 0.438 0.065536 0.191844 0.112128 0.248 0.519 0.061504 0.269361 0.128712 0.255 0.512 0.065025 0.262144 0.13056 0.270 0.525 0.0729 0.275625 0.14175 0.257 0.562 0.066049 0.315844 0.144434 3.642 7.001 0.949 3.549 1.825 sum a. Develop a regression model that attempts to predict winning % as a function of Team Batting average. Find the sample regression line, and interpret the coefficients. b. Find the standard error of estimate, and describe what this statistic tells you. c. Do these data provide sufficient evidence to conclude that higher team batting averages lead to higher winning percentages? d. Find the coefficient of determination, and interpret its value. e. Predict with 90% confidence the winning percentage of a team whose batting average is .275. f. Why would performing an f test to help determine if the model is useful be redundant for this model? Question 2 (20 marks) The general manager of the Cleveland Indians baseball team is in the process of determining which minorleague players to draft. He is aware that his team needs home-run hitters and would like to find a way to predict the number of home runs a player will hit. Being an astute statistician, he gathers a random sample of players and records the number of home runs each player hit in his first two full years as a major-league player, the number of home runs he hit in his last full year in the minor leagues, his age, and the number of years of professional baseball. An example of the first few lines of data, along with the initial regression printout appears below. Major HR Minor HR 19 23 6 Years Pro Age 13 15 4 19 21 22 3 3 5 SUMMARY OUTPUT Regression Statistics Multiple R 0.592560205 R Square 0.351127597 Adjusted R Square 0.335171718 Standard Error 6.992104843 Observations 126 ANOVA Df Regression Residual Total Intercept Minor HR Age Years Pro 3 122 125 SS MS F Significance F 3227.612245 1075.871 22.00616 1.85592E-11 5964.522676 48.88953 9192.134921 Coefficients Standard Error t Stat P-value Lower 95% -1.969977822 9.547049398 -0.20634 0.836866 -20.86933228 0.665838264 0.087149184 7.640212 5.46E-12 0.493317598 0.135727743 0.524087215 0.258979 0.796088 -0.901756157 1.176370911 0.670625334 1.75414 0.081917 -0.151200086 a. What is the regression equation? Interpret each of the coefficients. b. How well does the model fit? c. Test with alpha =.05 if the model is useful. Explain how your test result relates to “significance F” on the regression printout. d. Do each of the independent variables belong in the model? How can you tell? e. Predict with 95% confidence the number of home runs in the first two years of a player who is 25 years old, has played professional baseball for 7 years, and hit 22 home runs in his last year in the minor leagues. Question 3 (25 marks) The administrator of a school board in a large county was analyzing the average mathematics test scores in the schools under her control. She noticed that there were dramatic differences in scores among the schools. In an attempt to improve the scores of all the schools, she attempted to determine the factors that account for the differences. Accordingly, she took a random sample of 40 schools across the county and, for each, determined the mean test score last year, the percentage of teachers in each school who have at least one university degree in mathematics, the mean age, and the mean annual income of the mathematics teachers. An example of the first few lines of data, along with the initial regression printout appears below. SUMMARY OUTPUT Regression Statistics Multiple R 0.5975122 R Square 0.8570209 Adjusted R Square 0.8034393 Standard Error 7.724526 Observations 40 ANOVA Df Regression Residual Total Intercept Math Degree Age Income SS MS F Significance F 3 1192.732105 397.5774 6.663125 0.001076925 36 2148.058895 59.6683 39 3340.791 Coefficients Standard Error 35.677618 7.278849159 0.2474816 0.069845662 0.2448306 0.185213036 0.1332967 0.152818937 t Stat 4.901547 3.543263 1.321886 0.872253 P-value 2.03E-05 0.001115 0.194545 0.388851 Lower 95% 20.91544713 0.1058282 -0.13079835 -0.17663405 Correlation matrix: Test Score Test Score Math Degree Age Income Math Degree 1 0.506626 1 0.332495 0.076597 0.311981 0.099351 Age 1 0.869752 Income 1 a. What is the regression model? Do these coefficients make sense? b. Overall, does this model fit the data well? Histogram Frequency 10 5 Frequency 0 Bin 100 residuals 50 0 0 10 20 30 40 Series1 -50 -100 -150 residuals Observation # 100 80 60 40 20 0 -20 0 -40 -60 -80 -100 -120 100 200 300 400 500 Series1 Predicted c. What are the required conditions regarding the error variable? Are these conditions satisfied? Explain in detail. d. What is Multicollinearity? Why is it a problem? Is multicollinearity a problem in this model? Should we fix multicollinearity when we do find evidence of it?. e. How would you attempt to make this a better model? Question 4 (10 marks) Lotteries have become important sources of revenue for governments. Many people have criticized lotteries, however, referring to them as a tax on the poor and uneducated. In an examination of the issue, a random sample of 100 adults was asked how much they spend on lottery tickets and was interviewed about various socioeconomic variables. The purpose of this study is to test the following beliefs: 1. 2. 3. 4. Relatively uneducated people spend more on lotteries than do relatively educated people. Older people buy more lottery tickets than younger people. People with more children spend more on lotteries than people with fewer children. Relatively poor people spend a greater proportion of their income on lotteries than relatively rich people. What do these regression results suggest about the 4 beliefs listed above? Be as detailed and thorough as possible in your answer. Question # 5 (20 marks) You perform a regression analysis with n = 6 and k = 2 which produces the following residuals: Residual 8 7 -5 -4 0 6 Could autocorrelation be a problem in the model? (as part of your answer be sure to define what autocorrelation is as well as suggest any potential fix for autocorrelation if detected.)