0 NAME:______________________ I.D. # : ______________________

advertisement
0
NAME:______________________
I.D. # : ______________________
ECONOMICS 2900
Economics and Business Statistics
SUMMER SESSION 1, 2004
MIDTERM EXAMINATION
Tuesday, June 1st 2004
Weight 35%
NOTE : You have 2 hours to complete the exam, budget your time accordingly. Please
answer all questions on this exam booklet. Calculators used must not have the ability to
program alphabetic characters (whole words or sentences) GOOD LUCK
** Please do not mark the tables **
Question 1 (25 marks)
It is doubtful that any sport collects more statistics than baseball. This surfeit of statistics allows fans to
conduct a great variety of statistical analyses. For example, fans are always interested in determining
which factors lead to successful teams. A statistics practitioner determined the team batting average and
the team winning percentage for the 14 American League teams at the end of a recent season. We will
assume that these data represent a random sample of the relationship between batting average and winning
percentage for all time.
Team BA Winning% Team BA^2 win%^2 TeamBA * win%
0.254
0.414
0.064516 0.171396
0.105156
0.269
0.519
0.072361 0.269361
0.139611
0.255
0.500
0.065025
0.25
0.1275
0.262
0.537
0.068644 0.288369
0.140694
0.254
0.352
0.064516 0.123904
0.089408
0.247
0.519
0.061009 0.269361
0.128193
0.264
0.506
0.069696 0.256036
0.133584
0.271
0.512
0.073441 0.262144
0.138752
0.280
0.586
0.0784 0.343396
0.16408
0.256
0.438
0.065536 0.191844
0.112128
0.248
0.519
0.061504 0.269361
0.128712
0.255
0.512
0.065025 0.262144
0.13056
0.270
0.525
0.0729 0.275625
0.14175
0.257
0.562
0.066049 0.315844
0.144434
3.642
7.001
0.949
3.549
1.825
sum
a.
Develop a regression model that attempts to predict winning % as a
function of Team Batting average. Find the sample regression line, and
interpret the coefficients.
b.
Find the standard error of estimate, and describe what this statistic tells
you.
c.
Do these data provide sufficient evidence to conclude that higher team
batting averages lead to higher winning percentages?
d.
Find the coefficient of determination, and interpret its value.
e.
Predict with 90% confidence the winning percentage of a team whose
batting average is .275.
f. Why would performing an f test to help determine if the model is useful be
redundant for this model?
Question 2 (20 marks)
The general manager of the Cleveland Indians baseball team is in the process of determining which minorleague players to draft. He is aware that his team needs home-run hitters and would like to find a way to
predict the number of home runs a player will hit. Being an astute statistician, he gathers a random sample
of players and records the number of home runs each player hit in his first two full years as a major-league
player, the number of home runs he hit in his last full year in the minor leagues, his age, and the number of
years of professional baseball. An example of the first few lines of data, along with the initial regression
printout appears below.
Major
HR
Minor
HR
19
23
6
Years
Pro
Age
13
15
4
19
21
22
3
3
5
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.592560205
R Square
0.351127597
Adjusted R Square 0.335171718
Standard Error
6.992104843
Observations
126
ANOVA
Df
Regression
Residual
Total
Intercept
Minor HR
Age
Years Pro
3
122
125
SS
MS
F
Significance F
3227.612245 1075.871 22.00616 1.85592E-11
5964.522676 48.88953
9192.134921
Coefficients Standard Error t Stat
P-value
Lower 95%
-1.969977822 9.547049398 -0.20634 0.836866 -20.86933228
0.665838264 0.087149184 7.640212 5.46E-12 0.493317598
0.135727743 0.524087215 0.258979 0.796088 -0.901756157
1.176370911 0.670625334 1.75414 0.081917 -0.151200086
a.
What is the regression equation? Interpret each of the coefficients.
b.
How well does the model fit?
c.
Test with alpha =.05 if the model is useful. Explain how your test result
relates to “significance F” on the regression printout.
d.
Do each of the independent variables belong in the model? How can you
tell?
e.
Predict with 95% confidence the number of home runs in the first two
years of a player who is 25 years old, has played professional baseball for
7 years, and hit 22 home runs in his last year in the minor leagues.
Question 3 (25 marks)
The administrator of a school board in a large county was analyzing the average mathematics test scores in
the schools under her control. She noticed that there were dramatic differences in scores among the
schools. In an attempt to improve the scores of all the schools, she attempted to determine the factors that
account for the differences. Accordingly, she took a random sample of 40 schools across the county and,
for each, determined the mean test score last year, the percentage of teachers in each school who have at
least one university degree in mathematics, the mean age, and the mean annual income of the mathematics
teachers. An example of the first few lines of data, along with the initial regression printout appears below.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.5975122
R Square
0.8570209
Adjusted R Square
0.8034393
Standard Error
7.724526
Observations
40
ANOVA
Df
Regression
Residual
Total
Intercept
Math Degree
Age
Income
SS
MS
F
Significance F
3 1192.732105 397.5774 6.663125 0.001076925
36 2148.058895 59.6683
39
3340.791
Coefficients
Standard Error
35.677618 7.278849159
0.2474816 0.069845662
0.2448306 0.185213036
0.1332967 0.152818937
t Stat
4.901547
3.543263
1.321886
0.872253
P-value
2.03E-05
0.001115
0.194545
0.388851
Lower 95%
20.91544713
0.1058282
-0.13079835
-0.17663405
Correlation matrix:
Test
Score
Test Score
Math Degree
Age
Income
Math
Degree
1
0.506626
1
0.332495 0.076597
0.311981 0.099351
Age
1
0.869752
Income
1
a.
What is the regression model? Do these coefficients make sense?
b.
Overall, does this model fit the data well?
Histogram
Frequency
10
5
Frequency
0
Bin
100
residuals
50
0
0
10
20
30
40
Series1
-50
-100
-150
residuals
Observation #
100
80
60
40
20
0
-20 0
-40
-60
-80
-100
-120
100
200
300
400
500
Series1
Predicted
c. What are the required conditions regarding the error variable? Are these
conditions satisfied? Explain in detail.
d.
What is Multicollinearity? Why is it a problem? Is multicollinearity a problem in
this model? Should we fix multicollinearity when we do find evidence of it?.
e.
How would you attempt to make this a better model?
Question 4 (10 marks)
Lotteries have become important sources of revenue for governments. Many people
have criticized lotteries, however, referring to them as a tax on the poor and
uneducated. In an examination of the issue, a random sample of 100 adults was asked
how much they spend on lottery tickets and was interviewed about various
socioeconomic variables. The purpose of this study is to test the following beliefs:
1.
2.
3.
4.
Relatively uneducated people spend more on lotteries than do relatively
educated people.
Older people buy more lottery tickets than younger people.
People with more children spend more on lotteries than people with
fewer children.
Relatively poor people spend a greater proportion of their income on
lotteries than relatively rich people.
What do these regression results suggest about the 4 beliefs listed above? Be as
detailed and thorough as possible in your answer.
Question # 5 (20 marks)
You perform a regression analysis with n = 6 and k = 2 which produces the following
residuals:
Residual
8
7
-5
-4
0
6
Could autocorrelation be a problem in the model? (as part of your answer be sure to
define what autocorrelation is as well as suggest any potential fix for autocorrelation if
detected.)
Download