Exam 3

advertisement
DS350 – QUANTITATIVE METHODS FOR BUSINESS DECISIONS
FALL SEMESTER 2003
“Big Quiz” #3
Answer the following questions in the space provided. SHOW YOUR WORK when
appropriate. Unless the problem states otherwise, use the traditional confidence level of 95%
and the traditional significance level of =.05. Relative problem weights are given in brackets;
these total 100 points. Writing “Pledged” before your signature is a symbol of your ongoing
commitment to the Honor System. Enjoy!!
Question 1 [4 points]:
Dietrich Buxtehude is conducting a hypothesis test to investigate whether there is a
connection between statistics aptitude and sleep. He computes a positive correlation, and obtains
a p-value of .000042. What conclusion should he draw?
_____
_____
_____
_____
There is enough evidence to believe that statistics aptitude and sleep are related.
There is not enough evidence to believe that statistics aptitude and sleep are related.
There is enough evidence to believe that statistics aptitude and sleep are not related.
There is not enough evidence to believe that statistics aptitude and sleep are not related.
Question 2 [2 points]:
Is the result in Question 1 statistically significant? Explain.
Question 3 [12 points, divided as indicated]:
The Mall-Mart Corporation wants to open a new SuperStore in the thriving metropolis of
Bean Blossom, Indiana. This has created some controversy in the town. A random sample of
400 adult town residents was surveyed on their opinions; 240 of these supported opening of the
SuperStore.
a) [8] Give a confidence interval for the percentage of all adult town residents who favor opening
of the SuperStore.
b) [4] Based upon your confidence interval, may we conclude (with 95% confidence) that a
majority of the town favors opening of the SuperStore? Explain.
Question 4 [18 points, divided as indicated]:
Voters in Orange County recently defeated a referendum to increase taxes to fund
improvements in the county school system. Analysis of the election data revealed some patterns
in the voting. For example, a poll taken a few days before the election asked for respondents’
attitudes toward the referendum. Age of the respondent was included as a demographic question
on the survey. A cross-tabulation of responses on these questions is given below.
Age of respondent:
Favored
18-35
100
35-65
180
65 and older
40
Opposed
100
220
160
a) [4] State the appropriate null and alternative hypothesis (in words and in symbols) for
investigating whether a respondent’s age is related to her/his opinion regarding the tax
referendum.
b) [8] Compute the test statistic for testing your hypotheses from Part A. State the distribution
and (if appropriate) the degrees of freedom. Give the p-value of the test.
c) [6] Draw an appropriate conclusion, both in statistics jargon (reject/don’t reject) and in the
language of the problem.
Question 5 [10 points]:
Dr. Rasp has data on the number of hours of sleep, and the “big quiz” score, for three
students in statistics class. Alphonso got 6 hours of sleep, and got a 78 on the “quiz.” Balph got
only one hour of sleep and had a 42 on the “quiz,” while Clorinda got a 96 after 8 hours of sleep.
Find the (sample) correlation between sleep and quiz score.
Question 6 [30 points, divided as indicated]:
Recall that the security market line is the regression line relating the return and risk (as
measured by the beta coefficient) of stocks. Data for a random sample of ten stocks on the New
York Stock Exchange are given below.
Risk (X)
.4
Return (Y) 2.6
.7
2.8
1
3.5
.2
3.8
1.3
5.2
1.6
7.4
.9
7.6
1.1
8.4
1
8.5
1.8
10.2
Note that, for these data, the (sample) correlation is .721. The slope is 4 and the intercept is 2.
The error variance (se2) is 4.06.
a) [6] State (in words and in symbols) the appropriate null and alternative hypotheses for
investigating whether in fact “return is a function of risk.”
b) [8] Compute the test statistic for testing your hypotheses from Part A. State the distribution
and (if appropriate) the degrees of freedom. Give the p-value of the test.
c) [6] Draw an appropriate conclusion, both in statistics jargon (reject/don’t reject) and in the
language of the problem.
d) [10] Predict the return for WorldWide Widget, which has a beta of 1.4. Give a 95%
confidence interval for this value.
Question 7 [14 points, divided as indicated]:
Gracetta Squornshellous has done (single) exponential smoothing on a set of data.
Results are as follows:
Data
18
22
28
42
Exponential Smooth
18
18.4
19.36
21.624
a) [6] What smoothing constant () was used, to do the exponential smoothing?
b) [4] What does this smoothing constant () indicate?
______ The exponentially smoothed values will be highly sensitive to changes in the data.
______ The exponentially smoothed values will be highly stable to changes in the data.
______ The exponentially smoothed values will reject the null hypothesis.
______ The exponentially smoothed values will not reject the null hypothesis.
c) [4] Gracetta uses the exponentially smoothed values to forecast the following data point.
What is the mean square error (MSE) of the forecasts?
Question 8 [10 points]:
Horatio Wajberlinski has computed a regression model relating a student’s grade on the
“big quiz” in statistics class to the number of hours of sleep s/he got the night before the “quiz.”
Excel output showing results from his analysis is given on the following page.
a) [6] Give the slope and intercept for the model. Interpret these numbers.
b) [4] Horatio spilled coffee on part of the printout, and he can no longer read the top three
numbers. But they can still be computed from information given elsewhere on the printout.
Give the coefficient of determination (r2) for these data. Interpret this number.
REGRESSION OUTPUT for Question #8
Regression Statistics
Multiple R
??
R Square
??
Adjusted R Square
----Standard Error
25.26
Observations
20
ANOVA
df
Regression
Residual
Total
Intercept
Sleep
1
18
19
SS
12339.5
11486.3
23825.8
Coefficient Standard
s
Error
20.30
8.89
10.06
2.29
MS
12339.5
638.1
t Stat
2.28
4.40
F
19.3369
P-value
0.03
0.00
Significance F
0.000347
Lower
95%
1.61
5.25
Upper
95%
38.99
14.87
DS350 – FALL 2003 – “BIG QUIZ” #3 - SOLUTIONS
1) There is enough evidence to believe that statistics aptitude and sleep are related.
2) Yes. “Significant” means “we rejected the null hypothesis.”
3a) Confidence interval on a proportion:
[best guess] + [ # ]*[st dev]
We observed 240/400 = .60
.6  z  1.96
(.6)  (.4)
→ .6 + .048.
400
3b) YES, we’re pretty sure a majority of the town favors opening the SuperStore. We don’t
know what the population proportion is exactly, but we’re pretty sure it’s between 55.2%
and 64.8% - a majority under any circumstance.
4a) H0: age and opinion are not related, 18-35 = 35-65 = 65+
HA: age and opinion are related; at least one of the ’s is not equal to the others.
4b) Since it involves multiple proportions, it’s a chi-square test. The expected cell frequencies
are:
Age:
18-35
35-65
65+
Favored
120
240
120
80*
160
80
The chi-square statistic is  2  
Opposed
*EXAMPLE:
[row total]*[col total]/[total total]
200*320/800 = 80
obs  exp 2
exp
Here this is
100  802  100 1202  180 1602  220  2402  40  802  160 1202
80
120
160
240
80
120
= 5 + 3.333 + 2.5 + 1.667 + 20 + 13.333 = 45.833
This is a chi-square statistic with 2 degrees of freedom. The p-value is <.005.
4c) We reject the null hypothesis. There is reason to believe that opinion about the tax
referendum depends upon age.
5) First compute the covariance.
First method:
X
Y X-Xbar
6
78
1
1
42
-4
8
96
3
Total:
Second method:
Y-Ybar product
X
Y
X*Y
6
6
6
78
468
-30
120
1
42
42
24
72
8
96
768
198
15
216
1278
1
1278   (15)  (216)
198
3
Covariance =
= 99
Covariance =
= 99
2
2
Then Correlation = Covariance/[ SD(x) * SD(y) ] = 99/(3.605*27.79) = .9986
6a) H0: return is not related to risk, =0 or =0
HA: return is a function of risk, >0 or >0 (one-tailed)
6b) Test on correlation:
or
Test on slope:
obs  exp
.721  0
obs  exp
40
= 2.94
= 2.94
t

t

2
sd
sd
4.06
1  .721
(9)  (.2444)
8
This has a Student’s t distribution with 8 degrees of freedom. The p-value is between
.005 and .01.
6c) We reject the null hypothesis. Return is a function of risk.
6d) This is a confidence interval for a predicted individual.
We expect a return of Y = mX + b = (4)*(1.4)+2 = 7.6%
Our confidence interval is:
 7.6   t , 8df
 2.306

4.06 1 

1.4 12 
1


10 (9)  (.2444) 
7.6 + 5.03
7a) The first exponential smooth value of 18.4 is a weighted average of the current data (22) and
the previous smoothed value (18.4). [NOTE: the same analysis could have been done on
any of the smoothed values.]
SO: *22 + (1-)*18 = 18.4
22 + 18 – 18 = 18.4
4 = 18.4 – 18 = .4
 = .1
7b) The exponentially smoothed values will be highly stable to changes in the data.
8a) Intercept = 20.3. Folk who get 0 sleep score 20.3, on average.
Slope = 10.06. Each hour of sleep increases the grade 10.06 points, on average.
8b) The coefficient of determination (r2) is the percentage of the variation in Y that’s explained
by X.
Here, there’s 23825.8 total variation in Grade, of which 12339.5 is explainable by Sleep.
So r2 = 12339.5/23825.8 = .518
(There are other ways to get this number, but this is by far the easiest.)
Download