DS350 – QUANTITATIVE METHODS FOR

advertisement
DS350 – QUANTITATIVE METHODS FOR BUSINESS DECISIONS
SPRING SEMESTER 2004
“Knowledge Festival” #3 – Version 3
Answer the following questions in the space provided. SHOW YOUR WORK when
appropriate. Unless the problem indicates otherwise, use the traditional confidence level of 95%
and the traditional significance level of =.05. Relative problem weights are given in brackets;
these total 100 points.
This “big quiz” is administered under the provisions of the Stetson University Honor System.
You are expected to act with academic integrity while taking this “quiz,” and to facilitate
integrity on the part of your fellow-students (keeping answers covered, not discussing questions
with later sections, etc.). The word “pledged” before your signature is a symbol of your ongoing
commitment to the Honor System.
ENJOY!!!!!
Question 1 [5 points]:
Prunella Mildmungle is investigating whether statistics majors get more sleep than
average. She computes a p-value of .000846. What conclusion should she draw?
____ Reject the null hypothesis.
____ Don’t reject the null hypothesis.
____ Accept the null hypothesis
____ Reject the alternative hypothesis.
____ Don’t reject the alternative hypothesis.
____ Accept the alternative hypothesis.
Question 2 [5 points]:
Muford P. Frindlegast is investigating whether pepperoni pizza causes cancer. He has
rejected his null hypothesis. What conclusion should he draw?
____ There is enough reason to believe pepperoni pizza does not cause cancer.
____ There is not enough reason to believe pepperoni pizza causes cancer.
____ There is not enough reason to believe pepperoni pizza does not cause cancer.
____ There is enough reason to believe pepperoni pizza does cause cancer.
Question 3 [4 points]:
Hortensia Mae Prindlesnout is testing whether brain damage causes cell phone use. What
would a Type I error be?
Question 4 [6 points]:
Ludwig Merkwingle is fitting a regression model. He notes (from the Microsoft Excel
printout) that for his data, r2=1. Which of the following statements will be true? (NOTE: There
may be more than one correct answer; check all that apply.)
____ The two sample means (for the “X” and “Y” variables) are equal.
____ The two population means (for the “X” and “Y” variables) are equal.
____ All the data are the same.
____ The error variance (se2) is zero.
____ As “X” increases, “Y” tends to increase.
____ All the data lie on a straight line.
Question 5 [20 points, divided as indicated]:
Clorinda Cragdingle owns a small portfolio of three stocks. Data on their returns (in %)
and risks (beta coefficients) are given below. Clorinda recalls from her finance class that “return
is a function of risk,” and that the regression line measuring this phenomenon is called the
security market line.
Company
WorldWide Widget
Amalgamated Fratostat
Sirius Cybernetics
Return
6%
11%
19%
Risk
.5
1
1.5
a) [4] Compute the standard deviation for the “Risk” variable.
b) [1] Which is the “X” variable in the regression – RETURN
or
RISK?
c) [9] Compute the slope and intercept of the regression line for these data.
d) [6] Interpret the slope and intercept, in context of the problem.
Question 6 [6 points]:
Anastasia Romanova has data on daily returns of her portfolio, for Monday through
Thursday of this week. The data are
3
5
4
2
Do single exponential smoothing on these data, using =.8 .
Question 7 [22 points, divided as indicated]:
Balph Snerdwell frequently ignores Dr. Rasp’s excellent advice, and does not get a good
night’s sleep before attending a “knowledge festival.” Balph sets about to prove that he is right
and Dr. Rasp is wrong. He surveys a random sample of 42 fellow-students, and obtains data on
their grade point average and the number of times this semester which they have pulled an
all-nighter. He fits a regression model to the data. The Microsoft Excel output is given below.
Also given are the mean and the standard deviation of the two variables.
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.546
0.298
0.281
0.997
42
AllNighters
GPA
Mean
St. Dev.
4.71
3.24
1.92
1.18
ANOVA
df
Regression
Residual
Total
1
40
41
Coefficients
Intercept
AllNighters
2.854
-0.198
SS
16.920
39.788
56.708
Standard
Error
0.274
0.048
MS
16.920
0.995
t Stat
10.417
-4.124
F
17.011
P-value
0.000
0.000
Significance F
0.000
Lower
95%
2.300
-0.296
Upper
95%
3.408
-0.101
a) [4] Balph has 10 all-nighters this semester. Predict his grade point average.
b) [10] Give an 80% confidence interval for your result in Part A.
c) [6] Do Balph’s results tend to support Balph’s or Dr. Rasp’s point of view? Explain. Are the
findings statistically significant?
d) [2] One number in the printout (.048) is shaded. Show how that number was computed.
Question 8 [4 points]:
Clyde Arthur Fazenbaker knows that the Stanford-Binet IQ test is calibrated to give a
mean score of 100. He wants to know whether being left-handed has any effect upon average
intelligence. He recruits 42 left-handed people for his study, and administers IQ tests to each of
them. He decides to test H0: X =100 vs. HA: X  100. Are these hypotheses appropriate?
Explain.
Question 9 [4 points]:
When Dr. Rasp was teaching at the University of Alabama, he had 600 students in his
introductory statistics classes. Let’s suppose, just for fun, that one day Dr. Rasp brings a large
box to class, containing millions and millions of random numbers each written on a slip of paper.
(Unknown to the students in the class, the average of all the millions and millions of random
numbers in the box is 42.) Dr. Rasp asks each of the 600 students in the class to draw five pieces
of paper from the box, and to compute the traditional 95% confidence interval for the mean. Of
course, the 600 students will all have different confidence intervals. How many of those
confidence intervals will contain the number 42?
____ Probably about 600 of them.
____ Probably about 42 of them.
____ Probably about 570 of them.
____ Probably about 30 of them.
____ Probably about 300 of them.
____ Probably about 0 of them.
____ We can’t tell from the information given.
Question 10 [4 points]:
The Stetson University Biology Department has kept careful records on the number of
caterpillars observed on campus each year, ever since the university’s founding in 1873. Biology
major Zenobia Fritterling, as part of her senior research project, has done single exponential
smoothing on these data. She observes that the best fit for the data, in terms of predictive
accuracy, is to use a smoothing constant of =.042. Which of the following may she best
conclude from her result?
____ Her results are statistically significant, since she obtained a number smaller than .05.
____ Her exponentially smoothed values will display a high degree of sensitivity.
____ The number of caterpillars this year doesn’t tell us very much about the number next year.
____ The mean square error of her forecasts will be fairly small.
Question 11 [4 points]:
Repeated studies have shown that physical fitness is bad for people. Gracetta
Squornshellous and Horatio Wajberlinski have collected a data set on the average amount of
time, per week, that people devote to physical activity, and the number of injuries they incur over
the course of the year. Gracetta computes the correlation for the data, and tests whether that
correlation is zero. Horatio computes a slope for the data, and tests whether that slope is zero.
How will their test statistics compare?
____ Gracetta will have a larger test statistic than Horatio.
____ Horatio will have a larger test statistic than Gracetta.
____ The two test statistics will be equal.
____ We can’t tell from the information given.
Question 12 [4 points]:
Ismerelda does a follow-up study to the one conducted by Gracetta and Horatio in the
previous question. She uses a smaller data set (only 35 people), but uses a more detailed survey
to obtain more accurate data on the amount of time that people are engaging in physical
activities. For her data, she computes a correlation of .4. She tests whether this correlation is
significantly different from 0, and obtains a test statistic of 2.5. Which of the following is the
best conclusion from her study?
____ The results are statistically significant. Fitness accounts for a large percentage
(around 95%) of people’s injuries.
____ It appears that people who exercise more do tend to have more injuries. However,
fitness accounts for a relatively small percentage (less than 20%) of injuries.
____ Since the p-value is fairly small (less than one percent, one-tailed test), there’s very
little reason to believe that fitness causes injuries.
____ The results are statistically insignificant. The number of injuries appears totally
unrelated to the amount of physical activity.
Question 13 [7 points, divided as indicated]:
Berengaria Naverre thinks that there might plausibly be a relationship, as Dr. Rasp
claims, between the amount of sleep a student gets and her/his score on a “big quiz.” She
decides to check this out. She surveys 42 of her fellow-students. She asks each student whether
they got “little,” “some,” or “a lot of” sleep before the last knowledge festival. She also asked
about their grade: “good” (A or B), “OK” (C), or “suboptimal” (D or F).
a) [5] Which is the best statistical procedure for Berengaria to use in analyzing the data?
____ one-sample test on means
____ paired data test on means
____ paired data test on proportions
____ chi-square test
____ one-sample test on proportions
____ independent samples test on means
____ independent samples test on proportions
____ t-test on correlation
b) [2] This is an example of which of the following?
____ controlled experiment
____ prospective observational study
____ retrospective observational study
Question 14 [5 points]:
Euterpe Waldfogel wants to know whether the perceived credibility (or lack thereof) of
the accounting firm doing the auditing for a major corporation has any real impact upon the
performance of the corporation’s stock. She identifies forty major corporations that were audited
by Arthur Anderson, during the scandals associated with that firm. For each of these forty
corporations she identified another company in the same industry, and of similar size, which was
audited by another Big Five accounting firm. She then obtains stock market data (annual returns,
in percentage) for each company in her study.
Which is the best statistical procedure for Euterpe to use in analyzing the data?
____ one-sample test on means
____ paired data test on means
____ paired data test on proportions
____ chi-square test
____ one-sample test on proportions
____ independent samples test on means
____ independent samples test on proportions
____ t-test on correlation
DS350 – SPRING 2004 – “BIG QUIZ” #3 – VERSION 3 - KEY
1)
2)
3)
4)
Reject the null hypothesis.
There is enough reason to believe pepperoni pizza does cause cancer.
We say that brain damage causes cell phone use, but in reality it does not.
The error variance (se2) is zero. AND All the data lie on a straight line.
5a) Variance = [ (.5-1)2 + (1-1)2 + (1.5-1)2 ] / 2 = .25
OR Variance = [ {(.52) + (12) + (1.52)} – (1/3)*(.5+1+1.5)2 ] / 2 = .25
So the standard deviation is the square root of .25, or .5.
5b) The “X” variable is RISK.
5c) Compute the covariance by one of the following methods:
X
Y
.5
6
1
11
1.5
19
TOTALS:
X-Xbar
-.5
0
.5
Y-Ybar
-6
-1
7
6.5
product
3
0
3.5
3
36
X
Y
.5
6
1
11
1.5
19
42.5
XY
3
11
28.5
Covar = 6.5/2 = 3.25
OR= [42.5 – (1/3)*3*36]/2 = 3.25
Slope = Covar/Var(X) = 3.25/.25 = 13
To get the intercept, plug the sample means into Y=mX=b: 12=13*1+b, or intercept = -1
5d)
Each additional point of risk (beta) increases return by 13%, on average.
When there is 0 risk, the average return is –1%.
6)
3
3
5 .8*5 + .2*3 = 4.6
4 .8*4 + .2*4.6 = 4.12
2 .8*2 + .2*4.12 = 2.42
7a) Y = -.198*10 + 2.854 = .874

(10  4.71) 2 
1
7b)  .874   t , 40df  1.303 .995  1 

2 
 42 (41)  (3.24 ) 
7c) The results tend to support Dr. Rasp’s findings, because the slope is negative, indicating that
the more all-nighters a student pulls, the lower the g.p.a. tends to be. The result is
statistically significant, because the p-value for the slope is (approx.) 0.
.995
7d) This is the standard deviation of the slope – “Door #2”.
(41)  (3.24 2 )
8) No. We should test the population mean (), not the sample mean.
9) Probably about 570 of them. [A 95% confidence interval means the interval contains the
correct value 95% of the time.]
10) The number of caterpillars this year doesn’t tell us very much about the number next year.
11) The two test statistics will be equal. [They’re both testing “no relationship.”]
12) It appears that people who exercise more do tend to have more injuries … . [The result was
significant, so more exercise does mean more injuries. However, correlation = .4, so r-square
is .16 – exercise explains only 16% of injuries.]
13) Chi-square test, retrospective observational study.
14) Paired data test on means.
Download