DS350 – QUANTITATIVE METHODS FOR BUSINESS DECISIONS SPRING SEMESTER 2004 “Knowledge Festival” #3 – Version 3 Answer the following questions in the space provided. SHOW YOUR WORK when appropriate. Unless the problem indicates otherwise, use the traditional confidence level of 95% and the traditional significance level of =.05. Relative problem weights are given in brackets; these total 100 points. This “big quiz” is administered under the provisions of the Stetson University Honor System. You are expected to act with academic integrity while taking this “quiz,” and to facilitate integrity on the part of your fellow-students (keeping answers covered, not discussing questions with later sections, etc.). The word “pledged” before your signature is a symbol of your ongoing commitment to the Honor System. ENJOY!!!!! Question 1 [5 points]: Prunella Mildmungle is investigating whether statistics majors get more sleep than average. She computes a p-value of .000846. What conclusion should she draw? ____ Reject the null hypothesis. ____ Don’t reject the null hypothesis. ____ Accept the null hypothesis ____ Reject the alternative hypothesis. ____ Don’t reject the alternative hypothesis. ____ Accept the alternative hypothesis. Question 2 [5 points]: Muford P. Frindlegast is investigating whether pepperoni pizza causes cancer. He has rejected his null hypothesis. What conclusion should he draw? ____ There is enough reason to believe pepperoni pizza does not cause cancer. ____ There is not enough reason to believe pepperoni pizza causes cancer. ____ There is not enough reason to believe pepperoni pizza does not cause cancer. ____ There is enough reason to believe pepperoni pizza does cause cancer. Question 3 [4 points]: Hortensia Mae Prindlesnout is testing whether brain damage causes cell phone use. What would a Type I error be? Question 4 [6 points]: Ludwig Merkwingle is fitting a regression model. He notes (from the Microsoft Excel printout) that for his data, r2=1. Which of the following statements will be true? (NOTE: There may be more than one correct answer; check all that apply.) ____ The two sample means (for the “X” and “Y” variables) are equal. ____ The two population means (for the “X” and “Y” variables) are equal. ____ All the data are the same. ____ The error variance (se2) is zero. ____ As “X” increases, “Y” tends to increase. ____ All the data lie on a straight line. Question 5 [20 points, divided as indicated]: Clorinda Cragdingle owns a small portfolio of three stocks. Data on their returns (in %) and risks (beta coefficients) are given below. Clorinda recalls from her finance class that “return is a function of risk,” and that the regression line measuring this phenomenon is called the security market line. Company WorldWide Widget Amalgamated Fratostat Sirius Cybernetics Return 6% 11% 19% Risk .5 1 1.5 a) [4] Compute the standard deviation for the “Risk” variable. b) [1] Which is the “X” variable in the regression – RETURN or RISK? c) [9] Compute the slope and intercept of the regression line for these data. d) [6] Interpret the slope and intercept, in context of the problem. Question 6 [6 points]: Anastasia Romanova has data on daily returns of her portfolio, for Monday through Thursday of this week. The data are 3 5 4 2 Do single exponential smoothing on these data, using =.8 . Question 7 [22 points, divided as indicated]: Balph Snerdwell frequently ignores Dr. Rasp’s excellent advice, and does not get a good night’s sleep before attending a “knowledge festival.” Balph sets about to prove that he is right and Dr. Rasp is wrong. He surveys a random sample of 42 fellow-students, and obtains data on their grade point average and the number of times this semester which they have pulled an all-nighter. He fits a regression model to the data. The Microsoft Excel output is given below. Also given are the mean and the standard deviation of the two variables. Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.546 0.298 0.281 0.997 42 AllNighters GPA Mean St. Dev. 4.71 3.24 1.92 1.18 ANOVA df Regression Residual Total 1 40 41 Coefficients Intercept AllNighters 2.854 -0.198 SS 16.920 39.788 56.708 Standard Error 0.274 0.048 MS 16.920 0.995 t Stat 10.417 -4.124 F 17.011 P-value 0.000 0.000 Significance F 0.000 Lower 95% 2.300 -0.296 Upper 95% 3.408 -0.101 a) [4] Balph has 10 all-nighters this semester. Predict his grade point average. b) [10] Give an 80% confidence interval for your result in Part A. c) [6] Do Balph’s results tend to support Balph’s or Dr. Rasp’s point of view? Explain. Are the findings statistically significant? d) [2] One number in the printout (.048) is shaded. Show how that number was computed. Question 8 [4 points]: Clyde Arthur Fazenbaker knows that the Stanford-Binet IQ test is calibrated to give a mean score of 100. He wants to know whether being left-handed has any effect upon average intelligence. He recruits 42 left-handed people for his study, and administers IQ tests to each of them. He decides to test H0: X =100 vs. HA: X 100. Are these hypotheses appropriate? Explain. Question 9 [4 points]: When Dr. Rasp was teaching at the University of Alabama, he had 600 students in his introductory statistics classes. Let’s suppose, just for fun, that one day Dr. Rasp brings a large box to class, containing millions and millions of random numbers each written on a slip of paper. (Unknown to the students in the class, the average of all the millions and millions of random numbers in the box is 42.) Dr. Rasp asks each of the 600 students in the class to draw five pieces of paper from the box, and to compute the traditional 95% confidence interval for the mean. Of course, the 600 students will all have different confidence intervals. How many of those confidence intervals will contain the number 42? ____ Probably about 600 of them. ____ Probably about 42 of them. ____ Probably about 570 of them. ____ Probably about 30 of them. ____ Probably about 300 of them. ____ Probably about 0 of them. ____ We can’t tell from the information given. Question 10 [4 points]: The Stetson University Biology Department has kept careful records on the number of caterpillars observed on campus each year, ever since the university’s founding in 1873. Biology major Zenobia Fritterling, as part of her senior research project, has done single exponential smoothing on these data. She observes that the best fit for the data, in terms of predictive accuracy, is to use a smoothing constant of =.042. Which of the following may she best conclude from her result? ____ Her results are statistically significant, since she obtained a number smaller than .05. ____ Her exponentially smoothed values will display a high degree of sensitivity. ____ The number of caterpillars this year doesn’t tell us very much about the number next year. ____ The mean square error of her forecasts will be fairly small. Question 11 [4 points]: Repeated studies have shown that physical fitness is bad for people. Gracetta Squornshellous and Horatio Wajberlinski have collected a data set on the average amount of time, per week, that people devote to physical activity, and the number of injuries they incur over the course of the year. Gracetta computes the correlation for the data, and tests whether that correlation is zero. Horatio computes a slope for the data, and tests whether that slope is zero. How will their test statistics compare? ____ Gracetta will have a larger test statistic than Horatio. ____ Horatio will have a larger test statistic than Gracetta. ____ The two test statistics will be equal. ____ We can’t tell from the information given. Question 12 [4 points]: Ismerelda does a follow-up study to the one conducted by Gracetta and Horatio in the previous question. She uses a smaller data set (only 35 people), but uses a more detailed survey to obtain more accurate data on the amount of time that people are engaging in physical activities. For her data, she computes a correlation of .4. She tests whether this correlation is significantly different from 0, and obtains a test statistic of 2.5. Which of the following is the best conclusion from her study? ____ The results are statistically significant. Fitness accounts for a large percentage (around 95%) of people’s injuries. ____ It appears that people who exercise more do tend to have more injuries. However, fitness accounts for a relatively small percentage (less than 20%) of injuries. ____ Since the p-value is fairly small (less than one percent, one-tailed test), there’s very little reason to believe that fitness causes injuries. ____ The results are statistically insignificant. The number of injuries appears totally unrelated to the amount of physical activity. Question 13 [7 points, divided as indicated]: Berengaria Naverre thinks that there might plausibly be a relationship, as Dr. Rasp claims, between the amount of sleep a student gets and her/his score on a “big quiz.” She decides to check this out. She surveys 42 of her fellow-students. She asks each student whether they got “little,” “some,” or “a lot of” sleep before the last knowledge festival. She also asked about their grade: “good” (A or B), “OK” (C), or “suboptimal” (D or F). a) [5] Which is the best statistical procedure for Berengaria to use in analyzing the data? ____ one-sample test on means ____ paired data test on means ____ paired data test on proportions ____ chi-square test ____ one-sample test on proportions ____ independent samples test on means ____ independent samples test on proportions ____ t-test on correlation b) [2] This is an example of which of the following? ____ controlled experiment ____ prospective observational study ____ retrospective observational study Question 14 [5 points]: Euterpe Waldfogel wants to know whether the perceived credibility (or lack thereof) of the accounting firm doing the auditing for a major corporation has any real impact upon the performance of the corporation’s stock. She identifies forty major corporations that were audited by Arthur Anderson, during the scandals associated with that firm. For each of these forty corporations she identified another company in the same industry, and of similar size, which was audited by another Big Five accounting firm. She then obtains stock market data (annual returns, in percentage) for each company in her study. Which is the best statistical procedure for Euterpe to use in analyzing the data? ____ one-sample test on means ____ paired data test on means ____ paired data test on proportions ____ chi-square test ____ one-sample test on proportions ____ independent samples test on means ____ independent samples test on proportions ____ t-test on correlation DS350 – SPRING 2004 – “BIG QUIZ” #3 – VERSION 3 - KEY 1) 2) 3) 4) Reject the null hypothesis. There is enough reason to believe pepperoni pizza does cause cancer. We say that brain damage causes cell phone use, but in reality it does not. The error variance (se2) is zero. AND All the data lie on a straight line. 5a) Variance = [ (.5-1)2 + (1-1)2 + (1.5-1)2 ] / 2 = .25 OR Variance = [ {(.52) + (12) + (1.52)} – (1/3)*(.5+1+1.5)2 ] / 2 = .25 So the standard deviation is the square root of .25, or .5. 5b) The “X” variable is RISK. 5c) Compute the covariance by one of the following methods: X Y .5 6 1 11 1.5 19 TOTALS: X-Xbar -.5 0 .5 Y-Ybar -6 -1 7 6.5 product 3 0 3.5 3 36 X Y .5 6 1 11 1.5 19 42.5 XY 3 11 28.5 Covar = 6.5/2 = 3.25 OR= [42.5 – (1/3)*3*36]/2 = 3.25 Slope = Covar/Var(X) = 3.25/.25 = 13 To get the intercept, plug the sample means into Y=mX=b: 12=13*1+b, or intercept = -1 5d) Each additional point of risk (beta) increases return by 13%, on average. When there is 0 risk, the average return is –1%. 6) 3 3 5 .8*5 + .2*3 = 4.6 4 .8*4 + .2*4.6 = 4.12 2 .8*2 + .2*4.12 = 2.42 7a) Y = -.198*10 + 2.854 = .874 (10 4.71) 2 1 7b) .874 t , 40df 1.303 .995 1 2 42 (41) (3.24 ) 7c) The results tend to support Dr. Rasp’s findings, because the slope is negative, indicating that the more all-nighters a student pulls, the lower the g.p.a. tends to be. The result is statistically significant, because the p-value for the slope is (approx.) 0. .995 7d) This is the standard deviation of the slope – “Door #2”. (41) (3.24 2 ) 8) No. We should test the population mean (), not the sample mean. 9) Probably about 570 of them. [A 95% confidence interval means the interval contains the correct value 95% of the time.] 10) The number of caterpillars this year doesn’t tell us very much about the number next year. 11) The two test statistics will be equal. [They’re both testing “no relationship.”] 12) It appears that people who exercise more do tend to have more injuries … . [The result was significant, so more exercise does mean more injuries. However, correlation = .4, so r-square is .16 – exercise explains only 16% of injuries.] 13) Chi-square test, retrospective observational study. 14) Paired data test on means.