DS350 – QUANTITATIVE METHODS FOR BUSINESS DECISIONS FALL SEMESTER 2003 “Big Quiz” #3 Answer the following questions in the space provided. SHOW YOUR WORK when appropriate. Unless the problem states otherwise, use the traditional confidence level of 95% and the traditional significance level of =.05. Relative problem weights are given in brackets; these total 100 points. Writing “Pledged” before your signature is a symbol of your ongoing commitment to the Honor System. Enjoy!! Question 1 [4 points]: Dietrich Buxtehude is conducting a hypothesis test to investigate whether there is a connection between statistics aptitude and sleep. He computes a positive correlation, and obtains a p-value of .000042. What conclusion should he draw? _____ _____ _____ _____ There is enough evidence to believe that statistics aptitude and sleep are related. There is not enough evidence to believe that statistics aptitude and sleep are related. There is enough evidence to believe that statistics aptitude and sleep are not related. There is not enough evidence to believe that statistics aptitude and sleep are not related. Question 2 [2 points]: Is the result in Question 1 statistically significant? Explain. Question 3 [12 points, divided as indicated]: The Mall-Mart Corporation wants to open a new SuperStore in the thriving metropolis of Bean Blossom, Indiana. This has created some controversy in the town. A random sample of 400 adult town residents was surveyed on their opinions; 240 of these supported opening of the SuperStore. a) [8] Give a confidence interval for the percentage of all adult town residents who favor opening of the SuperStore. b) [4] Based upon your confidence interval, may we conclude (with 95% confidence) that a majority of the town favors opening of the SuperStore? Explain. Question 4 [18 points, divided as indicated]: Voters in Orange County recently defeated a referendum to increase taxes to fund improvements in the county school system. Analysis of the election data revealed some patterns in the voting. For example, a poll taken a few days before the election asked for respondents’ attitudes toward the referendum. Age of the respondent was included as a demographic question on the survey. A cross-tabulation of responses on these questions is given below. Age of respondent: Favored 18-35 100 35-65 180 65 and older 40 Opposed 100 220 160 a) [4] State the appropriate null and alternative hypothesis (in words and in symbols) for investigating whether a respondent’s age is related to her/his opinion regarding the tax referendum. b) [8] Compute the test statistic for testing your hypotheses from Part A. State the distribution and (if appropriate) the degrees of freedom. Give the p-value of the test. c) [6] Draw an appropriate conclusion, both in statistics jargon (reject/don’t reject) and in the language of the problem. Question 5 [10 points]: Dr. Rasp has data on the number of hours of sleep, and the “big quiz” score, for three students in statistics class. Alphonso got 6 hours of sleep, and got a 78 on the “quiz.” Balph got only one hour of sleep and had a 42 on the “quiz,” while Clorinda got a 96 after 8 hours of sleep. Find the (sample) correlation between sleep and quiz score. Question 6 [30 points, divided as indicated]: Recall that the security market line is the regression line relating the return and risk (as measured by the beta coefficient) of stocks. Data for a random sample of ten stocks on the New York Stock Exchange are given below. Risk (X) .4 Return (Y) 2.6 .7 2.8 1 3.5 .2 3.8 1.3 5.2 1.6 7.4 .9 7.6 1.1 8.4 1 8.5 1.8 10.2 Note that, for these data, the (sample) correlation is .721. The slope is 4 and the intercept is 2. The error variance (se2) is 4.06. a) [6] State (in words and in symbols) the appropriate null and alternative hypotheses for investigating whether in fact “return is a function of risk.” b) [8] Compute the test statistic for testing your hypotheses from Part A. State the distribution and (if appropriate) the degrees of freedom. Give the p-value of the test. c) [6] Draw an appropriate conclusion, both in statistics jargon (reject/don’t reject) and in the language of the problem. d) [10] Predict the return for WorldWide Widget, which has a beta of 1.4. Give a 95% confidence interval for this value. Question 7 [14 points, divided as indicated]: Gracetta Squornshellous has done (single) exponential smoothing on a set of data. Results are as follows: Data 18 22 28 42 Exponential Smooth 18 18.4 19.36 21.624 a) [6] What smoothing constant () was used, to do the exponential smoothing? b) [4] What does this smoothing constant () indicate? ______ The exponentially smoothed values will be highly sensitive to changes in the data. ______ The exponentially smoothed values will be highly stable to changes in the data. ______ The exponentially smoothed values will reject the null hypothesis. ______ The exponentially smoothed values will not reject the null hypothesis. c) [4] Gracetta uses the exponentially smoothed values to forecast the following data point. What is the mean square error (MSE) of the forecasts? Question 8 [10 points]: Horatio Wajberlinski has computed a regression model relating a student’s grade on the “big quiz” in statistics class to the number of hours of sleep s/he got the night before the “quiz.” Excel output showing results from his analysis is given on the following page. a) [6] Give the slope and intercept for the model. Interpret these numbers. b) [4] Horatio spilled coffee on part of the printout, and he can no longer read the top three numbers. But they can still be computed from information given elsewhere on the printout. Give the coefficient of determination (r2) for these data. Interpret this number. REGRESSION OUTPUT for Question #8 Regression Statistics Multiple R ?? R Square ?? Adjusted R Square ----Standard Error 25.26 Observations 20 ANOVA df Regression Residual Total Intercept Sleep 1 18 19 SS 12339.5 11486.3 23825.8 Coefficient Standard s Error 20.30 8.89 10.06 2.29 MS 12339.5 638.1 t Stat 2.28 4.40 F 19.3369 P-value 0.03 0.00 Significance F 0.000347 Lower 95% 1.61 5.25 Upper 95% 38.99 14.87 DS350 – FALL 2003 – “BIG QUIZ” #3 - SOLUTIONS 1) There is enough evidence to believe that statistics aptitude and sleep are related. 2) Yes. “Significant” means “we rejected the null hypothesis.” 3a) Confidence interval on a proportion: [best guess] + [ # ]*[st dev] We observed 240/400 = .60 .6 z 1.96 (.6) (.4) → .6 + .048. 400 3b) YES, we’re pretty sure a majority of the town favors opening the SuperStore. We don’t know what the population proportion is exactly, but we’re pretty sure it’s between 55.2% and 64.8% - a majority under any circumstance. 4a) H0: age and opinion are not related, 18-35 = 35-65 = 65+ HA: age and opinion are related; at least one of the ’s is not equal to the others. 4b) Since it involves multiple proportions, it’s a chi-square test. The expected cell frequencies are: Age: 18-35 35-65 65+ Favored 120 240 120 80* 160 80 The chi-square statistic is 2 Opposed *EXAMPLE: [row total]*[col total]/[total total] 200*320/800 = 80 obs exp 2 exp Here this is 100 802 100 1202 180 1602 220 2402 40 802 160 1202 80 120 160 240 80 120 = 5 + 3.333 + 2.5 + 1.667 + 20 + 13.333 = 45.833 This is a chi-square statistic with 2 degrees of freedom. The p-value is <.005. 4c) We reject the null hypothesis. There is reason to believe that opinion about the tax referendum depends upon age. 5) First compute the covariance. First method: X Y X-Xbar 6 78 1 1 42 -4 8 96 3 Total: Second method: Y-Ybar product X Y X*Y 6 6 6 78 468 -30 120 1 42 42 24 72 8 96 768 198 15 216 1278 1 1278 (15) (216) 198 3 Covariance = = 99 Covariance = = 99 2 2 Then Correlation = Covariance/[ SD(x) * SD(y) ] = 99/(3.605*27.79) = .9986 6a) H0: return is not related to risk, =0 or =0 HA: return is a function of risk, >0 or >0 (one-tailed) 6b) Test on correlation: or Test on slope: obs exp .721 0 obs exp 40 = 2.94 = 2.94 t t 2 sd sd 4.06 1 .721 (9) (.2444) 8 This has a Student’s t distribution with 8 degrees of freedom. The p-value is between .005 and .01. 6c) We reject the null hypothesis. Return is a function of risk. 6d) This is a confidence interval for a predicted individual. We expect a return of Y = mX + b = (4)*(1.4)+2 = 7.6% Our confidence interval is: 7.6 t , 8df 2.306 4.06 1 1.4 12 1 10 (9) (.2444) 7.6 + 5.03 7a) The first exponential smooth value of 18.4 is a weighted average of the current data (22) and the previous smoothed value (18.4). [NOTE: the same analysis could have been done on any of the smoothed values.] SO: *22 + (1-)*18 = 18.4 22 + 18 – 18 = 18.4 4 = 18.4 – 18 = .4 = .1 7b) The exponentially smoothed values will be highly stable to changes in the data. 8a) Intercept = 20.3. Folk who get 0 sleep score 20.3, on average. Slope = 10.06. Each hour of sleep increases the grade 10.06 points, on average. 8b) The coefficient of determination (r2) is the percentage of the variation in Y that’s explained by X. Here, there’s 23825.8 total variation in Grade, of which 12339.5 is explainable by Sleep. So r2 = 12339.5/23825.8 = .518 (There are other ways to get this number, but this is by far the easiest.)