DS350 – QUANTITATIVE METHODS FOR BUSINESS DECISIONS FALL SEMESTER 2003 “Ultimate Knowledge Festival” (known to the Mundane as the “Final Exam”) Answer the following questions in the space provided. SHOW YOUR WORK when appropriate. Relative problem weights are given in brackets; these total 100 points. Unless the problem states otherwise, use the traditional confidence level of 95% and the traditional significance level of =.05. The word “pledged” in front of your signature on this exam is a sign of your ongoing commitment to academic integrity and the Stetson University’s Honor System. ENJOY!! Question 1 [2 points]: Alphonso Ferrabosco II is conducting a hypothesis test to determine whether people can distinguish coffee from used motor oil, in a blind taste test. He obtains a p-value of .42. What statistical conclusion should he draw? _____ reject the null hypothesis _____ reject the alternative hypothesis _____ not reject the null hypothesis _____ not reject the alternative hypothesis Question 2 [2 points]: Continuing the example from Question 1, how should Alphonso phrase his conclusion, in the context of the problem? _____ There is enough evidence to believe people can tell coffee from used motor oil. _____ There is not enough evidence to believe people can tell coffee from used motor oil. _____ There is enough evidence to believe people cannot tell coffee from used motor oil. _____ There is not enough evidence to believe people cannot tell coffee from used motor oil. Question 3 [2 points]: Euterpe Waldfogel believes her pet wombat, Muffy, can successfully pick stocks. (Muffy can’t throw darts. Instead, selections are made by performing a biological function on pages of the Wall Street Journal.) Euterpe tests whether Muffy’s selections outperform the S&P 500. She obtains a p-value of .042 for her test. Which of the following is the best interpretation of this number? _____ Muffy’s picks outperformed the S&P 500 by 4.2% _____ There’s only a 4.2% chance that Muffy is actually able to pick stocks successfully. _____ There’s only a 4.2% chance that the null hypothesis is true. _____ Muffy selects good stocks 4.2% of the time. _____ Stock selections as good as the ones Muffy made would happen by chance only 4.2% of the time. Question 4 [2 points]: Wilhelmine Tempusfugit is investigating the extent to which presidential elections are influenced by the economy. She has developed a regression model relating the percentage of vote obtained by the incumbent party (Y) to the percentage change of the Dow Jones Industrial Average in the six months preceding the election (X). Which of the following is the best quantity for Wilhelmine to examine, to see whether she has a statistically significant result? _____ Is the slope of the regression line positive? _____ Is the coefficient of determination (r2) large? _____ Is the error variance (se2) small? _____ Is the t-statistic for the slope large? Question 5 [2 points]: Gracetta Squornshellous wants to use double exponential smoothing on a data set consisting of 846 weeks of sales figures at RandomNumbers-R-Us, the nation’s leading vendor of voting machines. She believes it will be wonderful and joyous (and, of course, fun) to use =.042 and =.846. Which of the following statements best describes the results of her double exponential smoothing? _____ Her results will be incorrect, because her and her don’t add to 1. _____ Because the sample size is so large, her smoothed values will be very close to the actual data. _____ Location estimates will react very rapidly to shifts in the data, while estimates of the trend will tend to be similar from one data point to the next. _____ Trend estimates will react very rapidly to shifts in the data, while estimates of the location will tend to be similar from one data point to the next. Question 6 [2 points]: The court system in the Kingdom of Boravia believes that the defendant in a criminal case should be “presumed innocent until proven guilty beyond a reasonable doubt.” Boravia is unique, however, in having a strict quantitative definition of what constitutes “reasonable” doubt. At present, juries in Boravia are required by law to use statistical methods in assessing the defendant’s guilt, and to reject the null hypothesis of innocence if (and only if) the p-value from their test is less than =.01. The Attorney General of Boravia has proposed a constitutional amendment that would change this value to =.05. (“It’s more traditional,” explained the AG in a recent interview.) If the amendment were ratified, what effect would this have on the Boravian court system? _____ More innocent people would be convicted, while more guilty people would be set free. _____ More guilty people would be convicted, while more innocent people would be set free. _____ More guilty people would be convicted, but more innocent people would be convicted. _____ More innocent people would be set free, but more guilty people would be set free. Question 7 [2 points]: Balph Snerdwell randomly selects five students from this class, and collects data on the number of hours of sleep they got last night. From these data, he computes a 95% confidence interval as 6.42 + 2. What may he conclude from this interval? _____ He’s 95% confident that the five people in his sample averaged less than 8 hours of sleep last night. _____ Out of the entire class, 95% of the students got between 4.42 and 8.42 hours of sleep last night. _____ A hypothesis test of H0: = 8 would not be rejected. _____ The population mean is between 4.42 and 8.42 hours. _____ None of the above. Question 8 [4 points]: Continuing the example from above: What did Balph obtain as the sample mean and the sample standard deviation for his five data points? (If we can’t find these numbers from the information given, explain why.) Question 9 [3 points]: Recall that the security market line is the regression line relating the expected return of a stock (Y) to its risk (X). The intercept of that line may be interpreted as “the return you expect when you incur 0 risk.” Since short-term U.S. Treasury bills are a good proxy for this risk-free rate, we know that the intercept of the security market line should equal the Treasury bill rate. What statistical procedure would we use, to test whether this is so? _____ test on correlations _____ test on slopes _____ test on means _____ test on predicted averages _____ test on predicted individuals _____ test on proportions Question 10 [3 points]: Do field goal kickers in college football perform well under pressure? (Or does it just seem like things tend to go “wide right” when the game is on the line?) Sports researcher Joe Slabotnik studied 300 randomly selected games of the thousands of Division I college football games played over the last decade. “Clutch” situations were defined as those in the last two minutes of the game, with a difference of three points or less between the two teams. He looked at the field goal conversion rate (number of field goals made, divided by total number of field goals attempted), under clutch and non-clutch situations. What test should he use to investigate his research question? _____ paired-data t-test _____ paired-data z-test _____ test on regression slopes _____ independent sample t-test _____ independent sample z-test _____ none of these Question 11 [3 points]: Do Stetson graduates make more than Rollins graduates? Throckmorton P. Addlepate IV randomly selects ten students – one student from each school, in each of the five most popular majors (English, history, psychology, biology, and of course statistics). He obtains starting salary data from each of them. What test should he use to investigate his research question? _____ chi-square test _____ paired-data t-test _____ paired-data z-test _____ test on correlations _____ independent sample t-test _____ independent sample z-test Question 12 [3 points]: Horatio Wajberlinski wants to know whether his “lucky” penny is a fair coin. He tosses it 10,000 times. (Who says there’s never anything exciting to do in DeLand?) He obtains 5042 “heads” and 4958 “tails.” What statistical test should he use? _____ one-sample z-test _____ paired data z-test _____ independent samples z-test _____ one-sample t-test _____ paired data t-test _____ independent samples t-test Question 13 [3 points]: Dr. Rasp theorizes that physical fitness is bad for people. There are extensive data indicating that people who exercise regularly suffer more injuries. What’s more (says Dr. Rasp), there is reason to believe that people who exercise regularly do less well in school. (“All that physical activity means more blood going to the muscles and less to the brain,” he theorizes.) He obtains data from ten statistics students, on average number of hours of exercise per week and on “big big quiz” grades. What test should he use to investigate his research question? _____ paired data z-test _____ independent samples z-test _____ test on regression slope _____ paired data t-test _____ independent samples t-test _____ test on predicted individuals Question 14 [3 points]: A recent Gallup Poll surveyed 1600 people nationwide – including 400 from each of four regions of the country (Northeast, South, Midwest, West). One question on the survey was whether the respondent had a generally favorable or generally unfavorable opinion of President Bush. Which of the following would best be used to test whether President Bush’s popularity varies among the four regions of the country? _____ paired-data t-test _____ chi-square test _____ test on predicted averages _____ independent samples t-test _____ test on correlations _____ test on predicted individuals Question 15 [16 points, divided as indicated]: Are people more likely to skip class the day before Spring Break, or the day after Spring Break? Hortensia Mae Prindlesnout surveyed 200 randomly selected fellow-students at the University of Southern North Dakota at Hoople. She found that 120 students skipped class the day before Spring Break, and 80 skipped class the day after Spring Break. Fifty of these individuals actually skipped class both days. (Giving the demanding academic rigor of the basketweaving major at the UofSNDatH, Hortensia found several students who skip class every day. But that’s another story … ) a) [2] State the null and alternative hypotheses, in words and in symbols. b) [8] Compute an appropriate test statistic and p-value. c) [4] Draw an appropriate conclusion, both in statistics jargon (reject/not reject) and in the context of the problem. d) [2] What would a Type I error be, in this situation? Question 16 [12 points, divided as indicated]: Recent Rollins College computer science graduate Mortimer Byttemapper is employed by Spam-R-Us, a direct marketing firm that uses email solicitation to sell a variety of products, including investments in Nigerian financial markets, mortgage refinancing and re-refinancing, and medical procedures for … uh, “enhancement” of various body parts. The data table below gives, for the past four months, the number of customers the company has and the number of email solicitations (in millions) that the company has sent. Month: # of customers million emails sent Aug. 9 13 Sep. 6 9 Oct. 8 11 Nov. 9 15 Note that, for these data, the regression equation is Y = .5X + 2. a) [1] Which is the “X” variable and which is the “Y” variable in this situation? b) [4] Find the error variance (se2) for these data. c) [7] Compute a 95% confidence interval for the slope of the line. Interpret this result, in the context of the problem. Question 17 [6 points; 2 each part; question continues on next page]: The last page of this “ultimate knowledge festival” contains a short article from a recent issue of The Economist, describing a recent study on whether being a Cambridge professor is good for you. Answer the following questions, based upon that article. a) The study was {pick one}: a controlled experiment an observational study b) The story indicates that “[a]cademic success, the choice of arts or science, and the wealth of their college were all statistically insignificant.” What does “insignificant” mean, in this context? QUESTION 17, CONTINUED: c) [2] The article notes that Cambridge professors had an average life expectancy, at age 60, of 19 years, while that for the average male at age 60 is 15.3 years. Why did the researchers need to do a hypothesis test on the data? (Isn’t it enough just to note that 19 years is more than 15.3 years, and say that professor live 3.7 years longer, on average?) Question 18 [10 points; 2 each part]: Percival Richkid III, a member of Omicron Omicron Pi Sigma fraternity, believes Dr. Jacques File discriminates against “Greek” students. (Dr. File replies: I’m not out to get Greeks. I’m out to get everybody.) Percival uses his statistical knowledge to develop a multiple regression model. He models a person’s numerical score on the “ultimate knowledge festival” as a function of four variables: the SAT-Math score, the number of hours of sleep the night before the “festival,” the number of times class was skipped, and a dummy variable for whether the student was “Greek” (0 = no, 1 = yes). An Excel printout of his results is given on the last page. (Percival spilled a “beverage” on the printout, obscuring some of the numbers.) a) Give and interpret the coefficient of determination (r2) for the model. b) Two of the Sums of Squares on the printout are missing. (They’re indicated by “???”.) Give these missing numbers. c) Use Percival’s model to predict Percival’s grade on the “big big quiz.” He has an SAT-Math score of 330 (dad made a couple of phone calls to some trustees he knows), and pulled an allnighter just before the “really big stats party” to make up for the ten classes he skipped. d) Based upon the printout, does the claim that Dr. File discriminates against Greeks appear warranted? Explain. e) Suppose the dummy variable had been coded 0 for Greeks and 1 for non-Greeks. How would this change results? Question 19 [14 points, divided as indicated]: The Sirius Cybernetics Corporation is studying the effectiveness of a proposed new advertising campaign for its new line of solar powered flashlights. Ten medium-sized test markets were selected. Five (chosen at random) received the proposed advertising campaign. The other five served as a control group; these cities received a placebo advertising campaign. Sales figures (thousands of units sold) are given below. To make your lives simpler (something I don’t normally like to do), I’ve provided the relevant means and standard deviations. I’ve also taken differences, and found the mean and standard deviation of those results as well. Of course, you may not need all of these numbers. Proposed advertising Placebo advertising Difference 50 42 8 35 42 -7 42 38 4 19 18 1 54 40 14 X = 40 X = 36 X =4 sd = 13.84 sd = 10.20 sd = 7.84 a) [2] State an appropriate null and alternative hypothesis, in words and in symbols. b) [8] Compute an appropriate test statistic. Give the p-value. c) [4] Draw an appropriate conclusion, both in statistics jargon (reject/don’t reject) an in the context of the problem. Question 20 [6 points]: Dr. Rasp recently finished re-reading Charles Dickens’ A Christmas Carol, a tragic novel in which a sensible, hard-working, prosperous businessman named Scrooge degenerates into a sentimental softie. This has prompted Dr. Rasp to develop a model for forecasting sales of lumps of coal. Obviously, there is a seasonal effect here – sales will be higher during the Christmas gift-giving season. The last page of this “Christmas present from me to the class” gives quarterly sales data. Dr. Rasp has already computed the trend line for the deseasonalized data; that line is Y = 49.6 + 10.9X. Find the seasonality coefficient for the fourth quarter of the year, and use it to forecast sales for Fourth Quarter 2003. ARTICLE FOR QUESTION 17 SUMMARY OUTPUT for Question 18 Regression Statistics Multiple R 0.822 R Square 0.676 Adjusted R Square 0.641 Standard Error 10.574 Observations 42 ANOVA df Regression Residual Total Intercept SAT-Math Sleep Skip Greek 4 37 41 SS MS <not legible> ??? <not legible> ??? 12765.71 Coefficient Standard s Error 11.295 5.794 0.058 0.009 4.148 0.709 -0.570 0.307 3.646 3.302 t Stat 1.949 6.482 5.849 -1.853 1.104 F Significance F 19.29 1.19363E-08 P-value 0.059 0.000 0.000 0.072 0.277 DATA for Question 19 2001 Qtr. I Qtr. II Qtr. III Qtr. IV 45 72 35 160 2002 81 120 55 240 2003 117 168 75 ??? Lower 95% -0.445 0.040 2.711 -1.192 -3.045 Upper 95% 23.035 0.076 5.585 0.053 10.337 SOLUTIONS 1) not reject the null hypothesis 2) There is not enough evidence to believe people can tell coffee from used motor oil. 3) Stock selections as good as the ones Muffy made would happen by chance only 4.2% of the time. 4) Is the t-statistic for the slope large? 5) Trend estimates will react very rapidly to shifts in the data, while estimates of the location will tend to be similar from one data point to the next. 6) More guilty people would be convicted, but more innocent people would be convicted. 7) A hypothesis test of H0: = 8 would not be rejected. 8) Sample mean = 6.42 (the “best guess” in the CI). sd The “+ 2” equals [ # ]*[stdev], which here is t . The t-score has 4 df. n So 2 = (2.776)*(sd/√5), and hence sd = (2*√5)/2.776 = 1.611 9) test on predicted averages 11) paired-data t-test 13) test on regression slope 10) independent sample z-test 12) one-sample z-test 14) chi-square test 15a) H0: Folk are just as likely to skip the day before as the day after. before = after HA: Folk are not just as likely to skip the day before as the day after. before after 15b) This is a test on proportions, with paired data. | Skipped after Didn’t | Total Skipped before | 50 70 | 120 Didn’t | 30 50 | 80 Total | 80 120 | 200 The 50 “yes-yes” and 50 “no-no” data points provide no information about the research question. We test to see whether the remaining data are split evenly (H0: = .5). We observed a proportion of 70/(70+30) = .7 [you could also use 30/(70+30)=.3.] The test obs exp .7 .5 statistic is z = 4. The p-value is approximately 0. sd (.5) (.5) 100 15c) Reject the null hypothesis. There is reason to believe that students are more likely to skip class the day before Spring Break than the day after. 15d) A Type I error would conclude there was a difference between the two days in propensity to skip, when in fact there was no real difference. 16a) X = million emails sent Y = # of customers 16b) X Y Yhat=.5X+2 error error-sq 13 9 8.5 .5 .25 9 6 6.5 -.5 .25 11 8 7.5 .5 .25 15 9 9.5 -.5 .25 Total 1 So se2 = 1/(n-2) = ½ 16c) CI: [best guess] + [ # ]*[st dev] error var .5 CI: .5 + (4.303)* (n - 1 3) (VarX 6.667) CI: .5 + .68, or -.18 to 1.18 We’re pretty sure that 1 million more spam messages generates somewhere between .18 fewer and 1.18 more customers. 17a) observational study 17b) “Insignificant” means that the null hypothesis was not rejected. There’s no reason to believe these variables have any real effect upon longevity. 17c) The data tell what’s true for a sample of Cambridge professors. We want to generalize to all Cambridge professors. The hypothesis test tells us whether the apparent difference can be reasonably explained by random chance, or whether there’s a real effect of being a professor. 18a) r2 = .676. The four predictor variables (SAT-Math, sleep, skipping, being Greek) together explain 67.6% of the variation in grades. 18b) There’s 12765.71 total variation in grades. Of this, 67.6% is explained by Regression, and the remaining 32.5% is explained by Error. Thus, the two Sums of Squares are .676*12765.71 = 8629.6 and .324*12765.71 = 4136.1, respectively. 18c) Y = 11.295 + .058*330 +4.148*0 - .570*10 + 3.646*1 = 28.4 18d) No, the claim does not appear warranted. The sample data indicate that Greeks actually score 3.646 points higher on the “big quiz.” However, the effect is not statistically significant (p-value = .277), so there’s no particular reason to believe the professor is out to get either Greeks or non-Greeks. 18e) The slope for the “Greek” variable would now be negative 3.646 – non-Greeks are 3.646 lower than Greeks. Other slopes would stay the same. The intercept would now be 11.295 + 3.646 = 14.941 – that’s the baseline we start computing from, for the “0” (now, Greek) group. 19a) H0: the proposed advertising campaign is ineffective; proposed = placebo HA: the proposed advertising campaign is effective; proposed > placebo 19b) Independent sample t-test. (4) (13.84 2 ) (4) (10.20 2 ) 2 Pooled variance: se = = 147.8 8 obs exp (40 36) 0 Test statistic: t = = = .52. P-value is large (p>.10, one-tailed) sd 147.8 147.8 5 5 19c) Do not reject the null hypothesis. There is not enough evidence to believe the proposed ad campaign really increases sales. 20) Trend line for Q4 of 2001: 49.6 + 10.9*4 = 93.2. Actual = 160. Ratio = 160/93.2 = 1.72 Trend line for Q4 of 2002: 49.6 + 10.9*8 = 136.8. Actual = 240. Ratio = 240/136.8 = 1.75 Average seasonality: (1.72+1.75)/2 = 1.735 For Q4 of 2003, trend is 49.6 + 10.9*12 = 180.4. Now multiply by the seasonality: 180.4*1.735 = 313 is the forecast sales.