*** DS280 – INTRODUCTION TO STATISTICS *** SPRING SEMESTER 2003 “BIG QUIZ” #3 - T INSTRUCTIONS: Write your name at the top of this page. (It’s worth two points!) Writing “Pledged” before your signature indicates your ongoing commitment to the Honor System. There are 100 points worth of questions on this “quiz” – relative problem weights are given in brackets. Answer the questions in the space provided. SHOW YOUR WORK on computational problems. Enjoy!! Question 1 [3 points]: For a normal distribution, approximately ________% of the data lie within one standard deviation of the mean; approximately _________ % lie within two standard deviations of the mean, and approximately __________% lie within three standard deviations of the mean. Question 2 [6 points]: Find the sample standard deviation for the following data: 4 9 2 5 Question 3 [12 points; 6 each part]: Recall that IQ scores are normally distributed with a mean of 100 and standard deviation of 16. a) What percentage of the population has IQ between 64 and 136? b) What IQ do you need to score in the bottom 70% of the population? Question 4 [10 points, divided as indicated]: Clyde Arthur Fazenbaker doesn’t like broccoli. He decides to prove that broccoli is bad for people. He obtains data from three countries on per capita broccoli consumption and the cancer death rate. The data are below. Country Boravia Kafoonistan Lower Slobovia Broccoli Consumption 20 40 60 Cancer Rate 100 50 30 a) [8] Compute the correlation between broccoli consumption and cancer rate, for these data. b) [2] Is the sign (positive or negative) of this correlation consistent with Clyde Arthur’s theory? Explain. Question 5 [4 points]: Murgatroyd Applegarth’s portfolio gained 60% in 1999, and gained 90% the following year. However, she broke even (neither gaining nor losing money) in 2001 and lost 35% of value in 2002. What has been her average rate of return over the four years? Question 6 [10 points]: Tickets in the Boravian National Lottery cost one Boravian dollar apiece. Each ticket has a three-digit number on it – 000 through 999. Two tickets are drawn at random. If your ticket is the first one drawn, you win $500 (net – you get your dollar back plus $500 more). If your ticket is the second one drawn, you win $250 net. Otherwise you lose your dollar. Find the expected value and variance of the net winnings in the Boravian Lottery. Question 7 [4 points]: The Orlando Sentinel this week carried a news article about a conflict between restaurant owners and environmentalists over Chilean sea bass. Environmental activists are concerned that the commercial fishing industry is rapidly depleting the population of sea bass. Restaurant owners, however, do not wish to see any further restrictions on sales of this popular fish. Underlying the debate is a statistical issue – that of forecasting the sea bass population, and assessing the impact that commercial fishing has. Which of the following is most appropriate in this context? _____ catastrophe model _____ exponential growth model _____ least squares regression model _____ saturation model Question 8 [4 points]: Next year Stetson University will formally adopt an Honor Code for student academic conduct. The intent is to encourage and reinforce standards for honesty and integrity in academic work. Clearly, we would all be better off if everyone were academically honest. However, some students cheat because they perceive they can obtain a short-term benefit (at the expense of fellow-students). Of course, as cheating becomes widespread, everyone suffers. Which of the following best describes this scenario? _____ regression to the mean _____ prisoner’s dilemma _____ spurious correlation _____ false positive rate Question 9 [4 points]: A recent study of self-made millionaires (that is, those who made, rather than inherited, their fortunes) showed that most of the children of these millionaires were financially wealthy, or at least well-off – but were generally not as successful in business as the parents. This is an example of which of the following? _____ substitution bias _____ correlation is not causality _____ extrapolation beyond the range _____ regression to the mean Question 10 [2 points]: Dietrich Buxtehude has enough money to invest in two stocks. All else being equal, which of the following is the best strategy for him, to reduce his risk? _____ Put all his money in one company, so he’s not exposed to risks incurred by the other. _____ Invest in two companies that are highly positively correlated, so when good things happen to one they’ll tend to happen to the other. _____ Invest in two companies that are essentially independent, so there will be approximately zero correlation between their returns. _____ Invest in two companies with a negative correlation of returns, so that gains in one will tend to be offset by losses in the other. Question 11 [8 points]: Anastasia Romanova rolls five standard, six-sided dice. What is the probability she gets at least one “4”? Question 12 [7 points, divided as indicated]: Muford Dudelsack reads in the newspaper about a research study that found a negative correlation between intelligence and the amount of time spent watching television. He concludes that the smarter you are, the less you tend to watch TV. a) [4] Is this a correct interpretation of the study’s findings? Explain. b) [3] Muford decides to replicate the study. He’s pretty lazy, though – and obtains only two data points. What correlation will he get for his data? (Or can’t we tell, from the information given?) Explain. Question 13 [12 points, divided as indicated]: The year is 2503, and residents of the Moon are worried about sabotage of life support systems by terrorists from Mars. The Division of Moonbase Security believes it has developed a screening system that can identify potential terrorists with 99.9% reliability. They propose using the system on every one of the 10,000 tourists who visit the Moon from Mars annually. a) [8] Suppose that only ten of the 10,000 are in fact terrorists. What is the false positive rate for this procedure? b) [4] Are “being a terrorist” and “being labeled a terrorist by the screening system” independent or dependent events? Explain. Question 14 [6 points]: Wilmot Proviso is deciding whether to invest in a company that sells reliable used cars to low-income individuals, or one which repossesses these cars when owners miss payments. He estimates that in good economic times the sales company will earn him $25,000 while the repossession company will lose him $10,000. On the other hand, in bad times the first company will lose $8000 while the second firm will gain him $15,000. Wilmot has no idea about the likelihood that economic times will be good next year. What is his break-even point – the probability at which the two options are balanced? (Any higher chance of good times makes him go with the sales company, while with any higher chance of bad times he goes with the other firm.) Question 15 [6 points]: Once again it’s time for Dr. Rasp to investigate his favorite regression model – the relationship between the amount of sleep students get the night before the “big quiz” and their score on the “quiz.” Excel output from his data analysis is given below. SUMMARY OUTPUT Regression Statistics Multiple R 0.761 R Square 0.580 Adjusted R Square 0.557 Standard Error 23.474 Observations 20 ANOVA df Regression Residual Total Intercept Sleep 1 18 19 SS 13688.5 9918.7 23607.1 Coefficients Standard Error 20.5 8.26 9.4 1.89 MS 13688.5 551.0 F Significance F 24.8 9.61E-05 t Stat P-value Lower 95% Upper 95% 2.48 0.02 3.11 37.84 4.98 0.00 5.45 13.39 Give the slope and intercept of the regression model. Interpret these quantities, in context of the problem. ***** DS280 – SPRING 2003 – BIG QUIZ #3-T – SOLUTIONS ***** 1) Approximately 2/3 within 1 sd; approx. 95% within 2 sd; virtually all within 3 sd 1 2 2 X 2 X X X n 2) Variance = = n 1 n 1 First method: Second method: 2 X X X2 X X X X 4 -1 1 4 16 9 4 16 9 81 2 -3 9 2 4 5 0 0 5 25 Total = 26 Totals: 20 126 X =5 126 (1 / 4) (20) 2 26 = 8.667 OR Variance = = 8.667 3 3 Hence the standard deviation is 8.667 = 2.94 So Variance = 3a) z = (136-100)/16 = 2.25 and z = (64-112)/16 = -2.25. Looking z=2.25 up in the table gives an area of .4878. So our probability is .4878 + .4878 = .9756 3b) To score in the bottom 70%, we have the bottom half of the curve plus an additional 20%. Looking up a probability of 20% gives a z-score of .52. That is, the cutoff point is .52 standard deviations above the mean. Hence: 100 + (.52)*(16) = 108 X X Y Y XY n X Y 1 4a) Covariance = First method: X Y 20 100 40 50 60 30 n 1 n 1 Second method: X Y 20 100 40 50 60 30 product X*Y Y Y X X -20 40 -800 2000 0 -10 0 2000 20 -30 -600 1800 Total: -1400 120 180 5800 5800 1 / 3 (120) (180) 1400 Covariance = = -700 OR Covariance = = -700 2 2 Covariance 700 = = -.97 SD(x) SD(y) (20) (36.05) 4b) This is NOT consistent with Clyde Arthur’s prediction. He says that higher broccoli consumption should be associated with higher cancer rates – but the negative correlation here indicates that higher broccoli consumption is associated with lower cancer rates. Correlation = 5) 4 1.6 1.9 1.65 = 1.186, so 18.6% average rate of return 6) Possible outcome +500 +250 -1 Probability 1/1000 1/1000 998/1000 E(X) = (500)*(.001)+(250)*(.001)+(-1)*(.998) E(X) = (500)*(.001)+(250)*(.001)+(-1)*(.998) = -.248 V(X) = [(500)2*(.001)+(250)2*(.001)+(-1)2*(.998)] – (-.248)2 = 313.44 7) catastrophe model 8) prisoner’s dilemma 9) regression to the mean 10) Invest in two companies with negative correlation of returns … 11) Pr(at least one 4) = 1 – Pr(no 4’s) = 1 – (5/6)5 = .598 12a) Yes. A negative correlation means that high values on one variable tend to be associated with low values on the other, and vice versa. No causal relationship is implied – but no causal statement is being made here. 12b) With just two data points, we know they have to lie exactly on the line. Hence the correlation will be +1. (Granted, it won’t be a particularly informative correlation – but that’s not what the question is asking.) 13a) This question is taken verbatim from Big Quiz #2 (changing only one number). | Terrorist | Non-terrorist | TOTAL Labeled a terrorist | .999*10 = 9.99 |.001*9990 = 9.99 | 19.98 Labeled a non-terrorist | | | TOTAL | 10 | 9990 | 10,000 Hence the false positive rate is 9.99/19.98 = 50% 13b) These are dependent events – your chance of being labeled a terrorist is much higher if you actually are one than if you aren’t. The two ARE related. 14) The payoff matrix is: Good times Bad times Car sales company $25,000 -$8,000 Car repossession -$10,000 $15,000 The break-even point occurs at the probability for which the two payoffs balance: (25000)(p)+(-8000)(1-p) = (-10000)(p)+(15000)(1-p) 25000p – 8000 + 8000p = -10000p + 15000 – 15000p 33000p – 8000 = 15000 – 25000p 58000p = 23000 p = 23000/58000 = .397 15) slope = 9.4 – Every additional hour of sleep increases the score 9.4 points, on average. Intercept = 20.5 – Folk with no sleep score a 20.5, on average.