Exam 3 - Stetson University

advertisement
*** DS280 – INTRODUCTION TO STATISTICS ***
SPRING SEMESTER 2003
“BIG QUIZ” #3 - T
INSTRUCTIONS: Write your name at the top of this page. (It’s worth two points!) Writing
“Pledged” before your signature indicates your ongoing commitment to the Honor System.
There are 100 points worth of questions on this “quiz” – relative problem weights are given in
brackets. Answer the questions in the space provided. SHOW YOUR WORK on computational
problems. Enjoy!!
Question 1 [3 points]:
For a normal distribution, approximately ________% of the data lie within one standard
deviation of the mean; approximately _________ % lie within two standard deviations of the
mean, and approximately __________% lie within three standard deviations of the mean.
Question 2 [6 points]:
Find the sample standard deviation for the following data:
4
9
2
5
Question 3 [12 points; 6 each part]:
Recall that IQ scores are normally distributed with a mean of 100 and standard deviation of 16.
a) What percentage of the population has IQ between 64 and 136?
b) What IQ do you need to score in the bottom 70% of the population?
Question 4 [10 points, divided as indicated]:
Clyde Arthur Fazenbaker doesn’t like broccoli. He decides to prove that broccoli is bad
for people. He obtains data from three countries on per capita broccoli consumption and the
cancer death rate. The data are below.
Country
Boravia
Kafoonistan
Lower Slobovia
Broccoli Consumption
20
40
60
Cancer Rate
100
50
30
a) [8] Compute the correlation between broccoli consumption and cancer rate, for these data.
b) [2] Is the sign (positive or negative) of this correlation consistent with Clyde Arthur’s theory?
Explain.
Question 5 [4 points]:
Murgatroyd Applegarth’s portfolio gained 60% in 1999, and gained 90% the following
year. However, she broke even (neither gaining nor losing money) in 2001 and lost 35% of
value in 2002. What has been her average rate of return over the four years?
Question 6 [10 points]:
Tickets in the Boravian National Lottery cost one Boravian dollar apiece. Each ticket has
a three-digit number on it – 000 through 999. Two tickets are drawn at random. If your ticket is
the first one drawn, you win $500 (net – you get your dollar back plus $500 more). If your ticket
is the second one drawn, you win $250 net. Otherwise you lose your dollar. Find the expected
value and variance of the net winnings in the Boravian Lottery.
Question 7 [4 points]:
The Orlando Sentinel this week carried a news article about a conflict between restaurant
owners and environmentalists over Chilean sea bass. Environmental activists are concerned that
the commercial fishing industry is rapidly depleting the population of sea bass. Restaurant
owners, however, do not wish to see any further restrictions on sales of this popular fish.
Underlying the debate is a statistical issue – that of forecasting the sea bass population, and
assessing the impact that commercial fishing has. Which of the following is most appropriate in
this context?
_____ catastrophe model
_____ exponential growth model
_____ least squares regression model
_____ saturation model
Question 8 [4 points]:
Next year Stetson University will formally adopt an Honor Code for student academic
conduct. The intent is to encourage and reinforce standards for honesty and integrity in
academic work. Clearly, we would all be better off if everyone were academically honest.
However, some students cheat because they perceive they can obtain a short-term benefit (at the
expense of fellow-students). Of course, as cheating becomes widespread, everyone suffers.
Which of the following best describes this scenario?
_____ regression to the mean
_____ prisoner’s dilemma
_____ spurious correlation
_____ false positive rate
Question 9 [4 points]:
A recent study of self-made millionaires (that is, those who made, rather than inherited,
their fortunes) showed that most of the children of these millionaires were financially wealthy, or
at least well-off – but were generally not as successful in business as the parents. This is an
example of which of the following?
_____ substitution bias
_____ correlation is not causality
_____ extrapolation beyond the range
_____ regression to the mean
Question 10 [2 points]:
Dietrich Buxtehude has enough money to invest in two stocks. All else being equal,
which of the following is the best strategy for him, to reduce his risk?
_____ Put all his money in one company, so he’s not exposed to risks incurred by the other.
_____ Invest in two companies that are highly positively correlated, so when good things
happen to one they’ll tend to happen to the other.
_____ Invest in two companies that are essentially independent, so there will be
approximately zero correlation between their returns.
_____ Invest in two companies with a negative correlation of returns, so that gains in one
will tend to be offset by losses in the other.
Question 11 [8 points]:
Anastasia Romanova rolls five standard, six-sided dice. What is the probability she gets
at least one “4”?
Question 12 [7 points, divided as indicated]:
Muford Dudelsack reads in the newspaper about a research study that found a negative
correlation between intelligence and the amount of time spent watching television. He concludes
that the smarter you are, the less you tend to watch TV.
a) [4] Is this a correct interpretation of the study’s findings? Explain.
b) [3] Muford decides to replicate the study. He’s pretty lazy, though – and obtains only two
data points. What correlation will he get for his data? (Or can’t we tell, from the information
given?) Explain.
Question 13 [12 points, divided as indicated]:
The year is 2503, and residents of the Moon are worried about sabotage of life support
systems by terrorists from Mars. The Division of Moonbase Security believes it has developed a
screening system that can identify potential terrorists with 99.9% reliability. They propose using
the system on every one of the 10,000 tourists who visit the Moon from Mars annually.
a) [8] Suppose that only ten of the 10,000 are in fact terrorists. What is the false positive rate for
this procedure?
b) [4] Are “being a terrorist” and “being labeled a terrorist by the screening system” independent
or dependent events? Explain.
Question 14 [6 points]:
Wilmot Proviso is deciding whether to invest in a company that sells reliable used cars to
low-income individuals, or one which repossesses these cars when owners miss payments. He
estimates that in good economic times the sales company will earn him $25,000 while the
repossession company will lose him $10,000. On the other hand, in bad times the first company
will lose $8000 while the second firm will gain him $15,000. Wilmot has no idea about the
likelihood that economic times will be good next year. What is his break-even point – the
probability at which the two options are balanced? (Any higher chance of good times makes him
go with the sales company, while with any higher chance of bad times he goes with the other
firm.)
Question 15 [6 points]:
Once again it’s time for Dr. Rasp to investigate his favorite regression model – the
relationship between the amount of sleep students get the night before the “big quiz” and their
score on the “quiz.” Excel output from his data analysis is given below.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.761
R Square
0.580
Adjusted R Square
0.557
Standard Error
23.474
Observations
20
ANOVA
df
Regression
Residual
Total
Intercept
Sleep
1
18
19
SS
13688.5
9918.7
23607.1
Coefficients Standard Error
20.5
8.26
9.4
1.89
MS
13688.5
551.0
F
Significance F
24.8
9.61E-05
t Stat
P-value Lower 95% Upper 95%
2.48
0.02
3.11
37.84
4.98
0.00
5.45
13.39
Give the slope and intercept of the regression model. Interpret these quantities, in context of the
problem.
***** DS280 – SPRING 2003 – BIG QUIZ #3-T – SOLUTIONS *****
1) Approximately 2/3 within 1 sd; approx. 95% within 2 sd; virtually all within 3 sd
1
2
2
X 2    X 



X

X

n
2) Variance =
=
n 1
n 1
First method:
Second method:
2
X
X
X2
X X
X  X 
4
-1
1
4
16
9
4
16
9
81
2
-3
9
2
4
5
0
0
5
25
Total = 26
Totals: 20
126
X =5
126  (1 / 4)  (20) 2
26
= 8.667
OR
Variance =
= 8.667
3
3
Hence the standard deviation is 8.667 = 2.94
So Variance =
3a) z = (136-100)/16 = 2.25 and z = (64-112)/16 = -2.25.
Looking z=2.25 up in the table gives an area of .4878.
So our probability is .4878 + .4878 = .9756
3b) To score in the bottom 70%, we have the bottom half of the curve plus an additional 20%.
Looking up a probability of 20% gives a z-score of .52.
That is, the cutoff point is .52 standard deviations above the mean.
Hence: 100 + (.52)*(16) = 108
 X  X  Y  Y   XY  n  X  Y 
1
4a) Covariance =
First method:
X
Y
20
100
40
50
60
30
n 1

n 1
Second method:
X
Y
20
100
40
50
60
30
product
X*Y
Y Y
X X
-20
40
-800
2000
0
-10
0
2000
20
-30
-600
1800
Total:
-1400
120
180 5800
5800  1 / 3 (120)  (180)
 1400
Covariance =
= -700 OR Covariance =
= -700
2
2
Covariance
 700
=
= -.97
SD(x)  SD(y)
(20)  (36.05)
4b) This is NOT consistent with Clyde Arthur’s prediction. He says that higher broccoli
consumption should be associated with higher cancer rates – but the negative correlation
here indicates that higher broccoli consumption is associated with lower cancer rates.
Correlation =
5)
4
1.6 1.9 1.65 = 1.186, so 18.6% average rate of return
6) Possible outcome
+500
+250
-1
Probability
1/1000
1/1000
998/1000
E(X) = (500)*(.001)+(250)*(.001)+(-1)*(.998)
E(X) = (500)*(.001)+(250)*(.001)+(-1)*(.998) = -.248
V(X) = [(500)2*(.001)+(250)2*(.001)+(-1)2*(.998)] – (-.248)2 = 313.44
7) catastrophe model
8) prisoner’s dilemma
9) regression to the mean
10) Invest in two companies with negative correlation of returns …
11) Pr(at least one 4) = 1 – Pr(no 4’s) = 1 – (5/6)5 = .598
12a) Yes. A negative correlation means that high values on one variable tend to be associated
with low values on the other, and vice versa. No causal relationship is implied – but no
causal statement is being made here.
12b) With just two data points, we know they have to lie exactly on the line. Hence the
correlation will be +1. (Granted, it won’t be a particularly informative correlation – but
that’s not what the question is asking.)
13a) This question is taken verbatim from Big Quiz #2 (changing only one number).
|
Terrorist
|
Non-terrorist
| TOTAL
Labeled a terrorist
| .999*10 = 9.99 |.001*9990 = 9.99
|
19.98
Labeled a non-terrorist |
|
|
TOTAL |
10
|
9990 |
10,000
Hence the false positive rate is 9.99/19.98 = 50%
13b) These are dependent events – your chance of being labeled a terrorist is much higher if
you actually are one than if you aren’t. The two ARE related.
14) The payoff matrix is:
Good times Bad times
Car sales company
$25,000
-$8,000
Car repossession
-$10,000 $15,000
The break-even point occurs at the probability for which the two payoffs balance:
(25000)(p)+(-8000)(1-p) = (-10000)(p)+(15000)(1-p)
25000p – 8000 + 8000p = -10000p + 15000 – 15000p
33000p – 8000 = 15000 – 25000p
58000p = 23000
p = 23000/58000 = .397
15) slope = 9.4 – Every additional hour of sleep increases the score 9.4 points, on average.
Intercept = 20.5 – Folk with no sleep score a 20.5, on average.
Download