LAST NAME (Please Print): KEY FIRST NAME

advertisement
LAST NAME (Please Print): KEY
FIRST NAME (Please Print):
HONOR PLEDGE (Please Sign):
Statistics 111
Midterm 3
• This is a closed book exam.
• You may use your calculator and a single page of notes.
• The room is crowded. Please be careful to look only at your own exam. Try to sit
one seat apart; the proctors may ask you to randomize your seating a bit.
• Report all numerical answers to at least two correct decimal places or (when appropriate) write them as a fraction.
• All question parts count for 1 point.
1
1. For different kinds of murderers, you observe their favorite flavors of ice cream. Your
data are as follows:
parricide
infanticide
regicide
chocolate
10
20
30
vanilla
10
20
0
strawberry
10
20
0
In words specific to the problem, what is the appropriate null hypothesis?
There is no relationship between type of murderer and preferred flavor of ice cream.
40 What is the value of your test statistic?
The expected values are found from the row sum times the column sum divided by
the total, and are shown in the following table
parricide
infanticide
regicide
Total
chocolate
10
20
30
60
vanilla
10
20
0
30
strawberry
10
20
0
30
Total
30
60
30
120
Then one uses usual formula:
ts =
X
all cells
(Oij − Eij )2
Eij
9.49 What is the critical value for a 0.05 level test?
It comes from the chi2 table with df equal to the (number of rows - 1)×(number of
columns - 1).
< 0.001 What is your significance probability (if necessary, give bracketing values).
It is below the smallest value in the table.
2
In words specific to the problem, what is the conclusion for a 0.05 level test?
Strongly reject the null—there is evidence that ice cream preference is associated with
murder style.
2. You suspect that administration of electric shocks encourages telepathy. To test this,
you have undergraduates attempt to guess the cards in a Rhine deck, administering
small shocks when they guess wrong. (This is the experiment Bill Murray was conducting at the beginning of Ghostbusters.) Under random chance, a person would
expect to guess 5 cards correctly, with a standard deviation of 2.
6 Suppose that electric shocks confer telepathy, and people who feel the pain stimulus
guess, on average 7.5 cards correctly. How many undergraduates do you need to shock
in order for a 0.05 level test to have power 0.9?
0.9 = IP[ts > 1.645]
X̄ − 7.5 + 7.5 − 5
√
> 1.645]
= IP[
2/ n
2.5
= IP[Z > 1.645 − √
2/ n
so −1.28 = 1.645 −
nearest integer.
2.5
√
2/ n
and solving gives n = 5.476 and one must round up to the
0.92 You shock 9 students. What is the power of a 0.01 level test?
power = IP[ts > 2.33]
X̄ − 7.5 + 7.5 − 5
√
= IP[
> 2.33]
2/ n
2.5
= IP[Z > 2.33 − √ ]
2/ 9
= IP[Z > −1.42].
3
3. A Fox News reporter claims that at least 10% more women than men vote for the more
handsome candidate. You want to prove him wrong. You draw a random sample of
100 men and 150 women, and ask them whether they would vote for Orlando Bloom
if he ran against Newt Gingrich. (Assume Bloom is more handsome than Gingrich.)
You find that 80 men would vote for Bloom, and so would 125 women.
In symbols, what is the null hypothesis? H0 : pw − pm ≥ 0.1
Note to TA: Accept any mathematically equivalent statement of the null hypothesis.
-1.39 What is the value of your test statistic?
q
ts = (p̂w − p̂m − 0.1)/ p̂w (1 − p̂w /nw + p̂m (1 − p̂m /nm
q
= (0.83 − 0.8 − 0.1)/ 0.83 ∗ 0.17/150 + 0.8 ∗ 0.2/100
= −1.38
-1.64, -1.65 What is your critical value when α = 0.05?
0.08 What is your significance probability? (If necessary, give a bracket.)
In words pertinent to the problem, what conclusion do you reach (at the 0.05 level)?
Sadly, we cannot conclude that the Fox reporter is wrong. (To avoid any perception
of sexism, let me assure everyone that in the other version of the exam, the null
hypothesis is rejected.)
4. Spin magazine published a formula for celebrity perversity (P ), as measured by public
opinion polls, based on 25 famous people. The explanatory variables were the age
difference between them and their primary partner (D), the number of partners in a
year (N ), the number of arrests (A), and whether they were homo/bi/heterosexual
(H), coded as 1, 2, 3. Suppose Spin’s regression equation was
P = −7 + 4D 2 + 5N + 5A − 2H
4
where the standard errors on the coefficients are 0.5, 0.4, 1.2, 0.3, and 1.3, respectively.
4711 What is the predicted perversity score for Rolling Stones bassist Bill Wyman,
who, at 52, married 18 year-old Mandy Smith, claims to have had approximately 20
partners per year, has not been arrested, and is heterosexual?
−7 + 4(52 − 18)2 + 5(20) + 5(0) − 2(3) = 4711.
H Which explanatory variables should not be included in the model? Use α = 0.05.
The critical value is a t20 , and for α = 0.05 this is 1.725. The test statistics for the
coefficient on the D 2 is 4/0.4 = 10; for the coefficient on N it is 1.5/1.2 = 1.25; for
the coefficient on A it is 5/0.3 = 16.67, and on H it is 2/1.3 = 1.54. The only value
less than 1.725 is H.
5. Mendel crosses pea plants that are (YgUw) with themselves. (Here Y indicates dominant yellow peas, g indicates recessive green peas, U indicates dominant unwrinkled
peas, and w indicates recessive wrinkled peas.) He obtains 160 offspring, and observes
15 green wrinkled plants, 28 yellow wrinkled plants, 28 green unwrinkled plants, and
the rest are yellow and unwrinkled. (These traits are independent.)
In words pertinent to the problem, what is your null hypothesis?
Mendelian inheritance holds: the proportions should be 9/16, 3/16, 3/16, 1/16 for
unwrinkled yellows, unwrinkled greens, wrinkled yellows, and wrinkled greens, respectively.
2.78 What is the value of your test statistic?
Out of 160 crosses, one expects 10, 30, 30, and 90 in the obvious four categories. The
corresponding counts are 15, 28, 28, 89. The test statistic is
X
all categories
(Oi − Ei )2
= 2.777.
Ei
5
7.81 What is the critical value for a 0.05 level test?
This is from the χ23 table; the df is the number of categories - 1.
> 0.25 What is the significance probability for the experiment (if necessary, give
bracketing values.)
From the table, the sig. prob. is bigger than 0.25.
6. I want to argue that my 2013 Focus class is smarter than the average Duke student.
Suppose the average Duke IQ is 120, and a random sample of 10 Focus students out
of 17 had a mean IQ of 125 with a sample sd of 10.
In symbols, what is your alternative hypothesis? HA : µF > 120
2.39 What is the value of your test statistic?
The trick here is to use the FPCF. The test statistic is
ts =
X̄ − µ0
√
p
= 2.39.
(10/ 10) ∗ (17 − 10)/(17 − 1)
1.83 For a 0.05 level test, what is your critical value?
This comes from a t9 table.
0.025 to 0.02 What is my significance probabilty (if necessary, give a bracket).
Yes Do I decide my Focus class is smarter?
7. You want to argue that Duke students are smarter than students at UNC. To reduce
variance, you control for major. You observe:
6
math
history
English
economics
statistics
Duke
130
110
115
120
150
UNC
125
106
110
114
145
In words pertinent to the problem, what is your alternative hypothesis?
The average IQ at Duke is larger than the average IQ at UNC.
TAs: Accept any equivalent wording.
15.81 What is the value of your test statistic?
This is a paired difference test. The differences
are 5, 4, 5, 6, and 5, so the standard
√
deviation of the differences is sdD = 0.5. The mean for Duke is 125 and the mean
for UNC is 120. So the test statistic is
125 − 120
= 15.81.
ts = p
0.5/5
2.13 For a 0.05 level test, what is your critical value?
It comes from a t4 table.
In words pertinent to the problem, what conclusion do you reach (at the 0.05 level)?
You reject the null; Duke is smarter.
< 0.0005 What is your significance probability (if necessary, give bracketing values).
8. You use linear regression to predict annual income (in thousands) from the number of
pets someone owns. The fitted regression equation is Y = 80 − 3X. The proportion
7
of variance explained by knowing the number of pets is 0.36, the standard deviation
of the residuals is 12, and fit is based on 100 randomly chosen people.
-0.6 What is the correlation coefficient?
This is the square root of the coefficient of determination, with sign matched to the
slope.
$50K What is your predicted income for Ace Ventura, who owns ten pets?
$50K, since 50 = 80 - 3*10.
$34.64K About 90% of the people who own 10 pets will have at least what income?
Previously, I took a shortcut in teaching people how to place a confidence interval
on a regression line and a regression prediction. But this semester I decided to teach
the real formulae, which appear on page 14 of lecture 17. For the Fall 2014 exam, be
prepared to use those formulae.
Under the regression assumptions, people with 10 pets have incomes that are normally
distributed with mean 50 and standard deviation 12. From the z-table, 90% of the
area is above -1.28, so the answer is 50 + (12)(-1.28) = $34.64K.
9. Write a nonlinear equation to predict the amount of lumber Y that can be harvested
from a tree whose diameter at 5 feet is X and whose height is W .
Since trees are cylinders, the usable volume has the form Y = β0 + β1 X 2 W .
11. Your brother has two coins. One is fair, and the other has 2/3 chance of coming up
heads. He wants to determine who washes the dishes with a coin toss, and assures
you he is using a fair coin. To test this, you toss the coin five times, and get four
heads.
0.19 What is your exact significance probability?
8
Significance probability is the chance of getting results that support the alternative
as or more strongly than the data seen, when the null is true. So compute the chance
of getting 4 or 5 heads with a fair coin. This is 0.1875.
0.42 You are a Bayesian, and think there is a 60% chance that your brother is telling
the truth. After you make the five tosses, what do you think is the chance that he is
truthful?
Bayes rule. Calculate
IP[fair|4] =
IP[4|fair] ∗ 0.6
IP[4|fair] ∗ 0.6 + IP[4|2/3 coin] ∗ 0.4
using the binomial formula.
10. List all, and only, the true statements (10 pts.) D, F, I, J
A. As points cluster more tightly around a line, the correlation increases.
B. An ecological correlation occurs when there is a nonlinear relationship.
C. Galton first proposed euphonics.
D. Including irrelevant explanatory variables reduces predictive accuracy.
E. Correlation implies causation.
F. In regression, the errors are assumed to be independent and normal.
G. A frequentist can tell you the probability that the null hypothesis is correct.
H. A significance probability is the chance of observing data that are as or more
supportive of the null than the data obtained, when the null hypothesis is true.
I. A P-value is the same thing as a significance probability.
J. Descartes was an artillery officer.
9
Download