E(Z – Y) = EZ – EY = 2.000 – 2.001 = –.001

advertisement
Practice Exam (weeks 1 – 7) – Sample Solutions
Attempt all questions. You must support all answers with reasons – correct
answers with incorrect or missing reasons will receive NO CREDIT.
1. Scientists have long wondered what role, if any, fever plays in defending the body against
infection. In order to determine whether fever is a beneficial respose to infection,
researchers assigned laboratory mice at random to two groups. Both groups were
infected with a fever-causing virus. Fevers in the first group were brought within normal
limits with carefully monitored doses of aspirin. The second group was given nothing.
The researchers found that the death rate for the first group was significantly higher than
that of the second group.
a. Is this study observational or experimental? experimental; the researchers assign
the treatments to individuals (rather than the individuals choosing the treatment)
b. What is the treatment? aspirin Is there a control group? If so, which group? yes,
mice not receiving aspirin
c. Is there likely to be a placebo effect in this study? Why or why not? no, there is
no placebo (and it seems unlikely that mice would believe they should get better if
receiving something  )
d. Can the researchers conclude that the increased death rate in the first group was
due to the absence of fever? Give reasons for your answer. No, aspirin may have
effects other than reducing fever
2. These questions refer to the following output from R for tumor thickness (in mm):
0.15
0.00
0.05
0.10
Density
0.20
0.25
Histogram of tumor
0
5
10
tumor
> library(boot)
> data(melanoma)
> tumor<-melanoma[,"thickness"]
> length(tumor)
15
[1] 205
> quantile(tumor)
0%
25%
50%
0.10
0.97
1.94
75%
100%
3.56 17.42
a. How many observations are there for the variable tumor? 205
b. What was the largest value of tumor? 17.42mm
c. Approximately how many tumors have thickness between 4 mm and 6 mm?
.05*(6-4) = 10%, or around 20
d. Which is larger, the mean or median tumor thickness? long right tail, so mean
bigger than median
e. Approximately what proportion of values are larger than 2 mm?
very close to 2mm, so about half are larger than 2 mm
The median is
f. Sketch a boxplot for tumor. (below, left)
5
0
10
5
0
10
Sample Quantiles
15
15
Normal Q-Q Plot
-3
-2
-1
0
1
2
3
Theoretical Quantiles
g. Sketch what you think the shape of a QQ normal plot for tumor would look like
(don’t worry about the scales), and explain why it should look like this. (above,
right)
3. A machine produces pins (printing tips) for use in microarray experiments. If the
machine is correctly adjusted, the rate of unacceptable pins is 5%. If it is not adjusted
correctly, the rate of unacceptable pins is 50%. From past company records, the machine
is known to be correctly adjusted 90% of the time. A quality control inspector randomly
selects one mask from those recently produced and discovers that it is defective. What is
the probability that the machine is incorrectly adjusted?
Use Bayes’ rule: P(U|D) = P(D|U)P(U)/[P(D|U)P(U) + P(D|not U)P(not U)]
= .5*.1/[.5*.1 + .05*.9] = 10/19
4. You have two scales for measuring weight in a lab. Both scales give answers that vary a
bit in repeated weighings of the same item. If the true weight of a compound is 2 grams,
the first scale produces readings Z that have mean 2.000 g and SD 0.002 g. The second
scale’s readings Y have mean 2.001 g and SD 0.001 g. Assume that the readings of Z
and Y are independent.
a. Which scale is biased? How much bias is there? second scale; bias = 2.001 –
2.000 = .001
b. Give the MSE (mean square error) for each scale. MSE = var + bias2
For the first scale, MSE = .0022 + 02 = .000004; for the second scale, MSE =
.0012 + .0012 = .000002
c. Which scale is less variable? The second scale has a lower SD (or variance), so
it is less variable
d. What are the mean and SD of the difference between the readings, X = Z – Y?
E(X) = E(Z – Y) = EZ – EY = 2.000 – 2.001 = –.001
SD(X) = sqrt(Var(Z) + Var(Y)) = sqrt(.0022 + .0012) = .0022
e. You measure once with each scale and average the readings. Your result is
W=(Z+Y)/2. What are the mean and SD of W? Is the average W more variable
or less variable than the reading Y of the less variable scale?
E(W) = E[(Z+Y)/2] = (EZ + EY)/2 = (2.000 + 2.001)/2 =2.0005
SD(W) = sqrt( (Var(Z) + Var(Y))/4 ) = sqrt((.0022 + .0012)/4) = .0011
5. You have to take a statistics exam which consists of true/false questions. That is, each
question has 2 possible answers and you have to choose the correct one. You studied
very much for this test, but unfortunately you woke up with ‘statistics amnesia’ and don’t
remember anything at all, so you will need to guess the answer to every question.
Fortunately, you have in your pocket a fair coin, which you will use to help you answer
the questions. You plan to flip the coin once for each question, and answer ‘true’ if the
coin lands head and ‘false’ if the coin lands tails. You need to answer at least 80% of the
question correctly to pass the exam.
a. Suppose the exam has 10 questions. What is the distribution of your score on the
test? Be specific, and include the values of any parameters of the distribution.
Binomial (n = 10, p = .5), assuming two possible outcomes (H or T) on each flip,
that each coin flip is independent, flip a fixed number of times (10 here), same
probability of heads on each flip
b. What is your expected score? What is the SD of your score? Let X = number
correct out of 10. EX = np = 10*.5 = 5; SD(X) = sqrt(np(1 – p)) = sqrt(10*.5*.5)
= 1.58
c. What is your chance of passing? P(X  8) = (108)(.510) + (109)(.510) + (1010)(.510)
= (45 + 10 + 1) (.510) = .055
d. Suppose the test has 100 questions. What is the (exact) distribution of your score
on the test? Again, be specific.
Binomial (n = 100, p = .5)
e. What is your expected score? What is the SD of your score? EX = np = 100*.5
= 50; SD(X) = sqrt(np(1 – p)) = sqrt(100*.5*.5) = 5
f. What is your chance of passing?
P(X  80)  P(Z  (80 – 50)/5) = P(Z  6)  0
6. The number of bacteria colonies of a certain type in samples of polluted water has a
Poisson distribution with a mean of 1 per cubic cm (cc). What is the chance that a 1 cc
sample will contain two or more bacteria colonies? P(X  2) = 1 – P(X = 0) – P(X = 1)
= 1 – e-1 – e-1 = .26
7. Suppose that whether or not it rains tomorrow depends on previous weather conditions
only through whether or not it is raining today. Suppose also that, if it is raining today,
then it will rain tomorrow with probability 0.6, and, if it is not raining today, then it will
rain tomorrow with probability 0.3. Say the system is in state 0 when it rains and state
one when it does not.
a. Why is the sequence of 0s and 1s (for whether it is raining or not over a number
of days) a Markov chain? Because in the problem it says that the weather
tomorrow depends only on the weather today, and not on the sequence of rain/no
rain leading up to today
b. Write down the transition matrix.
State
0 (rain)
1 (no rain)
0
0.6
0.4
1
0.3
0.7
c. Find the stationary distribution.
need to solve the equations: .6*0 + .3*1 = 0 , .4*0 + .7*1 = 1; get 0 =
3/7, 1 = 4/7
d. Write down an expression for the probability that it does not rain Monday –
Friday, then rains on the weekend (i.e. the sequence 1, 1, 1, 1, 1, 0, 0).
(4/7)*.7*.7*.7*.7*.3*.6
8. In an experiment with rats, a behavioral scientist used an auditory signal to indicate that
food was available through an open door in the cage. The scientist counted the number
of trials needed by each of 25 rats to learn to recognize the signal. The mean number of
trials was 15, and the SD was 2.5.
a. What is the population parameter of interest? mean number of trials
b. What assumptions do you need to make to be able to create a CI (confidence
interval) for the population parameter? We need the value for the population
parameter to be unknown, a random sample from the population, and the sample
size to be large enough that the CLT holds
c. Suppose that the assumptions hold. Give an approximate 95% CI for the
population parameter. 15 +/- 2*2.5/sqrt(25), or 15 +/- 1
Download