PracticeExam1WithAnswers

advertisement
Statistics 515 – Statistical Methods I
Practice Test for Exam 1
E. A. Pena’s Class
(WITH Answers)
______________________________________________________________________
Part I (24 points). For questions 1-8 please refer to the following information:
Petroleum pollution in seas and oceans stimulates the growth of some types of bacteria.
A count of petroleumlytic microorganisms (bacteria per 100 milliliters) in n = 10 portions
of seawater gave the following readings.
Raw Data: 49, 70, 54, 67, 59, 40, 61, 69, 71, 52
The associated ordered/arranged values are given below.
Ordered Values: 40, 49, 52, 54, 59, 61, 67, 69, 70, 71
Furthermore, for this data set,
 Xi = Sum of the Observations = 592
 (Xi)2 = Sum of the Squared Observations = 36014
1. Construct a stem-and-leaf or dot plot for this data set.
Answer:
4 | 0, 9
5 | 2, 4, 9
6 | 1, 7, 9
7 | 0, 1
2. Compute the sample mean.
Answer: Sample Mean = (592)/10 = 59.2
3. Determine the sample median.
Answer: Sample Median: (59 + 61)/2 = 60
4. Compute the sample variance. [You may use the information given above!]
Answer: Sample Variance = [36014 - (592)2/10](10-1) = 107.51
1
5. Compute the sample standard deviation.
Answer: Sample Standard Deviation = 10.37
6. Determine the first quartile.
Answer: First Quartile = Q1 = 52
7. Determine the third quartile.
Answer: Third Quartile = Q3 = 69
8. Draw the boxplot.
70
C1
60
50
40
________________________________________________________________________
Part II (18 points). For questions 9-14 please refer to the following information:
Americium 241 (241Am) is a radioactive material used in the manufacture of smoke
detectors. The article "Retention and Dosimetry of Injected 241Am in Beagles" [a beagle
is a small short-legged smooth-coated hound] published in Radiation Research (1984),
pp. 564-575, described a study in which 55 beagles were injected with a dose of 241Am
(proportional to the animals' weights). Skeletal retention of 241Am (Ci/kg) was recorded
2
for each of the 55 beagles. The following summary information pertains to these 55
observations.
Frequency Histogram for the Amount of Americium
Retained in 55 Beagles
Frequency
15
10
5
0
0.175 0.225 0.275 0.325 0.375 0.425 0.475 0.525 0.575 0.625
Amount of Americium Retained
Numerical Summary Measures
Type of Summary Measure
Value of Americium Retained
n (# of Observations)
55
Sample Mean
0.3489
Sample Median
0.3370
Sample Standard Deviation
0.0800
Minimum
0.1860
First Quartile (Q1)
0.3030
Third Quartile (Q3)
0.4080
Maximum
0.5850
3
Boxplot for the Amount of Americium
Retained by the 55 Beagles
Americium Retained
0.6
0.5
0.4
0.3
0.2
9. Describe the shape of the distribution for the Amount of Americium Retained by
these 55 beagles. Provide explanations and/or reasons for your answer.
Answer: The shape of the distribution is somewhat right-skewed. Notice that the
sample mean is a little bit larger than the sample median, which is what we expect
for right-censored distributions. You will also notice the right skewness from the
boxplot, with the distance from the median to the third quartile being larger than
the distance of the first quartile from the median.
10. Based on the information provided, are there any outliers in the data? If so, what is
the approximate value of this(these) outlier(s).
Answer: The boxplot indicates one outlier, whose value is .58, the largest
observation.
11. Approximately what percentage of the 55 observations are between 0.3030 (the first
quartile) and 0.4080 (the third quartile)?
Answer: By definitions of the first and third quartiles, there will be approximately
50% of all observations between .3030 and .4080.
12. Provide a plausible explanation why the sample mean is larger than the sample
median.
Answer: Two possible explanations are a) the distribution is right-skewed; and b)
the mean is affected by the outlier on the right tail.
4
13. Based on the histogram, approximately how many observations exceed the value of
0.375?
Answer: 9 + 6 + 2 + 0 + 1 = 18 observations.
14. The interval around the sample mean whose limits are two sample standard
deviations away from the sample mean is [.3489 - 2(.08), .3489 + 2(.08)] = [.1889,
.5089]. What could you say about the percentage of observations that will fall in this
interval? Provide a reason for your answer.
Answer: Since the histogram is not exactly mound-shaped, we could use
Chebyshev's Inequality to conclude that there will be at least 75% of all
observations in the specified interval. As the distribution is not too far away from
being mound-shaped, using the Empirical Rule, we could conclude that the
percentage in this interval should be close to 95%.
________________________________________________________________________
Part III (21 points). For questions 15-21 please refer to the following information.
In a three-year study of cocaine addiction by D. M. Barnes as reported in the article
"Breaking the cycle of addiction" which appeared in Science, 241(1988), pp. 1029-1030,
72 chronic cocaine users were either given the antidepressant desipramine, lithium (the
standard drug to treat cocaine addiction), or a placebo. The 72 subjects were randomly
divided into three equal groups. The purpose of the study was to determine whether
giving a cocaine addict an antidepressant will help in breaking the addiction. The
following table presents the result of the study.
Cocaine Relapse?
Desipramine
Lithium
Placebo
Total
Yes
10
18
20
48
No
14
6
4
24
15. Compare the relapse rate for the three groups. Which among desipramine, lithium, or
placebo is most effective in lowering the relapse rate among cocaine addicts?
Answer: Rates: Desipramine = 10/24 = .4167; Lithium = 18/24 = .75; Placebo = 20/24
= .83. Therefore, the Desipramine group has the lowest relapse rates.
5
16. Consider the experiment of choosing at random one of the subjects in the above study
and then determining the treatment given (which is either desipramine, lithium, or
placebo) and observing whether the subject has a relapse. The sample space of this
experiment is:
S = {(Desipramine, Yes), (Desipramine, No), (Lithium, Yes), (Lithium, No),
(Placebo, Yes), (Placebo, No)}
What would be the appropriate probabilities to assign to these six outcomes in this
sample space. Note that these probabilities should be based on the number of
individuals in the different cells of the table and the overall total.
Answer:
P((D,Y)) = 10/72 = .1389; P((D,N)) = 14/72 = .1944; P((L,Y)) = 18/72 = .25; P((L,N))
= 6/72 = .08; P((P,Y)) = 20/72 = .28; P((P,N)) = 4/72 = .0556.
17. Define event A to be the event that "Desipramine" was assigned, and B be the event
that the subject had a relapse. What are P(A) and P(B)?
Answer:
P(A) = (10 + 14)/72 = .3333
P(B) = 48/72 = .6667
18. Find P(A or B), that is, the probability that either A or B occurs.
Answer:
P(A or B) = (48+14)/72 = .8611.
19. Find P(B|A), the conditional probability of B given A.
Answer:
P(B|A) = P(A and B)/P(A) = (10/72)/((24/72) = 10/24 = .4167.
6
20. Are events A and B independent? Provide a reason for your answer.
Answer: Since P(B) does not equal P(B|A), then A and B are dependent.
21. Find the probabilities
a) P(Desipramine was assigned | B); and
b) P(Lithium was assigned | B).
Based on these probabilities, if you are given the information that the subject had a
relapse, is it more likely that the subject was assigned desipramine or lithium?
Answer:
P(B|Desipramine) = P(B|A) = .4167
P(B|Lithium) = 18/24 = .75
Thus, it is more likely that the patient was assigned Lithium.
________________________________________________________________________
Part IV (12 points). For questions 22-24 please refer to the following information.
ELISA tests are used to screen donated blood for the presence of the AIDS virus. The test
actually detects antibodies, substances that the body produces when the virus is present.
If the antibodies are present, ELISA is positive with probability of .997 and negative with
probability of .003. If the blood being tested is not contaminated with AIDS antibodies,
ELISA gives a positive result with probability of .015 and a negative result with
probability of .985. Assume that 1% of a large population carries the AIDS antibody in
their blood. Suppose that one individual is randomly chosen from this population.
22. Draw a tree diagram which depicts the outcomes of this two-step experiment, with
step 1 being the process of choosing the person (outcomes: the person does or does
not carry the antibody) and step 2 being the process of performing the ELISA test on
the person’s blood (outcomes: positive or negative).
Answer: Can't draw it on the computer so:
(Antibody, Positive)
(Antibody, Negative)
(No AntiBody, Positive)
(No AntiBody, Negative)
Prob = (.01)(.997) = .00997
Prob = (.01)(.003) = .00003
Prob = (.99)(.015) = .01485
Prob = (.99)(.985) = .97515
7
23. What is the probability that the ELISA test for the AIDS virus will show a positive
result?
Answer: P(Positive) = P(Antibody, Positive) + P(No Antibody, Positive) = .00997 +
.01485 = .02482.
24. Given that the ELISA test is positive, what is the probability that the chosen person
has the AIDS antibody?
Answer: P(Antibody | Positive) = P(Antibody, Positive)/P(Positive) = .00997/.02482 =
.40169.
________________________________________________________________________
Part V (16 points). For questions 25-28 please refer to the following information.
Let X be the random variable denoting the number of revisions (including the original
version) before a manuscript is accepted for publication in a scientific journal. Suppose
that the probability function of X is given by:
x = number of revisions
1
2
3
4
5
p(x) = P{x revisions needed}
0.10
0.30
0.35
0.15
0.10
25. Find P{2 < X < 4} = probability that the manuscript will take between 2 and 4,
inclusive, revisions before getting accepted for publication.
Answer: Prob = .30 + .35 + .15 = .80
26. Determine the mean of X.
Answer: Mean = = (1)(.10) + (2)(.30) + (3)(.35) + (4)(.15) + (5)(.10) = 2.85
27. Determine the standard deviation of X.
Answer:
Variance = 2 = (1 - 2.85)2(.10) + (2 - 2.85)2(.30) + (3 - 2.85)2(.35) + (4 - 2.85)2(.15) +
(5 - 2.85)2(.10) = 1.2275
8
Standard Deviation =  = Square Root (1.2275) = 1.1079
28. Suppose that we define the variable Y = 2X + 5. By simply using the mean and
standard deviation of X, what will be the mean and standard deviation of Y?
Answer: (Did not actually teach this in our class, so this type will not be included in
exam)
Mean of Y = 2(2.85) + 5 = 10.7
Variance of Y = (2)2(1.2275) = 4.91
Standard Deviation of Y = (2)(1.1079) = 2.2158
________________________________________________________________________
Part VI (12 points). For questions 29-32 please refer to the following information.
A psychiatrist believes that 80% of all people [a very large population] who visit doctors
have problems of a psychosomatic nature. She decides to select 25 patients at random to
test her theory. Let X denote the number of patients out of the 25 who have problems of
a psychosomatic nature, so that X has a binomial distribution. Assume that the
psychiatrist's theory is correct.
29. What is the mean of X?
Answer: Mean = (25)(.80) = 20
30. What is the standard deviation of X?
Answer:
Variance = (25)(.80)(.20) = 4
Standard Deviation = 2
31. Find the probability that X = 20. [You may just write this in formula form.]
Answer: P(X=20) = 25C20 (.8)20 (.2)25-20 = .1960
32. By using a table of binomial probabilities or a calculator, we find that P{X < 14} =
.0056 when the psychiatrist’s theory is correct. Suppose that when the sample of 25
patients was actually taken, only 14 has problems of a psychosomatic nature. What
conclusions could you make about the psychiatrist's theory?
Answer: P( X < 14 ) = .006. If 14 are obtained, either that you are SOOO unlucky,
or the theory is wrong, and since we believe that we are not that unlucky, then we
would conclude that the theory is the wrong one!
9
10
Some Formulas That May Be Useful
X
1 n
 Xi
n i 1
2

 n
 
 Xi  

1 n
1  n 2  i 1  
2
2
S 
Xi 
 ( X i  X )  n  1 

n  1 i 1
n
i 1




M = value that divides arranged data into two equal parts
Q1 = Divides arranged data into 25:75 split
Q3 = Divides arranged data into 75:25 split
P(A or B) = P(A) + P(B) - P(A and B)
P(B|A) = P(A and B)/P(A)
P(B) = P(A)P(B|A) + P(Ac)P(B|Ac)
P(A|B) = P(A)P(B|A)/P(B)
P(A and B) = P(A)P(B) if A and B are independent
   xp(x)
 2   ( x   )2 p( x)   x 2 p( x)  2
  2
n
p( x )    p x (1  p ) n  x
 x
=np;
 2  np(1  p)
n! = (n)(n-1)(n-2)...(2)(1) with 0! = 1
n
n
n!
Cr    
 r  r!(n  r )!
11
Download