Statistics 108

advertisement
Statistics 108
Fall 2002, (Prof. Rizzardi)
Name:___________
+If the answer is a single number or interval, circle your final answer.
+Show all work for full credit. Neatness counts.
+If you are not certain of an answer, describe your logic for partial credit.
+For this final, you are allowed to use:
 tan sheet (or new double sided copy) of normal distribution table
 calculator
 1 page of notes - extra notes may be made on normal distribution table if
more space is needed
 If you create your own new copy of the tan sheet and it is not double
sided, restrict your 1-page of notes to the back side of the two normal
distribution table sheets.
(Problem 1) Let x1  3, x2  7, x3  2.
(1a) Calculate x .
3
(1b) Calculate  ( xi  x ) 2
i 1
(1c) Calculate the sample standard deviation of the data.
(Problem 2) Fill in both blanks. The smallest value a probability can be is ______ and
the largest value is ________, and any probability value stated outside of this range is a
mistake.
(Problem 3) Suppose you were to calculate a 1-standard deviation window by calculating
x  s . Assuming the dot plot of the data is approximately “bell-shaped”, you would
expect there to be roughly _______% of the data within the 1-standard deviation
window.
(Problem 4) A study (fictional!) at a local hospital was performed to see if breast
feeding of infants was associated with lower allergy rates later in the child’s life. The
mothers of young children suffering from severe non-food allergies were asked whether
or not the child had been breast fed regularly for at least 2 months after birth. Mothers of
children who were in the hospital for non-disease injuries (e.g. broken arm) were also
asked the same question. Data were later analyzed to compare whether breast feeding
was more or less prevalent among the allergy children. Was this an experimental or
observational study? Explain why.
1
(Problem 5) Suppose a fair 6-sided die is rolled once. A particular event occurs if the die
satisfies one of the values inside of the set.
A={1,2,3}
B={2,4,6}
C={2,3,5}
D={3,4,5,6}
(5a) Calculate P (D )
(5b) Calculate P( D c ) ; i.e., the probability of the complement of event D.
(5c) Calculate P ( A  C )
(5d) Calculate P ( A | B )
(Problem 6) Suppose the following probabilities among an animal population:
P( diseased | male ) = 0.2
P( diseased | female ) = 0.1
P( male ) = 0.30
P( female ) = 0.70
(6a) Are disease and sex independent? Explain.
(6b) Calculate P( diseased )
2
(Problem 7) The leaf lengths of 582 trillium plants were collected by HSU students. The
boxplot of the lengths is shown below. Units are in centimeters.
20
leaf
15
10
5
(7a) Approximately what percent of the leaf lengths are less than 15cm in length?
(7b) Give a rough calculation of the interquartile range. Show your work.
(Problem 8) Suppose the probability of a newborn calf being male is 0.3; i.e.,
P(male)=0.3. If six calves were born, and their sexes are independent, what is the
probability of exactly two males and four females; i.e., calculate P( number of males=2).
3
(Problem 9) A study was carried out where the weight (pounds) and cholesterol levels
(mg/100 ml) were compared. Of interest was whether cholesterol is associated with
weight. A simple linear regression analysis was performed by a statistician. Below is
some of the Minitab output.
The regression equation is
cholesterol = - 128 + 2.03 wt
Predictor
Constant
wt
Coef
-127.57
2.0320
SE Coef
78.90
0.4447
T
-1.62
4.57
P
0.130
0.001
Regression Plot
cholesterol = -127.567 + 2.03199 wt
S = 36.8697
R-Sq = 61.6 %
R-Sq(adj) = 58.7 %
cholesterol
300
200
100
140
150
160
170
180
190
200
210
220
wt
(9a) If a randomly sampled man weighed 180 pounds, using the regression
analysis, what would you expect his cholesterol to be?
(9b) For each pound increase in weight, you would expect cholesterol to
(a) Decrease about 128 mg/100ml
(b) Increase about 128 mg/100ml
(c) Increase about 2.0 mg/100ml
(d) Increase about 0.4 mg/100ml
(e) Increase about 4.6 mg/100ml
(9c) The correlation coefficient between weight and cholesterol is about:
(a) –2.0 (b) – 0. 75 (c) –0.06 (d) 0 (e) +0.06 (f) +0.75 (g) +2.0
4
(Problem 10) As part of the National Health and Nutrition Examination Survey, iron
levels were checked for a sample of 786 girls aged 12 to 15. Iron deficiency was detected
in 71 of those sampled. Calculate a 95% confidence interval for the proportion of girls
in the general population aged 12 to 15 whom are iron deficient. You may use the
conservative method if you wish.
(Problem 11) The below contingency table is a chi-square test output from Minitab with
some parts deleted. It involves the student data and compares hair color against gender.
Rows: sex
Columns: hair
black
blond
brown lightbro
red
All
female
3
2.96
15
15.28
9
WWW
4
VVVV
3
1.97
34
UUUUU
male
3
3.04
16
15.72
XXX
YYY
4
TTTT
1
2.03
35
35.00
All
6
6.00
31
31.00
20
20.00
8
8.00
4
4.00
69
69.00
Chi-Square = 1.218, DF = 4, P-Value = 0.875
6 cells with expected counts less than 5.0
Cell Contents -Count
Exp Freq
(11a) For brown-hair males, (observed) XXX= __________.
(11b) For brown-hair males, (expected) YYY=___________.
(11c) Which conclusion is most appropriate?
(a) There is statistically significant evidence that the mean hair color of males is
equal to females (P=0.875).
(b) There is not statistically significant evidence that he mean hair color of males
is equal to females (P=0.875).
(c) There is statistically significant evidence that gender and hair color are
dependent (P=0.875).
(d) There is not statistically significant evidence that gender and hair color are
dependent (P=0.875).
5
(Problem 12) Suppose the random variable X is distributed according to the standard
normal distribution (mean=0, sd=1), calculate P(X < 2).
(Problem 13) Suppose the random variable X is normally distributed with a mean of 50
and standard deviation of 5. If 16 X’s were sampled and their mean, X , calculated, find
P( X < 52.5 ).
(Problem 14) Fill in both blanks. Suppose the random variable X has any distribution
(not necessarily normal!) with mean  and standard deviation  . Then, as the size of
the sample (n) gets large, the distribution of __________(hint: a symbol) will become
“approximately _________________” with mean  and standard deviation

n
.
(Problem 15) Suppose 36 island foxes were captured and weighed. The mean weight
was 10 pounds with a sample standard deviation of 3 pounds.
(15a) Calculate a 95% confidence for the mean weight of island foxes. (You
may use 2 as the multiplier.)
(15b) What is meant by “95% confidence interval”?
(15c) True or False: The width of the confidence interval will typically get wider
with a larger sample size. If false, explain why the interval would get
narrower.
6
(Problem 16) A P-value is a probability. Specifically, what is the probability
describing?
(Problem 17) A motor company claims that the probability of one of their cars needing a
repair during the first 80,000 miles is at most 0.3. A survey of 600 car owners resulted in
34% of the 600 cars needing a repair during the first 80,000 miles. Using   0.05 , test
the hypothesis that the true probability of repair is greater than 0.3.
(17a) State the null and alternative hypothesis.
(17b) Calculate the appropriate test statistic.
(17c) Calculate a p-value. Sketch a graph shading in the area beneath the density
curve which equals the p-value.
(17d) Should you keep or reject your null hypothesis? Explain how you reached
this conclusion.
(17e) Is it possible that you made a Type I error? Explain.
7
Download