Statistics 108 Final Exam, Fall 2003 Name: _______________

advertisement
Statistics 108



Final Exam, Fall 2003
Name: _______________
You are allowed one 8x11.5 inch sheet of paper with notes on both sides.
You are allowed a calculator.
Show all work to receive partial credit.
(1) In early April 1998, Humboldt State students recorded data from 582 trillium plants in the forest behind
campus. One variable measured was whether the plant had a white flower (=w), pink flower (=p), or a flower
that had gone to seed (=s). Students also measured the length of stem between the flower and the leaves. The
pie chart and boxplots for the data are shown below.
(1a) Is flower type a categorical or quantitative variable?
(1b) Calculate how many degrees of the circle the white segment occupies.
(1c) For the pink flowers, draw an arrow into the boxplot showing its median. Label this line “c”.
(1d) For the pink flowers, draw an arrow into the boxplot showing its 75th percentile. Label this line “d”.
(1e) What was the flower type (pink, seeded, or white) of the plant with the longest stem?
1
(2) Joey was told to shade in the region or regions of a Venn diagram which satisfy A  B c . (Notice the
complement!) Assume the events A and B are not disjoint; i.e., events A and B have an intersection. Joey
correctly drew the Venn diagram, but could not figure out which region(s) to shade. Shade in the correct
region(s) for Joey.
(3) Suppose the probability of the British stock market finishing the year higher than it ended is 0.60.
Similarly the probability of the French stock market finishing the year higher than it ended is also 0.60. Why
is it not reasonable that the probability of the French and British markets both ending up higher is
0.60  0.60  0.36 ?
(4)
x
0
1
2
3
4
P(X=x)
0.24
0.41
0.26
???
0.01
P(X  x)
??
??
??
??
??
The above distribution table gives the probabilities for the possible values of the random variable X.
(4a) Find the missing value, ???, for P(X=3).
(4b) Fill in the missing values for the cumulative distribution function (??).
(4c) Is X a discrete or continuous random variable?
(4d) Calculate the expected value of X.
2
(5) Suppose Z is distributed according to the standard normal distribution. Calculate P(1.0  Z  1.5) .
(6) Suppose X is distributed according to a normal distribution with mean 100 and standard deviation 10.
(6a) Calculate P ( X  120) .
(6b) Calculate the 75th percentile of X; i.e., P(X<?)=0.75.
(6c) Calculate P( X  98) if X is a sample mean calculated from a sample size of n=16.
(7) True or False: Suppose the diameters of local Dungeness crabs are distributed according to a skewed
distribution which is very different from the normal distribution. Then, according to the central limit
theorem, if you were to randomly sample and measure 1,000 local Dungeness crabs, the distribution of these
individual crab measurements would be approximately normally distributed. Defend your answer.
(8) Five fair dice are rolled. Calculate the probability of getting exactly three 6s.
3
(9) A study was carried out to investigate if taking vitamin C affects the chance of getting a cold. 818 people
were enrolled in the study. 407 randomly chosen people were given enough 1000mg vitamin C tablets to last
them through the winter. The remaining 411 were given placebo pills. A physician, who did not know the
group to which the person had been assigned, interviewed each person at the end of the cold season. 335 of
the 411 people assigned the placebo had had a cold, while 302 of the 407 people assigned the vitamin C had
had the cold.
(9a) Was this an observational or experimental study? Explain why.
(9b) For those who had taken the placebo, calculate the risk of getting a cold.
(9c) Using the placebo group as the base group, calculate the relative risk of getting a cold for those who had
taken vitamin C compared to those with the placebo.
The data can be displayed in a 2-by-2 contingency table (see below) and then a chi-square test for
independence executed.
Outcome
Cold
No Cold
Totals
Placebo
411
335
76
Vitamin C
407
302
105
Totals
637
181
818
(9d) Suppose a chi-square test for independence was to be executed. State the null and alternative
hypotheses.
Ho:
Ha:
(9f) The final chi-square statistic for the data is 6.337 giving a P-value=0.012. State you conclusion in a
complete sentence. Explain how you reached your conclusion?
(9g) Assuming you did your statistical analysis correctly, is it possible that your conclusion is incorrect?
Explain why or why not.
4
(10) A November 2003 the Gallup Poll News Service randomly telephoned and interviewed 1,004 adults in
the United States, aged 18 and older. 612 of those interviewed answered “yes” to the question “Is religion
very important in your own life”.
(10a) What is the population of interest?
(10b) What is the sample?
(10c) Estimate the proportion of American adults who consider “religion very important in their own life.”
(10d) Using the more accurate technique (not the quick conservative method), calculate a 95% confidence
interval.
(10e) Using your answer in part (d) as an example, explain what is being described the a 95% confidence
interval.
(11) A doctor wondered whether cholesterol level differed, on average, between before-breakfast and afterdinner. Both measurements were taken on 20,000 different people and the differences for each of the 20,000
calculated. A t test was executed on the differences and a 95% confidence interval calculated The beforebreakfast level was, on average, -0.043 lower than after-dinner. The P-value was 0.0337 and the 95%
confidence interval ( -0.0850, -0.0016).
(11a) Is there statistically significant evidence that the mean difference is not zero?
Explain two ways of how you reached your conclusion.
1.
2.
(11b) Is there a practical significance (real importance) in the result? Defend your answer.
5
(12) A 1985 study was carried out in a fine restaurant in Toulouse France to investigate the relationship
between the age of a vintage armagnacs (a brandy) and the cost per a glass in Francs. A regression analysis
was carried giving the following result:
(12a) Circle which value is most likely to be the correlation for price and age:
(i) –1.30 (ii) –1.00 (iii) –0.96 (iv) –0.23 (v) 0.00
(vi) +1.30 (vii) +1.00 (viii) +0.96 (ix) +0.23
(12b) How much would you expect to pay for a glass if they had 30 year old armagnac at this restaurant?
Show your work.
(12c) Suppose they actually charged 120 Francs for a glass of 30 year old armagnac. Calculate the value of
that residual.
(12d) Explain how the regression line was determined.
6
(13) A company manufactures a drug used treat a certain disease. They claim that the drug will cure at least
20% of the people with the disease. A medical school professor decided to research this claim as he thought it
cures less than 20% of people with the disease. 300 randomly selected people with the disease were given the
drug, and only 51 people were cured (17%). Using a level of significance of 5%, is there statistically
significant evidence to support the doctor’s suspicion that the cure rate is less than 20%?
That is, test H 0 : p  0.20 against H A : p  0.20 using   0.05 .
(13a) Calculate the test statistic.
(13b) Calculate the p-value.
(13c) True or False: There is statistically significant evidence that the proportion of cases that this drug
would cure the disease is less than 0.20. Explain how you reached your conclusion.
(13d) In this problem, pˆ  0.17 . Is p̂ a statistic or parameter? Explain the difference between a statistic
and a parameter.
(14) A study was carried out where 30 students’ blood pressures were measured at the start of a Friday
statistics lecture on a day when they had no exam. The blood pressures of those same 30 students were also
measured at the start of another Friday statistics lecture, but on that day they were also scheduled to have a
quiz. The question the study was trying to answer was whether or not the stress of a quiz raises the mean
blood pressure of the students. Circle which method should be used to analyze this data?
(i) Chi-square test for independence (ii) Paired t-test
(iii) Regression analysis (iv) Two-sample t-test
(15) True or False: Increasing the sample size will increase the power of the test. Explain why or why not.
(16) True or False: If the null hypothesis is true, the probability of a type 1 error is  . Explain why or
why not.
(17) True or False: A type 2 error can only occur if we decide to keep the null hypothesis. Explain why or
why not.
7
(18) The weights of 87 trapped sparrows were measured during January one winter. The researcher was
interested in whether or not adult sparrows weighed more than juvenile sparrows in the depths of winter. The
Minitab output is shown below.
Two-Sample T-Test and CI: wt, age
Two-sample T for wt
age
adult
juvenile
N
28
59
Mean
25.82
25.79
StDev
1.51
1.39
SE Mean
0.29
0.18
Difference = mu (adult
) - mu (juvenile)
Estimate for difference: 0.023
95% lower bound for difference: -0.544
T-Test of difference = 0 (vs >): T-Value = 0.07
P-Value = 0.473
DF = 49
(18a) What is the sample?
(18b) What is the population?
(18c) True or False: There was statistically significant evidence that the mean weight of adult sparrows was
greater than the mean weight juvenile sparrows that January. Explain how you reached your conclusion.
(18d) The p-value is 0.473. What is this probability describing?
(18e) Suppose you wanted to calculate a 95% confidence interval for the difference in mean weights. The
confidence interval is not available from the above Minitab output. What would you have to do differently
in Minitab to get a 95% confidence interval?
8
Download