Statistics 108 Final Exam, Fall 2003 Name: _______________ You are allowed one 8x11.5 inch sheet of paper with notes on both sides. You are allowed a calculator. Show all work to receive partial credit. (1) In early April 1998, Humboldt State students recorded data from 582 trillium plants in the forest behind campus. One variable measured was whether the plant had a white flower (=w), pink flower (=p), or a flower that had gone to seed (=s). Students also measured the length of stem between the flower and the leaves. The pie chart and boxplots for the data are shown below. (1a) Is flower type a categorical or quantitative variable? (1b) Calculate how many degrees of the circle the white segment occupies. (1c) For the pink flowers, draw an arrow into the boxplot showing its median. Label this line “c”. (1d) For the pink flowers, draw an arrow into the boxplot showing its 75th percentile. Label this line “d”. (1e) What was the flower type (pink, seeded, or white) of the plant with the longest stem? 1 (2) Joey was told to shade in the region or regions of a Venn diagram which satisfy A B c . (Notice the complement!) Assume the events A and B are not disjoint; i.e., events A and B have an intersection. Joey correctly drew the Venn diagram, but could not figure out which region(s) to shade. Shade in the correct region(s) for Joey. (3) Suppose the probability of the British stock market finishing the year higher than it ended is 0.60. Similarly the probability of the French stock market finishing the year higher than it ended is also 0.60. Why is it not reasonable that the probability of the French and British markets both ending up higher is 0.60 0.60 0.36 ? (4) x 0 1 2 3 4 P(X=x) 0.24 0.41 0.26 ??? 0.01 P(X x) ?? ?? ?? ?? ?? The above distribution table gives the probabilities for the possible values of the random variable X. (4a) Find the missing value, ???, for P(X=3). (4b) Fill in the missing values for the cumulative distribution function (??). (4c) Is X a discrete or continuous random variable? (4d) Calculate the expected value of X. 2 (5) Suppose Z is distributed according to the standard normal distribution. Calculate P(1.0 Z 1.5) . (6) Suppose X is distributed according to a normal distribution with mean 100 and standard deviation 10. (6a) Calculate P ( X 120) . (6b) Calculate the 75th percentile of X; i.e., P(X<?)=0.75. (6c) Calculate P( X 98) if X is a sample mean calculated from a sample size of n=16. (7) True or False: Suppose the diameters of local Dungeness crabs are distributed according to a skewed distribution which is very different from the normal distribution. Then, according to the central limit theorem, if you were to randomly sample and measure 1,000 local Dungeness crabs, the distribution of these individual crab measurements would be approximately normally distributed. Defend your answer. (8) Five fair dice are rolled. Calculate the probability of getting exactly three 6s. 3 (9) A study was carried out to investigate if taking vitamin C affects the chance of getting a cold. 818 people were enrolled in the study. 407 randomly chosen people were given enough 1000mg vitamin C tablets to last them through the winter. The remaining 411 were given placebo pills. A physician, who did not know the group to which the person had been assigned, interviewed each person at the end of the cold season. 335 of the 411 people assigned the placebo had had a cold, while 302 of the 407 people assigned the vitamin C had had the cold. (9a) Was this an observational or experimental study? Explain why. (9b) For those who had taken the placebo, calculate the risk of getting a cold. (9c) Using the placebo group as the base group, calculate the relative risk of getting a cold for those who had taken vitamin C compared to those with the placebo. The data can be displayed in a 2-by-2 contingency table (see below) and then a chi-square test for independence executed. Outcome Cold No Cold Totals Placebo 411 335 76 Vitamin C 407 302 105 Totals 637 181 818 (9d) Suppose a chi-square test for independence was to be executed. State the null and alternative hypotheses. Ho: Ha: (9f) The final chi-square statistic for the data is 6.337 giving a P-value=0.012. State you conclusion in a complete sentence. Explain how you reached your conclusion? (9g) Assuming you did your statistical analysis correctly, is it possible that your conclusion is incorrect? Explain why or why not. 4 (10) A November 2003 the Gallup Poll News Service randomly telephoned and interviewed 1,004 adults in the United States, aged 18 and older. 612 of those interviewed answered “yes” to the question “Is religion very important in your own life”. (10a) What is the population of interest? (10b) What is the sample? (10c) Estimate the proportion of American adults who consider “religion very important in their own life.” (10d) Using the more accurate technique (not the quick conservative method), calculate a 95% confidence interval. (10e) Using your answer in part (d) as an example, explain what is being described the a 95% confidence interval. (11) A doctor wondered whether cholesterol level differed, on average, between before-breakfast and afterdinner. Both measurements were taken on 20,000 different people and the differences for each of the 20,000 calculated. A t test was executed on the differences and a 95% confidence interval calculated The beforebreakfast level was, on average, -0.043 lower than after-dinner. The P-value was 0.0337 and the 95% confidence interval ( -0.0850, -0.0016). (11a) Is there statistically significant evidence that the mean difference is not zero? Explain two ways of how you reached your conclusion. 1. 2. (11b) Is there a practical significance (real importance) in the result? Defend your answer. 5 (12) A 1985 study was carried out in a fine restaurant in Toulouse France to investigate the relationship between the age of a vintage armagnacs (a brandy) and the cost per a glass in Francs. A regression analysis was carried giving the following result: (12a) Circle which value is most likely to be the correlation for price and age: (i) –1.30 (ii) –1.00 (iii) –0.96 (iv) –0.23 (v) 0.00 (vi) +1.30 (vii) +1.00 (viii) +0.96 (ix) +0.23 (12b) How much would you expect to pay for a glass if they had 30 year old armagnac at this restaurant? Show your work. (12c) Suppose they actually charged 120 Francs for a glass of 30 year old armagnac. Calculate the value of that residual. (12d) Explain how the regression line was determined. 6 (13) A company manufactures a drug used treat a certain disease. They claim that the drug will cure at least 20% of the people with the disease. A medical school professor decided to research this claim as he thought it cures less than 20% of people with the disease. 300 randomly selected people with the disease were given the drug, and only 51 people were cured (17%). Using a level of significance of 5%, is there statistically significant evidence to support the doctor’s suspicion that the cure rate is less than 20%? That is, test H 0 : p 0.20 against H A : p 0.20 using 0.05 . (13a) Calculate the test statistic. (13b) Calculate the p-value. (13c) True or False: There is statistically significant evidence that the proportion of cases that this drug would cure the disease is less than 0.20. Explain how you reached your conclusion. (13d) In this problem, pˆ 0.17 . Is p̂ a statistic or parameter? Explain the difference between a statistic and a parameter. (14) A study was carried out where 30 students’ blood pressures were measured at the start of a Friday statistics lecture on a day when they had no exam. The blood pressures of those same 30 students were also measured at the start of another Friday statistics lecture, but on that day they were also scheduled to have a quiz. The question the study was trying to answer was whether or not the stress of a quiz raises the mean blood pressure of the students. Circle which method should be used to analyze this data? (i) Chi-square test for independence (ii) Paired t-test (iii) Regression analysis (iv) Two-sample t-test (15) True or False: Increasing the sample size will increase the power of the test. Explain why or why not. (16) True or False: If the null hypothesis is true, the probability of a type 1 error is . Explain why or why not. (17) True or False: A type 2 error can only occur if we decide to keep the null hypothesis. Explain why or why not. 7 (18) The weights of 87 trapped sparrows were measured during January one winter. The researcher was interested in whether or not adult sparrows weighed more than juvenile sparrows in the depths of winter. The Minitab output is shown below. Two-Sample T-Test and CI: wt, age Two-sample T for wt age adult juvenile N 28 59 Mean 25.82 25.79 StDev 1.51 1.39 SE Mean 0.29 0.18 Difference = mu (adult ) - mu (juvenile) Estimate for difference: 0.023 95% lower bound for difference: -0.544 T-Test of difference = 0 (vs >): T-Value = 0.07 P-Value = 0.473 DF = 49 (18a) What is the sample? (18b) What is the population? (18c) True or False: There was statistically significant evidence that the mean weight of adult sparrows was greater than the mean weight juvenile sparrows that January. Explain how you reached your conclusion. (18d) The p-value is 0.473. What is this probability describing? (18e) Suppose you wanted to calculate a 95% confidence interval for the difference in mean weights. The confidence interval is not available from the above Minitab output. What would you have to do differently in Minitab to get a 95% confidence interval? 8