Statistics 108 Fall 2002, (Prof. Rizzardi) Name:___________ +If the answer is a single number or interval, circle your final answer. +Show all work for full credit. Neatness counts. +If you are not certain of an answer, describe your logic for partial credit. +For this final, you are allowed to use: tan sheet (or new double sided copy) of normal distribution table calculator 1 page of notes - extra notes may be made on normal distribution table if more space is needed If you create your own new copy of the tan sheet and it is not double sided, restrict your 1-page of notes to the back side of the two normal distribution table sheets. (Problem 1) Let x1 3, x2 7, x3 2. (1a) Calculate x . 3 (1b) Calculate ( xi x ) 2 i 1 (1c) Calculate the sample standard deviation of the data. (Problem 2) Fill in both blanks. The smallest value a probability can be is ______ and the largest value is ________, and any probability value stated outside of this range is a mistake. (Problem 3) Suppose you were to calculate a 1-standard deviation window by calculating x s . Assuming the dot plot of the data is approximately “bell-shaped”, you would expect there to be roughly _______% of the data within the 1-standard deviation window. (Problem 4) A study (fictional!) at a local hospital was performed to see if breast feeding of infants was associated with lower allergy rates later in the child’s life. The mothers of young children suffering from severe non-food allergies were asked whether or not the child had been breast fed regularly for at least 2 months after birth. Mothers of children who were in the hospital for non-disease injuries (e.g. broken arm) were also asked the same question. Data were later analyzed to compare whether breast feeding was more or less prevalent among the allergy children. Was this an experimental or observational study? Explain why. 1 (Problem 5) Suppose a fair 6-sided die is rolled once. A particular event occurs if the die satisfies one of the values inside of the set. A={1,2,3} B={2,4,6} C={2,3,5} D={3,4,5,6} (5a) Calculate P (D ) (5b) Calculate P( D c ) ; i.e., the probability of the complement of event D. (5c) Calculate P ( A C ) (5d) Calculate P ( A | B ) (Problem 6) Suppose the following probabilities among an animal population: P( diseased | male ) = 0.2 P( diseased | female ) = 0.1 P( male ) = 0.30 P( female ) = 0.70 (6a) Are disease and sex independent? Explain. (6b) Calculate P( diseased ) 2 (Problem 7) The leaf lengths of 582 trillium plants were collected by HSU students. The boxplot of the lengths is shown below. Units are in centimeters. 20 leaf 15 10 5 (7a) Approximately what percent of the leaf lengths are less than 15cm in length? (7b) Give a rough calculation of the interquartile range. Show your work. (Problem 8) Suppose the probability of a newborn calf being male is 0.3; i.e., P(male)=0.3. If six calves were born, and their sexes are independent, what is the probability of exactly two males and four females; i.e., calculate P( number of males=2). 3 (Problem 9) A study was carried out where the weight (pounds) and cholesterol levels (mg/100 ml) were compared. Of interest was whether cholesterol is associated with weight. A simple linear regression analysis was performed by a statistician. Below is some of the Minitab output. The regression equation is cholesterol = - 128 + 2.03 wt Predictor Constant wt Coef -127.57 2.0320 SE Coef 78.90 0.4447 T -1.62 4.57 P 0.130 0.001 Regression Plot cholesterol = -127.567 + 2.03199 wt S = 36.8697 R-Sq = 61.6 % R-Sq(adj) = 58.7 % cholesterol 300 200 100 140 150 160 170 180 190 200 210 220 wt (9a) If a randomly sampled man weighed 180 pounds, using the regression analysis, what would you expect his cholesterol to be? (9b) For each pound increase in weight, you would expect cholesterol to (a) Decrease about 128 mg/100ml (b) Increase about 128 mg/100ml (c) Increase about 2.0 mg/100ml (d) Increase about 0.4 mg/100ml (e) Increase about 4.6 mg/100ml (9c) The correlation coefficient between weight and cholesterol is about: (a) –2.0 (b) – 0. 75 (c) –0.06 (d) 0 (e) +0.06 (f) +0.75 (g) +2.0 4 (Problem 10) As part of the National Health and Nutrition Examination Survey, iron levels were checked for a sample of 786 girls aged 12 to 15. Iron deficiency was detected in 71 of those sampled. Calculate a 95% confidence interval for the proportion of girls in the general population aged 12 to 15 whom are iron deficient. You may use the conservative method if you wish. (Problem 11) The below contingency table is a chi-square test output from Minitab with some parts deleted. It involves the student data and compares hair color against gender. Rows: sex Columns: hair black blond brown lightbro red All female 3 2.96 15 15.28 9 WWW 4 VVVV 3 1.97 34 UUUUU male 3 3.04 16 15.72 XXX YYY 4 TTTT 1 2.03 35 35.00 All 6 6.00 31 31.00 20 20.00 8 8.00 4 4.00 69 69.00 Chi-Square = 1.218, DF = 4, P-Value = 0.875 6 cells with expected counts less than 5.0 Cell Contents -Count Exp Freq (11a) For brown-hair males, (observed) XXX= __________. (11b) For brown-hair males, (expected) YYY=___________. (11c) Which conclusion is most appropriate? (a) There is statistically significant evidence that the mean hair color of males is equal to females (P=0.875). (b) There is not statistically significant evidence that he mean hair color of males is equal to females (P=0.875). (c) There is statistically significant evidence that gender and hair color are dependent (P=0.875). (d) There is not statistically significant evidence that gender and hair color are dependent (P=0.875). 5 (Problem 12) Suppose the random variable X is distributed according to the standard normal distribution (mean=0, sd=1), calculate P(X < 2). (Problem 13) Suppose the random variable X is normally distributed with a mean of 50 and standard deviation of 5. If 16 X’s were sampled and their mean, X , calculated, find P( X < 52.5 ). (Problem 14) Fill in both blanks. Suppose the random variable X has any distribution (not necessarily normal!) with mean and standard deviation . Then, as the size of the sample (n) gets large, the distribution of __________(hint: a symbol) will become “approximately _________________” with mean and standard deviation n . (Problem 15) Suppose 36 island foxes were captured and weighed. The mean weight was 10 pounds with a sample standard deviation of 3 pounds. (15a) Calculate a 95% confidence for the mean weight of island foxes. (You may use 2 as the multiplier.) (15b) What is meant by “95% confidence interval”? (15c) True or False: The width of the confidence interval will typically get wider with a larger sample size. If false, explain why the interval would get narrower. 6 (Problem 16) A P-value is a probability. Specifically, what is the probability describing? (Problem 17) A motor company claims that the probability of one of their cars needing a repair during the first 80,000 miles is at most 0.3. A survey of 600 car owners resulted in 34% of the 600 cars needing a repair during the first 80,000 miles. Using 0.05 , test the hypothesis that the true probability of repair is greater than 0.3. (17a) State the null and alternative hypothesis. (17b) Calculate the appropriate test statistic. (17c) Calculate a p-value. Sketch a graph shading in the area beneath the density curve which equals the p-value. (17d) Should you keep or reject your null hypothesis? Explain how you reached this conclusion. (17e) Is it possible that you made a Type I error? Explain. 7