Statistics 515 – Statistical Methods I Practice Test for Exam 1 E. A. Pena’s Class (WITH Answers) ______________________________________________________________________ Part I (24 points). For questions 1-8 please refer to the following information: Petroleum pollution in seas and oceans stimulates the growth of some types of bacteria. A count of petroleumlytic microorganisms (bacteria per 100 milliliters) in n = 10 portions of seawater gave the following readings. Raw Data: 49, 70, 54, 67, 59, 40, 61, 69, 71, 52 The associated ordered/arranged values are given below. Ordered Values: 40, 49, 52, 54, 59, 61, 67, 69, 70, 71 Furthermore, for this data set, Xi = Sum of the Observations = 592 (Xi)2 = Sum of the Squared Observations = 36014 1. Construct a stem-and-leaf or dot plot for this data set. Answer: 4 | 0, 9 5 | 2, 4, 9 6 | 1, 7, 9 7 | 0, 1 2. Compute the sample mean. Answer: Sample Mean = (592)/10 = 59.2 3. Determine the sample median. Answer: Sample Median: (59 + 61)/2 = 60 4. Compute the sample variance. [You may use the information given above!] Answer: Sample Variance = [36014 - (592)2/10](10-1) = 107.51 1 5. Compute the sample standard deviation. Answer: Sample Standard Deviation = 10.37 6. Determine the first quartile. Answer: First Quartile = Q1 = 52 7. Determine the third quartile. Answer: Third Quartile = Q3 = 69 8. Draw the boxplot. 70 C1 60 50 40 ________________________________________________________________________ Part II (18 points). For questions 9-14 please refer to the following information: Americium 241 (241Am) is a radioactive material used in the manufacture of smoke detectors. The article "Retention and Dosimetry of Injected 241Am in Beagles" [a beagle is a small short-legged smooth-coated hound] published in Radiation Research (1984), pp. 564-575, described a study in which 55 beagles were injected with a dose of 241Am (proportional to the animals' weights). Skeletal retention of 241Am (Ci/kg) was recorded 2 for each of the 55 beagles. The following summary information pertains to these 55 observations. Frequency Histogram for the Amount of Americium Retained in 55 Beagles Frequency 15 10 5 0 0.175 0.225 0.275 0.325 0.375 0.425 0.475 0.525 0.575 0.625 Amount of Americium Retained Numerical Summary Measures Type of Summary Measure Value of Americium Retained n (# of Observations) 55 Sample Mean 0.3489 Sample Median 0.3370 Sample Standard Deviation 0.0800 Minimum 0.1860 First Quartile (Q1) 0.3030 Third Quartile (Q3) 0.4080 Maximum 0.5850 3 Boxplot for the Amount of Americium Retained by the 55 Beagles Americium Retained 0.6 0.5 0.4 0.3 0.2 9. Describe the shape of the distribution for the Amount of Americium Retained by these 55 beagles. Provide explanations and/or reasons for your answer. Answer: The shape of the distribution is somewhat right-skewed. Notice that the sample mean is a little bit larger than the sample median, which is what we expect for right-censored distributions. You will also notice the right skewness from the boxplot, with the distance from the median to the third quartile being larger than the distance of the first quartile from the median. 10. Based on the information provided, are there any outliers in the data? If so, what is the approximate value of this(these) outlier(s). Answer: The boxplot indicates one outlier, whose value is .58, the largest observation. 11. Approximately what percentage of the 55 observations are between 0.3030 (the first quartile) and 0.4080 (the third quartile)? Answer: By definitions of the first and third quartiles, there will be approximately 50% of all observations between .3030 and .4080. 12. Provide a plausible explanation why the sample mean is larger than the sample median. Answer: Two possible explanations are a) the distribution is right-skewed; and b) the mean is affected by the outlier on the right tail. 4 13. Based on the histogram, approximately how many observations exceed the value of 0.375? Answer: 9 + 6 + 2 + 0 + 1 = 18 observations. 14. The interval around the sample mean whose limits are two sample standard deviations away from the sample mean is [.3489 - 2(.08), .3489 + 2(.08)] = [.1889, .5089]. What could you say about the percentage of observations that will fall in this interval? Provide a reason for your answer. Answer: Since the histogram is not exactly mound-shaped, we could use Chebyshev's Inequality to conclude that there will be at least 75% of all observations in the specified interval. As the distribution is not too far away from being mound-shaped, using the Empirical Rule, we could conclude that the percentage in this interval should be close to 95%. ________________________________________________________________________ Part III (21 points). For questions 15-21 please refer to the following information. In a three-year study of cocaine addiction by D. M. Barnes as reported in the article "Breaking the cycle of addiction" which appeared in Science, 241(1988), pp. 1029-1030, 72 chronic cocaine users were either given the antidepressant desipramine, lithium (the standard drug to treat cocaine addiction), or a placebo. The 72 subjects were randomly divided into three equal groups. The purpose of the study was to determine whether giving a cocaine addict an antidepressant will help in breaking the addiction. The following table presents the result of the study. Cocaine Relapse? Desipramine Lithium Placebo Total Yes 10 18 20 48 No 14 6 4 24 15. Compare the relapse rate for the three groups. Which among desipramine, lithium, or placebo is most effective in lowering the relapse rate among cocaine addicts? Answer: Rates: Desipramine = 10/24 = .4167; Lithium = 18/24 = .75; Placebo = 20/24 = .83. Therefore, the Desipramine group has the lowest relapse rates. 5 16. Consider the experiment of choosing at random one of the subjects in the above study and then determining the treatment given (which is either desipramine, lithium, or placebo) and observing whether the subject has a relapse. The sample space of this experiment is: S = {(Desipramine, Yes), (Desipramine, No), (Lithium, Yes), (Lithium, No), (Placebo, Yes), (Placebo, No)} What would be the appropriate probabilities to assign to these six outcomes in this sample space. Note that these probabilities should be based on the number of individuals in the different cells of the table and the overall total. Answer: P((D,Y)) = 10/72 = .1389; P((D,N)) = 14/72 = .1944; P((L,Y)) = 18/72 = .25; P((L,N)) = 6/72 = .08; P((P,Y)) = 20/72 = .28; P((P,N)) = 4/72 = .0556. 17. Define event A to be the event that "Desipramine" was assigned, and B be the event that the subject had a relapse. What are P(A) and P(B)? Answer: P(A) = (10 + 14)/72 = .3333 P(B) = 48/72 = .6667 18. Find P(A or B), that is, the probability that either A or B occurs. Answer: P(A or B) = (48+14)/72 = .8611. 19. Find P(B|A), the conditional probability of B given A. Answer: P(B|A) = P(A and B)/P(A) = (10/72)/((24/72) = 10/24 = .4167. 6 20. Are events A and B independent? Provide a reason for your answer. Answer: Since P(B) does not equal P(B|A), then A and B are dependent. 21. Find the probabilities a) P(Desipramine was assigned | B); and b) P(Lithium was assigned | B). Based on these probabilities, if you are given the information that the subject had a relapse, is it more likely that the subject was assigned desipramine or lithium? Answer: P(B|Desipramine) = P(B|A) = .4167 P(B|Lithium) = 18/24 = .75 Thus, it is more likely that the patient was assigned Lithium. ________________________________________________________________________ Part IV (12 points). For questions 22-24 please refer to the following information. ELISA tests are used to screen donated blood for the presence of the AIDS virus. The test actually detects antibodies, substances that the body produces when the virus is present. If the antibodies are present, ELISA is positive with probability of .997 and negative with probability of .003. If the blood being tested is not contaminated with AIDS antibodies, ELISA gives a positive result with probability of .015 and a negative result with probability of .985. Assume that 1% of a large population carries the AIDS antibody in their blood. Suppose that one individual is randomly chosen from this population. 22. Draw a tree diagram which depicts the outcomes of this two-step experiment, with step 1 being the process of choosing the person (outcomes: the person does or does not carry the antibody) and step 2 being the process of performing the ELISA test on the person’s blood (outcomes: positive or negative). Answer: Can't draw it on the computer so: (Antibody, Positive) (Antibody, Negative) (No AntiBody, Positive) (No AntiBody, Negative) Prob = (.01)(.997) = .00997 Prob = (.01)(.003) = .00003 Prob = (.99)(.015) = .01485 Prob = (.99)(.985) = .97515 7 23. What is the probability that the ELISA test for the AIDS virus will show a positive result? Answer: P(Positive) = P(Antibody, Positive) + P(No Antibody, Positive) = .00997 + .01485 = .02482. 24. Given that the ELISA test is positive, what is the probability that the chosen person has the AIDS antibody? Answer: P(Antibody | Positive) = P(Antibody, Positive)/P(Positive) = .00997/.02482 = .40169. ________________________________________________________________________ Part V (16 points). For questions 25-28 please refer to the following information. Let X be the random variable denoting the number of revisions (including the original version) before a manuscript is accepted for publication in a scientific journal. Suppose that the probability function of X is given by: x = number of revisions 1 2 3 4 5 p(x) = P{x revisions needed} 0.10 0.30 0.35 0.15 0.10 25. Find P{2 < X < 4} = probability that the manuscript will take between 2 and 4, inclusive, revisions before getting accepted for publication. Answer: Prob = .30 + .35 + .15 = .80 26. Determine the mean of X. Answer: Mean = = (1)(.10) + (2)(.30) + (3)(.35) + (4)(.15) + (5)(.10) = 2.85 27. Determine the standard deviation of X. Answer: Variance = 2 = (1 - 2.85)2(.10) + (2 - 2.85)2(.30) + (3 - 2.85)2(.35) + (4 - 2.85)2(.15) + (5 - 2.85)2(.10) = 1.2275 8 Standard Deviation = = Square Root (1.2275) = 1.1079 28. Suppose that we define the variable Y = 2X + 5. By simply using the mean and standard deviation of X, what will be the mean and standard deviation of Y? Answer: (Did not actually teach this in our class, so this type will not be included in exam) Mean of Y = 2(2.85) + 5 = 10.7 Variance of Y = (2)2(1.2275) = 4.91 Standard Deviation of Y = (2)(1.1079) = 2.2158 ________________________________________________________________________ Part VI (12 points). For questions 29-32 please refer to the following information. A psychiatrist believes that 80% of all people [a very large population] who visit doctors have problems of a psychosomatic nature. She decides to select 25 patients at random to test her theory. Let X denote the number of patients out of the 25 who have problems of a psychosomatic nature, so that X has a binomial distribution. Assume that the psychiatrist's theory is correct. 29. What is the mean of X? Answer: Mean = (25)(.80) = 20 30. What is the standard deviation of X? Answer: Variance = (25)(.80)(.20) = 4 Standard Deviation = 2 31. Find the probability that X = 20. [You may just write this in formula form.] Answer: P(X=20) = 25C20 (.8)20 (.2)25-20 = .1960 32. By using a table of binomial probabilities or a calculator, we find that P{X < 14} = .0056 when the psychiatrist’s theory is correct. Suppose that when the sample of 25 patients was actually taken, only 14 has problems of a psychosomatic nature. What conclusions could you make about the psychiatrist's theory? Answer: P( X < 14 ) = .006. If 14 are obtained, either that you are SOOO unlucky, or the theory is wrong, and since we believe that we are not that unlucky, then we would conclude that the theory is the wrong one! 9 10 Some Formulas That May Be Useful X 1 n Xi n i 1 2 n Xi 1 n 1 n 2 i 1 2 2 S Xi ( X i X ) n 1 n 1 i 1 n i 1 M = value that divides arranged data into two equal parts Q1 = Divides arranged data into 25:75 split Q3 = Divides arranged data into 75:25 split P(A or B) = P(A) + P(B) - P(A and B) P(B|A) = P(A and B)/P(A) P(B) = P(A)P(B|A) + P(Ac)P(B|Ac) P(A|B) = P(A)P(B|A)/P(B) P(A and B) = P(A)P(B) if A and B are independent xp(x) 2 ( x )2 p( x) x 2 p( x) 2 2 n p( x ) p x (1 p ) n x x =np; 2 np(1 p) n! = (n)(n-1)(n-2)...(2)(1) with 0! = 1 n n n! Cr r r!(n r )! 11