Chapter 18 Two-sample Problems So far we have looked into the problems that involve one sample. For this we developed techniques to draw inference about the (unknown) mean of the population from which the sample is drawn. Issue: In real life, often we have two samples from two independent populations and we want to compare the two population means using the two samples. Two typical situations are: 1) We want to compare responses to two treatments. For example: Compare cholesterol levels between a placebo group of subjects and a group receiving a new cholesterol-lowering drug. 2) We want to compare characteristics of two populations. For example: Compare scores of males and females on a positive attitude test. The data (two samples) for the above situations can arise from the comparative experiments. Goal: Develop techniques to draw inference for comparing the two (unknown) population means. Notation: 1 – population mean of the first population 2 – population mean of the second population 1 – population standard deviation of the first population 2 – population standard deviation of the second population n1 – size of SRS drawing from the first population n2 – size of SRS drawing from the second population 1 Assumptions for comparing two population means: 1) The two samples are SRSs from two distinct populations. 2) Samples are independent!!! - the response of subjects in one sample has no influence on the responses of the subjects in the other sample. 3) Both populations are normally distributed, i.e., - 1, 2, 1 and 2 are unknown (realistic situation!) The parameter of interest is We want to develop CI for An obvious estimate of 1-2 is Idea: Large differences between the sample means suggest the population means are likely to be different. However, large differences can arise just by chance if the observations vary a great deal. So, the variability needs to be accounted for. It is given by the Since the population standard deviations 1, 2 are unknown, we replace them with 2 Recall from Chapter 17, the form of the CI for (when is unknown): In general terms it has the form: Analogously, using the general form of t-CI, we have: The CI for 1-2 is given by Here t* is the upper /2 critical value for the tk distribution where k = min(n1–1, n2–1) is the degrees of freedom. This CI has level at least (1-). Ex: Influence of gene aP2 A geneticist at a Medical Center is studying the influence of gene aP2 on diabetes. She compares the level of insulin in a random sample of 11 normal mice with another random sample of 10 mice whose gene aP2 is removed. The following results (in ng/ml) are obtained: Group Normal aP2 removed Mean 5.90 0.75 Std. dev. 2.850 0.632 (a) Compute a 95% CI for the difference in the mean insulin levels of the normal mice and aP2 removed mice. 3 (b) Compute a 90% CI for the difference in the mean insulin levels of the normal mice and aP2 removed mice. (c) Suppose the geneticist wants to test if there is significant difference in the mean insulin levels of the normal mice and aP2 removed mice at 5% level. Perform the test by using confidence intervals with the four-step process. 4 Keys points to check in solving problems that involve drawing inference about population mean(s): Check if the question is about one sample or two samples. If it is about one sample, check if the population standard deviation is given. o If is given, use z CI. o If is not given, use one-sample t-CI. If it is about two samples, check if the samples are independent or matched pairs (not independent). o If the samples are matched pairs, apply one-sample t procedures to the differences of observed responses. o If the samples are independent, apply two-sample t CI. 5 Ex: The diameter of Jupiter is measured 100 times independently by a new unbiased process. Using these 100 measurements, a 99% CI for the true diameter is computed to be (88,707 miles, 88,733 miles). Is there evidence at 1% level that the true diameter of Jupiter is not 88,720 miles? Use the four-step process. 6 Ex. Suppose a manufacturer of printers for personal computers wishes to estimate the mean number of characters printed before the printhead fails. The printer manufacturer tests 15 printheads and records the number of characters printed until failure for each. These 15 measurements (in millions of characters) are listed below. 1.13 1.55 1.43 0.92 1.25 1.36 1.32 0.85 1.07 1.48 1.20 1.33 1.18 1.22 1.29 (a) Construct a 99% confidence interval for the mean number of characters printed before the printhead fails. (b) The store manager is interested in knowing if the number of characters printed before the printhead fails is one million or not. State the appropriate hypotheses. Using the above CI what do you conclude? Use the four-step process. 7 Ex. The Chapin Social Insight Test is a psychological test designed to measure how accurately a person appraises other people. The possible scores on the test range from 0 to 41. During the development of the Chapin test, it was given to several different groups of people. Here are the results for male and female college students majoring in the liberal arts: Group Sex N x s 1 2 Male Female 133 162 25.34 24.94 5.05 5.44 Do these data support the contention that female and male students differ in average social insight at significance level = 0.1? Use the four-step process. 8