STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 In this section, we will investigate methods for comparing two or more population means. Specifically, we will discuss the following: Comparing two means with dependent samples (Section 7.1) Comparing two means with independent samples (Section 7.2) 7.1 - COMPARING TWO POPULATION MEANS: DEPENDENT SAMPLES First, we will consider methods that allow us to make comparisons on numerical variables between two different groups. The hypothesis testing procedures presented in this section should be used when the groups being compared are NOT INDEPENDENT of each other. Whether or not two groups are independent or dependent usually is determined by how the data is collected. Consider the following example. Example 7.1: The degree of clinical agreement among different physicians on the presence or absence of generalized lymphadenopathy was assessed in 32 randomly selected participants from a prospective study of male sexual contacts of men with AIDS or an AIDS-related condition. The total number of palpable lymph nodes was assessed by two physicians, and interest lies in comparing the two physicians. The data are given in the file LymphNodes.JMP. A portion of the data is shown below. Comment: For these data, the first observation listed under Doctor A IS directly related to the first observation listed under Doctor B because the two measurements are being made on the same patient. We have randomly selected patients; therefore, once we have chosen Patient 1 to be assessed by Doctor A, the same patient will be assessed by Doctor B. Thus, these two samples are therefore dependent. If each observation in from population 1 is directly related to one observation from population 2 then we have dependent samples. 150 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Could this experiment be conducted in such a way that the samples are independent? If so, how? What are the disadvantages of this approach? Back to Example 7.1 Research Question: Is there statistical evidence that these two physicians disagree in their assessments of generalized lymphadenopathy? To answer this question, the testing procedure works with the DIFFERENCES and not the actual measurements. We do this to remove the structure of dependence between the two groups. In JMP, we create an additional column (by double-clicking on the empty column next to “‘Doctor B”) and title it “Difference”. Right click on the new column header and select Formula. In the edit window, tell JMP to calculate the difference as follows: 151 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Click Apply and then OK, and JMP returns the following: Now, the parameter of interest is the true population AVERAGE OF THE DIFFERENCES which we will represent by μdifference. Our best estimate for this parameter is the sample mean of the observed differences. We’ll call this quantity x difference . The sample standard deviation of the differences will be denoted by sdifference. Comment: Note that these differences represent a single column of data. Therefore, the testing procedure is the SAME as the procedure for testing a single population mean we covered in Section 6.1 The Procedure for Comparing Two Population Means (with Dependent Samples) Step 0: Check the assumptions behind the test to be sure that the test is valid. For this particular hypothesis test, we must check the following: a. Either the number of pairs is sufficiently large, OR b. It is reasonable to assume the DIFFERENCES are normally distributed. Step 1: Convert the research question into H0 and Ha. Two-Tailed Test: Ho: Ha: 152 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Upper-Tailed Test: Ho: Ha: Lower-Tailed Test: Ho: Ha: Step 2: Determine α, the level of significance. Step 3: Calculate a test statistic from your data. For this test, the test statistic is t Step 4: x difference μdifference s difference n (Note: this the same test statistic used in Section 6) Determine the p-value and make a decision concerning H0. Decision Rule: If the p-value is less than α, then the data is said to support the alternative hypothesis. That is, we reject H0 in favor of Ha. Step 5: Write a conclusion in terms of the original research question. You should state your p-value in your conclusion. 153 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Back to Example 7.1: Carry out the hypothesis test to determine whether there is statistical evidence that these two physicians disagree in their assessment of generalized lymphadenopathy. When setting up hypotheses, let µ1 and µ2 represent the true mean number of palpable lymph nodes found by Doctor A and Doctor B, respectively. Step 0: Check the assumptions behind the test to be sure that the test is valid. Is the number of pairs sufficiently large? If not, is it reasonable to assume the differences are normally distributed? Select Analyze > Distribution and move Difference to the Y, Columns box. 154 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Step 1: Convert the research question into H0 and Ha. H0: Ha: Step 2: Determine α, the level of significance. Step 3: Calculate a test statistic from your data. 155 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Verify the test statistic using our formula: t Step 4: x difference μdifference = s difference n Determine the p-value and make a decision concerning H0. 156 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 p-value = Decision: Step 5: Write a conclusion in terms of the original research question. “We have evidence that these two physicians disagree in their assessments of generalized lymphadenopathy (p-value < .0001).” Next, obtain the 95% confidence interval for the difference in means: Questions: 1. Do the doctors agree in the number of palpable lymph nodes found? 2. Which doctor tends to find more palpable lymph nodes? 3. To what degree do the doctors differ? 157 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Example 7.2: The data in the file Captopril.JMP give the diastolic blood pressures for 15 patients with moderate essential hypertension, immediately before and two hours after taking a drug, captopril. Our interest is in investigating the response to the drug treatment. In particular researchers would like to determine if the systolic blood pressure decreases by more than 10 mmHg and whether the diastolic BP decreases by more than 5 mmHg following taking Captopril. Let µ1 and µ2 represent the population means of the blood pressures before and after taking the drug, respectively. Question: Are these samples dependent or independent? Explain. The research question is whether or not the average blood pressure is lower after taking the drug. Using JMP, carry out the formal hypothesis test to answer this question. Step 0: Step 1: Check the assumptions behind the test to be sure that the test is valid. Is the number of pairs sufficiently large? If not, is it reasonable to assume the differences in systolic and diastolic blood pressures are normally distributed? Systolic Only Convert the research question into H0 and Ha. H0: Ha: 158 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Step 2: Determine α, the level of significance. Step 3: Calculate a test statistic from your data. Step 4: Determine the p-value and make a decision concerning H0. p-value = Decision: 159 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Step 5: Write a conclusion in terms of the original research question. Finally, construct a 95% confidence interval for the difference in systolic blood pressure means: Questions: 1. Interpret this confidence interval. 2. Does this interval agree with the results of the hypothesis test? Explain. 160 STAT 110: Section 7.1 – Comparing Two Population Means Using Dependent Samples May 2012 Example 7.2 (cont’d): Diastolic blood pressure For diastolic blood pressure we have the following results. Ho: Ha: p-value = Conclusion: Interpret the confidence interval for the mean difference in diastolic blood pressure . Is this a contradiction of our conclusion from the hypothesis test? Explain. 161