Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-1 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) A) Two Population Proportions Using Independent Samples EXAMPLE: The article “Foraging Behavior in the Indian False Vampire Bat” reported that 36 of 193 female bats in flight spent more than 5 minutes in the air before locating food. For male bats, 64 of 168 exceeded 5 minutes when locating food. Is there sufficient evidence to indicate that the proportion of flights taking longer than 5 minutes differs for the two sexes? Note: we have two independent samples and the interest is in comparing the proportions for the two genders Notation: Population Population Sample Proportion Size π1 n1 1 π2 n2 2 Sample Proportion p1 p2 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-2 To compare 2 population proportions based on 2 independent samples we shall consider the size of the difference π 1 − π 2 : π1 − π 2 = 0 ⇒ π1 = π 2 π1 − π 2 > 0 ⇒ π1 > π 2 π1 − π 2 < 0 ⇒ π1 < π 2 Our sampling estimator of this difference is the difference in the sample proportions p1 − p 2 when the two samples are independent of one another. Sampling Distribution of p1 − p 2 when the two samples are independently and randomly taken: 1) the mean of the distribution is µ p1− p2 = π 1 − π 2 (that is, p1 − p 2 is unbiased) 2) the standard deviation of the distribution is π 1(1 − π 1 ) π 2 (1 − π 2 ) σ p1− p2 = + n1 n2 3) the shape of the distribution is approximately normal (a bell curve) if both n1 and n1 are large. Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-3 The sample sizes are large enough to invoke the CLT if both 1) n1p1 ≥ 10 and n1(1 − p1 ) ≥ 10 and 2) n 2 p2 ≥ 10 and n 2 (1 − p 2 ) ≥ 10 . So, if p1 − p 2 is at least approximately normally distributed we get that z= ( p1 − p 2 ) − (π 1 − π 2 ) estimate of σ p1− p2 has an approximate standard normal distribution, i.e. z is approximately N(0,1) As we’ll see, the estimator of σ p1− p2 π 1(1 − π 1 ) π 2 (1 − π 2 ) = + depends on n1 n2 whether we are constructing a confidence interval or performing a test of the difference in the two population proportions. Let’s look at hypothesis testing first. Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-4 LARGE SAMPLE HYPOTHESIS TEST OF THE DIFFERENCE OF TWO POPULATION PROPORTIONS BASED ON TWO INDEPENDENT SAMPLES: H0: π 1 − π 2 = 0 Null hypothesis: Alternative Hypothesis is one of three: a) b) c) HA: π 1 − π 2 > 0 HA: π 1 − π 2 < 0 HA: π 1 − π 2 ≠ 0 Test Statistic: where pC z= ( p1 − p 2 ) ⎛ 1 1 ⎞ ⎜ ⎟⎟ pC (1 − pC )⎜ + ⎝ n1 n 2 ⎠ n1p1 + n 2 p 2 = n1 + n 2 total # successes in both samples = total sample size P-value: depends on the alternative hypothesis: a) P-value = Pr( Z > z) b) P-value = Pr( Z < z) c) P-value = 2 Pr( Z < - |z| ) Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-5 Decision Rule: reject Ho if P-value ≤ α Assumptions: 1. n1 and n2 are large enough for the sample proportions to be approximately normally distributed 2. the sampling was random and not more than 5% of the population. 3. the two samples are independent EXAMPLE Bats: Sample Statistics: Population 1= female 2= male Hypotheses: Sample Size n1= 193 # Suc- Sample Proportion cesses 36 36 p1 = = .1865 193 64 n2 = 168 64 p2 = = .3809 168 Ho: π 1 − π 2 = 0 HA: π 1 − π 2 ≠ 0 Significance level: let’s use α = 0.05 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) Assumptions: n1 p1 ≥ 10, n1 (1 − p1 ) ≥ 10 n 2 p 2 ≥ 10, n 2 (1 − p 2 ) ≥ 10 14-6 have been met. And we have 2 random samples. Test Statistic: first we need to calculate the common proportion pC Then, z= = n1p1 + n 2 p 2 36 + 64 = = = .277 n1 + n 2 193 + 168 ( p1 − p 2 ) ⎛ 1 1 ⎞ ⎜ ⎟⎟ pC (1 − pC )⎜ + ⎝ n1 n 2 ⎠ (.1865 − .3809 ) 1 ⎞ ⎛ 1 + .277(1 − .277 )⎜ ⎟ 193 168 ⎝ ⎠ = −4.12 P-value: = 2 Pr(Z< -|z|) = 2 Pr(Z<-4.12) <0.0001 ≈ 0+ Alternative Decision Rule: Reject H0 if the test statistic meets the condition |z| > z*(1-α/2) |z| = |-4.12| = 4.12 >>> z*(1-α/2) = z*(0.975) = 1.96 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-7 Conclusions: We reject the null hypothesis since pvalue <0.0001 <<<< α=0.05. There is strong evidence, based on these samples, that the population proportion of female false vampire bats taking longer than 5 minutes before locating food is different from the proportion for male bats doing the same. Now, we would like to estimate the size of the difference between the two proportions. That’s done with a confidence interval. LARGE SAMPLE CONFIDENCE INTERVAL ESTIMATION OF THE DIFFERENCE OF TWO PROPORTIONS BASED ON INDEPENDENT SAMPLES: Interval Estimator: p1(1 − p1 ) p 2 (1 − p 2 ) ( p1 − p 2 ) ± ( z value) + n1 n2 where the z critical value is based on the confidence level desired Assumptions: 1. n1 and n2 are large enough for the sample proportions to be approximately normally distributed 2. the sampling was random and not more than 5% of the population. 3. the two samples are independently taken Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-8 Note that the estimator of the standard deviation of p1 − p 2 is different than the one used in hypothesis testing! EXAMPLE for the bats let’s use a 90% C.I. to estimate the difference in proportions of time spent searching for food between males and females. From topic 11, the z critical value for 90% is 1.645. So, a 90% C.I. is p1 (1 − p1 ) p 2 (1 − p 2 ) ( p1 − p 2 ) ± 1.645 + n1 n2 .187(1 − .187 ) .381(1 − .381) = (.187 − .381) ± 1.645 + 193 168 = −.194 ± 1.645(.0468 ) = −.194 ± .077 = ( −.271, − .117 ) Hence, with 90% confidence, the population proportion of female false vampire bats that spend more than 5 minutes locating food is between 11.7% and 27.1% lower than the population proportion of male bats that spend more than 5 minutes locating food. Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-9 EXAMPLE Old Faithful, the geyser at Yellowstone National Park, is known to have two distinct types of eruptions: long-duration (> 3 minutes) and short duration (< 3 min). If the types of eruptions are equally likely at all times of the day, then the proportion of long duration eruptions occurring during the day should be the same as the proportion at night. A geologist hypothesized that the length of duration was affected by solar heating during the day and hence, the proportion of daytime long duration eruptions should be higher than the nighttime proportion. Two samples were taken in August using randomly selected dates. The geologist observed 53% long duration eruptions during the day (out of 35 eruptions) and 49% (out of 41 eruptions) at night. Is there sufficient evidence to support the scientist’s claim? Use a significance level of 0.025. Let day eruptions be population #1 and night, #2. Hypotheses: Ho: π 1 − π 2 = 0 HA: π 1 − π 2 > 0 Significance level: Assumptions: α = 0.025 n1 p1 ≥ 10, n1 (1 − p1 ) ≥ 10 n 2 p 2 ≥ 10, n 2 (1 − p 2 ) ≥ 10 have Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-10 been met. And we have 2 random samples. Test Statistic: first we need the common proportion n1p1 + n2 p2 35(.53) + 41(.49) pC = = = 0.508 n1 + n2 35 + 41 Then, z= ( p1 − p 2 ) ⎛ 1 1 ⎞ ⎟⎟ pC (1 − pC )⎜⎜ + ⎝ n1 n 2 ⎠ (.53 − .49) = = 0.35 1⎞ ⎛ 1 + ⎟ .51(1 − .51)⎜ ⎝ 35 41⎠ P-value: = Pr(Z> z) = Pr(Z>0.35)=1 – Pr(Z<0.35) = 1 – 0.6368 = 0.3632 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-11 Conclusions: We fail to reject the null hypothesis since p-value =0.36>>>> α=0.025. There is insufficient evidence based on these samples, to support the geologist’s contention that the proportion of long duration geyser eruptions is higher during the day than the proportion at night. Do we need a CI here? B) Two Population Means Using Independent Samples EXAMPLE A scientist is interested in determining which of two butterfly species has a larger wingspan. Species 1 is found on forest understory plants and tends to feed on its nursery plants. Thus it doesn’t travel far. The other species is found on open field flowers and migrates seasonally. She hypothesizes that the migrating species has larger average wingspans than the forest species and plans to take two samples to test her hypothesis. Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-12 Notation: Popu- Popula- Popula- Sample Sample Sample lation tion tion Size Mean Standard Mean Standard Deviation Deviation x1 s1 1 n1 σ1 µ1 x2 s2 2 n2 σ2 µ2 To compare 2 population means we shall consider the size of the difference µ1 − µ 2 : µ1 − µ 2 = 0 ⇒ µ1 − µ 2 > 0 ⇒ µ1 − µ 2 < 0 ⇒ µ1 = µ 2 µ1 > µ 2 µ1 < µ 2 Our sampling estimator of this population difference is the sample mean difference x1 − x 2 when the two samples are independent of one another. Sampling Distribution of x1 − x 2 when the two samples are independently and randomly taken: 1) the mean of the distribution is µ X − X = µ1 − µ 2 1 2 (that is, x1 − x 2 is unbiased) Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-13 2) the standard deviation of the distribution is σ 12 σ 22 + σ X −X = 1 2 n1 n 2 3) the shape of the distribution is approximately normal (a bell curve) if a) both n1 and n1 are large, or b) both of the populations being sampled are approximately normally distributed The estimator of µ1 − µ 2 is x1 − x 2 and The estimator of σX 1− X 2 = σ 12 σ 22 + n1 n 2 depends on whether σ 1 ≠ σ 2 (unequal variance case) or σ 1 = σ 2 (equal variance case). When σ 1 ≠ σ 2 , the estimator of σ X1 − X 2 is given by sX 1− X 2 = s12 s 22 + . n1 n 2 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-14 What are the degrees of freedom for s X1 − X 2 ? Satterthwaite showed that the appropriate degrees of freedom for this estimator are df = (V1 + V2 ) 2 V12 V22 + n1 − 1 n 2 − 1 s12 s 22 where V1 = and V2 = n1 n2 When σ 1 = σ 2 , the estimator of σ X1 − X 2 is given by s x1− x2 = 1 1⎞ + ⎟ ⎝ n1 n2 ⎠ 2⎛ sc ⎜ where the estimator of the common variance is sc2 s12 (n1 − 1) + s22 (n2 − 1) . = n1 + n2 − 2 The degrees of freedom for this estimator are n1 + n2 − 2 . So, if x1 − x 2 is at least approximately normally distributed we get that Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) t = 14-15 ( x 1 − x 2 ) − ( µ1 − µ 2 ) or t= s12 s 22 + n1 n 2 ( x1 − x2 ) − ( µ1 − µ 2 ) 1 1⎞ + ⎟ n ⎝ 1 n2 ⎠ 2⎛ sc ⎜ have approximate T-distributions. HYPOTHESIS TEST OF THE DIFFERENCE IN TWO POPULATION MEANS BASED ON TWO INDEPENDENT SAMPLES: Null hypothesis: H0: µ1 − µ 2 = D0 where D0 is the hypothesized difference in the means Alternative Hypothesis is one of three: a) HA: µ1 − µ 2 > D0 b) HA: µ1 − µ 2 < D0 c) HA: µ1 − µ 2 ≠ D 0 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) Test Statistic: (1) t = either ( x 1 − x 2 ) − D0 s12 n1 (2) t = + s 22 14-16 or n2 ( x1 − x2 ) − ( µ1 − µ 2 ) ⎛1 1⎞ sc2 ⎜ + ⎟ ⎝ n1 n2 ⎠ The df are for (1): (V1 + V2 )2 V12 V22 + n1 − 1 n2 − 1 s12 s 22 where V1 = and V2 = n1 n2 and for (2): n1 + n2 − 2 . P-value: depends on the alternative hypothesis: a) P-value = Pr( T > t) b) P-value = Pr( T < t) c) P-value = 2 Pr( T > |t|) Decision Rule: reject Ho if P-value ≤ α Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-17 Assumptions: 1. n1 and n2 are large enough for the sample means to be approximately normally distributed 2. the sampling was random and not more than 5% of the population. 3. the two samples are independently taken EXAMPLE Nitrogen is the most common nutrient applied to soils. In tropical areas with warm temperatures and heavy rainfall, only part of the applied nitrogen is used by crops and the rest is lost. Information about the mean nitrogen loss (N-loss) is important for research on optimal growth of plants. To that end, two nitrogen fertilizer treatments are to be compared for their average N-loss: Urea alone (1) and Urea+N-Serve (2). A sugarcane field was divided into equal size plots and plots were randomly assigned to one of the two treatments. There were sufficient numbers of plots so that no treated plots were adjacent on any side. Important Point about Experimental Design: when planning an experiment to compare two or more treatments: Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-18 1) experimental units (plants, field plots, people, etc) should be randomly selected from the larger group from which they could be selected (the population of potential experimental units) 2) treatments should be randomly assigned to the experimental units 3) extraneous or confounding factors should be considered and minimized when assigning and running the experiment (e.g. all units should be the same size, have the same weather conditions, etc) The following data represent Nitrogen loss (% of total N applied) over a 16 week period: Fertilizer UN U Percentage N-loss 10.8, 10.5,14.0, 13.5, 8.0, 9.5, 11.8, 10.0, 8.7, 9.0, 9.8, 13.8, 14.7, 10.3, 12.8 8.0, 7.3, 14.1, 9.8, 7.1, 6.3, 10.0, 7.1, 7.9, 6.1, 6.9, 11.0, 10.0 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-19 15 14 13 NLOSS 12 11 10 9 8 TREATM 7 6 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 Count Count U UN Group N Mean SD S2 U 13 8.585 2.288 5.235 UN 15 11.147 2.140 4.580 Question: Is there sufficient evidence to support the hypothesis that the two treatments differ in their mean percentage N-loss? Hypotheses: Ho: µ1 − µ 2 = 0 HA: µ1 − µ 2 ≠ 0 Significance level: α = 0.05 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-20 Test Statistic (assuming unequal variances) t = ( x 1 − x 2 ) − D0 s12 n1 + s 22 (8.58 − 11.15 ) − 0 = 2 (2.29 ) (2.14 ) + 13 15 n2 = −3.045 2 s12 (2.29) 2 = = 0.4034 Degrees of Freedom: V1 = n1 13 s 22 (2.14) 2 V2 = = = 0.3053 n2 15 df = (V1 + V2 ) 2 V12 n1 − 1 + V22 n2 − 1 = (.4034 + .3053 ) 2 2 2 = 24.8 (.4034 ) (.3053 ) + 13 − 1 15 − 1 round down to 24 df. P-value: 2 Pr ( T > |t| ) =2 Pr(T> 3.0) = 2(.003) =.006 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-21 Test Statistic (assuming equal variances) t= ( x1 − x2 ) − D0 ⎛1 1⎞ sc2 ⎜ + ⎟ ⎝ n1 n2 ⎠ = (8.58 − 11.15) − 0 = −3.06 1⎞ ⎛ 1 4.882⎜ + ⎟ ⎝ 13 15 ⎠ where s12 (n1 − 1) + s22 (n2 − 1) 5.235(12) + 4.580(14) 2 sc = = = 4.882 n1 + n2 − 2 13 + 15 − 2 Degrees of Freedom: n1 + n2 − 2 = 26. P-value: 2 Pr ( T > |t| ) =2 Pr(T> 3.0) = 2(.003) =.006 Conclusion: Regardless of the choice of test, the pvalue = .006 << α=0.05, so reject the null hypothesis. There is sufficient evidence to indicate that the two nitrogen treatments differ in their average percentage nitrogen loss at α=0.05. How do we identify which test statistic is appropriate? Well, we can either use the rule of thumb that the sample variances should be within 3 times each other OR do a test of equality of the two variances OR assume they are unequal. Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-22 Oneway Analysis of N-Loss By Treatment 16 N-Loss 14 12 10 8 6 U UN Treatment t Test Assuming equal variances Difference t Test DF Prob > |t| Estimate -2.5621 -3.060 26 0.0051 Std Error 0.8373 Lower 95% -4.2831 Upper 95% -0.8410 Assuming UnEqual Variances Difference t Test DF Prob > |t| Estimate -2.5621 -3.045 24.8474 0.0054 Std Error 0.8414 Lower 95% -4.2870 Upper 95% -0.8371 Since the two treatments do differ in their mean Nlosses, I’d like to estimate the size of that difference. Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-23 CONFIDENCE INTERVAL ESTIMATION OF THE DIFFERENCE OF TWO MEANS BASED ON INDEPENDENT SAMPLES: Interval Estimator: ( x1 − x2 ) ± (t critical value ) × estimator of σ x1 − x2 where the t critical value is based on the confidence level desired and the degrees of freedom are calculated according to which estimator you use (equal or unequal variance). Assumptions: 1. n1 and n2 are large enough for the sample means to be approximately normally distributed 2. the sampling was random and not more than 5% of the population. 3. the two samples are independently taken So, to go back to our example: A 95% confidence interval based on two independent samples is given by either s12 s 22 ( x1 − x 2 ) ± (t critical value ) + n1 n 2 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-24 or ⎛1 1⎞ ( x1 − x2 ) ± (t critical value) sc2 ⎜ + ⎟ ⎝ n1 n2 ⎠ From earlier: Group U UN N 13 15 Mean 8.585 11.147 SD 2.288 2.140 And the df = 24 for the unequal case and 26 for the equal case. T critical value for 95% and 24 df = 2.06. So, (2.29)2 (2.14)2 + (8.58 − 11.15) ± 2.06 13 15 = ( −4.2870 − 0.8371) Similarly, the T critical value for 95% and 26 df = 2.05, so we obtain Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-25 1⎞ ⎛ 1 (8.58 − 11.15) ± 2.05 4.882⎜ + ⎟ ⎝ 13 15 ⎠ = ( −4.2831, − 0.8410) Thus, with 95% confidence, the mean nitrogen loss (%) from Urea alone is between .8% and 4.3% below the mean loss of the Urea+N-Serve combination. EXAMPLE Discharge of industrial waste into rivers affects water quality. To assess the effect of a power plant on water quality, 24 samples were taken 16 km upstream of the plant and another 24 were taken at 4 km downstream. Alkalinity (mg/l) was measured on each water sample. Do the data suggest that the true mean alkalinity below the plant is more than 50 mg/l higher than the true mean alkalinity upstream of the plant? Output from a statistical software program: Group upstream downstream N 24 24 t-score = 113.2 df = 45 Mean 75.9 183.6 SD 1.83 1.70 2-sided P-value = 0+ Pop’ln 2 1 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) Hypotheses: 14-26 Ho: µ1 − µ 2 = 50 HA: µ1 − µ 2 > 50 Check the t-score: ( x 1 − x 2 ) − D0 (183.6 − 75.9) − 50 t= = = 113.17 2 2 2 2 s1 s 2 (1.70) (1.83) + + 24 24 n1 n 2 Assumptions: 1) sample sizes large enough? 2) samples independent and random? Conclusion: There is strong evidence to suggest that the average alkalinity of the water below the power plant is more than 50 mg/l higher than the mean alkalinity of the water above the power plant. For a 95% confidence interval estimate of the difference we have: T critical value for 95% and 45 df ≈ 2.02. So, (1.70)2 (1.83)2 (183.6 − 75.9) ± 2.02 + 24 24 = (100.67, 102.73) Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-27 We conclude that the mean alkalinity below the power plant is between 100.7 and 103 mg/l higher than the mean alkalinity of the water above the power plant! C) Comparing Two Population Means Using Paired Samples Consider the following experiments: 1. In order to determine if two IQ tests yield similar results (means and standard deviations), the researcher selected 50 college students at random to take both tests. The order in which any given student took the tests was randomized and the tests were taken 1 month apart to minimize crossover effects. The hypothesis is that test # 1 is biased in that it yields a higher average score than test #2 which has been in use for many years. Note the experimental design here as well as the hypotheses being tested. We can’t use the independent samples test for this case. Hypotheses: Ho: µ1 − µ 2 = 0 HA: µ1 − µ 2 > 0 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-28 2. A swine nutritionist wished to compare a nitrogen poor + enzyme diet (#1) to a nitrogen rich diet (#2) for pigs. Rather than take one piglet from each new litter and assign it a diet at random, he chose instead to take 2 piglets from each litter and randomly assign one pig to one diet and the other to the other diet. The hypothesis is that the nitrogen rich diet results in a lower average weight gain than the nitrogen poor + enzyme diet. Hypotheses: Ho: µ1 − µ2 = 0 HA: µ1 − µ2 > 0 3. A researcher is interested in the effect of oxygen exposure on cell fluidity in pulmonary artery cells in dogs. She intends to collects cells from ten dogs for the experiment. For each dog, two agar plates of artery cells are prepared and each plate is randomly assigned to either receive O2 or not receive O2 treatment. She wishes to test the hypothesis that the mean fluidity for oxygen treated cells (2) differs from the mean for untreated cells (1). Hypotheses: Ho: µ1 − µ 2 = 0 HA: µ1 − µ 2 ≠ 0 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-29 In all three cases, the samples are NOT independent of each other. In fact, they are deliberately dependent. One reason for this is that the estimator of the difference between two means based on 2 independent samples has a large standard deviation (recall that it is the square root of the SUM of two variances). When samples are paired as is done here, the standard deviation of the estimator of µ1 − µ 2 used for a paired experiment is often smaller. Defn: A PAIRED or “BLOCKED” experiment is one in which for each randomly selected experimental unit in the first sample there is a deliberately selected unit in the second sample. The units in the second sample are chosen so that they have characteristics similar to the unit in the first sample to which they have been paired. The characteristics used for pairing are usually those that likely have an effect on the response variable being studied in the experiment. It is this last statement that often leads to the standard deviation being smaller in paired experiments. Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-30 Example #1. Perfect pairing since each experimental unit in sample 1 is also used in sample 2. Intuitively, comparing how several people react to each test is more informative and accurate than comparing results for independently chosen people for each test. Î Look at the individual differences in scores, one for each person. Example #2. Genetics has a relatively large influence on adult size and growth in most animals. Hence it would not be surprising that two pigs from the same litter would respond to each of the two diets similarly in the sense that one would respond as the other would had it been on the first one’s diet as well. Hence, the two littermates are paired in this experiment. Î Look at the individual differences in growth between littermates. Example #3. Although the cells in each of the 2 treatments are not exactly the same, they are as close as possible, being from the same animal. Hence any effect due to animal variability is controlled Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-31 somewhat by using the same dogs for both treatments. Î Look at the difference in cell fluidity for each dog. For paired samples, the estimator of the difference µ1 − µ 2 is the average of the sample differences D . To obtain this difference: for each pair, calculate the difference in X under the two treatments. Call this difference D. EXAMPLE: Cell fluidity Dog without O2 With O2 (X1) (X2) 1 0.308 0.308 2 0.304 0.309 3 0.305 0.305 4 0.304 0.311 5 0.301 0.303 6 0.278 0.293 7 0.296 0.302 8 0.301 0.300 9 0.302 0.308 10 0.237 0.250 Mean 0.294 0.299 Std Dev 0.022 0.018 Difference D=(X1-X2) 0.000 -0.005 0.000 -0.007 -0.002 -0.015 -0.006 0.001 -0.006 -0.013 -0.0053 0.00542 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-32 Then, the data consist of the n differences. The average of the sample differences is D = 1 D ∑ n and the standard deviation is n ∑ (D − D ) 2 sD = 1 n −1 . Now, the differences can be regarded as a random sample from a population of differences if the experimental units (e.g. the ten dogs) can be regarded as a random selection from among all experimental units. In that case, we have Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-33 SAMPLING DISTRIBUTION of D : 1) the mean of the distribution is µ D = µ1 − µ 2 2) the standard deviation of the distribution is σD σD = where σ D is the standard deviation of the n population of differences from which we sampled n differences. 3) the shape of the distribution is approximately normal (a bell curve) if n is large or the population being sampled is approximately normally distributed. The estimator of µ D is D , the sample mean difference and the estimator of σ D is s D , the sample standard deviation of the differences. In that case, the problem reverts to a test of the mean µ D based on a single sample. Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-34 HYPOTHESIS TEST OF THE DIFFERENCE IN TWO POPULATION MEANS USING PAIRED SAMPLES: Null hypothesis: H0: µ1 − µ 2 = D 0 ( µ D = D 0 ) where D0 is the hypothesized difference in the means Alternative Hypothesis is one of three: a) HA: µ1 − µ 2 > D 0 ( µ D > D 0 ) b) HA: µ1 − µ 2 < D 0 ( µ D < D 0 ) c) HA: µ1 − µ 2 ≠ D 0 ( µ D ≠ D 0 ) Test Statistic: D − Do t = sD n P-value: depends on the alternative hypothesis: a) P-value = Pr( T > t) b) P-value = Pr( T < t) c) P-value = 2 Pr( T > |t|) Decision Rule: reject Ho if P-value ≤ α Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-35 Assumptions: 1. D is approximately normally distributed 2. the sampling was random and not more than 5% of the population. EXAMPLE So, let’s return to the dog fluidity study Hypotheses: Ho: µd = 0 HA: µd ≠ 0 Significance Level: we’ll choose α=0.025. Now, the numbers we need are: D = −0.0053 , s D = 0.00542 , n = 10 Test Statistic: t = D −0 − 0.00530 = = −3.0939 sd 0.00542 10 n df = n − 1 = 9. P-value: 2Pr(T>|t|) = 2Pr(T>+3.09) = 0.006 Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-36 Conclusion: P-value =0.006 << α=0.025. Hence we reject Ho and conclude that the data provide sufficient evidence at α=0.025 to indicate that oxygen treatment changes the mean fluidity of pulmonary artery cells in dogs. Assumptions: The sample size is small but it is likely that the population of differences are not too skewed. CONFIDENCE INTERVAL FOR THE DIFFERENCE OF TWO MEANS BASED ON A PAIRED SAMPLE: ⎛ sD ⎞ D ± ( t critical value)⎜ ⎟ ⎝ n⎠ where the t critical value is based on n-1 df and the desired confidence level. Assumptions: 1) sampling is random and 2) either the sample size is large so we can use the CLT or the original population has a frequency distribution that is bell-curve shaped. In our dog EXAMPLE: For a 95% confidence interval of the difference of two means we need Topic (14) – COMPARING TWO POPULATIONS (OR TREATMENTS) 14-37 the t critical value for 95% and 9 df. So, t = 2.26. Hence, the 95% C. I. of the difference between mean fluidity in cells with and without oxygen is ⎛ .00542 ⎞ − 0.0053 ± 2.26⎜ ⎟ = −0.0053 ± 0.0039 ⎝ 10 ⎠ = ( −0.0092, − 0.0014 ) Which implies that the mean fluidity in the cells without oxygen is below the mean fluidity for those that receive oxygen.