Outline 1 Descriptive Statistics: Box plots 2 Comparing Two Samples Paired samples Independent samples - equal variances Independent samples - unequal variances 3 Nonparametric methods for two samples Levene’s test Mann-Whitney test 4 Comparing two proportions With a z-test Confidence interval With a chis-square test 1 The sample median To compute the sample median: arrange the data in a list of ascending order take the middle value in the list ( ỹ = n+1 th 2 avg of value n th 2 and n 2 th +1 value if n is odd if n is even Height of red pine seedlings from the nursery: ỹ = 2 3 4 5 6 | | | | | cm. 36 144 23679 46 1 If n is even, the sample median is the average of the middle two values. Ex: median of 2,4,5,7,8,14 is: 2 Sample quantiles The median is a special quantile (or percentile): ỹ = y[.50] To compute the pth sample quantile y[p] (0 ≤ p ≤ 1): arrange data in a list of ascending order compute np If np is an integer, then y[p] is the average of (np)th and (np + 1)th data values in the list. If np is not an integer, then round up to [np] and use the [np]th data value in the list. Ex: height of seedlings, n = 13. The 0.20th sample quantile, or 20th percentile is y[0.20] = p = .1538: y[0.1538] = (note: np = 2) 3 Box plots y[0.50] = sample median y[0.25] = first quartile (Q1) y[0.75] = third quartile (Q3) Height of seedlings: y[0.25] = y[0.75] = and W&S and other texts provide a different definition: Q1 as the median of the first half (median excluded), Q3 as the median of the second half of the data. They are not equivalent definitions (check?). A box plot displays several quantiles simultaneously: 4 Sample range and IQR Sample range Sample range = largest obs − smallest obs, is a measure of spread/variability of the data set. Height of seedlings: sample range is: Interquartile range (IQR) Interquartile range = difference between the third quartile and the first quartile: IQR= Q3 − Q1 = y[0.75] − y[0.25] For the height of seedlings, the IQR is: 5 Box plot for production of organic cows Milk yield data IQR: 14 Q1 Min Median Q3 Max Range: 36 20 25 30 35 40 45 50 55 6 Box plots are useful to compare samples Fungus colonization from 3 samples: black mustards producing a large/low amount of sinigrin, and other plant species. ● % fungus colonization 12 10 8 6 4 2 heterospecific high sinigrin low sinigrin 7 R commands # > > > + + enter data: hi = c(4.43,2.18,6.64,4.41,3.7,4.79,3.38,8.37,2.94,6.92,7.24) lo = c(11.07,7.89,7.89,8.12,8.11,10.79,10.3,7.21,5.77,10.47,8.09) het = c(7,7.03,9.17,6.75,3.45,6.63,4.52,8.09,7.55,6.63,8.68,9.4, 5.22,9,3.96,5.46,7.75,9.11,9.01,7.46,6.41,5.29,3.65,8.29,9.55, 4.3,6.45,6.46,4.83,7.98,5.64,6.85,13.41) > summary(hi) Min. 1st Qu. 2.18 3.54 > IQR(hi) [1] 3.24 Median 4.43 # summary stats, high sinigrin sample Mean 3rd Qu. Max. 5.00 6.78 8.37 # IQR > boxplot(hi) > boxplot(hi,lo,het) # boxplot, high sinigrin only # boxplot, all 3 samples side-by-side # same but group names added: > boxplot(hi,lo,het, names=c("High sinigrin","Low","Heterospecific")) # > > > if data is read from a file: dat = read.csv("blackmustard.csv", header=TRUE) dat boxplot(fungus_percent ˜ community, data=dat) 8 Outline 1 Descriptive Statistics: Box plots 2 Comparing Two Samples Paired samples Independent samples - equal variances Independent samples - unequal variances 3 Nonparametric methods for two samples Levene’s test Mann-Whitney test 4 Comparing two proportions With a z-test Confidence interval With a chis-square test 9 Paired vs. Independent samples Treatments: A and B. Paired samples: each observation on trt A is naturally paired with an observation on trt B. Related or same experimental units are used for both treatments. Independent samples: no direct relationship between an observation on trt A and an observation on trt B. Choice of paired versus independent sample is an important design issue. Data analysis follows the design. 10 Examples Two-sample comparisons are very common. Examples: 1 Compare fungus colonization in low/high sinigrin mustard 2 Compare taste of cheese from cows on two different diets (organic in the open vs. non-organic, hay/pellets) 3 Compare cholesterol level of patients before and after a drug treatment 4 Baby weight at birth among smoking/non-smoking women 11 When, why should samples be paired? Cholesterol example: 1 Cholesterol level of 10 patients before and after a drug treatment. 2 Cholesterol level of 10 patients before treatment and of another 10 patients after treatment. Baby weight example: pairing women according to certain traits. Effective only if it controls variability. Paired sample studies usually preferred, because of increased precision (i.e. reduced variability) in estimating treatment differences. If 3 or more treatments, blocking replaces pairing. 12 Paired samples - Blood pressure example Question of interest: is there any evidence that a particular drug has an effect on blood pressure? Experiment: on 15 middle-aged male hypertension patients. For each patient, blood pressure is measured at time of enrollment and again after 6 months of the drug treatment. 13 Blood pressure (mm Hg) Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Before (Y1 ) 90 100 92 96 96 96 92 98 102 94 94 102 94 88 104 After (Y2 ) 88 92 82 90 78 86 88 72 84 102 94 70 94 92 94 Difference (D = Y1 − Y2 ) 2 8 10 6 18 10 4 26 18 -8 0 32 0 -4 10 14 Blood pressure example µ1 = population mean blood pressure before the drug trt µ2 = population mean blood pressure after the drug trt µD = µ1 − µ2 = difference between the two mean blood pressure levels. Paired samples Testing µ1 = µ2 or µ1 6= µ2 is equivalent to testing H0 : µD = 0 vs HA : µD 6= 0. A one-sample t-test can be used on the differences. The difference D = Y1 − Y2 is the blood pressure difference. We assume we have a random sample D1 , D2 , . . . , D15 of size n = 15 from N (µD , σD2 ). 15 Blood pressure example Under H0 : µD = 0, the test statistic T = D̄ − 0 √ has a SD / n t-distribution on df = We observed d̄ = 8.80 mm Hg, sd = 10.98. The observed t-value is = 3.10 t= on df = The p-value is 2IP{T ≥ 3.10}, which is between 0.002 and 0.01 from Table C. There is strong evidence against H0 : the drug is deemed ... 16 Confidence interval from paired samples A (1 − α) CI for the difference of means µ1 − µ2 = µD is s s d̄ − tn−1,α/2 √d ≤ µD ≤ d̄ + tn−1,α/2 √d n n Blood pressure: ≤ µD ≤ We are 95% confident that the population mean decrease in blood pressure after 6 months of treatment lies between 2.72 and 14.88 mm Hg (or 8.80 ± 6.08). Remarks If a difference of 5 mm Hg is needed for biological significance, we could test H0 : µD = 5 vs. HA : µD 6= 5. Assumptions: random sample, and D values have normal distribution. Check with a normal probability plot No normality assumption about Y1 , or about Y2 . Usually not independent due to pairing. 17 R commands > > > > > > # first enter the data bpbefore =c(90,100,92,96,96,96,...,102,94,88,104) bpafter = c(88, 92,82,90,78,86,..., 70,94,92, 94) # Now do the paired t-test and 95% CI t.test( bpbefore - bpafter ) One Sample t-test data: bpbefore - bpafter t = 3.1054, df = 14, p-value = 0.00775 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 2.722083 14.877917 sample estimates: mean of x 8.8 18 R commands > # or better use of the function t.test: > > t.test(bpbefore, bpafter, paired=TRUE) Paired t-test data: bpbefore and bpafter t = 3.1054, df = 14, p-value = 0.00775 alternative hypothesis: true difference in means is not 95 percent confidence interval: equal to 0 2.722083 14.877917 sample estimates: mean of the differences 8.8 19 Independent samples Compare mycorrhizal colonization in soil from high-sinigrin black mustard communities (11 rep) and low-sinigrin black mustard communities (11 rep). Question of interest: is there evidence of an effect of black mustard (high/low sinigrin) on fungus colonization? 20 Mycorrhizal colonization example Data: mycorrhizal colonization (% of root section), in high-sinigrin (hi) and low-sinigrin (lo) communities. community 1 2 3 4 5 6 7 8 9 10 11 hi (Y1 ) 4.43 2.18 6.64 4.41 3.70 4.79 3.38 8.37 2.94 6.92 7.24 lo (Y2 ) 11.07 7.89 7.89 8.12 8.11 10.79 10.30 7.21 5.77 10.47 8.09 No pairing: observations can be permuted within each trt (column). high sinigrin low sinigrin 2 3 4 5 6 7 8 9 % fungus colonization 10 11 ȳ1 = 5, s1 = 2.0 ȳ2 = 8.7, s2 = 1.7 21 Mycorrhizal colonization Let µ1 = the population mean fungus colonization in communities assigned to high-sinigrin black mustard, µ2 = the population mean fungus colonization with low-sinigrin. We want to test H0 : µ1 = µ2 versus HA : µ1 6= µ2 . µ1 = µ2 means µ1 − µ2 = 0. Main idea: use Ȳ1 − Ȳ2 , because IE Ȳ1 − Ȳ2 = µ1 − µ2 . If Ȳ1 − Ȳ2 is close to 0, we will if Ȳ1 − Ȳ2 is far from 0, we will Here ȳ1 − ȳ2 = 5 − 8.7 = −3.7 (% root section difference). 22 Assumptions 1 2 3 Two independent random samples Y1 and Y2 . The first sample Y11 , Y12 , . . . , Y1n1 is from N (µ1 , σ12 ), second sample Y21 , Y22 , . . . , Y2n2 is from N (µ2 , σ22 ). The variances are the same σ12 = σ22 = σ 2 . Essentially, three assumptions: independence (within a trt and between two trts), normality, and equal variance. No need to have equal sample size. 23 Distribution of Ȳ1 − Ȳ2 Under these assumptions Ȳ1 ∼ N and Ȳ2 ∼ N If H0 is true, Ȳ1 − Ȳ2 has a normal distribution with mean: IE Ȳ1 − Ȳ2 = µ1 − µ2 = 0 variance: var(Ȳ1 −Ȳ2 ) = =σ 2 1 1 + n1 n2 But... how do we know σ 2 ? 24 S12 estimates σ 2 and so does S22 . A pooled estimate of σ 2 is Pooled estimated of σ 2 Sp2 = (n1 − 1)S12 + (n2 − 1)S22 sum of all deviations2 = n1 + n2 − 2 n1 + n2 − 2 weighted average of S12 and S22 , weighted by the df’s. Now we can estimate σȲ1 −Ȳ2 by Standard error of Ȳ1 − Ȳ2 s SȲ1 −Ȳ2 = Sp 1 1 + n1 n2 With equal sample sizes n1 = n2 = n we get r Sp2 = (S12 + S22 )/2 and SȲ1 −Ȳ2 = Sp 2 n 25 The two-sample t-test 1 2 Hypotheses: H0 : µ1 = µ2 versus HA : µ1 6= µ2 Test statistic: T = Ȳ1 − Ȳ2 − 0 SȲ1 −Ȳ2 If H0 is really true then T ∼ t-distribution with df= n1 + n2 − 2 3 Get the data and p-value: ȳ1 − ȳ2 = 5 − 8.7 % root section, s1 = 2.0 and s2 = 1.7, with n1 = n2 = 11 Pooled estimate of σ: Standard error of Ȳ1 − Ȳ2 : t-value: 4 sp = = 1.856 t= = .791 = 4.67 , p-value: 2IP{T20 > 4.67} < .001 Finally, df= We 2 accept 2 reject H0 . Or: There is evidence that fungus colonization is affected by the type of black mustard community. 26 Confidence intervals Independent samples A (1 − α) confidence interval for µ1 − µ2 is s 1 1 ȳ1 − ȳ2 ± t ∗ sp + n1 n2 where the multiplier t = tn1 +n2 −2,α/2 is determined by the t-distribution with n1 + n2 − 2 df. fungus colonization: a 99% CI for µ1 − µ2 is (using t = 2.845) which is −3.7 ± 2.25 or [−5.95, −1.45] % of root sections. The test and the CI are consistent: 27 R commands > > > > > > # enter the data: lo = c(11.07,7.89,7.89,8.12,...,7.21,5.77,10.47,8.09) hi = c( 4.43,2.18,6.64,4.41,...,8.37,2.94, 6.92,7.24) # do the test: t.test(hi, lo, var.equal=T, conf.level=.99) Two Sample t-test data: hi and lo t = -4.6786, df = 20, p-value = 0.0001444 alternative hypothesis: true difference in means is not 99 percent confidence interval: -5.951651 -1.450167 sample estimates: mean of x mean of y 5.000000 8.700909 28 Another example Compare fungus colonization in high-sinigrin and in heterospecific communities (mixed species, no black mustard). ȳhet = 7.0, shet = 2.1, nhet = 33 ȳhi = 5.0, shi = 2.0, nhi = 11 ȳlo = 8.7, slo = 1.7, nlo = 11. ● % fungus colonization 12 10 Test µhi = µhet . 8 6 4 2 heterospecific high sinigrin low sinigrin 29 T-test with unequal variances Some soaps are labelled “antibacterial” soaps, but one might expect ordinary soap also to kill bacteria. Experiment: prepare solution from regular soap and solution of sterile water. Solutions were placed onto petri dishes along with E. coli bacteria, incubated for 24h. Question: is regular soap solution preventing growth of bacteria compared to sterile water? Control 30 36 66 21 63 38 35 45 Soap 76 27 16 30 26 46 6 Data: # bacteria colonies on each dish n mean sd Control 8 41.75 15.6 Soap 7 32.43 22.8 No pairing. We refuse to assume σ1 6= σ2 . 30 ȳ1 − ȳ2 = 9.32, but how big could that be by chance alone? Test of H0 : µ1 = µ2 with unequal variances s S12 S22 Ȳ1 − Ȳ2 with SȲ1 −Ȳ2 = + . n1 n2 SȲ1 −Ȳ2 The p-value is obtained by comparing the value of T with a t-distribution with adjusted degree of freedom Test statistic: T = df = (v1 + v2 )2 v21 n1 −1 + v22 n2 −1 where v1 = S12 /n1 and v2 = S22 /n2 . df will always be at most n1 + n2 − 1 and always be ≥ the minimum of n1 − 1 and n2 − 1. df will not necessarily be an integer. 31 Soap experiment ȳ1 − ȳ2 = 9.32 more colonies in water than in soap. We get v1 = sȳ1 −ȳ2 = = 30.42, v2 = = 74.26, then = 10.25 colonies and df= 10.4. t= We use Table C with df= a two-sided test. = 0.909 and get p-value > 0.20 with There is no evidence of antibiotic effect in soap (p> .2) from this experiment. By comparison: the t-test assuming equal variances yields sȳ1 −ȳ2 = 9.98 colonies, t = 0.933, which would be compared to the t-distribution on df= . Same conclusion. 32 Confidence interval for µ1 − µ2 with unequal variances A (1 − α) confidence interval for µ1 − µ2 is ȳ1 − ȳ2 ± t ∗ sȳ1 −ȳ2 where t = tdf,α/2 and df is the adjusted degree of freedom, and s s12 s2 + 2. sȳ1 −ȳ2 = n1 n2 In the soap example, for 90% confidence we get t= = 1.812 and interval: i.e. [-9.2, 27.9] more colonies on average in water than in soap. 33 Which t-test should I use? Assumptions (t-test not assuming equal variances) 1 Independence, within and among samples, 2 Each sample comes from a Normal distribution or is large enough. If software allows, use the t-test not assuming equal variances by default. If variances turn out to be significantly different (from Levene’s test): find the right software! On exams, use t-test assuming equal variances, and test this assumption. Unless indicated otherwise. 34 Outline 1 Descriptive Statistics: Box plots 2 Comparing Two Samples Paired samples Independent samples - equal variances Independent samples - unequal variances 3 Nonparametric methods for two samples Levene’s test Mann-Whitney test 4 Comparing two proportions With a z-test Confidence interval With a chis-square test 35 Assessing assumptions Assumptions of the independent two-sample t-test? The t-test assuming equal variances is robust against nonnormality, but sensitive to dependence. moderately robust against unequal variance (σ12 6= σ22 ) if n1 ≈ n2 , but much less robust if n1 and n2 are quite different (e.g. differ by a ratio of 3 or more). How to determine whether the equal variance assumption is appropriate? Chi-square test to compare σ12 and σ22 using S12 and S22 , but avoid it: very sensitive to nonnormality. Levene’s test: nonparametric test for comparing two variances. Does not assume normality, still assumes independence. 36 Levene’s test Consider two independent samples Y1 and Y2 : Sample 1: 4, 8, 10, 23 Sample 2: 1, 2, 4, 4, 7 Test H0 : σ12 = σ22 vs. HA : σ12 6= σ22 . Sample variances: s12 = 67.58, s22 = 5.30. Main idea of Levene’s test: turn testing for equal variances using the original data into testing for equal means using modified data. Suppose normality and independence, if Levene’s test gives a small p-value (< 0.01), then we use the approximate test for H0 : µ1 = µ2 vs. HA : µ1 6= µ2 that does not require equal variances. 37 Levene’s test 1 Find the median for each sample. Here ỹ1 = 9, ỹ2 = 4. 2 Subtract the median from each obs Sample 1: -5, -1, 1, 14 Sample 2: -3, -2, 0, 0, 3 3 Take absolute values: we get deviations (not the usual ones) with positive signs. Sample 1*: 5, 1, 1, 14 Sample 2*: 3, 2, 0, 0, 3 4 For any sample with odd sample size, remove 1 zero. Sample 1*: 5, 1, 1, 14 Sample 2*: 3, 2, 0, 3 5 Perform an independent two-sample t-test on the modified samples. 38 Levene’s test Here ȳ1∗ = 5.25, ȳ2∗ = 2, s12∗ = 37.58, s22∗ = 2.00. We get sp2 = = 19.79, sp = 4.45 and t= on df= = 1.03 . The p-value 2 P(T6 ≥ 1.03) is > 0.20. Do not reject H0 . Going back to the original samples, could we use the t-test that assumes equal variances? 39 R commands There is no predefined function to do Levene’s test in R, but we can just copy and paste a function available from the course website. > y1 = c(4,8,10,23); y2 = c(1,2,4,4,7) > levene.test(y1, y2) Two Sample t-test data: levene.trans(data1) and levene.trans(data2) t = 1.0331, df = 6, p-value = 0.3414 alternative hypothesis: true difference in means is not 95 percent confidence interval: -4.447408 10.947408 sample estimates: mean of x mean of y 5.25 2.00 40 Mann-Whitney test (aka Wilcoxon rank sum test) What if one small sample (or both) are not normally distributed? Mann-Whitney: Non-parametric test for two independent samples. Analogous test exists for paired samples (signed rank test). No distribution assumption, but still assume independence. Main idea: look at the ranks of the observations Consider two independent samples Y1 and Y2 : Sample 1: 11, 22, 14, 21 Sample 2: 20, 9, 12, 10 Test H0 : µ1 = µ2 vs. HA : µ1 6= µ2 . 41 Mann-Whitney test 1 2 Rank the observations: rank obs sample 1 2 3 4 5 6 7 8 9 10 11 12 14 20 21 22 2 2 1 2 1 2 1 1 Compute the sum of ranks for each sample. Here RS1 = = 23 RS2 = = 13 keep the smallest T = 13. 3 Under H0 the means are equal, the rank sums should be ∼ equal, so the smallest should not be too small. How small just by chance? p-value=IP{T ≤ 13}=2IP{RS2 ≤ 13} To compute it, we list all possible orderings and get the rank sum of each possibility. IP{RS2 ≤ 13} = so p-value = 0.2. 42 Mann-Whitney test Rankings with RS2 ≤ 13: rank sample number 1 2 3 4 5 6 7 8 2 2 2 2 1 1 1 1 2 2 2 1 2 1 1 1 2 2 2 1 1 2 1 1 2 2 2 1 1 1 2 1 2 2 1 2 2 1 1 1 2 2 1 2 1 2 1 1 2 1 2 2 2 1 1 1 RS2 10 11 12 13 12 13 13 Total numbers of rankings: 43 Mann-Whitney test Had we observed T = 10, p-value = 2 ∗ 1/70 = 0.0286. Had we observed T = 11, p-value = 2 ∗ 2/70 = 0.0571. With this sample size, we can only reject at 5% if the observed rank sum is 10, i.e. all values in one sample are ... A table (see course webpage) gives the cut-off values for different sample sizes. For n1 = n2 = 4 and α = 0.05, we can only reject H0 if the observed rank sum is 10. 44 R command: wilcox.test > samp1 = c(11, 22, 14, 21) > samp2 = c(20, 9, 12, 10) > > wilcox.test(samp2, samp1) Wilcoxon rank sum test data: samp2 and samp1 W = 3, p-value = 0.2 alternative hypothesis: true mu is not equal to 0 45 Mann-Whitney test with unequal sample sizes Recorded below are the longevity of two breeds of dogs. Breed A Breed B obs 12.4 15.9 11.7 14.3 10.6 8.1 13.2 16.6 19.3 15.1 rank obs 11.6 9.7 8.8 14.3 9.8 7.7 rank n2 = 10 104.5 n1 = 6 T ∗ = 31.5 46 Mann-Whitney test n1 = smaller sample size, n2 = larger sample size. T ∗ = sum of ranks in the smaller group. Let T ∗∗ = n1 (n1 + n2 + 1) − T ∗ = 6 × 17 − 31.5 = 70.5 and T = min(T ∗ , T ∗∗ ) = 31.5 The p-value is 2IP{T ≤ 31.5|H0 } =? Look up the table: t = 31.5 is between 27 and 32, the p-value is between 0.01 and 0.05. Reject H0 at 5%. Remarks If there are ties, the table gives approximation only. The test does not work well if the variances are very different. 47 R command: wilcox.test > breedA = c(12.4, 15.9, 11.7, 14.3, 10.6, 8.1, 13.2, + 16.6, 19.3, 15.1) > breedB = c(11.6, 9.7, 8.8, 14.3, 9.8, 7.7) > wilcox.test(breedB, breedA) Wilcoxon rank sum test with continuity correction data: breedB and breedA W = 10.5, p-value = 0.03917 alternative hypothesis: true mu is not equal to 0 Warning message: Cannot compute exact p-value with ties in: wilcox.test.default(breedB, breedA) 48 Outline 1 Descriptive Statistics: Box plots 2 Comparing Two Samples Paired samples Independent samples - equal variances Independent samples - unequal variances 3 Nonparametric methods for two samples Levene’s test Mann-Whitney test 4 Comparing two proportions With a z-test Confidence interval With a chis-square test 49 Comparing Two Proportions Association genotype - phenotype: cross 2 inbred lines of mice, one lean, one naturally obese. Backcross with the lean parent: F2 mice. genotype at a given locus: either LO or LL. Phenotype: either lean or obese. pLL = p1 probability of obese phenotype among F2 backcrosses with genotype LL at the locus, pLO = p2 probability of obese phenotype among F2 backcrosses with genotype LO at the locus. 50 Test for comparing two proportions We want to test H0 : p1 = p2 versus HA : p1 6= p2 and the data will be Y1 ∼ B(n1 , p1 ) and Y2 ∼ B(n2 , p2 ) independent. Use the difference in sample proportions p̂1 − p̂2 = Y1 Y2 − . n1 n2 IE(p̂1 − p̂2 ) = p1 − p2 and var(p̂1 − p̂2 ) = Under H0 : p1 = p2 = p, we have IE(p̂1 − p̂2 ) = 0 and var(p̂1 − p̂2 ) = p(1 − p)(1/n1 + 1/n2 ), so that Z = ≈ N (0, 1) But p is still unknown. We estimate it by 51 Test for comparing two proportions Test statistic Y1 + Y2 p̂1 − p̂2 − 0 using p̂ = Z =p n1 + n2 p̂(1 − p̂)(1/n1 + 1/n2 ) Z is approximately N (0, 1) under H0 . Data: nLO = 105, YLO = 71 mice with genotype LO are obese, nLL = 87, YLL = 45 with genotype LL are obese. 52 Phenotype-genotype association Here p̂LO = = 0.676, p̂LL = estimate of obesity rate is = 0.517, the pooled p̂ = = 0.604 √ = 0.0056 = 0.075 SEpLO −pLL = Observed test statistic is z= = 2.24 We compare z = 2.24 to N (0, 1): the p-value is 2 IP{Z ≥ 2.24} = 0.025. Reject H0 at the 5% level, or evidence against H0 . 53 Confidence interval Recall var(p̂1 − p̂2 ) = Here we don’t assume p1 = p2 , we just plug in p̂1 and p̂2 . Confidence interval for p1 − p2 , with (1 − α) confidence s p̂1 − p̂2 ± zα/2 p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) + n1 n2 Obesity rate: 95% confidence interval for pLO − pLL : i.e. 0.159 ± 0.138 or [0.021, 0.297]. We are 95% confident that genotype LO is associated with an increase of obesity rate (compared to LL) by a value between 2.1 and 29.7 percentage points. 54 Comparing two proportions - sample size requirement Need to check that the normal approximation works well: For the confidence interval for p1 − p2 , check that n1 p̂1 ≥ 5, n1 q̂1 ≥ 5, n2 p̂2 ≥ 5 and n2 q̂2 ≥ 5. That’s easy... In testing H0 : p1 = p2 , check that n1 p̂ ≥ 5, n1 q̂ ≥ 5, n2 p̂ ≥ 5 and n2 q̂ ≥ 5. 55 R command: prop.test > prop.test(c(71, 45), c(105, 87), correct=F) 2-sample test for equality of proportions without continuity correction data: c(71, 45) out of c(105, 87) X-squared = 5.0264, df = 1, p-value = 0.02496 alternative hypothesis: two.sided 95 percent confidence interval: 0.02097751 0.29692068 sample estimates: prop 1 prop 2 0.6761905 0.5172414 56 R command: prop.test > prop.test(c(71, 45), c(105, 87)) 2-sample test for equality of proportions with continuity correction data: c(71, 45) out of c(105, 87) X-squared = 4.3837, df = 1, p-value = 0.03628 alternative hypothesis: two.sided 95 percent confidence interval: 0.01046848 0.30742971 sample estimates: prop 1 prop 2 0.6761905 0.5172414 57 Chi-square test of independence/association Are phenotype and genotype at a given locus independent? associated? LO LL total 71 45 116 lean 34 42 76 total 105 87 192 obese He want to test H0 : pLO = pLL against HA : pLO 6= pLL . Equivalently: H0 : genotype and obesity phenotype are independent. HA : genotype at the locus and phenotype are not independent: one genotype tends to be associated with one phenotype. 58 Chi-square test of independence/association 1 Build table of expected counts under H0 . If H0 is true, pLO = pLL , but we don’t know this value. Our best guess: p̂ = total # successes = total # trials somewhere between p̂LO = = .60 = .68 and p̂LL = = .52. Expected # obese mice with LO genotype: 116 105 ∗ p̂ = 105 ∗ 192 = 63.44. In general, E= Row total * Column total Grand total 59 Chi-square test of independence/association Observed counts: LO Expected counts when genotype and phenotype are independent: LO LL total LL total 71 45 116 obese 63.44 52.56 116 lean 34 42 76 lean 41.56 34.44 76 total 105 87 192 total 105 192 obese 87 Calculate the test statistic X 2 , distance to independence: X (obs − exp)2 X2 = = exp 2 all cells = 5.026 60 Test of independence 3 p-value: If there is independence (phenotype does not depend on genotype) then X 2 has a χ2 distribution with df= 1 here. Using Table B: .02 < p < .05. 4 Conclusion: There is moderate evidence that the genotypes have different obesity rates. Furthermore, in the data we have p̂1 = .68 > p̂2 = .52. There is moderate evidence that LO has a higher obesity rate. Same conclusion as a z test for testing pLL = pLO . We had z = 2.25. Here X 2 = 5.026 = z 2 and exact same p-value. 61 Using R: chisq.test() > mice = matrix( c(71,34,45,42), 2,2) > mice [,1] [,2] [1,] 71 45 [2,] 34 42 > chisq.test(mice) Pearson’s Chi-squared test with Yates’ continuity correction data: mice X-squared = 4.3837, df = 1, p-value = 0.03628 > chisq.test(mice, correct=FALSE) Pearson’s Chi-squared test data: mice X-squared = 5.0264, df = 1, p-value = 0.02496 62