1.017/1.010 Class 17 Testing Hypotheses about Two Populations Tests of differences between two populations To test if two populations x and y are different we can compare specified distributional properties ax and ay (means, variances, 90 percentiles, etc.). Null hypothesis: H0: ax = ay = a0 or ax - ay = 0 The hypothesis test may be based on "natural" (unbiased and consistent) estimators of ax and ay, derived from the independent random samples x1, x2,..., xNx and y1, y2,...,yNy : aˆ x = aˆ x ( x1 , x 2 ,..., x Nx ) aˆ y = aˆ y ( x1 , x 2 ,..., x Ny ) We can derive a two-sided rejection region Ra0 written in terms of a standardized statistic z , following the same basic procedure as in the single population case (see Class 15): z (aˆ x , aˆ y , a x , a y ) = (aˆ x - aˆ y ) − (a x − a y ) SD[aˆ x - aˆ y ] z (aˆ x , aˆ y , a 0 , a 0 ) = (aˆ x - aˆ y ) SD[aˆ x - aˆ y ] α R z0 : z (aˆ x , aˆ y , a0 , a0 ) ≤ z L = Fz-1 ( ) 2 z (aˆ x , aˆ y , a0 , a0 ) ≥ zU = Fz-1 (1 - α 2 ) For large samples z (aˆ x , aˆ y , a0 , a0 ) has a unit normal distribution if H0 is true (ax = ay = a0). Use norminv to compute zL and zU from α. We can also define a rejection region Ra0 written in terms of the nonstandardized estimates: 1 α Ra 0 : aˆ x − aˆ y ≤ ∆a L = Fz-1 ( ) SD[aˆ x - aˆ y ] 2 aˆ x − aˆ y ≥ ∆aU = Fz-1 (1 - α 2 ) SD[aˆ x - aˆ y ] The two-sided p-value is obtained from: (aˆ x - aˆ y ) 1 − p / 2 = Fz [z ] = Fz aˆ x - aˆ y ≥ 0 SD(aˆ ) (aˆ x - aˆ y ) p / 2 = F z [z ] = F z aˆ x - aˆ y ≤ 0 SD(aˆ ) For large samples use normcdf to compute p from aˆ x - aˆ y . SD[aˆ ] Special Case: Large sample test of the difference between two means If the property of interest is the mean then: H0: ax = E[x] = ay = E[y] or E[x] - E[y] = 0 Natural estimator of E[x] - E[y] is mx- my. In large sample case mx - my is normal with mean and variance: E[mx - my] = E[x] - E[y] (unbiased) Var[(m x - m y ) = Var[m x ] + Var[m y ] = σ x2 Nx + σ 2y Ny (consistent) Construct a large sample test statistic z ~ N(0,1) : z= mx - m y σ x2 Nx + σ 2y Ny ≈ mx - m y s 2y s x2 + Nx Ny Two-sided rejection region written in terms of mx and my: 2 α Ra0 : m x − m y ≤ ∆a L = Fz-1 ( ) SD[m x - m y ] 2 m x − m y ≥ ∆aU = Fz-1 (1 - α 2 ) SD[m x - m y ] The two-sided p-value is obtained from: 2 −1 / 2 s2 s y 1 − p / 2 = Fz ( z ) = Fz (m x − m y ) x + mx ≥ m y N Ny x 2 −1 / 2 s2 s y p / 2 = Fz ( z ) = Fz (m x − m y ) x + mx ≤ m y N Ny x Example: Comparing crop yields with and without fertilizer application Consider two agricultural fields, one that is fertilized and one that is not. Yield samples (kg/ha) from the two fields are as follows: Fertilized (x): 66 41 77 80 52 98 99 74 81 78 Not fertilized (y): 65 88 55 124 66 72 96 71 Test the hypothesis H0: Mean yields are the same with and without fertilizer mx = my = sx = sy = z= p= Nx = Ny = Copyright 2003 Massachusetts Institute of Technology Last modified Oct. 8, 2003 3