# 1.017/1.010 Class 17 Testing Hypotheses about Two Populations

```1.017/1.010 Class 17
Tests of differences between two populations
To test if two populations x and y are different we can compare specified
distributional properties ax and ay (means, variances, 90 percentiles, etc.).
Null hypothesis:
H0: ax = ay = a0 or ax - ay = 0
The hypothesis test may be based on &quot;natural&quot; (unbiased and consistent)
estimators of ax and ay, derived from the independent random samples x1,
x2,..., xNx and y1, y2,...,yNy :
aˆ x = aˆ x ( x1 , x 2 ,..., x Nx )
aˆ y = aˆ y ( x1 , x 2 ,..., x Ny )
We can derive a two-sided rejection region Ra0 written in terms of a
standardized statistic z , following the same basic procedure as in the
single population case (see Class 15):
z (aˆ x , aˆ y , a x , a y ) =
(aˆ x - aˆ y ) − (a x − a y )
SD[aˆ x - aˆ y ]
z (aˆ x , aˆ y , a 0 , a 0 ) =
(aˆ x - aˆ y )
SD[aˆ x - aˆ y ]
α
R z0 : z (aˆ x , aˆ y , a0 , a0 ) ≤ z L = Fz-1 ( )
2
z (aˆ x , aˆ y , a0 , a0 ) ≥ zU = Fz-1 (1 -
α
2
)
For large samples z (aˆ x , aˆ y , a0 , a0 ) has a unit normal distribution if H0 is
true (ax = ay = a0). Use norminv to compute zL and zU from α.
We can also define a rejection region Ra0 written in terms of the
nonstandardized estimates:
1
α
Ra 0 : aˆ x − aˆ y ≤ ∆a L = Fz-1 ( ) SD[aˆ x - aˆ y ]
2
aˆ x − aˆ y ≥ ∆aU = Fz-1 (1 -
α
2
) SD[aˆ x - aˆ y ]
The two-sided p-value is obtained from:
 (aˆ x - aˆ y ) 
1 − p / 2 = Fz [z ] = Fz 
 aˆ x - aˆ y ≥ 0
 SD(aˆ ) 
 (aˆ x - aˆ y ) 
p / 2 = F z [z ] = F z 
 aˆ x - aˆ y ≤ 0
 SD(aˆ ) 
For large samples use normcdf to compute p from
aˆ x - aˆ y
.
SD[aˆ ]
Special Case: Large sample test of the difference between two means
If the property of interest is the mean then:
H0: ax = E[x] = ay = E[y] or E[x] - E[y] = 0
Natural estimator of E[x] - E[y] is mx- my.
In large sample case mx - my is normal with mean and variance:
E[mx - my] = E[x] - E[y]
(unbiased)
Var[(m x - m y ) = Var[m x ] + Var[m y ] =
σ x2
Nx
+
σ 2y
Ny
(consistent)
Construct a large sample test statistic z ~ N(0,1) :
z=
mx - m y
σ x2
Nx
+
σ 2y
Ny
≈
mx - m y
s 2y
s x2
+
Nx Ny
Two-sided rejection region written in terms of mx and my:
2
α
Ra0 : m x − m y ≤ ∆a L = Fz-1 ( ) SD[m x - m y ]
2
m x − m y ≥ ∆aU = Fz-1 (1 -
α
2
) SD[m x - m y ]
The two-sided p-value is obtained from:

2  −1 / 2 
 s2
s
y 


1 − p / 2 = Fz ( z ) = Fz (m x − m y ) x +
 mx ≥ m y
N

Ny
x





2  −1 / 2 
 s2
s
y 


p / 2 = Fz ( z ) = Fz (m x − m y ) x +
 mx ≤ m y
N
Ny 
x




Example: Comparing crop yields with and without fertilizer application
Consider two agricultural fields, one that is fertilized and one that is not. Yield
samples (kg/ha) from the two fields are as follows:
Fertilized (x):
66 41 77 80 52
98 99 74 81 78
Not fertilized (y): 65 88 55 124 66 72 96 71
Test the hypothesis H0: Mean yields are the same with and without fertilizer
mx =
my =
sx =
sy =
z=
p=
Nx =
Ny =
Copyright 2003 Massachusetts Institute of Technology