1.017/1.010 Class 16 Testing Hypotheses about a Single Population

advertisement
1.017/1.010 Class 16
Testing Hypotheses about a Single Population
Formulating Hypothesis Testing Problems
Hypotheses about a random variable x are often formulated in terms of its
distributional properties. Example, if property is a:
Null hypothesis H0: a = a0
Objective of hypothesis testing is to decide whether or not to reject this
hypothesis. Decision is based on estimator aˆ of a:
Reject H0: If observed estimate â lies in rejection region
Ra0 ( aˆ ∈ Ra 0 )
Do not reject H0: Otherwise ( aˆ ∉ Ra 0 )
Select rejection region to obtain desired error properties:
Test Result
Do not reject H0
aˆ ∉ Ra 0
Reject H0
aˆ ∈ Ra 0
H0 true
P(H0|H0) =1 - α
P(~H0|H0) = α
(Type I Error)
H0 false
P(H0|~H0) = β
(Type II Error)
P(~H0|~H0) =
1- β
True situation
Type I error probability α is called the test significance level.
Deriving Hypothesis Rejection Regions for Large Sample Tests
Hypothesis test is often based on a standardized statistic that depends
on unknown true property and its estimate. Basic concepts are the same
as used to derive confidence intervals (see Class 14).
An example is the z statistic:
z (aˆ , a) =
aˆ - a
SD[aˆ ]
If the estimate is unbiased E[z] = 0 and Var[z] = 1.
1
Define a rejection region Rz0 in terms of z as:
R z 0 : z (aˆ , a0 ) ≤ z L
z (aˆ , a0 ) ≥ zU
As rejection region grows Type I error increases and Type II error
decreases (test is more likely to reject hypothesis).
As rejection region shrinks Type I error decreases and Type II error
increases (test is less likely to reject hypothesis)
Usual practice is to select rejection region to insure that Type I error
probability is equal to a specified value α.
For a two-sided test require that Type I error probability is distributed
equally between intervals below zL (probability = α/2) and above zU
(probability = α/2).
These probabilities are:
P[ z (aˆ , a) ≤ z L | H0] = P[ z (aˆ , a0 ) ≤ z L ] = Fz ( z L ) =
α
2
P[ z (aˆ , a) ≥ zU | H0] = P[ z (aˆ , a0 ) ≥ zU ] = 1 − Fz ( zU ) =
α 
z L = Fz-1  
2
α
2
 α
zU = Fz-1 1 − 
2

For large samples z (aˆ , a0 ) has a unit normal distribution. Use the
MATLAB function norminv to evaluate Fz-1 .
If the definition of z is applied a two-sided rejection region Ra0 can also be
written directly in terms of the estimate aˆ :
α 
Ra 0 : aˆ ≤ a L = a0 + Fz-1   SD[aˆ ]
2
 α
aˆ ≥ aU = a0 + Fz-1 1 −  SD[aˆ ]
2

p Values
p value is largest significance level resulting in acceptance of H0.
For a symmetric two-sided rejection region and a large sample:
2
 aˆ − a0 
p / 2 = 1 − Fz 

 SD(aˆ ) 
 aˆ − a0 
p / 2 = Fz 

 SD(aˆ ) 
aˆ ≥ a
aˆ ≤ a0 0
For large samples use the MATLAB function normcdf to
compute p from aˆ and SD[ â ].
Special Case -- Sample mean
Consider hypothesis about value of population mean a = E[x]:
H0: a = E[x] = a0
Base test on sample mean estimator mx. Obtain SD[mx] from sample
standard deviation:
SD[m x ] =
SD[ x ]
N
≈
sx
N
Example: Testing whether mean is significantly different from zero
Suppose a0 = 0, sx = 3, N = 9, mx = 1.2 and α = .05:
 0.05  3
Ra 0 : m x ≤ a L = 0 + Fz-1 
= −1.96

 2  9
 0.05  3
m x ≥ aU = 0 + Fz-1 1 −
= +1.96

2  9

In this case hypothesis is not rejected since mx = 1.2 does not lie in Ra0.
The two-sided p-value is (see plot):
 m − a0 
1 − p / 2 = Fz  x
 = Fz
s
/
N
 x

p = 0.22
3
1.2 − 0 

 = Fz [1.2] = .89
3
/
9


1-p/2
z
Copyright 2003 Massachusetts Institute of Technology
Last modified Oct. 8, 2003
4
Download