1.017/1.010 Class 16 Testing Hypotheses about a Single Population Formulating Hypothesis Testing Problems Hypotheses about a random variable x are often formulated in terms of its distributional properties. Example, if property is a: Null hypothesis H0: a = a0 Objective of hypothesis testing is to decide whether or not to reject this hypothesis. Decision is based on estimator aˆ of a: Reject H0: If observed estimate â lies in rejection region Ra0 ( aˆ ∈ Ra 0 ) Do not reject H0: Otherwise ( aˆ ∉ Ra 0 ) Select rejection region to obtain desired error properties: Test Result Do not reject H0 aˆ ∉ Ra 0 Reject H0 aˆ ∈ Ra 0 H0 true P(H0|H0) =1 - α P(~H0|H0) = α (Type I Error) H0 false P(H0|~H0) = β (Type II Error) P(~H0|~H0) = 1- β True situation Type I error probability α is called the test significance level. Deriving Hypothesis Rejection Regions for Large Sample Tests Hypothesis test is often based on a standardized statistic that depends on unknown true property and its estimate. Basic concepts are the same as used to derive confidence intervals (see Class 14). An example is the z statistic: z (aˆ , a) = aˆ - a SD[aˆ ] If the estimate is unbiased E[z] = 0 and Var[z] = 1. 1 Define a rejection region Rz0 in terms of z as: R z 0 : z (aˆ , a0 ) ≤ z L z (aˆ , a0 ) ≥ zU As rejection region grows Type I error increases and Type II error decreases (test is more likely to reject hypothesis). As rejection region shrinks Type I error decreases and Type II error increases (test is less likely to reject hypothesis) Usual practice is to select rejection region to insure that Type I error probability is equal to a specified value α. For a two-sided test require that Type I error probability is distributed equally between intervals below zL (probability = α/2) and above zU (probability = α/2). These probabilities are: P[ z (aˆ , a) ≤ z L | H0] = P[ z (aˆ , a0 ) ≤ z L ] = Fz ( z L ) = α 2 P[ z (aˆ , a) ≥ zU | H0] = P[ z (aˆ , a0 ) ≥ zU ] = 1 − Fz ( zU ) = α z L = Fz-1 2 α 2 α zU = Fz-1 1 − 2 For large samples z (aˆ , a0 ) has a unit normal distribution. Use the MATLAB function norminv to evaluate Fz-1 . If the definition of z is applied a two-sided rejection region Ra0 can also be written directly in terms of the estimate aˆ : α Ra 0 : aˆ ≤ a L = a0 + Fz-1 SD[aˆ ] 2 α aˆ ≥ aU = a0 + Fz-1 1 − SD[aˆ ] 2 p Values p value is largest significance level resulting in acceptance of H0. For a symmetric two-sided rejection region and a large sample: 2 aˆ − a0 p / 2 = 1 − Fz SD(aˆ ) aˆ − a0 p / 2 = Fz SD(aˆ ) aˆ ≥ a aˆ ≤ a0 0 For large samples use the MATLAB function normcdf to compute p from aˆ and SD[ â ]. Special Case -- Sample mean Consider hypothesis about value of population mean a = E[x]: H0: a = E[x] = a0 Base test on sample mean estimator mx. Obtain SD[mx] from sample standard deviation: SD[m x ] = SD[ x ] N ≈ sx N Example: Testing whether mean is significantly different from zero Suppose a0 = 0, sx = 3, N = 9, mx = 1.2 and α = .05: 0.05 3 Ra 0 : m x ≤ a L = 0 + Fz-1 = −1.96 2 9 0.05 3 m x ≥ aU = 0 + Fz-1 1 − = +1.96 2 9 In this case hypothesis is not rejected since mx = 1.2 does not lie in Ra0. The two-sided p-value is (see plot): m − a0 1 − p / 2 = Fz x = Fz s / N x p = 0.22 3 1.2 − 0 = Fz [1.2] = .89 3 / 9 1-p/2 z Copyright 2003 Massachusetts Institute of Technology Last modified Oct. 8, 2003 4