ELE530: Neyman-Pearson and Composite Hypothesis Testing Fernando Pérez-Cruz Princeton University and University Carlos III in Madrid February 16th , 2009 Theory of Detection and Estimation Neyman-Pearson Composite Hypothesis Testing I Simple hypothesis testing: • Binary: decide whether y = 0 or y = 1, given an observation. • Multiple: decide whether y = 1, y = 2, . . . , y = M . Pérez-Cruz 1 Theory of Detection and Estimation Neyman-Pearson Composite Hypothesis Testing I Simple hypothesis testing: • Binary: decide whether y = 0 or y = 1, given an observation. • Multiple: decide whether y = 1, y = 2, . . . , y = M . I Composite binary hypothesis testing: • Decide whether y = c or y 6= c. • Decide whether y ≥ c or y < c. • Generally, decide whether y ∈ Y0 or y ∈ / Y0 . Pérez-Cruz 2 Theory of Detection and Estimation Neyman-Pearson Composite Hypothesis Testing I Simple hypothesis testing: • Binary: decide whether y = 0 or y = 1, given an observation. • Multiple: decide whether y = 1, y = 2, . . . , y = M . I Composite binary hypothesis testing: • Decide whether y = c or y 6= c. • Decide whether y ≥ c or y < c. • Generally, decide whether y ∈ Y0 or y ∈ / Y0 . I Composite multiple hypothesis testing: • Decide whether y ∈ Y0, y ∈ Y1, y ∈ Y2, . . . and Pérez-Cruz SM i=1 Yi = Y. 3 Theory of Detection and Estimation Neyman-Pearson Binary Hypothesis Testing I Null hypothesis: Alternative hypothesis: H0 : y ∈ Y 0 H1 : y ∈ / Y0 I Define the rejection region: X1 = {x|T (x) > c} I We reject the null hypothesis if x ∈ X1. I We do not reject the null hypothesis if x ∈ / X1 . H0 True Retain Null Reject Null Pérez-Cruz √ Type I Error H1 True Type II Error √ 4 Theory of Detection and Estimation Neyman-Pearson Binary Hypothesis Testing I Null hypothesis: Alternative hypothesis: H0 : y ∈ Y 0 H1 : y ∈ / Y0 I Define the rejection region: X1 = {x|T (x) > c} I We reject the null hypothesis if x ∈ X1. I We do not reject the null hypothesis if x ∈ / X1 . H0 True Retain Null Reject Null Pérez-Cruz √ False Alarm H1 True Misdetection √ 5 Theory of Detection and Estimation Neyman-Pearson Definitions I Power of a Test: β(y) = P (x ∈ X1|y) It measures the probability of rejecting the null hypothesis for a given y and rejection region. I The size of a test: α = sup β(y) y∈Y0 It measures the maximum probability of rejecting the null hypothesis (type I error). I A test has level α if its size is less than α. Pérez-Cruz 6 Theory of Detection and Estimation Neyman-Pearson Example I We are given n observation from N (y, σ 2) with known σ. I We want to know whether y ≤ 0 (H0) or y > 0 (H1). I We are given the test T (x) = 1 Pn n i=1 xi . I We want to select c to get 0.05-level test. How do we do it? • Rejection region: Pérez-Cruz 7 Theory of Detection and Estimation Neyman-Pearson Example I We are given n observation from N (y, σ 2) with known σ. I We want to know whether y ≤ 0 (H0) or y > 0 (H1). I We are given the test T (x) = 1 Pn n i=1 xi . I We want to select c to get 0.05-level test. How do we do it? • Rejection region: X1 = {x|T (x) > c} • The Power function: β(y) = P (T (x) > c) = P =P √ √ ! n(T (x) − y) n(c − y) > σ σ ! ! √ √ n(c − y) n(c − y) Z> =1−Φ σ σ where Z ∼ N (0, 1) and Φ(·) is the cdf of a Gaussian. Pérez-Cruz 8 Theory of Detection and Estimation Neyman-Pearson Example I We are given n observation from N (y, σ 2) with known σ. I We want to know whether y ≤ 0 (H0) or y > 0 (H1). I We are given the test T (x) = 1 Pn n i=1 xi . I We want to select c to get 0.05-level test. How do we do it? • Rejection region: X1 = {x|T (x) > c} • The Power function: √ √ ! n(T (x) − y) n(c − y) > σ σ ! ! √ √ n(c − y) n(c − y) Z> =1−P Z ≤ σ σ β(y) = P (T (x) > c) = P =P where Z ∼ N (0, 1). Pérez-Cruz 9 Theory of Detection and Estimation Neyman-Pearson Example I n P o 1 Power for our test for different levels. X1 = x n i xi > c 1 0.9 0.8 0.8 0.7 0.7 0.6 0.6 β(y) β(y) 0.9 1 c=0 c=0.52 c=0.815 α=0.5 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0 −1 α=0.05 −0.5 Pérez-Cruz n=5 n=10 n=25 0.1 α=0.005 0 0.5 y 1 1.5 2 0 −1 −0.5 0 0.5 y 1 1.5 2 10 Theory of Detection and Estimation Neyman-Pearson Most Powerful Test I For T (x), we find the c that achieves the desired level α. I α is the maximum probability of false alarm for any y ∈ Y0. I The minimum value of β(y) for y ∈ / Y0, is the minimum probability of detection, i.e. the maximum probability of misdetection. I Most Powerful Test: test with highest power for a given level: • Maximize the probability of detection for a given false alarm. max Pd(δ) δ Pérez-Cruz subject to Pf a(δ) ≤ α 11 Theory of Detection and Estimation Neyman-Pearson Most Powerful Test I For T (x), we find the c that achieves the desired level α. I α is the maximum probability of false alarm for any y ∈ Y0. I The minimum value of β(y) for y ∈ / Y0, is the minimum probability of detection, i.e. the maximum probability of misdetection. I Most Powerful Test: test with highest power for a given level: • Maximize the probability of detection for a given false alarm. max Pd(δ) subject to δ max sup 1 − R(δ|y) δ Pérez-Cruz y ∈Y / 0 subject to Pf a(δ) ≤ α sup R(δ|y) ≤ α y∈Y0 12 Theory of Detection and Estimation Neyman-Pearson Most Powerful Test I For T (x), we find the c that achieves the desired level α. I α is the maximum probability of false alarm for any y ∈ Y0. I The minimum value of β(y) for y ∈ / Y0, is the minimum probability of detection, i.e. the maximum probability of misdetection. I Most Powerful Test: test with highest power for a given level: • Maximize the probability of detection for a given false alarm. max Pd(δ) subject to min sup R(δ|y) subject to δ δ Pérez-Cruz y ∈Y / 0 Pf a(δ) ≤ α sup R(δ|y) ≤ α y∈Y0 13 Theory of Detection and Estimation Neyman-Pearson Some tests I There are many different test. I It is difficult to prove if a test is the most powerful test. I A test is proposed and its power is compared with other test. I Some test of interest: Pérez-Cruz 14 Theory of Detection and Estimation Neyman-Pearson Some tests I There are many different test. I It is difficult to prove if a test is the most powerful test. I A test is proposed and its power is compared with other test. I Some test of interest: • Wald test: verifies if y = y0. Pérez-Cruz 15 Theory of Detection and Estimation Neyman-Pearson Some tests I There are many different test. I It is difficult to prove if a test is the most powerful test. I A test is proposed and its power is compared with other test. I Some test of interest: • Wald test: verifies if y = y0. • Permutation test: verifies if p(x) and q(x) are the same. Pérez-Cruz 16 Theory of Detection and Estimation Neyman-Pearson Some tests I There are many different test. I It is difficult to prove if a test is the most powerful test. I A test is proposed and its power is compared with other test. I Some test of interest: • Wald test: verifies if y = y0. • Permutation test: verifies if p(x) and q(x) are the same. • Likelihood test: verifies if y ∈ Y0 versus y ∈ / Y0 . Pérez-Cruz 17 Theory of Detection and Estimation Neyman-Pearson Some tests I There are many different test. I It is difficult to prove if a test is the most powerful test. I A test is proposed and its power is compared with other test. I Some test of interest: • Wald test: verifies if y = y0. • Permutation test: verifies if p(x) and q(x) are the same. • Likelihood test: verifies if y ∈ Y0 versus y ∈ / Y0 . • Neyman-Pearson test: verifies if y = y0 versus y = y1. Pérez-Cruz 18 Theory of Detection and Estimation Neyman-Pearson Neyman-Pearson test I Decide between hypothesis y = y0 or y = y1 given x = {x1, . . . , xn}: I Neyman-Pearson test: n p(xi|y = y1) L(y = y1) i=1 T (x) = = Qn L(y = y0) i=1 p(xi |y = y0 ) Q I Neyman-Pearson rejection region: X1 = {x|T (x) > k} where k is such that β(y0) = α. I Neyman-Pearson lemma: • Most powerful test. Pérez-Cruz 19 Theory of Detection and Estimation Neyman-Pearson Example I I Null Hypothesis: Alternative Hypothesis: exp(−(x + 1)2/2) √ 2π I exp(−x2/8) √ 8π 2 2 x + (x+1) exp − >k Rejection region: X1 = x 1 2 8 2 5 0.5 log L(x) logτ 4 p(x|y=0) p(x|y=1) 0.45 0.4 0.35 0.3 2 p(x|y) log L(x) 3 1 0.2 0.15 0 0.1 −1 −2 −6 Pérez-Cruz 0.25 0.05 −4 −2 0 x 2 4 6 0 −6 −4 −2 0 x 2 4 6 20 Theory of Detection and Estimation Neyman-Pearson k=0 5 0.5 log L(x) logτ 4 p(x|y=0) p(x|y=1) 0.45 0.4 0.35 0.3 2 p(x|y) log L(x) 3 1 0.2 0.15 0 0.1 −1 0.05 −2 −6 −4 −2 0 2 x 4 p(x|y=0) p(x|y=1) 0.45 0.4 0.8 0.35 0.7 0.3 0.6 0.25 0.2 −2 0 2 x 4 6 0.5 0.4 0.3 0.15 0.2 0.1 P 0.05 0 −6 −4 0.9 R(δ|y=1) p(x|y) 0 −6 6 1 0.5 Pérez-Cruz 0.25 −4 Pfa Pfa miss −2 0 x 0.1 2 4 6 0 0 0.2 0.4 0.6 R(δ|y=0) 0.8 1 21 Theory of Detection and Estimation Neyman-Pearson k=1 5 0.5 log L(x) logτ 4 p(x|y=0) p(x|y=1) 0.45 0.4 0.35 0.3 2 p(x|y) log L(x) 3 1 0.2 0.15 0 0.1 −1 0.05 −2 −6 −4 −2 0 2 x 4 p(x|y=0) p(x|y=1) 0.45 0.4 0.8 0.35 0.7 0.3 0.6 0.25 0.2 −2 0 2 x 4 6 0.5 0.4 0.3 0.15 0.2 0.1 P 0.05 0 −6 −4 0.9 R(δ|y=1) p(x|y) 0 −6 6 1 0.5 Pérez-Cruz 0.25 0.1 miss Pfa −4 −2 0 x Pfa 2 4 6 0 0 0.2 0.4 0.6 R(δ|y=0) 0.8 1 22 Theory of Detection and Estimation Neyman-Pearson k = −0.5 5 0.5 log L(x) logτ 4 p(x|y=0) p(x|y=1) 0.45 0.4 0.35 0.3 2 p(x|y) log L(x) 3 1 0.2 0.15 0 0.1 −1 0.05 −2 −6 −4 −2 0 2 x 4 p(x|y=0) p(x|y=1) 0.45 0.4 0.8 0.35 0.7 0.3 0.6 0.25 0.2 −2 0 2 x 4 6 0.5 0.4 0.3 0.15 0.2 0.1 0.05 0 −6 −4 0.9 R(δ|y=1) p(x|y) 0 −6 6 1 0.5 Pérez-Cruz 0.25 P −4 P miss fa −2 P 0.1 fa 0 x 2 4 6 0 0 0.2 0.4 0.6 R(δ|y=0) 0.8 1 23 Theory of Detection and Estimation Neyman-Pearson Receiver Operation Characteristic 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Detection rate R(δ|y=1) I Repeat for every value of k: 0.6 0.5 0.4 0.3 0.6 0.5 0.4 0.3 0.2 0.2 0.1 0.1 0 0 0.2 0.4 0.6 R(δ|y=0) 0.8 1 0 0 0.2 0.4 0.6 False alarm 0.8 1 I What does the red circle represent? I And the green? Pérez-Cruz 24 Theory of Detection and Estimation Neyman-Pearson Example II: Biased Coin I Null Hypothesis: fair coin. I Alternative Hypothesis: biased coin p(H) = 0.8. I We toss the coin three times. I We want to find the most powerful test with level 0.2. I What do we do? Pérez-Cruz 25 Theory of Detection and Estimation Neyman-Pearson Example II: Biased Coin I Null Hypothesis: fair coin. I Alternative Hypothesis: biased coin p(H) = 0.8. I We toss the coin three times. I We want to find the most powerful test with level 0.2. I What do we do? • Define a rejection set. Pérez-Cruz 26 Theory of Detection and Estimation Neyman-Pearson Example II: Biased Coin I Null Hypothesis: fair coin. I Alternative Hypothesis: biased coin p(H) = 0.8. I We toss the coin three times. I We want to find the most powerful test with level 0.2. I What do we do? • Define a rejection set. • Compute the power function β(y). Pérez-Cruz 27 Theory of Detection and Estimation Neyman-Pearson Example II: Biased Coin I Null Hypothesis: fair coin. I Alternative Hypothesis: biased coin p(H) = 0.8. I We toss the coin three times. I We want to find the most powerful test with level 0.2. I What do we do? • Define a rejection set. • Compute the power function β(y). • Obtain the test level. Pérez-Cruz 28 Theory of Detection and Estimation Neyman-Pearson Example II: Biased Coin I Null Hypothesis: fair coin. I Alternative Hypothesis: biased coin p(H) = 0.8. I We toss the coin three times. I We want to find the most powerful test with level 0.2. I What do we do? • Define a rejection set. • Compute the power function β(y). • Obtain the test level. • Obtain the test power. Pérez-Cruz 29 Theory of Detection and Estimation Neyman-Pearson Example II I Observation set x ∈ {HHH,. . . ,TTT} I Rejection set X1 = { } 1 0.9 0.8 R(δ|y=1) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 Pérez-Cruz 0.2 0.4 0.6 R(δ|y=0) 0.8 1 30 Theory of Detection and Estimation Neyman-Pearson Example II I Observation set x ∈ {HHH,. . . ,TTT} I Rejection set X1 = { } 1 0.9 0.8 R(δ|y=1) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 Pérez-Cruz 0.2 0.4 0.6 R(δ|y=0) 0.8 1 31 Theory of Detection and Estimation Neyman-Pearson Example II I Hard decision rules. 1 0.9 0.8 R(δ|y=1) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 Pérez-Cruz 0.2 0.4 0.6 R(δ|y=0) 0.8 1 32 Theory of Detection and Estimation Neyman-Pearson Example II I Ramdomize decision rules. 1 0.9 0.8 R(δ|y=1) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 Pérez-Cruz 0.2 0.4 0.6 R(δ|y=0) 0.8 1 33 Theory of Detection and Estimation Neyman-Pearson Example II I ROC: Receiver Operation Characteristic 1 0.9 Detection rate 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 Pérez-Cruz 0.2 0.4 0.6 False alarm 0.8 1 34 Theory of Detection and Estimation Neyman-Pearson Risk Composite Hypothesis Testing I Simple hypothesis testing: R(δ) = Z L(δ(x), y)p(x, y)dxdy I Composite hypothesis testing: R(δ) = Z L(δ(x), h(y))p(x, y)dxdy I Binary composite hypothesis testing: R(δ) =L00 Z X0 ,Y0 Z + L01 Pérez-Cruz p(x, y)dxdy + L10 X0 ,Y1 Z X1 ,Y0 Z p(x, y)dxdy + L11 p(x, y)dxdy X1 ,Y1 p(x, y)dxdy 35 Theory of Detection and Estimation Neyman-Pearson Example I We are given a coin a we want to know if it is biased towards heads? • H0: P (π > 0.5) • H1: P (π <= 0.5) where π = p(H). I We are given 5 coin tosses: X = {HHHHH, . . . , TTTTT}. I We now a priori that p(π) is Beta distributed: Γ(α + β) α−1 π (1 − π)β−1 Γ(α)Γ(β) with α = 2 and β = 3.5; p(π) = Pérez-Cruz 36 Theory of Detection and Estimation Neyman-Pearson Example I Risk for δ(x) = H0: R(δ(x) = H0) = L00 Z Y0 p(x, y)dy + L01 Z Y1 p(x, y)dy I Risk for δ(x) = H1: R(δ(x) = H1) = L11 Z Y1 p(x, y)dy + L10 Z Y0 p(x, y)dy I where ! Γ(5.5) 5 N 5−N H H p(x, y) = π (1 − π) π(1 − π)2.5 NH Γ(2)Γ(3.5) I How would you choose X1, if L00 = L11 = 0 and L01 = L10 = 1? Pérez-Cruz 37 Theory of Detection and Estimation Neyman-Pearson Uniformly Most Powerful Test I Is there a Neyman-Pearson test for composite hypothesis testing? Pérez-Cruz 38 Theory of Detection and Estimation Neyman-Pearson Uniformly Most Powerful Test I Is there a Neyman-Pearson test for composite hypothesis testing? No I Uniformly Most Powerful Test: • For y = y0 and y ∈ Y1. • If X1 is independent of y for R(δ|y = y0) ≤ α. • Example: Test if the mean of a Gaussian is µ0 (H0) or greater (H1). Pérez-Cruz 39 Theory of Detection and Estimation Neyman-Pearson Uniformly Most Powerful Test I Is there a Neyman-Pearson test for composite hypothesis testing? No I Uniformly Most Powerful Test: • For y = y0 and y ∈ Y1. • If X1 is independent of y for R(δ|y = y0) ≤ α. • Example: Test if the mean of a Gaussian is µ0 (H0) or greater (H1). n o −1 X1 = x|x > σΦ (1 − α) + µ0 Independent of y. Why? Pérez-Cruz 40 Theory of Detection and Estimation Neyman-Pearson Locally Most Powerful Test I UMP tests usually do not exist. I For y = y0 and y ∈ Y1. I We can apply the Locally most powerful test. I Idea: Compare y0 with y tending towards y0. ∂β(y) + O(y 2). • β(y) = β(y0) + (y − y0) ∂y y=y0 ∂β(y) • β(y) = α + (y − y0) ∂y + O(y 2). y=y0 ∂β(y) • We want a test that maximizes ∂y . y=y0 Pérez-Cruz 41 Theory of Detection and Estimation Neyman-Pearson Generalized Likelihood Ratio Test I Neyman-Pearson test: n p(xi|y = y1) L(y = y1) T (x) = = Qi=1 n L(y = y0) i=1 p(xi |y = y0 ) Q I GLR test: Qn supy∈Y1 L(y) supy∈Y1 i=1 p(xi|y) T (x) = = Q supy∈Y0 L(y) supy∈Y0 n i=1 p(xi |y) I GLR rejection region: X1 = {x|T (x) > k} where k is such that α = supy∈Y0 β(y). I This test does not need to be the most powerful. Pérez-Cruz 42 Theory of Detection and Estimation Neyman-Pearson Example I Repeat the previous example. I For x = {HHHHH}: • p(x|π) = π 5 • supy∈Y1 L(y) = 0.55 = 0.03125 • supy∈Y0 L(y) = 15 = 1 I For x = {THHHH}, . . . , {HHHHT}: • p(x|π) = 5π 4(1 − π) • supy∈Y1 L(y) = 5 × 0.55 = 0.15625 • supy∈Y0 L(y) = 5 × 0.84 × 0.2 = 0.4096 Pérez-Cruz 43 Theory of Detection and Estimation Neyman-Pearson Example I For x = {TTHHH}, . . . , {HHHTT}: • p(x|π) = 10π 3(1 − π)2 • supy∈Y1 L(y) = 10 × 0.55 = 0.3125 • supy∈Y0 L(y) = 10 × 0.63 × 0.42 = 0.3456 I For x = {TTTHH}, . . . , {HHTTT}: • p(x|π) = 10π 2(1 − π)3 • supy∈Y1 L(y) = 10 × 0.42 × 0.63 = 0.3456 • supy∈Y0 L(y) = 10 × 0.55 = 0.3125 Pérez-Cruz 44 Theory of Detection and Estimation Neyman-Pearson Example I For x = {TTTTH}, . . . , {HTTTT}: • p(x|π) = 5π(1 − π)4 • supy∈Y1 L(y) = 5 × 0.2 × 0.84 = 0.4096 • supy∈Y0 L(y) = 5 × 0.55 = 0.15625 I For x = {TTTTT}: • p(x|π) = (1 − π)5 • supy∈Y1 L(y) = 15 = 1 • supy∈Y0 L(y) = 0.55 = 0.03125 Pérez-Cruz 45