306 CHAPTER 8 HYPOTHESIS TESTING is a maximum a posteriori probability (MAP) hypothesis test. In such a all outcomes s for which P[Hols] > P[H1Is], and Al contains all ou P[Hlls] > P[Hols]. If P[Hols] = P[Hlls], the assignment of s to does not affect PERR. In Theorem 8.1, we arbitrarily assign s to Ao when probabilities are equal. We would have the same probability of error if for all outcomes that produce equal a posteriori probabilities or if we assign .=:.::-:::a. with equal a posteriori probabilities to Ao and others to AI. Equation (8.8) is another statement of the MAP decision rule. It cn::!::::= probability models that are assumed to be known: .-= • The a priori probabilities of the hypotheses: P [Ho] and P [Hll, • The likelihood function of Ho: P[sIHo], \ • The likelihood function of HI: P[sIHIl. When the outcomes of an experiment yield a random vector X as the deciS::::::':::=:J •• can express the MAP rule in terms of conditional PMFs or PDFs. If X is X = Xi to be the .outcome of the experiment. If the sample space S o~:;z C:'!'I:II" is continuous, we interpret the conditional probabilities by assuming corresponds to the random vector X in the small volume X ::::: X < x + ds: -""'----Ix (x)dx. Section 4.9 demonstrates that the conditional probabilities are densities. Thus in terms of the random variable X, we have the follo . MAP hypothesis test. = Theorem 8.2 For an experiment that produces a random vector X, the MAP hypothese Discrete: . Continuous: XE A 01 if x PxIHo (x) P [Hll > ---; PXIH1 (x) - P [Ho] . IxIHo (x) E Ao if IxIHl (x) X E A 1 other. P [HI] ::::P [Ho]; In these formulas, the ratio of conditional probabilities is referred to as _ The formulas state that in order to perform a binary hypothesis test, we obs =-::::~':::=.::l__ of an experiment, calculate the likelihood ratio on the left side of the fouu.-..<c,'_ it with a constant on the right side of the formula. We can view the like' evidence, based on an observation, in favor of Ho: If the likelihood ratio __ Ho is more likely than HI. The ratio of prior probabilities, on the right si :. prior to performing the experiment, in favor of HI. Therefore, Theorem - is the better conclusion if the evidence in favor of Ho, based on the eXIX!'I::::::::::t~ the prior evidence in favor of HI. In many practical hypothesis tests, including the following example, - ~H====:;:lI_ compare the logarithms of the two ratios. -=~_ Example 8.6 r.======~ With probability p, a digital communications system transmits a O. probability 1 - p. The received signal is either X = -v + N vo bit is 0; or v + N volts, if the transmitted bit is 1. The voltage ±[ -~ component of the received signal, and N, a Gaussian (0,0') ran ~ 310 CHAPTER 8 HYPOTHESIS TESTING Periorming the same substitutions and simplifications In n E Ao if n 2: n" = 1+ as - (qIP[HdCoI) = 58.92; qOtH01)IO l-qo in l-ql Therefore, in the minimum cost hypothesis test, Ao = { 2: _at most 58 disk drives to reach a conclusion regarding drives pass the test, then N 2: 59, and the failure rate' -"'.=-=~--=error probabilities are: Co PFA = P [N ::: 581Ho] PMISS = P [N 2: 591HrJ = FNIHo (58) = 1 = 1 - FNIHI (58) = (1 - 10(1 - 10-: - The average cost (in dollars) of this rule is E [C] = = P [Ho] PFAClO + P [HI] PMISSCOI (0.9)(0.0058)(50,000) + (0.1)(0.0022)( By comparison, the MAP test, which minimizes the probab:=I',3j,3:=:-:=the expected cost, has an expected cost E [CMAP] = (0.9)(0.0046)(50,000) + (0.1)(0.0079)( A savings of $60 may not seem very large. The reason is the minimum cost test work very well. By comparison, for a testing altogether, each day that the failure rate is qj = 0.1 1,000 returned drives at an expected cost of $200,000. S' with probability P[HIl = 0.1, the expected cost of a "no te - -="'~-.'-'-"" Neyman-Pearson Test Given an observation, the MAP test minimizes the probability 0: !C:~~ ~--hypothesis and the minimum cost test minimizes the cost of e test requires that we know the a priori probabilities P[Hj] of the a:t:::!?=;; t:...••__ and the minimum cost test requires that we know in addition two types of errors. In many situations, these costs and a priori pITJ:3!:X:::;;;:or even impossible to specify. In this case, an alternate appro act. tolerable level for either the false alarm or miss probability. This ,C-,~=,.-'" Neyman-Pearson test. The Neyman-Pearson test minimizes PMISS SI:.:~=:J:==_-'-.;dI. probability constraint PPA = a, where a is a constant that indicates cr=:~'iCie::==:::alarms. Because PPA = P[AIIHo] and PMISS = P[AoIHd are '-'.h..e....- ..•.•.••••.•... the test does not require the a priori probabilities P[Ho] and P[Hl1Neyman-Pearson test when the decision statistic is a continous randoc; .-~.:- '::==~_. Theorem 8.4 ~ Neyman-Pearson Binary Hypothesis Test Based on the decision statistic X, a continuous random vector, the de;:=;c::,,:::: 8.2 mizes PMISS, subject to the constraint Ao X E . if L (x) = PFA 311 = a, is IxIHo (x) IxIHI BINARY HYPOTHESIS TESTING :::: y; X E Al otherwise, (x) where y is chosen so that fL(x)<y IxIHo(X) dx = a. Proof Using the Lagrange multiplier method, we define the Lagrange multiplier X and the function G = PMlSS = to = 1 + ),,(PFA - a) (8.27) (1 - !XIHI (x) dx + ).. (fxIHI (x) - VXIHo (8.28) to fxlHo (x) dx - a ) (x») ds: + ),,(1 - a) (8.29) Ao For a given x and a, we see that G is minimized if AO includes all x satisfying fxlHI (x) - VXIHo (x) :5 o. (8.30) Note that x is found from the constraint PFA = a. Moreover, we observe that Equation (8.~) implies )..> 0; otherwise, !XIHo(x) - )..!XIHI (x) > 0 for all x and AO ¢, the empty set, would minimize G. In this case, PFA = 1, which would violate the constraint that PFA = a. Since X > 0, we can rewrite the inequality (8.30) as L (x) ~ 1/).. = y. = In the radar system of Example 8.4, the decision statistic was a random variable X and the receiver operating curves (ROCs) of Figure 8.2 were generated by adjusting a threshold xo that specified the sets Ao = {X :s xo} and Al = {X > xo}. Example 8.4 did not question whether this rule finds the best ROC, that is, the best trade-off between PMlSS and PFA. The Neyman-Pearson test finds the best ROC. For each specified value of PFA a, the Neyman-Pearson test identifies the decision rule that minimizes PMlss. In the Neyman-Pearson test, an increase in y decreases PMlSS but increases PFA. When the decision statistic X is a continuous random vector, we can choose y so that false alarm probability is exactly a. This may not be possible when X is discrete. In the discrete case, we have the following version of the Neyman-Pearson test. = .' ~ ;:Morem8.5 Discrete Neyman-Pearson Test Based on the decision statistic X, a decision random vector, the decision rule that minimizes PMISS, subject to the constraint PFA :s a, is X E A 0 IifL() X = PXIHo (x) :::: y; X E Al otherwise, PXIHI (x) where y is the largest possible value such that LL(x)<y PXIHo(x) dx:s a. Example 8.10 ---- Continuing the disk drive factory test of Example 8.8, design a Neyman-Pearson test such that the false alarm probability satisfies PFA :5 a = 0.01. Calculate the resulting 312 CHAPTER 8 HYPOTHESIS TESTING miss and false alarm probabilities. The Neyman-Pearson test is . PNIHo (n) n E Ao If L(n) = ~ y; PNIHj (n) ( .s. n E A 1 otherwise. We see from Equation (8.15) that this is the same as the MAP test with P[H1]/ P[H replaced by y. Thus, just like the MAP test, the Neyman-Pearson test must be ~ threshold test of the form n E Ao if n ~ n*; n E l (~.: Al otherwise. Some algebra would allow us to find the threshold n" in terms of the parameter i However, this is unnecessary. It is simpler to choose n" directly so that the test meets the false alarm probability constraint PFA = P [N ::: n" - 11Ho] = FNIHo (n* - 1) = 1 - (1 - Qo)n*-1 ::: a. (8.3 This implies n" < 1 + In(l - a) = 1 + In(0.99) = 101.49. In(l - qa) In(0.9) (8.34 Thus, we can choose n* = 101 and still meet the false alarm probability constrain The error probabilities are: PFA = P [N PMISS =P s lOOIHo] [N ~ 1011Hd = 1 - (1 - 10-4)100= 0.00995, (8.3 = (8.36) (1 _10-1)100 = 2.66.10-5. We see that a one percent false alarm probability yields a dramatic reduction in the probability of a miss. Although the Neyman-Pearson test minimizes neither the overall probability of a test error nor the expected cost E[C], it may be preferable to either the MAP test or the minimum cost test. In particular, customers will judge the quality of the disk drives and the reputation of the factory based on the number of defective drives that are shipped. Compared to the other tests, the Neyman-Pearson test results in a much lower miss probability and far fewer defective drives being shipped. Maximum Likelihood Test Similar to the Neyman-Pearson test, the maximum likelihood (ML) test is another method that avoids the need for a priori probabilities. Under the ML approach, we treat the hypothesis as some sort of "unknown" and choose a hypothesis Hi for which P[sIHiJ, the conditional probability of the outcome s given the hypothesis Hi is largest. The idea behind choosing a hypothesis to maximize the probability of the observation is to avoid making assumptions about the a priori probabilities P [Hi]. The resulting decision rule, called the maximum likelihood (ML) rule, can be written mathematically as: Definition 8.1 Maximum Likelihood Decision Rule For a binary hypothesis test based on the experimental outcome s E S, the maximum 8.2 BINARY HYPOTHESIS TESTING 313 likelihood (ML) decision rule is S E Ao if P s [sIHo] :::: P [sIHl]; E Al otherwise. Comparing Theorem 8.1 and Definition 8.1, we see that in the absence of information about the a priori probabilities P [Hi], we have adopted a maximum likelihood decision rule that is the same as the MAP rule under the assumption that hypotheses Ho and HI occur with equal probability. In essence, in the absence of a priori information, the ML rule assumes that all hypotheses are equally likely. By comparing the likelihood ratio to a threshold equal to 1, the ML hypothesis test is neutral about whether Ho has a higher probability than HI or vice versa. When the decision statistic of the experiment is a random vector X, we can express the ML rule in terms of conditional PMFs or PDFs, just as we did for the MAP rule. orem 8.6 If an experiment produces a random vector X, the ML decision rule states Discrete: X E . Continuous: x E Ao if Ao if Ixsn, PXIHo (x) :::: 1; PX.lHl (x) . fx.lHo (x) > 1; (x) - X E A 1 otherwise, X E A 1 otherwise. Comparing Theorem 8.6 to Theorem 8.4, when X is continuous, or Theorem 8.5, when X is discrete, we see that the maximum likelihood test is the same as the Neyman-Pearson test with parameter y = 1. This guarantees that the maximum likelihood test is optimal in the limited sense that no other test can reduce PMISS for the same PFA. In practice, we use a ML hypothesis test in many applications. It is almost as effective as the MAP hypothesis test when the experiment that produces outcome s is reliable in the sense that PERR for the ML test is low. To see why this is true, examine the decision rule in Example 8.6. When the signal-to-noise ratio 2v/0" is high, the threshold (of the log-likelihood ratio) is close to 0, which means that the result of the MAP hypothesis test is close to the result of a ML hypothesis test, regardless of the prior probability p. f. ~~!!- 1 Continuing the disk drive test of Example 8.8, design the maximum likelihood test for the factory state based on the decision statistic N, the number of drives tested up to and including,the first failure. The ML hypothesis test corresponds to the MAP test with P[Hol = P[Hll = 0.5. In ths icase, Equation (8.16) implies n" = 66.62 or Ao = {n ::::67}. The conditional error probabilities under the ML rule are PFA = P PMISS [N ::: 661Ho] = = P [N:::: 671Hl] = 1 - (1 - 10-4)66 (1 - 10-1)66 = = 0.0066, 9.55 .10-4. (8.37) (8.38) For the [AL test, PERR = 0.0060. Comparing the MAP rule with the ML rule, we see that the prior information used in the MAP rule makes it more difficult to reject the null hypothesis, We need only 46 good drives in the MAP test to accept Ho, while in the