Other Test Constructions: Likelihood Ratio & Bayes Tests Side-Note: So far we have seen a few approaches for creating tests such as • Neyman-Pearson Lemma (“most powerful” tests of H0 : θ = θ0 vs H1 : θ = θ1 ) • Two methods for “uniformly most powerful” (UMP) tests – Method I: Based on Neyman-Pearson Lemma (may work for Rp -valued parameters θ and tests of H0 : θ ∈ Θ0 ⊂ Rp vs H1 : θ 6∈ Θ0 ) – Method II: Monotone Likelihood Ratio (may work for real-valued θ ∈ R and tests “H0 : θ ≤ θ0 vs H1 : θ > θ0 ” or “H0 : θ ≥ θ0 vs H1 : θ < θ0 ”) The tests above are often rooted in comparing likelihoods to make a testing decision (e.g., NeymanPearson Lemma). We next consider a very general testing procedure based on comparing the ratio of two likelihoods. A.) Likelihood Ratio Tests Definition: Let f (x|θ), θ ∈ Θ ⊂ Rp , be the joint pdf/pmf of X = (X1 , . . . , Xn ) (the ˜ ˜ parameter θ can be vector-valued) and let Θ0 be a nonempty proper subset of Θ. Then, the likelihood ratio statistic (LRS) for testing H0 : θ ∈ Θ0 ⊂ Rp vs H1 : θ 6∈ Θ0 is defined as max f (x|θ) θ∈Θ0 ˜ . λ(x) = max f (x|θ) ˜ θ∈Θ ˜ Note that if θ̂ ≡ MLE of θ over entire Θ & θ̃ ≡ maximum of f (x|θ) over θ ∈ Θ0 ˜ f (x|θ̃) ⇒ we may write λ(x) = ˜ ˜ f (x|θ̂) ˜ Definition: A size α likelihood ratio test (LRT) for testing H0 : θ ∈ Θ0 ⊂ Rp vs H1 : θ 6∈ Θ0 is defined as 1 γ ϕ(x) = ˜ 0 if λ(x) < k ˜ if λ(x) = k ˜ if λ(x) > k ˜ where γ ∈ [0, 1] and 0 ≤ k ≤ 1 are constants determined by maxθ∈Θ0 Eθ ϕ(X ) = α. ˜ 1 Example: Let X1 , . . . , Xn be iid Gamma(α = 3, θ), θ > 0. Find a size α LRT for H0 : θ = θ0 vs H1 : θ 6= θ0 . 2 Example: Let X1 , . . . , Xn be iid Exponential(θ, ν), θ > 0, ν ∈ R with common pdf ( 1 −(x−ν)/θ if x ≥ ν θe f (x|θ, ν) = 0 otherwise Find a size α LRT for H0 : ν = ν0 vs H1 : ν 6= ν0 (where ν0 ∈ R is fixed). 3 B.) Large Sample Properties of LRT Tests (for calibration) The following result describes the asymptotic distribution of the likelihood ratio statistic (under appropriate regularity conditions) & may be used to calibrate a LRT in a simple fashion when the sample size is sufficiently large. Theorem: Let X1 ,X2 , . . . be iid random vectors with common pdf/pmf f (x|θ), ˜ ˜ ˜ θ ∈ Θ ⊂ R (the parameter θ can be vector-valued). Let λn (X1 ,X2 , . . . ,Xn ) denote the ˜ ˜ ˜ likelihood ratio statistic based on X1 ,X2 , . . . ,Xn for testing H0 : θ ∈ Θ0 ⊂ Rp vs ˜ ˜ ˜ H1 : θ 6∈ Θ0 , where Θ0 has the form o n 0 0 0 0 Θ0 = θ = (θ1 , . . . , θp ) ∈ Θ : θ1 = θ1 , . . . , θr = θr for some θ1 , . . . , θr and r ≤ p | {z } hypothesized values for first r ≤ p parameters Then, under the Cramér-Rao type regularity conditions, it holds that: if H0 is true, d − 2 log λn (X1 ,X2 , . . . ,Xn ) −→ χ2r ˜ ˜ ˜ as n → ∞. Remark: The above limiting distribution suggests the following testing procedure based on the (1 − α)-quantile of a χ2r distribution, denoted as χ21−α (r) for which ¡ ¢ ¡ ¢ P χ2r ≤ χ21−α (r) = 1 − α, P χ2r > χ21−α (r) = α. Namely, for n large (e.g., say n ≥ 30 observations), 1 if −2 log λn (X1 ,X2 , . . . ,Xn ) > χ21−α (r) ˜ ˜ ˜ ϕ(X1 ,X2 , . . . ,Xn ) = ˜ ˜ ˜ 0 otherwise is an approximate size α LRT for testing “H0 : θ1 = θ10 , . . . , θr = θr0 ” vs “H1 : θi 6= θi0 for some 1 ≤ i ≤ r.” 4 Example: Let X1 ,X2 , . . . be iid N2 (µ, A) random vectors, where µ = (µ1 , µ2 ) ∈ R2 ˜ ˜ and A is a known 2 × 2 positive definite matrix. Find a size α LRT for testing H0 : 2µ1 + 3µ2 = 0 vs H1 : 2µ1 + 3µ2 6= 0. 5 C.) Bayes Tests Let X1 , . . . , Xn have joint pdf/pmf f (x|θ), θ ∈ Θ ⊂ Rp , and we want to test ˜ H0 : θ ∈ Θ0 ⊂ Rp vs H1 : θ 6∈ Θ0 . Let • π(θ) be a prior pdf R • P (θ ∈ Θ0 |x) = Θ0 fθ|x (θ)dθ ⇐ posterior probability that θ ∈ Θ0 ˜ ˜ R • P (θ 6∈ Θ0 |x) = Θ\Θ0 fθ|x (θ)dθ ⇐ posterior probability that θ 6∈ Θ0 ˜ ˜ • Note that P (θ ∈ Θ0 |x) + P (θ 6∈ Θ0 |x) = 1 ˜ ˜ Then, a Bayes test for testing H0 : θ ∈ Θ0 vs H1 : θ 6∈ Θ0 is given by 1 if P (θ 6∈ Θ0 |x) ≥ P (θ ∈ Θ0 |x) 1 if P (θ 6∈ Θ0 |x) ≥ 1/2 ˜ ˜ = ˜ ϕ(x) = 0 otherwise 0 otherwise ˜ Discussion: The Bayes test follows from minimizing the Bayes Risk BRϕ1 of a simple test ϕ1 (x) (i.e., tests where ϕ1 (x) ∈ {0, 1} for any x) ˜ ˜ ˜ • Consider a loss function L(θ, a) = I{θ∈Θ0 } I{a=1} + I{θ6∈Θ0 } I{a=0} where “a” may assume two values: a = 1 means “reject H0 ” and a = 0 means “don’t reject H0 ”. So, the loss L(θ, a) = 0 for a correct decision and L(θ, a) = 1 for an incorrect decision: L(θ, a) = ( 0 1 if θ ∈ Θ0 & a = 0 or if θ 6∈ Θ0 & a = 1 otherwise • The risk function of a simple test ϕ1 (x) is ˜ Rϕ1 (θ) = Eθ L(θ, ϕ1 (X )) = I{θ∈Θ0 } Pθ (ϕ1 (X ) = 1) +I{θ6∈Θ0 } Pθ (ϕ1 (X ) = 0) ˜ ˜ ˜ | {z } {z } | prob. of Type I error prob. of Type II error R • The Bayes risk of ϕ1 (x) w.r.t. π(θ) is: BRϕ1 = E(θ) Rϕ1 (θ) = Θ Rϕ1 (θ)π(θ)dθ ˜ & the Bayes test ϕ(x) minimizes BRϕ1 over all simple tests ϕ1 (x) ˜ ˜ 6 • Alternatively, we can find the Bayes test by minimizing the posterior risk of a simple test ϕ1 (x) for each fixed x, where the posterior risk is ˜ ˜ ¶ Z µ Eθ|x L(ϕ1 (x), θ) = I{θ∈Θ0 } I{ϕ1 (x)=1} + I{θ6∈Θ0 } I{ϕ1 (x)=0} fθ|x (θ)dθ ˜ Θ ˜ ˜ ˜Z ˜ Z = I{ϕ1 (x)=1} fθ|x (θ)dθ + I{ϕ1 (x)=0} fθ|x (θ)dθ Θ0 Θ\Θ0 ˜ ˜ ˜ ˜ = I{ϕ1 (x)=1} P (θ ∈ Θ0 |x) + I{ϕ1 (x)=0} P (θ 6∈ Θ0 |x) ˜ ˜ ˜ ˜ For each fixed x, we choose the values ϕ1 (x) = 1 or 0 of the test to minimize ˜ ˜ the posterior risk; that is, for each fixed x, we should pick ϕ1 (x) = 1 if P (θ 6∈ ˜ ˜ Θ0 |x) ≥ P (θ ∈ Θ0 |x) and pick ϕ1 (x) = 0 if P (θ 6∈ Θ0 |x) < P (θ ∈ Θ0 |x). Note ˜ ˜ ˜ ˜ ˜ this is the same decision rule as the Bayes test ϕ(x) above. ˜ Example: Let X1 , . . . , Xn be iid N (θ, 1), θ ∈ R. Find the Bayes test for H0 : θ ≤ θ0 vs H1 : θ > θ0 under the N (µ, τ 2 ) prior for θ, where µ, τ 2 , θ0 are fixed. 7