Efficiency and Relative Efficiency of Tests. Chi

Efficiency and Relative Efficiency of Tests. Chi-Square Tests Scientific Seminar “Asymptotic Statistics” Olena Korzhevska Table of Contents • Relative Efficiency of Tests – Asymptotic Power Functions. Consistency. Asymptotic Relative Efficiency • Efficiency of Tests – Asymptotic Representation Theorem. Testing Normal Means. Local Asymptotic Normality. One-Sample Location. Two-Sample Problems • Chi-Square Tests – Quadratic Forms in Normal Vectors. Pearson Statistic. Testing Independence. Goodness-of-Fit Tests. Asymptotic Efficiency 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 2/42 1. Relative Efficiency of Tests Asymptotic Power Functions • The relative efficiency of two sequences of tests is the quotient of the numbers of observations needed with the two tests to obtain the same level and power. • Testing problem: 𝐻0 : 𝜃 ∈ Θ0 𝑣𝑠. 𝐻1 : 𝜃 ∈ Θ1 • The power function of a test that rejects 𝐻0 if a test statistics 𝑇𝑛 falls into critical region 𝐾𝑛 : 𝜃 ⟼ 𝜋𝑛 𝜃 = Ρ𝜃 (𝑇𝑛 ∈ 𝐾𝑛 ) • The test is of the level 𝜶 if its size sup 𝜋𝑛 𝜃 : 𝜃 ∈ Θ0 does not exceed 𝛼. • The sequence of tests is asymptotically of level 𝜶 if limsup sup 𝜋𝑛 𝜃 ≤ 𝛼. 𝑛→∞ 14/02/2015 𝜃∈Θ0 Olena Korzhevska. Asymptotic Statistics Seminar 4/42 Asymptotic Power Functions • The test with power function 𝜋𝑛 is better than the test with power function 𝜋𝑛′ if both 𝜋𝑛 (𝜃) ≤ 𝜋′𝑛 (𝜃), 𝜃 ∈ Θ0 , 𝜋𝑛 (𝜃) ≥ 𝜋′𝑛 (𝜃), 𝜃 ∈ Θ1 . • Aim: to compare tests asymptotically. • Consider 2 sequences of tests, with power functions 𝜋𝑛 and 𝜋′𝑛 (Tests of each sequences are of the same type). 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 5/42 Asymptotic Power Functions • First idea – compare limiting power functions of the form : 𝜋 𝜃 = lim 𝜋𝑛 𝜃 . 𝑛→∞ • Example(Sign test). 𝑋1 , 𝑋2 , … , 𝑋𝑛 r.v. form the distribution with unique median 𝜃. - Test: 𝐻0 : 𝜃 = 0 𝑣𝑠. 𝐻1 : 𝜃 > 0, - Test statistics: 𝑆𝑛 = 𝑛−1 𝑛𝑖=1 1{𝑋𝑖 >0} , - Distribution function of the observations 𝐹 𝑥 − θ , 𝜇 𝜃 = 1 − 𝐹 −θ , 1 − 𝐹 −θ 𝐹 −𝜃 𝜎2 𝜃 = , 𝑛 𝑛 1 𝜇 0 = , 2 1 𝜎 0 = . 4 - 𝑛 𝑆𝑛 − 𝜇 𝜃 ⇝ 𝑁(0, 𝜎 2 (𝜃)) asymptotically, - under 𝐻0 : 𝑛 𝑆𝑛 − 1/2 ⇝ 𝑁(0,1/4) 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 6/42 Asymptotic Power Functions • Example(Sign test). - Test that rejects 𝐻0 if 𝑛 𝑆𝑛 − 1/2 > 𝑧𝛼 /2 has power function: 𝑧𝛼 𝜋𝑛 𝜃 = 𝑃𝜃 ( 𝑛 𝑆𝑛 − 𝜇 𝜃 > − 𝑛 𝜇 𝜃 −𝜇 0 2 𝑧𝛼 − 𝑛 𝐹 0 − 𝐹 −𝜃 2 =1− Φ + 𝑜(1) 𝜎 𝜃 - as 𝐹 0 − 𝐹 −𝜃 > 0 for every 𝜃 > 0, it follows that for 𝛼 = 𝛼𝑛 → 0 sufficiently slowly 0 𝑖𝑓 𝜃 = 0, 𝜋𝑛 𝜃 = 1 𝑖𝑓 𝜃 > 0. - In this case the limit power function corresponds to the perfect test with all error probabilities equal to zero. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 7/42 Asymptotic Power Functions How do we compare tests? We need to make the problem of discriminating between the null and the alternative hypotheses more difficult as n increases. It is natural to consider a shrinking alternative, that converges to the null. To test: 𝐻0 : 𝜃 = 0 𝑣𝑠. 𝐻1 : 𝜃𝑛 > 0, with 𝜃𝑛 → 0 Example(Sign test, continued). (on a board) In this situation a reasonable method for asymptotic comparison of 2 sequences of tests is to consider local limiting power functions: 𝜋 ℎ = lim 𝜋𝑛 𝑛→∞ 14/02/2015 ℎ 𝑛 , ℎ ≥ 0. Olena Korzhevska. Asymptotic Statistics Seminar 8/42 Asymptotic Power Functions Theorem: Suppose that 𝑇𝑛 , 𝜇, and 𝜎 are such that, for all ℎ and 𝜃𝑛 = ℎ/ 𝑛, 𝑛(𝑇𝑛 −𝜇(𝜃𝑛 )) ⇝ 𝜎(𝜃𝑛 ) 𝜃𝑛 𝑁(0,1) 𝜇 is differentiable in 0, 𝜎 is continuous in 0. Then the tests that reject 𝐻0 : 𝜃 = 0 for large values of 𝑇𝑛 and are asymptotically of level 𝛼 satisfy, for all ℎ, ℎ 𝜇′ 0 𝜋𝑛 → 1 − Φ 𝑧𝛼 − ℎ . 𝜎 0 𝑛 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 9/42 Asymptotic Power Functions Proof: Substituting ℎ = 0 shows that the asymptotic level of the test is 𝛼 iff 𝐻0 : 𝜃 = 0 is rejected for 𝑛(𝑇𝑛 −𝜇(0)) 𝜎(0) Thus, 𝜋𝑛 𝜃𝑛 = 𝑃𝜃𝑛 = 𝑃𝜃𝑛 𝑛 𝑇𝑛 − 𝜇 0 𝑛 𝑇𝑛 −𝜇 𝜃𝑛 𝜎(𝜃𝑛 ) → 1 − Φ 𝑧𝛼 − 14/02/2015 > > 𝑧𝛼 . > 𝜎 0 𝑧𝛼 𝜎 0 𝑧𝛼 − 𝑛 𝜇 𝜃𝑛 −𝜇 0 𝜎(𝜃𝑛 ) 𝜇′ 0 ℎ 𝜎 0 Olena Korzhevska. Asymptotic Statistics Seminar 10/42 Asymptotic Power Functions • 𝜇′ 0 𝜎 0 - slope of the sequence of tests. • Example (Sign test): The sign test has slope • Example (t-test):𝑇𝑛 = 𝑋 𝑆𝑛 , 𝑛 𝑋−𝜃 𝑆𝑛 𝜇′ 0 𝜎 0 = 2f 0 . ⇝ 𝑁 0,1 . 𝜃 Reject H0 if 𝑛𝑇𝑛 > 𝑧𝛼 . 𝑛 𝑋 𝑆 − 𝜇 𝜃 = 14/02/2015 ℎ/ 𝑛 𝜎 𝜃 ,𝜎 𝜎 = 𝑛(𝑋−ℎ / 𝑛) 𝑆 𝜃 = 1. 𝜇′ 0 𝜎 0 +ℎ 1 𝑆 1 − 𝜎 ⇝ 𝑁(0,1) ℎ/ 𝑛 = 1/𝜎. Olena Korzhevska. Asymptotic Statistics Seminar 11/42 Asymptotic Power Functions Example (Sign test vs. t-test): • 𝑋1 , 𝑋2 , … , 𝑋𝑛 random sample from a 𝑓(𝑥 − 𝜃)-density, 𝑓symmetric about 0, has unique median & finite 2𝑛𝑑 moment. • Test: 𝐻0 : 𝜃 = 0 that the observations are symmetrically distributes around 0. Compare the performance of sign and ttest. • Suffices to compare the slopes of 2 tests: 1 𝜎 2f 0 and , respectively. • For 𝑁(0,1) the slopes are 2/𝜋 and 1. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 12/42 Asymptotic Power Functions Relative efficiency of the sign test versus the ttest for some distributions. DISTRIBUTION EFFICIENCY(SIGN/T-TEST) Logistic 𝜋 2 /12 Normal 2/𝜋 Laplace 2 Uniform 1/3 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 13/42 Consistency • Definition: A sequence of tests with power functions 𝜃 ⟼ 𝜋𝑛 𝜃 is asymptotically consistent at level 𝛼 against alternative 𝜃 if it is asymptotically of the level 𝛼 and 𝜋𝑛 𝜃 → 1. • If a family of sequences of tests contains for every level 𝛼 a sequence that is consistent against every alternative, then the corresponding tests are simply called consistent. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 14/42 Consistency 𝑃𝜃 Lemma 1: 𝑇𝑛 a sequence of statistics: 𝑇𝑛 𝜇(𝜃) for every 𝜃. Then the family of tests that reject the null hypothesis 𝐻0 : 𝜃 = 0 for large values of 𝑇𝑛 is consistent against every 𝜃 such that 𝜇 𝜃 > 𝜇(0). Lemma 2: Suppose that 𝑇𝑛 , 𝜇, and 𝜎 are such that, for all ℎ and 𝜃𝑛 = ℎ/ 𝑛, 𝑛(𝑇𝑛 −𝜇(𝜃𝑛 )) ⇝ 𝜎(𝜃𝑛 ) 𝜃𝑛 𝑁(0,1), 𝜇′ (0) > 0, 𝜎 – continuous at 0 and σ 0 > 0. Suppose that the tests that reject 𝐻0 for the large values of 𝑇𝑛 have nondecreasing power functions 𝜃 ⟼ 𝜋𝑛 𝜃 . Then this family of tests is consistent against every alternative 𝜃 > 0. Moreover, if 𝜋𝑛 0 → 𝛼, then 𝜋𝑛 𝜃𝑛 → 𝛼 when 𝑛𝜃𝑛 → 0, or 𝜋𝑛 𝜃𝑛 → 1 when 𝑛𝜃𝑛 → ∞. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 15/42 Consistency • Example(t-test): The two-sample t-statistics (𝑋𝑛 − 𝑌𝑛 )/𝑆 converges in probability to E(𝑌 − 𝑋)/𝜎, where 𝜎 2 = lim 𝑣𝑎𝑟(𝑌𝑛 − 𝑋𝑛 ). n→∞ If the null hypothesis postulates that E𝑌 = 𝐸𝑋, then the test that rejects the null hypothesis for the large values of the t-statistics is consistent against every alternative for which E𝑌 > 𝐸𝑋. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 16/42 Asymptotic relative efficiency • Sequence of tests can be ranked in quality by comparing their asymptotic power functions. • For the test statistics we have seen so far this comparison involves “slopes” of the tests. • The concept of relative efficiency yields a method to quantify the interpretation of the slopes. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 17/42 Asymptotic relative efficiency • Sequence of testing problems to test: 𝐻0 : 𝜃 = 0 vs. 𝐻1 : 𝜃 = 𝜃𝜐 . • Requirement: tests need to attain asymptotic level 𝛼 and power 𝛾 ∈ (𝛼, 1). • 𝜋𝑛 is a power function of a test if n observations are available, 𝑛𝜐 is minimal number of observations such that both 𝜋𝑛𝜐 (0) ≤ 𝛼 and 𝜋𝑛𝜐 (𝜃𝜐 ) ≥ 𝛾. • The limit (if exists) lim 𝑛𝜐,2 𝑛→∞ 𝑛𝜐,1 is called (asymptotic) relative efficiency or Pitman efficiency of the first sequence of tests with respect to second one. • A relative efficiency larger than 1 indicates that fewer observations are needed with the first sequence of tests, which may then be considered the better one. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 18/42 Asymptotic relative efficiency Theorem: Consider stat. models (𝑃𝑛,𝜃 : 𝜃 ≥ 0) : 𝑃𝑛,𝜃 − 𝑃𝑛,0 𝑛 𝑇𝑛,𝑖 −𝜇𝑖 𝜃𝑛 Let 𝑇𝑛,1 , 𝑇𝑛,2 – sequences of statistics: 𝜎𝑖 𝜃𝑛 𝜃→0 0, ∀𝑛. ⇝ 𝑁 0,1 , ∀𝜃𝑛 → 0, 𝜃𝑛 functions: 𝜇𝑖 − differentiable at 0, 𝜇′ 𝑖 0 > 0, and 𝜎𝑖 −continuous at 0, 𝜎𝑖 0 > 0, i ∈ 1,2 . Then the relative efficiency of the tests that reject 𝐻0 : 𝜃 = 0 for large values of 𝑇𝑛,𝑖 is equal to 𝜇1′ (0)/𝜎1 (0) 𝜇2′ (0)/𝜎2 (0) 2 , ∀ 𝜃𝜈 ↓ 0, ∀𝜃𝜐 → 0 independently of 𝛼 > 0 and 𝛾 ∈ 𝛼, 1 . If the power function of the test based on 𝑇𝑛,𝑖 are nondecreasing for every n, then the assumption of asymptotic normality of 𝑇𝑛,𝑖 can be relaxed to asymptotic normality under every sequence 𝜃𝑛 = 𝑂(1/ 𝑛) only. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 19/42 2. Efficiency of Tests Asymptotic Representation Theorem • Randomized test (test function) 𝜙 in an experiment (𝜒, 𝐴, 𝑃ℎ : ℎ ∈ 𝐻) is a measurable map 𝜙: 𝜒 ⟼ [0,1] on the sample space. • The power function of a test 𝜙 is the function ℎ ⟼ 𝜋 ℎ = 𝐸ℎ 𝜙 𝑋 . Theorem: Let the sequence of experiments ℰ𝑛 = (𝑃𝑛,ℎ : ℎ ∈ 𝐻) converge to a dominated experiment ℰ= (𝑃ℎ : ℎ ∈ 𝐻). Suppose that a sequence of power functions 𝜋𝑛 of tests in ℰ𝑛 converges poinwise, i.e., 𝜋𝑛 ℎ → 𝜋(ℎ), for every h and some arbitrary function 𝜋. Then 𝜋 is a power function in the limit experiment, i.e., there exists a test 𝜙 in ℰ with 𝜋 ℎ = 𝐸ℎ 𝜙 𝑋 for every h. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 21/42 Testing Normal Means • Suppose X is 𝑁𝑘 (ℎ, Σ)-distributed, Σ – known, h – unknown. • Test: 𝐻0 : 𝑐 𝑇 ℎ = 0 vs. 𝐻1 : 𝑐 𝑇 ℎ > 0, for known vector c, 𝑐 𝑇 Σ𝑐 > 0 Proposition: The test that rejects 𝐻0 if 𝑐 𝑇 𝑋 > 𝑧𝛼 𝑐 𝑇 Σ𝑐 is uniformly most powerful at level 𝛼 for testing the 𝐻0 : 𝑐 𝑇 ℎ = 0 vs. 𝐻1 : 𝑐 𝑇 ℎ > 0, based on X. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 22/42 Local Asymptotic Normality • If the model (𝑃𝜃 : 𝜃 ∈ Θ) is differentiable in quadratic mean, then the local experiment converges to the Gaussian experiment (recall yesterday last talk!) 𝑃𝜃𝑛0 +ℎ/ −1 𝑘 𝑘 : ℎ ∈ 𝑅 → 𝑁 ℎ, 𝐼 : ℎ ∈ 𝑅 𝜃 𝑛 0 • The sequence of power functions 𝜃 ↦ 𝜋𝑛 (𝜃) in original experiments induces the sequence of power functions h ↦ 𝜋𝑛 (𝜃0 + ℎ/ 𝑛) in the local experiments. Suppose 𝜋𝑛 𝜃0 + ℎ 𝑛 → 𝜋 ℎ ∀ℎ, some 𝜋. Then by the asymptotic representation theorem, this limit 𝜋 is the power function in the Gaussian limit experiment. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 23/42 Local Asymptotic Normality • Suppose 𝜃-real, 𝜋𝑛 is of asymptotic level 𝛼 to test: 𝐻0 : 𝜃 ≤ 𝜃0 vs. 𝐻1 : 𝜃 > 𝜃0 Then, 𝜋 0 = lim 𝜋𝑛 𝜃0 ≤ 𝛼, and hence 𝜋 corresponds to a level 𝛼 𝑛→∞ test for: 𝐻0 : ℎ = 0 vs. 𝐻1 : ℎ > 0 in the limit experiment. • By Proposition for testing normal means, 𝜋 must be bounded by the power function of the uniformly most powerful level 𝛼 test in the limit experiment. Thus ∀h,(c=1, Σ = 𝐼𝜃−1 𝑖𝑛 𝑃𝑟𝑜𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛) 0 ℎ lim 𝜋𝑛 𝜃0 + ≤ 1 − Φ 𝑧𝛼 − ℎ 𝐼𝜃0 𝑛→∞ 𝑛 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 24/42 Local Asymptotic Normality • As stated earlier, sequence of power function 𝜋𝑛 𝜃0 + ℎ/ 𝑛 → 1 − Φ(𝑧𝛼 − ℎ𝑠) for every h, has slope s. From the upper bound, 𝐼𝜃0 is the largest possible slope. • The relative efficiency of the best test and the test with a slope s is: 𝐼𝜃0 /𝑠 2 which can be interpreted as the number of observations needed with the given sequence of tests with the slope s divided by the number of observations needed with the best test to obtain the same power. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 25/42 Local Asymptotic Normality Theorem 15.4: Let Θ ⊂ 𝑅𝑘 -open, 𝜓: Θ ⟼ 𝑅-differentiable in 𝜃0 , with 𝜓 ≠ 0: 𝜓 𝜃0 = 0. Let (𝑃𝑛 ,𝜃 : 𝜃 ∈ Θ) be locally asymptotically normal at 𝜃0 with nonsingular I, 𝑟𝑛 → ∞ -const. Then, 𝜃 ↦ 𝜋𝑛 (𝜃) of any sequence of level 𝛼 tests for testing: 𝐻0 : 𝜓(𝜃) ≤ 0 vs. 𝐻1 : 𝜓(𝜃) > 0 satisfy for every h: 𝜓𝜃0 ℎ > 0: 𝜓𝜃0 ℎ ℎ limsup𝜋𝑛 𝜃0 + ≤ 1 − Φ 𝑧𝛼 − . 𝑟𝑛 𝑛→∞ 𝑇 𝜓𝜃0 𝐼𝜃−1 𝜓 𝜃0 0 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 26/42 Local Asymptotic Normality Addendum: Let 𝑇𝑛 be statistics such that 𝜓𝜃0 𝐼𝜃−1 Δ 0 𝑛,𝜃0 𝑇𝑛 = + 𝑜𝑃𝑛,𝜃 1 . 0 −1 𝑇 𝜓𝜃0 𝐼𝜃0 𝜓𝜃0 Then the sequence of tests that reject 𝐻0 for the values of 𝑇𝑛 > z𝛼 is asymptotically optimal in the sense that the sequence for every h 𝑃𝜃0 +𝑟𝑛−1 ℎ 𝑇𝑛 ≥ 𝑧𝛼 → 1 − Φ 𝑧𝛼 − 𝜓𝜃0 ℎ 𝑇 𝜓𝜃0 𝐼𝜃−1 𝜓𝜃 0 0 *(Δ𝑛,𝜃0 - sequence of statistics that converges in distribution under 𝜃0 to a normal 𝑁𝑘 (0, 𝐼𝜃0 )-distribution). 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 27/42 Local Asymptotic Normality • The point 𝜃0 in the theorem is on the boundary of 𝐻0 and 𝐻1 . • If the dimension k>1, then this boundary is (k-1)-dimentional, and there are many possible values for 𝜃0 . • If dimension k=1, the boundary point 𝜃0 is typically unique −1/2 and hence known, and we could use Tn = I𝜃0 Δ𝑛,𝜃0 to construct an optimal sequence of tests for the problem 𝐻0 : 𝜃 = 𝜃0 .There are known as score tests. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 28/42 One-Sample Location • 𝑋1 , 𝑋2 , … , 𝑋𝑛 sample from a 𝑓(𝑥 − 𝜃)-density, 𝑓-symmetric about 0, has finite 𝐼𝑓 , may be known or (partially) unknown. • To test: 𝐻0 : 𝜃 = 0 vs. 𝐻1 : 𝜃 > 0. • For fixed 𝑓, ( 𝑛𝑖=1 𝑓 𝑥𝑖 − 𝜃 : 𝜃 ∈ R ) is locally asymptotically normal at 𝜃 = 0 with Δ𝑛,0 = −𝑛−1/2 𝑛𝑖=1 𝑓/𝑓′ (𝑋𝑖 ), norming rate 𝑛, Fisher information 𝐼𝑓 . • From the preceding sections, the best asymptotic level 𝛼 power function for known 𝑓 is 1 − Φ 𝑧𝛼 − ℎ 𝐼𝑓 . • 𝑇𝑛 = − 1 1 𝑛 𝐼𝑓 ′ 𝑛 𝑓 𝑖=1 𝑓 𝑋𝑖 + 𝑜𝑃0 (1) • Than according to the Theorem 15.4, the sequence of tests that reject 𝐻0 if 𝑇𝑛 > 𝑧𝛼 attains bound and hence is asymptotically optimal. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 29/42 One-Sample Location Example(t-test): The standard normal density 𝑓0 possesses score function 𝑓0′ /𝑓0 𝑥 = −𝑥 and I𝑓0 = 1. Consequently, if the underlying distribution is normal, then the optimal test statistics should satisfy: Tn = 𝑛𝑋𝑛 /𝜎 + 𝑜𝑃0 (𝑛−1/2 ). The t-statistics 𝑋𝑛 /𝑆𝑛∗ fulfill the requirements. That is the case because for normally distributed observations the ttest is uniformly most powerful for every finite n and hence is certainly asymptotically optimal. *t-statistics simply replaces unknown standard deviation 𝜎 by an estimate 𝑆𝑛 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 30/42 One-Sample Location In this example, t-statistics simply replaces the unknown standard deviation 𝜎 by an estimate. This approach can be followed for the most scale families. Under some regularity conditions, the statistics 𝑛 1 1 𝑓0′ 𝑋𝑖 𝑇𝑛 = − 𝑓0 𝜎𝑛 𝑛 𝐼𝑓0 𝑖=1 Should yield asymptotically optimal tests, given a consistent sequence of scale estimators 𝜎𝑛 . 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 31/42 3. Chi-Square Tests Quadratic Forms in Normal Vectors • 𝜒𝑘2 ≝ • 𝑘 2 𝑍 𝑖=1 𝑖 𝑘 2 𝑍 𝑖=1 𝑖 for i.i.d. 𝑁 0,1 -distributed 𝑍1 , 𝑍2 , … , 𝑍𝑘 ≝ 𝑍 2 of standard normal vector 𝑍 = (𝑍1 , … , 𝑍𝑘 ) Lemma: If vector 𝑋 is 𝑁𝑘 (0, Σ)-distributed, then 𝑋 2 is distributed as 𝑘𝑖=1 𝜆2𝑖 𝑍𝑖2 for i.i.d. 𝑁 0,1 -distributed 𝑍1 , … , 𝑍𝑘 and 𝜆1 , … , 𝜆𝑘 the eigenvalues of Σ. Proof: There exists an orthogonal matrix 𝑂 : 𝑂Σ𝑂𝑇 = 𝑑𝑖𝑎𝑔(𝜆𝑖 ). Then the vector 𝑂𝑋~𝑁𝑘 (0, 𝑑𝑖𝑎𝑔(𝜆𝑖 )), which is the same as the distribution of the vector ( 𝜆1 𝑍1 , … , 𝜆𝑘 𝑍𝑘 ). Now 𝑋 2 = 𝑂𝑋 2 14/02/2015 has the same distribution as 𝑘 𝑖=1 Olena Korzhevska. Asymptotic Statistics Seminar 2 𝜆𝑖 𝑍𝑖 . 33/42 Pearson Statistics • Suppose we observe 𝑋𝑛 = (𝑋𝑛,1 , … , 𝑋𝑛,𝑘 ) with multinomial distribution corresponding to 𝒏 trials and 𝒌 classes having probabilities 𝑝 = (𝑝1 , … , 𝑝𝑘 ). • The Pearson statistics for the testing 𝐻0 : 𝑝 = 𝑎 is given by 𝑘 𝐶𝑛 a = 𝑖=1 𝑋𝑛,𝑖 − 𝑛𝑎𝑖 𝑛𝑎𝑖 2 Theorem: If the vector 𝑋𝑛 is multinomially distributed with the parameters 𝑛 and 𝑎 = 𝑎1 , … , 𝑎𝑘 > 0, then the sequence 𝑃 2 𝐶𝑛 a → 𝜒𝑘−1 under 𝑎. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 34/42 Pearson Statistics • The Pearson statistic is oddly asymetric in the observed and true frequencies(which is motivated be the form of the asymptotic covariance matrix). • The method to symmetrize the statistic leads to the Hellinger statistic 𝑘 𝐻𝑛2 a =4 𝑖=1 𝑋𝑛,𝑖 − 𝑛𝑎𝑖 𝑘 2 𝑋𝑛,𝑖 + 𝑛𝑎𝑖 2 =4 𝑋𝑛,𝑖 − 𝑛𝑎𝑖 2 𝑖=1 • Up to a multiplicative constant it’s a Hellinger distance between the discrete probability distribution on {1, … , 𝑘} with probability vectors 𝑎 and 𝑋𝑛 /𝑛, respectively. 𝑃 • As (𝑋𝑛 /𝑛 − 𝑎) → 0, 𝐻𝑛2 is asymptotically equivalent to 𝐶𝑛 . 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 35/42 Testing Independence • Suppose that each element of a population can be classified by two characteristics, having 𝒌 and 𝒓 levels, respectively : 𝑁11 ⋯ 𝑁1𝑟 𝑁1 . ⋮ ⋱ ⋮ ⋮ 𝑁𝑘1 ⋯ 𝑁𝑘𝑟 𝑁𝑘 . ………………………… 𝑁.1 … 𝑁.𝑟 𝑁 • Classification for a random sample of size 𝒏 from the population – matrix 𝑋𝑛,𝑖𝑗 : multinomially distributed with parameters 𝒏 and probabilities 𝑝𝑖𝑗 = 𝑁𝑖𝑗 /𝑁. • 𝐻0 : 𝑝𝑖𝑗 = 𝑎𝑖 𝑏𝑗 − 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 for unknown probability vectors 𝑎𝑖 and 𝑏𝑗 . 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 36/42 Testing Independence • The ML-estimators of 𝑎 and 𝑏 under 𝐻0 : 𝑎𝑖 = 𝑋𝑛,𝑖. /𝑛 and 𝑏𝑗 = 𝑋𝑛,.𝑗 /𝑛 • Modified Pearson statistic with these estimators: 𝑘 𝑟 𝐶𝑛 𝑎𝑛 ⨂𝑏𝑛 = 𝑖=1 𝑗=1 𝑋𝑛,𝑖𝑗 − 𝑛𝑎𝑖 𝑏𝑗 2 𝑛𝑎𝑖 𝑏𝑗 Corollary: If the (𝑘 × 𝑟) matrices 𝑋𝑛 are multinomially distributed with parameters 𝑛 and 𝑝𝑖𝑗 = 𝑎𝑖 𝑏𝑗 > 0, then the sequence 𝐶𝑛 𝑎𝑛 ⨂𝑏𝑛 converges in distribution to the 2 𝜒(𝑘−1)(𝑟−1) -distribution. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 37/42 Testing Independence Example: Google wants to test the performance of new search algorithms. Google might test three algorithms using a sample of 10,000 google.com search queries. Search algorithm No new search New search Total current test 1 test 2 3511 1749 1818 1489 751 682 5000 2500 2500 Total 7078 2922 10000 • To test: 𝐻0 : The algorithms each perform equally well. 𝐻1 : The algorithms do not perform equally well. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 38/42 Testing Independence Example: ML estimators for 𝑎 and 𝑏: 𝑎𝑖 = 𝑋𝑛,𝑖. /𝑛, 𝑏𝑗 = 𝑋𝑛,.𝑗 /𝑛, • 𝑛𝑎𝑖 𝑏𝑗 = 𝑋𝑛,𝑖. ∙ 𝑋𝑛,.𝑗 /𝑛 – expected count of each cell (ij). Search algorithm current test 1 test 2 No new search 3511 (3539) 1749 (1769.5) 1818 (1769.5) New search 1489 (1461) 751 (730.5) 682 (730.5) Total 5000 2500 2500 • 𝐶𝑛 𝑎𝑛 ⨂𝑏𝑛 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 2 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 Total 7078 2922 10000 = 6.120 • 𝑑𝑓 = 𝑘 − 1 𝑟 − 1 = 2 − 1 3 − 1 = 2 • p−value = 0.047, thus we reject 𝐻0 at significance level 𝛼 = 0.05. That is, the data provide convincing evidence that there is some difference in performance among the algorithms. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 39/42 Goodness-of-Fit Tests • Given a random sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 from a distribution 𝑃, we want to test H0 : 𝑃 ∈ 𝒫0 • Testing goodness-of-fit typically focuses on no particular alternative, that is why 𝜒 2 statistics are reasonable. • Partition 𝑋 =∪𝑗 𝑋𝑗 of the sample space into finitely many sets • ℙ𝑛 𝐴 = 𝑛−1 (1 ≤ 𝑖 ≤ 𝑛: 𝑋𝑖 ∈ 𝐴) fraction of observations in 𝐴 • Vector 𝑛(ℙ𝑛 𝑋1 , … , ℙ𝑛 𝑋𝑘 ) is multinominal distributes, modified chi-squared statistics is given: 𝑘 𝑛 ℙ𝑛 𝑋𝑖 − 𝑃 𝑋𝑖 𝑖=1 14/02/2015 2 𝑃 𝑋𝑖 Olena Korzhevska. Asymptotic Statistics Seminar 40/42 Asymptotic Efficiency • The asymptotic null distributions of various versions of the Pearson statistic enable us to set critical values but by themselves do not give information on the asymptotic power of the tests. • The asymptotic power can be measured in various ways: – the most important method – to consider local limiting power functions (discussed earlier) – A second method to evaluate the asymptotic power is by Bahadur efficiencies 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar 41/42 Thank you for attention. 14/02/2015 Olena Korzhevska. Asymptotic Statistics Seminar

Efficiency and Relative Efficiency of Tests. Chi

Related documents

Products

Support

Efficiency and Relative Efficiency of Tests. Chi

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib