Chi-Square Test and Goodness-of-Fit Testing Ming-Tsung Hsu OPLab@im.ntu.edu.tw 1 Outline Goal of Hypothesis Test Terms & Notation Chi-Square Test Goodness-of-Fit Testing Example OPLab@im.ntu.edu.tw 2 Goal of Hypothesis Test To examine statistical evidence, and to determine whether it supports or contradicts a claim The life of lamps is more than 10,000 hours The data are from normal distribution To reduce the directly-relevant data to a “level of suspicion” based purely on the data OPLab@im.ntu.edu.tw 3 Terms & Notation Null Hypothesis (H0) vs. Alternative hypothesis (H1 or HA) Parametric Test vs. Non-Parametric Test Significance level (α) and Critical Region “Reject H0” vs. “Do not reject H0“ Central Limit Theorem Type I Error vs. Type II Error Sampling distribution of the sample mean Test Statistic vs. Table Value P-value OPLab@im.ntu.edu.tw 4 Null Hypothesis vs. Alternative hypothesis H 0 : 0 H1 : 0 H 0 : Data are from norm aldistribution H 1 : Not H 0 OPLab@im.ntu.edu.tw 5 Type I Error vs. Type II Error Type I error H0 is true but reject H0 Pr(reject H0 | H0) = α Type II error H1 is true but do not reject H0 Pr(do not reject H0 | H1) = β OPLab@im.ntu.edu.tw 6 Parametric Test vs. Non-Parametric Test Parametric Test Parameters of population Mean test, variance test, etc. Non-Parametric Test Make no assumptions about the frequency distributions of the variables being assessed Independent test, distribution test, etc. OPLab@im.ntu.edu.tw 7 Significance level (α) and Critical Region OPLab@im.ntu.edu.tw 8 Central Limit Theorem Central Lim it Theorem(CLT) If X 1 , , X n is a random sam ple from a distribution with m ean and variance 2 , then the lim iting distribution of n Zn X i 1 i n n is the standardnorm al, d Zn Z ~ N (0, 1) as n OPLab@im.ntu.edu.tw 9 Test Statistic vs. Table Value T . S. : Z X 0 n T .V . : Z 1 (One Sided), Z1 (T wo - Sided) 2 Z 0.95 1.645, Z 0.975 1.96 Z 0.99 2.326 Decision Rule : | Z | T . V . Rej H 0 OPLab@im.ntu.edu.tw 10 P-value p value p ( X x | ) Decision rule: p value (one sided) or (t wo - sided) 2 Rej H 0 OPLab@im.ntu.edu.tw 11 Chi-Square Test Non-Parametric Test Goodness-of-Fit Test T. S. ~χ2(ν) Also known as “Pearson's chi-square test” Independent Test Homogeneity Test OPLab@im.ntu.edu.tw 12 Goodness-of-Fit Testing Used to test if a sample of data came from a population with a specific distribution T . S. : 2 ( O E ) i 2 i ~ 2 (k 1 m) Ei i 1 k Oi :Observations of ith group Ei :Expected frequency of ith group k:Number of groups m: Number of estimated parameters K-1-m: Degree of freedom OPLab@im.ntu.edu.tw 13 Example OPLab@im.ntu.edu.tw 14 Parameter Estimation - λ The MLE of 1 1 ˆ 0.246 t 4.06 OPLab@im.ntu.edu.tw 15 Observations and Expected Frequencies Interval Obs t F(t) = p(T < t) C.F. Frequency 0~<1 14 1 0.218078 12.86659 12.86659 1 ~ < 2.5 12 2.5 0.459359 27.10219 14.2356 2.5 ~ < 5 18 5 0.707707 41.75474 14.65255 5 ~ < 7.5 5 7.5 0.841975 49.67651 7.921768 7.5 ~ < 10 5 10 0.914565 53.95934 4.282832 ≧10 ≧10 1 59 5.040662 5 ?! OPLab@im.ntu.edu.tw 16 Test Statistic and P-value T . S. : (Oi E i ) 2 2.4137 Ei i 1 k 2 p value: P ( 2 2.4137| 2 (6 1 1)) 0.66 OPLab@im.ntu.edu.tw 17 Observations and Expected Frequencies Paper 18 12.87 14.24 14.65 7.92 4.28 5.04 T . S. : (Oi Ei ) 2 2.043 Ei i 1 k 2 p value P ( 2 2.403| 2 (6 1 1)) 0.72785 OPLab@im.ntu.edu.tw 18 Re-Grouping ID lower upper Freq. Obs t F(x) C. F. E. F. 30 3.3 0.556 32.812 32.813 17 6.3 0.788 46.486 13.673 1 0.3 3.3 30 2 3.3 6.3 17 3 6.3 9.3 6 6 9.3 0.899 53.020 6.534 4 9.3 12.3 3 6 ≧9.3 1 5.980 5 12.3 15.3 1 T . S. : 6 15.3 18.3 1 7 18.3 21.3 1 (Oi Ei ) 2 1.0942 Ei i 1 # of groups = 1+3.322*log(n) 59 k 2 p value P ( 2 1.0942| 2 (4 1 1)) 0.579 OPLab@im.ntu.edu.tw 19