Chapter 7 Inferences Regarding Population Variances Introduction • Population Variance: Measure of average squared deviation of individual measurements around the mean N 2 ( Y ) i V (Y ) s 2 E[(Y ) 2 ] i 1 N • Sample Variance: Measure of “average” squared deviation of a sample of measurements around their sample mean. Unbiased estimator of s2 y n s2 i 1 i y n 1 2 Sampling Distribution of s2 (Normal Data) • Population variance (s2) is a fixed (unknown) parameter based on the population of measurements • Sample variance (s2) varies from sample to sample (just as sample mean does) • When Y~N(,s), the distribution of (a multiple of) s2 is Chi-Square with n-1 degrees of freedom. (n-1)s2/s2 ~ c2 with df=n-1 • Chi-Square distributions – Positively skewed with positive density over (0,) – Indexed by its degrees of freedom (df) – Mean=df, Variance=2(df) – Critical Values given in Table 7, pp. 1095-1096 Chi-Square Distributions Chi-Square Distributions 0.2 0.18 df=4 0.16 0.14 df=10 df=20 0.12 f(X^2) f 1(y) f 2(y) df=30 0.1 f 3(y) f 4(y) df=50 f 5(y) 0.08 0.06 0.04 0.02 0 0 10 20 30 40 X^2 50 60 70 Chi-Square Distribution Critical Values Chi-Square Distribution (df=10) 0.12 0.1 a 0.995 0.990 0.975 0.950 0.900 0.100 0.050 0.025 0.010 0.005 0.08 .95 f (X^2) 0.06 0.04 .025 0.02 .025 0 0 5 10 15 20 25 -0.02 3.247 X^2 20.48 30 35 40 c 2(a ) df=10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 Chi-Square Critical Values (2-Sided Tests/CIs) f(X^2) 1-a a/2 a/2 c 2L c 2U (1-a)100% Confidence Interval for s2 (or s) • Step 1: Obtain a random sample of n items from the population, and compute s2 • Step 2: Choose confidence level (1-a ) • Step 3: Obtain c2L and c2U from the table of critical values for the chi-square distribution with n-1 df • Step 4: Compute the confidence interval for s2 based on the formula below • Step 5: Obtain confidence interval for standard deviation s by taking square roots of bounds for s2 2 2 ( n 1 ) s ( n 1 ) s 2 (1 a )100% CI for s : , 2 2 cL cU Statistical Test for s2 • Null and alternative hypotheses – 1-sided (upper tail): H0: s2 s02 Ha: s2 > s02 – 1-sided (lower tail): H0: s2 s02 Ha: s2 < s02 – 2-sided: H0: s2 = s02 Ha: s2 s02 • Test Statistic 2 c obs (n 1) s 2 s 2 0 • Decision Rule based on chi-square distribution w/ df=n-1: – 1-sided (upper tail): Reject H0 if cobs2 > cU2 = ca2 – 1-sided (lower tail): Reject H0 if cobs2 < cL2 = c1-a2 – 2-sided: Reject H0 if cobs2 < cL2 = c1-a/2 2 (Conclude s2 < s02) or if cobs2 > cU2 = ca /22 (Conclude s2 > s02 ) Inferences Regarding 2 Population Variances • Goal: Compare variances between 2 populations s 12 • Parameter: 2 (Ratio is 1 when variances are equal) s2 • Estimator: s12 s22 (Ratio of sample variances) • Distribution of (multiple) of estimator (Normal Data): s12 s 12 s12 s22 2 2 ~F 2 2 s2 s 2 s 1 s 2 with df1 n1 1 and df 2 n2 1 F-distribution with parameters df1 = n1-1 and df2 = n2-1 Properties of F-Distributions Take on positive density over the range (0 , ) Cannot take on negative values Non-symmetric (skewed right) Indexed by two degrees of freedom (df1 (numerator df) and df2 (denominator df)) • Critical values given in Table 8, pp 1097-1108 • Parameters of F-distribution: • • • • df 2 (df 2 2) df 2 2 2 2 df 2 2 df 2 df1 2 s df1 (df 2 2) 2 (df 2 4) df 2 4 F-Distributions 0.9 0.8 0.7 Density Function of F 0.6 0.5 f(5,5) 0.4 f(5,10) f(10,20) 0.3 0.2 0.1 0 0 1 2 3 4 5 -0.1 F 6 7 8 9 10 Critical Values of F-Distributions • Notation: Fa, df1, df2 is the value with upper tail area of a above it for the F-distribution with degrees’ of freedom df1 and df2, respectively • F1-a, df1, df2 = 1/ Fa, df2, df1 (Lower tail critical values can be obtained from upper tail critical values with “reversed” degrees of freedom) • Values given for various values of a, df1, and df2 in Table 8, pp 1097-1108 Critical Values of F (df1=5,df2=5) 0.7 upper area middle area lower area 0.25 0.5 0.25 0.1 0.8 0.1 0.05 0.9 0.05 0.025 0.95 0.025 0.01 0.98 0.01 0.005 0.99 0.005 0.001 0.998 0.001 0.6 .05 Density Function of F 0.5 0.4 upper cv 1.8947 3.4530 5.0503 7.1464 10.9671 14.9394 29.7514 lower cv 0.5278 0.2896 0.1980 0.1399 0.0912 0.0669 0.0336 .90 0.3 0.2 0.1 .05 0 0 1 2 F(.95,5,5)=1/F(.05,5,5)=1/5.05=.198 3 4 5 F 6 F(.05,5,5)=5.05 7 8 9 10 Test Comparing Two Population Variances • Assumption: the 2 populations are normally distributed 1 - Sided Test : H 0 : s 12 s 22 Test Statistic : Fobs H a : s 12 s 22 s12 2 s2 Rejection Region : Fobs Fa ,n1 1,n2 1 P value : P( F Fobs ) 2 - Sided Test : H 0 : s 12 s 22 Test Statistic : Fobs H a : s 12 s 22 s12 2 s2 Rejection Region : Fobs Fa / 2,n1 1,n2 1 (s 12 s 22 ) or Fobs F1a / 2,n1 1,n2 1 (s 12 s 22 ) P value : 2min( P ( F Fobs ), P( F Fobs )) (1-a)100% Confidence Interval for s12/s22 • Obtain ratio of sample variances s12/s22 = (s1/s2)2 • Choose a, and obtain: – FL = F1a/2, n2-1, n1-1 = 1/ Fa/2, n1-1, n2-1 – FU = Fa/2, n2-1, n1-1 • Compute Confidence Interval: s s F , F L U s s 2 1 2 2 2 1 2 2 Conclude population variances unequal if interval does not contain 1 Tests Among t > 2 Population Variances • Hartley’s Fmax Test – – – – Very simple to compute Test Statistic Must have equal sample sizes (n1 = … = nt) Test based on assumption of normally distributed data Uses special table for critical values • Levene’s Test – – – – More difficult to compute by hand No assumptions regarding sample sizes/distributions Uses F-distribution for the test Computed automatically by software packages (SAS,SPSS, Minitab) Hartley’s Fmax Test • H0: s12 = … = st2 (homogeneous variances) • Ha: Population Variances are not all equal • Data: smax2 is largest sample variance, smin2 is smallest • Test Statistic: Fmax = smax2/smin2 • Rejection Region: Fmax F* (Values from class website, indexed by a (.05, .01), t (number of populations) and df2 (n-1, where n is the individual sample sizes) Levene’s Test • H0: s12 = … = st2 (homogeneous variances) • Ha: Population Variances are not all equal • Data: For each group, obtain the following quantities: yij the j th measuremen t from group i (i 1,..., t j 1,..., ni ) ~ y i sample median for group i (i 1,..., t ) ~ zij yij y i (i 1,..., t j 1,..., ni ) ni z i. z j 1 ni ni t z ij i 1 j 1 z .. N n1 ... nt N (t 1) z (N t) t Test Statistic : L ij ni z i. z .. i 1 t ni i 1 j 1 ij z i. Rejection Region : L Fa ,t 1, N t 2 2