• • 3 1 has the Chi-square Distribution with n − 1 degrees of freedom or df , when the sample is from a normal population. (n − 1) S 2/σ 2 = χ2 Recall that the sample variance s2 = (y − ȳ)2/(n − 1) is the point estimate of σ 2. For tests and confidence intervals about σ 2 we use the fact that the sampling random variable Estimation and Tests for a Single Population Variance INFERENCES ABOUT POPULATION VARIANCES The distribution appear to be more skewed for smaller values of df and become more symmetric as df increases. Because the chi-square distribution is nonsymmetric, the percentiles for probabilities at both ends needs to be tabulated. • • 4 where • Since df = n − 1 look up χ2n−1 percentiles. • χ2L is the lower-tail value with area α/2 to the left. • χ2U is the upper-tail value with area α/2 to the right. • The confidence interval for the standard deviation σ is found by taking square roots of both end points of above. (n − 1)s2 (n − 1)s2 2 < σ < χ2U χ2L This has the form: A 100(1-α)% Confidence Interval for σ 2 Plots of the chi-square distribution for df = 5, 15, and 30 are shown in Fig. 7.3 • 2 Percentiles of the chi-square distribution are given in Table 7. Like the Student’s t distribution, there is a different curve for each sample size n i.e., for each value of df . • • The chi-square distribution is nonsymmetric. • χ2U = χ20.005,29 = 52.34 = 29(3.433 ) 52.34 2 = 6.53 (n−1)s χ2L 2 = χ2 = (n−1)s σ02 1. Reject H0 if χ2 > χ2U , where χ2U ≡ χ2α, n−1 2. Reject H0 if χ2 < χ2L, where χ2L ≡ χ21−α, n−1 3. Reject H0 if χ2 > χ2U , where χ2U ≡ χ2α/2, n−1, or if χ2 < χ2L, where χ2L ≡ χ21−α/2, n−1 Rejection Region: for specified α and df = n − 1, Test Statistic: 2 Test: 1. H0 : σ 2 ≤ σ02 vs. Ha : σ 2 > σ02 2. H0 : σ 2 ≥ σ02 vs. σ 2 < σ02 3. H0 : σ 2 = σ02 vs. σ 2 = σ02 Tests of Hypotheses about σ 2 2 29(3.433 ) 13.12 Thus (6.53, 26.05) is a 99% C.I. for σ 2 (n−1)s χ2U 2 = 26.05 Thus the endpoints for the required C.I. are, respectively,: χ2L = χ20.995,29 = 13.12 α/2 = .005 and 1 − α/2 = .995, we compute 7 5 (19)(6.2) 4 = 29.45 8 Since 29.45 is not in R.R., we fail to reject H0. The p-value for this test is given by P (χ219 > 29.45). Using Table 7, it is also seen that the p-value lies between 0.05 and 0.10. R.R.: χ2 > 30.14 χ2U = χ20.05,19 = 30.14 T.S. χ2c = Test H0 : σ 2 ≤ 4 vs. Ha : σ 2 > 4 Given n = 20, s2 = 6.2 Example 7.2 The normal probability plot of the data (see text book) appear to satisfy the normality assumption needed. 6 A separate confidence interval for μ is computed for checking whether the design specification for the mean is satisfied. (See text book for this calculation.) The filling machine was designed to be so that σ = 4 grams, and since this value falls in the above interval, the machine satisfies the design specification for standard deviation. Since α = .1, A 99% confidence interval for σ 2 is computed as follows: Thus we are 99% confident that the standard deviation of the weights of coffee containers lies in the interval (2.56, 5.10). we get (2.56, 5.10) as a 99% C.I. for σ. Taking square roots of both endpoints of the interval for σ 2: From the data n = 30, ȳ = 500.453, s = 3.433 were calculated. Example 7.1 The normal probability plot of the data (see text book) appear to show that the sample is from a normal distribution. Always plotting sample data as a preliminary procedure using boxplot or normal probability plot to look for skewness, or outliers, is recommended. • • • • s21/σ12 s22/σ22 11 Table 6 gives percentiles of the F-distribution only for upper tail areas. Fc = df1 is the degrees of freedom associated with S12. df2 is the degrees of freedom associated with S22. The F distribution is a two parameter family indexed by df1 and df2. Some use the terms Numerator and Denominator degrees of freedom for df1 and df2. The F-statistic is calculated as • • • • We don’t have a CLT-type theorem which applies to S 2 like we have in case of Ȳ . • • • • • • They are more sensitive to departures from normality than the inferences about population mean. • 9 • The above inferences are based on the assumption of sampling from a normal population. • S12/σ12 S22/σ22 12 The lower tail values, when needed can be obtained by the relationship 1 F1−α,df1,df2 = Fα,df1∗,df2∗ ∗ ∗ where df1 = df2 and df2 = df1. That is, to calculate a left tail probability, look-up the corresponding right tail probability after switching the numerator and denominator degrees of freedom. For example F.95,3,10 is calculated using the above relationship as 1/F.05,10,3 = 1/8.79 = 0.11 And F.95,10,3 is calculated using the above relationship as 1/F.05,3,10 = 1/3.71 = 0.27 10 has the F distribution with degrees of freedom df1 = n1 − 1 and df2 = n2 − 1. F = We will consider the case of independent samples from two populations having variances σ12 and σ22. The main application is testing σ12 = σ22. Theory says that when the two populations are Normal, and sample sizes are n1 and n2 then the random variable Estimation and Tests for Comparing Two Population Variances F = = 0.105 0.058 = 1.81 13 15 Conclusion: The assumption of equal variances for the two populations appears reasonable. Since Fc = 1.81 does not fall in the rejection region, we fail to reject H0 : σ12 = σ22. Thus we reject if F ≤ 0.25 or F ≥ 4.03. F0.025,9,9 = 4.03 and F0.975,9,9 = 1/F0.025,9,9 = 1/4.03 = 0.25 From Table 8, using α = 0.05, df1 = 9, df2 = 9, we have Rejection Region: F ≤ F.975,9,9 or F ≥ F.025,9,9 Test Statistic: Fc = s21 s22 Test: H0 : σ12 = σ22 vs. Ha : σ12 = σ22 2. Reject H0 if F ≤ F1−α/2,df1,df2 or F ≥ Fα/2,df1,df2 1. Reject H0 if F ≥ Fα,df1,df2 s21 s22 Rejection Region: For given α, df1 = n1 −1, and df2 = n2 −1. Test Statistic: 2. H0 : σ12 = σ22 vs. Ha : σ12 = σ22 1. H0 : σ12 ≤ σ22 vs. Ha : σ12 > σ22 Test: Tests comparing two variances s22 = 0.058 with df1 = 9 and df2 = 9. 14 s2max s2min where s2max=largest sample Note: Use t, df2 = n − 1 and a = α to read Table 12 16 Rejection Region: For given α, reject H0 if Fmax exceeds the tabulated percentile from Table 12, for α . variance and s2min=smallest sample variance Test Statistic: Fmax = Test: H0 : σ12 = σ22 = . . . = σt2 vs. Ha : Not all σ 2’s equal This test requires that independent random samples of equal samples sizes, n, are drawn from t populations having normal distributions. Hartley’s Fmax Test for Homogeneity of Variances Test comparing several variances s21 = 0.105, The two independent samples gave sample variances: using α = .05 H0 : σ12 = σ22 vs. Ha : σ12 = σ22 To check whether this is a valid assumption, let us formally test In order to test the hypothesis H0 : μ1 − μ2 = 0 using a two-sample t-statistic with pooled sample variance, we need to assume σ12 = σ22. Example 7.5 (continuation of Example 6.1) • • • • • 17 When sample sizes are not all equal an approximate test is obtained if n is replaced with nmax, where nmax is largest sample size, in the above procedure. Hartley’s test is quite sensitive to violation of the normality assumption. Thus other procedures that do not require the normality assumption, such as Levene’s test described in the text, have been proposed. Levene’s test is too cumbersome to calculate by hand; so software may have to be used. Levine’s test procedure is less powerful than Hartley’s test when populations have normal distributions. 18