Chapter 7 Slides

advertisement
Chapter 7
Inferences Regarding Population
Variances
Introduction
• Population Variance: Measure of average squared
deviation of individual measurements around the mean
N
2
(
Y


)
 i
V (Y )  s 2  E[(Y   ) 2 ] 
i 1
N
• Sample Variance: Measure of “average” squared
deviation of a sample of measurements around their
sample mean. Unbiased estimator of s2
 y
n
s2 
i 1
i
y
n 1

2
Sampling Distribution of s2 (Normal Data)
• Population variance (s2) is a fixed (unknown)
parameter based on the population of measurements
• Sample variance (s2) varies from sample to sample
(just as sample mean does)
• When Y~N(,s), the distribution of (a multiple of) s2 is
Chi-Square with n-1 degrees of freedom.
(n-1)s2/s2 ~ c2 with df=n-1
• Chi-Square distributions
– Positively skewed with positive density over (0,)
– Indexed by its degrees of freedom (df)
– Mean=df, Variance=2(df)
– Critical Values given in Table 7, pp. 1095-1096
Chi-Square Distributions
Chi-Square Distributions
0.2
0.18
df=4
0.16
0.14
df=10
df=20
0.12
f(X^2)
f 1(y)
f 2(y)
df=30
0.1
f 3(y)
f 4(y)
df=50
f 5(y)
0.08
0.06
0.04
0.02
0
0
10
20
30
40
X^2
50
60
70
Chi-Square Distribution Critical Values
Chi-Square Distribution (df=10)
0.12
0.1
a
0.995
0.990
0.975
0.950
0.900
0.100
0.050
0.025
0.010
0.005
0.08
.95
f (X^2)
0.06
0.04
.025
0.02
.025
0
0
5
10
15
20
25
-0.02
3.247
X^2
20.48
30
35
40
c 2(a ) df=10
2.156
2.558
3.247
3.940
4.865
15.987
18.307
20.483
23.209
25.188
Chi-Square Critical Values (2-Sided Tests/CIs)
f(X^2)
1-a
a/2
a/2
c 2L
c 2U
(1-a)100% Confidence Interval for s2 (or s)
• Step 1: Obtain a random sample of n items from the
population, and compute s2
• Step 2: Choose confidence level (1-a )
• Step 3: Obtain c2L and c2U from the table of critical
values for the chi-square distribution with n-1 df
• Step 4: Compute the confidence interval for s2 based
on the formula below
• Step 5: Obtain confidence interval for standard
deviation s by taking square roots of bounds for s2
2
2


(
n

1
)
s
(
n

1
)
s
2

(1  a )100% CI for s : 
,
2
2
cL 
 cU
Statistical Test for s2
• Null and alternative hypotheses
– 1-sided (upper tail): H0: s2  s02 Ha: s2 > s02
– 1-sided (lower tail): H0: s2  s02 Ha: s2 < s02
– 2-sided: H0: s2 = s02 Ha: s2  s02
• Test Statistic
2
c obs

(n  1) s 2
s
2
0
• Decision Rule based on chi-square distribution w/ df=n-1:
– 1-sided (upper tail): Reject H0 if cobs2 > cU2 = ca2
– 1-sided (lower tail): Reject H0 if cobs2 < cL2 = c1-a2
– 2-sided: Reject H0 if cobs2 < cL2 = c1-a/2 2 (Conclude s2 < s02)
or if cobs2 > cU2 = ca /22 (Conclude s2 > s02 )
Inferences Regarding 2 Population Variances
• Goal: Compare variances between 2 populations
s 12
• Parameter: 2 (Ratio is 1 when variances are equal)
s2
• Estimator:
s12
s22
(Ratio of sample variances)
• Distribution of (multiple) of estimator (Normal Data):
s12 s 12
s12 s22
 2 2 ~F
2
2
s2 s 2 s 1 s 2
with df1  n1  1 and df 2  n2  1
F-distribution with parameters df1 = n1-1 and df2 = n2-1
Properties of F-Distributions
Take on positive density over the range (0 , )
Cannot take on negative values
Non-symmetric (skewed right)
Indexed by two degrees of freedom (df1 (numerator df)
and df2 (denominator df))
• Critical values given in Table 8, pp 1097-1108
• Parameters of F-distribution:
•
•
•
•
df 2

(df 2  2)
df 2  2
2
2
df
2
2 df 2  df1  2 
s 
df1 (df 2  2) 2 (df 2  4)
df 2  4 
F-Distributions
0.9
0.8
0.7
Density Function of F
0.6
0.5
f(5,5)
0.4
f(5,10)
f(10,20)
0.3
0.2
0.1
0
0
1
2
3
4
5
-0.1
F
6
7
8
9
10
Critical Values of F-Distributions
• Notation: Fa, df1, df2 is the value with upper tail area of a
above it for the F-distribution with degrees’ of freedom
df1 and df2, respectively
• F1-a, df1, df2 = 1/ Fa, df2, df1 (Lower tail critical values can
be obtained from upper tail critical values with
“reversed” degrees of freedom)
• Values given for various values of a, df1, and df2 in
Table 8, pp 1097-1108
Critical Values of F (df1=5,df2=5)
0.7
upper area middle area lower area
0.25
0.5
0.25
0.1
0.8
0.1
0.05
0.9
0.05
0.025
0.95
0.025
0.01
0.98
0.01
0.005
0.99
0.005
0.001
0.998
0.001
0.6
.05
Density Function of F
0.5
0.4
upper cv
1.8947
3.4530
5.0503
7.1464
10.9671
14.9394
29.7514
lower cv
0.5278
0.2896
0.1980
0.1399
0.0912
0.0669
0.0336
.90
0.3
0.2
0.1
.05
0
0
1
2
F(.95,5,5)=1/F(.05,5,5)=1/5.05=.198
3
4
5
F
6
F(.05,5,5)=5.05
7
8
9
10
Test Comparing Two Population Variances
• Assumption: the 2 populations are normally distributed
1 - Sided Test : H 0 : s 12  s 22
Test Statistic : Fobs
H a : s 12  s 22
s12
 2
s2
Rejection Region : Fobs  Fa ,n1 1,n2 1
P  value : P( F  Fobs )
2 - Sided Test : H 0 : s 12  s 22
Test Statistic : Fobs
H a : s 12  s 22
s12
 2
s2
Rejection Region : Fobs  Fa / 2,n1 1,n2 1 (s 12  s 22 )
or
Fobs  F1a / 2,n1 1,n2 1 (s 12  s 22 )
P  value : 2min( P ( F  Fobs ), P( F  Fobs ))
(1-a)100% Confidence Interval for s12/s22
• Obtain ratio of sample variances s12/s22 = (s1/s2)2
• Choose a, and obtain:
– FL = F1a/2, n2-1, n1-1 = 1/ Fa/2, n1-1, n2-1
– FU = Fa/2, n2-1, n1-1
• Compute Confidence Interval:
s

s
F
,
F
L
U

s
s

2
1
2
2
2
1
2
2
Conclude population variances unequal if interval does not contain 1
Tests Among t > 2 Population Variances
• Hartley’s Fmax Test
–
–
–
–
Very simple to compute Test Statistic
Must have equal sample sizes (n1 = … = nt)
Test based on assumption of normally distributed data
Uses special table for critical values
• Levene’s Test
–
–
–
–
More difficult to compute by hand
No assumptions regarding sample sizes/distributions
Uses F-distribution for the test
Computed automatically by software packages
(SAS,SPSS, Minitab)
Hartley’s Fmax Test
• H0: s12 = … = st2 (homogeneous variances)
• Ha: Population Variances are not all equal
• Data: smax2 is largest sample variance, smin2 is smallest
• Test Statistic: Fmax = smax2/smin2
• Rejection Region: Fmax  F* (Values from class website,
indexed by a (.05, .01), t (number of populations) and df2
(n-1, where n is the individual sample sizes)
Levene’s Test
• H0: s12 = … = st2 (homogeneous variances)
• Ha: Population Variances are not all equal
• Data: For each group, obtain the following quantities:
yij  the j th measuremen t from group i (i  1,..., t j  1,..., ni )
~
y i  sample median for group i (i  1,..., t )
~
zij  yij  y i
(i  1,..., t j  1,..., ni )
ni
z i. 
z
j 1
ni
ni
t
 z
ij
i 1 j 1
z .. 
N  n1  ...  nt
N


(t  1)
 z

(N  t)
t
Test Statistic : L 
ij
 ni z i.  z ..
i 1
t
ni
i 1 j 1
ij
 z i.
Rejection Region : L  Fa ,t 1, N t
2
2
Download