Statistical Inference Based on Two Samples

advertisement
Statistical Inference Based on Two Samples
Basic Assumptions
1. X1 , X2 , . . . , Xm is a random sample from a population with
mean µ1 and variance σ12 .
2. Y1 , Y2 , . . . , Ym is a random sample from a population with
mean µ2 and variance σ22 .
3. The two samples are independent of one another.
Proposition
The expected value of X̄ − Ȳ is µ1 − µ2 and the standard
deviation of X̄ − Ȳ is
r
σ12
σ2
σX̄ −Ȳ =
+ 2
m
n
Samples from Normal Populations with Known Variances
If the the two samples X1 , X2 , . . . , Xm and Y1 , Y2 , . . . , Ym are from
normal populations, then we have
X̄ − Ȳ ∼ N(µ1 − µ2 ,
σ2
σ12
+ 2)
m
n
Therefore,
Z=
is a standard normal rv.
(X̄ − Ȳ ) − (µ1 − µ2 )
q
σ12
σ22
m + n
Samples from Normal Populations with Known Variances
If the population variances are known to be σ12 andσ22 , then the
two-sided confidence interval for the difference of the population
means µ1 − µ2 with confidence level 1 − α is
!
r
r
σ12
σ22
σ12
σ22
X̄ − Ȳ + zα/2
+ ,
+
X̄ − Ȳ − zα/2
m
n
m
n
Samples from Normal Populations with Known Variances
In case of known population variances, the procedures for
hypothesis testing for the difference of the population means
µ1 − µ2 is similar to the one sample test for the population mean:
Null hypothesis H0 : µ1 − µ2 = ∆0
Test statistic value
z=
Alternative Hypothesis
Ha : µ1 − µ2 > ∆0
Ha : µ1 − µ2 < ∆0
Ha : µ1 − µ2 6= ∆0
(X̄ − Ȳ ) − ∆0
q
σ12
σ22
m + n
Rejection Region for Level α Test
z ≥ zα (upper-tailed)
z ≤ −zα (lower-tailed)
z ≥ zα/2 or z ≤ −zα/2 (two-tailed)
Samples from Normal Populations with Known Variances
The type II error when µ1 −µ2 = ∆0 is calculated similarly as the one sample case:
Alternative Hypothesis Type II Error Probability β(∆0 ) for Level α Tes
Ha : µ 1 − µ 2 > ∆ 0
Ha : µ 1 − µ 2 < ∆ 0
Ha : µ1 − µ2 6= ∆0
where
σ = σX̄ −Ȳ =
0
Φ zα + ∆0 −∆
σ
0
1 − Φ −zα + ∆0 −∆
σ
0
−
Φ
−zα/2 −
Φ zα/2 + ∆0 −∆
σ
q
(σ12 /m) + (σ22 /n).
∆0 −∆0
σ
Large Size Samples
Example 9.1
Analysis of a random sample consisting of m = 20 specimens of
cold-rolled steel to determine yield strengths resulted in a sample
average strength of x̄ = 29.8 ksi. A second random sample of
n = 25 two-sided galvanized steel specimens gave a sample average
strength of ȳ = 34.7 ksi. Assuming that the two yield-strengh
distributions are normal with σ1 = 4.0 and σ2 = 5.0, does the data
indicate that the corresoponding true average yield strengths µ1
and µ2 are different?
Large Size Samples
When the sample size is large, both X̄ and Ȳ are approximately
normally distributed, i.e. approximately we have
2
2
X̄ ∼ N µ1 , S1 /m , Ȳ ∼ N µ2 , S2 /n
Therefore, X̄ − Ȳ is approximately normal with mean µ1 − µ2 and
S
2
variance m1 +
Further more,
2
S2
n
.
Z=
(X̄ − Ȳ ) − (µ1 − µ2 )
q
S12
S22
m + n
is approximately a standard normal rv.
Large Size Samples
In case both m and n are large (m, n > 30), the procedure for
constructing confidence interval and testing hypotheses for the
difference of two population means are similar to the one sample
case.
The two-sided confidence interval for the difference of the
population means µ1 − µ2 with confidence level 1 − α is
s
s


2
2
2
2
S1
S
S1
S
 X̄ − Ȳ − z
+ 2,
X̄ − Ȳ + zα/2
+ 2
α/2
m
n
m
n
Large Size Samples
In case both m and n are large (m, n > 30), the procedures for
hypothesis testing for the difference of the population means
µ1 − µ2 is :
Null hypothesis H0 : µ1 − µ2 = ∆0
Test statistic value
z=
Alternative Hypothesis
Ha : µ1 − µ2 > ∆0
Ha : µ1 − µ2 < ∆0
Ha : µ1 − µ2 6= ∆0
(X̄ − Ȳ ) − ∆0
q
S12
S22
m + n
Rejection Region for Level α Test
z ≥ zα (upper-tailed)
z ≤ −zα (lower-tailed)
z ≥ zα/2 or z ≤ −zα/2 (two-tailed)
Samples from Normal Populations with Known Variances
Example Problem 7
Are male college stuents more easily bored than their female
counterparts? This question was examined in the article “Boredom
in Young Adults – Gender and Cultural Comparisons” (J. of
Cross-Cultural Psych., 1991: 209-223). The authors administered
a scale called the Boredom Proneness Scale to 97 male and 148
female U.S. college students. Does the accompanying data support
the research hypothesis that the mean Boredom Proneness Rating
is highter for men than for women?
Gender Sample Size Sample Mean Sample SD
Male
97
10.40
4..83
Female
148
9.26
4..68
Download