Uploaded by 蔡孟君

假设检验:Z 检验和 T 检验讲义

advertisement
-
Saurplhe Test of
tly pothesis
Ho oll = bho d ≤ M
Ha
念相
:
熙质的
要码
Ha: # blo
双尾
Test stowtiztics
EHo 詹霓灭了
易
R
beY l h e lblc hh
龙尾在强
领 著当率 的冰满
α
-
0
P- Valve
o
rejeat51 if
p Value ≤α
-
:
.
ε←←… “ 武
05
n
Ho 特真时 , 观察到易本数据
*
2 + P [Z
( Z
>
Ho
FReject
(
Not Reject
*
*
*
P ( tct )
*
PCZCZ )
P( t > t )
)
tpftatt
.
后R
样極端 ( 或更極端的机率
P( t >t )
4
U ?µ o
:
530
!
PCZ > Z
)
*
1)
enough statistical
have
We
-
eviclence
to
,
P
o
PPUcemesswan
)
:
d
NCa ≈ )
水
口
而
.
rejent
tho ,at
Esupport Ha
Ho
体参段比
二
二
文
》
母
一
口
ome
唔
.
NC π
,
a
:
.
.
sannew<
i
ndependert
<
paired
one
的
sampet test
Z
未知 < 發坐
式
Chapter 1
Two-Sample Tests of Hypothesis
1.1
Introduction
Previously we looked at techniques to estimate and test parameters for one population:
• Population Mean µ
(H0 : µ = µ0 → t test ( σ 2 unknown))
• Population Proportion p
(H0 : p = p0 → z test )
We will still consider these parameters when we are looking at two populations, however, our interest will now be:
• Comparison of two means with two independent samples (獨立樣本)
• Comparison of two means with paired samples (相依樣本)
• Comparison of two proportions with two independent samples
The two samples, taken separately and independently, are referred to as independent simple random samples.
• 獨立樣本: 自兩獨立母體中隨機抽出樣本,以推論兩個獨立母體的參
數間是否存在顯著差異。
• 相依 (配對、成對) 樣本: 兩個相依母體中隨機抽出一組成對樣本,以
推論兩個相依母體的參數間是否存在顯著差異。
1
方
用
。
σ=& 知
Tiwo
用。
独立样本
vs
.
相依样本
samplel
population
志冒
o oosampies
iid
popalation
譬
某 F 擾因素
eg . 家庭学院
Sample 1
sample 1
samplez
絨上課
比較期未成立绩
是否有異 ,
Sample 2
害体課
一
Population 1
Sample 1
Parameter: µ1 , σ12 Statistic: x̄1 , s21
Sample size: n1
1.2
和
Population 2
Sample 2
Parameter: µ2 , σ22
Statistic: x̄2 , s22
Sample size: n2
Comparing Two Population Means: Independent Samples
• Difference between Two Means:
Because we are comparing two population means, µ1 → µ2 , we use the statistic, x¯1 → x¯2
– x¯1 → x¯2 is normally distributed if the original populations are normal
–or–approximately normal if the populations are nonnormal and the
sample sizes are large (n1 , n2 > 30) BY CUT
靈
– The expected value of x¯1 → x¯2 is µ1 → µ2 .
σ2
σ2
– The variance of x¯1 → x¯2 is n11 + n22 .
Note:
∅
*
E (X̄1 → X̄2 ) = E (X̄1 ) → E (X̄2 ) = µ1 → µ2
!
"
!
"
∑ X1i
∑ X2i
V (X̄1 → X̄2 ) = V (X̄1 ) +V (X̄2 ) = V
+V
n1
n2
2
2
σ
σ
1
1
= 2 n1V (X1 ) + 2 n2V (X2 ) = 1 + 2
n1
n2
n1
n2
!
"
σ2 σ2
X̄1 → X̄2 ↑ N µ1 → µ2 , 1 + 2
h1
n2
Note
ǘ
标碎化公式
Z
*
=
)
2
。
H0 : µ1 → µ2 = 0
(i.e H0 : µ1 = µ2 )
Two tail test: H1 : µ1 → µ2 ↓= 0
(i.e µ1 ↓= µ2 )
Lower tail test: H1 : µ1 → µ2 < 0
(i.e µ1 < µ2 )
Upper tail test: H1 : µ1 → µ2 > 0
(i.e µ1 > µ2 )
Point estimator of the difference between 2 population means: x̄1 → x̄2
Standard error of x̄1 → x̄2 : standard deviation of the sampling distribution of x̄1 → x̄2
#
σ12 σ22
σx̄1 →x̄2 =
+
n1
n2
Point estimator has a std error that describes the variation in the sampling distribution of the estimator.
1.2.1 Equal, Known Population Variances
• No assumptions about the shape of the populations are required.
(Note: 當 n1 ↔ 30, n2 ↔ 30,需假設兩樣本來自兩個獨立的常態母體)
• The samples are from independent populations.
• The formula for computing the value of z is:
If σ1 and σ2 are known :
ender Ho 為真 : µ 1 =µ 2
x¯1 → x¯2
z∗ = $ 2
σ1
σ22
n1 + n2
-
Interval estimator: point estimator ±ME
[ 比 ,
#
-
µ s)
=0
σ12 σ22
100(1 → α )%CI : X̄1 → X̄2 ± z α2
+
n1
n2
⎛
⎞
X̄1 → X̄2 → (µ1 → µ2 )
$ 2
P ⎝→z α2 <
< z α2 ⎠ = 1 → α
σ1
σ22
n1 + n2
2
⎛
⎞
#
#
2
2
2
2
σ1 σ2
σ1 σ2 ⎠
⇒ P ⎝X̄1 → X̄2 → z α2
+
< µ1 → µ2 < X̄1 → X̄2 + z α2
+
= 1→α
n1
n2
n1
n2
3
=
0435
一
口
。
e
Example 1 A study using two random samples of 35 people each found that the
average amount of time those in the age group of 26-35 years spent per week on
leisure activities was 39.6 hours, and those in the age group of 46-55 years spent
35.4 hours. Assume that the population standard deviation for those in the first age
group found by previous studies is 6.3 hours, and the population standard deviation
of those in the second group found by previous studies was 5.8 hours. At α = 0.05,
can it be concluded that there is a significant difference in the average time each
group spends on leisure activities?
0
.
H0 : µ1 → µ2 = 0
Ha : µ1 → µ2 ↓= 0
475
39.6 → 35.4
z =$
= 2.9
6.32
5.82
35 + 35
∗
9a
凰
0025
(
a1
02
. 96
←→
-
1 96
.
-
p
valve
04981
_
fin
g
na
D. 5
-
0 4981
.
p-value = 2 · P(z > 2.9) = 2 ≃ (0.5 → 0.4981)
Reject H0 if |z∗ | > 1.96
Reject H0 . There is a significant difference between two population means at α =
0.05.
Example 2 A researcher hypothesizes that the average number of sports that colleges offer for males is greater than the average number of sports that colleges
offer for females. A sample of the number of sports offered by colleges is shown.
At α = 0.10, is there enough evidence to support the claim? Assume σmale and
σ f emale = 3.3. X̄male = 8.6, X̄ f emale = 7.9, nmale = 10, n f emale = 15.
H0 : µM → µF ↔ 0
Ha : µM → µF > 0
入
kkaá
Za ≈ 1 28
.
(or µM > µF )
8.6 → 7.9
z∗ = $
= 0.52
3.32
3.32
+
10
15
p-value = 0.5 → 0.1985 = 0.3015
p
value
.-
0
Reject H0 if z∗ > 1.28
Not reject H0 . We dont have enough statistical evidence to suppot the claimat
α = 0.1
N
4
-
5
0
.
1985
The t distribution is used as the test statistic if one or more of the samples have
less than 30 observations. The required assumptions are:
• Both populations must follow the normal distribution.
• The samples are from independent populations.
• The populations must have equal standard deviations. (for equal variances
case)
There are two cases to consider:
- The unknown population variances are equal.
- The unknown population variances are not equal.
σ
是否要假設相同可做檢定 (ch2),簡單方法: 大
↔ 3,可假設相同
小σ 2
F
2
• Equal Variances σ12 = σ22 = σ 2
NCMc µ
-
2 ~
-
,
2
,
器器 )
– Comparing Population Means (The Pooled t-test):
!
!
""
1
1
2
X̄1 → X̄2 ↑ N µ1 → µ2 , σ
+
n1 n2
Finding the value of the test statistic requires two steps:
Step1. Pool the sample standard deviations.
σ三
(n1 → 1)s21 + (n2 → 1)s22
s2p =
n1 + n2 → 2
=+些
s.
东样本数不同
♦ Idea: if the variances are the same, we can pool data for both samples
together to produce a pooled variance estimator.
Step2. Use the pooled standard deviation to compute the t-statistic.
t∗ =
个 加校的 式
sz
(x̄1 → x̄2 ) → (µ1 → µ2 )
$
s2p ( n11 + n12 )
~ S:
Ʃ(x
d f = n1 + n2 → 2
=
5
方
s
用
又
又
、
《
1.2.2 Unknown Population Variances
p
(yn ) s
=
+
三
( n 1)
-
π
enr-
– CI for µ1 → µ2 :
The confidence interval estimator for µ1 → µ2 when the population variances
are equal is given by:
* 4
如果是双尾檢定
)
1
1
(x̄1 → x̄2 ) ± tα /2 s2p ( + )
可以 CI 去下结論
n1 n2
CIC
d f = n1 + n2 → 2
"
σ12 σ22
X̄1 → X̄2 ↑ N µ1 → µ2 ,
+
n1
n2
現在σ 末知
X̄1 → X̄2 → (µ1 → µ2 )
$ 2
要改成t 檢定
z=
σ1
σ22
σ 變成S
n1 + n2
Not resect to
!
X̄1 → X̄2 → (µ1 → µ2 )
$ 2
s1
s22
+
n1
n2
*
*
++
又知 σ12 = σ22 = σ 2 用 s2p 去估計 σ 2 ⇒ X̄1 → X̄2 ↑ N µ1 → µ2 , σ 2 n11 + n12
t=
用加權平均的概念得 s2p ,但樣本數不同要考慮進去 ⇒ s2p =
⇒t =
(n1 →1)s21 +(n2 →1)s22
n1 +n2 →2
X̄1 → X̄2 → (µ1 → µ2 )
) *
+
1
1
2
s p µ1 + n2
100(1 → α )% < CI :
⎛
⎞
⎜
⎟
X¯1 → X¯2 → (µ1 → µ2 )
α (n1 + n2 → 2) <
α (n1 + n2 → 2)⎟ = 1 → α
)
P⎜
→t
<
t
*
+
⎝ 2
⎠
2
s2p n11 + n12
• Unequal Variances σ12 ↓= σ22
– Comparing Population
Means:+
*
σ2
σ2
X̄1 → X̄2 ↑ N µ1 → µ2 , n11 + n22
用
6
] 含o
,
Use the formula for the t-statistic shown if it is not reasonable to assume the
population standard deviations are equal. (Assumption: 兩樣本是隨機抽
自兩個獨立的常態母體)
t∗ =
(x̄1 → x̄2 ) → (µ1 → µ2 )
$ 2
s1
s22
n1 + n2
The degrees of freedom are adjusted downward by a rather complex approximation formula. The effect is to reduce the number of degrees of freedom
in the test, which will require a larger value of the test statistic to reject the
null hypothesis.
[(s2 /n1 ) + (s22 /n2 )]2
d f = 12 2
(s1 /n1 )
(s22 /n2 )2
n1 →1 + n2 →1
– CI for µ1 → µ2 :
(x̄1 → x̄2 ) ± tα /2
7
#
s21 s22
+
n1 n2
Example 1 A financial planner wants to compare the yield of income and growth
mutual funds. Fifty thousand dollars is invested in each of a sample of 35 income
and 40 growth funds. The mean increase for a two-year period for the income funds
is $900. For the growth funds, the mean increase is $875. Income funds have a
sample standard deviation of $35; growth funds have a sample standard deviation
of $45. Assume that the population standard deviations are equal. At the 0.05
significance level, is there a difference in the mean yields of the two funds?
H0 : µ1 → µ2 = 0
Ha : µ1 → µ2 ↓= 0
s2p =
Reject H0 if |t ∗ | > 1.993
(35 → 1)352 + (40 → 1)452
35 + 40 → 2
900 → 875
t∗ = $ .
/ = 2.66,
1
1
s2p 35
+ 40
=
1652 397
.
d f = 35 + 40 → 2 = 73
Reject H0 . 0.001<p-value<0.01
Example 2 Two random samples of 40 students were drawn independently from
two populations of students. Assume their statistics tests are normally distributed
(total points = 100). The following statistics regarding their scores in a statistics
test were obtained: x¯1 = 76, s1 = 8, x¯2 = 72, s2 = 6.5. Assume variances are not
equal.
a. Test at the 5% significance level to determine whether we can infer that the
two population means differ.
b. Estimate with 95% confidence the difference between the two population
means.
c. Explain how to use the 95% confidence interval to test the hypotheses at
α = .05.
H0 : µ1 → µ2 = 0
Ha : µ1 → µ2 ↓= 0
Reject H0 if |t ∗ | > 1.992
(a)
8
⇒ Reject H0
76 → 72
t∗ = $
= 2.45,
82
6.52
40 + 40
$
2
d f = 74.86 ≈ 75
2
8
(b)(76 → 72) ± 1.992 40
+ 6.5
40 = [0.75, 7.25]
(c) 0 is not included in the interval ⇒ Reject H0
! Which test statistic do we use? Equal variance or unequal variance?
Whenever there is insufficient evidence that the variances are unequal, it is preferable to perform the Equal variance t test
Larger df have the same effect as having larger sample size, so prefer the one with
larger df.(equal df ≥ unequal df)
If n 1 and n 2 are quite different use unequal variances t test.
1.3
Comparing Two Population Means: Paired/Matched
Samples
Dependent samples are samples that are paired, coupled, or related in some
fashion.(Assumption:1. 成對樣本隨機抽自成對母體 2. 成對母體的參數差異
值是常態分配)
For example:
- If you wished to buy a car you would look at the same car at two (or more)
different dealerships and compare the prices.
- Decide on the basis of salaries “before and after” receiving an MBA whether
that degree contributes to financial well-being.
• Comparing Population Means: µd = µ1 → µ2
H0 : µd = 0
Two tail test: Ha : µd ↓= 0
Upeer
p tail test: Ha : µd > 0
9
Lower tail test: Ha : µd < 0
! The idea is simply to look at the pairwise differences of the observed data.
(Just like one sample t-test)
d¯
√
sd / nd
sd
CI : d¯ ± tα /2 √
nd
t∗ =
d f = nd → 1
Example 1 A marketing consultant was in the process of studying the perceptions
of married couples concerning their monthly clothing expenditures. He believed
that the husband’s perception would be higher than the wife’s. To judge his belief,
he takes a random sample of ten married couples and asks each spouse to estimate
the family clothing expenditure (in dollars) during the previous month. The data
are shown below.
a. Can the consultant conclude at the 5% significance level that the husband’s
estimate is higher than the wife’s estimate?
b. Estimate with 95 % confidence the population mean difference. Briefly describe what the interval estimate tells you.
Couple Husband
1
380
2
280
3
215
4
350
5
210
6
410
7
250
8
360
9
180
10
400
Wife d
270 110
300 -20
185 30
320 30
180 30
390 20
250
0
320 40
170 10
330 70
10
(a)
d : sum = 320,
mean = 32,
df =9
H0 : µd ↔ 0
Ha : µd > 0
Reject H0 if |t ∗ | > 1.833
灵世
s = 36.42,
32
t∗ = $
wǒ
36.45
√
10
= 2.78
⇒ Reject H0 . Yes, we have enough evidence to conclude that at α = 0.05
(b)
36.45
32 ± 2.262 ≃ √
= [5.92, 58.07]
10
We est. husband’s expenditure would be on average between 5.92 & 58.07 higher
than wife’.
Example 2 A researcher has performed the following experiment. For each of 10
sets of identical twins who were born 30 years ago, he recorded their annual incomes, according to which twin was born first. The results (in $1,000s) are shown
below. Can he infer at 5% significance level that there is a difference in income
between the twins?
11
H0 : µd = 0
Ha : µd ↓= 0
d¯ = →4,
d
-
12
9
{
1
o
sd = 5.77
→4
t∗ = $
= →2.19
upllan
5.77
√
10
Reject H0 if |t ∗ | > 2.262 ⇒ Not reject H0 .
:
=
-
2
.
19
! How do we differentiate between dependent and independent samples?
• Dependent samples are characterized by a measurement followed by an intervention of some kind and then another measurement. This could be called
a “before” and “after” study.
• Dependent samples are characterized by matching or pairing observations.
! Why do we prefer dependent samples to independent samples?
• By using dependent samples, we are able to reduce the variation in the sampling distribution.
1.4
Comparing Two Population Proportions
To draw inferences about the parameter p1 → p2 , we take samples of the population, calculate the sample proportions, and look at their difference. p̂1 → p̂2 is an
12
口
一
一
长
「
Twin Set First Born Second Born
1
32
44
2
36
43
3
21
28
4
30
39
5
49
51
6
27
25
7
39
32
8
38
42
9
56
64
10
44
44
、
E ( δ 了 =θ
论是 θ 的 unbiased
estimator
unbiased estimator for p1 → p2 .
1)
2)
E( p̂1 → p̂2 ) = p1 → p2 , Var( p̂1 → p̂2 ) = p1 (1→p
+ p2 (1→p
n1
n2
[proof]
1)
X1 ↑ Bin(n1 , p1 ), p̂1 = nx11 , E( p̂1 ) = p1 , V ( p̂1 ) = p1 (1→p
n1
2)
X2 ↑ Bin(n2 , p2 ), p̂2 = nx22 , E( p̂2 ) = p2 , V ( p̂2 ) = p2 (1→p
n2
! "
x1
1
n1 p1
E ( pˆ1 ) = E
= E (x1 ) =
= p1
n1
n1
n1
1
1
p1 q1
V ( pˆ1 ) = V (x1 ) = 2 n1 p1 q1 =
n1
n1
n1
Z
E( p̂1 → p̂2 ) = E( p̂1 ) → E( p̂1 ) = p1 → p2
1)
2)
V ( p̂1 → p̂2 ) = V ( p̂1 ) +V ( p̂2 ) = p1 (1→p
+ p2 (1→p
n1
n2
1)
2)
p̂1 → p̂2 ↑ N(p1 → p2 , p1 (1→p
+ p2 (1→p
)
n1
n2
The statistic p̂1 → p̂2 is approximately normally distributed if the sample sizes
are large enough so that: n1 p̂1 , n2 p̂2 , n1 (1 → p̂1 ), and n2 (1 → p̂2 ) are all ≥ 5.(大樣
本)
( p̂1 → p̂2 ) → (p1 → p2 )
z∗ = $
p1 (1→p1 )
2)
+ p2 (1→p
n1
n2
However, the standard error of p̂1 → p̂2 is unknown. Thus, we have two different
estimators for the standard error, which depend upon the null hypothesis.
• Case1: H0 : p1 → p2 = 0
∗
→
z =$
Pooled proportion:
p̂ =
so
P
=
P
2
=
P
→
pˆ1 → pˆ2
p̂(1 → p̂)( n11 + n12 )
x1 + x2 n1 pˆ1 + n2 pˆ2
=
( pˆ1 , pˆ2 )的加板均 )
n1 + n2
n1 + n2
• Case2: H0 : p1 → p2 = D(D ↓= 0)
13
pooled est of p
is
p
z∗ = $
( pˆ1 → pˆ2 ) → D
pˆ1 (1→ pˆ1 )
pˆ2 )
+ pˆ2 (1→
n1
n2
Confidence Interval Estimator:
The confidence interval estimator for p1 → p2 is given by:
#
pˆ1 (1 → pˆ1 ) pˆ2 (1 → pˆ2 )
( pˆ1 → pˆ2 ) ± zα /2
+
n1
n2
and as you may suspect, its valid when n1 pˆ1 , n2 pˆ2 , n1 (1 → pˆ1 ), and n2 (1 → pˆ2 )
are all ≥ 5.
Example 1 A statistician wanted to determine if efforts to promote safety have
been successful. By checking the records of 250 workers, he found that 30 of
them suffered either minor or major injuries that year. A random sample of 400
workers last year revealed that 80 suffered some form of injury.
a. Can the statistician infer at the 5% significance level that efforts to promote
safety have been successful?
b. Estimate with 95% confidence the difference in population proportions.
今年去年
H0 : P1 → P2 ≥ 0
(a)
Ha : P1 → P2 < 0
80
30+80
p̂1 = 30
p̂2 = 400
, p̂ = 250+400
= 0.169
50 ,
n1 p̂1 , n2 p̂2 , n1 q̂1 , n2 q̂2 ⩾ 5
2
z∗ = $
30
80
250 → 400
1
1
0.169(1→0.169)( 250
+ 400
)
∗
Reject H0 if z < →1.645
= →2.65
⇒ Reject H0 . Yes, successful
(b)
#
.
/
.
/
30
30
80
800
250 1 → 250
400 1 → 400
→0.08 ± 1.96
+
= [→0.1362, →0.0238]
250
400
14
Example 2 In the nursing home study mentioned in the News Today, the researchers found that 12 out of 34 small nursing homes had a resident vaccination
rate of less than 80%, while 17 out of 24 large nursing homes had a vaccination
rate of less than 80%. At α = 0.05, test the claim that there is no difference in the
proportions of the small and large nursing homes with a resident vaccination rate
of less than 80%.
H0 : P1 → P2 = 0
器 05
Ha : P1 → P2 ↓= 0
12
17
12+17
p̂1 = 34 , p̂2 = 24 , p̂ = 34+24 = ∞
0.169
n1 p̂1 , n2 p̂2 , n1 q̂1 , n2 q̂2 ⩾ 5
z∗ = $ 0.35→0.71
= →2.7
1
1
0.5≃0.5≃( 34
+ 34
)
㉔
Reject H0 if |z∗ | > 1.96
⇒ Reject H0 .
=
.
15
Download