Uploaded by xbcepniezntckwhvml

Hypothesis Testing in Mathematical Statistics

advertisement
MATHEMATICAL STATISTICS 214
CHAPTER 10
Hypothesis Testing
Introduction
Aim
+ Infer about unknown parameter from an estimate obtained
from a random sample from a population.
Approach
+ Posit about the “behaviour”/“value” of a parameter.
+ Investigate the validity of the hypothesis using evidence
available from a random sample from the population of interest.
+ Understand how to identify and generate appropriate function
of the sample variable that will ensure that a valid decision is
reached.
+ Quantify the possibility of making the incorrect decision.
1 / 47
Elements of Statistical Tests
Question
+ What is the population mean
(µ) height of the Khoisan?
Hypothesis
+ On average, the Khoisan are
50cm tall. That is, µ = 50.
Data
+ Collected a random sample
of 20 Khoisan.
Evidence
+ Estimate of average height
from the sample is 25cm.
That is, µ̂ = x̄ = 25.
2 / 47
Elements of Statistical Tests
Statistical Analyses
+ Case I Is the difference between the hypothesised value
(µ = 50) and the sample estimate (x̄ = 25) negligible and/or only
due to random sampling error. In other words, the data does
not contain enough evidence for the claim to be rejected.
f(X)
+ Case II The difference between the hypothesis and the statistic
reflects true discrepancy between the values. Therefore, the
claim is rejected based on evidence from the sample.
25
µ = 50
X
3 / 47
Elements of Statistical Tests
Statistical tests of hypothesis comprise the following essential
elements.
(a.) Null Hypothesis (H0 ):
I
Assumption: Associated with an “equal” (=) notation.
(b.) Alternative Hypothesis (H1 ):
I
Assumption: Associated with one of “not-equal” (6=), “less-than” (<)
or “greater-than” (>) sign.
(c.) Significance Level (α): corresponds to the P(Type I Error).
I
zα : threshold implied by the significance level. Also referred to as
“critical value”.
(d.) Test Statistic: magnitude of the evidence present in the data.
(e.) Rejection Region (RR): area under the sampling distribution
beyond the critical value.
(f.) Conclusion: statement about inference from the statistical test.
4 / 47
Elements of Statistical Tests
α
2
α
2
− zα
2
zα
2
5 / 47
Elements of Statistical Tests
+ Ideally, the null hypothesis (H0 ) and the alternative hypothesis
(H1 ) should be stated in a way such that they are mutually
exclusive and collectively exhaustive.
I
Note: H0 can only be expressed in terms of one of “=”, “≤”
or “≥” sign.
+ The null hypothesis is stated in terms of the “status quo”.
I
Note: NEVER accept H0 .
+ The research aim is usually to show support for H1 .
+ Rejection Region (RR): comprises the set of values of the test
statistic for which H0 will be rejected.
6 / 47
Elements of Statistical Tests
For any fixed rejection region, two types of errors can be made
when a conclusion is reached.
+ Type I Error:
I Occurs with a probability equal to α.
I
I
It is the significance level of the test.
P(reject H0 |true H0 ).
+ Type II Error:
I Occurs with a probability equal to β.
I
Power of the test is calculated as (1 − β).
I
P(not reject H0 |false H0 ).
7 / 47
Elements of Statistical Tests
Actual Situation
Decision
H0 is True
H0 is False
Do not reject H0
Correct decision
Confidence = 1 − α
Type II Error
P(Type II Error)= β
Reject H0
Type I Error
P(Type I Error)= α
Correct decision
Power = 1 − β
+ Inverse relationship exist between α and β.
+ Increased sample size causes both α and β to decrease.
+ Value of β depends on the true value of the parameter.
I
When the difference between true and hypothesised value
of the parameter increases, β decreases and vice versa.
8 / 47
Elements of Statistical Tests
α β
β
µtrue
α
Benchmark
α
µtrue
β
µtrue µH0
µH0
µtrue − µH0 decreases β increases.
β
α
µtrue µH0
µH0
β increases α decreases.
σ increases β increases.
2
9 / 47
Example
A manufacturer of automatic washers offers a model in one of two
colors – A or B. A random sample of 20 customers that each
purchased one type of the washers were observed. Assume that it
was decided a priori that a washer will be considered inferior if not
more than 5 of the sampled customers bought it.
(a.) Calculate α = P(Type I Error) with respect to the null
hypothesis that both washers perform similarly against an
alternative that Washer A is inferior.
(b.) If the true population proportion of customers that prefer
Washer Washer A is 0.40, calculate β = P(Type II Error).
10 / 47
Solution
π = population proportion of customers that prefer Washer A.
X = number of customers that purchase Washer A.
H0 :
π = 0.50
H1 :
π < 0.50
(b.)
(a.)
X ∼ Binomial(n = 20, π = 0.50)
α
=
=
≈
β
=
P (X > 5 | n = 20, π = 0.40)
5 X
20
0.50i × (1 − 0.50)20−i
i
i=1
=
1−
≈
1 − 0.126.
0.021.
=
0.874.
P (X ≤ 5 | n = 20, π = 0.50)
5 X
20
0.40j × (1 − 0.40)20−j
j
j=1
11 / 47
One Sample Tests
12 / 47
Large Sample Tests: µ
X̄ ∼ N(µ, σx̄2 )
+ Null Hypothesis H0 :
a
?
µ = µ0
+ Alternative Hypothesis H1 :
a
µ < µ0
lower-tail
µ 6= µ0
two-tail
µ > µ0
upper-tail
+ Significance Level:
a
P(Type I Error) = α
+ Test Statistic:
a
x̄ − µ0
x̄ − µ0
√
Z=
=
σx̄
σ/ n
Z ∼ N(0, 1)
+ Rejection Region (RR):
a
{Z < −zα } lower-tail
{|Z| > z α2 }
two-tail
{Z > zα }
upper-tail
Population variance (σ 2 ) known or sample size large enough (n ≥ 30) such that
it may be estimated accurately.
13 / 47
Large Sample Tests
α
2
α
2
− zα
zα
2
2
Two-Tail
α
α
zα
− zα
Lower-Tail
Upper-Tail
14 / 47
Large Sample Tests: π
p ∼ N(π, σp2 )
+ Null Hypothesis H0 :
a
?
π = π0
+ Alternative Hypothesis H1 :
a
π < π0
lower-tail
π 6= π0
two-tail
π > π0
upper-tail
+ Significance Level:
a
P(Type I Error) = α
+ Test Statistic:
a
p − π0
p − π0
Z=
=p
σp
π0 (1 − π0 )/n
Z ∼ N(0, 1)
+ Rejection Region (RR):
a
{Z < −zα } lower-tail
{|Z| > z α2 }
two-tail
{Z > zα }
upper-tail
n×π
>
5
n × (1 − π)
>
5
15 / 47
Large Sample Tests: Example
The outage voltage of an electric circuit is specified by the
manufacturer as 130. A sample of 25 independent readings on the
voltage for this circuit produced an average of 128.6. Suppose that it
is known that the standard deviation of output voltage is 4.0. If
output voltage is assumed to be normally distributed, at 5%
significance level,
(a.) verify the claim made by the manufacturer.
(b.) A competing company claims that the output voltage from the
manufacturer’s batteries is greater than what is advertised and it
damages appliances. Is the competitor’s claim valid?
Solution:
X = output voltage
2
42
X̄ ∼ N µ, σn = N µ, 25
Significance Level = P(Type I Error) = α = 5% = 0.05
16 / 47
Large Sample Tests: Solution Cont’d
(a.)
zstat
(b.)
H0 :
µ = 130
H1 :
µ 6= 130
=
x̄ − µ
√
σ/ n
=
−1.75.
=
128.6 − 130
√
4/ 25
zstat
H0 :
µ ≥ 130
H1 :
µ < 130
=
x̄ − µ
√
σ/ n
=
−1.75.
=
128.6 − 130
√
4/ 25
zcrit = z α = z0.025 = 1.96
zcrit = −zα = −z0.05 = −1.645
Conclusion: |zstat | = 1.75 < 1.96 = zcrit .
Therefore, fail to reject null hypothesis.
There is insufficient evidence in the
observed data to be able to claim that
output voltage is different from 130V .
Conclusion:
zstat = −1.75 < −1.645 = zcrit .
Therefore, reject null hypothesis. The
output voltage is statistically significantly
not greater than the advertised 130V .
2
zcrit ≈ − 1.75
zcrit ≈ − 1.75
-1.96
0
1.96
-1.645
0
17 / 47
Type II Error: Not reject false H0
a
β = P(Type II Error) = P(Not reject H0 | False H0 )
Example:
The outage voltage of an electric circuit is specified by the
manufacturer as 130. An officer from the consumer protection unit
intends to verify the claim of the manufacturer against that of some
dissatisfied consumers that the output voltage is less than what was
advertised. A sample of 25 independent readings on the voltage for
this circuit was collected for the assessment. Suppose that it is
public knowledge that output voltage is normally distributed with a
standard deviation of 4.0. What is the value of β if the true mean
output voltage is 128. Use 5% significance level.
18 / 47
Type II Error: Solution
X = output voltage
42
X̄ ∼ N µ, 25
α
0.05
H0 :
µ ≥ 130
H1 :
µ < 130
Therefore,
x̄crit − 130
√
4/ 25
=
−1.645
Ñ x̄crit
=
130 − 1.645 × 0.80
≈
128.684
=
0.05
=
P (Z < zcrit | µ = 130)
P (Z < z0.05 | µ = 130)
=
=
P (Z < −1.645 | µ = 130)
x̄
− 130
P Z < crit√
4/ 25
=
=
=
So that,
β
TRUE
P (Z > zcrit | µ = 128)
128.684 − 128
√
P Z>
4/ 25
=
P (Z > 0.855)
≈
0.1963
NULL
α β
128
xcrit
130
19 / 47
Sample Size Estimation: Derivation
Imagine that you are interested in verifying the following
hypotheses.
H 0 : µ ≤ µ0 ,
H1 : µ > µ0 .
vs.
Assume that a similar assessment has been executed by a different
researcher who had inferred that µ > µT . What sample size is
required if you plan to ensure that the P(Type I Error) = α and that
the P(Type II Error) = β. Assume further that the variable of
interest X ∼ N(µ, σ 2 ) and that µ0 < µT .
NULL
PROPOSED
β α
µ0
xcrit
µT
20 / 47
Sample Size Estimation: Derivation
α
=
=
Ñ zα
=
Ñ x̄crit
=
H0 :
µ ≤ µ0
H1 :
µ > µ0
β
P (Z > zα | µ = µ0 )
x̄ − µ
P Z > crit √ 0
σ/ n
x̄crit − µ0
√
σ/ n
σ
µ0 + zα √
n
=
=
(1)
Ñ −zβ
=
Ñ x̄crit
=
P Z < −zβ | µ = µT
x̄ − µ
P Z < crit √ T
σ/ n
x̄crit − µT
√
σ/ n
σ
µT − zβ √
n
(2)
Equate Equation (1) and Equation (2):
σ
σ
√
√
x̄crit = µ0 + zα
= µT − z β
n
n
Solve for n:
&
a
n=
(zα + zβ )2 × σ 2
(µT − µ0 )2
'
21 / 47
Sample Size Estimation: Example
How “clean” is green energy? Cobalt – a natural resource abundant in
war-torn Democratic Republic of Congo (DRC) – is a major component of
the batteries used to power renewable energy. An European battery
manufacturing company that has a cobalt mine in the DRC was recently
accused of over-working her mine workers. The company claims that on
average, her employees work below 45H00 weekly. A prominent activist
disagrees. If the activist posits that the average work duration per week at
the mine is 45H32, what sample size is required to verify the company’s
claim against the activist’s such that probabilities of types I and II errors
are restricted to about 5% and 9%, respectively. Assume that a standard
deviation of 2H12 was obtained from a pilot study.
Solution (follows directly from the preceding derivation):
H0 :
H1 :
µ ≤ 45
µ > 45

n
=
α = 5%
β = 9%
µT = 45 +
32
60
= 45.53̇
σ̂ = 2 +
12
60
= 2.2






× σ̂ 2

2 

µT − µ0

zα + zβ
2

2
=
=
n
× 2.22
 z0.05 + z0.09



2




45.53̇ − 45
&
'
(1.645 + 1.341)2 × 2.22
0.53̇2
≈
d153.629e
≈
154
a
Note:
(1.) Always round-up
to the nearest integer.
(2.) Calculations are
the same when s2 is
used.
22 / 47
Hypothesis Testing vs. Confidence Interval
θ̂ ∼ N(θ, σθ̂2 )
+ Null Hypothesis H0 :
a
?
θ = θ0
+ Alternative Hypothesis H1 :
a
θ 6= θ0
+ Confidence Interval:
a
θ̂ − θ
−z α2 <
< z α2
σθ̂
θ̂ − z α2 × σθ̂ < θ < θ̂ + z α2 × σθ̂
two-tail
+ Significance Level:
a
P(Type I Error) = α
aDo not reject H in favour of
0
H1 if the value of θ0 lies inside
the (1 − α) × 100% confidence
interval for θ.
α
2
α
2
− zα
2
zα
2
23 / 47
Hypothesis Testing vs. Confidence Interval
Example:
Bleue-Blanche-Rouge (BBR), a cobalt mining company in the DRC was recently
accused of paying below the minimum 500 FRANCS hourly wage. The accusation was
based on the salary of 12 of her employees that yielded an average of 493.15 FRANCS.
Verify the authenticity of the accusation with an appropriate confidence interval and
an 8% significance level. Assume that wages at the DRC have been shown to be
N (µ, 150.63) distributed.
X = wage at BBR
X̄ ∼ N µ, 150.63
12
H0 :
µ ≤ 500
H1 :
µ > 500
α = 0.08
!
x̄ − µ
P p
< zα
σ 2 /n
493.15 − µ
< z0.08
P √
150.63/12
α = 0.08
488.17
=
1−α
µx
∴ 92% C.I. :
µ > 493.15 − 1.4051 ×
=
1 − 0.08
!
r
150.63
12
(488.1719 < µ < ∞)
Conclusion: 500 falls within the interval. Therefore, fail to reject null hypothesis.
There is insufficient evidence in the observed data to be able to claim that, on
average, BBR does not pay below minimum wage.
24 / 47
p-value
+ Represents a different way to report results of hypothesis tests.
+ It is the smallest value of α for which the null hypothesis can
be rejected.
+ Avails an opportunity to evaluate the extent to which the
evidence in the data disagree with the null hypothesis (H0 ).
Example:
Bleue-Blanche-Rouge (BBR), a cobalt mining company in the DRC
was recently accused of paying below the minimum 500 FRANCS
hourly wage. The accusation was based on the salary of 12 of her
employees that yielded an average of 493.15 FRANCS. Estimate the
p-value for the appropriate hypothesis test to verify the authenticity
of the accusation. Adopt an 8% significance level for your inference
and assume that wages at the DRC have been shown to be
N (µ, 150.63) distributed.
25 / 47
p-value: Example
X = wage at BBR
X̄ ∼ N µ, 150.63
12
H0 :
µ ≤ 500
H1 :
µ > 500
p-value
α = 0.08
p-value ≈ 0.9734
α = 0.08
-1.9334
z0.08
zstat
=
x̄ − µ
p
σ 2 /n
=
493.15 − 500
√
≈ −1.9334
150.63/12
=
P (Z > zstat ) ≈ P (Z > −1.9334)
≈
0.9734
Interpretation: If BBR paid below 500 FRANCS on
average, the probability of observing 12 of her
employees with a mean wage of 493.15 FRANCS
or more is about 97.34%.
Conclusion: p-value ≈ 0.9734 > 0.08 = α.
Therefore, fail to reject null hypothesis. The
claim that, on average, BBR pays below the
minimum wage may not be invalidated based on
the salary data collected from the sampled
employees.
26 / 47
Small Sample Test for µ
X̄ ∼ N(µ, σx̄2 )
+ Null Hypothesis H0 :
a
?
µ = µ0
+ Alternative Hypothesis H1 :
a
µ < µ0
lower-tail
µ 6= µ0
two-tail
µ > µ0
upper-tail
+ Test Statistic:
a
x̄ − µ0
x̄ − µ0
√
T=
=
σ̂x̄
s/ n
T ∼ t(n−1) .
+ Rejection Region (RR):
a
{T < −t(n−1),α } lower-tail
{|T| > t(n−1), α }
two-tail
{T > t(n−1),α }
upper-tail
2
+ Significance Level:
a
P(Type I Error) = α
σ 2 is unknown and the sample size is not
large enough (n < 100) for an accurate
estimate to be obtained.
27 / 47
Small Sample Test for µ: Example
Alcohol abuse is prevalent among South African teens. In a recent
broadcast, the ruling party claimed that some expensive and heavily
criticised measures that were implemented were fruitful. It was said that the
average age (in years) of alcohol takers is no longer less than 18. You set
out to validate this statement and decided a priori on a 5% significance level.
A random sample of 25 alcohol users yielded a mean and standard
deviation of 16.3 yr and 4.17 yr respectively.
X = age (in years) of South African alcohol user
2
X̄ ∼ N µ, σ25
tstat
H0 :
µ ≥ 18
H1 :
µ < 18
Critical value approach:
=
−T(24),0.05
α = 0.05
≈
−1.7109
Deg. of Fr. = 25 − 1 = 24
>
−2.0384 ≈ tstat
=
=
x̄ − µ
√
s/ n
16.3 − 18
√
≈ −2.0384
4.17/ 25
Tcrit
p-value approach:
P T(24) < tstat
≈
≈
P T(24) < −2.0384
0.0263 < 0.05 = α
28 / 47
Small Sample Test for µ: Example Cont’d
Confidence approach:
x̄ − µ
√ > −T(24),α
P
s/ n
16.3 − µ
√ > −T(24),0.05
P
4.17/ 25
=
1−α
=
1 − 0.05
tcrit ≈ − 1.7109
p-value ≈ 0.0263
Ñ 95% C.I. :
4.17
µ < 16.3 + 1.7109 × √
25
α = 0.05
-2.0384
18 ∈/ (−∞ < µ < 14.8731)
1 − α = 0.95
Decision:
Reject null hypothesis.
Conclusion:
The average age of alcohol users in South
Africa is statistically significantly not greater
than 18 years.
xup ≈ 14.8731
µx
18
29 / 47
Hypothesis Testing w.r.t σ 2
X ∼ N(µ, σ 2 )
+ Null Hypothesis H0 :
a
?
σ 2 = σ02
+ Alternative Hypothesis H1 :
a
σ 2 < σ02
lower-tail
σ 2 6= σ02
two-tail
σ 2 > σ02
upper-tail
+ Test Statistic:
a
(n − 1)s2
C=
σ02
2
C ∼ χ(n−1)
.
+ Rejection Region (RR):
a
2
{C < χ(n−1),α
}
lower
2
{C < χ(n−1),
α } or
2
two
2
{C > χ(n−1),1−
α}
2
+ Significance Level:
a
P(Type I Error) = α
2
{C > χ(n−1),1−α
}
2
χ(n−1),α
= Ccrit
Ñ
2
χ(n−1),1−α
= Ccrit
Ñ
upper
2
P χ(n−1)
> Ccrit = α.
2
P χ(n−1)
> Ccrit = 1 − α.
30 / 47
Hypothesis Testing w.r.t σ 2 : Rejection Regions
α
χ(n-1), α
α
0
χ(n-1), 1−α
31 / 47
Hypothesis Testing w.r.t σ 2 : Example
A fund manager is considering migrating some of her traditionally
stocked portfolio to a crypto-based portfolio. She was told by a
crypto enthusiast that the daily change in value of individual
cryptocurrency has a normal distribution with variance (or volatility)
not more than 100% and an average of 0%. So, she decided to
switch if she can validate the claim. She opts to test the claim with
the sum of daily volatilities. Subsequently, she collected data of the
daily volatility of 12 representative and independent
cryptocurrencies. See the table below for the crypto volatility data
collected by the manager. Conduct an appropriate hypothesis test.
Report the corresponding p-value (as accurately as possible) and
advise based on a 5% statistical significance level whether the
manager should migrate her portfolio.
111.70%
280.80%
314.10%
45.00%
246.40%
81.10%
186.80%
70.30%
177.40%
43.00%
116.70%
226.30%
32 / 47
Hypothesis Testing w.r.t σ 2 : Example
Claim:
I Let C = Daily change in the value of cryptocurrency i.
i
I Further, let s2 = Observed daily variability of cryptocurrency i.
i
Ci ∼ N(0, 1)
"
Y=
12
X
i=1
#
Ci ∼ N
12
X
!
12
X
0,
i=1
1
= N(0, 12)
i=1
Data:
sy2 = 111.70% + 314.10% + . . . + 43.00% + 226.30% = 1899.60% = 18.996
Analysis:
H0 :
σy2 ≤ 12
H1 :
σy2 > 12
α = 0.05
Xstat
=
(n − 1)sy2
σy2
=
11 × 18.996
12
≈
17.413
p-value:
2
χ(11),0.05
≈
19.6751
2
χ(11),0.10
≈ 17.2750
2
∴ 0.05 < P χ(11)
> Xstat < 0.10
Conclusion: Fail to reject the null hypothesis.
There is not enough information in the available
data to suggest that, on average, the variability of
a crypto asset exceeds 100%. So, the manager
may migrate her portfolio.
33 / 47
χ2crit ≈ 17.275
α = 0.10
χ2crit ≈ 19.6751
α = 0.05
0
17.4
34 / 47
Two-Sample Tests
35 / 47
Large Sample Tests: µ1 − µ2
x1 = {x1,1 , . . . , xn1 ,1 } ∼ N(µ1 , σ12 ) and x2 = {x1,2 , . . . , xn2 ,2 } ∼ N(µ2 , σ22 )
+ Null Hypothesis H0 :
a
?
µ1 − µ2 = D0
+ Test Statistic:
a
x̄ − x̄ − D
Z = 1q 2 2 2 0 ∼ N(0, 1)
σ1
σ2
n + n
1
+ Alternative Hypothesis H1 :
a
µ1 − µ2 < D0
lower-tail
µ1 − µ2 6= D0
two-tail
µ1 − µ2 > D0
upper-tail
2
+ Rejection Region (RR):
a
{Z < −zα } lower-tail
{|Z| > z α }
two-tail
2
{Z > zα }
upper-tail
+ Conditions:
+ Significance Level:
a
P(Type I Error) = α
I x1 and x2 are independent.
I σ12 and σ22 are known or sample sizes
are large enough (i.e., n12 , n22 > 100)
such that they could be accurately
estimated.
36 / 47
Tests of Proportions: π1 − π2
p1 = (y1 /n1 ) ∼ N(π1 , π1 [1−π1 ]/n1 ) and p2 = (y2 /n2 ) ∼ N(π2 , π2 [1−π2 ]/n2 )
+ Null Hypothesis H0 :
a
?
π1 − π2 = D0
+ Test Statistic:
a
p − p2 − D0
Z= r 1
p̃(1 − p̃) n1 +
1
+ Alternative Hypothesis H1 :
a
π1 − π2 < D0
lower-tail
π1 − π2 6= D0
two-tail
π1 − π2 > D0
upper-tail
1
n2
Z ∼ N(0, 1)
+ Rejection Region (RR):
a
{Z < −zα } lower-tail
{|Z| > z α }
two-tail
2
{Z > zα }
+ Significance Level:
a
P(Type I Error) = α
+ Note:
p̃ =
upper-tail
n1 p1 + n2 p2
n1 + n2
37 / 47
Small Sample Tests: µ1 − µ2
x1 = {x1,1 , . . . , xn1 ,1 } ∼ N(µ1 , σ12 ) and x2 = {x1,2 , . . . , xn2 ,2 } ∼ N(µ2 , σ22 )
+ Null Hypothesis H0 :
a
?
µ1 − µ2 = D0
+ Test Statistic:
a
x̄ − x̄ − D0
T = r1 2
∼ t(n1 +n2 −2)
sp2 n1 + n1
1
+ Alternative Hypothesis H1 :
a
µ1 − µ2 < D0
lower-tail
µ1 − µ2 6= D0
two-tail
µ1 − µ2 > D0
upper-tail
2
+ Rejection Region (RR):
a
{T < −t(n +n −2),α } lower-tail
1
{|T| > t(n
{T > t(n
2
α
1 +n2 −2), 2
1 +n2 −2),α
}
}
two-tail
upper-tail
+ Note:
+ Significance Level:
a
P(Type I Error) = α
σ12 and σ22 are unknown and may not be
accurately estimated due to small sample
sizes. It is assumed that σ12 = σ22 .
sp2 =
(n1 − 1)s12 + (n2 − 1)s22
n1 + n2 − 2
38 / 47
Small Sample Tests: µ1 − µ2
x1 = {x1,1 , . . . , xn1 ,1 } ∼ N(µ1 , σ12 ) and x2 = {x1,2 , . . . , xn2 ,2 } ∼ N(µ2 , σ22 )
+ Null Hypothesis H0 :
a
?
µ1 − µ2 = D0
+ Test Statistic:
a
x̄ − x̄ − D0
T = r1 2
∼ t(ν)
s12
s22
n + n
1
+ Alternative Hypothesis H1 :
a
µ1 − µ2 < D0
lower-tail
µ1 − µ2 6= D0
two-tail
µ1 − µ2 > D0
upper-tail
2
+ Rejection Region (RR):
a
{T < −t(ν),α } lower-tail
{|T| > t(ν), α }
two-tail
2
{T > t(ν),α }
upper-tail
+ Note:
+ Significance Level:
a
P(Type I Error) = α
σ12 and σ22 are unknown and may not be
accurately estimated due to small sample
sizes. It is assumed
that σ12 6= σ22 . 



 (s2 /n + s2 /n )2



2 2
ν =  21 12
−
2

2
2
(s1 /n1 )
(s2 /n2 )
n +1 + n +1
1
2
39 / 47
Paired Sample Tests: µd
x1 = {x1,1 , . . . , xn1 ,1 } and x2 = {x1,2 , . . . , xn2 ,2 }
d = {x1,1 − x1,2 , . . . , xn,1 − xn,2 } = {d1 , . . . , dn } ∼ N(µd , σd2 )
+ Null Hypothesis H0 :
a
?
µd = µ0
+ Test Statistic:
a
x̄ − µ
T = d √ 0 ∼ t(n−1)
sd / n
+ Alternative Hypothesis H1 :
a
µd < µ0 lower-tail
+ Rejection Region (RR):
a
{T < −t(n−1),α } lower-tail
µd 6= µ0
two-tail
µd > µ0
upper-tail
{|T| > t(n−1), α }
two-tail
2
+ Significance Level:
a
P(Type I Error) = α
{T > t(n−1),α }
upper-tail
+ Note:
σd2 is unknown and may not
be accurately estimated due to
small sample size.
40 / 47
Test of Variances: σ12 ÷ σ22
x1 = {x1,1 , . . . , xn1 ,1 } ∼ N(µ1 , σ12 ) and x2 = {x1,2 , . . . , xn2 ,2 } ∼ N(µ2 , σ22 )
+ Null Hypothesis H0 :
a
?
σ12 ÷ σ22 = 1
+ Alternative Hypothesis H1 :
a
σ12 ÷ σ22 6= 1
two-tail
σ12 ÷ σ22 > 1 upper-tail
+ Significance Level:
a
P(Type I Error) = α
+ Test Statistic:
a
(n1 − 1)s12 (n2 − 1)s22
F =
(n1 − 1)σ12 (n2 − 1)σ22
= s12 s22 ∼ F(n1 −1,n2 −1)
+ Rejection Region (RR):
a
{F > F(n −1,n −1),α } lower-tail
1
{F > F(n
2
α}
1 −1,n2 −1), 2
two-tail
+ Note:
I x and x are independent.
1
2
I
σ 21 ≥ σ 22 .
41 / 47
Rejection Region for F Test
α
0
F(n1-1,n2-1), α
42 / 47
Two-Sample Test: Example
You intend to compare the quality of products between two electric circuit producing
companies. You collected independent sample products from the two industries. The
corresponding summary statistics with respect to the output voltages from the
samples are presented below. Based on a 5% probability of Type I error, which of the
manufacturers has better (higher output) products? Assume that output voltage is
normally distributed.
A
29
20.3
3.7
n
x̄
s
B
25
22.5
5.4
Solution:
Xa , Xb = voltage from companies A & B products respectively
Xa ∼ N µa , σa2 ;
and
Xb ∼ N µb , σb2
X̄a ∼ N µa , σa2 /29 ;
and
X̄b ∼ N µb , σb2 /25
α = 0.05
Preamble: verify if variances may be pooled
Critical value:
H0 :
σb2 /σa2 = 1
Fstat ≈ 2.13 < 2.17 ≈ F(24,28),0.025
H1 :
σb2 /σa2 6= 1
Fstat
=
sb2 /sa2
=
5.4 /3.7
2
2
= 2.13
Implication: Fail to reject null hypothesis. No enough
evidence to suggest that output voltage variability differs
between outputs from both companies.
43 / 47
α
= 0.025
2
F0.025 ≈ 2.17
α
= 0.025
2
Fstat ≈ 2.13
44 / 47
Two-Sample Test: Example
s
sp =
H0 :
µa − µb = 0
H1 :
µa − µb 6= 0
(na − 1) sa2 + (nb − 1) sb2
=
na + nb − 2
tstat =
r
(28 × 3.72 ) + (24 × 5.42 )
≈ 4.5640
29 + 25 − 2
(x̄a − x̄b ) − (µa − µb )
(20.3 − 22.5) − 0
q
q
=
≈ −1.7662
1
1
1
1
sp na + n
4.5640 29
+ 25
b
p-value:
T(52),0.025 ≈ 1.960
T(52),0.050 ≈ 1.645
0.05 = 2 × 0.025 < P T(52) > |tstat | < 2 × 0.050 = 0.10
Conclusion:
Fail to reject null hypothesis. There is insufficient evidence in the data to claim
that there is statistically significant difference between the average voltage output
generated by the circuits from the two industries.
45 / 47
t(52), 0.025 ≈ − 1.96
α
= 0.050
2
t(52), 0.05 ≈ − 1.645
α
= 0.025
2
tstat ≈ − 1.7662
46 / 47
End of Chapter 10.
47 / 47
Download