8 The Likelihood Ratio Test

advertisement
8 The Likelihood Ratio Test
8.1
The likelihood ratio
We often want to test in situations where the adopted probability model
involves several unknown parameters. Thus we may denote an element of
the parameter space by
θ = (θ 1 , θ 2 , . . . θ k )
Some of these parameters may be nuisance parameters, (e.g. testing hypotheses on the unknown mean of a normal distribution with unknown variance,
where the variance is regarded as a nuisance parameter).
We use the likelihood ratio, λ(x), defined as
sup {L(θ; x) : θ ∈ Θ0} , x ∈ Rn .
λ(x) =
X
sup {L(θ; x) : θ ∈ Θ}
The informal argument for this is as follows.
For a realisation x, determine its best chance of occurrence under H0 and also
its best chance overall. The ratio of these two chances can never exceed unity,
but, if small, would constitute evidence for rejection of the null hypothesis.
A likelihood ratio test for testing H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 is a test
with critical region of the form
C1
= {x : λ(x) ≤ k} ,
where k is a real number between 0 and 1.
Clearly the test will be at significance level α if k can be chosen to satisfy
sup {P (λ(X) ≤ k; θ ∈ Θ0)} = α.
If H0 is a simple hypothesis with Θ0 = {θ0}, we have the simpler form
X) ≤ k; θ0) = α.
P (λ(
To determine k, we must look at the c.d.f. of the random variable λ(X),
where the random sample X has joint p.d.f. fX (x; θ0).
69
Example
Exponential distribution
Test H0 : θ = θ0 against H1 : θ > θ0.
Here Θ0 = {θ0}, Θ1 = [θ0, ∞).
The likelihood function is
(
( ; x) =
n
L θ
=1
f xi ; θ)
= θn e
θ
−
x
i
.
i
The numerator of the likelihood ratio is
L(θ0 ;
x) = θ0 e
n
nθ 0 x
−
.
We need to find the supremum as θ ranges over the interval [θ0, ∞). Now
l(θ;
so that
x) = n log θ − nθx
x
which is zero
θ < 1 /x and
n
∂l(θ; )
=
− nx
∂θ
θ
only when θ = 1 /x . Since L(θ; )
decreasing for θ > 1 /x ,
sup {L(θ; x) : θ ∈ Θ} =
x
x ne n ,
θn0 e nθ0 x
−
−
−
70
is an increasing function for
if 1 /x ≥ θ0
if 1 /x < θ0
.
L(θ;x)
sup{ L(θ;x) : θ Θ }
∋
θ
θ0
1/x
L(θ;x)
sup{ L(θ;x) : θ Θ }
∋
 0
 1
0
θ0
x)
λ(
Since
θ
1/x
θn e nθ0 x
,
x ne n
,
−
=
−
θn xn e
=
1 /x ≥ θ0
1 /x < θ0
−
−
nθ 0 x n
e ,
1,
1 /x ≥ θ 0
1 /x < θ0
d
xn e nθ0 x = nxn 1 e nθ0 x (1 − θ0 x)
dx
is positive for values of x between 0 and 1 /θ0 where θ0 > 0, it follows that
λ( ) is a non-decreasing function of x. Therefore the critical region of the
−
x
−
−
likelihood ratio test is of the form
1= x:
n
C
=1
xi
≤c
.
i
Example The one-sample t-test
The null hypothesis is H0 : θ = θ0 for the mean of a normal distribution with
unknown variance σ2.
71
We have
and
Θ
Θ0
= {(θ, σ2) : θ ∈ R, σ2 ∈ R+ }
= {(θ, σ2) : θ = θ0, σ2 ∈ R+}
√ 1 2 exp − 1 2 (x − θ)2
2σ
2πσ
The likelihood function is
f (x; θ, σ 2) =
1
L(θ, σ 2 ; x) = (2πσ 2 ) n/2 exp − 2
2σ
(
1
2σ2
(
n
−
Since
l(θ0 , σ 2 ;
and
x) = − 2
n
∂l
∂σ 2
=
which is zero when
log(2πσ2) −
σ2 =
we conclude that
sup ( 0
L θ ,σ
+ 2σ1 4
− n2
2σ
1
n
(
(
n
n
=1
R.
x∈
,
x i − θ )2
=1
i
n
xi − θ0 )2
=1
i
xi − θ0 )2 ,
i
xi − θ0 )2
=1
i
2 ; x) = 2 (
n
π
n
=1
2
n
−
xi − θ0 )
/2
e
−
/2 .
n
i
For the denominator, we already know from previous examples that the m.l.e.
of θ is x, so
sup (
2 2 ; x) =
(
n
L θ, σ
π
xi − x)2
n i=1
n
−
/2
e
n
−
/2
=1( − 0)2 /2
(x) = ( − )2
=1
This may be written in a more convenient form. Note that
( − 0)2 = (( − ) + ( − 0))2
=1
=1
( − )2 + ( − 0)2
=
and
n
i
n
i
λ
n
i
xi
θ
xi
xi
n
i
n
=1
i
72
θ
x
xi
xi
−
n
.
x
x
x
n x
θ
θ
so that
λ(
x) =
The critical region is
1+
n(x − θ0 )2
n
2
i=1 (xi − x)
/2
n
−
.
= {x : λ(x) ≤ k}
so it follows that H0 is to be rejected when the value of
|x − θ 0 |
n
2
i=1 (xi − x)
C1
exceeds some constant.
Now we have already seen that
X−θ
√
S/ n
where
S2
= n −1 1
∼ t(n − 1)
(
n
=1
Xi − X )2 .
i
Therefore it makes sense to write the critical region in the form
|x −√θ0|
C1 = x :
≥c
s/ n
which is the standard form of the two-sided t -test for a single sample.
73
8.2
The likelihood ratio statistic
Since the function −2 log λ(x) is a decreasing function, it follows that the
critical region of the likelihood ratio test can also be expressed in the form
C1 = {x : −2 log λ(x) ≥ c} .
Writing
Λ(x) = −2 log λ(x) = 2 l(θ : x) − l(θ0 : x)
the critical region may be written as
C1 = {x : Λ(x) ≥ c}
and Λ(X) is called the likelihood ratio statistic.
We have been using the idea that values of θ close to θ are well supported
by the data so, if θ0 is a possible value of θ, then it turns out that, for large
samples,
D
Λ(X) →
χ2p
where p = dim(θ).
Let us see why.
8.2.1 The asymptotic distribution of the likelihood ratio statistic
Write
and, remembering that () = 0, we have
Λ ( − 0)2 − ()
= ( − 0)2 ()
= ( − 0)2 ( 0) (( ))
0
But
( − 0) ( 0)1/2 → (0 1) and (( )) → 1
0
so
( − 0)2 ( 0) → 21
l(θ0 )
= l(θ) + (θ − θ0)l (θ) + 21 (θ − θ0)2l (θ) + . . .
l θ
θ
θ
θ
θ
J θ
θ
θ
I θ
D
θ I θ
θ
l θ
θ
θ
N
J θ
.
I θ
J θ
I θ
,
I θ
74
D
χ
P
and Slutsky’s theorem gives
D
χ21
Λ→
provided θ0 is the true value of θ.
Example Poisson distribution
Let X = (X1, . . . , Xn) be a random sample from a Poisson distribution with
parameter θ, and test H0 : θ = θ0 against H1 : θ = θ0 at significance level
0.05.
The p.m.f. is
e θ θx
, x = 0, 1, . . .
p(x; θ) =
x!
so that
n
n
l(θ : x) = −nθ +
xi log θ − log
xi !
−
=1
=1
i
and
∂l(θ :
∂θ
giving θ = x.
Therefore
Λ = 2n
x) = −n + 1 x
i
n
θ
θ0 − x + x log
=1
i
i
x
θ0
.
The distribution of Λ under H0 is approximately χ21 and χ21(0.95) = 3.84, so
the critical region of the test is
C1
=
x : 2n
θ0 − x + x log
75
x
θ0
≥ 3.84
.
8.3
Testing goodness-of-fit for discrete distributions
The data below were collected by the ecologist E.C. Pielou, who was interested in the pattern of healthy and diseased trees. The subject of her research was Armillaria root rot in a plantation of Douglas firs. She recorded
the lengths of 109 runs of diseased trees and these are given below.
Run length
1 2 3 4 5 6
Number of runs 71 28 5 2 2 1
On biological grounds, Pielou proposed a geometric distribution as a probability model. Is this plausible?
Let’s try to answer this by first looking at the general case.
Suppose we have k groups with ni in the ith group. Thus
Group 1 2 3 4 · · · k
Number n1 n2 n3 n4 · · · nk
where i ni = n.
Suppose further that we have a probability model such that πi(θ), i =
1, 2, . . . , k, is the probability of being in the ith group. Clearly i πi(θ) = 1.
The likelihood is
k
π i (θ)n
L(θ) = n!
n!
=1
i
i
and the log-likelihood is
( )=
k
l θ
i
=1
( ) + log ! − log
k
ni log π i θ
n
=1
i
i
ni !
Suppose θ maximises l(θ), being the solution of l (θ) = 0.
The general alternative is to take πi as unrestricted by rthe model and subject
only to i πi = 1. Thus we maximise
(π) =
k
l
=1
ni log πi
+ log ! − log
k
n
i
=1
ni !
with
g (π )
=
i
Using Lagrange multiplier γ we obtain the set of k equations
∂l
∂πi
−γ
∂g
∂πi
= 0, 1 ≤ i ≤ k,
76
i
πi
= 1.
or
ni
πi
Writing this as
− γ = 0, 1 ≤ i ≤ k.
= 0, 1 ≤ i ≤ k
and summing over i we find γ = n and
ni − γπi
=
ni
.
n
πi
The likelihood ratio statistic is
Λ =
2
log − =1
=1
2
log
()
=1
k
ni
i
=
ni
n
k
i
k
ni
ni
nπi θ
i
ni log π i (θ)
.
General statement of asymptotic result for the likelihood ratio
statistic
Testing H0 : θ ∈ Θ0 ⊂ Θ against H1 : θ ∈ Θ, the likelihood ratio statistic
D
χ2p ,
Λ = 2 sup l(θ) − sup l(θ) →
Θ
Θ
θ∈
where
p=
θ∈ 0
dim Θ − dim Θ0
In the general case above where
Λ=2
k
=1
ni log
ni
nπi (θ )
i
,
the restriction ki=1 πi = 1 means that dim Θ = k − 1. Clearly dim Θ0 = 1
so p = k − 2 and
D
Λ→
χ2k 2 .
−
Example Pielou’s data
These are
Run length
1 2 3 4 5 6
Number of runs 71 28 5 2 2 1
77
and Pielou proposed a geometric model with p.m.f.
p(x)
= (1 − θ)x
−
1 θ,
x
= 1, 2, . . .
where x is the observed run length. Thus, if xj , 1 ≤ j ≤ n, are the observed
run lengths, the log-liklihood for Pielou’s model is
( )= (
n
l θ
j
and, maximising,
which gives
xj
=1
− 1) log(1 − θ ) + n log θ
=1
x −n
= − (1 −jθ) + nθ
∂l(θ)
∂θ
n
j
= 1
θ
x
By the invariance property of m.l.e.’s
.
= (1 − ) 1 = (
πi (θ)
θ
i−
x − 1)
.
xi
θ
i
The data give x = 1.523. We can therefore use the expression for πi(θ) to
calculate
k
= 3.547.
Λ = 2 ni log ni
nπi (θ)
i=1
There are six groups, so p = 6 − 1 − 1 = 4.
The approximate distribution of Λ is therefore χ24 and
P (Λ ≥ 3.547)
= 0.471.
There is no evidence against Pielou’s conjecture that a geometric distribution
is an appropriate model.
Example Two-way contingency table
Data are obtained by cross-classifying a fixed number of individuals according
to two criteria. They are therefore displayed as nij in a table with r rows
and c columns as follows.
n11
···
n1c
n1.
nr 1
n.1
···
···
nrc
n.c
nr.
n
...
...
...
78
...
The aim is to investigate the independence of the two classifications.
Suppose the kth individual goes into cell (Xk , Yk ), k = 1, 2, . . . , n, and that
individuals are independent. Let
P ((Xk , Yk ) = (i, j )) = θij , i = 1, 2, . . . , r; j = 1, 2, . . . , c,
where ij θij = 1. The null hypothesis of independence of classifiers can be
written H0 : θij = φiρj .
This is on Problem Sheet 4 so here are a few hints.
The likelihood function is
L(θ)
!
=n
i,j
so the log-likelihood is
l(θ )
=
nij log θij
nij
θij
nij !
+ log n! −
i,j
log
nij !
i,j
maximise with respect to
Under= 0, put = 1=. You willandobtain
= =
Under 1, maximise with respect to subject to
obtain
=
and, finally
log
Λ=2
H
i φi
j ρj
θij
φi ρ j
ni.
,
n
φi
H
ρj
r
=1 j =1
ij
nij
n
c
nij
i
nij n
ni. n.j
Example An historic data set - crime and drinking
79
and ρj subject to
n.j
n
θij
θij
φi
.
θij
= 1. You will
These are Pearson’s 1909 data on crime and drinking.
Crime Drinker Abstainer
Arson
50
43
Rape
88
62
Violence 155
110
Stealing
379
300
Coining
18
14
Fraud
63
144
Is crime drink related?
For these data, Λ = 50.52.
Under H0, Λ ∼ χ2p, where p = dim Θ − dim Θ0. In the notation used earlier,
there are apparently 6 values of φi to estimate, but in fact there are only 5
values because i φi = 1. Similarly there are 2 − 1 = 1 values of ρj . Thus
dim Θ0 = 6.
Because ij θij = 1, dim Θ = 12 − 1 = 11 so, therefore, p = 11 − 6 = 5.
Testing against a χ2-distribution with 5 degrees of freedom, note that the
0.9999 quantile is 25.75 and we can reject at the 0.0001 level of significance.
There there is overwhelming evidence that crime and drink are related.
Degrees of freedom
It is clear from the above that, when testing contingency tables, the number
of degrees of freedom of the resulting χ2-distribution is given, in general, by
p
= rc − 1 − (r − 1) − (c − 1)
= rc − r − c + 1
= (r − 1)(c − 1).
80
8.4
Pearson’s statistic
For testing independence in contingency tables, let Oij be the observed number in cell (i, j ), i = 1, 2, . . . , r; j = 1, 2, . . . , c, and Eij be the expected
number in cell (i, j ). Pearson’s statistic is
(Oij − Eij )2 ∼ χ2
P =
.
Eij
i,j
(r 1)(c 1)
−
−
The expected number Eij in cell (i, j ) is calculated under the null hypothesis
of independence.
If ni. is the total for the ith row and the overall total is n, then the probability
of an observation being in the ith row is estimated by
ni.
.
P (ith row) =
n
Similarly
n.j
P (j th column) =
n
and
Eij = n × P (ith row) × P (j th column)
= = ni.n.j .
n
Example Crime and drinking
These are the data on crime and drinking with the row and column totals.
Crime Drinker Abstainer Total
Arson
50
43
93
Rape
88
62 150
Violence
155
110 265
Stealing
379
300 679
Coining
18
14
32
Fraud
63
144 207
Total
753
673 1426
The Eij are easily calculated.
93 × 753 = 49.11, and so on.
E11 =
1426
Pearson’s statistic turns out to be P = 49.73, which is tested against a χ2distribution with (6 − 1) × (2 − 1) = 5 degrees of freedom and the conclusion
is, of course, the same as before.
81
8.4.1 Pearson’s statistic and the likelihood ratio statistic
=
P
(Oij − Eij )2
Eij
i,j
=
nij
i,j
− n nn 2
i.
.j
ni. n.j
n
Consider the Taylor expansion of x log(x/a) about x = a.
x log
=(
x
a
(x − a)2 − (x − a)3 + · · ·
2a
6a2
x − a) +
Now put x = nij and a = ni.nn.j so that
nij
Thus
log
=
i,j
nij n
ni. n.j
nij log
= n−n
or
ni. n.j
n
nij
−
n.j
n
+ 12
+ nij
− n nn 2
+ ···
2 n nn
i.
i.
.j
.j
nij n
ni. n.j
i
ni.
n
j
i,j
ΛP
82
(Oij − Eij )2 + · · · 1 P
Eij
2
Download