Uploaded by Jason K Moses

WST221Notes2022(1)

advertisement
'HSDUWPHQWRI6WDWLVWLFV
1RWHV
0DWKHPDWLFDO6WDWLVWLFV
:67
‹&RS\ULJKWUHVHUYHG
CONTENTS
CHAPTER 7: RANDOM VARIABLES AND THEIR DISTRIBUTIONS
7.1
INTRODUCTION
196
7.2
SEQUENCES OF RANDOM VARIABLES
196
7.3
THE CENTRAL LIMIT THEOREM
203
7.4
APPROXIMATIONS FOR THE BINOMIAL DISTRIBUTION
206
7.5
ASYMPTOTIC NORMAL DISTRIBUTIONS
208
7.6
STOCHASTIC CONVERGENCE
208
7.7
ADDITIONAL LIMIT THEOREMS
211
7.8
ASYMPTOTIC DISTRIBUTIONS OF EXTREME ORDER STATISTICS
214
CHAPTER 8: STATISTICS AND SAMPLING DISTRIBUTIONS
8.1
INTRODUCTION
215
8.2
STATISTICS
215
8.3
SAMPLING DISTRIBUTIONS
218
8.4
THE t, F AND BETA DISTRIBUTIONS
226
8.5
LARGE SAMPLE APPROXIMATIONS
238
CHAPTER 9: POINT ESTIMATION
9.1
INTRODUCTION
239
9.2
SOME METHODS OF ESTIMATION
240
9.3
CRITERIA FOR EVALUATING ESTIMATORS
253
9.4
LARGE SAMPLE PROPERTIES
267
9.5
BAYES AND MINIMAX ESTIMATORS
272
CHAPTER 10: SUFFICIENCY AND COMPLETENESS
10.1 INTRODUCTION
276
10.2 SUFFICIENT STATISTICS
277
10.3 FURTHER PROPERTIES OF SUFFICIENT STATISTICS
284
10.4 COMPLETENESS AND THE EXPONENTIAL CLASS
287
CHAPTER 11: INTERVAL ESTIMATION
11.1 INTRODUCTION
293
11.2 CONFIDENCE INTERVALS
294
11.3 PIVOTAL QUANTITY METHOD
297
11.4 GENERAL METHOD
302
11.5 CONFIDENCE INTERVALS FOR SOME STANDARD PROBLEMS
309
CHAPTER 12: TESTS OF HYPOTHESES
12.1 INTRODUCTION
317
12.2 P-VALUES AND THE USE OF CONFIDENCE INTERVALS FOR TESTING
HYPOTHESIS
12.3 TESTS OF HYPOTHESES FOR SOME STANDARD PROBLEMS
331
333
196
[Chapter 7]
CHAPTER 7 : LIMITING DISTRIBUTIONS
7.1 INTRODUCTION
In many cases the distribution of a function of random variables is very difficult to derive
exactly or may not even be very useful if it is known exactly. For large samples there are
many instances where a very good approximate distribution can be found and which can
be used very easily.
7.2 SEQUENCES OF RANDOM VARIABLES
ƒDefinition 7.2.1 :
Suppose that Y 1 , Y 2 , Y 3 , T is a sequence of random variables with corresponding
distribution functions G 1 , G 2 , G 3 , T i.e. P¡Y n t y¢ G n Ÿy for all values of n and for
all values of y.
If Y is a random variable with distribution function G and lim
G Ÿy GŸy for all
nv. n
values of y where G is continuous , we say that Y 1 , Y 2 , Y 3 , T converges in distribution
d
to Y and use the notation Y n v Y . The distribution function G is then called the
limiting distribution of Y n .
EXAMPLE 7.2.1 :
Suppose that X 1 , X 2 , X 3 , T , X n is a random sample from a population which is uniformly
distributed over the interval (0,1).
Let Y n:n max£X 1 , X 2 , X 3 , T , X n ¤. Then the distribution function of Y n:n is given by
G n Ÿy £FŸy ¤ n by th.6.5.3a
0
for y t 0
yn
for 0 t y t 1 since FŸy y for 0 y 1
1
for y u 1.
Therefore
lim
G Ÿy nv. n
0
for y 1.
for y u 1.
G Ÿy is the distribution function of a random variable Y such that
Hence lim
nv. n
P¡Y 1¢ 1.
1
197
[Chapter 7]
ƒDefinition 7.2.2 :
The distribution function GŸy of the random variable Y is called degenerate at the point
c if
0
for y c
GŸy 1
for y u c
i.e. if P¡Y c¢ 1.
EXAMPLES :
Example 7.2.2 :
Suppose that X 1 , X 2 , X 3 , T , X n is a random sample from a population which is
exponentially distributed with parameter 2 i.e. the distribution function is
0
for x 0
FŸx for x u 0.
1 " e "x/2
for all X i .
Let Y 1:n min£X 1 , X 2 , X 3 , T , X n ¤. Then the distribution function of Y 1:n is given by
G n Ÿy 1 " £1 " FŸy ¤ n by th.6.5.3a
for y 0
0
1 " ¡1 " £1 "
n
e "y/2 ¤¢
0
for y 0
1 " e "ny/2
for y u 0
for y u 0
Therefore
lim
G Ÿy nv. n
0
for y t 0
1
for y 0
G Ÿy is not the distribution function of a random variable since it is not
Note that lim
nv. n
continuous from the right at y 0. However, if we make the function continuous from
the right at 0, i.e. take
0
for y 0
GŸy 1
for y u 0,
G Ÿy GŸy at all points y where G is continuous and therefore we do get
then lim
nv. n
convergence in distribution to G.
198
[Chapter 7]
Example 7.2.3
Suppose that Y 1 , Y 2 , Y 3 , T is a sequence of random variables such that Y n is normally
distributed with expected value 1n and variance 1. Then
G n Ÿy P¡Y n t y¢
P
Y n " 1n
1
P¡Z t y "
oŸy "
y" 1n
1
t
1
n
¢
where Z L NŸ0, 1
1
n
and therefore
G Ÿy oŸy
lim
nv. n
since o a continuous function.
d
Hence Y n v Z where Z L NŸ0, 1 .
Example 7.2.4
Suppose that Y 1 , Y 2 , Y 3 , T is a sequence of random variables such that Y n is normally
2
distributed with expected value 0 and variance Ÿ 1n . Then
G n Ÿy
P¡Y n t y¢
P
Y n "0
1
n
t
y"0
1
n
P¡Z t ny¢
where Z L NŸ0, 1
oŸny .
It then follows that
lim
G Ÿy nv. n
oŸ".
if y 0
oŸ0
if y 0
oŸ.
if y 0
0
if y 0
0. 5
if y 0
1
if y 0
G Ÿy is not a distribution function since it is not
In this case we also get that lim
nv. n
continuous from the right at 0. If we make the function continuous from the right at 0 i.e.
take
0
for y 0
GŸy 1
for y u 0,
G Ÿy GŸy at all points y where G is continuous and therefore we do get
then lim
nv. n
convergence in distribution to G.
199
[Chapter 7]
ƒDefinition 7.2.3 :
A sequence of random variables is said to converge stochastically to a constant c if the
limiting distribution is degenerate at c.
EXAMPLES :
Example 7.2.5 :
Suppose that Y 1 , Y 2 , Y 3 , T is a sequence of normally distributed random variables such
that Y n L NŸn, 1 . Then
G n Ÿy P¡Y n t y¢
P
Y n "n
1
t
y"n
1
P¡Z t y " n¢
oŸy " n .
It then follows that
lim
G Ÿy oŸ".
nv. n
where Z L NŸ0, 1
for all y
0
for all y
In this case there does not exist a distribution function G such that
lim
G Ÿy GŸy for all y where G is continuous i.e. the sequence Y 1 , Y 2 , Y 3 , T does
nv. n
not converge in distribution.
Example 7.2.6
Suppose that S n is the sum of all the observations in a random sample from a normal
population with expected value 6 and variance @ 2 . Then S n is normally distributed
with expected value n6 and variance n@ 2 - from Example 6.4.5. Therefore
G n Ÿy P¡S n t y¢
P
S n " n6
y " n6
t
n@
n@
P Zt
o
y " n6
n@
y " n6
n@
.
where Z L NŸ0, 1
200
[Chapter 7]
It then follows that
oŸ".
if 6 0
for all y
lim
G Ÿy nv. n
oŸ0
if 6 0
for all y
oŸ.
if 6 0
for all y
In none of these cases does there exist a distribution function G such that
G Ÿy GŸy i.e. there does not exist a limit distribution for S n .
lim
nv. n
S " n6
.
Now let Y n n
n@
Then
S n " n6
G n Ÿy P
ty
n@
P¡Z t y¢
where Z L NŸ0, 1
oŸy
i.e. lim
G Ÿy oŸy
nv. n
d
for all y i.e. Y n v Z where Z L NŸ0, 1 .
Note that although S n itself does not converge in distribution to any distribution, a linear
function of S n namely Y n does converge in distribution. To transform to Y n we
subtracted the expected value and divided by the standard deviation i.e. we obtained a
random variable with expected value 0 and variance 1. It turns out in many cases that if
we standardise random variables in this way that we do obtain a limit distribution.
JTheorem 7.2.1a :
Suppose that c and b are constants and dŸn a function of n such that lim
dŸn 0.
nv.
Then
nb
c dŸn
1
e cb .
lim
n
n
nv.
201
[Chapter 7]
EXAMPLES :
Example 7.2.7
Suppose that X 1 , X 2 , T , X n is a random sample from a population with a Pareto
distribution with parameters 1 and 1. Then
for x 0
Ÿ1 x "2
f X i Ÿx 0
otherwise
and
x
F X i Ÿx
; f X i Ÿt dt
".
for x t 0
0
x
;Ÿ1 t
"2
for x 0
dt
0
for x t 0
0
"Ÿ1 t
x
"1
0
for x 0
for x t 0
0
1 " Ÿ1 x
"1
for x 0
Let Y 1:n min£X 1 , X 2 , T , X n ¤. Then
F Y 1:n Ÿy 1 " ¡1 " F X Ÿy ¢ n where F X is the distribution function of all X i ’s
1 " ¡1 " 0¢ n
1 " 1 " 1 " Ÿ1 y
"1
for y t 0
0
1 " ¡1 y¢
F Ÿy i.e. lim
nv. Y 1:n
i.e. Y 1:n
for y t 0
0
"n
for y 0
n
for y 0
(7.2.3)
for all y t 0
1
for all y 0
converge in distribution to a degenerate distribution.
202
[Chapter 7]
Now let Z n nY 1:n .
Then
G n Ÿy P¡Z n t y¢
P¡nY 1:n t y¢
P¡Y 1:n t y/n¢
F Y 1:n Ÿy/n
0
for y t 0
1 " ¡1 y/n¢ "n
for y 0
for y t 0
0
i.e. lim
G Ÿy nv. n
from (7.2.3)
1 " e "y
for y 0 from th.7. 2. 1a
i.e. Z n converges in distribution to an exponential distribution with parameter 1.
Example 7.2.8
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution with parameter 2.
Let Y n:n max£X 1 , X 2 , T , X n ¤. Then
F Y n:n Ÿy £FŸy ¤ n from th.6.5.3a
£1 " e "y/2 ¤
n
£0¤ n
for y 0
(7.2.4)
for y 0
Let Z n Y n:n /2 " ln n. Then
G n Ÿy P¡Z n t y¢
P
Y n:n
2
" ln n t y
P¡Y n:n t 2Ÿy ln n ¢
F Y n:n ¡2Ÿy ln n ¢
1 " e " 2 ¡2Ÿyln n
¡1 "
i.e. lim
G Ÿy nv. n
1
¢
n
for n big enough y lnŸn 0
"y
n
e
n ¢
"y
e "e
for all y.
In the examples above G n Ÿy P¡Y n t y¢ is known for all values of n.
G Ÿy GŸy , then GŸy is an approximation of G n Ÿy for large n. How good this
If lim
nv. n
approximation will be, will differ for different cases, and will depend on how big n is.
In general the approximation will be good enough provided n is big enough. See for
instance Bain and Engelhardt p236.
203
[Chapter 7]
OEXERCISES :
Bain and Engelhardt p.259 No. 1 and p260 No. 3.
7.3 THE CENTRAL LIMIT THEOREM
JTheorem 7.3.1
Suppose that Y 1 , Y 2 , Y 3 , T is a sequence of random variables with distribution functions
G 1 , G 2 , G 3 , T and moment generating functions M 1 , M 2 , M 3 , T . Suppose that Y is a
random variable with distribution function G and moment generating function M. If
lim M n Ÿt MŸt for all t in an interval "h t h where h 0, then
nv.
d
lim G n Ÿy GŸy for all y where G is continuous i.e. Y n v Y.
nv.
Proof : We will not give a proof of this theorem.
EXAMPLES :
Example 7.3.1
Suppose that Y 1 , Y 2 , Y 3 , T is a sequence of binomial random variables with parameters n
and 6/n. Then E¡Y n ¢ 6 for all values of n. Let M n Ÿt be the moment generating
function of Y n . Then
6
6 n
M n Ÿt n e t 1 " n
n
6Ÿe t " 1
1
n
i.e. lim M n Ÿt e 6Ÿe "1
t
for all values of t.
nv.
Since this limit moment generating function is that of a Poisson random variable with
d
parameter ȝ , it follows from th.7.3.1 that Y n v Y where Y is a Poisson random variable
with parameter ȝ.
Example 7.3.2
Suppose that Y n is the number of successes in n independent trials where only success or
not-success is observed and the probability of success is p for all trials i.e. Y n is a
binomial random variable with parameters n and p.
Let W n Y n /n i.e. W n is the proportion of successes in n trials.
Hence
t
t
n
M Wn Ÿt E¡e tWn ¢ E¡e n Y n ¢ M Y n Ÿt/n ¡pe n 1 " p¢
Let fŸt e t .
204
[Chapter 7]
Then
fŸt
et
fŸ0 1
f U Ÿt
et
f U Ÿ0 1
and f UU Ÿt
et.
2
where 8 is between 0 and t
et 1 t t e8
2
2
t
e n 1 nt t 2 e 8 n where 8 n is between 0 and t/n .
2n
Therefore
or
Hence M Wn Ÿt
2
p 1 nt t 2 e 8 n
2n
2
8n
p t2 en
pt
1 n n
dŸn
pt
1 n n
1"p
n
n
n
8n
2
where dŸn p t en v 0 as n v . for all values of t.
2
Therefore lim
M Ÿt e pt for all values of t.
nv. W n
But e pt is the moment generating function of a random variable which is equal to p with
probability 1. From th.7.3.1 it then follows that the proportion of successes converge
stochastically to p.
JTheorem 7.3.2 THE CENTRAL LIMIT THEOREM
Suppose that X 1 , X 2 , X 3 , T , X n is a random sample from a population with expected
value ȝ and variance ı 2 ..
n
! X i " n6
Let Z n i1
.
n@
d
Then Z n v Z where Z is a standard normal random variable.
Proof:
For the proof we also assume that the moment generating function of the X i ’s exist. Let
mŸt be the moment generating function of X i " 6. Then mŸ0 1,
m U Ÿ0 E¡X i " 6¢ 0 and m UU Ÿ0 E ŸX i " 6 2 @ 2 . Hence
mŸt
2
mŸ0 m U Ÿ0 t m UU Ÿ8 t
where 8 is between 0 and t
2
2
1 m UU Ÿ8 t
2
2t2
2
@
1
£m UU Ÿ8 " @ 2 ¤ t .
2
2
205
[Chapter 7]
Note that
n
! X i " n6
Zn i1
n@
n
!
i1
Xi " 6
n@
and therefore
M Z n Ÿt
n
E exp t !
i1
X1 " 6
n@
E exp t
E exp t
n
i1
n
1
exp t
Xi " 6
n@
X2 " 6
n@
C exp t
Xn " 6
n@
since X i ’s are independent
t
n@
m
i1
Xi " 6
n@
@2
t2
n@ 2
2
n
¡m UU Ÿ8 n
t2
2
dŸn
1 n n
2
" @2 ¢ t 2
2n@
where 8 n is between 0 and
t
n@
n
where dŸn v 0 as n v . since 8 n v 0 and m UU Ÿ8 n v @ 2 .
1 2
M Ÿt e 2 t
Therefore lim
nv. Z n
which is the moment generating function of a standard normal distribution.
d
From th 7.3.1 it then follows that Z n v Z where Z is a standard normal random variable.
n
! X i " n6
For large values of n, Z n i1
n@
standard normal random variable. ſ
is therefore approximately distributed like a
OEXERCISES :
Bain and Engelhardt p260 Nos. 9 and 11 .
206
[Chapter 7]
7.4 APPROXIMATIONS FOR THE BINOMIAL DISTRIBUTION
Suppose that Y n is a binomial random variable with parameters n and p. It then follows
Y " np
is approximately distributed like a standard normal random variable.
that Z n n
npq
How good this approximation is, will depend on p and on how big n is. In general a
reasonable approximation will be obtained if np u 5 and nq u 5.
EXAMPLES :
Example 7.4.1
Suppose that Y n is a binomial random variable with parameters n 20 and p 0. 5.
Then
P¡Y 20 8¢ 1 " P¡Y 20 t 8¢
8
1"!
y0
n Ÿ0. 5 y Ÿ1 " 0. 5
y
1 " 0. 2517
20"y
from table I p601 Bain and Engelhardt
0. 7483.
If we use the normal approximation we obtain
P¡Y 20 8¢ 1 " P¡Y 20 t 8¢
1"P
Y 20 " 20 •. 5 t
20 •. 5 •. 5
` 1 " P¡Z t ". 89¢
1 " ¡1 ". 8133¢
. 8133.
8 " 20 •. 5
20 •. 5 •. 5
where Z L NŸ0, 1
207
[Chapter 7]
A better approximation can be obtained by using the so-called correction for continuity
since we are approximating a binomial distribution which is discrete with a normal
distribution which is continuous. Suppose that Y n L BINŸn, p . Then
P¡a t Y n t b¢ ` P¡a ". 5 t Y n t b . 5¢
a ". 5 " np
Y " np
b . 5 " np
P¡
t n
t
¢
npq
npq
npq
` P¡
o
a ". 5 " np
b . 5 " np
tZt
¢
npq
npq
b . 5 " np
npq
"o
where Z L NŸ0, 1
a ". 5 " np
npq
Example 7.4.2
The same as Example 7.4.1 i.e. Y 20 L BINŸ20, . 5 .
With the correction for continuity we get that
P¡Y 20 8¢ 1 " P¡Y 20 t 8¢
` 1 " P¡Y 20 t 8 . 5¢
1"P
Y 20 " 20 •. 5 t 8. 5 " 20 •. 5
20 •. 5 •. 5
20 •. 5 •. 5
` 1 " P Z t 8. 5 " 20 •. 5
20 •. 5 •. 5
where Z L NŸ0, 1
1 " ¡1 ". 7486¢
. 7486
and this approximation is almost precisely the correct probability.
OEXERCISES :
Bain and Engelhardt p260 No. 12 .
208
[Chapter 7]
7.5 ASYMPTOTIC NORMAL DISTRIBUTIONS
ƒDefinition 7.5.1 :
If Y 1 , Y 2 , T is a sequence of random variables and m and c are constants such that
d
Z n Y n c" m v Z where Z is a standard normal random variable, then Y n is said to
n
have an asymptotic normal distribution with asymptotic mean m and asymptotic variance
c 2 /n .
EXAMPLE 7.5.1
Suppose that a random sample of size n 40 is selected from a population with an
exponential distribution with parameter 100. By the central limit theorem
n
! X i " n • 100
i1
n • 100
" 100 converges in distribution to a standard normal random
X n 100
n
variable i.e. X 40 has an asymptotic normal distribution with asymptotic mean 100 and
asymptotic variance 100 2 /40 250.
7.6 STOCHASTIC CONVERGENCE
ƒDefinition 7.6.1 :
A sequence of random variables Y 1 , Y 2 , T is said to converge stochastically to a constant
c if it has a limiting distribution that is degenerate at the point c i.e. if G n is the
distribution function of Y n then
lim
G Ÿy 0 for y c
nv. n
1 for y c.
209
[Chapter 7]
JTheorem 7.6.1
A sequence of random variables Y 1 , Y 2 , T converges stochastically to a constant c if and
only if for every . 0
P¡|Y n " c| .¢ 1.
lim
nv.
Proof :
Suppose that we have stochastic convergence to the constant c. For any . 0 we have
that
1 u P¡|Y n " c| .¢ P¡Y n c .¢ " P¡Y n t c " .¢
u P¡Y n t c ./2¢ " G n ¡c " .¢
since c ./2 c .
G n ¡c ./2¢ " G n ¡c " .¢
The limit of the right-hand side is 1 i.e. lim
P¡|Y n " c| .¢ exists and is equal to 1.
nv.
P¡|Y n " c| .¢ 1 for all . 0.
Now suppose that lim
nv.
Let y c. Then . Ÿc " y /2 0.
Also y . y Ÿc " y /2 Ÿc y /2 and c " . c " Ÿc " y /2 Ÿc y /2 i.e.
y . c " ..
We then have that
0 t G n Ÿy P¡Y n t y¢ t P¡Y n t y .¢ P¡Y n t c " .¢ t P¡|Y n " c| u .¢
1 " P¡|Y n " c| .¢.
Since the right-hand side goes to zero it follows that lim
G Ÿy exist and is equal to 0.
nv. n
The proof for y c is similar and is left as an exercise. ſ
ƒDefinition 7.6.2 :
If Y 1 , Y 2 , T is a sequence of random variables and c a constant, then we say that the
sequence of random variables converge to the constant c in probability if
P
P¡|Y n " c| .¢ 1 for all . 0 and is denoted by Y n v c.
lim
nv.
210
[Chapter 7]
EXERCISE 7.6.1 :
§
Let p n be the proportion of successes in n Bernoulli trials with probability of success
p.
L
§
Then E p n
§
p and var p n
L
pŸ1 " p /n.
L
By Chebychev’s inequality we then get that for any . 0
§
p n "p .
1 u lim
P
nv.
L
lim
1"P
nv.
§
p n "p u .
1 " lim
P
nv.
§
p n "p u .
u 1"
L
L
pŸ1"p
n
lim
nv.
.2
Since the limit on the right-hand side is 1, it follows that
§
§ P
P p n "p . 1 and therefore p n v p.
lim
nv.
L
L
JTheorem 7.6.2
Let X 1 , X 2 , T , X n be a random sample from a population with expected value 6 and
P
variance @ 2 . with sample mean X n . Then X n v 6.
Proof :
For any . 0 we have that
P¡|X n " 6| .¢ lim
1 u lim
£1 " P¡|X n " 6| u .¢¤
nv.
nv.
1 " lim
£P¡|X n " 6| u .¢¤
nv.
u 1 " lim
nv.
@ 2 /n
.2
From Chebychev’s theorem
Since the limit on the right-hand side is 1, it follows that
P
lim
P¡|X n " 6| .¢ 1 and therefore X n v 6. ſ
nv.
JTheorem 7.6.3
d
P
If Z n Y n c" m v Z where Z is a standard normal random variable, then Y n v m.
n
211
[Chapter 7]
7.7 ADDITIONAL LIMIT THEOREMS
ƒDefinition 7.7.1 :
A sequence of random variables Y 1 , Y 2 , T is said to converge in probability to a random
variable Y if
P
P¡|Y n " Y| .¢ 1 for any . 0 and denoted by Y n v Y.
lim
nv.
NOTE: In this definition the "limit" is a random variable and not a constant. In case Y is
a random variable which is equal to a constant m with probability 1 the two definitions
are the same.
JTheorem 7.7.1a
P
If Y n v Y then lim
P¡|Y n " Y| t .¢ 1 for any . 0.
nv.
Proof :
Let . 0. Then
1 u P¡|Y n " Y| t .¢ u P¡|Y n " Y| .¢
and letting n tend to infinity if follows that
1 u lim
P¡|Y n " Y| t .¢ u 1
nv.
i.e. lim
P¡|Y n " Y| t .¢ 1.
nv.
From this it then also follows that
lim
P¡|Y n " Y| .¢ 0 and
nv.
lim
P¡|Y n " Y| u .¢ 0.
nv.
ſ
JTheorem 7.7.1
If Y 1 , Y 2 , T is a sequence of random variables that converge to a random variable Y in
P
d
probability then they converge to Y in distribution i.e. Y n v Y ´ Y n v Y.
Proof :
Let G n be the distribution function of Y n , G the distribution function of Y and let
. 0. Then
GŸy " . P¡Y t y " .¢
P¡Y t y " ., Y n t y¢ P¡Y t y " ., Y n y¢
t P¡Y n t y¢ P¡Y n " Y .¢
t G n Ÿy P¡|Y n " Y| .¢.
212
[Chapter 7]
Letting n tend to infinity on both sides we obtain that
GŸy " . t lim
G Ÿy .
nv. n
Similarly
GŸy . P¡Y t y .¢
u P¡Y t y ., Y n t y¢
P¡Y n t y¢ " P¡Y n t y, Y y .¢
u P¡Y n t y¢ " P¡Y " Y n .¢
u P¡Y n t y¢ " P¡|Y n " Y| .¢.
G n Ÿy " P¡|Y n " Y| .¢.
Letting n tend to infinity we obtain that
G Ÿy .
GŸy . u lim
nv. n
Thus GŸy " . t lim
G Ÿy t GŸy . for any . 0.
nv. n
If G is continuous at the point y, if we let . tend to 0 in the above,
we get that GŸy t lim
G Ÿy t GŸy i.e. lim
G Ÿy GŸy .
nv. n
nv. n
Therefore Y n converge to Y in distribution. ſ
NOTE : The converse of theorem 7.7.1 is not true, i.e. convergence in probability is a
stronger convergence than convergence in distribution.
JTheorem 7.7.2
P
P
If Y n v c, then for any function g that is continuous at c , gŸY n v gŸc .
Proof :
If g is continuous at the point c, there exist for every . 0 a - 0 such that
|y " c| - .implies |gŸy " gŸc | ..
Therefore
1 u P¡|gŸY n " gŸc | .¢ u P¡|Y n " c| -¢.
Letting n go to infinity on both sides we obtain that
P
P¡|gŸY n " gŸc | .¢ u 1 since Y n v c
1 u lim
nv.
P
P¡|gŸY n " gŸc | .¢ 1 for any . 0 i.e. gŸY n v gŸc .
i.e. lim
nv.
ſ
213
[Chapter 7]
We note the following theorems with no proofs given.
JTheorem 7.7.3
P
If X 1 , X 2 , T and Y 1 , Y 2 , T are sequences of random variables such that X n v c and
P
Y n v d, then
P
1. aX n bY n v ac bd
P
2. X n Y n v cd
P
3. X n /c v 1 for c p 0
P
4. 1/X n v 1/c if P¡X n p 0¢ 1 for all n and c p 0
5.
P
Xn v
c if P¡X n u 0¢ 1 for all n.
JTheorem 7.7.4
P
If X 1 , X 2 , T and Y 1 , Y 2 , T are sequences of random variables such that X n v c and
d
Y n v Y, then
d
1. X n Y n v c Y
d
2. X n Y n v cY
d
3. Y n /X n v Y/c for c p 0
JTheorem 7.7.5
d
d
If Y n v Y, then for any continuous function g, gŸY n v gŸY .
JTheorem 7.7.6
n ŸY n " m d
v Z where Z is a standard normal random variable and if g has a
If
c
n ŸgŸY n " gŸm
d
v Z i.e. gŸY n has an
non-zero derivative at y m, then
U
|cg Ÿm |
asymptotic normal distribution with asymptotic expected value gŸm and asymptotic
2
c 2 ¡g U Ÿm ¢
.
variance
n
214
7.8
[Chapter 7]
ASYMPTOTIC DISTRIBUTIONS OF EXTREME ORDER
STATISTICS
Let X 1:n , X 2:n , T , , X n:n be the order statistics for a random sample from a population
with distribution function F.
The maximum X n:n is said to have a non-degenerate limiting distribution G if there
exist sequences of standardizing constants £a n ¤ and £b n ¤ with a n 0 such that the
d
bn v
Y where Y is a random variable with
standardized variable Y n X n:na "
n
distribution function G. We will then say that X n:n has a limiting distribution of type
G.
Similarly for X 1:n .
It can be shown that there are only three types of limiting distributions for the extreme
order statistic X n:n , namely
"y
1. GŸy e "e for y 0
2. GŸy e "y
"+
for y 0
+
3. GŸy e "Ÿ"y for y 0.
Similarly it can be shown that there are only three types of limiting distributions for the
extreme order statistic X 1:n , namely
y
1. GŸy 1 " e "e for all y
2. GŸy 1 " e "Ÿ"y
"y +
"+
for y 0
for y 0.
3. GŸy 1 " e
Note that the type 3 distribution for minimums is a Weibull distribution.
215
[Chapter 8]
CHAPTER 8 : STATISTICS AND SAMPLING
DISTRIBUTIONS
8.1 INTRODUCTION
In this chapter we study the distribution of a number of statistics which are later used for
statistical inference.
8.2 STATISTICS
ƒDefinition 8.2.1 :
Suppose that X 1 , X 2 , X 3 , T X n are jointly distributed random variables.
Let tŸX 1 , X 2 , X 3 , T X n T be a function of X 1 , X 2 , X 3 , T X n which does not depend on
any unknown parameters.
Then T is called a statistic.
EXAMPLES :
Example 8.2.1
Suppose that X 1 , X 2 , X 3 , T X n is a random sample from some population.
Let X Let s 2 L
n
Xi
! i1
n
. Then X is a statistic and is called the sample mean .
n
! i1
ŸX i " X
n"1
2
. Then s 2 is a statistic and is called the sample variance.
L
Example 8.2.2
Suppose that X 1 , X 2 , X 3 , T X n are independent Bernoulli random variables all with
n
§ ! i1 X i
§
. Then p is a statistic and is called the sample proportion.
parameter p. Let p
n
L
L
Statistics ( the plural of statistic as defined above) are therefore functions of random
variables. We are interested in the properties of these random variables such as the
expected value, variance and the distribution as for any other random variable.
216
[Chapter 8]
JTheorem 8.2.1 :
Suppose that X 1 , X 2 , X 3 , T X n is a random sample from a population with expected value
ȝ and variance ı 2 .
n
Xi
! i1
@2 .
.
Then
E¡X¢
6
and
varŸX
Let X n
n
Proof :
E¡X¢ E¡ 1n £X 1 X 2 X 3 T X n ¤¢
1
n
1
n
1
n
E¡X 1 X 2 X 3 T X n ¢
£E¡X 1 ¢ E¡X 2 ¢ E¡X 3 ¢ T E¡X n ¢¤
£6 6 6 T 6¤
6.
Furthermore
var¡X¢ var¡ 1n £X 1 X 2 X 3 T X n ¤¢
1
n2
1
n2
@2
n
¡varŸX 1 varŸX 2 varŸX 3 T varŸX n ¢ since X i ’s are independent
• n@ 2
.
ſ
JTheorem 8.2.2 :
Suppose that X 1 , X 2 , X 3 , T X n is a random sample from a population with expected value
ȝ and variance ı 2 .
2
n
! i1
ŸX i " X
2
Let s . Then E s 2
@2.
n"1
L
L
Proof :
Note first of all that
n
!ŸX i " X
2
i1
n
! X 2i " 2XX i ŸX
i1
n
2
n
! X 2i " 2X ! X i nŸX
i1
n
i1
! X 2i " 2XŸnX nŸX
i1
n
! X 2i " nŸX
i1
and therefore
2
2
2
217
[Chapter 8]
E s2
L
n
E
! X 2i " nŸX
1
n"1
i1
n
1
n"1
2
E ! X 2i " nŸX
2
i1
n
1
n"1
! EŸX 2i " nE X 2
1
n"1
! varŸX i ŸEŸX i
1
n"1
1
n"1
@2.
i1
n
2
" n varŸX ŸEŸX
2
i1
n£@ 2 6 2 ¤ " n
@2
n
62
¡Ÿn " 1 @ 2 ¢
ſ
JTheorem 8.2.3a :
Suppose that X 1 , X 2 , X 3 , T X n are independent Bernoulli random variables all with
parameter p .
n
pq
§ ! i1 X i
§
§
. Then E p p and var p n .
Let p
n
L
L
L
Proof :
The variable X i is 1 if the i-th observation is a success and 0 if the i-th observation is
n
not-success i.e. ! X i is the number of successes in n independent trials where the
i1
n
probability of success is p at each trial i.e. ! X i L BINŸn, p . Therefore
i1
§
E p
L
E
n
Xi
! i1
n
n
1n E ! i1 X i
1n np
p
and
218
[Chapter 8]
§
var p
var
n
Xi
! i1
n
L
n
12 var ! i1 X i
n
12 npq
n
pq
n . ſ
8.3 SAMPLING DISTRIBUTIONS
LINEAR COMBINATIONS OF NORMAL RANDOM VARIABLES
JTheorem 8.3.1 :
Suppose that X 1 , X 2 , X 3 , T X n are independent random variables and that
n
X i L NŸ6 i , @ 2i . Let Y ! a i X i where a 1 , a 2 , T , a n are constants. Then Y is normally
i1
n
n
i1
i1
distributed with expected value ! a i 6 i and variance ! a 2i @ 2i .
Proof:
M Y Ÿt
n
E exp t ! a i X i
i1
E¡expŸŸta 1 X 1 expŸŸta 2 X 2 T expŸŸta n X n ¢
E¡expŸŸta 1 X 1 ¢ E¡expŸŸta 2 X 2 ¢ T E¡expŸŸta n X n ¢
since X 1 , X 2 , T , X n
are independent.
M X 1 Ÿa 1 t M X 2 Ÿa 2 t T M X n Ÿa n t
n
M X i Ÿa i t
i1
n
exp 6 i Ÿa i t @ 2i
i1
exp
Ÿa i t
2
n
n
i1
i1
2
2
! a i 6 i t ! a 2i @ 2i t
2
This is the moment generating function of a normal random variable with expected value
n
n
n
n
i1
i1
i1
i1
! a i 6 i and variance ! a 2i @ 2i i.e. Y L N ! a i 6 i , ! a 2i @ 2i .
ſ
219
[Chapter 8]
JTheorem 8.3.2a :
Suppose that X 1 , X 2 , T X n is a random sample from a normal population with expected
value 6 and variance @ 2 .
Let X be the sample mean. Then X is normally distributed with expected value 6 and
2
variance @n .
Proof :
n
n
Xi
! i1
1
We have that X !
n Xi.
n
i1
n
From th.8.3.1 it then follows that X is normally distributed with mean !
n
variance !Ÿ 1n
i1
1
n
6 6 and
i1
2
2
@ 2 @n .
ſ
JTheorem 8.3.2b :
Suppose that X 1 , X 2 , T X n 1 is a random sample from a normal population with expected
value 6 1 and variance @ 21 .
Suppose that Y 1 , Y 2 , T Y n 2 is an independent random sample from a normal population
with expected value 6 2 and variance @ 22 .
Let X be the sample mean of the X’s and let Y be the sample mean of the Y’s. Then
@2
@2
X " Y is normally distributed with expected value 6 1 " 6 2 and variance n 11 n 22 .
Proof :
From th.8.3.2a it follows that X is normally distributed with expected value 6 1 and
@2
variance n 11 and Y is also normally distributed with expected value 6 2 and variance
@ 22
n 2 . Furthermore X and Y are independent since the two samples are independent.
Hence X " Y is a linear combination of two independent random variables and it then
follows from th.8.3.1 that X " Y is a normal random variable with expected value
@2
@2
6 1 " 6 2 and variance n 11 n 22 . ſ
OEXERCISES :
Bain and Engelhardt p.283 No.’s 1, 2 and 4.
220
[Chapter 8]
THE CHI-SQUARE DISTRIBUTION
ƒDefinition 8.3.1 :
The random variable X is said to have a chi-square distribution with 7 .degrees of
freedom if X is distributed like a gamma random variable with parameters 2 and 7/2.
JTheorem 8.3.2 :
If X is distributed like a chi-square with 7 .degrees of freedom, then
72 r
"7/2
r
r
, E¡X ¢ 2
, E¡X¢ 7 .and varŸX 27.
M X Ÿt Ÿ1 " 2t
72
Proof : The results follows from the results for gamma distributions. ſ
JTheorem 8.3.3 :
If X L GAMŸ2, 4 , then Y 2X/2 L D 2 Ÿ24 i.e. Y is distributed like chi-square with
24 degrees of freedom.
Proof :
"4
2X
2t
Ÿ1 " 2t "24/2 ,
M Y Ÿt E¡e tY ¢ E e t 2 E e 2 X M X 2t 1 " 2 2t
2
2
i.e. it is the moment generating function of a chi-square distribution with 24 degrees of
freedom. ſ
JTheorem 8.3.4 :
Suppose that X 1 , X 2 , T X n are independent random variables and that X i L D 2 Ÿ7 i . Let
n
n
i1
i1
Y ! X i . Then Y is distributed like chi-square with ! 7 i degrees of freedom.
Proof :
M Y Ÿt
n
M X i Ÿt
i1
n
Ÿ1 " 2t
"
7i
2
i1
Ÿ1 " 2t
"
% ni1 7 i
2
n
which is the moment generating function of a chi-squared random variable with ! 7 i
degrees of freedom.
ſ
i1
221
[Chapter 8]
JTheorem 8.3.5 :
Suppose that X is a standard normal random variable.
Then X 2 is distributed like a chi-squared random variable with 1 degree of freedom.
Proof :
M X 2 Ÿt
E¡e tX ¢
2
.
"x /2
2
dx
; e tx e
2=
".
.
;
2
2
exp " x2 Ÿ1 " 2t
2=
".
.
1
;
1 " 2t ".
dx
x2
exp " 12
2=
1
1"2t
1
1 " 2t
dx
this is the integral of a normal density
function with expected value 0 and
variance 1 .
1 " 2t
1
1 " 2t
This is the moment generating function of a chi-squared random variable with 1 degree of
freedom. ſ
JTheorem 8.3.6a :
Suppose that X 1 , X 2 , T X n is a random sample from a normal population with expected
value 6 and variance @ 2 .
Let X be the sample mean.
2
n
nŸX " 6
ŸX i " 6 2
2
L D Ÿn and
L D 2 Ÿ1 .
Then !
@2
@2
i1
Proof :
n
n
n
Xi " 6 2
ŸX " 6 2
!
Y 2i
! i 2
!
@
@
i1
i1
i1
Xi " 6
where Y i are independent NŸ0, 1 random variables i.e. Y i 2 are
@
n
independent D 2 Ÿ1 random variables i.e. ! Y 2i L D 2 Ÿn . Also
nŸX " 6
@2
2
X"6
@
n
2
i1
Y 2 where Y L NŸ0, 1 i.e. Y 2 L D 2 Ÿ1 .
ſ
222
[Chapter 8]
JTheorem 8.3.6 :
Suppose that X 1 , X 2 , T X n is a random sample from a normal population with expected
value 6 and variance @ 2 .
Let X be the sample mean and let s 2 be the sample variance. Then
L
1. X and X i " X are independent for i 1, 2, 3, T , n.
2. X and s 2 are independent.
L
Ÿn " 1 s 2
3.
@2
L
L D 2 Ÿn " 1 .
Proof :
Part 1
The joint density function of X 1 , X 2 , T X n is given by
n
f X i Ÿx i
f X 1 ,X 2 ,T,X n Ÿx 1 , x 2 , T x n
i1
n
"
e 2
f X 1 ,X 2 ,T,X n Ÿx 1 , x 2 , T x n
1
i1
since the X i ’s are independent i.e.
x i "6
@
2= @
x i "6
" 2 % i1 @
e
n
Ÿ2= 2 @ n
1
n
2
2
.
n
Let Y 1 X !
1
n
Xi
i1
and
Yi Xi " X
for i 2, 3, 4, T n.
Consider the equations
n
y1 x !
1
n
xi
i1
and
yi xi " x
for i 2, 3, 4, T n.
The solution of these equations is given by
x i y i y 1 for i 2, 3, 4, T n.
and x 1 ny 1 " x 2 " x 3 "T "x n
ny 1 " £y 2 y 1 ¤ " £y 3 y 1 ¤ "T "£y n y 1 ¤
y 1 " y 2 " y 3 "T "y n .
223
[Chapter 8]
Then
1 "1 "1 "1 T "1
J xvy
1
1
0
0 T
0
1
0
1
0 T
0
1
0
0
1 T
0
.
.
.
. T
.
n "1 "1 "1 T "1
0
1
0
0 T
0
0
0
1
0 T
0
0
0
0
1 T
0
.
.
.
. T
.
1 0 0 0 T 1
0 0 0 0 T 1
by subtracting column 2 from the first column, subtracting column 3 from the first
column,T,subtracting the n-th column from the first column
n 0 0 0 T 0
0 1 0 0 T 0
0 0 1 0 T 0
n
0 0 0 1 T 0
.
.
. T
.
.
0 0 0 0 T 1
by adding row 2 to the first row, adding row 3 to the first row,T,adding the n-th row to
the first row and since the determinant of a diagonal matrix is the product of the diagonal
elements.
We have that
n
!Ÿx i " x
i1
n
n
i1
i1
! x i " ! x nx " nx 0
n
and !Ÿx i " 6
n
n
i2
i2
" !Ÿx i " x " ! y i
i.e. x 1 " x
2
n
!Ÿx i " x x " 6
i1
i1
n
! Ÿx i " x
i1
n
!Ÿx i " x
i1
n
!Ÿx i " x
2
2
2
Ÿ8. 3. 1
2
2Ÿx i " x Ÿx " 6 Ÿx " 6
n
n
i1
i1
2Ÿx " 6 !Ÿx i " x !Ÿx " 6
nŸx " 6
2
2
n
!Ÿx i " x
2
nŸx " 6
i2
n
" ! yi
i2
2
n
! y 2i nŸy 1 " 6
i2
2
Ÿ8. 3. 2
i1
Ÿx 1 " x
2
2
2
224
[Chapter 8]
Therefore
f Y 1 ,Y 2 ,T,Y n Ÿy 1 , y 2 , T y n
2
n
" ! yi
exp " 2@1 2
n
! y 2i nŸy 1 " 6
i2
2
i2
n
2
Ÿ2=
. |n|
@n
nn
exp " 1 2 nŸy 1 " 6
2@
Ÿ2= 2 @ n
2
exp " 1 2
2@
n
" ! yi
i2
2
n
! y 2i
i2
i.e. the joint density Y 1 , Y 2 , T , Y n consists of one factor that depends only on y 1 and the
other factor that depends on y 2 , y 3 , T y n i.e. Y 1 is independent of Y 2 , Y 3 , T , Y n or X
is independent of X 2 " X, X 3 " X, T , X n " X.
From (8.3.1) we have that
n
X 1 " X " !ŸX i " X and therefore X is also independent of X 1 " X.
i2
Part 2
L
n
! i1
ŸX i " X
2
depends only on X 1 " X, X 2 " X, T , X n " X and since X is
n"1
independent of all of them, it follows that X and s 2 are independent.
Since
s2
L
Part 3
From 8.3.2 we get that
n
!ŸX i " 6
2
n
!ŸX i " X
i1
2
nŸX " 6
2
i1
Xi " 6
or !
@
n
2
i1
Ÿn " 1 s 2
@2
L
X"6
2
@
n
or say V 1 V 2 V 3 .
X "6
The random variables i @ , i 1, 2, T , n, are independent standard normal random
Xi " 6 2
variables, i.e.
, i 1, 2, T , n, are independent D 2 Ÿ1 random variables and
@
n
Xi " 6 2
therefore V 1 !
L D 2 Ÿn . Furthermore V 2 depend only on s 2 and V 3
@
L
i1
only depends on X and since they are independent random variables it follows that V 2
and V 3 are independent random variables. Since X L NŸ6, @ 2 /n it follows that
X"6
@
n
L NŸ0, 1 and V 3 X"6
@
n
2
L D 2 Ÿ1 .
225
[Chapter 8]
Since V 1 is the sum of two independent random variables V 2 and V 3 we have that
M V 1 Ÿt M V 2 Ÿt . M V 3 Ÿt where M V i Ÿt the m.g.f. of V i or
Ÿ1 " 2t
" n2
M V 2 Ÿt . Ÿ1 " 2t
"
M V 2 Ÿt Ÿ1 " 2t
n"1
2
" 12
or
.
This is the moment generating function of a chi-squared random variable with n " 1
Ÿn " 1 s 2
L
degrees of freedom i.e. V 2 L D 2 Ÿn " 1 . ſ
2
@
Suppose that Y L D 2 Ÿ7 , that 0 + 1 and that D 2+ Ÿ7 is that value such that
P¡Y t D 2+ Ÿ7 ¢ + i.e. D 2+ Ÿ7 is the 100 • + percentile of a chi-squared distribution with
7 ..degrees of freedom. Some values of D 2+ Ÿ7 may be found in Table 4 pp604-605 in the
book of Bain and Engelhardt.
For instance D 2.025 Ÿ9 2. 7 and D 2.975 Ÿ9 19. 02 from which it then follows that if
Y L D 2 Ÿ9 , then P¡D 2.025 Ÿ9 t Y t D 2.975 Ÿ9 ¢ 0. 95 or P¡2. 7 t Y t 19. 02¢ 0. 95.
EXAMPLE 8.3.1 :
Suppose that X 1 , X 2 , T X 10 is a random sample from a normal population with expected
9 s2
L
2
2
L D 2 Ÿ9 .
value 6 and variance @ where both 6 and @ are unknown. Then
@2
It then follows that
9 s2
L
2
t D 2.975 Ÿ9
0. 95 P D .025 Ÿ9 t
@2
P 2. 7 t
P
9 s2
L
@2
t 19. 02
2
1
t @2
19. 02
9s
t
1
2. 7
L
P
9 s2
L
19. 02
t @2 t
9 s2
L
2. 7
.
226
[Chapter 8]
Note that @ 2 is a fixed number, but that s 2 is a random variable i.e. for different
L
samples the value of
s2
L
9 s2
will be different. The intervals
L
19. 02
9 s2
,
L
2. 7
will be
different for different samples. Some intervals will include @ 2 and others will not
include @ 2 . The probability that the interval will include @ 2 is 0.95. Such an interval is
called a 95% confidence interval for @ 2 .
OEXERCISES :
Bain and Engelhardt p.284 No.’s 5, 8, 9, 11, 12 and 13.
8.4 THE t, F AND BETA DISTRIBUTIONS
STUDENT’S t DISTRIBUTION
JTheorem 8.4.1 :
Suppose that Z is a standard normal random variable and that X is an independent
chi-squared random variable with 7 .degrees of freedom.
Let T Z . Then the density function of T is given by
X
7
f T Ÿt
7
2
71
2
12
2
1 t7
7
" 71
2
for ". t ..
The density function is called the t distribution with 7 .degrees of freedom.
Proof :
The joint density function of Z and X is given by
f Z,X Ÿz, x
f Z Ÿz . f X Ÿx
since Z and X are independent
7
1 e " 12 z 2 x 2 "1 e " 2
7
2 2 72
2=
x
0
Let T Z
X
7
for " . z . and x 0
otherwise
and W X.
z
The solution of the equations t and w x
x
7
is given by x w and z t w
7 .
Therefore JŸŸz, x v Ÿt, w
w
7
0
t
1
1
2 w
7
1
7
w .
7
227
[Chapter 8]
Hence
1 e " 12
2=
f T,W Ÿt, w t2w
7
7
"1 "
w7 2 e7 2 22 2
w
w
7
for " . t . and w 0
0
otherwise
Then
.
f T Ÿt
; f T,W Ÿt, w dw
".
.
;
0
.
;
0
71
w 2
71
2 2 71
2
1
2
2
71
2
1
2
7
2
71
2
7
2
" 71
2
1 t7
2
w dw
7
2
" 71
2
2
w
e
1
2
1 t7
2
2
7
"1 "
w7 2 e7 2 22 2
w
t
"1 " 2 1 7
2
t2w
7
1 e " 12
2=
7
.
7
71
2
7
2
dw
7
71
2
;
w
0
2
2
1 t7
2
w
t
"1 " 2 1 7
e
71
2
•1
since the integral is that of a GAM
1
2
71
2
72
JTheorem 8.4.2 :
See Bain and Engelhardt.
7
dw
71
2
2
1 t7
" 71
2
ſ
2 , 71
2
2
1 t7
function
228
[Chapter 8]
JTheorem 8.4.3 :
Suppose that X 1 , X 2 , T X n is a random sample from a normal population with expected
value 6 and variance @ 2 where both 6 and @ 2 are unknown. Let X be the sample
X"6
is distributed like t with n " 1
mean and let s 2 be the sample variance. Then
s/ n
L
L
degrees of freedom.
Proof :
From th.8.3.2a we have that
Ÿn " 1 s 2
L
@2
L D 2 Ÿn " 1 . It then follows from th.8.4.1 that
X"6
@/ n
Ÿn"1 s 2
L
@2
X"6
L NŸ0, 1 and from th.8.3.6 it is independent of
@/ n
/ Ÿn " 1
X"6
L tŸn " 1 .
s/ n
ſ
L
The percentiles of the t-distribution t + Ÿ7 can be found in table 6 p.608 in Bain and
Engelhardt. Suppose that T is distributed like t with 7 .degrees of freedom, then
+ P¡Y t t + Ÿ7 ¢.
For example t .025 Ÿ9 "2. 262 and t .975 Ÿ9 2. 262.
EXAMPLE 8.4.1 :
Suppose that X 1 , X 2 , T X 10 is a random sample from a NŸ6, @ 2
X"6
Then
L tŸ9 and
s / 10
population.
L
0. 95 P t .025 Ÿ9 t
X"6
t t .975 Ÿ9
s / 10
L
P "2. 262 t
X"6
t 2. 262
s / 10
L
P "2. 262 • s / 10 t X " 6 t 2. 262 • s / 10
L
L
P "X " 2. 262 • s / 10 t "6 t "X 2. 262 • s / 10
L
L
P X " 2. 262 • s / 10 t 6 t X 2. 262 • s / 10
L
L
229
[Chapter 8]
Note that 6 is a constant. For some samples the interval
X " 2. 262 • s / 10 , X 2. 262 • s / 10
L
L
will include 6 while in other samples it
will not include 6. The probability that the interval will include 6 is 0.95 i.e. the
interval is a 95% confidence interval for 6.
OEXERCISES :
Bain and Engelhardt p.285 No.16.
THE BETA AND F DISTRIBUTIONS
ƒDefinition 8.4.1 :
The random variable X is said to have a beta density function with parameters ) 1 and
) 2 if the density function is given by
x ) 1 "1 Ÿ1 " x ) 2 "1
for 0 x 1
BŸ) 1 , ) 2
f X Ÿx 0
otherwise
1
where BŸ) 1 , ) 2 ; x ) 1 "1 Ÿ1 " x
) 2 "1
dx.
0
Note that
.
1 ) 1 "1
; f X Ÿx dx ; x Ÿ1 " x
".
0
BŸ) 1 , ) 2
) 2 "1
dx
1
1
; x ) 1 "1 Ÿ1 " x
BŸ) 1 , ) 2 0
1
BŸ) 1 , ) 2
BŸ) 1 , ) 2
) 2 "1
1
i.e. the integral of the density function is 1.
dx
230
[Chapter 8]
JTheorem 8.4.4a :
Suppose that X 1 is a gamma random variable with parameters 2 and ) 1 . Suppose that
X 2 is an independent gamma random variable with parameters 2 and ) 2 .
X1
and Y 2 X 1 X 2 .
Let Y 1 X1 X2
Then Y 1 and Y 2 are independent random variables and Y 1 is a beta random variable
with parameters ) 1 and ) 2 and Y 2 is a gamma random variable with parameters 2
and Ÿ) 1 ) 2 .
Proof :
The joint density function of X 1 and X 2 is given by
f X 1 ,X 2 Ÿx 1 , x 2
f X 1 Ÿx 1 . f X 2 Ÿx 2
x1
x )1 1 "1 e " 2
2 ) 1 Ÿ) 1
0
x2
x )2 2 "1 e " 2
2 ) 2 Ÿ) 2
since X 1 and X 2 are independent
for x 1 0 and x 2 0
otherwise.
The solution of y 1 x x1 x
1
2
and y 2 x 1 x 2
is given by x 1 y 1 . Ÿx 1 x 2 y 1 . y 2
and x 2 y 2 " x 1 y 2 " y 1 y 2 Ÿ1 " y 1 y 2
and the Jacobian of the transformation is
y2 y1
y 2 Ÿ1 " y 1 " £"y 2 y 1 ¤ y 2 .
J xvy "y 2 1 " y 1
Therefore
f Y 1 ,Y 2 Ÿy 1 , y 2
Ÿy 1 y 2 ) 1 "1 e "
2 ) 1 Ÿ) 1
y1y2
2
¡Ÿ1 " y 1 y 2 ¢ ) 2 "1 e "
2 ) 2 Ÿ) 2
Ÿ1"y 1 y 2
2
0
y )1 1 "1 Ÿ1 " y 1
Ÿ) 1 Ÿ) 2 /Ÿ) 1 ) 2
0
for y 1 y 2 0 and Ÿ1 " y 1 y 2 0
otherwise
) 2 "1
|y 2 |
y2
y ) 1 "1) 2 "11 e " 2
)21 ) 2
2
Ÿ) 1 ) 2
for 0 y 1 1 and y 2 0
otherwise
231
[Chapter 8]
.
Then f Y 1 Ÿy 1 ; f Y 1 ,Y 2 Ÿy 1 , y 2 dy 2 .
".
Since f Y 1 ,Y 2 Ÿy 1 , y 2 0 for all y 2 if y 1 t 0 or if y 1 u 1, it follows that the integral of
f Y 1 Ÿy 1 is 0 for y 1 t 0 or if y 1 u 1.
For 0 y 1 1 we then get that
.
) 2 "1
f Y 1 Ÿy 1
y2
y )1 1 "1 Ÿ1 " y 1
y ) 1 "1) 2 "11 e " 2
; )21 ) 2
dy 2
Ÿ) 1 Ÿ) 2 /Ÿ) 1 ) 2 0 2
Ÿ) 1 ) 2
y )1 1 "1 Ÿ1 " y 1
Ÿ) 1 Ÿ) 2 /Ÿ) 1 ) 2
) 2 "1
• 1 since the integral is the integral
of a gamma density function with parameters 2 and Ÿ) 1 ) 2
.
Since 1 ; f Y 1 Ÿy 1 dy 1
".
1
;
0
) 2 "1
y )1 1 "1 Ÿ1 " y 1
dy 1
Ÿ) 1 Ÿ) 2 /Ÿ) 1 ) 2
1
Ÿ) 1 Ÿ) 2 /Ÿ) 1 ) 2
1
; y )1 1 "1 Ÿ1 " y 1
it follows that BŸ) 1 , ) 2 dy 1
0
1
BŸ) 1 , ) 2
Ÿ) 1 Ÿ) 2 /Ÿ) 1 ) 2
) 2 "1
by definition of BŸ) 1 , ) 2
Ÿ) 1 Ÿ) 2
Ÿ) 1 ) 2
Therefore
f Y 1 Ÿy 1 y )1 1 "1 Ÿ1 " y 1
BŸ) 1 , ) 2
0
) 2 "1
for 0 y 1 1
otherwise
i.e. Y 1 has a beta density function with parameters ) 1 and ) 2 .
232
[Chapter 8]
We also have that
.
f Y 2 Ÿy 2 ; f Y 1 ,Y 2 Ÿy 1 , y 2 dy 1 .
".
Since f Y 1 ,Y 2 Ÿy 1 , y 2 0 for all y 1 if y 2 t 0, it follows that the integral of f Y 2 Ÿy 2
for y 2 t 0.
is 0
For y 2 0
) 2 "1
y )1 1 "1 Ÿ1 " y 1
;
Ÿ) 1 Ÿ) 2 /Ÿ) 1 ) 2
0
1
f Y 2 Ÿy 2
y2
y2
y )2 1 "1) 2 "11 e " 2
dy 1
2 ) 1 ) 2 Ÿ) 1 ) 2
) 2 "1
y )1 1 "1 Ÿ1 " y 1
y )2 1 ) 2 "1 e " 2
;
) 1 )
dy 1
2
2
Ÿ) 1 ) 2 0 Ÿ) 1 Ÿ) 2 /Ÿ) 1 ) 2
1
y2
y )2 1 ) 2 "1 e " 2
) 1 )
2
2
Ÿ) 1 ) 2
BŸ) 1 , ) 2
Ÿ) 1 Ÿ) 2 /Ÿ) 1 ) 2
y2
y )2 1 ) 2 "1 e " 2
) 1 )
because of Ÿ8. 4. 1
2
2
Ÿ) 1 ) 2
i.e. Y 2 has a gamma density function with parameters 2 and ) 1 ) 2 . ſ
JTheorem 8.4.4b :
Suppose that X 1 and X 2 are two independent chi-squared random variables with 7 1
and 7 2 degrees of freedom respectively.
X1
and Y 2 X 1 X 2 are independent random variables and Y 1 is a
Then Y 1 X1 X2
beta random variable with parameters 721 and 722 and Y 2 is a chi-squared random
variable with 7 1 7 2 degrees of freedom.
Proof :
We have that X 1 L GAM 2, 721 and X 2 L GAM 2, 722 . It then follows from
th.8.4.4a that Y 1 and Y 2 are independent and that Y 1 has a beta density function with
parameters 721 and 722 and that Y 2 has a GAM 2, 721 722 distribution i.e.
Y 2 L D 2 Ÿ7 1 7 2 . ſ
233
[Chapter 8]
JTheorem 8.4.4c :
Suppose that X is a random variable with a beta density function with parameters
and )22 .
) 2 . X . Then the density function of Y is given by
Let Y )
1 1"X
f Y Ÿy )1
2
) 1 ) 2
2
)2
2
Ÿ )) 12
)1
2
y
)1
2
"1
Ÿ1 )1
)2
y
"
) 1 ) 2
2
)1
2
for y 0
0
otherwise
This density function is called an F distribution with ) 1 and ) 2 degrees of freedom.
Proof :
We have that
x
f X Ÿx )1
2
"1
Ÿ1 " x
B )21 , )22
)2
2
"1
0
for 0 x 1
otherwise
)2 x .
Let y )
1
1"x
) 1 yŸ1 " x x
Then )
2
) 1 y x ) 1 yx 1 ) 1 y x
or )
)2
)2
2
y
)1
or x )
u "1 Ÿy
2
1
y
1 )
)
say.
2
Then
d u "1 Ÿy ) 1
)2
dy
)1
1
1 )
)2 y " )2 y )1
2
)2
1
1 )
y
)2
1
1
1 )
)2 y
2
.
234
[Chapter 8]
Therefore
f Y Ÿy
)1
)2
1
)1
2
y
)1
)2
"1
1"
y
B
)1
)2
)1
2
,
1
)2
2
)2
2
y
)1
)2
y
"1
)1
)2
1
1
1 )
)2 y
2
0
1
y
)1
)2
y
1
otherwise
for 0 )1
)2
)1
2
) 1 ) 2
2
)1
2
)1
)2
)2
2
y
)1
2
"1
1
1 )
)2 y
"
) 1 ) 2
2
for y 0
0
ſ
otherwise
JTheorem 8.4.4 :
Suppose that X 1 and X 2 are independent chi-squared random variables with 7 1 and 7 2
degrees of freedom respectively.
Let F X1
71
X2
72
.
Then F is distributed like F with 7 1 and 7 2 degrees of freedom.
Proof :
From th.8.4.4b we have that Y parameters
71
2
and
72
2
X1
is a beta random variable with
X1 X2
. From th.8.4.4c it then follows that
X1
72 Y 72 72 X1 X 1 X 2
F 7
7
7 1 1 " X1
1
1
X2
1"Y
X 1 X 2
X1
71
X2
72
is distributed like F with 7 1 and 7 2 degrees of freedom.
JTheorem 8.4.5 :
See Bain and Engelhardt p276.
ſ
235
[Chapter 8]
JTheorem 8.4.6 :
If U is distributed like F with 7 1 and 7 2 degrees of freedom, then 1/U is distributed
like F with 7 2 and 7 1 degrees of freedom.
Proof :
To get an F distribution with 7 1 and 7 2 degrees of freedom, suppose that X 1 and X 2
are independent chi-squared random variables with 7 1 and 7 2 degrees of freedom
respectively.
Then U X1
71
X2
72
is distributed like F with 7 1 and 7 2 degrees of freedom.
But 1 U
X2
72
X1
71
and from th.8.4.5 it follows that 1/U is distributed like F with 7 2
and 7 1 degrees of freedom.
ſ
We will use the notation F L FŸ7 1 , 7 2 to indicate that the random variable F has an F
distribution with 7 1 and 7 2 degrees of freedom.
Now suppose that F L FŸ7 1 , 7 2 . Let f + Ÿ7 1 , 7 2 be that value such that
P¡F t f + Ÿ7 1 , 7 2 ¢ +, i.e. f + Ÿ7 1 , 7 2 is the 100 • + percentile of the F distribution
with 7 1 and 7 2 degrees of freedom. Values of f + Ÿ7 1 , 7 2 is given in Table 7
pp.609-611 in the book of Bain and Engelhardt. Note that the table gives values of
f + Ÿ7 1 , 7 2 for various combinations of values of 7 1 and 7 2 and + . 90, . 95, . 975, . 99
and . 995 i.e. only for values of + close to 1. To obtain values of f + Ÿ7 1 , 7 2 for values
of + close to 0 we use th.8.4.6 as follows.
Suppose that F L FŸ7 1 , 7 2 i.e. 1/F L FŸ7 2 , 7 1 . Therefore
1 " + P¡F t f 1"+ Ÿ7 1 , 7 2 ¢
P¡1/F u 1/f 1"+ Ÿ7 1 , 7 2 ¢
1 " P¡1/F t 1/f 1"+ Ÿ7 1 , 7 2 ¢
or
+
P¡1/F t 1/f 1"+ Ÿ7 1 , 7 2 ¢
i.e. f + Ÿ7 2 , 7 1 1/f 1"+ Ÿ7 1 , 7 2
or f + Ÿ7 1 , 7 2 1/f 1"+ Ÿ7 2 , 7 1 .
236
[Chapter 8]
EXAMPLE 8.4.2 :
Suppose that F L FŸ5, 20 .
From Table 7 we get that f .975 Ÿ5, 20 3. 29 and f .975 Ÿ20, 5 6. 33 i.e.
f .025 Ÿ5, 20 1/6. 33 0. 157.
Therefore
0. 95 0. 975 " 0. 025 P f .025 Ÿ5, 20 t F t f .975 Ÿ5, 20
P¡0. 157 t F t 3. 29¢.
JTheorem 8.4.7 :
Suppose that X 1 , X 2 , T , X n 1 is a random sample from a normal population with variance
@ 21 .
Suppose that Y 1 , Y 2 , T , Y n 2 is an independent random sample from a normal population
with variance @ 22 .
Let s 21 be the sample variance of the X’s and let s 22 be the sample variance of the Y’s.
L
L
s 21
Let F L
@ 21
.
s 22
L
@ 22
Then F has an F distribution with n 1 " 1 and n 2 " 1 degrees of freedom.
Proof :
From th.8.3.6 we have that
Ÿn 1 " 1 s 21
Ÿn 2 " 1 s 22
L
L
U1 L D 2 Ÿn 1 " 1 and U 2 L D 2 Ÿn 2 " 1
@ 21
@ 22
and U 1 and U 2 are independent since the samples are independent.
It then follows from th.8.4.4 that
F
U1
n 1 "1
U2
n 2 "1
But F is distributed like F with n 1 " 1 and n 2 " 1 degrees of freedom.
U1
n 1 "1
U2
n 2 "1
Ÿn 1 "1 s 21
L
@ 21
n 1 "1
Ÿn 2 "1 s 22
L
@ 22
n 2 "1
s 21
L
@ 21
s 22
L
@ 22
.
ſ
237
[Chapter 8]
EXAMPLE 8.4.3 :
Suppose that X 1 , X 2 , T , X 11 is a random sample from a normal population with variance
@ 21 .
Suppose that Y 1 , Y 2 , T , Y 21 is an independent random sample from a normal population
with @ 22 .
Let s 21 be the sample variance of the X’s and let s 22 be the sample variance of the Y ’s.
L
L
s 21
L
@ 21
Then
s 22
L FŸ10, 20 and
L
@ 22
s 21
L
@ 21
0. 95 P f .025 Ÿ10, 20 t
t f .975 Ÿ10, 20
s 22
L
@ 22
P
P
1
f .975 Ÿ20, 10
@ 21
@ 22
1
@ 21
@ 22
L
s 21
L
s 22
L
s 21
2
L
P
i.e.
1
2. 77
the interval
L
s 22
L
t
@ 21
@ 22
@ 21
@ 22
1
2. 77
u
t f .975 Ÿ10, 20
@ 21
@ 22
1
s 22
P 3. 42 sL2 u
1
L
s 22
L
L
P 3. 42 u
s 21
t
s 21
1 t
3. 42
s 21
t 2. 77
u
from Table 7
1
2. 77
1
2. 77
s 21
L
s 22
L
s 21
t 3. 42 sL2
2
L
s 21
s 22
s2
, 3. 42 s 12
is a 95% confidence interval for
2
OEXERCISES :
Bain and Engelhardt p.286 No.’s 17 and 18.
@ 21
.
@ 22
238
[Chapter 8]
8.5 LARGE-SAMPLE APPROXIMATIONS
JTheorem 8.5.1 :
Suppose that Y 7 is a random variable for 7 1, 2, 3, T and that Y 7 has a chi-squared
distribution with 7 .degrees of freedom. Let Z 7 Y 7 " 7 .
27
d
Then Z 7 v Z where Z is a standard normal random variable.
Proof :
7
If Y 7 L D 2 Ÿ7 then Y 7 is distributed like ! X i where the X i ’s are independent
i1
random variables all with a chi-squared distribution with 1 degree of freedom. Then
E¡X i ¢ 2 • 12 1 and varŸX i 2 2 • 12 2.
7
! Xi " 7 • 1
From the central limit theorem it then follows that
Z L NŸ0, 1 .
i1
7 2
d
Z 7 v Z where
ſ
Suppose that Y 7 L D 2 Ÿ7 and that D 2+ Ÿ7 is the number such that P¡Y 7 t D 2+ Ÿ7 ¢ +
i.e. D 2+ Ÿ7 is the 100 • + percentile of a chi-squared distribution with 7 .degrees of
freedom.
For large values of 7 .we then get that
D 2+ Ÿ7 " 7
D 2+ Ÿ7 " 7
+ P Y7 " 7 t
`P Zt
where Z L NŸ0, 1
27
27
27
D 2+ Ÿ7 " 7
i.e.
` z + where z + is the 100 • + percentile of the NŸ0, 1 distribution.
27
Therefore D 2+ Ÿ7 ` 27 z + 7.
EXAMPLE 8.5.1 :
Suppose that 7 30 and + 0. 95. Then D 2+ Ÿ30 43. 77 from Table 4 in Bain and
Engelhardt. The approximation derived above gives us
D 2+ Ÿ30 ` 2 • 30 • 1. 645 30 42. 74.
It can also be shown that if T 7 is distributed like t with 7 degrees of freedom, then
d
T 7 v Z where Z L NŸ0, 1 . The percentiles of the t distribution with 7 degrees of
freedom is therefore approximately equal to the percentiles of a standard normal
distribution.
239
[Chapter 9]
CHAPTER 9 : POINT ESTIMATION
9.1 INTRODUCTION
Suppose that X 1 , X 2 , T , X n are jointly distributed random variables and that their joint
distribution depends on certain quantities. These quantities are referred to as parameters
of the distribution.
For example if X 1 , X 2 , T , X n is a random sample from a normal population with
expected value ȝ and variance ı 2 then
n
x i "6 2
1
f X 1 ,X 2 ,T,X n Ÿx 1 , x 2 , T , x n 1 e " 2 @ .
2= @
i1
This distribution depends on the two parameters 6 and @ 2 .
Another example is when the joint distribution of X 1 , X 2 , T , X k is a multinomial
distribution i.e.
n!
f X Ÿx p x 1 p x 2 T p xk k Ÿ1 " p 1 " p 2 "T "p k n"x 1 "x 2 "T"x k .
x 1 !x 2 !T x k !Ÿn " x 1 " x 2 "T "x k ! 1 2
In this case the parameters of the distribution are n and p 1 , p 2 , T , p k .
Usually the properties (e.g. the distribution) of populations are not known completely. It
may for instance be known ( or may reasonably be expected to) that the distribution of a
population is normal but the expected value and variance may not be known. In this case
the quantities 6 and @ 2 are unknown parameters.
The purpose of taking a sample from a population is usually to get to know more about
the unknown properties of the population. On the basis of a sample the unknown
properties e.g. parameters cannot be established exactly and estimates of these parameters
are therefore required. We may also be interested to find intervals which we are fairly
confident will include the values of these parameters i.e. confidence intervals. It may also
be of interest to test whether a parameter is equal to some specific value i.e. we need tests
of hypotheses. In chapter 9 we study the problem of estimating unknown parameters of a
distribution.
In the rest of this chapter we will assume that the distribution of a population depends on
a number of parameters say 2 1 , 2 2 , T , 2 k . Let ș Ÿ2 1 , 2 2 , T , 2 k .
We will assume that X 1 , X 2 , T , X n is a random sample from a population with
distribution fŸx; 2 1 , 2 2 , T , 2 k fŸx; 2 i.e.
n
f X 1 ,X 2 ,T,X n Ÿx 1 , x 2 , T , x n fŸx i ; 2 1 , 2 2 , T , 2 k .
i1
240
[Chapter 9]
ƒDefinition 9.1.1 :
A statistic T tŸX 1 , X 2 , T , X n is an estimator of IJŸș if tŸx 1 , x 2 , T , x n is used to
estimate IJŸș if the observed values of X 1 , X 2 , T , X n are equal to x 1 , x 2 , T , x n .
For the observed values x 1 , x 2 , T , x n , tŸx 1 , x 2 , T , x n is an estimate of AŸ2 .
A special case of IJŸș is IJŸș ș in which case we estimate the parameter ș itself.
9.2 SOME METHODS OF ESTIMATION
Suppose that the distribution of a random variable X is given by fŸx; 2 1 , 2 2 , T , 2 k .
Then the moments of X will depend on 2 1 , 2 2 , T , 2 k i.e.
6 Uj E¡X j ¢ 6 Uj Ÿ2 1 , 2 2 , T , 2 k , say.
We then say that 6 U1 , 6 U2 , T , 6 Uk are the population moments for X and all of them will
be functions of the parameters of the distribution of X.
ƒDefinition 9.2.1 :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with distribution
fŸx; 2 1 , 2 2 , T , 2 k .
The sample moments are defined by
n
j
M Uj 1n ! X i .
i1
Note that M U1 X the sample mean.
§
We will use the notation 2 to indicate an estimate of the parameter ș.
If the observed values of X 1 , X 2 , T , X n are x 1 , x 2 , T , x n§ and we use
tŸx 1 , x 2 , T , x n . The
tŸx 1 , x 2 , T , x n as an estimate of 2, we use the notation 2 §
corresponding estimator tŸX 1 , X 2 , T , X n is indicated by 2.
L
241
[Chapter 9]
ƒDefinition 9.2.2a : THE METHOD OF MOMENTS ESTIMATORS
Suppose that X 1 , X 2 , T , X n is a random sample from a population with distribution
fŸx; 2 1 , 2 2 , T , 2 k . Let 6 Uj Ÿ2 1 , 2 2 , T , 2 k , j 1, 2, . . . , k be the population moments. Let
M Uj , j 1, 2, . . . , k be the sample moments.
If x 1 , x 2 , T , x n are the observed values of X 1 , X 2 , T , X n the observed values of M Uj
n
will be m Uj 1n ! x ji , j 1, 2, . . . , k.
i1
§ §
§
The method of moments estimates of 2 1 , 2 2 , T , 2 k , say 2 1 , 2 2 , T , 2 k are those values
of 2 1 , 2§2 , T§ , 2 k such
§ that U
U
(9.2.1)
6 j Ÿ 2 1 , 2 2 , T , 2 k m j for j 1, 2, . . . , k.
The solution of equations (9.2.1) will depend on x 1 , x 2 , T , x n i.e. there will be functions
t 1 , t 2§, . . . , t k of x 1 , x 2 , T , x n such that
2 i t i Ÿx 1 , x 2 , T , x n
for i 1, 2, . . . , k.
The corresponding method of moments estimators of 2 1 , 2 2 , T , 2 k is then given by
§
2 i t i ŸX 1 , X 2 , T , X n
for i 1, 2, . . . , k.
L
§ §
§
The method of moments estimators of 2 1 , 2 2 , T , 2 k denoted by 2 1 , 2 2 , T , 2 k are
therefore those values of 2 1 , 2 2 , T , 2 k such that
§ §
§
6 Uj 2 1 , 2 2 , T , 2 k M Uj for j 1, 2, . . . , k.
L
L
L
L
L
L
The method
estimator of IJŸ2 1 , 2 2 , T , 2 k is given by
§ § of moments
§
IJŸ 2 1 , 2 2 , T , 2 k .
L
L
L
EXAMPLES :
Example 9.2.1
Suppose that X 1 , X 2 , T , X n is a random sample from a normal population with
parameters 6 and @ 2 . In this case
6 U1 6 and 6 U2 E¡X 2 ¢ varŸX £E¡X¢¤ 2 @ 2 6 2 .
If x 1 , x 2 , T , x n are the observed values of X 1 , X 2 , T , X n we have that
n
m U1 1n ! x i x and
i1
n
m U2 1n ! x 2i
i1
n
1n !Ÿx i " x x
2
1n ! Ÿx i " x
2Ÿx i " x x Ÿx
i1
n
1n
2
i1
n
n
!Ÿx i " x 2 2x !Ÿx i " x
i1
n
1n !Ÿx i " x
i1
2
nŸx
2
i1
2
Ÿx
2
n
since !Ÿx i " x nx " nx 0.
i1
(9.2.2)
242
[Chapter 9]
§
The equations to determine 6 and @ 2 are therefore given by
§
6x
and
n
§
@ 2 Ÿ6 2 1n !Ÿx i " x 2 Ÿx 2 .
i1
The solution is given by
n
§
1 2
6 x and @ 2 1n !Ÿx i " x 2 n "
n s
i1
where s 2 is the sample variance.
Example 9.2.2
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution with parameter 2. In this case 6 U1 E¡X¢ § 2. The equation to determine
the method of moments estimate of 2 is then given by 2 m U1 x.
Example 9.2.3
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution with parameter 2. In this example we want to estimate
1
AŸ2 P¡X u 1¢ §
e " 2 . According to the definition of method of moment estimators,
1
the estimate is AŸ 2 AŸx e " x from Example 9.2.2.
1
1
Another way to estimate e " 2 is to write the distribution in terms of e " 2 .
1
Let 1 e " 2 . Then lnŸ1 " 1 i.e. 2 " 1 .
2
lnŸ1
243
[Chapter 9]
In terms of 1 the density function is given by
fŸx; 1 Ÿ" lnŸ1 e ŸlnŸ1
x
0
for x 0
otherwise.
1
and m U1 x.
In this case 6 U1 Ÿ1 E¡X¢ " lnŸ1
The equation to determine the method of moments estimate of 1 is therefore
" 1§ x
lnŸ 1
1
§
§
i.e. lnŸ 1 " 1 or 1 e " x which is the same estimate as obtained above. The fact that
x
we get the same estimate is an example of the so-called property of invariance of the
method of moments estimates.
Example 9.2.4
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an uniform
distribution over the interval Ÿ2 1 , 2 2 . Then
fŸx; 2 1 , 2 2 1
22 " 21
0
In this case 6 U1 Ÿ2 1 , 2 2
and
6 U2 Ÿ2 1 , 2 2
for 2 1 x 2 2
otherwise.
21 22
2
varŸX £E¡X¢¤ 2
Ÿ2 2 " 2 1 2
Ÿ2 2 2 2
1
12
4
2
2
2 21 22 22
1
.
3
§
§
The equations to determine the method of moments estimates 2 1 and 2 2
§
§
2
1 22
are
m U1 x
(9.2.3)
2
§2
§2 § §
n
21 21 22 22
m U2 1n ! x 2i
(9.2.4)
and
3
i1
§
§
From (9.2.3) we get that 2 2 2x " 2 1 and if we substitute this in (9.2.4) we get that
§2 §
§
§ 2
n
2 1 2 1 2x " 2 1 2x " 2 1
1n ! x 2i
3
i1
or
244
[Chapter 9]
n
§2 §
2 1 2 1 Ÿ2x " 4x 4x 2 " 3n ! x 2i
0.
(9.2.5)
i1
§
Equation (9.2.5) is a quadratic equation for 2 1 and will
§ give
§ us two solutions.
Since ș 1 ș 2 we must use that solution for which ș 1 ș 2 .
OEXERCISES :
Bain and Engelhardt p.328 No. 1 and 2(b), 2(c) and 2(d).
METHOD OF MAXIMUM LIKELIHOOD
ƒDefinition 9.2.2 :
Suppose that the joint distribution of X 1 , X 2 , T , X n is given by
f X 1 ,X 2 ,T,X n Ÿx 1 , x 2 , T , x n f X 1 ,X 2 ,T,X n Ÿx 1 , x 2 , T , x n ; 2 for all Ÿx 1 , x 2 , T , x n R n . Let
( be the set of all possible values of 2.
Given the observed values x 1 , x 2 , T , x n of X 1 , X 2 , T , X n , the function L : ( v R,
where LŸ2 fŸx 1 , x 2 , T , x n ; 2 for 2 ( is defined as the likelihood function.
In the case where X 1 , X 2 , T , X n is a random sample from a population with distribution
fŸx; 2 , the likelihood function is given by
LŸș fŸx 1 ; ș . fŸx 2 ; ș . T . fŸx n ; ș for all ș ȍ .
ƒDefinition 9.2.3 :
Suppose that LŸ2 f X 1 ,X 2 ,T,X n Ÿx 1 , x 2 , T , x n ; 2 , 2 (, is the likelihood function for
the observed
§ values x 1 , x 2 , T , x n of the random variables X 1 , X 2 , T , X n .
A value 2 ȍ is a maximum likelihood estimate of ș if
§
f X 1 ,X 2 ,T,X n Ÿx 1 , x 2 , T , x n ; 2 max f X 1 ,X 2 ,T,X n Ÿx 1 , x 2 , T , x n ; 2
2(
max LŸ2 .
2(
Given the observed values x 1 , x 2 , T , x n , the value of 2 for which LŸ2 is a§maximum
is not necessarily unique. If it is
§ unique for all values of x 1 , x 2 , T , x n then 2 is a
function of x 1 , x 2 , T , x n , say 2 tŸx 1 , x 2 , T , x n for all§ Ÿx 1 , x 2 , T , x n R n .
The corresponding likelihood estimator is then given by 2 tŸX 1 , X 2 , T , X n .
L
245
[Chapter 9]
NOTE 1 : In case ș consists of only one parameter ș , and if ȍ consists of an open
interval and LŸș is a maximum for some ș ȍ, then the maximum likelihood estimate
§
2 of 2 is a solution of the equation
d
LŸș 0.
(9.2.6)
d2
If the solution is not unique, it must be determined which one of the solutions gives the
maximum value of LŸ2 - remember a solution of (9.2.6) may be a minimum of LŸ2 .
§
§
NOTE 2 : If 2 gives a maximum of LŸș , then 2 gives a maximum of§ ln LŸș and
vice-versa. The maximum likelihood estimate of ș is therefore a value 2 of ș which is
a solution of the equation
d
ln LŸș 0.
(9.2.7)
d2
JTheorem 9.2.1a :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with a Poisson
distribution
with parameter 2. Then the maximum likelihood estimator of 2 is given by
§
2 X.
L
Proof :
Suppose that x 1 , x 2 , T , x n are the observed values of X 1 , X 2 , T , X n . Then
n
LŸ2
xi
e "2 2
xi!
i1
% i1 x i
e "n2 2n
xi!
n
i1
and
n
n
i1
i1
ln LŸ2 "n2 ! x i lnŸ2 " ln x i ! .
Therefore
n
ln LŸ2 "n ! x i . 1 0
2
i1
n
§
or 2 1n ! x i x.
d
d2
i1
246
[Chapter 9]
§
To show that 2 x actually gives a maximum value of LŸ2 consider
d2
d2 2
ln LŸ2
§
2 2
n
! x i " 12
2
i1
§
2 2
£nx¤ " 12
x
n
"
x
n
0 if x 0 i.e. if ! x i 0
i1
which will be the case unless
all x i ’s are zero. Assuming not all observations are zero, we
§
x
does give a maximum value for LŸș and therefore
can
then
conclude
that
2
§
2 X. ſ
L
JTheorem 9.2.1 : THE INVARIANCE PROPERTY OF ML ESTIMATORS
Suppose that X 1 §
, X 2 , T , X n is a random sample from a population with distribution given
by fŸx; 2 . Let 2 be the maximum likelihood estimator ( ml estimator) of 2. If
L
§
§
A uŸ2 is a one-to-one function of 2, then the ml estimator of A is given by A uŸ 2
L
i.e. the ml estimator of a function of theta is that function at the ml estimator of 2.
L
Proof :
Let LŸ2 be the likelihood function in terms of 2 i.e.
n
LŸ2 fŸx i ; 2
i1
Let L ' ŸIJ be the likelihood function in terms of IJ. To find L ' ŸIJ we need to write the
distribution of X in terms of IJ. Suppose that u : R v R is a one to one function i.e. if
IJ uŸș , then ș u "1 ŸIJ . Therefore fŸx; ș fŸx; u "1 ŸIJ . Hence
n
LŸ2
fŸx i ; 2
i1
n
fŸx i ; u "1 ŸA
i1
LŸu "1 ŸA
L ' ŸA
(9.2.8)
§
§
The maximum value of L is obtained when 2 2 and in terms of A when A A.
L
L
Because of the relationship (9.2.8)
, which
must be the same on both sides
§ the maximums
§
§
§
of (9.2.8) , we must have that 2 u "1 Ÿ A or A uŸ 2 . ſ
L
L
L
L
247
[Chapter 9]
JTheorem 9.2.2a :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution
with parameter ș. Then the maximum likelihood estimator of ș is given by
§
2 X.
L
Proof :
In this case
1 e " 2x
2
0
fŸx; 2 for x 0
otherwise.
Therefore
n
xi
LŸ2 1 e " 2
i1 2
0
if all x i 0
otherwise.
Since P¡X i 0¢ 1 and P¡all X i 0¢ 1, we will assume that all x i 0 in which
case
1 n
LŸ2 1n e " 2 % i1 x i
2
and then
n
ln LŸ2 "n lnŸ2 " 1 ! x i
2 i1
and
n
1 " " 1 . ! xi
d
ln
LŸ2
"n.
d2
2
22
i1
§
and for 2 2 we have that
1 "
"n. §
2
" §12
2
n
i.e.
! xi
n
. ! xi
0
i1
§
n2
i1
n
§
i.e. 2 1n ! x i x give a maximum value for LŸ2 and
i1
§
therefore 2 X. ſ
L
248
[Chapter 9]
EXAMPLES :
Example 9.2.5
Suppose that X 1 , X 2 , T , X n is a random sample from a population with density function
given by
e "Ÿx"1
for x u 1
fŸx; 1 0
otherwise
Then
n
LŸ1
e "Ÿx i "1
if all x i u 1
i1
0
otherwise
e "% i1 Ÿx i "1
if all x i u 1
0
otherwise
n
Let y 1 min£x 1 , x 2 , T , x n ¤. Hence 1 t all x i ’s if and only if 1 t y 1 . Then
e "% i1 Ÿx i "1
if 1 t y 1
0
if 1 y 1
n
LŸ1 n
n
i1
i1
Note that as 1 gets bigger, !Ÿx i " 1 gets smaller i.e. " !Ÿx i " 1 gets bigger and
therefore e "% i1 Ÿx i "1 gets bigger i.e. LŸ1 gets bigger, provided 1 t y 1 . As soon as 1
becomes greater than y 1 the value of LŸ1 becomes zero and stays zero for all 1 y 1 . A
graphical representation of LŸ1 is given in Figure 9.1 in Bain and Engelhardt.
n
Therefore LŸ1 is a maximum for 1 y 1 min£x 1 , x 2 , T , x n ¤ . This is a case where
the maximum of LŸ1 is not at a point where the derivative of LŸ1 is zero.
Example 9.2.6
Suppose that X 1 , X 2 , T , X n is a random sample from a normal population with expected
value 6 and variance @ 2 . Suppose that @ 2 is known but that 6 is unknown. Suppose
that x 1 , x 2 , T , x n are the observed values for X 1 , X 2 , T , X n .
In this case
n
LŸ6
1 e " 12 xi@"6 2
2= @
i1
" 12 % ni1
1
n e
n
2 2
2
Ÿ2= Ÿ@
x i "6
@
2
249
[Chapter 9]
i.e.
n
ln LŸ6 " n lnŸ2= " n lnŸ@ 2 " 1 2 !Ÿx i " 6
2
2
2@ i1
and
2
n
d
d6
" 1 2 ! 2Ÿx i " 6 Ÿ"1
2@ i1
ln LŸ6
n
12 ! x i " n6
@
i1
12 £nx " n6¤
@
d
d6
and therefore
§
§
ln LŸ6 0 for 6 x i.e. 6 x and therefore 6 X.
L
Example 9.2.7
Suppose that X 1 , X 2 , T , X n is a random sample from a normal population with expected
value 6 and variance @ 2 . Suppose that 6 is known but that @ 2 is unknown. Suppose
that x 1 , x 2 , T , x n are the observed values for
X 1 , X 2 , T , X n . In this case
n
LŸ@ 2
1 e " 12 xi@"6 2
2= @
i1
1
" 12 % ni1
n e
n
Ÿ2= 2 Ÿ@ 2 2
x i "6
@
2
i.e.
n
ln LŸ@ 2 " n lnŸ2= " n lnŸ@ 2 " 1 2 !Ÿx i " 6
2
2
2@ i1
and
d
d@ 2
1
" n 12 2 @
2Ÿ@ 2
ln LŸ@ 2
n
2
!Ÿx i " 6
2
2
i1
n
1
2Ÿ@ 2
"n@ 2 !Ÿx i " 6
2
i1
n
and therefore
d
d@ 2
n
ln LŸ@ 2 0 if @ 2 1n !Ÿx i " 6
i.e. @ 2 1n !Ÿx i " 6
i1
2
i1
2
n
and @ 2 1n !ŸX i " 6 2 .
L
i1
2
250
[Chapter 9]
JTheorem 9.2.2b :
Suppose that X 1 , X 2 , T , X n is a random sample from a normal population with expected
value ȝ and variance ı 2 . Suppose that ȝ and ı 2 are unknown. Suppose that
x 1 , x 2 , T , x n are the observed values for X 1 , X 2 , T , X n .
n
2
§
Then 6 X and @ 2 1n !ŸX i " X .
L
i1
L
Proof :
In this case
LŸ6, @ 2
n
1 e " 12 xi@"6 2
2= @
i1
1
" 12 % ni1
n e
n
2
Ÿ2= 2 Ÿ@ 2
x i "6
@
2
i.e.
n
ln LŸ6, @ 2 " n lnŸ2= " n lnŸ@ 2 " 1 2 !Ÿx i " 6
2
2
2@ i1
and
6
2
n
" 1 2 ! 2Ÿx i " 6 Ÿ"1
2@ i1
ln LŸ6, @ 2
n
12 ! x i " n6
@
i1
1
2 £nx " n6¤
@
and therefore
6
§
ln LŸ6, @ 2 0 for 6 x i.e. 6 x.
Ÿ9. 2. 9
We also have that
@ 2
ln LŸ6, @ 2
1
" n 12 2 @
2Ÿ@ 2
1
2Ÿ@ 2
n
2
!Ÿx i " 6
i1
n
2
"n@ 2 !Ÿx i " 6
6
n
ln LŸ6, @ 2 0 if @ 2 1n !Ÿx i " x
i1
n
§
i.e. @ 2 1n !Ÿx i " x 2 and 6 x
i1
and therefore
n
§
6 X and @ 2 1n !ŸX i " X 2 . ſ
L
L
i1
2
(9.2.10)
i1
Substituting (9.2.9) in (9.2.10) we get that
@ 2
2
2
ln LŸ6, @ 2 0 and
251
[Chapter 9]
EXAMPLES :
Example 9.2.8
Suppose that X 1 , X 2 , T , X n is a random sample from a population with density function
given by
1 e " Ÿx"12
for x u 1
2
fŸx; 2, 1 0
otherwise
where 2 0 and 1 are unknown parameters.
Let x 1 , x 2 , T , x n be the observed values of X 1 , X 2 , T , X n . Then
n
LŸ2, 1
Ÿx i "1
12 e " 2
if all x i u 1
0
otherwise
1 e "% ni1 Ÿxi2"1
2n
0
if all x i u 1
i1
otherwise
i.e.
n
ln LŸ2, 1 "n ln 2 " 1 !Ÿx i " 1
2 i1
if all x i u 1
".
otherwise.
n
Note that for any fixed value of 2 the value of "n ln 2 " 1 !Ÿx i " 1 gets bigger and
2 i1
bigger as 1 gets bigger since 2 0 provided 1 is less than all x i ’s i.e. as long as 1 is
less than y 1 min£x 1 , x 2 , T , x n ¤. As soon as 1 gets bigger than y 1 the value of LŸ2, 1
becomes ". and stays ".. The maximum value of LŸ2, 1 is therefore obtained at
1 y 1 for any value of 2. Given 1 y 1 we must then find that value of 2 for which
LŸ2, y 1 is a maximum. We have that
n
ln LŸ2, y 1 "n ln 2 " 1 !Ÿx i " y 1
2 i1
n
d
d2
and
ln LŸ2, y 1 "n 1 12 !Ÿx i " y 1 0
2
2 i1
n
if
"n2 !Ÿx i " y 1 0
i1
i.e. if
n
§
2 1n !Ÿx i " y 1 x " y 1
§
and 1 y 1 .
i1
252
[Chapter 9]
Example 9.2.9
Suppose that X 1 , X 2 , T , X n is a random sample from a population with a Weibull density
function given by
* x *"1 "Ÿ 2x *
e
for x 0
2 2
fŸx; 2, * 0
otherwise.
Let x 1 , x 2 , T , x n be the observed values of X 1 , X 2 , T , X n . Then
n
LŸ2, *
*
xi
2
2
i1
*"1
xi
e "Ÿ 2
*
if all x i u 0
0
otherwise
*
2
n n
x2i
*"1
xi
e "% i1 Ÿ 2
n
*
if all x i u 0
i1
0
otherwise.
For all x i u 0 we have that
n
ln LŸ2, * n ln * " n ln 2 Ÿ* " 1 ! ln x i
2
i1
Hence
n
n
" ! xi
2
i1
*
.
n
*"1 "x
i
ln LŸ2, * " n Ÿ* " 1 ! " 1 " ! * x i
2
2
2
2
2
i1
i1
and
n
n
n ! ln x i " ! x i * ln x i .
ln
LŸ2,
*
*
2
2
2
* i1
i1
To find the maximum likelihood estimates we have to solve the equations
2
"
* n *
n*
*1 ! x i 0
2
2
i1
n
(9.2.11)
n
n ! ln x i
2
* i1
" ! xi
2
i1
*
ln x i
2
0
(9.2.12)
From (9.2.11) we get that
n
*
"n2 * ! x i 0
i1
i.e. 2 *
n
xi
! i1
n
1
*
.
(9.2.13)
253
[Chapter 9]
We can substitute this into (9.2.12) to obtain an equation in ȕ to solve. It is not possible
to find an analytic solution to this equation and it therefore becomes necessary to use
numerical methods to find a solution. Once the solution for ȕ is known it can be
substituted in (9.2.13) to find the value of ș.
OEXERCISES :
Bain and Engelhardt p.328 No. 3 ,4(a) - 4(c) , 6, 7 and 13.
9.3
CRITERIA FOR EVALUATING ESTIMATORS
ƒDefinition 9.3.1 :
The estimator T of IJŸș is said to be an unbiased estimator of IJŸș if
E¡T¢ IJŸș for all possible values of ș.
If an estimator T is not an unbiased estimator of IJŸș it is a biased estimator of IJŸș .
EXAMPLES :
Example 9.3.1
Suppose that X 1 , X 2 , T , X n is a random sample from a population with expected value 6.
Let X be the sample mean. Then E¡X¢ 6 for all possible values of 6 i.e. X is an
unbiased estimator of 6.
Example 9.3.2
Suppose that X 1 , X 2 , T , X n is a random sample from a population with variance @ 2 .
Let s 2 be the sample variance. Then E¡s 2 ¢ @ 2 for all possible values of @ 2 i.e. s 2
L
is an unbiased estimator of @ 2 .
L
L
Example 9.3.3
Suppose that X 1 , X 2 , T , X n are independent Bernoulli random variables with parameter
n
§
§
§
p. Let p 1n ! X i . Then E¡ p¢ p and this is true for all possible values of p i.e. p
i1
is an unbiased estimator of p.
254
[Chapter 9]
It is often possible to derive several different estimators of a parameter e.g. the method of
moment estimator may differ from the maximum likelihood estimator. It is then
necessary to compare the properties of the estimators to decide which one is the ”best”
estimator.
Suppose that T is used as an estimator of AŸ2 . The ”error” in the estimate is then
T " AŸ2 . If T is an unbiased estimator of AŸ2 , then
E " error" E¡T " AŸ2 ¢ AŸ2 " AŸ2 0 for all possible values of 2.
An unbiased estimator is a "good" estimator in the sense that the expected or average
error is zero. This is, however, not the only desirable property of an estimator and
sometimes we may even prefer a biased estimator to an unbiased estimator. A further
problem is also that it is possible to obtain more than one unbiased estimator and we then
have to decide which one of the unbiased estimators is best.
Example 9.3.4
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution with parameter 2. Then E¡X i ¢ 2 and therefore E¡X¢ 2 for all values of
theta i.e. X is an unbiased estimate of 2.
Let Y 1 min£X 1 , X 2 , T , X n ¤. Then
F Y 1 Ÿy 1 1 " ¡1 " FŸy 1 ¢ n
1 " 1 " 1 " e"
y1
2
n
if y 1 t 0
1 " ¡1 " 0¢ n
1 " e"
ny 1
2
0
if y 1 0
if y 1 t 0
0
1"e
if y 1 0
"
y1
2
n
if y 1 0
if y 1 t 0
i.e. Y 1 has an exponential distribution with parameter n2 . Therefore
E¡Y 1 ¢ n2 and E¡nY 1 ¢ 2. Since this is true for all values of 2 it means that T nY 1
is an unbiased estimator of 2.
255
[Chapter 9]
For the unbiased estimator T of AŸ2 to be a ”good” estimator we would like the
absolute value of the ”error” namely |T " AŸ2 |, to be as small as possible with a
probability as close to 1 as possible.
Given . 0, we would therefore like the probability P¡|T " AŸ2 | .¢ to be as small as
possible. From Chebychev’s inequality we have that
varŸT
.
varŸT
t
.2
varŸT
i.e. to make P¡|T " IJŸș | .¢ as small as possible, we must make varŸT as small as
possible. We therefore expect that estimators with smaller variances are better than those
with bigger variances.
P¡|T " AŸ2 | u .¢ P |T " AŸ2 | u
Example 9.3.5
Continuation of Example 9.3.4.
For the exponential distribution with parameter ș the variance is ș 2 i.e.
2
varŸX 2n . Since Y 1 has an exponential distribution with parameter n2 we have that
2
2
varŸY 1 2 2 and varŸnY 1 n 2 2 2 2 2 .
n
n
Hence varŸX t varŸnY 1 and therefore X is a better unbiased estimator of ș than nY 1
.
ƒDefinition 9.3.2 :
UNIFORMLY MINIMUM VARIANCE UNBIASED ESTIMATORS
An estimator T ' of IJŸș is called a uniformly minimum variance unbiased estimator of
IJŸș if
1. T ' is an unbiased estimator of AŸ2 and
2. for any unbiased estimator T of AŸ2 , varŸT ' t varŸT for all possible values of 2.
If possible we would like to use uniformly minimum variance unbiased estimators if they
can be found.
One problem is to know whether or not an unbiased estimator has minimum variance or
not.
256
[Chapter 9]
JTheorem 9.3.2a :
Suppose X 1 , X 2 , T , X n is a random sample from a population with density function fŸx; 2 .
If T is an unbiased estimator of AŸ2 , then the Cramer-Rao lower bound (CRLB), based
on a random sample is
¡A U Ÿ2 ¢
VarŸT u
nE
2
ln fŸX; 2
2
2
CRLB
(9.3.1)
Proof :
Assume the case of sampling from a continuous distribution. Define the random variable
U ' U ' ŸX 1 , T , X n ; 2
ln fŸX 1 , T , X n ; 2
2
fŸX , T , X ; 2
1
1
n
fŸX, T , X n ; 2 2
then
; C ; U ' Ÿx 1 , T , x n ; 2 fŸx 1 , T , x n ; 2
; C ; fŸx 1 , T , x n ; 2 dx 1 Cdx n
2
EŸU ' dx 1 Cdx n
; C ; fŸx 1 , T , x n ; 2 dx 1 Cdx n
2
we assume that we can inter-change the order of differentiation and integration
1
2
2
0
2
If T TŸX 1 , T , X n is an unbiased estimator for AŸ2 then
AŸ2 EŸT ; C ; TŸx 1 , T , x n
fŸx 1 , T , x n ; 2 dx 1 Cdx n
(9.3.2)
2
and if we differentiate with respect to 2, then
A U Ÿ2 ; C ; TŸx 1 , T , x n fŸx 1 , T , x n ; 2 dx 1 Cdx n
2
Since
EŸU '
; C ; TŸx 1 , T , x n
U ' Ÿx 1 , T , x n ; 2 fŸx 1 , T , x n ; 2 dx 1 Cdx n
EŸTU '
0 it follows that
EŸTU ' CovŸT, U ' EŸT EŸU ' CovŸT, U '
(9.3.3)
257
[Chapter 9]
Because the correlation coefficient is always between o1
®: ¡CorŸT, U ' ¢ 2 t 1
¡CovŸT, U ' ¢ 2
VarŸT VarŸU '
®:
t1
®: ¡CovŸT, U ' ¢ 2 t VarŸT VarŸU '
2
¡A U Ÿ2 ¢
VarŸU '
When X 1 , T , X n represents a random sample
fŸX 1 , T , X n ; 2 fŸX 1 ; 2 T fŸX n ; 2
so that
U ' ŸX 1 , T , X n ; 2 ln fŸX 1 , T , X n ; 2
2
®: VarŸT u
ln
2
n
n
i1
n
fŸX i ; 2 2
i1
ln fŸX i ; 2 2
!
(9.3.4)
! ln fŸX i ; 2
i1
n
! UŸX i ; 2
i1
where
UŸX i ; 2 ln fŸX i ; 2 ,
2
i 1, 2, T n
it follows that
n
VarŸU ' Var
! UŸX i ; 2
i1
n
! Var¡UŸX i ; 2
¢,
X i ’s are independent
i1
ln fŸX; 2 , X ’s are identically distributed
i
2
ln fŸX; 2 2
nE
2
ln fŸX; 2 0 (9.3.2). Consequently it follows from (9.3.4) and (9.3.5) that
n Var
since E
2
2
VarŸT u
¡A U Ÿ2 ¢
VarŸU '
¡A U Ÿ2 ¢
nE
(9.3.5)
2
ln fŸX; 2
2
2
From th.9.3.2a it follows that if T is an unbiased estimator of AŸ2 and if varŸT is equal
to the right-hand side of (9.3.1), then T is the unbiased estimator with the smallest
possible variance.
ſ
258
[Chapter 9]
EXAMPLE 9.3.6 :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution with parameter 2. Then
1 e " 2x
for x 0
2
fŸx; 2 0
otherwise
and for x 0 we have that
ln fŸx; 2 " ln 2 " x
2
i.e. 2
ln fŸx; 2 " 1 x2 x "2 2 .
2
2
2
Then
EX
2
ln fŸX; 2
2
E
ŸX " 2
24
2
varŸX
24
2
2 4 12
2
2
For AŸ2 2 we have that A U Ÿ2 1 and then
¡A U Ÿ2 ¢
nE X
2
2
2
ln fŸX; 2
22 .
12
n
n • 212
2
Since varŸX 2n from Example 9.3.5 it follows that X is the unbiased estimator of 2
with the smallest possible variance. Since this is true for all possible values of 2, X is the
uniform minimum variance unbiased estimator of 2.
JTheorem 9.3.2b :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with density function
n
fŸx; 2 . Let U ' !
i1
2
ln£fŸX i ; 2 ¤. If T is an unbiased estimator of AŸ2 with variance
equal to the Cramer-Rao lower bound then T must be some linear function of U ' ( or
equivalently U ' must be a linear function of T ) with probability 1 .
Proof :
The equality in (9.3.4) can only be true if corŸT, U ' 1 which implies that they are
each a linear function of the other one with probability one. ſ
NOTE. When U ' aT b the constants a and b for which this is true will usually
depend on 2 i.e. we should really write it as U ' aŸ2 T bŸ2 .
259
[Chapter 9]
EXAMPLE 9.3.6a
Continuation of Example 9.3.6.
n
n
ln£fŸX i ; 2 ¤ ! X i "2 2
2
i1
i1
n
2 X" n
2
2
n
2 ŸX " 2
2
and since X is an unbiased estimator of 2, X must be an unbiased estimator with
variance equal to the CRLB. Note that X is also the value of 2 for which the derivative
of the log likelihood function is 0 i.e. X is the maximum likelihood estimator of 2.
In this case U ' !
2
JTheorem 9.3.2c :
Suppose that T is an unbiased estimator of AŸ2 with variance equal to the CRLB.
Suppose that T ' is an unbiased estimator of A ' Ÿ2 also with variance equal to the CRLB.
Then A ' Ÿ2 must be a linear function of AŸ2 .
Proof :
From theorem 9.3.2b it follows that
U ' aŸ2 T bŸ2 and also U ' cŸ2 T ' dŸ2
where aŸ2 , bŸ2 , cŸ2 and dŸ2 depend only on 2 and not on the X i ’s and T and T '
only depend on the X i ’s and in no way depend on 2. It then follows that
aŸ2 T bŸ2 cŸ2 T ' dŸ2
aŸ2
bŸ2 " dŸ2
T
.
cŸ2
cŸ2
aŸ2
Note that
only depends on 2 and not on the X i ’s . But since T ' can only depend
cŸ2
aŸ2
on the X i ’s we must have that
is some constant, say e, that does not depend on 2
cŸ2
or the X i ’s .
bŸ2 " dŸ2
must be some constant for which the same is true. Hence
Similarly
cŸ2
T ' eT f. But since E¡T ' ¢ A ' Ÿ2 for all theta and E¡T¢ AŸ2 for all 2 and e
and f are constants, it must be true that A ' Ÿ2 eAŸ2 f for all 2. ſ
i.e. T ' 260
[Chapter 9]
EXAMPLE 9.3.6b
Continuation of Example 9.3.6a.
Since X is an unbiased estimator of 2 with variance equal to the CRLB there cannot
exist unbiased estimators of 1/2 or e 2 with variance equal to the CRLB. This does not
mean however that UMVUE estimators does not exist - we simply cannot use the
Cramer-Rao lower bound to show that such estimators are UMVUE. We will pursue this
problem of how to find UMVUE estimators further in Chapter 10.
NOTE. An estimator T of IJŸș which is unbiased and with variance equal to the CRLB
will be called a CRLB estimator of IJŸș .
JTheorem 9.3.2d :
If T is an unbiased estimator of IJŸș such that U ' cŸș T dŸș then T is a CRLB
estimator of IJŸș and
A U Ÿ2
varŸT CRLB .
cŸ2
Proof :
Since T is an unbiased estimator of AŸ2 and U ' a linear function of T, T must be the
CRLB estimator i.e. varŸT CRLB. Furthermore
varŸU ' £cŸ2 ¤ 2 varŸT £cŸ2 ¤ 2 CRLB
2
¡A U Ÿ2 ¢
i.e.
CRLB
£cŸ2 ¤ 2 CRLB
¡A U Ÿ2 ¢
[since CRLB varŸU '
i.e. £CRLB¤ 2 A U Ÿ2
cŸ2
i.e. varŸT CRLB 2
from th.9.3.2a]
2
A U Ÿ2
cŸ2
ſ
EXAMPLE 9.3.6c
Continuation of Example 9.3.6b
In this case U ' n2 X " n and since X is an unbiased estimator of 2 it is the CRLB
2
2
estimator of AŸ2 2 and therefore
2
A U Ÿ2
varŸT 1n
2n
cŸ2
22
which is the CRLB obtained in Example 9.3.6.
261
[Chapter 9]
JTheorem 9.3.2e :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with density function
fŸx; ș . If T is the CRLB estimator
§ of IJŸș , where IJ is a one-to-one function, and if the
maximum likelihood estimator ș of ș is
L
§
unique, then T is some function of 2.
L
Proof :
§
Note that U ' is the derivative of the log of the likelihood function for 2 i.e. 2 is the
L
value of 2 for which U ' 0. If T is the CRLB estimator of AŸ2 then
§
U ' cŸ2 T dŸ2 . The value of 2 for which U ' 0 therefore depends on T i.e. 2 is a
L
§
function of T, say 2 uŸT where u is some one-to-one function. The maximum
L
§
§
likelihood estimator of AŸ2 , say A, is then given by A AŸuŸT .
L
L
§
ſ
Since u and A are both one-to-one functions T u "1 ŸA "1 Ÿ A
L
EXAMPLE 9.3.6d
Continuation of Example 9.3.6c.
In this case the CRLB estimator of ș is X and is also the maximum likelihood estimator
of ș.
EXAMPLE 9.3.7 :
Suppose that X 1 , X 2 , T , X n are independent random variables with a geometric
distribution with parameter 2. Let AŸ2 1/2. Since E¡X i ¢ 1/2 it follows that
X
n
! X i /n is an unbiased estimator of 1/2 and since varŸX i Ÿ1 " 2 /2 2 we
i1
have that varŸX 1 "22 .
n2
We have that ln fŸx; 2 ln 2 Ÿx " 1 lnŸ1 " 2
1
"x
and 2
ln fŸx; 2 1 " x " 1 1 " x2 2
2
1"2
2Ÿ1 " 2
Ÿ1 " 2
and therefore E
2
ln fŸX; 2
2
E
"X
Ÿ1 " 2
1
Ÿ1 " 2
1
Ÿ1 " 2
1
2
2
2
varŸX
2
since E¡X¢ 1/2
1"2 1
.
22
2 2 Ÿ1 " 2
262
[Chapter 9]
For AŸ2 1/2 we have A U Ÿ2 "1/2 2 and therefore
CRLB " 212
n•
2
1 "22
n2
1
2 2 Ÿ1"2
which is equal to varŸX i.e. X is the CRLB estimator of 1/2.
n
We have that U ' !
i1
n
2
ln£fŸX i ; 2 ¤
! 1 " Xi " 1
2
1"2
i1
"
1
1"2
n
! X i n 12 i1
1
1"2
n Xn 1 1
2
1"2
1"2
n
n
"
X
.
1"2
2Ÿ1 " 2
"
Since X is an unbiased estimator of 1/2 and U ' is a linear function of X it also follows
from this that X is the CRLB estimator of 1/2 and that the variance of X is equal to
" 212
A U Ÿ2
1 "22 which is the value of the CRLB obtained above.
cŸ2
n2
n• 1
Ÿ1"2
The maximum likelihood estimator of ș is that value for which U ' 0 i.e.
n
n
" n X
£"2X 1¤ 0
1"2
2Ÿ1
"
2
2Ÿ1
"2
§
X i.e. in this
i.e. 2 1/X and the maximum likelihood estimator of 1/2 is 1/Ÿ1/X
L
case the maximum likelihood estimator is the same as the CRLB estimator. Note that 2 is
not a linear function of 1/2 and therefore cannot have a CRLB estimator - not even the
maximum likelihood estimator 1/X is a CRLB estimator and we do not know if it is even
an unbiased estimator.
ƒDefinition 9.3.3 :
The relative efficiency of an unbiased estimator T of AŸ2 to another unbiased estimator
T ' of AŸ2 is defined by
VarŸT '
.
reŸT, T ' VarŸT
An unbiased estimator T ' of AŸ2 is said to be efficient if reŸT, T ' t 1 for all unbiased
estimators T of AŸ2 for all possible values of 2.
263
[Chapter 9]
The efficiency of an unbiased estimator T of IJŸș is given by
eŸT reŸT, T ' if T ' is an efficient estimator of IJŸș .
EXAMPLE 9.3.8 :
Continuation of Examples 9.3.4 , 9.3.5 and 9.3.6.
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
2
distribution with parameter 2. Then X is an unbiased estimator of 2 with varŸX 2n
and nY 1 is also an unbiased estimator with varŸnY 1 2 2 .
22
VarŸX
n
Therefore reŸnY 1 , X 1n
VarŸnY 1
22
i.e. for large n the relative efficiency of nY 1 with respect to X is very small.
From Example 9.3.6 we know that X is the unbiased estimator of 2 with the smallest
possible variance for all values of 2 i.e. X is an efficient estimator of 2.
Since X is an efficient estimator of ș the efficiency of nY 1 is given by
eŸnY 1 reŸnY 1 , X 1n .
So far we only considered unbiased estimators. It is possible that we can get an estimator
T of IJŸș which has a very small bias but for which the differences T " IJŸș is always
very small. Such an estimator may be preferred to any possible unbiased estimator of
IJŸș .
ƒDefinition 9.3.4 :
If T is an estimator of IJŸș , then the bias is given by
bŸT E¡T¢ " AŸ2
and the mean squared error is given by
MSEŸT E¡£T " AŸ2 ¤ 2 ¢ .
NOTE : If we call T " IJŸș the "error" of the estimator, then
MSEŸT E¡£" error" ¤ 2 ¢.
264
[Chapter 9]
JTheorem 9.3.2 :
If T is an estimator of IJŸș , then
MSEŸT varŸT £bŸT ¤ 2 .
Proof :
MSEŸT
E¡£T " AŸ2 ¤ 2 ¢
E¡£T " E¡T¢ E¡T¢ " AŸ2 ¤ 2 ¢
E¡£T " E¡T¢¤ 2 2£E¡T¢ " AŸ2 ¤£T " E¡T¢¤ £E¡T¢ " AŸ2 ¤ 2 ¢
varŸT £bŸT ¤ 2 since E¡T " E¡T¢¢ E¡T¢ " E¡T¢ 0
ſ
EXAMPLE 9.3.9 :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with density function
given by
e "Ÿx"1
for x 1
fŸx; 1 0
otherwise.
Then
.
E¡X¢ ; xe "Ÿx"1 dx
let y x " 1 i.e. dy dx
1
.
; Ÿy 1 e "y dy
0
.
.
0
0
; y 2"1 e "y dy 1 ; y 1"1 e "y dy
Ÿ2 1Ÿ1 1 1;
.
E¡X 2 ¢ ; x 2 e "Ÿx"1 dx
let y x " 1 i.e. dy dx
1
.
; Ÿy 1 2 e "y dy
0
.
; Ÿy 2 21y 1 2 e "y dy
0
.
;
y 3"1 e "y dy
.
21 ;
0
Ÿ3 21Ÿ2 y 2"1 e "y dy
.
0
2 21 1 2 " Ÿ1 1
2
2 21 1 2 " £1 21 1 2 ¤
1.
; y 1"1 e "y dy
0
1 2 Ÿ1
2 21 1 2 ;
varŸX
12
265
[Chapter 9]
Hence E¡X " 1¢ Ÿ1 1 " 1 1
i.e. X " 1 is an unbiased estimator of 1 and
varŸX
varŸX " 1 varŸX 1n .
n
Let FŸx be the distribution function for the population i.e.
for x 1
0
x
FŸx ; e "Ÿt"1 dt
for x u 1.
1
For x 1
x
FŸx
; e "Ÿt"1 dt
let y t " 1 i.e. dy dt
1
x"1
; e "y dy
0
x"1
"e "y | 0
1 " e "Ÿx"1 .
Therefore
FŸx 0
for x 1
1 " e "Ÿx"1
for x u 1.
Let Y 1 min£X 1 , X 2 , T , X n ¤.
Then F Y 1 Ÿy 1
1 " ¡1 " FŸy 1 ¢ n
1 " ¡1 " £1 " e "Ÿy 1 "1 ¤¢
1"
n
for y 1 1
0
and f Y 1 Ÿy 1 for y 1 1
0
e "nŸy 1 "1
for y 1 u 1
0
for y 1 1
ne "nŸy 1 "1
for y 1 u 1
for y 1 u 1
266
[Chapter 9]
Therefore
.
E¡Y 1 ¢ ; y 1 ne "nŸy 1 "1 dy 1
let u nŸy 1 " 1 i.e. du ndy 1
1
.
; un 1 ne "u 1n du
0
.
; un 1 ne "u 1n du
0
.
.
0
0
1n ; u 2"1 e "u du 1 ; u 1"1 e "u du
1n Ÿ2 1Ÿ1
1n 1.
If Y 1 were to be used as an estimator of 1 it will not be an unbiased estimator.
But
MSEŸY 1 E £Y 1 " 1¤ 2
.
; £y 1 " 1¤ 2 ne "nŸy 1 "1 dy 1
let u nŸy 1 " 1 i.e. du ndy 1
1
.
;
0
u
n
2
ne "u 1n du
.
12 ; u 3"1 e "u du 12 Ÿ3 2 12 .
n 0
n
n
Therefore MSEŸX " 1 varŸX 1n u 2 12 MSEŸY 1 if n 1.
n
Although Y 1 is not an unbiased estimator of 1, it’s MSE is less than that of the
unbiased estimator X " 1.
OEXERCISES :
Bain and Engelhardt p.330 No.’s 15, 17, 21, 28(a)-(d) and 33(a)-(d).
267
[Chapter 9]
9.4 LARGE-SAMPLE PROPERTIES
Properties of estimators discussed so far were for fixed values of n. It turns out that
although estimators may have undesirable properties for small n, they may have very
good asymptotic properties as n tend to infinity and may therefore be useful estimators to
use for large values of n.
ƒDefinition 9.4.1 :
Let £T n ¤ be a sequence of estimators of AŸ2 i.e. T n is an estimator of AŸ2 if the
sample size is n. These estimators are said to be consistent estimators of AŸ2 if for
every . 0
lim
P¡|T n " AŸ2 | .¢ 1 for every 2 (.
nv.
For a consistent estimator we know that if the sample size is large there is a probability
close to 1 that the value of the estimator will be very close to AŸ2 irrespective of the true
value of 2.
P
NOTE that the sequence £T n ¤ of estimators of AŸ2 is consistent if T n v AŸ2 .
ƒDefinition 9.4.2 :
Let £T n ¤ be a sequence of estimators of AŸ2 i.e. T n is an estimator of AŸ2 if the
sample size is n. These estimators are said to be mean squared error consistent
estimators of AŸ2 if
lim
E £T n " AŸ2 ¤ 2
nv.
lim
MSE¡T n ¢ 0 for every 2 (.
nv.
For mean squared error consistent estimators the expected value of the squared error in
the estimate is very small for large n.
ƒDefinition 9.4.3 :
Let £T n ¤ be a sequence of estimators of AŸ2 i.e. T n is an estimator of AŸ2 if the
sample size is n. These estimators are said to be asymptotically unbiased if
lim
E¡T n ¢ AŸ2 for every 2 (.
nv.
For asymptotically unbiased estimators the expected value of T n is very close to AŸ2
irrespective of the value of 2.
268
[Chapter 9]
JTheorem 9.4.1 :
A sequence of estimators of AŸ2 is mean squared error consistent if and only if it is
asymptotically unbiased and
varŸT n 0
lim
nv.
Proof :
By theorem 9.3.2
Ÿ9. 4. 1
MSEŸT n varŸT n ¡EŸT n " AŸ2 ¢ 2 .
Therefore if £T n ¤ is asymptotically unbiased and lim
varŸT
0,
n
nv.
MSE¡T
then lim
0
for
every
2
(
i.e.
is
MSE consistent.
¢
£T
¤
n
n
nv.
MSE¡T n ¢ 0 and this can only
On the other hand if £T n ¤ is MSE consistent then lim
nv.
happen if both terms on the right-hand side tend to 0, i.e. lim
varŸT n 0 and
nv.
E¡T n ¢ AŸ2 i.e. £T n ¤ is asymptotically unbiased .
lim
nv.
EXAMPLE 9.4.1 :
Let X 1 , X 2 , T , X n be a random sample from a population with an exponential
distribution with parameter 2. Let T n 1/X n be used as an estimator of 1/2. Note that
X n is the maximum likelihood estimator of 2 and therefore 1/X n is the maximum
likelihood estimator of 1/2. Let
n
n
2 ! i1 X i
Yn 2nX n L D 2 Ÿ2n by th.8.3.3 since ! X i L GAMŸ2, n . Hence
2
2
i1
2n
"1
Tn Y n where Y n L D 2 Ÿ2n . Therefore by th.8.3.2
2
EŸT n
Ÿn " 1
2n 2 "1 2
Ÿn
n 1
n"1 2
using r "1
since Ÿn Ÿn " 1 Ÿn " 1 .
Therefore T n 1/X n is not an unbiased estimator of 1/2 for any finite value of n, but
since lim
EŸT n 1/2 it is an asymptotic unbiased estimator of 1/2.
nv.
Furthermore
2
2n 2 2 "2 Ÿn " 2 using r "2 in th.8.3.2
E¡T 2n ¢ 2n E¡Y "2
n ¢ 2
2
Ÿn
2
1
since Ÿn Ÿn " 1 Ÿn " 2 Ÿn " 2
n2 Ÿn " 1 Ÿn " 2
2
and
269
[Chapter 9]
2
n 1
1
"
var¡T n ¢ n 2 n"1 2
Ÿn " 1 Ÿn " 2
2
2
1
n"1 " n"2
n2
2
n"2
n"2
2 Ÿn " 1
2
1
Ÿn " 2 2 2
i.e. lim
var¡T n ¢ 0 for all values of 2. From th.9.4.1 it then follows that T n is MSE
nv.
consistent for 1/2.
n
n"1
2
JTheorem 9.4.2 :
If a sequence £T n ¤ is mean squared error consistent for AŸ2 then it is also consistent
for AŸ2 .
Proof :
We have that
1 u P¡|T n " AŸ2 | .¢ P £T n " AŸ2 ¤ 2 . 2
u 1 " E £T n " AŸ2 ¤ 2 /. 2 by the Markov inequality.
1 " MSEŸT n /. 2
and since £T n ¤ is MSE consistent lim
MSEŸT n 0 and therefore
nv.
P¡|T n " AŸ2 | .¢ 1 for any . 0 i.e. £T n ¤ is consistent.
lim
nv.
ſ
JTheorem 9.4.3 :
If £T n ¤ is consistent for AŸ2 and if g is everywhere continuous then £gŸT n ¤ is
consistent for gŸAŸ2 .
Proof :
It follows from th.7.7.2 with T n Y n and AŸ2 c and the fact that being consistent is
the same as convergence in probability. ſ
§
It follows from th.9.4.3 that if 2 n is consistent for 2 and A is a continuous function,
L
§
then A 2 n
is consistent for AŸ2 .
L
270
[Chapter 9]
ƒDefinition 9.4.4 :
Let £T n ¤ and £T 'n ¤ be two asymptotically unbiased sequences of estimators of AŸ2 .
The asymptotic relative efficiency of T n relative to T 'n is defined by
areŸT n , T 'n lim
nv.
varŸT 'n
.
varŸT n
.
The sequence £T 'n ¤ is said to be asymptotically efficient if areŸT n , T 'n t 1 for all other
asymptotically unbiased sequences £T n ¤.
The asymptotic efficiency of an asymptotically unbiased sequence £T n ¤ is given by
aeŸT n areŸT n , T 'n if £T 'n ¤ is asymptotically efficient.
EXAMPLE 9.4.2 :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution with parameter 2. Then
fŸx; 2 1 e "x/2 , ln fŸx; 2 " lnŸ2 " x/2
2
and 2 ln fŸx; 2 " 1 x2 x "2 2
2
2
2
2
varŸX
ŸX " 2 2
and E
ln
fŸX;
2
E
12 .
4
2
24
2
2
If we let AŸ2 1/2 then A U Ÿ2 "1/2 2 and the CRLB for an unbiased estimator of 1/2
is given by
1 2 . From Example 9.4.1 we have that if T n 1/X n then
n2
n 212
2
n
1
.
n 1 and var¡T n ¢ n"1
n"1 2
Ÿn " 2 2 2
CRLB EŸT n
1
24
CRLB 1 i.e. although the T n ’s are not unbiased estimators they are in the
Then lim
nv. varŸT n
limit and in the limit the variance of T n is as small as the variance of any sequence of
unbiased estimators.
271
[Chapter 9]
ASYMPTOTIC PROPERTIES OF MLEs
Under certain conditions, it can§be shown that maximum likelihood estimators possess
very desirable properties. Let 2 n be the solution of the maximum likelihood equations.
Then under certain conditions
§
1. 2 n exists and is unique
§
2 n is consistent.
2.
L
3.
§
2n
L
is asymptotically normal with asymptotic mean 2 and asymptotic
2
variance 1/nE
4.
§
2n
L
ln fŸX; 2
2
is asymptotically efficient.
Note that the asymptotic variance is equal to the CRLB for an unbiased estimator of 2.
It also follows from th.7.7.6 that if A is a function with non-zero derivatives that if
§
§
§
is asymptotically normal with asymptotic mean AŸ2 and
A n A 2 n then A n
L
L
L
asymptotic variance ¡A U Ÿ2 ¢ 2 CRLB which is the CRLB for an unbiased estimator of
AŸ2 .
EXAMPLE 9.4.3 :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution with parameter 2. Then X n is the maximum likelihood estimator of 2.
From the central limit theorem we know that asymptotically X n has a normal distribution
with expected value 2 and asymptotic variance 2 2 /n. Since the CRLB for an unbiased
estimator of 2 is 2 2 /n it shows that X n is an efficient estimator of 2. From the
properties of MLEs it also follows that X n is asymptotically efficient. Now let
RŸ2 P¡X t¢ e "t/2 where t is some fixed number 0. (RŸt is the reliability as
defined in chapter 16).
" t
Then R n e Xn is the maximum likelihood estimator of RŸ2 and by th.7.7.6 is
asymptotically normal distributed with asymptotic expected value RŸ2 and asymptotic
variance
var R n
`
2
RŸ2
2
£2 2 /n¤
2
¡e "t/2 t/2 2 ¢ £2 2 /n¤.
272
[Chapter 9]
9.5 BAYES AND MINIMAX ESTIMATORS
Let T be an estimator of AŸ2 . The difference between the value of AŸ2 and the value
of the estimator T is referred to as the "error". This error may have certain consequences
for a person making use of the estimate in stead of the value of AŸ2 . These
consequences, if it can be measured in some way by a non-negative real number, is
referred to as the loss suffered by using the estimate.
ƒDefinition 9.5.1 :
If T is an estimator of AŸ2 , then a loss function is any real-valued function, LŸt; 2
such that
LŸt; 2 u 0 for all values of t
and
LŸt; 2 0 if t AŸ2 .
NOTE that LŸt; 2 £t " AŸ2 ¤ 2 satisfies the conditions to be a loss function.
ƒDefinition 9.5.2 :
The risk function is defined as R T Ÿ2 E¡LŸT; 2 ¢ i.e. the risk function is the expected
value of the loss using T as an estimate of AŸ2 .
NOTE that the value of the risk function depends on the value of 2.
NOTE that if the loss function is £t " AŸ2 ¤ 2 then the risk function is MSE.
If one is faced with estimating a parameter or function of a parameter, one way to proceed
could be to choose an appropriate loss function for the particular problem and then try to
make the expected loss as small as possible for all values of the parameter.
ƒDefinition 9.5.3 :
An estimator T 1 is a better estimator than T 2 if and only if
R T 1 Ÿ2 t R T 2 Ÿ2 for all 2 (
and
R T 1 Ÿ2 R T 2 Ÿ2 for at least one 2 (.
An estimator T is admissable if and only if there is no better estimator.
273
[Chapter 9]
We will only consider admissable estimators since if an estimator is not admissable there
is some estimator with risk less than or equal to it for all values of 2 and definitely better
for some values of 2.
Typically an estimator will have smaller risk for some 2 and bigger risk for other values
of 2 than some other estimator.
ƒDefinition 9.5.4 :
An estimator T 1 is a minimax estimator if
max R T 1 Ÿ2 t max R T Ÿ2
2
2
for every estimator T i.e.
max R T 1 Ÿ2 min max R T Ÿ2
2
2
T
One will use a minimax estimator if one wants to make the worst possible expected loss
as small as possible.
It can happen that a minimax estimator will have a slightly smaller maximum expected
loss than another estimator but have much bigger expected losses for almost all other
values of 2 than the other estimator. To avoid this pitfall one may consider an average
value of the expected loss. In statistics we usually calculate the expected value of a
random variable using the probability distribution of the random variable. To calculate
such an average loss one will need some ”probability distribution” for the values of 2.
The fact that we use such a distribution to determine the average expected loss does not
mean that the parameter becomes a random variable, the value of which is determined by
the outcome of some experiment. Let p be some density function that has high values
for values of 2 which are considered very likely and has small values for values of 2
considered very unlikely. If no such prior knowledge of 2 exists one may consider a
uniform distribution to use.
ƒDefinition 9.5.5 :
For a random sample from a population with distribution fŸx; 2 , the Bayes risk of an
estimator T relative to a risk function R T Ÿ2 and density function pŸ2 is the average
risk with respect to pŸ2 , namely
A T E 2 ¡R T Ÿ2 ¢ ; R T Ÿ2 pŸ2 d2.
(
274
[Chapter 9]
ƒDefinition 9.5.6 :
For a random sample from a population with distribution fŸx; 2 , the Bayes estimator T '
relative to the risk function R T Ÿ2 and distribution pŸ2 is the estimator with minimum
expected risk, i.e.
E 2 ¡R T ' Ÿ2 ¢ t E 2 ¡R T Ÿ2 ¢ for every estimator T.
NOTE that the distribution pŸ2 is used to calculate the average risk. This is an
additional assumption about the parameter 2 and is a tool that can be used to choose
between two estimators. Hopefully a researcher will have prior knowledge about the
likely values of the parameter 2 to help in choosing the function pŸ2 . The distribution
pŸ2 is referred to as the prior distribution.
ƒDefinition 9.5.7 :
The conditional density of 2 given the sample observations x Ÿx 1 , x 2 , T x n
the posterior density and is given by
f 2|X Ÿ2 .
is called
fŸx 1 , x 2 , T , x n |2 pŸ2
; fŸx 1 , x 2 , T x n |2 pŸ2 d2
".
NOTE in the above definition, if X 1 , X 2 , T , X n and 2 were all ordinary random
variables, fŸx 1 , x 2 , T , x n |2 pŸ2 would be the joint density function of all the random
variables.
The Bayes estimator is the estimator that minimizes the average risk over 2 namely
E 2 ¡R T Ÿ2 ¢ E 2 £E X|2 ¡LŸT; 2 ¢¤ E X £E 2|X ¡LŸT; 2 ¢¤.
If for all values of x we use for T, which is a function of x, that value which minimises
E 2|Xx ¡LŸT; 2 ¢ for that particular x, then E X £E 2|X ¡LŸT; 2 ¢¤ will be minimised.
JTheorem 9.5.1 :
If X 1 , X 2 , T , X n is a random sample from a population with distribution fŸx; 2 , then
the Bayes estimator is the estimator that minimises the expected loss relative to the
posterior distribution of 2 given x, E 2|X ¡LŸT; 2 ¢.
Proof : Follows from the remark before the theorem.
275
[Chapter 9]
JTheorem 9.5.2 :
The Bayes estimator, T, of AŸ2 under the squared error loss function,
LŸT; 2 ¡T " AŸ2 ¢ 2
is the conditional expected value of AŸ2 relative to the posterior distribution,
.
T E 2|X ¡AŸ2 ¢ ; AŸ2 f 2|X Ÿ2 d2.
".
Proof :
For any random variable X the quantity gŸc E £X " c¤ 2
since
is minimised if c E¡X¢
gŸc E¡X 2 " 2cX c 2 ¢ E¡X 2 ¢ " 2cE¡X¢ c 2 and
d
dc
gŸc "2E¡X¢ 2c which is equal to 0 if c E¡X¢.
.
Therefore E 2|X ¡LŸT; 2 ¢ ; £TŸx " AŸ2 ¤ 2 f 2|X Ÿ2 d2 is minimised if
".
.
TŸX ; AŸ2 f 2|X Ÿ2 d2.
".
JTheorem 9.5.3 :
The Bayes estimator, T, of 2 under the absolute error loss function, LŸT; 2 |T " 2|
is the median of the posterior distribution f 2|X Ÿ2 .
Proof :
For any random variable X with density function fŸx the quantity gŸc E¡|X " c|¢ is
minimised if c is the median of X since
.
c
gŸc
; Ÿc " x fŸx dx ; Ÿx " c fŸx dx
".
c
c
.
".
c
cFŸc " ; xfŸx dx ; xfŸx dx " c£1 " FŸc ¤ and
d
dc
gŸc
FŸc cfŸc " cfŸc " cfŸc " £1 " FŸc ¤ cfŸc
2FŸc " 1
which is equal to 0 if FŸc 1/2 i.e. c is the median of X.
.
Therefore E 2|X ¡LŸT; 2 ¢ ; |TŸx " 2|f 2|X Ÿ2 d2 is minimised if TŸX is the median of
f 2|X Ÿ2 .
".
276
[Chapter 10]
CHAPTER 10 : SUFFICIENCY AND COMPLETENESS
10.1 INTRODUCTION
Suppose that X 1 , X 2 , T , X n are jointly distributed random variables and that their joint
distribution depends on certain parameters. The observed values of the random variables
contain information about the distribution of the random variables and therefore contains
information on the values of the parameters which we can use to estimate the values of
the parameters. The likelihood function gives us information on the relation between the
observed values and the parameters since it depends on the observed values and the value
of the parameters.
A question that arises is whether or not it is really necessary to know all the different
observed values and whether it is possible that all the information about the parameters is
contained in a few properties(functions) of the observed values?
EXAMPLE 10.1.1
Let X 1 , X 2 , T , X n be n independent Bernoulli random variables with probability of
success 2. Let x 1 , x 2 , T , x n be the observed values. Then the likelihood function for 2,
given the observed values, is given by
LŸ2 2 %x i Ÿ1 " 2 n"%x i .
The value of this function can be determined without knowing the individual x i ’s, we
only need to know the sum of the x i ’s.
Intuitively we therefore feel that all the information in the sample about the value of 2 is
contained in the sum of the observations. To formalise this, let us determine the
conditional distribution of the x i ’s given the sum of the x i ’s.
277
[Chapter 10]
n
Let S ! X i . Then conditional distribution of the X i ’s given S is as follows
i1
f X 1 ,X 2 ,T,X n |S Ÿx 1 , x 2 , T , x n |s
P¡X 1 x 1 , X 2 x 2 , T , X n x n |S s¢
P¡X 1 x 1 , X 2 x 2 , T , X n x n , S s¢/P¡S s¢
P¡X 1 x 1 , X 2 x 2 , T , X n x n ¢
P¡S s¢
if ! x i s
0
if ! x i p s
2 %x i Ÿ1 " 2 n"%x i
n 2 s Ÿ1 " 2 n"s
s
if ! x i p s
0
1
n
s
if ! x i s
if
! xi s
if ! x i p s
0
i.e. it does not depend on 2 and therefore contains no information about 2.
Now let T tŸX 1 , X 2 , T , X n and let C t £Ÿx 1 , x 2 , T , x n |tŸx 1 , x 2 , T , x n t¤.
Then the conditional distribution of T given S is
f T|S Ÿt | s P¡T t | S s¢
! f X 1 ,X 2 ,T,X n |S Ÿx 1 , x 2 , T , x n |s
Ct
which does not depend on 2. The individual x i ’s nor any function of them contain any
n
information about 2 if ! X i is known.
i1
10.2 SUFFICIENT STATISTICS
Let X ŸX 1 , X 2 , T , X n be a vector of random variables with joint distribution fŸx; 2
where x Ÿx 1 , x 2 , T , x n and 2 Ÿ2 1 , 2 2 , T , 2 r is a vector of parameters. Let S be a
vector of k statistics say
S ŸS 1 , S 2 , . . . , S k Ÿs 1 ŸX 1 , X 2 , T , X n , s 2 ŸX 1 , X 2 , T , X n , . . . , s k ŸX 1 , X 2 , T , X n . We
will assume that the functions s 1 , s 2 , . . . , s k are such that no one of them can be expressed
in terms of the other functions.
278
[Chapter 10]
ƒDefinition 10.2.1 :
Let X be a vector of random variables with joint distribution
fŸx; 2 and let S ŸS 1 , S 2 , . . . , S k be a k-dimensional statistic. Then S 1 , S 2 , . . . , S k is a
set of jointly sufficient statistics for 2 if for any other vector of statistics T, the
conditional distribution of T given S, denoted by f T|S , does not depend on 2 .
In the case of a single parameter 2 and a single statistic S, we simply say that S is
sufficient for 2.
The vector statistic S will be jointly sufficient for 2 if any other vector statistic T
contains no additional information about 2 if we already have the information contained
in S .
When we consider ”other” statistics T we have in mind that they are functions of the
X i ’s, say T i t i ŸX 1 , X 2 , T , X n , i 1, 2, T , u, which cannot be expressed in terms of
each other or in terms of the functions that define S 1 , S 2 , . . . , S k .
One problem with definition 10.2.1 is that we have to consider all possible vector
statistics T and have to determine the conditional distribution of T given S . The latter
may be fairly difficult to do.
In the case of discrete random variables we can first get the conditional distribution of X
given S as follows :
P¡X 1 x 1 , X 2 x 2 , T , X n x n | S 1 s 1 , S 2 s 2 , T , S k s k ¢
P¡X 1 x 1 , X 2 x 2 , T , X n x n , S 1 s 1 , S 2 s 2 , T , S k s k ¢
P¡S 1 s 1 , S 2 s 2 , T , S k s k ¢
P¡X 1 x 1 , X 2 x 2 , T , X n x n ¢
P¡S 1 s 1 , S 2 s 2 , T , S k s k ¢
if s i Ÿx 1 , x 2 , T , x n s i i
0
otherwise
f X Ÿx; 2
f S Ÿs; 2
if s i Ÿx 1 , x 2 , T , x n s i i
0
otherwise
279
[Chapter 10]
Now let C t £Ÿx 1 , x 2 , T , x n |t i Ÿx 1 , x 2 , T , x n t i for i 1, 2, . . . , q¤.
Then
P¡T 1 t 1 , T 2 t 2 , . . . , T q t q |S 1 s 1 , S 2 s 2 , . . . , S k s k ¢
! P¡X 1 x 1 , X 2 x 2 , T , X n x n |S 1 s 1 , S 2 s 2 , T , S k s k ¢.
Ct
f X Ÿx; 2
does not depend on 2 it will follow from the above that the
f S Ÿs; 2
conditional distribution of T given S is independent of 2 for all possible statistics T.
(NOTE: To show a similar result for the continuous case involve some technical
difficulties e.g. the transformation from n variables to n q variables.)
If it turns out that
Consider a random sample X 1 , X 2 , T , X n from a population with density function
fŸx; 2 . Let S 1 , S 2 , . . . , S n be the order statistics. Then
f S Ÿs; 2 n!fŸs 1 ; 2 fŸs 2 ; 2 . . . . fŸs n ; 2
0
And therefore
for s 1 s 2 . . . , s n
otherwise
f X Ÿx; 2 /f S Ÿs; 2 if s i Ÿx 1 , x 2 , T , x n s i i
fŸx 1 ; 2 fŸx 2 ; 2 T fŸx n ; 2
if s i Ÿx 1 , x 2 , T , x n s i i
n!fŸs 1 ; 2 fŸs 2 ; 2 T fŸs n ; 2
1 since the x i ’s is just the s i ’s in some order.
n!
This does not depend on 2 and this then means that the order statistics are jointly
sufficient for 2 .
JTheorem 10.2.1 :
Suppose that X ŸX 1 , X 2 , T , X n is a random vector with joint distribution fŸx; 2 and
that S ŸS 1 , S 2 , . . . , S k , k n, is a vector of statistics. Then S is jointly sufficient for 2
if and only if there exist functions g and h such that
fŸx 1 , x 2 , T , x n ; 2 gŸs; 2 hŸx 1 , x 2 , T , x n
where gŸs; 2 depend on x 1 , x 2 , T , x n only through s and hŸx 1 , x 2 , T , x n in no way
involves 2.
280
[Chapter 10]
Proof :
Let S 1
u 1 ŸX 1 , X 2 , T , X n
S2
u 2 ŸX 1 , X 2 , T , X n
..............................
Sk
u k ŸX 1 , X 2 , T , X n
Let T 1 , T 2 , T , T n"k be other statistics such that
T1
v 1 ŸX 1 , X 2 , T , X n
T2
v 2 ŸX 1 , X 2 , T , X n
..............................
T n"k v n"k ŸX 1 , X 2 , T , X n
S
S
say
qŸX .
T
T
We will assume that R n can be subdivided into regions such that over each one of these
regions the transformation is one-to-one and that on the i-th region we have that
S
X q "1
. We then have that
i
T
s
s
Ji x v
f S,T Ÿs, t; 2 ! f q "1
i
t
t
i
This gives us a transformation from X to
! gŸs; 2 h q "1
i
i
gŸs; 2 ! h q "1
i
i
. .
and f S Ÿs; 2
s
t
Ji x v
s
t
s
t
Ji x v
s
t
.
gŸs; 2 ; ; T ; ! h q "1
i
". ".
". i
s
t
Ji x v
s
t
dt
gŸs; 2 mŸs
where mŸs does not depend on 2 since not one of the functions h, q "1
i or J i depend on
2.
f Ÿs, t; 2
Therefore f T|S Ÿt; s S,T
f S Ÿs; 2
! h q "1
i
i
s
t
Ji x v
s
t
mŸs
which is independent of 2. The marginal distribution of any subset of the T ’s will also
be independent of 2. Thus the conditional distribution of any other statistic T given S is
independent of 2 and therefore S is jointly sufficient for 2. ſ
281
[Chapter 10]
NOTE: If S is jointly sufficient for 2 and if S ' uŸS is a one-to-one transformation
from S to S ' , then S ' will be jointly sufficient. For any given s ' there is a unique
value of s such that s ' uŸs i.e. knowing that S ' s ' is equivalent to knowing that
S s. Therefore f T|S ' Ÿt|s ' f T|S Ÿt|s which is independent of 2.
EXAMPLE 10.2.2 :
Consider a random sample X 1 , X 2 , T , X n where X i L BINŸ1, 2 . Let
n
n
i1
i1
S ! X i and s ! x i . Then
fŸx 1 , x 2 , T , x n ; 2 2 %x i Ÿ1 " 2 n"%x i 2 s Ÿ1 " 2 n"s gŸs; 2 1
i.e. hŸx 1 , x 2 , T , x n 1 for all x i 0 or 1 for all i and 0 otherwise which does not
depend on 2.
n
From th.10.2.1 it follows that S ! X i is sufficient for 2.
i1
It may happen that a set of statistics S 1 , S 2 , T , S k is jointly sufficient for 2 but that all the
information is really contained in only some of the S i ’s. In Example 10.2.2 above, if we
n
let s 1 ! x i and s 2 x 1 , we can write the joint distribution as
i1
fŸx 1 , x 2 , T , x n ; 2 2 %x i Ÿ1 " 2 n"%x i £x 1 ¤ 0 ¡2 s 1 Ÿ1 " 2 n"s 1 s 02 ¢ 1 gŸs 1 , s 2 ; 2
i.e. S 1 and S 2 are jointly sufficient for ș. However, since S 1 is sufficient , if we know
s 1 then s 2 does not contain any extra information about ș.
ƒDefinition 10.2.2 :
A set of statistics is called a minimal sufficient set if the members of the set are jointly
sufficient for the parameters and if they are a function of every other set of jointly
sufficient statistics.
In the example above s 1 can be determined if we you know both s 1 and s 2 but the
converse is not true i.e. both s 1 and s 2 cannot be determined if you only know s 1 .
282
[Chapter 10]
EXAMPLE 10.2.3 :
Consider a random sample X 1 , X 2 , T , X n from a population which is uniformly
distributed over the interval Ÿ0, 2 where 2 is unknown. Let X 1:n and X n:n be the
smallest and largest values respectively. Then the joint distribution is given by
fŸx 1 , x 2 , T , x n ; 2
1/2 n
if 0 x i 2 for i 1, 2, . . . , n
0
otherwise
1/2 n
if 0 x 1:n and x n:n 2
0
otherwise
gŸs; 2 hŸx 1 , x 2 , T , x n
where gŸs; 2
and hŸx 1 , x 2 , T , x n
1/2 n
if s x n:n 2
0
otherwise
1
if x 1:n 0
0
otherwise.
It then follows from the factorisation criterium that X n:n is a sufficient statistic for ș.
ƒDefinition 10.2.3 :
If A is a set then the indicator function of A, denoted by I A , is defined as
1
if x A
I A Ÿx 0
if x A.
Using indicator functions we can rewrite Example 10.2.3 as follows:
fŸx; ș £1/ș¤I Ÿ0,ș Ÿx ,
n
fŸx 1 , x 2 , T , x n ; 2 fŸx i ; 2
n
£1/2 n ¤ I Ÿ0,2 Ÿx i
i1
i1
£1/2 n ¤
I Ÿ0,2 Ÿx n:n I Ÿ0,. Ÿx 1:n
¡£1/2 n ¤ I Ÿ0,2 Ÿx n:n ¢ hŸx 1 , x 2 , T , x n .
283
[Chapter 10]
EXAMPLES :
Example 10.2.4
Let X 1 , X 2 , T , X n be a random sample from a population which is normally distributed
with parameters 6 and @ 2 which are both unknown. Then
n
fŸx i ; 6, @ 2
fŸx 1 , x 2 , T , x n ; 6, @ 2
i1
Ÿ2=
Ÿ2=
1n
2
1n
2
@n
@n
e
e
"
1
2@ 2
% ni1 Ÿx i "6
"
1
2@ 2
¡% ni1 x 2i "26% ni1 x i n6 2 ¢
2
.
n
n
i1
i1
This expression only depends on the x i ’s through s 1 ! x i and s 2 ! x 2i i.e.
fŸx 1 , x 2 , T , x n ; 6, @ 2 gŸs 1 , s 2 ; 6, @ 2 hŸx 1 , x 2 , T , x n where
" 1 ¡s "26s "n6 2 ¢
1n
e 2@ 2 2 1
and hŸx 1 , x 2 , T , x n q 1.
gŸs 1 , s 2 ; 6, @ 2 n
Ÿ2= 2 @
n
n
i1
i1
It follows from the factorisation criterion that S 1 ! X i and S 2 ! X 2i are jointly
sufficient statistics for 6 and @ 2 . The maximum likelihood estimators of 6 and @ 2 are
n
§
6 X S 1 /n and @ 2 !ŸX i " X 2 /n S 2 /n " ŸS 1 /n 2 . This is a one-to-one
L
L
i1
transformation of S 1 and S 2 and are therefore also jointly sufficient for 6 and @ 2 .
Example 10.2.5
Consider a random sample X 1 , X 2 , T , X n from a population which is uniformly
distributed over the interval Ÿ2, 2 1 where 2 is unknown. Let X 1:n and X n:n be the
smallest and largest values respectively. Then the joint distribution is given by
n
fŸx 1 , x 2 , T , x n ; 2 fŸx i ; 2
i1
n
I Ÿ2,21 Ÿx i
i1
I Ÿ2,. Ÿx 1:n I Ÿ".,21 Ÿx n:n
which shows that X 1:n and X n:n are jointly sufficient for ș. (It can be shown that they
are minimal sufficient i.e. we would loose information if only one of them is known ).
284
[Chapter 10]
10.3 FURTHER PROPERTIES OF SUFFICIENT STATISTICS
JTheorem 10.3.1 :
§
If S ŸS 1 , S 2 , T , S k is sufficient for ș and if ș is a unique maximum likelihood
L
§
estimator of ș , then ș must be a function of S.
L
Proof :
If S is sufficient then by the factorisation criterium
LŸș fŸx 1 , x 2 , T , x n ; ș gŸs; ș hŸx 1 , x 2 , T , x n
which means that to maximise LŸș it is necessary to maximise gŸs; ș i.e. the value of
ș for which the maximum is obtained depends on s. If the maximum likelihood estimator
is unique this defines a function of s. ſ
§ §
§
NOTE: If the maximum likelihood estimators 2 1 , 2 2 , T , 2 k are unique and sufficient
L
L
L
they will be minimal sufficient since the factorisation criterium applies to all sets of
sufficient statistics and therefore also to minimal sufficient statistics.
NOTE: Example 10.3.1 in Bain and Engelhardt shows that if the maximum likelihood
estimates are not unique it is possible to find maximum likelihood estimators which are
not functions of sufficient statistics.
NOTE: If S is a sufficient statistic with a known distribution, the distribution of S can
be used to find the maximum likelihood estimate - gŸs; 2 is proportional to the
distribution function of S.
285
[Chapter 10]
JTheorem 10.3.2 :
If S is sufficient for ș, then any Bayes estimator of ș will be a function of S.
Proof :
The posterior distribution of 2 given x is given by
fŸx 1 , x 2 , T , x n ; 2 pŸ2
gŸs; 2 hŸx 1 , x 2 , T , x n pŸ2
f 2|x Ÿ2 ; fŸx 1 , x 2 , T , x n ; 2 pŸ2 d2
; gŸs; 2 hŸx 1 , x 2 , T , x n pŸ2 d2
!
!
gŸs; 2 pŸ2
; gŸs; 2 pŸ2 d2
!
which depends only on s and 2. Hence E 2|x ¡LŸT; 2 ¢ only depends on T and s and
therefore the value of T for which it is a minimum only depends on s i.e. is a function
of s. By theorem 9.5.1 the Bayes estimate depends only on s. ſ
JTheorem 10.3.3 :
If X 1 , X 2 , T , X n is a random sample from a population with a continuous distribution
with density function fŸx; ș , then the order statistics are sufficient for ș.
Proof :
For fixed x 1:n , x 2:n , . . . , x n:n and associated x 1 , x 2 , T , x n the conditional density function
of x 1 , x 2 , T , x n given x 1:n , x 2:n , T , x n:n is given by
fŸx 1 ; 2 fŸx 2 ; 2 T fŸx n ; 2
n!fŸx 1:n ; 2 fŸx 2:n ; 2 T fŸx n:n ; 2
1
if x 1:n minŸx 1 , x 2 , T , x n , . . , x n:n maxŸx 1 , x 2 , T , x n
n!
0
otherwise
i.e. the conditional distribution does not depend on ș which implies the sufficiency of the
order statistics. ſ
286
[Chapter 10]
JTheorem 10.3.4 : The Rao-Blackwell theorem
Let X 1 , X 2 , T , X n be jointly distributed random variables with joint density function
fŸx 1 , x 2 , T , x n ; 2 and let S be a vector of jointly sufficient statistics for 2. If T is any
unbiased estimator of AŸ2 and if T ' E¡T|S¢, then
1. T ' is a function of S ,
2. T ' is an unbiased estimator of AŸ2 , and
3. VarŸT ' t varŸT for all 2 and varŸT ' varŸT for some 2
unless T ' T with probability 1.
Proof :
Since S is jointly sufficient the conditional distribution of T given S does not depend on
2 . Therefore t ' Ÿs E¡T|s¢ does not depend on 2. Therefore T ' E¡T | S¢ is a statistic
that depends on S and does not depend on 2.
Furthermore
E¡T ' ¢ E S ¡T ' ¢ since T ' is a function of S
E S ¡E¡T|S¢¢
E¡T¢ by theorem 5.4.1
AŸ2 for all values of 2 since T is an unbiased estimator of AŸ2
is an unbiased estimator of IJŸș .
i.e.
From theorem 5.4.3 we have that
varŸT var¡E¡T|S¢¢ E¡varŸT|S ¢
T'
u var¡E¡T|S¢¢ varŸT '
and equality holds if and only if E¡varŸT|S ¢ 0 which will be true if and only if
varŸT|S 0 with probability 1 (since a variance is always u 0) and varŸT|S can only
be zero if T is equal to its expected value i.e. T ' with probability 1 ſ
NOTE: From the Rao-Blackwell theorem it follows that if any unbiased estimator T
exists and if S is jointly sufficient, there will exist a function of S that is an unbiased
estimator namely E¡T | S¢. This unbiased estimator which is a function of S will also
have a smaller variance unless it is already a function of S . To look for an UMVUE
estimator we can therefore restrict our search to sufficient statistics. If there exist a
unique function of sufficient statistics which is unbiased it will be the UMVUE estimator.
The problem of uniqueness will be discussed in the next section.
287
[Chapter 10]
10.4 COMPLETENESS AND THE EXPONENTIAL CLASS
ƒDefinition 10.4.1 : COMPLETENESS
A family ( i.e. set ) of density functions £f T Ÿt; 2 ; 2 (¤ is called complete if
E¡uŸT ¢ 0 for all 2 ( implies uŸT 0 with probability 1 for all 2 (.
NOTE: If the functions u 1 ŸT and u 2 ŸT of T are both unbiased estimators of AŸ2 ,
then E¡u 1 ŸT " u 2 ŸT ¢ AŸ2 " AŸ2 0 for all values of 2. In the case of completeness
this would imply that u 1 ŸT " u 2 ŸT 0 with probability 1 i.e. u 1 ŸT u 2 ŸT with
probability 1 i.e. for all practical purposes the two fuctions are exactly the same i.e there
is a unique function of T that is an unbiased estimator of AŸ2 .
NOTE: The Rao-Blackwell theorem implies that what we are essentially interested in is
whether or not the family of distributions of a set of sufficient statistics are complete or
not.
NOTE: A sufficient statistic, the density of which is a member of a complete family of
density functions, is referred to as a complete sufficient statistic.
JTheorem 10.4.1 LEHMANN-SCHEFFE
Let X 1 , X 2 , T , X n be random variables with joint density function fŸx 1 , x 2 , T , x n ; 2 and
let S be a vector of jointly complete sufficient statistics for 2 . If T ' t ' ŸS is a statistic
that is unbiased for AŸ2 , then T ' is the UMVUE of AŸ2 .
Proof :
From the completeness it follows that any statistic that is a function of S and an unbiased
estimator of AŸ2 , must be equal to T ' with probability 1. If T is any other statistic that
is an unbiased estimator of AŸ2 , then by the Rao-Blackwell theorem E¡T | S¢ is also an
unbiased estimator of AŸ2 and a function of S, so by uniqueness T ' E¡T | S¢ with
probability 1.
Furthermore varŸT ' t varŸT for all 2 . Thus T ' is the UMVUE of AŸ2
ſ
288
[Chapter 10]
EXAMPLE 10.4.1 :
Let X 1 , X 2 , T , X n be a random sample from a population which has a Poisson
distribution with parameter 6. Then
e "n6 6 %x i
.
fŸx 1 , x 2 , T , x n ; 6 $Ÿx i !
By the factorization criterium S ! X i is a sufficient statistic. The distribution of S is a
Poisson distribution with parameter n6. The expected value of any function u of S is
given by
.
e "n6 Ÿn6
s!
E¡uŸS ¢ ! uŸs s0
s
e
"n6
.
!
s0
uŸs n s s
6.
s!
.
uŸs n s s
6 0 for all 6 0 since e "n6 p 0 for
s!
s0
any 6 0. This power series in 6 will be equal to 0 for all 6 0 if and only if all the
uŸs n s
0 for all s i.e. uŸs 0 for all s i.e. uŸs is 0
coefficients of 6 s are 0 i.e.
s!
with probability 1. Therefore the family of distributions for S is complete.
Since E¡S/n¢ n6/n 6 for all values of 6, S/n is the UMVUE of 6. The variance of
S/n is n6/n 2 6/n.
To determine the Cramer-Rao lower bound for an unbiased estimator of ȝ we use
fŸx; 6 e "6 6 x /x!
This will be 0 for all 6 0 only if !
ln fŸx; 6 "6 x ln 6 " lnŸx!
6
E
ln fŸx; 6 "1 x/6 Ÿx " 6 /6
6
ln fŸX; 6
2
E¡ŸX " 6 2 /6 2 ¢ varŸX /6 2 6/6 2 1/6
and therefore CRLB 1/Ÿn • 1/6 6/n.
In this case the variance of the UMVUE of 6 is actually equal to the CRLB.
n
Note that !
i1
6
ln fŸX i ; 6 ŸS " n6 /6 Ÿn/6 ŸS/n " n
which is a linear function of S/n which is an unbiased estimator of 6. From this we can
also conclude that varŸS/n is equal to the CRLB for an unbiased estimate of 6.
We also have that E¡£S/n¤ 2 ¢ varŸS/n £E¡S/n¢¤ 2 6/n 6 2 i.e. £S/n¤ 2 is a
function of the complete sufficient statistic S and is an unbiased estimator of 6/n 6 2 .
From the Lehmann-Scheffe theorem it then follows that £S/n¤ 2 is the UMVUE estimate
n
of 6/n 6 2 . Since !
i1
6
ln fŸX i ; 6 is not a linear function of £S/n¤ 2 it will mean that
the variance of the UMVUE is not equal to the Cramer-Rao lower bound.
289
[Chapter 10]
To find the UMVUE of any function AŸ6 it is only necessary to find some function of S
that is an unbiased estimator. If there is a difficulty finding an unbiased estimator of AŸ6
which is a function of S, first find any function T of X 1 , X 2 , T , X n which is an
unbiased estimator and then determine E¡T | S¢.
In any particular case it may still be a substantial problem whether or not a family of
density functions is complete.
ƒDefinition 10.4.2 : REGULAR EXPONENTIAL CLASS (REC)
A density function is said to be a member of a regular exponential class if it can be
expressed in the form
cŸ2 hŸx e % j1 q j Ÿ2 t j Ÿx
for x A
0
otherwise
k
fŸx; 2 where 2 Ÿș 1 , ș 2 , T , ș k is a vector of k unknown parameters and the parameter space
has the form
ȍ £2 | a i t ș i t b i , i 1, 2, . . . , k¤ ( a i ". and b i . permissible )
and if it satisfies the regularity conditions 1, 2 and 3a or 3b given by :
1. The set A £x|fŸx; 2 0¤ does not depend on 2.
2.
The functions q j Ÿ2 are non-trivial, functionally independent,
continuous functions of the 2 i .
3a. For a continuous random variable the derivatives t Uj Ÿx are linearly
independent continuous functions of x over A.
3b. For a discrete random variable the t j Ÿx are non-trivial functions
of x on A and none is a linear function of the others.
For convenience we will write that fŸx; 2 is a member of RECŸq 1 , q 2 , . . . , q k or simply
REC.
NOTE: The notion of REC can be extended to the case where X is a vector.
JTheorem 10.4.2a
A REC of density functions is complete if t j Ÿx x for all j.
Proof: We will not prove this theorem.
290
[Chapter 10]
JTheorem 10.4.2b
Suppose that X 1 , X 2 , T , X n is a random sample from a population with a density
function that is a member of a REC of density functions.
The statistics
n
n
n
i1
i1
i1
S 1 ! t 1 ŸX i , S 2 ! t 2 ŸX i , T , S k ! t k ŸX i is a set of sufficient statistics for
2 Ÿ2 1 , 2 2 , . . . , 2 k .
Proof :
The joint density function of X 1 , X 2 , T , X n is given by
fŸx 1 , x 2 , . . . , x n ; 2
n
cŸ2 hŸx i e % j1 q j Ÿ2 t j Ÿx i
k
i1
n
hŸx i
£cŸ2 ¤ n e % j1 q j Ÿ2 ¡% i1 t j Ÿx i
k
n
¢
i1
so that it follows from the factorization criterium that S 1 , S 2 , . . . , S k is a set of sufficient
statistics. ſ
JTheorem 10.4.2
Suppose that X 1 , X 2 , T , X n is a random sample from a population with a density
function that is a member of a REC of density functions.
The set of sufficient statistics S 1 , S 2 , . . . , S k has a joint distribution which is member of a
REC that is complete.
Proof : No proof will be given.
An unbiased estimator of AŸ2 , the variance of which is equal to the Cramer- Rao lower
bound, will be referred to as a CRLB estimator.
EXAMPLES :
Example 10.4.2
Suppose that X is a random variable with a Bernoulli distribution with parameter p.
Then
fŸx; p p x Ÿ1 " p 1"x Ÿ1 " p £p/Ÿ1 " p ¤ x Ÿ1 " p e x ln¡p/Ÿ1"p ¢ for x A £0, 1¤
and 0 t p t 1 i.e. the Bernoulli density functions is a member of the regular exponential
class with cŸp 1 " p, qŸp ln¡p/Ÿ1 " p ¢ and tŸx x i.e. it is also a complete class
of distributions.
291
[Chapter 10]
Example 10.4.3
Let X 1 , X 2 , T , X n be independent random variables all with a Bernoulli distribution with
parameter p. From Example 10.4.2 it follows that the distribution of the X i ’s belong to a
n
REC with t 1 Ÿx i x i i.e. S 1 ! X i is a single complete sufficient statistic for p i1
th.10.4.2b. Let X S 1 /n. Then E¡X¢ p and from the Lehmann-Scheffe theorem it
then follows that X is a UMVUE estimator of p with varŸX £pŸ1 " p ¢/n.
To determine the CRLB we note that :
fŸx; p p x Ÿ1 " p 1"x
ln fŸx; p x ln p Ÿ1 " x lnŸ1 " p
x"p
ln fŸx; p px " .1 " x p
1"p
pŸ1 " p
1
1
E¡£ p
ln fŸX; p ¤ 2 ¢ varŸX pŸ1 " p
¡pŸ1 " p ¢ 2
pŸ1 " p
12
and therefore CRLB
varŸX
n
1
n pŸ1"p
i.e. X is the CRLB estimator of p.
From th.9.3.2c it follows that no CRLB estimator of pŸ1 " p exist. But
2
n XŸ1 " X
E
n
p"E X
n"1
n"1
2
n
p " varŸX ¡EŸX ¢
n"1
pŸ1 " p
n
p"
Ÿp 2
n
n"1
n
pŸ1 " p 1 " 1n
n"1
pŸ1 " p
n
i.e.
XŸ1 " X is an unbiased estimator and a function of the complete sufficient
n"1
statistic S i.e. from the Lehmann-Scheffe theorem it follows that it is the UMVUE
estimtor of pŸ1 " p .
292
[Chapter 10]
Example 10.4.4
Suppose that X is a random variable with a normal density function with parameters 6
and @ 2 . Then
2
1 e " 12 ¡ x"6
1 exp x 2 "1 x 26 " 6 2
@ ¢
fŸx; 6, @ 2 2@ 2
2@ 2
2@ 2
2= @
2= @
which is a member of a REC with t 1 Ÿx x 2 and t 2 Ÿx x. The statistics
n
n
i1
i1
S 1 ! X 2i and S 2 ! X i are therefore a set of complete sufficient statistics for 6
and @ 2 i.e. any unbiased estimator of any function of 6 and @ 2 which is a function of
S 1 and S 2 will therefore be a UMVUE estimator.
We state the next two theorems witout proof.
JTheorem 10.4.3
If a CRLB estimator T exists for AŸ2 , then a single sufficient statistic exists, and T is a
function of the sufficient statistic. Conversely, if a single sufficient statistic exist and the
CRLB exists, then a CRLB estimator exists for some AŸ2 .
JTheorem 10.4.4
If the CRLB exists, then a CRLB estimator will exist for some function AŸ2 if and only
if the density function is a member of the REC. Furthermore the CRLB estimator of AŸ2
§
§
will be A 2 where 2 is the maximum likelihood estimator of 2.
L
L
293
[Chapter 11]
CHAPTER 11 : INTERVAL ESTIMATION
11.1 INTRODUCTION
Although point estimation is very important, it is necessary to get an indication of the
accuracy of the estimate. The MSE of the estimate gives an indication of the error. In
most cases, however, the MSE will depend on the parameters of the distribution. A
possibility is to estimate the MSE using estimates of the parameter to get some indication
of the possible error in the estimate of the parameter.
A more satisfactory solution is to find a set of values of ș depending on the observed
values of X 1 , X 2 , T , X n which is a subset of the set of possible values of ș , say ȍ , and
which is such that there is a high probability that this set will include the value, referred to
as the true value, of ș.
ƒDefinition 11.1.1 :
Suppose that X 1 , X 2 , T , X n are jointly distributed random variables with joint
distribution fŸx 1 , x 2 , T , x n ; ș . Let ȍ be the set of all possible values ș.
For all possible observed values x 1 , x 2 , T , x n of X 1 , X 2 , T , X n let
RŸx 1 , x 2 , T , x n ’ ȍ.
Let ș 0 be the true value of ș.
If P¡2 0 RŸX 1 , X 2 , T , X n | 2 0 ¢ + for all 2 0 (
then RŸx 1 , x 2 , T , x n is called a 100 • Ȗ% confidence region for ș.
NOTE : For some observed values x 1 , x 2 , T , x n the region RŸx 1 , x 2 , T , x n will include
2 0 and for some observed values RŸx 1 , x 2 , T , x n will not include 2 0 . The probability
that the region will include 2 0 must be + for all possible values of 2 0 for the region to
be a 100 • +% confidence region.
In the rest of this chapter we will consider methods to find such confidence regions. We
would like to get confidence regions which are as small as possible. This means that we
should use statistics with good properties. Such statistics are discussed in Chapter 10 of
Bain and Engelhardt. In general it is true that if such statistics exist that they are
functions of the maximum likelihood estimators of the parameters.
294
[Chapter 11]
Confidence regions based on maximum likelihood estimators will therefore be used
whenever possible.
11.2 CONFIDENCE INTERVALS
ƒDefinition 11.2.1 :
Suppose that X 1 , X 2 , T , X n are jointly distributed random variables with joint
distribution fŸx 1 , x 2 , T , x n ; ș . Let ȍ be the set of all possible values ș.
Let l : R n v R and u : R n v R be two functions such that
lŸx 1 , x 2 , T , x n uŸx 1 , x 2 , T , x n for all values of x 1 , x 2 , T , x n .
If ș 0 is the true value of ș and
P¡lŸX 1 , X 2 , T , X n ș 0 uŸX 1 , X 2 , T , X n | ș 0 ¢ Ȗ for all ș 0 ȍ
then the interval RŸx 1 , x 2 , T , x n ŸlŸx 1 , x 2 , T , x n , uŸx 1 , x 2 , T , x n is called a
100 • Ȗ% confidence interval for ș.
For different sets of observed values x 1 , x 2 , T , x n we will get different intervals. Some
will include ș and some will not include ș.
For any observed values x 1 , x 2 , T , x n of X 1 , X 2 , T , X n the values of lŸx 1 , x 2 , T , x n
and uŸx 1 , x 2 , T , x n are called the lower and upper limits of the confidence interval
respectively. The quantity + is called the confidence level.
EXAMPLE 11.2.1 :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution with parameter 2.
In this case the X i ’s are independent GAMŸș, 1 random variables i.e.
n
! X i L GAMŸ2, n by Example 6.4.2. From th.8.3.3 it then follows that
i1
2 ! i1 X i
L D 2 Ÿ2n
2
i.e. if X is the sample mean, then 2nX L D 2 Ÿ2n . Hence
2
n
295
[Chapter 11]
0. 95 P D 2.025 Ÿ2n t 2nX t D 2.975 Ÿ2n | 2
2
2 u
1
|2
D 2.975 Ÿ2n
2nX
P
1
D 2.025 Ÿ2n
u
P
2nX
D 2.025 Ÿ2n
u2u
2nX
|2
D 2.975 Ÿ2n
P
2nX
D 2.975 Ÿ2n
t2t
2nX
|2
D 2.025 Ÿ2n
i.e. if x 1 , x 2 , T , x n are the observed values of X 1 , X 2 , T , X n then the interval from
2nx
to 2 2nx
is a 95% confidence interval for 2. Note that this interval
D 2.975 Ÿ2n
D .025 Ÿ2n
is based on x which is the maximum likelihood estimate of 2.
ƒDefinition 11.2.2 :
Suppose that X 1 , X 2 , T , X n are jointly distributed random variables with joint
distribution fŸx 1 , x 2 , T , x n ; ș . Let ȍ be the set of all possible values ș.
1. If l : R n v R is such that
P¡lŸX 1 , X 2 , T , X n 2 | 2¢ + for all 2 (
then lŸx 1 , x 2 , T , x n is called a one-sided lower 100+% confidence limit for 2.
2. If u : R n v R is such that
P¡uŸX 1 , X 2 , T , X n 2 | 2¢ + for all 2 (
then uŸx 1 , x 2 , T , x n is called a one-sided upper 100+% confidence limit for 2.
EXAMPLES :
Example 11.2.2
Continuation of Example 11.2.1.
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
distribution with parameter 2. Then, if X is the sample mean,
2nX L D 2 Ÿ2n . Hence
2
0. 95 P 2nX t D 2.95 Ÿ2n |2
2
P
2 u
1
|2
D 2.95 Ÿ2n
2nX
P 2u
P
2nX |2
D 2.95 Ÿ2n
2nX
D 2.95 Ÿ2n
t 2|2
296
[Chapter 11]
i.e. if x 1 , x 2 , T , x n are the observed values of X 1 , X 2 , T , X n then
95% confidence limit for 2. Similarly
0. 95 P 2nX u D 2.05 Ÿ2n | 2
2
P
2nx
is a lower
2nx
is an upper
D 2.95 Ÿ2n
2 t
1
|2
D 2.05 Ÿ2n
2nX
P 2t
2nX
D 2.05 Ÿ2n
|2
i.e. if x 1 , x 2 , T , x n are the observed values of X 1 , X 2 , T , X n then
95% confidence limit for 2.
D 2.05 Ÿ2n
These examples shows that usually lower and upper confidence limits for ș can be found
quite easily once we have found a confidence interval for ș . We will therefore
concentrate on finding confidence intervals for ș.
The problem still remains how to find such confidence intervals.
Example 11.2.3
Suppose that X 1 , X 2 , T , X n is a random sample from a population which is normally
distributed with expected value 6 and variance @ 2 where 6 is unknown and @ 2 is
known. The maximum likelihood estimate of 6 is the sample mean X - see Example
2
9.2.6. Then X is normally distributed with expected value 6 and variance @n X"6
is a standard normal random variable - th.3.3.4.
th.8.3.2a. Then Y @
n
297
[Chapter 11]
Hence
X"6
1 " ) P "z 1" )2 @
n
z 1" )2 |Ÿ6, @ 2
P "z 1" )2 @ X " 6 z 1" )2 @ |Ÿ6, @ 2
n
n
P "X " z 1" )2 @ "6 "X z 1" )2 @ |Ÿ6, @ 2
n
n
P X z 1" )2 @ 6 X " z 1" )2 @ |Ÿ6, @ 2
n
n
P X " z 1" )2 @ 6 X z 1" )2 @ |Ÿ6, @ 2
n
n
i.e. x " z 1" )2 @ to x z 1" )2 @ is a 100Ÿ1 " ) % confidence interval for 6.
n
n
Note that it is only possible to calculate the limits of the interval if @ 2 is known.
In the above examples we managed to find confidence limits using the two quantities
2nX L D 2 Ÿ2n and X " 6 L NŸ0, 1 . The problem still remains how to find such
@
2
n
confidence intervals. In the next two sections we study two methods of obtaining
confidence intervals.
11.3 PIVOTAL QUANTITY METHOD
ƒDefinition 11.3.1 :
Suppose that Q qŸX 1 , X 2 , T , X n ; ș is a quantity that is a function of only
X 1 , X 2 , T , X n and ș .
Then Q is called a pivotal quantity if its distribution does not depend on ș or any other
unknown parameters .
In Example 11.2.1 the quantity 2nX is a pivotal quantity since it is a function only of
2
n
1
X n ! X i and 2 and its distribution is D 2 Ÿ2n which does not depend on 2. In
i1
Example 11.2.3
X"6
and 6 ( remember
depend on 6.
@
n
@2
n
is a pivotal quantity since it is only a function of X 1n ! X i
i1
is assumed known ) and its distribution is NŸ0, 1 which does not
298
[Chapter 11]
NOTE : The distribution of a pivotal quantity qŸX 1 , X 2 , T , X n ; 2 is independent of 2
only if 2 is the true value of the parameter. For example if X i L EXPŸ5 then
2nX L D 2 Ÿ2n and if X i L EXPŸ10 then 2nX L D 2 Ÿ2n .
5
10
Suppose that Q qŸX 1 , X 2 , T , X n ; ș is a pivotal quantity. Let q Ȗ be the value for which
Ȗ P¡qŸX 1 , X 2 , T , X n ; ș t q Ȗ | ș¢ . Note that this probability is true if ș is the true value
of the parameter. Note also that since the distribution of Q does not depend on ș or any
other unknown parameters, the value of q Ȗ does not depend on ș and therefore it can be
determined without knowing the value of ș.
JTheorem 11.3.1a :
Suppose that X 1 , X 2 , T , X n are jointly distributed random variables with joint
distribution fŸx 1 , x 2 , T , x n ; ș and that
Q qŸX 1 , X 2 , T , X n ; ș is a pivotal quantity.
Given the observed values x 1 , x 2 , T , x n of X 1 , X 2 , T , X n , let
RŸx 1 , x 2 , T , x n 2 | 2 (, q )2 t qŸx 1 , x 2 , T , x n ; 2 t q 1" )2 .
Then RŸx 1 , x 2 , T , x n is a 100Ÿ1 " ) % confidence region for 2.
Proof :
Let 2 0 be the true value of 2. Then
1 " ) P¡q )2 t qŸX 1 , X 2 , T , X n ; 2 0 t q 1" )2 | 2 0 ¢
P¡2 0 RŸX 1 , X 2 , T , X n | 2 0 ¢ and it is true for all 2 0 (
i.e. RŸx 1 , x 2 , T , x n is a 100Ÿ1 " Į % confidence region for ș.
ſ
JTheorem 11.3.1b :
Suppose that X 1 , X 2 , T , X n is a random sample from a population which is normally
distributed with expected value 6 and variance @ 2 where 6 and @ 2 are unknown. Then
X"6
Q
L tŸn " 1 i.e. it is a pivotal quantity for 6 and
s/ n
~
x " t 1" )2 Ÿn " 1 s to x t 1" )2 Ÿn " 1 s is a 100Ÿ1 " ) % confidence interval
n
n
for 6.
299
[Chapter 11]
Proof :
X"6
t 1" )2 Ÿn " 1 | Ÿ6, @ 2
s/ n
1 " ) P "t 1" )2 Ÿn " 1 ~
s
P "t 1" )2 Ÿn " 1 X " 6 t 1" )2 Ÿn " 1 ~
n
s
P "X " t 1" )2 Ÿn " 1 i.e. x " t 1" )2 Ÿn " 1 interval for 6.
n
n
6 X " t 1" )2 Ÿn " 1 ~
n
s
P X " t 1" )2 Ÿn " 1 ~
6 X t 1" )2 Ÿn " 1 ~
n
| Ÿ6, @ 2
"6 "X t 1" )2 Ÿn " 1 ~
s
P X t 1" )2 Ÿn " 1 s
s to x t 1" ) Ÿn " 1 2
n
s
~
n
s
~
n
s
~
n
| Ÿ6, @ 2
| Ÿ6, @ 2
| Ÿ6, @ 2
s is a 100Ÿ1 " ) % confidence
n
ſ
JTheorem 11.3.1c :
Suppose that X 1 , X 2 , T , X n is a random sample from a population which is normally
distributed with expected value ȝ and variance ı 2 where ȝ and ı 2 are unknown.
Ÿn " 1 s 2
~
L D 2 Ÿn " 1 i.e. it is a pivotal quantity for @ 2 and
Then Q @2
Ÿn " 1 s 2
Ÿn " 1 s 2
to
is a 100Ÿ1 " ) % confidence interval for @ 2 .
D 21" )2 Ÿn " 1
D 2)2 Ÿn " 1
Proof :
We have that
Ÿn " 1 s 2
1 " ) P D 2 Ÿn " 1 t
P
P
P
1
D 2 Ÿn " 1
2)
Ÿn " 1
s2
~
D 2 Ÿn " 1
2)
Ÿn " 1 s 2
~
D 21" )2
Ÿn " 1
t D 21" )2 Ÿn " 1 | Ÿ6, @ 2
~
2)
@2
@2
Ÿn " 1 s 2
u
~
u
@2
t
u
@2
t
u
1
| Ÿ6, @ 2
D 21" )2 Ÿn " 1
Ÿn " 1 s 2
~
D 21" )2
Ÿn " 1
Ÿn " 1 s 2
~
D 2 Ÿn " 1
2)
| Ÿ6, @ 2
| Ÿ6, @ 2
300
[Chapter 11]
Ÿn " 1 s 2
D 21" )2 Ÿn " 1
i.e.
to
Ÿn " 1 s 2
D 2)2 Ÿn " 1
is a 100Ÿ1 " ) % confidence interval for @ 2 .
ſ
In general it is true that any pivotal quantity can be used to construct confidence intervals.
To obtain confidence intervals with good properties, it is usually best to find pivotal
quantities based on maximum likelihood estimators of ș.
If it is not possible to find a pivotal quantity for a particular problem, the result in the
following theorem can be used.
JTheorem 11.3.1d :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with density function
fŸx; 2 and distribution function FŸx; 2 . Then
n
Q "2 ! ln FŸX i ; 2 L D 2 Ÿ2n
i1
i.e. Q is a pivotal quantity for 2.
Proof :
The random variable FŸX i ; 2 is uniformly distributed over the interval Ÿ0, 1 and
therefore " ln FŸX i ; 2 is exponentially distributed with parameter 1 i.e. it is a gamma
random variable with parameters 1 and 1. Hence
n
!£" ln FŸX i ; 2 ¤ L GAMŸ1, n
i1
i.e.
n
2 !£" ln FŸX i ; 2 ¤ L D 2 Ÿ2n .
ſ
(11.3.1)
i1
NOTE : If X L UNIFŸ0, 1 , then £1 " X¤ L UNIFŸ0, 1 i.e. if
FŸX i ; 2 L UNIFŸ0, 1 then £1 " FŸX i ; 2 ¤ L UNIFŸ0, 1 and therefore
n
"2 ! ln£1 " FŸX i ; 2 ¤ L D 2 Ÿ2n
i1
(11.3.2)
301
[Chapter 11]
EXAMPLE 11.3.1 :
Suppose that X 1 , X 2 , T , X n is a random sample from a population with a Pareto density
function with parameters 1 and 4 i.e.
4Ÿ1 x "4"1
for x 0
fŸx; 4 0
otherwise
and therefore
x
FŸx; 4 ; fŸt; 4¤dt
".
for x 0
0
x
; 4Ÿ1 t
"4"1 dt
for x 0
0
for x 0
0
¡"Ÿ1 t
"4 x
¢0
for x 0
for x 0
0
1 " Ÿ1 x
"4
for x 0
Using (11.3.2) and for all X i 0 we have that
n
Q "2 ! ln£1 " FŸX i ; 2 ¤
i1
n
"2 !£"4 lnŸ1 X i ¤ L D 2 Ÿ2n .
i1
Now using th.11.3.1a we get that if x 1 , x 2 , T , x n are the observed values of
X 1 , X 2 , T , X n that
RŸx 1 , x 2 , T , x n
n
4 | D 2)2 Ÿ2n t 24 ! lnŸ1 x i t D 21" )2 Ÿ2n
i1
4|
D 2)2 Ÿ2n
n
2 ! lnŸ1 x i
i1
t4t
D 21" )2 Ÿ2n
n
2 ! lnŸ1 x i
i1
which gives a 100Ÿ1 " ) % confidence interval for 4.
In Example 11.3.1 it was very easy to determine the confidence interval for 4. In most
cases, however, it is not that simple to determine the confidence interval for 2 using the
pivotal quantities (11.3.1) or (11.3.2).
302
[Chapter 11]
11.4 GENERAL METHOD
In many cases a suitable pivotal quantity is not available i.e. where the distribution of
Q qŸX 1 , X 2 , T , X n ; ș does not depend on ș. In general it is much more likely that we
can find a statistic S sŸX 1 , X 2 , T , X n such that the distribution of S depends on ș. In
such cases the following theorem can be used to construct confidence regions.
JTheorem 11.4.1a :
Suppose that X 1 , X 2 , T , X n are jointly distributed random variables with joint
distribution fŸx 1 , x 2 , T , x n ; ș .
Also suppose that S sŸX 1 , X 2 , T , X n is a statistic with distribution gŸs; 2 . Let h 1 Ÿ2
and h 2 Ÿ2 be such that P¡h 1 Ÿ2 t S t h 2 Ÿ2 |2¢ 1 " ).
Given any observed value s of S, let RŸs £2|h 1 Ÿ2 t s t h 2 Ÿ2 ¤.
Then RŸs is a 100Ÿ1 " ) % confidence region for 2.
Proof :
Let 2 0 be the true value of 2. Then
1 " ) P¡h 1 Ÿ2 0 t S t h 2 Ÿ2 0 | 2 0 ¢
P¡2 0 RŸS | 2 0 ¢ for all possible values of 2 0
i.e. RŸs is a 100Ÿ1 " ) % confidence region for 2. ſ
NOTE: Suppose that GŸs; 2 is the distribution function of S.
Suppose further that h 1 Ÿș is such that P¡S t h 1 Ÿș ¢ Į/2 i.e. GŸh 1 Ÿș ; ș Į/2.
Also suppose that h 2 Ÿș is such that P¡S t h 2 Ÿș ¢ 1 " Į/2 i.e.
GŸh 2 Ÿș ; ș 1 " Į/2.
Then
P¡h 1 Ÿ2 t S t h 2 Ÿ2 | 2¢ GŸh 2 Ÿ2 ; 2 " GŸh 1 Ÿ2 ; 2
Ÿ1 " )/2 " )/2
1 " ).
From th.11.4.1a it then follows that
RŸs £2 | h 1 Ÿ2 t s t h 2 Ÿ2 ¤
is a 100Ÿ1 " Į % confidence region for ș.
303
[Chapter 11]
EXAMPLES :
Example 11.4.1:
Suppose that X 1 , X 2 , T , X n is a random sample from a population with density function
1 e " x"2
22
for x 2
22
fŸx; 2 0
otherwise
Let ( £2 | 2 0¤.
Then FŸx; ș 0 for x t ș and for x ș we have that
x
FŸx; 2
" t"2
; 12 e 22 dt
2 2
"e
" t"22
2
x
2
" x"22
1"e 2 .
Note that FŸ. ; 2 1.
Let Y 1 min£X 1 , X 2 , T , X n . ¤ Then
F Y 1 Ÿy 1 ; 2 1 " ¡1 " FŸy 1 ; 2 ¢ n
if y 1 2
0
1" 1" 1"e
"
y 1 "2
22
n
if y 1 2
if y 1 2
0
1"e
"
nŸy 1 "2
22
if y 1 2
Let h 1 Ÿ2 be such that F Y 1 Ÿh 1 Ÿ2 ; 2 0. 05 i.e.
0. 05 1 " e
i.e.
"
nŸh 1 Ÿ2 "2
22
"
nŸh 1 Ÿ2 "2
e 22
nŸh 1 Ÿ2 " 2
i.e. "
22
0. 95
ln 0. 95
i.e. h 1 Ÿ2 2 " 1n Ÿln 0. 95 2 2
2 1n Ÿ0. 0513 2 2 .
304
[Chapter 11]
Let h 2 Ÿ2 be such that F Y 1 Ÿh 2 Ÿ2 ; 2 0. 95 i.e.
0. 95 1 " e
"
nŸh 2 Ÿ2 "2
22
"
i.e.
nŸh 2 Ÿ2 "2
0. 05
e 22
nŸh 2 Ÿ2 " 2
i.e. "
22
ln 0. 05
i.e. h 2 Ÿ2 2 " 1n Ÿln 0. 05 2 2
2 1n Ÿ2. 996 2 2 .
The two functions h 1 Ÿ2 and h 2 Ÿ2 can be represented graphically as follows
h1
h2
3
y1
2
1
0
0
1
l (y1)
2
u (y1)
3
For any value of 2 we have that P¡h 1 Ÿ2 t Y 1 t h 2 Ÿ2 ¢ 0. 95 " 0. 05 0. 9.
For any observed value y 1 of Y 1 we have that
RŸy 1 £2 | h 1 Ÿ2 t y 1 t h 2 Ÿ2 ¤
£2 | lŸy 1 t 2 t uŸy 1 ¤.
Now suppose that n 10 and that y 1 2. 5 is the observed value of Y 1 .
The value of lŸy 1 is such that h 2 ŸlŸy 1
y 1 i.e.
lŸy 1 1n Ÿ2. 996 ¡lŸy 1 ¢ 2 y 1
i.e.
lŸ2. 5 1
10
Ÿ2. 996 ¡lŸ2. 5 ¢ 2 2. 5
i.e. . 2996¡lŸ2. 5 ¢ 2 lŸ2. 5 " 2. 5 0
i.e. lŸ2. 5
"1 o 1 2 " 4 •. 2996 • Ÿ"2. 5
2 •. 2996
"1
o
2
. 5992
1. 667
since 2 0.
305
[Chapter 11]
The value of uŸy 1 is such that h 1 ŸuŸy 1
uŸy 1 1n Ÿ. 0513 ¡uŸy 1 ¢ 2 y 1
i.e.
uŸ2. 5 1
10
y 1 i.e.
Ÿ. 0513 ¡uŸ2. 5 ¢ 2 2. 5
i.e. . 00513¡uŸ2. 5 ¢ 2 uŸ2. 5 " 2. 5 0
"1 o 1 2 " 4 •. 00513 • Ÿ"2. 5
2 •. 00513
"1
o
1.
0253
. 01026
2. 466
since 2 0
i.e. uŸ2. 5
i.e. the interval Ÿ1. 667, 2. 466 is a 90% confidence interval for ș if n 10 and the
observed value of Y 1 is 2. 5 .
Example 11.4.2:
Suppose that X 1 , X 2 , T , X n is a random sample from a population with an exponential
n
density function with parameter 2. Then X i L GAMŸ2, 1 and S ! X i L GAMŸ2, n
and therefore 2S L D 2 Ÿ2n .
2
Let h 1 Ÿș and h 2 Ÿș be such that
P¡S t h 1 Ÿ2 |2¢ )/2 and P¡S t h 2 Ÿ2 |2¢ 1 " )/2.
Then P¡h 1 Ÿ2 t S t h 2 Ÿ2 |2¢ Ÿ1 " )/2 " )/2 1 " ).
Then
)/2 P¡S t h 1 Ÿ2 | 2¢
2h 1 Ÿ2
P 2S t
|2
2
2
2h 1 Ÿ2
P D 2 Ÿ2n t
2
i.e.
2h 1 Ÿ2
2
i.e. h 1 Ÿ2
D 2)2 Ÿ2n
D 2)2 Ÿ2n
2.
2
i1
306
[Chapter 11]
Similarly
1 " )/2 P¡S t h 2 Ÿ2 | 2¢
2h 2 Ÿ2
P 2S t
|2
2
2
2h 2 Ÿ2
P D 2 Ÿ2n t
2
i.e.
i.e.
2h 2 Ÿ2
2
h 2 Ÿ2
D 21" )2 Ÿ2n
D 21" )2 Ÿ2n
2.
2
Now suppose that s is the observed value of S and let lŸs and uŸs be that value such
that h 2 ŸlŸs
s and h 1 ŸuŸs
s.
Then RŸs £2|h 1 Ÿ2 t s t h 2 Ÿ2 ¤
£2|lŸs t 2 t uŸs ¤.
as can be seen graphically as follows
h1
h2
s
0 l (s )
1 u (s )
2
307
[Chapter 11]
Since h 2 ŸlŸs
s we have that
D 21" )2 Ÿ2n
h 2 ŸlŸs
lŸs s
2
we have that lŸs 2 2s
D 1" )2 Ÿ2n
D 2)2 Ÿ2n
uŸs s
2
we have that uŸs 2 2s .
D )2 Ÿ2n
and since h 1 ŸuŸs
2s
2s
,
is therefore a 100Ÿ1 " ) % confidence interval
D 21" )2 Ÿ2n D 2)2 Ÿ2n
for 2 if s is the observed value of S.
The interval
In the above theorems and examples we assumed that it is possible to find the functions
h 1 Ÿ2 and h 2 Ÿ2 such that P¡h 1 Ÿ2 t S t h 2 Ÿ2 | 2¢ is exactly equal to 1 " ) for all
values of 2. For continuous random variables this is true, but for discrete random
variables it is usually not true that we can find such functions for which the probability is
exactly equal to 1 " ). To cater for discrete random variables we need to consider
so-called conservative confidence intervals.
ƒDefinition 11.4.1 :
Suppose that X 1 , X 2 , T , X n are jointly distributed random variables with joint
distribution fŸx 1 , x 2 , T , x n ; ș . Let ȍ be the set of all possible values ș .
Let RŸx 1 , x 2 , T , x n ’ ȍ.
Let ș 0 be the true value of ș.
If
P¡2 0 RŸX 1 , X 2 , T , X n | 2 0 ¢ u 1 " ) for all 2 0 (
and x 1 , x 2 , T , x n are the observed values of X 1 , X 2 , T , X n then RŸx 1 , x 2 , T , x n is
called a 100Ÿ1 " Į % conservative confidence region for ș.
308
[Chapter 11]
JTheorem 11.4.3a :
Suppose that X 1 , X 2 , T , X n are jointly distributed random variables with joint
distribution fŸx 1 , x 2 , T , x n ; ș .
Let S sŸX 1 , X 2 , T , X n be a discrete random variable with possible values
s 1 s 2 s 3 . . . .
Let h 1 Ÿș be the biggest value of the s i ’s such that P¡S h 1 Ÿș | ș¢ t Į/2 .
Let h 2 Ÿș be the smallest value of the s i ’s such that P¡S t h 2 Ÿș | ș¢ u 1 " Į/2.
Let s be the observed value of S and let RŸs £ș | h 1 Ÿș t s t h 2 Ÿș ¤.
Then RŸs is a 100Ÿ1 " ) % conservative confidence region for 2.
Proof :
For any value of 2
P¡h 1 Ÿ2 t S t h 2 Ÿ2 | 2¢ P¡S t h 2 Ÿ2 | 2¢ " P¡S h 1 Ÿ2 | 2¢
u Ÿ1 " )/2 " Ÿ)/2
1 " ).
(11.4.1)
Suppose that ș 0 is the true value of ș. Then
P¡2 0 RŸS ¢ P¡h 1 Ÿ2 0 t S t h 2 Ÿ2 0 | 2 0 ¢
u 1 " ).
since (11.4.1) true for all values of 2
i.e. RŸs a 100Ÿ1 " ) % conservative confidence region for 2. ſ
Basically we can therefore use the same method for continuous and discrete random
variables to construct either exact confidence regions or conservative confidence regions.
We will later consider making use of certain tables to find conservative confidence
intervals for the parameters of the binomial and Poisson distributions.
NOTE : Let E denote some event. The notation P¡E | ș¢ is used to denote the probability
of E when ș is the value of the parameter in the distribution used to determine the
probability of E. To simplify notation we will simply use
P¡E¢ instead of P¡E | ș¢ from now on.
309
[Chapter 11]
11.5 CONFIDENCE INTERVALS FOR SOME STANDARD PROBLEMS
11.5.1 RANDOM SAMPLES FROM NORMAL POPULATIONS :
ONE SAMPLE
In this section we will assume that X 1 , X 2 , T , X n is a random sample from a population
with a normal distribution with parameters 6 and @ 2 . We will use X to denote the
sample mean and use s 2 to denote the sample variance. We will use x and s 2 to
~
denote observed values of X and s 2 .
~
(a) Confidence interval for ȝ
From th.11.3.1b we have that
X"6
L tŸn " 1 and is a pivotal quantity for 6
s/ n
~
and that x " t 1" )2 Ÿn " 1 s to x t ) Ÿn " 1 1" 2
n
s is a 100Ÿ1 " ) % confidence
n
interval for 6.
(b) Confidence interval for ı 2 .
Ÿn " 1 s 2
From th.11.3.1c we have that
@ 2 and that
@2.
Ÿn " 1 s 2
D 21" )2 Ÿn " 1
to
~
@2
Ÿn " 1 s 2
D 2)2 Ÿn " 1
L D 2 Ÿn " 1 is a pivotal quantity for
is a 100Ÿ1 " ) % confidence interval for
11.5.2 RANDOM SAMPLES FROM NORMAL POPULATIONS :
TWO SAMPLES
Suppose that X 11 , X 12 , T , X 1n 1 is a random sample of size n 1 from a normal population
with expected value 6 1 and variance @ 21 . Let X 1 be the sample mean of the X 1i ’s i.e.
n1
X 1i
! i1
X1 . Let s 21 be the sample variance of the X 1i ’s i.e.
n1
s 21 ~
n1
! i1
ŸX 1i " X 1
n1 " 1
~
2
.
310
[Chapter 11]
Suppose that X 21 , X 22 , T , X 2n 2 is an independent random sample of size n 2 from a
normal population with expected value 6 2 and variance @ 22 . Let X 2 be the sample mean
n2
X 2i
! i1
. Let s 22 be the sample variance of the X 2i ’s i.e.
of the X 2i ’s i.e. X 2 n2
s 22 n2
! i1
ŸX 2i " X 2
.
n2 " 1
~
~
2
(a) Confidence Interval for 6 1 " 6 2
(i) @ 21 @ 22 @ 2 , say
Ÿn 1 " 1 s 21
@2
In this case X 1 L NŸ6 1 , n 1
~
and
@2
@2
L D 2 Ÿn 1 " 1 are independent and is
Ÿn 2 " 1 s 22
~
also independent of X 2 L N 6 2 , n 2
and
L D 2 Ÿn 2 " 1 which are
@2
independent.
It then follows that
2
@2
and is independent of
ŸX 1 " X 2 L N 6 1 " 6 2 , @
n1 n2
Ÿn 1 " 1 s 21
~
@2
Therefore Z Ÿn 2 " 1 s 22
ŸX 1 " X 2 " Ÿ6 1 " 6 2
@2
n1
Ÿn 1 " 1 s 21
V
~
@2
@2
n2
L NŸ0, 1 and is independent of
Ÿn 2 " 1 s 22
Hence
T
L D 2 Ÿn 1 n 2 " 2 .
~
@2
L D 2 Ÿn 1 n 2 " 2 .
~
@2
ŸX 1 "X 2 "Ÿ6 1 "6 2
Z
V
n 1 n 2 "2
@
1
@
1
n1
n12
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
~
n 1 n 2 "2
~
ŸX 1 " X 2 " Ÿ6 1 " 6 2
1
n1
1
n2
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
is distributed like t with n 1 n 2 " 2 degrees of freedom.
~
n 1 n 2 "2
~
311
[Chapter 11]
Hence
1"Į
ŸX 1 " X 2 " Ÿ6 1 " 6 2
P "t 1" )2 Ÿn 1 n 2 " 2 1
n1
P
"t 1" )2 Ÿn 1 n 2 " 2
1
n1
1
n2
~
1
n1
ŸX 1 " X 2 " t 1" Ÿn 1 n 2 " 2
)
2
ŸX 1 " X 2 " Ÿ6 1 " 6 2
~
1
n1
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
1
n2
~
1
n1
1
n2
~
n 1 n 2 "2
1
n2
ŸX 1 " X 2 t 1" Ÿn 1 n 2 " 2 •
)
2
i.e. Ÿx 1 " x 2 o t 1" )2 Ÿn 1 n 2 " 2
~
n 1 n 2 "2
t 1" )2 Ÿn 1 n 2 " 2 •
P
~
n 1 n 2 "2
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
1
n2
t 1" )2 Ÿn 1 n 2 " 2
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
~
~
n 1 n 2 "2
1
n1
Ÿ6 1 " 6 2
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
1
n2
~
~
n 1 n 2 "2
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
n 1 n 2 "2
is a 100Ÿ1 " ) %
confidence interval for Ÿ6 1 " 6 2 .
(ii) @ 21 p @ 22
The problem of determining confidence intervals for 6 1 " 6 2 in case @ 21 p @ 22 is a well
known problem in statistical literature. Unfortunately no easy solution exists for this
problem and we will only give two approximate solutions below.
ŸX 1 " X 2 " Ÿ6 1 " 6 2 d
If n 1 v . and n 2 v . it can be shown that
v Z where
2
2
s1
~
n1
s2
~
n2
Z L NŸ0, 1 . For large values of n 1 and n 2 we then have that
1 " ) ` P¡"z 1" )2 t
ŸX 1 " X 2 " Ÿ6 1 " 6 2
s 21
~
n1
s 22
t z 1" )2 ¢ from which we find
~
n2
approximate Ÿ1 " ) 100% confidence intervals for 6 1 " 6 2 of
Ÿx 1 " x 2 o z 1" )2
s 21
n1
s 22
n2
.
312
[Chapter 11]
NOTE : If n 1 and n 2 are big, the approximate confidence intervals for 6 1 " 6 2 apply
even if the populations are not normally distributed.
For small values of n 1 and n 2 of samples from normal populations reasonably accurate
Ÿ1 " ) 100% confidence intervals for 6 1 " 6 2 is obtained by using the
fact that
ŸX 1 " X 2 " Ÿ6 1 " 6 2
s 21
~
n1
where v is approximately distributed like t with v degrees of freedom
s 22
~
n2
s 21
n1
2
s 22
n2
2
s 21
n1
n 1 "1
s 22
n2
.
2
n 2 "1
Hence
1 " ) ` P "t 1" )2 Ÿv t
ŸX 1 " X 2 " Ÿ6 1 " 6 2
s 21
s 22
t t 1" )2 Ÿv
n~2
from which we can derive approximate Ÿ1 " Į 100% confidence intervals for ȝ 1 " ȝ 2 of
~
n1
Ÿx 1 " x 2 o t 1" )2 Ÿv
s 21
n1
s 22
n2
(b) Confidence interval for
.
@ 21
@ 22
Ÿn 1 " 1 s 21
~
From th.8.3.6 we have that
@2
Ÿn 2 " 1 s 22
~
@2
L D 2 Ÿn 2 " 1 . From th.8.4.7 it then follows that
Ÿn 1 "1 s 21
~
@ 21
F
Ÿn 1 "1
Ÿn 2 "1 s 22
~
@ 22
L D 2 Ÿn 1 " 1 and is independent of
s 21
s 21
~
@ 21
s 22
s 22
~
~
@ 21
@ 22
~
@ 22
Ÿn 2 "1
i.e. F is a pivotal quantity for
@ 21
@ 22
.
L FŸn 1 " 1, n 2 " 1
313
[Chapter 11]
Therefore
s 21
~
s 22
1 " ) P f )2 Ÿn 1 " 1, n 2 " 1 t
P
1
f )2 Ÿn 1 " 1, n 2 " 1
t f 1" )2 Ÿn 1 " 1, n 2 " 1
~
@ 21
@ 22
@ 21
@ 22
u
u
s 21
~
1
f 1" )2 Ÿn 1 " 1, n 2 " 1
s 22
~
P
s 21
s 21
~
s 22
~
s 22
~
f Ÿn 1 " 1, n 2 " 1
)
2
@ 21
@ 22
u
u
~
f 1" )2 Ÿn 1 " 1, n 2 " 1
s 21
P
for
f 1" )2 Ÿn 1 " 1, n 2 " 1
@ 21
@ 22
.
~
f 1" )2 Ÿn 1 " 1, n 2 " 1
s 21
s 22
i.e.
s 21
~
s 22
~
,
t
@ 21
@ 22
t
s 22
~
f )2 Ÿn 1 " 1, n 2 " 1
s 21
s 22
f )2 Ÿn 1 " 1, n 2 " 1
is a 100Ÿ1 " ) % confidence interval
314
[Chapter 11]
11.5.3 RANDOM SAMPLES FROM NORMAL POPULATIONS:
PAIRED OBSERVATION
Suppose that n experimental units are selected at random from a population of
experimental units. Suppose that two observations, say X 1i and X 2i , are made on the
i-th experimental unit. Suppose also that the joint distribution of ŸX 1i , X 2i is bivariate
normal with expected values ȝ 1 and ȝ 2 , variances ı 21 and ı 22 and covariance ı 12 .
Then Y i X 1i " X 2i , i 1, 2, 3, . . . , n, are independent normal random variables with
expected value ȝ 1 " ȝ 2 and variance ı 21 ı 22 " 2ı 12 .
Let Y ŸX 1 " X 2 be the sample mean of Y 1 , Y 2 , T , Y n and let s 2 be the sample
Y " Ÿ6 1 " 6 2
variance. Then
s / n
~
is distributed like t with n " 1 degrees of freedom i.e. it
~
is a pivotal quantity for 6 1 " 6 2 . Hence
s , y t 1" ) Ÿn " 1 s
2
n
n
is a 100Ÿ1 " ) % confidence interval for 6 1 " 6 2 .
y " t 1" )2 Ÿn " 1 NOTE : The idea to use the differences between X 1i and X 2i to get confidence intervals
for 6 1 " 6 2 can also be used in the case of independent samples to get exact confidence
intervals in case @ 21 p @ 22 . Suppose that X 11 , X 12 , T , X 1n 1 is a random sample of size n 1
from a normal population with expected value 6 1 and variance @ 21 and that
X 21 , X 22 , T , X 2n 2 is an independent random sample of size n 2 from a normal population
with expected value 6 2 and variance @ 22 . Suppose that n 1 t n 2 .
Let Y i X 1i " X 2i , i 1, 2, 3, T , n 1 . Then Y 1 , Y 2 , T , Y n 1 are independent normal
random variables with expected value 6 1 " 6 2 and variance @ 21 @ 22 .
Let Y be the sample mean of Y 1 , Y 2 , T , Y n 1 and let s 2 be the sample variance. Then
Y " Ÿ6 1 " 6 2
s/ n
~
is distributed like t with n 1 " 1 degrees of freedom i.e. it is a pivotal
~
quantity for 6 1 " 6 2 . Hence
y " t 1" )2 Ÿn 1 " 1 s , y t ) Ÿn " 1 1" 2
1
n1
s
n1
is a
100Ÿ1 " Į % confidence interval for 6 1 " 6 2 .
Although this method gives exact confidence intervals, their expected length will be
greater than that of the approximate ones given before since this method does not use all
the information available.
315
[Chapter 11]
11.5.4 CONFIDENCE INTERVALS FOR THE PROBABILITY OF SUCCESS :
ONE SAMPLE CASE
Suppose that X 1 , X 2 , T , X n are independent Bernoulli random variables with p the
n
probability of success for all of them. Then X ! X i is the number of successes in the
i1
§
,
the
proportion
of
successes
,
is
the maximum likelihood estimator of
n trials and p X
n
~
p. In this case we have that
§
p " p
d
~
v Z where Z L NŸ0, 1 .
§
§
p 1" p
~
~
n
For large values of n it then follows that
§
p " p
1 " ) ` P "z 1" )2 t
~
§
§
p 1" p
~
t z 1" )2
~
n
§
§
pŸ1" p
n
§
i.e. p o z 1" )2 is an approximate 100Ÿ1 " ) % confidence interval for p.
Since there does not exist a pivotal quantity for p and X is a discrete random variable we
can only find conservative 100Ÿ1 " Į % confidence intervals for p in the case of small
values of n.
To find 95% and 99% conservative confidence intervals for p Table 41 of Biometrika
Tables for Statisticians Vol. I by Pearson and Hartly can be used.
11.5.5 CONFIDENCE INTERVALS FOR THE DIFFERENCE BETWEEN
TWO PROBABILITIES OF SUCCESS
Suppose that X 1 L BINŸn 1 , p 1 and that X 2 L BINŸn 2 , p 2 and that X 1 and X 2
§
§
X2
1
are independent. Let p 1 X
n 1 and p 2 n 2 . If n 1 v . and n 2 v . we get that
~
§
§
p1 " p2
~
" Ÿp 1 " p 2
~
§
§
p 1 1" p 1
~
~
n1
§
§
p 2 1" p 2
~
~
n2
~
d
v Z where Z L NŸ0, 1 .
316
[Chapter 11]
For large values of n 1 and n 2 we then have that
§
§
p1 " p2
1 " ) ` P "z 1" )2 t
~
§
§
p 1 1" p 1
~
~
n1
§
§
i.e. Ÿ p 1 " p 2 o z 1" )2
" Ÿp 1 " p 2
~
§
§
p 1 Ÿ1" p 1
n1
§
§
p 2 1" p 2
~
t z 1" )2
~
n2
§
§
p 2 Ÿ1" p 2
n2
is an approximate 100Ÿ1 " ) %
confidence interval for p 1 " p 2 .
11.5.6 CONFIDENCE INTERVALS FOR THE PARAMETER OF A
POISSON DISTRIBUTION
Suppose that X 1 , X 2 , T , X n is a random sample from a population with a Poisson
distribution with parameter 6 . Let X be the sample mean. As n v . we have that
X"6
X
n
d
v Z where Z L NŸ0, 1 .
For large n
1 " ) ` P "z 1" )2 t
X"6
X
n
t z 1" )2
x
n is an approximate 100Ÿ1 " ) % confidence interval for 6.
i.e. x o z 1" )2 If X L POIŸ6 then Table 40 of Biometrika Tables For Statisticians vol. I by Pearson
and Hartly can be used to find conservative confidence intervals for 6 .
If X 1 , X 2 , T , X n is a random sample from a population with a Poisson distribution with
n
n
i1
i1
parameter 6 , ! X i L POIŸn6 . In that case ! X i can be used in Table 40 to
determine a confidence interval for n6 which can then be used to determine a confidence
interval for 6 .
317
[Chapter 12]
CHAPTER 12 : TESTS OF HYPOTHESES
12.1 INTRODUCTION
ƒDefinition 12.1.1 :
Suppose that X is a random variable with distribution fŸx; ș . A statistical hypotheses is
some statement about the distribution of X. If there is only one possible distribution for
which the statement is true it is called a simple hypothesis. If there are more than one
distribution for which the statement is true it is called a composite hypothesis.
EXAMPLE 12.1.1 :
A statement that the random variable X has a normal distribution with expected value 100
and variance 225 is a simple hypothesis. A statement that X has a normal distribution with
expected value 100 is a composite hypothesis since the statement would be true for any
NŸ100, @ 2 , @ 2 0, distribution.
Suppose that X is a random variable with distribution fŸx; 2 where the mathematical
form of f is known e.g. an exponential distribution. In such cases it is usual to formulate
the hypothesis only in terms of 2. Suppose that ( is the set of all possible values of 2.
Let ( 0 ’ (. In this case a statement that the distribution of X is such that 2 ( 0 is a
hypothesis that is usually stated as
H : 2 ( 0 . If the hypothesis is correct then 2 ( 0 and if the hypothesis is not correct
then 2 ( " ( 0 . If ( 0 consists of only one element, then the hypothesis is simple. If
( 0 consists of more than one element, then the hypothesis is composite.
To make a decision whether or not a hypothesis is true, it is necessary to plan some
experiment which will give us observations of the random variable X. Suppose that
X 1 , X 2 , T , X n is a random sample from a population with distribution fŸx; ș . Then the
X i ’s will be independent random variables all with distribution fŸx; ș .
Now suppose that x 1 , x 2 , T , x n are the observed values of X 1 , X 2 , T , X n . On the basis
of these observed values it is then necessary to make a decision whether or not to accept
the hypothesis. We will decide to reject the hypothesis if the observed values are very
unlikely , maybe impossible, if the hypothesis is true. Even if the observed values are
very unlikely if the hypothesis is true, it will never be a ”proof” that the hypothesis is not
true. It is therefore possible to make an incorrect decision about the hypothesis on the
basis of the observed values.
318
[Chapter 12]
Two types of incorrect decisions are possible :
TYPE I ERROR - To reject a hypothesis if the hypothesis is correct
TYPE II ERROR - Not to reject a hypothesis if the hypothesis is not correct.
NOTE : There are two possible decisions we can make namely to reject a hypothesis or
not to reject a hypothesis. A hypothesis is rejected if the observed values are such that it
is very unlikely to observe such values if the hypothesis is correct. This does not mean
that if the hypothesis is not rejected that the observed values strongly support the
hypothesis. It is therefore not correct to say that if a hypothesis is not rejected that we
actually accept the hypothesis as correct.
Depending on the set of observed values for which we will reject the hypothesis we can
determine the probabilities of type I and type II errors.
ƒDefinition 12.1.2 :
The set of all possible observed values for which we would reject the hypothesis is called
the critical region of the test.
Suppose that the critical region for a test is C. If Ÿx 1 , x 2 , T , x n C the hypothesis is
rejected and if Ÿx 1 , x 2 , T , x n R n " C the hypothesis is not rejected.
Normally researchers would like to prove that something, say some treatment, does make
a difference or that there is some relationship between variables. In such cases the onus to
prove the existence of such differences or relationships is on the researcher i.e. the
researcher must provide the necessary data to prove the statement. In such cases the usual
procedure is to formulate the hypotheses that there is no difference or no relationship.
This is known as the null hypothesis and is normally indicated by H 0 . The onus is then
on the researcher to show that the observed data is very unlikely on the assumption of no
difference or no relationship. It is therefore required that there is only a very small
probability that the null hypothesis will be rejected if in fact the null hypothesis is correct
i.e.
P¡reject H 0 |H 0 is true¢ P¡Type I error¢
must be small for all cases where H 0 is correct.
319
[Chapter 12]
ƒDefinition 12.1.3 :
Suppose that H 0 is a simple hypothesis. Then
Į P¡type I error¢ P¡reject H 0 |H 0 is true¢
is called the significance level of the test.
If H 0 is a composite hypothesis, then the maximum of P¡type I error¢ over all cases for
which H 0 is true, is called the size of the test.
EXAMPLE 12.1.2 :
Suppose that the lifetime of a standard electronic component is exponentially distributed
with parameter 1 i.e. the expected lifetime of a standard component is 1. The producer of
a new type of component claims that the new type of component is better than the
standard component i.e. that the new component has an expected lifetime greater than 1.
(i) In this case the null hypothesis is that the expected lifetime of the new type of
component does not differ from the expected lifetime of the standard component. The
onus is on the producer of the new type of component to prove that the expected lifetime
of the new type of component is greater than that of the standard type of component.
The null hypothesis is that the lifetime of the new type of component is exponentially
distributed with expected lifetime equal to 1. Hence the null hypothesis is H 0 : ș 1
where ș is the parameter of the exponential distribution. The null hypothesis is not true if
ș p 1. This is called the alternative hypothesis and is indicated by H A : ș p 1.
Suppose that the lifetimes of 10 of the new type of component are observed and that
X 1 , X 2 , T , X 10 are independent exponentially distributed random variables with
parameter ș.
n
Suppose the null hypothesis is rejected if 2 ! x i Ȥ 2.99 Ÿ20 37. 57.
i1
Then
P¡reject H 0 |H 0 correct¢ P
10
2 ! X i /1 37. 57| 2 1
i1
P¡D 2 Ÿ20
0. 01.
D 2.99 Ÿ20 ¢
by th.8.3.3
320
[Chapter 12]
The probability is therefore 0.01 that H 0 will be rejected if H 0 is correct i.e. if ș 1.
For the simple hypothesis H 0 : ș 1 the significance level of the test is 0.01.
(ii) Since the producer claims that the new type of component is better than the standard
type of component, his claim really is that 2 1. It would therefore be better to
formulate the null hypothesis in such a way that if the null hypothesis is rejected it would
mean that 2 1. The alternative hypothesis should therefore be H A : 2 1 and since
( £2 | 2 0¤ we will have to take ( 0 £2 | 0 2 t 1¤. In this case we have a
composite null hypothesis H 0 : 2 ( 0 .
Suppose that we use the same critical region as above namely to reject H 0 if
n
2 ! X i Ȥ 2.99 Ÿ20 37. 57.
i1
For any value of ș such that 0 ș t 1 we then have that
P¡reject H 0 | 2¢ P 2 ! i1 X i D 2.99 Ÿ20 | 2
10
P
2 ! i1 X i
D 2 Ÿ20
.99
|2
2
2
tP
2 ! i1 X i
D 2.99 Ÿ20 | 2
2
10
10
t P¡D 2 Ÿ20 D 2.99 Ÿ20 ¢
since 2 t 1 i.e. D 2.99 Ÿ20 t
D 2.99 Ÿ20
2
by th.8.3.3
0. 01.
The maximum value of the probability of a type I error for all values of ș such that
0 ș t 1 is therefore 0.01 i.e. the size of the test is 0.01.
Let
ȕ P¡ type II error ¢ P¡ do not reject H 0 | H 0 is not correct ¢.
Obviously we would like the probability of a type II error i.e. ȕ to be as small as possible
subject to the condition that the probability of a type I error must be small.
But
1 " ȕ P¡ reject H 0 | H 0 is not correct ¢
i.e. we would like to make 1 " ȕ to be as big as possible.
321
[Chapter 12]
ƒDefinition 12.1.4 :
Let ʌŸș P¡ reject H 0 | ș¢ for all ș ȍ.
The function = defined for all possible values of 2, is called the power function of the
test.
NOTE that the power function is always the probability to reject the null hypothesis.
For ș ȍ 0 we have that
=Ÿ2 P¡ reject H 0 | 2¢ for 2 ( 0
i.e. it is the probability of a type I error and we want this as small as possible for all values
of 2 which belong to ( 0 .
For ș ȍ " ȍ 0 we have that
=Ÿ2 P¡ reject H 0 | 2¢ for 2 ( " ( 0
i.e. it is the probability to reject H 0 when in fact H 0 is not true i.e. it is the probability
of a correct decision i.e. we want this as close to 1 as possible for all values of 2 which
belong to ( " ( 0 .
An ideal test would therefore be a test for which
=Ÿ2 0 for all 2 ( 0
1 for all 2 ( " ( 0
which means that we would always make the correct decision i.e never reject H 0 if it is
true and always reject H 0 if it is not true.
One would therefore always try to construct the test of an hypothesis so that the power
function of the test is as close as possible to the power function of the ideal test.
EXAMPLE 12.1.3 :
Continuation of Example 12.1.2.
Suppose that X 1 , X 2 , T , X 10 are independent exponential random variables with
parameter 2. Let ( 0 £2 | 0 2 t 1¤ and H 0 : 2 ( 0 and H A : 2 ( " ( 0 .
10
Suppose that the test is to reject H 0 if 2 ! x i D 2.99 Ÿ20 37. 57 where x 1 , x 2 , T , x 10
i1
are the observed values of X 1 , X 2 , T , X 10 .
322
Then
=Ÿ2
[Chapter 12]
P¡reject H 0 | 2¢
P 2 ! i1 X i D 2.99 Ÿ20 | 2
10
2 ! i1 X i
D 2 Ÿ20
.99
|2
2
2
10
P
P D 2 Ÿ20 37. 57 | 2 .
2
by th.8.3.3
For different values of 2 this probability is given in the table below.
POWER FUNCTION OF TEST : n 10, Į 0. 01.
2:
0. 3
0. 6
1. 0 1. 5
2. 0
2. 5
3. 0
37. 57 125. 2
62. 6
37. 6 25. 0
18. 8
15. 0
12. 5
2
=Ÿ2
. 00001 . 00001 0. 01 0. 2014 0. 5349 0. 7764 . 8978
Note that although =Ÿ2 t 0. 01 for all 2 ( 0 , the probability is only 0.2014 to reject
H 0 even if 2 1. 5 i.e. if the expected lifetime of the new type of component is 50%
bigger than that of the standard component. For 2 1. 5 there is an 80% probability of
not rejecting the null hypothesis.
The question now is if we can improve on the test to come closer to that of the ideal test.
One possibility is to increase the size ot the test i.e. to allow a bigger probability of
rejecting H 0 even if H 0 is true.
10
Suppose that H 0 will be rejected if 2 ! x i Ȥ 2.95 Ÿ20 31. 41 where x 1 , x 2 , T , x 10 are
the observed values of X 1 , X 2 , T , X 10 .
Then
=Ÿ2 P¡reject H 0 | 2¢
i1
P 2 ! i1 X i D 2.95 Ÿ20 | 2
10
2 ! i1 X i
D 2 Ÿ20
.95
|2
2
2
10
P
P D 2 Ÿ20 31. 41 | 2 .
2
by th.8.3.3
323
[Chapter 12]
For different values of 2 this probability is given in the table below.
POWER FUNCTION OF TEST : n 10, Į 0. 05.
2:
0. 3
0. 6
1. 0
1. 5
2. 0
2. 5 3. 0
31. 41 104. 7
52. 4
31. 41 20. 9 15. 7 12. 6 10. 5
2
=Ÿ2
. 00001 . 00001 0. 05 0. 397 0. 735 . 898 . 958
The size of the test ( i.e. the maximum value of the probability to reject H 0 if in fact H 0
is true) is now 0.05 in stead of 0.01 as before. For 2 1. 5 the probability to reject H 0
is now 0.397 compared to 0.2014 before. The ”cost” of this improvement of the power of
the test when H 0 is not true is to increase the probability of a type I error.
Another way to improve the power of the test is to take more observations. Suppose we
make 20 observations i.e. X 1 , X 2 , T , X 20 are independent exponential random variables
with parameter ș.
20
Suppose that H 0 will be rejected if 2 ! x i D 2.95 Ÿ40 55. 76 where x 1 , x 2 , T , x 20
i1
are the observed values of X 1 , X 2 , T , X 20 .
Then
=Ÿ2 P¡reject H 0 | 2¢
P 2 ! i1 X i D 2.95 Ÿ40 | 2
20
2 ! i1 X i
D 2 Ÿ40
.95
|2
2
2
20
P
P D 2 Ÿ40 55. 76 | 2 .
2
by th.8.3.3
For different values of 2 this probability is given in the table below.
POWER FUNCTION OF TEST : n 20, Į 0. 05.
2:
0. 3
0. 6
1. 0
1. 5
2. 0
2. 5 3. 0
55. 76 186
92. 9
55. 76 37. 2 27. 9 22. 3 18. 6
2
=Ÿ2
. 00001 . 00001 0. 05 0. 606 0. 926 . 991 . 999
Note that for 2 1 i.e. when H 0 is not true, the probability of rejecting is much higher
than before.
324
[Chapter 12]
The three power functions above can be represented graphically as follows.
1
power
0.8
0.6
0.4
0.2
0
0
1
2
3
theta 4
Note that for ș 1 i.e. ș ȍ 0 the power of the test is much higher for n 20 than
for n 10.
The question arises, given Į 0. 05 , how many observations should be made such that
the power is at least 0.95 when ș 2 i.e. if ș 2 we want to be at least 95% certain
that H 0 will be rejected.
If Į 0. 05 and n observations are made then H 0 should be rejected if
n
2 ! x i Ȥ 2.95 Ÿ2n .
i1
Then
=Ÿ2
P¡reject H 0 | 2¢
P 2 ! i1 X i D 2.95 Ÿ2n | 2
n
2 ! i1 X i
D 2 Ÿ2n
.95
|2
2
2
n
P
P D 2 Ÿ2n D 2.95 Ÿ2n
2
.
by th.8.3.3
We then have to determine the smallest value for n such that
D 2 Ÿ2n
=Ÿ2 P D 2 Ÿ2n .95
u 0. 95.
2
This has to be done in a trial and error basis i.e. try different values of n until we find the
smallest value of n. The table below show an example of such calculations starting with
n 25.
325
[Chapter 12]
DETERMINING THE SMALLEST n SUCH THAT ʌŸ2 u 0. 95
D 2.95 Ÿ2n
n D 2.95 Ÿ2n
=Ÿ2
Decision
2
25
67. 5
33. 8
0. 962
D n t 25
24
65
32. 5
0. 957
D n t 24
23
62
31. 0
0. 955
D n t 23
22
60
30. 0
0. 946
D n 23
CRITICAL REGION
The choice of the critical region i.e. those values of the observed values x 1 , x 2 , T , x n for
which the null hypothesis H 0 is rejected determines the values of Į, the significance
level of the test, and ʌŸș the power of the test. In general a decision is first made about
the significance level of the test and then the test is chosen in such a way that the power of
the test is as big as possible if H 0 is not true.
As a general rule the best choice of C , the critical region, consists of those observed
values x 1 , x 2 , T , x n which are very unlikely if H 0 is true and is as likely as possible if
H 0 is not true. To find such values it is necessary to pay particular attention to the
formulation of the null hypothesis and the alternative hypothesis. This usually determines
the shape of the test e.g. whether it is a one-sided or a two-sided test.
As an example let us consider the case where X 1 , X 2 , T , X n is a random sample from a
normal population with expected value ȝ and variance ı 2 where ı 2 is known.
We will consider three possible cases.
(i) H 0 : 6 t 6 0 ; H A : 6 6 0 .
In this case ȍ 0 £ȝ | ȝ t ȝ 0 ¤ and ȍ " ȍ 0 £ȝ | ȝ ȝ 0 ¤.
§
The maximum likelihood estimator of ȝ is 6 X where X is the sample mean.
L
Choose ȝ 1 ȍ 0 i.e. ȝ 1 t ȝ 0 and choose ȝ 2 ȍ " ȍ 0 i.e. ȝ 2 ȝ 0 .
Suppose that x is the observed value of X. Let f X Ÿ. | 6 be the density function of X
given the value of 6. Then f X Ÿx | 6 1 gives us the likelihood of observing x in case
6 6 1 i.e. when H 0 is true. Similarly f X Ÿx | 6 2 gives us the likelihood of observing x
f Ÿx | 6 2
gives us the ratio of the likelihood
if 6 6 2 i.e. when H A is true. The ratio X
f X Ÿx | 6 1
of observing x when 6 6 2 to the likelihood of observing x when 6 6 1 .
326
[Chapter 12]
Hence if this ratio is much greater than 1 it is much more likely to observe x if H A is
true than when H 0 is true. If the ratio is much less than 1 it is much more likely to
observe x if H 0 is true than when H A is true. We will therefore reject H 0 if the ratio
is big and not reject H 0 if the ratio is small. Since X L NŸ6, @ 2 /n we get that
f X Ÿx | 6 2
f X Ÿx | 6 1
1
2=
1
2=
e
"
n
2@ 2
e
"
n
2@ 2
"
n
@
n
@
n
exp "
1
2
2 @n
Ÿx " 6 2
2
exp "
1
2
2 @n
Ÿx " 6 1
2
Ÿx"6 2 2 "Ÿx"6 1
2
¡Ÿ6 22 "6 21 "2xŸ6 2 "6 1 ¢
¡Ÿ6 2 "6 2 ¢
n
¡2xŸ6 "6 ¢
2
1
e 2@ 2 2 1 e 2@ 2
Since 6 2 " 6 1 0 the ratio will be big if x is large i.e. we will reject H 0 if x is large.
The critical region will therefore be to reject H 0 if x is greater than some critical value
c i.e. the form of the test is to reject H 0 if x c. The critical value is determined by the
significance level or size of the test. Suppose that we want the probability of a type I
error to be t ) for all 6 ( 0 i.e. when 6 t 6 0 .
For all ȝ t ȝ 0
P¡reject H 0 | 6¢ P¡X c | 6¢
P
X"6
@
n
P Z
tP Z
if
c " 60
@
n
c"6
@
n
c"6
@
n
c " 60
@
n
|6
|6
| 60
where Z L NŸ0, 1
since 6 t 6 0
)
z 1") or c 6 0 @
n
z 1") .
The test of size Į for H 0 : ȝ t ȝ 0 vs. H A : ȝ ȝ 0 is therefore to reject H 0 if
x ȝ 0 @n z 1"Į .
(12.1.1)
This test is called a one-sided test of size ) .
327
[Chapter 12]
The power function of the test is given by
=Ÿ6 P¡X c | 6¢
P X 60 P
X"6
@
n
@
n
60 " 6
60 " 6
1"o
z 1") | 6
@
n
@
n
z 1") | 6
z 1") .
NOTE : Note that ʌŸȝ gets bigger as ȝ gets bigger and tend to 1 as n tends to . .
The power is Į if ȝ ȝ 0 .
For any value of ȝ ȝ 0 , ʌŸȝ gets bigger as n gets bigger.
For any value of ȝ ȝ 0 , ʌŸȝ gets smaller as ı gets bigger.
For any value of 6, =Ÿ6 gets bigger as ) gets bigger.
Now suppose that when ȝ ȝ 2 ȝ 0 we want the power of the test to be equal to 1 " ȕ.
Then
=Ÿ6 2 1 " * 1 " o
i.e.
i.e.
i.e.
i.e.
60 " 62
@
n
60 " 62
@
n
z 1")
z 1") z * "z 1"*
n Ÿ6 0 " 6 2
"z 1") " z 1"*
@
@Ÿz 1") z 1"*
n 62 " 60
n
@ 2 Ÿz 1") z 1"*
Ÿ6 2 " 6 0 2
2
.
(12.1.2)
NOTE : The bigger ı 2 is, the bigger is n.
The smaller Į is, the bigger is n.
The bigger 1 " ȕ is, the bigger is n.
The closer 6 2 is to 6 0 , the bigger n is.
(ii) H 0 : 6 u 6 0 ; H A : 6 6 0 .
f Ÿx | 6 2 ( " ( 0
In this case the ratio X
will be big if x is small i.e. H 0 will be
f X Ÿx | 6 1 ( 0
rejected if x c where c is determined in such a way that the size of the test is ) .
Note that this is also a one-sided test but in this case we reject if x is too small.
328
[Chapter 12]
(iii) H 0 : 6 6 0 ; H A : 6 p 6 0
Suppose that ȝ 1 is a value of ȝ different from ȝ 0 . Then
f X Ÿx | 6 1
f X Ÿx | 6 0
1
2=
1
2=
@
n
@
n
exp "
1
2
2 @n
Ÿx " 6 1
2
exp "
1
2
2 @n
Ÿx " 6 0
2
e
"
n
2@ 2
Ÿx"6 1 2 "Ÿx"6 0
e
"
n
2@ 2
¡Ÿ6 21 "6 20 "2xŸ6 1 "6 0 ¢
e
"
n
2@ 2
¡Ÿ6 21 "6 20 ¢
n
e 2@ 2
2
¡2xŸ6 1 "6 0 ¢
In this case, since ȝ 1 " ȝ 0 can be either positive or negative, we see that the ratio will be
big if x is big if ȝ 1 " ȝ 0 0 and the ratio will also be big if x is small when
ȝ 1 " ȝ 0 0 . To provide for both possibilities we will have to reject either if x is too
small or when it is too big. This is known as a two-sided test.
If H 0 is true then ȝ ȝ 0 and X is normally distributed with expected value ȝ 0 and
variance ı 2 /n . This means that if H 0 is true the most likely values of X are close to ȝ 0
. Therefore we will reject if the difference between x and ȝ 0 is greater than some
critical value c i.e. when |x " 6 0 | c. For the significance level of the test to be Į we
must have that
) =Ÿ6 0 P |X " 6 0 | c | 6 0
P
X " 60
@
n
P |Z| i.e.
c
@
n
c
where Z L NŸ0, 1
@
n
nc
)
@ z 1" 2 i.e. c | 60
@
n
z 1" )2 .
The test of significance level ) for H 0 : 6 6 0 vs. H A : 6 p 6 0 is therefore to reject
H 0 if |x " 6 0 | @n z 1" )2 .
(12.1.3)
This test is called a two-sided test.
329
[Chapter 12]
The power function of the test is given by
=Ÿ6
P |X " 6 0 | @
n
z 1" )2 | 6
P X " 60 "
@
n
z 1" )2 | 6 P X " 6 0 @
n
P X 60 "
P
X"6
@
n
60 " 6
@
n
@
n
@
n
.
" z 1" )2 | 6
" z 1" )2
P
P Z z 1" )2 | 6
z 1" )2 | 6
X"6
@
n
60 " 6
@
n
60 " 6
@
n
z 1" )2
z 1" )2 | 6
where Z L NŸ0, 1
12.1.4
For 6 6 0 we have that - 0 and from
=Ÿ6
@
n
z 1" )2 | 6 P X 6 0 60 " 6
60 " 6
P Z
Let - @
n
P Z - " z 1" )2
12.1.4
P Z - z 1" )2
it then follows that
where Z L NŸ0, 1
` P Z - " z 1" )2 , since P Z - z 1" )2
)/2 i.e. very small.
Suppose that for some given value 6 1 6 0 we want the power of the test to be 1 " *.
Then
=Ÿ6 1 1 " * ` P Z - " z 1" )2
`P Z
i.e.
i.e.
i.e.
z 1"* `
n Ÿ6 0 " 6 1
@
60 " 61
@
n
60 " 61
@
n
" z 1" )2
" z 1" )2
` z 1"* z 1" )2
n `
@ 2 z 1" )2 z 1"*
Ÿ6 0 " 6 1
2
2
. Ÿ12. 1. 5 .
330
[Chapter 12]
For 6 6 0 we have that - =Ÿ6
P Z - " z 1" )2
60 " 6
@
n
0. In this case it follows from
P Z - z 1" )2
12.1.4
that
where Z L NŸ0, 1
` P Z - z 1" )2 , since P Z - " z 1" )2
)/2 i.e. very small.
1 " P Z t - z 1" )2
Suppose that for some given value 6 1 6 0 we want the power of the test to be 1 " *.
Then
=Ÿ6 1 1 " * ` 1 " P Z t - z 1" )2
` 1"P Z t
i.e.
60 " 61
i.e.
@
n
z 1" )2
n Ÿ6 0 " 6 1
@
@
n
z 1" )2
` z * "z 1"*
` "z 1"* " z 1" )2
n `
i.e.
60 " 61
@ 2 z 1" )2 z 1"*
2
.
Ÿ6 0 " 6 1 2
which is the same result as obtained in Ÿ12. 1. 5 .
EXAMPLE 12.1.4 :
Suppose that X 1 , X 2 , T , X n is a random sample from a normal population with expected
value 6 and variance @ 2 10. To test the hypothesis H 0 : 6 t 5 vs. the alternative
H A : 6 5 with a test of size ) 0. 01, we will reject H 0 if
x 60 @
n
z 1")
see Ÿ12. 1. 1
i.e. if x 5 10
n
2. 326.
If it is required that the probability to reject H 0 must be 0. 99 in case 6 8, then by
Ÿ12. 1. 2 we must have that
@ 2 Ÿz 1") z 1"*
n
Ÿ6 2 " 6 0 2
2
10Ÿ2. 326 2. 326
Ÿ8 " 5 2
i.e. n 25 observations are required.
2
24. 05
331
[Chapter 12]
To test the hypothesis H 0 : 6 5 vs. the alternative H A : 6 p 5 with significance level
) 0. 01, we will reject H 0 if
|x " 6 0 | @
n
see Ÿ12. 1. 3
z 1" )2
i.e. if |x " 5| 10
n
2. 576.
If it is required that the probability to reject H 0 must be 0. 99 in case 6 8, then by
Ÿ12. 1. 5 we must have that
n
@ 2 z 1" )2 z 1"*
Ÿ6 0 " 6 1
2
2
10Ÿ2. 576 2. 326
Ÿ8 " 5 2
2
26. 70
i.e. n 27 observations are needed.
12.2 P-VALUES AND THE USE OF CONFIDENCE INTERVALS FOR
TESTING HYPOTHESES
P-values
It was established in the previous section that the choice of the significance level of a test
has a very strong and direct influence on the power of the test. Furthermore it must be
noticed that there is no prescription about the choice of the significance level i.e. in many
cases it could be a very personal choice. The common use of 0.05 and 0.01 is merely a
choice for convenience. Strictly speaking a reasonable choice should be based on the
consequences of a type I error versus the consequences of a type II error. This is very
seldom known and not easy to establish. For this reason it has become common practice
to report the P-value of a test and leave it to the user or reader to make his own decision.
ƒDefinition 12.2.1a :
The P-value for any test is the smallest value of the significance level for which the null
hypothesis would be rejected based on the observed values of the random variables.
If the P-value of a test is 0.03 and the hypothesis would be rejected for any value of
) u 0. 03, but would not be rejected for any value of ) 0. 03. This tells the reader that
the test would be rejected for ) 0. 05 and would not be rejected at ) 0. 01. If the
P-value is 0.005 then it indicates to the reader that the hypothesis would be rejected for
both ) 0. 05 and ) 0. 01. Since the significance level of a test is the probability to
reject H 0 if in fact H 0 is correct, a very small P-value therefore indicates that the
observed values of the random variables are extremely unlikely if the hypothesis is true.
332
[Chapter 12]
EXAMPLE 12.2.1 :
Suppose that X 1 , X 2 , T , X 10 is a random sample from a normal population. Let the
hypothesis be H 0 : 6 3 vs. H A : 6 p 3. If H 0 is correct, then X " 3 is distributed
s/ n
L
like t with 10-19 degrees of freedom. The critical region for the test is to reject if
x " 3 c and the test will have significance level ) if we reject when
s/ n
x " 3 t 1" ) Ÿ9 . Also note that as c gets bigger the significance level gets smaller.
2
s/ n
Now suppose that the observed values are such that x " 3 0. 883. For this observed
s/ n
value and critical value c we would reject if . 883 c. To make the significance level
as small as possible, make c as big as possible and still reject for that value of c i.e. we
would reject if x " 3 0. 883, because if c . 883 we would not reject H 0 . The
s/ n
P-value is therefore the significance level of the test if we reject when x " 3 0. 883
s/ n
i.e. P P
X"3
s/ n
L
0. 883
2 • 0. 2 0. 4
as can be determined from the t table in Bain and Engelhardt p 608.
The P-value gives an indication of the probability to get such a big deviation of X from 3
if the hypothesis was true.
333
[Chapter 12]
The use of confidence intervals for testing hypotheses.
JTheorem 12.2.1 :
Suppose that x 1 , x 2 , T , x n are the observed values of the random variables X 1 , X 2 , T , X n
and that RŸx 1 , x 2 , T , x n is a 100Ÿ1 " ) % confidence region for the parameter 2. A
test of significance level ) for the hypothesis H 0 : 2 2 0 vs. H A : 2 p 2 0 is to reject
H 0 if 2 0 RŸx 1 , x 2 , T , x n .
Proof :
Since RŸx 1 , x 2 , T , x n is a 100Ÿ1 " ) % confidence region for the parameter 2, we
have that P 2 RŸX 1 , X 2 , T , X n | 2 1 " ) for all 2. The significance level of the
test to reject H 0 if 2 0 RŸx 1 , x 2 , T , x n is given by
P reject H 0 | H 0 is true P 2 0 RŸX 1 , X 2 , T , X n | 2 0
1 " P 2 0 RŸX 1 , X 2 , T , X n | 2 0
1 " Ÿ1 " ) for all 2 0 (
).
ſ
EXAMPLE 12.2.2 :
Suppose that X 1 , X 2 , T , X 16 is a random sample from a normal population. Let the
hypothesis be H 0 : 6 3 vs. H A : 6 p 3. and suppose that this hypothesis is to be
tested with significance level ) 0. 05. Suppose that the observed values are such that
x 5 and s 2 9. The 100Ÿ1 " 0. 05 % 95% confidence interval for 6 is then
given by
x o t 1". 052 Ÿn " 1 .
s
n
5 o 2. 131 •
3
16
i.e. the interval is Ÿ3. 4, 6. 6 .
Since this does not include the value 3 the hypothesis that 6 3 is rejected with
significance level ) 0. 05.
12.3 TESTS OF HYPOTHESES FOR SOME STANDARD PROBLEMS.
RANDOM SAMPLES FROM NORMAL POPULATIONS : ONE SAMPLE
Suppose that X 1 , X 2 , T , X n is a random sample from a normal population with expected
value 6 and variance @ 2 . Let X be the sample mean and s 2 the sample variance and
let x and s 2 be the observed values of X and s 2 .
L
L
334
[Chapter 12]
(a) Test for H 0 : 6 6 0 vs. H A : 6 p 6 0 : @ 2 known
In this case X L NŸ6 0 , @ 2 /n if H 0 is true and therefore
true. If we reject H 0 when
P reject H 0 | H 0 is true
x " 60
@/ n
P
X " 60
L NŸ0, 1 if H 0 is
@/ n
z 1" )2 the significance level of the test is
X " 60
@/ n
z 1" )2 | 6 0
P |Z| z 1" )2
where Z L NŸ0, 1
).
We have previously derived the result (see (12.1.5)) that if we want the power of
the test to be 1 " * for 6 6 1 , then the number of observations must be
n`
@ 2 z 1" )2 z 1"*
Ÿ6 0 " 6 1
2
2
.
NOTE : We will only give the two-sided tests. The necessary adjustment must in each
case be made for one-sided hypotheses and the corresponding one-sided tests.
(b) Test for H 0 : 6 6 0 vs. H A : 6 p 6 0 : @ 2 not known.
In this case
x " 60
s/ n
X " 60
L tŸn " 1 if H 0 is correct. If we reject H 0 when
s/ n
L
t 1" )2 Ÿn " 1 the significance level of the test is
P reject H 0 | H 0 is true
P
X " 60
s/ n
t 1" )2 Ÿn " 1 | 6 0
L
P |T| t 1" )2 Ÿn " 1
).
where T L tŸn " 1
335
[Chapter 12]
The power function of this test is given by
=Ÿ6, @ 2 P reject H 0 | 6, @ 2
P
X " 60
s/ n
t 1" )2 Ÿn " 1 | 6, @ 2
L
P
X " 6 6 " 60
s/ n
t 1" )2 Ÿn " 1 | 6, @ 2
L
X"6
P
@
n
Ÿn"1 s 2
L
@2
P
ZV
Ÿn"1
6"6 0
@
n
t 1" )2 Ÿn " 1 | 6, @ 2
/Ÿn " 1
t 1" )2 Ÿn " 1 | 6, @ 2
6 " 60
.
@/ n
is called a non-central t-distribution with n " 1 degrees of
where Z L NŸ0, 1 , V L D 2 Ÿn " 1 , Z and V are independent and - The distribution of
ZV
Ÿn"1
freedom and parameter of non-centrality -. This distribution can be used to determine
the power of the t-test and to determine n that will give us a certain power for specific
values of 6 and @ 2 . Table 10 of Biometrika Tables for Statisticians Vol I by Pearson
and Hartly gives the power of the test for values of ) 0. 05 and ) 0. 01, different
degrees of freedom and
C
6 " 60
@/ n
• 1 .
2
EXAMPLE 12.3.1 :
Suppose that X 1 , X 2 , T , X 10 is a random sample from a normal population with expected
value 6 and variance @ 2 . Let the hypothesis be H 0 : 6 5 vs. H A : 6 p 5 and
suppose that this hypothesis is to be tested with significance level ) 0. 05.
In this case
X"5
s/ n
L tŸ9 if H 0 is correct. If we reject H 0 when
L
x"5
s/ n
t 1" .052 Ÿ9 2. 262 the significance level of the test is 0.05.
336
[Chapter 12]
If 6 8 and @ 2 10 then the power of the test can be determined from table 10 in
Biometrika Tables vol I for ) 0. 05, degrees of freedom 9 and
8"5
C
• 1 2. 12. From table 10 we get the power as 0.76.
2
10 / 10
Now suppose that we want to determine the number of observations required to ensure
that the power is at least 0.95 if 6 8 and @ 2 10. The calculations has to be done on
a trial and error basis, an example of which is shown in the table below.
DETERMINATION OF n TO ENSURE POWER OF .95
FOR 6 8 AND @ 2 10.
n
C
15 2. 6
=Ÿ8, 10
Decision
0. 93
D n 15
18 2. 84 0. 96
D n t 18
17 2. 77 0. 95
D n t 17
16 2. 68 0. 95
D n 17
Table 8 p612 in the book of Bain and Engelhardt gives an indication of the values of n
that are required. For the example above we have that 2) 0. 05, d 3 . 95 and
10
from the table we find that n 18 observations are required.
(c) Test for H 0 : @ 2 @ 20 vs. H A : @ 2 p @ 20 .
Ÿn " 1 s 2
L
If H 0 is correct we have that
L D 2 Ÿn " 1 and if we reject H 0 whenever
@ 20
Ÿn " 1 s 2
Ÿn " 1 s 2
2)
D
D 21" )2 Ÿn " 1 the significance level of the
"
1
or
when
Ÿn
2
@ 20
@ 20
test will be ).
337
[Chapter 12]
SAMPLES FROM NORMAL POPULATIONS : TWO INDEPENDENT SAMPLES
Suppose that X 11 , X 12 , T , X 1n 1 is a random sample of size n 1 from a normal population
with expected value 6 1 and variance @ 21 . Let X 1 be the sample mean of the X 1i ’s i.e.
2
n1
n1
X 1i
! i1
! i1
ŸX 1i " X 1
2
2
X1 . Let s 1 be the sample variance of the X 1i ’s i.e. s 1 .
n1
n1 " 1
L
L
Suppose that X 21 , X 22 , T , X 2n 2 is an independent random sample of size n 2 from a
normal population with expected value 6 2 and variance @ 22 . Let X 2 be the sample
n2
X 2i
! i1
. Let s 22 be the sample variance of the X 2i ’s i.e.
mean of the X 2i ’s i.e. X 2 n2
s 22 n2
! i1
ŸX 2i " X 2
n2 " 1
L
L
2
.
(a) Test for H 0 : 6 1 " 6 2 d vs. H A : 6 1 " 6 2 p d : @ 21 @ 22 @ 2
Ÿn 1 " 1 s 21
L
In this case X 1 L NŸ6 1 , @ 2 /n 1 and is independent of
L D 2 Ÿn 1 " 1 .
@2
Ÿn 2 " 1 s 22
L
They are independent of X 2 L NŸ6 2 , @ 2 /n 2 and
L D 2 Ÿn 2 " 1 and
@2
the last two also are independent random variables.
Hence ŸX 1 " X 2 L N 6 1 " 6 2 ,
Ÿn 1 " 1 s 21
@2
L
Therefore Z @2
n1
@2
n2
and is independent of
Ÿn 2 " 1 s 22
@2
L
L D 2 Ÿn 1 n 2 " 2 .
ŸX 1 " X 2 " Ÿ6 1 " 6 2
@2
n1
@2
n2
L NŸ0, 1
Ÿn 1 " 1 s 21
and is independent of V @2
L
Ÿn 2 " 1 s 22
L
@2
L D 2 Ÿn 1 n 2 " 2 .
ŸX 1 "X 2 "Ÿ6 1 "6 2
Hence T Z
V
n 1 n 2 "2
@
1
@
1
n1
n12
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
L
n 1 n 2 "2
L
ŸX 1 " X 2 " Ÿ6 1 " 6 2
1
n1
is distributed like t with n 1 n 2 " 2 degrees of freedom.
1
n2
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
L
n 1 n 2 "2
L
338
[Chapter 12]
If H 0 is true
ŸX 1 " X 2 " d
1
n1
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
1
n2
L
n 1 n 2 "2
L tŸn 1 n 2 " 2 .
L
A test of significance level ) is to reject H 0 if
Ÿx 1 " x 2 " d
1
n1
1
n2
Ÿn 1 "1 s 21 Ÿn 2 "1 s 22
n 1 n 2 "2
t 1" )2 Ÿn 1 n 2 " 2 .
(b) Test for H 0 : 6 1 " 6 2 d vs. H A : 6 1 " 6 2 p d : @ 21 p @ 22 and n 1 and n 2 large.
If H 0 is correct and n 1 v . and n 2 v . we have that
ŸX 1 " X 2 " d
s 21
L
n1
s 22
L
n2
d
v Z where Z L NŸ0, 1
For large values of n 1 and n 2 a test of H 0 with approximate significance level ) is
therefore to reject H 0 if
Ÿx 1 " x 2 " d
s 21
n1
s 22
n2
z 1" )2 .
339
[Chapter 12]
(c) Test for H 0 : 6 1 " 6 2 d vs. H A : 6 1 " 6 2 p d : @ 21 p @ 22 and n 1 or n 2 small.
ŸX 1 " X 2 " d
If H 0 is correct then
is approximately distributed like t with v degrees
2
2
s1
L
n1
s2
L
n2
of freedom where
s 21
n1
v
s 21
n1
2
n 1 "1
2
s 22
n2
.
2
s 22
n2
n 2 "1
A test with approximate significance level ) is to reject H 0 if
Ÿx 1 " x 2 " d
s 21
n1
s 22
n2
t 1" )2 Ÿv .
@ 21
@ 22
(d) Test for H 0 :
r vs. H A :
@ 21
@ 22
pr
Ÿn 1 " 1 s 21
L
In this case
Ÿn 2 " 1 s 22
L D 2 Ÿn 1 " 1 and
@ 21
L
@ 22
L D 2 Ÿn 2 " 1 and they
are independent since the samples are independent. Therefore
Ÿn 1 "1 s 21
Ÿn 2 "1 s 22
@ 21
@ 22
L
n1 " 1
L
µ
n2 " 1
s 21
If H 0 is true then
L
s 22
s 21
L
s 22
L
µ
@ 21
@ 22
~FŸn 1 " 1, n 2 " 1 .
µ r L FŸn 1 " 1, n 2 " 1 . A two-sided test of significance level
L
) is therefore to reject H 0 if
s 21
s 22
µ r f )2 Ÿn 1 " 1, n 2 " 1 or if
s 21
s 22
µ r f 1" )2 Ÿn 1 " 1, n 2 " 1 .
340
[Chapter 12]
SAMPLES FROM NORMAL POPULATIONS: PAIRED OBSERVATIONS
Suppose that a random sample of n pairs of observations ŸX 1i , X 2i is drawn from a
population with a bivariate normal distribution with expected values 6 1 and 6 2 and
variances @ 21 and @ 22 and covariance @ 12 .
Then Y i X 1i " X 2i for i1,2,3,T,n
are independent normal random variables with expected value 6 1 " 6 2 and variance
@ 21 @ 22 " 2@ 12 .
Let Y be the sample mean and s 2 the sample variance of the Y i ’s and let y and s 2 be
L
the observed values of Y and s 2 .
Y " Ÿ6 1 " 6 2
Then
s/ n
L
L tŸn " 1 .
L
Let the hypothesis be H 0 : 6 1 " 6 2 d vs. H A : 6 1 " 6 2 p d.
If H 0 is true Y " d L tŸn " 1 .
s/ n
L
A test of significance level ) is to reject H 0 if
y"d
s/ n
t 1" )2 Ÿn " 1 .
TEST FOR H 0 : p p 0 vs. H A : p p p 0 FOR THE PROBABILITY OF SUCCESS:
ONE SAMPLE.
Let X be the total number of successes in n independent trials of a Bernoulli experiment
with probability of success p.
(a) n Large
§
Let p X
n , the proportion of successes. If H 0 is true it then follows that
L
§
p "p 0
L
p 0 .Ÿ1"p 0
n
d
v Z where Z L NŸ0, 1 .
A test with approximate significance level ) is to reject H 0 when
§
p " p0
p 0 .Ÿ1"p 0
n
z 1" )2 .
341
[Chapter 12]
(b) n Small
A test of approximate significance level ) is to reject H 0 when p 0 is not an element of
the 100Ÿ1 " ) % confidence interval for p as determined with the help of Table 41 in
Biometrika Tables.
TEST FOR H 0 : p 1 " p 2 d vs. H A : p 1 " p 2 p d WHERE p 1 AND p 2 ARE
PROBABILITIES OF SUCCESS IN TWO SAMPLES.
Suppose that X 1 L BINŸn 1 , p 1 independently of X 2 L BINŸn 2 , p 2 .
§
§
X2
1
Let p 1 X
n 1 and p 2 n 2 . If n 1 v . and n 2 v . we get that if H 0 is correct that
L
L
§
§
p1 " p2
L
§
§
p 1 . 1" p 1
L
n1
" Ÿd
d
L
L
§
§
p 2 . 1" p 2
L
n2
v Z where Z L NŸ0, 1 .
L
For large values of n 1 and n 2 we then have that a test which has approximate
significance level ) is to reject H 0 if
§
§
Ÿ p 1 " p 2 " Ÿd
§
§
. p 1 Ÿ1" p 1
n1
§
§
p 2 .Ÿ1" p 2
n2
z 1" )2 .
For n 1 and/or n 2 small the hypothesis H 0 : p 1 " p 2 0 vs. H A : p 1 " p 2 p 0 can be
tested using a so-called conditional test as given in Bain and Engelhardt p427.
342
[Chapter 12]
TEST FOR H 0 : 6 6 0 vs. H A : 6 p 6 0 FOR THE PARAMETER OF A
POISSON DISTRIBUTION.
Suppose that X 1 , X 2 , T , X n is a random sample from a population with a Poisson
distribution with parameter 6.
Let X be the sample mean.
(a) n Large
As n v . we have that if H 0 is true that
X " 60
60
n
d
v Z where Z L NŸ0, 1 .
For large n a test for which the significance level is approximately ) is to reject H 0 if
x " 60
60
n
z 1" )2 .
(b) n Small
If n is small a test with significance level approximately ) is to reject H 0 if 6 0 is not
in the confidence interval given by Table 40 of Biometrika Tables For Statisticians vol. I
by Pearson and Hartly.
Download