The major use of Fisher*s information is that it provides the

advertisement
FISHER’S INFORMATION
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
The major use of Fisher’s information is that it provides the asymptotic variance of the
maximum likelihood estimate. If  ML is the maximum likelihood estimate in a problem
in which  is the only unknown parameter, then the limiting (large-sample distribution)

1 
of  ML is normal N   ,
 , where I() is Fisher’s information for .
I    

There is one minor exception to this result. It is required that  be the interior
point of a parameter space of finite dimension. The exceptions come about for
situations in which the parameter limits the range of the random variables. For
example, a sample from the uniform law on [0, ] is one in which we cannot
make use of Fisher’s information for  ML .
Because  ML is a consistent estimate of , it follows that I(  ML ) is a consistent estimate of


1
 . This is
I(). Thus, we actually use the limiting distribution  ML ~ N   ,

I ˆ ML 


1
generally used to make a 1 -  confidence interval for  as  ML ± z / 2
.
I ˆ
 
 
ML
There is a natural correspondence between confidence intervals and hypothesis tests.
This interval can be used to make a test of H0:  = 0 versus H1:   0 ; the null
hypothesis H0 is to be accepted if and only if 0 is inside the interval. Specifically, H0
1
is rejected if and only if ˆ ML  0  z/2
.
I ˆ
 
ML
It should be noted that I() is in reciprocal square units to that of . For example, if  is
1
in units of hours, then I() is in units of
.
hour 2
How do we find I()? There are a number of ways. Let f(x) be the likelihood for the
whole problem. Note here that x is used as a vector to represent the entire set of data.
Let S be the score random variable, defining this as
S =
III

log f  x  

Page 1
 gs2011
FISHER’S INFORMATION
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

log f  X   . Of course, this f(x | ) is a function of

the data x and also the parameter. We’ve used the partial derivative symbol  but we
could as well have used d. This is not a material confusion.
In random variable form, this is
Though this S is a random variable, but it is not a statistic; its form involves , which is
unknown.
It can be shown that E S = 0.
 f  x |   dx
Start from
= 1
X
The  is just to remind us that the integral is over the x.

f  x |  dx 
 X

  f  x |  dx
X

In this expression, find

.


1  0

In the second item from the left. . .
 

f  x |  


X  f  x |  dx  X  f  x |   f  x |  dx 

X



log f  x |   f  x |   dx

= ES
Thus, the score random variable has expected value zero.
There are three ways to get I():
(1)
I() = E S2
(2)
I() = Var S
(3)
 2

I() = E   2 log f  X  
 

Generally one way will be somewhat easier than the others.
III
Page 2
 gs2011
FISHER’S INFORMATION
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Here’s a neat example. Suppose that X1, X2, …, Xn are independent random variables,
each N(, 2). Let’s suppose for this example that  is a known value. It’s pretty easy to
show that  ML = X . The method of moments estimate is also ˆ MM = X . Now let’s find
I(). First, get the likelihood for the whole sample:
1
 2  xi  
1
L = 
e 2
i 1  2
n
2
=

1
 n  2 
n/2
e
1 n
2
  xi  
2 2 i 1
Now we need the score random variable:
log L = n log  
S =
n
1 n
2
log  2   2   xi   
2
2 i 1
1 n

1 n
log L   2   2  xi    = 2   xi   
 i 1

2 i 1
1 n
  X i    . This is not a statistic, as it involves
 2 i 1
the unknown parameter . It’s easy to see that E S = 0 here.
In random variable form, this is S =
There are several ways to get I(), all pretty easy.
(1)
(2)
(3)
III
 1 n
  1 n

I() = E S2 = E  2   X i       2   X j     
   j 1

  i 1
n
n
1
n
1
= 4  E  X i     X j    = 4 n 2 = 2 .


 i 1 j 1
1
n
I() = Var S = 4 n 2 = 2 . This is probably the easiest way.


 2

   1 n

I() = E   2 log L  = E  X i    
 2 i

    1
 

   1 n
n
n  
= E X i  2  = 2 .

2 

 
    i 1
Page 3
 gs2011
FISHER’S INFORMATION
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
1
2

. This
I  
n
is also the non-asymptotic variance. It’s also something that we suspected right from the
beginning.
Thus, the asymptotic variance of the maximum likelihood estimate is
Here is another example.
Suppose that x1 , x 2 ,..., x n are known values, all positive. Suppose that Y1, Y2 , ..., Yn are
independent, with Yi ~ N(xi, xi2 2). We can certainly get a method of moments estimate
for . Observe that E Yi = xi, so that
n
n
i 1
i 1
E Yi   xi
n
The method of moments estimate is  MM 
Y
i
i 1
n
x

Y
. As an interesting
x
i
i 1
observation,
2
Var Yi = 2 2

n x
i 1
n
1
1
Var  MM = 2 Var Y = 2 2
x
n x
n
x
2
i
i 1
In what follows we have to worry about two parameters. For the sake of this example,
let’s think of  as known. (It actually won’t matter here.)
The likelihood for Yi is
f(yixi) =
1
xi  2

e
1
2 xi2 2
 yi xi 2
Based on this, we can write the likelihood for the whole problem:
 y x 
1
1
2
 2 i 2 i
 2 2  yi xi  
 1
1
2  i1
xi
2 xi 
e
e
L = 
 =
n
n
/
2
i 1  xi  2


n  2   xi
n
n
2
i 1
III
Page 4
 gs2011
FISHER’S INFORMATION
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
We’ll need to take log L:
n
log L = n log   log  2  
2
1 n  yi  xi 
log xi 


22 i 1
xi2
i 1
n
2
We could get the maximum likelihood estimates for both  and . For now, we’ll just
worry about , as noted above. Clearly we get that estimate by minimizing the sum from
the exponent. Thus we solve
 n  yi  xi 

 i 1
xi2
2

n
2  yi  xi  ( xi )
xi yi  xi2
=

2


xi2
xi2
i 1
i 1
n
 n y
  let
= 2    i      0

 i 1  xi
1 n y
1 n Y
The solution is  ML   i . In random variable form, this is  ML   i . This is
n i 1 xi
n i 1 xi
a very unusual ratio estimate. This has a parallel concept in finite population sampling.
Suppose we wanted to know its asymptotic variance. We need the score random
variable. (Frequently this score random variable is found as part of the routine of getting
the maximum likelihood estimate, but not here.)
S =


1 n  y  xi  2 xi 
1 n  yi
log L   2  i
=
 
2
2 

2 i 1
xi
 i 1  xi

In random variable form, this is
S =
III

1 n  Yi
 
2 
 i 1  xi

Page 5
 gs2011
FISHER’S INFORMATION
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
The easiest way to get I() is as Var S:
 Yi

 1 n Yi 
1 n Var Yi 
    = Var  2   = 4 


 i 1 xi2
i 1  xi

  i 1 xi 
 1
I() = Var S = Var  2

1 n x 2 2
n
= 4 i2 = 2

 i 1 xi
n
2
It follows that the limiting variance of  ML is
.
n
non-asymptotic result as well.
You can actually show that this is a
This is a better (smaller) variance that that for ˆ MM , which was
2
2
an interesting exercise to show that
< 2 2
n
n x
n
x
2
i
2
n2 x 2
n
x
i 1
2
i
. It’s
.
i 1
For cases in which we have a sample, meaning n independent values sampled from the
same distribution, we have I() = n I1(), where I1() is the information in one
observation. We can get this from the score random variable based on one observation,
generally identified as S1.
Another example. Suppose that X1, X2, …, Xn is a sample from the exponential density
f(x) =  e - x I(x  0)
Let’s find the maximum likelihood estimate. Begin with the likelihood
n
L =
 e
n
  xi
=  e
n
   xi
i 1
i 1
It follows that
n
log L = n log    xi
i 1
let

n n
log L    xi  0

 i 1
III
Page 6
 gs2011
FISHER’S INFORMATION
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Therefore  ML =
n
=
n
x
1
. In random variable form, this is  ML =
x
1
. It’s going
X
i
i 1
to be very difficult to get a limiting distribution. Let’s use the asymptotic results, based
on the fact that this is a maximum likelihood estimate.
Because this situation deals with a sample, we will find I() through I() = n I1().
For observation 1, we have log L1 = n log  -  x1. Then
S1 =

1
log L1   x1


In random variable form,
S1 =
1
 X1

Certainly E S1 = 0.
There are several ways to get I1(). Here’s the easiest:
 2

  
  1


I1() = E   2 log L1  = E  
 log L1  = E  
  X 1 


   
   
 

1
 1 
= E 2  = 2

 
It follows then that I() = n I1() =
n
.
2
We can certainly make an approximate 95% confidence interval based on
1
1
ˆ ML  2 SE  ˆ ML  . Specifically, this is
.
 2
X
X n
III
Page 7
 gs2011
Download