FISHER’S INFORMATION IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII The major use of Fisher’s information is that it provides the asymptotic variance of the maximum likelihood estimate. If ML is the maximum likelihood estimate in a problem in which is the only unknown parameter, then the limiting (large-sample distribution) 1 of ML is normal N , , where I() is Fisher’s information for . I There is one minor exception to this result. It is required that be the interior point of a parameter space of finite dimension. The exceptions come about for situations in which the parameter limits the range of the random variables. For example, a sample from the uniform law on [0, ] is one in which we cannot make use of Fisher’s information for ML . Because ML is a consistent estimate of , it follows that I( ML ) is a consistent estimate of 1 . This is I(). Thus, we actually use the limiting distribution ML ~ N , I ˆ ML 1 generally used to make a 1 - confidence interval for as ML ± z / 2 . I ˆ ML There is a natural correspondence between confidence intervals and hypothesis tests. This interval can be used to make a test of H0: = 0 versus H1: 0 ; the null hypothesis H0 is to be accepted if and only if 0 is inside the interval. Specifically, H0 1 is rejected if and only if ˆ ML 0 z/2 . I ˆ ML It should be noted that I() is in reciprocal square units to that of . For example, if is 1 in units of hours, then I() is in units of . hour 2 How do we find I()? There are a number of ways. Let f(x) be the likelihood for the whole problem. Note here that x is used as a vector to represent the entire set of data. Let S be the score random variable, defining this as S = III log f x Page 1 gs2011 FISHER’S INFORMATION IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII log f X . Of course, this f(x | ) is a function of the data x and also the parameter. We’ve used the partial derivative symbol but we could as well have used d. This is not a material confusion. In random variable form, this is Though this S is a random variable, but it is not a statistic; its form involves , which is unknown. It can be shown that E S = 0. f x | dx Start from = 1 X The is just to remind us that the integral is over the x. f x | dx X f x | dx X In this expression, find . 1 0 In the second item from the left. . . f x | X f x | dx X f x | f x | dx X log f x | f x | dx = ES Thus, the score random variable has expected value zero. There are three ways to get I(): (1) I() = E S2 (2) I() = Var S (3) 2 I() = E 2 log f X Generally one way will be somewhat easier than the others. III Page 2 gs2011 FISHER’S INFORMATION IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII Here’s a neat example. Suppose that X1, X2, …, Xn are independent random variables, each N(, 2). Let’s suppose for this example that is a known value. It’s pretty easy to show that ML = X . The method of moments estimate is also ˆ MM = X . Now let’s find I(). First, get the likelihood for the whole sample: 1 2 xi 1 L = e 2 i 1 2 n 2 = 1 n 2 n/2 e 1 n 2 xi 2 2 i 1 Now we need the score random variable: log L = n log S = n 1 n 2 log 2 2 xi 2 2 i 1 1 n 1 n log L 2 2 xi = 2 xi i 1 2 i 1 1 n X i . This is not a statistic, as it involves 2 i 1 the unknown parameter . It’s easy to see that E S = 0 here. In random variable form, this is S = There are several ways to get I(), all pretty easy. (1) (2) (3) III 1 n 1 n I() = E S2 = E 2 X i 2 X j j 1 i 1 n n 1 n 1 = 4 E X i X j = 4 n 2 = 2 . i 1 j 1 1 n I() = Var S = 4 n 2 = 2 . This is probably the easiest way. 2 1 n I() = E 2 log L = E X i 2 i 1 1 n n n = E X i 2 = 2 . 2 i 1 Page 3 gs2011 FISHER’S INFORMATION IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 1 2 . This I n is also the non-asymptotic variance. It’s also something that we suspected right from the beginning. Thus, the asymptotic variance of the maximum likelihood estimate is Here is another example. Suppose that x1 , x 2 ,..., x n are known values, all positive. Suppose that Y1, Y2 , ..., Yn are independent, with Yi ~ N(xi, xi2 2). We can certainly get a method of moments estimate for . Observe that E Yi = xi, so that n n i 1 i 1 E Yi xi n The method of moments estimate is MM Y i i 1 n x Y . As an interesting x i i 1 observation, 2 Var Yi = 2 2 n x i 1 n 1 1 Var MM = 2 Var Y = 2 2 x n x n x 2 i i 1 In what follows we have to worry about two parameters. For the sake of this example, let’s think of as known. (It actually won’t matter here.) The likelihood for Yi is f(yixi) = 1 xi 2 e 1 2 xi2 2 yi xi 2 Based on this, we can write the likelihood for the whole problem: y x 1 1 2 2 i 2 i 2 2 yi xi 1 1 2 i1 xi 2 xi e e L = = n n / 2 i 1 xi 2 n 2 xi n n 2 i 1 III Page 4 gs2011 FISHER’S INFORMATION IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII We’ll need to take log L: n log L = n log log 2 2 1 n yi xi log xi 22 i 1 xi2 i 1 n 2 We could get the maximum likelihood estimates for both and . For now, we’ll just worry about , as noted above. Clearly we get that estimate by minimizing the sum from the exponent. Thus we solve n yi xi i 1 xi2 2 n 2 yi xi ( xi ) xi yi xi2 = 2 xi2 xi2 i 1 i 1 n n y let = 2 i 0 i 1 xi 1 n y 1 n Y The solution is ML i . In random variable form, this is ML i . This is n i 1 xi n i 1 xi a very unusual ratio estimate. This has a parallel concept in finite population sampling. Suppose we wanted to know its asymptotic variance. We need the score random variable. (Frequently this score random variable is found as part of the routine of getting the maximum likelihood estimate, but not here.) S = 1 n y xi 2 xi 1 n yi log L 2 i = 2 2 2 i 1 xi i 1 xi In random variable form, this is S = III 1 n Yi 2 i 1 xi Page 5 gs2011 FISHER’S INFORMATION IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII The easiest way to get I() is as Var S: Yi 1 n Yi 1 n Var Yi = Var 2 = 4 i 1 xi2 i 1 xi i 1 xi 1 I() = Var S = Var 2 1 n x 2 2 n = 4 i2 = 2 i 1 xi n 2 It follows that the limiting variance of ML is . n non-asymptotic result as well. You can actually show that this is a This is a better (smaller) variance that that for ˆ MM , which was 2 2 an interesting exercise to show that < 2 2 n n x n x 2 i 2 n2 x 2 n x i 1 2 i . It’s . i 1 For cases in which we have a sample, meaning n independent values sampled from the same distribution, we have I() = n I1(), where I1() is the information in one observation. We can get this from the score random variable based on one observation, generally identified as S1. Another example. Suppose that X1, X2, …, Xn is a sample from the exponential density f(x) = e - x I(x 0) Let’s find the maximum likelihood estimate. Begin with the likelihood n L = e n xi = e n xi i 1 i 1 It follows that n log L = n log xi i 1 let n n log L xi 0 i 1 III Page 6 gs2011 FISHER’S INFORMATION IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII Therefore ML = n = n x 1 . In random variable form, this is ML = x 1 . It’s going X i i 1 to be very difficult to get a limiting distribution. Let’s use the asymptotic results, based on the fact that this is a maximum likelihood estimate. Because this situation deals with a sample, we will find I() through I() = n I1(). For observation 1, we have log L1 = n log - x1. Then S1 = 1 log L1 x1 In random variable form, S1 = 1 X1 Certainly E S1 = 0. There are several ways to get I1(). Here’s the easiest: 2 1 I1() = E 2 log L1 = E log L1 = E X 1 1 1 = E 2 = 2 It follows then that I() = n I1() = n . 2 We can certainly make an approximate 95% confidence interval based on 1 1 ˆ ML 2 SE ˆ ML . Specifically, this is . 2 X X n III Page 7 gs2011