Statistics 512 Notes 13: Properties of Maximum Likelihood Estimates L( ) L( ; X1 , , X n ) f ( X1 , , X n ) ˆMLE max L( ; X1, , X n ) max l ( ; X1, , Xn ) Good properties of maximum likelihood estimates: (1) Invariance (2) Consistency (3) Asymptotic Normality (4) Efficiency Invariance (Theorem 6.1.2): Let X 1 , , X n be iid with the pdf f ( x; ), . For a specified function g, let g ( ) be a parameter of interest. Suppose ˆ is the mle of . Then g (ˆ) is mle of g ( ) . Proof: For each in the range of g, define the set g 1 ( ) { : g ( ) } The maximum occurs at ˆ and the domain of g is which covers ˆ . Hence ˆ is in one of these sets and , in fact can only be in one set. Hence to maximize L ( ) , choose ̂ so 1 that g (ˆ ) is the unique set containing ˆ . Then ˆ g (ˆ) . Example: X 1 , , X n iid Bernoulli (p). The large sample confidence sample interval for p is pˆ 1.96 Varˆ( pˆ ) The MLE for p̂ is pˆ MLE n i 1 n Xi . p(1 p) . n By invariance the maximum likelihood estimate of pˆ (1 pˆ ) Var ( pˆ MLE ) is . n The variance of the MLE is Var ( pˆ MLE ) Consistency: Consistency means that the MLE converges in probability to the true value. To proceed, we need a definition. If f and g are PDF’s define the Kullback-Leibler distance between f and g to be f ( x) D( f , g ) f ( x) log dx g ( x ) It can be shown that D ( f , g ) 0 and D( f , f ) 0 . For any , to mean D( f ( x), f ( x)) . We say that the model is identifiable if implies that D ( , ) 0 . This means that different values of the parameter correspond to different distributions. We will assume that the model is identifiable. Let 0 denote the true value of . Let ln ( ) denote the log likelihood of based on an iid sample X 1 , , X n . Maximizing ln ( ) is equivalent to maximizing f (X ) 1 M n ( ) log i . n i f0 ( X i ) By the law of large numbers, M n ( ) converges to f ( x) f ( X i ) E0 log log f0 ( x)dx f0 ( X i ) f0 ( x) f0 ( x) log f0 ( x)dx D( , 0 ) f ( x ) Hence, M n ( ) D(0 , ) which is maximized at 0 since D(0 ,0 ) 0 and D(0 , ) 0 for 0 . Therefore, we expect that the maximizer will tend to 0 . To prove this P formally, we need more than M n ( ) D( 0 , ) . We need this convergence to be uniform over . We also have to make sure that the function D(0 , ) is well behaved. Here are the formal details Theorem: Let 0 denote the true value of . Define f (X ) 1 M n ( ) log i n i f0 ( X i ) and M ( ) D(0 , ) . Suppose that P sup | M n ( ) M ( ) | 0 and that for every 0 , sup :| * | M ( ) M ( 0 ) P Let ˆn denote the MLE. Then ˆn 0 . Proof: Since ˆn maximizes M n ( ) , we have M (ˆ ) M ( ) . Hence, n n n 0 M ( 0 ) M (ˆn ) M n ( 0 ) M (ˆn ) M ( 0 ) M n ( 0 ) M (ˆ ) M (ˆ ) M ( ) M ( ) n n n 0 n 0 sup | M n ( ) M ( ) | M ( 0 ) M n ( 0 ) P 0 where the last line follows from the assumption P sup | M n ( ) M ( ) | 0 . Pick any 0 . By the assumption sup :| * | M ( ) M ( 0 ) , there exists 0 such that | 0 | implies that M ( ) M (0 ) . Hence, P(| ˆ | ) P(M (ˆ ) M ( ) ) 0 . n 0 n 0