Notes 13 - Wharton Statistics Department

Statistics 550 Notes 13 Reading: Section 2.3. Schedule: 1. Take home midterm due Wed. Oct. 25th 2. No class next Tuesday due to fall break. We will have class on Thursday. 3. The next homework will be assigned next week and due Friday, Nov. 3rd. I. Asymptotic Relative Efficiency (Clarification from last class) Consider two estimators Tn and U n and suppose that L n (Tn   )  N (0, t 2 ) and that L n (U n   )  N (0, u 2 ) . We define the asymptotic relative efficiency of U to T by ARE (U , T )  t 2 / u 2 . For X 1 , , X n iid N ( ,1) , 1 ARE (Sample median,Sample mean)   0.63 .  /2 The interpretation is that if person A uses the sample median as her estimator of  and person B uses the sample mean as her estimator of  , person B needs a sample size that is only 0.63 as large as person A to obtain the same approximate variance of the estimator. 1 Theorem: If ˆn is the MLE and  n is any other estimator, then ARE (n ,ˆn )  1 . Thus, the MLE has the smallest asymptotic variance and we say that the MLE is asymptotically efficient and asymptotically optimal. Comments: (1) We will provide an outline of the proof for this theorem when we study the Cramer-Rao (information) inequality in Chapter 3.4; (2) The result is actually more subtle than the stated theorem because it only covers a certain class of well behaved estimators – more details will be study in Stat 552. II. Uniqueness and Existence of the MLE For a finite sample, when does the MLE exist, when is it unique and how do we find the MLE? If  is open, l x ( ) is differentiable in  and ˆMLE exists, then ˆ must satisfy the estimating equation MLE  l x ( )  0 This is known as the likelihood equation. (1.1) But solving (1.1) does not necessarily yield the MLE as there may be solutions of (1.1) that are not maxima, or solutions that are only local maxima. Anomalies of maximum likelihood estimates: 2 Maximum likelihood estimates are not necessarily unique and do not even have to exist. Nonuniqueness of MLEs example: X 1 , , X n are iid 1 1   ,   Uniform( 2 2 ). 1 1  if max X i     min X i  1 Lx ( )   2 2 0 otherwise Thus any estimator ˆ that satisfies max X i  1 ˆ 1    min X i  is a maximum likelihood 2 2 estimator. Nonexistence of maximum likelihood estimator: The likelihood function can be unbounded. An important example is a mixture of normal distributions, which is frequently used in applications. X 1 , , X n iid with density  ( x  1 )2   ( x  2 )2  1 1 f ( x)  p exp  exp    (1  p) . 2 2 2  21 2  1    2 2  2 This is a mixture of two normal distributions. The 2 2 unknown parameters are ( p, 1 , 2 ,  1 ,  2 ) . 2 Let 1  X1 . Then as  1  0 , f ( X 1 )   so that the likelihood function is unbounded. 3 Example where the MLE exists and is unique: Normal distribution X1 , , X n iid N (  ,  2 )  1  ( xi   )  2  1 , xn ;  ,  )   exp       2   2     i 1  n f ( x1 , 2 n 1 l (  ,  )  n log   log 2  2 2 2 l 1 n  2  i 1 ( X i   )   l n n     3  i 1 ( X i   ) 2    n i 1 ( X i   )2 The partials with respect to  and  are Setting the first partial equal to zero and solving for the mle, we obtain ˆ MLE  X Setting the second partial equal to zero and substituting the mle for  , we find that the mle for  is 1 n 2 ˆ MLE  ( X  X )  i . n i 1 4 To verify that this critical point is a maximum, we need to check the following second derivative conditions: (1) The two second-order partial derivatives are negative:  2l  2l 0 0 2 2 and    ˆ , ˆ    ˆ , ˆ MLE MLE MLE MLE (2) The Jacobian of the second-order partial derivatives is positive,  2l  2  2l  l  l  2 2 0 2   ˆ MLE , ˆ MLE See attached notes from Casella and Berger for verification of (1) and (2) for normal distribution. Conditions for uniqueness and existence of the MLE: We now provide a general condition under which there is a unique maximum likelihood estimator that is the solution to the likelihood equation. The condition applies to many exponential families. Boundary of a parameter space: Suppose the parameter p space    is an open set. Let      be the boundary of  , where  denotes the closure of  in [, ] p . That is,  is the set of points outside of  that can be obtained as limits of points in  , including all 5 points with  as a coordinate. For instance, for X ~ N (  ,  2 ), (  ,  2 )  (, )  (0, ) ,   {(a, b) : a  , 0  b  }  {( a, b) : a , b {0, }} Convergence of points to boundary: In general, for a sequence { m } of points from  open, we define  m   as m   to mean that for any subsequence { mk } , either  mk  t with t  or  mk diverges with |  mk |  as k   where |  | denotes the Euclidean norm. 2 Example: In the N (  ,  ) case, (a, m1 ),(m, b),(m, b),(a, m),(m, m1 ) all tend to  as m   . Lemma 2.3.1: Suppose we are given a function l :   where    p is open and l is continuous. Suppose also that lim {l ( ) :   }   .   Then there exists ˆ  such that l (ˆ)  max{l ( ) :   } . Proof: Problem 2.3.5. Proposition 2.3.1: Suppose our model is that X has pdf or pmf p( X |  ),    , and that (i) l x ( ) is strictly concave; 6 (ii) l x ( )   as    . Then the maximum likelihood estimator exists and is unique. Proof: l x ( ) is continuous because l x ( ) is convex (see Appendix B.9). By Lemma 2.3.1, ˆ exists. To prove MLE uniqueness, suppose ˆ1 and ˆ2 are distinct maximizers of the likelihood, then 1 1  l x (ˆ1 )  l x (ˆ1 )  l x (ˆ2 )  l x  (ˆ1  ˆ2 )  2 2  with the inequality following from the strict concavity of l x ( ) ; this contradicts ˆ1 being a maximizer of the likelihood.   Corollary: If the conditions of Proposition 2.3.1 are satisfied and l x ( ) is differentiable in  , then ˆMLE is the unique solution to the estimating equation:  l x ( )  0 (1.2) Application to Exponential Families: 1. Theorem 1.6.4, Corollary 1.6.5: For a full exponential family, the log likelihood is strictly concave. Consider the exponential family p( x |  )  h( x) exp{i 1iTi ( x)  A( )} k Note that if A( ) is convex, then the log likelihood 7 log p( x |  )  log h( x)  i 1iTi ( x)  A( ) is concave in . k Proof that A( ) is convex: Recall that A( )  log   h( x) exp[i 1iTi ( x)dx . To show that A( ) is convex, we want to show that A(1  (1   )2 )   A(1 )  (1   ) A(2 ) for 0    1 or equivalently exp{ A(1  (1   )2 )}  exp{ A(1 )}exp{(1   ) A(2 )} k We use Holder’s Inequality to establish this. Holder’s Inequality (B.9.4 on page 518 of Bickel and Doksum) states that for any two numbers r and s with r , s  1, r 1  s 1  1 , E | XY | {E | X |r }1/ r {E | Y |s }1/ s . We have exp{ A(1  (1   )2 )}   exp[ i 1 (1i  (1   )2i )Ti ( x )]h( x ) dx k  exp[  T ( x)]exp[  T ( x)]h( x)dx      (exp[  T ( x)]) h( x)dx    (exp[ (1   ) T ( x)]) k k i 1 k i 1 i 1 1i i 2i i 1/(1 ) k 1/ i 1 1i i 1i i h( x )dx exp{ A(1 )}exp{(1   ) A(2 )} For a full exponential family, the log likelihood is strictly concave. For a curved exponential family, the log likelihood is concave but not strictly concave. 8  1  2. Theorem 2.3.1, Corollary 2.3.2 spell out specific conditions under which l x ( )   as    for exponential families. Example 1: Gamma distribution  1 x 1e  x /  , 0  x     f ( x;  ,  )   ( )  0, elsewhere  l ( ,  )  i 1  log ( )   log   ( 1)log X i  X i /   n for the parameter space   0,   0 . The gamma distribution is a full two-dimensional exponential family so that the likelihood function is strictly concave. The boundary of the parameter space is   {(a, b) : a  , 0  b  }  {( a, b) : a  0, 0  b  }  {(a, b) : 0  a  , b  }  {( a, b) : 0  a  , b  0} Can check that lim {l ( ) :   }   .   Thus, by Proposition 2.3.1, the MLE is the unique solution to the likelihood equation. The partial derivatives of the log likelihood are 9  l n   '( )   i 1    log   log X i    ( )  X  l n     i 1    2i       Setting the second partial derivative equal to zero, we find ˆ   n i 1 Xi nˆ MLE When this solution is substituted into the first partial derivative, we obtain a nonlinear equation for the MLE of : MLE  '( ) n  Xi  n log i 1  n log ˆ MLE   i 1 log X i  0 ( ) n This equation cannot be solved in closed form. n n Next topic: Numerical methods for finding the MLE (Chapter 2.4). 10

Notes 13 - Wharton Statistics Department

Related documents

Products

Support

Notes 13 - Wharton Statistics Department

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib