Lecture 12 Parameter Est..pptx

PROBABILITY AND STATISTICS FOR ENGINEERING Principles of Parameter Estimation Hossein Sameti Department of Computer Engineering Sharif University of Technology The Estimation Problem  We use the various concepts introduced and studied in earlier lectures to solve practical problems of interest.  Consider the problem of estimating an unknown parameter of interest from a few of its noisy observations. - the daily temperature in a city - the depth of a river at a particular spot  Observations (measurement) are made on data that contain the desired nonrandom parameter  and undesired noise. The Estimation Problem  For example Observatio n  signal (desired part)  noise,  or, the i th observation can be represented as X i    ni , i  1,2,, n.   : the unknown nonrandom desired parameter  ni , i  1,2,, n : random variables that may be dependent or independent from observation to observation.  The Estimation Problem: - Given n observations X 1  x1 , X 2  x2 , , X n  xn , obtain the “best” estimator for the unknown parameter  in terms of these observations. Estimators  Let us denote by ˆ( X ) the estimator for  .  Obviously ˆ( X ) is a function of only the observations.  “Best estimator” in what sense?  Ideal solution: the estimate ˆ( X ) coincides with the unknown  .  Almost always any estimate will result in an error given by e  ˆ( X )   .  One strategy would be to select the estimatorˆ( X ) so as to minimize some function of this error - mean square error (MMSE), - absolute value of the error - etc. A More Fundamental Approach: Principle of Maximum Likelihood  Underlying Assumption: the available data X 1 , X 2 , , X n has something to do with the unknown parameter  .  We assume that the joint p.d.f of X 1 , X 2 , , X n , f X ( x1, x2 , , xn ; ), depends on .  This method - assumes that the given sample data set is representative of the population f X ( x1 , x2 , , xn ; ) - chooses the value for  that most likely caused the observed data to occur Principle of Maximum Likelihood  In other words, given the observations x1 , x2 , , xn , f X ( x1 , x2 , , xn ; ) is a function of  alone  The value of  that maximizes the above p.d.f is the most likely value for  , and it is chosen as the ML estimate ˆML ( X ) for . f X ( x1 , x2 , , xn ; ) ˆML ( X )   Given X 1  x1 , X 2  x2 , , X n  xn , the joint p.d.f f X ( x1 , x2 , , xn ; ) represents the likelihood function  The ML estimate can be determined either from - the likelihood equation sup f X ( x1, x2 , , xn ; ) ˆML - or using the log-likelihood function  L( x1 , x2 , , xn ; )  log f X ( x1 , x2 , , xn ; ).  If L( x1 , x2 , , xn ; ) is differentiable and a supremum ˆML exists in the above equation, then that must satisfy the equation  log f X ( x1 , x2 , , xn ; )  0.   ˆML Example  Let X i    wi , i  1  n, represent n observations where  is the unknown parameter of interest,  wi , i  1  n, are zero mean independent normal r.vs with common 2 variance  .  Determine the ML estimate for . Solution  Since wi s are independent r.vs and  is an unknown constant, X i s are independent normal random variables.  Thus the likelihood function takes the form n f X ( x1 , x2 , , xn ; )   f X i ( xi ; ). i 1 Example - continued  Each X i is Gaussian with mean  and variance  2 (Why?).  Thus f X i ( xi ; )  1 2 2 e  ( xi  ) 2 / 2 2 .  Therefore the likelihood function is: f X ( x1 , x2 ,, xn ; )  1 ( 2 ) 2 n/2  e n  ( xi  )2 / 2 2 i 1 .  It is easier to work with the log-likelihood function L( X ; ) in this case. Example - continued  We obtain 2 n n ( x   ) L( X ; )  ln f X ( x1 , x2 ,, xn ; )  ln( 2 2 )   i 2 , 2 2 i 1  and taking derivative with respect to , we get n  ln f X ( x1 , x2 , , xn ; ) (x  )  2 i 2  0,  2 i 1  ˆML  ˆML  or n 1 ˆML ( X )   X i . n i 1  This linear estimator represents the ML estimate for  . Unbiased Estimators  Notice that the estimator is a r.v. Taking its expected value, we get n 1 E [ˆML ( x )]   E ( X i )   , n i 1  i.e., the expected value of the estimator does not differ from the desired parameter, and hence there is no bias between the two.  Such estimators are known as unbiased estimators. 1 n ˆ   ML ( X )   X i represents an unbiased estimator for  . n i 1 Consistent Estimators  Moreover the variance of the estimator is given by V ar (ˆ ˆ ML )  E [( ML   1   ) ]  E   n  2 n X i 1 i     2      2 2 n n     1 1          E   ( X i   )    2 E   ( X i   )          n i 1  n  i 1  n n  1  n 2  2  E ( X i   )    E ( X i   )( X j   )  . n  i 1 i 1 j 1, i  j   The latter terms are zeros since X i and X j are independent r.vs. n 2  2 Var( X i )  2  .  n n i 1  So, 1 Var(ˆML )  2 n  And: Var(ˆML )  0 n as n  ,  another desired property. We say estimators that satisfy this limit are consistent estimators. Example  Let X 1 , X 2 , , X n be i.i.d. uniform random variables in ( 0,  ) with common p.d.f 1 f X i ( xi ; )  , 0  xi   ,   where  is an unknown parameter. Find the ML estimate for . Solution  The likelihood function in this case is given by f X ( X 1  x1 , X 2  x2 , , X n  xn ; )  1 n 1 , 0  xi   , i 1 n , 0  max( x1 , x2 , , xn )   . n  The likelihood function here is maximized by the minimum value of  .  Example - continued  and since   max( X 1 , X 2 , , X n ), we get ˆML ( X )  max( X1, X 2 , , X n ) to be the ML estimate for  .  a nonlinear function of the observations.  Is this is an unbiased estimate for  ? we need to evaluate its mean.  It is easier to determine its p.d.f and proceed directly.  Let Z  max( X 1 , X 2 , , X n ) where f X i ( xi ; )  1  , 0  xi   . Example - continued  Then FZ ( z )  P[max( X 1 , X 2 ,  , X n )  z ]  P( X 1  z, X 2  z,, X n  z ) n z   P ( X i  z )   FX i ( z )    ,   i 1 i 1 n  so that  nz n 1  f Z ( z)    n ,   0, n 0  z , 0  z , otherwise .  Using the above, we get  n 0 n E[ˆML ( X )]  E ( Z )   z f Z ( z )dz    0 n  n 1  z dz   . n n 1  (1  1 / n) n Example - continued  In this case E [ˆML ( X )]   , so the ML estimator is not an unbiased estimator for  .  However, note that as n    lim E[ˆML ( X )]  lim n  n  (1  1 / n ) ,  i.e., the ML estimator is an asymptotically unbiased estimator.  Also,  E ( Z )   z f Z ( z )dz  2 0 2 n  n   0 z n 2 dz  n2 n 1  so that 2 2 2 2 n  n  n  2 2 Var[ˆML ( X )]  E ( Z )  [ E ( Z )]    . 2 2 n  2 (n  1) (n  1) (n  2)  Var[ˆML ( X )]  0 as n  , implying that this estimator is a consistent estimator. Example  Let X 1 , X 2 , , X n be i.i.d Gamma random variables with unknown parameters  and  .  Determine the ML estimator for  and  . Solution  Here xi  0, and n n   x   1 f X ( x1 , x2 , , xn ; ,  )  x e . n  i ( ( )) i 1 n i i 1  This gives the log-likelihood function to be L( x1 , x2 , , xn ; ,  )  log f X ( x1 , x2 , , xn ; ,  ) n  n   n log   n log ( )  (  1)  log xi     xi . i 1  i 1  Example - continued  Differentiating L with respect to  and  we get n L n  n log   ( )   log xi  0,  ( ) i 1  ,  ˆ , ˆ n L n    xi  0.   i 1  ,  ˆ , ˆ  Thus,  So, ˆ ML ( X )  log ˆ ML ˆ ML 1 n , n x i 1 i (ˆ ML ) 1 n  1 n   log   xi    xi . (ˆ ML )  n i 1  n i 1  Notice that this is highly nonlinear in ˆ ML . Conclusion  In general the (log)-likelihood function - can have more than one solution, or no solutions at all. - may not be even differentiable - can be extremely complicated to solve explicitly Best Unbiased Estimator 1 n ˆ  We have seen that  ML ( X )   X i represents an unbiased estimator n i 1 2 for  with variance  . n  It is possible that, for a given n, there may be other unbiased estimators to this problem with even lower variances.  If such is indeed the case, those estimators will be naturally preferrable compared to previous one.  Is it possible to determine the lowest possible value for the variance of any unbiased estimator?  A theorem by Cramer and Rao gives a complete answer to this problem. Cramer - Rao Bound  Variance of any unbiased estimator ˆ based on observations X 1  x1, X 2  x2 , , X n  xn for  must satisfy the lower bound Var(ˆ)  1   ln f X ( x1 , x2 ,, xn ; )  E     2  1 .   2 ln f X ( x1 , x2 ,, xn ; )  E   2     The right side of above equation acts as a lower bound on the variance of all unbiased estimator for , provided their joint p.d.f satisfies certain regularity restrictions. (see (8-79)-(8-81), Text). Efficient Estimators  Any unbiased estimator whose variance coincides with Cramer-Rao bound must be the best.  Such estimates are known as efficient estimators. 1 n ˆ  Let us examine whether  ML ( X )   X i is efficient . n i 1 2 1  n    ln f X ( x1 , x2 ,, xn ; )   ( X   )  ;   i 4    i 1     2  and 2 n n  1 n   ln f X ( x1 , x2 ,, xn ; )  2 E   4  E[( X i   ) ]    E[( X i   )( X j   )]      i 1 i 1 j 1, i  j  1 n 2 n  4   2 ,  i 1  So the Cramer - Rao lower bound is  2 n . Rao-Blackwell Theorem  As we obtained before, the variance of this ML estimator is the same as the specified bound.  If there are no unbiased estimators that are efficient, the best estimator will be an unbiased estimator with the lowest possible variance.  How does one find such an unbiased estimator?  Rao-Blackwell theorem gives a complete answer to this problem.  Cramer-Rao bound can be extended to multiparameter case as well. Estimating Parameters with a-priori p.d.f  So far, we discussed nonrandom parameters that are unknown.  What if the parameter of interest is a r.v with a-priori p.d.f f ( ) ?  How does one obtain a good estimate for  based on the observations X 1  x1 , X 2  x2 , , X n  xn ?  One technique is to use the observations to compute its a-posteriori p.d.f. f |X ( | x1 , x2 , , xn ).  Of course, we can use the Bayes’ theorem to obtain this a-posteriori p.d.f. f |X ( | x1 , x2 , , xn )  f X | ( x1, x2 , , xn |  ) f ( ) f X ( x1, x2 , , xn ) .  Notice that this is only a function of  , since x1 , x2 , , xn represent given observations. MAP Estimator  Once again, we can look for the most probable value of  suggested by the above a-posteriori p.d.f.  Naturally, the most likely value for  is the one corresponding to the maximum of the a-posteriori p.d.f (The MAP estimator for  ). f ( | x1 , x2 , , xn ) ˆ  MAP  It is possible to use other optimality criteria as well.

Lecture 12 Parameter Est..pptx

Related documents

Products

Support

Lecture 12 Parameter Est..pptx

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib