# Outline • Parameter estimation – Maximum likelihood estimation

```Outline
• Parameter estimation
– Maximum likelihood estimation
Bayes Decision Theory
• Assumptions
– Suppose that there are c categories
• {1, 2, ....., c}
– The prior probability and class conditional
density are known
– There are a possible actions
• {1,  2, .....,  a}
– Loss function (i | j} describe the loss incurred
for taking action i when the state of nature is j
5/29/2016
Visual Perception Modeling
2
Bayes Decision Rule
• To minimize the overall risk, compute the
conditional risk and select the action for
which the conditional risk is minimum
c
R( i | x)    ( i |  j ) P( j | x)
j 1
– The resulting minimum overall risk is called the
Bayes risk, which is the best performance
5/29/2016
Visual Perception Modeling
3
Discriminant Functions for Normal Density
• Minimum error rate classification for normal
density
1
d
t 1
g i ( x)   ( x   i )  i ( x   i )  ln( 2 )
2
2
1
 ln(|  i |)  ln( P( i ))
2
• Three different cases
5/29/2016
Visual Perception Modeling
4
Parameter Estimation
• We could design an optimal classifier if we
knew the prior probabilities and the classconditional densities
– Unfortunately, in pattern recognition applications we
rarely have this kind of complete knowledge about
the probabilistic structure of the problem
• Training data
– Some vague, general knowledge about the problem
– A number of design samples
5/29/2016
Visual Perception Modeling
5
Parameter Estimation – cont.
• Two approaches
– Parameter estimation
• Estimate the parameters of the unknown probabilities
and probability densities
– Non-parametric procedures
• Multi-layer perceptrons and in general neural
networks
• Fisher linear discriminant function
• Work in the feature space directly
5/29/2016
Visual Perception Modeling
6
Parameter Estimation – cont.
• Parameter estimation
– Maximum-likelihood approach
• Parameters as quantities whose values are fixed but
unknown
• The best estimate of their value is the one that
maximizes the probability of obtaining the samples
– Bayesian learning
• Parameters are random variables with known prior
distribution
• Observations convert the prior into posteriori
5/29/2016
Visual Perception Modeling
7
Maximum-Likelihood Estimation
• Assumptions
– We separate a collection of samples according to class
• D1, D2, ....., Dc
– Samples in Dj are drawn independently according to
the probability p(x|j)
– We assume that p(x|j) has a known parametric form
and is uniquely determined by the value of a
parameter vector j
– To simplify further, we assume that samples in Di give
no information about j if i  j
5/29/2016
Visual Perception Modeling
8
Maximum-Likelihood Estimation – cont.
• Suppose that D contains n samples
– x1, ....., xn
– By assumption that samples were drawn
independently, we have
n
p ( D | θ )   p ( xk | θ )
k 1
– The maximum-likelihood estimate of  is the
value of * that maximizes p(D| )
5/29/2016
Visual Perception Modeling
9
Maximum-Likelihood Estimation – cont.
• Log-likelihood
l (θ )  ln( p ( D | θ ))
θ*  arg max l (θ)
n
θ
l (θ)   ln( p ( xk | θ))
k 1
n
θ l (θ)    θ (ln( p ( xk | θ)) )
k 1
5/29/2016
Visual Perception Modeling
10
Maximum-Likelihood Estimation – cont.
• The maximum likelihood solution is
θl (θ)  0
– A solution * can be a true global maximum, a
local maximum, or a minimum, or an inflection
point of l()
• We need to check each solution individually
• Or calculate the second derivatives to identify the
global optimum
5/29/2016
Visual Perception Modeling
11
Maximum-Likelihood Estimation – cont.
• Gaussian case - Unknown 
1
ln p ( xk |  )   ln[( 2 ) d |  |]
2
1
 ( xk   )T  1 ( xk   )
2
n
̂ 
1
5/29/2016
x
k 1
k
n
Visual Perception Modeling
12
Maximum-Likelihood Estimation – cont.
• Gaussian case - Unknown  and 
– Univariate case
1
1
ln p ( xk | θ )   ln 22 
( xk   1 ) 2
2
2 2
1


( xk  1 )


2
 θl (θ)  
2 
 1  ( xk  1 ) 
 2 2

2 22
5/29/2016
Visual Perception Modeling
13
Maximum-Likelihood Estimation – cont.
• Gaussian case - Unknown  and  continued
n k 1
̂   xk
1 n
ˆ 2
5/29/2016
n k 1
  ( xk  ˆ ) 2
1 n
Visual Perception Modeling
14
Maximum-Likelihood Estimation – cont.
• Bias
– For a large number of samples,
 n k 1
 n 1
   ( xk  
2 2
ˆ )2  
1 n

n
5/29/2016
Visual Perception Modeling
15
```