Lecture 5: Decision and inference

advertisement
Lecture 5:
Decision and inference
The decisive approach to statistical inference
Point estimation of an unknown parameter :
The decision rule is a point estimator (the functional form): ˆ X 
The action is a particular point estimate. ˆobs  ˆ x 
State of nature is the true value of .
The loss function is a measure of how far away the estimator is from  :
 
LS  LS  ,ˆ
Prior information is quantified by the prior distribution (pdf/pmf) p( ).
Data is the random sample x from a distribution with (pdf/pmf) f (x | ).
Three simple loss functions
Zero-one loss:
0 | ˆ   |  m
LS  , ˆ   
k, m  0
k | ˆ   |  m
Absolute error loss:
LS  , ˆ   k  | ˆ   | k  0
Quadratic (error) loss:
2
ˆ
ˆ
LS  ,    k     
k 0
Minimax estimators:
Find the value of  that maximizes the expected loss with respect to the sample
values, i.e. that maximizes E X LS  , ˆ X  over the set of estimators ˆ X  .
 


Then, the particular estimator that minimizes the risk for that value of  is the
minimax estimator.
Bayes estimators:
A Bayes estimator is the estimator that minimizes
 L  ,ˆ x q | x d
S
Minimization with respect to different loss functions will result in measures of
location in the posterior distribution of .
Zero-one loss:
ˆB  x  is the posterior mode for  given x
Absolute error loss:
ˆB  x  is the posterior median for  given x
Quadratic loss:
ˆB  x  is the posterior mean for  given x : E  x 

Example
Assume we have a sample x = (x1, … , xn ) from U (0, ) and that a prior
density for  is the Pareto density
p     1
 1 

,  2 ;   1,   0
=3
=1
What is the Bayes estimator of  under quadratic loss?
The posterior distribution is also Pareto with
q | x     n  1  max , xn  
   n  ,   max , xn  
  n 1
 ˆB  E  | x  

  n 1    n 










n

1

max

,
x

d 
n 

max  , x n  
  n  1  max , xn   n1
  n 1

 max , xn  
 n2

   n 1

d 

max  , x n  
Compare with ˆML  xn 
Hypothesis testing
Test of H0:  =  0 vs. H1:  =  1
Classic test (Neyman-Pearson):
Decision rule: C = A test with critical region C
Action: C (x) = “Reject H0 if x C , otherwise accept H0 ”
Loss function:
cI : Cost of making a type-I error
cII : Cost of making a type-II error
Risk function
R C ; θ   E X LS θ ,  C  X  
 Loss when rejecting H 0 for true value θ   Pr  X  C | θ  
 Loss when accepting H 0 for true value θ   Pr  X  C | θ 

R C ; θ0   cI    0  1     cI 
R C ; θ1   0  1     cII    cII 
Assume a prior setting p0 = Pr (H0 is true) = Pr ( =  0) and p1 = Pr (H1
is true) = Pr ( =  1)
 The prior expected risk becomes
Eθ R C ; θ   cI   p0  cII   p1
Bayes classic test:
 B  arg minEθ R C ; θ   arg mincI p0  cII p1
C
C
Minimax test:




 *  arg min maxR C ; θ   arg min maxcI  , cII  
C
θ
C
θ
Lemma: Bayes classic tests and most powerful tests (Neyman-Pearson lemma) are
equivalent in that
• every most powerful test is a Bayes classic test for some values of p0 and p1
• every Bayes classic test is a most powerful test with
Lθ1 ; x  p0 cI

Lθ0 ; x  p1cII
Finite action problems
Bayesian hypothesis testing has previously in the course referred to calculation of
the posterior odds:
Pr H 0 x 
Pr H 0 
 B
Pr H1 x 
Pr H1 
Concluding which of H0 and H1 should be the hypothesis to be retained had so far
been a question about whether the posterior probability of one of the hypothesis is
“high enough”.
Coupling the posterior probabilities with losses (or utilities) will define a decision
problem.
The loss function is
c0 : Cost of accepting H0
when H1 is true
c1 : Cost of accepting H1
when H0 is true
The Bayes action is the action that minimises the expected posterior loss:
0  Pr H 0 x   c0  Pr H1 x   c0  Pr H1 x 
c1  Pr H 0 x   0  Pr H1 x   c1  Pr H 0 x 
Example: Return to the example with dye on banknotes from lecture 1:
Assume a method for detecting a certain kind of dye on banknotes is
such that
It gives a positive result (detection) in 99 % of the cases when the dye
is present, i.e. the proportion of false negatives is 1%
•
It gives a negative result in 98 % of the cases when the dye is absent,
i.e. the proportion of false positives is 2%
•
The presence of dye is rare: prevalence is about 0.1 %
Assume the method has given positive result for a particular
banknote.
What should we conclude about the presence of dye?
Can we formulate as a decision problem?
Assume that…
•
•
•
•
The banknote is a SEK 100 banknote
If we deem the banknote to have been contaminated with the dye, we will
consider it as useless and it will be destroyed
If we deem the banknote not to have been contaminated with the dye, we
will use it (in the future) for ordinary purchasing
Upon using the banknote for purchasing, if it is revealed (by other means
than our method) that the banknote is contaminated with the dye, there is a
fine of SEK 500
Hence, our loss function can be written
The posterior probabilities were obtained before:
Pr " Dye is present" " Positive detection"  0.047
Pr " Dye is not present" " Positive detection"  0.953
Hence,
100  0.953  95.3
500  0.047  23.5
Minimising the expected posterior loss gives the action “Use the banknote”.
How high must the fine be for the action to be changed?
Download