Lecture 5: Decision and inference The decisive approach to statistical inference Point estimation of an unknown parameter : The decision rule is a point estimator (the functional form): ˆ X The action is a particular point estimate. ˆobs ˆ x State of nature is the true value of . The loss function is a measure of how far away the estimator is from : LS LS ,ˆ Prior information is quantified by the prior distribution (pdf/pmf) p( ). Data is the random sample x from a distribution with (pdf/pmf) f (x | ). Three simple loss functions Zero-one loss: 0 | ˆ | m LS , ˆ k, m 0 k | ˆ | m Absolute error loss: LS , ˆ k | ˆ | k 0 Quadratic (error) loss: 2 ˆ ˆ LS , k k 0 Minimax estimators: Find the value of that maximizes the expected loss with respect to the sample values, i.e. that maximizes E X LS , ˆ X over the set of estimators ˆ X . Then, the particular estimator that minimizes the risk for that value of is the minimax estimator. Bayes estimators: A Bayes estimator is the estimator that minimizes L ,ˆ x q | x d S Minimization with respect to different loss functions will result in measures of location in the posterior distribution of . Zero-one loss: ˆB x is the posterior mode for given x Absolute error loss: ˆB x is the posterior median for given x Quadratic loss: ˆB x is the posterior mean for given x : E x Example Assume we have a sample x = (x1, … , xn ) from U (0, ) and that a prior density for is the Pareto density p 1 1 , 2 ; 1, 0 =3 =1 What is the Bayes estimator of under quadratic loss? The posterior distribution is also Pareto with q | x n 1 max , xn n , max , xn n 1 ˆB E | x n 1 n n 1 max , x d n max , x n n 1 max , xn n1 n 1 max , xn n2 n 1 d max , x n Compare with ˆML xn Hypothesis testing Test of H0: = 0 vs. H1: = 1 Classic test (Neyman-Pearson): Decision rule: C = A test with critical region C Action: C (x) = “Reject H0 if x C , otherwise accept H0 ” Loss function: cI : Cost of making a type-I error cII : Cost of making a type-II error Risk function R C ; θ E X LS θ , C X Loss when rejecting H 0 for true value θ Pr X C | θ Loss when accepting H 0 for true value θ Pr X C | θ R C ; θ0 cI 0 1 cI R C ; θ1 0 1 cII cII Assume a prior setting p0 = Pr (H0 is true) = Pr ( = 0) and p1 = Pr (H1 is true) = Pr ( = 1) The prior expected risk becomes Eθ R C ; θ cI p0 cII p1 Bayes classic test: B arg minEθ R C ; θ arg mincI p0 cII p1 C C Minimax test: * arg min maxR C ; θ arg min maxcI , cII C θ C θ Lemma: Bayes classic tests and most powerful tests (Neyman-Pearson lemma) are equivalent in that • every most powerful test is a Bayes classic test for some values of p0 and p1 • every Bayes classic test is a most powerful test with Lθ1 ; x p0 cI Lθ0 ; x p1cII Finite action problems Bayesian hypothesis testing has previously in the course referred to calculation of the posterior odds: Pr H 0 x Pr H 0 B Pr H1 x Pr H1 Concluding which of H0 and H1 should be the hypothesis to be retained had so far been a question about whether the posterior probability of one of the hypothesis is “high enough”. Coupling the posterior probabilities with losses (or utilities) will define a decision problem. The loss function is c0 : Cost of accepting H0 when H1 is true c1 : Cost of accepting H1 when H0 is true The Bayes action is the action that minimises the expected posterior loss: 0 Pr H 0 x c0 Pr H1 x c0 Pr H1 x c1 Pr H 0 x 0 Pr H1 x c1 Pr H 0 x Example: Return to the example with dye on banknotes from lecture 1: Assume a method for detecting a certain kind of dye on banknotes is such that It gives a positive result (detection) in 99 % of the cases when the dye is present, i.e. the proportion of false negatives is 1% • It gives a negative result in 98 % of the cases when the dye is absent, i.e. the proportion of false positives is 2% • The presence of dye is rare: prevalence is about 0.1 % Assume the method has given positive result for a particular banknote. What should we conclude about the presence of dye? Can we formulate as a decision problem? Assume that… • • • • The banknote is a SEK 100 banknote If we deem the banknote to have been contaminated with the dye, we will consider it as useless and it will be destroyed If we deem the banknote not to have been contaminated with the dye, we will use it (in the future) for ordinary purchasing Upon using the banknote for purchasing, if it is revealed (by other means than our method) that the banknote is contaminated with the dye, there is a fine of SEK 500 Hence, our loss function can be written The posterior probabilities were obtained before: Pr " Dye is present" " Positive detection" 0.047 Pr " Dye is not present" " Positive detection" 0.953 Hence, 100 0.953 95.3 500 0.047 23.5 Minimising the expected posterior loss gives the action “Use the banknote”. How high must the fine be for the action to be changed?