2013-10-15 Assessing Probabilities in Risk and Decision Analysis Aron Larsson SU/DSV and MIUN/ITM Probabilities in risk analysis ”A A measure of uncertainty of an observable quantity Y” Probabilities are subjective Based on the assessor’s knowledge There exists no ”true” true probability assignment 1 2013-10-15 The basic problem Given a measurable quantity Y, Y we want to specify a probability distribution P(Y ≤ y) for y > 0 This also given a background information K represented as hard data y1, …, yn and as expert knowledge The hard data is more or less relevant Evaluating probability assignments Pragmatic criterion Accordance Semantic criterion Calibration, with observable data accordance with future outcomes Syntactic criterion Coherence, assigned probabilities should conform to the laws of probability theory 2 2013-10-15 Using classical statistics Let Y be a binary quantity (one or zero) P(Y = 1) = (1/n) Σi yi = (y1 + y2 + … + yn )/n Let Y be a real valued quantity P(Y ≤ y) = (1/n) Σi I(yi ≤ y) Where I() is the indicator funtion Needs n observations, n must be ”sufficiently large” ≥ 10 provided that not all H are either 0 or 1) Maximum likelihood estimation Assume that we have data from a known parametric distribution (normal, poisson, beta etc.) We wish to estimate the parameters θ = (θ1, …, θn) of the distribution The MLE is the value of the parameters that makes the observed data most likely 3 2013-10-15 Maximum likelihood estimation We have n i.i.d. i i d samples x1, …, xn Specify the joint distribution f(x1, …, xn | θ) = f(x1| θ) f(x2| θ)… f(xn | θ) Now view x1, …, xn as the parameters and let θ vary, vary then a likelihood function for θ can be formulated as L(θ | x1, …, xn ) = Πi f(xi | θ) Maximum likelihood estimation L(θ | x1, …, xn ) = Πi f(xi | θ) ln L(θ | x1, …, xn ) = Σi ln f(xi | θ) Now we estimate θ by finding a value that maximises L As it turns out, this is very easy for some parametric distributions (the normal, the exponential, the poisson) 4 2013-10-15 Bayesian analysis Update probabilities when new information becomes available We are interested in the probability of Θ = θ, which may be updated by observing x ︶ Aj AiP P i A | AiB | B P 1 P n j | B Ai P ︵ ︵ ︶︵ ︶ ︵ ︶ ︶︵ Prior probabilities Let Θ = 0 mean ”not not ill ill”, Θ = 1 mean ”moderately ill, ” Θ = 2 mean ”seriously ill” We need a prior probability distribution π(Θ = θ), which we assume can be retrieved from, e.g., health statistics The prior probability is the probabilities we have over the outcomes of Θ before new information 5 2013-10-15 Likelihood principle The likelihood principle in Bayesian analysis makes explicit the natural conditional idea that only the actual observed x should be relevant to conclusions or evidence about Θ For observed data x, the function L(θ) = f(x | θ) is called the ”likelihood function” Note: x given θ here Likelihood principle (cont’d) cont’d) In making inferences or decisions about θ after x is observed, all relevant experimental information is contained in the likelihood function for the observed x. 6 2013-10-15 Likelihood function Assume we can conduct a test on a patient yielding positive (1) or negative (0) We need to know about the dependencies, i.e. the conditional probabilities: P(X ( = 1 | Θ = 2)) = 0.9 P(X = 1 | Θ = 1) = 0.6 P(X = 1 | Θ = 0) = 0.1 We refer to this as the likelihood function L(x | θ). Likelihood function and marginal Knowing the likelihood function we can simply obtain P(X = x), i.e. the marginal distribution of X, labelled m(x | π) or m(x) For example m(1) = P(X = 1 | Θ = 2)P(Θ = 2) + P(X = 1 | Θ = 1)P(Θ = 1) + P(X = 1 | Θ = 0)P(Θ = 0) = 0,9 · 0,02 + 0,6 · 0,1 + 0,1 · 0,88 = 0,166 7 2013-10-15 Likelihood function and marginal F d | x f The marginal density of X is x m ︶ ︵ ︶ ︵ ︶ ︵ In the discrete case, this is f ( x | ) ( ) In the continuous case f (x | ) ( )d Bayesian updating ”Knowing” Knowing π(θ), π(θ) m(x), m(x) and L(θ), L(θ) we are now interested in π(θ | x), or P(Θ = θ | X = x) That is, we are interested in the probability of the outcomes θ having observed x 8 2013-10-15 The posterior distribution Let π(Θ = 2) = 0.02 0 02 be a prior probability probability, we now observe X=1 then π(2 | 1) = f(1 | 2) π(2) / m(1) = 0,9 · 0,02 / 0,166 = 0,11 This is now our posterior probability of Θ = 2, or π(2 | 1) = 0,11 In this discrete case, this is called the Bayes’ theorem Adding more information So, we now ”know” So know that P(Θ = 2 | X = 1) = 0.11, what if that is not enough? Additional information can be sought Let X := X1 and do another test X2 (which is conditionally independent of the first test) Now we are interested in the following, P(Θ = 2 | X1 = 1, X2 = 1) 9 2013-10-15 Likelihood function again From the independence of the tests tests, the likelhood function’s properties is not changed P(X1 = 1 | Θ = 2, X2 = 1) = 0.9 P(X1 = 1 | Θ = 1, X2 = 1) = 0.6 P(X1 = 1 | Θ = 0, 0 X2 = 1) = 0.1 01 Replace π(Θ = θ) with π(Θ = θ | X1 = 1) and update again Posterior as new prior Replacing π(θ) P(Θ = 2 | X1 = 1) = 0.11 P(Θ = 1 | X1 = 1) = 0.6*0.10 _ = 0.36 0.9*0.2 + 0.6*0.1 + 0.1*0.88 P(Θ = 0 | X1 = 0) = 0.1*0.88 _ = 0.53 0.9*0.2 + 0.6*0.1 + 0.1*0.88 10 2013-10-15 Bayesian updating again P(Θ = 2 | X1 = 1 1, X2 = 1) = 0.9*0.11 _ 0.9*0.11 + 0.6*+0.36 + 0.1*0.53 = 0.27 Observable parameters Assume we make n observations collected in x = (x1, …, xn) Each xi is indepent of each other and identically distributed, then we have a joint distribution of the data ︶ | 1 i xi p n | xn , . . . ,1 x p ︵ ︶ ︵ 11 2013-10-15 Bayesian updating in general Let f be a distribution, distribution in general general, we can write the conditional distribution of θ given x, which is called the posterior distribution, as f ( | x ) L( x | ) ( ) m( x ) Bayesian updating updating:: Example Quality engineering – sampling by attributes We produce N items in a lot, we want at most that 0,35% of these are ”non-conforming” in terms of quality (the acceptance quality limit a is 0,35%) We assume a prior distribution over a Then we look at n items from N, the sample size The probability of finding zero non-conforming items in our sample given a certain a is our likelihood function Finding zero non-conforming items will then increase our confidence in that the quality is better than a 12 2013-10-15 Bayesian updating updating:: Example Lot size (500 ‐ 10 000 m2) AQL 5 000 SQL Sample size Prior conf. Post. Conf. Diff Pr. find one 200 80,22% 90,14% 9,92% 33,54% 0,20% 0,35% Sampling Cost. / Cost. / Tot. start cost m2 % Cost $400,0 $ $0,00 $ $2,00 $ $40,32 0 1 0,9 0,8 0,7 0,6 Before 0,5 Aft After 0,4 0,3 0,2 0,1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 When data is missing When data is lacking or existing data only partially relevant Expert elicitation Direct assessment Reference games g Pearson-Tukey 13 2013-10-15 Expert elicitiation: elicitiation: Probability wheel Two adjustable sectors, a spinner for visually generating random events of specified probability. When the expert feels that the probability of ending up in the blue sector is the same as the probability of the event of interest, the probability of the event “equals” the proportion of the blue sector. Expert elicitation: elicitation: Indifferent bets approach Observe how experts behave in gambling situations Assume that you are indifferent between these two bets Further assume ”riskneutrality w.r.t. money” Bet 1 Win €X if Italy win Lose €Y if France win Bet 2 Lose €X if Italy win Win €Y if France win 14 2013-10-15 Expert elicitation: elicitation: Indifferent bets approach Since their expected utility is equal, this yields: P(Italy win) = Y/(X+Y) Why? Bet 1 Win €X if Italy win Lose €Y if France win Bet 2 Lose €X if Italy win Win €Y if France win Expert elicitation: elicitation: Indifferent bets approach So, letting X = €10 So and Y = €15 then P(Italy win) = 15/25 = 3/5 So, based on the behaviour, the elicited probability that Italy will win is 3/5 Bet 1 Win €X if Italy win Lose €Y if France win Bet 2 Lose €X if Italy win Win €Y if France win 15 2013-10-15 Expert elicitation: elicitation: The reference lottery approach Compare the two lotteries lotteries, lottery 1 If Italy wins you will get 2 weeks paid vacation (Prize A) on a very nice location Otherwise you’ll get a glass of beer (Prize B) With lottery 2 Win Prize A with probability p Win Prize B with probability 1-p Expert elicitation: elicitation: The reference lottery approach ((cont’d cont’d)) Adjust p until you are indifferent between the two lotteries When you are indifferent, p is your subjective probability that Italy will win 16 2013-10-15 Continous probabilities In the case of an uncertain but continous quantity For example: ”The outcome is a real number between 0 and 1000” as oppose to ”the outcome will be either A, B, or C” as is the case for finite quantities Continous C ti quantities titi often ft emerge iin d decision i i problems, for example in variables such as demand, sales etc. Cumulative assessment Consider: ”The The outcome x of random variable (event node) E is a real number between 0 and 1000” Cumulative assessments would be P(x ≤ 200) = 0.1 P(x ≤ 400) = 0.3 P(x ≤ 600) = 0.6 P(x ≤ 800) = 0.95 P( ≤ 1000) = 1 P(x as oppose to ”the outcome will be either A, B, or C” as is the case for finite quantities 17 2013-10-15 Cumulative assessment graph Cum. Prob. 1,2 1 0,8 0,6 Cum. Prob. 0,4 0,2 0 Value Fractiles P(x ≤ a0.3 0.3 3 0 3) = 0 The number a0.3 is the 0.3 fractile of the distribution The a0.5 is the median of the distribution, the probability of ending up with an outcome lower a0.5 0 5 than is just as likely as ending up with an outcome greater than a0.5 18 2013-10-15 Quartiles P(x ≤ a0.25) = 0 0.25 25 (first quartile) P(x ≤ a0.5) = 0.5 (second quartile) P(x ≤ a0.75) = 0.75 (third quartile) Extended Pearson Pearson--Tukey method A simple but useful three-point approximation Suitable when the distribution is assumed to be symmetric Uses the median and the 0.05 and 0.95 fractiles Assign these three points specific probabilities P(a0.05) = 0 0.185 185 P(a0.5) = 0.63 P(a0.95) = 0.185 P = 0.185 0 185 P = 0.63 P = 0.185 a0.05 0 05 a0.5 a0.95 19 2013-10-15 Bracket medians Another, fairly simple, Another simple technique for approximating a continous distribution with a discrete one Not as restricted to symmetric distributions as the Pearson-Tukey method Consider P(a ≤ x ≤ b), b) the bracket median m* of this interval is where P(a ≤ x ≤ m*) = P(m* ≤ x ≤ b) Using bracket medians Break the continous probability distribution into several equally likely intervals Assess the bracket median for each such interval 20 2013-10-15 Using bracket medians ((cont’d cont’d)) Cum. Prob. 1,2 1 Cum. Prob. 0,8 0,6 Cum. Prob. 0,4 0,2 0 0 100 200 300 400 500 600 700 800 900 1000 Demand What is the bracket median for the interval [100, 500] in this probability distribution? Scoring rules A scoring rule measure the accuracy of probabilistic predictions Judge how well calibrated a probability assessment is Notation: x = 1 if event does occur, x=0 if it does not q = probability of occurrence reported by forecaster p = forecaster’s private probability of occurrence A proper scoring rule should provide maximum score when q=p 21 2013-10-15 Scoring rule (cont’d) cont’d) Let xq - q2/2 so that assessor assessor’s s expected payoff is pq - q2/2 Derivative w.r.t. q: p - q Setting to 0 gives q=p Note second derivative is negative g Assessor is motivated to tell the truth xq - q2/2 is a proper scoring rule Brier quadratic scoring rule 1 - (x - q)2 Assessor’s expected payoff: 1 - p(1 - q)2 - (1 - p)q2 Derivative w.r.t. q: -2pq 2pq + 2p - 2(1 2(1-p)q p)q = 2p 2p-2q 2q Setting to 0 gives q=p So the quadratic scoring rule is also proper 22 2013-10-15 Logarithmic scoring rule x log q + (1-x) (1 x) log (1 (1-q) q) Assessor’s expected payoff: p log q + (1-p) log (1-q) Derivative w.r.t. q p/q - (1 (1-p)/(1-q) p)/(1 q) Setting to 0 gives q=p Note second derivative is negative So logarithmic is also proper Scoring rule example 0,3 Trial No: x: Trial 1 1 Trial 2 1 Trial 3 1 Trial 4 0 Trial 5 1 Trial 6 1 Trial 7 0 Trial 8 1 Trial 9 1 Trial 10 1 Trial 11 Trial 11 1 Trial 12 1 Trial 13 0 Trial 14 0 Trial 15 1 Trial 16 1 Trial 17 1 Trial 18 1 Trial 19 0 Trial 20 1 0,25 0,2 0,15 Series1 0,1 0,05 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 23 2013-10-15 Readings Aven Chapter 4 24