Slides lecture 6

advertisement
2013-10-15
Assessing Probabilities
in Risk and Decision
Analysis
Aron Larsson SU/DSV and
MIUN/ITM
Probabilities in risk analysis
”A
A measure of uncertainty of an
observable quantity Y”
 Probabilities are subjective

 Based

on the assessor’s knowledge
There exists no ”true”
true probability
assignment
1
2013-10-15
The basic problem
Given a measurable quantity Y,
Y we want
to specify a probability distribution P(Y ≤ y)
for y > 0
 This also given a background information
K represented as hard data y1, …, yn and
as expert knowledge

 The
hard data is more or less relevant
Evaluating probability
assignments

Pragmatic criterion
 Accordance

Semantic criterion
 Calibration,

with observable data
accordance with future outcomes
Syntactic criterion
 Coherence,
assigned probabilities should
conform to the laws of probability theory
2
2013-10-15
Using classical statistics
Let Y be a binary quantity (one or zero)
P(Y = 1) = (1/n) Σi yi = (y1 + y2 + … + yn )/n
 Let Y be a real valued quantity
P(Y ≤ y) = (1/n) Σi I(yi ≤ y)

 Where

I() is the indicator funtion
Needs n observations, n must be
”sufficiently large”
≥
10 provided that not all H are either 0 or 1)
Maximum likelihood estimation
Assume that we have data from a known
parametric distribution (normal, poisson,
beta etc.)
 We wish to estimate the parameters θ =
(θ1, …, θn) of the distribution
 The MLE is the value of the parameters
that makes the observed data most likely

3
2013-10-15
Maximum likelihood estimation
We have n i.i.d.
i i d samples x1, …, xn
 Specify the joint distribution
f(x1, …, xn | θ) = f(x1| θ) f(x2| θ)… f(xn | θ)
Now view x1, …, xn as the parameters and
let θ vary,
vary then a likelihood function for θ
can be formulated as
L(θ | x1, …, xn ) = Πi f(xi | θ)

Maximum likelihood estimation
L(θ | x1, …, xn ) = Πi f(xi | θ)
 ln L(θ | x1, …, xn ) = Σi ln f(xi | θ)

Now we estimate θ by finding a value that
maximises L
 As it turns out, this is very easy for some
parametric distributions (the normal, the
exponential, the poisson)

4
2013-10-15
Bayesian analysis
Update probabilities when new information
becomes available
 We are interested in the probability of Θ =
θ, which may be updated by observing x

︶
Aj
AiP
P i
A
|
AiB
|
B
P
1
P n j
|
B
Ai
P
︵
︵

︶︵ ︶
︵

︶
︶︵
Prior probabilities
Let Θ = 0 mean ”not
not ill
ill”, Θ = 1 mean
”moderately ill, ” Θ = 2 mean ”seriously ill”
 We need a prior probability distribution
π(Θ = θ), which we assume can be
retrieved from, e.g., health statistics
 The prior probability is the probabilities we
have over the outcomes of Θ before new
information

5
2013-10-15
Likelihood principle
The likelihood principle in Bayesian
analysis makes explicit the natural
conditional idea that only the actual
observed x should be relevant to
conclusions or evidence about Θ
 For observed data x, the function
L(θ) = f(x | θ)
is called the ”likelihood function”

Note: x
given θ
here
Likelihood principle (cont’d)
cont’d)

In making inferences or decisions about θ
after x is observed, all relevant
experimental information is contained in
the likelihood function for the observed x.
6
2013-10-15
Likelihood function
Assume we can conduct a test on a
patient yielding positive (1) or negative (0)
 We need to know about the dependencies,
i.e. the conditional probabilities:

P(X
( = 1 | Θ = 2)) = 0.9
P(X = 1 | Θ = 1) = 0.6
P(X = 1 | Θ = 0) = 0.1
We refer to this as the likelihood function
L(x | θ).
Likelihood function and marginal

Knowing the likelihood function we can
simply obtain P(X = x), i.e. the marginal
distribution of X, labelled m(x | π) or m(x)
 For
example
m(1) =
P(X = 1 | Θ = 2)P(Θ = 2) +
P(X = 1 | Θ = 1)P(Θ = 1) +
P(X = 1 | Θ = 0)P(Θ = 0) =
0,9 · 0,02 + 0,6 · 0,1 + 0,1 · 0,88 = 0,166
7
2013-10-15
Likelihood function and marginal

F
d

|
x
f
The marginal density of X is
x
m


︶


︵ ︶ ︵
︶ ︵

In the discrete case, this is
 f ( x |  ) ( )

In the continuous case
 f (x |  ) ( )d


Bayesian updating

”Knowing”
Knowing π(θ),
π(θ) m(x),
m(x) and L(θ),
L(θ) we are
now interested in π(θ | x), or
P(Θ = θ | X = x)
 That
is, we are interested in the probability of
the outcomes θ having observed x
8
2013-10-15
The posterior distribution

Let π(Θ = 2) = 0.02
0 02 be a prior probability
probability,
we now observe X=1 then
π(2 | 1) = f(1 | 2) π(2) / m(1) = 0,9 · 0,02 / 0,166 = 0,11
This is now our posterior probability of Θ =
2, or π(2 | 1) = 0,11
 In this discrete case, this is called the
Bayes’ theorem

Adding more information

So, we now ”know”
So
know that P(Θ = 2 | X = 1) =
0.11, what if that is not enough?
 Additional
information can be sought
 Let X := X1 and do another test X2 (which is
conditionally independent of the first test)

Now we are interested in the following,
P(Θ = 2 | X1 = 1, X2 = 1)
9
2013-10-15
Likelihood function again

From the independence of the tests
tests, the
likelhood function’s properties is not
changed
P(X1 = 1 | Θ = 2, X2 = 1) = 0.9
P(X1 = 1 | Θ = 1, X2 = 1) = 0.6
P(X1 = 1 | Θ = 0,
0 X2 = 1) = 0.1
01

Replace π(Θ = θ) with π(Θ = θ | X1 = 1)
and update again
Posterior as new prior

Replacing π(θ)
P(Θ = 2 | X1 = 1) = 0.11
 P(Θ = 1 | X1 = 1) =
0.6*0.10
_ = 0.36
0.9*0.2 + 0.6*0.1 + 0.1*0.88
 P(Θ = 0 | X1 = 0) =
0.1*0.88
_ = 0.53
0.9*0.2 + 0.6*0.1 + 0.1*0.88

10
2013-10-15
Bayesian updating again

P(Θ = 2 | X1 = 1
1, X2 = 1) =
0.9*0.11
_
0.9*0.11 + 0.6*+0.36 + 0.1*0.53
= 0.27
Observable parameters
Assume we make n observations collected
in x = (x1, …, xn)
 Each xi is indepent of each other and
identically distributed, then we have a joint
distribution of the data

︶
|

1
i
 
xi
p
n
|
xn
,
.
.
.
,1
x
p
︵
︶

︵
11
2013-10-15
Bayesian updating in general

Let f be a distribution,
distribution in general
general, we can
write the conditional distribution of θ given
x, which is called the posterior distribution,
as
f ( | x ) 
L( x |  ) ( )
m( x )
Bayesian updating
updating:: Example

Quality engineering – sampling by attributes
 We
produce N items in a lot, we want at most that
0,35% of these are ”non-conforming” in terms of
quality (the acceptance quality limit a is 0,35%)
 We assume a prior distribution over a
 Then we look at n items from N, the sample size
 The probability of finding zero non-conforming items
in our sample given a certain a is our likelihood
function
 Finding zero non-conforming items will then increase
our confidence in that the quality is better than a
12
2013-10-15
Bayesian updating
updating:: Example
Lot size (500 ‐ 10 000 m2)
AQL
5 000
SQL
Sample
size
Prior conf.
Post. Conf.
Diff
Pr. find
one
200
80,22%
90,14%
9,92%
33,54%
0,20% 0,35%
Sampling Cost. / Cost. / Tot. start cost
m2
%
Cost
$400,0
$
$0,00
$
$2,00
$
$40,32
0
1
0,9
0,8
0,7
0,6
Before
0,5
Aft
After
0,4
0,3
0,2
0,1
0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
When data is missing
When data is lacking or existing data only
partially relevant
 Expert elicitation

 Direct
assessment
 Reference games
g

Pearson-Tukey
13
2013-10-15
Expert elicitiation:
elicitiation: Probability
wheel
Two adjustable sectors, a
spinner for visually generating
random events of specified
probability.
When the expert feels that the
probability of ending up in the
blue sector is the same as the
probability of the event of
interest, the probability of the
event “equals” the proportion of
the blue sector.
Expert elicitation:
elicitation: Indifferent
bets approach



Observe how experts
behave in gambling
situations
Assume that you are
indifferent between
these two bets
Further assume ”riskneutrality w.r.t.
money”
Bet 1
Win €X if
Italy win
Lose €Y if
France win
Bet 2
Lose €X if
Italy win
Win €Y if
France win
14
2013-10-15
Expert elicitation:
elicitation: Indifferent
bets approach


Since their expected
utility is equal, this
yields:
P(Italy win) = Y/(X+Y)
Why?
Bet 1
Win €X if
Italy win
Lose €Y if
France win
Bet 2
Lose €X if
Italy win
Win €Y if
France win
Expert elicitation:
elicitation: Indifferent
bets approach


So, letting X = €10
So
and Y = €15 then
P(Italy win) = 15/25 =
3/5
So, based on the
behaviour, the elicited
probability that Italy
will win is 3/5
Bet 1
Win €X if
Italy win
Lose €Y if
France win
Bet 2
Lose €X if
Italy win
Win €Y if
France win
15
2013-10-15
Expert elicitation:
elicitation: The reference
lottery approach

Compare the two lotteries
lotteries, lottery 1
 If
Italy wins you will get 2 weeks paid vacation
(Prize A) on a very nice location
 Otherwise you’ll get a glass of beer (Prize B)

With lottery 2
 Win
Prize A with probability p
 Win Prize B with probability 1-p
Expert elicitation:
elicitation: The reference
lottery approach ((cont’d
cont’d))
Adjust p until you are indifferent between
the two lotteries
 When you are indifferent, p is your
subjective probability that Italy will win

16
2013-10-15
Continous probabilities


In the case of an uncertain but continous
quantity
For example: ”The outcome is a real number
between 0 and 1000”
 as
oppose to ”the outcome will be either A, B, or C” as
is the case for finite quantities

Continous
C
ti
quantities
titi often
ft emerge iin d
decision
i i
problems, for example in variables such as
demand, sales etc.
Cumulative assessment


Consider: ”The
The outcome x of random variable (event
node) E is a real number between 0 and 1000”
Cumulative assessments would be






P(x ≤ 200) = 0.1
P(x ≤ 400) = 0.3
P(x ≤ 600) = 0.6
P(x ≤ 800) = 0.95
P( ≤ 1000) = 1
P(x
as oppose to ”the outcome will be either A, B, or C” as is the
case for finite quantities
17
2013-10-15
Cumulative assessment graph
Cum. Prob.
1,2
1
0,8
0,6
Cum. Prob.
0,4
0,2
0
Value
Fractiles

P(x ≤ a0.3
0.3
3
0 3) = 0
 The
number a0.3 is the 0.3 fractile of the
distribution

The a0.5 is the median of the distribution,
the probability of ending up with an
outcome lower a0.5
0 5 than is just as likely as
ending up with an outcome greater than
a0.5
18
2013-10-15
Quartiles
P(x ≤ a0.25) = 0
0.25
25 (first quartile)
 P(x ≤ a0.5) = 0.5 (second quartile)
 P(x ≤ a0.75) = 0.75 (third quartile)

Extended Pearson
Pearson--Tukey method



A simple but useful three-point approximation
Suitable when the distribution is assumed to be
symmetric
Uses the median and the 0.05 and 0.95 fractiles
 Assign



these three points specific probabilities
P(a0.05) = 0
0.185
185
P(a0.5) = 0.63
P(a0.95) = 0.185
P = 0.185
0 185
P = 0.63
P = 0.185
a0.05
0 05
a0.5
a0.95
19
2013-10-15
Bracket medians
Another, fairly simple,
Another
simple technique for
approximating a continous distribution with
a discrete one
 Not as restricted to symmetric distributions
as the Pearson-Tukey method
 Consider P(a ≤ x ≤ b),
b) the bracket median
m* of this interval is where


P(a ≤ x ≤ m*) = P(m* ≤ x ≤ b)
Using bracket medians
Break the continous probability distribution
into several equally likely intervals
 Assess the bracket median for each such
interval

20
2013-10-15
Using bracket medians ((cont’d
cont’d))
Cum. Prob.
1,2
1
Cum. Prob.
0,8
0,6
Cum. Prob.
0,4
0,2
0
0
100
200
300
400
500
600
700
800
900
1000
Demand
What is the bracket median for the interval [100, 500] in this
probability distribution?
Scoring rules
A scoring rule measure the accuracy of
probabilistic predictions
 Judge how well calibrated a probability
assessment is


Notation:
x
= 1 if event does occur, x=0 if it does not
 q = probability of occurrence reported by forecaster
 p = forecaster’s private probability of occurrence
 A proper scoring rule should provide maximum score
when q=p
21
2013-10-15
Scoring rule (cont’d)
cont’d)
Let xq - q2/2 so that assessor
assessor’s
s expected
payoff is pq - q2/2
 Derivative w.r.t. q: p - q

 Setting
to 0 gives q=p
 Note second derivative is negative
g

Assessor is motivated to tell the truth
 xq
- q2/2 is a proper scoring rule
Brier quadratic scoring rule
1 - (x - q)2
 Assessor’s expected payoff:

1

- p(1 - q)2 - (1 - p)q2
Derivative w.r.t. q:
 -2pq
2pq
+ 2p - 2(1
2(1-p)q
p)q = 2p
2p-2q
2q
 Setting to 0 gives q=p

So the quadratic scoring rule is also
proper
22
2013-10-15
Logarithmic scoring rule
x log q + (1-x)
(1 x) log (1
(1-q)
q)
 Assessor’s expected payoff:

p

log q + (1-p) log (1-q)
Derivative w.r.t. q
 p/q
- (1
(1-p)/(1-q)
p)/(1 q)
 Setting to 0 gives q=p
 Note second derivative is negative

So logarithmic is also proper
Scoring rule example
0,3
Trial No:
x:
Trial 1
1
Trial 2
1
Trial 3
1
Trial 4
0
Trial 5
1
Trial 6
1
Trial 7
0
Trial 8
1
Trial 9
1
Trial 10
1
Trial 11
Trial 11
1
Trial 12
1
Trial 13
0
Trial 14
0
Trial 15
1
Trial 16
1
Trial 17
1
Trial 18
1
Trial 19
0
Trial 20
1
0,25
0,2
0,15
Series1
0,1
0,05
0
0
0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9
1
23
2013-10-15
Readings

Aven Chapter 4
24
Download