QUANTIFYING HAZARDS AND RISKS WITH EXPERT JUDGMENT Willy Aspinall

advertisement
4 – 6 November 2009
QUANTIFYING HAZARDS AND RISKS
WITH EXPERT JUDGMENT
Willy Aspinall
Bristol University / Aspinall & Associates
Willy.Aspinall@Bristol.ac.uk
Promises, promises
“.... our work/project/research will
reduce uncertainty”
The Three Horsemen
of Risk Apocalypse
UNCERTAINTY
with apologies to Roger Cooke
AMBIGUITY
INDECISION
The Three Horsemen example
UNCERTAINTY
PF goes which way,
how far and how fast?
AMBIGUITY
What is understood by
term “pyroclastic flow”?
INDECISION
Do we evacuate?
The Three Horsemen responses
UNCERTAINTY
Do measurements,
quantify uncertainty
AMBIGUITY
Define concepts,
domain of application
INDECISION
Assess utilities,
preferences
The Three Horsemen roles
UNCERTAINTY
Experts‟ role to quantify
AMBIGUITY
Analyst/facilitator‟s
job to clarify
INDECISION
Stakeholder,
problem owner‟s
responsibility
Elicitation of expert judgment
In climate change modelling, for instance, the challenges are
exemplified by:
“…. We explore a high rate of refusal to participate in this expert
survey: many scientists prefer to rely on output from future climate
model simulations.”
Arnell, N. W., E. L. Tompkins, et al. (2005). Eliciting Information from Experts on the
Likelihood of Rapid Climate Change. Risk Analysis 25: 1419-1431.
“…The past performance of such projections has been
systematically overconfident. Analysts have often used scenarios
based on detailed story lines…. for evaluating uncertainty. No
probabilities are typically assigned to such scenarios.”
Morgan, M.G. and D. Keith (2008). Improving the way we think about projecting future energy
use and emissions of carbon dioxide. Climatic Change 90: 189-215.
The Classical Model
A performance-based procedure for quantifying uncertainties from expert
judgments
Qi =
Pr(event|data)???
Wj = Cj * Ij
Cooke, R.M. (1991) Experts in
Uncertainty. OUP.
Cooke, R.M. and L.L.H.J. Goossens
(2008) TU Delft expert judgment data
base. Reliability Engineering & System
Safety Expert Judgement 93: 657-674.
Synthesised group
“Decision-Maker”
DMi =
Wj*Qi
One case history (of several)
DEFRA study
objective: to develop a generic quantitative model for accelerated
internal erosion in Britain‟s population of 2,500 ageing dams,
using elicited quantities for key variables
Cowlyd Reservoir inspection party - 1917
Warmwithens Dam failure - 1970
..risk assessment and reservoir safety in the UK
Experts‟ spreads of opinion for one parameter
Opinions on the time-to-failure (in days from first detection) for the
10%ile of slowest cases….
….. and outcomes obtained by alternative ways of weighting and
pooling opinions
Note the “two
schools of
thought”
effect…and the
strong
„opinionation‟ of
many experts
The reservoir engineers: performance-based scores, and
mutual weighting rankings
Calibration weights versus mutual weights
Equal weights, performance-based weights and an expert
census approach
…hypothetical SSHAC-4 expert census
uncertainty spread??
Advanced 3D computational fluid dynamics modelling
Courtesy INGV and EU EXPLORIS Project
Elicitation of „realistic‟ physical uncertainties on model
outputs
Analysing expert elicitations with Cooke‟s “Classical Model”
The procedure relies on
cornerstones of the scientific
method:
Empirical control - evaluates
weights for experts on basis of
measures of performance
Accountability - inputs are
traceable in terms of scientific
inputs of individuals
Reproducibility - can replicate and
review all calculations used
Advantages:
Impartiality - experts are
treated equally prior to
calibration
Equity – individual experts’
scores are maximised by
stating true scientific views
Diagnostic - procedure can
highlight discrepancies in
reasoning or inconsistencies
in interpretation
……this approach produces a “rational consensus”, and sits
squarely within the Bayesian paradigm for decision-support
Montserrat - 11 October 2009
Probabilistic forecasting for Montserrat volcano using the
structured expert elicitation approach
2. GIVEN current conditions, what is the probability that within the next
year the first significant development will be the resumption of lava
extrusion.
SAC elicitation
Credible
interval lower
bound
Median
estimate
Credible
interval upper
bound
6.3%
34.1%
66.1%
Forecast metric - Brier Skill Score
Brier Score
BS
1
n
•oi
n
fk
ok
k 1
2
= 1 if the event occurs
= 0 if the event does not occur
•fi is the probability of occurrence according
to the forecast system
•BS can take on values in the range [0,1], a
perfect forecast having BS = 0
Brier Skill Score
BSS
BS cli
BS cli BS
BS cli
o1 o
The forecast system has predictive skill
relative to some reference (e.g. climate
record) if BSS is positive, a perfect system
having BSS = 1.
= total frequency of the event
o (e.g. sample climatology / global data /
other reference basis)
Forecast skill performance of Montserrat SAC
Probabilistic forecast scorecard
+ve BSS
All
forecasts
84
zero or -ve
BSS
26
(76%)
(24%)
61
14*
(83%)
(17%)
(110 no.)
Life critical
forecasts
(75 no.)
* includes some „most threatening‟ scenarios
cautious
Communicating forecast skill
Surrogate metrics for forecast skill
40 €
ROI [1€ staked per
forecast]
30 €
20 €
10 €
0€
-10 €
-20 €
Sep-2008
Sep-2006
Sep-2004
Sep-2002
Sep-2000
Sep-1998
Sep-1996
Sep-1994
Cumulative Return on Investment ROI
Montserrat
case, following
Lenny Smith &
colleagues……
[Hagedorn, R., Smith, L.A. (2008) Communicating the value of probabilistic forecasts with weather roulette.
Meteorol. Appl. Published online in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/met.9. ]
Forecast performance versus outlook period
Brier Score by outlook period
1.5
Brier Score by outlook period
0.5
0
Months
70
-1
60
-0.5
50
40
30
20
10
0
Brier Score rel. uniform probs.
1
Challenging elicitations of scientific expert judgment
The Harvard study on Kuwait’s First Gulf
War reparations claim
 More Than 700 Fires
 First Fires –
Air War ~ 17 January
1991
 Ground War ~ 23
February 1991
 Liberation ~ 28
February 1991
 Last Fire - 6 November 1991
6
 Oil Burned ~ 4 x 10 barrels
per day
9
 PM Emissions ~ 3 x 10 kg
 PM10 levels – typical 300
Health effects
ug/m3claim
, sometimbased
es 2000 on expert

•
elicitation: ~ 35 deaths
Individual experts’ best mortality estimates:
13, 32, 54, 110, 164, 2874
Equal Weights (82 deaths;
90% conf. range: 18 to 400 )
Performance Weights (35 deaths; 16 to 54)
The judicial decision of the UN Commission
eventually rejected the admissibility of this form
of evidence: “…not actual data…..”
…and we won‟t mention Prof Nutt and
cannabis!
Estimating dose-response curves for cancer risk from
airborne arsenic using expert inputs
Work with the late Joey Hanzich
(Env. Epid. MPhil 2006-07) and Dr
Peter Baxter at IPH Cambridge
Extracting signal from expert noise
Example self-weighted curves
from one individual expert for one
risk ratio value…..
….and pooled results for group,
combined with EXCALIBUR
weights
Weighted Cumulative Probability
Weighted Cumulative Probability vs Cumulative Exposure
1.0
Estimated
Risk
Ratio
1.01
1.05
1.10
1.50
2.00
0.8
0.6
0.4
0.2
0.0
1
00
0
0
0.
0
01
0
0
0.
0
10
0
0
0.
0
00
1
0
0.
0
00
0
1
0.
0
0
0
00
00
00
0
0
0
0
0
.0
1.
0.
10
10
Cumulative Exposure in (mg/cubic m)*years
A supplementary approach
The Cooke Classical Model and EXCALIBUR procedure for eliciting quantitative values
and uncertainty distributions from multiple experts.
Two-factor ranking of option items
by Paired Comparison with Probabilistic Inversion
For more qualitative assessments of
uncertain factors, simple paired comparison
analysis using Probabilistic Inversion (PI)
model fitting provides an alternative way of
characterizing relative rankings (“revealed
preferences”) from a group, with quantitative
estimates of associated uncertainties:
0.624
0.185
Item 7
0.586
0.233
0.604
0.199
Item 9
0.577
0.212
0.666
0.180
Item 10
0.785
0.222
0.425
0.168
Item 3
0.593
0.191
0.786
0.187
Item 1
0.440
0.226
0.316
0.130
Item 6
0.781
0.192
0.763
0.143
Item 4
0.805
0.133
0.447
0.158
Item 8
0.159
0.111
0.168
0.102
Item 2
0.4
0.3
0.2
Item8
0.1
Item 5
0.0
Importance
1
0.143
0.9
0.521
0.8
Item 2
0.7
0.089
Item 1
0.5
0.6
0.171
Item 3
0.5
0.118
Item 9
0.6
0.4
0.155
Item 7
0.7
0.3
Item 5
0.8
0.2
Std. dev.
Item 6
Item 4
0.1
Import.
Item 10
0
Std. dev.
0.9
Performance
Perform.
1.0
In almost all circumstances, and at all
times, we find ourselves in a state of
uncertainty
- Bruno de Finetti
….and scientists will continue to be
perplexed, bemused and uncertain!
Summing up
“.... our work/project/research will
reduce uncertainty……”
…. a laudable goal, but the opposite is
likely to emerge when exhaustive and
formalized investigations of scientific
uncertainty are undertaken – and scientists
will have to think how best to communicate
the implications for hazard and risk
management!
Thank you!
Download