The Moment Method for Combining Expert Judgements Bram Wisse* & Tim Bedford & John Quigley University of Strathclyde *TNO Defence, Security and Safety, The Hague, NL tim.bedford@strath.ac.uk Overview • Expert judgement combination problem • Discussion of the Classical Method • Using expected values to specify uncertainties • Performance based weighting for the combination of assessments of moments • Trading off location and spread accuracy • Comparison of methods • Conclusion Expert judgement combination problem • • • • Each expert specifies distribution for variable of interest “Combined expert” is a weighted mixture Good theoretical justification for taking weighted mixture (linear pool) Problem is to choose weights • • • • Equal weights User determined Analyst determined Performance based 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 1st Qtr 2nd Qtr Weight 0.5 3rd Qtr 0 4th Qtr 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr Weight 0.5 35 30 25 20 15 10 5 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr Classical model of Cooke • Data provided in terms of quantiles – usually 5, 50, 95% • Model enables us to choose weights by means of an “asymptotically proper scoring rule” • Scores calculated using quantities known only to analyst • Score is product of two components – Calibration – Information • Calibration shows whether experts are assessing probabilities accurately – measured by chi-square likelihood • Information shows whether experts give narrow or broad uncertainty bands – measured by relative information to “merged” uncertainty bands Classical Model Exercise Used every year in Strathclyde risk class… Experience of the class test • • • • • • Protests… Chatting… Attempted use of internet… Lack of geographical knowledge Overconfidence Almost every time, someone strategizes…deliberately wide bands Issues with Classical model • Definitely good at selecting experts who are good at assessing uncertainty • Usually considered better than equal weights • But… • Trade-off of calibration and information is not clear • Possible to “game” by giving wide uncertainty bands – its only an asymptotic proper scoring rule 5% 50% 95% 5% 50% 95% Wider uncertainty bands do better when there is a low number of calibration questions Moment based method as alternative • Moments valid theoretical way of representing uncertainties • Can try to elicit means, variances etc or get them implicitly from quantiles • To go from quantiles to moments, use Extended Pearson-Tukey method…. ππ = 0.185 (π₯0.05 )π + 0.63 (π₯0.50 )π + 0.185 (π₯0.95 )π where x0:05; x0:50 and x0:95 are the expert’s assessments of the 5%-, 50%- and 95%-quantile for variable x. Key idea… • Natural quadratic loss score for moments π = π₯ − πΈ(π) 2 • Overall loss for expert π over all variables: ππ = ππ π₯π − πΈπ (ππ ) 2 Expert weighting • To convert to a score….high loss implies low score • Choose cut off value πΌ and give expert score 0 ππ ππ > πΌ π π = πΌ − ππ ππ‘βπππ€ππ π • Expert weight is π€π is normalized score π€π = π π / π π • This is a proper scoring rule – expert who wants to maximize score should say what he/she thinks! • Choose optimal πΌ to minimize loss of combined expert • Parameters ππ show relative importance Geometric interpretation, examples Assessment of uncertain quantities X and Y, by 4 experts Convex hull of experts’ assessments e3 β e2 β α ∞ 2 4 β r 3 β e1=(E1(X), E1(Y)) Linear pool expectations for all Cut-off’s α in (miniφi, ∞) β e4 Geometric interpretation, examples Assessment of uncertain quantities X and Y, by 4 experts Convex hull of experts’ assessments e3 β e2 β 2 α ∞ β 1 3 β e1=(E1(X), E1(Y)) Linear pool expectations for all Cut-off’s α in (miniφi, ∞) e4 β r2 Properties of proposed weighting scheme 1. 2. 3. 4. 5. The unnormalised weight of an expert is a proper scoring rule. The expert with the smallest loss always remains in the pool. The DM loss is always smaller than, or equal to the loss of any individual expert. The DM loss is always smaller than, or equal to the loss obtained when using equal weights for all experts. The weighting scheme defines a continuous mapping from a vector of expert losses to a vector of expert weights. Coherent Expectations (based on the notion of prevision, (DeFinetti, 1974), (Lad, 1996) ) 1. An expectation E(X) for vector X is coherent as long as there is no vector s with allowable scale such that sTβ[X – E(X)] < 0, for all X in R(X), the realm of X. 2. An expectation E(X) for vector X is coherent iff β min R(X) ≤ E(X) ≤ max R(X), and β E(X+Y) = E(X) + E(Y). 3. The assertion E(X)=e is coherent iff e in C(R(X)), where C(A) denotes the convex hull of set A. Example Convex hull for (E(X), E(X2)), X in [0,1] (recall Var(X) = E(X2) – E(X)2 ≥ 0 ) New scoring rule for means and variances • If symmetric, then a proper scoring rule for mean m and variance v, is π = π1 (π₯ − π) 2 +π2 [ π₯ − π Penalty from mis-estimation of mean 2 − π£]2 Penalty from mis-estimation of variance Scoring rule for means and variances • If symmetric, then a proper scoring rule for mean m and variance v, is π = π1 (π₯ − π) 2 +π2 [ π₯ − π Weight on location penalty Weight on spread penalty 2 − π£]2 π1 π2 Ratio represents trade-off between location and spread penalty Classical vs Moment methods • Comparison measures? • No absolute measures of performance external to methods • Classical and moment methods each have “own” performance criteria • General experience from comparing methods is • Optimal combination of experts is different • Overall performance on the “other” performance criteria good, so plausible combinations produced by both methods Conclusions • A performance based weighting scheme developed based on moment methods, with good properties • Cases show comparable performance of the classical model and the moment model on some existing data sets. • Both models provide good alternative to choosing equal weights for all experts. • Moment method has better foundational properties! • Transparent way of choosing weights that trade off location and spread accuracy – so more flexible than classical model