An Overview of Elicitation Methods and Software Paul Garthwaite Open University, UK (Joint work with Fadlalla Elfadaly) 1 Here, the focus is on elicitation methods that help an expert quantify his/her opinions as a probability distribution. There is elicitation software for many other purposes. For example: • To aid collaboration via brainstorming; • to develop a conceptual model via mind maps; • to evaluate influence diagrams; • to construct qualitative probabilistic networks (qualitative forms of Bayesian belief networks); • To aggregate assessments of individual experts in one combined probability density function [Excalibur: acronym for Expert CALIBRation, (Cooke & Solomatine, 1992)]. 2 A useful review of much elicitation software is given in: Software to support expert elicitation Devilee & Knolby (2011). http://www.rivm.nl/bibliotheek/rapporten/630003001.pdf They consider: - ways software programs may support elicitation; - existing software programs that could support elicitation; - the functionalities of these existing software projects. 3 Why use expert opinion? Can’t opinion convey bias as well as knowledge? • The expert may want to quantify his/her opinions for her own use. e.g. An industrial chemist wanting to design experiments. • Available data may not be suitable as a statistical input. e.g. Sightings of a rare and endangered species. • Data is absent so that using expert knowledge is essential. Bayesian statistics provides a mechanism for incorporating expert opinion into a statistical analysis. 4 Basic strategy for quantifying opinion • We have a model for the problem in hand. • Examples: regression models and multinomial models. • The model has parameters, such as a and b in the regression model y = a + bx. • We want to quantify opinion about a and b. • We give a and b a distribution: typically we would assume that opinion about them can be represented by a multivariate normal distribution: a b 1 12 MVN , . 2 2 2 The expert answers questions that determine 1, 2 ,12 , 22 and . 5 The expert must be asked questions they can understand and answer meaningfully. Much psychological research has examined the man’s ability at acting as an intuitive statistician. We are good at giving point estimates (especially medians and modes – means tend to be biased when distributions are skew.) We are not good at making direct assessments of variances and correlations. In general, experts should not be asked to assess parameter values but should be asked about observable quantities. (e.g. in regression, questions should be phrased in terms of the response (Y), rather than in terms of regression coefficients.) Visual assessment tasks may be easier for an expert to perform. 6 To quantify uncertainty, many elicitation methods make extensive use of assessments of lower and upper quartiles. • This asks the expert to specify her median and lower and upper quartiles– they can be assessed by the method of bisection. Suppose opinion about a probability is needed. L M U 25% 25% 25% 25% _________________________________ 0 0.3 1.0 7 Elicitation of opinion for a multinomial model In the multinomial sampling model there are a number of categories. Each observation will be in exactly one category and expert opinion must: • provide an estimate of the probability of each category • quantify the accuracy of the estimates. Model to represent expert’s opinion: the simplest conjugate prior distribution is the Dirichlet distribution. 8 Example: Misclassification rates of BMI • A person in Malta gives their height and weight in a questionnaire and their calculated BMI is in the normal range. • Their true BMI is in one of the four categories: normal; overweight; obese; underweight. • We want to question an expert to assess the probabilities that the person’s true BMI is in each of these categories. 9 Median assessment of p for the first category 10 Assessing the probability of overweight, given that the red bar is correct. 11 Assessing the probability for obese, conditional on the red categories are correct. (The probability for the yellow category follows automatically.) 12 The short blue lines are the expert’s lower and upper quartile assessments for the first category. The insert shows the probability density function for the first probability. 13 Blue lines are the expert’s quartile assessments for the probability of overweight, conditional that 0.60 is the probability of normal. 14 Quartile assessments for obese (also giving those for underweight.) 15 Marginal distributions are shown to the expert as feedback. The expert can make modifications if (s)he wishes. 16 When there are k categories, the Dirichet prior distribution has the form f ( p1 , ( N ) , pk ) p1a1 1 (a1 ) (ak ) Thus, opinion about k parameters ( p1, just k hyperparameters (a1, , ak 1 and N ). p1ak 1. , pk ) is represented by A more flexible model for the expert’s opinion is the ConnorMossiman distribution: bi 1 ( ai bi ) k (a b ) ai 1 i i pkbk 1 1. , pk ) pi p j i 1 ( ai ) (bi ) j i k 1 f ( p1, This has 2(k – 1) hyperparameters. It can be determined from the same assessments that give the Dirichlet distribution. 17 SHELF (Sheffield Elicitation Framework) O’Hagan and Oakley have software to carry out elicitation of probability distributions (aimed particularly at quantifying uncertainty from a group of experts). The univariate distributions it uses to model opinion are: Normal, Student t, scaled beta, gamma, log-normal and log Student-t. An extension quantifies opinion about a multinomial distribution by first eliciting marginal (beta) probabilities for each category, and then reconciling them to form a Dirichlet distribution. Offers a choice of assessment tasks for quantifying probabilities: quartiles, tertiles, fixed interval, roulette. 18 19 Gaussian copulas The Connor-Mossiman distribution is more flexible than the Dirichet, but greater flexibility can be obtained by eliciting marginal beta distributions and then using a Gaussian copula to tie the beta distributions together. Elfadaly and I have developed a method for this and implemented it in free software. Adding Covariates to the Multinomial Model Clearly desirable – the probability that an overweight person says their weight is normal may depend on age and gender, for example. One way to do this is taking the multinomial logistic model as the sampling model. 20 The multinomial logistic model gives the following probability that a person with covariate values x falls in category i. 1 i 1 1 k exp( x ' ) , j j j 2 pi ( x ) exp( j x ' j ) , i 2, , k 1 k exp( x ' ) j j j 2 The prior distribution gives the α and β coefficients a (singular) multivariate normal distribution. It is singular because, for any given x, the expected values of the probabilities must sum to 1. 21 Elicit opinion for the subpopulation of (i) men aged 30: Man aged 30 Underweight Normal Overweight Obese p1(x) p2(x) p3(x) p4(x) Underweight Normal Overweight Obese p1(x) p2(x) p3(x) p4(x) Underweight Normal Overweight Obese p1(x) p2(x) p3(x) p4(x) Then women aged 30: Woman aged 30 Then men aged 60: Man aged 60 Eliciting the prior for the separate blocks is tricky; combining the information quite easy. 22 Generalised Linear Models We have developed methods for quantifying opinion about generalised linear models (GLMs). Can specify: ordinary linear regression, logistic regression, Poisson regression; Or the sampling distribution and link function may be specified as: Distribution: normal, Poisson, binomial, gamma, inversenormal, negative binomial, Bernoulli, geometric, exponential. Link function: canonical, identity, logarithm, logit, reciprocal, square-root, probit, log-log, complementary log-log, power, log-ratio, user specified. 23 The prior model gives the regression coefficients a multivariate normal distribution. Opinion is modelled by piece-wise linear models to add flexibility. 24 This is the type of graph for assessing medians for a continuous variable. 25 This is the type of bar-chart formed for a factor. 26 Graph for eliciting conditional quartiles. 27 PEGS (Probability Elicitation Graphical Software) http://statistics.open.ac.uk/elicitation Multinomial distribution. Separate programs (and a single combined program) elicit: • Dirichlet and Connor-Mossiman priors. • Dirichlet and Gaussian copula priors. • MVN prior for multinomial logistic model. Piecewise-linear GLMs Program that elicits an MVN prior also quantifies opinion about: • The error variance in a normal linear model. • The scale parameter in a gamma GLM. These are also available in separate stand-alone programs. 28 SHELF (O’Hagan, Oakley et al., Sheffield). I earlier mentioned that SHELF software elicits distributions and Dirichlet prior. The software also includes a web-based tool for eliciting probability distributions – users can log in from different sites and they can all see and interact with the same graphics. The Elicitator (Comford, Aston). Problem owners define an elicitation problem and invite experts to participate, who subsequently login to the website and complete a list of questions that make up the elicitation process. Elicitator (James, Low Choy, Mengersen, Queensland Univ. Tech) . This tool quantifies expert opinion about regression problems in ecology. Opinion at different geographical locations can be elicited (as in Denham and Mengersen, 2007), so as to define covariate values. 29 Reviews Cooke, R.M. (1991). Experts in Uncertainty: Opinion and Subjective Probability in Science. (Oxford University Press, New York). Garthwaite, P.H., Kadane, J.B. and O’Hagan, A. (2005). Statistical methods for eliciting probability distributions. J. Amer. Statist. Ass., 100, 680–701. Hogarth, R. M. (1975),. Cognitive Processes and the Assessment of Subjective Probability Distributions. J. Amer. Statist. Ass., 70, 271– 294. Kynn, M. (2008). The ‘‘heuristics and biases’’ bias in expert elicitation. J. R. Statist.Soc., A 171, 239–264. O’Hagan A., Buck C., Daneshkhah A., Eiser J., Garthwaite P., Jenkinson D., Oakley J., Rakow T., 2006. Uncertain Judgements: Eliciting Experts' Probabilities (Wiley, Chichester). Peterson, C. R., and Beach, L. R. (1967). Man as an Intuitive Statistician. Psychological Bulletin, 68, 29–46. Tversky, A. and Kahneman, D., 1974. Judgment under uncertainty: heuristics and biases, Science, 185, 1124-1131. 30 PEGS References Al-Awadhi, S A and Garthwaite, P H. (2006). Quantifying expert opinion for modelling fauna habitat distributions. Computational Statist., 21, 121-140. Garthwaite, P H, Chilcott, J B, Jenkinson, D J and Tappenden, P. (2008). Use of expert knowledge in evaluating costs and benefits of alternative service provision: A case study. Int. J. Technol. Assess. Health Care, 24, 350-357. Garthwaite, P H, Alawadhi, A S, Elfadaly, F and Jenkinson, D J. (2013). Quantifying subjective opinion about generalized linear and piecewise-linear models. J. Applied Statist., 40, 59-75. Elfadaly, F G and Garthwaite, P H. Eliciting Dirichlet and Connor-Mossiman prior distributions for multinomial models. Test, in press. Elfadaly, F G and Garthwaite, P H. Eliciting Dirichlet and Gaussian copula prior distributions for multinomial models. Submitted. Elfadaly, F G and Garthwaite, P H. Eliciting prior distributions for extra parameters in some generalised linear models. Submitted. 31 Other References Cooke, R. and Solomatine, D., "EXCALIBR – software package for expert data evaluation and fusion and reliability assessment" report to the Commission of the European Communities, Delft, 1990. Devilee, J L A and Knol, A B. (2011). Software to support expert elicitation: An exploratory study of existing software packages. RIVM Letter report 630003001 (National Institute for Public Health and the Environment, Netherlands). 32