The Lens method is a sound quantitative method for countering the effects of inconsistency in human responses. By collecting possible determinants in judgment and running regressions on expert responses, a system of formulas can be created to simulate an expert or group of expert’s judgments. This is useful in often repeated judgments and provides an improvement in performance due to the removal of inconsistency. Lens Executive Overview A proven method for improving expert judgment I Methods for Improving on Scoring Models Used for Risk Assessment and Prioritization An Executive Overview developed by Hubbard Decision Research March 26, 2009 Overview This is a description of the Applied Information Economics (AIE) approach to developing the portfolio assessments. Hubbard Decision Research (HDR) uses the “Lens Method” which has been proven to measurably improve judgments in a variety of fields since the 1950’s. HDR combines the Lens Method with training for improving an individual’s ability to assess subjective probabilities and risks. This is a substantial improvement over the most popular “scoring” methods for assessing and prioritizing portfolios. Scoring Method Problems For several decades, researchers in the decision sciences have measured the effectiveness of various decision making methods and have quantified numerous biases, errors and quirky behaviors of decision makers. Many practical methods have been developed that correct for these problems and result in measurable improvements in the track record of decision makers. Yet, the typical scoring and ranking methodology seems to be developed without any regard for this research. Not only do scoring and ranking methods rarely correct for the known errors in human decision making, it appears that the scoring method itself will add errors of its own. That is, experts would have been better off using their intuition. Decision Errors in Scoring and Ranking Methods There are problems scoring methods don’t fix and problems they add. Known Judgment Errors Ignored by Typical Scoring and Ranking Methods: There is a long list of sources of judgment error measured by researchers and virtually none are addressed by popular scoring methods. The following three may be the most significant (and most avoidable). o Decision makers are systematically overconfident when assessing chance. When assessing risks and probabilities, decision makers will routinely put a much higher chance on being right about forecasts then their actual track record would justify. That is, of all the times they say they are 90% confident that a particular risky event will not occur, they will turn out to be right much less than a 90% of the time.(1, 2, 3, 4) o Decision makers are extremely inconsistent. Researchers ask decision makers to assess each of some series of items (e.g. the survival probability of cancer patients, the chance of cancellation of software projects, the time to complete a defined task). Then the researchers later give the decision makers the same set of problems in a different order. Most answers will be different and some will be dramatically different. (5, 6) o Decision makers are influenced by irrelevant data. Due to an effect called anchoring, decision makers are influenced by previously presented data even if the data is irrelevant to the issue. (1) This means that even the order of the items to be scored changes judgments. Due to another effect called framing, even differently worded but logically identical phrases change assessments. (1) Errors Added by Typical Scoring and Ranking Methods: Not only are the errors above not addressed at all, but scoring methods tend to add unique errors of their own. o Arbitrary features of the scoring method cause unintended behaviors when scores are evaluated. For example, users of a scoring model based on a series of 5-point scales tend to use just one or two of the values most of the time but often express uncertainty about their choice. This “low resolution” means that there will be a large number of total scores within a small range and a single different choice will have a large effect on the ranks.(7, 8) 2 o The vaguely defined categories of what should be quantitative values (like defining risk as high/medium/low) cause confused responses even when users feel the methods are wellstructured.(9) 3 Solutions Fortunately, scientifically proven methods do exist to adjust and remove these sources of error. HDR uses two methods which have been thoroughly tested in controlled settings and in practical applications. Together, these methods can remove all of the previously mentioned sources of error that experts tend to have while not adding the errors of most scoring methods. These methods are “Calibration Training” and “The Lens Method”. In the case of overconfidence, researchers find that training significantly improves the ability of experts to assess odds. (10, 11, 12) The chart below shows the combined results of 11 studies of experts assessing subjective probabilities. In these studies, the average expert will only have about a 55% to 78% chance of being correct when they say they are 90% confidence. But after training, when experts are asked to subjectively assess the odds of a series of events, the experts will be right 90% of the time they say they are 90% confident, they will be right 75% of the time they say they are 75% confident, and so on. Such a person is then said to be able to provide “calibrated probabilities”. The Combined Results of 11 Studies in Probability “Calibration” Training Most experts significantly overstate their confidence in forecasts. Calibration training corrects this. 80% 50 6 0 70% % 60% 40% Research shows that various training methods each produce improvements in calibration of probabilities. HDR has combined several methods together and has found that about 72% of all decision makers can be calibrated in about 3 hours of training. Inconsistency is addressed in a different manner. When experts are asked to assess, say, the risk of failure for a list of projects, and then shown duplicates in a different order, they will give very different answers. But if their answers to a set of judgment problems can be “smoothed out” statistically, then the responses will not only be perfectly consistent, but will be better.(5, 6, 13) This is done by building what is called a regression model of the judgments of expert after observing a large number of their judgments (even if the judgments are for hypothetical situations). The following chart shows the effectiveness of this method in a variety of expert judgment situations. 4 The Lens Method Track Record: These are the measured improvements to judgments of experts on various topics 90% 70% Average of Un-calibrated 100% Assessed Chance Ideal Calibration Act Range of results Range fromofstudies results Overconfidence of forun-calibrated studies of calibra *These are studies done by HDR. The other studies are published literature. Combining calibration and the Lens Method without using arbitrary scales avoids significant sources of errors in expert judgments. These are not only well-documented effects in the published literature, but have been applied to a large number of practical problems in real-world settings. HDR has calibrated hundreds of experts for a variety of forecasting problems. The contractor has also used the Lens Method in several organizations to replace existing scoring models. Below is an overview of a seven-step process that combines calibration training with the Lens Method. Seven Steps for Building a Better Scoring Method 1. Identify SME’s: Identify the subject matter experts (SME’s) who will participate. 2. Identify Decision Factors: With the SME’s, identify a list of 10 or fewer factors that are predictive, fairly objective, and known at the time of the assessment (e.g. TRL, mission area, lead organization, etc.). 3. Calibrate SME’s: Train the SME’s to provide calibrated probability assessments. 4. Generate Scenarios: Using the identified factors, generate 30 to 50 scenarios using a combination of values for each of the factors just identified—they can be based on real examples or purely hypothetical. 5. Subjectively Assess Scenarios: Ask the experts to provide the relevant estimate for each scenario described (e.g. project risk, relative importance, etc.). 6. Statistically Analyze Assessments: Analyze the responses and develop a statistical model that approximates the relationship between the given factors and SME assessments. 7. Build a Spreadsheet Tool: Build a spreadsheet template that will generate assessments based on the factors provided. Conclusion 5 Since methods exist that avoid the errors of scoring methods and since they have been implemented in exactly the same environments, there is no reason to ever build a method based on arbitrary scores. Given the scope and risk of many of the major decisions scoring methods are routinely applied to, it is critical that scoring methods are replaced as soon as possible with methods that have a measurable effectiveness. Bibliography 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. D. Kahneman, P. Slovic, and A. Tversky, Judgement under Uncertainty: Heuristics and Biases, Cambridge: Cambridge University Press. 1982. D. Kahneman and A. Tversky, "Subjective Probability: A Judgement of Representativeness". Cognitive Psychology. 4: p. 430-454(1972). D. Kahneman and A. Tversky, "On the Psychology of Prediction". Psychological Review. 80: p. 237-251(1973). S. Lichtenstein, B. Fischhoff, and L.D. Phillips, Calibration of Probabilities: The State of the Art to 1980, in Judgement under Uncertainty: Heuristics and Biases, D. Kahneman, P. Slovic, and A. Tversky, Editors. 1982, Cambridge University Press: Cambridge. p. 306-334 E. Brunswick, "Representative Design and Probabilistic Theory in a Functional Psychology". Psychological Review. 62: p. 193-217(1955). N. Karelaia and R.M. Hogarth, "Determinants of Linear Judgement: A Meta-Analysis of Lens Studies". Psychological Bulletin. 134(3): p. 404-426(2008). L.A. Cox Jr., "What's Wrong with Risk Matrices?" Risk Analysis. 28(2): p. 497-512(2008) L. A. Cox. Jr. et. al. “Some Limitations of Aggregate Exposure Metrics” Risk Analysis, Vol. 27(2) 2007 D.V. Budescu, et. al., "Improving Communication of Uncertainty in the Reports of the Intergovernmental Panel on Climate Change". Psychological Science. 20(3): p. 299-308(2009) D. W. Hubbard, D. Evans “Problems with the use of scoring methods and ordinal scales in risk assessment," The IBM Journal of Research and Development: Special Issue on Business Integrity through Integrated Risk Management, Fall, 2009 (completed but yet to be published) C.P. Bradley, "Can We Avoid Bias?" British Medical Journal. 330: p. 784(2005) D. Hubbard, How to Measure Anything: Finding the Value of Intangibles in Business: John Wiley & Sons. 2007 C. F. Camerer “General Conditions for the Success of Bootstrapping Models” Organizational Behavior and Human Performance, 27, 1981, pp. 411-422 6