Lens-Executive-Overview

advertisement
The Lens method is a sound quantitative
method for countering the effects of
inconsistency in human responses. By
collecting possible determinants in
judgment and running regressions on
expert responses, a system of formulas can
be created to simulate an expert or group
of expert’s judgments. This is useful in
often repeated judgments and provides an
improvement in performance due to the
removal of inconsistency.
Lens
Executive
Overview
A proven method for
improving expert judgment
I
Methods for Improving on Scoring Models Used for Risk Assessment and Prioritization
An Executive Overview developed by Hubbard Decision Research
March 26, 2009
Overview
This is a description of the Applied Information Economics (AIE) approach to developing the portfolio
assessments. Hubbard Decision Research (HDR) uses the “Lens Method” which has been proven to
measurably improve judgments in a variety of fields since the 1950’s. HDR combines the Lens Method
with training for improving an individual’s ability to assess subjective probabilities and risks. This is a
substantial improvement over the most popular “scoring” methods for assessing and prioritizing portfolios.
Scoring Method Problems
For several decades, researchers in the decision sciences have measured the effectiveness of various
decision making methods and have quantified numerous biases, errors and quirky behaviors of decision
makers. Many practical methods have been developed that correct for these problems and result in
measurable improvements in the track record of decision makers. Yet, the typical scoring and ranking
methodology seems to be developed without any regard for this research. Not only do scoring and ranking
methods rarely correct for the known errors in human decision making, it appears that the scoring method
itself will add errors of its own. That is, experts would have been better off using their intuition.
Decision Errors in Scoring and Ranking Methods
There are problems scoring methods don’t fix and problems they add.


Known Judgment Errors Ignored by Typical Scoring and Ranking Methods: There is a long list of
sources of judgment error measured by researchers and virtually none are addressed by popular scoring
methods. The following three may be the most significant (and most avoidable).
o
Decision makers are systematically overconfident when assessing chance. When assessing risks
and probabilities, decision makers will routinely put a much higher chance on being right about
forecasts then their actual track record would justify. That is, of all the times they say they are
90% confident that a particular risky event will not occur, they will turn out to be right much less
than a 90% of the time.(1, 2, 3, 4)
o
Decision makers are extremely inconsistent. Researchers ask decision makers to assess each of
some series of items (e.g. the survival probability of cancer patients, the chance of cancellation of
software projects, the time to complete a defined task). Then the researchers later give the
decision makers the same set of problems in a different order. Most answers will be different and
some will be dramatically different. (5, 6)
o
Decision makers are influenced by irrelevant data. Due to an effect called anchoring, decision
makers are influenced by previously presented data even if the data is irrelevant to the issue. (1)
This means that even the order of the items to be scored changes judgments. Due to another effect
called framing, even differently worded but logically identical phrases change assessments. (1)
Errors Added by Typical Scoring and Ranking Methods: Not only are the errors above not addressed at
all, but scoring methods tend to add unique errors of their own.
o
Arbitrary features of the scoring method cause unintended behaviors when scores are evaluated.
For example, users of a scoring model based on a series of 5-point scales tend to use just one or
two of the values most of the time but often express uncertainty about their choice. This “low
resolution” means that there will be a large number of total scores within a small range and a
single different choice will have a large effect on the ranks.(7, 8)
2
o
The vaguely defined categories of what should be quantitative values (like defining risk as
high/medium/low) cause confused responses even when users feel the methods are wellstructured.(9)
3
Solutions
Fortunately, scientifically proven methods do exist to adjust and remove these sources of error. HDR uses
two methods which have been thoroughly tested in controlled settings and in practical applications.
Together, these methods can remove all of the previously mentioned sources of error that experts tend to
have while not adding the errors of most scoring methods. These methods are “Calibration Training” and
“The Lens Method”.
In the case of overconfidence, researchers find that training significantly improves the ability of experts to
assess odds. (10, 11, 12) The chart below shows the combined results of 11 studies of experts assessing
subjective probabilities. In these studies, the average expert will only have about a 55% to 78% chance of
being correct when they say they are 90% confidence. But after training, when experts are asked to
subjectively assess the odds of a series of events, the experts will be right 90% of the time they say they are
90% confident, they will be right 75% of the time they say they are 75% confident, and so on. Such a
person is then said to be able to provide “calibrated probabilities”.
The Combined Results of 11 Studies in Probability “Calibration” Training
Most experts significantly overstate their confidence in forecasts. Calibration training corrects this.
80%
50
6
0 70%
%
60%
40%
Research shows that various training methods each produce improvements in calibration of probabilities.
HDR has combined several methods together and has found that about 72% of all decision makers can be
calibrated in about 3 hours of training.
Inconsistency is addressed in a different manner. When experts are asked to assess, say, the risk of failure
for a list of projects, and then shown duplicates in a different order, they will give very different answers.
But if their answers to a set of judgment problems can be “smoothed out” statistically, then the responses
will not only be perfectly consistent, but will be better.(5, 6, 13) This is done by building what is called a
regression model of the judgments of expert after observing a large number of their judgments (even if the
judgments are for hypothetical situations). The following chart shows the effectiveness of this method in a
variety of expert judgment situations.
4
The Lens Method Track Record:
These are the measured improvements to judgments of experts on various topics
90%
70%
Average of Un-calibrated
100%
Assessed Chance
Ideal Calibration
Act
Range of results
Range
fromofstudies
results
Overconfidence
of
forun-calibrated
studies of calibra
*These are studies done by HDR. The other studies are published literature.
Combining calibration and the Lens Method without using arbitrary scales avoids significant sources
of errors in expert judgments. These are not only well-documented effects in the published literature,
but have been applied to a large number of practical problems in real-world settings. HDR has
calibrated hundreds of experts for a variety of forecasting problems. The contractor has also used the
Lens Method in several organizations to replace existing scoring models. Below is an overview of a
seven-step process that combines calibration training with the Lens Method.
Seven Steps for Building a Better Scoring Method
1. Identify SME’s: Identify the subject matter experts (SME’s) who will participate.
2. Identify Decision Factors: With the SME’s, identify a list of 10 or fewer factors that are
predictive, fairly objective, and known at the time of the assessment (e.g. TRL, mission area,
lead organization, etc.).
3. Calibrate SME’s: Train the SME’s to provide calibrated probability assessments.
4. Generate Scenarios: Using the identified factors, generate 30 to 50 scenarios using a
combination of values for each of the factors just identified—they can be based on real
examples or purely hypothetical.
5. Subjectively Assess Scenarios: Ask the experts to provide the relevant estimate for each
scenario described (e.g. project risk, relative importance, etc.).
6. Statistically Analyze Assessments: Analyze the responses and develop a statistical model that
approximates the relationship between the given factors and SME assessments.
7. Build a Spreadsheet Tool: Build a spreadsheet template that will generate assessments based on
the factors provided.
Conclusion
5
Since methods exist that avoid the errors of scoring methods and since they have been implemented in
exactly the same environments, there is no reason to ever build a method based on arbitrary scores.
Given the scope and risk of many of the major decisions scoring methods are routinely applied to, it is
critical that scoring methods are replaced as soon as possible with methods that have a measurable
effectiveness.
Bibliography
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
D. Kahneman, P. Slovic, and A. Tversky, Judgement under Uncertainty: Heuristics and Biases,
Cambridge: Cambridge University Press. 1982.
D. Kahneman and A. Tversky, "Subjective Probability: A Judgement of Representativeness".
Cognitive Psychology. 4: p. 430-454(1972).
D. Kahneman and A. Tversky, "On the Psychology of Prediction". Psychological Review. 80: p.
237-251(1973).
S. Lichtenstein, B. Fischhoff, and L.D. Phillips, Calibration of Probabilities: The State of the Art
to 1980, in Judgement under Uncertainty: Heuristics and Biases, D. Kahneman, P. Slovic, and A.
Tversky, Editors. 1982, Cambridge University Press: Cambridge. p. 306-334
E. Brunswick, "Representative Design and Probabilistic Theory in a Functional Psychology".
Psychological Review. 62: p. 193-217(1955).
N. Karelaia and R.M. Hogarth, "Determinants of Linear Judgement: A Meta-Analysis of Lens
Studies". Psychological Bulletin. 134(3): p. 404-426(2008).
L.A. Cox Jr., "What's Wrong with Risk Matrices?" Risk Analysis. 28(2): p. 497-512(2008)
L. A. Cox. Jr. et. al. “Some Limitations of Aggregate Exposure Metrics” Risk Analysis, Vol. 27(2)
2007
D.V. Budescu, et. al., "Improving Communication of Uncertainty in the Reports of the
Intergovernmental Panel on Climate Change". Psychological Science. 20(3): p. 299-308(2009)
D. W. Hubbard, D. Evans “Problems with the use of scoring methods and ordinal scales in risk
assessment," The IBM Journal of Research and Development: Special Issue on Business Integrity
through Integrated Risk Management, Fall, 2009 (completed but yet to be published)
C.P. Bradley, "Can We Avoid Bias?" British Medical Journal. 330: p. 784(2005)
D. Hubbard, How to Measure Anything: Finding the Value of Intangibles in Business: John Wiley
& Sons. 2007
C. F. Camerer “General Conditions for the Success of Bootstrapping Models” Organizational
Behavior and Human Performance, 27, 1981, pp. 411-422
6
Download