Decision Analysis Vol. 10, No. 2, June 2013, pp. 121–134 ISSN 1545-8490 (print) ISSN 1545-8504 (online) http://dx.doi.org/10.1287/deca.2013.0268 © 2013 INFORMS Toward an Improved Methodology to Construct and Reconcile Decision Analytic Preference Judgments Richard M. Anderson Puget Sound Institute, Center for Urban Waters, University of Washington Tacoma, Tacoma, Washington 98421, rmanders@uw.edu Robert Clemen Fuqua School of Business, Duke University, Durham, North Carolina 27708, robert.clemen@duke.edu P sychologists and behavioral economists have documented a variety of judgmental flaws that people make when they face novel decision situations. Similar flaws arise when decision analysts work with decision makers to assess their preferences and trade-offs, because the methods the analyst uses are often unfamiliar to the decision makers. In this paper we describe a process designed to mitigate the occurrence of such biases; it brings together three steps. In training, the decision maker is first given values to apply in judgment tasks unrelated to the decision at hand, providing an introduction to thinking deliberately and quantitatively about preferences. In practice, the learned tasks are then applied to a familiar decision, with the goal of developing the next incremental level of expertise in using the methods. Finally, in application, the more deliberative style of thinking is used to address the problem of interest. In an environmental resource setting with two oyster habitat managers, we test the procedure by attempting to mitigate the prominence effect that has been reported in the behavioral research literature. The resulting preference weights appear to be free of the prominence effect, providing initial steps toward operationalizing the “building code” for preferences introduced by Payne et al. [Payne JW, Bettman JR, Schkade DA (1999) Measuring constructed preferences: Towards a building code. J. Risk Uncertainty 19(1–3):243–270]. Key words: behavioral decision making; biases; decision analysis; environment; multiattribute value theory; practice History: Received on December 5, 2012. Accepted by former Editor-in-Chief L. Robin Keller on February 11, 2013, after 1 revision. 1. Introduction assessed and combined using an additive multiattribute value function Decision analysis approaches decision complexity generally by decomposing a large and difficult decision problem into a representation consisting of alternatives, preferences, and beliefs. The analyst typically guides the decision maker in identifying and assessing these components so that they are consistent with the relevant normative theory and thus can be recombined using the rules of the theory. For example, many problems involve evaluating an alternative x described in terms of a set of attributes x1 1 x2 1 0 0 0 1 xn . The alternative might be a prospective job offer, and the decision problem to choose between job offers that differ in terms of salary, vacation days, and location. In this situation, the components commonly consist of single attribute value funcP tions vi 4xi 5 and weights wi (where i wi = 1), both representing decision maker preferences, which are v4x5 = n X wi vi 4xi 50 (1) i=1 For (1) to be a valid representation of aggregate value v4x5, a normative requirement is that the attributes are structured in such a way that they may be thought of as separable in the sense that variations in preference for outcomes in terms of a given attribute are independent of preferences for outcomes in terms of the others (Keeney and Raiffa 1976). A prescriptive recommendation is assessing redundant component judgments using different methods and performing consistency checks to reconcile any discrepancies. The process of reconciliation is thought to improve quality of the final assessment since it provides opportunity for the decision maker to think 121 122 harder about her values, and thus arrive at greater confidence that her values have indeed been captured (von Winterfeldt and Edwards 1986, Huber et al. 1993, Hobbs and Horn 1997). Consider, for example, the weights wi in (1). These reflect preference judgments about the ranges of attribute outcomes spanning the alternatives under consideration. A basic notion of consistency is as follows. Suppose the available difference in annual salary (over all job prospects) is $20,000 and salary is weighted twice as highly as vacation days, which range from two to four weeks. Suppose further that a reduction in vacation days from four to two weeks is equivalent in value to having a job in a coastal rather than noncoastal location (assuming the coastal location is preferred). To be consistent, the decision maker should confirm that she would be willing to give up the coastal job location in return for an increase in salary of $10,000, assuming that value varies linearly over these ranges. The results of a large body of research in behavioral decision making have shown that violations of this elementary notion of consistency often take place, and will typically be because of more than, say, random errors due to routine lapses of mental attention (e.g., Lichtenstein and Slovic 2006). This research reports that preference judgments display an extraordinary degree of sensitivity as well as predictable directions of bias depending on the way in which preference elicitation questions are asked. For example, the response to the final question in the last paragraph has been shown to depend on whether the question was put in the form of a choice between two alternatives or whether the respondent was given a matching task, i.e., asked to give a value that would induce indifference between alternatives. Respondents have, for example, been shown to prefer the alternative that is superior on the prominent attribute more often in choice than in matching, a widely replicated preference bias called the prominence effect (e.g., Tversky et al. 1988, Fischer et al. 1999). Indeed, these observations have been so ubiquitous that one prominent team of researchers in this field suggested that it may be the case that “the foundations of choice theory and decision analysis are called into question” (Tversky et al. 1988, p. 84; see also Fischer et al. 1999, p. 1074 for a similar comment). The Anderson and Clemen: Decision Analytic Preference Judgments Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS foundation of which they speak is in part the plausible notion that if well-defined preferences exist, they should be invariant to different, but logically equivalent, procedures to elicit them. The reality has proven more complex. What has instead been observed is that preferences depend on details of how the question is asked (e.g., choice versus match, mentioned earlier; Tversky et al. 1988), how the choice set is displayed (e.g., attribute order; Carlson et al. 2006), or on characteristics of other options in the choice set (e.g., whether an asymmetrically dominated option is included; Huber et al. 1982), to give a few examples. Behavioral decision researchers have concluded that in complex and unfamiliar elicitation settings when preferences are not available for direct recall, subjects are not merely revealing stable preferences, but instead constructing them on the spot. Lacking precise knowledge of how their fundamental, familiar values relate to the decision setting at hand, they are prone to respond based on error-prone heuristics or on intuition readily influenced by the most affectual or the most salient features of the elicitation procedures (Gilovich et al. 2002, Lichtenstein and Slovic 2006). There is active debate on the mechanisms behind such phenomena, but little doubt that preferences revealed depend on the questions asked (e.g., Fischer et al. 1999). How should the proponent of decision analysis respond? If sensitivity analysis reveals that these errors matter, one response is to attempt to use what is being learned about how people respond to elicitation procedures to improve the way in which these procedures are conducted. Indeed, this has been called one of the central tasks of behavioral decision theory (Edwards 1990). With respect to value weights wi , there have been a number of attempts to determine, first, features of elicitation protocols to which people are sensitive and the magnitudes and directions of bias that tend to result, and then, second, ways of counteracting those biases, including warning and training respondents to guard against them and designing analytical approaches to mitigate their occurrence. For example, one often replicated result is that the particular version of “value tree” used to structure objectives impacts value weights, producing a so-called splitting bias where presenting an attribute Anderson and Clemen: Decision Analytic Preference Judgments Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS in greater detail increases the weight that it receives (e.g., Borcherding and von Winterfeldt 1988 report on one of the earliest studies of this phenomenon). In two more recent investigations, Hamalainen and Alaja (2008) attempted, with partial success, to avoid this problem by educating respondents about the danger, whereas Jacobi and Hobbs (2007) developed a computational method to debias weights. Another analytical approach to debiasing weights was aimed at standard “matching” or trade-off task approaches to obtaining weights (setting an attribute level so that a pair of consequences is indifferent). In this case, Anderson and Hobbs (2002) used a statistical procedure to adjust for the magnitude and direction of scale compatibility bias also frequently reported on within the literature, in which the attribute that is adjusted in the response appears to be more salient, leading to overweighting (e.g., Tversky et al. 1988, Slovic et al. 1990, Delquié 1993, Fischer et al. 1999). In contrast to this analytical approach, Delquié (1997) proposed a potential improvement to the matching procedure itself in which both attributes of the pair are adjusted. However, Fischer et al. (1999) found that a similar procedure remained likely to show a similar bias. Yet another approach has been to use weights derived from a less cognitively demanding task such as simply ranking the attribute ranges under consideration, followed by deriving weights via a specific algorithm (Edwards and Barron 1994, Barron and Barrett 1996). However, this would lead to a loss of specific ratio scale information by the decision maker and potentially suboptimal decisions (Jia et al. 1998). This paper returns to the initial recommendation for addressing error and bias in value weights described earlier, that of making redundant assessments using different methods and reconciling any inconsistencies. Respondents will usually find themselves in complex settings lacking everyday familiarity and will thus to some degree be constructing their preferences and therefore prone to inconsistency. Our central idea is to use behavioral insights to improve the process of weight elicitation and particularly of weight reconciliation in such settings (cf. Edwards 1990, Keeney 2002). We distinguish between approaches described in the previous paragraph where these insights were used to debias weights “offline” from the process of interaction 123 with the decision maker and approaches in which these insights are used to design a better interactive elicitation or reconciliation procedure. This paper focuses on the latter category. The former category of approaches will be most useful when little time is available to interact with the decision maker and in such cases provides a plausible approach given known tendencies toward weighting bias. However, it leaves unaddressed the prescriptive question of how to use specific behavioral insight to interact with the decision maker in a way that moves toward judgments that are both unbiased, providing normative validation, and in which the decision maker herself has a high degree of confidence, providing user acceptancebased “validation.” To be clear, our ultimate goal is to develop weights that are normatively valid—that is, that accurately reflect the long-term preferences of the elicitee or her agent—but, as we explain later, user confidence should be one of validity’s first fruits in an improved process. We propose that behavioral insight can and should be used to explicitly develop prescriptive guidance about how weights should be elicited and then reconciled, seeking to build on previous studies that applied multiple methods but did not report details of the underlying behavioral philosophy (e.g., Hobbs and Horn 1997). Thus we attempt to operationalize the Payne et al. (1999) concept of a “building code” for preference construction, providing a framework for a new way to think about preference elicitation and reconciliation (cf. Gregory 1999). We illustrate these ideas in an environmental preference assessment exercise conducted with two professional natural resource managers. Although our demonstration shows that our approach can be used with real decision makers, we hasten to add that the demonstration does not constitute experimental evidence one way or the other. Providing such evidence remains for a future study. We continue in §2 by reviewing existing literature on reconciliation to see what it suggests about how reconciliation has been and should be done. We ask if these previous approaches to eliciting and reconciling value weighting judgments have led to weights that satisfy two key aspects of validity, convergence, and user acceptance. We conclude that challenges to doing so remain, motivating the synthesis of the set of Anderson and Clemen: Decision Analytic Preference Judgments 124 Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS procedures described in §3 for eliciting and reconciling the component weight judgments required for the multiattribute value model (1). In §4 we describe how this procedure was applied in elicitations conducted with two North Carolina natural resource managers. In §5 we describe how our approach represents initial steps toward implementation of the Payne et al. (1999) building code for preference construction, and we outline hypotheses that should be tested to translate those guidelines into generalizable and practical guidance for decision analysts. 2. Studies on Reconciliation: Is Convergent Validity the Answer? Beyond the standard advice to perform redundant elicitations, little has been reported in the literature on what actually happens or what should happen in the reconciliation process. For one thing, merely asking respondents to “think harder” about their preferences might not lead to more accurate assessments if the judgment task they have been assigned is of a level of difficulty such that simpler but more errorprone heuristic thought procedures are employed to carry them out (Payne et al. 1993). Thus the goal of merely converging on a final response may produce a result that, despite greater confidence on the part of the respondent, is still confounded by errors of the sort previously described. Are reconciled judgments more bias free? Can we expect them to be so? One perspective on inconsistencies in attempted measures of preference is that the inconsistency measures are invalid and that decision-analytic attempts to obtain useful expressions of preference are thus subject to major if not fatal problems (e.g., Tversky et al. 1988, Fischer et al. 1999). For example, many studies have used a convergent validity approach that compares values obtained for the same objects by a holistic and a decomposed procedure. The holistic approach is the intuitive, unaided approach to, say, ranking a set of multiattributed alternatives, whereas an example of a decomposed procedure is the typical decision-analytic decomposition into preferences, beliefs, and alternatives as exemplified by value model (1). These studies have typically been motivated by descriptive questions (e.g., When are the judgments different?), as opposed to prescriptive considerations, where decomposition is recommended for its ability to help address human difficulties in making complex comparisons. The convergent validity principle predicts that if these different approaches are measuring the same thing (e.g., preferentially separable, multiattributed objects), they should be highly correlated; this is one approach to evaluating validity when dealing with measures such as stated utilities that are subjective and unobservable. The approach is also used to assess multiattribute value or utility weights elicited by different procedures as we do in this paper, where the idea of avoiding lack of “validity” (in its broader sense; e.g., Cronbach and Meehl 1955) could be more narrowly expressed as minimizing elicitation mode bias. However, in anticipation of future applications of the broader ideas, we retain use of the term “validity” when we discuss convergence (cf. Gregory 2004). For the holistic versus decomposed comparisons, in many cases correlations were high, though not perfect, and arguably offered evidence in favor of validity. Von Winterfeldt and Edwards (1986, p. 364) listed 11 such studies, with correlations ranging from 0.7 to 0.9. However, two studies, one experimental (John 1984) and the other a field study looking at alternative future German energy scenarios (Keeney et al. 1987), dealt with more challenging decision tasks in settings where attributes were uncorrelated or negatively correlated. In these studies, ordinal discrepancies suggesting lack of validity were obtained between intuitive rankings and results of a multiattribute model. There are also many studies that have reported low correlations and ordinal discrepancies in multiattribute utility models between preference weights derived by different procedures (e.g., Schoemaker and Waid 1982, Borcherding et al. 1991, Hobbs et al. 1992, Bell et al. 2001). Still, counter to the prior-stated suggestion that such inconsistencies and biases call into question the validity of these measures of preference or the usefulness of the procedures that attempt to obtain them, the decision-analytic response is typically to note that they fall short of violations of the normative assumptions of utility theory, as in Ellsberg’s paradox, for example (e.g., von Winterfeldt and Edwards 1986). Rather, they are unintended though nonrandom inconsistencies that should and can be somehow reconciled. In fact, it has been further suggested Anderson and Clemen: Decision Analytic Preference Judgments Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS that such an error can be an asset rather than a liability—that the attempt to reconcile the demands of the decision-analytic models to decision maker intuition can generate a “creative stress” that provides opportunity for deeper insight into the decision problem (von Winterfeldt and Edwards 1986). At a minimum, noticing and reconciling such inconsistencies is an inherent part of any realistic decision analysis. For example, John (1984) presented inconsistencies between multiattribute utility-based and holistic rankings to subjects and asked, “Why are you inconsistent and which numbers would you like to change?” Subjects attributed the disagreement to faulty judgment and were explicit about what they wanted to change. Keeney et al. (1987) obtained a similar result. But these authors did not address whether these reconciled judgments were themselves bias free. Eppel (1990) studied the effect of reconciling inconsistent multiattribute value weights in a series of laboratory experiments. His studies focused on the weights themselves and did not involve choosing alternatives. His experiments were administered by computer, facilitating tailored wording and sequencing of questions and immediate feedback and reconciliation of weight inconsistencies to respondents. Four widely used weight elicitation methods were compared: Ratio, Swing weighting, Indifference Trade-off, and Pricing out. (See von Winterfeldt and Edwards (1986) for a detailed explanation of each method.) An additive representation with linear single attribute value functions was assumed throughout. Respondents were asked to reconcile any ordinal inconsistencies between the weights from the different methods using one of two modes, a Trade-off mode and a Pricing mode. In the Trade-off mode, normalized weights were converted into indifference values between two alternatives that were supposed to be equal in value, where each alternative was described in terms of two attributes. For each of the four weight elicitation methods, the indifference values for a particular attribute were displayed simultaneously on the screen and respondents were asked to indicate what indifference value they felt most comfortable with, one of the four or a new one. This process was repeated for each attribute consistent with the normalization that the weights summed to one. The Pricing mode differed only in that results were displayed 125 and reconciled in terms of the amount society should be willing to pay to move each attribute from its worst to its best consequence (i.e., the cost attribute). Respondents were undergraduates. In contrast to Eppel (1990), who used a computerized instrument with undergraduates in a laboratory setting, Hobbs and Horn (1997) held one-on-one interviews to reconcile inconsistent decision analysis judgments from citizen stakeholders in a realistic energy resource planning process for a Canadian utility. Recommended alternatives would be added to a portfolio of nonexclusive options to provide the desired energy services. In this study, two different weight elicitation methods were applied, a Ratio-questioning/Swingweighting hybrid and Indifference Trade-off weighting, each combined with the same additive single attribute value functions. In a third approach, alternatives were also scored holistically. Interviews were held with stakeholders individually, each lasting about 1.5 hours. The reconciliation proceeded in two stages: In the first stage, weights from the two elicitation approaches were reflected back to the stakeholder, compared, and discussed. Stakeholders were given an opportunity to revise any judgments or choose a preferred set of weights. Discussion then focused on the alternatives recommended by the different sets of weighting judgments. Each alternative could be categorized as either “in” or “out” in this formulation, and reconciliation focused on cases where an alternative was ranked “in” under one weighting method and “out” under the other. Stakeholders were asked to indicate which of the two sets of weights was consistent with their values and why, leading ultimately to a final set of weights and recommended alternatives with which they were most comfortable. In the second stage, the final recommendation for each alternative (“in” or “out”) was compared to that made by the holistic assessment. Any differences were discussed. A final recommendation for each alternative was elicited, and when it differed from the recommendation of the preferred weights, a rationale was requested. As would be expected, both studies found that different weight elicitation methods lead to different weights, suggesting the need for reconciliation: different magnitude orderings of weights in Eppel (1990) 126 and different weights in the sense of recommending different portfolios of alternatives in Hobbs and Horn (1997). Hobbs and Horn (1997) also found (stage two) disagreement in recommended alternatives between final weights and holistic choices. Eppel (1990) attempted to understand the strategies used by respondents to reconcile inconsistent weights. To do so, he categorized each final response (separately for Trade-off and Pricing modes) as being closest (in terms of absolute difference) to one of the four responses (i.e., the feedback response) initially obtained by the four weight elicitation methods. Each respondent was categorized as reconciling toward the method that had the lowest average (across attributes) absolute difference between feedback and final values. The frequencies of these classifications for the two reconciliation formats (Trade-off or Pricing) were then analyzed. In contrast, Hobbs and Horn (1997) focused on the degree of stakeholder confidence in final recommended portfolios of alternatives. Most inconsistencies in this study were finally resolved in favor of weight-recommended portfolios, but there were some exceptions in which stakeholders finally rejected the model-based recommendations. Typically, the reason given was disagreement with the assumptions behind multiattributed descriptions of certain alternatives. Providing opportunity for such insights had been viewed as one of the goals of the reconciliation process. Based on explicit stakeholder feedback, Hobbs and Horn (1997) concluded that applying different preference elicitation methods plus reconciliation resulted in learning and increased confidence in recommendations. Although these studies shed light on what does happen in reconciliation of multiattribute value weights, they also suggest additional things that should happen in reconciliation. Given the abundance of evidence in the literature that preferences are often constructed at the time the valuation question is asked and thus will reflect features of the construction process, a more informative reconciliation procedure might attempt to indicate whether the reconciled weights are likely to be free of assessment-related biases. If preferences are really being constructed at the time of elicitation, then the reconciliation process in which initial judgments are reviewed and then adjusted to better Anderson and Clemen: Decision Analytic Preference Judgments Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS reflect values should overlap considerably with the construction process. User acceptance and confidence in results of the reconciliation process is certainly of critical importance (e.g., Hobbs and Horn 1997); the goal, of course, being an expression of preference that serves as a guide to action. However, we also wish that this confidence rests on a firm foundation, one that is not essentially determined by features of the procedure by which these results (e.g., weights) were initially elicited and/or reconciled. A more diagnostic look at the processes involved in reconciliation could suggest further avenues toward accomplishing these dual objectives. For example, Eppel (1990) reported evidence that reconciled weights might not be bias free. Recall that he calculated the frequencies of classifications for the two reconciliation modes, Trade-off and Pricing. So, for example, if a respondent’s final, reconciled response via the Trade-off mode was closest to the feedback response from the Ratio method, he was categorized as reconciling toward the Ratio method. Eppel (1990) found that for the Trade-off reconciliation mode, the largest fraction (6/16) of respondents reconciled toward the Trade-off weighting feedback value, and for the Pricing reconciliation mode, the largest fraction (also 6/16) of respondents reconciled toward the Pricing out weighting feedback value. He interpreted this as evidence of a tendency to reconcile toward the method that is most compatible with the corresponding feedback format. He noted that this could be partly because of a memory effect, since the feedback indifference values for the Trade-off weighting method were identical to the ones the respondent gave in the initial elicitation. However, Eppel (1990) also suggested that the question remains open whether this could be because of a more fundamental “compatibility effect” similar to the one discussed in the literature (e.g., Tversky et al. 1988, Slovic et al. 1990, Delquié 1993, Anderson and Hobbs 2002), where the response (in this case, the reconciliation strategy) is biased toward the most compatible stimulus (in this case, the feedback format). If so, reconciled weights may be biased by the particular process by which they were obtained. A second, follow-up experiment by Eppel (1990) for a different Anderson and Clemen: Decision Analytic Preference Judgments 127 Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS decision context with an expanded set of reconciliation modes produced confirmatory results. Eppel (1990) also found that, in general, reconciled weights failed to conform to certain other desirable requirements: they displayed the well-known “splitting bias” (e.g., Borcherding and von Winterfeldt 1988, Weber et al. 1988) and failed to appropriately respond to changes in attribute ranges (e.g., Fischer 1995, Keeney 2002). Both of these effects had been present in the initial weights as well. In contrast to Eppel (1990), Hobbs and Horn’s (1997) study reflected a shift away from a focus on satisfying such desirable criteria to a concern with user acceptance and confidence in final weights. Hobbs and Horn (1997) suggested that respondents found the more flexible one-on-one format of personal interviews to be essential to a reconciliation process that increases learning and builds user confidence. But it would seem that this would tend to compromise one’s ability to maintain experimental control across interviews because the style of the experimenter might change from one interview to the next (as pointed out by Eppel 1990). Analyst influence is one of the factors reported to affect preferences in a constructive context (e.g., Fischhoff et al. 1980, Brown 2005). Perhaps more importantly, the behavioral rationale for the elicitation and reconciliation design is not specified, so no basis is provided for having confidence that the final weights are unbiased. Still, in light of the difficulties encountered in obtaining satisfactory weights, Eppel (1990) concluded that perhaps a broader viewpoint on what constitutes a valid or high-quality preference assessment process is needed, e.g., one that also considers user confidence. In other words, it may be that no single preference assessment process is likely to satisfy all aspects of validity. Is such a compromise really necessary? In distinction from Eppel (1990), we suggest that an improved process, both initial elicitation as well as an appropriately designed reconciliation, can lead to weights that are both bias free and satisfy users. Freedom from bias (here, elicitation mode bias), remains the primary determinant of validity, but we would hope that the implied user confidence that his or her preferences have been accurately assessed would be one of the immediate and most important fruits of the process. This is what should happen in reconciliation of value weights, though our observations in this section suggest that this is not necessarily what does happen. In the next section, we turn to discussion of what a process that does both might look like, linking the proposed improvements to the particular issue they are designed to address. 3. Elicitation and Reconciliation to Learn Both Methods and Preferences To move toward the design of an elicitation and reconciliation procedure that addresses the dual goals of freedom from bias and user acceptance, we offer the following fundamental principle: because respondents are constructing or learning about their preferences during elicitation, decision-analytic techniques should facilitate this process (Payne et al. 1999, Gregory et al. 2005). Arguably, then, decision analysis should provide a framework so that respondents may adjust their initial responses as they better understand how the relevant values (e.g., their basic values) apply to the decision situation at hand. In short, it should provide an environment in which they can learn about their preferences. That this is done in a behaviorally effective way is the first requirement of an improved elicitation and reconciliation design. Previous reconciliation procedures may not effectively facilitate the learning process. Some respondents have found the trade-off weighting approach to eliciting value weights (Keeney and Raiffa 1976) too complex, suggesting this approach is sometimes a hindrance rather than a help (e.g., Fischer 1995, Hobbs and Horn 1997). More fundamentally, previous procedures may have failed to adequately instruct respondents in conducting the more deliberate and quantitative introspection about preferences that is required. Thus it becomes unclear whether adjustments during reconciliation actually represent learning about (e.g., constructing) preferences. They may instead signal confusion about how the methods work, how to do the thinking required, or even about what kind of thinking is required. Kahneman and Frederick (2002) discuss a dual-process model of cognitive operations in which more deliberative, reasondriven judgments may endorse, correct, or override judgments that are more intuitive, affective, and error 128 prone (see also Kahneman 2011). Facilitating the more deliberate judgments is, of course, the role of decision analysis, but if respondents maintain what may be a customarily more intuitive style in providing judgments, then decision analysis fails to carry out its approach toward facilitating learning. In fact, these two styles of thinking, reason and intuition, are inextricably linked in decision making, so to be more successful an elicitation procedure must bring the two together (Finucaine et al. 2003). This suggests that as a second requirement, an improved elicitation and reconciliation design must seek to ensure that the more deliberative way of thinking about preferences is actually taking place and interfacing with the more intuitive. These two requirements are addressed by a threestep process: (1) training, designed to build skill and familiarity with the style of thinking required; (2) practice, where the learned elicitation tasks are performed for a decision with which the respondent is familiar; and (3) application, where the elicitation tasks, including any necessary reconciliation of inconsistent judgments, are applied to the actual decision context that we are interested in for the analysis. We proceed to explain each of these steps in detail, including our rationale for how it satisfies the two requirements just elucidated. 3.1. Training In an approach we adapt here in our first step, Huber et al. (2002) describe the use of training to prepare respondents in a principal–agent study. The respondents are given a set of part-worths (the overall preference or value associated with each specified level of each attribute1 ) that they are to apply in performing a series of choice and matching tasks, and then given feedback on the accuracy of their responses. Because the respondents are able to focus exclusively on how the judgment tasks are used to translate the known part-worths, it is likely that by the end of training they would be highly confident in making the judgments and introduced to a way of thinking deliberately and quantitatively about preferences. 1 For example, using value model (1), the part-worth associated with option x on attribute i would be wi vi 4xi 5. Anderson and Clemen: Decision Analytic Preference Judgments Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS 3.2. Practice In this second step, the respondents first encode single attribute value functions, and then carry out the choice and matching tasks just learned for a decision for which they have everyday familiarity. Value weights can be derived from responses to the choice and matching tasks. An example of a familiar decision is buying groceries, where there is direct interaction with the market, and where through repeated experience a stable set of preferences is likely to have been formed, in particular, monetary values associated with the experienced consequences of purchasing decisions. This step involves only an incremental level of difficulty compared to the previous step because, using methods with which they are now familiar, respondents need only encode their personal, well-known values. We hypothesize that, after these first two steps, the respondents’ status with respect to understanding preference-elicitation methods and quantitative ways of thinking about preferences will be significantly improved in two specific ways. First, the respondents should possess a significantly greater comfort level and skill with the methods involved in performing the judgment tasks. Second, the respondents may have gained important insights through contrasting the outcomes of familiar decisions made in a customarily more intuitive style with outcomes of the decision analytic approach that seeks to be more deliberate and consistent. We would hypothesize that any variability or inconsistency in preferences observed in the next step–application to the real problem–will be because of a lack of understanding of (or certainty about) preferences, and not lack of understanding of the assessment methods or quantitative way of thinking about preferences. With the removal of this confounding factor related to method, an environment would have been created for respondents to effectively learn quantitatively about (e.g., construct) their preferences in the real problem of interest and for unambiguous identification (by the analyst) of any behavior indicating poor preference construction. 3.3. Application Finally, the learned elicitation methods and more deliberative thinking style are carried out to address Anderson and Clemen: Decision Analytic Preference Judgments Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS the decision problem of interest. Elicitation and reconciliation can now be expected to be a more deliberative process and to reflect quantitative learning about one’s preferences, integrating this style of thinking with the more intuitive and qualitative. Because the elicitation methods as well as the types of thinking involved in making the judgments are now better understood and integrated, we hypothesize that this learning will also produce a high level of user acceptance of the methods and their results. Choice and matching tasks are chosen as the different methods to use in carrying out the redundant assessments and consistency checks involved in deriving value weights based on a behavioral rationale. In one of the strongest preference biases documented in the decision-making literature, choice tasks have been shown to lead to steeper (more unequal) weights, whereas matching (direct tradeoff) tasks have been shown to lead to flatter (more equal) weights (Tversky et al. 1988, Fischer et al. 1999). This effect has been labeled the prominence effect, for which a recent explanation is that response tasks, like choice, that have as a goal differentiating between alternatives tend to give more weight to the more important (prominent) attribute than tasks such as matching that have equating alternatives as their goal (Fischer et al. 1999). Therefore, all other things being equal, it is plausible that using them together is likely to create offsetting tendencies and result in weights that better reflect user preferences. This idea has been described differently: (i) as using a portfolio of assessment procedures for which the biases cancel out (Kleinmuntz 1990); and (ii) as attempting to overcome inconsistencies among methods by triangulation of responses (Payne et al. 1999). Our point is that using these two particular methods together provides a behavioral basis for expecting weighting judgments to be free of a major, known weighting bias. 4. Demonstration: Attempting to Mitigate the Prominence Effect Through Training and Practice in Elicitation Methods Two state agency natural resource managers provided decision-analytic preference judgments to construct 129 value model (1) for a decision problem involving setting harvest policy for management of oyster fishery habitat in North Carolina, United States. Decision maker 1 provided judgments from which single attribute value functions and value weights were derived using choice and matching tasks for a twoattribute problem, whereas decision maker 2 did the same for a three-attribute version of the problem. Before addressing the attributes of interest, both individuals underwent the training and practice steps described earlier. As stated at the outset, although our example demonstrates the feasibility of the approach, it is not meant to provide controlled experimental evidence. 4.1. Training The respondents were presented with part-worths, which reflect linearity of the value functions, and are assumed to be known (Figure 1). An example of a choice training question, including the feedback that was given if the chosen alternative is wrong, is given in Figure 2(a). For this question, the correct choice is A, because the length of the “Total cost’’ bar reflected by the poor–good difference is greater than that reflected by the “Ski slope quality’’ difference. Several training choice tasks of differing complexity were administered. For training in matching tasks, the respondents learned the correct answer ($573 in Figure 2(b)) after generating their estimates. Additional feedback depended on whether responses were within 10%, 20%, or >20% of the magnitude of the correct response. An answer within 10% elicited a “Very good” response, 10%–20% elicited an “OK,” whereas >20% elicited, “That’s not very accurate.” Again, several matching training tasks were administered. Both respondents gradually developed facility in making the judgments, including quite subtle ones. 4.2. Practice In this step the respondents chose a familiar, twoattribute problem to provide single attribute value function, choice, and matching judgments. For one decision maker it was a decision on purchasing a favorite fruit from the grocery store, with the attributes cost per item and distance to the location where the fruit was grown (as proxy for environmental performance). Attribute value functions were Anderson and Clemen: Decision Analytic Preference Judgments 130 Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS Figure 1 Known Part-Worths and Instructions Provided to Respondents in the Training Stage of Elicitation and Reconciliation Relative importance of shift from Poor Fair Good | Total cost ($900-$600-$300) | Ski slope quality (C-70, B-80, A-90) | Likelihood of excellent snow (50%, 70%, 90%) Note. The length of each bar for each attribute reflects the value provided by that attribute. The dark part reflects the value of moving from poor to fair, whereas the lighter segment reflects the value of moving from fair to good. obtained by assigning zero points to the worst outcome and 100 points to the best and having the decision maker assign the midpoint of the attribute scale to a point (between 0 and 100) reflecting its importance relative to the attribute’s endpoints (Fishburn 1967). Curvature in these functions was then fit with a one-parameter exponential function (Keeney 1992). Our main interest was in the choice and matching judgments for which training had been provided. Both respondents carried out several choice and matching judgments on these familiar decision problems, improving both their facility with the methods as well as with the type of thinking involved in Figure 2 (a) Choice and (b) Matching Training Exercises Administered to Respondents (a) Choice training exercises: Which one would you choose, A or B? Total cost ($900-$600-$300) Ski slope quality (C-70, B-80, A-90) A $300 70 B $900 90 Feedback The correct answer is A. The importance of $300 versus $900 in total cost is greater than the importance of 70 versus 90 in quality. (b) Matching training exercises: What cost would make these equally valuable? Total cost ($900-$600-$300) A Likelihood of excellent snow (50%, 70%, 90%) $300 50% 90% B learning quantitatively about their preferences. There was informal evidence that the latter process was taking place. The respondent who used the fruit purchase as her familiar decision commented that prior to the exercise her intuitive judgment was that both attributes under consideration were “important,” but that this approach forced her to think about her tradeoffs more specifically. She expressed surprise at identifying the limits on how much extra she would spend to purchase a fruit that was grown closer to the purchase location. She noted that she could feel the tension between the more analytic, deliberative, and the more intuitive style of thinking about the decision. 4.3. Application The problem of interest involved three attributes, listed in Table 1. Decision maker 1 provided judgments for the first two attributes, and decision maker 2 for all three.2 The first attribute (E) is constructed (Keeney and Gregory 2005) to consist of levels of factors that are combined to reflect the economic impact of the oyster fishery (Table 2). Decision maker 1 viewed the second attribute (C) as an indicator of habitat quantity for visual foragers, whereas decision maker 2 viewed it as an indicator of the presence of, or potential for, algal blooms. The third attribute (P ) is an indication of the enhancement of species abundance or biodiversity because of the habitat effect of the oyster reef (Peterson et al. 2003) and was used to capture the concern of decision maker 2 for ecosystem resilience. First choice and then matching judgments were made; the tasks are displayed in Feedback 2 Correct answer = $573 Very good: $515­$630 OK: $458­$687 Otherwise: That’s not very accurate The process described here does not reflect a comprehensive introspection and structuring of values and objectives; instead, a few objectives and their associated attributes or measures are chosen to illustrate the issues raised in this paper. Anderson and Clemen: Decision Analytic Preference Judgments 131 Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS Figure 3 Choice Tasks for Management of North Carolina Oyster Fishery Habitat for Decision Maker 1 Choice: Which one would you choose, A or B? Economic impact of commercial fishery (1-3-5) Water clarity in lower NRE (0 m-1.5 m-3 m) A 5 1.5 meters B 1 3 meters A 5 1.5 meters B 3 3 meters A 5 1.5 meters B 4 3 meters A 5 0 meters B 3 1.5 meters A 5 0 meters B 2 1.5 meters A 5 2 meters B 4 3 meters A 5 2 meters B 3 3 meters Figures 3 and 4, respectively, for decision maker 1. To produce a set of value weights for the attributes, each of the six matching judgments was combined with the assessed single attribute value functions using a standard approach (Keeney and Raiffa 1976), yielding six replicate sets of value weights for each round of elicitations. Figure 5 shows the results of a series of elicitations with both decision makers, three rounds for decision maker 1 (Figures 3 and 4 tasks repeated in each round) and two for decision maker 2, culminating in a final value weight for each assessed attribute, indicated by the arrows. By convention, weights were normalized to sum to 1. Decision maker 1 had two attributes, so all graphed points fall on the diagonal. For example, 4WE 1 WC 5 = 40051 0055 would be in the middle of the diagonal. In contrast, the fact that decision maker 2 included a third attribute explains why the results are pulled inward from the sum-to-1 line, because all three assessed weights were nonzero. After each round of elicitations, the decision maker was asked to explain how she arrived at her judgments. If her responses reflected any methodological Figure 4 Matching Tasks for Management of North Carolina Oyster Fishery Habitat for Decision Maker 1 Matching: What level of water clarity would make these equally valuable? Economic impact of commercial fishery (1-3-5) Water clarity in lower NRE (0 m-1.5 m-3 m) A 1 3 meters B 5 A 1 B 4 A 1 B 3 A 1 B 2 A 4 B 5 A 3 B 5 3 meters 3 meters 3 meters 3 meters 3 meters misunderstanding, this was explained, and another round of elicitations was carried out. Thus, variability in weights in the final round of elicitations is expected to reflect not method but preference uncertainty. The weights derived from the matching judgments along with the single attribute value functions were also used to score the alternatives in the choice tasks using value model (1) so that the degree of consistency between choice and matching tasks could be assessed (Figure 6). The elicitation and reconciliation process led to the following observation that relates Table 1 E C P Attribute Definitions and Ranges Used to Evaluate Oyster Restoration Policy Options Attribute Range Annual statewide economic impact due to the north subtidal commercial oyster fishery Water clarity depth just downstream of a subtidal oyster reef in the lower Neuse River estuary, North Carolina, U.S., given average, July, steady-state hydrographic conditions Enhanced annual production of finfish and crustaceans on subtidal oyster reef in the lower Neuse River estuary, North Carolina, U.S. 1–5 (unitless) 0–3 m 0–5.2 kg/10 m2 Anderson and Clemen: Decision Analytic Preference Judgments 132 Level 1 2 3 4 5 Attribute for Annual Economic Impact of North Carolina North Subtidal Commercial Fishery, Where Level 1 Is Defined as the Worst Outcome and Level 5 Is Defined as the Best Outcome Ex-vessel value Fishermen (w/crew) Total statewide impact Additional jobs created $6001000 $9001000 $112001000 $115001000 $118001000 617 789 895 993 11013 $9761512 $115001346 $210501878 $216531057 $311931762 409 708 909 1106 1308 Figure 5 Value Weights Derived from Matching Judgments for (a) Decision Maker 1 (See Figure 4 for the Actual Tasks) and (b) Decision Maker 2 (a) Decision maker 1 1.0 0.9 0.8 0.7 0.6 wC Table 2 Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS Note. See North Carolina Division of Marine Fisheries (2007, Table 8.4). 0.5 0.4 0.3 0.2 1 2 3 0.1 0 0 0.2 0.4 0.6 0.8 1.0 0.8 1.0 wE (b) Decision maker 2 1.0 0.9 0.8 0.7 0.6 wC to the issue of method understanding versus learning about preferences previously discussed. Matching- and choice-based assessments were essentially in agreement by the end of the final round of elicitations for both decision makers, obviating the need for separate reconciliation. This was evaluated by using the computed weights from the matching tasks to compute overall value of A versus B to see whether the choice recommended by the assessed model matched the choice made by the decision makers. The final matching judgment produced weights–indicated by the arrows in Figure 5– that were in agreement with six of seven choices for decision maker 1 and nine of nine choices for decision maker 2. The only “disagreement” for decision maker 1 was for a choice task in which the alternatives were felt to be equally desirable (a tie), but that was scored by the matching task in favor of one of the alternatives. This process and result provide a basis for regarding these final weights as a better representation of respondent preferences than would be obtained by either type of task alone. It suggests that the well-documented tendency of choice and matching tasks to produce inconsistent preferences is absent, a result that may be because of the training and practice conducted. Specifically, we hypothesize that the training and practice in the steps enabled the respondents to learn the methods well enough so that at the application step distraction from focusing on integrating quantitative and qualitative learning about preferences was minimized. The resulting weights appear to be free of the prominence effect for these two decision makers, increasing validity: the two methods agree, the weights being approximately equal. 0.5 0.4 0.3 0.2 1 2 0.1 0 0 0.2 0.4 0.6 wP Notes. Arrows indicate the final matching judgment in the final series of elicitations for each decision maker. Squares, Elicitation 1; circles, elicitation 2; triangles, elicitation 3. 5. Next Steps Toward Operationalization of the Payne et al. (1999) “Building Code” In 2009, the North Carolina Coastal Federation received a $5 million federal economic stimulus grant for habitat conservation with the goal of putting private industry to work rebuilding the state’s oyster reefs (North Carolina Coastal Federation 2009). The Anderson and Clemen: Decision Analytic Preference Judgments 133 Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS Figure 6 Example of 1 of 9 Choice Tasks for Decision Maker 2 Indicating Agreement Between Choice and Matching Tasks Economic Water clarity Enhanced production Score impact (1-3-5) (0 m-1.5 m-3 m) (0-2.6-5.2 kg/10 m2) X A 5 1.5 meters 2.6 kg/10 m2 0.73 B 1 3 meters 2.6 kg/10 m2 0.55 Notes. “X” indicates his choice. The “Score” column indicates the evaluation of alternatives A and B using value model (1) based on elicited single attribute value functions and the final set of value weights (indicated by the arrow in Figure 5(b)). project moved oyster restoration plans ahead by several years, and relieved much of the financial constraint that was making decisions to take immediate action on oyster restoration difficult. The stimulus grant-funded project’s emphasis on both environment and economy (e.g., job creation) was quite consistent with the desire of the decision makers, as reflected in our results, to put emphasis on economy as well as environment. This paper has proposed and demonstrated a procedure to elicit and reconcile multiattribute value weights, resulting in weights that appear to (a) be free of one known weighting bias and (b) elicit a high degree of user acceptance and confidence. Although we do not provide detailed evidence that would allow us to generalize from our observations, we believe that our results provide steps forward in operationalizing the “building code” of Payne et al. (1999). In particular, we recommend that any such operationalization procedure should address the following issues: First, how do we design training and practice to provide greater facility with elicitation? Second, how do we facilitate judgments that are both more bias free and elicit greater user confidence? Third, how can the training and practice stages of a given procedure lead to the reduction of method misunderstanding so that remaining variability in preferences is mainly because of uncertainty about preferences? Acknowledgments The authors thank two anonymous reviewers, an associate editor, and former Editor-in-Chief L. Robin Keller for comments on an earlier draft. The opinions, findings, and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation. This research was supported by the NSF [SES-0922154]. References Anderson RM, Hobbs BF (2002) Using a Bayesian approach to quantify scale compatibility bias. Management Sci. 48(12): 1555–1568. Barron FH, Barrett BE (1996) Decision quality using ranked attribute weights. Management Sci. 42(11):1515–1523. Bell ML, Hobbs BF, Elliott EM, Ellis JH, Robinson Z (2001) An evaluation of multi-criteria methods in integrated assessment of climate policy. J. Multi-Criteria Decision Anal. 10(5):229–256. Borcherding K, von Winterfeldt D (1988) The effect of varying value trees on multiattribute evaluations. Acta Psychologica 68:153–170. Borcherding K, Eppel T, von Winterfeldt D (1991) Comparison of weighting judgments in multiattribute utility measurement. Management Sci. 37(12):1603–1619. Brown R (2005) Rational Choice and Judgment: Decision Analysis for the Decider (Wiley, Hoboken, NJ). Carlson KA, Meloy MG, Russo JE (2006) Leader-driven primacy: Using attribute order to affect consumer choice. J. Consumer Res. 32(4):513–518. Cronbach, L, Meehl P (1955) Construct validity in psychological tests. Psych. Bull. 52(4):281–302. Delquié P (1993) Inconsistent trade-offs between attributes: New evidence in preference assessment biases. Management Sci. 39(11):1382–1395. Delquié P (1997) “Bi-matching”: A new preference assessment method to reduce compatibility effects. Management Sci. 43(5): 640–658. Edwards W (1990) Unfinished tasks: A research agenda for behavioral decision theory. Hogarth RM, ed. Insights in Decision Making: A Tribute to Hillel J. Einhorn (University of Chicago Press, Chicago), 44–65. Edwards W, Barron FH (1994) SMARTS and SMARTER: Improved simple methods for multiattribute utility measurement. Organ. Behav. Human Decision Processes 60(3):306–325. Eppel T (1990) Eliciting and reconciling multiattribute weights. Unpublished doctoral dissertation, University of Southern California, Los Angeles. Finucaine ML, Peters E, Slovic P (2003) Judgment and decision making: The dance of affect and reason. Schneider SL, Shanteau J, eds. Emerging Perspectives on Judgment and Decision Research (Cambridge University Press, Cambridge, UK), 327–364. Fischer GW (1995) Range sensitivity of attribute weights in multiattribute value models. Organ. Behav. Human Decision Processes 62(3):252–266. Fischer GW, Carmon Z, Ariely D, Zauberman G (1999) Goal-based construction of preferences: Task goals and the prominence effect. Management Sci. 45(8):1057–1075. Fischhoff B, Slovic P, Lichtenstein S (1980) Knowing what you want: Measuring labile values. Wallensten T, ed. Cognitive Processes in Choice and Decision Behavior (Lawrence Erlbaum Associates, Hillsdale, NJ), 117–141. Fishburn PC (1967) Methods of estimating additive utilities. Management Sci. 13(7):435–453. Gilovich T, Griffin D, Kahneman D, eds. (2002) Heuristics and Biases: The Psychology of Intuitive Judgment (Cambridge University Press, New York). 134 Gregory R (1999) Commentary on “Measuring constructed preferences: Towards a building code” by Payne, Bettman, and Schkade. J. Risk Uncertainty 19(1–3):273–275. Gregory R (2004) Valuing risk management choices. McDaniels T, Small M, eds. Risk Analysis and Society: An Interdisciplinary Characterization of the Field (Cambridge University Press, New York), 213–250. Gregory R, Fischhoff B, McDaniels T (2005) Acceptable input: Using decision analysis to guide public policy deliberations. Decision Anal. 2(1):4–16. Hamalainen RP, Alaja S (2008) The threat of weighting biases in environmental decision analysis. Ecological Econom. 68(1–2): 556–569. Hobbs BF, Horn GTF (1997) Building public confidence in energy planning: A multimethod MCDM approach to demand-side planning at BC Gas. Energy Policy 25(3):357–375. Hobbs BF, Chankong V, Hamadeh W (1992) Does choice of multicriteria method matter? An experiment in water resources planning. Water Resources Res. 28(7):1767–1779. Huber J, Ariely D, Fischer G (2002) Expressing preferences in a principal-agent task: A comparison of choice, rating, and matching. Organ. Behav. Human Decision Processes 87(1):66–90. Huber J, Payne JW, Puto CP (1982) Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. J. Consumer Res. 9(1):90–98. Huber J, Wittink DR, Fiedler JA, Miller R (1993) The effectiveness of alternative preference elicitation procedures in predicting choice. J. Marketing Res. 30(1):105–114. Jacobi SK, Hobbs BF (2007) Quantifying and mitigating the splitting bias and other value tree-induced weighting biases. Decision Anal. 4(4):194–210. Jia J, Fischer GW, Dyer J (1998) Attribute weighting methods and decision quality in the presence of response error: A simulation study. J. Behav. Decision Making 11(2):85–105. John RS (1984) Value tree analysis of social conflicts about risky technologies. Unpublished doctoral dissertation, University of Southern California, Los Angeles. Kahneman D (2011) Thinking, Fast and Slow (Farrar, Strauss and Giroux, New York). Kahneman D, Frederick S (2002) Representativeness revisited: Attribute substitution in intuitive judgment. Gilovich T, Griffin D, Kahneman D, eds. Heuristics and Biases: The Psychology of Intuitive Judgment (Cambridge University Press, New York), 49–81. Keeney RL (1992) Value-Focused Thinking (Harvard University Press, Boston). Keeney RL (2002) Common mistakes in making value trade-offs. Oper. Res. 50(6):935–945. Keeney RL, Gregory RS (2005) Selecting attributes to measure the achievement of objectives. Oper. Res. 53(1):1–11. Keeney RL, Raiffa H (1976) Decisions with Multiple Objectives: Preferences and Value Tradeoffs (John Wiley and Sons, New York). Keeney RL, Renn O, von Winterfeldt D (1987) Structuring West Germany’s energy objectives. Energy Policy 15(4):352–362. Kleinmuntz DN (1990) Decomposition and the control of error in decision analytic models. Hogarth RM, ed. Insights in Decision Making: A Tribute to Hillel J. Einhorn (University of Chicago Press, Chicago), 107–126. Lichtenstein S, Slovic P, eds. (2006) The Construction of Preference (Cambridge University Press, New York). Anderson and Clemen: Decision Analytic Preference Judgments Decision Analysis 10(2), pp. 121–134, © 2013 INFORMS North Carolina Coastal Federation (2009) North Carolina estuary habitat restoration: Coastal restoration at work. Accessed December 4, 2012, http://www.nccoast.org/Northeast/pdfs/ NOAA-stimulus-Fact-Sheet.pdf. North Carolina Division of Marine Fisheries (2007) North Carolina oyster fishery management plan amendment II. Division of Marine Fisheries, North Carolina Department of Environment and Natural Resources, Morehead City, NC. Payne JW, Bettman JR, Johnson EA (1993) The Adaptive Decision Maker (Cambridge University Press, Cambridge, UK). Payne JW, Bettman JR, Schkade DA (1999) Measuring constructed preferences: Towards a building code. J. Risk Uncertainty 19(1–3): 243–270. Peterson CH, Grabowski JH, Powers SP (2003) Estimated enhancement of fish production resulting from restoring oyster reef habitat: Quantitative valuation. Marine Ecology Progress Ser. 264:249–264. Schoemaker PJH, Waid CC (1982) An experimental comparison of different approaches to determining weights in additive utility models. Management Sci. 28(2):182–196. Slovic P, Griffin D, Tversky A (1990) Compatibility effects in judgment and choice. Hogarth RM, ed. Insights in Decision Making: A Tribute to Hillel J. Einhorn (University of Chicago Press, Chicago), 5–27. Tversky A, Sattath S, Slovic P (1988) Contingent weighting in judgment and choice. Psych. Rev. 95(3):371–384. von Winterfeldt D, Edwards W (1986) Decision Analysis and Behavioral Research (Cambridge University Press, New York). Weber M, Eisenführ F, von Winterfeldt D (1988) The effects of splitting attributes on weights in multiattribute utility measurement. Management Sci. 34(4):431–445. Richard M. Anderson is a research scientist at the Puget Sound Institute at University of Washington Tacoma, where he is using decision analysis to help guide protection and restoration of the Puget Sound in Washington. He has broad interests in engineering and decision analysis, having previously worked at Pacific Northwest National Laboratory as a risk analyst and served as assistant professor in the Nicholas School of the Environment at Duke University. Special areas of application include biases in preference assessment and Bayesian network models in natural resource management. He holds a Ph.D. in environmental systems analysis from the Department of Geography and Environmental Engineering at Johns Hopkins University. Robert Clemen is professor emeritus of decision sciences at Duke University’s Fuqua School of Business. He has broad interests in the use of decision analysis for organizational decision making, and special interests in the psychology of judgment, assessing expert probabilities, the effectiveness of decision-making techniques, and using decision analysis to help organizations become environmentally sustainable. The author of the widely used text Making Hard Decisions, is also founding co-Editor-in-Chief of Decision Analysis. He has published more than 40 scholarly articles that have appeared in journals such as Management Science, Operations Research, International Journal of Forecasting, Decision Analysis, among others.