Context Effects Judicial Decision Making

advertisement
Context Effects in Judicial Decision Making
Jeffrey J. Rachlinski
Chris Guthrie
Andrew J. Wistrich
DRAFT 01/06/11
Abstract
In this paper, we present five sets of studies that demonstrate both how lawyers
can use psychological influences to induce judges to make erroneous judgments and how
judges respond to these efforts. In the first, we show lawyers can take advantage of
contrast effects to influence how judges evaluate the credibility of their expert witnesses.
Surprisingly, the study shows that adding a worthless expert witness who lacks credibility
can help one's case. In the second study, we found that judges' assessments of the
probative value of forensic evidence depends upon how the information is presented. But
we also found that judges resist this misleading attempt at misdirection. Similarly, in the
third set of studies, we show that judges are vulnerable to the misleading effects of the
conjunctive logic, but that they seem also able to resist this effect in settings that are more
relevant to the kinds of judgments they make. In the fourth set of studies, we identify
how judges avoid the influence of the hindsight bias on judgments of probable cause by
altering the target of the judgment that they make. In this study, the natural setting, rather
than crafty lawyering, provides the potential source of misdirection, but judges seem to
navigate it well. In the final set of studies we show how judges attempt to ignore
inadmissible evidence in ways that make them vulnerable to further undesirable extralegal influences. Taken together, these studies show that misdirection is a two-way street
in the courtroom. Lawyers can try to misdirected judges to induce erroneous judgments,
but judges might well respond by changing their own focus to more stable targets that
make them less vulnerable to untoward influences. These efforts, however, can also
backfire, as judges can make themselves more vulnerable to error.
1
Introduction
An old expression advises lawyers that "if the facts are against you, argue the law;
if the law is against you, argue the facts."1 The euphemism states the obvious, perhaps.
But it also identifies a particular talent that quality lawyers are supposed to possess--the
ability to change the terms of the debate so as to win a case that cannot otherwise be won.
We suspect that judges hate this expression, as it reifies lawyers' conflicting efforts to
control the agenda in lawsuits and potentially distract the judge from the core issues of a
case. Judges prefer to control the agenda in their own courtroom, and naturally dislike
dilatory, distracting litigation strategies. They also regard such efforts as wasteful in that
they believe that they will see through these efforts. From the judges' perspective
changing the subject will only waste time and delay the inevitable. But we wonder
whether lawyers' efforts to distraction and control of the way a case is framed are really
such a waste of time. Perhaps the real problem is not that these tactics are wasteful, but
that they are effective.
The opening aphorism about facts and law dovetails with a fundamental lesson of
social psychology concerning how to influence people's judgment. That is, social
psychologists report that although changing people's judgment of objects is challenging,
people's choices can be influenced easily by simply shifting the object of judgment. The
often misunderstood study on conformity by Solomon Asch illustrates the point. Asch
gave a room full of eight undergraduates a seemingly innocuous task of identifying which
of three lines of very different length was closest in length to a series of target lines. The
target lines were always identical in length to one of the three options, making the task
seem incredibly easy. At least the task looked easy until six of the supposed participants
apparently chose the wrong line. In reality, participant number seven was the only real
subject in the experiment, with the others being confederates of the experimenter who
were instructed to choose the wrong line in several of the rounds. In many instances, the
real subject went along with the group's choice, even though it was seemed erroneous.
Although this study is often described as an experiment on conformity, in reality, the
subjects complied with the group choice only because they felt that they had
misunderstood either the instructions or some other aspect of the task. They were never
confused about the lines, which were too different in length to inspire confusion. In
effect, the confident, seemingly erroneous answers provided by the six confederates
changed the target of judgment. The subjects were not assessing the length of lines, but
were trying to understand the motives of their supposed colleagues.
The notion of misdirection and controlling the object of judgment has been a
fundamental tenet of advertising almost since there has been advertising. Clothing lines
are selling images, not clothing. Decades of cigarette ads sold youth and cool, along with
nicotine. Beer commercials that feature attractive young people are quite obviously
selling sex, not beer. If advertisers try to misdirect the public, surely litigators try to
misdirect judges and juries. And if the public is susceptible to these manipulations,
maybe judges are as well.
1
Variations sometimes add, "if both the law and facts are against you, pound the table."
2
In this paper, we present five sets of studies that provide examples of why
litigators are apt to attempt the same kind of tricks as Solomon Ach and Madison Avenue
in the courtroom. In the first, we demonstrate that a simple contrast effect can influence
judges by showing that adding a worthless expert witness who lacks credibility can boost
one's case. In the second, we show that judges' assessments of the probative value of
forensic evidence depends upon how the information is presented. But at the same time,
we show that judges resist this misleading attempt at misdirection. Similarly, in the third
set of studies, we show that judges are vulnerable to the misleading effects of the
conjunctive logic, but that they seem also able to resist this effect in settings that are more
relevant to the kinds of judgments they make. In the fourth set of studies, we identify
how judges avoid the influence of the hindsight bias on judgments of probable cause by
altering the target of the judgment that they make. In the final set of studies we show
how judges attempt to ignore inadmissible evidence in ways that make them vulnerable
to further undesirable extra-legal influences.
Taken together, these studies show that misdirection is a two-way street in the
courtroom. Laywers can try to misdirected judges to induce erroneous judgments, but
judges respond by changing their own focus to more stable targets that make them less
vulnerable to untoward influences. These efforts, however, can also backfire, as judges
can make themselves more vulnerable to error.
I. Contrast Effects: Or How to Improve Your Case by Hiring a Lousy Expert
Contrast effects are one of the most pernicious distractions found in the
psychological literature on judgment and choice. This literature demonstrates that the
addition of undesirable items to a choice set can produce wildly inconsistent choices. As
one paper on the subject put it, "someone who prefers chicken over pasta should not
change this preference upon learning that fish is available" (Kelman et al, 1996)--and yet
preferences seem to be that fickle. In a series of studies, Amos Tversky and his students
had subjects choose between consumer products that varied along multiple dimensions
and found that the addition of an inferior product altered the subjects' stated preferences
for these products. (Simonson & Tversky, 1992 and others). For example, in one study,
subjects choosing between accepting (as a gift) either a nice Cross pen or $6 were more
likely to choose the pen when a third option of a vastly inferior pen was added to the
choice set.
This phenomenon has been documented in a wide variety of settings, including in
the legal settings. Settlement offers, in particular are amenable to context effects. In
deciding whether to accept a settlement, a litigant is choosing between further litigation
and accepting a certain settlement. Consequently giving a litigant a choice of two
settlements, one of which is clearly inferior, makes the more attractive settlement more
compelling, relative to litigation (just as the addition of the inferior pen makes the good
pen seem more attractive than that money). (Guthrie, 2001?). Adding inferior settlement
offers can similarly affect the choice between types of settlements. (Kelman et al. 1996).
3
Similar effects can be found in choices among criminal penalties as well. (Kelman et al.
1996).
These kinds of contrast effect would seem to defy logic. At the very least, they
are not consistent with a widely held assumption that preferences are invariant. A person
who prefers A to B, A to C, and B to C, should not prefer B to A and C. The addition of
an inferior contrast, does not so much chance people's judgment of the objects, but does
alter the object of their judgment. People might not truly know whether they prefer a
Cross pen to $6, but they can be sure that they prefer a Cross pen to a cheap, used Bic.
The contrast between the Cross and the Bic gives them a dimension on which they can
evaluate a choice with some measure of certainty. (Hsee, 1998). They can support and
defend the choice of the Cross over the Bic in a way that they cannot support the choice
of the Cross over the money.
So powerful is the need to support and evaluate one's choice that people can be
induced to favor accepting less of a desirable commodity. (Hsee, 1998). In one study,
for example, subjects indicated that they would be willing to pay an average of $1.66 for
eight ounces of ice cream presented in a ten ounce cup (making the cup seem unfilled),
even though a similar group of subjects indicated that they would pay an average of $2.26
for seven ounces of ice cream, in a five ounce cup (making the cup seem overfilled). The
object of judgment is not the absolute value of ice cream, but of whether the subjects
think they are getting good value for their money.
All of these studies provide examples of the basic social psychological point that
changing the object of judgment can manipulate people's choices. When people are faced
with a choice between the Cross pen and the money, they are evaluating the Cross pen in
terms of its cash value. When the cheap Bic becomes available, however, they are
judging the Cross in terms of its value as a pen as compared to pens--and it looks good on
that dimension. The nature of the inquiry has changed. Similarly, putting ice cream in a
cup that is too large makes it look like you are being cheated, while putting it in a cup
that is too small makes it look like you are getting a bargain. This kind of misdirection is
much like the driver who looks for his keys under a lamppost across the street from
where he actually lost them because the light is better. No one knows what a Cross pen is
worth to them, but they know it is worth more than a Bic, so they choose it more
frequently. Despite the illogic of these choices, "evaluability" and contrast effects creates
are potent phenomena.
But can contrast effects influence judges. To assess this we had a group of trial
judges evaluate a hypothetical legal question designed to elicit contrast effects. The
judges were in attendance at a state-wide annual judicial education conference for Florida
Circuit Court Judges June of 2006. The judges participated in this research as part of
their annual conference and were part of a plenary session labeled only "Judicial Decision
Making". During this session, the judges completed questionnaires that included a
number of hypothetical scenarios.2
2
This sample is described in other of our papers.
4
We designed one of these to test the contrast effect in evaluation of expert
witnesses. The scenario described a child-custody dispute, with the following text:
"Imagine that you are presiding over a child custody dispute in which the
husband and wife are at odds over the custody of their 11-year-old son, Jeremy.
The husband and wife are both competent parents, but their relationship with each
other is profoundly strained. They have rejected a joint custody relationship and
are each seeking sole custody of Jeremy (though the other parent would retain
visitation rights). Both the husband and the wife have retained experts to testify
as to the custodial arrangement that would serve Jeremy’s “best interests.” Based
solely on the information provided below, which of the following experts would
you deem to be most credible (please select one only):"
The materials then described the expert witnesses. The judges were randomly
assigned into one of two conditions: a control condition in which they chose between
two different experts--one working on behalf of each party--who were designed to be of
roughly comparable quality and a contrast condition, which provided the same two
experts, but also added a third expert for the husband. The third expert had vastly inferior
qualifications. In effect, the third expert is the equivalent of the Bic pen in this variation
on contrast effects.
The materials described the two comparable experts as follows:
Wife’s Expert – Dr. Henry is a licensed psychologist with a B.A. in Psychology
from Stanford and a Ph.D. in Clinical Psychology from the University of
Michigan. Dr. Henry has practiced as a clinical psychologist for 20 years in the
District of Columbia, working primarily with children and families. Dr. Henry
has testified as an expert in 15 child custody cases, seven times for the wife and
eight times for the husband. In this case, Dr. Henry will testify that the wife
should get custody.
Husband’s Expert – Dr. Williams is a licensed psychiatrist with a Bachelor’s
Degree in Biology and an M.D. from Emory University. Following medical
school, Dr. Williams completed a psychiatric residency and has since practiced
psychiatry for 10 years in the Miami area, working primarily with children and
families. Dr. Williams has testified as an expert in ten child custody cases, four
times for the husband and six times for the wife. In this case, Dr. Williams will
testify that the husband should get custody.
The third expert, identified also as the husband's expert, was identified as follows:
Husband’s Expert – Dr. Hancock is a psychiatrist with a B.A. in Psychology
from the University of Mississippi and an MD from St. George’s University
School of Medicine in Grenada. Dr. Hancock has never been admitted to practice
medicine in the United States. Dr. Hancock has, however, testified as an expert in
5
37 prior child custody cases, each time for the husband. In this case, Dr. Hancock
will testify that the husband should get custody.
Of the 144 judges who reviewed this problem,3 six did not respond (4 who saw
the 2-option and 2 who saw the 3 option). None of the judges in the 3-option condition
chose the 3rd choice.4 Even though no judges identified the weaker expert as the most
credible, the addition of this expert made the husband's better expert seem more credible.
In the control (2-option) condition, 54.% (38 out of 70) of the judges chose the husband's
witness as the more credible. In the contrast condition, this rose to 72.1% (49 out of 68).
This difference was significant statistically.5
Judges, it seems, are no different than ordinary consumers in that they are
vulnerable to contrast effects. It is hard to evaluate the reliability of an expert witness
(particularly since we did not provide the testimony, or any detail other than their
qualifications). But it was easy enough for the judges in our study to tell that one
psychiatrist was better than the other. In effect, the addition of this weak expert changed
the object of judgment; the judges were rightly seeing the good psychiatrist as more
qualified than the weaker psychiatrist, which distracted them from the more critical
inquiry as to whether the psychiatrist was more qualified than a psychologist.
This result suggest an insidious litigation strategy--put forth a weak expert to
make your good expert look better. That can, and should, seem ridiculous, but contrast
effects are powerful in real-world settings, just as they are in the lab. Real estate agents
reportedly show clients houses that are wildly inferior on some dimension that is
important to homebuyers as a way of getting them to see the real target that the agent is
trying to sell as a good buy. CITE. Firms often offer extremely high-end (and high
priced) versions of their product without any real hope of selling many of that version,
but only to make the price of the version they really hope to sell seem more attractive.
(Tversky & Simonson, XXX). If such distractions can affect homebuyers and
consumers, who are making serious decisions, then judges might be just as vulnerable.
3
This was roughly half of the judges in attendance at the conference. We varied our materials slightly so
that half of the judges read this problem and the other half read an unrelated scenario.
4
We originally presented this problem to a group of judges at an educational conference in another
jurisdiction. In the original version, we identified the wife's expert as the psychiatrist and the husband's as
psychologists; one of which had vastly inferior qualifications to the other. In that version, we encountered
a kind of ceiling effect, in that in the control condition, 84% of the judges (26 out of 31) chose the
husband's expert as more credible; that is, there was not much room for more support for this expert I the
contrast condition. The addition of the contrasting inferior expert psychologist increased this to 93% (28
out of 29), but this trend was not significant. (Fisher's exact test, p = .20.) Many of the judges informed us
that they believed psychologists were better witnesses in such cases, and hence we re-wrote the
qualifications and education of the experts so as to make the choice in the control condition a closer call.
5
Fisher’s exact test, p =.035. If we combine the results from the similar, original version of the problem
and this version, the combination also produces a significant contrast effect, with 63% (64 out of 101)
choosing the husband's expert in the control condition as compared to 79% (77 out of 97) in the contrast
condition. Fisher's exact test, p = .01.
6
Contrast effects might also explain why lawyers often include weak arguments in
their briefs to accompany arguments that stand a real chance of success. The weak
arguments might make the strong ones seem stronger by contrast. Obviously, including a
weak argument might undermine the credibility of the lawyer as well. And it would seem
surprising if judges cannot make more stable judgments of the quality of the arguments
attorneys present. But contrast effects are surprisingly strong and seem to affect
consumers in situations in which they have quite a bit of experience (such as evaluating
the monetary value of ice cream). Our study demonstrates that contrast effects can
influence judges in at least one relevant legal setting, and perhaps the effect is as potent
and general in judges as it is in consumers.
Similarly, lawyers might take advantage of the related phenomenon known as
compromise effects. That is, asking for a more extreme result than one expects to obtain
can move judgments in the direction of that extreme result. In one study demonstrating
this effect, people's judgment as to whether a homicide was an instance of second or first
degree murder was affected by whether they also had available a more extreme choice,
consisting of first degree murder with special circumstances (which could have drawn the
death penalty). (Kelman et al., 1996; Guthrie, Iowa?). Although few subjects thought
that murder with special circumstances was appropriate, the availability of this option
increased the percentage of subjects who determined that judged the homicide constituted
first-degree murder. The addition of a more extreme option shifts the scale in that
direction, so that the "compromise" of choosing in the middle moves along with that
extreme judgment. While we did not test this variation on the contrast effect in judges, it
is similar to contrast effects.
Even though context effects seem to be a widespread phenomenon that permeates
marketing and the consumer environment, and our study demonstrates its potential
influence on judges, we have conducted another study in which we were unable to elicit
contrast effects in judges. In this study, we presented judges in attendance at an
educational conference sponsored by the Federal Judicial Center for United States
Magistrate Judges with the task of evaluating two settlement offers. The case consisted
of a civil rights claim by an African-American high school honors student who had been
shot in the back by a security guard at a public university where he was taking classes. 6
6
The facts of the scenario, labeled "Settlement Problem" consisted of the following:
“You are presiding over a settlement conference in a lawsuit filed by a minor, Henry Johnson,
against Ted Samuelson, a campus police officer employed by the State University. The suit includes a
claim under 42 U.S.C. § 1983 and state law tort claims. The University will indemnify Samuelson and has
assumed the defense of this action.
"Johnson is a 16-year-old African-American from a poor neighborhood. He lives with his four
younger siblings and his mother, who works as a hotel maid. He has not seen his father in many years. In
January of Johnson’s junior year in high school, he began taking some classes at the local University,
through a special program offered to honor students.
"While returning from the University library late one evening, Johnson was stopped by Officer
Samuelson. Samuelson began questioning Johnson aggressively in connection with an armed robbery.
Nervous and frightened, Johnson ran off. Samuelson shouted at him to stop. When Johnson kept running,
Samuelson shot Johnson in the back. The bullet damaged Johnson’s spinal cord leaving him permanently
unable to walk. The incident has left Johnson bitter and angry. He is nonetheless determined to complete
7
Because the plaintiff was a minor, the judge had to approve the settlement. The
defendant was the university and offered either a $3 million settlement or $1.5 million
settlement, plus they would fire the security guard. The materials indicated that the
student wanted to accept the lesser sum so as to ensure that the guard would get fired.
For half of the judges, we created the contrast effect by indicating that the University had
initially only been willing to offer $1.5 million and a suspension of the guard.7 The
materials ultimately asked the judge whether they would allow the plaintiff to accept the
lesser sum.8
The 42 judges at the conference all answered the question, but they did not
express a contrast effect.9 In the control condition, 47% (10 out of 21) indicated that they
would allow the plaintiff to accept the lesser settlement, as compared to 38% (8 out of
21) in the contrast condition. Thus, contrary the addition of the earlier settlement offer
that contrasted unfavorably with one of the options made that option slightly less
attractive. The difference in acceptance rates of this option was not significant. 10
While we think it important to report this apparent disconfirmation of the effect,
we do not think that it undermines the basic conclusion that judges are vulnerable to
contrast effects. In the study, we were attempting to replicate some of the studies of
contrast effects in settlement as conducted by Kelman and his coauthors (1996) and by
his classes at the University (which is the best public college in the state) and perhaps enroll as a full-time
student after high school
"Johnson’s mother is his guardian ad litem. Between her job, managing Johnson’s injuries, and
caring for her other four children, however, she is overwhelmed. She is counting largely on others to do
the best for her son, and will approve any decision her son and his attorney make. Johnson’s lawyer seems
competent, but is inexperienced. Neither side has actively pursued discovery and both seem interested in
settling. Criminal charges against Samuelson for the shooting were dropped after a brief investigation."
7
This was described as follows: "In a previous settlement conference, Johnson insisted that the Officer
Johnson be disciplined and the University initially refused. After protracted discussions, the University
reluctantly offered to pay Johnson $1.5 million and to suspend Samuelson for three months without pay.
Johnson rejected that offer, and you adjourned the conference to give the parties a chance to reconsider
their positions."
8
This was stated as follows:
“The [second] settlement conference also proved to be contentious. The University attorneys
wanted to discuss a cash offer (which would go into a trust because Johnson is a minor), but Johnson
seemed interested only in disciplinary action against Officer Samuelson. After a lengthy bargaining
session, University attorneys gave Johnson two options, and announced that unless he accepted one, it
would withdraw both and actively litigate the claim. The options are:
A) The University fires Officer Samuelson and pays Johnson $1.5 million
B) The University takes no disciplinary action against Officer Samuelson and pays Johnson $3 million
You privately discussed the University’s offer with Johnson and his attorney. Johnson wants to
accept option A. He is furious that Officer Samuelson has not been disciplined and is worried about
studying on the same campus that Samuelson patrols. Johnson also needs cash to pay his medical bills,
however, which have become a concern to his family. Because Johnson is a minor, you must also approve
the settlement. Johnson has told you that he will reluctantly accept option B if you would not approve
option A.
If Johnson accepts settlement option A, will you approve it as a settlement?
___ Yes, I would approve option A as a settlement
___ No, I would not approve option A as a settlement
9
The 42 judges included 41 U.S. Magistrate Judges and one Federal District Judge.
10
Fisher's exact test, p = .75.
8
Guthrie (2002). In these studies, the addition of an inferior choice tended to boost
acceptance of the choice that it most closely resembled. Unlike the previous studies,
however, we did not offer the judges a third option. Instead, we had thought that the
judges might consider the previous settlement offer as a contrast. Even though we feel
our version represented a more realistic way that settlement offers might be structured in
such a case, it is possible that the contrast effect depends upon that previous offer being
available as a choice, just as it is in the studies of consumer choice.
It is also possible that that the role in which we case judges in this study
undermined the effect. Judges were not selecting their preferred choice, but were
deciding whether to approve or reject the decision of a litigant. Although we believe that
the contrast effect should have improved the desirability of the option that the litigant
chose, it could be that the judges are focused on a different object of judgment--that of
the maturity of the litigant. The contrast effect was thus somewhat indirect, at best, in
this case, which might have diluted the phenomenon.
II. Imaging the Numerator: Or How to Lie To Judges With Statistics
In other work, we have argued that judges, like most adults, use two distinct
cognitive systems to make judgments; an intuitive system founded large on affective
processes ("System 1") and a deliberative system founded largely on deduction processes
("System 2") (Guthrie et al., 2007). The intuitive system is surprisingly accurate
(Gladwell, 2004), but can lead to predictable errors in judgment. Because the intuitive
system is faster than the deductive system, good judgment requires the ability to suppress
the intuitive response and substituting a response based on deduction. We have found
that judges sometimes suppress misleading intuitive responses, but they do not do so
consistently. (Guthrie et al, 2007). This gap in judicial ability suggests a vulnerability
that savvy lawyers can exploit. A lawyer who can change the object of judgment to one
that triggers a favorable affective judgment potentially gains an advantage in trying to
persuade a judge.
Presentation of actuarial or statistical information presents a prominent example
of how manipulating the object of judgment can influence judgment by triggering a
misleading intuitive response. For example, it is widely thought that actuarial
information can often seem drab and unpersuasive relative to anecdotes. (Borgida,
1978?). Even Josef Stalin is reported to have articulated this tendency in the quote
(widely attributed to him) that "the death of a single life is a tragedy, but the death of
millions is a mere statistic." (cite). Anecdotes and stories play on the intuitive system,
triggering rapid responses. Statistics require the slower, emotionally disconnected
deliberative system to process. Individual examples and identifiable victims trigger
quick, emotional reactions that people do not easily override with judgments based on
statistical information. (Lowenstein et al, cite).
The ease with which exemplars can trigger people to use their emotional system
to make judgments that are inconsistent with a careful, statistical analysis is illustrated
9
neatly with the so-called "jellybean study" by Seymour Epstein. (Epstein, 1998?). In this
study, Epstein told subjects that they would receive a prize for drawing a red jellybean
from one of two jars filled with red and white jellybeans. One jar contained 1 red and 9
white jellybeans, while the other contained 10 red and 90 white jellybeans. Even though
the probability of drawing a red jelly bean from both jars was identical, subjects preferred
to draw from the jar with 10 red jellybeans. Epstein termed the phenomenon, "imaging
the numerator", arguing that the subjects' intuitive systems reacted to "more chances to
win" in the second urn, thereby ignoring the fact that this urn also provided a proportional
number of chances to lose. Many people did not override the intuitive focus on the larger
set of red jellybeans in the second urn. So powerful is the phenomenon of imaging the
numerator that many subjects preferred the second urn even when it contained as few as 7
red jellybeans (while still containing 90 white jellybeans).
Work by John Monahan and Eric Silver (2003) suggests that the phenomenon of
imaging the numerator also affects judges. These researchers gave judges hypothetical
scenarios involving a question of whether to commit an individual suffering from mental
illness involuntarily to an institution. They described the individual's condition and
diagnosis and had judges identify how great the risk of violence would have to become
before they would commit the individual. The judges chose from one of five categories
of risk: 1%, 8%, 26%, 56%, or 76%. When the researchers presented the risk of violence
in subjective format (that is, as a percentage chance, as in the previous sentence), the
modal threshold for commitment was 26%. When the researchers presented the risks in a
frequency format (that is 1 in 100, 8 in 100, 26 in 100, 56 in 100 or 76 in 100), the modal
threshold dropped to 8 in 100. Presenting the risks of violence in a "frequentist" format
made it easier for judges to think of the person as violent. The authors liken this effect to
imaging the numerator, arguing the judges focused on instances of violence described in
the frequentist format. In contrast, the percentages feel abstract.
The trend for judges to be more willing to commit mentally disturbed individuals
when the risks of violence were expressed in frequency formats was not significant in
Monahan and Silver's study.11 But the result is similar with results found on forensic
experts from a series of other studies. (Slovic & Monahan, 1998, 2000). In one of these
studies, for example, clinical psychologists were then asked to indicate whether they
would be willing to recommend committing a similar individual. Among the experts who
were told that "8%" of people with this individual's condition commit a violent act, 39%
stated that they would recommend committing him. Among the experts who learned that
"8 out of 100" people with this individual's condition commit a violent act, 61% stated
that they would commit him.
11
Although the authors do not report test statistics, they report the raw data on page 4, which enabled us to
conducted an order logistic regression, which confirmed that the trend was not significant. (p = .38). The
authors only had 26 judges available for the study, which was unlikely to detect the effect that researchers
have found for probability format in other contexts. QUICK POWER ANLAYSIS HERE--effect size from
Slovic & Monahan (1998, 2000).
10
The lesson for an attorney in this context is straightforward. A lawyer who wants
a judge to commit a patient involuntarily should present relevant statistics both to the
judge and courtroom expert in a frequentist format.
In similar work, Koehler has demonstrated how frequentist format can affect
ordinary adults acting as jurors in a criminal case. (Koehler, 2001). Koehler presented
jurors with a one-page description of a criminal case in which a prosecutor presented
probabilistic testimony of a forensic technique known as PCR matching (which is a
simple form of a DNA matching test).12 The materials indicated that blood from the
crime scene matched that of the defendant using this PCR technique. Koehler also
presented the probability that an innocent person would randomly drawn from the
community at large would also match the suspect. He presented this probability either as
0.1% (subjective format) or as 1 in 1,000 (frequency format).13 The frequentist format
made it easier to image that a large number of individuals would match the blood type.
In contrast, the subjective format makes it easier to commit the "inverse fallacy"
(Thompson, 19??); that is, people might quickly confuse the 0.1% as the probability that
he defendant is innocent. Hence, while the deductive system will easily see these two
formats as identical, the intuitive system will see the frequentist presentation as pointing
towards innocence and the subjective format as point towards guilt. Koehler found
12
The text of Koehler's full scenario was as follows:
"Imagine that you are presiding over People v. Nethers, a criminal case. In People v. Nethers,
Steven Nethers was accused of murdering Richard Oden during an attempted robbery of a hardware store
owned by Mr. Oden. According to reliable eyewitness accounts, the perpetrator entered Mr. Oden's
hardware store at approximately 4:30 p.m. on November 2, 1997 wearing a Halloween-type of mask, and
waving a small caliber handgun. The perpetrator approached Mr. Oden (who was behind the cash register)
and said, "Open it fast or you're a dead man." According to the eyewitnesses, when the perpetrator turned
his head to survey the store, Mr. Oden grabbed a hammer from the counter and smashed the perpetrator on
the head with a single blow. The perpetrator fired a single shot into Mr. Oden's chest and fled the store.
Mr. Oden died shortly thereafter in a local hospital.
"During an investigation of the hardware store crime scene, the police identified and recovered
several moist blood drops from the path that was taken by the perpetrator as he fled the store. These drops
were subjected to a form of DNA analysis called PCR testing. The PCR tests revealed the blood to be of a
type known as "2, 3." Because this blood type was different from Mr. Oden's blood type, police believed
that the recovered blood drops came from the bleeding head of the robber. During routine interviews of
people who live in the neighborhood, the police identified several potential suspects. All of these
individuals agreed to provide blood samples to police for comparison with blood that was recovered from
the crime scene. One of the suspects, Mr. Steven Nethers, matched the 2, 3 blood type and was arrested for
the murder.
"At trial, the prosecution alleged that the blood analysis demonstrated Mr. Nethers was the source
of the wet blood drops, and that he was therefore guilty of attempted robbery and murder. A DNA expert
testified that his tests could not rule out Mr. Nethers as a possible source of the blood drops. He also
testified that the probability that the suspect would match the blood sample if he were not the source is
0.1%. The defense argued that the blood evidence is irrelevant because there was no direct evidence, such
as eyewitness identifications, that linked Mr. Nethers to these crimes."
13
Koehler also varied the specificity of the population of potential innocents. He stated this either in a
general way (e.g., "the probability that the suspect would match the blood sample if he were not the source
is 0.1% ") or by identifying a specific population (e.g., "0.1% of people in Houston would also match the
blood drops"). In our study, reported below, we also varied this parameter, but found it had little effect on
the judges, and is less relevant to our hypothesis than the subjective versus frequentist format. Hence, we
do not discuss this in detail.
11
exactly these results. Subjects who saw the statistics presented in subjective format were
more likely to conclude that the blood at the crime scene had come from the defendant
than those who saw the probabilistic format. Interestingly, as discussed below, they did
not also express a greater willingness to convict the defendant.
These studies produce a clear set of advice for lawyers. Prosecutors should
present the statistics in a subjective format and defense attorneys should present the
statistics in a frequentist format. Each format flags a slightly different aspect of the
statistical evidence. The subjective format induces a sense of the power of forensics and
the near certainty of a 0.1% match. People exposed to statistics in this format are
assessing whether they believe that the person has only a 0.1% chance of being innocent.
By contrast, the frequentist format invites people to imagine a large number of potentially
innocent people who might also match the evidence. People exposed to statistics in this
format have to judge whether the defendant is one of the many innocents or is the guilty
party, based on the other facts. The format changes the target of the subjects' analysis
and changes the outcome.
To assess whether this same effect could be observed in trial judges, we presented
Koehler's materials to a group of 68 trial judges from judges in an urban Eastern
jurisdiction in attendance at their annual educational conference in May, 2002.14 Judges
saw four versions of Koehler's scenario, two in which the probability of matching was
presented in subjective (0.1%) format and two in which it was presented in frequentist
(1out of 1000) format. This variation was crossed with the degree of specificity of the
sample from which the potential sample of innocent matches might be drawn (matching
in general, or matching people from the District of Columbia). Following Koehler, we
asked the judges three questions about the fact pattern:
"Based on this evidence, what is the probability that the defendant is the source of
the recovered DNA trace?
___%
"Based on this evidence, what is the probability that the defendant is guilty of the
murder of Mr. Oden? ___%
"Based on this evidence, how would you find the defendant (assuming a bench
trial)? (Guilty or Not Guilty)"
As to the first two questions, the judges produced results similar to the subjects in
Koehler's study. Among the 24 judges who saw the frequentist presentation, the average
probability that the blood was from the perpetrator was 42%; whereas it was 73% among
the 29 judges who saw the subjective format.15 This difference was significant
14
These judges preferred not to have the jurisdiction identified. Three judges declined to allow us to use
their results for any further discussion, and the results from these judges have been omitted from all
analysis.
15
14 of the judges did not respond to this question or responded by indicating "I don't know".
12
statistically.16 Judges echoed this result when asked about the probability that the
defendant committed the crime. The judges in the frequentist condition gave an average
probability of guilt as 44% (21 judges), as opposed to 68% in the subjective condition (27
judges). This difference was also significant statistically.17
These assessments, however, did not affect the judges' assessments of the verdict.
Among the judges who saw the frequentist presentation 20% (6 out of 30) indicated they
would find the defendant guilty, as opposed to 25% (9 out of 36) judges who saw the
subjective version. This difference was not significant.18 This result also echoes that of
Koehler's who found that 32% of lay adults who read frequency statistics were willing to
convict, as opposed to 36% of those who read subjective statistics. (Koehler, 2002).
The results suggest an interesting dichotomy. Judges (and lay adults) were
influenced by the format. They assessed the facts differently and reported that they were
more persuaded by the evidence when it was presented in the subjective format. But the
format did not affect the judges' willingness to convict the defendant. This result
suggests that asking judges to assess guilt changes their perspective. We believe that
asking when judges render a verdict in a criminal context, they slow down, and think a bit
harder. The verdict triggers careful thoughts about due process and the degree of
confidence in the evidence in a different way than when we simply ask for an assessment
of probabilities. (Nesson, 19??). The question of guilt directly flags confidence in the
police and prosecutors, and the judges' own willingness to endure the risk of a wrongful
conviction. For judges, a criminal verdict represents a different question from that of the
probability of a blood match.
To be sure, the fact that we, and Koehler, observed a small trend towards greater
conviction in the subjective format suggests that we might not have enough statistical
power to detect an effect on a binary outcome, like guilt. Koehler, in fact, suggests as
much, and cites a similar study in which a significant effect was observed on guilt.
(Koehler, 2002). But in light of our research on the role of intuition and deliberation, we
suggest that the question of guilt prompts more deliberative judgments. Judges and jurors
have to be a confident in guilt to conclude that they will convict. This greater confidence
requires more deliberation and produces a different judgment than assessment of
probability.
16
F(1,49) = 7.86, p = .008. Specificity (general versus specific city) and the interaction between
specificity and format were not significant (F = 0.83 and F = 0.89, respectively).
17
F(1,44) = 4.83, p = .03. Specificity (general versus specific city) and the interaction between specificity
and format were not significant (F = 1.26 and F = 0.00, respectively).
18
Fisher's exact test, p = 0.77.
13
III. Conjunction Effects: Or, Persuading With Specificity?19
The so-called “extension rule” 20 is “perhaps the simplest and most transparent rule of
probability theory.” 21 This rule states that “if A is a subset of B, then the probability of A
cannot exceed that of B.”22 For example, the probability of a terrorist act in New York
City (A) cannot exceed the probability of a terrorist act in the United States (B) because
the United States includes New York City (as well as many other locations that might be
subjected to such an attack). Implicit in the extension rule is the “conjunction rule.”23
This rule states that “the probability of A&B can exceed the probability of neither A nor
B, since it is contained in both.”24 For example, the probability of a terrorist attack in
New York City carried out by Muslim extremists (A&B) cannot exceed the probability of
a terrorist attack in New York City (A) or the probability of a terrorist act carried out by
Muslim extremists (B).
The extension and conjunction rules are deductively accurate, as only a little
deliberation shows. Psychologists have found repeatedly, however, that people tend to
violate these rules of logic. Rather than engaging in careful deliberation, which leads to
compliance with the rule, people often engage in intuitive, impressionistic thinking and
thereby violate the rules. Apparently, it seems more likely that New York might face a
terrorist act committed by Muslim extremists than that New York might face a terrorist
attack.
The most famous problem of this type—the “Linda Problem”25 —is instructive. In
this widely administered problem, Professors Amos Tversky and Daniel Kahneman gave
subjects the following information about Linda:
Linda is 31 years old, single, outspoken, and very bright. She majored in
philosophy. As a student, she was deeply concerned with issues of
19
Note: Much of this section (everything other than the between-subjects conjunction problem) was taken
from our article in the Duke Law Journal ("The 'Hidden Judiciary': An Empirical Investigation of
Executive Branch Justice"), where the data from our extensional fallacy problem was first reported.
20
Maya Bar-Hillel & Efrat Neter, How Alike Is It Versus How Likely Is It: A Disjunction Fallacy in
Probability Judgments, 65 J. PERSONALITY & SOC. PSYCHOL. 1119, 1119 (1993); see also Amos Tversky &
Daniel Kahneman, Extensional Versus Intuitive Reasoning: The Conjunction Fallacy in Probability
Judgment, 90 PSYCHOL. REV. 293, 293–94 (1983) (“[The] probability theory does not determine the
probabilities of uncertain events—it merely imposes constraints on the relations among them. For example,
if A is more probable than B, then the complement of A must be less probable than the complement of B.”).
21
Researchers have described this as “perhaps the simplest and most transparent rule of probability theory,”
Bar-Hillel & Neter, supra note 20, at 1130, which “even untrained and unsophisticated people accept and
endorse.” Id.
22
Id. at 1119.
23
Id.
24
Id.
25
See Tversky & Kahneman, supra note 20, at 297; Amos Tversky & Daniel Kahneman, Judgments of and
by Representativeness, in JUDGMENT UNDER UNCERTAINTY: HEURISTICS AND BIASES 84, 92 (Daniel
Kahneman, Paul Slovic & Amos Tversky eds., 1982).
14
discrimination and social justice, and also participated in anti-nuclear
demonstrations.26
The researchers asked the subjects to rank-order the likelihood of eight different
statements, including these three: “Linda is active in the feminist movement”; “Linda is a
bank teller”; and “Linda is a bank teller and is active in the feminist movement.”27 The
description made it seem as though Linda was a feminist, but not a bank teller. As a
result, subjects generally reported that it was more likely that Linda was a bank teller
active in the feminist movement than that she was a bank teller.28 Obviously, this is
inaccurate. Under the conjunction rule, it cannot possibly be the case that it is more likely
that Linda was a bank teller and was active in the feminist movement than that she was
simply a bank teller.29
To explore whether judges would comply with, or violate, the conjunction rule,
we gave those who attended the national conference a problem called the “Employment
Case.” We asked a group of 102 Administrative Law Judges ("ALJ's) to imagine that they
were presiding in a case involving an employment dispute between Dina El Saba, a
public sector employee, and the agency for which she previously worked. The judges
learn that Dina worked as an administrative assistant for a senior manager named Peter
before the agency fired her. While at the agency, Dina’s employment evaluations were
“average” to “above average,” so she claimed her termination must have been motivated
by unlawful discrimination. The agency contends, instead, that it terminated Dina
because she repeatedly violated workplace rules and norms. Among other things, she
“took too many breaks during the workday and took odd days off as holidays”; “dressed
in ways that made her co-workers and agency visitors feel uncomfortable, covering
herself mostly in black”; acted “odd” and “aloof”; and refused to eat lunch in the
presence of male co-workers.
Based solely on these facts, we asked the ALJs to rank-order the likelihood of the
following four options:
_____ The agency unlawfully discriminated against Dina based on her
Islamic religious beliefs
_____ The agency actively recruited a diverse workforce
_____ The agency adhered to its internal employment policies
26
Tversky & Kahneman, supra note 20, at 297; Tversky & Kahneman, supra note 25, at 92.
Tversky & Kahneman, supra note 20, at 297; Tversky & Kahneman, supra note 25, at 92.
28
Tversky & Kahneman, supra note 20, at 297; Tversky & Kahneman, supra note 25, at 93.
29
The Linda problem can be criticized as methodologically flawed in that people might be assuming that
the single feature is actually meant to be the conjunction of “bank teller” and “not active in the feminist
movement.” But Professors Tversky and Kahneman have conducted a version in which they avoid this
problem by changing the second response to “Linda is a bank teller whether or not she is active in the
feminist movement” and obtained similar results. Tversky & Kahneman, supra note 20, at 299.
27
15
_____ The agency actively recruited a diverse workforce but also
unlawfully discriminated against Dina based on her Islamic
religious beliefs
Based on the facts we provided, we believed that the ALJs would deem it likely
that the agency unlawfully discriminated, but not that the agency actively recruited a
diverse workforce. Thus, we expected the ALJs to deem the fourth option (“The agency
actively recruited a diverse workforce but also unlawfully discriminated against Dina
based on her Islamic beliefs”) more likely than the second option (“The agency actively
recruited a diverse workforce”), even though, as a matter of deductive logic, the second
option must be more likely than the fourth. Judges who identified the fourth option (“The
agency actively recruited a diverse workforce but also unlawfully discriminated against
Dina based on her Islamic beliefs”) as more likely than the first option would also be
violating the conjunction rule (“The agency unlawfully discriminated against Dina based
on her Islamic religious beliefs”).
As expected, we found that the ALJs violated the conjunction rule. Rather than
thinking through the problem deliberatively, which would have led them to rank-order
options one and two as more likely than option four, we found the exact opposite. Of the
ninety-nine ALJs who responded to this problem, eighty-four (or 84.8 percent) violated
the conjunction rule in some way. These eighty-four judges committed all of the possible
errors, albeit at different rates: thirty-three rated the fourth option as either equally likely
as, or more likely than, both the first and second options;30thirty-six ranked the fourth
option as either equally likely as, or more likely than, the first option (but not the second
option);31 fifteen ranked the fourth option as either equally likely as, or more likely than,
the second option (but not the first option).32 Thus, the problem lured most judges into
committing the conjunction error, and they were more likely to commit the error with the
distraction of the more probable of the two components, just as happened in the classic
Linda problem.33
30
Of these thirty-three judges, twenty ranked the fourth option as more likely than both options one and
two, eight wrote in that they were all equally likely, three assigned the same ranking options one and two as
option four, and two simply put a check mark next to option four.
31
Of these thirty-six judges, one ranked the first and fourth options as equally likely.
32
Of these fifteen judges, seven ranked the first and fourth options as equally likely.
33
Although female judges were somewhat more likely to commit the conjunction error than male judges
(89 percent, or fifty out of fifty-six, versus 79 percent, or thirty-one out of thirty-nine, respectively), this
difference was not significant. Fisher's exact, p = .24. More experienced judges tended to be more likely to
commit the error than their younger counterparts, but this trend was also not significant. Logistic regression
of committing the error on years of experience yielded a negative, but not significant, coefficient of -.048, z
= 1.35, p = 0.18. The error rate was also nearly identical among judges who make recommendations as
opposed to the other judges (86 percent, or thirty-six out of forty-two, versus 84 percent, or forty-two out of
fifty, respectively). This was not a significant difference. Fisher's exact, p = 1.00.
Previous work on the CRT has shown that people who score high on the CRT are less likely to
commit the conjunction error. Oechssler et al., supra note, at 5 (“Of our subjects in the Low CRT group,
62.6% [committed the conjunction fallacy on the ‘Linda’ problem, but t]his percentage is much lower for
the High CRT group at 38.3% . . . .”). We found that among the judges who scored perfectly on the CRT
and answered this question, 72 percent (thirteen out of eighteen) committed the fallacy, whereas 86 percent
(sixty out of seventy) of the judges who got at least one of the CRT questions wrong committed the fallacy.
The percentage who committed the conjunction error broken down by exact CRT score: zero right on CRT,
16
As with our other findings, we believe that the judges in responding to this
problem were judging a different object of judgment than one might have thought.
Judges were not treating this problem as one of deductive logic and taking advantage of
its logical structure. Rather, they were trying to make sense of the story, which led them
to be distracted by their intuitions. Just like "Linda" feels like a feminist bank teller and
not an ordinary bank teller. Dina feels like a likely target of discrimination, so options
that highlight that fact seem more likely than those that do not. This suggests that
providing more detail can induce judges to come to believe fact patterns more readily,
even though they necessarily (as a matter of loci) make the stories less likely.
But like the other efforts to channel judges with misdirection in probabilistic
evidence, judges might resist this effect when asked a question with more direct legal
effect. We crafted another test of extensional reasoning that provided a more
straightforward test of the thesis that adding more detail will persuade judges. We gave
this problem to the same group of 68 judges from an urban Eastern jurisdiction who read
the problem on probabilistic evidence.
In the problem, labeled "Evaluation of Probable Cause", we asked judges to
assess probable cause in the issuance of a search warrant. We did not ask the judges
whether they would grant a warrant or not, but asked them to assess the likelihood that
the defendant had actually committed the crime. The facts ran as follows:
"Imagine that you are trying to decide whether to issue the police a
warrant to search the home of a suspect in a murder case. The suspect is John P.,
a meek man, 42 years old, married with two children. His neighbors describe him
as mild-mannered, but somewhat secretive. He owns an import-export company
based in New York City, and he travels frequently to Europe and the Far East.
Mr. P. was convicted a couple of years ago of smuggling precious stones and
metals (including uranium) and received a suspended sentence of 6 months in jail
and a large fine. The murder victim is one of his employees.
"The police have an eyewitness who knows Mr. P and states that he thinks
that he may have seen Mr. P. at the victim’s apartment shortly before the time that
the police believe that the victim was murdered. The police have no other
meaningful evidence against Mr. P. They have no other suspects at this time.
"In deciding whether there is probable cause for issuing the warrant, it is
important to get a sense of the likelihood that Mr. P may have committed the
crime. . . . "
84 percent (twenty-five out of thirty); one right on CRT, 86 percent (nineteen out of twenty-two); two right
on CRT, 89 percent (sixteen out of eighteen); three right on CRT, 72 percent (thirteen out of eighteen). This
difference was not, however, significant. Fisher's exact, p = 0.29. We also ran a logistic regression of
whether the judges committed the error based with CRT score as a predictor, which also showed no
significant effect. z = .72, p = .47. That we only observed a trend might be due to a somewhat small sample
size with which to identify the effect.
17
All of the judges received these facts, but we varied the question we asked judges
slightly. For half, we asked simply " Given the description above, how likely is it that
Mr. P. committed the murder?" For the other, we added a motive: "Given the description
above, how likely is it that Mr. P. committed the murder to prevent the victim from
talking to the police about smuggling?"34 Asking whether the defendant committed this
crime as a result of this particular motive excludes the possibility that he committed the
crime out of some other motive. Hence, the probability that the defendant committed the
crime because of that particular motive is necessarily smaller than the probability that he
defendant committed that particular crime. But research on conjunction and extensional
problems suggests that adding details that explain the event make it seem more likely.
And theorists have suggested that this affect might influence how judges and juries think.
(Kelman et al, 1996).
But the judges in our study were not affected by this manipulation. The 32 judges
in the "no motive" condition stated that, on average, the defendant had a 26.5% chance of
having committed the crime. The 33 judges in the "motive" condition gave a slightly
smaller average of 23.5%. In addition to running in the opposite direction from that
predicted by the previous work on the conjunction fallacy, this difference was not
significant statistically.35
In effect, when judges were properly focused on the relevant issue, they were not
distracted by conjunction effects. This single study cannot rule out the possibility that
conjunction effects cannot be used to persuade judges. They might simply have imputed
the motive (which was the obvious one) in the "no motive condition" and under other
circumstances, the effect might emerge. And conjunction effects have proven to be
powerful in other, similar contexts. (Tversky and Kahneman, 1983). But it supports the
continuing theme of this paper that judgment can be influenced by an adjustment to the
object of judgment. In the "Dina" problem, judges did not realize that a normative
approach would have set them off to looking for logical, deductive relationships among
the potential stories. But they were quite adept at avoiding such distractions when
focused directly on a question of whether a potential defendant committed a crime. It is
as if judges are rearranging the target of their judgment in an appropriate way.
Some of our studies above show that misdirection judges might be a profitable
undertaking for lawyers. Adding a weak witness produced notable context effects,
describing probability in a subjective probability format made forensic evidence more
compelling, and adding conjunctive details distracted judges from the logical structure of
a legal problem. But our studies also show that judges resist these efforts, and can
rearrange the target of their judgments so as to avoid misdirection. Even though judges
found forensic testimony more persuasive in the subject format, it did not influence their
judgments, and judges were not moved by conjunctive details in assessing the likelihood
that a potential defendant had committed a crime. Judges' ability to defend against
34
We also indicated to both groups that they should use a percentage to respond: "(Please state a
percentage between 0% and 100%.)"
35
t(60)= .063, p = .53.
18
potential misdirection suggests that judges manipulate the object of a legal judgment on
their own in ways that can avoid bias. Our final two sets of problems test this thesis with
the potentially misleading influence of the hindsight bias (in this study) and of evidence
that must be suppressed (in the final study).
IV. Hindsight Bias: Probable Cause  Probability
The hindsight bias provides an excellent test for the thesis that judges can
spontaneously alter the object of judgment so as to avoid erroneous judgments.
Psychologists have found that people are prone to the “hindsight bias,” according to
which the past comes to seem more predictable than it actually was.36 Once the outcome
of an event is known, that outcome comes to feel inevitable or at least much more likely
to have occurred than it would have seemed before it actually happened. This bias is
thought to influence a wide range of judgments in legal settings (Rachlinski, 1998). And
we have found that in at least two settings, it influences judges. (Guthrie et al., 2001;
Rachlinski et al., 2009).
Legal scholars have argued that the hindsight bias likely influences assessments
of probable cause. (Bibas, 20??). Judges assess probable cause both in foresight when
they decide whether to grant a search warrant, and in hindsight, when prosecutors attempt
to introduce evidence obtained in a search conducted pursuant to an exception to the
requirement of a warrant. In the latter instance the judge must assess whether the police
had probable cause to conduct the search, even though the judge knows that the search
produced incriminating evidence. Arguably, knowledge of the outcome taints that
judgment. Judges might determine that probable cause existed in hindsight, even in
circumstances in which they would not have granted a warrant in foresight.
In our initial investigation of this issue, we found no evidence that the hindsight
bias influences judgments of probable cause. (Wistrich et al, 2005). In our study, we
asked judges either to grant a warrant or whether to admit evidence obtained after a
search. In both cases, we presented identical facts to support the finding of probable
cause, except that in hindsight, we informed the judges that the search had produced
incriminating evidence. To our surprise, the same percentage of judges admitted the
testimony as granted the warrant. Hindsight did not seem to cloud their judgment.
How did judges manage to avoid the influence of the hindsight bias? We
undertook further study of probable cause determinations to find out if they truly avoided
the bias, or whether they simply made judgments that did not really turn on their
assessments of probability.
We recruited a total of 224 sitting judges attending five judicial education
conferences to participate in our study: 80 federal district judges (from three different
See Baruch Fischhoff, Hindsight ≠ Foresight: The Effect of Outcome Knowledge on Judgment Under
Uncertainty, 1 J. EXPERIMENTAL PSYCHOL.: HUM. PERCEPTION & PERFORMANCE 288, 288 (1975)
(“Reporting an outcome's occurrence increases its perceived probability of occurrence . . . .”).
36
19
educational conferences), 43 federal magistrate judges, and 101 state trial judges from
Florida. The federal judges were all attending educational conferences sponsored by the
Federal Judicial Center. The Florida judges were attending an annual educational
conference held each year and organized by the judges themselves. The educational
conferences were all generic, including a wide variety of topics for the judges. In all
cases, the data were collected at a “breakout” session during the conference. The judges
who participated in this research were thus judges who attended our panel, entitled “the
Psychology of Judging”, rather than some other educational session.37
As with our other studies, we did not ask the judges to identify themselves by
name, but we did ask them to identify their gender,38 number of years of experience on
the bench,39 and title (to ensure that they were judges). We also asked whether they had
served as a defense attorney or criminal prosecutor before becoming a judge.40 Finally,
we inquired into their political affiliation. For the Federal District Judges this inquiry
consisted of asking them to identify the political party of the President who appointed
them. For the magistrates and the state judges (both of whom are appointed largely
37
The judges in this sample all shared important characteristics. All of the judges participating in this
study were appointed, rather than elected. The Florida judges and the federal magistrate judges are
appointed for fixed terms by merit-based selection panels. They both can be (and often are) re-appointed.
The Federal District Judges are appointed by the President with the Advice and Consent of the U.S. Senate,
and have life tenure. All of the judges function essentially as trial-court judges who manage settlement,
hear motions, and preside over trials, although some variations exist between the three groups. The Federal
District Judges preside over all manner of civil and criminal suits filed in the federal courts in the United
States. Federal cases are individually assigned to a single federal judge, meaning that the judges preside
over all stages of a case, except appeal. The U.S. Magistrate Judges perform a role similar to that of the
Federal District Judges, except that the Magistrates may conduct trials only in civil cases and misdemeanor
criminal cases. By and large, the Magistrates focus more of their attention on pre-trial matters, such as
discovery and settlement, than do the Federal District Judges. The workload of the state trial judges
resembles that of the federal district judges, with one exception. The state judges rotate through various
departments (civil, criminal, family, and probate), such that at any point in time, these judges will only hear
matters arising within that department. Furthermore, some of the state judges specialize in a single
department; notably, the sample includes several judges who hear only family court matters.
38
Overall, only 21.6% of the judges in this part of the study were women (48 out of 222--with 4 judges
who did not answer the question). The percentage of women did not vary much by judge type: 20.3%
among the federal district judges (16 out of 79); 23.3% (10 out of 43) among the Magistrate judges; and
21.4% (21out of 98) among the state judges.
39
Overall, the judges had an average of 12.4 years of experience in their current positions as judges. As
with gender, the level of experience did not vary much by judge type. The federal district judges served for
an average of 11.4 years, the magistrates an average of 10.2 years, and the state judges for 14.4 years. This
figure understates the experience of the federal judges, however, as many had experience in state court
before serving as a federal judge. Among the 124 federal district and magistrate judges who responded to
the question about their experience, 41 (33.1%) had previous experience as a judge. Adding this
experience to their time on the federal bench shows the district judges and magistrates had an average of
15.1 and 12.1 years of total experience as judges, respectively. Factoring in this prior experience among
the federal judges reveals that the three samples of judges had an average of 14.2 total years of experience
as judges.
40
With regards to their experiences before becoming judges, 48.0% (106 out of 221) had been former
prosecutors (46.3%, or 37 out of 80, of the federal district judge; 46.5%, or 20 out of 43 federal magistrate
judges; and 50.0%, or 49 out of 98 of the state judges), and 55.0% (121 out of 220) had had experience as
criminal defense attorneys (51.9%, or 41 out of 79, of the federal district judges; 21.2%, or 22 out of 43, of
the federal magistrate judges; and 59.2%, or 58 out of 98 of the state judges).
20
through a committee recommendation process), we instead asked: “Which of the two
major political parties in the United States most closely matches your own political
beliefs?”41 None of these demographic variables influenced the judges’ decisions and we
therefore omit further discussion of them.
We gave each participating judge a questionnaire that included between four and
seven scenarios, only one of which dealt with probable cause. Some of the data from the
other items have been reported elsewhere (Wistrich et al., 2005). The probable cause
questionnaire was always included as the second item in the questionnaire, and the
demographic information on the judge as the last page.
Each judge received one of four versions of the questionnaire to create a 2x2
between-subjects design. Half of the judges received a questionnaire cast in a
foresightful perspective and half in a hindisghtful perspective. In foresight, the critical
inquiry was whether the facts of the scenario constituted probable cause for a search and
thereby justified granting a warrant. In hindsight, the materials indicated that a police
officer had already conducted a search that revealed incriminating information, and the
critical inquiry was whether the circumstances surrounding the search had constituted
probably cause, thereby making the search legal. Furthermore, we varied the severity of
the crime being investigated. The crime consisted either of an assault on or murder of a
police officer. The questionnaires were shuffled thoroughly before our presentation and
were distributed randomly to the judges.
The basic story in all four variations was the same. The one-page fact pattern,
labeled “Fourth Amendment Issue,” began with one of the following two paragraphs:
FORESIGHT:
“You have been asked to issue a telephonic warrant authorizing Officer
George McAllen to search the trunk of a parked car for evidence related to the
battery of a police officer. After you placed McAllen under oath, he relayed the
following information to you:”
HINDSIGHT:
“You have been asked to rule on a motion to suppress evidence obtained
from a warrantless search of the trunk of a parked car made by Officer George
McAllen for evidence related to the battery of a police officer. The parties’ briefs
convey the following information:”
Both versions then presented the following facts (with variations by crime noted):
41
Across all of the judges, 48.6% (101 out of the 208 who answered the question) identified themselves as
more affiliated with the Republican than Democratic Parties (or were appointed by a Republican President).
This varied somewhat by judge type: 57.0% (45 out of 79) of the federal district judges were appointed by
a Republican President; only 15.4% (6 out of 39) of the federal magistrate judges identified more with the
Republican party; and 55.6% (50 out of 90) of the state judges identified with the Republican party.
21
“Officer McAllen was part of a task force investigating a drug distribution
network operating in a poor urban area. McAllen was driving to meet a potential
informant at 1:47 am on a Saturday morning when he received a call from his
supervisor stating that another police officer had just been attacked while
operating undercover one mile from McAllen’s location. [Battery: The officer had
been struck hard in the head by a blunt instrument. Although the attack had left
him groggy, he was expected to recover fully. Murder: The officer had been
bludgeoned to death by a blunt instrument.]
“The perpetrator remains at large and the only available information about
him is that he is likely a drug dealer who had identified the officer. The officer
also wounded the perpetrator with a knife, which was found at the scene. [NOTE:
in hindsight, we used past tense]
“Officer McAllen remained in place after receiving the call, waiting for
his informant. Fifteen minutes later, McAllen observed a late-model, black BMW
park in front of a small nightclub. The driver got out of the car, opened the back
door, pulled out a long, curved piece of metal from the seat, and placed it into the
trunk of the car. The driver closed the trunk and then entered the club. McAllen
noted that the driver had a bandaged hand.
“Officer McAllen walked over to the car. He observed that the front left
tire was a small, temporary tire of the type used as a spare. This observation
made him realize that the metal object was likely a crowbar. He looked into the
car and observed a car jack on the floor and three envelopes on the back seat, two
of which appeared to be stuffed with cash. McAllen also observed a stain,
possibly from blood, on the steering wheel.”
The materials then closed with one of the two passages below:
FORESIGHT:
“Based on these observations, Officer McAllen believes there is probable
cause to search the trunk of this car and has asked you to issue a telephonic
warrant authorizing the search.
“Would you issue the warrant?
_____ Yes, there is probable cause for the search; I would issue the
warrant
_____ No, there is not probable cause for the search; I would not issue the
warrant”
HINDSIGHT
“Based on these observations, Officer McAllen believed that there was
probable cause to search the trunk of the car. He opened the trunk and found a
bloodied crowbar and a large quantity of white powder that appeared to be
cocaine. After phoning for backup, McAllen and his colleagues arrested the
driver when he returned to his car.
22
Subsequent investigative work confirmed that the driver’s fingerprints were on
the crowbar. DNA tests also matched the blood on the crowbar with that of the
officer who had been attacked. The BMW driver is now being prosecuted for
battery and drug violations.
“The driver’s defense attorney has filed a motion to suppress the evidence
obtained from the trunk on the ground that there was no probable cause to conduct
the search.
“Would you allow the evidence to be admitted?
_____ Yes, there was probable cause for the search; I would admit the
evidence
_____ No, there was not probable cause for the search; I would not allow
the evidence to be admitted”
The final question thus provides the first dependent variable. It requests that the
judges give their opinion as to whether the circumstances supported a finding of probable
cause. The call of the question lists both the determination of probable cause and the
consequence.
The materials presented above depict the versions presented to the state judges.
The federal version differed slightly in that the law enforcement agents were identified as
FBI agents, rather than police officers. This was necessary to ensure that the crime was
of a type that would come before a federal judge.
In effect, each of the versions gives an ambiguous set of information about an
individual who may have been a drug dealer who attacked a police officer (or FBI agent).
The story was intended to suggest either that the suspect was the perpetrator, or just
someone out at night who had recently changed his tire. In foresight, the materials
simply ask whether granting a warrant is appropriate. The hindsight version goes on a bit
further to indicate that the police officer conducted the search without a warrant. It
describes the results of the search and indicates that the suspect was indeed the
perpetrator--presenting evidence that leaves little doubt about it. None of the evidence
will be admissible, however, unless the police officer had probable cause to conduct the
search.
Because the search involved an automobile, the officer can conduct the search
without a warrant. Law enforcement officials can engage in warrantless searches if
exigent circumstances would justify acting quickly, without the time necessary to obtain
a warrant. Case law firmly establishes that the search of an automobile constitutes exigent
circumstances, owing to the concern that automobiles are apt to be moved while the
officer obtains the warrant. Carrol v. United States, 267 U.S. 132 (1925); United States
v. Ross, 456 U.S. 798 (1982). Officers sometimes seek a warrant, even when exigencies
would excuse its absence, as a means of ensuring the admissibility of any evidence they
might obtain. Thus, both our foresight and hindsight conditions portray realistic
situations. Exigency does not, however, excuse the necessity that probable cause be
present. Absent probable cause, a judge should not issue a warrant in foresight or should
suppress evidence obtained in hindsight.
23
At the top of the page that followed the probable cause hypothetical, the judges
confronted a second question concerning the scenario. They had been given no notice of
this question, although had they turned ahead before answering, they might have seen it.
This second question was designed to elicit a probability estimate as to the success of the
search. In the foresight conditions, we asked:
“In the problem on the previous page, what is the likelihood that the search, if
conducted, would uncover evidence that would incriminate the driver in the attack
on the police officer?”
In the hindsight conditions, we asked:
“In the problem on the previous page, if Officer McAllen had requested a
telephonic warrant before conducting the search, what would you have said was
the likelihood that the search would have uncovered evidence that would
incriminate the driver in the attack on the undercover police officer?’
Below the question was a blank line ending in a percent symbol.
As to the determination of probable cause, the judges overall displayed no
significant effects, as the table below shows. Across both crimes, 57.1% (68 out of 119)
of the judges in foresight found the circumstances supported a finding of probable cause,
as compared to 54.3% (57 out of 105) of the judges in hindsight. Among the judges
reviewing the battery case, 55.7% (34 out of 61) in foresight found probable cause, as
compared to 44.0% (22 out of 50) of the judges in hindsight. Among the judges
reviewing the murder case, 58.6% (34 out of 58) in foresight found probable cause, as
compared to 63.4% (69 out of 113) in hindsight. Logistic regression of the choice on
time and crime revealed no significant main effects or interactions.42
Table: Percent of judges finding probable cause, by condition.
Battery
Murder
Total
Foresight
55.7 (34/61)
58.6 (34/58)
57.1 (64/119)
Hindsight
44.0 (22/50)
63.6 (35/55)
54.3 (57/105)
Total
50.5 (56/111)
61.1 (69/113)
54.0 (121/224)
These results replicated and extended our earlier finding that the hindsight bias
does not appear to influence judgments of probable cause. This scenario replicates the
result that judges make the same judgments in foresight as hindsight with respect to
probable cause in a different fact pattern. And we extend the result by showing that the
severity of the crime does not influence judges either. We had thought that even judges
42
All p’s > .20.
24
could overlook their knowledge that the defendant was actually guilty for minor offenses,
it would be more difficult to did so for more serious crimes. The more serious the crime,
the more costly it is to suppress the evidence. In the literature on the hindsight bias, in
fact, negative outcomes produce a more significant bias than positive ones, and we
reasoned that an even more negative outcome might facilitate the bias. But judges
resisted this manipulation—at least in this hypothetical setting.
Our second dependent measure affords us some ability to assess how it is that
judges avoid the influence of the hindsight bias. In fact, our initial question is somewhat
unlike that which hindsight bias studies typically pose. Typically, researchers ask for
assessments of probability, whereas we had asked for a judgment with direct legal effect.
Our follow-up question, however, asked judges to make probability estimates.
The results also showed no overall hindsight bias effects, at least at a superficial
level, as the Table below indicates. ANOVA of these results showed no significant main
effect of interaction.43
Table: Mean Rating of probability, by condition.
Battery
Murder
Total
Foresight
53.4
56.9
55.2
Hindsight
56.0
61.1
58.6
Total
54.8
59.0
56.5
But these results are apt to be influenced somewhat by the judges’ rulings on
either granting a warrant or admitting the evidence. We asked the judges these questions
before asking for a probability assessment. When we re-ran the analysis, controlling for
the effect of their ruling, we found that time had a marginally significant effect on the
probability estimate.44 The Table below, which collapses across the type of crime, but
controls for the judges’ rulings, shows a marked hindsight bias when judges determined
that there was no probable cause. It also shows that the judgment of probable cause did
correlate with their probability judgments.45 Hence, the judges were subject to a
hindsight bias, but their rulings masked the effect.
All p’s > .20.
ANOVA of the probability estimate on time, crime, ruling, and all interactions. Main effect of time was
marginally significant, F(1, 183) = 3.04, p =.08.
45
This was significant as well. F(1, 193) = 27.8, p < .001
43
44
25
Table: Mean Rating of probability, by condition.
Ruling on Probable cause
No
Yes
Foresight
40.5
65.0
55.2
Hindsight
52.0
64.6
58.6
Total
46.3
64.8
56.5
Condition
Total
Thus, the hindsight bias has a complex relationship to the assessment of probable
cause. Curiously, judges do not seem to base their rulings of probable cause on an
assessment of the likelihood that the search would turn up (or would have turned up)
incriminating evidence. Rather, we suspect that the judges are assessing police conduct
in a more general manner. Probable cause has generated a mountain of case law—much
of which is familiar to the judges we studied and most of which represents a judicial
effort to manage police behavior, not guide criminal investigations. Judges largely
ignored the probability that the search would turn up incriminating evidence and ignored
even the severity of the crime. The object of judgment is the judicial evaluation of what
constitutes reasonable investigation tactics—not the probability of a successful search.
Probability judgments, and the accompanying hindsight bias, lurked in the
background. Controlling for the judges’ ruling, the judges gave higher probability
estimates of the likelihood of a successful search when evaluating in hindsight—at least
among judges who felt that there was no probable cause.
Probability, however, had a relationship to the judges’ judgment. Judges who
determined that probable cause existed gave markedly higher estimates for the likelihood
of a successful search. But it seems likely that the judgment of probable cause drove the
probability estimate, rather than the other way around.
This pattern of results suggests, overall, that judges are not making probability
judgments when assessing probable cause. They focus their attention on different factors
and thereby avoid the hindsight bias that he situation would otherwise seem likely to
produce.
V.
Suppressed Confessions: Policing the Police.
In our final study reported here, we test another situation in which judges are
under great cognitive pressure to rely on an untoward influence—that of suppressed
evidence. Bench trials can sometimes place judges in the awkward position of ruling
what evidence is to be considered, and then limiting their assessments to only that
evidence. Juries have a luxury in that regard, in that they are commonly (although
26
certainly not always) shielded from even knowing what evidence was excluded.
Psychology includes a large literature on human ability to ignore known information that
does not bode well for a judge who must ignore highly relevant testimony. The human
brain does not seem well suited to ignoring relevant information any more than it is well
suited to ignoring known outcomes (which produces the hindsight bias).
In a previous study, we demonstrated that judges have difficult ignoring evidence
that they deem inadmissible. (Wistrich et al, 2005). In numerous contexts, we showed
that trial judges rely on evidence even after they rule it to be inadmissible. But two
instances, judges seemed to resist the influence of inadmissible evidence. The first was
that they resisted the influence of the hindsight bias on their probable case assessments,
as discussed above. The second was that they seemed able to disregard a reliable
confession that was illegally obtained. In this scenario, we asked judges to determine
whether they would find a criminal defendant guilty or innocent in a bench trial. We
provided a set of somewhat weak evidence as to the guilt of a defendant charged with
robbing a convenience store. Half of the judges also learned that the defendant had
provided a reliable confession to the crime, although the confession was obtained two
hours after the defendant had asked for a lawyer—a request that he police ignored. When
asked judges to rule on the admissibility of this confession, almost all of the judges
deemed it inadmissible. And they ignored it as well. Judges who learned of the
confession and suppressed it were as likely to convict the defendant as judges who had
never heard the confession.
As with the hindsight bias, we wondered how it is judges managed this
cognitively challenging task. And as with the hindsight bias, we suspected that judges
were not truly immune from the effects of the inadmissible information. Anecdotal
comments from the judges in the study provided a few clues to what they were doing.
Judges noted that the crime was “only a robbery”; no one was hurt, and not much cash
was stolen. In such a case, the cost of suppressing the evidence, in terms of a lost
conviction, is lower than it would be if the crime were more serious. Also, judges
seemed annoyed at the blatant disregard of constitutional rights practiced by the police in
this scenario. We suspected that if the police conduct were more extreme, judges might
reveal their displeasure with even lower conviction rates than had they not learned of the
confession, and of the police misconduct that produced it.
In effect, we suspect that the judges are not truly ignoring the confession. Rather,
they are balancing different factors—the cost of suppression and the degree of police
misconduct. To test this, we altered our scenarios somewhat to vary the severity of the
crime (just as with the hindsight bias) and the police misconduct. We thereby created a
2x3 design in which we crossed the severity of the crime (armed robbery versus murder)
and the degree of police misconduct (none, ignoring a request for a lawyer, and a severe
interrogation). We predicted that the both the severity of the crime and the degree of
misconduct would influence conviction rates.
27
For the crime scenario, we used a fact pattern, modeled somewhat after our earlier
study. The problem, labeled “Evaluation of a Robbery Trial”, included the following
facts (with the added facts making the robbery into a related murder noted in brackets):
“Mr. Simson is on trial for bank robbery under 18 U.S.C. § 2113
[MURDER: and a related murder]. Mr. Simson has waived his right to a jury
trial. You are thus presiding in a bench trial. The following summarizes the
evidence presented at trial:
“In the early morning, an armed assailant wearing jeans, a white t-shirt, a
ski mask, and black gloves entered a small branch of First Federal Bank (a
federally insured bank), ordered everyone onto the ground, and demanded that a
teller put money in a plastic shopping bag. The teller complied, quickly emptying
$520 that she had just received from a customer into the bag. As she reached for
more money, the perpetrator ran off. The robbery was captured on a surveillance
camera videotape.
[MUDRER: On his way out of the bank, the perpetrator stumbled into a young
woman pushing a stroller with her infant son. Startled, the perpetrator fired at the
woman twice, killing her instantly. The child was unharmed.]
“When police arrived, the teller reported that once outside the bank, the
perpetrator pulled off his ski mask, discarding both it and the gun as he climbed
quickly into a white Ford Taurus and sped off. Police retrieved the gun and mask;
neither had usable fingerprints. The gun was unloaded. The gun had been
reported stolen several years earlier by its original owner, who is now deceased.
“Several police officers then began a search of the neighborhood for a
white Ford Taurus. Two hours after the crime, they found one, parked 10 blocks
from the crime scene. The Department of Motor Vehicle records identified the
owner as the defendant. The police knocked on the door to his apartment. The
defendant matched the height, weight and race of the perpetrator in the
surveillance videotape, although he was wearing different clothing. The police
then insisted that the defendant accompany them to the station-house to answer
questions, which he did.
“Upon arrival, the police led him to a room, locked the door, read him his
Miranda rights, and began interrogating him. The defendant reported that he had
been home alone all morning. The police allowed the teller to listen in from the
next room. The teller reportedly said “that sounds like the guy.” The police then
placed the defendant under arrest. They obtained a search warrant and searched
his apartment. They found shopping bags similar to the one used by the
perpetrator of the crime, a pair of black gloves, and clothing matching that of the
perpetrator (white t-shirt and jeans) in the washing machine. The defendant also
had nearly six hundred dollars in cash in his apartment. The police did not find
firearms or ammunition of any kind.”
28
In the two versions in which there was no confession, the materials ended by
noting: “The police continued questioning the defendant, but he requested a lawyer and
the interrogation ended.” Then the materials ended by asking: “Based solely on the
evidence admitted at trial, would you convict the defendant?”
In the two versions involving mild police misconduct, the story continues instead
as follows:
“The police continued questioning the defendant. Even though the
defendant clearly requested a lawyer, twice, the police refused to call one and
continued the interrogation. Two hours later, the defendant confessed, and agreed
to write out a description of the crime. His written description matched the events
perfectly, including the fact that he discarded the ski mask and gun outside the
store (which the police had not told him). The entire interrogation was recorded
with both video and audio.”
In the two versions involving severe police misconduct, the story continued
instead to state that:
“The police continued to interrogate the defendant. During the entire
interrogation, they had denied the defendant access to the restroom, and he
ultimately soiled his clothing. One officer had threatened the defendant and
“pushed him around.” After nine hours of this treatment, the defendant
confessed and agreed to write out a description of the crime. His written
description matched the events perfectly, including the fact that he discarded the
ski mask and gun outside the store (which the police had not told him). The
entire interrogation was recorded with both video and audio.
Finally, in the four versions in which the defendant confessed, the materials went
on to indicate that:
“The defendant’s attorney has moved to suppress the confession, arguing
that the interrogation violated the defendant’s rights under Miranda by continuing
after the defendant had requested an attorney. Would you grant the motion and
suppress the evidence?”
After obtaining the ruling, the materials then asked, “Based solely on the evidence
admitted at trial, would you convict the defendant?”
The design is described in the table below:
29
Table: Experimental Design
Crime
Robbery
Police Misconduct
None (no confession)
Murder
None (no confession)
Ignored requests for
lawyer
Ignored requests for
lawyer
9-hour severe
interrogation
9-hour severe
interrogation
We presented this scenario at multiple judicial education conferences in order to
obtain enough data to fill out all six conditions. Much of the data came from judges who
are described above. Notably the 81 Federal District Judges, 44 U.S. Magistrates, and
101 Florida State judges who participated in our extended study of the hindsight bias also
responded to this scenario. Additionally, 88 judges in attendance at a conference of
California State trial judges in May of 2006 (in Palm Springs) assessed this scenario for a
total of 314 judges.
In analyzing the results, we excluded the 7 judges who declined to suppress the
evidence.46 Excluding these judges, the Table below reports the conviction rates by
condition47:
Table: % Conviction by Condition
No Confession
2-hour
9-hour
Total
Robbery
29.6 (13/44)
43.1 (22/51)
28.3 (15/53)
33.8 (50/148)
Murder
24.0 (12/50)
36.0 (18/50)
44.0 (22/50)
34.7 (52/150)
Total
26.6 (25/94)
39.6 (40/101)
35.9 (37/103)
34.2 (102/298)
As the table shows, the severity of the crime did not affect conviction rate when
the defendant did not offer a confession.48 Unlike our previous study, the confession
seemed to affect judgment, although as predicted, this effect depended both on the crime
and on the police misconduct. When the police engaged in a minor violation of the
defendant’s rights, the judges were influenced by the confession; even though they
suppressed the confession, they were more likely to convict the defendant.49 This
46
These consisted of 1 District Judge in the murder/9-hour condition who convicted; 1 District Judge in
the robbery/9-hour condition who acquitted; 1 Magistrate in the robbery/2-hour condition who convicted; 1
Florida judge in the murder/2-hour condition who acquitted; 1 Florida judge in the murder/2-hour condition
who convicted; 1 Florida judge in the robbery/2-hour condition who convicted; 1 Palm Springs judge in the
robbery/2-hour condition.
47
Note: 9 judges gave no verdict: 6 Florida judges (1 robbery/no confession; 1 robbery/2-hour; 1
robbery/9-hour; 1 murder/2-hour; 2 murder/9-hour) and 3 PS judges (1 robbery/2-hour; 1 murder/2-hour; 1
murder/9-hour.
48
Fisher’s exact test, p = .642.
49
Collapsing across crime, the confession in the minor violation produced marginally significantly more
convictions. Fisher’s Exact Test, p = .07
30
conviction rate did not vary with the type of crime.50 When the police misconduct was
severe, however, the conviction rate dropped slight for the robbery (albeit not
significantly),51 but did not drop for the murder.52
Analyzing the results using logistic regression revealed that the crime overall had
no significant effect53; that the presence of a confession overall had a marginally
significant effect54; that the harsh interrogation produced a slightly lower conviction rate
than the mild police misconduct55; that the crime by confession (overall) interaction was
not significant56; but notably, the crime by harshness of the interrogation interaction was
significant.57
These results show that judges are not, in fact, able to ignore the confession.
Rather, they are making a more complex judgment. Just as in the case of the hindsight
bias, they are assessing the degree of police misconduct. The judges reacted to the
greater degree of police misconduct by reducing their willingness to convict the
defendant when the crime was less serious. When the crime was a horrific homicide,
however, judges were unwilling to punish the police misconduct with lower convictions.
Note that throughout, the judges followed the law exceptionally—the confessions were
all clearly illegally obtained and almost all of the judges suppressed them. But the judges
also took two extra-legal steps: they convicted the defendant more often when they had
learned of (and suppressed) the confession, and they punished severe police misconduct
not only by suppressing the conviction (which the law obliges them to do), but also by
reducing their willingness to convict the defendant.
The object of judgment in the case of these materials was thus a combination of
the underlying facts, the additional confession, the severity of the crime and the degree of
police misconduct. Had the judges been truly ignoring the confession, only the facts
would have affected their judgment.
50
51
52
53
54
55
56
57
Fisher’s exact test, p = .54.
Fisher’s exact test, p = .15.
Fisher’s exact test, p = .54.
z = .61, p =.54.
z = 1.68, p =.09.
z = 1.98, p =.05.
z = .20, p =.85.
z = 2.03, p =.04.
31
V. Conclusion
The results of these studies paint an intricate portrait of how judges might be
navigating the risk of cognitive errors in the courtroom. Lawyers have opportunities to
exploit judicial vulnerability to untoward influence by manipulating the setting to alter
the object of judgment from a normatively appropriate one (such as credibility of
witnesses in an absolute sense) to a misleading one (relative credibility of witnesses).
Judges are not always so easily fooled by tricky litigation strategies, however, as they can
refocus their attention when the ultimate judgment must be made. In our studies of
probabilistic testimony and the misleading effects of conjunctive events, judges seemed
to be misled by the illusion in some respects, but ultimately made good judgments on the
most critical question (likelihood of guilt). Our study of the hindsight bias also shows
that judges seem to divert their attention away from potentially misleading judgments
(such as a probability assessment) onto a more stable set of judgments (such as what
constitute appropriate police conduct). But even this kind of refocusing can lead judges
astray, as we show in our last study, in which judicial attention to police misconduct
leads to a somewhat lawless response.
Overall, we believe these kinds of studies being paint a more thorough portrait
both of judicial vulnerability and resilience to errors in judgment. Judicial errors are not
random mistakes, but are products of both the limits of human cognition, efforts by
lawyers to exploit those limits, and a judicial effort to cope with those limits.
32
Download