English preprint (MS Word document)

advertisement
Online Studies of Probability Learning
Michael H. Birnbaum
Abstract
In the probability learning or probability guessing paradigm, the judge’s task is to
predict, guess, or choose which of two or more events will occur on the next
instance. For example, the person might be asked to predict whether the next card
drawn randomly from a shuffled deck will be red or black. The classic result in this
task is known as probability matching, a tendency by the participant to match the
probability of choosing the responses to the probability that each response has been
reinforced, or shown to be right. This chapter presents data from a Web-based
version of this experiment, controlled by a JavaScript program, and compares them
with data obtained from a sample of undergraduates tested in the lab. Web-based
research allows the investigator to assess how well results obtained with the typical
“subject pool” in the lab generalize across levels of age, education, gender, and other
demographic characteristics that vary in the Web but not the lab sample.
Keyword (#3-5): probability learning, choice, frequencies, rational decision making.
1. Introduction
In some learning tasks, the subject is reinforced for every “correct” response. For
example, one could test rats in a Y-shaped runway with two chambers. The rat might
be reinforced with food for taking a turn to the left at the choice point. It is also
possible to add some probabilistic “noise” to the environment by reinforcing the left
turn on a random subset of 70% of the trials and reinforcing a turn to the right on
30% of the trials.
Studies with probabilistic reinforcement have also been conducted with humans
who are asked to predict which of several events will occur on the next trial. Some of
these “studies” are conducted not by scientists, but by people in business who realize
that money can be made by involving humans in probabilistic prediction. For
example, will the next roll of the roulette wheel be an even number or odd? Will the
next card drawn from a deck be red or black? Will the next roll of the dice be a
seven? Will the home team win its next football match? People are willing to risk
money for the opportunity to make such predictions (i.e., to bet), if their correct
predictions are reinforced by the chance to win money.
One of the interesting findings in this area of research is that people do not follow
the optimal policy in such situations. Instead, they appear to match their choice
probabilities to the probabilities of the events that they are predicting. Although this
2
Michael H. Birnbaum
policy is not optimal, such behavior was consistent with a simple mathematical
model of learning proposed by Estes (1950). Bower (1994) reviews this prediction
(p. 295) and the literature on this topic.
Let us consider an example to illustrate both what would be an optimal solution
and contrast that with the typical finding of what people do in this situation. Suppose
a person is betting on whether or not the next card drawn from a shuffled deck will
be a heart or not a heart. A standard deck of cards has 25% hearts, 25% spades,
25% clubs, and 25% diamonds. If the person’s task is to be correct as much as
possible, the best policy is to always guess that it will NOT be hearts. That guess
should be made on every single trial. This policy will be correct 75% of the time. It
will only be wrong 25% of the time when a heart is drawn from the deck.
What people seem to do instead is to make different guesses on different trials,
with 25% of their guesses being hearts, and the other 75% not hearts. Thus, they
match the probability of guessing hearts to the probability that hearts has occurred.
Now, if people had Extra Sensory Perception (ESP), perhaps they could do better
than chance this way, but people do not have ESP. Instead, their predictions have
been found to be independent of the actual cards drawn. By matching probabilities,
one can correctly predict hearts on (.25)(.25) = .0625 of all trials and correctly
predict not hearts on (.75)(.75) = .5625 of the trials. These two ways of being
correct are mutually exclusive, so the probability of being correct is .0625 + .5625 =
.625. Thus, by matching the probability of choosing hearts to the probability of
hearts, the person will be correct on only 62.5% of the trials. By always guessing not
hearts, the person could have been correct on 75% of the trials.
2. Questions and Hypotheses
The question of why people probability match instead of guess the more probable
alternative has been explored for many years. Bower (1994) presents a good
summary of literature pertaining to Estes’ theory of learning in the special Centennial
issue of Psychological Review. Variations of this classic probability learning
paradigm form a rich area in which one can study learning, decision-making or
choice, human understanding of frequencies and probabilities, beliefs in ESP,
superstition, rationality, gambler’s fallacy, and configural hypothesis testing (Bower
& Heit, 1992; Estes, 1976).
In studies of learning with animals, it is very time consuming to collect a small
amount of data. One has to care for the rat colony. The animals are usually tested
only a few trials per day to ensure that they are uniformly motivated (have been
deprived of food) during the testing. When these studies are conducted with humans,
it is easier to collect data, but it could still be costly to test a large number of people.
The cost meant that each study might test only a few different conditions.
The purpose of this demonstration is to show that a Web version of this classic
result can be reliably replicated in both the lab and on the Web. If the Web version
reliably reproduces the classic result, then this Web version can be employed in
Probability Learning
3
future investigations that study such variables as instructions, payoffs, practice, or
other manipulations.
3. Method
The best way to understand the experiment is to load the experiment in the browser
and try a few “games” with it. "games" with it. Each time that the experiment is run,
a different value of p, the probability that the R2 (right side) event is reinforced is
randomly selected by the program that controls the experiment.
3.1 Subjects
There were 71 undergraduates who served as one option to partially fulfill an
assignment in lower division psychology classes. Each of these lab participants
completed the task twice, using the same version of the experiment as used by the
Web volunteers. These 71 participated with different values of p in each “game,”
forming a between-Ss design for the first replicate, and a within-Ss design comparing
first and second replicates.
The Web volunteers were 856 people who were recruited by links in sites listing online psychology experiments and by passive listings with search engines. These
participants were in a complete between-subjects design, in which each participant
was randomly assigned to a different value of p.
The CGI script recorded the remote IP address, to allow checking for multiple
submissions of data from the same participant. Although some of the Web volunteers
participated as many as four times, only the first set of data was analyzed for each
Web participant. The referring web page was also detected and recorded by the CGI
script, as a security check, to ensure that all of the data were generated from the same
Web page.
3.2 Material
The URL of the experiment is as follows:
http://psych.fullerton.edu/mbirnbaum/web/ProbLearn.htm
The experiment is entirely contained in one file. The JavaScript program, which runs
on the participant’s browser is contained in the file associated with this chapter.
There are comments in the JavaScript to help explain what each variable does.
4
Michael H. Birnbaum
The HTML page contains two forms, expanel , which creates the
experimental panel that is displayed in a table, and DataForm, which contains the
variables that will be passed as data to the CGI script, which saves the data on the
server. The JavaScript program assigns each person’s new game to a different level
of probability and runs the random trials, keeping track of how many times the
person selects the left or right button, and how often the person was right. The
JavaScript program also passes the results of the experiment to the form,
DataForm, which in turn, sends the data to the CGI script.
The probability is selected uniformly by the statement,
p = Math.random()
which selects a value of p uniformly in the 0 to 1 interval.
Pushing the left (R1) or right (R2) buttons on the experimental panel calls the
functions check1() and check2(), respectively. These functions carry out the
business of each trial. These functions update the trial number, n, by one on each
trial; n is initially set to zero and increments by one for each button push by the
participant. Each check function then decides randomly (by choosing another
random number) if R1 or R2 occurred, and it displays this feedback in the
experimental panel as well as feedback indicating whether participant’s guess was
“correct” or “wrong.” These functions also keep track of how many times the person
has clicked R1 (n1) or R2 (n2). After a time delay of 450 msec, the feedback is
erased by the clear3() function, and the experimental panel is ready for the
participant to make his or her next guess.
When the number of trials reaches 100, the check functions call the function,
done(), which computes the actual percentage of trials on which the feedback was
R2. Because there are 100 trials, this number will take on 101 possible levels, from 0
to 100. This function displays an alert box with feedback that summarizes how the
participant did. This function also causes the data to be passed to hidden variables in
the DataForm. The variables passed are pr, the percentage of times that R2 was
reinforced, p2, the percentage of times that the participant pushed the “R2” button,
and pc, the percentage of times that the participant correctly predicted the feedback
on the next trial.
3.3 Procedure
Different procedures employed with lab and Web participants. In the lab
procedure, the subjects were university undergraduates who came to the lab to
participate. The experimenter met them, showed them how to scroll and click, gave
them brief instructions, and then let them work at their own paces to complete the
task. The undergraduates were instructed to complete two games, and they were told
that each game is different.
In the Web procedure, participants found their ways to the site, read the
cursory instructions, and completed the task on their own. Some repeated the task
several times, but most completed just one game.
Probability Learning
5
3.4 Design
The main independent variable is the probability that R2 (as opposed to R1) is the
“correct” response on each trial. Because there were 100 trials, there were 101 levels
of p varying from 0 to 100 out of 100 trials.
The other variables used in the analysis are whether the people were tested with the
lab procedure or Web procedure and demographics. The participants were asked to
indicate their gender, age, education, and nationality before sending the data. The
data are saved by the server as a comma separated values (CSV) file. CSV files are
easy to import to spreadsheets and statistical packages such as Excel and SPSS.
4. Results
Figure 1 shows the relationship between the percentage of R2 predictions and the
percentage of R2 reinforcements. Each lab participant’s data are shown as a separate
point. Filled circles show results for the first “game” and open circles show results
for the second game.
6
Michael H. Birnbaum
In Figure 1, probability matching predicts that the data points should fall on
the identity line. Although there is considerable scatter, the data are approximated by
the probability matching hypothesis.
Figure 2 plots the percentage correct as a function of the percentage of R2
reinforcements. There are two predicted lines shown in the figure. The dashed
curve, consisting of two straight line segments, is the prediction of the optimal
strategy, which is to always respond with the more frequent button. This strategy
sets an upper limit, assuming there is no ESP, because it should take a few trials to
figure out which of the two responses is more frequent, especially for values close to
50%.
The solid curve in Figure 2 is a quadratic function showing the predicted
accuracy, assuming that the subject matches response probability to stimulus
probability. It also assumes that the subject does not have ESP. This curve gives a
better fit to the individual data.
Probability Learning
7
The mean percentage correct on Replicate 1 was 68.7 and for Replicate 2 it was
smaller, 67.3%. This decrease was not significant (t = -.46, p > .05).
Figure 3 shows the results, plotted as in Figure 2, for the Web participants.
Each point is a single game by a different person. The figure shows a very similar
pattern for the Web participants to that of the lab participants. There are a few values
far from the percentage matching result. These outliers appear to be cases in which
the more R2 that were reinforced, the more the person would choose R1, as if
following a rule sometimes called the “gambler’s fallacy.”
In the gambler’s fallacy, a gambler who has been losing comes to believe that
the coin, roulette wheel, cards, or dice remember and now “owe” the gambler more
wins, to make the events average out. It is called a fallacy because the law of large
numbers does not imply that results come out even at the end. The law merely says
that in a large sample, the sample proportion will be closer to the true probability,
which does not require the coin to have a memory.
These data show very little support for ESP, or extrasensory precognition of
the next event. Very few participants in either the lab or Web versions show
performance above the line showing how they could do by always sticking to the
more frequent alternative.
8
Michael H. Birnbaum
5. Further Questions and Discussion
Because one can collect large quantities of data via the Web, and because the
participants are so much more heterogeneous than lab subjects, it is possible to
subdivide the sample and examine whether the results depend on such demographic
characteristics as age or education. For example, among the 538 females who
participated, there were 40 with doctoral degrees, and among the 299 males, there
were 30 with doctorates. One can ask if those with doctorates were more likely to be
correct. Indeed, the mean number correct for females with the doctorate was 70.7,
and for males it was 69.9. These figures are higher than the corresponding means for
those who have some graduate school (66.2% and 67.1%, respectively), which in
turn are higher than the figures for those with only bachelor’s degrees. For the 110
females with bachelor’s degrees, the mean was 64.6% correct and for the 74 males,
the corresponding figure was 66.8. Although the trends are small, it becomes
possible with large, heterogeneous samples to be able to explore such small effects in
the data. For example, the percentage correct for the 70 participants with doctoral
Probability Learning
9
degrees is significantly higher than the percentage correct for the 629 people with 12
to 16 years of education, t(697) = 2.03, p < .05.
One can make the same types of graphs as shown in Figures 1 and 2 for each
subsample of the Web study to examine if there are demographic variables that
correlate with the results.
6. Extensions
This set-up allows one to investigate a number of interesting possibilities. Why do
people do so badly in this paradigm? Perhaps they are trying too hard to find
complex rules that will be correct more often than the simple rule of choosing the
button that has most often been right. Would instructions to avoid such rules or to
encourage such rules affect performance at the task?
One can use the basic engine to run studies that might ask these questions. One
could use an entry page to randomly assign participants to different conditions, using
techniques such as described by Birnbaum (2000) to assign people to conditions. In
one condition, people might be told the optimal strategy, and in another condition,
they might be told to use their ESP. In another condition, they might be instructed
that there is a pattern for them to discover that would enable them to be 100%
correct. In still another condition, people might be instructed that the pattern is
completely random and they should do the best they can to be as right as possible.
Two or more of the above conditions might be explored to see which set of
instructions produces the best or worst performance. As another between-Ss
manipulation, it would be interesting to give no special instructions, but have a
financial payoff for each correct prediction. Would this incentive cause people to
adopt better strategies?
A recent paper by Wolford, Miller, and Gazzaniga (2000) illustrates how one
can use the basic paradigm to investigate issues in neuroscience. They used the
probability learning (or “guessing”) paradigm with people who have had the corpus
callosum sectioned to alleviate symptoms of epilepsy. These people are sometimes
called “split brain” patients because the big body of direct connections between the
left and right hemispheres of the cerebrum have been cut. In these patients, it is
possible to isolate which half-brain receives the information by projecting
information to the right or left visual field. It is also possible to isolate which
hemisphere receives the stimuli or controls the response by putting the stimulus or
response in the left or right hand. Wolford, et al. conclude that when people are put
in this probability learning task, they attempt to form strategies. They believe that
these strategies are controlled by the left hemisphere, because the left hemisphere
performs more poorly at this task than the right. Perhaps the left hemisphere, which
speaks, controls these hypotheses, which produce the probability matching behavior.
Another possibility is that the left hemisphere is too verbal and less mathematical
than the right hemisphere, so perhaps the key is to remove the usual dominance of
the left hemisphere and allow the right hemisphere to solve the math.
10
Michael H. Birnbaum
In the lab version of the present study, each person participated twice in the
game, providing a within-Subjects test of the effect of p. In some situations, within
and between-subjects experiments are known to yield different results (Birnbaum,
1999; 2001). Although the within-Ss and between-Ss comparisons were compatible
in the lab version to the between-Ss research in the Web version (Figures 2 and 3), it
would seem interesting to examine how performance in this task might be affected
by repeated practice in the task. Imagine an experiment in which each participant
plays 100 or 200 games with different values of p. Would people in this situation
learn to be more optimal or would they continue to probability match? One could
plot the probability correct as a function of experience to examine this effect.
The probability learning paradigm has a rich history in psychology and is a
good situation in which students can devise new experiments to extend the science of
psychology.
7. Literature
Birnbaum, M. H. (1999). How to show that 9 > 221: Collect judgments in a
between-subjects design. Psychological Methods, 4(3), 243-249. This
paper shows how one can reach rather silly conclusions in a between-subjects
design. The study was done by the use of Web, but the point of the paper
applies to either Web or lab research.
Birnbaum, M. H. (2001). Introduction to Behavioral Research on the Internet. Upper
Saddle River, NJ: Prentice Hall. This book covers the techniques needed to
understand how to conduct simple experiments via the WWW. It is intended for
use with undergraduates, graduate students, or professionals who are getting
started in this field. It includes explanations of use of Excel and SPSS for data
analysis. Included are presentations of HTML and JavaScript, including
techniques for random assignment of people to conditions by different
techniques. The techniques used in this study of passing variables between
forms are explained as well.
Bower, G., & Heit, E. (1992). Choosing between uncertain options: A reprise
to the Estes scanning model. In A. Healy, S. Kosslyn, & R. M. Shiffrin
(Eds.), From learning theory to connectionist theory (pp. 21-43). Hillsdale,
NJ: Lawrence Erlbaum. This paper compares Estes’ scanning model with an
expected utility model applied to experimental situations that generalize the
probability learning paradigm. For example, the subjects might learn the relative
strengths of horses, political candidates or choices in a gambling context. They
can then be presented with test pairs that have not previously raced, competed,
or been compared. In the test phase, it is of interest to predict the choice
probabilities in these new test pairs as a function of the frequencies and strengths
of reinforcements in the learning phase.
Bower, G. (1994). A turning point in mathematical learning theory. Psychological
Review, 101, 290-300. This article reviews research on mathematical models of
learning that followed from the classic article of Estes, including the probability learning
paradigm. It was included in the special Centennial issue of Psychological Review that
Probability Learning
11
covered the most important topics covered in the journal’s first hundred years. This
article also summarizes more modern developments in learning theory, including recent
works by Estes, which use connectionist adaptive network models for the revision of
behavior as the result of experience.
Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review,
57, 94-107 (Reprinted in part in Psychological Review, 1994, 101, 282-289). This
is article is a classic. It is regarded as one of the most important papers to appear in the
first century of publication of the Psychological Review.
Estes, W. K. (1976). The cognitive side of probability learning. Psychological
Review, 83, 37-64. This article presents a cognitive theory of frequency learning to
account for variants of the probability learning paradigm and other related experiments on
predictive behavior, such as exhibited in multiple cue probability learning studies, and
relative frequency judgments.
Restle, F., & Greeno, J. G. (1970). Introduction to mathematical psychology.
Reading, MA: Addison-Wesley. This undergraduate textbook contains several useful
chapters on mathematical learning theory. It reviews evidence testing urn models of
learning as replacement or learning as accumulation and whether learning is all-or-none
or gradual. The process of mathematical formulation of clear theory and experiments to
compare theories is well-explained in these chapters.
Wolford, G., Miller, M. B., & Gazzaniga (2000). The left hemisphere’s role in
hypothesis formation. Journal of Neuroscience, 20: RC64:1-4. This paper
shows that patients with split brains show probability matching when the task is
performed in the left hemisphere and give more optimal behavior in the right
hemisphere. It can be read and viewed on line from the following URL:
http://www.jneurosci.org/cgi/search?volume=&firstpage=&author1=wolford&a
uthor2=&titleabstract=&fulltext=&fmonth=Jan&fyear=1981&tmonth=Nov&tyea
r=2000&hits=10&sendit=Search&fdatedef=1+January+1981&tdatedef=1+No
vember+2000
8. Acknowledgments
This research was supported by National Science Foundation Grants, SBR9410572 and SES 99-86436.
(SBR-9410572).
Download