Online Studies of Probability Learning Michael H. Birnbaum Abstract In the probability learning or probability guessing paradigm, the judge’s task is to predict, guess, or choose which of two or more events will occur on the next instance. For example, the person might be asked to predict whether the next card drawn randomly from a shuffled deck will be red or black. The classic result in this task is known as probability matching, a tendency by the participant to match the probability of choosing the responses to the probability that each response has been reinforced, or shown to be right. This chapter presents data from a Web-based version of this experiment, controlled by a JavaScript program, and compares them with data obtained from a sample of undergraduates tested in the lab. Web-based research allows the investigator to assess how well results obtained with the typical “subject pool” in the lab generalize across levels of age, education, gender, and other demographic characteristics that vary in the Web but not the lab sample. Keyword (#3-5): probability learning, choice, frequencies, rational decision making. 1. Introduction In some learning tasks, the subject is reinforced for every “correct” response. For example, one could test rats in a Y-shaped runway with two chambers. The rat might be reinforced with food for taking a turn to the left at the choice point. It is also possible to add some probabilistic “noise” to the environment by reinforcing the left turn on a random subset of 70% of the trials and reinforcing a turn to the right on 30% of the trials. Studies with probabilistic reinforcement have also been conducted with humans who are asked to predict which of several events will occur on the next trial. Some of these “studies” are conducted not by scientists, but by people in business who realize that money can be made by involving humans in probabilistic prediction. For example, will the next roll of the roulette wheel be an even number or odd? Will the next card drawn from a deck be red or black? Will the next roll of the dice be a seven? Will the home team win its next football match? People are willing to risk money for the opportunity to make such predictions (i.e., to bet), if their correct predictions are reinforced by the chance to win money. One of the interesting findings in this area of research is that people do not follow the optimal policy in such situations. Instead, they appear to match their choice probabilities to the probabilities of the events that they are predicting. Although this 2 Michael H. Birnbaum policy is not optimal, such behavior was consistent with a simple mathematical model of learning proposed by Estes (1950). Bower (1994) reviews this prediction (p. 295) and the literature on this topic. Let us consider an example to illustrate both what would be an optimal solution and contrast that with the typical finding of what people do in this situation. Suppose a person is betting on whether or not the next card drawn from a shuffled deck will be a heart or not a heart. A standard deck of cards has 25% hearts, 25% spades, 25% clubs, and 25% diamonds. If the person’s task is to be correct as much as possible, the best policy is to always guess that it will NOT be hearts. That guess should be made on every single trial. This policy will be correct 75% of the time. It will only be wrong 25% of the time when a heart is drawn from the deck. What people seem to do instead is to make different guesses on different trials, with 25% of their guesses being hearts, and the other 75% not hearts. Thus, they match the probability of guessing hearts to the probability that hearts has occurred. Now, if people had Extra Sensory Perception (ESP), perhaps they could do better than chance this way, but people do not have ESP. Instead, their predictions have been found to be independent of the actual cards drawn. By matching probabilities, one can correctly predict hearts on (.25)(.25) = .0625 of all trials and correctly predict not hearts on (.75)(.75) = .5625 of the trials. These two ways of being correct are mutually exclusive, so the probability of being correct is .0625 + .5625 = .625. Thus, by matching the probability of choosing hearts to the probability of hearts, the person will be correct on only 62.5% of the trials. By always guessing not hearts, the person could have been correct on 75% of the trials. 2. Questions and Hypotheses The question of why people probability match instead of guess the more probable alternative has been explored for many years. Bower (1994) presents a good summary of literature pertaining to Estes’ theory of learning in the special Centennial issue of Psychological Review. Variations of this classic probability learning paradigm form a rich area in which one can study learning, decision-making or choice, human understanding of frequencies and probabilities, beliefs in ESP, superstition, rationality, gambler’s fallacy, and configural hypothesis testing (Bower & Heit, 1992; Estes, 1976). In studies of learning with animals, it is very time consuming to collect a small amount of data. One has to care for the rat colony. The animals are usually tested only a few trials per day to ensure that they are uniformly motivated (have been deprived of food) during the testing. When these studies are conducted with humans, it is easier to collect data, but it could still be costly to test a large number of people. The cost meant that each study might test only a few different conditions. The purpose of this demonstration is to show that a Web version of this classic result can be reliably replicated in both the lab and on the Web. If the Web version reliably reproduces the classic result, then this Web version can be employed in Probability Learning 3 future investigations that study such variables as instructions, payoffs, practice, or other manipulations. 3. Method The best way to understand the experiment is to load the experiment in the browser and try a few “games” with it. "games" with it. Each time that the experiment is run, a different value of p, the probability that the R2 (right side) event is reinforced is randomly selected by the program that controls the experiment. 3.1 Subjects There were 71 undergraduates who served as one option to partially fulfill an assignment in lower division psychology classes. Each of these lab participants completed the task twice, using the same version of the experiment as used by the Web volunteers. These 71 participated with different values of p in each “game,” forming a between-Ss design for the first replicate, and a within-Ss design comparing first and second replicates. The Web volunteers were 856 people who were recruited by links in sites listing online psychology experiments and by passive listings with search engines. These participants were in a complete between-subjects design, in which each participant was randomly assigned to a different value of p. The CGI script recorded the remote IP address, to allow checking for multiple submissions of data from the same participant. Although some of the Web volunteers participated as many as four times, only the first set of data was analyzed for each Web participant. The referring web page was also detected and recorded by the CGI script, as a security check, to ensure that all of the data were generated from the same Web page. 3.2 Material The URL of the experiment is as follows: http://psych.fullerton.edu/mbirnbaum/web/ProbLearn.htm The experiment is entirely contained in one file. The JavaScript program, which runs on the participant’s browser is contained in the file associated with this chapter. There are comments in the JavaScript to help explain what each variable does. 4 Michael H. Birnbaum The HTML page contains two forms, expanel , which creates the experimental panel that is displayed in a table, and DataForm, which contains the variables that will be passed as data to the CGI script, which saves the data on the server. The JavaScript program assigns each person’s new game to a different level of probability and runs the random trials, keeping track of how many times the person selects the left or right button, and how often the person was right. The JavaScript program also passes the results of the experiment to the form, DataForm, which in turn, sends the data to the CGI script. The probability is selected uniformly by the statement, p = Math.random() which selects a value of p uniformly in the 0 to 1 interval. Pushing the left (R1) or right (R2) buttons on the experimental panel calls the functions check1() and check2(), respectively. These functions carry out the business of each trial. These functions update the trial number, n, by one on each trial; n is initially set to zero and increments by one for each button push by the participant. Each check function then decides randomly (by choosing another random number) if R1 or R2 occurred, and it displays this feedback in the experimental panel as well as feedback indicating whether participant’s guess was “correct” or “wrong.” These functions also keep track of how many times the person has clicked R1 (n1) or R2 (n2). After a time delay of 450 msec, the feedback is erased by the clear3() function, and the experimental panel is ready for the participant to make his or her next guess. When the number of trials reaches 100, the check functions call the function, done(), which computes the actual percentage of trials on which the feedback was R2. Because there are 100 trials, this number will take on 101 possible levels, from 0 to 100. This function displays an alert box with feedback that summarizes how the participant did. This function also causes the data to be passed to hidden variables in the DataForm. The variables passed are pr, the percentage of times that R2 was reinforced, p2, the percentage of times that the participant pushed the “R2” button, and pc, the percentage of times that the participant correctly predicted the feedback on the next trial. 3.3 Procedure Different procedures employed with lab and Web participants. In the lab procedure, the subjects were university undergraduates who came to the lab to participate. The experimenter met them, showed them how to scroll and click, gave them brief instructions, and then let them work at their own paces to complete the task. The undergraduates were instructed to complete two games, and they were told that each game is different. In the Web procedure, participants found their ways to the site, read the cursory instructions, and completed the task on their own. Some repeated the task several times, but most completed just one game. Probability Learning 5 3.4 Design The main independent variable is the probability that R2 (as opposed to R1) is the “correct” response on each trial. Because there were 100 trials, there were 101 levels of p varying from 0 to 100 out of 100 trials. The other variables used in the analysis are whether the people were tested with the lab procedure or Web procedure and demographics. The participants were asked to indicate their gender, age, education, and nationality before sending the data. The data are saved by the server as a comma separated values (CSV) file. CSV files are easy to import to spreadsheets and statistical packages such as Excel and SPSS. 4. Results Figure 1 shows the relationship between the percentage of R2 predictions and the percentage of R2 reinforcements. Each lab participant’s data are shown as a separate point. Filled circles show results for the first “game” and open circles show results for the second game. 6 Michael H. Birnbaum In Figure 1, probability matching predicts that the data points should fall on the identity line. Although there is considerable scatter, the data are approximated by the probability matching hypothesis. Figure 2 plots the percentage correct as a function of the percentage of R2 reinforcements. There are two predicted lines shown in the figure. The dashed curve, consisting of two straight line segments, is the prediction of the optimal strategy, which is to always respond with the more frequent button. This strategy sets an upper limit, assuming there is no ESP, because it should take a few trials to figure out which of the two responses is more frequent, especially for values close to 50%. The solid curve in Figure 2 is a quadratic function showing the predicted accuracy, assuming that the subject matches response probability to stimulus probability. It also assumes that the subject does not have ESP. This curve gives a better fit to the individual data. Probability Learning 7 The mean percentage correct on Replicate 1 was 68.7 and for Replicate 2 it was smaller, 67.3%. This decrease was not significant (t = -.46, p > .05). Figure 3 shows the results, plotted as in Figure 2, for the Web participants. Each point is a single game by a different person. The figure shows a very similar pattern for the Web participants to that of the lab participants. There are a few values far from the percentage matching result. These outliers appear to be cases in which the more R2 that were reinforced, the more the person would choose R1, as if following a rule sometimes called the “gambler’s fallacy.” In the gambler’s fallacy, a gambler who has been losing comes to believe that the coin, roulette wheel, cards, or dice remember and now “owe” the gambler more wins, to make the events average out. It is called a fallacy because the law of large numbers does not imply that results come out even at the end. The law merely says that in a large sample, the sample proportion will be closer to the true probability, which does not require the coin to have a memory. These data show very little support for ESP, or extrasensory precognition of the next event. Very few participants in either the lab or Web versions show performance above the line showing how they could do by always sticking to the more frequent alternative. 8 Michael H. Birnbaum 5. Further Questions and Discussion Because one can collect large quantities of data via the Web, and because the participants are so much more heterogeneous than lab subjects, it is possible to subdivide the sample and examine whether the results depend on such demographic characteristics as age or education. For example, among the 538 females who participated, there were 40 with doctoral degrees, and among the 299 males, there were 30 with doctorates. One can ask if those with doctorates were more likely to be correct. Indeed, the mean number correct for females with the doctorate was 70.7, and for males it was 69.9. These figures are higher than the corresponding means for those who have some graduate school (66.2% and 67.1%, respectively), which in turn are higher than the figures for those with only bachelor’s degrees. For the 110 females with bachelor’s degrees, the mean was 64.6% correct and for the 74 males, the corresponding figure was 66.8. Although the trends are small, it becomes possible with large, heterogeneous samples to be able to explore such small effects in the data. For example, the percentage correct for the 70 participants with doctoral Probability Learning 9 degrees is significantly higher than the percentage correct for the 629 people with 12 to 16 years of education, t(697) = 2.03, p < .05. One can make the same types of graphs as shown in Figures 1 and 2 for each subsample of the Web study to examine if there are demographic variables that correlate with the results. 6. Extensions This set-up allows one to investigate a number of interesting possibilities. Why do people do so badly in this paradigm? Perhaps they are trying too hard to find complex rules that will be correct more often than the simple rule of choosing the button that has most often been right. Would instructions to avoid such rules or to encourage such rules affect performance at the task? One can use the basic engine to run studies that might ask these questions. One could use an entry page to randomly assign participants to different conditions, using techniques such as described by Birnbaum (2000) to assign people to conditions. In one condition, people might be told the optimal strategy, and in another condition, they might be told to use their ESP. In another condition, they might be instructed that there is a pattern for them to discover that would enable them to be 100% correct. In still another condition, people might be instructed that the pattern is completely random and they should do the best they can to be as right as possible. Two or more of the above conditions might be explored to see which set of instructions produces the best or worst performance. As another between-Ss manipulation, it would be interesting to give no special instructions, but have a financial payoff for each correct prediction. Would this incentive cause people to adopt better strategies? A recent paper by Wolford, Miller, and Gazzaniga (2000) illustrates how one can use the basic paradigm to investigate issues in neuroscience. They used the probability learning (or “guessing”) paradigm with people who have had the corpus callosum sectioned to alleviate symptoms of epilepsy. These people are sometimes called “split brain” patients because the big body of direct connections between the left and right hemispheres of the cerebrum have been cut. In these patients, it is possible to isolate which half-brain receives the information by projecting information to the right or left visual field. It is also possible to isolate which hemisphere receives the stimuli or controls the response by putting the stimulus or response in the left or right hand. Wolford, et al. conclude that when people are put in this probability learning task, they attempt to form strategies. They believe that these strategies are controlled by the left hemisphere, because the left hemisphere performs more poorly at this task than the right. Perhaps the left hemisphere, which speaks, controls these hypotheses, which produce the probability matching behavior. Another possibility is that the left hemisphere is too verbal and less mathematical than the right hemisphere, so perhaps the key is to remove the usual dominance of the left hemisphere and allow the right hemisphere to solve the math. 10 Michael H. Birnbaum In the lab version of the present study, each person participated twice in the game, providing a within-Subjects test of the effect of p. In some situations, within and between-subjects experiments are known to yield different results (Birnbaum, 1999; 2001). Although the within-Ss and between-Ss comparisons were compatible in the lab version to the between-Ss research in the Web version (Figures 2 and 3), it would seem interesting to examine how performance in this task might be affected by repeated practice in the task. Imagine an experiment in which each participant plays 100 or 200 games with different values of p. Would people in this situation learn to be more optimal or would they continue to probability match? One could plot the probability correct as a function of experience to examine this effect. The probability learning paradigm has a rich history in psychology and is a good situation in which students can devise new experiments to extend the science of psychology. 7. Literature Birnbaum, M. H. (1999). How to show that 9 > 221: Collect judgments in a between-subjects design. Psychological Methods, 4(3), 243-249. This paper shows how one can reach rather silly conclusions in a between-subjects design. The study was done by the use of Web, but the point of the paper applies to either Web or lab research. Birnbaum, M. H. (2001). Introduction to Behavioral Research on the Internet. Upper Saddle River, NJ: Prentice Hall. This book covers the techniques needed to understand how to conduct simple experiments via the WWW. It is intended for use with undergraduates, graduate students, or professionals who are getting started in this field. It includes explanations of use of Excel and SPSS for data analysis. Included are presentations of HTML and JavaScript, including techniques for random assignment of people to conditions by different techniques. The techniques used in this study of passing variables between forms are explained as well. Bower, G., & Heit, E. (1992). Choosing between uncertain options: A reprise to the Estes scanning model. In A. Healy, S. Kosslyn, & R. M. Shiffrin (Eds.), From learning theory to connectionist theory (pp. 21-43). Hillsdale, NJ: Lawrence Erlbaum. This paper compares Estes’ scanning model with an expected utility model applied to experimental situations that generalize the probability learning paradigm. For example, the subjects might learn the relative strengths of horses, political candidates or choices in a gambling context. They can then be presented with test pairs that have not previously raced, competed, or been compared. In the test phase, it is of interest to predict the choice probabilities in these new test pairs as a function of the frequencies and strengths of reinforcements in the learning phase. Bower, G. (1994). A turning point in mathematical learning theory. Psychological Review, 101, 290-300. This article reviews research on mathematical models of learning that followed from the classic article of Estes, including the probability learning paradigm. It was included in the special Centennial issue of Psychological Review that Probability Learning 11 covered the most important topics covered in the journal’s first hundred years. This article also summarizes more modern developments in learning theory, including recent works by Estes, which use connectionist adaptive network models for the revision of behavior as the result of experience. Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review, 57, 94-107 (Reprinted in part in Psychological Review, 1994, 101, 282-289). This is article is a classic. It is regarded as one of the most important papers to appear in the first century of publication of the Psychological Review. Estes, W. K. (1976). The cognitive side of probability learning. Psychological Review, 83, 37-64. This article presents a cognitive theory of frequency learning to account for variants of the probability learning paradigm and other related experiments on predictive behavior, such as exhibited in multiple cue probability learning studies, and relative frequency judgments. Restle, F., & Greeno, J. G. (1970). Introduction to mathematical psychology. Reading, MA: Addison-Wesley. This undergraduate textbook contains several useful chapters on mathematical learning theory. It reviews evidence testing urn models of learning as replacement or learning as accumulation and whether learning is all-or-none or gradual. The process of mathematical formulation of clear theory and experiments to compare theories is well-explained in these chapters. Wolford, G., Miller, M. B., & Gazzaniga (2000). The left hemisphere’s role in hypothesis formation. Journal of Neuroscience, 20: RC64:1-4. This paper shows that patients with split brains show probability matching when the task is performed in the left hemisphere and give more optimal behavior in the right hemisphere. It can be read and viewed on line from the following URL: http://www.jneurosci.org/cgi/search?volume=&firstpage=&author1=wolford&a uthor2=&titleabstract=&fulltext=&fmonth=Jan&fyear=1981&tmonth=Nov&tyea r=2000&hits=10&sendit=Search&fdatedef=1+January+1981&tdatedef=1+No vember+2000 8. Acknowledgments This research was supported by National Science Foundation Grants, SBR9410572 and SES 99-86436. (SBR-9410572).