6. Marilyn Vos Savant Meets Rescorla-Wagner: A Crucial Experiment 1. Introduction The previous chapters described variations of the Baker experiment that have never been performed. But a critical issue is Rescorla Wagner. That model has dominated psychological theories of human and animal learning for many years, with vast influence. A variety of empirical objections have been made to it--for example that it neglects learning about a “cue” that happens when the subject observes the effect in the absence of the cue, or that to save the phenomena it requires different parameter settings for similar experimental situations, or that it does not allow backwards blocking. I will add another, at least hypothetically. There is a very simple case in which the Rescorla-Wagner model predicts that a learner will converge toward associating a “cue” with an outcome when in fact the cue has no influence on the outcome and when, further, the data provided to the learner contain that very information—if the Causal Markov Assumption is made. No arguments over parameter settings are required, and no additional terms will reconcile the difference. It is not surprising that a test using that case has never been done, since it requires two things: A general analysis of the equilibria of the Rescorla-Wagner model, and an understanding of a fundamental feature of graphical causal models. With both things in hand, we can describe an appropriate experiment. First, consider the fundamental feature. 1 2. Conditional Dependence and the Monte Hall Game When two independently distributed variables, say X and Z, both influence a third variable, say Y, then conditional on some value of Y, X and Z are not independent. Judea Pearl gives the following illustration. Suppose the variables are the state of the battery in your car (charged/dead), the state of the fuel tank (not empty / empty) and whether your car starts (starts / does not). Suppose your regard the states of the battery and of the fuel tank as independent: knowing the state of the battery gives no information about the state of the fuel tank, and vice-versa. Now condition on a value of the effect, whether the car starts, by supposing you are given the information that your car does not start. Now the information that your battery is charged does provide information about the state of the fuel tank. The general principle is that in a causal graph, if edges from two independent (or conditionally independent on other variables) are both into—collide—a third variable, then the two causes are dependent conditional on their effect (and on whatever other variables had to be conditioned on to make the causes independent). The principle is necessary in all linear models, and in all other models satisfying the Causal Markov and Faithfulness Assumptions. This principle is elementary to auto mechanics, but there is wonderful anecdotal evidence that people do not recognize it in some contexts, and that many professional statisticians do not recognize it at all. The principle is violated in all regression algorithms, sometimes with strikingly unfortunate results. For example, the estimates of low level lead exposure on the intelligence of children were obtained by step-wise regression, in a case that violates this principle, with the result that the published estimates of the malign effect are at least 50% too low (Scheines, et al, 1998). The anecdotal evidence comes from Marilyn vos Savant, who for many years ran one of the few intelligent newspaper features, a 2 puzzle and advice column in a Sunday supplement. Vos Savant described, and gave the correct answer to, the Monte Hall Problem. The Monte Hall game goes like this: Before the contestant arrives, the host, Monte Hall, places a thousand dollars behind one door and a (quiet) goat behind each of two other doors. The contestant enters and is told that if he chooses the correct door he will win the money. The contestant then chooses. Monte Hall then opens one of the doors that the contestant did not choose and that does not contain the money. (If the contestant chose a door that does not contain the money, then Monte Hall has no choice as to which door to open; if the contestant choose the door that does contain the money, then Monte Hall opens one of the other doors at random.) Now, after a door without the money has been opened, the contestant is given the option of changing his choice of doors. What should the contestant do, stick with his first choice, change, or does it not matter? The answer is that the contestant should change doors, and by doing so increases the chance of winning from 1/3 to 2/3. The majority of people presented with the game think it does not matter whether the choice is changed or not, and when vos Savant published her correct answer, she received scores of denunciations from professors of statistics. The Monte Hall problem is an instance of the collider phenomenon—of independent variables conditional on a common effect. Monte Hall’s original choice of where to put the money—a variable with three values—and the contestant’s original choice of which door has the money—another variable with three values—influence which door Monte Hall opens—still another variable with three values. The contestant’s original choice is independent of where Monte Hall put the money. But conditional on the information about which door Monte Hall opened, the contestant’s original choice provides information about which door Monte Hall put the money behind. The anecdotal evidence does not decide the question of whether people can and do make causal judgements in accord with conditional association. In the Monte Hall game, it is not obvious how to identify the variables, and contestants have no data from which to 3 learn—so an explicit analysis was required. And statistical eyes have disciplinary blinkers. 3. Testing the Rescorla Wagner Model Consider the following structure: B A C E A has no influence at all on E, and by the Causal Markov Assumption, A and E are independent. But A and E are dependent conditional on some value of B. Now suppose that C is not observed. A, B and E are observed, A and B precede E, and the following associations hold: A and B are dependent B and E are dependent A and E are independent A and E are dependent conditional on some value of B. Assuming Faithfulness and the Causal Markov Assumption, these facts and the time order are inconsistent with any causal pathway from A to E. If one made inferences in accord with these principles, one would conclude that A has no influence on E. Indeed, in principle one could further infer that there is some unobserved factor influencing both B and E. 4 Thanks to David Danks work, we know the form of the Rescorla Wagner equilibria for this sort of set up, although the particular values of the equilibria of course depend on the particular probability distribution over cues and outcome. The Rescorla Wagner equilibrium value of the association between A and E—that is, the value of VA in the pair <VA, VB> for which the expected change of VA, VB on any new data is zero—is just conditional P. That is, the equilibrium RW value of VA for an appropriate probability distribution p over A, B, E is PEA;~B = p(E = 1 | A = 1, B = 0) – p(E = 1 | A = 0, B = 0). For most probability distributions over A, B, E obtained by marginalizing out C in a distribution faithful to the graph in figure 1 PEA;~B is not zero. If subjects follow the RW model, they will ignore the fact that A and E have no association and will focus on the fact that A and E are associated conditional on the absence of B. If subjects make judgements in accord with the Causal Markov Assumption and Faithfulness, the fact that A and E are not associated will lead them to judge there is no causal connection between those variables. So what should the probabilities be? The same pattern of dependencies and independencies among A, B, E is obtained if we ignore C and use the graph B A E which is easier to parameterize since the distribution factors simply as p(B |A,E) p(A) p(E), and is equivalent to setting p(U = E) = 1 for the original graph. David Danks has provided the following parameterization: Experimental setup #1: A -> B <- U -> E 5 Let p(U=E) = 1 p(A = 1)=.1 p(E = 1)=.45 p(B = 1 |A = 1)=.9 p(B = 1| A = 0)=.2 p(B = 1 |A = 1, E = 1)=.8 p(B = 1 |A = 1, E = 0)=.98 p(B = 1 | A = 0, E = 1)=.02 p(B = 1 | A = 0, E = 0)=.35 The equilibrium value of VA is .35. One could, as in the Baker experiment, give subjects a substantial number of trials according to this probability distribution and ask for their judgement of “efficacy” or some such thing. But such answers confound subjects’ judgements of the strength of the dependence with their confidence in their judgement that there is any dependence, and, when subjects may reasonably think there may be an indirect influence, confuse whether an estimate of direct influence or of total influence is wanted. A better design, it would seem, would be to also give subjects a second set of trials in which a distinct cue, C, is paired with the same cue B as before and the same effect E, but C actually causes E, and at equilibrium VC < VA. Then subjects may be given a forced choice between A and C to bring about E, with a reward if they succeed in producing E. Assuming they are near equilibrium at the end of each of the two sets of trials, the causal interpretation of Rescorla Wagner requires that subjects prefer to try to bring about E by bringing about A. If they use the Causal Markov Assumption and Faithfulness, they should prefer to try to bring about E by bring about C. The graph for the second set of trials is simply B C E 6 and Danks provides the parameterization: Experimental setup #2: B <- C -> E p(C = 1) =.1 p(B = 1 | C = 1) =.9 p(B = 1 | C = 0) =.2 p(E = 1 | C = 1) =.7 p(E = 1 | C = 0) =.42 This leads to an equilibrium value of VC=.28. Note that these parameters satisfy p(A)=p(C), and p(E) is the same in each experiment.. With these parameters, RW predicts that VA > VC, even though p(E = 1 |A = 1) = p(E = 1) = .45, and p(E = 1 |C = 1) = 1.5 * p(E = 1) = .675. Gregory Cooper and his associates at the University of Pittsburgh tested whether medical residents made causal judgements in accordance with causal Bayes nets, giving them numbers for conditional probabilities rather than data, and including a case structurally like those considered here. The medical residents did not do very well with three variables, but because of a data artifact neither did a Bayes net learning algorithm. For a comparison with Rescorla Wagner, it is essential to give subjects data rather than numerical probabilities. .The principal difficulty in such an experiment is that a large number of trials would be required to approximate the relevant statistics. 7