Marilyn Vos Savant Meets Rescorla-Wagner: A - PhilSci

Marilyn Vos Savant Meets Rescorla-Wagner: A Crucial
1. Introduction
The previous chapters described variations of the Baker experiment that have never been
performed. But a critical issue is Rescorla Wagner. That model has dominated
psychological theories of human and animal learning for many years, with vast influence.
A variety of empirical objections have been made to it--for example that it neglects
learning about a “cue” that happens when the subject observes the effect in the absence of
the cue, or that to save the phenomena it requires different parameter settings for similar
experimental situations, or that it does not allow backwards blocking. I will add another,
at least hypothetically.
There is a very simple case in which the Rescorla-Wagner model predicts that a learner
will converge toward associating a “cue” with an outcome when in fact the cue has no
influence on the outcome and when, further, the data provided to the learner contain that
very information—if the Causal Markov Assumption is made. No arguments over
parameter settings are required, and no additional terms will reconcile the difference. It is
not surprising that a test using that case has never been done, since it requires two things:
A general analysis of the equilibria of the Rescorla-Wagner model, and an understanding
of a fundamental feature of graphical causal models. With both things in hand, we can
describe an appropriate experiment. First, consider the fundamental feature.
2. Conditional Dependence and the Monte Hall Game
When two independently distributed variables, say X and Z, both influence a third
variable, say Y, then conditional on some value of Y, X and Z are not independent. Judea
Pearl gives the following illustration. Suppose the variables are the state of the battery in
your car (charged/dead), the state of the fuel tank (not empty / empty) and whether your
car starts (starts / does not). Suppose your regard the states of the battery and of the fuel
tank as independent: knowing the state of the battery gives no information about the state
of the fuel tank, and vice-versa. Now condition on a value of the effect, whether the car
starts, by supposing you are given the information that your car does not start. Now the
information that your battery is charged does provide information about the state of the
fuel tank.
The general principle is that in a causal graph, if edges from two independent (or
conditionally independent on other variables) are both into—collide—a third variable,
then the two causes are dependent conditional on their effect (and on whatever other
variables had to be conditioned on to make the causes independent). The principle is
necessary in all linear models, and in all other models satisfying the Causal Markov and
Faithfulness Assumptions.
This principle is elementary to auto mechanics, but there is wonderful anecdotal evidence
that people do not recognize it in some contexts, and that many professional statisticians
do not recognize it at all. The principle is violated in all regression algorithms, sometimes
with strikingly unfortunate results. For example, the estimates of low level lead exposure
on the intelligence of children were obtained by step-wise regression, in a case that
violates this principle, with the result that the published estimates of the malign effect are
at least 50% too low (Scheines, et al, 1998). The anecdotal evidence comes from Marilyn
vos Savant, who for many years ran one of the few intelligent newspaper features, a
puzzle and advice column in a Sunday supplement. Vos Savant described, and gave the
correct answer to, the Monte Hall Problem.
The Monte Hall game goes like this: Before the contestant arrives, the host, Monte Hall,
places a thousand dollars behind one door and a (quiet) goat behind each of two other
doors. The contestant enters and is told that if he chooses the correct door he will win the
money. The contestant then chooses. Monte Hall then opens one of the doors that the
contestant did not choose and that does not contain the money. (If the contestant chose a
door that does not contain the money, then Monte Hall has no choice as to which door to
open; if the contestant choose the door that does contain the money, then Monte Hall
opens one of the other doors at random.) Now, after a door without the money has been
opened, the contestant is given the option of changing his choice of doors. What should
the contestant do, stick with his first choice, change, or does it not matter? The answer is
that the contestant should change doors, and by doing so increases the chance of winning
from 1/3 to 2/3. The majority of people presented with the game think it does not matter
whether the choice is changed or not, and when vos Savant published her correct answer,
she received scores of denunciations from professors of statistics.
The Monte Hall problem is an instance of the collider phenomenon—of independent
variables conditional on a common effect. Monte Hall’s original choice of where to put
the money—a variable with three values—and the contestant’s original choice of which
door has the money—another variable with three values—influence which door Monte
Hall opens—still another variable with three values. The contestant’s original choice is
independent of where Monte Hall put the money. But conditional on the information
about which door Monte Hall opened, the contestant’s original choice provides
information about which door Monte Hall put the money behind.
The anecdotal evidence does not decide the question of whether people can and do make
causal judgements in accord with conditional association. In the Monte Hall game, it is
not obvious how to identify the variables, and contestants have no data from which to
learn—so an explicit analysis was required. And statistical eyes have disciplinary
3. Testing the Rescorla Wagner Model
Consider the following structure:
A has no influence at all on E, and by the Causal Markov Assumption, A and E are
independent. But A and E are dependent conditional on some value of B. Now suppose
that C is not observed. A, B and E are observed, A and B precede E, and the following
associations hold:
A and B are dependent
B and E are dependent
A and E are independent
A and E are dependent conditional on some value of B.
Assuming Faithfulness and the Causal Markov Assumption, these facts and the time
order are inconsistent with any causal pathway from A to E. If one made inferences in
accord with these principles, one would conclude that A has no influence on E. Indeed, in
principle one could further infer that there is some unobserved factor influencing both B
and E.
Thanks to David Danks work, we know the form of the Rescorla Wagner equilibria for
this sort of set up, although the particular values of the equilibria of course depend on the
particular probability distribution over cues and outcome. The Rescorla Wagner
equilibrium value of the association between A and E—that is, the value of VA in the pair
<VA, VB> for which the expected change of VA, VB on any new data is zero—is just
conditional P. That is, the equilibrium RW value of VA for an appropriate probability
distribution p over A, B, E is
PEA;~B = p(E = 1 | A = 1, B = 0) – p(E = 1 | A = 0, B = 0).
For most probability distributions over A, B, E obtained by marginalizing out C in a
distribution faithful to the graph in figure 1 PEA;~B is not zero.
If subjects follow the RW model, they will ignore the fact that A and E have no
association and will focus on the fact that A and E are associated conditional on the
absence of B. If subjects make judgements in accord with the Causal Markov Assumption
and Faithfulness, the fact that A and E are not associated will lead them to judge there is
no causal connection between those variables.
So what should the probabilities be? The same pattern of dependencies and
independencies among A, B, E is obtained if we ignore C and use the graph
which is easier to parameterize since the distribution factors simply as p(B |A,E) p(A)
p(E), and is equivalent to setting p(U = E) = 1 for the original graph.
David Danks has provided the following parameterization:
Experimental setup #1: A -> B <- U -> E
Let p(U=E) = 1
p(A = 1)=.1
p(E = 1)=.45
p(B = 1 |A = 1)=.9
p(B = 1| A = 0)=.2
p(B = 1 |A = 1, E = 1)=.8
p(B = 1 |A = 1, E = 0)=.98
p(B = 1 | A = 0, E = 1)=.02
p(B = 1 | A = 0, E = 0)=.35
The equilibrium value of VA is .35.
One could, as in the Baker experiment, give subjects a substantial number of trials
according to this probability distribution and ask for their judgement of “efficacy” or
some such thing. But such answers confound subjects’ judgements of the strength of the
dependence with their confidence in their judgement that there is any dependence, and,
when subjects may reasonably think there may be an indirect influence, confuse whether
an estimate of direct influence or of total influence is wanted. A better design, it would
seem, would be to also give subjects a second set of trials in which a distinct cue, C, is
paired with the same cue B as before and the same effect E, but C actually causes E, and
at equilibrium VC < VA. Then subjects may be given a forced choice between A and C to
bring about E, with a reward if they succeed in producing E. Assuming they are near
equilibrium at the end of each of the two sets of trials, the causal interpretation of
Rescorla Wagner requires that subjects prefer to try to bring about E by bringing about A.
If they use the Causal Markov Assumption and Faithfulness, they should prefer to try to
bring about E by bring about C.
The graph for the second set of trials is simply
and Danks provides the parameterization:
Experimental setup #2: B <- C -> E
p(C = 1) =.1
p(B = 1 | C = 1) =.9
p(B = 1 | C = 0) =.2
p(E = 1 | C = 1) =.7
p(E = 1 | C = 0) =.42
This leads to an equilibrium value of VC=.28. Note that these parameters satisfy
p(A)=p(C), and p(E) is the same in each experiment.. With these parameters, RW
predicts that VA > VC, even though p(E = 1 |A = 1) = p(E = 1) = .45, and p(E = 1 |C = 1)
= 1.5 * p(E = 1) = .675.
Gregory Cooper and his associates at the University of Pittsburgh tested whether medical
residents made causal judgements in accordance with causal Bayes nets, giving them
numbers for conditional probabilities rather than data, and including a case structurally
like those considered here. The medical residents did not do very well with three
variables, but because of a data artifact neither did a Bayes net learning algorithm. For a
comparison with Rescorla Wagner, it is essential to give subjects data rather than
numerical probabilities. .The principal difficulty in such an experiment is that a large
number of trials would be required to approximate the relevant statistics.