lecture 14

advertisement
Chapter 7
The Associative Structure of
Instrumental Conditioning
The way we talk about Pavlovian conditioning is very
‘cognitive’
 we say that animals form mental representations
of the relationships among stimuli
 animal has a representation of the CS that gets
associated with some representation of the US
 when we present a CS, it calls up a representation
of the US
 Instrumental/operant conditioning is now viewed
in the same way
 subjects/animals are information-processors, not
only with respect to stimuli (Pavlovian) but also with
respect to their own behavior (operant)
3 main components in operant learning situation
1. Stimulus – S (or sometimes Sd)
 the discriminative stimulus sets the occasion
for reward by signaling when the response will be
followed by the reinforcer
2. Response – R
3. Outcome – O
Associations develop among each of these elements:
S-R association
 the discriminative stimulus can become directly
associated with the response
S-O association
 the discriminative stimulus can become associated
with the outcome (basically a Pavlovian association)
R-O association
 the response becomes associated with the outcome
In recent years, this notion that animals develop mental
representations of behavior is probably best shown in the
work of Rescorla
Rescorla and colleagues have done a # of experiments
demonstrating that animals develop R-O associations
The typical way to demonstrate R-O association is to
train rats to make a response for a particular outcome
and then devalue that outcome — should lead to a
decrease in responding
R-O association
Colwill & Rescorla (1985)
Training
Devaluation
R1
LP
O1
Sucrose
O1
LiCL
O2
nothing
R2
CP
O2
food
(Same rats
get both)
Test
R1 and R2
Everything is counterbalanced, but for the
sake of simplicity say LP = sucrose and
CP = food and the sucrose is devalued
R-O association
 animals should have developed 2 different R-O
associations
 during test, 20 min with both responses available
 they could LP or CP but no outcome was given, i.e.,
essentially an extinction test
 if R1 evokes memory for devalued or aversive outcome,
but R2 does not, then should see a decrease in R1
R-O association
Results:
7
6
R2 -outcome
not devalued
5
Mean
resp/min 4
3
2
R1 -outcome
was devalued
1
Time
R-O association
Results:
 when outcome (reinforcer) was devalued by pairings
with LiCl, the response that produced the reinforcer
declined
 the reason is that subjects remembered the reinforcer
as being aversive and therefore devalued the response
that was associated with that outcome
 so, memory for, or representation of, the goal object is
crucial for the execution of the response
S-O association
Like Pavlovian CSs, Sd (discriminative stimuli) also
become associated with outcomes
Colwill & Rescorla (1988)
Sd training
Response training
Test
S1
R1
O1
R3
O1
S1: R3 vs R4
N
LP
Suc
R4
O2
S2: R3 vs R4
S2
R2
O2
L
CP
food
2 new responses
All rats get both
S-O association
if rat has S-O association (i.e., knew which outcome
went with which Sd), then when given S1 on test, should
perform the response that was associated with the same
outcome
i.e., when given S1 — should perform R3
S2 — should perform R4
This is essentially what happened
S-O association
Results:
10
Same
outcome
8
Mean 6
resp/min
Different
outcome
4
2
Trials
S-O association
Results:
 in the presence of a particular Sd, the rats performed
the response that was associated with the same outcome,
more than the response associated with the different
outcome
 evidence for S-O association
S-R association
 somewhat simpler to demonstrate
T
BP
food
 see more BP during the T than in its absence
 Rescorla has shown with devaluation experiments
that even with complete devaluation, see some
responding due to S-R association
for ex., devalue food
in the presence of the T, rat still barpresses
(but won’t eat the food)
Hierarchical Associations
In addition to the simple associations of 2 elements
(i.e., S-R, S-O, R-O), can also have hierarchical
associations
 the Sd becomes an occasion setter that signals
when the response will be followed by a reinforcer
S
R
R
O
nothing
 so, the Sd signals the relationship between a
response and its outcome
S
[R
O]
Hierarchical Associations
 Recall from Pavlovian conditioning that a CS is only
powerful when it reliably predicts a US
 When the CS provides no reliable information about
the occurrence of the US, then conditioning is weak
 The same idea has been applied to the learning of a
hierarchical association
Hierarchical Associations
S
R
R
O
nothing
In this situation, the S is informative about when the R
will be followed by the O
S
R
R
O
O
However, in the second situation, the S is provides no
information about when the R will be followed by the O
Hierarchical Associations
Rescorla (1990) used this idea to obtain evidence for a
hierarchical association
Training
Test
S1
[R1
O1]
S1: R1 vs R2
S1
[R2
O2]
S2: R1 vs R2
S2
[R1
O2]
S2
[R2
O1]
4 30-s presentations of both
discriminative stimuli, with both
responses available
But also,
R1
R2
O1
O2
Which Sd is informative about the
R-O relation???
Hierarchical Associations
Results:
7
S2 - informative
6
5
Mean
resp/min
4
3
S1 -not informative
2
1
Trials
Theories of Reinforcement
1. Reinforcement as stimulus presentation
What identifies a reinforcer?
Thorndike
 a stimulus that is satisfying
 the problem with this definition is that it is circular
Theories of Reinforcement
Hull’s Drive Reduction Theory
 a biological need upsets the body’s homeostasis and
induces a drive state
 any stimulus that satisfies the biological need, restores
homeostasis, and thus reduces the drive state serves as a
reinforcer
 the problem with this definition is that many
reinforcers do not restore/maintain homeostasis
 incentive motivation (response elicited by reinforcer),
curiosity, praise, criticism
Theories of Reinforcement
2. Reinforcement as behavior
A. The Premack Principle
 rather than talking about reinforcing stimuli,
Premack focused on reinforcing responses
 so, instead of saying food is a reinforcing stimulus,
Premack said eating is a reinforcing response
 the only difference between an operant response
and a reinforcer is the probability of occurrence
 avoided circularity by defining a reinforcer as a more
probable behavior than the ‘operant’ behavior
 that is, high probability behaviors will reinforce low
probability behaviors, but not the reverse
When a rat is water deprived (E1), it drinks more than it runs. Therefore, drinking
reinforces running, but running does not reinforce drinking. When a rat is not water
deprived (E2), it runs more than it drinks. Then running reinforces drinking, but
drinking does not reinforce running
Theories of Reinforcement
B. Behavioral Regulation Approaches
 expanded on Premack Principle
 took into account the animal’s repertoire of behavior
in a context
 this established the “bliss point”, the optimum
distribution of responding for the subject
 operant contingencies shift the subject away from the
bliss point; the subject behaves so as to approach the
optimum distribution as closely as possible
Time Studying
Download