Instrumental Conditioning:
Motivational Mechanisms
Contingency-Shaped
Behaviour
• Uses three-term contingency
• Reinforcement schedule (e.g., FR10)
imposes contingency
• Seen in non-humans and humans
Rule Governed Behaviour
• Particularly in humans
• Behaviour can be varied and
unpredictable
• Invent rules or use (in)appropriate rules
across conditions (e.g., language)
• Age-dependent, primary vs. secondary
reinforcers, experience
Role of Response in Operant
Conditioning
• Thorndike
– Performance of response necessary
• Tolman
– Formation of expectation
• McNamara, Long & Wike (1956)
– Maze
– Running rats or riding rats (cart)
– Association what is needed
Role of the Reinforcer
• Is reinforcement necessary for operant
conditioning?
• Tolman & Honzik (1930)
• Latent learning
– Not necessary for learning
– Necessary for performance
Average Errors
Results
no food
no food until day 11
food
Day 11
Days
Associative Structure in
Instrumental Conditioning
• Basic forms of association
– S = stimulus, R = response, O = outcome
• S-R
• Thorndike, Law of Effect
• Role of reinforcer: stamps in S-R
association
• No R-O association acquired
Hull and Spence
• Law of Effect, plus a classical
conditioning process
• Stimulus evokes response via
Thorndike’s S-R association
• Also, S-O association creates
expectancy of reward
• Two-process approach
– Classical and instrumental are different
One-Process or Two-Processes?
• Are instrumental and classical the same
(one process) or different (two
processes)?
• Omission control procedure
– US presentation depends on nonoccurrence of CR
– No CR, then CS ---> US
– CR, then CS ---> no US
Omission Control
Trial with a CR
CS
US
CR
Trial without a CR
CS
US
CR
Gormenzano & Coleman (1973)
• Eyeblink with rabbits
• US=shock, CS=tone
• Classical group: 5mA shock each trial,
regardless of response
• Omission group: making eyeblink CR to
CS prevents delivery of US
• One-process prediction:
– CR acquisition faster and stronger for
Omission group
– Reinforcement for CR is shock avoidance
– In Classical group CR will be present
because it somehow reduces shock
aversiveness
• BUT…
– CR acquisition slower in Omission group
– Classical conditioning extinction (not all
CSs followed by US)
• Supports Two-process theory
Classical in Instrumental
• Classical conditioning process provides
motivation
• Stimulus substitution
• S acquires properties of O
– rg = fractional anticipatory goal response
• Response leads to feedback
– sg = sensory feedback
• rg-sg constitutes expectancy of reward
Timecourse
S
rg - s g
R
O
Through stimulus substitution S elicits rg-sg,
giving motivational expectation of reward
Prediction
Lever pressing
Magnitude
• According to rg-sg
CR should occur
before operant
response; but
doesn’t always
• Dog lever pressing
on FR33 ---> PRP
• Low lever presses
early, then higher;
but salivation only
later
salivation
Time from start of trial
Modern Two-Process Theory
•
•
•
•
Classical conditioning in instrumental
Neutral stimulus ---> elicits motivation
Central Emotional State (CES)
CES is a characteristic of the nervous
system (“mood”)
• CES won’t produce only one response
– Bit annoying re: prediction of effect
Prediction
• Rate of operant response modified by
presentation of CS
• CES develops to motivate operant response
• CS from classical conditioning also elicits
CES
• Therefore, giving CS during instrumental
conditioning should alter CES that motivates
instrumental response
“Explicit” Predictions
• Emotional states
US
CS
CS+
CS-
Appetitive
(e.g., food)
Hope
Disappointment
Aversive
(e.g., shock)
Fear
Relief
• Behavioural predictions
Instrumental schedule
Aversive US
CS+(fear)
CS-(relief)
Positive reinforcement
Negative reinforcement
decrease
increase
increase
decrease
R-O and S(R-O)
• Earlier interpretations had no responsereinforcement associations
• Intuitive explanation, though
• Perform response to get reinforcer
Colwill & Rescorla (1986)
– Food or sucrose
Testing of Reinforcers
Mean responses/min.
• R-O association
• Devalue reinforcer
post-conditioning
• Does operant
response decrease?
• Bar push right or left
for different
reinforcers
normal reinforcer
devalued reinforcer
Blocks of Ext. Trials
Interpretation
• Can’t be S-R
– No reinforcer in this model
• Can’t be S-O
– Two responses, same stimuli (the bar), but
only one response affected
• Conclusion
– Each response associated with its own
reinforcer
– R-O association
Hierarchical S-(R-O)
• R-O model lacks stimulus component
• Stimulus required to activate
association
• Really, Skinner’s (1938) three term
contingency
• Old idea; recent empirical testing
Colwill & Delameter (1995)
• Rats trained on pairs of S+
• Biconditional discrimination problem
– Two stimuli
– Two responses
– One reinforcer
• Match the correct response to the
stimuli to be reinforced
• Training, reinforcer devaluation, testing
• Training
–
–
–
–
Tone: lever --> food; chain --> nothing
Noise: chain --> food; lever --> nothing
Light: poke --> sucrose; handle --> nothing
Flash: handle --> sucrose; poke --> nothing
• Aversion conditioning
• Testing: marked reduction in previously
reinforced response
–
–
–
–
Tone: lever press vs. chain
Noise: chain vs. lever
Light: poke vs. handle
Flash: handle vs. poke
Analysis
• Can’t be S-O
– Each stimulus associated with same reinforcer
• Can’t be R-O
– Each response reinforced with same outcome
• Can’t be S-R
– Due to devaluation of outcome
• Each S activates a corresponding R-O
association
Reinforcer Prediction, A Priori
• Simple definition
– A stimulus that increases the future
probability of a behaviour
– Circular explanation
• Would be nice if we could predict
beforehand
Need Reduction Approach
• Primary reinforcers reduce biological
needs
• Biological needs: e.g., food, water
• Not biological needs: e.g., sex,
saccharin
• Undetectable biological needs: e.g.,
trace elements, vitamins
Drive Reduction
• Clark Hull
• Homeostasis
– Drive systems
• Strong stimuli aversive
• Reduction in stimulation is reinforcer
– Drive is reduced
• Problems
– Objective measurement of stimulus intensity
– Where stimulation doesn’t change or increases!
Trans-situationality
• A stimulus that is a reinforcer in one
situation will be a reinforcer in others
• Subsets of behaviour
– Reinforcing behaviours
– Reinforcable behaviours
• Often works with primary reinforcers
• Problems with other stimuli
Primary and Incentive
Motivation
• Where does motivation to respond
come from?
• Primary: biological drive state
• Incentive: from reinforcer itself
But… Consider:
• What if we treat a reinforcer not as a stimulus
or an event, but as a behaviour in and of itself
• Fred Sheffield (1950s)
• Consummatory-response theory
– E.g., not the food, but the eating of food that is the
reinforcer
– E.g., saccharin has no nutritional value, can’t
reduce drive, but is reinforcing due to its
consumability
Premack’s Principle
• Reinforcing responses occur more than the
responses they reinforce
• H = high probability behaviour
• L = low probability behaviour
• If L ---> H, then H reinforces L
• But, if H ---> L, H does not reinforce L
• “Differential probability principle”
• No fundamental distinction between
reinforcers and operant responses
Premack (1965)
• Two alternatives
– Eat candy, play pinball
– Phase I: determine individual behaviour
probability (baseline)
• Gr1: pinball (operant) to eat (reinforcer)
• Gr2: eating candy (operant) to play pinball
(reinforcer)
– Phase II (testing)
• T1: play pinball (operant) to eat (reinforcer)
– Only Gr1 kids increased operant
• T2: eat (operant) to play pinball (reinforcer)
– Only Gr2 kids increased operant
Premack in Brief
Any activity…
…could be a reinforcer
… if it is more likely to be “preferred” than
the operant response.
Response Deprivation
Hypothesis
• Restriction to reinforcer response
• Theory:
– Impose response deprivation
– Now, low probability responses can reinforce high
probability responses
• Instrumental procedures withhold reinforcer
until response made; in essence, deprived of
access to reinforcer
• Reinforcer produced by operant contingency
itself
Behavioural Regulation
• Physiological homeostasis
• Analogous process in behavioural
regulation
• Preferred/optimal distribution of
activities
• Stressors move organism away from
optimum behavioural state
• Respond in ways to return to ideal state
Behavioural Bliss Point
• Unconstrained condition: distribute
activities in a way that is preferred
• Behavioural bliss point (BBP)
• Relative frequency of all behaviours in
unconstrained condition
• Across conditions
– BBP shifts
• Within condition
– BBP stable across time
Imposing a Contingency
• Puts pressure on BBP
• Act to defend challenges to BBP
• But requirements of contingency (may)
make achieving BBP impossible
• Compromise required
• Redistribute responses so as to get as
close to BBP as possible
Minimum Deviation Model
•
•
•
•
Behavioural regulation
Due to imposed contingency:
Redistribute behaviour
Minimize deviation of responses from
BBP
– Get as close as you can
restricted running
Time drinking
40
30
20
restricted drinking
10
10
20
30
Time running
40
Strengths of BBP Theory
• Reinforcers: not special stimuli or
responses
• No difference between operant and
reinforcer
• Explains new allocation of behaviour
• Fits with findings on cognition for
cost:benefit optimization