Instrumental Conditioning:
Foundations
Name Game
• Instrumental: subject instrumental in
producing outcome
• Operant: subject operates on
environment to produce outcome
Elements
• Discriminative stimulus (SD)
• Response
• Outcome
– Appetitive
– Aversive
• Species and individuals
Outcomes and Effects
• Positive
– Something is delivered
• Negative
– Something is removed
• Reinforcer
– Causes frequency of behaviour to increase
• Punisher
– Causes frequency of behaviour to decrease
• Need to know effect on behaviour before
labeling “reinforcer” or “punisher”
Omission Training
• Negative punishment
• Text: withdrawing sources of positive
reinforcement
• Omission training technique
– Operant response --> withholding of
appetitive stimulus
– No operant response --> get appetitive
stimulus
A Caveat
• Domjan: picky on operant vs.
instrumental (e.g., p. 151)
• Domjan: sloppy on reinforcer and
punisher re: effect on behaviour
• Don’t be sloppy like Domjan!
• Be clear on outcome’s effect on
increase/decrease of behaviour!
Methodologies/Procedures
• Discrete trial
– One trial at a time
– Reset apparatus
– Latency, running speed, reduction in errors
• Free operant
– Uninterrupted repeated trials
– Less disruptive for subject
– Response rate
Beginnings
•
•
•
•
•
•
Edward L. Thorndike
Undergraduate student
Animal intelligence
Mazes
Puzzle boxes (video)
Trial-and-error learning
Law of Effect
• Responses followed by appetitive
outcomes increase in frequency
• Responses followed by aversive
outcomes decrease in frequency
• Stimulus-Response (S-R) learning
• Association between S and R altered by
experience
Mechanical Strengthening
Processes
•
•
•
•
•
Guthrie & Horton (1946)
Cat in box with a pole
Streotypic behaviours
Consistent in individual
Variable across individuals
Stop-Action Principle
• “Random” response strengthened by
success
– Individual predispositions
– E.g. response = bite pole; appetitive
outcome = escape
– “Stops the action”
• Not immediate
• Dominance of one response
Problems with Stop-Action
• Muezinger’s (1928)
• Guinea pigs
• Lever press for
lettuce
• Not one dominant
operant behaviour
Response Classes
• Lashley (1942)
• Reinforcement strengthens class of
operant responses
• End goal
B.F. Skinner
• Operant response
– Meaningful, measurable unit of behaviour
– Defined by effect it has on environment
• Skinner’s approach ( video)
• Operant chamber (video)
Shaping
• Trial-and-error somewhat random
• Successive approximations
• Very precise operant response possible
Shaping: Reinforcers
• Conditioned reinforcer
– Previously neutral stimulus that has
acquired the capacity to strengthen
responses because it has been repeatedly
paired with a primary reinforcer
• Primary reinforcer
– Stimulus that naturally strengthens any
response that is paired with it
Shaping a Lever Press
• Gradual process
• Reinforce more appropriate/precise
responses
Behavioural Stereotypy vs.
Variability
• Always some slight variability in
responses
• Degree of stereotypy
– Specific imposed response requirements
– Cost-benefit of different responses
• Can actually condition response
variability
– E.g., Only reinforce novel responses
Page & Neuringer (1985)
Expimental
% novel responses
• Pigeons
• 8 pecks on two keys
(left and right)
• Exp. Gr.: only reinforced
if response different
from previous 50
responses
• Control Gr.: reinforced
for any response
pattern
Control
1st five
sessions
last five
sessions
Mediators on Response
• Belongingness
– Thorndike: some responses harder to
condition than others
• Biological predispositions
• Breland & Breland (1961) and
instinctive drift
Skinner (1948)
•
•
•
•
Superstitious behaviour
Accidental strengthening of response
FT-15 sec. grain delivery
6 of 8 pigeons develop very
characteristic, unrequired responses
• Humans
– Rituals, personal and society superstitions;
persistent
Staddon & Simmelhag (1971)
•
•
•
•
•
High speed cameras
Interim and terminal responses
Behavioural regularities
Temporally structured
Terminal: species specific behaviours
re: food anticipation
• Interim: behaviours not motivated by
food
• R3: peck at floor
• R4: quarter turn
• R8: move along
magazine wall
• R1: orient toward
food magazine wall
• R7: peck at
magazine wall
Probability of Occurrence
Responses
terminal
Interval (sec.)
interim
Behaviour Systems Theory
• Periodic food delivery activates feeding
system
• Preorganized species-typical foraging
and feeding responses
• Just after food: post-food focal search
• Middle of time interval: general search
Reinforcer Values
• Response magnitude, rate of learning
• Quantity and quality
– Individual’s level to assess magnitude differences
– Generally positive correlation for single operant
tasks; more complicated for higher schedules (back
to this with choice section)
• Changes in reinforcement magnitude
– Reinforcement history
– Expectation
– Positive and negative behavioural contrast
Response-Reinforcer
Contingency
• “Causal relation”
• Strong contingency produces stronger
responding and faster learning
• Non-contingent (random) relationship
– Lack of responding
– Extinction
Temporal Contiguity
• Immediate reinforcement more effective
than delayed
– Which response was reinforced?
forgetting; reinforcer devaluation
• Skinner on teaching machine (video)
• Bridge (conditioned reinforcer)
• Marking procedure
Control
•
•
•
•
Response’s control over outcome
Uncontrollable situation
Aversive outcome
Learned helplessness
Triadic Design
Group
Exposure
Phase
Conditioning
Phase
Result
Group E
Escapable
shock
Escapeavoidance
Rapid
avoidance
Group Y
Yoked
inescapable
shock
Escapeavoidance
Slow
avoidance
Group R
Restricted to
apparatus
Escapeavoidance
Rapid
avoidance
• Immunization
– Escapable shock; inescapable shock, escapeavoidance; rapid avoidance
Theory
• Behaviour has no effect on situation
• Generalization
• Maier & Seligman (1976)
– Motivational, cognitive, and/or emotional
impairment
• Non-human learned helplessness
– Model for human depression
– Situation (specific/global), Attribution
(internal/external), time (short- or long-term)