ch5

advertisement
Instrumental Learning & Operant Reinforcement
Operant Learning
 Stimulus
 Response
 Outcome
Classical vs. Operant
 Classical
 Requires reflex action
 Neutral stimulus associated with US
 Outside of subject’s control
 Operant
 Strengthening/weakening of “voluntary” action
 Subject responds or doesn’t
 Can operate together
What’s in a Name?
 Operant learning: subject operates on environment
 Instrumental conditioning: subject is instrumental in
obtaining outcome
Trial and Error Learning
 E.L. Thorndike
 Animal intelligence
 Maze studies
Puzzle Box
 Cats
 Cage with mechanism to open door
 Escape latency
 Discrete trial procedure
Law of Effect
 Any behaviour followed by an appetitive stimulus will
increase in frequency
Terms
 Operant (response): any behaviour that operates on
the environment to produce an effect
 Reinforcer: any event that increases the frequency of a
behaviour
 Punisher: any event that decreases the frequency of a
behaviour
Operant Learning
 B.F. Skinner
 Operant chamber
 Free operant procedure
Discrete Trial & Free Operant
 Discrete
 One trial at a time
 “Apparatus” must be reset
 Measure some
behaviour
 e.g., mazes
 Free
 Operant can occur at
any time
 Operant can occur
repeatedly
 Response rate
 e.g., operant chamber
Four Contingencies
 Positive reinforcement
 Negative reinforcement
 Positive punishment
 Negative punishment
Positive and Negative
 Positive: presents some stimulus
 Negative: removes some stimulus
Reinforcers and Punishers
 Reinforcer: increases a behaviour
 Punisher: decreases a behaviour
Contingencies
Response Rate:
Removed
Response Causes
Stimulus to Be:
Increases
Decreases
Positive Reinforcement
Positive Punishment
Lever press --> Food
Lever press --> Shock
Negative Reinforcement
Negative Punishment
Lever press --> Shock off Lever press --> Food removed
Types of Reinforcers
 Primary
 Not dependent on an association with other reinforcers
 Secondary
 Initially neutral stimulus
 Paired with primary reinforcer
 “Conditioned Reinforcer”
Secondary Reinforcers
 “Bridging”, “clicker”
 Secondary extinction without periodic pairings with
primary
 Generally weaker than primary
 Generalized reinforcer
 Paired with many other kinds of reinforcers
 e.g., money
Strength of Operant Learning
 Can condition practically any behaviour
 Shaping (successive approximations)
Shaping a Lever Press
 Gradual process
 Reinforce more appropriate/precise responses
 Feedback
Response Chains
 Sequences of behaviours in specific order
 Objective: primary reinforcer
 Conditioned reinforcers
 Discriminative stimuli
Forward Chaining
 Start with first response in sequence, then work
through to last response in additive steps
Backwards Chaining
 Often used with “complex” training
 Start with last response in chain
 Next, second last response
 Third last, etc.
Contingency
 Correlation between behaviour & outcome
 Strong contingency --> better learning
 Random contingency --> no learning
 Both reinforcement and punishment
Contiguity
 Time between behaviour & outcome
 Shorter = better learning
 Delays let other behaviours occur, forgetting,
extinction (behaviour w/o reinforcement)
 Learning with delay if stimulus “placeholder” provided
(conditioned reinforcer?)
 More important for punishment
Reinforcer Characteristics
 Larger reinforcers --> stronger learning
 Not a linear effect
 Qualitative differences in reinforcers and punishers
 Species & individual differences
 Intensity of punisher
Task Characteristics
 Some tasks easier to learn than others
 Species & individual differences
 Innate and/or prior conditioning
Deprivation Levels
 Generally, the greater the deprivation, the more
effective the reinforcer
 Reinforcers can satiate
 Deprivation can provide motivation to engage in
punishable behaviours
Extinction
 Behavioural does not lead to same outcome
 Response no longer produces same outcome
 Extinction burst (with reinforcement)
 Variability of behaviour
 Aggression and frustration
 Spontaneous recovery
 Resurgence
Hull’s Drive Reduction Theory
 Animals have motivational states (drives)
 Necessary for survival
 Reinforcers are things that reduce drives
 Physiological value
 Reduce physiological state
Drive Reduction Reinforcers
 Works well with primary
 Some increase a
reinforcers
 Many secondary
reinforcers have no
physiological value
 Hull: association links
secondary to drive
 Some reinforcers hard to
classify as primary or
secondary
physiological state
 Some necessities
undetectable
 Roller coasters
 Vitamins
 Saccharin
Relative Value Theory & Premack
Principle
 Treat reinforcers as behaviours
 Is it the food, or the behaviour of eating that is the
reinforcer?
 Behavioural probability scale
 Greater or lesser value of behaviours relative to one
another
 No distinction between primary and secondary
Premack Principle
 One behaviour will reinforce a second behaviour
 High probability behaviour reinforces low
probability behaviour
 Baseline probability scale
 Time
 Rank order
Time spent on response
 Reinforcement relativity
Probabilty of response =
Total time
 No absolutes
Example
 Behaviours
 Eat ice cream (I), play video game (V), read book (B)
 Baseline (30 minutes)
 Student 1: I (2min), V (8min), B (20min)

Scale: I -- V -- B
 Student 2: I (8min), V (20min), B (2min)

Scale: B -- I -- V
 Student 1: V reinforces I, B reinforces V & I
 Student 2: I reinforces B, V reinforces I & B
Problems
 Baseline phase
 Fair rating?
 How to compare very different behaviours
 Time problems
 What if time not important to behaviour?
 Behaviour duration?
 Length of baseline period?
Response Deprivation Theory
 Deprived behaviours = reinforcing behaviours
 Drop below baseline level of performance
 Not relative frequency of one behaviour compared to
another (i.e., Premack)
 Level of deprivation for a behaviour
 Praise? “Yes”?
Definitions
 Escape
 Get away from aversive stimulus that is in progress
 Avoidance
 Get away from aversive stimulus before it begins
Shuttle
Box
 Solomon & Wynne (1953)
 Dogs
 Chamber with barrier; Shock
 Light off as signal
Barrier
Discriminative
stimuli
Electrifiable
floor
Side 1
Side 2
Two-Process Theory
 Classical and operant
conditioning
 Shock = US
 Fear/pain/jump/twitch/
squeal = UR
 Darkness = CS
 Fear of dark = CR
 Fear: heart rate,
breathing, stomach
cramps, etc.
 Negative reinforcement
 Removal of fear (CR)
 Escape of CS, not
avoidance of shock
Support for Two-Process Theory
 Rescorla & LoLordo (1965)
 Dog in shuttlebox
 No signal
 Response gives “safe time”
 Pair tone with shock
 Tone increases rate of response
 CS can amplify avoidance
 Conditioned inhibition can reduce avoidance
Problems with Two-Process
Theory
 Avoidance without observable fear
 Heart rate
 Not consistent
 Fear diminishes with avoidance learning
Measuring Fear
 Kamin, Brimer, and Black (1963)
 Lever press ---> food
 Auditory CS ---> avoidance in shuttle box until: 1, 3,
9, 27 avoidances in a row
 CS in Skinner box; check for suppression of lever
press
Results
 Fear decreases during extended avoidance training
Responding
 But, avoidance still strong
 Even low fear is enough?
1
3
9
Avoidance responses
27
Extinction in Avoidance Behaviour
 Odd prediction from two-process theory
 “Yo-yo” effect
 Avoidance should toggle
successful avoidance
 But! Avoidance is extremely persistent
trials
One-Process Theory
 Classical conditioning component unnecessary
 Avoidance, not fear reduction, is reinforcer
 “Safety”
Sidman Avoidance Task
 Free-operant avoidance
 Can avoidance be learned if no warning CS?
 Shock at random intervals
 Response gives safe time
 Extensive training --> learn avoidance
 But, usually never perfect
 High variability across subjects
 Two-process theory suggests:
 Time becomes a CS (time elicits fear)
Herrnstein & Hineline (1966)
 Rapid and slow shock rate schedules
 Lever press switches schedules
 Shocks presented randomly, no signal
 Responses give shock reduction
 Reduction in shock is reinforcer
Learned Helplessness
 Behaviour has no effect on situation
 Generalizes
 Laboratory
 Give inescapable shocks
 Shuttle box
 Will not switch sides
 Expectation that behaviour has no effect
Learned Helplessness in Humans
 Depression
 Situations beyond your control
 Three dimensions
 Situation: specific or global
 Attribute: internal or external
 Time: short-term or long-term
Maier & Seligman (1976)
 Motivational impairment
 Cognitive impairment
 Emotional impairment
Therapeutic Application
 Confidence building (“can not fail”)
 Implementation issues
 Tasks that can be successfully completed
 Produces immunization
 Escapable condition … inescapable condition
 Learned helplessness less likely to develop
Download