Consequence - Learnblock

advertisement
Lecture 3:
Instrumental learning
The puzzle box
Historical Roots
• Darwin
– variance b/w animals can be described by the
commonalities between them and other animals both
in terms of behaviour and mental abilities
– led to ‘comparative psychology’
That’s one smart
monkey
Historical Roots
• Darwin
– variance b/w animals can be described by the
commonalities between them and other animals both
in terms of behaviour and mental abilities
– led to ‘comparative psychology’
BUT is this insight/intelligence?
(i.e. how did the animal actually acquire this behaviour)
Edward Lee Thorndike
1. Cat locked in
puzzle box,
2. Cat makes the
‘right’
response,
3. Door opens,
4. Eat kipper
(Yum).
The puzzle box
Edward Lee Thorndike
1. Cat locked in
puzzle box,
2. Cat makes the
‘right’
response,
3. Door opens,
4. Eat kipper
(Yum).
The puzzle box
Edward Lee Thorndike
1. Cat locked in
puzzle box,
2. Cat makes the
‘right’
response,
3. Door opens,
4. Eat kipper
(Yum).
The puzzle box
Edward Lee Thorndike
1. Cat locked in
puzzle box,
2. Cat makes the
‘right’
response,
3. Door opens,
4. Eat kipper
(Yum).
The puzzle box
Trial and
error
• E.g., time for cat to
escape from puzzle
box.
• Observed
progressive
improvement over
many trials,
not ‘sudden insight’.
Discrete trial procedures
•Single trial procedures.
•Measured objective dependent
variables such as ‘time’ or
‘errors’.
(Used by other
researchers)
The puzzle box
Law of Effect
• What a human or animal does is strongly
influenced by the immediate consequences of
such behaviour in the past:
“Of several responses made to the same situation,
those that are accompanied or closely followed by
satisfaction to the animal will, other things being
equal, be more firmly connected with the
situation, so that, when it recurs, they will be more
likely to recur”
- Thorndike (1911, p. 244)
Tripartite Contingency - ABC
•Antecedent: The stimulus controlling
behaviour;
– The Discriminitive Stimuli (Sd).
• Behaviour: What is the Response being
reinforced?
– The Response (R)
•Consequence: What is the immediate
outcome of a behaviour?
– The Reinforcing Stimulus (Sr).
B. F. Skinner
• A limitation with
Thorndike’s method
was the discrete trial:
– When subject can
respond is constrained,
– One response and/or
one reinforcer per trial,
– Handling stress.
Free operant procedure
1. Rat placed in
Skinner Box
2. Rat makes ‘right’
response,
3. Rat gets food
4. Repeat from 2.
Free operant procedure
1. Rat placed in
Skinner Box
2. Rat makes ‘right’
response,
3. Rat gets food
4. Repeat from 2.
Free operant procedure
1. Rat placed in
Skinner Box
2. Rat makes ‘right’
response,
3. Rat gets food
4. Repeat from 2.
Free operant procedure
1. Rat placed in
Skinner Box
2. Rat makes ‘right’
response,
3. Rat gets food
4. Repeat from 2.
Instrumental learning
Note the importance difference between
these procedures and those used by
Pavlov:
– Pavlov: The subject has no control over
events, but responds to them.
– Thorndike/Skinner: The subject has to
respond to change the circumstances.
•The behaviour is instrumental in
determining what happens.
What are Reinforcers
Many reinforcers are intrinsically valued
(primary reinforcers) e.g. giving a dog food
But this is not always the case:
– Secondary reinforcers: Acquire their reinforcing
properties through experience, e.g. clicker with
dog
– Social reinforcement, e.g. praise
– Sensory & activity reinforcers, e.g. learning guitar
A brief exercise…
A brief exercise…
Shaping
Principle of successive approximation.
• Reinforce behaviours that are closer and closer to a
target behaviour
• Gradually make the conditions of reinforcement more
stringent, more precise.
• Can generate entirely novel behaviours
– Bar pressing in rats,
– Dog opening door.
Shaping
90
80
70
60
50
40
30
20
10
0
Days
40
32
25
22
19
16
13
10
7
Normal
Social
Check
4
1
% social time
Practical use of Shaping
Superstitious Behaviour
•
Shaping may also explain
superstitious behaviours…
Skinner
“The way reinforcement is
carried out is more important
that the amount of
reinforcement given””
Contiguity & Contingency
Fixed
Ratio
(Responses)
Interval
(Time period)
Variable
Contiguity & Contingency
FIXED RATIO: 10 bar presses = 1 food pellet
VARIABLE RATIO: on average 10 bar presses = 1 food pellet
BUT this varies across trials, e.g. 7, 8,10, 11, 14 presses
FIXED INTERVAL: 10 bar presses = 1 food pellet
BUT only 1 pellet is available every 10sec
VARIABLE INTERVAL: 10 bar presses = 1 food pellet
BUT on average only 1 pellet is available every 10sec but this
varies across trials, e.g. 6, 9, 10, 12, 13 secs
Reinforcement schedules
Plays to extinction (mean log)
Rft Schedule affects Extinction
2
Lewis & Duncan, 1956
1.9
persistence at
gambling
1.8
1.7
100
75
50
25
Percent reinforcement during training
• The less reliably a response is reinforced, the
more persistent it is during extinction.
0
Can we unlearn? - Extinction
Duration of crying (min)
60
Rapid
Reacquisition
50
40
First extinction
Spontaneous
recovery
30
20
Second
extinction
10
Williams, 1959
0
1
2
3
4
5
6
7
Times child put to bed
8
9
10
Response training reinforcement
The
consequence
Appetitive
Aversive
R
Produces
Consequ.
Positive Reinforcement:
R increases
Positive
Punishment:
R decreases
Negative Punishment
(omission):
R decreases
Negative
reinforcement
(Escape/Avoid):
R increases
R Terminates
Consequ.
Positive Reinforcement
Response training reinforcement
The
consequence
Appetitive
Aversive
R
Produces
Consequ.
Positive
Punishment:
R decreases
R Terminates
Consequ.
Negative
reinforcement
(Escape/Avoid):
R increases
Negative Punishment
(omission):
R decreases
Escape learning
WS On
Escape
Shock On
• A barrier divides the shuttle box - one half has a grid floor.
• A warning signal (WS) comes on, followed by a mild foot
shock through the grid floor.
• The subject can escape the shock by leaping over the
barrier to the safe area.
Escape learning
WS On
Escape
Shock On
• A barrier divides the shuttle box - one half has a grid floor.
• A warning signal (WS) comes on, followed by a mild foot
shock through the grid floor.
• The subject can escape the shock by leaping over the
barrier to the safe area.
Escape learning
WS On
Escape
Shock On
• A barrier divides the shuttle box - one half has a grid floor.
• A warning signal (WS) comes on, followed by a mild foot
shock through the grid floor.
• The subject can escape the shock by leaping over the
barrier to the safe area.
Escape learning
WS On
Escape
Shock On
• A barrier divides the shuttle box - one half has a grid floor.
• A warning signal (WS) comes on, followed by a mild foot
shock through the grid floor.
• The subject can escape the shock by leaping over the
barrier to the safe area.
Avoidance
• The animal soon learns to jump over the barrier when the
warning signal comes on, and avoids the shock
altogether.
• Escape = turning off some currently occurring aversive
event,
• Avoid = preventing some aversive event from occurring.
WS On
Happy rat
Avoid
Shock On
Avoidance
• The animal soon learns to jump over the barrier when the
warning signal comes on, and avoids the shock
altogether.
• Escape = turning off some currently occurring aversive
event,
• Avoid = preventing some aversive event from occurring.
WS On
Happy rat
Avoid
Shock On
Negative Reinforcement
Response suppression Punishment
The
consequence
Appetitive
Aversive
R
Produces
Consequ.
Positive Reinforcement:
R increases
Positive
Punishment:
R decreases
Negative Punishment
(omission):
R decreases
Negative
reinforcement
(Escape/Avoid):
R increases
R Terminates
Consequ.
Positive Punishment
Response suppression Reinforce other activities
The
consequence
Appetitive
R
Produces
Consequ.
Positive Reinforcement:
R increases
R Terminates
Consequ.
Negative Punishment
(omission):
R decreases
Aversive
Negative
reinforcement
(Escape/Avoid):
R increases
Negative Punishment
Response-consequence
contingencies
The
consequence
Appetitive
Aversive
R
Produces
Consequ.
Positive Reinforcement:
R increases
Positive
Punishment:
R decreases
Negative Punishment
(omission):
R decreases
Negative
reinforcement
(Escape/Avoid):
R increases
R Terminates
Consequ.
Important
Negative reinforcement
is
NOT
Punishment!!!
An analysis of
drug abuse
How can we describe the
spiral from drug use to
drug abuse?
•
•
•
•
Sources of +ve Rf
Sources of -ve Rf
Goal directed or habit?
Treat with punishment?
Extinction? Omission?
Very simplified analysis
• Role of habituation,
classical cond’n,
discrimination learning,
social learning, etc.
Instrumental learning A summary
• Unlike classical conditioning, instrumental learning
involves circumstances where behaviour
determines the events that follow.
• The likelihood that a behaviour will increase or
decrease is determined by:
– The nature of the events that follow (appetitive/aversive),
– Whether the behaviour produces or terminates these
events.
• Can be automatic, but can also goal-oriented.
Download