• 2 doors
• .1 and .2 probability of getting a dollar respectively
• Can get a dollar behind both doors on the same trial
• Dollars stay there until collected, but never more than 1 dollar per door.
• What order of doors do you choose?
• If choices are made moment by moment, should be orderly patterns in the choices: 2, 2, 1, 2, 2, 1…
• Results mixed but promising results when using time as the measure
• Maximizing local rates and moment to moment choices can lower overall reinforcement rate.
• Short-term vs. long-term
• Many of life’s reinforcers are delayed…
– Eating right, studying, etc.
• Delay obviously devalues a reinforcer
– How are effects of reinforcers affected by delay?
– Why choose the immediate, smaller reward?
– Why ever show self-control?
• Temporal, not causal
– Causal, with delay, very hard
• Same with delay of reinforcement
– Effects decrease with delay
• But how does it occur?
• Are there reliable and predictable effects?
• Can we quantify the effect?
How Do We Measure Delay Effects?
Studying preference of delayed reinforcers
Humans:
- verbal reports at different points in time
“what if” questions
Humans AND nonhumans:
A. Concurrent chains
B: Titration
All are choice techniques.
7
A. Concurrent chains
Concurrent chains are simply concurrent schedules -- usually concurrent equal VI VI -- in which reinforcers are delayed.
When a response is reinforced, usually both concurrent schedules stop and become unavailable, and a delay starts.
Sometimes the delays are in blackout with no response required to get the final reinforcer (an FT schedule);
Sometimes the delays are actually schedules, with an associated stimulus, like an FI schedule, that requires responding.
8
Initial links,
Choice phase
W
W
Conc VI VI
W W
W
W
Terminal links,
Outcome phase
VI a s
Food
The concurrent-chain procedure
VI b s
Food
9
An example of a concurrent-chain experiment
MacEwen (1972) investigated choice between two terminallink FI and two terminal-link VI schedules, one of which was always twice as long as the other.
The initial links were always concurrent VI 60-s VI 60-s schedules.
10
The terminal-link schedules were:
FI 5 s
FI 10 s
FI 20 s
FI 40 s
VI 5 s
VI 10 s
VI 20 s
VI 40 s
FI 10 s
FI 20 s
FI 40 s
FI 80 s
VI 10 s
VI 20 s
VI 40 s
VI 80 s
Constant reinforcer (delay and immediacy) ratio in the terminal links – all immediacy ratios are 2:1.
11
2.0
BIRD M6
1.5
1.0
0.5
FI TERMINAL LINKS
VI TERMINAL LINKS
0.0
0 10 20 30
SMALLER FI or VI VALUE (s)
40
12
From the generalised matching law , we would expect:
B
1 log
B
2
a d log
D
2
D
1
log c
If a d was constant, then because D
2
/ D
1 was kept constant, we would expect no change in choice with changes in the absolute size of the delays.
FI 5 s
FI 10 s
FI 20 s
FI 40 s
VI 5 s
VI 10 s
VI 20 s
VI 40 s
FI 10 s
FI 20 s
FI 40 s
FI 80 s
VI 10 s
VI 20 s
VI 40 s
VI 80 s
D
2
/ D
1 was kept constant throughout.
13
But choice did change, so a d did NOT remain constant:
2.0
But does give us some data to answer some other questions…
1.5
1.0
BIRD M6
0.5
FI TERMINAL LINKS
VI TERMINAL LINKS
0.0
0 10 20
SMALLER FI or VI VALUE (s)
30 40
14
• Now that we have some data…
• How does reinforcer value change over time?
• What is the shape of the decay function?
Basically, the effects that reinforcers have on behaviour and more delayed after the reinforced response.
3
A concave-upwards graph
2
1
0
0 5 10 15 20
REINFORCER DELAY
25 30
16
• What is the “real” delay function?
V t
= V
0
/ (1 + Kt)
V t
= V
0
/(1 + Kt) s
V t
= V
0
/(M + Kt s )
V t
= V
0
/(M + t s )
V t
= V
0 exp(-Mt)
Exponential versus hyperbolic decay
It is important to understand how the effects of reinforcers decay over time, because different sorts of decay predict different effects.
The two main candidates:
Exponential decay -- the rate of decay remains constant over time in this
Hyperbolic decay -- the rate of decay decreases over time
-- as in memory, too
18
Exponential decay
V t
V
0 e
bt
V t
: value of the delayed reinforcer at time t
V o
: value of the reinforcer at 0-s delay t : delay in seconds b : a parameter that determines the rate of decay e : the base of natural logarithms.
20
Hyperbolic decay
V t
hV
0 h
t
In this equation, all the variables are the same as in the exponential decay, except that h is the half-life of the decay -the time over which the value of V o value.
reduced to half its initial
Hyperbolic decay is strongly supported by Mazur’s research.
21
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
HYPERBOLIC DECAY
EXPONENTIAL DECAY
5 10 15 20
REINFORCER DELAY (s)
25 30
Two sorts of decay fitted to McEwen's
(1972) data
Hyperbolic is clearly better.
Not that clean, but…
1.00
0.75
0.50
0.25
HYPERBOLIC DECAY
0.00
1.00
0.75
0.50
0.25
EXPONENTIAL DECAY
0.00
0 10 20 30 40
SMALLER DELAY (s)
23
• Titration procedures.
B: Titration - Finding the point of preference reversal
The titration procedure was introduced by Mazur:
- one standard (constant) delay and
- one adjusting delay.
These may differ in what schedule they are (e.g., FT versus
VT with the same size reinforcers for both), or they may be the same schedule (both FT, say) with different magnitudes of reinforcers.
What the procedure does is to find the value of the adjusting delay that is equally preferred to the standard delay -- the indifference point in choice.
25
For example:
- reinforcer magnitudes are the same
- standard schedule is VT 30 s
- adjusting schedule is FT
How long would the FT schedule need to become to make preference equal?
26
Titration: Procedure
Trials are in blocks of 4.
The first 2 are forced choice, randomly one to each alternative
The last 2 are free choice.
If, on the last 2 trials, it chooses the adjusting schedule twice, the adjusting schedule is increased by a small amount.
If it chooses the standard twice, the adjusting schedule is decreased by a small amount.
If equal choice (1 of each) -- no change
(von Bekesy procedure in audition)
27
Mazur's titration procedure
ITI
Trial start
Choice
Why the postreinforcer blackout?
W W W
W
W
Standard delay + red houselight
2-s food,
BO
W
Peck
W
6-s food
W
W
Adjusting delay + green houselight
28
• Different magnitudes, finding delay
– 2-sec rf delayed 8 sec = 6 sec rf delayed
20 sec.
• Equal magnitudes, variable vs. fixed delay
– Fixed delay 20 sec = variable delay 30 sec
• Why preference for variable?
– Hyperbolic decay and interval weighting.
• Which would you prefer?
– $1 in an hour
– $2 tomorrow
• Which would you prefer?
– $1 in a month
– $2 in a month and a day
Here’s the problem:
Preference reversal
In positive self control, the further you are away from the smaller and larger reinforcers, the more likely you are to accept the larger, more delayed reinforcers.
But, the closer you get to the first one, the more likely you are to chose the smaller, more immediate one.
32
Friday night:
“Alright, I am setting my alarm clock to wake me up at 6.00 am tomorrow morning, and then I’ll go jogging.” ...
Saturday 6.00 am:
“Hmm….maybe not today.”
33
Assume: At the moment in time when we make the choice, we choose the reinforcer that has the highest current value...
To be able to understand why preference reversal occurs, we need to know how the value of a reinforcer changes the time by which it is delayed...
Outside the laboratory, the majority of reinforcers are delayed.
Studying the effects of delayed reinforcers is therefore very important.
35
Animal research: Preference reversal
Green, Fisher, Perlow, & Sherman (1981)
Choice between a 2-s and a 6-s reinforcer.
Larger reinforcer delayed 4 s more than the smaller.
Choice response (across conditions) required from 2 to 28 s before the smaller reinforcer.
We will call this time T .
36
28 s T 2 s
Small rf
Large rf
4 s
37
28 s T 2 s
Small rf
Large rf
4 s
38
28 s T 2 s
Small rf
Large rf
4 s
39
Green et al .
(continued)
Thus, if T was 10 s, at the choice point,
the smaller reinforcer was 10-s away
the larger was 14-s away
So, as T is changed over conditions, we should see preference reversal.
40
Control condition: two equal-sized reinforcers were delayed, one 28 s the other 32 s.
Preference was strongly towards the reinforcer that came sooner.
So, at delays that long, pigeons can still clearly tell which reinforcer is sooner and which one later.
2
1
GREEN ET AL. (1981)
MEAN DATA
SELF CONTROL
0
-1 IMPULSIVITY
-2
0 5 10 15
VALUE OF T
20 25
41
Which Delay Function Predicts This?
6
4
EXPONENTIAL
MAG = 2
MAG = 6
2
0
0 10 20
SECONDS FROM SMALLER RF
30
42
6
4
2
HYPERBOLIC
MAG = 2
MAG = 6
0
0 10 20
SECONDS FROM SMALLER RF
30
Only hyperbolic decay can explain preference reversal
43
Hyperbolic predictions shown the same way
4
3
2
1
0
0 1 2 3
TIME
Choice reverses here
4 5 6
44
Using strict matching theory to explain preference reversal
The concatenated strict matching law for reinforcer magnitude and delay (see the generalised matching lecture) is:
B
1
B
2
M
1
D
2
M
2
D
1 where M is reinforcer magnitude, and D is reinforcer delay.
Note that for delay, a longer delay is less preferred, and therefore D
2 is on top.
(OK, we know SM isn’t right, and delay sensitivity isn’t constant)
45
We will take the situation used by Green et al . (1981) , and work through what the STRICT matching law predicts:
The baseline is : M
1
= 2, M
2
= 6, D
1
= 0, D
2
= 4
B
B
2
1
M
1
.
M
2
D
2
D
1
2 x 4
6 x 0
8
0
The choice is infinite. Thus, the subject is predicted always to take the smaller, zero-delayed, reinforcer
46
Now, add T = 0.5 s, so M
1
= 2, M
2
= 6, D
1
= 0.5, D
2
= 4.5
B
B
2
1
M
1
.
M
2
D
2
D
1
2 x 4 .
5
6 x 0 .
5
9
3
3
The subject is predicted to prefer the smaller magnitude reinforcer three times more than the larger magnitude reinforcer, and again be impulsive. But its preference for the immediate reinforcer has decreased a lot.
47
Then, when T = 1,
B
1
B
2
2 x 5
6 x 1
10
6
1 .
67
The choice is now less impulsive.
48
For T = 2, the preference ratio B
1
/ B
2 is 1 -- so now, the generalised matching law predicts indifference between the two choices.
For T = 10, the preference ratio is 0. 47 -- more than 2:1 towards the larger, more delayed, reinforcer. That is, the subject is now showing self control
The whole function is shown next -- predictions for Green et al .
(1981) assuming strict matching.
49
This graph shows log ( B
2
/ B
1
), rather than ( B
1
/B
2
), shows how self control increases as you go back in time from when the reinforcers are due.
0.0
-0.5
1.0
MATCHING LAW PREDICTIONS
Self control
0.5
Impulsive
-1.0
0 5 10 15
VALUE OF T
20 25
50
Green et al.’s actual data 2
1
GREEN ET AL. (1981)
MEAN DATA
SELF CONTROL
0
-1 IMPULSIVITY
-2
0 5 10 15
VALUE OF T
20 25
51
• Do this now
• Don’t have a choice to do the bad thing later
• Halloween candy
52
Commitment in the laboratory
Rachlin & Green (1972)
Pigeons chose between:
EITHER allowing themselves a later choice between a small shortdelay ( SS ) reinforcer or a large long-delay reinforcer ( LL ),
OR denying themselves this later choice, and can only get the LL reinforcer.
53
W
W
T
Rachlin & Green (1972)
Larger later
Smaller sooner
Larger later, no choice
Blackout
Reinforcer
54
Operant Conditioning
As they moved the time T at which the commitment response was offered earlier in time from the reinforcers (from 0.5 to 16 s), preference should reverse.
Indeed, Rachlin and Green found that 4 out of 5 birds developed commitment (what we might call a commitment strategy) when T was larger.
56
Operant Conditioning
Mischel & Baker (1975)
Experimenter puts one pretzel on a table and leaves the room for an unspecified amount of time.
If the child rings a bell, experimenter will come back and child can eat the pretzel.
If the child waits, experimenter will come back with 3 pretzels.
Most children chose the impulsive option.
But there is apparently a correlation with age, SES, IQ scores.
(correlation!)
58
Operant Conditioning
Mischel & Baker (1975)
Self control less likely if children are instructed to think about the taste of the pretzels (e.g., how crunchy they are).
Self control was more likely if they were instructed to think about the shape or colour of the pretzels.
60
Much human data replicated with animals by Neuringer &
Grosch (1981) .
For example, making food reinforcers visible upset self control, but an extraneous task helped self control.
61
Can nonhumans be trained to show sustained self control?
Mazur & Logue (1978) - Fading in self control
Choice 1
Choice 2
Delay (s) Magnitude (s)
6
6
2
6
Preferred Choice 2 (larger magnitude, same delay) -- Self control
Over 11,000 trials, they faded the delay to the smaller magnitude (Choice 1) to 0 s -- and self control was maintained!
62
Additionally, and this is important, self control was even maintained even when the outcomes were reversed between the keys.
In other words, the pigeons didn’t have to be re-taught to choose the self control option, but applied it to the new situation.
63
Contingency contracting
A common therapeutic procedure: e.g., “I give you my CD collection, and agree that if I don't lose
0.5 kg per week, you can chop up one of my CDs -- each week.”
You use the facts of self control -- i.e., you say "let's start this a couple of weeks from now" and the client will readily agree -- if you said, "starting today", they most likely would not.
It's easy to give up anything next week...
64
• Tell your friend to pick you up
• Let everyone know you’ve stopped smoking
• Avoid discriminative stimuli
• Train incompatible behaviors
• Bring consequences closer in time
Social dilemmas
A lot of the world’s problems are problems of self control on a macro scale.
-Investment strategies
Rachlin, H. (2006). Notes on discounting. Journal of the
Experimental Analysis of Behavior, 85, 425- 435.
“ In general, if a variable can be expressed as a function of its own maximum value, that function may be called a discount function. Delay discounting and probability discounting are commonly studied in psychology, but memory, matching, and economic utility also may be viewed as discounting processes.”
66