Class 16 - Theories of Reinforcement

advertisement
Theories of Reinforcement
Law of Effect
• The law of effect is circular:
– What is a reinforcer? An event that
increases behavior.
– What events increase behaviors?
Reinforcers.
• How can we break this circularity?
Need-Reduction Theory
• Deprivation (e.g., food, water) lead to
• Deficiencies (e.g., changes in tissue,
endocrine system) which give rise to
• A need to restore the status quo
(homeostasis).
• Reinforcers - stimuli that decrease some
biological need state. Events that do not
do so will not be reinforcers.
Problems with Need-Reduction
Theory
• Sexual behavior: Male rats learn to run to
a female in heat with no copulation.
• Rats pressing lever for electrical brain
stimulation (Olds, 1958)
• Rats will work for saccharine
• It takes time for food to be ingested and
the biological need to be reduced –
– Very long delay of reinforcement.
Drive-Reduction Theory
• Any strong stimulation (internal or
external) is aversive and:
– produces a negative drive state
– reinforcement occurs when this drive state is
reduced by some consequential event.
• A drive is an intervening variable:
– It is correlated with biological needs.
– Satisfying a drive state is likely to satisfy a
biological need later.
Drive-Reduction Theory
• Food pellets are reinforcing
– Satisfy a drive state.
– Eventually satisfy the need state, as well.
• Unlike needs, drives can satisfied
instantly, and there is no delay of
reinforcement problem.
Drive vs. Need
• Oakley and Pfaffmann (1962)
• Rats worked for saccharin (artificial sweetener,
no nutritive value, therefore doesn’t satisfy a
need).
• Then they placed lesions in the central nervous
system of rats (thalamus) to eliminate the sense
of taste – eliminating drive, but need still
present.
• Response rates for saccharin (no nutritive value)
and sucrose (sugar) both decreased.
• Drive important
Drive vs. Need
• Epstein and Teitelbaum (1962)
• Rats pressed levers for injection of food into
gastric tube.
• They maintained perfect control of their body
weights.
• Food into stomach can therefore be a reinforcer,
and can be used to train new behaviors.
• This procedure too eliminates drive, food
removes the need, so now need looks like it is
important.
Problems with Drive-Reduction
• Theory is unfalsifiable
– Can have a conditioned drive state for anything
• Monkeys working for visual stimuli, such as
moving toy trains (Butler, 1953).
• Or to see people in the lab.
• Rats working for light (Premack & Collier, 1962).
– But they prefer the dark…
• Exploratory drive? Curiosity drive? Motherhood
drive?
• It is just too easy to think of new drives to
explain everything that animals will work for...
Problems with Drive-Reduction
• Prediction of reinforcers
– If we had an adequate theory of reinforcement, we
could predict what events were reinforcers
• Especially problematic in applied work
– everyone “needs” a different reinforcer.
• Can predict some things
– Make a person hungry, food will probably be a
reinforcer
– But which food?
• Back to the process:
– The details may be different but the underlying
processes should be the same.
Classical Reinforcement Theory
• Assumptions:
– Some responses are reinforcible, and are always
reinforcible.
– Other responses are never reinforcible
(consummatory responses).
responses
that can be
reinforced
instrumental
responses
responses
that can not
be reinforced
consummatory
responses
Classical Reinforcement Theory
• Assumptions:
– Reinforcers are regarded as events, as stimuli
– Reinforcers are trans-situational
• If a stimulus reinforces lever presses, it should also
reinforce other responses.
stimuli that
are always
reinforcers
stimuli that
are never
reinforcers
Premack Theory
• Premack suggested a way to predict a priori what events
would be reinforcers.
– Denied assumptions of classical reinforcement theory.
• The reinforcement process:
– Relation between responses
– Not a relation between responses and consequential stimuli.
• No clear boundaries between behaviors and reinforcers
– Is it food that is the reinforcer, or eating?
– Is it the toy, or playing?
• Premack's principle of positive reinforcement:
– If an instrumental response is followed by a contingent response
that is more highly probable, the instrumental response will
increase in frequency.
Premack Theory
• Stated another way:
– If a less probable response is followed by a
more probable response, the less probable
response will increase in frequency.
• L: less probable response
• H: more probable response
– Reinforcement is L → H. (Probability of L
increases)
• But if H → L
– H will decrease in frequency
– Punishment
Premack Theory
• Measure unconstrained baseline behavior and
rank all activities in terms of their probability.
• Every behavior can reinforce behavior down the
list and punish behavior further up.
• The Premack principle implies the following:
– 1. That all responses are potentially reinforcible whereas classical theory said some were, and some
were not.
– 2. That all responses are potentially reinforcers for
other, less probable, responses.
• Premack's indifference principle
– Irrelevant how the current behavior probabilities got to
be what they are (e.g., through deprivation, learning,
or whatever). All that matters is the current
probabilities.
Testing Premack
• Premack (1963) used four monkeys, and
designed four manipulanda –
–
–
–
A door they could open (D)
A lever they could press (L)
A horizontal lever they could operate (H)
A plunger they could operate (P)
• Only 1 of the 4 (Chicko) showed clear responseprobability differentials
– So, only Chicko can be used to test the theory.
• Chicko’s order: H>D/L >P
Results for Chicko from Premack (1963)
Item Independent Paired
Contingency
P(H) 78
68
243
P(D)
93
214
L(H) 270
326
342
P(L)
40
246
D(H) 382
274
467
L(D) 270
233
245
H(P) 543
584
382
D(P)
382
369
298
H(D) 543
459
424
78
78
H>D/L>P
No
reinforcement
Summarizing Premack's (1963) Results
• Chicko ordered H>D/L>P
• In contingency tests,
– H reinforced D, L, and P
– D and L did not reinforce H, but did reinforce P
– P did not reinforce any other response.
• Thus, D and L were both reinforcers for P
– but not reinforcers (for H).
• Contrary to classical reinforcement theory:
– Events were either reinforcers, or not reinforcers
– Violating the trans-situationality assumption
Reinforcing Consummatory
Responses
• Premack (1959) used children as subjects.
– Could play pinball or eat chocolate.
• 61% of the children preferred to play
pinball
• 39% preferred to eat chocolate
• Divided each of these groups into two
subgroups.
– Eat-to-Play
– Play-to-Eat
Premack (1959)
• For the Pinballers:
– Eat to Play increased Eat significantly
– Play to Eat increased Play only a very small
amount
• For the Eaters:
– Eat to Play very small increase in Eat
– Play to Eat increased Play significantly
• Supports Premack theory
• Demonstrated that eating, a
consummatory response according to
classical reinforcement theory, could be
reinforced
Premack (1962)-Assumptions
• 1. Depriving a rat of water for 24 hours
increases the probability of the rat drinking when
water is available.
• 2. Depriving a rat of running in an activity wheel
for 24 hours increases the probability of running
when the wheel is presented.
• If running is more probable than drinking, then
according to his theory running should reinforce
drinking.
– Reverse of the usual reinforcer relation: run to drink
Premack (1962)
• Condition 1
– Rats had 24-hour access to an activity wheel, but
were not allowed to drink.
– In 1-hour test session with activity wheel and
drinkometer, spent, on average, 240 s drinking and 50
s running.
– Drinking was more probable than running.
• Condition 2
– Rats had 24-hour access to water, but they were not
allowed to run.
– In a 1-hour test session, they drank for 28 s on
average, and ran for 329 s.
– Running was more probable than drinking.
Premack (1962) Results
• Two contingency tests for both groups
– Drink to run
– Run to drink
• Under Condition 1 (drinking > running), run-todrink reinforced running, but drink-to-run did not
reinforce drinking.
• Under Condition 2 (running > drinking), drink-torun reinforced drinking (drinking time went from 28
to 98 s), but run-to-drink did not reinforce running.
• Increase in drinking in the drink-to-run contingency
– Reinforcement of a typical consummatory response.
• Supports Premack's indifference principle
– Doesn't matter how the responses got to be at their
current probability.
Punishment
• If a high probability response is followed
by a lower probability response, the highprobability response should decrease in
frequency.
– i.e. H → L
• Problem: If the response has a low
probability, how can we get the animal to
emit it?
Weisman and Premack (unpublished)
• Used motorized activity wheel: makes rats run
• Condition 1: 24 hours water, no activity wheel.
– Running reinforced Drinking in contingency tests.
• Condition 2: 24 hours activity wheel, no water.
– Test: Drink was followed by a 5-s forced run.
– The forced run decreased drinking substantially
• Aged rats (who had a very low probability of
running) completely stopped drinking in the test
sessions when drinking was followed by the
forced run.
• Typical punishment effects.
Relations: Reinforcement and Punishment
• H  L  H  L  H etc
• Premack (1969): reinforcement and punishment
two sides of the same coin, and cannot be
separated.
– Reinforce anything you’re punishing something else
• Value isn't in the thing itself, it’s in the behavior that
the thing controls.
– Value of old bottle of water in your car?
– Equally, food has no value, except if you can indulge in
eating.
• Symmetrical positive and negative laws of effect?
– Or a single law concerned with how both response
probabilities change when two responses occur in close
temporal conjunction.
Premack
• Premack has had a profound effect on the
theory of reinforcement
– Newer theories (such as response-deprivation
theory and behavior-regulation theory)
– Applications in real-life situations.
Applications of Premack
• Homme, deBaca, Devine, Steinhorst and Rickert
(1963)
– 3-year old nursery-school children: want them sitting
quietly in chairs
– Verbal instructions ineffective
• High probability behaviors:
– Running around the room, screaming, pushing chairs,
or quietly working on jigsaw puzzles
• Low probability behaviors:
– Sitting in their chairs
• Make the high probability behaviors contingent
on the low probability behaviors:
– Contingency: Sit and attend  Bell + "Run and
Scream!"
Homme, deBaca, Devine,
Steinhorst and Rickert (1963)
• Later: Tokens earned for low-probability
behaviors which could be used to buy highprobability behaviors
• Control was virtually perfect after just a few
days.
• "In summary, even in this preliminary,
unsystematic application, the Premack
hypothesis proved to be an exceptionally
practical principle for controlling the behavior of
nursery school Ss."
Token Economies
• Token economy programs
– developed for the rehabilitation of long-term schizophrenic
patients
• The behavior of such patients can be brought under
reinforcement control (Ayllon & Azrin, 1965; Atthowe & Krasner,
1968; Winkler, 1970).
• But all report patients who failed to respond to the token
regime.
– 18% of Ayllon and Azrin's
– 10% of Atthowe and Krasner's
• Despite the use of multiple reinforcers, reinforcer
sampling and reinforcement exposure (Ayllon & Azrin,
1968) designed to increase the utilization of back-up
reinforcers, these patients did not work for the available
rewards.
Mitchell & Stoffelmayr (1973)
• "Application of the Premack Principle to the
behavioral control of extremely inactive
schizophrenics”
• The ward with the most severely ill chronic
patients was chosen. The ongoing treatment
included industrial therapy, ward domestic work,
and weekly group discussions.
• Selected the 4 most inactive patients..
– Identified items that could be used as reinforcers:
candies, cigarettes, fruit, and cookies.
– For two patients cigarettes and fruit were used
maintain working
• Found no dispensible reinforcers for WL and PM
– Used the Premack principle directly
Number of 30-s intervals in
which an instance of work
occurred in 30 minutes.
Used WL and PM in tests of
Premack theory
Mitchell & Stoffelmayr (1973)
• Instructions + reinforcement sessions.
– Response that occurred freely with very high frequency was
sitting, which was used for both patients.
• Shaping sessions: experimenter approached the patient
and asked him to stand.
– If the patient remained seated, the experimenter would tip the
subject's chair forward until the patient stood up.
– The patient then given a coil and if they removed some wire they
got to sit down.
– Repeat after 90 seconds, gradually increasing response
requirement
– By Session 14, WL was removing three coil wrappings before
reinforcement was given, while PM achieved this by Session 17.
– In the subsequent sessions, the patients were allowed to remain
seated while working. Reinforcement became resting.
Inactive catatonic schizophrenic patients
• "The present results suggest that even the
most severely inactive patient will respond
to a reinforcement regime. The strict
application of Premack's principle then
may have considerable therapeutic
application for those patients, who in
refusing to accept any tangible reward, do
not respond to the token regime."
Changing Relations: Premack
(1969)
• The following responses were offered to
rats:
– Drink 4% sucrose
– Drink 32% sucrose
– Run
• Responses tracked over a session
Tests: Drinking 32% sucrose reinforced running at the start of the
session, but at the end of the session, the opportunity to run
reinforced drinking 32% sucrose
Problems with Premack
• Response probabilities can fluctuate within
a session, therefore difficult to measure.
– In an applied setting, we might offer the
clients a choice:
– “What do you want to do after this activity?”
• Research has found that a lowerprobability response can under certain
circumstances reinforce a higherprobability response.
Response-Deprivation Hypothesis
• According to the probability-differential view (the
original version of Premack theory), a lowerprobability response can never reinforcer a
higher-probability response.
• A few studies, however, have shown that this
can happen if the animal is prevented from
emitting the lower response activity at its
baseline level. Any response, therefore, can be
reinforcing (Timberlake & Allison, 1974).
• And you don’t get the reinforcement effect if the
contingent response can be emitted at the
baseline level.
Summary
• Need-reduction and drive-reduction
theories could not provide satisfactory
explanations of reinforcement.
• Premack has changed how reinforcement
is regarded: Instead of a stimulus/event, it
is regarded as a response.
• Response-deprivation hypothesis is one
example of how Premack’s principle has
been refined.
Download