Chap7b

advertisement
PSY 402
Theories of Learning
Chapter 7 – Behavior & Its Consequences
Instrumental & Operant Learning
Stimulus Control

Skinner discovered that stimuli (cues) provide
information about the opportunity for
reinforcement (reward).



The stimulus sets the occasion for the behavior.
Fading – gradually transferring stimulus
control from a simple stimulus to a more
complex one.
Operant behavior is controlled by both stimuli
and reinforcers.
Discriminative Stimuli



Discriminative stimuli act as “occasion
setters” (see Chap 5) in classical conditioning.
The stimulus that signals the opportunity for
responding and gaining a reward is SD.
The stimulus that signals the absence of
opportunity is SD.
7.6 Simple demonstration of stimulus control
SD
Light signals that
reward is available.
SD
Types of Reinforcers

Primary reinforcer – stimuli or events that
reinforce because of their intrinsic properties:


Food, water, sex
Secondary reinforcer – stimuli or events that
reinforce because of their association with a
primary reinforcer:


Money, praise, grades, sounds (clicks)
Called conditioned reinforcers.
Behavior Chains



Secondary (conditioned) reinforcers reward
intermediate steps in a chain of behavior
leading to a primary reinforcer.
Secondary reinforcers can also be
discriminative stimuli that set the occasion for
more responding.
Classical conditioning is a glue that enables
chains of behavior leading to a goal.
7.7 A behavior chain: Each response produces a stimulus
Schedules of Reinforcement

Behavior is recorded continuously on a drum
recorder.




A cumulative graph shows the rate of responding
over time.
The steepness of the line indicates how quickly
the rat is responding
Hash marks indicate when reward was given.
CRF (Continuous reinforcement) – the rat is
rewarded every time it does the behavior.
7.8 Cumulative recorder (Part 1)
Drum recorder
7.8 Cumulative Record: Results (Part 2)
Ratio Schedules

Fixed Ratio (FR) – there is a ratio between
responding and reward.



The rate is rewarded for every xth behavior.
FR-15 means the rat gets one reward for every 15
behaviors (e.g., bar presses).
Variable Ratio (VR) – the number of
responses needed varies, but averages out to a
particular ratio.

VR-15 – the ratio varies but averages to 1:15.
Interval Schedules

Fixed Interval (FI) – rewards are given for the
first response after a given amount of time has
passed.


FI-15 means one reward is given after 15
minutes, but only if the rat does the behavior.
Variable Interval (VI) – rewards are given
after varying amounts of time that average to
a particular interval.

VI-15 means one reward after average of 15 min.
7.9 Cumulative records showing typical responding on different schedules of reinforcement
Pause after
each reward
Response is
inconsistent
Best
Worst
Effects of Schedule on Behavior



FR leads to steady responding but post
reinforcement pauses occur after each reward.
VR leads to a high rate of responding with no
pauses – never know when reward will occur.
FI leads to behavior right before the end of
each interval, with goofing off in between.


Scallops in the cumulative record
VI leads to the lowest rate of responding.
Compound Schedules

Multiple schedules – two or more schedules
alternate, each signaled by a different SD.


Chained schedules – completion of one leads
to the beginning of a new schedule (with SD).


Mixed schedules – schedules alternate but no
stimulus signals which type is being used.
Tandem schedules – liked chained but no SD.
Differential High/Low Responding – specifies
the behavior and the deadline (interval).
Choice


Concurrent schedules – two different types of
behavior are offered, each with its own
schedule of reinforcement.
Behavior on concurrent schedules follows
Herrnstein’s Matching Law.



The proportion of behavior allocated to a choice
is the same as the proportion of reward offered.
B1 / (B1 + B2) = R1 / (R1 + R2) or
B1 / B2 = R1 / R2
7.10 Matching on concurrent VI VI schedules (Part 1)
7.10 Matching on concurrent VI VI schedules (Part 2)
7.11 Matching in humans
The Law Works for Reward Size


The amount of responding is proportionate to
the relative reward sizes.
If V1 and V2 are different reward sizes, then



B1 / B2 = V1 / V2
The Matching Law says nothing about what
people or rats are thinking.
Melioration – a strategy of shifting between
two choices until the rewards are equal.
A Law for One Choice

If the total amount of behavior (B1 + B2) is K,
then the rate of responding to a single choice
(B1) is:



B1 = K x R1 / (R1 + Ro)
Ro is the reinforcement rate for some other choice
(the reward for doing something else).
This is called the Quantitative Law of Effect
because it predicts the amount of responding.
7.12 Response rates of six pigeons, each one tested with several different VI schedules
Implications of the Law

According to the Law, a particular behavior
can be weakened by providing rewards for
other behaviors in the environment.



Drug abuse is more likely for people who have
little other reward in their lives.
Problems can be prevented by making sure there
are reinforcers for pro-social behaviors.
More positive environments can be built.
Impulsiveness

Delayed gratification – the willingness to set
aside an immediate reward in favor of a longterm, larger reward.



People find this difficult to do.
Self-control = delaying gratification.
Impulsive behavior is more likely when small
rewards are imminent (immediate, salient).
7.13 Self-control increases as the time between a choice and reward increases
7.14 The present value of a delayed reward depends on how far you are from the reward in time
Small reward is
worth more than
large reward at
this time.
Hot and Cold Thoughts


Imagining the desirable qualities of an
immediate reward undermines self control.
Distraction by thinking about something
unrelated supports self control.



Drug abusers have difficult with self-control.
Impulsivity may be domain-specific (depend on
the kind of reward involved).
Although mentalistic, “self-control” is defined
in terms of specific behaviors and choices.
Behavioral Economics


Not all reinforcers are alike – substitutability
is a continuum (varies).
Demand curve – does consumption vary with
price?


Elastic commodities do, inelastic ones
(necessities) do not.
Reinforcers can be substitutes, independents,
or complements, depending on their demand
curves.
7.15 These curves describe the demand for a commodity as a function of its price
7.16 Demand for two commodities as one of them increases in price
Theories of Reinforcement

Drive Reduction – Hull



Reinforcement occurs when the consequence of
behavior reduces a drive (hunger, thirst).
Not everything reinforcing reduces a drive, and
some reinforcers increase drives (stimulation).
Premack’s Principle – behaviors can be
reinforcers (not just stimuli such as food).

The chance to do a preferred behavior is a reward
Problems with Premack’s Principle



Prior preferences are important to the theory,
but how can they be determined in advance?
Restricting a behavior creates a void where
the person must do something – this may
account for the observed increase, not reward.
Access to even a less-preferred but restricted
behavior can be reinforcing.

The reinforcer need not be preferred behavior.
Behavioral Regulation Theory

Response deprivation theory – every behavior
has a natural level (the amount someone
wants to do if there are no restrictions).


A behavior will be rewarding if restricted below
the natural level.
Also called behavioral regulation theory.
Blisspoint


The blisspoint is the amount of each of two
behaviors someone would do if unrestricted.
Minimum distance model – someone will do
enough of each of two behaviors to get as
close as possible to the blisspoint.

When two behaviors are contingent, the blisspoint
is the perpendicular distance from the line for a
reinforcement schedule.
7.17 Reinforcing effect as a function of the rat's preference for each reinforcer
7.18 The minimum distance model
7.19 Number of responses to reach “bliss point” depends on the reinforcement schedule
Selection by Consequences



Reinforcers select behaviors by weeding out
the ones that are less efficient in obtaining
rewards.
Skinner called this “selection by
consequences.”
A process similar to evolution encourages
some behaviors and leads to extinction of
others, shaped by consequences of actions.
Download