5. Operant Conditioning V2

Reward and Punishment
Cats escape from box to get a treat
At first its all trial and error
When successful the behaviour is rewarded
This good consequence strengthens the
Law of effect – good consequence more
likely to be repeated, bad consequence not
Instrumental learning – the cat is active in
achieving its own escape and reward
A learning process by which the likelihood of a
particular behaviour occurring is determined by
the consequences of that behaviour
Theory of Operant Conditioning - Behaviour
operates on the environment and our behaviour
is instrumental in producing the consequences Rewards/Punishments
US psychologist Burrhus Frederic Skinner (1904
– 1990) referred to the responses observed in
trial and error learning as operants.
American Psychologist, B.F Skinner (1904 –
1990) believed behaviour can be reduced to the
relationships between the behaviour, its
antecedents (the events that precede it), and its
Operant - a response (or set or responses) that
occurs and acts (“operates”) on the environment
to produce some kind of effect. It is a response
or behaviour that generates consequences.
Operant Conditioning is based on
Thorndike’s law of effect that an organism
will tend to repeat a behaviour (operants)
that have desirable consequences (e.g.
receiving a treat) or that will enable it to
avoid undesirable consequences (e.g. Given a
detention). Organisms will tend not to repeat
a behaviour that has undesirable
consequences (e.g. Disapproval or a fine)
3 components:
 1. Stimulus (S) that precedes an operant response
 2. Operant response (R) to the stimulus
 3. Consequence (C) to the operant response
Sometimes expressed as:
where second S is a stimulus in the form of a
The model means the probability of an operant
response (R) to a stimulus (S) is a function of (depends
on) the consequence (C) that has followed (R) in the
e.g. Cat in puzzle box, the S is the box, R the sequence of
movements needed to open the door and C is escape
and food.
See further examples in Table 10.2 (page 479)
Skinner used the term “operant conditioning”
rather than “instrumental learning” as he
wanted to emphasise that animals and
people learn to operate on the environment
to produce desired or satisfying
He proposed that in Thorndike’s experiments
the cat “operated” on the environment to
allow it to escape and get the fish reward.
The operant that became conditioned was the behaviour
of pushing the lever to open the door.
Skinner also contrasted operants with respondents in
classical conditioning. Respondents are behaviours
produced by known or recognised stimuli.
e.g. Pavlov’s dogs responded by salivating to meat powder
and later the bell. Thorndike’s cats made many different
responses that were not prompted by a particular
stimulus. The dog receives a consequence (food) whether
or not it has learned the conditioned response.
This is why Skinner referred to classical conditioning as
“respondent conditioning”.
In operant conditioning the consequence only
occurs if the organism performs the
In operant conditioning, if responses are not made,
the consequence doesn’t happen. In classical
conditioning, responses occur regardless of
Skinner believed that ALL behaviour could be
explained by the relationships between the
behaviour, its antecedents (events occurring
before it) and its consequences.
Skinner argued that any behaviour that is
followed by a consequence will change in
strength (become more, or less, established)
and frequency (occur more, or less often)
depending on the nature of that consequence
(reward or punishment).
The Skinner Box is a small chamber in which
an experimental animal learns to make a
particular response for which the
consequences can be controlled by the
It contains a lever that delivers food (or
water) into a dish when pressed.
Some boxes also have lights and buzzers,
some have grid floors that can deliver a mild
electric shock.
Lever is usually wired to a cumulative
recorder (chart paper with a pen that makes a
special mark each time a desired response is
The recorder indicates how often (frequency)
of response and the rate of response (speed).
Rats – press level
Pigeons – peck disc.
Skinner referred to different types of rewards as
He used the Skinner Box to reward the animals
according to different types of programs or
schedules of reinforcement.
The fact the rats were hungry provided the
motivation for their frantic activity, increasing
the probability the lever would eventually be
pressed and the food reward dispensed.
Skinner believed there was no need to search
for internal agents (factors within an
organism) to explain changes in behaviour.
He based his view on the notion that
behaviour can be understood in terms of
environmental or external influences, without
any consideration of internal mental
Any stimulus (event or action) that
subsequently strengthens or increases the
likelihood of the response (behaviour) that it
The reinforcer comes after the response
Reinforcement makes things stronger
Reinforcement can involve receiving a
pleasant stimulus (e.g. Treat for your dog) or
avoiding or escaping an unpleasant stimulus
(e.g. Umbrella on a rainy day).
An essential feature of reinforcement is that
it is only used after the desired or correct
response is made.
A reinforcer is any stimulus (object or event) that strengthens or
increases the frequency or likelihood of a response that it follows.
The word reinforcer is often used interchangeably with the word
reward (although they are not technically the same).
One difference is that a reward suggests an outcome that is positive,
such as satisfaction or pleasure.
A stimulus is a reinforcer if it strengthens the preceding behaviour.
Also, a stimulus can be rewarding because it’s pleasurable, but is not a
reinforcer unless it increases the frequency of a response or the likelihood
of a response occurring.
e.g. Eating chocolate is pleasurable but is not a reinforcer unless it
promotes or strengthens a particular response.
Positive Reinforcer
PLUS something GOOD
A stimulus which strengthens a response by
providing a pleasant or satisfying
Skinners experiment = food pellets
Negative Reinforcer
MINUS something BAD
A stimulus that strengthens a response by the
reduction, removal or prevention of an
unpleasant stimulus
The behaviour that removes reduces or prevents
an unpleasant stimulus is strengthened by the
Skinners experiment = electric shock
Taking Panadol for headache
Driving slow to avoid fine
Positive reinforcement add good
Negative reinforcement take away bad
Both STRENGTHEN a response
Overall outcome is desirable to organism, just
have achieved it in different ways
Positive punishment - the delivery of a stimulus following an
undesirable response
Negative punishment – the removal of a stimulus following an
undesired response
Punisher – an unpleasant stimulus that when paired with a response
weakens the response or decreases the rate of responding over time
Punishers reduce unwanted behaviour
It is usually more effective to reinforce alternative desirable behaviour
than it is to punish undesirable behaviour
Negative punishment often referred to as
response cost
When a valued stimulus removed
Eg. If you drink drive we will take away your
The way reinforcement is delivered is referred
to as the “schedule of reinforcement”.
It is a program for giving reinforcement,
specifically the frequency and manner in
which a desired response is reinforced.
The schedule influences the speed of learning
and the strength of the learned response.
Continuous Reinforcement necessary for a
response to become learned
Partial Reinforcement can be more effective
at maintaining a response
Fixed Ratio
 Fixed number of correct responses
 Being paid $5 for every 100 newspapers delivered
Variable Ratio
 Variable number of correct responses
 Poker machines
Fixed Interval
 Fixed time period
 Teachers at Gleneagles get paid every fortnight
Variable Interval
 Variable time period
 Fishing
The variable ratio schedule is the most
resistant to extinction
It leads to the fastest rate of responding
Gambling addiction is explicable through
variable ratio reinforcement
Order of presentation – reinforcement needs to
occur after the desired response not before! So the
organism associates the reinforcement with the
Timing – Reinforcers need to occur as close in time to
the desired response as possible. Most effective
reinforcement occurs immediately after the desired
Appropriateness of the reinforcer – For a stimulus to
be a reinforcer it must provide a pleasing or satisfying
consequence for its recipient.
Reinforcers that work in one situation will not always
work in another.
The characteristics of the individual involved and the
particular situation need to be taken into account
when deciding on the best kind of reinforcer to be
An inappropriate punisher can have the opposite
effect and produce the same consequence as a
reinforcer. (e.g. Giving verbal reprimand from a
teacher to an attention seeking , talkative Year 8
student can act as a reinforcer for the talkative
Punishment may temporarily decrease the
occurrence of unwanted responses or behaviour,
but it doesn’t promote more desirable or
appropriate behaviour in its place.
So, instead Skinner advocated for the greater
use of positive reinforcement to strengthen
desirable behaviours or to promote the learning
of alternative behaviours to punishable
Same key processes as in classical
Refers to the overall learning process during
which a specific response, or pattern of
responses is established.
THE MEANS by which this is acquired is different
between operant and classical conditioning.
TYPES OF BEHAVIOURS acquired through
operant conditioning are usually more complex
than the reflexive involuntary responses in
classical conditioning.
Acquisition in operant conditioning is the
establishment of a response through reinforcement.
Speed of establishment of response depends on the
schedule of reinforcement.
Sometimes, a behaviour to be acquired is too complex
to be performed completely at the end of the
acquisition process, so a simpler version of the
behaviour or a step towards the target behaviour is
attempted and reinforced continuously until it is
established. This involves a procedure called shaping.
Extinction – the gradual decrease in the
strength or rate of responding after a period of
non- reinforcement. Extinction occurs after the
termination of reinforcement.
Extinction has occurred when a conditioned
response is no longer present.
Depending on whether partial or continuous
reinforcement has been used, the response rate
may actually increase in the initial phase of
extinction after reinforcement is stopped.
There is often reluctance to stop the response
altogether as it has had satisfying
Frustration and anger may also accompany
the increased response rate.
Extinction is less likely to occur when partial
reinforcement is used. Uncertainty leads to
greater tendency for response to continue.
Spontaneous recovery – the response is (after a
rest period) again shown in the absence of
Response is likely to be weaker and will probably
not last very long.
A spontaneously recovered response is often
stronger when it occurs after a lengthy period
following extinction of the response than when
it occurs relatively soon after extinction.
Stimulus generalisation - occurs
when the correct response is made
to another stimulus which is similar
to the stimulus for which
reinforcement is obtained.
Response usually occurs at a
reduced level (frequency and
strength) e.g. pigeons pecked other
colored lights
Stimulus discrimination organism makes response to a
stimulus for which reinforcement is
obtained but not for any other
similar stimulus (e.g. sniffer dogs
used by drug detection units)
Shaping – a strategy in
which a reinforcer is
given for any response
that successively
approximates and
ultimately leads to the
final desired response
Used to train behaviours
that are unlikely to occur
Also known as the method
of successive
Used when the desired
response has a low
probability of occurring
Used in real life – dolphins
at SeaWorld for
entertainment purposes,
search and rescue dogs
tracking skills, guide dogs.
Learning to write
Children learning to swim
Monkeys trained to assist
quadraplegics (Read Box
10.7 on page 499)
The consistent use of
Operant conditioning to
alter behaviour over
 Use of tokens as
rewards that can be
‘cashed in’ for bigger
rewards later
 Schools
 Prisons
Token Economies are a
form of behaviour
modification using
reinforcement tokens to
influence behaviour
E.g. Prisons – tokens
cashed in for rewards
such as cigarettes or
A token economy is a
setting in which an
individual receives
tokens (reinforcers) for
desired behaviour and
these tokens can then be
collected and exchanged
for other reinforcers in
the form of actual or
“real” rewards.
E.g. Prisons, schools
Tokens may be
withdrawn as “penalties”
for undesirable
Advantage of tokens:
can be used in large
group situations where
real rewards are difficult
to administer
immediately after a
desired behaviour
Once desired behaviour
is established, tokens can
be phased out and
replaced by more
“natural” and easily
administered reinforcers
(e.g. Praise, smile).
E.g. Schools – to
increase reading by
students, improve social
skills of students with
intellectual disabilities.
Sometimes token
economies backfire or fail.
People may feel
manipulated and refuse to
Situations are so complex
and uncontrolled that well
planned programs can go
wrong. (e.g. Not smiling
when delivering reinforcer)
Operant conditioning
procedures may fail also
when the underlying cause
of a behaviour is not
e.g. Rewarding
cheerfulness when the
gloominess is caused by a
boring job – the solution
may lie in changing jobs.