operant conditioning

advertisement
OPERANT
CONDITIONING
OPERANT CONDITIONING

Many of the behaviours in animals and humans
cannot be explained in terms of classical
conditioning.

Many complex behaviours appear to be voluntary,
goal-directed and governed by anticipated
consequences or rewards.

Different principals are needed to explain how
complex, goal-orientated behaviour is learned and
changed.
TRIAL AND ERROR LEARNING

Trial and error learning describes an organism’s
attempts to learn, or to solve a problem, by trying
alternate possibilities until a correct solution or
desired outcome is achieved.

It involves a number of attempts (trials) and a
number of incorrect choices (errors) before the
correct behaviour is learned.

Once learned, the behaviour will usually be
performed quickly and with few errors.
TRIAL AND ERROR LEARNING

Sometimes referred to as:
-instrumental learning as in the individual is
instrumental in learning the correct response
-operant conditioning as in the individual
operates on the environment to solve the
problem.

Trial and error learning involves motivation,
exploration, incorrect and correct responses, and
reward.
TRIAL AND ERROR LEARNING

Receiving a reward of some kind leads to the
repeated performance of the correct responses,
strengthening the association between the
behaviour and its outcome.
The number maze and the learning
curve





Negotiate the maze by drawing a line between
each consecutive number starting at number 1.
You will be given a 1 minute interval for each
maze.
Repeat the procedure for each maze.
Record the number you reached in each maze
in the time allowed.
Plot a graph of these numbers against the 10
trial numbers.
The number maze and the learning
curve
1.
2.
3.
4.
What is the shape of the graph?
How is the shape of this graph different
from the shapes of graphs obtained by
Thorndike?
Work out how long on average it took to
get from number to number in the first
trial as compared to the last trial (no. of
no.’s / 60 seconds).
What was the reinforcement that caused
learning to occur in this case?
THORNDIKE’S EXPERIMENT
WITH CATS.

In the early years of the twentieth century,
about the same time Pavlov was investigating the
digestive system of dogs, Edward Thorndike was
performing experiments that would form the
basis of operant conditioning.

In Thorndike’s puzzle box experiment, he would
place a cat inside a puzzle box and put a fish
outside the box. The idea was to observe and
time the cats attempts to escape the box and get
to the fish.

At first the cat showed a wide range of random
behaviours in attempting to escape the box, until
it accidentally stepped on a leaver in the middle
of the box which released the door.

The cats behaviour gradually became less random.

Each time it was put in the box the cat would
escape a little more quickly, until eventually it
escaped as soon as it was put back in the box.

Because the cat had started with random
behaviour and had gradually learned the solution
to the puzzle box, Thorndike believed that
learning was a trial and error process.

Thorndike found that the animal learned those
behaviours that were followed by pleasant
consequences, while other behaviours were not
repeated. This became known as the law of effect.

The law of effect suggests that behaviours that
lead to positive consequences are repeated and
behaviours that do not lead to positive
consequences are not repeated.

The conditioning process became known as
instrumental conditioning, because behaviour is
instrumental in obtaining rewards.

Although it was formulated to explain goaldirected behaviour, operant conditioning
attempts to explain such behaviour in terms of
what has happened in the past.
OPERANT CONDITIONING

The term ‘operant conditioning’ was not
introduced until years after Thorndike’s
experiments with cats.

This term was coined by a man named Burrhus
Skinner.

He suggested that an operant is a response (or
set of responses) that occurs and acts on the
environment to produce some kind of effect.

Essentially an operant is a response of behaviour
that generates consequences.

Before conditioning, an organism might make many
operant responses. (The cat clawing and biting).

Operant conditioning is based on the principle
that an organism will tend to repeat behaviours
that have desirable consequences, or that will
enable it to avoid undesirable consequences.

Furthermore, organisms will tend not to repeat
behaviours which have undesirable consequences.
SKINNER’S EXPERIMENTS
WITH RATS

Skinner created an apparatus called a Skinner
Box.

A Skinner Box is a small chamber in which an
experimental animal learns to make a particular
response for which the consequences can be
controlled by the researcher.

It has a leaver which delivers a reward (food)
when pushed.

Some boxes have lights, buzzers and grid floors
which provide mild electric shocks.

The lever is also attached to a cumulative
recorder which tracks the desired responses,
their frequency and speed.

Rats and pigeons were used for these
experiments.

Skinner 1938, classic experiment to demonstrate
operant conditioning.

When a hungry rat was placed in the box, it would
scurry around, randomly touching the floor and
walls.

Eventually it would accidentally press the leaver
on the wall in which case a pellet of rat food
would drop into the food dish and the rat would
eat it.

With additional repetitions of leaver pressing
followed by food, the rat’s random movements
began to disappear and were replaced by more
consistent lever pressing.

Eventually the rat was pressing the lever as fast
as it could eat the pellets.

The pellet was a reward for making the correct
response.

Skinner referred to different kinds of rewards
as reinforcers.

Skinner wanted to demonstrate the impact of
reinforcement according to different types of
schedules of reinforcement. Eg. Every time a
correct response is made compared with every
second time the response is made.

Thorndike’s cats could see their reinforcement
from the box they were placed in, so although it
took them many trials to make the correct
response, their motivation was clear.

Skinner’s lab animals came across their
reinforcement by chance.

Skinner had to use hungry rats in order for them
to act erratically and hit the leaver by chance.
ELEMENTS OF OPERANT
CONDITIONING

Central to operant conditioning is reinforcement
because learning through operant conditioning
occurs as a result of consequences of behaviour.

A response that is rewarded is strengthened,
whereas one that is punished is weakened.
REINFORCEMENT

How do you train a dog?

How do you ensure that you don’t get wet when
walking in the rain?

Reinforcement may involve receiving a pleasant
stimulus (pat/food) or escaping an unpleasant
stimulus (rain).

In either case the outcome is one that is desired
by the organism performing the behaviour.

Reinforcement is applying a positive stimulus or
removing a negative stimulus to subsequently
strengthen or increase the likelihood of a
particular response that it follows.

The term ‘reinforcer’ is often used
interchangeably with the term ‘reward’.

The only difference is that reward suggests an
outcome that is positive, such as satisfaction or
pleasure.

A stimulus is a reinforcer if it strengthens the
preceding behaviour.
SCHEDULES OF
REINFORCEMENT.

Reinforcement may be provided on a continuous
schedule (after every correct response) or on a
partial reinforcement schedule (that is only on
some occasions).

The difference between the two is the speed
with which the response is conditioned and the
strength of the conditioned response.

In the early stages of conditioning, learning is
most rapid if the correct response is reinforced
every time it occurs.

This is known as continuous reinforcement.

Once a correct response consistently occurs, a
different reinforcement schedule can be used to
maintain, increase or strengthen the response.

Responses maintained through a program of
intermittent reinforcement are stronger and are
less likely to weaken or cease than those
maintained by continuous reinforcement.

Partial reinforcement is the process of
reinforcing some correct responses but not all of
them.

The term schedule of reinforcement refers to
the frequency and manner in which a desired
response is reinforced.

Reinforcement can be given after a certain
number of correct responses have been made
(ratio) or as a certain amount of time has passed
(interval).

Reinforcement may be given on a regular basis
(fixed) or it may be unpredictable (variable).

Behaviour that is conditioned on a schedule of
partial reinforcement is generally the most
difficult to change.

Each schedule produces a different effect on the
rate and pattern of a response.
POSITIVE REINFORCEMENT

A positive reinforcer is a stimulus that
strengthens or increases the likelihood of a
desired response by providing a satisfying
consequence (reward).

Positive reinforcement occurs from giving or
applying a positive reinforcer after the desired
response has been made.

The food pellet in the Skinner box.
Receiving a good mark if you have studied hard.

NEGATIVE REINFORCEMENT

A negative reinforcer is any unpleasant or
aversive stimulus, that when removed or avoided,
strengthens or increases the likelihood of a
desired response.

Skinner Box and electric current.

Negative reinforcement is the removal or
avoidance of an unpleasant stimulus. It has the
effect of increasing the likelihood of a response
being repeated.

The important distinction between positive and
negative reinforcement is that positive
reinforcers are given and negative reinforcers
are removed or avoided.

Both procedures lead to desirable consequences.

Examples of negative reinforcers are:
-turning off a scary video
-driving slowly to avoid a speeding fine

If you take a panadol when you have a headache
and the headache goes away, the behaviour of
taking the panadol has been negatively
reinforced, and it is likely you will repeat that
behaviour next time you have a headache.

TO REMEMBER:
-positive (+) reinforcer = adding something pleasant
-negative (-) reinforcer = subtracting something
unpleasant (which results in a pleasant or
desirable outcome.
PUNISHMENT

Punishment is the delivery of an unpleasant
stimulus following a response, or the removal of a
pleasant stimulus following a response.

It has the same unpleasant quality as a negative
reinforcer, but unlike a negative reinforcer, the
punishment is given or applied, whereas the
negative reinforcer is prevented or avoided.

Punishment is designed to weaken a response, or
decrease the probability of that response
occurring again over time.
Factors that influence the
effectiveness of reinforcement
and punishment.

Reinforcement is intended to increase the
likelihood of a behaviour being repeated and
punishment is intended to decrease the likelihood
of behaviour being repeated.
-Order of presentation
-Timing
-Appropriateness
ORDER OF PRESENTATION

To use reinforcement and punishment effectively
it is important that it is presented after a
desired response, never before.

Learning consequences of certain responses.
TIMING

Reinforcement are most effective when they are
given immediately after the response has
occurred.

This helps the organism to make the association
between the response and the
reinforcer/punishment.

If there is a delay learning will take longer.

Sometimes, in real life, it is not possible for
consequences to be given immediately.
APPROPRIATENESS

For any stimulus to be a reinforcer, it must be
pleasing or satisfying in some way.

It is not known if something is going to be a
reinforcer until after it has been used.

It cannot be assumed that a reinforcer that works
in one situation will work in other situations.

Characteristics of the individual need to be taken
into account.

A stimulus must be appropriate as a punishment,
as in it must provide a consequence that is
unpleasant, and therefore likely to decrease the
unwanted behaviour.
KEY PROCESSES IN OPERANT
CONDITIONING

The same key processes are involved in both
classical and operant conditioning, however the
way in which these processes occur is slightly
different in each.
-Acquisition
-Extinction
-Stimulus generalisation
-Stimulus discrimination
-Spontaneous recovery
ACQUISITION

Acquisition refers to the overall learning process,
during which a specific response, or set of
responses is established.

The types of behaviours acquired during operant
conditioning in comparison to classical
conditioning are generally more complex.

In operant conditioning, acquisition is the
establishment of a response through
reinforcement.

Some behaviours that are operantly conditioned
are too complex to be performed completely in
the beginning of the acquisition process.

Instead behaviours that are a simpler version of
the desired behaviour, or a step towards the
desired behaviour are rewarded instead.

This is known as shaping.

Shaping is the procedure in which reinforcement
is given for any response that successively
approximates and ultimately leads to the final
desired response, or target behaviour. (Also
known as the method of successive
approximations).
EXTINCTION

In operant conditioning, extinction may also
occur, and the process is similar to its occurrence
in classical conditioning.

Extinction is the gradual decrease in the
strength or rate of a conditioned response
following consistent non-reinforcement of the
response .

Extinction is less likely to occur when partial
reinforcement is used.
SPONTANEOUS RECOVERY

After the apparent extinction of a response,
spontaneous recovery can occur and the organism
will once again show the response in the absence
of any reinforcement.

The response is likely to be weaker.

A spontaneously recovered response is often
stronger when it occurs after a lengthy period
following extinction of the response, than when it
occurs relatively soon after extinction.
STIMULUS GENERALISATION

In operant conditioning, stimulus generalisation
occurs when the correct response is made to
another stimulus that is similar to the stimulus
that was present when the conditioned response
was reinforced.

Response usually occurs at a reduced level.
STIMULUS DISCRIMINATION

Stimulus discrimination occurs when an organism
makes the correct response to a stimulus and is
reinforced, but does not respond to any other
stimulus, even when they are similar.

Skinner trained pigeons to discriminate between
red and green lights and to peck only when they
saw a green light in order to receive
reinforcement.
COMPARISON OF CLASSICAL
AND OPERANT CONDITIONING
Common elements:
-Acquisition
-Extinction
-Spontaneous recovery
-Stimulus discrimination
-Stimulus generalisation
-Association between two events
-Often occur in the same situation


Major differences:
-Operant- emphasis on consequences
-Classical- behaviour does not have environmental
consequences
-Classical- response is involuntary/automatic
-Operant- responses are mostly voluntary
THE ROLE OF THE LEARNER

In classical conditioning the learner is relatively
passive, that is the response elicited by the
learner occurs automatically.

In operant conditioning the learner must actively
operate on the environment so as to obtain the
reinforcement or the punishment.
TIMING OF THE STIMULUS
AND RESPONSE

In classical conditioning the response depends on
the presentation of the UCS occurring first.

In operant conditioning the presentation of the
reinforcer depends on the response occurring
first.

In classical conditioning the timing of the two
stimuli produces an association between them
that conditions the learner to anticipate the UCS
and respond to it even if it is not presented.

In operant conditioning, the association that is
conditioned is between the stimulus and the
response. The response is either strengthened by
reinforcement or weakened through punishment.

In classical conditioning the timing of the two
stimuli needs to be very close and the sequencing
is vital.

In operant conditioning, while learning generally
occurs faster when the reinforcement or
punishment occurs soon after the response, there
can be a significant time difference between
them.
THE NATURE OF THE RESPONSE

In classical conditioning the response by the learner is
usually a reflexive, involuntary one.

In operant conditioning, the response by the learner is
usually a voluntary one.

In classical conditioning the response is likely to
involve the action of the autonomic nervous system,
and the association is not conscious or deliberate.

In operant conditioning the response is likely to
involve the central nervous system, and to be
conscious, intentional and often goal-orientated.
Download