U4- Learning 5 - Operant Conditioning

advertisement
3. Operant Conditioning
= A form of learning for which the likelihood of a particular
response occurring is determined by the consequences of that
response.
A response that has a desirable consequence will tend to be
repeated and a response that has an undesirably consequence will
tend not to be repeated.
1. The Stimulus (S) that precedes (comes before) the
operant response.
2. The Operant Response (R) to the stimulus.
3. The Consequence (C) to the operant response.
Stimulus
Operant
Response
Consequence
The three-phase model of
Operant Conditioning
Skinner and his Rats…
(page 480)
•
Skinner placed a hungry animal (a rat
or a pigeon) into a SKINNER BOX. The
box contained a lever and a food tray.
•
The animal would move around the
cage and at some stage it would
accidentally press the lever – releasing
a food pellet.
•
Skinner took note of how many trials it
took for the rat to learn to press the
level straight away.
Skinner concluded that BEHAVIOUR IS SHAPED AND
MAINTAINED BY ITS CONSEQUENCES.
•
•
Meaning, that what happens directly after a behaviour will
determine if that behaviour will be repeated (strengthened) or
will stop (weakened).
Elements of Operant Conditioning
Reinforcement
= Any stimulus that strengthens or increases
the likelihood of a response that it follows.
Can either involve receiving a pleasant stimulus
or ‘escaping’ an unpleasant one.
Reinforcer
= Any stimulus that provides reinforcement,
often referred to as a reward.
Elements of Operant Conditioning
Positive Reinforcement
= The presentation of a PLEASANT stimulus
(consequence) following a desired response,
thereby strengthening the response or making it
more likely to occur again.
E.g. The food pellet.
Or:
We wash the dishes for mum, receive praise
(positive reinforcement) and then are more likely
to do it again.
Any other ideas?
Elements of Operant Conditioning
Negative Reinforcement
(Page 487)
= The removal or avoidance of an unpleasant
stimulus; because the outcome is a pleasant one,
the removal of the unpleasant stimulus is
strengthened or more likely to occur again.
Eg. We take a panadol to get rid of a headache
(We add something to the situation to remove an
unpleasant stimulus. Then more likely to take
panadol again).
OR
On a rainy day we use an umbrella to remove the
unpleasant experience of having wet clothes.
For the maths geeks among us…
To help you remember this difference, you could link the
terms with mathematical symbols:
positive (+) reinforcer = adding something pleasant;
negative (–) reinforcer = subtracting something unpleasant.
In mathematics, two negatives make a positive. The same
applies to the concept of negative reinforcement—the
subtraction of a negative (unpleasant) stimulus results in a
positive (desirable) consequence or outcome.
Elements of Operant Conditioning
SCHEDULES OF REINFORCEMENT:
Reinforcement can be provided on a continuous schedule (that
is, after each correct response) or on a partial reinforcement
schedule (only on SOME occasions after the correct response).
There are four basic schedules of PARTIAL reinforcement.
Each produces a different effect on the response acquisition
rate and the strength of the response.
1. Fixed-Ratio Schedule
2. Variable-Ratio Schedule
3. Fixed-Interval Schedule
4. Variable-Interval Schedule
Elements of Operant Conditioning
SCHEDULES OF REINFORCEMENT:
1. Fixed-Ratio Schedule
A schedule of reinforcement for which a correct
response is reinforced after a SET NUMBER of
correct responses.
The most effective and fastest learned.
EXAMPLES:
A rat pushed the lever 10 times to receive the food. This will
soon see the rat press the lever 10 times very quickly to get the
food.
People who are employed on a ‘piecework’ basis are on a fixedratio schedule. For example, $20 payment for every 100
newspapers sold or $5 for every bucket of cherries picked.
Elements of Operant Conditioning
SCHEDULES OF REINFORCEMENT:
2. Variable-Ratio Schedule
A schedule of reinforcement in which a reinforcer is given after
an UNPREDICTABLE NUMBER of correct responses.
This is also an effective and speedy method as the uncertainty of
when the next reward will occur keeps organisms responding.
EXAMPLES:
A gambler has no way of predicting how many times he must
put a coin in the slot to win but the more times a coin is inserted
the greater the chance of a payout. People who play pokie are
often reluctant to leave them, especially when they have had a
large number of unreinforced responses.
OR
Door to door salesmen. It is uncertain how many houses they
will have to visit to make a sale, but the more houses they try,
the more likely that they will succeed.
Elements of Operant Conditioning
SCHEDULES OF REINFORCEMENT:
3. Fixed-Interval Schedule
A schedule of reinforcement in which a correct response is
reinforced after a SET PERIOD OF TIME has elapsed since the
previous reinforcer.
Usually produces a moderate rate of response, particularly once
the organism works out that time is the key factor.
EXAMPLES:
Eg. A worker who has monthly performance reviews is much more
likely to perform at a higher level in the days just before the
review.
Also, baking a cake for 3o minutes without a timer.
Not going to bother checking much in first few minutes, but from
approx 20 minute (estimation) on you will check more often.
Elements of Operant Conditioning
SCHEDULES OF REINFORCEMENT:
4. Variable-Interval Schedule
A schedule of reinforcement in which a reinforcer is given
after IRREGULAR PERIODS OF TIME have passed, provided
the correct response has been made.
Weakest schedule, a low but steady rate of response.
EXAMPLES:
Fishing – people don’t know if they will get a bite at 20 seconds,
20 minutes, or at all, so the person checks their line every so
often.
Same for emails. If you usually get about 10 emails a day, but
not at consistent times, you might check your mail randomly
throughout the day.
Elements of Operant Conditioning
Punishment
= A negative consequence (an unpleasant event or the
removal of something that is pleasant) following a response
which decreases the likelihood of that response occurring
again.
Punishment can be negative or positive, just like
reinforcement.
EG: Receiving a speeding fine and demerit points for
speeding. These are unpleasant consequences intending to
reduce the behaviour. (+ Positive Punishment)
If you continue to speed, you will accumulate points and
lose your license. This is the removal of something pleasant
as a form of punishment. (- Negative Punishment)
Elements of Operant Conditioning
Punishment
Since negative punishment involves taking a stimulus away
or not obtaining a reinforcer as a consequence of behaviour,
it is often referred to as response cost.
Response cost =
When any valued stimulus is removed, whether
or not it causes the behaviour. (Licence removal)
Elements of Operant Conditioning
Factors that Influence the effectiveness of
Reinforcement and Punishment
(pg 490)
Order of
Presentation
Reinforcement/Punishment need to
be presented AFTER a response.
Never before.
Timing
Reinforcement/Punishment are
most effective when delivered
IMMEDIATELY AFTER the response.
This ensure association between the
response and the consequence.
Appropriateness
Reinforcement/Punishment need to
be relevant to the individual.
Reinforcers need to be desirable
and punishments undesirable.
Key Processes of
Operant Conditioning
Acquisition
= The establishment of a response through reinforcement. The
speed of acquisition depends on the schedule of reinforcement
applied.
COMPARISON TO CLASSICAL CONDITIONING:
Similar to CLASSICAL CONDITIONING in that it refers to the
overall learning process during which a response is established,
but differs with regard to HOW the behaviour is learned.
Key Processes of
Operant Conditioning
Extinction
= The gradual decrease in the strength or rate of a conditioned
(learned) response following consistent non-reinforcement of
the response.
Extinction is less likely to occur when partial reinforcement
occurs because the reinforcement itself is uncertain. For
example, the gambler is USED to the reward being
unpredictable.
COMPARISON TO CLASSICAL CONDITIONING:
Occurs after the removal of the REINFORCEMENT rather than the
unconditioned stimulus.
Key Processes of
Operant Conditioning
Spontaneous Recovery
= The return of a response in expectation of a reinforcer after a
period of extinction. The response is likely to be weaker and will
not last long.
COMPARISON TO CLASSICAL CONDITIONING:
Very similar to classical conditioning in explanation. (Different
components but same concept).
Key Processes of
Operant Conditioning
Stimulus Generalisation
= Occurs when the correct response is made to another stimulus
that is similar (not necessarily identical) to the stimulus that
initially produced the response. EG. The pigeon that initially
learned to peck a green light to receive food also pecked yellow
and red lights.
COMPARISON TO CLASSICAL CONDITIONING:
Very similar to classical conditioning in explanation. (Different
components but same concept).
Key Processes of
Operant Conditioning
Stimulus Discrimination
= Occurs when an organism makes the correct response to a
stimulus and is reinforced but does not respond to any other
stimulus even if they are similar.
Eg. Skinner trained his pigeons to discriminate between the red
and the green light by only rewarding them when they pecked
the green.
READ pg 496 – Sniffer Dogs.
COMPARISON TO CLASSICAL CONDITIONING:
Very similar to classical conditioning in explanation. (Different
components but same concept).
Further Comparisons between
Classical and Operant Conditioning
Classical Conditioning
Operant Conditioning
The Role of the
Learner
Learner is Passive,
Doesn’t have to do
anything for learning to
occur.
Has no control.
Learner is Active. Must
perform some activity to
receive consequence.
Has control over learning.
Timing of the
Stimulus and
Response
Response (salivation)
requires presentation of
the UCS (Meat) first.
Timing of CS then UCS
needs to be immediate.
The presentation of the
reinforcer/punishment
relies on the response
occurring first.
Timing can be further apart
(eg. variable interval)
The Nature of
the Response
Response by learner is
usually reflexive and
involuntary. (salivating,
blinking.)
Response usually voluntary
(pressing lever).
Download