Unit 5 – Operant Conditioning

advertisement
Operant Conditioning
The Learner is NOT passive. Learning based on
consequence!!!
Operant Conditioning



Learning controlled by a connection to the consequence of
one’s behavior
Consequences of behavior determine whether it will be
repeated in future
Vs. Classical Conditioning

Behavior is…



CC: elicited, automatic, reflexive
OC: emitted, voluntary, complex behaviors
Reward is…


CC: provided independent of actions
OC: dependent on behavior
B.F. Skinner
•
•
•
The most influential
behaviorist and
proponent of Operant
Conditioning.
Nurture guy through
and through.
Used a Skinner Box
(Operant Conditioning
Chamber) to prove his
concepts.
Skinner

Operant box—non-reflexive behaviors could be altered
by learning
Chaining Behaviors

Subjects are taught a
number of responses
successively in order to
get a reward.
Click picture to see a rat chaining behaviors.
Click to see a cool example of chaining behaviors.
Thorndike’s Puzzle and The Law of Effect
•
•
•
•
•
•
Click picture to see a better
explanation of the Law of Effect.
Edward Thorndike
Locked cats in a cage
Behavior changes because of
its consequences.
If a response is rewarded, that
response is more likely to occur
If consequences are
unpleasant, the StimulusReward connection will
weaken. (LOE)
Called the whole process
instrumental learning.
•
Instrumental behaviors
Thorndike
Operant Conditioning
 Reinforcement
 Increases probability of response


Positive: desirable stimulus is added
Negative: undesirable stimulus is removed
 Punishment
 Decreases probability of response
 Positive: adding something bad
 Negative: removing something good
Reinforcement
 When an event increases the likelihood that a response will occur again
 Positive
 Adding something good
 Designed to increase behavior
 Negative
 Removing something bad
 Designed to increase behavior
Types of reinforcers

Primary vs. secondary



Primary: inherently satisfying to most people
Secondary: gain value from conditioning
Immediate & delayed

Usually needs to be immediate, but humans can handle delayed
reinforcers

Important for self-control
Rat basketball
What type of learning was
this an example of?
Can you explain what
helped the rats learn to
score a basket?
Punishment/Consequence

When an event decreases the likelihood that a response will occur
again

Two types: Positive & Negative
 Positive ≠ Good. POSITIVE = ADD
 Adding something bad
 Designed to decrease behavior
 Negative ≠ Bad. NEGATIVE = SUBTRACT
 Removing something good
 Designed to decrease behavior
Importance of reinforcement





Punishment signals undesirable behavior but doesn’t
inform of desired behavior
Punished behavior is suppressed
Punishment teaches stimulus discrimination
Punishment (esp. physical) teaches fear & aggression
Ignore behavior that one wants to punish; look for what
to reinforce
Punishment tends to be ineffective
 It
tells the organism what not to do, rather than
what to do
 Creates anxiety that can interfere with future
learning
 Encourages subversive behavior (sneakiness)
 Provides a model for aggressive behavior

Only true for some races/cultures
Neg. reinforcement ≠ punishment
The Decision Tree

How to solve operant conditioning problems
Should the behavior
increase or
decrease?
Increase.
(Reinforcement)
Decrease.
(Punishment)
Is something being
added or taken
away?
Added.
(Positive)
Removed.
(Negative)
Review
Punishment
decreases
behavior
Reinforcement
increases
behavior
Positive
Negative
ADD
something
unfavorable
SUBTRACT
something
desirable
ADD
something
desirable
SUBTRACT
something
unfavorable
Applications of Operant Conditioning
Behavior Modification


Started with Thorndike
Altering individual behavior (frequency) through positive and negative
reinforcement and positive and negative punishment


Reduction of behavior through its extinction and punishment



Adaptive behaviors
A.K.A. – Applied Behavior Analysis or Positive Behavior Support (PBS)
A child is riding with an adult, and the child is thirsty. So, the child asks
to stop and get a drink. The adult says no, the child asks again, and
again, and again... Finally, the adult gives in, saying, "All right, just this
once." Big mistake, right? Why? The adult has now put the child on a
partial schedule, guaranteeing a repetition of the same behavior later
on. Instead, the adult should have said, "All right, I'll get you a drink IF
you don't ask for one for the next 10 (time may have to vary,
depending on the child) minutes." Then, the adult is providing the child
with positive reinforcement for being quiet.
Ending a Relationship?????
Behavior Modification




Reinforcement provides a system of rewards and punishments to change negative behavior into positive
responses.

Provides rewards when someone acts in a positive manner. Rewards can range from a compliment to
granting a special privilege to the patient whose behavior becomes desirable.

A negative consequence might be the result of unwanted behavior, with the removal of a favorite
object or taking away a privilege.
Cognitive behavior modification techniques focus on thought patterns that affect behavior,

Involve teaching a patient to recognize thoughts that may be unrealistic or distort reality.

Keeping a journal, role-playing, and being asked to defend thoughts that defy reality.

Eating disorders, anxiety disorder, OCD, Panic attacks
Aversion behavior modification techniques center on the premise that all behavior is learned and can be
unlearned. (aka CC)

Electrical shock treatment is one example of adverse stimuli used to treat deviant behavior.

(Mild) medication given to alcoholics that might make them ill if they drink while using the drug.
The token system provides immediate rewards while setting goals for future conduct.

Distribute a token or similar object each time a patient or student exhibits positive behavior.

Tokens can be amassed and later exchanged for a prize or privilege, or lost due to unwanted behavior.

This form of behavior modification is commonly used in mental institutions and prisons to help
control individuals who show violent tendencies.
Premack principle

A less frequently performed behavior
can be increased by reinforcing it with a
more frequent behavior

Eat your vegetables before you can have
dessert!
Operant Conditioning in Daily Life
Do we wait for the
subject to deliver the
desired behavior?
Sometimes, we use a
process called shaping.
Shaping is reinforcing
small steps on the way
to the desired behavior.
To train a dog to get
your slippers, you would
have to reinforce him in
small steps. First, to
find the slippers. Then
to put them in his
mouth. Then to bring
them to you and so
on…this is shaping
behavior.
To get Barry to become a better student, you
need to do more than give him a massage when
he gets good grades. You have to give him
massages when he studies for ten minutes, or
for when he completes his homework. Small
steps to get to the desired behavior.
Shaping
Reinforcing responses that come successively closer to the desired response
Successive approximations
Shaping

Reinforcers gradually increase organism’s actions toward
desired end behavior

Successive approximations : behaviors closer & closer to end
learning goal get rewarded
1.
Simply turning toward the lever will be reinforced
2.
Only stepping toward the lever will be reinforced
3.
Only moving to within a specified distance from the lever will be reinforced
4.
Only touching the lever with a part of the body will be reinforced
5.
Only touching the lever with a specified paw will be reinforced
6.
Only depressing the lever partially with the specified paw will be reinforced
7.
Only depressing the lever completely with the specified paw will be reinforced
Schedules of reinforcement
•How often to you give the reinforcer?
•Every time or just sometimes you see
the behavior.
Schedules of Reinforcement

Continuous reinforcement schedule:



Reinforcing a response every time
Learning occurs rapidly, extinction occurs rapidly
Partial reinforcement schedule:


Reinforcing a response only some of the time
Slower acquisition, but resistant to extinction


Fixed vs. Variable
Ratio vs. Interval




Fixed ratio: after set # of responses
Variable ratio: after unpredictable # of responses
Fixed interval: after set amount of time has passed
Variable interval: after unpredictable amount of time has passed
Continuous v. Partial Reinforcement
Continuous




Reinforce the behavior
EVERYTIME the
behavior is exhibited.
Usually done when the
subject is first learning
to make the association.
Acquisition comes really
fast.
But so does extinction.
Partial
•
•
•
•
Reinforce the behavior
only SOME of the
times it is exhibited.
Acquisition comes
more slowly.
But is more resistant
to extinction.
FOUR types of Partial
Reinforcement
schedules.
Schedules of reinforcement

Continuous vs. partial
Ratio schedules
1.
Fixed-ratio (FR) schedules:
Reinforcement after a fixed (predictable) number of
responses


2.
Ex: paid $1 for every 20 apples you pick
Variable-ratio (VR) schedules:
Reinforcement after a varying (unpredictable) number of
responses
Induces very high rate of responding



Ex: scratch & win lottery tickets
Interval Schedules
3.
Fixed-interval (FI) schedule:

4.
Reinforcement after a fixed (predictable) amount of time
Variable-interval (VI) schedule:

Reinforcement after varying (unpredictable) amounts of
time
Reinforcement Schedules
Fixed
Variable
Ratio
Interval
after set number
of responses
after set amount
of time
after random
number of
responses
after random
amount of time
Ratio
Fixed
Variable
Interval
Name that Schedule!
A
B
D
C




A.Variable Ratio
C. Variable Interval
B.Fixed Ratio
D. Fixed Interval
Winning at the slot machines
Getting a free flight after accumulating 10,000 flight miles
Receiving an allowance every Saturday regardless of chores,
as long as you’ve done one chore
Random drug testing at your job
Download