Learning
Chapter 5
Definition of Learning
 Learning is any relatively permanent change in
behavior brought on by experience or practice
 “relatively permanent” refers to the fact that when people learn
anything, some part of their brain is physically changed to
record what they’ve learned
 This is actually a process of memory, without the ability to remember,
people cant learn anything
 “experience or practice” refers to the tendency for behavior to
differ based on the experience of specific events
 If a behavior results in a positive experience, it is likely to occur again
 If a behavior results in a negative experience, it is not likely to occur again
Classical Conditioning
 Ivan Pavlov – studied the digestive system of dogs
 Reflex – an involuntary response that is not under personal control or
choice
 Ex. Dogs salivate when they receive food
 Stimulus – any object, event, or experience that causes a response (the
reaction of an organism)
 Ex. Food given to dogs that causes the reflexive response of salivation
 Pavlov noticed that his dogs were salivating when they weren’t supposed to
 Some would start salivating when they saw the lab assistant bringing their food,
some when they heard the clatter of the food bowl from the kitchen, some when
it was the time of day when they were usually fed
 Thus, Pavlov switched his focus to study these responses and eventually
termed the phenomenon classical conditioning
 Learning to make an involuntary (reflex) response to a stimulus other than the
original, natural stimulus that normally produces the reflex
Elements of Classical Conditioning
 Unconditioned stimulus (UCS) – the original, naturally occurring stimulus that leads to
an involuntary (reflex) response
 Unconditioned because it is unlearned
 In Pavlov’s research, food is the UCS
 Unconditioned response (UCR) – an involuntary (reflex) response to a naturally
occurring stimulus or UCS
 Also unlearned, occurs because of genetic “wiring” in the nervous system
 In Pavlov’s research, salivation is the UCR
 Conditioned stimulus (CS) – stimulus that becomes able to produce a learned reflex
response by being paired with the original UCS
 Almost any stimulus can become associated with a UCS if it is paired with UCS often enough
 Before a stimulus is associated with a UCS it is called a neutral stimulus (NS) – stimulus
that has no effect on the desired response
 After being paired with the UCS enough times to produce the reflexive response alone, the NS
becomes the CS
 In Pavlov’s research, lab assistant bringing food, bell, or clatter of the food bowl is the CS
 Conditioned response (CR) – learned reflex response to a CS
 Is usually not as strong as the original UCR, but is essentially the same response
 In Pavlov’s research salivation in response to the lab assistant or clatter of the food bowl is the CR
Pavlov’s Famous Experiments
 Paired the ticking of a metronome with the presentation of food
 Because the metronome’s ticking didn’t normally produce salivation it was
the NS before any conditioning took place
 CR and UCR are both salivation
 They differ because they are in response to different things
 UCR occurs after a UCS
 CR occurs after a CS
 Food (UCS) produces salivation (UCR)
 Food (UCS) is repeatedly paired with sound of the metronome (NS)
 After pairings: sound of the metronome (CS) produces salivation (CR)
Pavlov’s Basic Principles of Classical
Conditioning
 The CS must come before the USC
 If the sound of the metronome came just after the dogs received food, they did not
become conditioned
 The CS and UCS must come very close together in time – no more than 5
seconds
 When the time between the potential CS and the UCS was extended to several
minutes, no association between the two was made
 Too much could happen in the longer interval of time to interfere with conditioning
 Recent studies have found that the interstimulus interval (ISI), the time between
the CS and UCS can vary depending on the nature of the conditioning task and even
the organism being conditioned
 Shorter ISIs (less than 500 milliseconds) have been found to be ideal for conditioning
 The NS must be paired with the UCS several times
 Often many pairings are necessary
 The CS is usually some stimulus that is distinctive, or stands out, from other
competing stimuli
 The metronome was a sound that was not normally present in the laboratory and,
therefore, was distinct
Stimulus Generalization and
Discrimination
 Stimulus generalization – the tendency to respond to a stimulus that is
similar to the original conditioned stimulus with the conditioned response
 Pavlov noticed that similar sounds to the metronome would produce a similar
CR
 Strength of the response is not as strong as to the original CR
 Stimulus discrimination – the tendency to stop making a generalized
response to a stimulus that is similar to the original CS because the similar
stimulus is never paired with the UCS
 Pavlov never paired the sounds similar to the metronome with food
 Because only the real CS (metronome) was followed with food (UCS) the dogs
learned to tell the difference, or discriminate, between the “fake” sounds and the
actual CS
 This occurs when an organism learns to respond to different stimuli in different
ways
Extinction and Spontaneous Recovery
 Extinction – the disappearance or weakening of a learned response following the
removal or absence of the UCS
 When the CS (metronome’s ticking) was repeatedly presented without the UCS
(food), the CR (salivation) “died out” or stopped occurring
 In theory, this occurs because new learning has taken place
 During extinction, the CS-UCS association that was learned, weakens, as the CS no longer
predicts the UCS
 For Pavlov’s dogs, they learned not to salivate to the metronome because it no longer predicted
food
 Spontaneous recovery – the reappearance of a learned response after extinction
has occurred
 After extinction, Pavlov waited a few weeks before letting the dogs hear the
metronome again, when he brought it back, they began to salivate again
 This brief recovery of the CR shows that the CR is still retained even after extinction
(remember that learning is relatively permanent), so something that is learned is really
“still in there” even after extinction
 It is just suppressed or inhibited by the lack of an association with the UCS
 As time passes, this inhibition weakens, especially if the original CS has not been
present for a while
Higher-Order Conditioning
 Occurs when a strong CS is paired with another NS, causing
the NS to become a second CS
 At the point that Pavlov’s dogs were strongly conditioned to
salivate (CR) when they heard the metronome (CS) if another
sound, like a snap (NS), occurred just before the metronome
(CS) enough times, the snap (NS) would become a CS and
produce salivation (CR) by itself
 Food (UCS) would have to be presented every now and then to
maintain the original CR of salivation to the metronome (CS)
 Without the UCS the higher-order conditioning would be
difficult to maintain and would gradually fade away
Conditioned Emotional Responses
 Conditioned emotional responses – emotional response that has become
classically conditioned to occur to learned stimuli
 John Watson’s classic “Little Albert” experiment demonstrated the classical
conditioning of a phobia (an irrational fear response)
 Presentation of a white rat was paired with a loud scary noise until Albert feared the
white rat
 Before conditioning: white rat = NS
 During conditioning: white rat (NS) paired with loud noise (UCS) to produce fear
(UCR)
 After conditioning: white rat (CS) produces fear (CR)
 In advertising, commercials often use things that are known to produce an
emotional response in hopes that the emotional response will become associated
with their product (ex. Attractive women or cuddly puppies)
Vicarious Conditioning
 Vicarious conditioning – classical conditioning of a
reflex response or emotion by watching the reaction of
another person
 Ex. Children used to receive vaccination shots in school
 The nurse would line children up, and one by one they would
receive the shot
 When some children received their shots, they cried a lot
 By the time the nurse got to the end of the line of children, they
were all crying, many of them before the needle even touched
their skin
 The children had learned their fear response to the shot from
watching the reactions of the children who went before them
Other Conditioned Responses
 Conditioned taste aversion – development of a nausea or aversive response to a
particular taste because that taste was followed by a nausea reaction, occurring after
only one association
 Ex. The chemotherapy drugs that cancer patients receive can create severe nausea,
which usually causes them to develop a taste aversion to anything they eat up to 6
hours before the treatment
 Biological preparedness – the tendency of animals to learn certain associations,
such as taste and nausea, with only one or few pairings due to the survival value of
the learning
 Ex. If an animal eats something that makes them sick, they are likely to avoid that
food in the future, which increases their chances of survival and passing on their
genes to future generations
 These 2 types of conditioning violate 2 of Pavlov’s basic principles
 The pairing of the CS and USC being close in time
 Taste aversion can develop even if the food was eaten a considerable time before nausea occurs
 It should take multiple pairing of the CS and UCS to achieve conditioning
 Because of biological preparedness, taste aversion can occur with only one or few parings of the
stimulus food with the nausea response
Why Does Classical Conditioning Work?
 2 ways to explain how one stimulus can come to “stand for” another
 Stimulus substitution – Pavlov’s original theory
 Suggested that the CS, through its association close in time with the UCS, came to
activate the same place in the brain that was originally activated by the UCS
 But if a mere association in time is all that is needed, why would conditioning not
work when the CS is presented immediately after the UCS
 Cognitive perspective – modern explanation
 Suggests that the CS provides information or an expectancy about the coming of




the UCS
The CS has to provide some kind of information about the coming of the UCS in
order to achieve conditioning
If the CS comes after the UCS it can’t provide any information about when the UCS
is coming
Ex. If rats experience an electric shock (UCS) while a specific tone (NS) is played,
they will expect a shock (UCS) to occur during the tone (CS) and become anxious
(CR) when they hear the tone
But if the shock (UCS) comes immediately after the tone stops (NS), they will act
normally when hearing the tone and anxious (CR) when it stops (CS), because they
expect that during the tone a shock will not occur
Operant Conditioning
 Classical conditioning is the kind of learning that occurs
with reflexive, involuntary behavior
 Operant conditioning is the kind of learning that applies
to voluntary behavior
 Operant conditioning – the learning of voluntary
behavior through the effects of pleasant and unpleasant
consequences to responses
Thorndike’s Puzzle Box: How to
Frustrate a Cat
 Thorndike would place a hungry cat inside a “puzzle box” from which the




only escape was to press a lever on the floor of the box
A bowl of food was placed outside the box, so the hungry cat would be highly
motivated to get out
The cat would move around the box, pushing and rubbing against the walls
trying to escape and would eventually push the lever by accident and open
the door
The lever is the stimulus, the pushing of the lever is the response, and the
consequence is both escape from the box and food
After a number of trials the cat took less and less time to push the lever
 Its important not to assume the cat had “figured out” the connection between the
lever and freedom, Thorndike kept moving the lever to a different position, and
the cat had to learn the whole process over again
 The cat would simply push and rub around the same area that had worked the
last time and each time found the lever a little more quickly
Thorndike’s Law of Effect
 Based on his “puzzle box” research Thorndike developed the
law of effect
 If an action is followed by a pleasurable consequence, it will
tend to be repeated, and if followed by an unpleasant
consequence, it will tend not to be repeated
 This is the basic principle behind learning voluntary behavior
 In the case of the “puzzle box,” pushing of the lever was
followed by a pleasurable consequence (freedom and food), so
pushing the lever became a repeated response
B.F. Skinner: The Next Behaviorist
 Skinner took leadership of behaviorism after Watson
 He combined the work of Pavlov and Thorndike into a way to
explain that all behavior is the product of learning
 Skinner is who actually termed the learning of voluntary behavior
operant conditioning
 Voluntary behavior is what people and animals do to operate in the
world
 Important distinction between operant and classical conditioning
 In classical conditioning, learning a reflex depends on what comes
BEFORE the response (UCS), and what will become the CS
 In operant conditioning, learning depends on what happens AFTER the
response, the consequence
The Concept of Reinforcement
 Reinforcement – any event or stimulus, that when following a
response, increases the probability that the response will occur again
 Typically, reinforcement is pleasurable
 But, reinforcement can also be negative, like avoiding something
unpleasant
 Ex. When a behavior causes the removal of pain
 Skinner’s research involved something called a “Skinner box” or
“operant conditioning chamber”
 Often involved placing a rat into one of the chambers and training it to
push down on a bar to get food
Primary and Secondary Reinforcers
 Reinforcers are not all alike, there are 2 types
 Primary – any reinforcer that is naturally reinforcing by
meeting a basic biological need, such as hunger or touch
 Infants, toddlers, preschool age children, and animals can be easily
reinforced with primary reinforcers
 Ex.You can reinforce a toddlers behavior with candy
 Secondary – any reinforcer that becomes reinforcing after
being paired with a primary reinforcer, such as praise, money,
or gold stars
 Ex. Money can be a reinforcer because it is associated with the
ability to obtain (purchase) things that meet basic needs, such as
food and shelter
=
Positive and Negative Reinforcement
 Reinforcers can also differ in the way they are used
 Positive reinforcement – the reinforcement of a response
by the addition or experiencing of a pleasurable stimulus
 Ex. Every time a rat presses a bar it receives food. The rat’s
pressing of the bar is positively reinforced by the pleasurable
reward of food
 Negative reinforcement – the reinforcement of a
response by the removal, escape from, or avoidance of an
unpleasant stimulus
 Ex. If during a mild electric shock, if a rat presses a bar the
shock stops. The rat’s pressing of the bar is negatively reinforced
by the removal of the painful shock stimulus
Schedules of Reinforcement
 The timing of reinforcement can have a tremendous difference in the
speed at which learning occurs and the strength of the learned
response
 Consider this scenario: Heather’s mother gives her a quarter every
night she remembers to put her dirty cloths in the hamper. Sean’s
mother gives him gives him a dollar at the end of the week, but only
if he has put his cloths in the hamper every night that week.
 Which child will learn to put their cloths in the hamper more quickly?
 After both Heather and Sean have been conditioned to put their dirty
cloths in the hamper, if both mothers stop giving money, which child is
more likely to continue to putting their dirty cloths in the hamper the
longest?
The Partial Reinforcement Effect
 Continuous reinforcement – the reinforcement of each and every
correct response
 Responses that are reinforced each time they occur are more easily and
quickly learned
 Ex. Therefore, because Heather was reinforced every night with a quarter,
she will learn the association faster than Sean
 Partial reinforcement effect – the tendency for a response that is
reinforced after some, but not all, correct responses to be very resistant
to extinction
 Ex. Sean expected to get a reinforcer only after 7 correct responses, when
his reinforcers stop, he might continue to put his dirty cloths in the hamper
for several more days or even another week or so, hoping that the reinforcer
will eventually come anyway
 Heather will probably stop putting her dirty cloths in the hamper more
quickly than Sean because she expects to be reinforced after every correct
response
The Partial Reinforcement Effect
 Partial reinforcement can be accomplished according to different patterns or schedules
 It might be a certain interval of time that’s important
 When timing of the response is more important, it is called an interval schedule
 Ex. If an office safe can only be opened at a specific time of day, it wouldn’t matter how many times
a person tried to open it because it would only work at a specific time
 Ex. A rat can only get 1 food pellet for pressing a lever every 2 hours, regardless of how many times
the bar is pressed
 Or it might be the number of responses required that’s important
 When the number of responses is more important, the schedule is called a ratio schedule, because
a certain number of responses is required for each reinforcer
 Ex. If a person had to sell a certain number of raffle tickets in order to get a prize
 Ex. A rat must press a bar 10 times to get a food pellet, regardless of how long it takes
 Another way schedules of reinforcement can differ is in whether the number of responses or
interval of time is fixed (the same every time) or variable (a different number or interval is
required in each case)
 So it’s possible to have a fixed interval schedule, a variable interval schedule, a fixed ratio schedule,
and a variable ratio schedule
Fixed Interval Schedule of Reinforcement
 Fixed Interval Schedule – schedule of reinforcement in
which the interval of time that must pass before
reinforcement becomes possible is always they same
 Ex. Receiving a paycheck at the end of each week
 If you were teaching a rat to press a lever to get food pellets,
you might require it to push the lever at least once within a 2
minute time span to get a pellet
 It wouldn’t matter how many times the rat pushed the bar; the
rat would only get a pellet at the end of the 2 minute interval if
it had pressed the bar at least once
Fixed Interval Schedule of Reinforcement
 Fixed interval schedule of reinforcement does not produce a fast rate
of responding
 Since it only matters that at least one response is made during the
specific interval of time, speed is not that important
 Eventually the rat will start pushing the lever only as the interval of
time nears its end, which is what causes the “scalloping” effect seen in
the graph
 The response rate goes up just before the reinforcer and then drops off
immediately after, until it is almost time for the next reinforcer
Ex. This is similar to the way
factory workers speed up
production just before payday
and slow down just after payday
Variable Interval Schedule of Reinforcement
 Variable interval schedule of reinforcement – the interval of time
that must pass before reinforcement becomes possible is different for each
trial or event
 A rat might receive a food pellet when it pushes a lever, every 5 minutes on
average, but sometimes the interval might be 2 minutes, sometimes 10
 But the rat must push the lever at least once after that 2 or 10 minute interval to
get the pellet
 Because the rat cant predict how long the interval is going to be, it pushes the bar
more or less continuously, producing the smooth line on the graph
Ex. Dialing a busy
phone number, because
you don’t know when
the call will go through,
you keep dialing and
dialing
Ex. Pop quizzes are
unpredictable, students
don’t know exactly what
day they might be given a
quiz, so the best strategy is
to study a little every night
just in case and show up to
class…
Fixed Ratio Schedule of Reinforcement
 Fixed ratio schedule of reinforcement – the number of responses required
for reinforcement is always the same
 Notice 2 things about the graph
 The rate of responding is very fast, especially compared to the fixed interval schedule
 Rapid response rate occurs because the rat wants to get to the next reinforcer as fast as possible,
and the number of lever pushes counts
 There are little “breaks” in the response pattern immediately after a reinforcer is
given
 The pauses or breaks come right after a reinforcer, because the rat knows “about how many” lever
pushes will be needed to get to the next reinforcer because it’s always the same
 Fixed schedules, both interval and ratio, are predictable, which allow rest breaks
Ex. Some sandwich shops give out
punch cards to their customers that
get punched every time they buy a
sandwich, when the card has 10
punches, the customer might get a
free sandwich
Variable Ratio Schedule of Reinforcement
 Variable ratio schedule of reinforcement – the number of responses required
for reinforcement is different for each trial or event
 The rat might be expected to push the bar an average of 20 time to get
reinforcement, that means that sometimes the rat would push the lever 10 times
before a reinforcer comes, but on other trials it might take 30 presses or more
 In the graph, the line is just as rapid a response rate as the fixed ratio schedule
because the number of responses still matters
 But the graph is much smoother because the rat is taking no rest breaks because it
doesn’t know how many times it may have to push the lever to get the next food pellet
 Unpredictability makes the variable schedule responses more or less continuous
Ex. People who put money into a slot machine
continuously, do so because the don’t know how
many times they will have to do this until the
jackpot comes. They do this continuously because
“the next one” might hit the jackpot. The same is
true with lottery tickets and pretty much any sort
of gambling
Comparison of Reinforcement
Schedules
Additional Factors to Effective
Reinforcement
 Regardless of the schedule of reinforcement, 2 additional factors contribute
to making reinforcement of a behavior as effective as possible
 Timing
 In general, a reinforcer should be given as immediately as possible after the
desired behavior
 Delaying reinforcement tends not to work well, especially when dealing with
animals and small children
 Reinforce only the desired behavior
 This should be obvious, but everyone makes mistakes sometimes
 Ex. Many parents make the mistake of giving a child who has not done some
chore the promised treat anyway, which completely undermines the child’s
learning of that chore or task
 Also, who hasn’t given a treat to a pet that has not really done the trick?
 Examples: which kind of reinforcement is going on?
 Andy’s father nags him to wash his car. Andy hates being nagged, so he
washes the car so his father will stop nagging.
 Negative reinforcement, washing his car removes the unpleasant stimulus of
his father nagging
 Bradley learns that talking in a funny voice gets him lots of attention
from his classmates, so now he talks that way often.
 Positive reinforcement, increasing use of the voice to get attention
 Tina is a server at a restaurant and always tries to smile and be
pleasant because that seems to lead to bigger tips
 Positive reinforcement, Tina’s smiling and pleasantness are reinforced by
better tips
 Will turns his report in to his teacher on the day it is due because
papers get marked down a letter grade for every day they are late
 Negative reinforcement, avoiding the unpleasant stimulus of being marked
down a grade by turning in a paper on time
The Role of Punishment in Operant
Conditioning
 Thinking back to positive and negative reinforcement
 These strategies are important for increasing the likelihood that
the targeted behavior will occur again
 But what about a behavior we do not want to occur again?
 Punishment…
How Does Punishment Differ From
Reinforcement?
 People experience 2 kinds of things as consequences in the world
 Things they like (ex. Food, money, candy, sex, praise, etc.)
 Things they don’t like (ex. Spankings, being yelled at, experiencing
any kind of pain, etc.)
 Additionally, people experience these two kinds of consequences
in 1 of 2 ways
 Directly (ex. Getting money for working or getting yelled at for
misbehaving)
 Or they don’t experience them at all (ex. Losing an allowance for
misbehaving or avoiding a scolding by lying about misbehavior)
4 Ways to Modify Behavior
Positive (Adding)
Negative
(Removing/Avoiding)
Reinforcement
Punishment
Something valued or desirable
Something unpleasant
Positive Reinforcement
Ex. Getting a gold star for
good behavior
Punishment by Application
Ex. Getting a spanking for
disobeying
Something unpleasant
Something valued or desirable
Negative Reinforcement
Ex. Avoiding a ticket by
stopping at a red light
Punishment by Removal
Ex. Losing a privilege such as
going out with friends
2 Kinds of Punishment
 Punishment – any event or object that, when following a
response, makes that response less likely to happen again
 Punishment by application – the punishment of a response
by the addition or experiencing of an unpleasant stimulus
 This is the kind of punishment people usually think of
 Ex. Spanking
 Punishment by removal – the punishment of a response by the
removal of a pleasurable stimulus
 This is the kind of punishment people normally confuse with negative
reinforcement
 Ex. “grounding” a teenager is removing the freedom to do what the teenager
wants to do
Negative Reinforcement VS. Punishment by Removal
Example of Negative Reinforcement
Example of Punishment by Removal
Stopping at a red light to avoid getting in an
accident
Losing the privilege of driving because you
got into too many accidents
Mailing an income tax return by April 15 to
avoid paying a penalty
Having to lose some of your money to pay
the penalty for late tax filing
Obeying a parent before the parent reaches
the count of 3 to avoid getting a scolding
Being “grounded” (losing your freedom)
because of disobedience
 Negative reinforcement occurs when a response is followed by the removal of an unpleasant
stimulus
 If something unpleasant has just gone away as a consequence of that response, the response
will tend to happen again
 If the response increases, the consequence has to be some kind of reinforcement
 Punishment by removal occurs when a response if followed by the removal of a pleasant
stimulus
 If something pleasant is taken away as a consequence of a response, the response probably
will not happen again
 If the response decreases, the consequence has to be some type of punishment
 In both, something is removed, but the difference between them is what is taken away and
the result it has on behavior
Problems With Punishment
 Although punishment can be effective in reducing or weakening a behavior, it has several
drawbacks
 Punishment is used to weaken a response, and getting rid of a response that is already well
established isn’t easy
 In reinforcement, all that has to be done is strengthen an already existing response
 Punishment usually serves to temporarily suppress or inhibit a behavior until enough time
has passed
 Ex. Punishing a child’s bad behavior doesn’t always eliminate the behavior completely
 As time goes on, the punishment is forgotten, and the “bad” behavior may occur again in a
kind of spontaneous recovery of the old (probably pleasurable) behavior
 Punishment by application can be pretty severe, and severe punishments do one thing well: it
stops the behavior immediately
 It may not stop it permanently, but it does stop it
 In a situation in which a child might be doing something dangerous or self-injurious, this kind
of punishment is sometimes more acceptable
 Ex. If a child starts to run into a busy street, the parent might scream at the child to stop and
then administer several rather severe swats to the child’s rear
 If this is not usual behavior for the parent, the child will most likely never run into the street
again
Problems With Punishment
 Other than situations of immediately stopping dangerous behavior,
severe punishment has too many drawbacks to be really useful (it can
also lead to abuse)
 Severe punishment may cause a child (or animal) to avoid the punisher
instead of the behavior being punished, so the child (or animal) learns
the wrong response
 Severe punishment may encourage lying to avoid the punishment (a kind
of negative reinforcement), again, not the response that is desired
 Severe punishment creates fear and anxiety, emotional responses that do
not promote learning, if the point is to teach something, this kind of
consequence isn’t going to help
 Hitting provides a successful model for aggression
Problems With Punishment
 Punishment as a model for aggression
 The adult is using aggression to get he/she wants from the child
 Children sometimes become more likely to use aggression to get what they want
when they receive this kind of punishment
 And, the adult has lost an opportunity to model a more appropriate way to deal
with parent-child disagreements
 Since aggressive punishment does tend to stop the undesirable behavior, at least for
a little while, the parent actually experiences a kind of negative reinforcement
 When they spank, the unpleasant behavior goes away
 This may increase the tendency to use aggressive punishment over other forms of
discipline and can lead to child abuse
 Some children are so desperate for their parents’ attention that they will misbehave
on purpose
 The punishment is a form of attention, and these children will take whatever
attention they can get, even if it is negative
Problems With Punishment
 Punishment by removal is less objectionable and is the only
kind of punishment that is permitted in many public schools
 But this kind of punishment also has drawbacks
 It teaches the child what not to but not what the child should do
 Both punishment by removal and punishment by application
are usually only temporary in their effect on behavior
 As time passes, the behavior will most likely return as the
memory of the punishment gets weaker, allowing spontaneous
recovery of the negative behavior
How to Make Punishment More Effective
 Punishment should immediately follow the behavior it is meant to punish
 If the punishment comes long after the behavior, it will not be associated with that
behavior (also true for reinforcement)
 Punishment should be consistent
 If the parent says that a certain punishment will follow a certain behavior, the parent
must make sure to follow through and do what he/she promised
 Punishment for a particular behavior should stay at the same intensity or increase
slightly but never decrease
 Ex. If a child is scolded for jumping on the bed the first time, the second time the behavior happens
the child should be punished by scolding or by a stronger penalty, like removal of a favorite toy
 But if the first misbehavior is punished by spanking and the second only by a scolding, the child
learns to “gamble” with the possible punishment
 Punishment of the wrong behavior should be paired with reinforcement of the right
behavior
 Pairing punishment with reinforcement allows parents and others to use a much
milder punishment and still be effective
 It also teaches the desired behavior rather than just suppressing the undesired one
 Ex. If a 2 year old is eating with her fingers, the parent should pull her hand gently out of her plate
and say something like “No, we don’t eat with our fingers, we eat with our fork.” then place the fork
in the child’s hand and praise her for using it, “See, you are doing such a good job with your fork,
I’m so proud of you!”
Stimulus Control
 Discriminative stimulus – any stimulus that provides the organism
with a cue for making a certain response in order to obtain
reinforcement
 Specific cues lead to specific responses, and discriminating between cues
leads to success
 Ex. A police car is a discriminative stimulus for slowing down and a
red stoplight is a cue for stopping because both of these actions are
usually followed by negative reinforcement, people don’t want to get
a ticket or get hit by another car
 Ex. A doorknob is a cue for where to grab a door to open it
 If a door has a knob, people always turn it, but if it has a handle, people
usually pull it
 The 2 kinds of opening devices each cause a different response from
people, and their reward is opening the door
Other Concepts in Operant Conditioning
 Shaping – the reinforcement of simple steps in behavior that lead to a
desired more complex behavior
 Ex. If you wanted to train your dog to jump through a hoop, you would have




to start with some behavior that the dog is already capable of doing on its own
Then gradually mold that starting behavior into a jump (something the dog is
capable of doing but not likely to do on its own)
You would start with the hoop on the ground in front of the dog and then call
the dog through the hoop, using a treat as bait
After the dog steps through the hoop, you give the dog a treat (positive
reinforcement)
The next time, you could raise the hoop a little, reward the dog for walking
through it again, the raise the hoop again, reward again, and so on
 The goal is achieved by reinforcing each Successive approximation
 Successive approximations – small steps in behavior, one after the other,
that lead to a particular goal behavior
Other Concepts in Operant Conditioning
 Extinction in operant condition involves the removal of the reinforcement (in classical
conditioning, extinction involves the removal of the UCS)
 Ex. If a child is throwing a tantrum to get a candy bar, if the parent does not cave in and
removes the reinforcement (the candy bar) and if possible parental attention, the tantrum
will eventually stop
 Operantly conditioned responses can also be generalized to stimuli that are similar to the
original stimulus (just like in classical conditioning)
 Ex. When a baby is first learning to label objects and people, he may say “Dada” when his
father is present, and the father reinforces the behavior with praise and attention
 But sometimes the baby will call all men “Dada,” but over time as other men fail to reinforce
this response, he’ll learn to discriminate among them and his father and only call his father
“Dada”
 In this way, the man who is actually his father becomes a discriminative stimulus
 Spontaneous recovery also occurs in operant conditioning (just like in classical
conditioning)
 Ex. In the example of teaching the dog to jump through the hoop, if the dog has already
learned other tricks, like rolling over or shaking paws, when learning a new trick the dog
may try to get a reinforcer by performing its old tricks, before finally walking through the
hoop
Using Operant Conditioning: Behavior
Modification
 Behavior modification – the use of operant conditioning techniques to bring about
desired changes in behavior
 Used for many years to change undesirable behavior and create desirable responses in animals
and humans, particularly in school children
 If a teacher wants to use behavior modification to help a child learn to be more attentive
during lectures
 Select a target behavior, such as making eye contact with the teacher
 Choose a reinforcer, such as a gold star applied to the child’s chart on the wall
 Every time the child makes eye contact, the teacher gives the child a gold star. Inappropriate
behavior, such as looking out the window, is not reinforced with gold stars
 At the end of the day, the teacher gives the child a special treat or reward for having a certain
number of gold stars (reward is decided ahead of time and discussed with the child)
 The gold stars in the example above, can be considered tokens, secondary reinforcers that can
be traded in for other kinds of reinforcers
 Token economy – type of behavior modification in which desired behavior is rewarded with
tokens
 Commonly used in programs like Alcoholics Anonymous
Using Operant Conditioning: Behavior
Modification
 Another tool behaviorists use to modify behavior is called
time-out
 Time-out – form of mild punishment by removal in which a
misbehaving animal, child, or adult is placed in a special area
away from the attention of others
 Essentially, the organism is being “removed” from any possibility
of positive reinforcement in the form of attention
Horrible but
hilarious time
out method…
Using Operant Conditioning: Behavior
Modification
 Applied behavior analysis (ABA) – modern term for a form of
behavior modification that uses both analysis of current behavior and
behavioral techniques to address a socially relevant issue
 Ex. ABA has been used as a technique involving shaping to teach social
skills to individuals with Autism
 Small pieces of candy are used as reinforcers to teach social skills and
language to children with autism
 In ABA, skills are broken down into their simplest steps and then
taught to the child through a system of reinforcement
 Prompts (such as moving a child’s face to look at a teacher on a task) are
given as needed when the child is learning a skill or refuses to cooperate
 As the child begins to master a skill and receives reinforcement in the
form of treats or praise, the prompts are gradually taken away until the
child can do the skill independently
Using Operant Conditioning: Behavior
Modification
 Techniques for modifying responses have been developed so that even
biological responses, normally considered involuntary, such as blood
pressure, muscle tension, and hyperactivity can be brought under conscious
control
 Biofeedback – using feedback about biological conditions to bring
involuntary responses, such as blood pressure and relaxation, under
voluntary control
 Relatively newer biofeedback technique, called neurofeedback involves
trying to change brain-wave activity
 Involves amplifiers connected to a computer that records and analyzes the
physiological activity of the brain
 Neurofeedback can be integrated with video-game-like programs that individuals
can use to learn how to produce brain waves or specific types of brain activity
associated with specific cognitive or behavioral states (ex. increased attention,
staying focused, relaxed awareness)
Cognitive Learning Theory
 Behaviorists believed that only observable, measurable behavior
should be studied
 But, other psychologists had an interest in cognition, the mental
events that take place inside a person’s mind while behaving
 These individuals began to dominate the field of experimental
psychology
 Behaviorists could no longer ignore the thoughts, feelings, and
expectations that clearly existed in the mind and that seemed to
influence observable behavior
 They eventually began to develop a cognitive learning theory to
supplement the more traditional theories of learning (conditioning)
 There are 3 important people that are often cited as key theorists in the
early days of the development of cognitive learning theory
 Gestalt psychologists Edward Tolman and Wolfgang Kohler, and modern
psychologist Martin Seligman
Latent Learning: Tolman’s Maze-Running
Rats
 Tolman’s best-known experiments in learning involved teaching 3
groups of rats the same maze, one at a time
 1st group – each rat was placed in the maze and reinforced with food for
making its way out the other side
 The rat was then placed back in the maze, reinforced when it completed the maze
again, and so on until the rat could successfully solve the maze without making
any errors (like wrong turns)
 2nd group – rats were treated exactly like the first group except they
didn’t get any reinforcement when they exited the maze, they were
simply put back over and over again for the 1st 9 days
 On the 10th day, the rats began to receive reinforcement for getting out of the
maze
 3rd group – served as a control group and were not reinforced over the
entire course of the experiment
Latent Learning: Tolman’s Maze-Running Rats
 A behaviorist would predict that only the 1st group of rats would learn the maze,
because learning depends on reinforcement
 At first, the 1st group of rats solved the maze after a certain number of trials
 Whereas the 2nd and 3rd groups seemed to wander aimlessly around until
accidentally finding their way out
 On the 10th day, the first time the 2nd group was reinforced, they solved the maze
almost immediately
 Rats in the 2nd group , while wandering around in the first 9 days, had learned how
to navigate the maze successfully and had stored this knowledge as a kind of “mental
map,” or cognitive map of the layout of the maze
 The rats in the 2nd group had not demonstrated their learning of the maze in the
first 9 days because they had no reason to
 The cognitive map has remained hidden, or latent, until the rats had a reason to use
it, getting reinforced with food for completing the maze
 Tolman called this latent learning – learning that remains hidden until its
application becomes useful
 The idea that learning could happen without reinforcement, and then later affect behavior, was
something traditional operant conditioning could not explain
Insight Learning: Kohler’s Smart Chimp
 Kohler was a Gestalt psychologist who became marooned on an island
off the coast of North Africa when WWI broke out
 At the time, he was working at a primate research lab on the island
and began to study animal learning
 In one famous study, Kohler set up a problem for a chimpanzee
named Sultan
 The problem was how to get to a banana that was placed just out of his
reach outside his cage
 Sultan solved the problem relatively easily, first trying to reach through
the bars with his arm, then using a stick that was lying in the cage to rake
the banana to him
 As chimpanzees are natural tool users this only demonstrates trial-anderror learning
Insight Learning: Kohler’s Smart Chimp
 Then, the banana was placed just out of reach of Sultan’s extended arm with
the stick in his hand
 There were two sticks lying around in the cage, which could be fitted together to
make a single pole that would be long enough to reach the banana
 Sultan first tried one stick, then the other, and after about an hour he
pushed one stick out of the cage as far as it would go toward the banana and
then pushed the other stick behind the first one
 Of course when he tried to pull the sticks back, he could only get the one in his
hand
 When Kohler gave him the stick back, he sat on the floor of the cage and
looked at them carefully, he then put the sticks together and retrieved his
banana
Insight Learning: Kohler’s Smart Chimp
 Kohler called Sultan’s rapid “perception of relationships”
insight
 Insight – the sudden perception of relationships among
various parts of a problem, allowing the solution to the problem
to come quickly
 Insight could not be gained through trial-and-error learning
alone
Learned Helplessness: Seligman’s
Depressed Dogs
 Learned helplessness – the tendency to fail to act to escape from a
situation because of a history of repeated failures in the past
 Seligman presented a tone followed by a harmless but painful electric
shock to one group of dogs
 This group of dogs were harnessed so they could not escape the shock
 The researchers assumed the dogs would learn to fear the sound of the
tone and later try to escape from the tone before being shocked
 Another group of dogs were not conditioned to fear the tone
Learned Helplessness: Seligman’s
Depressed Dogs
 The dogs were then placed in a box consisting of a low fence
in the middle that divided the box into 2 compartments
 The dogs could easily see over the fence and jump over if
they wanted
 Dogs who had not been conditioned to fear the tone quickly
jumped from one side of the box to the other as soon as the
shock occurred
Learned Helplessness: Seligman’s
Depressed Dogs
 When the dogs who were conditioned to fear the tone were
placed in the box
 Instead of jumping over the fence when the tone sounded, they
just sat there
 They showed distress but didn’t try to jump over the fence even
when the shock itself began
Learned Helplessness: Seligman’s
Depressed Dogs
 Why?
 The dogs that had been harnessed while being conditioned had
apparently learned in the original tone/shock situation that there was
nothing they could do to escape the shock
 So when placed in a situation in which escape was possible, the dogs still
did nothing because they had learned to be “helpless”
 They believed they could not escape, so they didn’t try
 More recently, learned helplessness has been studied from a
neuroscientific perspective
 research has indicated that brain areas associated with fear/anxiety,
suppression of the fight-flight response, and higher-level brain areas in the
frontal lobe which determine whether or not a stimulus is controllable all
play a role
Learned Helplessness: Seligman’s
Depressed Dogs
 The concept of learned helplessness has been extended to explain
some behaviors characteristic of depression
 Depressed people seem to lack normal emotions and become
somewhat apathetic
 They often stay in unpleasant work environments or bad marriages or
relationships rather than trying to escape or better their situation
 Seligman proposed that this depressive behavior is a form of learned
helplessness
 Depressed people may have learned in the past that they seem to have no
control over what happens to them
 A sense of powerlessness and hopelessness is common in depressed
people, and this seems to apply to the behavior of Seligman’s dogs
Observational Learning
 Observational learning – learning new behavior by
watching a model (someone else who is doing that behavior)
perform that behavior
 Sometimes the behavior is desirable, and sometimes it is not
 Classic study of observational learning: Albert Bandura’s
“Bobo doll” study
Bandura and The Bobo Doll
 In this classic study 2 groups of preschool children were placed
in a room with an experimenter and a model and watched as
the model interacted with toys in the room
 Group 1 – the model interacted with the toys in a nonaggressive
manner, completely ignoring the presence of a “Bobo” doll
 Group 2 – the model became very aggressive with the doll,
kicking it and yelling at it, throwing it in the air and hitting it with
a hammer
 Then each child was left alone in the room and had the
opportunity to play with the toys in the room while a camera
filmed them through a one-way mirror
 Children in group 1, who had watched the model ignore the
Bobo doll, did not act aggressively toward the doll
 Children in group 2, who had watched the model act
aggressively, beat up on the doll, hitting and kicking it, exactly
imitating the model’s behavior
Bandura and The Bobo Doll
 Obviously, the aggressive children had learned their
aggressive actions from merely watching the model, with no
reinforcement
 This is an example of learning/performance
distinction – the observation that learning can take place
without actual performance of the learned behavior
Bandura and The Bobo Doll
 In later studies, 2 groups of children were shown the model acting
aggressively on film
 Group 1 – watched the model act aggressively with the Bobo doll and then be
rewarded
 Group 2 – watched the model act aggressively with the Bobo doll and then be
punished
 When placed in the room with the doll, children from group 1 imitated the
model’s aggressive actions but children from group 2 did not
 Then, children from group 2 were told they would receive a reward if they
could show the experimenter what the model had done
 Each child from group 2 then correctly imitated the model’s aggressive
behaviors
 Apparently, consequences do matter in motivating a child or an adult to
imitate a particular model
Bandura and The Bobo Doll
 This makes the tendency for some movies and TV programs to
make “heroes” out of violent, aggressive “bad guys” particularly
disturbing
 Recent nationwide study of young people ages 8-18 in the U.S.
 Found that young people spend almost 7.5 hours on average per day
involved in media consumption (TV, computers, video games,
music, cell phones, print, and movies) 7 days a week
 Given the prevalence of media multitasking (using more than one
media device at a time) they are packing in approximately 10 hrs 45
mins of media during those 7.5 hours
 While not all media consumption is violent, it’s easy to imagine that
some of that media is of a violent nature
Bandura and The Bobo Doll
 Correlational research stretching over nearly 2 decades
suggests that a link exists between viewing violent TV and an
increased level of aggression in children
 While correlations do not prove that viewing violence on TV is
the cause of increased violence, the link between violence and
viewing violence on TV is a strong one
 Although still a topic of debate, there appears to be a strong
body of evidence that exposure to media violence does have
immediate and long-term effects
 Increasing the likelihood of aggressive verbal and physical behavior
and aggressive thoughts and emotions in children, adolescents,
and adults
The 4 Elements of Observational Learning
 Attention – to learn anything through observation, the learner must first pay attention to
the model
 Ex. A person at a fancy dinner party who wants to know which utensil to use has to watch a
person who seems to know what is correct
 Certain characteristics of models can make attention more likely
 Ex. People pay more attention to those they perceive as similar to them, and to those they perceive as attractive
 Memory – the learner must be able to retain the memory of what was done
 Ex. Remembering the steps in preparing a meal that was seen on a cooking show
 Imitation – the learner must be capable of reproducing, or imitating, the actions of the
model
 Ex. A 2 year old might be able to watch someone tie shoelaces and might even remember most
of the steps, but the 2 year old’s chubby little fingers will not have the dexterity necessary for
actually tying the laces
 Motivation – the learner must have the desire or motivation to perform the action
 Ex. The person at the fancy dinner party might not care which fork is the “proper” one to use
 If a person expects a reward because one has been given in the past, or has been promised a
future reward, or has witnessed a model getting a reward they will be much more likely to
imitate the observed behavior