Unit 4 – Learning through Conditioning

advertisement
overheads for Unit 4: learning
1
Learning through Conditioning
Learning: a relatively durable change in behaviour or knowledge that
is due to experience.
Conditioning: learning associations between events that occur in an
organism’s environment
Classical Conditioning: a stimulus acquires the capacity to evoke a
response that was originally evoked by another stimulus.
 Ivan Pavlov
Classical Conditioning Terminology:
Unconditioned stimulus (UCS): evokes an unconditioned response
without previous conditioning
Unconditioned response (UCR): an unlearned reaction to an
unconditioned stimulus that occurs without previous conditioning
Conditioned stimulus (CS): a previously neutral stimulus that has,
through conditioning, acquired the capacity to evoke a conditioned
response
Conditioned response (CR): a learned reaction to a conditioned
stimulus that occurs because of previous conditioning.
CS + UCS  UCR
bell + meat  salivation
CS  CR
bell  salivation
Conditioned fears: many irrational fears and phobias can be traced
back to experiences that involve classical conditioning
overheads for Unit 4: learning
2
Conditioning and Physiological Responses
Ader & Cohen (1981; 1984; 1993) have shown that c.c. procedures
can lead to immunosuppression - a decrease in the production of
antibodies
They paired a drug that chemically suppresses the
immune system (UCS) with an unusual tasting liquid
(CS). Eventually, the liquid alone resulted in
immunosuppression (CR).
Acquisition: the process of pairing the CS and UCS, until the
conditioned response is elicited by the conditioned stimulus
 stimuli that are novel, unusual, or especially intense
 timing of the stimulus presentations is critical
o Simultaneous conditioning: CS and UCS begin and
end together
o Trace conditioning: CS begins and ends before the UCS
is presented
o Short-delayed conditioning: CS begins just before UCS
(about half a second before) and stops at the same time
as the UCS. This is the timing that works best.
Extinction: when UCS no longer follows the CS, the CR gets weaker
and eventually ceases.
Spontaneous Recovery: the reappearance of an extinguished
response after a period of non-exposure to the conditioned stimulus.
 the renewal effect: if a response is extinguished in a different
environment than it was acquired, it will reappear if the animal
is returned to the original environment where acquisition took
place
 so extinction leads to suppression of the CR, not “unlearning”
Stimulus Generalization: a stimulus similar to the CS may also elicit
the CR.
 e.g. little Albert (Watson & Raynor, 1920)
 the more similar new stimuli are to the original CS, the
greater the generalization.
overheads for Unit 4: learning
3
Stimulus Discrimination: process by which organism learns to
respond differently to stimuli distinct from the CS.
 the less similar new stimuli are to the original CS, the greater
the likelihood and ease of discrimination.
Higher order conditioning: when a conditioned stimulus functions
as if it were an unconditioned stimulus.
phase 1: a neutral stimulus is paired with a UCS until it becomes a
CS
CS (tone) + UCS (meat)  UCR (salivation)
CS (tone)  CR (salivation)
phase 2:
another neutral stimulus is paired with the previously
established CS, and acquires the capacity to elicit the
response originally evoked by the UCS
CS (red light) + CS (tone)  CR (salivation)
CS (red light)  CR (salivation)
Operant Conditioning
Volitional responses come to be controlled by their consequences.
(also called instrumental learning)
 Edward Thorndike (1898): cats and “puzzle boxes”. Cats used
trial and error learning, not simple reflexes, to figure out a way out
of the puzzle box.
* responses leading to + outcomes are strengthened.
* responses leading to – outcomes are weakened.
Thorndike’s Law of Effect: If a response in the presence of a
stimulus leads to satisfying effects, the association between the
stimulus and the response is strengthened.
overheads for Unit 4: learning
4
B.F. Skinner (1953, 1969, 1984): organisms tend to repeat those
responses that are followed by favourable consequences.
Reinforcement: when an event following a response increases an
organism’s tendency to make that response.
Operant chamber (a.k.a. a Skinner Box): a small enclosure in which
an animal can make a specific response that is recorded while the
consequences of the response are systematically controlled.
Reinforcement contingencies: the circumstances or rules that
determine whether responses lead to the presentation of reinforcers.
 experimenter manipulates whether positive consequences occur
when the animal makes a designated response (typically the
delivery of a bit of food into a food cup in the chamber)
 the key d.v. is response rate over time, recorded by the
cumulative recorder
Acquisition: the initial stage of learning some new pattern of
responding
Shaping: the reinforcement of closer and closer approximations of a
desired response.
Extinction: the gradual weakening and disappearance of a response
tendency because the response is no longer followed by a reinforcer.
Resistance to extinction: when an organism continues to make a
response after delivery of the reinforcer for it has been terminated.
 the greater the resistance to extinction, the longer the
responding will continue
Antecedent stimuli: stimuli that precede a response can also exert
considerable influence over operant behaviour.
 when a response is consistently followed by a reinforcer in the
presence of a particular stimulus, that stimulus can serve as a
signal indicating that the response is likely to lead to a reinforcer. It
becomes a discriminant stimulus: a cue that influences operant
behavior by indicating the probable consequences of a response.
overheads for Unit 4: learning
5
Delayed Reinforcement: The longer the delay between the
response and delivery of a reinforcer, the more slowly conditioning
proceeds.
Conditioned Reinforcement
 primary reinforcers are events that are inherently reinforcing
because they satisfy biological needs (e.g. food, water, sex,
warmth)
 secondary reinforcers are events that acquire reinforcing
qualities by being associated with primary reinforcers (e.g. money,
good grades, praise)
Reinforcement Schedules
Continuous reinforcement: when every instance of a designated
response is reinforced.
Intermittent, or partial, reinforcement: when a designated
response is reinforced only some of the time.
Four Popular Intermittent Schedules
Fixed-Ratio (FR): reinforcement comes after a fixed number of
responses, e.g. FR-25 = reinforcement on every 25th response.
 high rate of responding
Variable Ratio (VR): the number of responses needed for
reinforcement varies, e.g. VR-10 = on the average, reinforced after
every 10th response
 highest rate of responding
 greatest resistance to extinction
overheads for Unit 4: learning
6
Fixed Interval (FI): reinforcer is delivered for the first response made
after a fixed period of time has gone by, e.g. FI-10 = subject has to
wait 10 seconds after reinforcement before their response will yield
another reinforcement
 response patterns are scalloped
Variable Interval (VI): first response after a variable period of time
has elapsed is reinforced, e.g. VI –20 = reinforcers delivered on
average once every 20 seconds
 generates a low but stable response rate
 hard to extinguish
 ratio schedules produce more rapid responding
 variable schedules tend to produce steadier response rates
and greater resistance to extinction
 shifting to a higher ratio stimulates harder work and greater
productivity
 gambling is reinforced on a variable ratio schedule, which
produces rapid, steady responding with great resistance to
extinction.
Positive Reinforcement: when a stimulus that follows a behaviour
increases the probability of that behaviour over time
Negative Reinforcement: when the removal of an unpleasant or
aversive stimulus is made contingent on a particular behaviour,
thereby strengthening that response
Avoidance behaviour
 escape learning: when an organism acquires a response that
decreases or ends some aversive stimulation.
 shuttle box paradigm: 2 compartments with a door that can be
opened and closed by the experimenter
 animal is placed in one compartment and an electric current
in the floor of the chamber is turned on with the doorway open.
overheads for Unit 4: learning
7
 animal learns to escape the shock by going into the other
compartment.
 the escape response gets strengthened through negative
reinforcement
 escape learning can lead to avoidance learning: when an
organism acquires a response that prevents some aversive
stimulus from happening at all.
 e.g. the experimenter gives the animal a signal that the
shock is forthcoming
 avoidance responses are long-lasting, even though we’re not
sure how the behaviour continues to be reinforced. The best
explanation is the Two-Process theory of avoidance.
The Two-Process Theory of Avoidance
 the warning light becomes a CS (via classical conditioning),
eliciting conditioned fear in the animal
 fleeing to the other side of the box is an operant response that
produces negative reinforcement because it reduces conditioned
fear
 the avoidance response removes an internal aversive stimulus,
conditioned fear, rather than an external aversive stimulus, the
shock.
Punishment: when an event following a response weakens the
tendency to make that response.
 typically involves presentation of an aversive stimulus
 can also involve the removal of a rewarding stimulus
 it can have unintended side-effects:
- general suppression of behavioural activity
- strong emotional responses, including fear, anxiety, anger,
and resentment
- physical punishment often leads to an increase in aggressive
behaviour
overheads for Unit 4: learning
8
Instinctive Drift: when an animal’s innate response tendencies
interfere with the conditioning process.
Conditioned taste aversion: tendency to associate a substance’s
taste with illness caused by eating that substance.
 even just one pairing
 even with a delay of hours
 John Garcia (1989)
 probably a by-product of our evolutionary history
Preparedness and phobias:
Preparedness (Seligman, 1971): a species-specific predisposition to
be conditioned in certain ways and not others.
- instinctive drift, conditioned taste aversion, phobic responses
We carry an innate tendency acquired through natural selection to
respond quickly and automatically to stimuli that posed a survival
threat to our ancestors.
Observational Learning (Bandura’s 1977, 1986)
 bobo doll study
 an organism’s responding is influenced by the observation of
others, who are called models.
Four key processes
1. Attention. You need to be paying attention to someone else’s
behaviour and its consequences
2. Retention. You need to store a mental representation of what
you have witnessed in your memory to be able to use later.
3. Reproduction. You have to be able to reproduce the response.
4. Motivation. You’re not likely to engage in the behaviour unless
you’re motivated, e.g. by expectations that it will pay off for you.
 Reinforcement affects which responses are actually performed
more than which responses are acquired.
Bandura’s theory explains why physical punishment tends to increase
aggressive behaviour in children:
 they are unwittingly serving as models for aggressive behaviour
Download