CHAPTER 16: TEACHING AND REVISION NOTES Comparisons of

advertisement
CHAPTER 16: TEACHING AND REVISION NOTES
Comparisons of classical and operant conditioning
Vs
Classical conditioning
Acquisition by pairing UCS with NS
Extinction by continually presenting CS without UCS
Stimulus generalisation – a stimulus similar to CS
elicits CR
Stimulus discrimination – CR is no longer elicited
by a stimulus similar to CS
Spontaneous recovery – When CR has been
extinguished, CS presented after a time delay again
causes CR
The learner is passive when either the CS or UCS is
presented. The learner has no control over learning
Responses are involuntary (reflexive)
Timing – The reinforcer is the UCS and this
precedes the response.
Operant conditioning
Acquisition by reinforcing the behaviour
Extinction by removing reinforcement
When a similar discriminative stimulus
leads to the behaviour
An environment similar to the discriminative
stimulus no longer elicits the behaviour
Extinguished response occurs again when
discriminative stimulus presented after
delay
The learner is active; the learner controls
the learning
Voluntary responses.
The reinforcer follows the desired response.
Trial and error learning as informed by Edward Lee Thorndike’s
puzzle-box experiment
E.L. Thorndike worked in the USA at the same time as Pavlov was making his discoveries in Russia.
Thorndike’s experiment with a cat in a ‘puzzle box’ was as follows:
 A hungry cat was placed in a wooden cage, the cat could see out (and smell food) but could not escape
 There was a lever in the cage which, when pressed, would open the door, allowing the cat to escape
 The cat explored the cage and eventually its random movements caused it to press the lever and the door
opened
 The cat escaped and was rewarded (ate the food)
 After about 22 trials (learning was not immediate), when the cat was placed in the cage it escaped much
more quickly and the speed of escape remained fast on subsequent occasions
 Thorndike realised that the cat had been instrumental in obtaining its freedom. He called the process of
learning Instrumental Conditioning.
 Thorndike framed his ‘Law of Effect’ which states that this type of learning occurs when ‘behaviour becomes
controlled by its consequences’.
Three-phase model of operant conditioning as informed by
Skinner: positive and negative reinforcement, response cost,
punishment and schedules of reinforcement
OPERANT CONDITIONING
Learning in which behaviour becomes controlled by its consequences.
Operant conditioning procedures
B.F. Skinner commenced experimenting with rats in the 1930s. He coined the term operant conditioning. This was
to give emphasis to the idea that animals and people learn to operate on the environment to produce desired
consequences. An operant is a response (or set of responses) that occurs without being elicited by any stimulus and
acts upon the environment in the same way each time.
Oxford Psychology Units 3 & 4
ISBN 978 0 19 556717 5 © Oxford University Press Australia
THE THREE-PHASE MODEL OF OPERANT CONDITIONING
D-B-C
Discriminant stimulus (Antecedent condition)
The condition that signals that it is appropriate to exhibit the behaviour.
Behaviour
Performing the action
Consequence
The outcome as a result of the behaviour
Elements of operant conditioning


Reinforcement – Any stimulus (action or event) that strengthens or increases the likelihood of a response
(behaviour) that it follows.
o Positive reinforcer – A reward which strengthens a response by providing a pleasant or satisfying
consequence.
o Negative reinforcer – The removal, reduction, or prevention of an unpleasant stimulus.
Punishment - Any stimulus (action or event) that weakens or decreases the likelihood of a response
(behaviour) that it follows.
o Punishment – Weakens a response by presenting an unwanted stimulus
o Response cost – weakens a response by removing a desired stimulus
Schedules of reinforcement (also punishment) – the frequency and manner in which a response is
reinforced.
 Continuous reinforcement – when a correct response is reinforced every time it is given.
 Partial reinforcement – when only some correct responses are reinforced. Responses conditioned under
partial reinforcement are usually stronger (take longer to extinguish) than those under continuous
reinforcement.
 Fixed interval schedule – reinforcement is delivered after a fixed time period (eg. every 10 seconds).
 Fixed ratio schedule – reinforcement is delivered after a fixed no. of correct responses. (eg. every 10th
response)
 Variable interval schedule – reinforcement occurs on an average of a set time interval, but not with regular
frequency (eg. on an average of every 10 seconds but with variations from 4 -16 seconds).
 Variable ratio schedule – reinforcement occurs on the basis of a set average number of correct responses,
but is not regular in its occurrence (eg. on an average of every tenth response, but with variations from the
2nd to the 18th response).
Shaping
A procedure in which a reinforcer is given for any response that successively approximates and ultimately leads to
the desired response or target behaviour. Also known as the method of successive approximations.
Extinction, spontaneous recovery, generalisation and discrimination




Extinction – when the operantly conditioned response disappears over time as reinforcement ceases.
Spontaneous recovery – the reappearance of an extinguished response after a rest period.
Generalisation – in operant conditioning this usually refers to generalisation of the reinforcer (stimulus).
Discrimination – the organism learns to know which responses will be reinforced and which will not.
Punishment
Distinction from negative reinforcement
 Negative reinforcement (like positive reinforcement) increases the probability of a response occurring.
Punishment decreases the probability of the response occurring.
Oxford Psychology Units 3 & 4
ISBN 978 0 19 556717 5 © Oxford University Press Australia

Although both negative reinforcement and punishment involve an unpleasant stimulus (eg. a telling off, a
fine), punishment occurs when this unpleasant stimulus follows the response (eg. inappropriate behaviour).
Negative reinforcement occurs when the response avoids or stops an existing unpleasant stimulus.
Potential punishers are any consequences, which might lead to a decrease in a given response. It is therefore
important to know the subject being operantly conditioned sufficiently well to judge the type of consequences, which
will be pleasant or unpleasant. For example, a quiet student who is ‘fussed over’ by the teacher every time she offers
a response in class, may shun such attention and see it as threatening (punishment) rather than rewarding
(reinforcing) as the teacher had intended it.
Side effects of punishment – frustration and aggression may develop in a child who is punished frequently.
Administering the punishment may be an outlet for frustrations of the punisher. Consequently, punishment may
increase simply because it makes the punisher feel better, not because the person being punished deserves it.
Effective punishment is quick, brief; immediate, and linked to the undesired behaviour in the mind of the person
(animal) being punished.
Operant conditioning in everyday life:


Token economy – a situation in which individuals receive tokens for appropriate behaviour and these tokens
can be collected and exchanged for more tangible rewards. Tokens may also be withdrawn and individuals
can be fined in tokens for inappropriate behaviour.
Animal training – often involves the same sort of shaping procedures used by Skinner to get animals to
perform tricks. Also used to train dogs for search and rescue operations, to detect drugs and bombs, and to
do guide work for the visually impaired.
Oxford Psychology Units 3 & 4
ISBN 978 0 19 556717 5 © Oxford University Press Australia
Download