CHAPTER 16: TEACHING AND REVISION NOTES Comparisons of classical and operant conditioning Vs Classical conditioning Acquisition by pairing UCS with NS Extinction by continually presenting CS without UCS Stimulus generalisation – a stimulus similar to CS elicits CR Stimulus discrimination – CR is no longer elicited by a stimulus similar to CS Spontaneous recovery – When CR has been extinguished, CS presented after a time delay again causes CR The learner is passive when either the CS or UCS is presented. The learner has no control over learning Responses are involuntary (reflexive) Timing – The reinforcer is the UCS and this precedes the response. Operant conditioning Acquisition by reinforcing the behaviour Extinction by removing reinforcement When a similar discriminative stimulus leads to the behaviour An environment similar to the discriminative stimulus no longer elicits the behaviour Extinguished response occurs again when discriminative stimulus presented after delay The learner is active; the learner controls the learning Voluntary responses. The reinforcer follows the desired response. Trial and error learning as informed by Edward Lee Thorndike’s puzzle-box experiment E.L. Thorndike worked in the USA at the same time as Pavlov was making his discoveries in Russia. Thorndike’s experiment with a cat in a ‘puzzle box’ was as follows: A hungry cat was placed in a wooden cage, the cat could see out (and smell food) but could not escape There was a lever in the cage which, when pressed, would open the door, allowing the cat to escape The cat explored the cage and eventually its random movements caused it to press the lever and the door opened The cat escaped and was rewarded (ate the food) After about 22 trials (learning was not immediate), when the cat was placed in the cage it escaped much more quickly and the speed of escape remained fast on subsequent occasions Thorndike realised that the cat had been instrumental in obtaining its freedom. He called the process of learning Instrumental Conditioning. Thorndike framed his ‘Law of Effect’ which states that this type of learning occurs when ‘behaviour becomes controlled by its consequences’. Three-phase model of operant conditioning as informed by Skinner: positive and negative reinforcement, response cost, punishment and schedules of reinforcement OPERANT CONDITIONING Learning in which behaviour becomes controlled by its consequences. Operant conditioning procedures B.F. Skinner commenced experimenting with rats in the 1930s. He coined the term operant conditioning. This was to give emphasis to the idea that animals and people learn to operate on the environment to produce desired consequences. An operant is a response (or set of responses) that occurs without being elicited by any stimulus and acts upon the environment in the same way each time. Oxford Psychology Units 3 & 4 ISBN 978 0 19 556717 5 © Oxford University Press Australia THE THREE-PHASE MODEL OF OPERANT CONDITIONING D-B-C Discriminant stimulus (Antecedent condition) The condition that signals that it is appropriate to exhibit the behaviour. Behaviour Performing the action Consequence The outcome as a result of the behaviour Elements of operant conditioning Reinforcement – Any stimulus (action or event) that strengthens or increases the likelihood of a response (behaviour) that it follows. o Positive reinforcer – A reward which strengthens a response by providing a pleasant or satisfying consequence. o Negative reinforcer – The removal, reduction, or prevention of an unpleasant stimulus. Punishment - Any stimulus (action or event) that weakens or decreases the likelihood of a response (behaviour) that it follows. o Punishment – Weakens a response by presenting an unwanted stimulus o Response cost – weakens a response by removing a desired stimulus Schedules of reinforcement (also punishment) – the frequency and manner in which a response is reinforced. Continuous reinforcement – when a correct response is reinforced every time it is given. Partial reinforcement – when only some correct responses are reinforced. Responses conditioned under partial reinforcement are usually stronger (take longer to extinguish) than those under continuous reinforcement. Fixed interval schedule – reinforcement is delivered after a fixed time period (eg. every 10 seconds). Fixed ratio schedule – reinforcement is delivered after a fixed no. of correct responses. (eg. every 10th response) Variable interval schedule – reinforcement occurs on an average of a set time interval, but not with regular frequency (eg. on an average of every 10 seconds but with variations from 4 -16 seconds). Variable ratio schedule – reinforcement occurs on the basis of a set average number of correct responses, but is not regular in its occurrence (eg. on an average of every tenth response, but with variations from the 2nd to the 18th response). Shaping A procedure in which a reinforcer is given for any response that successively approximates and ultimately leads to the desired response or target behaviour. Also known as the method of successive approximations. Extinction, spontaneous recovery, generalisation and discrimination Extinction – when the operantly conditioned response disappears over time as reinforcement ceases. Spontaneous recovery – the reappearance of an extinguished response after a rest period. Generalisation – in operant conditioning this usually refers to generalisation of the reinforcer (stimulus). Discrimination – the organism learns to know which responses will be reinforced and which will not. Punishment Distinction from negative reinforcement Negative reinforcement (like positive reinforcement) increases the probability of a response occurring. Punishment decreases the probability of the response occurring. Oxford Psychology Units 3 & 4 ISBN 978 0 19 556717 5 © Oxford University Press Australia Although both negative reinforcement and punishment involve an unpleasant stimulus (eg. a telling off, a fine), punishment occurs when this unpleasant stimulus follows the response (eg. inappropriate behaviour). Negative reinforcement occurs when the response avoids or stops an existing unpleasant stimulus. Potential punishers are any consequences, which might lead to a decrease in a given response. It is therefore important to know the subject being operantly conditioned sufficiently well to judge the type of consequences, which will be pleasant or unpleasant. For example, a quiet student who is ‘fussed over’ by the teacher every time she offers a response in class, may shun such attention and see it as threatening (punishment) rather than rewarding (reinforcing) as the teacher had intended it. Side effects of punishment – frustration and aggression may develop in a child who is punished frequently. Administering the punishment may be an outlet for frustrations of the punisher. Consequently, punishment may increase simply because it makes the punisher feel better, not because the person being punished deserves it. Effective punishment is quick, brief; immediate, and linked to the undesired behaviour in the mind of the person (animal) being punished. Operant conditioning in everyday life: Token economy – a situation in which individuals receive tokens for appropriate behaviour and these tokens can be collected and exchanged for more tangible rewards. Tokens may also be withdrawn and individuals can be fined in tokens for inappropriate behaviour. Animal training – often involves the same sort of shaping procedures used by Skinner to get animals to perform tricks. Also used to train dogs for search and rescue operations, to detect drugs and bombs, and to do guide work for the visually impaired. Oxford Psychology Units 3 & 4 ISBN 978 0 19 556717 5 © Oxford University Press Australia