Reward and Punishment Cats escape from box to get a treat At first its all trial and error When successful the behaviour is rewarded This good consequence strengthens the behaviour Law of effect – good consequence more likely to be repeated, bad consequence not Instrumental learning – the cat is active in achieving its own escape and reward A learning process by which the likelihood of a particular behaviour occurring is determined by the consequences of that behaviour Theory of Operant Conditioning - Behaviour operates on the environment and our behaviour is instrumental in producing the consequences Rewards/Punishments US psychologist Burrhus Frederic Skinner (1904 – 1990) referred to the responses observed in trial and error learning as operants. American Psychologist, B.F Skinner (1904 – 1990) believed behaviour can be reduced to the relationships between the behaviour, its antecedents (the events that precede it), and its consequences Operant - a response (or set or responses) that occurs and acts (“operates”) on the environment to produce some kind of effect. It is a response or behaviour that generates consequences. Operant Conditioning is based on Thorndike’s law of effect that an organism will tend to repeat a behaviour (operants) that have desirable consequences (e.g. receiving a treat) or that will enable it to avoid undesirable consequences (e.g. Given a detention). Organisms will tend not to repeat a behaviour that has undesirable consequences (e.g. Disapproval or a fine) 3 components: 1. Stimulus (S) that precedes an operant response 2. Operant response (R) to the stimulus 3. Consequence (C) to the operant response S R C Sometimes expressed as: S R S where second S is a stimulus in the form of a consequence. The model means the probability of an operant response (R) to a stimulus (S) is a function of (depends on) the consequence (C) that has followed (R) in the past. e.g. Cat in puzzle box, the S is the box, R the sequence of movements needed to open the door and C is escape and food. See further examples in Table 10.2 (page 479) Skinner used the term “operant conditioning” rather than “instrumental learning” as he wanted to emphasise that animals and people learn to operate on the environment to produce desired or satisfying consequences. He proposed that in Thorndike’s experiments the cat “operated” on the environment to allow it to escape and get the fish reward. The operant that became conditioned was the behaviour of pushing the lever to open the door. Skinner also contrasted operants with respondents in classical conditioning. Respondents are behaviours produced by known or recognised stimuli. e.g. Pavlov’s dogs responded by salivating to meat powder and later the bell. Thorndike’s cats made many different responses that were not prompted by a particular stimulus. The dog receives a consequence (food) whether or not it has learned the conditioned response. This is why Skinner referred to classical conditioning as “respondent conditioning”. In operant conditioning the consequence only occurs if the organism performs the response. SUMMING UP: In operant conditioning, if responses are not made, the consequence doesn’t happen. In classical conditioning, responses occur regardless of responding. Skinner believed that ALL behaviour could be explained by the relationships between the behaviour, its antecedents (events occurring before it) and its consequences. Skinner argued that any behaviour that is followed by a consequence will change in strength (become more, or less, established) and frequency (occur more, or less often) depending on the nature of that consequence (reward or punishment). The Skinner Box is a small chamber in which an experimental animal learns to make a particular response for which the consequences can be controlled by the researcher. It contains a lever that delivers food (or water) into a dish when pressed. Some boxes also have lights and buzzers, some have grid floors that can deliver a mild electric shock. Lever is usually wired to a cumulative recorder (chart paper with a pen that makes a special mark each time a desired response is made). The recorder indicates how often (frequency) of response and the rate of response (speed). Rats – press level Pigeons – peck disc. Skinner referred to different types of rewards as “reinforcers”. He used the Skinner Box to reward the animals according to different types of programs or schedules of reinforcement. The fact the rats were hungry provided the motivation for their frantic activity, increasing the probability the lever would eventually be pressed and the food reward dispensed. Skinner believed there was no need to search for internal agents (factors within an organism) to explain changes in behaviour. He based his view on the notion that behaviour can be understood in terms of environmental or external influences, without any consideration of internal mental processes. Any stimulus (event or action) that subsequently strengthens or increases the likelihood of the response (behaviour) that it follows. The reinforcer comes after the response (behaviour) Reinforcement makes things stronger Reinforcement can involve receiving a pleasant stimulus (e.g. Treat for your dog) or avoiding or escaping an unpleasant stimulus (e.g. Umbrella on a rainy day). An essential feature of reinforcement is that it is only used after the desired or correct response is made. A reinforcer is any stimulus (object or event) that strengthens or increases the frequency or likelihood of a response that it follows. The word reinforcer is often used interchangeably with the word reward (although they are not technically the same). One difference is that a reward suggests an outcome that is positive, such as satisfaction or pleasure. A stimulus is a reinforcer if it strengthens the preceding behaviour. Also, a stimulus can be rewarding because it’s pleasurable, but is not a reinforcer unless it increases the frequency of a response or the likelihood of a response occurring. e.g. Eating chocolate is pleasurable but is not a reinforcer unless it promotes or strengthens a particular response. Positive Reinforcer PLUS something GOOD A stimulus which strengthens a response by providing a pleasant or satisfying consequence Skinners experiment = food pellets Money Grades Applause Negative Reinforcer MINUS something BAD A stimulus that strengthens a response by the reduction, removal or prevention of an unpleasant stimulus The behaviour that removes reduces or prevents an unpleasant stimulus is strengthened by the consequence Skinners experiment = electric shock Taking Panadol for headache Driving slow to avoid fine Positive reinforcement add good Negative reinforcement take away bad Both STRENGTHEN a response Overall outcome is desirable to organism, just have achieved it in different ways Positive punishment - the delivery of a stimulus following an undesirable response PLUS BAD Negative punishment – the removal of a stimulus following an undesired response MINUS GOOD Punisher – an unpleasant stimulus that when paired with a response weakens the response or decreases the rate of responding over time Punishers reduce unwanted behaviour It is usually more effective to reinforce alternative desirable behaviour than it is to punish undesirable behaviour MINUS GOOD Negative punishment often referred to as response cost When a valued stimulus removed Eg. If you drink drive we will take away your licence The way reinforcement is delivered is referred to as the “schedule of reinforcement”. It is a program for giving reinforcement, specifically the frequency and manner in which a desired response is reinforced. The schedule influences the speed of learning and the strength of the learned response. Continuous Reinforcement necessary for a response to become learned Partial Reinforcement can be more effective at maintaining a response Fixed Ratio Fixed number of correct responses Being paid $5 for every 100 newspapers delivered Variable Ratio Variable number of correct responses Poker machines Fixed Interval Fixed time period Teachers at Gleneagles get paid every fortnight Variable Interval Variable time period Fishing The variable ratio schedule is the most resistant to extinction It leads to the fastest rate of responding Gambling addiction is explicable through variable ratio reinforcement Order of presentation – reinforcement needs to occur after the desired response not before! So the organism associates the reinforcement with the behaviour Timing – Reinforcers need to occur as close in time to the desired response as possible. Most effective reinforcement occurs immediately after the desired response Appropriateness of the reinforcer – For a stimulus to be a reinforcer it must provide a pleasing or satisfying consequence for its recipient. Reinforcers that work in one situation will not always work in another. The characteristics of the individual involved and the particular situation need to be taken into account when deciding on the best kind of reinforcer to be used. An inappropriate punisher can have the opposite effect and produce the same consequence as a reinforcer. (e.g. Giving verbal reprimand from a teacher to an attention seeking , talkative Year 8 student can act as a reinforcer for the talkative behaviour) Punishment may temporarily decrease the occurrence of unwanted responses or behaviour, but it doesn’t promote more desirable or appropriate behaviour in its place. So, instead Skinner advocated for the greater use of positive reinforcement to strengthen desirable behaviours or to promote the learning of alternative behaviours to punishable behaviours. Same key processes as in classical conditioning: ACQUISITION EXTINCTION SPONTANEOUS RECOVERY STIMULUS GENERALISATION STIMULUS DISCRIMINATION Refers to the overall learning process during which a specific response, or pattern of responses is established. THE MEANS by which this is acquired is different between operant and classical conditioning. TYPES OF BEHAVIOURS acquired through operant conditioning are usually more complex than the reflexive involuntary responses in classical conditioning. Acquisition in operant conditioning is the establishment of a response through reinforcement. Speed of establishment of response depends on the schedule of reinforcement. Sometimes, a behaviour to be acquired is too complex to be performed completely at the end of the acquisition process, so a simpler version of the behaviour or a step towards the target behaviour is attempted and reinforced continuously until it is established. This involves a procedure called shaping. Extinction – the gradual decrease in the strength or rate of responding after a period of non- reinforcement. Extinction occurs after the termination of reinforcement. Extinction has occurred when a conditioned response is no longer present. Depending on whether partial or continuous reinforcement has been used, the response rate may actually increase in the initial phase of extinction after reinforcement is stopped. There is often reluctance to stop the response altogether as it has had satisfying consequences. Frustration and anger may also accompany the increased response rate. Extinction is less likely to occur when partial reinforcement is used. Uncertainty leads to greater tendency for response to continue. Spontaneous recovery – the response is (after a rest period) again shown in the absence of reinforcement. Response is likely to be weaker and will probably not last very long. A spontaneously recovered response is often stronger when it occurs after a lengthy period following extinction of the response than when it occurs relatively soon after extinction. Stimulus generalisation - occurs when the correct response is made to another stimulus which is similar to the stimulus for which reinforcement is obtained. Response usually occurs at a reduced level (frequency and strength) e.g. pigeons pecked other colored lights Stimulus discrimination organism makes response to a stimulus for which reinforcement is obtained but not for any other similar stimulus (e.g. sniffer dogs used by drug detection units) Shaping – a strategy in which a reinforcer is given for any response that successively approximates and ultimately leads to the final desired response Used to train behaviours that are unlikely to occur spontaneously Also known as the method of successive approximations. Used when the desired response has a low probability of occurring naturally. Used in real life – dolphins at SeaWorld for entertainment purposes, search and rescue dogs tracking skills, guide dogs. Learning to write Children learning to swim Monkeys trained to assist quadraplegics (Read Box 10.7 on page 499) The consistent use of Operant conditioning to alter behaviour over time Use of tokens as rewards that can be ‘cashed in’ for bigger rewards later Schools Prisons Token Economies are a form of behaviour modification using reinforcement tokens to influence behaviour change. E.g. Prisons – tokens cashed in for rewards such as cigarettes or privileges. A token economy is a setting in which an individual receives tokens (reinforcers) for desired behaviour and these tokens can then be collected and exchanged for other reinforcers in the form of actual or “real” rewards. E.g. Prisons, schools Tokens may be withdrawn as “penalties” for undesirable behaviour. Advantage of tokens: can be used in large group situations where real rewards are difficult to administer immediately after a desired behaviour occurs. Once desired behaviour is established, tokens can be phased out and replaced by more “natural” and easily administered reinforcers (e.g. Praise, smile). E.g. Schools – to increase reading by students, improve social skills of students with intellectual disabilities. Sometimes token economies backfire or fail. WHY? People may feel manipulated and refuse to co-operate. Or Situations are so complex and uncontrolled that well planned programs can go wrong. (e.g. Not smiling when delivering reinforcer) Operant conditioning procedures may fail also when the underlying cause of a behaviour is not altered. e.g. Rewarding cheerfulness when the gloominess is caused by a boring job – the solution may lie in changing jobs.