OPERANT CONDITIONING OPERANT CONDITIONING Many of the behaviours in animals and humans cannot be explained in terms of classical conditioning. Many complex behaviours appear to be voluntary, goal-directed and governed by anticipated consequences or rewards. Different principals are needed to explain how complex, goal-orientated behaviour is learned and changed. TRIAL AND ERROR LEARNING Trial and error learning describes an organism’s attempts to learn, or to solve a problem, by trying alternate possibilities until a correct solution or desired outcome is achieved. It involves a number of attempts (trials) and a number of incorrect choices (errors) before the correct behaviour is learned. Once learned, the behaviour will usually be performed quickly and with few errors. TRIAL AND ERROR LEARNING Sometimes referred to as: -instrumental learning as in the individual is instrumental in learning the correct response -operant conditioning as in the individual operates on the environment to solve the problem. Trial and error learning involves motivation, exploration, incorrect and correct responses, and reward. TRIAL AND ERROR LEARNING Receiving a reward of some kind leads to the repeated performance of the correct responses, strengthening the association between the behaviour and its outcome. The number maze and the learning curve Negotiate the maze by drawing a line between each consecutive number starting at number 1. You will be given a 1 minute interval for each maze. Repeat the procedure for each maze. Record the number you reached in each maze in the time allowed. Plot a graph of these numbers against the 10 trial numbers. The number maze and the learning curve 1. 2. 3. 4. What is the shape of the graph? How is the shape of this graph different from the shapes of graphs obtained by Thorndike? Work out how long on average it took to get from number to number in the first trial as compared to the last trial (no. of no.’s / 60 seconds). What was the reinforcement that caused learning to occur in this case? THORNDIKE’S EXPERIMENT WITH CATS. In the early years of the twentieth century, about the same time Pavlov was investigating the digestive system of dogs, Edward Thorndike was performing experiments that would form the basis of operant conditioning. In Thorndike’s puzzle box experiment, he would place a cat inside a puzzle box and put a fish outside the box. The idea was to observe and time the cats attempts to escape the box and get to the fish. At first the cat showed a wide range of random behaviours in attempting to escape the box, until it accidentally stepped on a leaver in the middle of the box which released the door. The cats behaviour gradually became less random. Each time it was put in the box the cat would escape a little more quickly, until eventually it escaped as soon as it was put back in the box. Because the cat had started with random behaviour and had gradually learned the solution to the puzzle box, Thorndike believed that learning was a trial and error process. Thorndike found that the animal learned those behaviours that were followed by pleasant consequences, while other behaviours were not repeated. This became known as the law of effect. The law of effect suggests that behaviours that lead to positive consequences are repeated and behaviours that do not lead to positive consequences are not repeated. The conditioning process became known as instrumental conditioning, because behaviour is instrumental in obtaining rewards. Although it was formulated to explain goaldirected behaviour, operant conditioning attempts to explain such behaviour in terms of what has happened in the past. OPERANT CONDITIONING The term ‘operant conditioning’ was not introduced until years after Thorndike’s experiments with cats. This term was coined by a man named Burrhus Skinner. He suggested that an operant is a response (or set of responses) that occurs and acts on the environment to produce some kind of effect. Essentially an operant is a response of behaviour that generates consequences. Before conditioning, an organism might make many operant responses. (The cat clawing and biting). Operant conditioning is based on the principle that an organism will tend to repeat behaviours that have desirable consequences, or that will enable it to avoid undesirable consequences. Furthermore, organisms will tend not to repeat behaviours which have undesirable consequences. SKINNER’S EXPERIMENTS WITH RATS Skinner created an apparatus called a Skinner Box. A Skinner Box is a small chamber in which an experimental animal learns to make a particular response for which the consequences can be controlled by the researcher. It has a leaver which delivers a reward (food) when pushed. Some boxes have lights, buzzers and grid floors which provide mild electric shocks. The lever is also attached to a cumulative recorder which tracks the desired responses, their frequency and speed. Rats and pigeons were used for these experiments. Skinner 1938, classic experiment to demonstrate operant conditioning. When a hungry rat was placed in the box, it would scurry around, randomly touching the floor and walls. Eventually it would accidentally press the leaver on the wall in which case a pellet of rat food would drop into the food dish and the rat would eat it. With additional repetitions of leaver pressing followed by food, the rat’s random movements began to disappear and were replaced by more consistent lever pressing. Eventually the rat was pressing the lever as fast as it could eat the pellets. The pellet was a reward for making the correct response. Skinner referred to different kinds of rewards as reinforcers. Skinner wanted to demonstrate the impact of reinforcement according to different types of schedules of reinforcement. Eg. Every time a correct response is made compared with every second time the response is made. Thorndike’s cats could see their reinforcement from the box they were placed in, so although it took them many trials to make the correct response, their motivation was clear. Skinner’s lab animals came across their reinforcement by chance. Skinner had to use hungry rats in order for them to act erratically and hit the leaver by chance. ELEMENTS OF OPERANT CONDITIONING Central to operant conditioning is reinforcement because learning through operant conditioning occurs as a result of consequences of behaviour. A response that is rewarded is strengthened, whereas one that is punished is weakened. REINFORCEMENT How do you train a dog? How do you ensure that you don’t get wet when walking in the rain? Reinforcement may involve receiving a pleasant stimulus (pat/food) or escaping an unpleasant stimulus (rain). In either case the outcome is one that is desired by the organism performing the behaviour. Reinforcement is applying a positive stimulus or removing a negative stimulus to subsequently strengthen or increase the likelihood of a particular response that it follows. The term ‘reinforcer’ is often used interchangeably with the term ‘reward’. The only difference is that reward suggests an outcome that is positive, such as satisfaction or pleasure. A stimulus is a reinforcer if it strengthens the preceding behaviour. SCHEDULES OF REINFORCEMENT. Reinforcement may be provided on a continuous schedule (after every correct response) or on a partial reinforcement schedule (that is only on some occasions). The difference between the two is the speed with which the response is conditioned and the strength of the conditioned response. In the early stages of conditioning, learning is most rapid if the correct response is reinforced every time it occurs. This is known as continuous reinforcement. Once a correct response consistently occurs, a different reinforcement schedule can be used to maintain, increase or strengthen the response. Responses maintained through a program of intermittent reinforcement are stronger and are less likely to weaken or cease than those maintained by continuous reinforcement. Partial reinforcement is the process of reinforcing some correct responses but not all of them. The term schedule of reinforcement refers to the frequency and manner in which a desired response is reinforced. Reinforcement can be given after a certain number of correct responses have been made (ratio) or as a certain amount of time has passed (interval). Reinforcement may be given on a regular basis (fixed) or it may be unpredictable (variable). Behaviour that is conditioned on a schedule of partial reinforcement is generally the most difficult to change. Each schedule produces a different effect on the rate and pattern of a response. POSITIVE REINFORCEMENT A positive reinforcer is a stimulus that strengthens or increases the likelihood of a desired response by providing a satisfying consequence (reward). Positive reinforcement occurs from giving or applying a positive reinforcer after the desired response has been made. The food pellet in the Skinner box. Receiving a good mark if you have studied hard. NEGATIVE REINFORCEMENT A negative reinforcer is any unpleasant or aversive stimulus, that when removed or avoided, strengthens or increases the likelihood of a desired response. Skinner Box and electric current. Negative reinforcement is the removal or avoidance of an unpleasant stimulus. It has the effect of increasing the likelihood of a response being repeated. The important distinction between positive and negative reinforcement is that positive reinforcers are given and negative reinforcers are removed or avoided. Both procedures lead to desirable consequences. Examples of negative reinforcers are: -turning off a scary video -driving slowly to avoid a speeding fine If you take a panadol when you have a headache and the headache goes away, the behaviour of taking the panadol has been negatively reinforced, and it is likely you will repeat that behaviour next time you have a headache. TO REMEMBER: -positive (+) reinforcer = adding something pleasant -negative (-) reinforcer = subtracting something unpleasant (which results in a pleasant or desirable outcome. PUNISHMENT Punishment is the delivery of an unpleasant stimulus following a response, or the removal of a pleasant stimulus following a response. It has the same unpleasant quality as a negative reinforcer, but unlike a negative reinforcer, the punishment is given or applied, whereas the negative reinforcer is prevented or avoided. Punishment is designed to weaken a response, or decrease the probability of that response occurring again over time. Factors that influence the effectiveness of reinforcement and punishment. Reinforcement is intended to increase the likelihood of a behaviour being repeated and punishment is intended to decrease the likelihood of behaviour being repeated. -Order of presentation -Timing -Appropriateness ORDER OF PRESENTATION To use reinforcement and punishment effectively it is important that it is presented after a desired response, never before. Learning consequences of certain responses. TIMING Reinforcement are most effective when they are given immediately after the response has occurred. This helps the organism to make the association between the response and the reinforcer/punishment. If there is a delay learning will take longer. Sometimes, in real life, it is not possible for consequences to be given immediately. APPROPRIATENESS For any stimulus to be a reinforcer, it must be pleasing or satisfying in some way. It is not known if something is going to be a reinforcer until after it has been used. It cannot be assumed that a reinforcer that works in one situation will work in other situations. Characteristics of the individual need to be taken into account. A stimulus must be appropriate as a punishment, as in it must provide a consequence that is unpleasant, and therefore likely to decrease the unwanted behaviour. KEY PROCESSES IN OPERANT CONDITIONING The same key processes are involved in both classical and operant conditioning, however the way in which these processes occur is slightly different in each. -Acquisition -Extinction -Stimulus generalisation -Stimulus discrimination -Spontaneous recovery ACQUISITION Acquisition refers to the overall learning process, during which a specific response, or set of responses is established. The types of behaviours acquired during operant conditioning in comparison to classical conditioning are generally more complex. In operant conditioning, acquisition is the establishment of a response through reinforcement. Some behaviours that are operantly conditioned are too complex to be performed completely in the beginning of the acquisition process. Instead behaviours that are a simpler version of the desired behaviour, or a step towards the desired behaviour are rewarded instead. This is known as shaping. Shaping is the procedure in which reinforcement is given for any response that successively approximates and ultimately leads to the final desired response, or target behaviour. (Also known as the method of successive approximations). EXTINCTION In operant conditioning, extinction may also occur, and the process is similar to its occurrence in classical conditioning. Extinction is the gradual decrease in the strength or rate of a conditioned response following consistent non-reinforcement of the response . Extinction is less likely to occur when partial reinforcement is used. SPONTANEOUS RECOVERY After the apparent extinction of a response, spontaneous recovery can occur and the organism will once again show the response in the absence of any reinforcement. The response is likely to be weaker. A spontaneously recovered response is often stronger when it occurs after a lengthy period following extinction of the response, than when it occurs relatively soon after extinction. STIMULUS GENERALISATION In operant conditioning, stimulus generalisation occurs when the correct response is made to another stimulus that is similar to the stimulus that was present when the conditioned response was reinforced. Response usually occurs at a reduced level. STIMULUS DISCRIMINATION Stimulus discrimination occurs when an organism makes the correct response to a stimulus and is reinforced, but does not respond to any other stimulus, even when they are similar. Skinner trained pigeons to discriminate between red and green lights and to peck only when they saw a green light in order to receive reinforcement. COMPARISON OF CLASSICAL AND OPERANT CONDITIONING Common elements: -Acquisition -Extinction -Spontaneous recovery -Stimulus discrimination -Stimulus generalisation -Association between two events -Often occur in the same situation Major differences: -Operant- emphasis on consequences -Classical- behaviour does not have environmental consequences -Classical- response is involuntary/automatic -Operant- responses are mostly voluntary THE ROLE OF THE LEARNER In classical conditioning the learner is relatively passive, that is the response elicited by the learner occurs automatically. In operant conditioning the learner must actively operate on the environment so as to obtain the reinforcement or the punishment. TIMING OF THE STIMULUS AND RESPONSE In classical conditioning the response depends on the presentation of the UCS occurring first. In operant conditioning the presentation of the reinforcer depends on the response occurring first. In classical conditioning the timing of the two stimuli produces an association between them that conditions the learner to anticipate the UCS and respond to it even if it is not presented. In operant conditioning, the association that is conditioned is between the stimulus and the response. The response is either strengthened by reinforcement or weakened through punishment. In classical conditioning the timing of the two stimuli needs to be very close and the sequencing is vital. In operant conditioning, while learning generally occurs faster when the reinforcement or punishment occurs soon after the response, there can be a significant time difference between them. THE NATURE OF THE RESPONSE In classical conditioning the response by the learner is usually a reflexive, involuntary one. In operant conditioning, the response by the learner is usually a voluntary one. In classical conditioning the response is likely to involve the action of the autonomic nervous system, and the association is not conscious or deliberate. In operant conditioning the response is likely to involve the central nervous system, and to be conscious, intentional and often goal-orientated.