NATIONAL QUALIFICATIONS CURRICULUM SUPPORT Human Biology Case Study into Rewarded Behaviour, Unrewarded Behaviour and Shaping Learning Teacher’s Notes [HIGHER] The Scottish Qualifications Authority regularly reviews the arrangements for National Qualifications. Users of all NQ support materials, whether published by Learning and Teaching Scotland or others, are reminded that it is their responsibility to check that the support materials correspond to the requirements of the current arrangements. Acknowledgement Learning and Teaching Scotland gratefully acknowledges this contribution to the National Qualifications support programme for Human Biology. The publisher gratefully acknowledges permission to use the following sources: image of Thorndike’s cat from www.animalbehavoiur.net; image from http://www.simplypsychology.pwp.blueyonder.co.uk/thorndike.jpg Every effort has been made to trace all the copyright holders but if any have been inadvertently overlooked, the publishers will be pleased to make the necessary arrangements at the first opportunity. © Learning and Teaching Scotland 2011 This resource may be reproduced in whole or in part for educational purposes by educational establishments in Scotland provided that no profit accrues at any stage. 2 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 Contents Introduction 4 Description of activities Activity 1: A practical with a human subject 6 Activity 2: An introduction to rewarded behaviour 8 Activity 3A: Identifying examples of positive reinforcement of behaviour 9 Activity 3B (Extension): Beyond rewarded behaviour: other forms of operant conditioning 9 Activity 4: Interpreting and evaluating the first reported experimental evidence of operant conditioning 12 Activity 5A: An investigation into rewarded and unrewarded behaviour: analysing results and drawing conclusions 14 Activity 5B (Extension): Presenting a scientific research paper on operant conditioning: what is learned? 16 Activity 6: Research into the shaping of behaviour and its applications 21 Reference 22 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 3 INTRODUCTION Introduction Outline This case study is divided into six sections. Each section comprises an activity based around an aspect of operant conditioning. Activities 2, 5A and 6 are sufficient to cover the content of the Higher Human Biology course ; the others can be used, if desired, to cover the topic in more depth . Two activities (3 and 5) have optional extension activities that are suitable for use with high-ability students. All activities that are selected should be completed sequentially. Background Operant conditioning can be described by three variables: the behaviour, the consequence of the behaviour and the discriminative stimulus, which signals the contingency (relationship) between the behaviour and the consequence. The consequence is usually referred to as a reinforcer because it has the potential to change the frequency of the be haviour. In the case of rewarded behaviour, the reinforcer has an attractive value to the animal and has a positive contingency with the behaviour (it is presented whenever the behaviour is carried out). As a result of this the frequency of the rewarded behaviour increases. For example, if a rat presses a lever and receives a food reward, then the lever pressing is the behaviour, the food is the consequence/reinforcer and the sight of the lever is the discriminative stimulus. The presentation of the food reinforces the lever-pressing behaviour so that its frequency increases. Another way of describing this is to say that the rat’s lever-pressing behaviour has been conditioned by reinforcement with a food reward. A common mistake made by students is to describe an animal as having ‘been conditioned’. It is always a specific behaviour that is conditioned, not the whole animal. It is worth likening the process of operant conditioning to natural selection. The starting point is variation in the responses an ind ividual makes. A specific response is then selected for because it benefits the individual in 4 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 INTRODUCTION some way in the environment in which it is performed. This response then becomes more likely to be repeated and its frequency increases as long as it continues to benefit the individual. NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 5 DESCRIPTION OF ACTIVITIES Description of activities Activity 1: A practical with a human subject This is a practical activity in which students condition their peers’ behaviour. The students will carry out an experiment in groups of three. One me mber of the group will be the subject, one will be the experimenter and one will be the monitor. The experimenter will instrumentally condition an aspect of the subject’s behaviour (tapping their chin). It is important that the students are naïve to the theory of operant conditioning and the expected outcomes of the experiment in order for it to work as intended, so no mention of operant conditioning should be made. The activity is presented to the students as a practical that will introduce some aspects of this topic. Each student will have their own role-specific instruction card to read. However, it may be worth briefing the experimenters and the monitors verbally with questions to make sure that they understand what they have to do during the experiment. It is important to emphasise to the experimenter that if chin tapping does not occur at all during the first minutes of the experiment then they should pick another behaviour to reward. The experiment should take each group a maximum of 30 minutes . After the students have completed the experiment they will discuss what occurred in their groups. They will then need to be debriefed on the events that occurred. In the final part of the activity the students will plot a graph of the class results. A computer with a spreadsheet is needed. Background For background information for this practical visit http://cogprints.org/604/1/biblio25.html. Equipment required Each group requires: 6 1 stop clock paper and pencils NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 DESCRIPTION OF ACTIVITIES space (each group should work in an environment free from visual and auditory distraction) a large bag of chocolate buttons or equivalent. Debriefing the students After the students have done the experiment it will be necessary to review t he process that was occurring and the key concepts of rewarded behaviour. Emphasis should be placed on the terms behaviour, reward and conditioning. The questions below can be used to guide the discussion: Was the subject’s conditioned behaviour carried o ut voluntarily? Did the subject know why the pencil was being tapped on the table/why they were receiving points? Was the subject’s conditioned behaviour carried out consciously? Why did the subject’s behaviour change as it did over the course of the experiment? Why did the subject’s behaviour change as it did in the last 5 minutes of the experiment? Did learning occur during the experiment? What advantages does this kind of behaviour provide? What aspect of human behaviour often allows repetitive tr aining to be circumvented? (Language: the ability to communicate abstract concepts to other individuals). How does this work? (A person can learn that certain behaviour results in a reward simply by being told so. For example, telling the subject at the start of the session that they would receive a point every time they tapped their chin would result in them producing a high response rate immediately). Did all the experimenters succeeding in conditioning chin tapping in their subjects? What factors could have prevented conditioning from occurring? Approximate duration of activity: 90 minutes. NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 7 DESCRIPTION OF ACTIVITIES Activity 2: An introduction to rewarded behaviour In this activity students read information then watch a video clip (access to YouTube needed) and answer questions. The aim is for students to become familiar with the process of rewarded behaviour and the terminology that is used to describe it. Answers The stimulus in Activity 1 was simply the presence of the experimenter. The behaviour being rewarded is the pressing of the lever (it is repeated throughout the clip). The reward is likely to be food. The rat associates the presence of the lever with the reward. The rat may also associate the light with the rewarded behaviour. By behaving in this way the rat increases the mass of food it obtains. Food provides the rat with a source of energy and materials for growth. Obtaining more food increases the rat’s chances of surviving, reproducing and passing on its genes. The use of an operant conditioning chamber allows a researcher to control or systematically alter many environmental variables, eg temperature, light intensity, background noise, availability of food. Lever pressing can be quantified and measured (usually electronically). Approximate duration of activity: 25 minutes. 8 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 DESCRIPTION OF ACTIVITIES Activity 3A: Identifying examples of positive reinforcement of behaviour This activity gives students the opportunity to apply what they have learned about positive reinforcement in the previous activity. They will complete an online quiz (access to the internet needed) and self -assess their answers. Approximate duration of activity: 30 minutes. Activity 3B (Extension): Beyond rewarded behaviour: other forms of operant conditioning In the extension to this activity students are introduced to the four different forms of operant conditioning using animal examples and will construct further examples of their own in the form of comic strips. Answers The responses should be described precisely: Category One: Response = loud mewing. Reinforcer value = attractive (this means the outcome is rewarding/beneficial for the animal). The response results in the presentation of the reinforcer (carrying out the mewing results in food). This increases the rate of responding so that the animal receives more rewards. Category Two: Response = touching electric fence. Reinforcer value = aversive (the outcome is detrimental to the animal). The response results in the presentation of the reinforcer (touching the fence results in getting an electric shock). This decreases the rate of responding so that the animal receives fewer unpleasant experiences. Category Three: Response = scratching mother’s face. Reinforcer value = attractive. The response results in the omission of the reinforcer and this decreases the rate of responding. Category Four: Response = pressing lever. Reinforcer value = aversive. The response results in the omission of the reinforcer and this increases the rate of responding. NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 9 DESCRIPTION OF ACTIVITIES Relationship between response and reinforcer Value of reinforcer Positive (the reinforcer is presented when the response is made) Negative (the reinforcer is omitted when the response is made) Attractive (its presence benefits the animal) Positive reinforcement (reward) The rate of responding increases. Extinction The rate of responding decreases. Aversive (its presence harms the animal) Punishment The rate of responding decreases. Negative reinforcement (avoidance) The rate of responding increases. Human examples Positive reinforcement: 1. A child receives money every time they tidy their room so they tidy their room more frequently. 2. A teacher gives the class cake whenever the class average for a test is 70% or greater. As a result the students’ test scores improve. Punishment: 1. A person receives a fine for dropping litter, so they stop dropping litter. 2. A person doing DIY hits their fingers every time they use a particular hammer, so they stop using the hammer. Extinction: 1. A person doesn’t receive a drinks can after they have put mo ney into a vending machine, so they stop using the machine. 2. A toddler finds that people don’t pay attention to them when they throw a tantrum, so they stop throwing tantrums. 10 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 DESCRIPTION OF ACTIVITIES Negative reinforcement: 1. A person misses their hotel breakfast each time t hey sleep in. As a result they get up on time more frequently. 2. A pizza delivery person has money taken off their salary each time they take too long to make their deliveries, which causes the pizzas to arrive cold. As a result the pizza delivery person makes their deliveries more quickly. Approximate duration of activity: 30 minutes. NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 11 DESCRIPTION OF ACTIVITIES Activity 4: Interpreting and evaluating the first reported experimental evidence of operant conditioning For this activity students will read an extract from Edward Thorndike’s original paper, in which he describes his initial investigations into operant conditioning. Students will discuss the key points in the paper, evaluate the author’s claims and sketch predictive graphs from the results that are described. Students will finish by watching a video clip from a documentary that dramatises Thorndike’s experiment (access to YouTube required). They will be able to assess how closely their interpretation of the extract matches the events portrayed in the clip. Answers A diagram that shows a cat/dog/chicken in a box or pen with food outside and a gate to negotiate to escape from the enclosure. The two pictures below are based on Thorndike’s original designs. Independent variable: how many trials the animal had to escape from the box. Dependent variable: the time it took the animal to successfully escape from the box. If Thorndike had fed the animal when it failed to escape it would not have learned to associate its escape behaviour with a reward and might not have been motivated to try to escape in future trials. The animal might have learned that to be rewarded with food, all it would have to do is wait in the box until Thorndike released it. 12 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 DESCRIPTION OF ACTIVITIES Stimulus: the environment inside the box. Rewarded behaviour: the actions that allowed the animal to open the gate and escape from the box. Reward: the food placed visibly outside the box. The graph should show the trial number on the x-axis and time to escape on the y-axis. The time to escape should be inversely proportiona l to the trial number. The sentence that indicates Thorndike’s consideration for reliability is: ‘Enough animals were taken with each box or pen to make it sure that the results were not due to individual peculiarities.’ Thorndike claims to have used healthy naïve animals (ones that had not had any previous exposure to his experiment’s apparatus) that were kept uniformly hungry. Additional information needed: age of animals, sex of animals, type of food and quantity of food used as reward, exact dimensio ns and design of box, time of day when experiments were done, room temperature, background noise and other environmental factors, etc. Carry out the experiment using identical equipment and matched animals, but omit the reward. Thorndike’s law of effect: Behaviour changes because of its consequences. (Technically, in Thorndike’s words: ‘Of the several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected to the situation.’) Approximate duration of activity: 30 minutes. NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 13 DESCRIPTION OF ACTIVITIES Activity 5A: An investigation into rewarded and unrewarded behaviour: analysing results and drawing conclusions In this activity students are presented with a fictional experiment and two sets of graphical data that illustrate reinforcement or non -reinforcement of a behaviour and extinction. They will interpret the patterns of data and draw conclusions about the nature of operant conditioning. Answers 1. For Rat 1: As the number of trials increases from 1 to 6, the number of lever presses per minute increases from just about 0 to just below 100. From trial 6 to trial 10 the number of lever presses per minute remains at a fairly constant value around 100. For Rat 2: There is no increase in the rate of lever pressing over trials 1 to 10. The rate remains between 0 and 5 presses per minute. 2. When Rat 1 presses the lever it receives a sucrose pellet. The sucrose pellet functions as a reward and reinforces the Rat 1’s lever-pressing behaviour so that it carries it out at a higher frequency. Rat 2 does not receive a reward when it presses the lever so its lever pressing is not reinforced and remains at a low level. 3. The rate of lever pressing may not have increased between trials 6 and 10 because Rat 1 was pressing the lever as fast as was physically/mechanically possible. 4. Additional information: age of rats, health status of rats, amount of prior experience that rats had had of the operant conditioning chamber, sex of rats, mass of rats, normal feeding schedule of rats, species of rats, length of time between trial periods, time of day when experiment carried out, the nature of the environment external to the chamber, etc. 5. Only deliver food to the rat when it presses the lever a nd the light is on. 6. The rats may have been placed in separate choice chambers to eliminate the effects of competition between them. 7. Graph 2 shows that the number of responses per minute for both rats steadily decreases when the reinforcer is omitted. 8. Rat 1 initially received a sucrose pellet every time it pressed the lever in the chamber. Rat 3 initially received a sucrose pellet for every three lever presses it made. 14 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 DESCRIPTION OF ACTIVITIES 9. The frequency of unrewarded behaviour decreases faster if the rat has initially been conditioned using a continuous reinforcement schedule than if it has initially been conditioned using a partial reinforcement schedule. 10. The response rate would continue to decrease at approximately the same rate (students should be encouraged to add to the graph to make their prediction). 11. The term for withholding a reward after prior conditioning is ‘extinction’ because it results in the extinction (decrease in frequency) of the behaviour that has been conditioned. 12. The experiments could be repeated with more rats (of the same age, sex, species, health status, etc) and replicated with identical equipment to improve the reliability of the results. Approximate duration of activity: 40 minutes. NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 15 DESCRIPTION OF ACTIVITIES Activity 5B (Extension): Interpreting a scientific research paper on operant conditioning: what is learned? This extension activity will introduce high-ability students to reading and interpreting scientific research papers. The accompanying sheets will provide them with guidance on the structure of a research paper and the approach they should take when reading it. The students will read a section of an original research paper entitled: ‘Variations in the sensitivity of instrumental responding to reinforcer devaluation’ (Adams, 1982). This was an investigation into the nature of the association formed during operant conditioning of rat behaviour. The task is to produce a poster that summarises the key details and findings of one of the experiments that the paper describes. This is a very challenging activity, and students may require support to understand the concepts and protocols discussed in the paper. They should be encouraged to annotate their paper as much as possible with their thoughts and queries, and to identify the meaning of all unfamiliar terms. An explanation of the background to the experiment and its findings is provided in the document ‘A guide to the paper’. This document also includes discussion questions that could be posed to the students to help them focus on and understand the key points of the experiment. The answers to these are given below. Students can work individually for this extension activity, but they may benefit from doing the activity in a group. Answers 1. Reward: sucrose (a disaccharide sugar) pellets. Rewarded behaviour: pressing a lever. 2. Changing the value of the reinforcer: making it aversive to the rat instead of attractive (or vice-versa). Decreasing the value of reinforcer is known as reinforcer devaluation. 3. Successful integration: the rats have combined their knowledge about the value of the reinforcer with their knowledge that performing the conditioned response produces the reinforcer as a consequence. If the reinforcer has been devalued then successful integration will be indicated by a decrease in the rats’ response rates. 16 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 DESCRIPTION OF ACTIVITIES 4. Extinction test: withholding the reinforcer when the animal makes the conditioned response. 5. Appropriate change in responding: rat will respond less frequently. 6. Independent variable: the number of reinforced lever presses sucrose the rats performed in the baseline training phase of the experiment (100 or 500). Dependent variable: the frequency of lever pressing by the trained rats in the extinction test after the reinforcer had been devalued (technically the mean relative response r ate). 7. The reinforcer was devalued by exposing rats in one 100 group and one 500 group to a pairing between the reinforcer and sickness (achieved by injecting them with lithium chloride). As a result of this the rats learnt to associate the sucrose pellets with sickness and subsequently avoided eating the ones they were given freely. 8. Naïve: the rats had not previously taken part in any conditioning experiments. This meant they were all starting with the same level of experience. 9. The mass of each rat was reduced prior to the investigation so that the rats would remain hungry throughout the experiments and not become satiated after a limited number of food rewards. This means that the sucrose pellets would retain their effectiveness as a positive r einforcer until devalued. 10. The diagram should be similar to those found on pages 12 and 16 in the student document, and in the video that is linked to on page 12. A full description of the boxes used is given on the fourth page of the research paper under ‘Apparatus’. The term magazine refers to the part of the box from which the rat obtains the sucrose pellets. NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 17 DESCRIPTION OF ACTIVITIES 11. 18 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 DESCRIPTION OF ACTIVITIES 12. Controls: The rats in the groups that did not receive reinforcer devaluation (100-U and 500-U) were effectively controls for the rats in the groups that did receive reinforcer devaluation (100 -P and 500-P). Rats in the unpaired groups received lithium chloride injections that were not associated with the consumption of sucrose pellets (to control for the effects of injection). Rats in the unpaired groups received matched quantities of sucrose pellets and a saline injection (harmless) in the food-aversion training phase (to control for the effect of being exposed to extra pellets and having this paired with an injection). Rats in the paired groups received a saline injection in the food -aversion training phase (to control for the effect of the unpaired injection the unpaired groups received). 13. Food aversion training: continued until all the rats in the paired groups no longer ate the sucrose pellets when they were given them freely (without the requirement to press a lever in order to receive them). 14. After baseline training the rats that had performed 500 reinforced lever presses pressed the lever at higher rate than the rat s that had performed 100 reinforced lever presses. 15. The response rate of each rat during testing was presented as a ratio of the rat’s response rate on the last day of baseline training. 16. The 100-P and 100-U groups have significantly different mean relative response rates. The 500-P and 500-U groups do not have significantly different mean relative response rates. The 100-P group showed sensitivity to reinforcer devaluation. Their mean relative response rate in the extinction test was significan tly lower than the 100-U group. This suggests that the rats had learnt that their lever pressing was associated with the outcome of a food reward and pressed the lever to obtain the food. The 500-P group did not show sensitivity to reinforcer devaluation. Their mean relative response rate in the extinction test was not significantly different from that of the 500-U group. This suggests that these rats had learnt that lever pressing was associated with exposure to the lever, and pressed the lever out of habit. 17. 18. Reacquisition test: This was carried out to assess the effectiveness of the food aversion training in devaluing the reinforcer. It measured how the frequency of lever pressing changed when lever pressing resulted in the delivery of a sucrose pellet again after the extinction test. The reacquisition test was important to do because it allowed the authors to NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 19 DESCRIPTION OF ACTIVITIES check whether devaluation of the reinforcer had been equally effective for both the 100-P and 500-P groups. If it hadn’t then this might have been the reason for any difference in response rates found in the extinction test. 19. The results of the reacquisition test suggest that the food -aversion training was equally effective in devaluing the reinforcer for rats in the 100-P and 500-P groups. Compared to the groups that did not experience reinforcer devaluation, the lever pressing rate of the 100 -P and 500-P rats did not recover to the baseline level, and did not differ significantly between the two groups. 20. Conclusion: with extended training the rats’ lever pressing response became a habit. 21. Confounding variables: 1. The number of responses (lever presses) and the number of reinforcers (sucrose pellets) experienced by the rats (we don’t know the extent of the contribution each one made to the measured effect of extended training on reinforcer devaluation. 2. The duration of training the rats received (2 days for 100 -P and 100-U, vs. 10 days for 500-P and 500-U). Approximate duration of activity: 120 minutes, depending on the level of support students require to understand the research paper. 20 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 DESCRIPTION OF ACTIVITIES Activity 6: Research into the shaping of behaviour and its applications In this activity students will carry out independent internet-based research into the shaping of behaviour and its applica tions. Their aim will be to answer a number of questions then present their findings to their peers. Their presentation should include a dramatised scene to illustrate how the process of shaping works. Students should work in groups of two to four for this activity. Approximate duration of activity: 90 minutes. NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011 21 REFERENCES Reference Adams, C D. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q J Exp Psychol.1982; 34B: 77–98. 22 NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY) © Learning and Teaching Scotland 2011