Human Biology Case Study into Rewarded Behaviour, Unrewarded Behaviour and Shaping Learning

advertisement
NATIONAL QUALIFICATIONS CURRICULUM SUPPORT
Human Biology
Case Study into Rewarded
Behaviour, Unrewarded Behaviour
and Shaping Learning
Teacher’s Notes
[HIGHER]
The Scottish Qualifications Authority regularly reviews
the arrangements for National Qualifications. Users of
all NQ support materials, whether published by
Learning and Teaching Scotland or others, are
reminded that it is their responsibility to check that the
support materials correspond to the requirements of the
current arrangements.
Acknowledgement
Learning and Teaching Scotland gratefully acknowledges this contribution to the National
Qualifications support programme for Human Biology.
The publisher gratefully acknowledges permission to use the following sources: image of
Thorndike’s cat from www.animalbehavoiur.net; image from
http://www.simplypsychology.pwp.blueyonder.co.uk/thorndike.jpg
Every effort has been made to trace all the copyright holders but if any have been inadvertently
overlooked, the publishers will be pleased to make the necessary arrangements at the first
opportunity.
© Learning and Teaching Scotland 2011
This resource may be reproduced in whole or in part for educational purposes by educational
establishments in Scotland provided that no profit accrues at any stage.
2
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
Contents
Introduction
4
Description of activities
Activity 1: A practical with a human subject
6
Activity 2: An introduction to rewarded behaviour
8
Activity 3A: Identifying examples of positive reinforcement
of behaviour
9
Activity 3B (Extension): Beyond rewarded behaviour: other
forms of operant conditioning
9
Activity 4: Interpreting and evaluating the first reported
experimental evidence of operant conditioning
12
Activity 5A: An investigation into rewarded and unrewarded
behaviour: analysing results and drawing conclusions
14
Activity 5B (Extension): Presenting a scientific research paper
on operant conditioning: what is learned?
16
Activity 6: Research into the shaping of behaviour and its
applications
21
Reference
22
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
3
INTRODUCTION
Introduction
Outline
This case study is divided into six sections. Each section comprises an
activity based around an aspect of operant conditioning. Activities 2, 5A and
6 are sufficient to cover the content of the Higher Human Biology course ; the
others can be used, if desired, to cover the topic in more depth . Two activities
(3 and 5) have optional extension activities that are suitable for use with
high-ability students. All activities that are selected should be completed
sequentially.
Background
Operant conditioning can be described by three variables: the behaviour, the
consequence of the behaviour and the discriminative stimulus, which
signals the contingency (relationship) between the behaviour and the
consequence. The consequence is usually referred to as a reinforcer because it
has the potential to change the frequency of the be haviour. In the case of
rewarded behaviour, the reinforcer has an attractive value to the animal and
has a positive contingency with the behaviour (it is presented whenever the
behaviour is carried out). As a result of this the frequency of the rewarded
behaviour increases.
For example, if a rat presses a lever and receives a food reward, then the
lever pressing is the behaviour, the food is the consequence/reinforcer and the
sight of the lever is the discriminative stimulus. The presentation of the food
reinforces the lever-pressing behaviour so that its frequency increases.
Another way of describing this is to say that the rat’s lever-pressing
behaviour has been conditioned by reinforcement with a food reward. A
common mistake made by students is to describe an animal as having ‘been
conditioned’. It is always a specific behaviour that is conditioned, not the
whole animal.
It is worth likening the process of operant conditioning to natural selection.
The starting point is variation in the responses an ind ividual makes. A
specific response is then selected for because it benefits the individual in
4
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
INTRODUCTION
some way in the environment in which it is performed. This response then
becomes more likely to be repeated and its frequency increases as long as it
continues to benefit the individual.
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
5
DESCRIPTION OF ACTIVITIES
Description of activities
Activity 1: A practical with a human subject
This is a practical activity in which students condition their peers’ behaviour.
The students will carry out an experiment in groups of three. One me mber of
the group will be the subject, one will be the experimenter and one will be the
monitor. The experimenter will instrumentally condition an aspect of the
subject’s behaviour (tapping their chin). It is important that the students are
naïve to the theory of operant conditioning and the expected outcomes of the
experiment in order for it to work as intended, so no mention of operant
conditioning should be made. The activity is presented to the students as a
practical that will introduce some aspects of this topic.
Each student will have their own role-specific instruction card to read.
However, it may be worth briefing the experimenters and the monitors
verbally with questions to make sure that they understand what they have to
do during the experiment. It is important to emphasise to the experimenter
that if chin tapping does not occur at all during the first minutes of the
experiment then they should pick another behaviour to reward.
The experiment should take each group a maximum of 30 minutes . After the
students have completed the experiment they will discuss what occurred in
their groups. They will then need to be debriefed on the events that occurred.
In the final part of the activity the students will plot a graph of the class
results. A computer with a spreadsheet is needed.
Background
For background information for this practical visit
http://cogprints.org/604/1/biblio25.html.
Equipment required
Each group requires:


6
1 stop clock
paper and pencils
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
DESCRIPTION OF ACTIVITIES


space (each group should work in an environment free from visual and
auditory distraction)
a large bag of chocolate buttons or equivalent.
Debriefing the students
After the students have done the experiment it will be necessary to review t he
process that was occurring and the key concepts of rewarded behaviour.
Emphasis should be placed on the terms behaviour, reward and
conditioning. The questions below can be used to guide the discussion:

Was the subject’s conditioned behaviour carried o ut voluntarily?

Did the subject know why the pencil was being tapped on the table/why
they were receiving points?

Was the subject’s conditioned behaviour carried out consciously?

Why did the subject’s behaviour change as it did over the course of the
experiment?

Why did the subject’s behaviour change as it did in the last 5 minutes of
the experiment?

Did learning occur during the experiment?

What advantages does this kind of behaviour provide?

What aspect of human behaviour often allows repetitive tr aining to be
circumvented? (Language: the ability to communicate abstract concepts
to other individuals). How does this work? (A person can learn that
certain behaviour results in a reward simply by being told so. For
example, telling the subject at the start of the session that they would
receive a point every time they tapped their chin would result in them
producing a high response rate immediately).

Did all the experimenters succeeding in conditioning chin tapping in their
subjects? What factors could have prevented conditioning from occurring?
Approximate duration of activity: 90 minutes.
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
7
DESCRIPTION OF ACTIVITIES
Activity 2: An introduction to rewarded behaviour
In this activity students read information then watch a video clip (access to
YouTube needed) and answer questions. The aim is for students to become
familiar with the process of rewarded behaviour and the terminology that is
used to describe it.
Answers
 The stimulus in Activity 1 was simply the presence of the experimenter.
 The behaviour being rewarded is the pressing of the lever (it is repeated
throughout the clip).
 The reward is likely to be food.
 The rat associates the presence of the lever with the reward.
 The rat may also associate the light with the rewarded behaviour.
 By behaving in this way the rat increases the mass of food it obtains. Food
provides the rat with a source of energy and materials for growth.
Obtaining more food increases the rat’s chances of surviving, reproducing
and passing on its genes.
 The use of an operant conditioning chamber allows a researcher to control
or systematically alter many environmental variables, eg temperature, light
intensity, background noise, availability of food. Lever pressing can be
quantified and measured (usually electronically).
Approximate duration of activity: 25 minutes.
8
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
DESCRIPTION OF ACTIVITIES
Activity 3A: Identifying examples of positive reinforcement of
behaviour
This activity gives students the opportunity to apply what they have learned
about positive reinforcement in the previous activity. They will complete an
online quiz (access to the internet needed) and self -assess their answers.
Approximate duration of activity: 30 minutes.
Activity 3B (Extension): Beyond rewarded behaviour: other
forms of operant conditioning
In the extension to this activity students are introduced to the four different
forms of operant conditioning using animal examples and will construct
further examples of their own in the form of comic strips.
Answers
The responses should be described precisely:
 Category One: Response = loud mewing. Reinforcer value = attractive
(this means the outcome is rewarding/beneficial for the animal). The
response results in the presentation of the reinforcer (carrying out the
mewing results in food). This increases the rate of responding so that the
animal receives more rewards.
 Category Two: Response = touching electric fence. Reinforcer value =
aversive (the outcome is detrimental to the animal). The response results
in the presentation of the reinforcer (touching the fence results in getting
an electric shock). This decreases the rate of responding so that the animal
receives fewer unpleasant experiences.
 Category Three: Response = scratching mother’s face. Reinforcer value =
attractive. The response results in the omission of the reinforcer and this
decreases the rate of responding.
 Category Four: Response = pressing lever. Reinforcer value = aversive.
The response results in the omission of the reinforcer and this increases
the rate of responding.
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
9
DESCRIPTION OF ACTIVITIES
Relationship between response and reinforcer
Value of reinforcer
Positive (the reinforcer
is presented when the
response is made)
Negative (the reinforcer
is omitted when the
response is made)
Attractive
(its presence benefits
the animal)
Positive reinforcement
(reward)
The rate of responding
increases.
Extinction
The rate of responding
decreases.
Aversive
(its presence harms the
animal)
Punishment
The rate of responding
decreases.
Negative reinforcement
(avoidance)
The rate of responding
increases.
Human examples
 Positive reinforcement:
1.
A child receives money every time they tidy their room so they tidy
their room more frequently.
2.
A teacher gives the class cake whenever the class average for a test
is 70% or greater. As a result the students’ test scores improve.
 Punishment:
1.
A person receives a fine for dropping litter, so they stop dropping
litter.
2.
A person doing DIY hits their fingers every time they use a
particular hammer, so they stop using the hammer.
 Extinction:
1.
A person doesn’t receive a drinks can after they have put mo ney into
a vending machine, so they stop using the machine.
2.
A toddler finds that people don’t pay attention to them when they
throw a tantrum, so they stop throwing tantrums.
10
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
DESCRIPTION OF ACTIVITIES
 Negative reinforcement:
1.
A person misses their hotel breakfast each time t hey sleep in. As a
result they get up on time more frequently.
2.
A pizza delivery person has money taken off their salary each time
they take too long to make their deliveries, which causes the pizzas
to arrive cold. As a result the pizza delivery person makes their
deliveries more quickly.
Approximate duration of activity: 30 minutes.
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
11
DESCRIPTION OF ACTIVITIES
Activity 4: Interpreting and evaluating the first reported
experimental evidence of operant conditioning
For this activity students will read an extract from Edward Thorndike’s
original paper, in which he describes his initial investigations into operant
conditioning. Students will discuss the key points in the paper, evaluate the
author’s claims and sketch predictive graphs from the results that are
described. Students will finish by watching a video clip from a documentary
that dramatises Thorndike’s experiment (access to YouTube required). They
will be able to assess how closely their interpretation of the extract matches
the events portrayed in the clip.
Answers

A diagram that shows a cat/dog/chicken in a box or pen with food outside
and a gate to negotiate to escape from the enclosure. The two pictures
below are based on Thorndike’s original designs.

Independent variable: how many trials the animal had to escape from the
box.
Dependent variable: the time it took the animal to successfully escape
from the box.

If Thorndike had fed the animal when it failed to escape it would not have
learned to associate its escape behaviour with a reward and might not
have been motivated to try to escape in future trials. The animal might
have learned that to be rewarded with food, all it would have to do is wait
in the box until Thorndike released it.
12
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
DESCRIPTION OF ACTIVITIES

Stimulus: the environment inside the box.
Rewarded behaviour: the actions that allowed the animal to open the
gate and escape from the box.
Reward: the food placed visibly outside the box.
 The graph should show the trial number on the x-axis and time to escape
on the y-axis. The time to escape should be inversely proportiona l to the
trial number.
 The sentence that indicates Thorndike’s consideration for reliability is:
‘Enough animals were taken with each box or pen to make it sure that the
results were not due to individual peculiarities.’
 Thorndike claims to have used healthy naïve animals (ones that had not
had any previous exposure to his experiment’s apparatus) that were kept
uniformly hungry.
 Additional information needed: age of animals, sex of animals, type of
food and quantity of food used as reward, exact dimensio ns and design of
box, time of day when experiments were done, room temperature,
background noise and other environmental factors, etc.
 Carry out the experiment using identical equipment and matched animals,
but omit the reward.
 Thorndike’s law of effect: Behaviour changes because of its
consequences. (Technically, in Thorndike’s words: ‘Of the several
responses made to the same situation, those which are accompanied or
closely followed by satisfaction to the animal will, other things being
equal, be more firmly connected to the situation.’)
Approximate duration of activity: 30 minutes.
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
13
DESCRIPTION OF ACTIVITIES
Activity 5A: An investigation into rewarded and unrewarded
behaviour: analysing results and drawing conclusions
In this activity students are presented with a fictional experiment and two sets
of graphical data that illustrate reinforcement or non -reinforcement of a
behaviour and extinction. They will interpret the patterns of data and draw
conclusions about the nature of operant conditioning.
Answers
1.
For Rat 1: As the number of trials increases from 1 to 6, the number of
lever presses per minute increases from just about 0 to just below 100.
From trial 6 to trial 10 the number of lever presses per minute remains
at a fairly constant value around 100. For Rat 2: There is no increase in
the rate of lever pressing over trials 1 to 10. The rate remains between 0
and 5 presses per minute.
2.
When Rat 1 presses the lever it receives a sucrose pellet. The sucrose
pellet functions as a reward and reinforces the Rat 1’s lever-pressing
behaviour so that it carries it out at a higher frequency. Rat 2 does not
receive a reward when it presses the lever so its lever pressing is not
reinforced and remains at a low level.
3.
The rate of lever pressing may not have increased between trials 6 and
10 because Rat 1 was pressing the lever as fast as was
physically/mechanically possible.
4.
Additional information: age of rats, health status of rats, amount of
prior experience that rats had had of the operant conditioning chamber,
sex of rats, mass of rats, normal feeding schedule of rats, species of
rats, length of time between trial periods, time of day when experiment
carried out, the nature of the environment external to the chamber, etc.
5.
Only deliver food to the rat when it presses the lever a nd the light is on.
6.
The rats may have been placed in separate choice chambers to eliminate
the effects of competition between them.
7.
Graph 2 shows that the number of responses per minute for both rats
steadily decreases when the reinforcer is omitted.
8.
Rat 1 initially received a sucrose pellet every time it pressed the lever
in the chamber. Rat 3 initially received a sucrose pellet for every three
lever presses it made.
14
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
DESCRIPTION OF ACTIVITIES
9.
The frequency of unrewarded behaviour decreases faster if the rat has
initially been conditioned using a continuous reinforcement schedule
than if it has initially been conditioned using a partial reinforcement
schedule.
10.
The response rate would continue to decrease at approximately the same
rate (students should be encouraged to add to the graph to make their
prediction).
11.
The term for withholding a reward after prior conditioning is
‘extinction’ because it results in the extinction (decrease in frequency)
of the behaviour that has been conditioned.
12.
The experiments could be repeated with more rats (of the same age, sex,
species, health status, etc) and replicated with identical equipment to
improve the reliability of the results.
Approximate duration of activity: 40 minutes.
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
15
DESCRIPTION OF ACTIVITIES
Activity 5B (Extension): Interpreting a scientific research
paper on operant conditioning: what is learned?
This extension activity will introduce high-ability students to reading and
interpreting scientific research papers. The accompanying sheets will provide
them with guidance on the structure of a research paper and the approach they
should take when reading it.
The students will read a section of an original research paper entitled:
‘Variations in the sensitivity of instrumental responding to reinforcer
devaluation’ (Adams, 1982). This was an investigation into the nature of the
association formed during operant conditioning of rat behaviour. The task is
to produce a poster that summarises the key details and findings of one of the
experiments that the paper describes.
This is a very challenging activity, and students may require support to
understand the concepts and protocols discussed in the paper. They should be
encouraged to annotate their paper as much as possible with their thoughts
and queries, and to identify the meaning of all unfamiliar terms. An
explanation of the background to the experiment and its findings is provided
in the document ‘A guide to the paper’. This document also includes
discussion questions that could be posed to the students to help them focus on
and understand the key points of the experiment. The answers to these are
given below.
Students can work individually for this extension activity, but they may
benefit from doing the activity in a group.
Answers
1.
Reward: sucrose (a disaccharide sugar) pellets. Rewarded behaviour:
pressing a lever.
2.
Changing the value of the reinforcer: making it aversive to the rat
instead of attractive (or vice-versa). Decreasing the value of reinforcer
is known as reinforcer devaluation.
3.
Successful integration: the rats have combined their knowledge about
the value of the reinforcer with their knowledge that performing the
conditioned response produces the reinforcer as a consequence. If the
reinforcer has been devalued then successful integration will be
indicated by a decrease in the rats’ response rates.
16
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
DESCRIPTION OF ACTIVITIES
4.
Extinction test: withholding the reinforcer when the animal makes the
conditioned response.
5.
Appropriate change in responding: rat will respond less frequently.
6.
Independent variable: the number of reinforced lever presses sucrose
the rats performed in the baseline training phase of the experiment (100
or 500). Dependent variable: the frequency of lever pressing by the
trained rats in the extinction test after the reinforcer had been devalued
(technically the mean relative response r ate).
7.
The reinforcer was devalued by exposing rats in one 100 group and one
500 group to a pairing between the reinforcer and sickness (achieved by
injecting them with lithium chloride). As a result of this the rats learnt
to associate the sucrose pellets with sickness and subsequently avoided
eating the ones they were given freely.
8.
Naïve: the rats had not previously taken part in any conditioning
experiments. This meant they were all starting with the same level of
experience.
9.
The mass of each rat was reduced prior to the investigation so that the
rats would remain hungry throughout the experiments and not become
satiated after a limited number of food rewards. This means that the
sucrose pellets would retain their effectiveness as a positive r einforcer
until devalued.
10.
The diagram should be similar to those found on pages 12 and 16 in the
student document, and in the video that is linked to on page 12. A full
description of the boxes used is given on the fourth page of the research
paper under ‘Apparatus’. The term magazine refers to the part of the
box from which the rat obtains the sucrose pellets.
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
17
DESCRIPTION OF ACTIVITIES
11.
18
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
DESCRIPTION OF ACTIVITIES
12.
Controls: The rats in the groups that did not receive reinforcer
devaluation (100-U and 500-U) were effectively controls for the rats in
the groups that did receive reinforcer devaluation (100 -P and 500-P).
Rats in the unpaired groups received lithium chloride injections that
were not associated with the consumption of sucrose pellets (to control
for the effects of injection). Rats in the unpaired groups received
matched quantities of sucrose pellets and a saline injection (harmless)
in the food-aversion training phase (to control for the effect of being
exposed to extra pellets and having this paired with an injection). Rats
in the paired groups received a saline injection in the food -aversion
training phase (to control for the effect of the unpaired injection the
unpaired groups received).
13.
Food aversion training: continued until all the rats in the paired
groups no longer ate the sucrose pellets when they were given them
freely (without the requirement to press a lever in order to receive
them).
14.
After baseline training the rats that had performed 500 reinforced lever
presses pressed the lever at higher rate than the rat s that had performed
100 reinforced lever presses.
15.
The response rate of each rat during testing was presented as a ratio of
the rat’s response rate on the last day of baseline training.
16.
The 100-P and 100-U groups have significantly different mean relative
response rates. The 500-P and 500-U groups do not have significantly
different mean relative response rates.
The 100-P group showed sensitivity to reinforcer devaluation. Their
mean relative response rate in the extinction test was significan tly
lower than the 100-U group. This suggests that the rats had learnt that
their lever pressing was associated with the outcome of a food reward
and pressed the lever to obtain the food. The 500-P group did not show
sensitivity to reinforcer devaluation. Their mean relative response rate
in the extinction test was not significantly different from that of the
500-U group. This suggests that these rats had learnt that lever pressing
was associated with exposure to the lever, and pressed the lever out of
habit.
17.
18.
Reacquisition test: This was carried out to assess the effectiveness of
the food aversion training in devaluing the reinforcer. It measured how
the frequency of lever pressing changed when lever pressing resulted in
the delivery of a sucrose pellet again after the extinction test. The
reacquisition test was important to do because it allowed the authors to
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
19
DESCRIPTION OF ACTIVITIES
check whether devaluation of the reinforcer had been equally effective
for both the 100-P and 500-P groups. If it hadn’t then this might have
been the reason for any difference in response rates found in the
extinction test.
19.
The results of the reacquisition test suggest that the food -aversion
training was equally effective in devaluing the reinforcer for rats in the
100-P and 500-P groups. Compared to the groups that did not
experience reinforcer devaluation, the lever pressing rate of the 100 -P
and 500-P rats did not recover to the baseline level, and did not differ
significantly between the two groups.
20.
Conclusion: with extended training the rats’ lever pressing response
became a habit.
21.
Confounding variables: 1. The number of responses (lever presses)
and the number of reinforcers (sucrose pellets) experienced by the rats
(we don’t know the extent of the contribution each one made to the
measured effect of extended training on reinforcer devaluation. 2. The
duration of training the rats received (2 days for 100 -P and 100-U, vs.
10 days for 500-P and 500-U).
Approximate duration of activity: 120 minutes, depending on the level of
support students require to understand the research paper.
20
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
DESCRIPTION OF ACTIVITIES
Activity 6: Research into the shaping of behaviour and its
applications
In this activity students will carry out independent internet-based research
into the shaping of behaviour and its applica tions. Their aim will be to
answer a number of questions then present their findings to their peers. Their
presentation should include a dramatised scene to illustrate how the process
of shaping works.
Students should work in groups of two to four for this activity.
Approximate duration of activity: 90 minutes.
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
21
REFERENCES
Reference
Adams, C D. Variations in the sensitivity of instrumental responding to
reinforcer devaluation. Q J Exp Psychol.1982; 34B: 77–98.
22
NEUROBIOLOGY AND BEHAVIOUR (H, HUMAN BIOLOGY)
© Learning and Teaching Scotland 2011
Download