Learning Theories 2 and 3

advertisement
Learning Theory 2 –
Trial and Error Learning
Involves learning by trying alternative possibilities until the desired outcome is achieved.
When an organism continues to explore their environment until they discover a response that will
allow them to reach their desired goal.
Edward Thorndike
Thorndike created a puzzle box. He placed a hungry cat in the box. The cat could see out and even
stick its paws out between the wooden slats. The only way for the cat to escape was through a door,
which could be opened by pressing a lever inside the box. Thorndike placed a piece of fish outside
the box, just out of reach. If the cat wanted to eat the fish, it had to work out how to get out of the
box through the door. Thorndike observed the cat’s behaviour and recorded the length of time it
took for the cat to press the lever and exit the box.
When the cat successfully hit the lever (accidentally at first), it could escape and eat the fish
(pleasing consequence).
The cat was placed in the box again and the process was repeated over a number of trials until the
cat would press the lever as soon as it was placed in the box.
Through trial and error, the cat was able to learn how to free itself from the box and obtain the fish.
Thorndike called this the ‘law of effect’.
Law of effect: behaviour that is accompanied or closely followed by a ‘satisfying’ consequence is
more likely to recur (strengthened) and a behaviour that is accompanied or closely followed by
‘annoying’ consequences or discomfort is less likely to recur (weakened).
Youtube link: Thorndike’s puzzle box
http://www.youtube.com/watch?v=BDujDOLre-8
Example of Trial and Error Learning: Matchstick puzzles
http://www.redheads.com.au/games.php
1
Learning Theory 3 –
Operant Conditioning
A learning process in which the likelihood of a behaviour being repeated is determined by the
consequences of that behaviour.
Operant: a response (or set of responses) that occurs and acts (operates) on the environment to
produce some kind of effect.
Operant Conditioning is based on Thorndike’s Law of Effect. An organism will tend to repeat a
behaviour (operant) that has desirable consequences, or that will enable it to avoid undesirable
consequences.
Organisms will tend to not repeat a behaviour that has an undesirable consequence.
Operant Conditioning is also known as ‘Instrumental Learning’.
Instrumental learning: refers to the process through which an organism learns the association
between behaviours and its consequences.
B. F. Skinner
An American behavioural psychologist. Skinner’s work into operant conditioning was pivotal in
determining what we know about such techniques today. Some of his reinforcement techniques
included teaching pigeons how to dance, play ping-pong and bowl a ball in a mini bowling alley.
Youtube link: Pigeon ping-pong
http://www.youtube.com/watch?v=vGazyH6fQQ4
Skinner created an apparatus for studying operant conditioning. This conditioning chamber is known
as the ‘Skinner box’. Skinner conducted many experiments with the Skinner box, most notably with
rats.
Hungry rats were placed in the box and learnt to press a lever to receive food.
Youtube link: Skinner box
http://www.youtube.com/watch?v=PQtDTdDr8vs
Three-phase model of Operant Conditioning
1. The stimulus precedes an operant (S)
2. The operant response to the stimulus (R)
3. The consequence to the operant response (C)
Stimulus (S) --> Response (R) --> Consequence (C)
2
Elements of Operant Conditioning
Reinforcement: occurs when a stimulus strengthens the likelihood of a response that it follows.
Reinforcer: can also be known as a reward.
Schedule of reinforcement: a program for giving reinforcement, specifically the frequency and
manner in which a desired response is reinforced.
Schedules of reinforcement
 Continuous
 Partial
 Fixed-Ratio
 Fixed-Interval
 Variable-Ratio
 Variable-Interval
Continuous reinforcement: the reinforcer is provided immediately after every correct response.
Partial reinforcement: the process of reinforcing some correct responses, but not all of them.
There are four basic schedules of partial reinforcement.
Ratio = number
Interval = time
Fixed = set
Variable = unpredictable
Fixed-ratio schedule: the reinforcer is given after a set, unvarying number of desired responses have
been made.
Variable-ratio schedule: the reinforcer is given after an unpredictable number of correct responses.
Fixed-interval schedule: the reinforcer is given after a specific period of time has elapsed since the
previous reinforcer, provided the correct response has been made.
Variable-interval schedule: the reinforcer is given after irregular periods of time have passed,
provided the correct response has been made.
 A variable-ratio schedule of reinforcement is least susceptible to extinction and is employed
by gambling businesses (poker machines).
Youtube link: Schedules of reinforcement
http://www.youtube.com/watch?v=I_ctJqjlrHA
3
Types of reinforcement
Positive reinforcement: a stimulus that strengthens or increases the frequency or likelihood of a
desired response by providing a satisfying consequence (reward).
Negative reinforcement: The removal or avoidance of an unpleasant stimulus. It increases the
likelihood of a response being repeated and thereby strengthening the response.
Punishment: the delivery of an unpleasant consequence following a response, or the removal of a
pleasant consequence following a response.
There are two types of punishment: positive punishment and negative punishment (response cost).
Positive punishment: involves the presentation or introduction of a stimulus that decreases the
likelihood of a response occurring again (giving something).
Negative punishment: involves the removal of a stimulus and thereby decreasing the likelihood of a
response occurring again (taking something away).
Response cost: described as involving any valued stimulus being removed whether it causes the
behaviour or not.
Giving something
Taking away
Positive Reinforcer
Negative Reinforcer
Positive Punishment
Negative Punishment (Response cost)
Presented
Removed
After response, event is:
Type of event
Pleasant
Unpleasant
Positive
reinforcement
Discomfort
follows response
Positive event
follows response
Punishment
Positive state
removed after
response
Negative
reinforcement
Punishment
(response cost)
Discomfort removed
by response
Increases
desirable
behaviour
Decreases
undesirable
behaviour
4
The Simpsons – Duffless
Learning
1. In Lisa’s first experiment how did the hamster demonstrate insight learning?
2. In Lisa’s second experiment, could the hamster’s behaviour be considered operant conditioning? Why/why not?
3a) Find an example of classical conditioning in the episode. What was it?
b) What was the:
UCS:
UCR:
NS:
CS:
CR:
Research Methods
If Marge was conducting an experiment on Homer when he chose not to drink Duff, answer the following questions.
1. What experimental design did Marge use?
2. The IV in her study was ______________________________________________. The DV in her study was
_________________________________________.
3. Write a hypothesis because Marge can’t because she doesn’t have enough fingers.
4. What are possible extraneous variables?
5
Operant Conditioning – Worksheet 1
Tick the appropriate column to indicate which schedule of reinforcement is being used in each situation.
Fixed Interval
Fixed Ratio
Variable
Interval
Variable Ratio
Louis is given 10 per cent commission on
every computer he sells at work.
A fisherman is catching fish off a pier.
A surfer is riding the waves at Bells Beach.
A student is given $100 by his parents if he
achieves A’s on his report card after every
semester at school.
Bob is playing the poker machines at the
casino.
Jennifer receives free gifts after every
$1000 she spends on her credit card.
Liz presses the button for the pedestrian
lights at the intersection.
A rat is reinforced after every 10th time it
presses a lever.
A rat is reinforced after, on average, every
10th time it presses a lever.
A person checks to see whether a load of
washing in the machine is complete.
A salesperson receives a bonus for every
four perfume bottles they sell.
A pigeon is reinforced for its first peck after
a light comes on every two minutes.
Claire repeatedly dials a busy telephone
line.
Gustav checks to see whether the chicken
he is roasting is cooked.
Tim hands in a weekly report for the school
newspaper.
6
Operant Conditioning – Worksheet 2
Tick the appropriate column to indicate which element of operant conditioning is being used in each situation.
Positive
Negative
Punishment
Response Cost
Reinforcement Reinforcement
A rat quickly learns to press a bar to stop an
electric shock being administered through
the floor of its cage.
Nadia receives pocket money for helping
around the house.
Sarah crashes her parents’ car into the
garage after being told not to drive the car,
so she is grounded for a month.
Peter is fined $200 for speeding.
Sam is talking in class and not doing any
work, so the teacher walks to his desk and
stands behind him. Sam stops talking and
starts to work.
Georgia receives an A+ on her Psychology
SAC after studying hard.
Sophie’s parents chastise her when she eats
with her fingers at the dining table.
Fido’s owner puts a collar on him that
releases an unpleasant sound that only dogs
can hear, every time Fido barks.
I have a headache, so I take an aspirin and
the headache goes away.
Brian does his homework to stop his parents
nagging him.
The judge tells Mr Axe that he is to spend 30
years in jail for the murder he committed.
Ari earns a bonus at work for selling more
computers than every other sales person.
Sonia rubs some ‘After sun’ moisturiser on
her sunburn and it stops itching. The next
time she is sunburned, she applies the ‘After
sun’.
Eric fails to return home from a party until
after his curfew, so his parents take away
his mobile phone and car for a week.
Mrs Garcia gives all of her students ‘free
time’ on the computers when they
complete their work.
Marvin is sent a bill for his mobile phone,
but he forgets to pay it, so the company
sends him another bill, including a late fee.
7
Download