Lecture 3: Instrumental learning The puzzle box Historical Roots • Darwin – variance b/w animals can be described by the commonalities between them and other animals both in terms of behaviour and mental abilities – led to ‘comparative psychology’ That’s one smart monkey Historical Roots • Darwin – variance b/w animals can be described by the commonalities between them and other animals both in terms of behaviour and mental abilities – led to ‘comparative psychology’ BUT is this insight/intelligence? (i.e. how did the animal actually acquire this behaviour) Edward Lee Thorndike 1. Cat locked in puzzle box, 2. Cat makes the ‘right’ response, 3. Door opens, 4. Eat kipper (Yum). The puzzle box Edward Lee Thorndike 1. Cat locked in puzzle box, 2. Cat makes the ‘right’ response, 3. Door opens, 4. Eat kipper (Yum). The puzzle box Edward Lee Thorndike 1. Cat locked in puzzle box, 2. Cat makes the ‘right’ response, 3. Door opens, 4. Eat kipper (Yum). The puzzle box Edward Lee Thorndike 1. Cat locked in puzzle box, 2. Cat makes the ‘right’ response, 3. Door opens, 4. Eat kipper (Yum). The puzzle box Trial and error • E.g., time for cat to escape from puzzle box. • Observed progressive improvement over many trials, not ‘sudden insight’. Discrete trial procedures •Single trial procedures. •Measured objective dependent variables such as ‘time’ or ‘errors’. (Used by other researchers) The puzzle box Law of Effect • What a human or animal does is strongly influenced by the immediate consequences of such behaviour in the past: “Of several responses made to the same situation, those that are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur” - Thorndike (1911, p. 244) Tripartite Contingency - ABC •Antecedent: The stimulus controlling behaviour; – The Discriminitive Stimuli (Sd). • Behaviour: What is the Response being reinforced? – The Response (R) •Consequence: What is the immediate outcome of a behaviour? – The Reinforcing Stimulus (Sr). B. F. Skinner • A limitation with Thorndike’s method was the discrete trial: – When subject can respond is constrained, – One response and/or one reinforcer per trial, – Handling stress. Free operant procedure 1. Rat placed in Skinner Box 2. Rat makes ‘right’ response, 3. Rat gets food 4. Repeat from 2. Free operant procedure 1. Rat placed in Skinner Box 2. Rat makes ‘right’ response, 3. Rat gets food 4. Repeat from 2. Free operant procedure 1. Rat placed in Skinner Box 2. Rat makes ‘right’ response, 3. Rat gets food 4. Repeat from 2. Free operant procedure 1. Rat placed in Skinner Box 2. Rat makes ‘right’ response, 3. Rat gets food 4. Repeat from 2. Instrumental learning Note the importance difference between these procedures and those used by Pavlov: – Pavlov: The subject has no control over events, but responds to them. – Thorndike/Skinner: The subject has to respond to change the circumstances. •The behaviour is instrumental in determining what happens. What are Reinforcers Many reinforcers are intrinsically valued (primary reinforcers) e.g. giving a dog food But this is not always the case: – Secondary reinforcers: Acquire their reinforcing properties through experience, e.g. clicker with dog – Social reinforcement, e.g. praise – Sensory & activity reinforcers, e.g. learning guitar A brief exercise… A brief exercise… Shaping Principle of successive approximation. • Reinforce behaviours that are closer and closer to a target behaviour • Gradually make the conditions of reinforcement more stringent, more precise. • Can generate entirely novel behaviours – Bar pressing in rats, – Dog opening door. Shaping 90 80 70 60 50 40 30 20 10 0 Days 40 32 25 22 19 16 13 10 7 Normal Social Check 4 1 % social time Practical use of Shaping Superstitious Behaviour • Shaping may also explain superstitious behaviours… Skinner “The way reinforcement is carried out is more important that the amount of reinforcement given”” Contiguity & Contingency Fixed Ratio (Responses) Interval (Time period) Variable Contiguity & Contingency FIXED RATIO: 10 bar presses = 1 food pellet VARIABLE RATIO: on average 10 bar presses = 1 food pellet BUT this varies across trials, e.g. 7, 8,10, 11, 14 presses FIXED INTERVAL: 10 bar presses = 1 food pellet BUT only 1 pellet is available every 10sec VARIABLE INTERVAL: 10 bar presses = 1 food pellet BUT on average only 1 pellet is available every 10sec but this varies across trials, e.g. 6, 9, 10, 12, 13 secs Reinforcement schedules Plays to extinction (mean log) Rft Schedule affects Extinction 2 Lewis & Duncan, 1956 1.9 persistence at gambling 1.8 1.7 100 75 50 25 Percent reinforcement during training • The less reliably a response is reinforced, the more persistent it is during extinction. 0 Can we unlearn? - Extinction Duration of crying (min) 60 Rapid Reacquisition 50 40 First extinction Spontaneous recovery 30 20 Second extinction 10 Williams, 1959 0 1 2 3 4 5 6 7 Times child put to bed 8 9 10 Response training reinforcement The consequence Appetitive Aversive R Produces Consequ. Positive Reinforcement: R increases Positive Punishment: R decreases Negative Punishment (omission): R decreases Negative reinforcement (Escape/Avoid): R increases R Terminates Consequ. Positive Reinforcement Response training reinforcement The consequence Appetitive Aversive R Produces Consequ. Positive Punishment: R decreases R Terminates Consequ. Negative reinforcement (Escape/Avoid): R increases Negative Punishment (omission): R decreases Escape learning WS On Escape Shock On • A barrier divides the shuttle box - one half has a grid floor. • A warning signal (WS) comes on, followed by a mild foot shock through the grid floor. • The subject can escape the shock by leaping over the barrier to the safe area. Escape learning WS On Escape Shock On • A barrier divides the shuttle box - one half has a grid floor. • A warning signal (WS) comes on, followed by a mild foot shock through the grid floor. • The subject can escape the shock by leaping over the barrier to the safe area. Escape learning WS On Escape Shock On • A barrier divides the shuttle box - one half has a grid floor. • A warning signal (WS) comes on, followed by a mild foot shock through the grid floor. • The subject can escape the shock by leaping over the barrier to the safe area. Escape learning WS On Escape Shock On • A barrier divides the shuttle box - one half has a grid floor. • A warning signal (WS) comes on, followed by a mild foot shock through the grid floor. • The subject can escape the shock by leaping over the barrier to the safe area. Avoidance • The animal soon learns to jump over the barrier when the warning signal comes on, and avoids the shock altogether. • Escape = turning off some currently occurring aversive event, • Avoid = preventing some aversive event from occurring. WS On Happy rat Avoid Shock On Avoidance • The animal soon learns to jump over the barrier when the warning signal comes on, and avoids the shock altogether. • Escape = turning off some currently occurring aversive event, • Avoid = preventing some aversive event from occurring. WS On Happy rat Avoid Shock On Negative Reinforcement Response suppression Punishment The consequence Appetitive Aversive R Produces Consequ. Positive Reinforcement: R increases Positive Punishment: R decreases Negative Punishment (omission): R decreases Negative reinforcement (Escape/Avoid): R increases R Terminates Consequ. Positive Punishment Response suppression Reinforce other activities The consequence Appetitive R Produces Consequ. Positive Reinforcement: R increases R Terminates Consequ. Negative Punishment (omission): R decreases Aversive Negative reinforcement (Escape/Avoid): R increases Negative Punishment Response-consequence contingencies The consequence Appetitive Aversive R Produces Consequ. Positive Reinforcement: R increases Positive Punishment: R decreases Negative Punishment (omission): R decreases Negative reinforcement (Escape/Avoid): R increases R Terminates Consequ. Important Negative reinforcement is NOT Punishment!!! An analysis of drug abuse How can we describe the spiral from drug use to drug abuse? • • • • Sources of +ve Rf Sources of -ve Rf Goal directed or habit? Treat with punishment? Extinction? Omission? Very simplified analysis • Role of habituation, classical cond’n, discrimination learning, social learning, etc. Instrumental learning A summary • Unlike classical conditioning, instrumental learning involves circumstances where behaviour determines the events that follow. • The likelihood that a behaviour will increase or decrease is determined by: – The nature of the events that follow (appetitive/aversive), – Whether the behaviour produces or terminates these events. • Can be automatic, but can also goal-oriented.