Instrumental Learning & Operant Reinforcement Operant Learning Stimulus Response Outcome Classical vs. Operant Classical Requires reflex action Neutral stimulus associated with US Outside of subject’s control Operant Strengthening/weakening of “voluntary” action Subject responds or doesn’t Can operate together What’s in a Name? Operant learning: subject operates on environment Instrumental conditioning: subject is instrumental in obtaining outcome Trial and Error Learning E.L. Thorndike Animal intelligence Maze studies Puzzle Box Cats Cage with mechanism to open door Escape latency Discrete trial procedure Law of Effect Any behaviour followed by an appetitive stimulus will increase in frequency Terms Operant (response): any behaviour that operates on the environment to produce an effect Reinforcer: any event that increases the frequency of a behaviour Punisher: any event that decreases the frequency of a behaviour Operant Learning B.F. Skinner Operant chamber Free operant procedure Discrete Trial & Free Operant Discrete One trial at a time “Apparatus” must be reset Measure some behaviour e.g., mazes Free Operant can occur at any time Operant can occur repeatedly Response rate e.g., operant chamber Four Contingencies Positive reinforcement Negative reinforcement Positive punishment Negative punishment Positive and Negative Positive: presents some stimulus Negative: removes some stimulus Reinforcers and Punishers Reinforcer: increases a behaviour Punisher: decreases a behaviour Contingencies Response Rate: Removed Response Causes Stimulus to Be: Increases Decreases Positive Reinforcement Positive Punishment Lever press --> Food Lever press --> Shock Negative Reinforcement Negative Punishment Lever press --> Shock off Lever press --> Food removed Types of Reinforcers Primary Not dependent on an association with other reinforcers Secondary Initially neutral stimulus Paired with primary reinforcer “Conditioned Reinforcer” Secondary Reinforcers “Bridging”, “clicker” Secondary extinction without periodic pairings with primary Generally weaker than primary Generalized reinforcer Paired with many other kinds of reinforcers e.g., money Strength of Operant Learning Can condition practically any behaviour Shaping (successive approximations) Shaping a Lever Press Gradual process Reinforce more appropriate/precise responses Feedback Response Chains Sequences of behaviours in specific order Objective: primary reinforcer Conditioned reinforcers Discriminative stimuli Forward Chaining Start with first response in sequence, then work through to last response in additive steps Backwards Chaining Often used with “complex” training Start with last response in chain Next, second last response Third last, etc. Contingency Correlation between behaviour & outcome Strong contingency --> better learning Random contingency --> no learning Both reinforcement and punishment Contiguity Time between behaviour & outcome Shorter = better learning Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?) More important for punishment Reinforcer Characteristics Larger reinforcers --> stronger learning Not a linear effect Qualitative differences in reinforcers and punishers Species & individual differences Intensity of punisher Task Characteristics Some tasks easier to learn than others Species & individual differences Innate and/or prior conditioning Deprivation Levels Generally, the greater the deprivation, the more effective the reinforcer Reinforcers can satiate Deprivation can provide motivation to engage in punishable behaviours Extinction Behavioural does not lead to same outcome Response no longer produces same outcome Extinction burst (with reinforcement) Variability of behaviour Aggression and frustration Spontaneous recovery Resurgence Hull’s Drive Reduction Theory Animals have motivational states (drives) Necessary for survival Reinforcers are things that reduce drives Physiological value Reduce physiological state Drive Reduction Reinforcers Works well with primary Some increase a reinforcers Many secondary reinforcers have no physiological value Hull: association links secondary to drive Some reinforcers hard to classify as primary or secondary physiological state Some necessities undetectable Roller coasters Vitamins Saccharin Relative Value Theory & Premack Principle Treat reinforcers as behaviours Is it the food, or the behaviour of eating that is the reinforcer? Behavioural probability scale Greater or lesser value of behaviours relative to one another No distinction between primary and secondary Premack Principle One behaviour will reinforce a second behaviour High probability behaviour reinforces low probability behaviour Baseline probability scale Time Rank order Time spent on response Reinforcement relativity Probabilty of response = Total time No absolutes Example Behaviours Eat ice cream (I), play video game (V), read book (B) Baseline (30 minutes) Student 1: I (2min), V (8min), B (20min) Scale: I -- V -- B Student 2: I (8min), V (20min), B (2min) Scale: B -- I -- V Student 1: V reinforces I, B reinforces V & I Student 2: I reinforces B, V reinforces I & B Problems Baseline phase Fair rating? How to compare very different behaviours Time problems What if time not important to behaviour? Behaviour duration? Length of baseline period? Response Deprivation Theory Deprived behaviours = reinforcing behaviours Drop below baseline level of performance Not relative frequency of one behaviour compared to another (i.e., Premack) Level of deprivation for a behaviour Praise? “Yes”? Definitions Escape Get away from aversive stimulus that is in progress Avoidance Get away from aversive stimulus before it begins Shuttle Box Solomon & Wynne (1953) Dogs Chamber with barrier; Shock Light off as signal Barrier Discriminative stimuli Electrifiable floor Side 1 Side 2 Two-Process Theory Classical and operant conditioning Shock = US Fear/pain/jump/twitch/ squeal = UR Darkness = CS Fear of dark = CR Fear: heart rate, breathing, stomach cramps, etc. Negative reinforcement Removal of fear (CR) Escape of CS, not avoidance of shock Support for Two-Process Theory Rescorla & LoLordo (1965) Dog in shuttlebox No signal Response gives “safe time” Pair tone with shock Tone increases rate of response CS can amplify avoidance Conditioned inhibition can reduce avoidance Problems with Two-Process Theory Avoidance without observable fear Heart rate Not consistent Fear diminishes with avoidance learning Measuring Fear Kamin, Brimer, and Black (1963) Lever press ---> food Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row CS in Skinner box; check for suppression of lever press Results Fear decreases during extended avoidance training Responding But, avoidance still strong Even low fear is enough? 1 3 9 Avoidance responses 27 Extinction in Avoidance Behaviour Odd prediction from two-process theory “Yo-yo” effect Avoidance should toggle successful avoidance But! Avoidance is extremely persistent trials One-Process Theory Classical conditioning component unnecessary Avoidance, not fear reduction, is reinforcer “Safety” Sidman Avoidance Task Free-operant avoidance Can avoidance be learned if no warning CS? Shock at random intervals Response gives safe time Extensive training --> learn avoidance But, usually never perfect High variability across subjects Two-process theory suggests: Time becomes a CS (time elicits fear) Herrnstein & Hineline (1966) Rapid and slow shock rate schedules Lever press switches schedules Shocks presented randomly, no signal Responses give shock reduction Reduction in shock is reinforcer Learned Helplessness Behaviour has no effect on situation Generalizes Laboratory Give inescapable shocks Shuttle box Will not switch sides Expectation that behaviour has no effect Learned Helplessness in Humans Depression Situations beyond your control Three dimensions Situation: specific or global Attribute: internal or external Time: short-term or long-term Maier & Seligman (1976) Motivational impairment Cognitive impairment Emotional impairment Therapeutic Application Confidence building (“can not fail”) Implementation issues Tasks that can be successfully completed Produces immunization Escapable condition … inescapable condition Learned helplessness less likely to develop