ch5

Instrumental Learning & Operant Reinforcement Operant Learning  Stimulus  Response  Outcome Classical vs. Operant  Classical  Requires reflex action  Neutral stimulus associated with US  Outside of subject’s control  Operant  Strengthening/weakening of “voluntary” action  Subject responds or doesn’t  Can operate together What’s in a Name?  Operant learning: subject operates on environment  Instrumental conditioning: subject is instrumental in obtaining outcome Trial and Error Learning  E.L. Thorndike  Animal intelligence  Maze studies Puzzle Box  Cats  Cage with mechanism to open door  Escape latency  Discrete trial procedure Law of Effect  Any behaviour followed by an appetitive stimulus will increase in frequency Terms  Operant (response): any behaviour that operates on the environment to produce an effect  Reinforcer: any event that increases the frequency of a behaviour  Punisher: any event that decreases the frequency of a behaviour Operant Learning  B.F. Skinner  Operant chamber  Free operant procedure Discrete Trial & Free Operant  Discrete  One trial at a time  “Apparatus” must be reset  Measure some behaviour  e.g., mazes  Free  Operant can occur at any time  Operant can occur repeatedly  Response rate  e.g., operant chamber Four Contingencies  Positive reinforcement  Negative reinforcement  Positive punishment  Negative punishment Positive and Negative  Positive: presents some stimulus  Negative: removes some stimulus Reinforcers and Punishers  Reinforcer: increases a behaviour  Punisher: decreases a behaviour Contingencies Response Rate: Removed Response Causes Stimulus to Be: Increases Decreases Positive Reinforcement Positive Punishment Lever press --> Food Lever press --> Shock Negative Reinforcement Negative Punishment Lever press --> Shock off Lever press --> Food removed Types of Reinforcers  Primary  Not dependent on an association with other reinforcers  Secondary  Initially neutral stimulus  Paired with primary reinforcer  “Conditioned Reinforcer” Secondary Reinforcers  “Bridging”, “clicker”  Secondary extinction without periodic pairings with primary  Generally weaker than primary  Generalized reinforcer  Paired with many other kinds of reinforcers  e.g., money Strength of Operant Learning  Can condition practically any behaviour  Shaping (successive approximations) Shaping a Lever Press  Gradual process  Reinforce more appropriate/precise responses  Feedback Response Chains  Sequences of behaviours in specific order  Objective: primary reinforcer  Conditioned reinforcers  Discriminative stimuli Forward Chaining  Start with first response in sequence, then work through to last response in additive steps Backwards Chaining  Often used with “complex” training  Start with last response in chain  Next, second last response  Third last, etc. Contingency  Correlation between behaviour & outcome  Strong contingency --> better learning  Random contingency --> no learning  Both reinforcement and punishment Contiguity  Time between behaviour & outcome  Shorter = better learning  Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement)  Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?)  More important for punishment Reinforcer Characteristics  Larger reinforcers --> stronger learning  Not a linear effect  Qualitative differences in reinforcers and punishers  Species & individual differences  Intensity of punisher Task Characteristics  Some tasks easier to learn than others  Species & individual differences  Innate and/or prior conditioning Deprivation Levels  Generally, the greater the deprivation, the more effective the reinforcer  Reinforcers can satiate  Deprivation can provide motivation to engage in punishable behaviours Extinction  Behavioural does not lead to same outcome  Response no longer produces same outcome  Extinction burst (with reinforcement)  Variability of behaviour  Aggression and frustration  Spontaneous recovery  Resurgence Hull’s Drive Reduction Theory  Animals have motivational states (drives)  Necessary for survival  Reinforcers are things that reduce drives  Physiological value  Reduce physiological state Drive Reduction Reinforcers  Works well with primary  Some increase a reinforcers  Many secondary reinforcers have no physiological value  Hull: association links secondary to drive  Some reinforcers hard to classify as primary or secondary physiological state  Some necessities undetectable  Roller coasters  Vitamins  Saccharin Relative Value Theory & Premack Principle  Treat reinforcers as behaviours  Is it the food, or the behaviour of eating that is the reinforcer?  Behavioural probability scale  Greater or lesser value of behaviours relative to one another  No distinction between primary and secondary Premack Principle  One behaviour will reinforce a second behaviour  High probability behaviour reinforces low probability behaviour  Baseline probability scale  Time  Rank order Time spent on response  Reinforcement relativity Probabilty of response = Total time  No absolutes Example  Behaviours  Eat ice cream (I), play video game (V), read book (B)  Baseline (30 minutes)  Student 1: I (2min), V (8min), B (20min)  Scale: I -- V -- B  Student 2: I (8min), V (20min), B (2min)  Scale: B -- I -- V  Student 1: V reinforces I, B reinforces V & I  Student 2: I reinforces B, V reinforces I & B Problems  Baseline phase  Fair rating?  How to compare very different behaviours  Time problems  What if time not important to behaviour?  Behaviour duration?  Length of baseline period? Response Deprivation Theory  Deprived behaviours = reinforcing behaviours  Drop below baseline level of performance  Not relative frequency of one behaviour compared to another (i.e., Premack)  Level of deprivation for a behaviour  Praise? “Yes”? Definitions  Escape  Get away from aversive stimulus that is in progress  Avoidance  Get away from aversive stimulus before it begins Shuttle Box  Solomon & Wynne (1953)  Dogs  Chamber with barrier; Shock  Light off as signal Barrier Discriminative stimuli Electrifiable floor Side 1 Side 2 Two-Process Theory  Classical and operant conditioning  Shock = US  Fear/pain/jump/twitch/ squeal = UR  Darkness = CS  Fear of dark = CR  Fear: heart rate, breathing, stomach cramps, etc.  Negative reinforcement  Removal of fear (CR)  Escape of CS, not avoidance of shock Support for Two-Process Theory  Rescorla & LoLordo (1965)  Dog in shuttlebox  No signal  Response gives “safe time”  Pair tone with shock  Tone increases rate of response  CS can amplify avoidance  Conditioned inhibition can reduce avoidance Problems with Two-Process Theory  Avoidance without observable fear  Heart rate  Not consistent  Fear diminishes with avoidance learning Measuring Fear  Kamin, Brimer, and Black (1963)  Lever press ---> food  Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row  CS in Skinner box; check for suppression of lever press Results  Fear decreases during extended avoidance training Responding  But, avoidance still strong  Even low fear is enough? 1 3 9 Avoidance responses 27 Extinction in Avoidance Behaviour  Odd prediction from two-process theory  “Yo-yo” effect  Avoidance should toggle successful avoidance  But! Avoidance is extremely persistent trials One-Process Theory  Classical conditioning component unnecessary  Avoidance, not fear reduction, is reinforcer  “Safety” Sidman Avoidance Task  Free-operant avoidance  Can avoidance be learned if no warning CS?  Shock at random intervals  Response gives safe time  Extensive training --> learn avoidance  But, usually never perfect  High variability across subjects  Two-process theory suggests:  Time becomes a CS (time elicits fear) Herrnstein & Hineline (1966)  Rapid and slow shock rate schedules  Lever press switches schedules  Shocks presented randomly, no signal  Responses give shock reduction  Reduction in shock is reinforcer Learned Helplessness  Behaviour has no effect on situation  Generalizes  Laboratory  Give inescapable shocks  Shuttle box  Will not switch sides  Expectation that behaviour has no effect Learned Helplessness in Humans  Depression  Situations beyond your control  Three dimensions  Situation: specific or global  Attribute: internal or external  Time: short-term or long-term Maier & Seligman (1976)  Motivational impairment  Cognitive impairment  Emotional impairment Therapeutic Application  Confidence building (“can not fail”)  Implementation issues  Tasks that can be successfully completed  Produces immunization  Escapable condition … inescapable condition  Learned helplessness less likely to develop

ch5

Related documents

Products

Support

ch5

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib