Instrumental Conditioning: Motivational Mechanisms Contingency-Shaped Behaviour • Uses three-term contingency • Reinforcement schedule (e.g., FR10) imposes contingency • Seen in non-humans and humans Rule Governed Behaviour • Particularly in humans • Behaviour can be varied and unpredictable • Invent rules or use (in)appropriate rules across conditions (e.g., language) • Age-dependent, primary vs. secondary reinforcers, experience Role of Response in Operant Conditioning • Thorndike – Performance of response necessary • Tolman – Formation of expectation • McNamara, Long & Wike (1956) – Maze – Running rats or riding rats (cart) – Association what is needed Role of the Reinforcer • Is reinforcement necessary for operant conditioning? • Tolman & Honzik (1930) • Latent learning – Not necessary for learning – Necessary for performance Average Errors Results no food no food until day 11 food Day 11 Days Associative Structure in Instrumental Conditioning • Basic forms of association – S = stimulus, R = response, O = outcome • S-R • Thorndike, Law of Effect • Role of reinforcer: stamps in S-R association • No R-O association acquired Hull and Spence • Law of Effect, plus a classical conditioning process • Stimulus evokes response via Thorndike’s S-R association • Also, S-O association creates expectancy of reward • Two-process approach – Classical and instrumental are different One-Process or Two-Processes? • Are instrumental and classical the same (one process) or different (two processes)? • Omission control procedure – US presentation depends on nonoccurrence of CR – No CR, then CS ---> US – CR, then CS ---> no US Omission Control Trial with a CR CS US CR Trial without a CR CS US CR Gormenzano & Coleman (1973) • Eyeblink with rabbits • US=shock, CS=tone • Classical group: 5mA shock each trial, regardless of response • Omission group: making eyeblink CR to CS prevents delivery of US • One-process prediction: – CR acquisition faster and stronger for Omission group – Reinforcement for CR is shock avoidance – In Classical group CR will be present because it somehow reduces shock aversiveness • BUT… – CR acquisition slower in Omission group – Classical conditioning extinction (not all CSs followed by US) • Supports Two-process theory Classical in Instrumental • Classical conditioning process provides motivation • Stimulus substitution • S acquires properties of O – rg = fractional anticipatory goal response • Response leads to feedback – sg = sensory feedback • rg-sg constitutes expectancy of reward Timecourse S rg - s g R O Through stimulus substitution S elicits rg-sg, giving motivational expectation of reward Prediction Lever pressing Magnitude • According to rg-sg CR should occur before operant response; but doesn’t always • Dog lever pressing on FR33 ---> PRP • Low lever presses early, then higher; but salivation only later salivation Time from start of trial Modern Two-Process Theory • • • • Classical conditioning in instrumental Neutral stimulus ---> elicits motivation Central Emotional State (CES) CES is a characteristic of the nervous system (“mood”) • CES won’t produce only one response – Bit annoying re: prediction of effect Prediction • Rate of operant response modified by presentation of CS • CES develops to motivate operant response • CS from classical conditioning also elicits CES • Therefore, giving CS during instrumental conditioning should alter CES that motivates instrumental response “Explicit” Predictions • Emotional states US CS CS+ CS- Appetitive (e.g., food) Hope Disappointment Aversive (e.g., shock) Fear Relief • Behavioural predictions Instrumental schedule Aversive US CS+(fear) CS-(relief) Positive reinforcement Negative reinforcement decrease increase increase decrease R-O and S(R-O) • Earlier interpretations had no responsereinforcement associations • Intuitive explanation, though • Perform response to get reinforcer Colwill & Rescorla (1986) – Food or sucrose Testing of Reinforcers Mean responses/min. • R-O association • Devalue reinforcer post-conditioning • Does operant response decrease? • Bar push right or left for different reinforcers normal reinforcer devalued reinforcer Blocks of Ext. Trials Interpretation • Can’t be S-R – No reinforcer in this model • Can’t be S-O – Two responses, same stimuli (the bar), but only one response affected • Conclusion – Each response associated with its own reinforcer – R-O association Hierarchical S-(R-O) • R-O model lacks stimulus component • Stimulus required to activate association • Really, Skinner’s (1938) three term contingency • Old idea; recent empirical testing Colwill & Delameter (1995) • Rats trained on pairs of S+ • Biconditional discrimination problem – Two stimuli – Two responses – One reinforcer • Match the correct response to the stimuli to be reinforced • Training, reinforcer devaluation, testing • Training – – – – Tone: lever --> food; chain --> nothing Noise: chain --> food; lever --> nothing Light: poke --> sucrose; handle --> nothing Flash: handle --> sucrose; poke --> nothing • Aversion conditioning • Testing: marked reduction in previously reinforced response – – – – Tone: lever press vs. chain Noise: chain vs. lever Light: poke vs. handle Flash: handle vs. poke Analysis • Can’t be S-O – Each stimulus associated with same reinforcer • Can’t be R-O – Each response reinforced with same outcome • Can’t be S-R – Due to devaluation of outcome • Each S activates a corresponding R-O association Reinforcer Prediction, A Priori • Simple definition – A stimulus that increases the future probability of a behaviour – Circular explanation • Would be nice if we could predict beforehand Need Reduction Approach • Primary reinforcers reduce biological needs • Biological needs: e.g., food, water • Not biological needs: e.g., sex, saccharin • Undetectable biological needs: e.g., trace elements, vitamins Drive Reduction • Clark Hull • Homeostasis – Drive systems • Strong stimuli aversive • Reduction in stimulation is reinforcer – Drive is reduced • Problems – Objective measurement of stimulus intensity – Where stimulation doesn’t change or increases! Trans-situationality • A stimulus that is a reinforcer in one situation will be a reinforcer in others • Subsets of behaviour – Reinforcing behaviours – Reinforcable behaviours • Often works with primary reinforcers • Problems with other stimuli Primary and Incentive Motivation • Where does motivation to respond come from? • Primary: biological drive state • Incentive: from reinforcer itself But… Consider: • What if we treat a reinforcer not as a stimulus or an event, but as a behaviour in and of itself • Fred Sheffield (1950s) • Consummatory-response theory – E.g., not the food, but the eating of food that is the reinforcer – E.g., saccharin has no nutritional value, can’t reduce drive, but is reinforcing due to its consumability Premack’s Principle • Reinforcing responses occur more than the responses they reinforce • H = high probability behaviour • L = low probability behaviour • If L ---> H, then H reinforces L • But, if H ---> L, H does not reinforce L • “Differential probability principle” • No fundamental distinction between reinforcers and operant responses Premack (1965) • Two alternatives – Eat candy, play pinball – Phase I: determine individual behaviour probability (baseline) • Gr1: pinball (operant) to eat (reinforcer) • Gr2: eating candy (operant) to play pinball (reinforcer) – Phase II (testing) • T1: play pinball (operant) to eat (reinforcer) – Only Gr1 kids increased operant • T2: eat (operant) to play pinball (reinforcer) – Only Gr2 kids increased operant Premack in Brief Any activity… …could be a reinforcer … if it is more likely to be “preferred” than the operant response. Response Deprivation Hypothesis • Restriction to reinforcer response • Theory: – Impose response deprivation – Now, low probability responses can reinforce high probability responses • Instrumental procedures withhold reinforcer until response made; in essence, deprived of access to reinforcer • Reinforcer produced by operant contingency itself Behavioural Regulation • Physiological homeostasis • Analogous process in behavioural regulation • Preferred/optimal distribution of activities • Stressors move organism away from optimum behavioural state • Respond in ways to return to ideal state Behavioural Bliss Point • Unconstrained condition: distribute activities in a way that is preferred • Behavioural bliss point (BBP) • Relative frequency of all behaviours in unconstrained condition • Across conditions – BBP shifts • Within condition – BBP stable across time Imposing a Contingency • Puts pressure on BBP • Act to defend challenges to BBP • But requirements of contingency (may) make achieving BBP impossible • Compromise required • Redistribute responses so as to get as close to BBP as possible Minimum Deviation Model • • • • Behavioural regulation Due to imposed contingency: Redistribute behaviour Minimize deviation of responses from BBP – Get as close as you can restricted running Time drinking 40 30 20 restricted drinking 10 10 20 30 Time running 40 Strengths of BBP Theory • Reinforcers: not special stimuli or responses • No difference between operant and reinforcer • Explains new allocation of behaviour • Fits with findings on cognition for cost:benefit optimization