Instrumental Conditioning: Foundations Name Game • Instrumental: subject instrumental in producing outcome • Operant: subject operates on environment to produce outcome Elements • Discriminative stimulus (SD) • Response • Outcome – Appetitive – Aversive • Species and individuals Outcomes and Effects • Positive – Something is delivered • Negative – Something is removed • Reinforcer – Causes frequency of behaviour to increase • Punisher – Causes frequency of behaviour to decrease • Need to know effect on behaviour before labeling “reinforcer” or “punisher” Omission Training • Negative punishment • Text: withdrawing sources of positive reinforcement • Omission training technique – Operant response --> withholding of appetitive stimulus – No operant response --> get appetitive stimulus A Caveat • Domjan: picky on operant vs. instrumental (e.g., p. 151) • Domjan: sloppy on reinforcer and punisher re: effect on behaviour • Don’t be sloppy like Domjan! • Be clear on outcome’s effect on increase/decrease of behaviour! Methodologies/Procedures • Discrete trial – One trial at a time – Reset apparatus – Latency, running speed, reduction in errors • Free operant – Uninterrupted repeated trials – Less disruptive for subject – Response rate Beginnings • • • • • • Edward L. Thorndike Undergraduate student Animal intelligence Mazes Puzzle boxes (video) Trial-and-error learning Law of Effect • Responses followed by appetitive outcomes increase in frequency • Responses followed by aversive outcomes decrease in frequency • Stimulus-Response (S-R) learning • Association between S and R altered by experience Mechanical Strengthening Processes • • • • • Guthrie & Horton (1946) Cat in box with a pole Streotypic behaviours Consistent in individual Variable across individuals Stop-Action Principle • “Random” response strengthened by success – Individual predispositions – E.g. response = bite pole; appetitive outcome = escape – “Stops the action” • Not immediate • Dominance of one response Problems with Stop-Action • Muezinger’s (1928) • Guinea pigs • Lever press for lettuce • Not one dominant operant behaviour Response Classes • Lashley (1942) • Reinforcement strengthens class of operant responses • End goal B.F. Skinner • Operant response – Meaningful, measurable unit of behaviour – Defined by effect it has on environment • Skinner’s approach ( video) • Operant chamber (video) Shaping • Trial-and-error somewhat random • Successive approximations • Very precise operant response possible Shaping: Reinforcers • Conditioned reinforcer – Previously neutral stimulus that has acquired the capacity to strengthen responses because it has been repeatedly paired with a primary reinforcer • Primary reinforcer – Stimulus that naturally strengthens any response that is paired with it Shaping a Lever Press • Gradual process • Reinforce more appropriate/precise responses Behavioural Stereotypy vs. Variability • Always some slight variability in responses • Degree of stereotypy – Specific imposed response requirements – Cost-benefit of different responses • Can actually condition response variability – E.g., Only reinforce novel responses Page & Neuringer (1985) Expimental % novel responses • Pigeons • 8 pecks on two keys (left and right) • Exp. Gr.: only reinforced if response different from previous 50 responses • Control Gr.: reinforced for any response pattern Control 1st five sessions last five sessions Mediators on Response • Belongingness – Thorndike: some responses harder to condition than others • Biological predispositions • Breland & Breland (1961) and instinctive drift Skinner (1948) • • • • Superstitious behaviour Accidental strengthening of response FT-15 sec. grain delivery 6 of 8 pigeons develop very characteristic, unrequired responses • Humans – Rituals, personal and society superstitions; persistent Staddon & Simmelhag (1971) • • • • • High speed cameras Interim and terminal responses Behavioural regularities Temporally structured Terminal: species specific behaviours re: food anticipation • Interim: behaviours not motivated by food • R3: peck at floor • R4: quarter turn • R8: move along magazine wall • R1: orient toward food magazine wall • R7: peck at magazine wall Probability of Occurrence Responses terminal Interval (sec.) interim Behaviour Systems Theory • Periodic food delivery activates feeding system • Preorganized species-typical foraging and feeding responses • Just after food: post-food focal search • Middle of time interval: general search Reinforcer Values • Response magnitude, rate of learning • Quantity and quality – Individual’s level to assess magnitude differences – Generally positive correlation for single operant tasks; more complicated for higher schedules (back to this with choice section) • Changes in reinforcement magnitude – Reinforcement history – Expectation – Positive and negative behavioural contrast Response-Reinforcer Contingency • “Causal relation” • Strong contingency produces stronger responding and faster learning • Non-contingent (random) relationship – Lack of responding – Extinction Temporal Contiguity • Immediate reinforcement more effective than delayed – Which response was reinforced? forgetting; reinforcer devaluation • Skinner on teaching machine (video) • Bridge (conditioned reinforcer) • Marking procedure Control • • • • Response’s control over outcome Uncontrollable situation Aversive outcome Learned helplessness Triadic Design Group Exposure Phase Conditioning Phase Result Group E Escapable shock Escapeavoidance Rapid avoidance Group Y Yoked inescapable shock Escapeavoidance Slow avoidance Group R Restricted to apparatus Escapeavoidance Rapid avoidance • Immunization – Escapable shock; inescapable shock, escapeavoidance; rapid avoidance Theory • Behaviour has no effect on situation • Generalization • Maier & Seligman (1976) – Motivational, cognitive, and/or emotional impairment • Non-human learned helplessness – Model for human depression – Situation (specific/global), Attribution (internal/external), time (short- or long-term)