Theories of Reinforcement Law of Effect • The law of effect is circular: – What is a reinforcer? An event that increases behavior. – What events increase behaviors? Reinforcers. • How can we break this circularity? Need-Reduction Theory • Deprivation (e.g., food, water) lead to • Deficiencies (e.g., changes in tissue, endocrine system) which give rise to • A need to restore the status quo (homeostasis). • Reinforcers - stimuli that decrease some biological need state. Events that do not do so will not be reinforcers. Problems with Need-Reduction Theory • Sexual behavior: Male rats learn to run to a female in heat with no copulation. • Rats pressing lever for electrical brain stimulation (Olds, 1958) • Rats will work for saccharine • It takes time for food to be ingested and the biological need to be reduced – – Very long delay of reinforcement. Drive-Reduction Theory • Any strong stimulation (internal or external) is aversive and: – produces a negative drive state – reinforcement occurs when this drive state is reduced by some consequential event. • A drive is an intervening variable: – It is correlated with biological needs. – Satisfying a drive state is likely to satisfy a biological need later. Drive-Reduction Theory • Food pellets are reinforcing – Satisfy a drive state. – Eventually satisfy the need state, as well. • Unlike needs, drives can satisfied instantly, and there is no delay of reinforcement problem. Drive vs. Need • Oakley and Pfaffmann (1962) • Rats worked for saccharin (artificial sweetener, no nutritive value, therefore doesn’t satisfy a need). • Then they placed lesions in the central nervous system of rats (thalamus) to eliminate the sense of taste – eliminating drive, but need still present. • Response rates for saccharin (no nutritive value) and sucrose (sugar) both decreased. • Drive important Drive vs. Need • Epstein and Teitelbaum (1962) • Rats pressed levers for injection of food into gastric tube. • They maintained perfect control of their body weights. • Food into stomach can therefore be a reinforcer, and can be used to train new behaviors. • This procedure too eliminates drive, food removes the need, so now need looks like it is important. Problems with Drive-Reduction • Theory is unfalsifiable – Can have a conditioned drive state for anything • Monkeys working for visual stimuli, such as moving toy trains (Butler, 1953). • Or to see people in the lab. • Rats working for light (Premack & Collier, 1962). – But they prefer the dark… • Exploratory drive? Curiosity drive? Motherhood drive? • It is just too easy to think of new drives to explain everything that animals will work for... Problems with Drive-Reduction • Prediction of reinforcers – If we had an adequate theory of reinforcement, we could predict what events were reinforcers • Especially problematic in applied work – everyone “needs” a different reinforcer. • Can predict some things – Make a person hungry, food will probably be a reinforcer – But which food? • Back to the process: – The details may be different but the underlying processes should be the same. Classical Reinforcement Theory • Assumptions: – Some responses are reinforcible, and are always reinforcible. – Other responses are never reinforcible (consummatory responses). responses that can be reinforced instrumental responses responses that can not be reinforced consummatory responses Classical Reinforcement Theory • Assumptions: – Reinforcers are regarded as events, as stimuli – Reinforcers are trans-situational • If a stimulus reinforces lever presses, it should also reinforce other responses. stimuli that are always reinforcers stimuli that are never reinforcers Premack Theory • Premack suggested a way to predict a priori what events would be reinforcers. – Denied assumptions of classical reinforcement theory. • The reinforcement process: – Relation between responses – Not a relation between responses and consequential stimuli. • No clear boundaries between behaviors and reinforcers – Is it food that is the reinforcer, or eating? – Is it the toy, or playing? • Premack's principle of positive reinforcement: – If an instrumental response is followed by a contingent response that is more highly probable, the instrumental response will increase in frequency. Premack Theory • Stated another way: – If a less probable response is followed by a more probable response, the less probable response will increase in frequency. • L: less probable response • H: more probable response – Reinforcement is L → H. (Probability of L increases) • But if H → L – H will decrease in frequency – Punishment Premack Theory • Measure unconstrained baseline behavior and rank all activities in terms of their probability. • Every behavior can reinforce behavior down the list and punish behavior further up. • The Premack principle implies the following: – 1. That all responses are potentially reinforcible whereas classical theory said some were, and some were not. – 2. That all responses are potentially reinforcers for other, less probable, responses. • Premack's indifference principle – Irrelevant how the current behavior probabilities got to be what they are (e.g., through deprivation, learning, or whatever). All that matters is the current probabilities. Testing Premack • Premack (1963) used four monkeys, and designed four manipulanda – – – – A door they could open (D) A lever they could press (L) A horizontal lever they could operate (H) A plunger they could operate (P) • Only 1 of the 4 (Chicko) showed clear responseprobability differentials – So, only Chicko can be used to test the theory. • Chicko’s order: H>D/L >P Results for Chicko from Premack (1963) Item Independent Paired Contingency P(H) 78 68 243 P(D) 93 214 L(H) 270 326 342 P(L) 40 246 D(H) 382 274 467 L(D) 270 233 245 H(P) 543 584 382 D(P) 382 369 298 H(D) 543 459 424 78 78 H>D/L>P No reinforcement Summarizing Premack's (1963) Results • Chicko ordered H>D/L>P • In contingency tests, – H reinforced D, L, and P – D and L did not reinforce H, but did reinforce P – P did not reinforce any other response. • Thus, D and L were both reinforcers for P – but not reinforcers (for H). • Contrary to classical reinforcement theory: – Events were either reinforcers, or not reinforcers – Violating the trans-situationality assumption Reinforcing Consummatory Responses • Premack (1959) used children as subjects. – Could play pinball or eat chocolate. • 61% of the children preferred to play pinball • 39% preferred to eat chocolate • Divided each of these groups into two subgroups. – Eat-to-Play – Play-to-Eat Premack (1959) • For the Pinballers: – Eat to Play increased Eat significantly – Play to Eat increased Play only a very small amount • For the Eaters: – Eat to Play very small increase in Eat – Play to Eat increased Play significantly • Supports Premack theory • Demonstrated that eating, a consummatory response according to classical reinforcement theory, could be reinforced Premack (1962)-Assumptions • 1. Depriving a rat of water for 24 hours increases the probability of the rat drinking when water is available. • 2. Depriving a rat of running in an activity wheel for 24 hours increases the probability of running when the wheel is presented. • If running is more probable than drinking, then according to his theory running should reinforce drinking. – Reverse of the usual reinforcer relation: run to drink Premack (1962) • Condition 1 – Rats had 24-hour access to an activity wheel, but were not allowed to drink. – In 1-hour test session with activity wheel and drinkometer, spent, on average, 240 s drinking and 50 s running. – Drinking was more probable than running. • Condition 2 – Rats had 24-hour access to water, but they were not allowed to run. – In a 1-hour test session, they drank for 28 s on average, and ran for 329 s. – Running was more probable than drinking. Premack (1962) Results • Two contingency tests for both groups – Drink to run – Run to drink • Under Condition 1 (drinking > running), run-todrink reinforced running, but drink-to-run did not reinforce drinking. • Under Condition 2 (running > drinking), drink-torun reinforced drinking (drinking time went from 28 to 98 s), but run-to-drink did not reinforce running. • Increase in drinking in the drink-to-run contingency – Reinforcement of a typical consummatory response. • Supports Premack's indifference principle – Doesn't matter how the responses got to be at their current probability. Punishment • If a high probability response is followed by a lower probability response, the highprobability response should decrease in frequency. – i.e. H → L • Problem: If the response has a low probability, how can we get the animal to emit it? Weisman and Premack (unpublished) • Used motorized activity wheel: makes rats run • Condition 1: 24 hours water, no activity wheel. – Running reinforced Drinking in contingency tests. • Condition 2: 24 hours activity wheel, no water. – Test: Drink was followed by a 5-s forced run. – The forced run decreased drinking substantially • Aged rats (who had a very low probability of running) completely stopped drinking in the test sessions when drinking was followed by the forced run. • Typical punishment effects. Relations: Reinforcement and Punishment • H L H L H etc • Premack (1969): reinforcement and punishment two sides of the same coin, and cannot be separated. – Reinforce anything you’re punishing something else • Value isn't in the thing itself, it’s in the behavior that the thing controls. – Value of old bottle of water in your car? – Equally, food has no value, except if you can indulge in eating. • Symmetrical positive and negative laws of effect? – Or a single law concerned with how both response probabilities change when two responses occur in close temporal conjunction. Premack • Premack has had a profound effect on the theory of reinforcement – Newer theories (such as response-deprivation theory and behavior-regulation theory) – Applications in real-life situations. Applications of Premack • Homme, deBaca, Devine, Steinhorst and Rickert (1963) – 3-year old nursery-school children: want them sitting quietly in chairs – Verbal instructions ineffective • High probability behaviors: – Running around the room, screaming, pushing chairs, or quietly working on jigsaw puzzles • Low probability behaviors: – Sitting in their chairs • Make the high probability behaviors contingent on the low probability behaviors: – Contingency: Sit and attend Bell + "Run and Scream!" Homme, deBaca, Devine, Steinhorst and Rickert (1963) • Later: Tokens earned for low-probability behaviors which could be used to buy highprobability behaviors • Control was virtually perfect after just a few days. • "In summary, even in this preliminary, unsystematic application, the Premack hypothesis proved to be an exceptionally practical principle for controlling the behavior of nursery school Ss." Token Economies • Token economy programs – developed for the rehabilitation of long-term schizophrenic patients • The behavior of such patients can be brought under reinforcement control (Ayllon & Azrin, 1965; Atthowe & Krasner, 1968; Winkler, 1970). • But all report patients who failed to respond to the token regime. – 18% of Ayllon and Azrin's – 10% of Atthowe and Krasner's • Despite the use of multiple reinforcers, reinforcer sampling and reinforcement exposure (Ayllon & Azrin, 1968) designed to increase the utilization of back-up reinforcers, these patients did not work for the available rewards. Mitchell & Stoffelmayr (1973) • "Application of the Premack Principle to the behavioral control of extremely inactive schizophrenics” • The ward with the most severely ill chronic patients was chosen. The ongoing treatment included industrial therapy, ward domestic work, and weekly group discussions. • Selected the 4 most inactive patients.. – Identified items that could be used as reinforcers: candies, cigarettes, fruit, and cookies. – For two patients cigarettes and fruit were used maintain working • Found no dispensible reinforcers for WL and PM – Used the Premack principle directly Number of 30-s intervals in which an instance of work occurred in 30 minutes. Used WL and PM in tests of Premack theory Mitchell & Stoffelmayr (1973) • Instructions + reinforcement sessions. – Response that occurred freely with very high frequency was sitting, which was used for both patients. • Shaping sessions: experimenter approached the patient and asked him to stand. – If the patient remained seated, the experimenter would tip the subject's chair forward until the patient stood up. – The patient then given a coil and if they removed some wire they got to sit down. – Repeat after 90 seconds, gradually increasing response requirement – By Session 14, WL was removing three coil wrappings before reinforcement was given, while PM achieved this by Session 17. – In the subsequent sessions, the patients were allowed to remain seated while working. Reinforcement became resting. Inactive catatonic schizophrenic patients • "The present results suggest that even the most severely inactive patient will respond to a reinforcement regime. The strict application of Premack's principle then may have considerable therapeutic application for those patients, who in refusing to accept any tangible reward, do not respond to the token regime." Changing Relations: Premack (1969) • The following responses were offered to rats: – Drink 4% sucrose – Drink 32% sucrose – Run • Responses tracked over a session Tests: Drinking 32% sucrose reinforced running at the start of the session, but at the end of the session, the opportunity to run reinforced drinking 32% sucrose Problems with Premack • Response probabilities can fluctuate within a session, therefore difficult to measure. – In an applied setting, we might offer the clients a choice: – “What do you want to do after this activity?” • Research has found that a lowerprobability response can under certain circumstances reinforce a higherprobability response. Response-Deprivation Hypothesis • According to the probability-differential view (the original version of Premack theory), a lowerprobability response can never reinforcer a higher-probability response. • A few studies, however, have shown that this can happen if the animal is prevented from emitting the lower response activity at its baseline level. Any response, therefore, can be reinforcing (Timberlake & Allison, 1974). • And you don’t get the reinforcement effect if the contingent response can be emitted at the baseline level. Summary • Need-reduction and drive-reduction theories could not provide satisfactory explanations of reinforcement. • Premack has changed how reinforcement is regarded: Instead of a stimulus/event, it is regarded as a response. • Response-deprivation hypothesis is one example of how Premack’s principle has been refined.