Learning Chapter 5 Definition of Learning Learning is any relatively permanent change in behavior brought on by experience or practice “relatively permanent” refers to the fact that when people learn anything, some part of their brain is physically changed to record what they’ve learned This is actually a process of memory, without the ability to remember, people cant learn anything “experience or practice” refers to the tendency for behavior to differ based on the experience of specific events If a behavior results in a positive experience, it is likely to occur again If a behavior results in a negative experience, it is not likely to occur again Classical Conditioning Ivan Pavlov – studied the digestive system of dogs Reflex – an involuntary response that is not under personal control or choice Ex. Dogs salivate when they receive food Stimulus – any object, event, or experience that causes a response (the reaction of an organism) Ex. Food given to dogs that causes the reflexive response of salivation Pavlov noticed that his dogs were salivating when they weren’t supposed to Some would start salivating when they saw the lab assistant bringing their food, some when they heard the clatter of the food bowl from the kitchen, some when it was the time of day when they were usually fed Thus, Pavlov switched his focus to study these responses and eventually termed the phenomenon classical conditioning Learning to make an involuntary (reflex) response to a stimulus other than the original, natural stimulus that normally produces the reflex Elements of Classical Conditioning Unconditioned stimulus (UCS) – the original, naturally occurring stimulus that leads to an involuntary (reflex) response Unconditioned because it is unlearned In Pavlov’s research, food is the UCS Unconditioned response (UCR) – an involuntary (reflex) response to a naturally occurring stimulus or UCS Also unlearned, occurs because of genetic “wiring” in the nervous system In Pavlov’s research, salivation is the UCR Conditioned stimulus (CS) – stimulus that becomes able to produce a learned reflex response by being paired with the original UCS Almost any stimulus can become associated with a UCS if it is paired with UCS often enough Before a stimulus is associated with a UCS it is called a neutral stimulus (NS) – stimulus that has no effect on the desired response After being paired with the UCS enough times to produce the reflexive response alone, the NS becomes the CS In Pavlov’s research, lab assistant bringing food, bell, or clatter of the food bowl is the CS Conditioned response (CR) – learned reflex response to a CS Is usually not as strong as the original UCR, but is essentially the same response In Pavlov’s research salivation in response to the lab assistant or clatter of the food bowl is the CR Pavlov’s Famous Experiments Paired the ticking of a metronome with the presentation of food Because the metronome’s ticking didn’t normally produce salivation it was the NS before any conditioning took place CR and UCR are both salivation They differ because they are in response to different things UCR occurs after a UCS CR occurs after a CS Food (UCS) produces salivation (UCR) Food (UCS) is repeatedly paired with sound of the metronome (NS) After pairings: sound of the metronome (CS) produces salivation (CR) Pavlov’s Basic Principles of Classical Conditioning The CS must come before the USC If the sound of the metronome came just after the dogs received food, they did not become conditioned The CS and UCS must come very close together in time – no more than 5 seconds When the time between the potential CS and the UCS was extended to several minutes, no association between the two was made Too much could happen in the longer interval of time to interfere with conditioning Recent studies have found that the interstimulus interval (ISI), the time between the CS and UCS can vary depending on the nature of the conditioning task and even the organism being conditioned Shorter ISIs (less than 500 milliseconds) have been found to be ideal for conditioning The NS must be paired with the UCS several times Often many pairings are necessary The CS is usually some stimulus that is distinctive, or stands out, from other competing stimuli The metronome was a sound that was not normally present in the laboratory and, therefore, was distinct Stimulus Generalization and Discrimination Stimulus generalization – the tendency to respond to a stimulus that is similar to the original conditioned stimulus with the conditioned response Pavlov noticed that similar sounds to the metronome would produce a similar CR Strength of the response is not as strong as to the original CR Stimulus discrimination – the tendency to stop making a generalized response to a stimulus that is similar to the original CS because the similar stimulus is never paired with the UCS Pavlov never paired the sounds similar to the metronome with food Because only the real CS (metronome) was followed with food (UCS) the dogs learned to tell the difference, or discriminate, between the “fake” sounds and the actual CS This occurs when an organism learns to respond to different stimuli in different ways Extinction and Spontaneous Recovery Extinction – the disappearance or weakening of a learned response following the removal or absence of the UCS When the CS (metronome’s ticking) was repeatedly presented without the UCS (food), the CR (salivation) “died out” or stopped occurring In theory, this occurs because new learning has taken place During extinction, the CS-UCS association that was learned, weakens, as the CS no longer predicts the UCS For Pavlov’s dogs, they learned not to salivate to the metronome because it no longer predicted food Spontaneous recovery – the reappearance of a learned response after extinction has occurred After extinction, Pavlov waited a few weeks before letting the dogs hear the metronome again, when he brought it back, they began to salivate again This brief recovery of the CR shows that the CR is still retained even after extinction (remember that learning is relatively permanent), so something that is learned is really “still in there” even after extinction It is just suppressed or inhibited by the lack of an association with the UCS As time passes, this inhibition weakens, especially if the original CS has not been present for a while Higher-Order Conditioning Occurs when a strong CS is paired with another NS, causing the NS to become a second CS At the point that Pavlov’s dogs were strongly conditioned to salivate (CR) when they heard the metronome (CS) if another sound, like a snap (NS), occurred just before the metronome (CS) enough times, the snap (NS) would become a CS and produce salivation (CR) by itself Food (UCS) would have to be presented every now and then to maintain the original CR of salivation to the metronome (CS) Without the UCS the higher-order conditioning would be difficult to maintain and would gradually fade away Conditioned Emotional Responses Conditioned emotional responses – emotional response that has become classically conditioned to occur to learned stimuli John Watson’s classic “Little Albert” experiment demonstrated the classical conditioning of a phobia (an irrational fear response) Presentation of a white rat was paired with a loud scary noise until Albert feared the white rat Before conditioning: white rat = NS During conditioning: white rat (NS) paired with loud noise (UCS) to produce fear (UCR) After conditioning: white rat (CS) produces fear (CR) In advertising, commercials often use things that are known to produce an emotional response in hopes that the emotional response will become associated with their product (ex. Attractive women or cuddly puppies) Vicarious Conditioning Vicarious conditioning – classical conditioning of a reflex response or emotion by watching the reaction of another person Ex. Children used to receive vaccination shots in school The nurse would line children up, and one by one they would receive the shot When some children received their shots, they cried a lot By the time the nurse got to the end of the line of children, they were all crying, many of them before the needle even touched their skin The children had learned their fear response to the shot from watching the reactions of the children who went before them Other Conditioned Responses Conditioned taste aversion – development of a nausea or aversive response to a particular taste because that taste was followed by a nausea reaction, occurring after only one association Ex. The chemotherapy drugs that cancer patients receive can create severe nausea, which usually causes them to develop a taste aversion to anything they eat up to 6 hours before the treatment Biological preparedness – the tendency of animals to learn certain associations, such as taste and nausea, with only one or few pairings due to the survival value of the learning Ex. If an animal eats something that makes them sick, they are likely to avoid that food in the future, which increases their chances of survival and passing on their genes to future generations These 2 types of conditioning violate 2 of Pavlov’s basic principles The pairing of the CS and USC being close in time Taste aversion can develop even if the food was eaten a considerable time before nausea occurs It should take multiple pairing of the CS and UCS to achieve conditioning Because of biological preparedness, taste aversion can occur with only one or few parings of the stimulus food with the nausea response Why Does Classical Conditioning Work? 2 ways to explain how one stimulus can come to “stand for” another Stimulus substitution – Pavlov’s original theory Suggested that the CS, through its association close in time with the UCS, came to activate the same place in the brain that was originally activated by the UCS But if a mere association in time is all that is needed, why would conditioning not work when the CS is presented immediately after the UCS Cognitive perspective – modern explanation Suggests that the CS provides information or an expectancy about the coming of the UCS The CS has to provide some kind of information about the coming of the UCS in order to achieve conditioning If the CS comes after the UCS it can’t provide any information about when the UCS is coming Ex. If rats experience an electric shock (UCS) while a specific tone (NS) is played, they will expect a shock (UCS) to occur during the tone (CS) and become anxious (CR) when they hear the tone But if the shock (UCS) comes immediately after the tone stops (NS), they will act normally when hearing the tone and anxious (CR) when it stops (CS), because they expect that during the tone a shock will not occur Operant Conditioning Classical conditioning is the kind of learning that occurs with reflexive, involuntary behavior Operant conditioning is the kind of learning that applies to voluntary behavior Operant conditioning – the learning of voluntary behavior through the effects of pleasant and unpleasant consequences to responses Thorndike’s Puzzle Box: How to Frustrate a Cat Thorndike would place a hungry cat inside a “puzzle box” from which the only escape was to press a lever on the floor of the box A bowl of food was placed outside the box, so the hungry cat would be highly motivated to get out The cat would move around the box, pushing and rubbing against the walls trying to escape and would eventually push the lever by accident and open the door The lever is the stimulus, the pushing of the lever is the response, and the consequence is both escape from the box and food After a number of trials the cat took less and less time to push the lever Its important not to assume the cat had “figured out” the connection between the lever and freedom, Thorndike kept moving the lever to a different position, and the cat had to learn the whole process over again The cat would simply push and rub around the same area that had worked the last time and each time found the lever a little more quickly Thorndike’s Law of Effect Based on his “puzzle box” research Thorndike developed the law of effect If an action is followed by a pleasurable consequence, it will tend to be repeated, and if followed by an unpleasant consequence, it will tend not to be repeated This is the basic principle behind learning voluntary behavior In the case of the “puzzle box,” pushing of the lever was followed by a pleasurable consequence (freedom and food), so pushing the lever became a repeated response B.F. Skinner: The Next Behaviorist Skinner took leadership of behaviorism after Watson He combined the work of Pavlov and Thorndike into a way to explain that all behavior is the product of learning Skinner is who actually termed the learning of voluntary behavior operant conditioning Voluntary behavior is what people and animals do to operate in the world Important distinction between operant and classical conditioning In classical conditioning, learning a reflex depends on what comes BEFORE the response (UCS), and what will become the CS In operant conditioning, learning depends on what happens AFTER the response, the consequence The Concept of Reinforcement Reinforcement – any event or stimulus, that when following a response, increases the probability that the response will occur again Typically, reinforcement is pleasurable But, reinforcement can also be negative, like avoiding something unpleasant Ex. When a behavior causes the removal of pain Skinner’s research involved something called a “Skinner box” or “operant conditioning chamber” Often involved placing a rat into one of the chambers and training it to push down on a bar to get food Primary and Secondary Reinforcers Reinforcers are not all alike, there are 2 types Primary – any reinforcer that is naturally reinforcing by meeting a basic biological need, such as hunger or touch Infants, toddlers, preschool age children, and animals can be easily reinforced with primary reinforcers Ex.You can reinforce a toddlers behavior with candy Secondary – any reinforcer that becomes reinforcing after being paired with a primary reinforcer, such as praise, money, or gold stars Ex. Money can be a reinforcer because it is associated with the ability to obtain (purchase) things that meet basic needs, such as food and shelter = Positive and Negative Reinforcement Reinforcers can also differ in the way they are used Positive reinforcement – the reinforcement of a response by the addition or experiencing of a pleasurable stimulus Ex. Every time a rat presses a bar it receives food. The rat’s pressing of the bar is positively reinforced by the pleasurable reward of food Negative reinforcement – the reinforcement of a response by the removal, escape from, or avoidance of an unpleasant stimulus Ex. If during a mild electric shock, if a rat presses a bar the shock stops. The rat’s pressing of the bar is negatively reinforced by the removal of the painful shock stimulus Schedules of Reinforcement The timing of reinforcement can have a tremendous difference in the speed at which learning occurs and the strength of the learned response Consider this scenario: Heather’s mother gives her a quarter every night she remembers to put her dirty cloths in the hamper. Sean’s mother gives him gives him a dollar at the end of the week, but only if he has put his cloths in the hamper every night that week. Which child will learn to put their cloths in the hamper more quickly? After both Heather and Sean have been conditioned to put their dirty cloths in the hamper, if both mothers stop giving money, which child is more likely to continue to putting their dirty cloths in the hamper the longest? The Partial Reinforcement Effect Continuous reinforcement – the reinforcement of each and every correct response Responses that are reinforced each time they occur are more easily and quickly learned Ex. Therefore, because Heather was reinforced every night with a quarter, she will learn the association faster than Sean Partial reinforcement effect – the tendency for a response that is reinforced after some, but not all, correct responses to be very resistant to extinction Ex. Sean expected to get a reinforcer only after 7 correct responses, when his reinforcers stop, he might continue to put his dirty cloths in the hamper for several more days or even another week or so, hoping that the reinforcer will eventually come anyway Heather will probably stop putting her dirty cloths in the hamper more quickly than Sean because she expects to be reinforced after every correct response The Partial Reinforcement Effect Partial reinforcement can be accomplished according to different patterns or schedules It might be a certain interval of time that’s important When timing of the response is more important, it is called an interval schedule Ex. If an office safe can only be opened at a specific time of day, it wouldn’t matter how many times a person tried to open it because it would only work at a specific time Ex. A rat can only get 1 food pellet for pressing a lever every 2 hours, regardless of how many times the bar is pressed Or it might be the number of responses required that’s important When the number of responses is more important, the schedule is called a ratio schedule, because a certain number of responses is required for each reinforcer Ex. If a person had to sell a certain number of raffle tickets in order to get a prize Ex. A rat must press a bar 10 times to get a food pellet, regardless of how long it takes Another way schedules of reinforcement can differ is in whether the number of responses or interval of time is fixed (the same every time) or variable (a different number or interval is required in each case) So it’s possible to have a fixed interval schedule, a variable interval schedule, a fixed ratio schedule, and a variable ratio schedule Fixed Interval Schedule of Reinforcement Fixed Interval Schedule – schedule of reinforcement in which the interval of time that must pass before reinforcement becomes possible is always they same Ex. Receiving a paycheck at the end of each week If you were teaching a rat to press a lever to get food pellets, you might require it to push the lever at least once within a 2 minute time span to get a pellet It wouldn’t matter how many times the rat pushed the bar; the rat would only get a pellet at the end of the 2 minute interval if it had pressed the bar at least once Fixed Interval Schedule of Reinforcement Fixed interval schedule of reinforcement does not produce a fast rate of responding Since it only matters that at least one response is made during the specific interval of time, speed is not that important Eventually the rat will start pushing the lever only as the interval of time nears its end, which is what causes the “scalloping” effect seen in the graph The response rate goes up just before the reinforcer and then drops off immediately after, until it is almost time for the next reinforcer Ex. This is similar to the way factory workers speed up production just before payday and slow down just after payday Variable Interval Schedule of Reinforcement Variable interval schedule of reinforcement – the interval of time that must pass before reinforcement becomes possible is different for each trial or event A rat might receive a food pellet when it pushes a lever, every 5 minutes on average, but sometimes the interval might be 2 minutes, sometimes 10 But the rat must push the lever at least once after that 2 or 10 minute interval to get the pellet Because the rat cant predict how long the interval is going to be, it pushes the bar more or less continuously, producing the smooth line on the graph Ex. Dialing a busy phone number, because you don’t know when the call will go through, you keep dialing and dialing Ex. Pop quizzes are unpredictable, students don’t know exactly what day they might be given a quiz, so the best strategy is to study a little every night just in case and show up to class… Fixed Ratio Schedule of Reinforcement Fixed ratio schedule of reinforcement – the number of responses required for reinforcement is always the same Notice 2 things about the graph The rate of responding is very fast, especially compared to the fixed interval schedule Rapid response rate occurs because the rat wants to get to the next reinforcer as fast as possible, and the number of lever pushes counts There are little “breaks” in the response pattern immediately after a reinforcer is given The pauses or breaks come right after a reinforcer, because the rat knows “about how many” lever pushes will be needed to get to the next reinforcer because it’s always the same Fixed schedules, both interval and ratio, are predictable, which allow rest breaks Ex. Some sandwich shops give out punch cards to their customers that get punched every time they buy a sandwich, when the card has 10 punches, the customer might get a free sandwich Variable Ratio Schedule of Reinforcement Variable ratio schedule of reinforcement – the number of responses required for reinforcement is different for each trial or event The rat might be expected to push the bar an average of 20 time to get reinforcement, that means that sometimes the rat would push the lever 10 times before a reinforcer comes, but on other trials it might take 30 presses or more In the graph, the line is just as rapid a response rate as the fixed ratio schedule because the number of responses still matters But the graph is much smoother because the rat is taking no rest breaks because it doesn’t know how many times it may have to push the lever to get the next food pellet Unpredictability makes the variable schedule responses more or less continuous Ex. People who put money into a slot machine continuously, do so because the don’t know how many times they will have to do this until the jackpot comes. They do this continuously because “the next one” might hit the jackpot. The same is true with lottery tickets and pretty much any sort of gambling Comparison of Reinforcement Schedules Additional Factors to Effective Reinforcement Regardless of the schedule of reinforcement, 2 additional factors contribute to making reinforcement of a behavior as effective as possible Timing In general, a reinforcer should be given as immediately as possible after the desired behavior Delaying reinforcement tends not to work well, especially when dealing with animals and small children Reinforce only the desired behavior This should be obvious, but everyone makes mistakes sometimes Ex. Many parents make the mistake of giving a child who has not done some chore the promised treat anyway, which completely undermines the child’s learning of that chore or task Also, who hasn’t given a treat to a pet that has not really done the trick? Examples: which kind of reinforcement is going on? Andy’s father nags him to wash his car. Andy hates being nagged, so he washes the car so his father will stop nagging. Negative reinforcement, washing his car removes the unpleasant stimulus of his father nagging Bradley learns that talking in a funny voice gets him lots of attention from his classmates, so now he talks that way often. Positive reinforcement, increasing use of the voice to get attention Tina is a server at a restaurant and always tries to smile and be pleasant because that seems to lead to bigger tips Positive reinforcement, Tina’s smiling and pleasantness are reinforced by better tips Will turns his report in to his teacher on the day it is due because papers get marked down a letter grade for every day they are late Negative reinforcement, avoiding the unpleasant stimulus of being marked down a grade by turning in a paper on time The Role of Punishment in Operant Conditioning Thinking back to positive and negative reinforcement These strategies are important for increasing the likelihood that the targeted behavior will occur again But what about a behavior we do not want to occur again? Punishment… How Does Punishment Differ From Reinforcement? People experience 2 kinds of things as consequences in the world Things they like (ex. Food, money, candy, sex, praise, etc.) Things they don’t like (ex. Spankings, being yelled at, experiencing any kind of pain, etc.) Additionally, people experience these two kinds of consequences in 1 of 2 ways Directly (ex. Getting money for working or getting yelled at for misbehaving) Or they don’t experience them at all (ex. Losing an allowance for misbehaving or avoiding a scolding by lying about misbehavior) 4 Ways to Modify Behavior Positive (Adding) Negative (Removing/Avoiding) Reinforcement Punishment Something valued or desirable Something unpleasant Positive Reinforcement Ex. Getting a gold star for good behavior Punishment by Application Ex. Getting a spanking for disobeying Something unpleasant Something valued or desirable Negative Reinforcement Ex. Avoiding a ticket by stopping at a red light Punishment by Removal Ex. Losing a privilege such as going out with friends 2 Kinds of Punishment Punishment – any event or object that, when following a response, makes that response less likely to happen again Punishment by application – the punishment of a response by the addition or experiencing of an unpleasant stimulus This is the kind of punishment people usually think of Ex. Spanking Punishment by removal – the punishment of a response by the removal of a pleasurable stimulus This is the kind of punishment people normally confuse with negative reinforcement Ex. “grounding” a teenager is removing the freedom to do what the teenager wants to do Negative Reinforcement VS. Punishment by Removal Example of Negative Reinforcement Example of Punishment by Removal Stopping at a red light to avoid getting in an accident Losing the privilege of driving because you got into too many accidents Mailing an income tax return by April 15 to avoid paying a penalty Having to lose some of your money to pay the penalty for late tax filing Obeying a parent before the parent reaches the count of 3 to avoid getting a scolding Being “grounded” (losing your freedom) because of disobedience Negative reinforcement occurs when a response is followed by the removal of an unpleasant stimulus If something unpleasant has just gone away as a consequence of that response, the response will tend to happen again If the response increases, the consequence has to be some kind of reinforcement Punishment by removal occurs when a response if followed by the removal of a pleasant stimulus If something pleasant is taken away as a consequence of a response, the response probably will not happen again If the response decreases, the consequence has to be some type of punishment In both, something is removed, but the difference between them is what is taken away and the result it has on behavior Problems With Punishment Although punishment can be effective in reducing or weakening a behavior, it has several drawbacks Punishment is used to weaken a response, and getting rid of a response that is already well established isn’t easy In reinforcement, all that has to be done is strengthen an already existing response Punishment usually serves to temporarily suppress or inhibit a behavior until enough time has passed Ex. Punishing a child’s bad behavior doesn’t always eliminate the behavior completely As time goes on, the punishment is forgotten, and the “bad” behavior may occur again in a kind of spontaneous recovery of the old (probably pleasurable) behavior Punishment by application can be pretty severe, and severe punishments do one thing well: it stops the behavior immediately It may not stop it permanently, but it does stop it In a situation in which a child might be doing something dangerous or self-injurious, this kind of punishment is sometimes more acceptable Ex. If a child starts to run into a busy street, the parent might scream at the child to stop and then administer several rather severe swats to the child’s rear If this is not usual behavior for the parent, the child will most likely never run into the street again Problems With Punishment Other than situations of immediately stopping dangerous behavior, severe punishment has too many drawbacks to be really useful (it can also lead to abuse) Severe punishment may cause a child (or animal) to avoid the punisher instead of the behavior being punished, so the child (or animal) learns the wrong response Severe punishment may encourage lying to avoid the punishment (a kind of negative reinforcement), again, not the response that is desired Severe punishment creates fear and anxiety, emotional responses that do not promote learning, if the point is to teach something, this kind of consequence isn’t going to help Hitting provides a successful model for aggression Problems With Punishment Punishment as a model for aggression The adult is using aggression to get he/she wants from the child Children sometimes become more likely to use aggression to get what they want when they receive this kind of punishment And, the adult has lost an opportunity to model a more appropriate way to deal with parent-child disagreements Since aggressive punishment does tend to stop the undesirable behavior, at least for a little while, the parent actually experiences a kind of negative reinforcement When they spank, the unpleasant behavior goes away This may increase the tendency to use aggressive punishment over other forms of discipline and can lead to child abuse Some children are so desperate for their parents’ attention that they will misbehave on purpose The punishment is a form of attention, and these children will take whatever attention they can get, even if it is negative Problems With Punishment Punishment by removal is less objectionable and is the only kind of punishment that is permitted in many public schools But this kind of punishment also has drawbacks It teaches the child what not to but not what the child should do Both punishment by removal and punishment by application are usually only temporary in their effect on behavior As time passes, the behavior will most likely return as the memory of the punishment gets weaker, allowing spontaneous recovery of the negative behavior How to Make Punishment More Effective Punishment should immediately follow the behavior it is meant to punish If the punishment comes long after the behavior, it will not be associated with that behavior (also true for reinforcement) Punishment should be consistent If the parent says that a certain punishment will follow a certain behavior, the parent must make sure to follow through and do what he/she promised Punishment for a particular behavior should stay at the same intensity or increase slightly but never decrease Ex. If a child is scolded for jumping on the bed the first time, the second time the behavior happens the child should be punished by scolding or by a stronger penalty, like removal of a favorite toy But if the first misbehavior is punished by spanking and the second only by a scolding, the child learns to “gamble” with the possible punishment Punishment of the wrong behavior should be paired with reinforcement of the right behavior Pairing punishment with reinforcement allows parents and others to use a much milder punishment and still be effective It also teaches the desired behavior rather than just suppressing the undesired one Ex. If a 2 year old is eating with her fingers, the parent should pull her hand gently out of her plate and say something like “No, we don’t eat with our fingers, we eat with our fork.” then place the fork in the child’s hand and praise her for using it, “See, you are doing such a good job with your fork, I’m so proud of you!” Stimulus Control Discriminative stimulus – any stimulus that provides the organism with a cue for making a certain response in order to obtain reinforcement Specific cues lead to specific responses, and discriminating between cues leads to success Ex. A police car is a discriminative stimulus for slowing down and a red stoplight is a cue for stopping because both of these actions are usually followed by negative reinforcement, people don’t want to get a ticket or get hit by another car Ex. A doorknob is a cue for where to grab a door to open it If a door has a knob, people always turn it, but if it has a handle, people usually pull it The 2 kinds of opening devices each cause a different response from people, and their reward is opening the door Other Concepts in Operant Conditioning Shaping – the reinforcement of simple steps in behavior that lead to a desired more complex behavior Ex. If you wanted to train your dog to jump through a hoop, you would have to start with some behavior that the dog is already capable of doing on its own Then gradually mold that starting behavior into a jump (something the dog is capable of doing but not likely to do on its own) You would start with the hoop on the ground in front of the dog and then call the dog through the hoop, using a treat as bait After the dog steps through the hoop, you give the dog a treat (positive reinforcement) The next time, you could raise the hoop a little, reward the dog for walking through it again, the raise the hoop again, reward again, and so on The goal is achieved by reinforcing each Successive approximation Successive approximations – small steps in behavior, one after the other, that lead to a particular goal behavior Other Concepts in Operant Conditioning Extinction in operant condition involves the removal of the reinforcement (in classical conditioning, extinction involves the removal of the UCS) Ex. If a child is throwing a tantrum to get a candy bar, if the parent does not cave in and removes the reinforcement (the candy bar) and if possible parental attention, the tantrum will eventually stop Operantly conditioned responses can also be generalized to stimuli that are similar to the original stimulus (just like in classical conditioning) Ex. When a baby is first learning to label objects and people, he may say “Dada” when his father is present, and the father reinforces the behavior with praise and attention But sometimes the baby will call all men “Dada,” but over time as other men fail to reinforce this response, he’ll learn to discriminate among them and his father and only call his father “Dada” In this way, the man who is actually his father becomes a discriminative stimulus Spontaneous recovery also occurs in operant conditioning (just like in classical conditioning) Ex. In the example of teaching the dog to jump through the hoop, if the dog has already learned other tricks, like rolling over or shaking paws, when learning a new trick the dog may try to get a reinforcer by performing its old tricks, before finally walking through the hoop Using Operant Conditioning: Behavior Modification Behavior modification – the use of operant conditioning techniques to bring about desired changes in behavior Used for many years to change undesirable behavior and create desirable responses in animals and humans, particularly in school children If a teacher wants to use behavior modification to help a child learn to be more attentive during lectures Select a target behavior, such as making eye contact with the teacher Choose a reinforcer, such as a gold star applied to the child’s chart on the wall Every time the child makes eye contact, the teacher gives the child a gold star. Inappropriate behavior, such as looking out the window, is not reinforced with gold stars At the end of the day, the teacher gives the child a special treat or reward for having a certain number of gold stars (reward is decided ahead of time and discussed with the child) The gold stars in the example above, can be considered tokens, secondary reinforcers that can be traded in for other kinds of reinforcers Token economy – type of behavior modification in which desired behavior is rewarded with tokens Commonly used in programs like Alcoholics Anonymous Using Operant Conditioning: Behavior Modification Another tool behaviorists use to modify behavior is called time-out Time-out – form of mild punishment by removal in which a misbehaving animal, child, or adult is placed in a special area away from the attention of others Essentially, the organism is being “removed” from any possibility of positive reinforcement in the form of attention Horrible but hilarious time out method… Using Operant Conditioning: Behavior Modification Applied behavior analysis (ABA) – modern term for a form of behavior modification that uses both analysis of current behavior and behavioral techniques to address a socially relevant issue Ex. ABA has been used as a technique involving shaping to teach social skills to individuals with Autism Small pieces of candy are used as reinforcers to teach social skills and language to children with autism In ABA, skills are broken down into their simplest steps and then taught to the child through a system of reinforcement Prompts (such as moving a child’s face to look at a teacher on a task) are given as needed when the child is learning a skill or refuses to cooperate As the child begins to master a skill and receives reinforcement in the form of treats or praise, the prompts are gradually taken away until the child can do the skill independently Using Operant Conditioning: Behavior Modification Techniques for modifying responses have been developed so that even biological responses, normally considered involuntary, such as blood pressure, muscle tension, and hyperactivity can be brought under conscious control Biofeedback – using feedback about biological conditions to bring involuntary responses, such as blood pressure and relaxation, under voluntary control Relatively newer biofeedback technique, called neurofeedback involves trying to change brain-wave activity Involves amplifiers connected to a computer that records and analyzes the physiological activity of the brain Neurofeedback can be integrated with video-game-like programs that individuals can use to learn how to produce brain waves or specific types of brain activity associated with specific cognitive or behavioral states (ex. increased attention, staying focused, relaxed awareness) Cognitive Learning Theory Behaviorists believed that only observable, measurable behavior should be studied But, other psychologists had an interest in cognition, the mental events that take place inside a person’s mind while behaving These individuals began to dominate the field of experimental psychology Behaviorists could no longer ignore the thoughts, feelings, and expectations that clearly existed in the mind and that seemed to influence observable behavior They eventually began to develop a cognitive learning theory to supplement the more traditional theories of learning (conditioning) There are 3 important people that are often cited as key theorists in the early days of the development of cognitive learning theory Gestalt psychologists Edward Tolman and Wolfgang Kohler, and modern psychologist Martin Seligman Latent Learning: Tolman’s Maze-Running Rats Tolman’s best-known experiments in learning involved teaching 3 groups of rats the same maze, one at a time 1st group – each rat was placed in the maze and reinforced with food for making its way out the other side The rat was then placed back in the maze, reinforced when it completed the maze again, and so on until the rat could successfully solve the maze without making any errors (like wrong turns) 2nd group – rats were treated exactly like the first group except they didn’t get any reinforcement when they exited the maze, they were simply put back over and over again for the 1st 9 days On the 10th day, the rats began to receive reinforcement for getting out of the maze 3rd group – served as a control group and were not reinforced over the entire course of the experiment Latent Learning: Tolman’s Maze-Running Rats A behaviorist would predict that only the 1st group of rats would learn the maze, because learning depends on reinforcement At first, the 1st group of rats solved the maze after a certain number of trials Whereas the 2nd and 3rd groups seemed to wander aimlessly around until accidentally finding their way out On the 10th day, the first time the 2nd group was reinforced, they solved the maze almost immediately Rats in the 2nd group , while wandering around in the first 9 days, had learned how to navigate the maze successfully and had stored this knowledge as a kind of “mental map,” or cognitive map of the layout of the maze The rats in the 2nd group had not demonstrated their learning of the maze in the first 9 days because they had no reason to The cognitive map has remained hidden, or latent, until the rats had a reason to use it, getting reinforced with food for completing the maze Tolman called this latent learning – learning that remains hidden until its application becomes useful The idea that learning could happen without reinforcement, and then later affect behavior, was something traditional operant conditioning could not explain Insight Learning: Kohler’s Smart Chimp Kohler was a Gestalt psychologist who became marooned on an island off the coast of North Africa when WWI broke out At the time, he was working at a primate research lab on the island and began to study animal learning In one famous study, Kohler set up a problem for a chimpanzee named Sultan The problem was how to get to a banana that was placed just out of his reach outside his cage Sultan solved the problem relatively easily, first trying to reach through the bars with his arm, then using a stick that was lying in the cage to rake the banana to him As chimpanzees are natural tool users this only demonstrates trial-anderror learning Insight Learning: Kohler’s Smart Chimp Then, the banana was placed just out of reach of Sultan’s extended arm with the stick in his hand There were two sticks lying around in the cage, which could be fitted together to make a single pole that would be long enough to reach the banana Sultan first tried one stick, then the other, and after about an hour he pushed one stick out of the cage as far as it would go toward the banana and then pushed the other stick behind the first one Of course when he tried to pull the sticks back, he could only get the one in his hand When Kohler gave him the stick back, he sat on the floor of the cage and looked at them carefully, he then put the sticks together and retrieved his banana Insight Learning: Kohler’s Smart Chimp Kohler called Sultan’s rapid “perception of relationships” insight Insight – the sudden perception of relationships among various parts of a problem, allowing the solution to the problem to come quickly Insight could not be gained through trial-and-error learning alone Learned Helplessness: Seligman’s Depressed Dogs Learned helplessness – the tendency to fail to act to escape from a situation because of a history of repeated failures in the past Seligman presented a tone followed by a harmless but painful electric shock to one group of dogs This group of dogs were harnessed so they could not escape the shock The researchers assumed the dogs would learn to fear the sound of the tone and later try to escape from the tone before being shocked Another group of dogs were not conditioned to fear the tone Learned Helplessness: Seligman’s Depressed Dogs The dogs were then placed in a box consisting of a low fence in the middle that divided the box into 2 compartments The dogs could easily see over the fence and jump over if they wanted Dogs who had not been conditioned to fear the tone quickly jumped from one side of the box to the other as soon as the shock occurred Learned Helplessness: Seligman’s Depressed Dogs When the dogs who were conditioned to fear the tone were placed in the box Instead of jumping over the fence when the tone sounded, they just sat there They showed distress but didn’t try to jump over the fence even when the shock itself began Learned Helplessness: Seligman’s Depressed Dogs Why? The dogs that had been harnessed while being conditioned had apparently learned in the original tone/shock situation that there was nothing they could do to escape the shock So when placed in a situation in which escape was possible, the dogs still did nothing because they had learned to be “helpless” They believed they could not escape, so they didn’t try More recently, learned helplessness has been studied from a neuroscientific perspective research has indicated that brain areas associated with fear/anxiety, suppression of the fight-flight response, and higher-level brain areas in the frontal lobe which determine whether or not a stimulus is controllable all play a role Learned Helplessness: Seligman’s Depressed Dogs The concept of learned helplessness has been extended to explain some behaviors characteristic of depression Depressed people seem to lack normal emotions and become somewhat apathetic They often stay in unpleasant work environments or bad marriages or relationships rather than trying to escape or better their situation Seligman proposed that this depressive behavior is a form of learned helplessness Depressed people may have learned in the past that they seem to have no control over what happens to them A sense of powerlessness and hopelessness is common in depressed people, and this seems to apply to the behavior of Seligman’s dogs Observational Learning Observational learning – learning new behavior by watching a model (someone else who is doing that behavior) perform that behavior Sometimes the behavior is desirable, and sometimes it is not Classic study of observational learning: Albert Bandura’s “Bobo doll” study Bandura and The Bobo Doll In this classic study 2 groups of preschool children were placed in a room with an experimenter and a model and watched as the model interacted with toys in the room Group 1 – the model interacted with the toys in a nonaggressive manner, completely ignoring the presence of a “Bobo” doll Group 2 – the model became very aggressive with the doll, kicking it and yelling at it, throwing it in the air and hitting it with a hammer Then each child was left alone in the room and had the opportunity to play with the toys in the room while a camera filmed them through a one-way mirror Children in group 1, who had watched the model ignore the Bobo doll, did not act aggressively toward the doll Children in group 2, who had watched the model act aggressively, beat up on the doll, hitting and kicking it, exactly imitating the model’s behavior Bandura and The Bobo Doll Obviously, the aggressive children had learned their aggressive actions from merely watching the model, with no reinforcement This is an example of learning/performance distinction – the observation that learning can take place without actual performance of the learned behavior Bandura and The Bobo Doll In later studies, 2 groups of children were shown the model acting aggressively on film Group 1 – watched the model act aggressively with the Bobo doll and then be rewarded Group 2 – watched the model act aggressively with the Bobo doll and then be punished When placed in the room with the doll, children from group 1 imitated the model’s aggressive actions but children from group 2 did not Then, children from group 2 were told they would receive a reward if they could show the experimenter what the model had done Each child from group 2 then correctly imitated the model’s aggressive behaviors Apparently, consequences do matter in motivating a child or an adult to imitate a particular model Bandura and The Bobo Doll This makes the tendency for some movies and TV programs to make “heroes” out of violent, aggressive “bad guys” particularly disturbing Recent nationwide study of young people ages 8-18 in the U.S. Found that young people spend almost 7.5 hours on average per day involved in media consumption (TV, computers, video games, music, cell phones, print, and movies) 7 days a week Given the prevalence of media multitasking (using more than one media device at a time) they are packing in approximately 10 hrs 45 mins of media during those 7.5 hours While not all media consumption is violent, it’s easy to imagine that some of that media is of a violent nature Bandura and The Bobo Doll Correlational research stretching over nearly 2 decades suggests that a link exists between viewing violent TV and an increased level of aggression in children While correlations do not prove that viewing violence on TV is the cause of increased violence, the link between violence and viewing violence on TV is a strong one Although still a topic of debate, there appears to be a strong body of evidence that exposure to media violence does have immediate and long-term effects Increasing the likelihood of aggressive verbal and physical behavior and aggressive thoughts and emotions in children, adolescents, and adults The 4 Elements of Observational Learning Attention – to learn anything through observation, the learner must first pay attention to the model Ex. A person at a fancy dinner party who wants to know which utensil to use has to watch a person who seems to know what is correct Certain characteristics of models can make attention more likely Ex. People pay more attention to those they perceive as similar to them, and to those they perceive as attractive Memory – the learner must be able to retain the memory of what was done Ex. Remembering the steps in preparing a meal that was seen on a cooking show Imitation – the learner must be capable of reproducing, or imitating, the actions of the model Ex. A 2 year old might be able to watch someone tie shoelaces and might even remember most of the steps, but the 2 year old’s chubby little fingers will not have the dexterity necessary for actually tying the laces Motivation – the learner must have the desire or motivation to perform the action Ex. The person at the fancy dinner party might not care which fork is the “proper” one to use If a person expects a reward because one has been given in the past, or has been promised a future reward, or has witnessed a model getting a reward they will be much more likely to imitate the observed behavior