Chapter 9 Generalization, Discrimination, and the Representation of Similarity 9.1 Behavioral Processes 9.1 Behavioral Processes • When Similar Stimuli Predict Similar Consequences • When Similar Stimuli Predict Different Consequences • Unsolved Mysteries—Why Are Some Feature Pairs Easier to Discriminate Between Than Others? • When Dissimilar Stimuli Predict the Same Consequence • Learning and Memory in Everyday Life— Discrimination and Stereotypes in Generalizing about Other People 3 Generalization and Discrimination • Generalization—transfer of past learning to new situations/problems. Responding to one stimulus (S) as a result of training with another; influenced by similarity to the training stimulus. Specificity—deciding how narrowly a rule applies. Generality—deciding how broadly a rule applies. • Discrimination—recognition of differences between stimuli. 4 When Similar Stimuli Predict Similar Consequences • Generalization gradient—graph showing how physical changes in stimuli correspond to behavioral response changes. • In Guttman and Kalish study: Pigeons learned to peck a yellow light (training S) for food. Gradient shows how often they subsequently pecked different color shades (Fig 9.1). Gradient width illustrates level of S generalization. 5 (Fig 9.1) Stimulus Generalization Gradients in Pigeons Adapted from Guttman & Kalish, 1956, pp. 79–88. 6 What Causes Generalization Gradients? • Is it discrimination error? • Logical inference about shared consequences? • Shepard (1987): Identify regions of shared consequence. Assume all possible regions, small and large. Average probabilistically over all. Result: Standard exp-declining gradient Argues: “View exp-declining gradients as representing attempt to predict, based on past experience, how likely it is that what is true about the consequences of one stimulus will also be true of other similar stimuli.” 7 Generalization as a Search for Similar Consequences • Consequential region—all stimuli with the same results as the training stimulus, as mapped on a generalization gradient. • For example, the pigeon has a moderate expectation to get food from pecking a yellow-range light (given Fig 9.1). 8 The Challenge of Incorporating Similarity into Learning Models • Discrete-component representation— representation in which each individual stimulus (or stimulus feature) corresponds to its own node or “component.” Simplest possible scheme to represent stimuli. • Fig 9.2 uses discrete-component representations. Shows an unrealistic generalization gradient. 9 (Fig 9.2) Stimulus Generalization Model Using Discrete-Component Representations 10 Limitations of DiscreteComponent Representations • Representations are applicable to situations in which stimuli are dissimilar and little generalization would occur. Fail when stimuli have high degree of physical similarity. • Note: Different representations in different contexts provide different patterns of similarity. Representations are context-specific. 11 (Fig 9.3) Generalization Gradient Produced by Discrete-Component Network of Fig 9.2 *Shows no response to yellow-orange light (despite similarity to previously trained yellow light). Only responds to trained “yellow” stimulus.; fails to show a smooth generalization gradient like that shown in Fig 9.1. 12 Shared Elements and Distributed Representations • Thorndike (law of effect), Estes (stimulus sampling theory), Rumelhart (connectionist models) contributed to a contemporary associative-learning model. Conceptualized with distributed representations (overlapping pools of stimulus nodes). Similar stimuli activate common elements; something learned about one stimulus transfers to other stimuli that activate the same nodes. 13 Thorndike and Estes Shared Elements Yellow Orange Network model follows…14 Shared Elements and Distributed Representations • Fig 9.4a–d shows a network model using distributed representations. Nodes laid out in topographic representation (nodes responding to physically similar stimuli placed beside each other in the model). • 9.4a shows the model (which is only slightly more complicated than Fig 9.2). • 9.4b shows the outcome in distributed weights after many acquisition trials. 15 (Fig 9.4a) Distributed Representation Network 16 (Fig 9.4b) Train “Yellow” 17 Shared Elements and Distributed Representations • 9.4c shows response strength from a stimulus (yellow/orange) test. • 9.4d shows the weaker response to a more varied stimulus (orange) test. • Such a distributed representation model better matches real life gradients, much like Fig 9.1 (see Fig 9.5). 18 (Fig 9.4c) Test “Yellow-Orange” Some Decline in Response 19 (Fig 9.4d) Test “Orange” More Decline in Response 20 (Fig 9.5) Stimulus Generalization Gradient Produced by Distributed Representation Model of Fig 9.4 21 When Similar Stimuli Predict Different Consequences • Two substances that appear similar initially, may become distinguishable over time. • Example: Gooseberries look like green grapes. If you are allergic to gooseberries, you learn to distinguish them from green grapes (discrimination). 22 Discrimination Training and Learned Specificity • The weaker the generalization, the stronger the discrimination. Discrimination = differential responding to two stimuli. Discrimination can be trained; in discrimination training, two different (but similar) stimuli are presented on each trial. • The steeper (and skinnier) the gradient, the higher the discrimination. 23 Discrimination Training and Learned Specificity • Fig 9.6 shows the adapted results of a classic 1962 experiment (Jenkins studies tone discrimination in pigeons). One gradient represents the test pattern for pigeons that heard a 1000 Hz tone before they pecked and received food. The other gradient represents the generalization for pigeons that were intermittently exposed to a similar 950 Hz tone without food. Which is the control group? Experimental group? 24 (Fig 9.6) Generalization Gradients for Tones of Different Frequencies Adapted from Jenkins and Harrison, 1962. 25 Unsolved Mysteries—Why Are Some Feature Pairs Easier to Discriminate between Than Others? • Some pairs of stimulus features are separable, such as brightness and hue. • Other feature pairs are perceived holistically, such as brightness and saturation. • Understanding the nature of feature pairs relates to stimulus generalization. 26 The Two-Dimensional Filtering Task 27 Negative Patterning: Differentiating Configurations from Their Individual Components • Negative patterning occurs when we respond positively to two stimuli presented separately, but we respond negatively to the compound (i.e., the combination). • Example: Mom at home? Eat dinner in the kitchen. Dad at home? Eat dinner in the kitchen. Both Mom and Dad at home? Don’t eat dinner in the kitchen (Eat in the dining room). 28 Negative Patterning • Rats, monkeys, and humans learn negative patterning tasks. • Rabbits can learn to blink to either a tone or a light, and to not blink to a simultaneous tone and light. 29 Negative Patterning in Rabbit Eyeblink Conditioning Adapted from Kehoe, 1988, Figure 9. 30 Negative Patterning • Single-layer network models using discretecomponent representations cannot learn negative patterning. 31 Negative Patterning • Fig 9.11 shows a multi-layer network model for negative patterning. Include extra nodes that only fire when two or more specific features present. • In Fig 9.11, a configural node for “tone + light” will fire only if both inputs are active. 32 (Fig 9.11) Solving Negative Patterning with a Network Model 33 Configural Learning in Categorization • Configural tasks require sensitivity to combinations of stimulus cues, above and beyond what is known about stimulus components. • Configural nodes can be applied to categorization learning, where humans learn to classify stimuli into categories. e.g., diagnosis from symptoms. 34 Configural Learning in Categorization • Figure 12a–12c shows a configural-node model of category learning. • 12a shows the model. • In 12b, both fever and soreness together (without ache) predicts the disease. Dilemma = combinatorial explosion • 12c is a simpler, more flexible (alternative) model. 35 (Fig 9.12a) 36 (Fig 9.12b) 37 (Fig 9.12c) 38 When Dissimilar Stimuli Predict the Same Consequence • Co-occurrence of stimuli may increase generalization. • Example: If you like the cookies at a new bakery, you may like their brownies. 39 Sensory Preconditioning: Similar Predictions for Co-occurring Stimuli • Sensory Preconditioning—conditioning without an explicit US. Prior presentation of compound stimuli results in later tendency for learning about one stimulus to generalize to the other. 40 Sensory Preconditioning *Example* • Step 1: (tone, light) • Step 2: (light, puff) CR eyeblink should develop over acquisition trials. • Step 3: (tone alone) If CR eyeblink occurs, we call this phenomenon “sensory preconditioning.” • Illustrates the generalizability of a stimulus’s power! The tone was never presented as a cue for the puff! 41 Sensory Preconditioning 42 Acquired Equivalence: Novel Similar Predictions Based on Prior Similar Consequences • Acquired equivalence—prior training in stimulus equivalence increases amount of generalization between two stimuli, even if stimuli are superficially dissimilar. • In Hall study, pigeons learned the dissimilar colors paired separately with the same color had the same result. Demonstrated this generalization in a new situation. 43 Acquired Equivalence 44 Learning and Memory in Everyday Life— Discrimination and Stereotypes in Generalizing about Other People • Category formation is a basic cognitive process. • Rational generalizations let us tentatively generalize individual outcomes from previous experiences. • Stereotyping is denying exceptions for individuals from a group for which we may hold oversimplified beliefs. Attempts to justify unfair treatment. 45 9.1 Interim Summary • Generalization = transfer of past learning to new situations and problems. Requires finding balance between specificity (knowing how narrowly a rule applies) and generality (knowing how broadly the rule applies). • Discrimination = recognizing differences between stimuli; knowing which to prefer. • Understanding similarity is essential to understand generalization and discrimination. 46 9.1 Interim Summary • Discrete-component representations: assign each stimulus (or feature) to its own node. Applicable to situations in which similarity among features is small enough that there is negligible transfer of response from one to another. • Distributed representations: incorporate idea of shared elements. Allow creation of psychological models with concepts represented as patterns of activity over many nodes; provide ability to model stimulus similarity and generalization. 47 9.1 Interim Summary • We tend to assume that patterns formed from compound cues will have consequences that parallel (or even combine) what we know about the individual cues. • However, some discriminations require sensitivity to the configurations of stimulus cues above and beyond what is known about the individual stimulus cues. 48 9.1 Interim Summary • Animals and people can learn to generalize between stimuli that have no physical similarity but that do have a history of co-occurrence or of predicting the same outcome. 49 9.2 Brain Substrates 9.2 Brain Substrates • Cortical Representations and Generalization • Generalization and the Hippocampal Region 51 Cortical Representations of Sensory Stimuli • Initial cortical processing of sensory information (vision, sound, touch, etc.) occurs in areas dedicated to each sense. • Areas in the mammalian cortex can be organized into topographical maps (e.g., homunculi for primary sensory and motor cortices. 52 Topographic Map of the Primary Sensory Cortex Adapted from Penfield & Rasmussen, 1950. 53 Shared-Elements Models of Receptive Fields • Does receptive field function match generalization theories? • If brain is organized in distributed representations, similar stimuli should activate common nodes (or neurons). 54 Shared-Elements Models • Fig 9.17a–c shows a shared-elements network model of generalization. 9.17a shows how a 550-Hz tone might activate nodes 2, 3, and 4. 9.17b shows how a similar 560-Hz tone might activate nodes 3, 4, and 5. 9.17c illustrates the node overlap (3 and 4) generalization between 550-Hz tone and a 560Hz tone. 55 (Fig 9.17a) 56 (Fig 9.17b) 57 (Fig 9.17c) 58 Shared-Elements Models of Receptive Fields • Auditory neurons respond to varying frequencies. Each neuron responds best to one frequency (see Fig 9.18). The wider the neuron’s receptive field, the broader the range of physical stimuli (in this case, auditory frequencies) processed by that neuron. 59 (Fig 9.18) Activity of node/neuron #3 in Fig 9.17 is recorded for each of the tones between 520 Hz and 580 Hz; the best frequency is 550 Hz. 60 Topographic Organization and Generalization • Richard Thompson’s 1960s animal studies found that intact auditory cortex is necessary for auditory generalization from a specific tone. Such sensory receptive fields can change from learning. 61 Plasticity of Cortical Representations • Even in adult animals, cortical areas temporarily shrink from disuse and spread from use. • Weinberger studies indicate that cortical plasticity is due to stimulus pairing. Stimulus presentation alone doesn’t drive plasticity, stimulus needs to be meaningfully related to consequence. 62 Plasticity of Representation in the Primary Auditory Cortex Adapted from Weinberger, 1977, figure 2. 63 Plasticity of Cortical Representations • The nucleus basalis in the basal forebrain releases acetylcholine (ACh) throughout the cortex. ACh facilitates cortical plasticity. 64 Generalization and the Hippocampal Region • Generalization shown in sensory preconditioning is disrupted by lesioning. Lesioned rabbits display no sensory preconditioning. 65 Hippocampal Region and Sensory Preconditioning Drawn from data presented in Port & Patterson, 1984. 66 Generalization and the Hippocampal Region • Similarly, rats with hippocampal region damage (lesions in the entorhinal cortex) showed poor acquired equivalence. Latent learning in rabbit eyeblink conditioning was eliminated with entorhinal cortical lesions. 67 Latent Inhibition in Rabbit Eyeblink Conditioning Adapted from Shohamy, Allen, & Gluck, 2000. 68 Modeling the Role of the Hippocampus in Adaptive Representations • Gluck and Myers (1993, 2001) propose that compression and differentiation of stimulus representations are computed in the hippocampal region. Region acts as an “information highway.” Cerebral cortex and cerebellum process associations for behavioral response and storage. • Research supports this model. 69 9.2 Interim Summary • While it is possible for an animal without an auditory cortex to learn to respond to auditory stimuli, an intact auditory cortex is essential for normal auditory generalization. Without their auditory cortex, animals can learn to respond to the presence of a tone, but cannot respond precisely to a specific tone. 70 9.2 Interim Summary • Cortical plasticity is driven by the correlation between stimulus and salient event. Plasticity is not driven by presentation alone; stimulus has to be meaningfully related to ensuing consequences. But, primary sensory cortices do not receive information about which consequence occurred, only that some sort of salient event has occurred. Thus, primary sensory cortices only determine which stimuli deserve expanded representation and which do not. 71 9.2 Interim Summary • When stimulus is paired with salient event (such as food or shock), nucleus basalis becomes active. Delivers acetylcholine to cortex. Enables cortical remapping to enlarge the representation of stimulus in the appropriate primary sensory cortex. 72 9.2 Interim Summary • Hippocampal region plays key role in learning behaviors that depend on stimulus generalization. e.g., classical conditioning paradigms of sensory preconditioning and latent inhibition. • Computational modeling suggests that role is related to hippocampal region’s compression and differentiation of stimulus representations. 73 9.3 Clinical Perspectives 9.3 Clinical Perspectives • Generalization Transfer and Hippocampal Atrophy in the Elderly • Rehabilitation of Language-Learning-Impaired Children 75 Generalization Transfer and Hippocampal Atrophy in the Elderly • Hippocampal or entorhinal cortical atrophy may be early sign for Alzheimer’s disease. Adapted from de Leon et al., 1993. Images courtesy of Dr. Mony de Leon NYU School of Medicine 76 Generalization Transfer and Hippocampal Atrophy in the Elderly • Myers and associates developed a method to study Human acquired equivalence study generalization transfer in the elderly. Phase 2: train new outcome Phase 3: transfer Phase 1: equivalence training Adapted from Myers et al., 2003. 77 Human Acquired Equivalence Study • Phase 1: Learned to associate the blue fish with the brunette and the blue fish with the blonde (equivalent preference). • Phase 2: Learned to associate the red fish with the brunette. • Phase 3: Can they generalize this red fish preference to the blonde? 78 Human Acquired Equivalence Study • Results: Healthy participants completed all three phases. Participants with hippocampal atrophy completed phases 1 and 2, but could not transfer learning in phase 3. • Test might be a quick and easy screening tool for potential cognitive impairment. 79 Rehabilitation of Language-Learning-Impaired Children • Language learning impairment (LLI)— language-learning problems not attributable to known factors. Children with normal intelligence but very low scores on oral language tests. Tallal found that problem was not languagespecific; rather, a problem in rapid sensory processing. 80 Rehabilitation of Language-Learning-Impaired Children • In study (Temple et al, 2003): Participants = 20 dyslexic children (8–12 years old) and 12 children matched for age, gender, handedness, and non-verbal IQ. All received fMRI during a rhyming task before and after dyslexic children’s training. Study includes behavioral remediation program to improve auditory and language processing. Uses non-linguistic and acoustically modified speech. Conducted 5 days per week, 100 min. per day, for 27.9 (average) training days. 81 Rehabilitation of Language-Learning-Impaired Children • Results: Children’s language and reading scores increased. fMRI increases in language-processing areas (left temporo-parietal cortex). • Illustrates cortical plasticity in children from intense behavioral treatment. 82 Brain Plasticity in Children with Dyslexia Data from Temple et al., 2003; Images courtesy of Elise Temple. 83 9.3 Interim Summary • Some forms of generalization depend on medial temporal lobe mediation. • Elderly individuals with hippocampal region atrophy (a risk factor for subsequent development of Alzheimer’s disease) can learn initial discriminations but fail to appropriately transfer learning in later tests. 84 9.3 Interim Summary • Studies of dyslexia and other language impairments provide examples of how insights from animal research on cortical function can have clinical implications for humans with learning impairments. 85