Cognitive Neuroscience and Embodied Intelligence Perception and Attention Based on book Cognition, Brain and Consciousness ed. Bernard J. Baars courses taught by Prof. Randall O'Reilly, University of Colorado, and Prof. Włodzisław Duch, Uniwersytet Mikołaja Kopernika and http://wikipedia.org/ http://grey.colorado.edu/CompCogNeuro/index.php/CECN_CU_Boulder_OReilly http://grey.colorado.edu/CompCogNeuro/index.php/Main_Page Janusz A. Starzyk EE141 1 Image Recognition Problem •How do receptive fields form? •Why does the cortex encode oriented bars of light? Learning through correlations based on natural scenes •How do we recognize objects? In different locations, sizes, rotations, and images on the retina •Why does the visual system separate into where/what pathways? Spatial invariance is difficult, because different signs occupy partly the same receptive fields, and the same signs in different parts of the retina which are rotated or of a different size don't activate the same receptive fields at all. 2 EE141 Recognition Where does invariance come from? A 3D image based on 2D projections, what's remembered is just one 3D representation (Marr 1982). Syntactic approach: form a whole from pieces of a model. Variant (Hinton 1981): look for transformations (displacement, scaling, rotation), conform to the canonical representation in the memory. Problem: many 2D objects can form different 3D objects; it's difficult to match the objects because the search space to connect fragments into a whole is too large – do we really remember 3D objects? 3 EE141 Gradual transformations In the brain, rotational invariance is strongly limited – eg. recognizing rotated faces. Limited invariant object recognition can be achieved thanks to gradual hierarchical parallel transformations, increasing invariance and creating increasingly complex features of distributed representations. Goal: not 3D, but to retain enough details to be able to recognize objects in an invariant manner after transformation. •Map seeking circuits in visual cognition (D. W. Arathorn, 2002 ) 4 EE141 Object recognition model Model objecrec.proj has many hypercolumns, but very simple ones. We allow for regions and transformations between LGN, V1, V2 and V4/IT. 20 images, but only vertical/horizontal elements. The element combinations on the IT level should react invariably. Output = representation on the symbolic level. Objects to be recognized, 3 out of 6 possible segments. Training on 0-17, test on 18-19. 4 sizes, 5, 7, 9 and 11 pixels. 5 EE141 Object recognition model properties Hypercolumn: the same signals, displaced and partly overlapping. Elements inside the hypercolumn compete, kWTA, elements within the layer also compete – inhibition on a greater area. Complete inhibition = max (local, from the whole layer). Hypercolumns perform feature extraction across the whole field of vision => each hypercolumn can share the same set of weights. Objects are represented with the help of edges in the LGN On/Off layer, each 16x16, wrapped edges (spherical geometry). V1: has already-learned representations of vertical and horizontal edges, 4x4 receptive fields in the LGN, there are 8 vertical and horizontal edges for "on" and 8 for "off", together 16 = 4x4 units. V2: 8x8 hypercolumns, signals from ¼ of the field of vision, in a 4x4 matrix. V4/IT: 10x10, entire visual field, for such simple objects will suffice. 6 EE141 More properties Simulations without shared weights for the hypercolumns give the same results, but they are significantly more costly; the Hebbian mechanism leads to identical weights for columns with the same (xi,yi). Without Hebb, just error correction gives completely different representations for the hypercolumns, because it doesn't detect input correlations. Lack of horizontal connections – the representation of V1 is already set, so they're not necessary and they slow down learning; these connections are important in completion processes, illusions, recognizing obstructed objects. Parameters: Hebb =0.005, but between V1/V2 there is only 0.001 because sharing weights gives more frequent activations = hence change. Learning: a rate of 0.01 => 0.001 after 150 epochs in order to stabilize learning and speed up the initial learning. Network construction: BuildNet, check connection properties, r.wt. 7 EE141 Network exploration StepTrain, phase – and StepTrain, phase + The whole training requires many hours; one object can be in 4 sizes and 256 positions in a 16x16 grid, together there are 1024 images of one object, 18 training objects, 18,432 images. A trained network after 460 epochs x 150 objects per epoch, after 30,000 presentations reaches good results, fewer than 2 presentations/image. net_updt => cycle_updt will show learning over the whole cycle; on a trained network, phases – and + are the same. How does activity of V2 and V4 correlate with LGN inputs? Receptive fields resulting from average activation can be seen looking at the correlation of x from LGN, with y from V2 or V4, for each element of the 8x8 hypercolumn we represent every ri 8 EE141 Averaged activation receptive fields Activation of 16x16 LGN-on-center for one hypercolumn V2, 8x8 elements; weight sharing => others the same. Elements from the lower left corner of V2, receiving from ¼ of the whole LGN field. Bright stripes = selective unit for the edges (different sizes) in a specific location. V2 elements don't react to single lines only to their combinations. Diffused parallel stripes – reaction to the same combinations in different locations. 9 EE141 V2 off-center fields LGN-off-center activation for one V2 hypercolumn weight sharing => others the same. These elements react more to the ends of shorter lines. Elements reacting selectively take part in the representation of many images, they detect complex features shared among different objects. 10 EE141 V2 correlations – output objects The reaction of V2 units to detecting specific objects, or V2 correlations – averaged output 4x5 = 20 objects. 11 EE141 V4 correlations – output objects The reaction of V4 units to detecting specific objects, or V4 correlations – averaged output 4x5. Greater selectivity than in V2, because of greater invariance and reaction to more complex features. 12 EE141 Receptive field tests Observation of V2 and V4 reactions: 4 probes used in the tests, each shown in all positions of the left LGN input quadrant, or 8x8. V2 columns react to ¼ of the whole field. We calculate response on the V2/V4 level, quadrants respond to specific test probes; eg. for probe 0, reactions to all 8x8 positions of this probe are in the lower left quadrant for a given element, all of its activity for 4 elements is in the 16x16 square. EE141 13 V2 tests for probes Hypercolumn V2 has 8x8 elements, the reactions of each to 4 probes averaged across all positions are in the small 16x16 squares. 14 EE141 V4 tests for probes V4 has 10x10 elements, the reactions of each to 4 probes averaged across all positions are in the small 16x16 squares. Non-dependence on position can be seen by all the yellow squares. Some react to single features of probes, others to the whole probe, and some to the presence of elements which are in each probe. 15 EE141 Statistical tests Table 8.1 summarizes the test results of presenting 20 objects in all positions and the reaction (for probe >0.5) of V4 elements to these presentations. For one object in 256 possible positions and 4 sizes (1024 images) on level V4 there is on average 10 different activations. Detailed results are in objrec.swp_pre.err. Two unknown objects 18, 19 give only errors. Training with the goal of determining generalizations: presenting a new object one out of 4 times; in 36 out of 256 possible positions, sizes of 5 or 9 pixels, so 14% of positions and 50% of sizes, 72 images (7%). After 60 training epochs, 150 objects/epoch, learning constant 0.001, object 18 gave 85% correct answers out of 1024 images; object 19 gave 66% correct answers, for small sizes. 16 EE141 Dorsal pathway Recognition is a function of the ventral pathway, now let's turn to the dorsal pathway. Functions: motion detection, localization, "where” and how to act, but also on what to focus attention and how to shift attention from one object to another. Attention allows us to tie different properties of an object into one whole, to solve the problem of cohesion of sensations in spite of distributed processing; distributed activation => features related to each other, referring to one object. Mainly an attention model, an emergent process resulting from the structure and dynamic of neural networks, mainly inhibition. The effects of attention are universal, visible in different situations. What to pay attention to? Is this a well posed question? Dogs bite, but not only Spot, not only mongrels, not only black ones... 17 EE141 Spatial attention model The interaction of spatial representations with object recognition. How does the ventral pathway interact with the dorsal pathway? Different spatial representations in the parietal cortex, here is a simple map of spatial relationships. Posner task: attention is directed to the cue, which affects reaction times to a simple target, depending on whether it appears in the same region or a different region. Activation in a specified location => speed of recognition. EE141 No cue cue 18 cue Spatial attention model It's possible to mediate the attentional effects by V1, but then inhibition will prevent switching attention to another object. Original Posner model: the parietal cortex "frees” attention. Model O’Reilly There is direct feedback (V4-V5?) between the dorsal pathway and the ventral pathway plus a path through V1. Spatial attention influences recognition; thicker lines = stronger effect. Forced by the dorsal pathway (PC) EE141 19 Lesion studies Consequences of damage to early visual areas Different visual deficits can result from neural damage at different levels of the visual processing hierarchy. Damage to the retina can result in monocular blindness Damage to the LGN can lead to loss of vision in the contralateral visual field Damage to a small part of V1 can lead to a clearly defined scotoma. Patients with damage to V1 area may still perform better than chance forced choice discrimination of objects (blindsight), although they claim they see nothing. Although the pathway from retina to LGN to V1 provides most of visual inputs to cortex, several alternative subcortical pathways project to extrastriate areas (MT, V3, V4), bypassing V1. This may explain forced choice results. 20 EE141 Lesion studies Extrastriate lesions – damage outside area V1 EE141 Motion blindness caused by a lesion to area MT: the world appears to be a series of still snapshots. Crossing street is dangerous since the patient cannot tell how fast the cars are approaching. Pouring a cap of coffee becomes a challenge since she cannot tell how fast the liquid was rising. 21 Lesion studies Cortical color blindness may be caused by a lesion to area V4: The world appears to be drained of color, just shades of gray. Patients can perceive the boundaries of colors but cannot name them. 22 EE141 Lesion studies Damage to ventral object areas Visual Agnosia: Patients with visual agnosia have difficulties with recognizing objects because of impairments in basic perceptual processing or higher-level recognition processes Three types of agnosia: apperceptive agnosia, associative agnosia, and prosopagnosia Agnosia=to lack knowledge of EE141 23 Lesion studies Patients with apperceptive agnosia can detect the appearance of visually presented items, but they have difficulty perceiving their shape and cannot recognize or name them. Associative agnosia refers to the inability to recognize objects, despite apparently intact perception of the object. Patient can copy a picture of the object but does not recognize it. A patient mistook his wife for a hat. Associative agnosia results from damage to ventral temporal cortex. 24 EE141 Lesion studies Patients with optic ataxia can perceive visual orientation and recognize objects but cannot perform visually guided actions. Optic ataxia results from damage to parietal lobe in dorsal pathway. Patients with prosopagnosia are still able to recognize objects well, but have great difficulty recognizing faces. All faces look the same Patients can recognize animals but not people Brodman area no. 37 is responsible for face recognition over 90% of cells in area 37 responds to faces only. 25 EE141 Lesion studies fMRI analysis of the face recognition process. Visible is activity in right hemisphere in lower temporal area Face recognition is important from evolutionary perspective. 26 EE141 Lesion studies Patients with achromatopsia are unable to recognize colors. This is often a result of damage to area V4 or thalamus. 27 EE141 Lesion studies Daltonism refers to dichromacy characterized by a lowered sensitivity to green light resulting in an inability to distinguish green and purplish-red. It is an inherited defect in perception of red and green, or in other words, red-green colorblindness. 28 EE141 Dorsal pathway lesions Lesions in the parietal cortex strongly affect mechanisms of attention and spatial orientation, extensive lesions in one hemisphere lead to hemispatial neglect, the inability to focus attention to the half of the visual space which is opposite the lesion. For small unilateral lesions, we can see a noticeable slowing of attention switching to the damaged side. For more severe cases, switching attention is not possible. Bilateral lesions lead to Balint's syndrome, difficulties with the coordination of hand and eye movement, simultanagnosia; differences in attention switching times in the Posner task are small. Posner contended that this is a result of attention binding, the inability to disengage, but he didn't give the disengagement mechanism; it follows after focusing attention elsewhere – a better model assumes normal competition. 29 EE141 Lesion studies Self-portrait EE141 Damage to the posterior parietal lobe can lead to a unilateral neglect, in which a patient completely ignores or does not respond to objects in the contralateral hemifield. Patients with damaged spatialtemporal recognition forget about half the space even though they see it Patients with right parietal damage may ignore the left half of the visual field, eat half of the food from the plate, or apply make-up to half of the face. 30 Unilateral Neglect Horizontal line bisection task Copying drawings 31 EE141 Lesion studies Bilateral lesions to parietal areas can lead to a much more profound deficit called Balint’s syndrome, which is primarily a disruption of spatial attention. It can be characterized by three main deficits: Optic ataxia – inability to point into a target Ocular apraxia – inability to shift the gaze Simultanagnosia – inability to perceive more than one object in the visual field People with Balint’s syndrome appear blind since they only focus on one object and cannot shift attention to anything else. 32 EE141 Linking brain activity and visual experience Imagine you are sitting in a dark room and looking at a jacket on a chair. Since you cannot see well, your perception is driven by your imagination – you may perceive a strange animal, a person, or a statue sitting there. When vision is ambiguous, perception falters or alternates between different things. This is known as multistable perception. There are many examples of multistable patterns or ambiguous figures 33 that scientists use to investigate these neural correlates of consciousness. EE141 Linking brain activity and visual experience You can cause binocular rivalry here using a pair of redgreen glasses Binocular rivalry: what you see is what you get activated When two very different pattern are shown, one to each eye, the brain cannot fuse them together like it would normally do. What happens is striking: awareness of one pattern last few seconds, then the other pattern appears 34 EE141 Linking brain activity and visual experience What happens in the brain during binocular rivalry? Tong et al. tackled this problem by focusing on two category-selective areas in the ventral temporal lobes (FFA and PPA). They used the redgreen filter glasses to present a face to one eye and house to the other eye. Depending on which image was perceived, they observed 35 activities either in FFA (face) or PPA (house). EE141 Linking brain activity and visual experience Strength of activation of FFA and PPA was the same in the rivalry experiment as in the case of stimulus alternation. Another approach is to train monkey to report which of two patterns is dominant during binocular rivalry and measure activity of a single neurons in different parts of the brain. This experiment supports interactive model of visual perception where feedback projection modulates lower levels. 36 EE141 Linking brain activity and visual experience Another way to separate physical stimulation and perceptual awareness is a visual detection task. A subject has to detect a particular pattern. The researcher makes the pattern harder and harder to see. Sometimes there is no pattern at all in the picture. Because this task gets difficult, people will get it wrong sometimes. What is interesting, that when there is ‘false positive’ (people see pattern even when it is not there), there is strong activity in areas V1, V2, and V3. When the faint stimulus is not detected activities in these areas are much weaker. So, it does not matter what was presented, but what does matter is what is happening in the brain. 37 EE141 Linking brain activity and visual experience (a) (b) (c) EE141 Close your left eye, look directly at the cross with your right eye and move the page up close to your nose, then move it slowly away from your face, while keeping your eye fixed on the cross. At the right distance, which should be around 12 inches (30 cm) away from the page you should notice the red dot vanish. Likewise, notice how the black stripes now fill-in; they become joined and the red dot vanishes. Brain fills-in perception of the blind spot using visual information from around the blind spot – constructive perception or perceptual filling-in. 38 Linking brain activity and visual experience Adelson's motion without movement Optical illusions are a result of our mind filling-in patterns based on experience 39 EE141 Linking brain activity and visual experience Two color spirals Zoom in on the color spiral – two colors are the same shade of green. EE141 40 Linking brain activity and visual experience These pictures illustrate another type of filling-in known as neon color spreading (a) and visual phantoms (b). Neon color spreading were found in V1 area. In a similar way apparent motion that we see in a movie theater is another type of filling-in by neural activities in V1 area. 41 EE141 Linking brain activity and visual experience Neural correlates of object recognition In binocular rivalry, activity in the fusiform face area and parahippocampal place area is closely linked to the observer’s awareness of faces and houses. Other studies deals with visually masked objects which can just barely be recognized. Mooney face shown in figure can be recognized at right orientation, while it is hard to recognized at different orientations. If the objects are recognized activity in ventral temporal region is greater, while 42 activity in V1 region shows no difference.. EE141 Manipulations of visual awareness To find out causal relations between activities in various brain regions it is useful to directly stimulate the selected brain area with electrical impulses. One way is to use implants for instance in V1 area Another way is to use transcranial magnetic stimulation (TMS) TMS involves rapidly generating a magnetic field outside of the head to induce electrical activity on the cortical surface. Patients report various experiences including ‘out of body experience’ – seeing its own body from above. 43 EE141 Manipulations of visual awareness Unconscious perception We use the term unconscious perception when subjects report not seeing a stimulus, but their behavior or brain activity suggests that specific information about the unperceived stimulus was indeed processed by the brain. When two different stimuli are flashed in quick succession, the visual system can no longer separate the two stimuli. Instead, what people perceive is a mix, or a fused blend of the two images. They may respond to individual images in various brain areas without being aware of seeing them 44 EE141 Manipulations of visual awareness EE141 For instance, a quick presentation of a red square followed by a green square can be perceived as a yellow one. Presentation of the images of the house or face in complementary colors to different eyes has the same effect of not seeing one. However, the brain still responds to these unseen patterns – fusiform face area (FFA) to face and parahippocampal place area (PPA) to house. 45 Summary Vision is our most important sensory modality. We discussed the functional properties of neurons as visual signals travel up from the retina to the primary visual cortex and onward to higher areas in the dorsal and ventral visual pathways. Progressing up the visual pathway, receptive fields gradually become larger and respond to more complex stimuli, following the hierarchical organization of the visual system. V1 supports conscious vision, provides visual features like orientation, motion and binocular disparity. V4 is important for color perception. MT is important for motion perception. Damage to dorsal pathway leads to optic ataxia (neglect). Damage to ventral temporal cortex leads to impairments in object or face recognition. In ventral temporal cortex some regions like LOC have general role in object recognition, while others like FFA and PPA are more 46 specialized EE141 Attention model Model attn_simple.proj from page http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_AttnSimple Stimuli: single activations in one of 7 places, for two objects (cue, target). 3 layers, invariance increases, each element of the higher layer combines 3 lower ones, from this V1 is 2x7, Spat1, Obj1 2x5, Spat2, Obj2 is 2x3, output 2x1. Reaction time: time needed for the activity of the target output connected with Obj2 to reach 0.6 Spat2 reacts only to location. EE141 47 Exploring the model r.wt will show connections. The control panel has several scaling parameters: spat_obj = 2, weight scaling spat=>obj, obj_spat =0.5 (not shown) v1_spat = 2, stronger than v1_obj, light noise noise_var = 0.0005 cue_dur = 200 number of cycles during the time when the cue is presented, which is followed by the target. 3 situations for Multi_objs: a) two different objects, b) two identical objects, c) two different objects in the same place. act, step through all events several times View Graph_log and Run –recognition of overlapping elements is generally slower; view text_log; view batch_text_log, run batch. 48 EE141 Posner Task env_type std_Posner view events: 0 only target, 1 cue on the left, target on the left, 2 cue on the left, target on the right. Activation is not zeroed after presentation of the first stimulus, only after the whole group. Display on, clear graph log, step. Batch will repeat 10x, graph => How does the network shorten time on the same side? How does it lengthen time on the opposite side? Test spat_obj=1 and v1_spat=1.5, 1 Change to even_type Close_Posner and check the effects. EE141 49 Simple model of the Posner task Object recognition times: normalization scales the results to the average adult time. Cue Adult Valid Invalid D 350 msec 390 msec 40 msec Elderly 540 600 60 Patients 640 760 120 Elderly normalized 0.65 350 390 40 Patients normalized 0.55 EE141 350 418 68 50 Lesion effects Patients with lesions even after normalization have significantly longer times on the Posner task, while the elderly after normalization have differences just like normal adults. Lesion in a model: env_type Std_Posner, Lesion, lesion_lay = Spat1_2 to handicap both levels, the number of locations = half, number of elements = half, or 1 of 2. number of elements = half, or 1. Check (r.wt) that the weights were zeroed: two elements in the right corner of Spat_1, and one from the upper right corner of Spat_2 Batch to see the effect. 51 EE141 Lesions reversed If we reverse the task and switch attention from the side with the lesion to the other side. Set env_type to Reverse_Posner: differences are significantly smaller (different scale). Why? The normal side more easily competes with the damaged side, so the differences decrease – in accord with the patient observations. Bilateral lesions: Std_Posner, Full for location, half for a number of units, Batch The effect is clear, but weaker than for unilateral lesions. EE141 52 Full lesion Unilateral neglect with extensive damage. Simulation: Multi_obj, half for locations, full for a number of units, Run The network has a tendency to focus attention on the undamaged side, regardless of the presentation, neglecting half the area. Patients with unilateral neglect are incapable of picturing one side of the space only when the other side has a strong stimulus competing for attention (phenomenon of extinction). Similar neglect for Std_Posner. 53 EE141 Delay effects If after the cue we make a delay of about 500 ms, there appears an "inhibition of return" phenomenon, times partially reverse, a change in location causes a faster reaction! This can be simulated by lengthening the cue presentation time and allowing for neuron fatigue (accommodation). Defaults, No_lesion, enc_type = Std_Posner, accommodate Change from 75 to 200 every 25 ms 54 EE141 Object-based attentional effects Attentional effects connected with the interaction of location and object recognition will be similar to attentional effects connected with the recognition of competing objects (object-based attention). Env_type Obj_attn, View Events Events: 2 objects without cues. Cue in the central location, two objects in the central area, the network should focus on the first. Last two: cue and 2 objects in the same place; yellow = greater activation. Defaults, Step: the first object influences the selection even if the second object is more active. 55 EE141 Summary Attention effects appear naturally in the model as a result of competition between inhibition, interconnection, the necessity of compromise. Similar effects can be seen in different cortical mechanisms. Some psychological mechanisms (slowing attention) show themselves to be unnecessary. Attention effects supply specific information allowing models to be finetuned to comply with experiment results and allowing the use of these models for other predictions; there is also a lot of neurophysiological data concerning attention. Limits of this model: lack of effects connected with the thalamus (Wager, O’Reilly), very simple representation of objects (one feature). 56 EE141 Complex recognition model Model objectrec_multiobj.proj.gz, Chapt. 8.6.1 This model has two extra layers: Spat1 connected with V1 and Spat2 connected with V2. The Spat1 layer has an excitatory selfconnection, allowing it to focus on one object. The Target layer shows which image was chosen and whether it matches the output. 57 EE141 Two objects in different places BuildNet, r.wt to check connections, receptive fields in V1. LoadNet, r.wt to check after training. Spat_1 reacts to 8x8 fields in V1, wrapping the right onto the left Spat_2 reacts to 16x16 fields in V2. Two objects (perpendicular lines) with the same activation in different locations. StepTest, object # 12, presented in the lower left corner. Initial oscillations, but gradual advantage of one of the two locations and the object found there; influence on the lower layers, in V1 remains the activation of only one. View Test_log; we can see the errors in recognition, because the objects are small, and the simultaneous activation of V1 introduces confusion – lack of a saccade mechanism leading to the next, and not simultaneous activation. Reducing fm_sapt1_scale from 1 to 0.01, simultanagnosia, it's not possible to recognize two objects, only one! EE141 58 Influence of spatial location Spatial activation can at the most modulate the recognition process, otherwise we'll know where, but not what. This is ensured by inhibition and competition, recognition is a combination of spatial activation and strengthened features in lower layers. Switching objects: we turn on accommodation of neurons. Accommodate, InitStep, TestStep After fatiguing the neurons with the first object, attention moves to the second, after layer Spat1. Errors are often made, this is not yet a good control mechanism. Attention connected with an object can also be seen in this model. View, Test_Process_ctrl, environment from vis_sim_test => obj_attn_test (at bottom of ScriptEnv). Apply, Reinit, Step. The network recognizes object 17; Step network recognizes 12 and 17, stays with 17 59 EE141 Some answers Why does the primary visual cortex react to oriented edges? Because correlational learning in a natural environment leads to this type of detector. Why does the visual system separate information into the dorsal pathway and the ventral pathway? Because signal transformations extract qualitatively different information, strengthening some contrasts and weakening others. Why does damage to the parietal cortex lead to disorders of spatial orientation and attention (neglect)? Because attention is an emergent property of systems with competition. How do we recognize objects in different locations, orientations, distances, with different images projected on the retina? Thanks to transformations, which create distributed representations based on increasingly complex and spatially invariant features. 60 EE141