Cue Reliabilities and Cue Combinations Robert Jacobs Department of Brain and Cognitive Sciences University of Rochester Visual Cue Combination • Q: Why combine information from multiple cues? • Q: That is, in what ways might judgments based on multiple cues be superior to a judgments based on individual cues? Weak Versus Strong Fusion • Terminology from Clark and Yuille (1990) • Weak fusion: form a linear combination of the judgments based on individual cues • Strong fusion: form a nonlinear combination of the judgments based on individual cues Strong Fusion • Q: What are some types of nonlinear combination that may be worth considering? Cue Combination • Within visual modality: – Much work has been done on visual depth perception – Q: Do the lessons learned in this work apply to other visual judgments? • Across sensory modalities: – Q: Are the cue combination strategies used by the visual system also used to integrate information from multiple sensory systems? 3-D Visual Perception is Easy • Why is seeing the visual world in three dimensions so easy? • Many cues to visual depth and shape: – object rotation (kinetic depth effect) – observer motion (motion parallax) – binocular vision (stereopsis) – shading gradients – texture gradients in retinal images – perspective (3-D world 2-D retina) • Historically, individual cues have been studied in isolation (Marr, 1982) Visual Cue Combination • However, no single cue: – is necessary for depth or shape perception – dominates our perception in all situations – is capable of supporting perception with the robustness and accuracy demonstrated by observers in natural settings • Need to study the use of multiple visual cues • Recent emphasis on studying how observers combine information from multiple visual cues – key issue: cue reliability Two Views of Cue Reliability • What is cue reliability? – a cue is reliable if the distribution of inferences given that cue has a small variance – a cue is reliable if the inferences based on that cue are consistent with the inferences based on other cues Visual Cues are Ambiguous • Inverse optics problem • Cue ambiguity Linear Cue Combination • Linear rule is simple • Linear rule has received considerable empirical support • Example: Object depth from motion and texture cues: d (m, t ) wM d M (m) wT dT (t ) • How do we compute motion and texture weights? Kalman Filter • Statistically optimal cue combination rule (given certain mathematical assumptions) • A highly reliable cue is one for which P(depth | cue) has a small variance • A less reliable cue is one for which P(depth | cue) has a large variance • Highly reliable cues are assigned a large weight • Less reliable cues are assigned a small weight Kalman Filter Kalman Filter Kalman Filter * • dM optimal depth estimate based on motion cue --- this is the depth that maximizes P(depth | motion) • dT* optimal depth estimate based on texture cue --- this is the depth that maximizes P(depth | texture) • d optimal depth estimate based on both cues * --- this is the depth that maximizes P(depth | m, t) Kalman Filter variance of P(depth | motion) 2 • T variance of P(depth | texture) • 2 M Kalman Filter d wM d wT d * wM * T 1 1 M2 1 * M 2 M 1 2 T wT 1 2 M 2 T 1 2 T Do Observers Combine Cues In a Statistically Optimal Fashion? • Young, Landy, and Maloney (1993): – Subjects view a pair of cylinders defined by motion and texture cues – Subjects judge which cylinder is greater in depth – Reliability of either motion or texture cue was reduced by corrupting the cue with noise – When reliability of motion cue was decreased, subjects tended to use the texture cue more – When reliability of texture cue was decreased, subjects tended to use the motion cue more Do Observers Combine Cues In a Statistically Optimal Fashion? • Knill (1998): – Texture patterns contain three nearly independent cues to the orientation of a planar surface • Perspective scaling of the texture elements • Projective foreshortening of the texture elements • Density of the texture elements • Subjects viewed two simulated textured planar surfaces • Subjects judged which of the two appeared more slanted • Subjects relied primarily on the foreshortening cue, secondarily on the scaling cue, and did not make significant use of the density cue • When the scaling cue was made less reliable, subjects tended to use the foreshortening cue more • When the foreshortening cue was made less reliable, subjects tended to use the scaling cue more Do Observers Combine Cues In a Statistically Optimal Fashion? • Ernst and Banks (2002): Quantitative study • Stage One: Subjects feel two raised ridges and judge which is taller – Estimate variance of P(height | haptic) • Stage Two: Subjects view two raised ridges defined by a stereo cue and judge which is taller – Multiple stereo noise conditions – Estimate variance of P(height | stereo) for each noise condition • Based on results of Stages One and Two, use Kalman filter model to predict subjects’ responses when both haptic and stereo cues are available • Stage Three: Subjects view and feel two raised ridges and judge which is taller • Predictions and Results: – Low stereo noise – Medium stereo noise – High stereo noise Cue Reliabilities and Cue Correlations • A cue is regarded as reliable if the inferences based on that cue are consistent with the inferences based on other cues in the environment • Hans Wallach (1985): induced motion and the moon illusion • Cue reliabilities and visual learning Visual and Haptic Percepts • Example: Bishop George Berkeley (1709) Perception of depth results from associations between visual cues and sensations of touch and motor movement – “Touch educates vision” This idea has not been seriously evaluated: – visual capture – difficult to experimentally test in a direct and detailed manner Research Question Question: Do observers adapt their visual cue combination strategies on the basis of consistencies (and inconsistencies) between visual and haptic percepts? Visual Stimuli • Horizontal cross-section of cylinder: – Circular: cylinder equally deep as wide – Elliptical: cylinder more deep than wide – Elliptical: cylinder less deep than wide • Visual cues to cylinder shape: – Texture cue: homogeneous and isotropic texture mapped to surface of each cylinder – Motion cue: texture elements move horizontally along the surface of a cylinder (constant flow field) Texture Cue Movie 1 Movie 2 Movie 3 Cue Conflict in Visual Stimuli • Cue conflict: – Computer graphics manipulation to independently manipulate the shapes indicated by texture and motion cues (Young, Landy, and Maloney, 1993) • Example: – Texture cue: circular cylinder – Motion cue: elliptical cylinder that is more deep than wide Virtual Reality Environment Virtual Reality Environment Virtual Reality Environment Visual Cue Combination Model • Linear model: d T ,M w T d (t ) d (m) w w T M T M (t , m) wT d T (t ) wM d M (m) 0, w M 0, w w T M 1 = depth percept based on texture = depth percept based on motion = linear coefficient associated with texture = linear coefficient associated with motion • Based on results of test trials, it is possible to estimate wT and wM Procedure • Training procedure: – view cylinder – adjust thumb and index fingers to indicate depth – grasp cylinder – judge whether visual depth was less than, equal to, or greater than haptic depth (no feedback) Procedure • Visual test: – Subjects viewed two sequentially presented displays of cylinders (one from set M, one from set T) – Subjects judged which of the two cylinders was greater in depth – Allowed us to estimate cue weights wT and w M Procedure • Motor test: – Subjects viewed a cylinder – Subjects adjusted their thumb and index fingers to indicate the perceived cylinder depth – Allowed us to estimate cue weights wT and w M Visual test: 4 out of 4 subjects have larger motion weights after motion relevant training Motor test: 3 out of 4 subjects have larger motion weights after motion relevant training Summary • Observers can use haptic percepts as a standard against which they can evaluate the relative reliabilities of available visual cues. • Observers can adapt their visual cue combination strategies on the basis of consistencies between visual and haptic percepts so as to place greater emphasis on depth information from visual cues which are consistent with haptic percepts. Issues for Future Research • Q: In addition to cue combination, what are other sorts of things that one can do when information from multiple visual cues is available? • A: Cue recalibration • What else? Issues for Future Research • Context-Dependencies: – Example: People rely on depth-from-stereo info more than depth-from-motion info when viewing nearby objects but not when viewing distant objects • Contributes to robustness and flexibility of observers’ cue combinations • What are the limits of these context-dependencies? • Are these limits based on properties of the visual system? Issues for Future Research • Statistical metrics: cue variances and cue correlations • Is mechanism computing cue reliabilities: – General purpose – Domain specific Issues for Future Research • Wallach (1985) speculated that: – There is one primary source of info in every perceptual domain, which is usable and not modifiable by experience, – Other cues are acquired later through correlation with the innate process • Which cues are usable innately versus those that are acquired on the basis of experience? Issues for Future Research • What relationships exist between developmental events that occur in infancy and adult mechanisms for assessing cue reliabilities? • Development of visual sensitivities to depth cues: – Motion parallax (at birth, or nearly so) – Binocular disparities (about four months of age) – Pictorial cues (about eight months of age) Issues for Future Research • Neuroscientific underpinnings: – Lateral occipital complex (LOC) preferentially activated by visual objects and haptic objects – Shows less activity for visual and haptic scrambled objects and for visual and haptic textures – Region appears to be a multimodal object-related network