cue integration - Brain & Cognitive Sciences

advertisement
Cue Reliabilities and Cue Combinations
Robert Jacobs
Department of Brain and Cognitive Sciences
University of Rochester
Visual Cue Combination
• Q: Why combine information from multiple cues?
• Q: That is, in what ways might judgments based on
multiple cues be superior to a judgments based on
individual cues?
Weak Versus Strong Fusion
• Terminology from Clark and Yuille (1990)
• Weak fusion: form a linear combination of the judgments
based on individual cues
• Strong fusion: form a nonlinear combination of the
judgments based on individual cues
Strong Fusion
• Q: What are some types of nonlinear combination that may
be worth considering?
Cue Combination
• Within visual modality:
– Much work has been done on visual depth perception
– Q: Do the lessons learned in this work apply to other
visual judgments?
• Across sensory modalities:
– Q: Are the cue combination strategies used by the
visual system also used to integrate information from
multiple sensory systems?
3-D Visual Perception is Easy
• Why is seeing the visual world in three dimensions so easy?
• Many cues to visual depth and shape:
– object rotation (kinetic depth effect)
– observer motion (motion parallax)
– binocular vision (stereopsis)
– shading gradients
– texture gradients in retinal images
– perspective (3-D world  2-D retina)
• Historically, individual cues have been studied in isolation
(Marr, 1982)
Visual Cue Combination
• However, no single cue:
– is necessary for depth or shape perception
– dominates our perception in all situations
– is capable of supporting perception with the robustness
and accuracy demonstrated by observers in natural
settings
• Need to study the use of multiple visual cues
• Recent emphasis on studying how observers combine
information from multiple visual cues
– key issue: cue reliability
Two Views of Cue Reliability
• What is cue reliability?
– a cue is reliable if the distribution of inferences given
that cue has a small variance
– a cue is reliable if the inferences based on that cue are
consistent with the inferences based on other cues
Visual Cues are Ambiguous
• Inverse optics problem
• Cue ambiguity
Linear Cue Combination
• Linear rule is simple
• Linear rule has received considerable empirical support
• Example: Object depth from motion and texture cues:
d (m, t )  wM d M (m)  wT dT (t )
• How do we compute motion and texture weights?
Kalman Filter
• Statistically optimal cue combination rule (given certain
mathematical assumptions)
• A highly reliable cue is one for which P(depth | cue) has a
small variance
• A less reliable cue is one for which P(depth | cue) has a
large variance
• Highly reliable cues are assigned a large weight
• Less reliable cues are assigned a small weight
Kalman Filter
Kalman Filter
Kalman Filter
*
• dM
 optimal depth estimate based on motion cue
--- this is the depth that maximizes P(depth | motion)
•
dT*  optimal depth estimate based on texture cue
--- this is the depth that maximizes P(depth | texture)
•
d  optimal depth estimate based on both cues
*
--- this is the depth that maximizes P(depth | m, t)
Kalman Filter
 
variance of P(depth | motion)
2

• T 
variance of P(depth | texture)
•
2
M
Kalman Filter
d  wM d  wT d
*
wM 
*
T
1
1
 M2

1

*
M
2
M

1

2
T
wT 
1

2
M
2
T

1

2
T
Do Observers Combine Cues In a
Statistically Optimal Fashion?
• Young, Landy, and Maloney (1993):
– Subjects view a pair of cylinders defined by motion and
texture cues
– Subjects judge which cylinder is greater in depth
– Reliability of either motion or texture cue was reduced
by corrupting the cue with noise
– When reliability of motion cue was decreased, subjects
tended to use the texture cue more
– When reliability of texture cue was decreased, subjects
tended to use the motion cue more
Do Observers Combine Cues In a
Statistically Optimal Fashion?
• Knill (1998):
– Texture patterns contain three nearly independent cues
to the orientation of a planar surface
• Perspective scaling of the texture elements
• Projective foreshortening of the texture elements
• Density of the texture elements
• Subjects viewed two simulated textured planar surfaces
• Subjects judged which of the two appeared more slanted
• Subjects relied primarily on the foreshortening cue,
secondarily on the scaling cue, and did not make
significant use of the density cue
• When the scaling cue was made less reliable, subjects
tended to use the foreshortening cue more
• When the foreshortening cue was made less reliable,
subjects tended to use the scaling cue more
Do Observers Combine Cues In a
Statistically Optimal Fashion?
• Ernst and Banks (2002): Quantitative study
• Stage One: Subjects feel two raised ridges and judge which
is taller
– Estimate variance of P(height | haptic)
• Stage Two: Subjects view two raised ridges defined by a
stereo cue and judge which is taller
– Multiple stereo noise conditions
– Estimate variance of P(height | stereo) for each noise
condition
• Based on results of Stages One and Two, use Kalman filter
model to predict subjects’ responses when both haptic and
stereo cues are available
• Stage Three: Subjects view and feel two raised ridges and
judge which is taller
• Predictions and Results:
– Low stereo noise
– Medium stereo noise
– High stereo noise
Cue Reliabilities and Cue Correlations
• A cue is regarded as reliable if the inferences based on that
cue are consistent with the inferences based on other cues
in the environment
• Hans Wallach (1985): induced motion and the moon
illusion
• Cue reliabilities and visual learning
Visual and Haptic Percepts
• Example: Bishop George Berkeley (1709)
Perception of depth results from associations between
visual cues and sensations of touch and motor movement
– “Touch educates vision”
This idea has not been seriously evaluated:
– visual capture
– difficult to experimentally test in a direct and detailed
manner
Research Question
Question:
Do observers adapt their visual cue combination strategies on
the basis of consistencies (and inconsistencies) between
visual and haptic percepts?
Visual Stimuli
• Horizontal cross-section of cylinder:
– Circular: cylinder equally deep as wide
– Elliptical: cylinder more deep than wide
– Elliptical: cylinder less deep than wide
• Visual cues to cylinder shape:
– Texture cue: homogeneous and isotropic texture mapped
to surface of each cylinder
– Motion cue: texture elements move horizontally along
the surface of a cylinder (constant flow field)
Texture Cue
Movie 1
Movie 2
Movie 3
Cue Conflict in Visual Stimuli
• Cue conflict:
– Computer graphics manipulation to independently
manipulate the shapes indicated by texture and motion
cues (Young, Landy, and Maloney, 1993)
• Example:
– Texture cue: circular cylinder
– Motion cue: elliptical cylinder that is more deep than wide
Virtual Reality Environment
Virtual Reality Environment
Virtual Reality Environment
Visual Cue Combination Model
• Linear model:
d
T ,M
w
T
d (t )
d (m)
w
w
T
M
T
M
(t , m)  wT d T (t )  wM d M (m)
 0,
w
M
 0,
w w
T
M
1
= depth percept based on texture
= depth percept based on motion
= linear coefficient associated with texture
= linear coefficient associated with motion
• Based on results of test trials, it is possible to estimate
wT and wM
Procedure
• Training procedure:
– view cylinder
– adjust thumb and index fingers to indicate depth
– grasp cylinder
– judge whether visual depth was less than, equal to, or
greater than haptic depth (no feedback)
Procedure
• Visual test:
– Subjects viewed two sequentially presented displays of
cylinders (one from set M, one from set T)
– Subjects judged which of the two cylinders was greater
in depth
– Allowed us to estimate cue weights wT and
w
M
Procedure
• Motor test:
– Subjects viewed a cylinder
– Subjects adjusted their thumb and index fingers to
indicate the perceived cylinder depth
– Allowed us to estimate cue weights wT and
w
M
Visual test: 4 out of 4 subjects have larger motion
weights after motion relevant training
Motor test: 3 out of 4 subjects have larger motion
weights after motion relevant training
Summary
• Observers can use haptic percepts as a standard against
which they can evaluate the relative reliabilities of
available visual cues.
• Observers can adapt their visual cue combination strategies
on the basis of consistencies between visual and haptic
percepts so as to place greater emphasis on depth
information from visual cues which are consistent with
haptic percepts.
Issues for Future Research
• Q: In addition to cue combination, what are other sorts of
things that one can do when information from multiple
visual cues is available?
• A: Cue recalibration
• What else?
Issues for Future Research
• Context-Dependencies:
– Example: People rely on depth-from-stereo info more
than depth-from-motion info when viewing nearby
objects but not when viewing distant objects
• Contributes to robustness and flexibility of observers’ cue
combinations
• What are the limits of these context-dependencies?
• Are these limits based on properties of the visual system?
Issues for Future Research
• Statistical metrics: cue variances and cue correlations
• Is mechanism computing cue reliabilities:
– General purpose
– Domain specific
Issues for Future Research
• Wallach (1985) speculated that:
– There is one primary source of info in every perceptual
domain, which is usable and not modifiable by
experience,
– Other cues are acquired later through correlation with
the innate process
• Which cues are usable innately versus those that are
acquired on the basis of experience?
Issues for Future Research
• What relationships exist between developmental events
that occur in infancy and adult mechanisms for assessing
cue reliabilities?
• Development of visual sensitivities to depth cues:
– Motion parallax (at birth, or nearly so)
– Binocular disparities (about four months of age)
– Pictorial cues (about eight months of age)
Issues for Future Research
• Neuroscientific underpinnings:
– Lateral occipital complex (LOC) preferentially
activated by visual objects and haptic objects
– Shows less activity for visual and haptic scrambled
objects and for visual and haptic textures
– Region appears to be a multimodal object-related
network
Download