EyetrackingWorkshop

advertisement
Eye tracking
Applications within cognitive science
Dr. Christa van Mierlo
Why is eye tracking used in cognitive science?
Scan patterns give delayed information about the mental
processes that are developing in a person’s mind and
reveal what visual information is (going to be) used by
these processes.
Frequently studied EM components
•
Fixations
– Gaze stays fixed on one position
– Intake of new visual information
– Planning of new eye movement
•
Saccades
– Fast movement of both eyes in the same direction
– Processing of new visual input is limited:
low spatial frequencies attentuated
high spatial frequencies unaffected.
– Top velocity proportional to amplitude
Visual Search
•
The sometimes difficult process of finding a target among
distractors in often cluttered visual environments.
•
Physical and cognitive processing limitations can prevent
us from instantly recognizing the presence of a target item
in a single glance (e.g. a large number of shared features
with distractor or fuzzy target specifications)
•
This can be overcome by focusing attention:
– Bottom up
– Top down
Bottom-up factors drawing attention
– onsets (e.g., Theeuwes, Kramer, Hahn, Irwin, &
Zelinsky, 1999; Yantis & Jonides, 1984)
– unique colors (e.g., Theeuwes, 1994; Theeuwes &
Burger, 1998)
Even when the location of the target is known, highly
salient features that are known not to be associated
with the target can still capture attention (Christ &
Abrams, 2006).
Top-down factors focusing attention
•
Target specificity: the number of features the candidate
shares with the target (e.g., Folk, Remington, & Johnston,
1992; Folk, Remington, & Wright, 1994)
•
Memory (Boot, McCarley, Kramer, & Peterson, 2004;
Brockmole & Henderson, 2006; Peterson & Kramer, 2001).
Four Eye Tracking studies within Visual Search
– The effects of target template specificity on visual
search in real-world scenes: Evidence from eye
movements (Malcolm & Henderson, 2009)
– Comparing eye movements to detected vs. undetected
target stimuli in an identity search task (Jacob &
Hochstein 2009)
– Stable individual differences in search strategy?: The
effects of task demands and motivational factors on
scannning strategy in visual search (Boot et al., 2009)
– Where to look next? Eye movements reduce local
uncertainty (Renninger et al. 2007)
The effects of target template specificity on visual search
in real-world scenes
Malcolm & Henderson (JOV 2009)
Searching is faster for more explicitly specified targets. Why?
– Does this affect the activation map that is used to select probable
target regions for fixation?
– Does this allow for faster evaluation of a target candidate?
– Or does it simply allow the search to begin faster?
Method
Subjects had to look for a specific object within a visual
scene.
– Target could be specified by a word or a picture.
Pictures specify the target template more elaborately
than words.
– To manipulate the time that the subject had to build up a
target template and keep it salient in memory, they
manipulated the SOA between the cue presentation and
the onset of visual scene (short/long).
– To manipulate target familiarity, the target specification
was either shown 4 times to the subject prior to
experiment or not at all.
Analysis
Divide scanpaths in different epochs:
– timing of first saccade (= time it takes to determine the
first possible candidate for target)
– time it takes to find target (all saccades and fixations up
to the first fixation on the target, representing processes
in which target candidates are selected and rejected)
– time it takes to decide that the object really is the target
(verification)
Results
Picture rather than word cues resulted in:
– Faster total search times
– Shorter scanning and verification times
•
Fewer regions visited
•
Shorter scanning fixation durations (rejection of
distractors)
Longer SOA’s resulted in faster search initiation, but no
interaction with cue type
Discussion
Knowledge of a target’s appearance prior to search
benefits scanning in 2 ways:
– Facilitating the selection of potential target locations
– Decreasing the time that it takes to reject fixated
distractors before moving on to next potential target
Proposed neural mechanism
•
People represent the visual scene in an activity map
•
Search is accelerated by increasing the topographical
activity that is associated with target similar features and
decreasing the noisy activity of target irrelevant features.
So that candidate selection and verification of the target
happen quicker.
Comparing eye movements to detected vs. undetected
target stimuli in an identity search task
Jacob & Hochstein JOV 2009
In conscious search:
– What determines of a target will be found?
– Does conscious detection come before or after concentrated
fixations on the target?
– What is the relation between repeated fixations on the same scene
region and limited WM capacity?
– What in the sequence of fixations reflects or influences ultimate
conscious perception?
Method
•
Find two identical cards among distracters
•
In each set there were two pairs of identical cards instead
of just one
•
Participants were not informed of this
•
After a learning session of 100 trials their eye movements
were measured for 50 trials
Analysis
Compare fixations on detected targets with fixations on undetected
targets
the detected pair
in red
the undetected pair
in blue.
Results
•
More and longer fixations on detected items than on
undetected items.
•
Less distance between fixations on detected than on
undetected items.
•
The patterns of fixations are nearly identical up to the point
of approximately 4 fixations before the end of the trial (~1.5
s before the first mouse click). This is true for both long and
short trials.
•
The number of fixations needed for identification is more or
less fixed within a range.
Discussion
So fixations are needed to identify the target; detection is
not an inherent property of the stimulus!
Does the large number of fixations give rise to detection or
is it a result of detection?
•
The fixations just before the mouse click did not always
land on the target cards, indicating that they were not the
results of an verification process.
•
The relatively small increase in cumulative number of
fixations on detected pairs in long searches, implies that
the number of fixations on targets needed for identification
is defined within a certain period of time.
So it seems that the increase in fixations near the target
are necessary for its detection and do not result from
verification!
Why a short burst of fixations near target cards just before
detection?
It may be difficult to keep many cards in working memory at
the same time, so that fixations need to be close to each
other to associate place with identity. Since the sequential
distance decreases when approaching detection; perhaps
a necessary condition for detection is that two cards be
represented concurrently in working memory.
•
Detection may depend on the increase in fixations rising
above some threshold.
•
The point where the slope exceeds a pre-determined
threshold may be regarded as the bifurcation point where
there is a change of state in the search process; a
transition between a first stage of “search in the dark” to a
second stage of “early implicit recognition”.
Proposed model
•
Stage 1: Initial search; random fixations on the different
cards in arbitrary order.
•
Stage 2: Implicit (unconscious) recognition of the target
pair, perhaps controlling and guiding eye movements to the
relevant sensed location of these target cards.
•
Stage 3: Insight: Explicit detection with conscious
knowledge of target presence and its location followed by
rapid marking of the two cards.
Stable individual differences in search strategy? The effects of task
demands and motivational factors on scannning strategy in visual
search
Boot et al., JOV 2009
This study seeks to further evaluate and understand individual
differences in visual search behaviour in the context of search tasks in
which poor strategies can have a major impact on performance.
Background
•
In Boot et al. (2006) , participants viewed dynamic displays
in which up to 24 dots moved across the display.
•
During some trials a new dot appeared in the display and
the task of the participant was to push a button when this
occurred.
A surprisingly large range in accuracy:
some participants almost always detected the new dot
others missed 50% or more of the onset events
•
The more participants moved their eyes among moving
objects in the display, the fewer targets they detected.
•
When overt searchers were instructed to search covertly,
their performance matched the performance of covert
searchers. Conversely, covert searchers instructed to
search overtly performed just as poorly as overt searchers.
•
This ability to switch strategies suggests that strategy is not
dictated by the size of an individual’s attentional field or
individual differences in visual processing.
Their current study seeks to explore:
–
whether stable individual differences in preference for a
certain scanning strategy might explain maladaptive
scan strategies
– the degree to which strategy might be modulated by
task demands, feedback, motivation and monetary
incentives
Method
Study scanning strategies during:
– dynamic dot detection task
– an efficient search task (a 45° left or right tilted line
among vertical lines)
– an inefficient search task (a tilted T among randomly
tilted Ls that had an offset of the _)
– a change blindness task in which participants searched
for changes in driving scenes (change in colour,
presence or position masked by other changes).
– Change blindness and inefficient search require focal
attention to the target (overt attention).
– Dynamic dot detection task and efficient search task do
not (covert attention).
Analysis
•
As a measure of overt versus covert searchers: average
number of eye movements made per second.
•
averaged across set sizes.
•
correlated across tasks:
If participants use the same scan strategy in different
tasks, regardless of whether or not this strategy is
adaptive, then the rates of eye movements on the
different search tasks should be correlated.
Predictions
Performance in the dynamic dot detection task has been
shown to be almost exclusively driven by strategy (Boot et
al., 2006).
If the eye movements on this task differ individually but are
similar to that those seen on other visual search tasks for
each subject, these differences in scan pattern are likely to
be caused by differences in strategy choice, not differences
in visual processing ability.
In difficult and inefficient search tasks, a covert search
strategy would be highly maladaptive due to the difficulty of
discriminating complex stimuli in the periphery.
In an efficient or easy search task, eye movements might
hinder performance by focusing attention on individual
items rather than allowing the unique target item to popout.
Results
•
A covert scanning strategy was the most optimal strategy in
the dynamic dot detection task
•
A clear trend toward faster response times was found for
more overt searchers in the change blindness task.
•
An overt scanning strategy was the most optimal strategy
for the inefficient search task
•
No effect of scanning strategy on performance for the
efficient search task
•
Observers retain their scanning strategy across different
tasks; however, they also adjust their scanning strategy
depending on the task performed.
•
Those observers who adjust their scanning strategy to a
greater degree exhibit the greatest overall benefit in
accuracy.
Discussion
•
Although strategy remained similar, task-specific
modulation of saccade rate was clearly observed.
Participants made fewer saccades in tasks such as the
dynamic dot detection task and the efficient search task
compared to the change blindness task and the inefficient
search task.
•
However, in general, strategy tended to remain similar
across tasks, even when that strategy resulted in slow or
inaccurate performance.
Experiment 2
•
Can participants modify their scanning approach when it
becomes clear that it is resulting in poor performance?
•
In Experiment 2, participants were provided with feedback
after each trial, and monetary incentive to ensure feedback
would be attended.
– If participants do not modify their strategy this would be
evidence of strong, stable individual differences.
– If participants change their strategy based on feedback
and motivation, similar strategy across many tasks
seems to be a weak preference to utilize one strategy
over another under conditions of uncertain performance
and low consequences.
Method
•
Dynamic dot detection versus inefficient search (a tilted T
among randomly tilted Ls that had an offset of the _ of the
Ls).
•
Explicit feedback
– for DDD:
‘incorrect; you missed the target/no target present’ or
‘correct – target/no target’ present’
– For IS:
‘You were fast!’ or ‘A bit slow!’
Fastest participant received an additional 20 dollars in
payment
Results
•
Feedback and monetary incentives caused participants to
shift their strategy rather than maintain similar strategies
across tasks.
•
Thus, based on situational factors, participants will
abandon their default strategy and adopt a strategy that is
more adaptive to the task at hand.
General Discussion
•
Scan strategies remain stable across a variety of both
static and dynamic tasks when the relationship between
strategy and performance is unclear or motivation to
perform well is low.
•
This suggests that participants might be utilizing a default
strategy.
•
Scan strategies also appear to be shaped by the task. On
average, participants tended to adopt more overt or covert
strategies depending on the demands of the visual search
task at hand, even without explicit feedback about
performance.
•
Those participants who varied their strategy performed
more accurately overall compared to participants who
showed less variability across tasks
Why are there differences in default strategy?
Maybe to compensate for:
– Differences in visual discrimination ability
– Differences in attentional abilities
As a result of differences in the structure and function of
various brain regions known to control endogenous eye
movements
Where to look next? Eye movements reduce local uncertainty
Renninger et al. JOV 2007
•
They use information theory to probe the underlying decision
strategies that govern eye-movement planning.
•
To evaluate the validity of their model they compare individual
fixations against strategy predictions using a signal detection
approach.
Method
Subjects had to familiarize themselves with a novel and
abstract silhouette and decide whether a second silhouette
was identical to the one that they familiarized themselves
with.
•
Five levels of difficulty: degree of
boundary change
•
Degree of boundary change was
calculated as the change in
orientation entropy along the
boundary using a ‘fixation’ at the
shape centroid. This metric scales
with human shape discrimination
performance (Renninger,
Verghese, & Coughlan, 2005a).
Model: Global strategy
•
Model’s aim is to build an accurate representation of each
shape as it is studied with eye movements so that it can be
discriminated from a highly similar shape during the
matching phase.
•
Given current knowledge of V1 processing, the information
needed for this task is the edge orientations derived from
the shape contour.
With each fixation, the model takes a foveated
measurement of the stimulus:
– Estimating the orientations using a set of filters that are
selective to eight discrete orientations
– within a pooling neighbourhood whose size depends on
distance from the current fixation point
– the number of occurrences of different veridical edge
orientations are counted to create a histogram (or
probability distribution after normalization) of the
different orientations at that location.
•
With each successive fixation, the current map is updated
by multiplying it with the new measurement distribution.
This map is flat before the very first fixation is made.
•
Information is the entropy of a probability distribution:
Entropy = - Σ p (x) log p(x).
•
When there are many different orientations in a
neighbourhood (e.g. a bumpy contour in the periphery), all
orientations are equally likely and the distribution will be flat
(high entropy). Alternatively, straight edges will produce
energy at a single orientation or very peaked distributions
(low entropy).
•
As the evidence of orientations accumulates with
successive fixations, the uncertainty of the shape
knowledge at any point in time can be represented by
computing a resolution-dependent entropy (RDE) map.
Can this global strategy model predict subjects’ eye
movements across the shapes?
Results
•
Percentage correct ranged from 75% to 78%.
•
Mean amplitudes of object-exploring saccades ranged from
2.38° to 4.44°. Mean dwell times ranged from 175 to 403
ms. All well within the normal range of naturalistic stimuli
and search tasks.
•
Subjects typically made three to five fixations around the
object in the viewing time allowed.
Fixated locations were
found to be spatially
distributed in a donut shape
for three of four subjects.
•
Red fixations are the first fixations to the object. They do
not have the same donut distribution of the other objectexploring fixations and are biased in the preview direction.
•
Their clustering suggest that they are simply localizing
saccades that are mostly independent of detailed shape
information.
Global strategy predictions?
•
If the goal of eye movements is to gather task relevant
information, then the best strategy is to fixate on locations
that maximize the total information gained about the
contour orientations.
•
This prediction is computed by evaluating all possible next
fixation locations in a grid of positions spaced 0.25° apart
and selecting the position that yield the greatest gain in
total information (= the greatest reduction in total
uncertainty).
First impression
•
The distributions of saccade amplitude and fixation location
are qualitatively similar to those measured for our subjects.
•
The distributions generated by a random strategy that
predicts fixations anywhere on the stimulus are quite
different.
Quantification of fit: fixation error
•
Every human fixation is mapped to the closest strategy
fixation, and the distance errors are accumulated. The
mean of these samples is the fixation error and is taken as
one measure of how well strategy-predicted locations align
with human fixations.
•
The significance of the alignment is assessed by
bootstrapping (1,000 iterations) to get 95% confidence
intervals of the fixation error.
Human fixations are closer to
the global strategy than to
random fixation.
Receiver Operating Characteristic
•
Each new fixation is overlaid on the current map, which is
updated using the previous series of fixations. The map is
rescaled from 0 to 1, and the prediction value is taken as
the maximum value that falls within 1° of the human
fixation.
•
ROC curves are computed and the area under the curve
(AUC) is determined to assess the power of the global
strategy prediction.
– Hits: the probability that the prediction value exceeds a
threshold at fixated locations
– False alarms: the probability that the prediction value
exceeds a threshold at random locations.
– Hits and false alarms are plotted with changing
threshold, sweeping out the ROC curve
•
If the global prediction is no better than random at
predicting human fixations, the ROC curve should lie along
the positive diagonal (AUC = 0.5). If the global strategy is a
good predictor of human fixations, it will tend toward the
upper left-hand corner of the plot (0.5<AUC<1.0)
•
To assess the significance of the AUC, hits and false
alarms are resampled with replacement to produce
bootstrapped estimates (significantly better than chance if
95% confidence interval does not include 0.5).
For all of our observers, the global model is significantly better than
chance at predicting the next fixation than random fixations.
Discussion
•
Subject seem to fixate locations that maximize reduction of
uncertainty.
•
The simple fact that the global model produces a donutshaped distribution of fixations may be enough to align it
with human fixation patterns.
•
A much more stringent test would be one that compares
the performance of the global strategy against a smarter
random strategy that knows shape information is near the
edges.
Smart Random Strategy
•
Fixation error:
– Every new human fixation is mapped to a randomly
drawn fixation on previous trials and the distance errors
are accumulated.
•
ROC:
– Hits: pixels of strategy map within 1° of human fixations
that exceed threshold
– False alarms: pixels of strategy map within 1° of a
fixation randomly drawn from their fixations on other
trials
ROC curves shift for each observer
toward the diagonal but the AUC is
still significantly greater than 0.5.
Discussion
•
Global strategy is omniscient because the benefit of all
possible fixations is fully known before a decision is made
about the best next fixation. This would need an enormous
amount of computational power. It is more likely that we
use estimates (e.g., heuristics or learned priors) to
determine the benefit of each possible next fixation. But
again, it is unclear how the visual system would do this
without complex computation.
•
Is there a simpler, more efficient strategy that produces
similar fixation behaviour?
Other strategies
•
Two biologically plausible strategies for making eyemovement decisions:
– Saliency
– Local Uncertainty.
•
They evaluate each strategy against the smart random
strategy baseline.
Saliency
•
Given that the shapes in the psychophysical task are novel,
top–down influences should be minimized and observers
may simply look at salient points on the shape. In our
stimuli, salient locations are those that have an orientation
that differs from its surround, such as corners or sharp
points.
•
We produced saliency prediction maps for our stimuli using
a model of Itti and Koch (2000).
Local Uncertainty
•
Only the most informative points, or points of maximum
entropy, are fixated.
To better understand this difference, imagine two nearby
locations that have similar prediction values. The global
strategy might be to fixate between them to maximize
information about both locations, whereas the local
uncertainty strategy would fixate the one with slightly
higher uncertainty (more information).
•
To model this, we used the RDE map directly as a
prediction map and the strategy is to fixate the hot spots.
Analysis
•
Fixation error and ROC curves for each strategy.
•
In the case of the saliency strategy, we include a 1° mask
that inhibits saliency signals at previously fixated locations
to mimic the dynamic changes in the saliency map due to
IOR. This will presumably improve the prediction of the
saliency strategy by reducing the number of salient
locations that the random strategy may predict.
•
For the local uncertainty strategy, the RDE map is updated
from the history of human fixations.
•
For both strategies, the prediction strength for the next
fixation is evaluated using the maximum value of the
strategy prediction map within 1 ° of the fixation.
Results
Both the saliency and local
uncertainty strategies produce
a donut-shaped distribution,
but neither strategy shows a
distribution of saccade
amplitudes exactly like the
observers.
Fixation error
•
Neither the saliency nor the
local uncertainty strategy
performs equal to or better than
the global strategy.
•
FE ignores the sequence of
fixations, possibly explaining the
larger errors with this metric.
ROC
•
The local uncertainty strategy is
at least as good as the global
strategy at predicting where
observers will look next.
•
It is well known that humans make fixations toward the
centroids of small shapes (Melcher & Kowler, 1999).
•
What if observers are combining the local uncertainty
strategy with a simple centroid prior when planning
fixations?
The discrepancy between fixation error and the ROC
finding could be explained if observers consistently
undershoot the maximum of the local uncertainty
prediction but still land within a hot spot.
Local Uncertainty + Centroid bias
• The calculated fixation locations f will be biased
toward the centroid by a weight w:
• C is the centroid and fˆ is the strategy-defined
prediction.
Fixation error
The spatial distribution of predicted fixations has a more
compact donut shape and looks strikingly similar to the
human pattern. This improved distribution is reflected in the
decrease in fixation error.
ROC: AUC comparison
Given observed fixation locations and different values of w,
the observer’s intended fixation can be calculated and
superimposed on the local uncertainty strategy map. Using
the prediction values from these maps, again ROC curves
are computed.
For all subjects,
the local uncertainty strategy
with centroid weighting
provides the best prediction of
human fixation locations.
Discussion
•
The saliency model did not incorporate eccentricity effects.
Salience is less pronounced for more eccentric locations.
However, even without eccentricity factors, the saliency
strategy shows some predictive power.
•
In the stimuli, local uncertainty and saliency predictions
often overlap, especially early in the fixation sequence.
This correlation is likely present in all natural stimuli. Stimuli
that cleanly isolate local uncertainty and saliency effects
would be needed to determine if the visual system makes
use of only one strategy or if it uses both strategies.
•
The inference of orientation at a point, discretization into
eight bins, and use of vernier parameters are all
approximations that may introduce error into the estimates
of local uncertainty. As a result, their prediction maps may
not be correct in detail.
•
Isolated maxima will predict a fixation regardless of
neighbouring activity. This may be the underlying cause of
the bimodal distribution of saccade length for the location
uncertainty strategy.
•
The visual system, perhaps through lateral interactions,
may smooth these spurious signals. Also, large areas of
activity may be sharpened through nonlinear competition.
Thank you for your time!
Goodbye!
Download