select - Center for Cognitive Science

advertisement

Institute Jean Nicod, Oct 28, 2005

What is focal attention for?

The What and Why of perceptual selection

The central function of focal attention is to select

We must select because our capacity to process information is limited

We must select because we need to be able to mark certain aspects of a scene and to refer to the marked tokens individually

 That’s what this talk is principally about: but first some background

The functions of focal attention

A central notion in vision science is that of “picking out” or selecting (also referring, tracking ) . T he usual mechanism for perceptual selection is called selective attention or focal attention .

Why must we select at all? Overview

 We must select because we can’t process all the information available.

This is the resource-limitation reason.

○ But in what way (along what dimensions) is it limited? What happens to what is not selected? The “filter theory” has many problems.

We need to select because certain patterns cannot be computed without first marking certain special elements (e.g. in counting)

We need to select in order to track the identity of individual things e.g., to solve the correspondence problem by identifying tokens in order to establish the equivalence of this

(t=i) and this

(t=i+ε)

We need to select because of the way relevant information in the world is packaged. This leads to the Binding Problem

. That’s an important part of what I will discuss in this talk.

Broadbent’s Filter Theory

(illustrating the resource-limited account of selection)

Rehearsal loop

Effectors

Motor planner

Limited Capacity Channel

Store of conditional probabilities of past events (in LTM)

Broadbent, D. E. (1958). Perception and Communication . London: Pergamon Press.

Attention and Selection

 The question of what is the basis for selection has been at the bottom of a lot of controversy in vision science. Some options that have been proposed include:

We select what can be described physically (i.e., by

“channels”) – we select transducer outputs

 e.g., we select by frequency, color, shape, or location

We select according to what is important to us (e.g., affordances – Gibson), or according to phenomenal salience (William James)

We select what we need to treat as special or what we need to refer to

 selecting as “marking”

Consider the options for what is the basis of visual selection

The most obvious answer to what we select is places or locations . We can select most other properties by their location – e.g., we can move our eyes so our gaze lands on different places

Must we always move our eyes to change what we attend to?

Studies of Covert Attention-Movement : Posner (1980)

Other empirical questions about place selection…

• When places are selected, are they selected automatically or can they be selected voluntarily?

• How does the visual system specify where to move attention to?

• Are there restrictions on what places we can select?

• Are selected places punctate or can they be regions?

• Must selected places be filled or can they be empty places?

• Can places be specifiable in relation to landmark objects (e.g., select the place half way between X and Y )?

Fixation frame

Covert movement of attention

Cue

Target-cue interval

Detection target

*

Cued

Uncued

*

Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed.

Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32 , 3-25.

Extension of Posner’s demonstration of attention switch

Does the improved detection in intermediate locations entail that the “spotlight of attention” moves continuously through empty space?

Sperling & Weichselgartner argued that this analog movement is best explained by a quantal mechanism

The theory assumes a quantal jump in attention in which the spotlight pointed at location -2 is extinguished and, simultaneously, the spotlight at location +2 is turned on. Because extinction and onset take a measurable amount of time, there is a brief period when the spotlights partially illuminate both locations simultaneously.

Could Objects, rather than places , be the basis for selection?

An independently motivated alternative is that selection occurs when token perceptual objects are individuated

Individuation involves distinguishing something from all things it is not. In general individuation involves appealing to properties of the thing in question (cf Strawson).

○ But a more primitive type of individuation or perceptual parsing may be computed in early vision

Primitive Individuation ( PI ) may be automatic

○ PI is associated with transients or the appearance of a new object

○ PI is sometimes accompanied by assignment of a deictic reference or FINST that keeps individuals distinct without encoding their properties (nonconceptual individuation). This indexing process is, however, numerically limited (to about 4 objects) [* More later]

○ Individuation is often accompanied by the creation of an

Object

File (OF) for that individual, though the OF may remain empty

Some empirical evidence for objectbased selection and indexing

General empirical considerations

Individuals and patterns – the need for argument-binding

Examples: subitizing, collinarity and other relational judgments

Experimental demonstrations

Single-object advantage in joint judgments

Evidence that whole enduring objects are selected

Multiple-Object tracking

Clinical/neuroscience findings

Some empirical evidence for object-based selection

General empirical considerations

Individuals and patterns – the need for argument-binding

Examples: subitizing, collinarity and other relational judgments

Experimental demonstrations

Single-object advantage in joint judgments

Evidence that whole enduring objects are selected

Multiple-Object tracking

Clinical/neuroscience findings

Individuals and patterns

Vision does not recognize patterns by applying templates but by parsing the pattern into parts – recognition-by-parts (Biederman)

A pattern is encoded over time (and over eye movements), so the visual system must keep track of the individual parts and recognize them as the same objects at different times and stages of encoding

Individuating is a prerequisite for recognition of configurational properties (patterns) defined among several individual parts

An example of how we can easily detect patterns if they are defined over a small enough number of parts is in subitizing

In order to recognize a pattern, the visual system must pick out individual parts and bind them to the representation being constructed

Examples include what Ullman called “visual routines”

Another area where the concept of an individual has become important is in cognitive development, where it is clear that babies are sensitive to the numerosity of individual things in a way that is independent of their perceptual properties

Are there collinear items (n>3)?

Several objects must be picked out at once in making relational judgments

The same is true for other relational judgments like inside or on-thesame-contour

… etc. We must pick out the relevant individual objects first. Respond: Inside-same contour? On-same contour?

Another example: Subitizing vs Counting.

How many squares are there?

Subitizing is fast, accurate and only slightly dependent on how many items there are . Only the squares on the right can be subitized.

Concentric squares cannot be subitized because individuating them requires a curve tracing operation that is not automatic.

Signature subitizing phenomena only appear when objects are automatically individuated and indexed

Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision. Psychological Review, 101 (1), 80-102.

Some empirical evidence for object-based selection

General empirical considerations

Individuals and patterns – the need for argument-binding

Examples: subitizing, collinarity and other relational judgments

Some experimental demonstrations

Single-object advantage in joint judgments

Evidence that whole enduring objects are selected

Multiple-Object tracking

Clinical/neuroscience findings

Single-object superiority occurs even when the shapes are controlled

Instruction: Attend to the Red objects

 Which vertex is higher, left or right 

(Note: There are now many control studies that eliminate most obvious confounds)

Attention spreads over perceived objects

Spreads to

B and not C

Spreads to

C and not B

Spreads to

B and not C

Spreads to

C and not B

Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads to other parts of the same visual object compared to equally distant parts of different objects.

We can select a shape even when it is intertwined among other similar shapes

Are the green items the same? On a surprise test at the end, subjects were not able to recall shapes that had been present but had not been attended in the task

(Rock & Gutman, 1981; DeSchepper & Treisman, 1996)

Further evidence that attention is object-based comes from the finding that various attention phenomena move with moving objects

Once an object is selected, the selection appears to remain with the object as it moves

Inhibition of return appears to be object-based

Inhibition-of-return (IOR) is the phenomenon whereby attention is slow to go back to an object that had been attended about 0.7 – 1.0 secs before

It is thought to help in visual search since it prevents previously visited objects from being revisited

Tipper, Driver & Weaver (1991) showed that IOR moves with the inhibited object

IOR appears to be object-based (it travels with the object that was attended)

Objects endure despite changes in location; and they carry their history with them!

Object File Theory of Kahneman & Treisman

A B

A

1 2 3

Letters are faster to read if they appear in the same box where they appeared initially. Priming travels with the object. According to the theory, when an object first appears, a file is created for it and the properties of the object are encoded and subsequently accessed through this object-file.

Some empirical evidence for object-based selection

General empirical considerations

Individuals and patterns – the need for argument-binding

Examples: subitizing, collinarity and other relational judgments

Experimental demonstrations

Single-object advantage in joint judgments

Evidence that whole enduring objects are selected

Multiple-Object tracking studies (later)

Clinical/neuroscience findings

Visual neglect

Balint syndrome & simultanagnosia

Visual neglect syndrome is object-based

When a right neglect patient is shown a dumbbell that rotates, the patient continues to neglect the object that had been on the right, even though It is now on the left (Behrmann & Tipper, 1999) .

Simultanagnosic (Balint Syndrome) patients attend to only one object at a time

Simultanagnosic patients cannot judge the relative length of two lines, but they can tell that a figure made by connecting the ends of the lines is not a rectangle but a trapezoid

(Holmes & Horax, 1919)

.

Balint patients attend to only one object at a time even if they are overlapping!

Luria, 1959

Some empirical evidence for object-based selection

Some general empirical considerations

Individuals and patterns – the need for argument-binding

Examples: subitizing, collinarity and other relational judgments

Some direct experimental demonstrations

Single-object advantage in joint judgments

Evidence that whole enduring objects are selected

Multiple-Object tracking studies

Clinical/neuroscience findings

Multiple Object Selection

One of the clearest cases illustrating object-based selection is Multiple Object Tracking

Keeping track of individual objects in a scene requires a mechanism for individuating, selecting, accessing and tracking the identity of individuals over time

These are the functions we have proposed are carried out by the mechanism of visual indexes (FINSTs)

We have been using a variety of methods for studying visual indexing , including subitizing, subset selection for search, and Multiple Object Tracking (MOT).

Multiple Object Tracking

In a typical experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off.

After these 4 “targets” have been briefly identified, all objects resume their identical appearance and move randomly. The subjects’ task is to keep track of which ones had earlier been designated as targets.

After a period of 5-10 seconds the motion stops and subjects must indicate, using a mouse, which objects were the targets.

People are very good at this task (80%-98% correct).

The question is: How do they do it?

Keep track of the objects that flash

How do we do it? Do we keep encode and update locations serially?

Keep track of the objects that flash

How do we do it? What properties of individual objects do we use?

Explaining Multiple Object Tracking

Basic finding: People (even 5 year old children) can track 4 to 5 individual objects that have no unique visual properties. How is it done?

Can it be done by keeping track of the only distinctive property of objects – their location?

○ Based on the assumption of finite attention movement speed, our modeling suggest that this cannot be done by encoding and updating locations (because of the speed at which they are moving and the distance between them)

○ If tracking is not done by using the only uniquely distinguishing property of objects, then it must be done by tracking their historical continuity as the same individual object

If we are not using objects’ locations, then how are we tracking them?

Our independently motivated hypothesis is that a small number of objects (e.g., 4-5) are individuated and reference tokens or indexes are assigned to them

An index keeps referring to the object as the object changes its properties and its location (that makes it the same object!)

An object is not selected or tracked by using an encoding of any of its properties. It is picked it out nonconceptually just the way a demonstrative does in language (i.e., this, that)

Although some physical properties must be responsible for the individuation and indexing of an object, we have data showing that these properties are not encoded, and the properties that are encoded need not be used in tracking

What has this to do with the

Binding Problem ?

First I will introduce the binding problem as it appears in psychology

The role of selection in encoding conjunctions of properties (the binding problem)

The binding problem was initially described by Anne

Treisman who showed conditions under which vision may fail to correctly bind conjunctions of properties

(resulting in conjunction illusions)

Feature binding requires focal attention (i.e., selection )

The problem has been of interest to philosophers because it places constraints on how information may be encoded in early vision (or, as Clark would put it,

‘at the sensory level’ or nonconceptually)

I introduce the binding problem to show how the object-based view is essential for its solution

Introduction to the Binding Problem:

Encoding conjunctions of properties

Experiments show the special difficulty that vision has in detecting conjunctions of several properties

It seems that items have to be attended (i.e., individuated and selected) in order for their property-conjunction to be encoded

When a display is not attended, conjunction errors are frequent

Read the vertical line of digits in this display

What were the letters and their colors?

This is what you saw briefly …

Under these conditions Conjunction Errors are very frequent

Encoding conjunctions requires selection

One source of evidence is from search experiments:

Single feature search is fast and appears to be independent of the number if items searched through

(suggesting it is automatic and ‘pre-attentive’)

Conjunction search is slower and the time increases with the number of items searched through (suggesting it requires serial scanning of attention)

Rapid visual search

(Treisman)

Find the following simple figure in the next slide:

This case is easy – and the time is independent of how many nontargets there are – because there is only one red item. This is called a ‘popout’ search

This case is also easy – and the time is independent of how many nontargets there are – because there is only one right-leaning item. This is also a ‘popout’ search.

Rapid visual search

(conjunction)

Find the following simple figure in the next slide:

Feature Integration Theory and feature Binding

Treisman’s attention as glue hypothesis: focal attention

(selection) is needed in order to bind properties together

 We can recognize not only the presence of “squareness” and

“redness”, but we can also distinguish between different ways they may be conjoined together

• Red square and green circle vs green square and red circle

 The evidence suggests that conjoined properties are encode only if they are attended or selected

 Notice that properties are considered to be conjoined if and only if they are properties of the same object, so it is objects that must be selected!

Constraints on nonconceptual representation of visual information (and the binding problem)

Because early (nonconceptual) vision must not fuse the conjunctive grouping of properties, visual properties can’t just be represented as being present in the scene – because then the binding problem could not be solved!

What else is required?

The most common answer is that each property must be represented as being at a particular location

According to Peter Strawson and Austin Clark, the basic unit of sensory representation is

Feature F at location L

This is the global map or feature placing proposal.

 This proposal fails for interesting empirical reasons

But if feature placing is not the answer, what is?

The role of attention to location in Treisman’s

Feature Integration Theory

Conjunction detected

R

Color maps Shape maps Orientation maps

Y

G

Master location map Attention “beam”

Original Input

But in encoding properties, early vision can’t just bind them together according to their spatial co-occurrence – even their cooccurrence within the same region . That’s because the relevant region depends on the object. So the selection and binding must be according to the objects that have those properties

If location of properties will not give us a way of solving the binding problem, what will?

This is why we need object-based selection and why the object-based attention literature is relevant …

An alternative view of how we solve the binding problem

If we assume that only properties of indexed objects (of which there are about 4-5) are encoded and that these are stored in object files associated with each object, then properties that belong to the same object are stored in the same object file , which is why they get bound together

This automatically solves the binding problem!

This is the view exemplified by both FINST Theory (1989) and Object File Theory (1992)

The assumption that only properties of indexed objects are encoded raises the question of what happens to properties of the other (unindexed) objects or properties in a display

The logical answer is that they are not encoded and therefore not available to conceptualization and cognition

But this is counter-intuitive!

An intriguing possibility….

 Maybe we see far less than we think we do!

This possibility has received a great deal of recent attention with the discovery of various ‘blindnesses’ such as change-blindness and inattentional blindness

The assumption that no properties other than properties of indexed objects can be encoded is in conflict with strong intuitions – namely that we see much more than we conceptualize and are aware of. So what do we do about the things we “see” but do not conceptualize?

Some philosophers say they are represented nonconceptually

But what makes this a nonconceptual representation , as opposed to just a causal reaction?

○ At the very minimum postulating that something is a representation must allow generalizations to be captured over their content , which would otherwise not be available

○ Traditionally representations are explanatory because they account for the possibility of misrepresentation and they also enter into conceptualizations and inferences. But unselected objects and unencoded properties don’t seem to fit this requirement (or do they?)

Maybe information about non-indexed objects is not represented at all!!

A possible view (which I am not prepared to fully endorse yet) is that certain topographical or biological reactions

(e.g., retinal activity) are not representations – because they have no truth values and so cannot misrepresent

One must distinguish between causal and represented properties

Properties that cause objects to be indexed and tracked and result in object files being created need not be encoded and made available to cognition

Is this just terminological imperialism?

If we call all forms of patterned reactions representations then we will need to have a further distinction among types within this broader class of representation

We may need to distinguish between personal and subpersonal types of ‘representation’ with only the former being representations for our purposes

We may also need to distinguish between patterned states within an encapsulated module that are not available to the rest of the mind/brain and those that are available

○ Certain patterned causal properties may be available to motor control – but does that make them representations?

An essential diagnostic is whether reference to content – to what is represented

– allows generalizations that would otherwise be missed and that, in turn, suggests that there is no representation without misrepresentation

○ We don’t want to count retinal images as representations because they can’t misrepresent, though they can be misinterpreted later

What next?

This picture leaves many unanswered questions, but it does provide a mechanism for solving the binding problem and also explaining how mental representations could have a nonconceptual connection with objects in the world (something required if mental representations are to connect with actions)

The End

… except for a few loose ends …

Can objects be individuated but not indexed? A new twist to this story

We have recently obtained evidence that objects that are not tracked in MOT are nonetheless being inhibited and the inhibition moves with them

It is harder to detect a probe dot on an untracked object than on either a tracked object or empty space!

But how can inhibition move with a nontarget when the space through which they move is not inhibited?

 Doesn’t this require the nontargets to be tracked?

The beginnings of the puzzle of clustering prior to indexing, and what that might mean!

If moving objects are inhibited then inhibition moves along with the objects. How can this be unless they are being tracked? And if they are being tracked there must be at least 8 FINSTs!

This puzzle may signal the need for a kind of individuation that is weaker than the individuation we have discussed so far – a mere clustering, circumscribing, figure-ground distinction without a pointer or access mechanism – i.e. without reference!

It turns out that such a circumscribing-clustering process is needed to fulfill many different functions in early vision. It is needed whenever the correspondence problem arises – whenever visual elements need to be placed in correspondence or paired with other elements. This occurs in computing stereo, apparent motion, and other grouping situations in which the number of elements does not affect ease of pairing (or even results in faster pairing when there are more elements). Correspondence is not computed over continuous visual manifolds but only over some pre-clustered elements.

Example of the correspondence problem for apparent motion

The grey disks correspond to the first flash and the black ones to the second flash. Which of the 24 possible matches will the visual system select as the solution to this correspondence problem? What principal does it use?

Curved matches Linear matches

Here is how it actually looks

Views of a dome

Structure from Motion Demo

Cylinder Kinetic Depth Effect

The correspondence problem for biological motion

FINST Theory postulates a limited number of pointers in early vision that are elicited by causal events in the visual field and that enable vision to refer to things without doing so under concept or a description

Download