LastLectures - Center for Cognitive Science

advertisement
Imagery slides
Imagery and Memory
●
Memory Examples: Dual Code Theory
 To recall Y you must first recall X
 Windows, doorknob, glasses, other facial features,
global-to-local
 But: Something like the same thing happens in recall of
alphabet letters and many other memorized lists
●
Imageability rating are more effective than
frequency of occurrence or frequency of cooccurrence in paired-associates learning.
Vision is clearly involved when
images are superimposed onto vision
 Many experiments show that when you
project an image onto a display the image
acts very much like a superimposed display
• Shepard & Podgorny (paper folding task…)
• Interference effects (Brooks)
 Controvercial Perky effect: Perception or
response bias?
Project an image onto a perceived form
Brooks’ spatial interference study
Respond by pointing to symbols in a table or by saying the words left or right
Perception or attention effects?
● Many impressive imagery effects can be
plausibly attributed to attention
● Bisiach widely-cited finding on visual neglect
 Bartolomeo, P., & Chokron, S. (2002). Orienting of attention in left unilateral
neglect. Neuroscience and Biobehavioral Reviews, 26(2), 217-234.
 Dulin, D., Hatwell, Y., Pylyshyn, Z. W., & Chokron, S. (2008). Effects of
peripheral and central visual impairment on mental imagery capacity.
Neuroscience and Biobehavioral Reviews, 32(8), 1396-1408.
Does neglect require vision?
 Chokron, S., Colliot, P., & Bartolomeo, P. (2004). The role of vision in
spatial representations. Cortex, 40, 281-290.
We can to some extent control our attended region
Is an image being projected onto a percept, or just a selective attention?
Farah, M. J. (1989). Mechanisms of imagery-perception interaction. Journal of
Experimental Psychology: Human Perception and Performance, 15, 203-211.
Shepard & Podgorny experiment
Both when the displays are seen and when the F is
imagined, RT to detect whether the dot was on the F is
fastest when the dot is at the vertex of the F, then when
on an arm of the F, then when far away from the F –
and slowest when one square off the F.
Similarities between perception of visual
scenes and ‘perception’ of mental images
●
Judgments from mental images
 Shape comparisons (of states: Shepard & Metzler)
 Size comparisons (Weber fraction or ratio effect)
• What do they tell us about the format of images?
• But this applies to nonvisual properties (e.g., price, taste)
More demonstrations of the relation between
vision, imagery (and later action)
● Images constructed from descriptions
The D-J example(s)
 Perception or inference/guessing

●
But there are even more persuasive
counterexamples we will see later
The two-parallelogram example
• Amodal completion
• Reconstruals: Slezak
•
Dynamic imagery
Imagining actions: Paper Folding
Mental rotation
Time to judge whether (a)-(b) or (b)-(c) are the
same except for orientation. Time increases
linearly with the angle between them (Shepard &
Metzler, 1971)
What do you do to judge whether these
two figures are the same shape?
Is this how the process looked to you?
When you make it rotate in your mind, does it seem
to retain its rigid 3D shape without re-computing it?
Mental rotation – the real story
In mental rotation the phenomenology motivates the theory
of “rotation” – but what the data actually show is that,
 Mental rotation is only found when the comparison figures are
enantiomorphs or if the difference between figure pairs can only be
expressed in figure-centric coordinates eg. they are 3D mirror-images
 No rotation occurs if the figures have landmarks that can be used to
identify the relations among their parts.
 Records of eye movements show that mental rotation is done
incrementally: It is not a holistic rotation as often reported. If fact
even the phenomenology is not of a smooth continuous rotation.
 The “rate of rotation” depends on the conceptual complexity of both
the figure and comparison task so that, at least, is not a result of the
architecture (Pylyshyn, 1979). There are even demonstrations that it
depends on how the subject interprets the figure (Kosslyn, 1994).
Mental Scanning
● Hundreds of experiments have now been done
demonstrating that it takes longer to scan attention
between places that are further apart in the imagined
scene. In fact the relation is linear between time and
distance.
● These have been reviewed and described in:
 Denis, M., & Kosslyn, S. M. (1999). Scanning visual mental
images: A window on the mind. Cahiers de Psychologie
Cognitive / Current Psychology of Cognition, 18(4), 409-465.
Studies of mental scanning
Does it show that images have metrical space?
1.8
1.6
1.4
Latency (secs)
1.2
1
0.8
0.6
0.4
0.2
0
Relative distance on image
Does this show that images are spatial, or have spatial properties, or
that they “preserve metrical spatial properties”? (Kosslyn, S. M., T. M. Ball, et al.
(1978). "Visual images preserve metric spatial information: Evidence from studies of image scanning."
Journal of Experimental Psychology: Human Perception and Performance 4: 46-60.
The idea of images being in some sense
spatial is an interesting and important claim
● I will discuss this claim at some length later because
it reveals a deep and all-consuming error that runs
through all imagery theorizing – by psychologists,
neuroscientists and philosophers.
● This is in addition to the errors I discussed earlier:
The idea that subjects understand the task of
imagining something to be the task of pretending
they are seeing it, and the idea that certain
properties of the world are properties of the image
(the intentional fallacy)
Constructing an image
● What determines what the image is like
when it is constructed from memory or from
knowledge?
● After constructing an image can you see
novel aspects of the imagined situation?
● Examples
Examples to probe your intuition and your tacit knowledge
Imagine seeing these events unfolding…
● You hit a baseball. What shape trajectory does it trace? It is
coming towards you: Where would you run to catch it? If you
have ever played baseball you would have a great deal of “tacit
knowledge” of what to do in such (well studied) cases.
● You drop a rubber ball on the pavement. Tap a button every
time it hits the ground and bounces. Plot height vs time.
height
What is responsible for the
pattern shown here?
Time since drop
● Drop a heavy steel ball at the same time as you drop a light ball
(a tennis ball), e.g., from the leaning tower of Pisa. Indicate
when they hit the ground. Repeat for different heights.
● Take a clear glass containing a colored liquid. Tilt it 45º to the
left (counter-clockwise). What is the orientation of the liquid?
What color do you see when two
color filters overlap?
?
Where would the water go if you poured it
over a full beaker of sugar?
Is there conservation of volume in
your image? If not, why not?
Seeing Mental Images
●
Do images have size?
● Can we say that one image is larger than another?
● If so, what properties do we expect the smaller/larger
image to have?
Do mental images have size?
Imagine a very small mouse. Can you see its whiskers?
Now imagine a huge mouse. Can you see its whiskers?
Do this imagery exercise:
Connect each
corner
of the top
Now
imagine
an parallelogram
identical
Imagine
a parallelogram
like this onewith the
corresponding
corner directly
of the bottom
parallelogram
below parallelogram
this one
What do you see when you imagine the connections?
Did the imagined shape look (and change) like the one you see now?
Slezak figures
Pick one (or two) of these animals and
memorize what they look like. Now
rotate it in your mind by 90 degrees
clockwise and see what it looks like.
Slezak figures rotated 90o
P 29
Space
Images and the representation of spatial properties
● We need to understand what it could mean
for a representation to be spatial.
● At the very least it must mean that there are
constraints placed on the form of the
representation that do not apply when the
representation is not spatial.
The idea that images are in some sense
spatial is an interesting and important claim
● I will return to this claim later because it reveals a deep
and ubiquitous error that runs through most (all?)
imagery theorizing – by psychologists, neuroscientists
and philosophers. This is the error of mistaking
descriptive adequacy with explanatory adequacy.
Let’s call this conflating, the missing constraint error.
● This is in addition to the two errors I discussed earlier:
 Ignoring the fact that the task of imagining something is
actually the task of pretending you are seeing it, and
 The mistaken assumption that certain properties of the world
are properties of the image (the intentional fallacy)
Both vision and visual imagery have
some connection to the motor system
● There are a number of experiments showing the close
connection between images and motor control*
 You can get Stimulus-Response compatibility effects
between the location of aanstimulus
image in space and the
location of the response button in space,
 Ronald Finke showed that you could get adaptation with
the position of the misperceived
imagined hand that was similar to
adaptation to displacing prism goggles,
 Both these findings provide support for the view that the
spatial character of images comes from something being
projected onto a concurrently perceived scene and then
functioning much as objects of perception.

This is the main new idea in Chapter 5 of Things & Places)
Recall the studies of mental scanning…
Does
properties?
Doesthis
thisresult
resultshow
showthat
thatimages
imageshave
havemetrical
spatial properties?
1.8
1.6
1.4
scan image
Latency (secs)
1.2
1
0.8
0.6
0.4
0.2
0
Relative distance on image

the image
scanning
effect
is Cognitively
Penetrable
ButWe
theshowed
way wethat
compute
the time
it takes
to scan
across an image
is by
imagining something moving across the real perceived display. Without
this display, we could not use our time-to-collision computation to
compute the time to cross various distances on the image because there
are no actual distances on the image! (Pylyshyn & Cohen, 1999)
Using a concurrently perceived room to
anchor FINSTs tagged with map labels
The Spatial character of images
What does it mean to say that images are spatial?
● It means that certain constraints hold among spatial measures (e.g.,
●
●
●
axioms of geometry and measure theory, such as triangle inequality,
symmetry of distances, Euclidean axioms, Pythagoras’ theorem…}
That certain constraints hold among “distances”, that certain relations
can be defined among these distances (e.g., ‘between’, ‘farther than’),
that Newtonian Physics holds between the terms that are used in
explanations (e.g., distances and time).
That mental images and motor control interact with one another to
some degree – so you can “point to” objects in your image.
Certain visual-motor ‘reflexes’ are automatic or preconceptual
 They are computed within the encapsulated Visual Module
 Preconceptual motor control is not sensitive to visual illusions, relative
to control that is computed by the cognitive (‘seeing as’) system.
Mental images as “depictive” representations
● “A depictive representation is a type of picture, which specifies the
locations and values of configurations of points in a space.
● The space in which the points appear need not be physical but can be
like an array in a computer, which specifies spatial relations purely
functionally. That is, the physical locations in the computer of
each point in an array are not themselves arranged in an array; it is
only by virtue of how this information is “read” and processed that it
comes to function as if it were arranged into an array….
● Depictive representations convey meaning via their resemblance to
an object.
● When a depictive representation is used, not only is the shape of the
represented parts immediately available to appropriate processes ,
but so is the shape of the empty space … [and] one cannot represent
a shape in a depictive representation without also specifying a size
and orientation….”
Form vs Content of images
● As in earlier discussion, one must be careful in distinguishing
form from content. We know that there is a difference between
the content of images and the content of other (nonimaginal)
thought: Images concern sensory appearances while ‘propositions’
can express most* other contents.
● In attributing a special form of representation to images one
should ask whether some symbolic system (e.g., sentences of
LOT) would not do. Simplicity (Occam’s Razor) would then prefer a
single format over two, especially if the one format is essential for
representing thoughts and inferences [Fodor, J. A. and Z. W. Pylyshyn (1988).
"Connectionism and cognitive architecture: A critical analysis." Cognition 28: 3-71.]
● The most promising contents that might require different forms of
representation are those that essentially represent magnitudes. Of
the magnitudes most often associated with images are spatial ones.
* There has been a long-standing debate in Artificial Intelligence concerning the advantages of
logical formats vs other symbol systems vs something completely difference (procedure).
Thou shalt not cheat
●
●
There is no natural law that requires the representations of time,
distance and speed to be related according to the motion equation.
You could just as easily imagine an object moving instantly or
with constant acceleration or with any motion relation you like,
since it is your image!
There are two possible reason why the observed relation
Actual Time = Representation of distance
Representation of speed
typically holds in an image-scanning task:
1.
2.
Because subjects have tacit knowledge that this is what
would happen if they viewed a real display, or
Because the matrix is taken to be a simulation of a real
physical display, as it often is in computer science.
 Notice that in the second case the explanation for the Reaction Time
comes from the simulated real display and not from the matrix.
The missing constraint in appeals to “space”
in both scanning and mental rotation
• What is assumed about the format or architecture of the mental
representation in the examples of mental rotation?
• According to philosopher Jesse Prinz (2002) p 118,
“If visual-image rotation uses a spatial medium of the kind
Kosslyn envisions, then images must traverse intermediate
positions when they rotate from one position to another. The
propositional [i.e., symbolic] system can be designed to
represent intermediate positions during rotation, but that is not
obligatory.”
• This is a very important observation, but it is incomplete. One
still needs to answer the question: What makes it obligatory that
the object must ‘pass through intermediate positions’ when
rotating in ‘functional space’, and what constitutes an
‘intermediate position’? These terms apply to the represented
world, not to the representation!
The important distinction between
architecture and represented content
●
It is only obligatory that a certain pattern must occur if
the pattern is caused by fixed properties of the
architecture as opposed to being due to properties of
what is represented (i.e., what the observer tacitly knows
about the behavior of what is represented)
 If it is obligatory only because the theorist says it is, score that
as a free empirical parameter that any theory can assume.
 This failure of image theories is quite general – all picture
theories suffer from the same lack of principled constraints.
The important distinction between
descriptive and explanatory adequacy
●
●
It is important to recognize that if we allow one theory to
stipulate what is obligatory without there being a principle
that mandates it, then any other theory can stipulate the same
thing. Such a theories are unconstrained so they can fit any
possible observation – i.e., they are able to describe
anything but explain nothing.
A theory that does not explain why some pattern is obligatory
can still be useful the way an organized catalog is useful. It
may even list the features according to which it is organized.
But it does not give an account of why it is organized that
way rather than some other way. To do that it needs to
appeal to something constant such as a law of nature or a
fixed property of the architecture.
How are these ‘obligatory’ constraints realized?
●
●
Image properties, such as size and rigidity are assumed
to be inherent in the architecture (e.g., of the ‘display’)
That raises the question of what kind of architecture
could possibly enforce rigidity of shape?



Notice that there is nothing about a spatial display, let alone a
functional space, that makes it obligatory that shape be rigidly
maintained as orientation is changed.
Such rigidity could not be a necessary property of the
architecture of an image system because we can easily imagine
that rigidity does not hold (e.g. imagine a rotating snake!).
There is also evidence that ‘mental rotation’ is incremental,
not holistic, and the speed of rotation depends on the
conceptual complexity of the shape and the comparison task.
What makes some properties seem “natural” in a
matrix but not so natural in a symbolic data structure?
1. A matrix is generally viewed as a two-dimensional structure in
which specifying the x and y values (rows and columns)
specifies the location of any cell. But that’s just the way it is
conventionally viewed. Rows, columns and cells are not actually
spatial locations.
2. In a computer there is no requirement that in getting from one
cell to another one must pass through any other specified cells
nor is there any requirement that there be empty cells between
any pairs of cells.
What makes some properties “natural” in a matrix
while not so natural in a symbolic data structure?
3.
The main reason it is natural to view a matrix as having spatial
constraints is that one is tacitly assuming that it represents some
space. Then it is the represented space that has the constraints,
not the matrix.
 Notice the subtle succumbing to the intentional fallacy again!
4.
Any constraints that the functional space exhibits are constraints
extrinsic to the format. Such constraints reside in the external
world which the ‘functional space’ represents.

But such extrinsic constraints can be added to any model of scanning,
including a propositional one.
What warrants the ‘obligatory’ constraint?
But it is no more obligatory that the relation between
distance, speed and time hold in functional space than in a
symbolic (propositional) representation. There is no natural
law or principle that requires it. You could imagine an
object moving instantly or according to any motion relation
you like, and the functional space would then comply with
that motion since it has no constraints of its own.
 So why does it seem natural for imagined moving objects to traverse
a ‘functional space’ than a sequence of symbolic representations of
locations?
There are at least two reasons why a ‘functional space’ might seem
more natural than a symbolic representation of space, and both
depend on (1) subjective experience and (2) the intentional fallacy.
Where does the obligatory constraint come from?
There are at least two reasons why the following equation holds in
the mental image scanning task, even though, unlike in the real
vision case, it does not follow from a natural law.
Actual Time = Representation of distance
Representation of speed
1.
Because subjects have tacit knowledge that this is what would
happen if they viewed a real display, and they understand the
task to be one of reproducing properties of this viewing, or
2.
Because the matrix is taken to be a simulation of real space. In
that case the reason that the equation holds is that it is supposed
to be simulating real space and the equation holds in real space.

In that case it is not something about the form of the representation that
provides the principled constraint, it’s the fact that it is supposed to be
simulating real space which is where the obligation comes from. But the
same thing can be done for any form of representation.
Why is it ‘natural’ to assume that
functional space is like real space?
There are several reasons why a functional space, such
as a matrix data structure, appears to have natural
spatial properties (e.g., distances, size, empty places):
1. Because when we think of functional space, such as
a matrix, we think of how we usually interpret it.
 A matrix does not intrinsically have distance, empty places,
direction or any other such property, except in the mind of
the person who draws it or uses it!
 Moving from one cell to another does not require passing
through intermediate cells unless we stipulate that it does.
The same goes for the concept of ‘intermediate cell’ itself.
Why is it ‘natural’ to assume that
functional space is like real space?
2.
Because when we think of a functional space, such as a
matrix, we think of it as being a way of simulating real
space in the model – making it more convenient to build
the model which otherwise would require special hardware
 This is why we think of some cells as being ‘between’ others and
some being farther away. This makes properties like distances seem
natural because we interpret the matrix as simulating real space.
 In that case we are not appealing to a functional space in explaining
the scanning effect, the size effect, etc. The explanatory force of the
explanation comes from the real space that we are simulating.
• This is just another way of assuming a real space (in the brain) where
representations of objects are located in neural space
• All the reasons why the assumption of real brain space cannot be
sustained in explanations of mental imagery phenomena apply to this
version of ‘functional space.’
Why is it ‘natural’ to assume that
functional space is like real space?
2.
Because what we really want to claim is that images are
displayed on a real spatial surface – a blackboard. But to
model this we would need to build a hardware display. {An
easier way to do this is simply to claim explicitly that there is a display or
even simulate one using software (such as Kosslyn, et al. (1979) claim to
have done*)}.
 This allows us to view some cells as being ‘between’ others and
some being farther away. This makes properties like distances seem
natural because we interpret the matrix as simulating or standing in
for a real spatial display board or screen.
 In that case we are not appealing to a functional space in explaining
the scanning effect, the size effect, etc. The explanatory force of the
explanation comes from the real space that we are claiming and
simulating. This is just another way of assuming a real space (in the
brain) where representations of objects are located in neural space.
Functional space and explanatory power
● There is a notion of explanatory power that needs to be kept in
mind. It is best illustrated in terms of models that contain empirical
parameters, as in fitting a polynomial curve to data.
● The general fact about fitting a model to data is that the fewer
parameters that need to be estimated from the data to be fitted, the
more powerful the explanation. The most powerful explanation is
one that does not have to use the to-be-fitted data to tune the model.
● In terms of the current example of explaining results of experiments
involving mental imagery, appealing to a “functional space” leaves
open an indeterminate number of empirical parameters, so it
provides a very weak (or vacuous) explanation.
● A literal (brain) space, on the other hand, is highly constrained
since it must conform to Euclidean axioms and Newtonian physics
– otherwise it would not be the space of natural science. But that
kind of space implies that images are displayed on a surface in the
brain and while that is a logical possibility it is not an empirical one
Explanation and Description
● Another way to look at what is going on is to think about the
difference between a description and an explanation. The two
ways of characterizing a set of phenomena appear similar – they
both speak of how things are and how they change (think of the
Code Box example).
● But a description of a system’s behavior can apply to many
different types of system with different mechanisms and
different causal properties. And the same mechanisms can also
produce very different behaviors under different circumstances.
Although a general statement of what constitutes scientific
explanation and how it differs from description has a long and
controversial history, the simple Code Box example will suffice
to suggest the distinction I have in mind.
Cognitive Penetrability again
● A description states the observed generalizations (the
observed patterns of behavior). The explanation goes
beyond this. The difference is related to the question of
how mutable a set of generalizations are and what types of
effects can lead to changes in these generalizations.
● Causal accounts tend to have a longer time scale and when
they change they tend to change according to different
sorts of principles than those that describe the patterns.*
 Notice that we have come back to the criterion of
cognitive penetrability. According to this way of
looking at the question of explanatory adequacy, an
theory meets the criteria of explanatory adequacy if
it describes the architecture of the system and its
operation.
A note about time scales and types of changes
* “Causal accounts tend to have a longer time scale and when
they change they tend to change according to different sorts
of principles than those that describe the patterns.”
Consider the Code Box example.

Changes that are not architectural tend to occur rapidly –
different patterns are observed simply because different topics
or words or even languages might be transmitted.

Changes that are architectural require altering which letters or
other symbols are transmitted (e.g., they may be numerals) or
changing whether the outputs consist of short and long pulses
that are interpretable as Morse Code. They require what we
might think of as “rewiring”.
A different way of approaching the
question of spatial representation
 I offer a provisional proposal that preserves some of the
advantages of the global spatial display, but assumes that the
relevant spatial properties are in the perceived world and can
be accessed if we have the right access mechanisms for
selecting and indexing objects in the perceived world
 Let’s call this the Index Projection Hypothesis because it
suggests that mental objects are somehow projected onto and
associated with perceived objects in real space
 But this proposal is very different from image-projection because
only a few object-labels are projected – not the rich visual
properties suggested by the phenomenology
The Image Projection Hypothesis
This projection hypothesis relies on the spatial locations of
objects in the concurrently perceived world to meet the
conditions outlined earlier. It rests on two assumptions:
1. We have a system of “pointers” (viz, the FINST perceptual
index mechanism to be described) by which a small number
(n≤4) of objects in the world can be selected and indexed.
Indexes provide demonstrative references to individual targets
qua individuals, that keep referring to these objects despite
changes in their location or any other properties.
2. When we perceive a scene that contains indexed objects, our
perceptual system is able to treat those objects as though they
were assigned unique labels. Thus our perceptual system is
able to detect configurational properties among the indexed
objects.
The index projection hypothesis (2)
The hypothesis claims that the subjective impression that we
have access to a panorama of detailed high-resolution
perceptual information is illusory. What we have access to is
only information about selected or indexed objects.
 We have the potential to obtain other information from more of
the scene through the use of our system of perceptual indexes.
This is the basic insight expressed in the “world as external
memory” slogan or the “situated cognition” approach.
 In reasoning using mental images we may assign indexes to
perceived objects based on our memory of (or our assumptions
about) where certain ‘mental’ objects are located
 But notice that the memory representation is itself not used
in spatial reasoning and therefore need not meet the spatial
constraints listed earlier – it can be in some general LOT
Examples of the projection hypothesis
 To illustrate how the projection hypothesis works, first

consider index-based projection in the visual modality,
where indexes can convert some apparently mental-space
phenomena into perceived-space phenomena (more on the
non-visual case later)
Examples from some ‘mental imagery” experiments
 Mental scanning (Kosslyn, 1973)
 Mental image superposition (Podgorny& Shepard, 1978)
 Visual-motor adaptation (Finke, 1979)
 S-R compatibility to imagined locations (Tlauka, 1998)
Studies of mental scanning
Often cited to suggest that spatial representations
are literally spatial and have metrical properties
tower
X
windmill
X
steeple
X
tree
X
Time to “see” feature on image
beach X
X
Distance on image
Brain image or index-based projection?
 A way to do this task:
 Associate places on the memorized map with
objects located in the same relative locations in the
world that you perceive (e.g., the room you are in)
 Move your attention or gaze from one place to
another as they are named
Using a perceived room to anchor
FINSTs tagged with map labels
Using vision with selected ‘labeled’ objects



If you ‘project’ the pattern of map places by indexing objects in the
room in front of you that correspond to the memorized relative
locations, then you can scan attention from one such indexed object
to another. The relation time = distance  speed holds because the
space you are scanning is the real physical space in the room.
You can also use the indexed objects to infer configurational
properties you may not have noticed, despite memorizing the
location of objects. e.g.
 Which 3 or more places on the map are collinear?
 Which place on the map is furthest North (or South, East, West)?
 Which 3 places form an isosceles triangle?
Such configurational consequence can be detected as opposed to
logically inferred, so long as they involve only a few places, because
the visual system can examine the indexed objects in the scene
Connecting Images and Motor actions
Images and visual-motor phenomena
 S-R Compatibility / Simon effect
 Finke’s imagined wedge goggles
 Harry’s subitizing-by-pointing
Both vision and visual imagery have
some connection to the motor system

There are a number of experiments showing the close
connection between images and motor control*



You can get Stimulus-Response compatibility effects
between the location of aanstimulus
image in space and the
location of the response button in space,
Ronald Finke showed that you could get adaptation with
the position of the misperceived hand that was similar to
imagined
adaptation to displacing
prism goggles,
Both these findings provide support for the view that the
spatial character of images comes from something being
projected onto a concurrently perceived scene and then
functioning much as objects of perception.

This is the main new idea in Chapter 5 of Things & Places)
This story is plausible for visual cases, but
how does it work without vision (e.g., in the
dark)?
 We must rely on our remarkable capacity to orient to
(point to, navigate towards, …) perceived or recalled
objects (including proprioceptive ‘objects’) in space
without vision
 Call this general capacity our spatial sense
 How can the projection hypothesis account for this
apparently world-centered spatial sense without assuming
a global allocentric frame of reference?
 Answer: Just as it does with vision, by anchoring
represented objects to (non-visually) perceived objects in
the world
The spatial sense and the projection
hypothesis
 Indexing non-visual ‘objects’ must exploit auditory and
proprioceptive signals, and perhaps even preparatory
motor programs (the ‘intentional’ frame of reference proposed
by Anderson & Bruneo, 2002; Duhamel, Colby & Goldberg,
1992)
 Is there some special problem about proprioceptive
inputs that makes them different from visual inputs?
Is there a problem with proprioceptive inputs
indexing objects the way visual indexes do?
 Unlike visual objects, proprioceptive “objects” are not fixed in
an allocentric frame of reference – or are all objects the same?
 Notice that in vision and audition, even though static objects are
fixed in an allocentric frame of reference, they nonetheless move
relative to sensors, so their location in an allocentric frame must be
updated as the proximal pattern moves (Andersen, 1999; Stricanne, Anderson
& Mazzoni, 1996)
 The neural implementation of FINST indexes in vision requires an
active updating process of some kind
 Maybe the same updating operation can also yield the sense of
“same location in space” for proprioceptive ‘objects’
 There are good reasons to think that proprioceptive signals may also
be given in an allocentric frame of reference! (Yves Rosetti)
What is the real problem of our sense of space?
 In order to solve the problem of how we index objects in the
world using proprioceptive inputs we need to solve the problem
of how we recognize two such inputs as corresponding to
actions (e.g., reaching) towards the same object in the world
 This is the problem of the equivalence of movements, or of
proprioceptive inputs, corresponding to the same object – it is
the problem that Henri Poincaré recognized as the central
problem of understanding our sense of space (in Poincaré “Why
space has three dimensions” Les Dernier Penseés, 1913)
 Solving the equivalence problem would solve the problem of
coordinating signals across frames of reference

That’s why mechanisms of coordinate transformation are of
central importance – they generate the relevant equivalences!
Assumption: Coordinate transformations are the
basis for the illusory “global frame of reference”
 A coordinate transformation operation takes a representation of

an object relative to one coordinate system – say retinal
coordinates – and produces a representation of that object relative
to another frame of reference – say relative to the location of a
hand in proprioceptive or kinematic coordinates
Coordinate transformations define equivalence classes of
proprioceptive inputs that correspond to actions (e.g., reaching,
eye movements) towards the same object in space
 Such transformations are well-known and ubiquitous in the brain
(especially in posterior parietal cortex and superior colliculus)
 A consequence of these mechanisms is that, as (Colby &
Goldberg, 1999) put it, “Direct sensory-to-motor coordinate
transformation obviates the need for a single representation of
space in environmental coordinates” (p319)
Coordinate transformations need not transform
all points in a given frame of reference
 Coordinate transformations need not transform all points
(including points in empty space) or all sensory objects: Only
a few selected objects need to be transformed at any one time
 The computational complexity of coordinate transformations
can be made tractable by only transforming selected objects (as
is done by matrix operations in computer graphics)
 This idea is closely related to the conversion-on-demand
hypothesis of Henriques et al. (1998) and Crawford et al. (2004).

In the Henriques et al COD proposal, visual information about
object locations is held in a gaze-centered frame of reference
and objects are converted to motor coordinates when needed
Coordinate transformations define equivalence
classes of gestures which individuate proprioceptive
objects just the way that FINST indexes do in vision



Coordinate transformations compute equivalence classes of
proprioceptive signals {s} corresponding to distinct motor actions to
individual objects in real space. The equivalence class is given by:
s ≡ s′ iff there is a coordinate transformation between S ↔ S′
As in the visual case, only a few such equivalence classes are
computed, corresponding to a few distal objects that were selected
and assigned an index, as postulated in FINST Theory
We can thus bind several objects of thought to objects in real space
(including sensory ‘objects’ perceived in proprioceptive modalities)
 This can explain the ‘spatial’ character of spatial representations,
just the way they did in the purely visual cases illustrated earlier
Mental imagery and neuroscience
● Neuroanatomical evidence for a retinotopic display in
the earliest visual area of the brain (V1)
● Neural imaging data showing V1 is more active during
mental imagery than during other forms of thought
 The form of activity differs for small vs large images in the
way that it differs when viewing small and large displays
● Transcranial magnetic stimulation of visual areas
interferes more with imagery than other forms of
thought
● Clinical cases show that visual and image impairment
tend to be similar (Bisiach, Farah)
● More recently psychophysical measures of images
shows parallels with comparable measures of vision,
and these can be related to the receptive cells in V1
Status of different types of evidence in the
debate about the form of mental images
● Phenomenology. Is it epiphenominal?
● Neuroscience evidence for:




Role of vision
Type and location of neural structures underlying images
Are the neural mechanisms for early vision used in imagery?
Does neuroanatomy provide evidence for the nature of “depictive”
representations.
Neuroscience has shown that the retinal pattern of
activation is displayed on the surface of the cortex
There is a topographical projection
of retinal activity on the visual
cortex of the cat and monkey.
Tootell, R. B., Silverman, M. S., Switkes, E., & de Valois, R. L
(1982). Deoxyglucose analysis of retinotopic organization in
primate striate cortex. Science, 218, 902-904.
Problems with drawing conclusions about the
nature of mental images from neuroscience data
1.
The capacity for imagery and for vision are known to be
independent. Also all imagery results are observed in the blind.
2.
Cortical topography is 2-D, but mental images are 3-D – all
phenomena (e.g. rotation) occur in depth as well as in the plane.
Patterns in the visual cortex are in retinal coordinates whereas
images are in world-coordinates
3.

4.
Your image stays fixed in the room when you move your eyes or turn
your head or even walk around the room
Accessing information from an image is very different from
accessing it from the perceived world. Order of access from
images is highly constrained.

Conceptual rather than graphical properties are relevant to image
complexity (e.g., mental rotation).
Problems with drawing conclusions about mental
images from the neuroscience evidence
5.
6.
7.
8.
Retinal and cortical images are subject to Emmert’s Law,
whereas mental images are not;
The signature properties of vision (e.g. spontaneous 3D
interpretation, automatic reversals, apparent motion, motion
aftereffects, and many other phenomena) are absent in images;
A cortical display account of most imagery findings is
incompatible with the cognitive penetrability of mental imagery
phenomena, such as scanning and image size effects;
The fact that the Mind’s Eye is so much like a real eye (e.g.,
oblique effect, resolution fall-off) should serve to warn us that
we may be studying what observers know about how the world
looks to them, rather than what form their images take.
Problems with drawing conclusions about mental
images from the neuroscience evidence
9.
Many clinical cases can be explained by appeal to tacit
knowledge and attention



The ‘tunnel effect’ found in vision and imagery (Farah) is likely due to
the patient knowing what things now looked like to her post-surgery
Hemispatial neglect seems to be a deficit in attention, which also
explains the “representational neglect” in imagery reported by Bisiach
A recent study shows that imaginal neglect does not appear if patients
have their eyes closed. This fits well in the account I will offer in which
the spatial character of a mental images derives from concurrently
perceived space.
10. What if colored three-dimensional images were found in visual
cortex? What would that tell you about the role of mental
images in reasoning? Would this require a homunculus?
Should we welcome back the homunculus?
● In the limit if the visual cortex contained the contents of ones
conscious experience in imagery we would need an interpreter to
“see” this display in visual cortex
● But we will never have to face this prospect because many
experiments (including ones by Kosslyn) show that the contents of
mental images are conceptual (or, as Kosslyn puts it, contain
“predigested information”).
● And finally, it is clear to anyone who thinks about it for a few
seconds that you can make your image do whatever you want and
to have whatever properties you wish.
 There are no known constraints on mental images that cannot be attributed to
lack of knowledge of the imagined situation (e.g., imagining a 4D cube).
 All currently claimed properties of mental images are cognitively penetrable.
Explaining mental scanning, mental rotation and
image size effects in terms of “functional space”
● When people are faced with the natural conclusion
that the “iconic” position entails space (as in scanning
and size effects) they appeal to “functional space”
● A Matrix in a computer are often cited as an example
● Consider a functional space account of scanning or of
mental rotation:
 Why does it take longer to scan a greater distance in a
functional space?
 Why does it take longer to rotate a mental image a greater
angle?
Why do conscious contents misguide
us?
 The contents that appear in our conscious experience almost


always concern what we are thinking about and not what
we are thinking with – with content rather than form.
The processes that we see unfolding in our mind are almost
always attributable to what we know about how the things
we are thinking about would enfold, rather than being due
to laws that apply to our cognitive architecture. cf Code
Box
We should take seriously the possibility that (almost) all
constraints and law-like behaviors of objects of our
experience are constraints due to our knowledge rather than
of the mental architecture. Notice the mental rotation
example and the mistake that Jesse Prinz makes.
This is what our conscious experience
suggests goes on in vision…
Kliban
This is what the demands of explanation
suggests must be going on in vision…
Imagine this shape rotating slowly
Is this how it looked to you?
When you make it rotate in your mind, does it retain its rigid 3D
shape without re-computing it? Would you expect to ‘see’ this kind
of information process? Does the experience in this case reassure
you that the rotation was smooth? Are you sure something
rotated?
What about the evidence of conscious
experience? Is it irrelevant?



I have often been accused of relegating conscious experience to
the category of epiphenomena – something that accompanies a
process but does not itself have a role in its causation.
But images are not illusory or unnatural, they are quite real. The
problem is that people have theories of the causal or informationprocessing that underlies these phenomena and these theories are
almost always false because they assume a simple and obvious
mapping from the experience to the computational or brain states
so that we can see the form of the representation.
The connection between conscious experience and information
processing is deeply mysterious (it’s the mind-body problem). But
one thing we do know is that the sequence of events that unfolds
when we imagine something does not reveal causal laws because
there are no causal laws of conscious states as conscious states.
The important distinction between
architecture and represented content
 It is only obligatory that a certain pattern must occur if the
pattern is caused by fixed properties of the architecture, as
opposed to being due to properties of what is represented
(i.e., what the observer tacitly knows about the behavior of
what is represented)
 If it is obligatory only because the theorist says it is, then score


that as a free empirical parameter (a wild card)
If we allow one theory to stipulate what is obligatory without
there being a principle that mandates it, then any other theory
can stipulate the same thing. Such theories are unconstrained
and explain nothing.
This failure of image theories is quite general – all picture
theories suffer from the same lack of principled constraints
The important distinction between
architecture and represented content
 It is only obligatory that a certain pattern must occur if the
pattern is caused by fixed properties of the architecture, as
opposed to being due to properties of what is represented
(i.e., what the observer tacitly knows about the behavior of
what is represented)
 If it is obligatory only because the theorist says it is, then score


that as a free empirical parameter (a wild card)
If we allow one theory to stipulate what is obligatory without
there being a principle that mandates it, then any other theory
can stipulate the same thing. Such theories are unconstrained
and explain nothing.
This failure of image theories is quite general – all picture
theories suffer from the same lack of principled constraints
How are these ‘obligatory’ constraints realized?
 Image properties, such as size and rigidity, are assumed to be

inherent in the system of representation (its architecture)
That raises the question of what kind of architecture could
possibly enforce rigidity of shape?
 Notice that neither a spatial display nor a functional space make it


obligatory that shape be rigidly maintained as orientation is changed.
Only certain physical properties can explain rigidity.
Such rigidity could not be part of the architecture of an imagery
system because we can easily imagine situations in which rigidity
does not hold (e.g. imagine a rotating snake!).
There is also evidence that ‘mental rotation’ is incremental, not
holistic, and the speed of rotation depends on the conceptual
complexity of the shape and the comparison task.
Aside: What can we conclude from the
contents of conscious experience?
What should we conclude about the role of
conscious appearance in cognitive science?
We can’t do without it:
 When we ask which line appears longest or which version of
an ambiguous figure we see we are asking for a report of
conscious content. Much of what we know of how vision
works depend on such evidence.
What should we conclude about the role of
conscious appearance in cognitive science?
But we can’t accept it at face value:
 If you ask yourself “what am I thinking …?” you raise one of the most
mysterious problems in the philosophy of mind. You could not
possibly give the correct, or at least an unproblematic answer because
it is quite possible that you don’t know what you are thinking! Your
experience when thinking is as of speaking or of perceiving – what else
could it be?
 Every imagined sentence is infinitely ambiguous and understanding it
presupposes a huge amount about the context of its utterance. Your
experience of speaking cannot be the same as your thought; your
thought precedes your imagined speech and is just one of the contents
that are expressed. The sentences you imagine, as the sentences you
speak, follow Gricean conversational maxims – e.g., don’t state what
your listener already knows, state only what is relevant, state only as
much as necessary to convey your intentions. Something similar is
true of imaging.
Examples of conscious evidence that
leads to false conclusions
 Sentence example
 Example of mental diagram and what it must assume
Do we (or can we) experience our
thoughts?
 There is much that can be said on this topic, but there
is no time for it here. But see:
 Grice, H. P. (1975). "Logic and Conversation." Syntax and


Semantics 3: 41-58.
Hurlburt, R. T., & Schwitzgebel, E. (2007). Describing
Inner Experience? Cambridge, MA: MIT Press.
Schwitzgebel, E. (2011). Perplexities of Consciousness.
Cambridge, MA: MIT Press.
What should we conclude about the role of
conscious appearance in cognitive science?
But we can’t accept it at face value:
 If you ask yourself “what am I thinking …?” you raise one of the most
mysterious problems in the philosophy of mind. You could not
possibly give the correct, or at least an unproblematic answer because
it is quite possible that you don’t know what you are thinking! Your
experience when thinking is as of speaking or of perceiving – what else
could it be?
 Every imagined sentence is infinitely ambiguous and understanding it
presupposes a huge amount about the context of its utterance. Your
experience of speaking cannot be the same as your thought; your
thought precedes your imagined speech and is just one of the contents
that are expressed. The sentences you imagine, as the sentences you
speak, follow Gricean conversational maxims – e.g., don’t state what
your listener already knows, state only what is relevant, state only as
much as necessary to convey your intentions. Something similar is
true of imaging.
But there are examples of solving geometry
problems easily with imagery
• There are many problems that you can solve much
more easily when you imagine a layout than when
you do not.
• In fact many instances of solving problems by
imagining a layout that seem very similar to how
would solve them if one had pencil-and-paper.
• The question of how pictures, graphs, diagrams, etc
help in reasoning is very closely related to the
question of how imagined layouts function in
reasoning. That is not in question. What is in
question is what happens in either the visual or
imagined cases and how images can benefit from this
processes even though there is no real diagram.
How do real visual displays help thinking?
 Why do diagrams, graphs, charts, maps, icons and other visual
objects help us to reason and to solve problems?
 The question why visual aids help is nontrivial and Seeing &
Visualizing, chapter 8 contains some speculative discussion,
e.g., they allow the visual system to:
• make certain kinds of visual inferences
• make use of visual demonstratives to offload some of the memory load
• Capitalize on the fact that the displays embody the axioms of measure
theory and of geometry (which are then inherited by thought)
 The big question is whether any of these advantages carry over to
imaginal thinking! Do mental images have some (or any) of the
critical properties that make diagrams helpful in reasoning?
Visual inferences?
●
If we recall a visual display it is because we have encoded
enough information about its visual-geometrical properties that
we can meet some criteria, e.g., we can draw it. But there are
innumerably many ways to encode this information that are
sufficient for the task (e.g. by encoding pairwise spatial
relations, global spatial relations, and so on). For many
properties the task of translating from one form to another is
much more difficult than the task of visually encoding it – the
translation constitutes visual inference.
● The visual system generalizes from particular instances as part
of its object-recognition skill (all recognition is recognition-as
and therefore assumes generalization from tokens to types). It
is also very good at noticing certain properties (e.g., relative
sizes, deviations from square or circle, collinearity, inside, and
so on). These capabilities can be exploited in graphical layouts.
Memorize this map so you can draw it accurately
From your memory:
•
•
•
•
•
Which groups of 3 or more locations are collinear?
Which locations are midway between two others?
Which locations are closest to the center of the island?
Which pairs of locations are at the same latitude?
Which is the top-most (bottom-most) location?
 If you could draw the map from memory using whatever
properties you noticed and encoded, you could easily
answer the questions by looking at your drawing – even
if you had not encoded the relations in the queries.
Draw a rectangle. Draw a line from the bottom corners to a point
on the opposite vertical side. Do these two lines intersect? Is the
point of intersection of the two lines below or above the
midpoint? Does it depend on the particular rectangle you drew?
A
B
x
y
m
m’
D
C
Which properties of a real diagram also
hold for a mental diagram?
● A mental “diagram” does not have any of the properties that a
real diagram gets from being on a rigid 2D surface.
● When you imagine 3 points on a line, labeled A, B, and C, must
B be between A and C? What makes that so? Is the distance AC
greater than the distance AB or BC?
● When you imagine drawing point C after having drawn points A
and B, must the relation between A and B remain unchanged
(e.g., the distance between them, their qualitative relation such as
above or below). Why?
● These questions raise what is known as the frame problem in
Artificial Intelligence. If you plan a sequence of actions, how do
you know which properties of the world a particular action will
change and which it will not, given that there are an unlimited
number of properties and connections in the world?
Download