The Illusion of Mental Pictures Zenon Pylyshyn Rutgers University, Center for Cognitive Science http://ruccs.rutgers.edu/faculty/pylyshyn.html The illusion of mental pictures ● There is no question that people (except maybe ~2% of the population) experience mental images when they recall, plan, anticipate and otherwise enjoy life in the absence of the things and people that they imagine ● Not only are we able to “picture” some object or scene in our “mind’s eye” but it seems that we must do so in order to solve certain kinds of problems ● Books are full of examples of how images helped people to discover and create theories and works of art – creations that would not have happened without the capacity to use mental imagery. I will not rehearse all the examples, but they include Einstein, Kikule, … The illusion about the causal role of mental pictures in thought ● What happens when we create and inspect mental images? This is a deep problem because it touches on the mind-body duality and other potentially unsolvable problems. But it is important that as scientists we consider what is entailed by talk of creating, recalling, examining and transforming mental images. ● I have argued, and still believe, that there is a powerful illusion behind not only our folk understanding of mental imagery, but also behind our attempts to build scientific theories of it. ● The illusion is this: When we engage in what we call visualizing there is, somewhere (presumably in our head), a thing that we view more or less the way we view the world; a thing that we might as well call a mental picture since a picture is, after all, something that looks like the thing that is pictured. Does the intentional fallacy apply to all conscious experiences, or only to sensory experiences (and is there a difference)? ♫ I believe it applies to all conscious experiences because I don’t know of any conscious experiences that are not at some level perceptual. Even somatic experiences arise from transduction (e.g. headache, nausia, dizziness) even though the conscious content may not signal the locus accurately ♫ True example: At this very moment I have a toothache in my upper left tooth, but my dentist insists that it is actually in my bottom teeth, and he may be right. Some common mistakes in thinking about mental imagery 1. The intentional fallacy: Confusing properties of the imagined world with properties of the imagination or the mechanism or medium of imagery 2. Examples : size, distance, color, shape, temporal duration? Task demands. Insufficient attention is paid to how subjects interpret instructions in imagery experiments. “Imagine x” “Pretend that you are seeing x happening” This is a “task demand” and is not a case of subjects being acquiescent or compliant and trying to give the result they think the experimenter wants. This is a rational and appropriate understanding of the task to imagine something. The consequence is that subjects will make as many properties as they know of and as they are able to control come out as it would have in reality. An example and a question about duration 1. The intentional fallacy: Confusing properties of the imagined world with properties of the imagination or the mechanism or medium of imagery 2. Examples : size, distance, color, shape, temporal duration? Task demands. Insufficient attention is paid to how subjects interpret the task of imagining in imagery experiments. “Imagine x” “Pretend that you are seeing x happening” This is a “task demand” and is not a case of subjects being acquiescent or compliant and trying to give the result they think the experimenter wants. This is a rational and appropriate understanding of the task to imagine something. The consequence is that subjects will make as many properties as they know and are able to control come out as it would have in reality. 1. A common mistake in thinking about mental imagery is the “intentional fallacy” in which one Confuses properties of the imagined world with properties of the imagination or the mechanism or medium of the image An image of X with property P can mean It’s the image that has the property! 1. (An image of X) with property P or 2. An image of (X with property P) It’s the X that has the property! 2. Demands of the task to “imagine x” ● Most of the behavioral research into dynamic properties of mental imagery is explained by noticing how one studies properties of imagined processes. When we ask subjects to “imagine that X” where X is some process (like looking at a small mouse or watching a spot move across a map from one place to another), what we are inviting the subject to do is pretend that they are seeing X happening. In that case, how X unfolds is dictated by what observers believe would happen if they were seeing it. This belief is often tacit and unconscious. ● This is not a case of subjects being disingenuous, or of acquiescing to “experimenter demand”. Rather it’s the rational response to the task as presented. ● Consider the task of imagining hearing a note-for-note perfect performance of a Mozart symphony. How long should that take? (a) 30 seconds, (b) 20 minutes. Mozart claimed (a) was he telling the truth? Examples to probe your intuition and your tacit knowledge Imagine various events unfolding before your “mind’s eye”… ● ● ● ● ● Imagine turning a heavy wheel. Now a light wheel. Which is faster? Imagine a baseball being hit. What shape trajectory does it trace? It is coming towards you: Where would you run to catch it? You have considerable “tacit knowledge” of what to do in this case. Imagine a coin dropping and whirling on its edge as it eventually settles. Describe how it behaves ( “Euler’s Disk” problem was first solved in 2000) Imagine a heavy ball (a shot-put) and a light ball (a tennis ball) being dropped at the same time from a building (e.g., the leaning tower of Pisa). Indicate when they hit the ground. Repeat at different heights. Imagine a clear glass containing a colored liquid. Tilt it 45º to the left (counter-clockwise). What is the orientation of the liquid? What color do you see when two color filters overlap? Where would the water go if you poured it over a full beaker of sugar? sugar water Is there conservation of volume in your image? If not, why not? What can we conclude from reports of arithmetic operations on imagined quantities? And now for something more serious Representation of Space in Consciousness (especially in the form of mental images) This a major topic in imagery research and also in discussions of (self)-consciousness Spatial character of mental images ● Among the more impressive findings of research on mental imagery are ones that suggest that images have spatial properties (e.g., mental rotation, mental scanning, mental size effects, psychophysical measures of the “mind’s eye”). ● Intuitively we feel that we can reason by imagining things laid out in space and then by examining the display we can often read off the solution. Yet there have been few attempts to say exactly what being “laid out in space” means, either formally or physically. ● One of the most explicit statements concerning the spatial properties of images has been a statement by Steve Kosslyn about what he calls the depictive nature of mental images. Images as displayed in “functional space” A statement of the picture theory (Kosslyn, 1994) ● “A depictive representation is a type of picture, which specifies the locations and values of configurations of points in a space”. ● “The space in which the points appear need not be physical…, but can be like an array in a computer, which specifies spatial relations purely functionally. That is, the physical locations in the computer of each point in an array are not themselves arranged in an array; it is only by virtue of how this information is ‘read’ and processed that it comes to function as if it were arranged into an array (with some points being close, some far, some falling along a diagonal, etc).” (p5) I will argue that it is important why the information is ‘read’ in one way rather than in another since that determines whether the account is explanatory or descriptive or merely circular. The illusion of mental (picture) space ● There have been two options for accounting for the spatial properties of conscious images: 1. 2. 3. Assume a physical display in the brain, or Assume a mechanism that simulates spatial properties but is not itself a literal space. This is referred to as functional space, or Assume that the phenomenology has causal power. ● Neither of these options is consistent with empirical evidence: The cortical space assumption is not consistent with neural or behavioral evidence and the functional space assumption is either metaphorical or circular. Because a functional space has no inherent constraints, so it exhibits whatever properties we stipulate it to have, thus is not explanatory. Later I will suggest that that there is a possible constraint on the spatial properties, but they are not in the head but in the relation of thought to concurrently-perceived space. What does being “spatial” entail? Images and space: some possible constraints 1. Are images spatial? Do they have spatial properties such as size, distance, and relations such as above, next-to, inbetween? Do the axioms of Euclidean geometry and measure theory hold of patterns displayed in them? e.g., a) ab + bc ac and ab = ba b) If abc = 90°, then ab2 + bc2 = ac2 a b c 2. If such axioms are true of images, what would that entail about how they must be instantiated in the brain? a) Could they be analogue? What constraints does that impose? b) Is the space 2-D or 3D? 3. Is there a coherent notion of a “functional space,” as something with the formal properties of space yet without being instantiated in real physical brain-space? The spatial-metrical character of images ● The claim that images have spatial properties comes from our phenomenology, and also from a number of experiments suggesting that images must actually have metrical properties, particularly spatial ones (not just represent metrical properties, but have them). ● The most commonly cited experiments are ones that seem to involve continuous spatial properties Mental scanning across an image*<discussed earlier> Effects of image size Mental rotation of images Do images have “size”? ● ● ● There are many studies showing that when subjects imagine something small it takes them longer to detect small features (e.g. the mouse’s whiskers) than when they imagine them as large. What do these tell us about what “size” is on an image? There are two possibilities: The size is either the size of the image or it is the size of the thing imagined. The first needs either a physical size or some theoretical idea about what constitutes image size that has yet to be provided, and the second can yield the observed result simply because the subject knows what would happen in the a viewing, namely if something is seen to be small the details will not be as clear or you will need to come closer (or ‘zoom’ in on the object) to see the details, so an observer will surely ensure that this happens. (What if it didn’t?) Suppose, instead, you were asked how long it took to report details in a large blurred low-resolution image versus a small high definition image? Why is such a control not done? Do mental images have (as opposed to represent) size? 1. Imagine a mouse across the room so it’s image “occupies a small part of your total image display”. 2. Now imagine it close to you so it fills your image display Of these two conditions, in which do you see small details most clearly? In which does it take longer to “see the mouse’s whiskers?” What does this result tell you about the “size” of a mental image? 3. Imagine a horse. How close can you imagine coming to the horse before it starts to overflow your image? Repeat with a toaster, a table, a person’s face, etc. 4. Does this provide a measure of “the visual angle of the mind’s eye” and a measure of the “mental size” of the horse? {cited as a classical image finding} Do these concepts have any meaning without the literal space view of images? Clearly these results are the ones you would expect if the subject is telling you what it would be like to see a real horse, mouse, etc One of the least controversial examples of image transformation: Mental rotation Time to judge whether (a)-(b) or (b)-(c) are the same except for orientation increases linearly with the angle between them (Shepard & Metzler, 1971) Imagine this shape rotating slowly Is this how it looked to you? When you make it rotate in your mind, does it seem to retain its rigid 3D shape without re-computing it? The missing ‘obligatory’ constraint ● What is assumed about the format or architecture of the mental representation in the examples of mental rotation? ● According to philosopher Jesse Prinz (2002, p 118) ‘If visual-image rotation uses a spatial medium of the kind Kosslyn envisions, then images must traverse intermediate positions when they rotate from one position to another. The propositional [symbolic] system can be designed to represent intermediate positions during rotation, but that is not obligatory’ This is a very important observation but it is incomplete. It needs to answer the question: What makes it obligatory that objects must ‘pass through intermediate positions’ when rotating in ‘functional space’, and what constitutes an ‘intermediate position’? These terms apply to the represented world, not to the representation! ● The important distinction between architecture and represented content? It is only obligatory that a certain pattern must occur if the pattern is caused by fixed properties of the architecture as opposed to being due to properties of what is represented (i.e., what the observer tacitly knows about the behavior of what is represented). If it is obligatory only because the theorist says it is, then that is a free empirical parameter . The important consequence is that if we allow one theory to stipulate what is obligatory without there being a principle that mandates it, then any other theory (e.g. a symbolic LOT theory) can stipulate the same thing. Such theories are unconstrained and explain nothing. This failure of image theories is quite general – all picture theories suffer from the same lack of principled constraints How are these ‘obligatory’ constraints realized? ● Image properties, such as size and rigidity are assumed to be inherent in the architecture (of the ‘display’) ● That raises the question of what kind of architecture could possibly enforce rigidity of shape? Notice that there is nothing about a spatial display, let alone a functional space, that makes it obligatory that shape be rigidly maintained as orientation is changed. Also such rigidity could not be part of the architecture of an imagery module because we can easily imagine situations in which rigidity does not hold (e.g. imagine a rotating snake!). There is also evidence that ‘mental rotation’ is incremental, not holistic, and the speed of rotation depends on the conceptual complexity of the shape and the comparison task. Example 2: Mental Scanning ● Some hundreds of experiments have now been done demonstrating that it takes longer to scan attention between places that are further apart in the imagined scene. In fact the time-distance relation is linear. ● These have been reviewed and described in: Denis, M., & Kosslyn, S. M. (1999). Scanning visual mental A window on the images: A window onmind the !!mind. Cahiers de Psychologie Cognitive / Current Psychology of Cognition, 18(4), 409-465. Rarely cited are experiments by Liam Bannon and me (described in Pylyshyn, 1981) which I will summarize for you. Studies of mental scanning Does it show that images have metrical space? 2 1.8 1.6 1.4 scan image imagine lights Latency (secs) show direction 1.2 1 0.8 0.6 0.4 0.2 0 Relative distance on image (Pylyshyn & Bannon. Described in Pylyshyn, 1981) Conclusion: The image scanning effect is Cognitively Penetrable i.e., it depends on goals and beliefs, or on Tacit Knowledge. The central problem with imagistic explanations… What is assumed in the mental picture explanations of mental scanning? ● In actual vision, it takes longer to scan a greater distance because real distance, real motion, and real time is involved, therefore this equation holds due to natural law: Time = distance speed But what ensures that a corresponding relation holds in an image? The obvious answer is: Because the image is laid out in real space! But what if that option is closed for empirical reasons? Well you might appeal to a “Functional Space” which imagists liken to a matrix data structure in which some pairs of cells are closer and others further away, and to move from one to another it is natural that you pass through intermediate cells ● Question: What makes these sorts of properties “natural” in a matrix data structure? What warrants the ‘obligatory’ constraint? To use Prinz’s term, it is not obligatory that the wellknown relation between distance, speed and time hold in functional space or in a matrix. There is no natural law or principle that requires it. You could imagine an object moving instantly or according to any motion relation you like, and the functional space would then comply with that since it has no constraints of its own. Where does the obligatory constraint come from? There are at least two reasons why the following equation holds in the mental image scanning task, even though, unlike in the real vision case, it does not follow from a natural law. Time = Representation of distance Representation of speed 1. 2. By the way, there are reasons to question even this widely-held view! Because subjects have tacit knowledge that this is what would happen if they viewed a real display, and they understand the task to be one of reproducing properties of this viewing, or Because the matrix is taken to be a simulation of real space. In that case the reason that the equation holds is that it is supposed to be simulating real space and the equation holds in real space. In that case it is not something about the form of the representation that provides the principled (obligatory) constraint, it’s the fact that it is meant to be simulating real space, which is where the obligation comes from. But, again, the same thing can be done for any form of representation. Functional space and explanatory power ● ● ● ● There is a notion of explanatory power that needs to be kept in mind. It is best illustrated in terms of models that contain empirical parameters, as in fitting a polynomial curve to data. The general fact about fitting a model to data is that the fewer parameters that need to be estimated from the data to be fitted, the more powerful the explanation. Thus the lower the order of the polynomial fit the better the explanation. In terms of the current example of explaining results of experiments involving mental imagery, appealing to a “functional space” leaves open an indeterminate number of empirical parameters, so it provides a very weak or vacuous explanation. A literal (brain) space, on the other hand, is highly constrained since it must conform to Euclidean axioms and Newtonian physics – otherwise it would not be the space of natural science. But that kind of space implies that images are displayed on a surface in the brain. What next? ● So we turn now to the only place where we might be able to find properties that explain the experimental imagery results – the brain – because it is the only place where there is literal physical space that could function to underwrite such operations as scanning or rotation. No wonder the more recent work on imagery has been carried out in collaboration with neuroscience. The good news I. Is there any reason to be optimistic about finding mechanisms of imagery in visual cortex* ● There is neuroanatomical evidence for a retinotopic layout in the earliest visual area of the brain (V1) ● Neural imaging data shows that V1 is more active during mental imagery than during other forms of thought ● Transcranial magnetic stimulation (TMS) of visual areas interferes more with imagery than other forms of thought ● Clinical cases of visual agnosia show that some impairments of vision have associated impairments of imagery (Bisiach, Farah) ● Recent psychophysical observations of imagery show parallels with corresponding observations of vision, and these can be related to the receptive cells in V1 (e.g., oblique effect) Neuroscience evidence shows that the retinal pattern of activation is displayed on the surface of the cortex There is a topographical projection of retinal activity on the visual cortex of the cat and monkey. Tootell, R. B., Silverman, M. S., Switkes, E., & de Valois, R. L (1982). Deoxyglucose analysis of retinotopic organization in primate striate cortex. Science, 218, 902-904. The bad news II. There are problems with drawing conclusions about mental imagery from such neuroscience data 1. The capacity for imagery and for vision are independent. Notice that all imagery results are observed in the blind as well as in patients with no visual cortex. 2. Cortical topography is 2-D, but mental images are 3-D – all phenomena (e.g. rotation) occur in depth as well as in the plane. 3. Patterns in the visual cortex are in retinal coordinates whereas images are primarily in world-coordinates Unless you make a special effort, your image of parts of the room stays fixed in the room when you move your eyes or turn your head or walk around the room III There are problems with drawing conclusions about mental imagery from such neuroscience data 4. Accessing and manipulating information in an image is very different from accessing it from the perceived world. Order of access from images is highly constrained. Some have tried to explain this by postulating rapid decay of images, but the times involved in these demonstrations are not consistent with the data (e.g., times for reporting letters are comparable to those involving size or scanning). Conceptual rather than graphical properties are relevant to image complexity (e.g., mental rotation) so image representations seem to be conceptual If images consist in patterns on visual cortex then they behave differently when the same patterns are acquired from vision. For example the important Emmert’s law applies to retinal and cortical images but not to mental images, a fact largely unnoticed. There are problems with drawing conclusions about mental imagery from these neuroscience data 5. The signature properties of vision (e.g., spontaneous 3D interpretation, automatic reversals, apparent motion, motion aftereffects, and many other phenomena) are absent in images; 6. A cortical display account of most imagery findings is incompatible with the cognitive penetrability of mental imagery phenomena, such as scanning and image size effects; 7. The fact that the Mind’s Eye is so much like a real eye (e.g., oblique effect, resolution fall-off) should serve to warn us that we may be studying what observers know about how the world looks to them, rather than what form their images take (unless the Mind’s eye is exactly the same as the real eye!). But there are problems with drawing conclusions about mental imagery from neuroscience data 9. Many clinical cases cited by image theorists can be explained by appeal to tacit knowledge and attention The ‘tunnel effect’ found in vision and imagery (Farah) is likely due to the patient knowing how things looked to her post-surgery Hemispatial neglect seems to be an attention deficit, which explains the neglect in imagery reported by Bisiach. A recent study shows that image neglect does not appear if patients have their eyes closed (Bartolomeo & Chokron, 2002). This fits well with the account I have offered in which the spatial character of mental images derives from concurrently perceived space. 2. Image size and the visual cortex ● There is evidence that when imagining “large” objects that overflow one’s phenomenal image a different pattern of activation in visual cortex occurs than when imagining a small object. This in itself is not remarkable since all scientists accept that a difference in mental experience must be accompanied by some difference in the neural state – this is called materialism, or more technically supervenience. ● The activation pattern when imagining a large visual pattern is claimed to be similar to the activation pattern when perceiving a large visual pattern (large on the retina). In vision, objects that extend into the parafovia of the eye, project onto the more frontal parts of the visual cortex. Imagists claim that this is also true when imagining a large pattern that fills the mental screen. 2a. Image size…. Continued The mere fact that larger images lead to activation in different (rather than larger) regions of the cortex does not in itself help to explain the size effect. The explanation of why larger images are associated with shorter reaction times, is that for a given display resolution, more details can be displayed when the image is larger (and a neural account for this assumption is given in terms of lateral inhibition among neurons in V1). The actual size (as well as the resolution) of the image display always enters into explanations of the size effect. The oblique effect ● In vision, when a set of lines is to be discriminated (distinguished from a single blur) the discrimination is easier when the lines are vertical or horizontal than when they are at a 45° angle. This is called the Oblique Effect. It is a lowlevel psychophysical effect that occurs in early vision. ● Is there an Oblique effect with mental images? If so it might suggest that vision and imagery are merged early in the visual pathway and so would support the cortical-picture theory. ● The point of this example (and there are many others if anyone is interested) is to show how far writers will go to confirm the intuitive but untenable cortical-picture theory of mental imagery. Do images have low-level visual properties? ● ● Imagine a grating in which the bars are: 1. Horizontal 2. Vertical 3. Oblique (45°) Imagine the bars getting closer and closer together. In which of these displays do the bars blur together first? (This requires a lab!) ● In vision, the oblique bars blur earlier (this is called the oblique effect) Kosslyn et al reported a similar result with mental imagery It is known that there are more vertical-tuned and horizontal-tuned cells than oblique-tuned cells in visual cortex. In “The case for mental imagery” Kosslyn, et al. (2006) argued that this confirms that images are projected onto visual cortex preserving orientation and spacing. Neurological explanations for both cases? ● The translation of the argument from the visual case to the mental (cortical) case rests on a misunderstanding of how the orientationspecific cells get their orientation property: they get it from the way they are wired to photoreceptive cells on the retina. Vertical cells are more often wired to columns of cells while horizontal cells are more often wired to rows of photocells. ● If patterns of bars were activated on the surface of cortex by mental imagery then horizontal cells would be no more likely to be activated by horizontal patterns than by vertical patterns. The only way that images of horizontal bars would preferentially activate horizontal cells is if the images were on the retina! ● Many similar findings alleged to show that images are displayed on the surface of visual cortex suffer from the same intentional fallacy (e.g. large images are read quicker than small images). What happens when horizontal/vertical cells are activated by means other than retinal patterns? 9 vertical 9 horizontal 5 oblique The proportion of Vertical, Horizontal & Oblique cells remains the same in all cases – they are random samples! An overarching consideration: What if colored three-dimensional neural regions were found in visual cortex? What would that tell you about the role of mental images in thought? What would it tell you about where the content of a conscious experience came from? Would this require a homunculus? Should we welcome back the homunculus? ● In the limit, if the visual cortex mapped the contents of one’s conscious experience, preserving spatial properties as many people believe of images do, we would need an interpreter to “see” this display in visual cortex. ● But we will never have to face this prospect because experiments show clearly that the contents of mental images (including those in iconic memory that lasts for a fraction of a second) are already conceptual (or, as Kosslyn puts it, are ‘predigested’)1 and therefore unlike any picture. ● Finally, you can make your image do whatever you want, and to have whatever properties you wish. There are no known constraints on mental images that cannot be attributed to lack of knowledge of the imagined situation (e.g., imagining a 4-dimensional object). Are there any ways of representing spatial layouts that are not excluded by this analysis? ● Maybe we have been looking in the wrong place for things that fall under the formal requirements of being spatial. Maybe they are not in the head after all. ● I have sketched a way of looking at this problem that locates the spatial character of thought in the concurrently-perceived world (see Chapter 5 of my 2007 book, Things and Places). I will end with just a hint of this approach. It relies on findings from the study of the interaction among perceptual modalities and imagery as well as with motor actions and also neuroscience findings concerned with coordinate transformation mechanisms in the brain. Another chapter in the imagery debate: The interaction of images with vision and motor control ● One of the more interesting lines of research on the spatial character of mental images involves studies of the interaction of images with perceived spatial layouts and with the motor system ● From the beginning it has been clear to me that one of the properties of mental images that makes them appear spatial is that they connect in certain ways not only with vision, but also with the motor system: We can point to things in our image! ● We can “project” our images onto perceived space – even space perceived in different modalities. I believe that this observation is the key to understanding the alleged spatial character of images. ● This does not require that a picture be projected, only the locations of a small number of features. As it happens, in my day job I have studied a mechanism I call a FINST that is well suited for this task. Projecting images: Shepard & Podgorny experiment Both when the displays are seen and when the F is imagined, RT to detect whether the dot was on the F is fastest when the dot is at the vertex of the F, then when on an arm of the F, then when far away from the F – and slowest when one square off the F. Both vision and visual imagery have some connection to the motor system ● Imagery clearly has some connection to motor control You can point to things in your image. This may be why images feel spatial Important idea! ● You can get Stimulus-Response compatibility effects between the location of a stimulus in an image and the location of the response button in space ● Ronald Finke showed that you could get adaptation with imagined hand position that was similar to adaptation to displacing prism goggles ● Both these findings provide support for the view that the spatial character of images comes from something being projected onto a concurrently perceived scene. This is the main new idea in Chapter 5 of Things & Places) Where do we stand? ● ● It seems that a literal picture-in-the-brain theory is untenable for many reasons – including the major empirical differences between mental images and cortical images. A serious problem with any formatbased explanation of mental imagery is the cognitive penetrability of many of the imagery demonstrations. The pictorial quality of images that arises from the similarity of the experience of imaging and of seeing, not the similarity of pictures and cortical activity! So how do we explain the similarity of the experience of imagining and of seeing – the fact that they both seem to involve a pictorial panoramic display? It is very likely that neither experience correctly reveals the form of the representation. This is what our conscious experience suggests goes on in vision… This is what the demands of explanation suggests must be going on in vision… For a copy of these slides see: http://ruccs.rutgers.edu/faculty/pylyshyn/Objects&Places2009 You are now here X But you are also here END