- Master Recherche en Sciences Cognitives Université Paris Descartes - EHESS - ENS Exploring the bouba/kiki effect: a behavioral and fMRI study Nathan Peiffer-Smadja Master 2 Thesis 2009-2010 Under the supervision of: Pr. Laurent Cohen “Neuropsychology and Neuroimaging” CRICM / CNRS UMR7225 / INSERM UMR_S975 / UPCM ABSTRACT Sound symbolism theory asserts that vocal sounds have a meaning independently of the words in which they occur. A striking example of this theory is the bouba/kiki effect which shows that non-sense words are non-randomly mapped onto unknown round or spiky shapes. While many studies have focused on the psychological reality of this phenomenon, only a few have tried to explain it. After showing that both vowels and consonants composing the pseudowords modulated the association with round or spiky shapes, we tried to assess the automaticity of this effect in speeded classification tasks. For the first time we found behavioral evidence that this effect could be due to cross-modal integration. We also report brain imaging results obtained during passive presentation of shapes, pseudowords or matching and mismatching pairs of shapes and pseudowords. We observed an increased activation for mismatching pairs of stimuli compared with matching stimuli in frontal cortices, which could reflect an incongruency effect. In addition, we found a difference of activation between matching and mismatching pairs correlated with a behavioral index of sensitivity to the effect in lateral occipital cortex. We discuss these results in the light of hypotheses concerning the bouba/kiki effect. CONTENTS I. Introduction 1. 2. 3. 4. Historic of sound symbolism The bouba/kiki effect Hypotheses concerning the bouba/kiki effect Goals of our study II. Behavioral experiments 1. 2. 3. 4. 5. Material and methods Explicit choice experiment Implicit Association Test Speeded classification experiment Speeded double classification experiment III. fMRI experiment 1. 2. 3. 4. 5. Objectives and hypotheses Material and methods Behavioral Results Imaging Results Imaging Discussion 4 4 6 6 8 9 9 10 16 20 22 25 25 25 29 32 37 IV. Conclusion 42 V. Bibliography 44 VI. Remerciements 48 VII. Annexes 49 I. Introduction 1. Historic of sound symbolism “The link existing between the signified and the signifier is arbitrary, or more simply (…) the linguistic sign is arbitrary. Thus the idea of ‘sister’ is not linked by any interior relationship with the series of sounds ‘s-i-st-e-r’ that corresponds to its signifier; it could quite as well be represented by any other signifier, which is proved by the differences between languages and the very existence of different languages.” When Saussure asserted in his “Lectures on general Linguistics” that the link between signified and signifier was arbitrary, he intended to put an end to an age-long debate over the conventional or the natural aspect of the linguistic sign. In his work “Cratylus” Plato presented a dialogue between Hermogenes, who held that “if one substituted one name for another, the latter would be as correct as the former” and Cratylus who asserted that “there was for each object a name that was its own and that belonged to it intrinsically, or by its nature”. Socrates concluded that, while admitting that in general the link between a thing and its name was arbitrary, there, nevertheless, were some noble words whose sound reflected their meaning. Since then, some psychologists have started calling into question, from an empirical point of view, the fully arbitrary nature of the link between signifier and signified, exploring the possibility that that certain sounds contain within themselves a certain meaning. The idea of a direct link between sound and meaning, known under the name of sound symbolism or phonetic symbolism, was, among others, upheld in the 19th century by Wilhelm von Humboldt. The definition of sound symbolism is not exactly the same among different authors, and this term may apply to many different phenomena. Hence, before reviewing experimental data existing on the subject, we will give a brief outline of the phenomena referred to as sound symbolism. Traditionally, phonetic symbolism comprehends all the occurrences wherein sounds produced by the human sound-producing organ express a meaning without using the system of language. The research dealing with phonetic symbolism takes on many forms according to the sounds studied, the experimental method applied or the meanings being investigated. First of all, we can define sound symbolism according to the kind of sound it deals with. Indeed, numerous studies refer to sound symbolism on very different levels, ranging from paragraphs to pure tones, with such intermediary stages as words or phonemes. Poetry, for example, uses many techniques whereby the sound is essential for the meaning: in his famous imitative alliteration, Racine wrote “Pour qui sont ces serpents qui sifflent sur vos têtes?” where the idea of snakes rests on the repetition of the sound “s”. Such an example would be considered considered as being on the fringes of sound symbolism. Onomatopoeia represent an example often cited to evidence the link between sound and sense (Nichols & Ohala, 1994). Saussure discussed this idea and considered that the differences between onomatopoeia from one language to another confirmed the arbitrariness of linguistic signs. However, within a given language, it seems clear that the origin of onomatopoeia lies in the imitation of a sound produced by animals, thus creating a kind of sound symbolism. Beyond onomatopoeia, all languages use mimetic words, whose phonetic structure is analogous to the meaning the words convey: “zigzag” in French, “pop”, or “bang” in English. Some Asian languages even have categories of words which are recognized as “sound symbolic”. The Japanese, for example, make a difference between onomatopoeia (phonomimes), phenomimes and psychomimes (Nichols & Ohala, 1994). Phenomimes and psychomimes are words describing exterior phenomena or psychological states whose phonetic structure reflects their meaning. It remains that these words follow conventional rules that enable their users to link sound and sense. Fig I.1: Examples of sound symbolic relations in japanese phenomimes. Many studies dealing with phonetic symbolism focused on mere phonemes. The use of phonemes has many advantages, in particular they enable us to avoid the confusion between what is the result of a real phenomenon of sound symbolism and what is the result of a conventional link between an existing word and its meaning known beforehand (e.g. the word “ball” would be considered round independently of sound-symbolic associations). Indeed, actual words automatically refer to the meaning that is ascribed to them and it is difficult, in those conditions, to single out, a posteriori, the elements that seem to belong to sound symbolism, whereas, if we use mere, apparently meaningless, sounds, this risk factor is more easily eliminated. Numerous experimental methods have been used in order to prove the existence of a privileged link between sound and sense. One of the most often used is cross-language generalization. Such studies were replicated among subjects from different countries and with different native languages. Subjects are presented with pairs of antonyms belonging to languages unknown to them. They are asked to guess which of the two words refers to a given meaning, e.g. which word means “dark” and which one means “light” and manage to do it at rates exceeding chance (Koriat & Levy, 2004; Brown, Black, & Horowitz, 1955; Taylor & Taylor, 1962; Huang, Pratoomraj, & Johnson, 1969). It has been shown in two recent studies (Imai & Okada, 2008; Nygaard, Cook, & Namy, 2008) that such cross-language generalization could be used in word learning: English-speaking adults and children were better at learning Japanese words that matched their Japanese meaning (e.g “hayai” (fast) learned as “fast”) compared to Japanese words that matched their Japanese antonym meaning (e.g. “hayai” (fast) learned as “slow”). Another appealing evidence of sound symbolism has been recently made by a team that showed that English nouns and verbs possessed systematic differences in phonological properties that influenced on-line processing of sentences (e.g. noun-like nouns (based on phonology) were processed faster than verb-like nouns) (Farmer, Christiansen, & Monaghan, 2006). Although these observations bring evidence to the existence of a natural link between sound and sense, they do not provide an explanation of this link. 2. The bouba/kiki effect Köhler discovered another method to study phonetic symbolism: in the island of Tenerife he showed to subjects two novel shapes, one round and one spiky, and gave them the two nonsense words “baluma” (later renamed “maluma”) and “takete” to label them. He observed that a vast majority of English-speaking adults chose maluma for the round shape and takete for the spiky shape (Köhler, 1947). This experiment illustrates a form of sound symbolism in so far as it brings to light a reproducible link between linguistically meaningless sounds and geometrical shapes. Indeed, Other researchers have used these words and shapes and have found similar results (Holland & Wertheimer, 1964; Wertheimer, 1958) sometimes with subjects speaking totally unrelated languages (Davis, 1961). More recently, Ramachandran and Hubbard (Ramachandran & Hubbard, 2001) replicated this result and found that 95% of English-speaking adults associated the round shape with the pseudoword « bouba » and the spiky shape with « kiki ». Since this study, the phenomenon has often been called the “bouba/kiki effect”. This privileged mapping seems reliable as researchers found that 2.5-year-old English-speaking children and adults significantly associated pseudowords containing rounded vowels to round shape and unrounded vowels to spiky shapes (Maurer, Pathman, & Mondloch, 2006). Figure I.2: Original drawings used in Köhler experiment. 3. Hypotheses concerning the bouba/kiki effect Köhler originally used the non-sense words “takete” and “baluma” in a written form. Thus, one of the hypotheses was that people associated shapes with pseudowords because of the geometrical resemblance between the letters and the shape. This explanation has been progressively dismissed as this effect has been found in many languages (Davis, 1961) and with spoken words (Maurer, Pathman, & Mondloch, 2006). Moreover, the existence of a bouba/kiki effect with pre-literate 2.5-year-old children cannot be explained by similarities between letters and shapes. Some researchers have tried to put forward other hypotheses to account for the bouba/kiki effect. Ramachandran and Hubbard proposed that we associate spiky shapes to “kiki” and round shapes to “bouba” because they mimic the articulatory movements we make when pronouncing these words. We call this hypothesis the “articulatory hypothesis”. Such an explanation could be related to motor theories of speech perception. Speech perception and production are often separated in schematic neurological theories of language, taking place respectively in left superior temporal and inferior frontal lobes (the so-called Wernicke’s and Broca’s areas) (Wernicke, 1874; Damasio & Geschwind, 1984; Gernsbacher & Kaschak, 2003). However, many neurobiological and psycholinguistic theories of speech consider that speech perception and production could be strongly linked. For example, the motor theory of speech perception postulates that people perceive speech sounds by identifying the articulatory movements necessary to produce them (Liberman & Mattingly, 1985). The direct realist theory of speech perception also proposes a direct link between speech perception and production (Fowler, 1986). Imaging experiments have brought evidence for these theories, with activation of left inferior frontal cortex or motor and premotor cortex while passively listening to words (Wilson, Saygin, Sereno, & Iacoboni, 2004; Hauk, Johnsrude, & Pulvermüller, 2004). One crucial point of these theories has been addressed recently: the specificity of activation for heard speech components in premotor cortex. It has been shown that passive listening to phonemes [t] and [p] activated different precentral clusters, corresponding to the ones activated to pronounce them (tongue motor area for [t] and lips motor area for [p]) (Pulvermüller, Huss, Kherif, Moscoso del Prado Martin, Hauk, & Shtyrov, 2006). As the authors of this study note “this is evidence that information about specific articulatory and motor features of speech sounds is accessed in speech perception”. More recently, researchers reported behavioral evidence that specific speech production commands are automatically and involuntarily accessed during speech perception (Yuen, Davis, Brysbaert, & Rastle, 2010). They recorded electropalatography measures of subjects pronouncing syllables beginning by [k] or [s] while listening either to the same syllables or to syllables beginning with [t] and found that this perceived syllable could modify [k] or [s] production. Subjects’ electropalatography showed traces of [t] production added to [k] or [s] production, meaning that this modification was specific of the incongruent syllable used. Thus, it seems highly probable that when hearing “kiki” or “bouba”, specific articulatory information about these words are accessed. It remains to show how these articulatory movements could relate to visual information and linking to particular shapes. A beginning of explanation could be found in work showing link between mouth articulation gestures and hand movements (Rizzolatti & Arbib, 1998). In a set of studies, Gentilucci et al (Gentilucci, Stefanini, Roy, & Santunione, 2004; Gentilucci, Campione, Dalla Volta, & Bernardis, 2009) have found that when subjects observe or execute a grasping action on objects of different size, the size of the target objects influenced the articulation of syllables pronounced simultaneously. When observing the grasp of a big solid object (power grasp), people pronounced a “bigger” /da/ (as assessed by the size of the first formant) as when observing the grasp of a small object (precision grasp). They conclude that seeing a grasping movement could specifically activate hand articulatory areas which in turn may affect speech. If we think that shapes could be processed as graspable objects, we can imagine that specific hand articulatory information about visually presented shapes could enter in conflict with specific mouth articulatory information extracted from pseudowords. This hypothesis may seem farfetched but predicts that the associations between “bouba” and round shapes could take place in motor or premotor cortex. Another theory would be that shapes and sounds are linked because of physical similarities. Indeed, we can think that the physical properties of large vs small or round vs spiky objects make them more prone to produce sounds close to “bouba” or “kiki”, based on the fact that in the real world, low-frequency sounds tend to come from larger and possibly softer objects, and vice-versa. Some studies found for example links between relative size of visual stimuli and relative pitch of sounds and showed that modulating congruency along these two dimensions could influence multisensory integration (Spence, 2007). We think that the bouba/kiki effect could be an example of multisensory integration based on physical similarity. Spence et al attribute their results to “synaesthesic congruency” but we think that this term is deceiving as it blends synaesthesia whose correspondences between senses are often unexplained by physical similarity and their results which can find naïve physical explanations (e.g. big objects produce lower frequency). Whereas several factors have been identified as having a major impact on multisensory integration (such as spatial or temporal correspondence), the role of “semantic” or “higher-order” congruence is more vague and has mostly been studied with complex visual and auditory stimuli whose association has been clearly established (Laurienti, Kraft, Maldjian, Burdette, & Wallace, 2004; or see Doehrmann & Naumer, 2008 for a review). In many cases, theses associations can be considered arbitrary (Van Atteveldt, Formisano, Goebel, & Blomert, 2004) and studies using low-level visual and auditory features as a “semantic” link are sparse. What Spence calls “synaesthesic congruency” and what Doehrmann calls “weak degree of “semantic”” is the same thing and could represent a new kind of factor playing a role in multisensory integration, explained by physical similarity. Our hypothesis is that the bouba/kiki effect could be the result of a match between low-level features of both stimuli but we do not know if this match would take place directly between unisensory areas or if it is mediated by higher-order regions. It is possible that low-level physical features of auditory and visual stimuli are both linked to the same higher-order representation, thus creating a (in)congruency effect. As a recent study showed, even a low feature such as pitch can be reproducibly mapped along a variety of dimensions such as “light-dark”, “sparse-dense” or even “granddaughter-grandma” (Eitan & Timmers, 2010). On the other hand, many recent studies have shown direct links between early sensory regions and the field of multisensory integration has begun to pay more attention to effects that could take place in these areas (Ghazanfar & Schroeder, 2006; Lemus, Hernández, Luna, Zainos, & Romo, 2010). The two latter theories are not mutually exclusive; it could be that they both participate in this word-shape mapping and we can even imagine that articulatory information about pronounced word could increase a perceptual link between sound and shape because of physical features they share with both stimuli. Indeed, low-level features of shapes can be either matched to low-level features of sounds or to low-level features of articulatory movement, as there is a clear physical link between sounds and the movements used to pronounce them. 4. Goals of our study We chose to focus on the bouba/kiki effect for different reasons. Firstly, it seems to be intuitively powerful for a great majority of people, while there is no obvious way to account for this link between pseudowords and shapes. Furthermore, this phenomenon deals only with simple abstract shapes and meaningless pseudowords, which are not burdened, at least in part, with the problems caused by actual words. Using real words to analyze this effect seems fraught with difficulty, owing to the fact that they automatically refer to their meaning. Concomitantly, the use of abstract shapes makes it possible to focus on meaning while manipulating simple concepts, like those of shape or size. Last but not least, we think that explaining the bouba/kiki effect could open the way for explanations to a variety of phenomena attributed to sound symbolism. Indeed, we think that some of the observations made by early researchers in the field could be explained either by links between sounds properties and other sensory modalities (mainly visual) or by links between speech perception, speech production and higher-order semantic processing. Understanding why non-sense words can be non-arbitrarily mapped to unknown visual objects could bring new arguments to a still active debate over the arbitrariness of the sign and the origins of language. Thus, the main goal of our study is to try to explain why people preferentially associate some non-sense words with some shapes, keeping in mind the major theories that could explain this effect. We consider the bouba/kiki effect as a form of crossmodal integration, either led by higher-order representations common to both stimuli, by articulatory links between speech production and hand movement or by shared low-level auditory and visual physical features. Consequently we think that one of the most promising approaches to disentangle these explanations could be to use brain imagery. Indeed, the three hypotheses do not predict the same pattern of brain activation during integration between non-sense words and shapes. Whereas it has been shown that semantic high-level crossmodal incongruency takes place mainly in frontal cortex, we hypothesize that the articulatory hypothesis would show activations in motor hand or mouth cortex and that direct linking between low-level features would show effects on early sensory areas. First, we decided to run an explicit choice experiment where subjects could freely associate shapes and pseudowords. This experiment allowed us to explore the mapping between pseudowords and shapes and to select stimuli that could be used in further experiments. Then, we tried to determine how those sound-shape correspondences could be best captured empirically, using a variety of explicit and implicit behavioral paradigms. Finally we moved to fMRI and tried to design experiments that could study audio-visual integration of our stimuli. II. Behavioral experiments 1. Material and methods a. Shapes material We designed shapes using the curve and freeform tools in Microsoft PowerPoint. Two sets of shapes were created, one round, with curved boundaries, and one spiky, with sharp angles. For each shape we created a big and a small version. All the shapes were filled with the same light gray color on a black background. Within each group of size (big and small) we adjusted size with Matlab in order to have exactly the same luminance for each shape. Small shapes all fit in a rectangle of 80 x 60 pixels and big shapes in a rectangle of 250 x 190 pixels. Fig II.1: Examples of spiky and round shapes with their two sizes. b. Pseudowords material Auditory stimuli were recorded in mono at 16-bits and 44.1 kHz sampling rate, using a Shure SM58 microphone and a Marantz PMD670 recorder, in a soundproof room. The auditory stimuli were composed of more than eighty pseudowords, all with a CVCV (consonantvowel-consonant-vowel) phonological structure. These pseudowords can be divided in 6 groups depending on their type of vowels and consonants. Mean word duration differed depending on the component consonants and vowels. The choice of pseudowords used in each experiment will be described later. c. Stimuli presentation Participants were seated in a room approximately 50 cm away from a screen displaying the centrally presented shape. Auditory stimuli were presented with a Sennheiser HD-448 headphone. Shapes were displayed in light gray on a black background on a flat screen at a resolution of 640 x 480; they occupied approximately 17.2 x 11.5 degrees of the visual field for big shapes and 10.3 x 6.9 degrees for small shapes. The experiments were designed using E-Prime 2.0 and the participants answered with an Eprime SRbox placed in front of the screen. The room remained silent during the experiment and the experimenter went out of the participants’ sight during the task. d. Subjects Sixty-four right-handed native French speakers with normal hearing and normal or correctedto-normal vision took part in the different experiments. Their handedness was assessed by Edinburgh Handedness Inventory (mean score: +69, standard deviation: 25). The mean age of the participants was 27 years (range 19 – 49 years); thirty-nine were women and twenty-five men. They all signed an informed consent form and received 10 Euros for their participation. 2. Explicit choice experiment a. Aim The aim of this experiment was to confirm that some pseudowords were more likely to be linked to particular shapes and sizes. This has been shown in some studies (Sapir, 1929; Köhler, 1947), but often anecdotally. We also wanted to study the respective role of vowels and consonants in this preferential linkage, both probably playing an important role in phonetic symbolism (Klink, 2000). Moreover, we needed this experiment as a reference to select pseudowords for which we confirmed the association with spiky or round shapes or with big or small shapes. Indeed, we think that it is essential for our other experiments to choose pseudowords that we tested rather than relying on intuition, one of the flaws of previous studies. b. Procedure In this experiment, subjects heard one pseudoword then saw two shapes, and had to choose which shape was the most adapted to the word they just heard. They had to push the right button to choose the right shape and the left to choose the left shape. They were explicitly told that there were no rules or “good answers” but that they had to answer intuitively. They had no time limit to answer. Fig II.2.1: Procedure used in the explicit choice experiment. Subjects heard a pseudo-word and saw two shapes and had all their time to choose which shape they associated to this pseudo-word. Four groups of shapes could be presented: big round, small round, big spiky, small spiky and the subjects never had to choose between two shapes of the same group (e.g. two small spiky shapes). The pseudowords they heard could also be separated into four groups defined by their vowels and consonants. In order to maximize a potential effect, we chose vowels opposed in the front/back axis and in their roundedness, two dimensions that could play a role in shape and size choice. The front/back axis depends on the position of the tongue in the mouth, back vowels are pronounced with the tongue at the back of the mouth and front vowels with the tongue at the front. Vowels can be rounded or unrounded, meaning that the lips form a circle to pronounce it or not. We chose two of the most unrounded front vowels: [i] and [e] and two of the most rounded back vowels: [u] and [o]. Our hypothesis, based on the literature, was that pseudowords containing [u] and [o] vowels would be “bigger” (Sapir, 1929) and “rounder (Maurer, Pathman, & Mondloch, 2006; French, 1977) than pseudowords containing [i] and [e] vowels, which would be “smaller” and “more spiky”. To study the role of consonants, we chose to oppose voiced continuant consonants to voiceless stop consonants, thinking that these two groups were the most adequate to bias the choice of size and shape. The three unvoiced stop consonants were [p], [k] and [t], and the three voiced continuants were [l], [m] and [ʒ]. We hypothesized that [l], [m] and [ʒ] are “rounder” and “bigger” than [p], [k] and [t]. We chose two vowels and three consonants of each kind to minimize our chance of finding an idiosyncratic “single sound” effect, which could be due to multiple other factors than phonological properties of the sound such as letter form or semantic bias. Crossing these two factors, we had four types of pseudowords: voiced continuants with back rounded vowels (eg “loju”), voiced continuants with front unrounded vowels (eg “leji”), unvoiced stop consonants with back rounded vowels (eg “kopu”) and unvoiced stop consonants with front unrounded vowels (eg “kipe”). We hypothesized that the “loju” group would be associated to big and round shapes and the “kipe” group to small and spiky shapes but we did not know with what kind of shapes the two intermediate groups, “leji” and “kopu” groups, would be associated. Fig II.2.2: Vowels used in the experiment. Two classes of vowels are separated: unrounded front vowels, [i] and [e], and rounded back vowels, [u] and [o]. 16 subjects participated in the experiment; two subjects were excluded because they did not perform the task correctly: one thought he had to choose rounded shapes for [o] and [u] sounds and one tried to choose according to intonation. To analyze the results, we entered mean percentage of round shape over spiky shape choice (and big size over small size choice) for each class of pseudowords in two way ANOVAs with two intra-subjects factors (vowel type and consonant type) and one random factor (subjects). The vowel factor had two levels: front and back, and the consonant factor: stop and continuant. In shape choice analysis we only took into account trials where the size of the two items were the same. In size choice analysis we only took into account trials where the shapes of the two items were the same, c. Results 100 % 75 % Front vowels and unvoiced consonants Back vowels and unvoiced consonants Front vowels and voiced consonants Back vowels and voiced consonants "Lujo" "Lije" 25 % 50 % "Toku" "Keti" 0% Choice of round shape over spiky shape (%) Shape choice 75 % Front vowels and unvoiced consonants Back vowels and unvoiced consonants Front vowels and voiced consonants Back vowels and voiced consonants JomoMuju 50 % Koto Lele Meje Meji Leli Lije Liji Jomu Lojo Mujo Loju Kotu Poko Toku PokuToko 25 % Peki Keti Kipi Kipe Kete Peke 0% Choice of round shape over spiky shape (%) 100 % Fig II.2.3: Mean percentage of round shape over spiky shape choice for the four classes of words we used. Error bars are standard error of the mean (SEM). Fig II.2.4: Mean percentage of round shape over spiky shape choice for the twenty-four words we used. The words are sorted out by percentage of choice. Colors correspond to the four classes of words described in the legend. We observed a main effect of vowels (F(1.13)=15.07, p<0.002) with back rounded vowels being associated to round shapes and front unrounded vowels to spiky shapes. We also observed a main effect of consonants (F(1.13)=15.77, p<0.002), voiced continuants being associated to round shapes and unvoiced stop consonants to spiky shapes. We did not find a significant interaction between vowels and consonants. These results confirmed our hypothesis, front unrounded vowels and stop consonants being linked to spiky shapes and back rounded vowels and continuant consonants to round shapes. Crossing these two factors showed us that pseudowords composed of continuant consonants and open unrounded vowels were considered rounder than pseudowords composed of stop consonants and back rounded vowels. Thus, in this experiment, the effect of consonant type was larger than the effect of vowel type on the choice of a round vs a spiky shape. In figure II.2.4 we plotted the percentage of round shapes over spiky shapes for every pseudoword we used. Pseudowords ranked perfectly according to their class, showing the robustness of this effect across items. 75 % "Lujo" "Toku" "Keti" "Lije" 25 % 50 % Front vowels and unvoiced consonants Back vowels and unvoiced consonants Front vowels and voiced consonants Back vowels and voiced consonants 0% Choice of big shape over small shape (%) 100 % Size choice Fig II.2.5 : Mean percentage of big shape over small shape choice for the four classes of words used in the first version of the experiment. Error bars are SEM. We observed a main effect of vowels (F(1.13)=10.35, p<0.01) with back rounded vowels being associated to big shapes and front unrounded vowels to small shapes. We did not find an effect of consonants for size choice (F(1.13)=0.30, p=0.6) We did not find any interaction between vowels and consonants. 75 % 50 % 25 % 0% Choice of round shape over spiky shape (%) 100 % d. Variability across subjects "Teki" "Toku" "Lije" "Lujo" Fig II.2.6: Percentage of choice of round shapes over spiky shapes for the four classes of words used in the first version of the experiment. Each line represents a different subject. People are divided into three groups depending on their sensitivity to the effect; people mainly affected by vowels are in red, by consonants in blue, and globally not affected in black. One of the main observations that we made during these experiments was that there was substantial inter-subject variability. To illustrate this point we plotted the percentage of shape choice of the 14 subjects taking part in the first version of the experiment for the four classes of words. On this plot we can identify different response profiles, some people being more sensitive to the vowel effect (red lines), others to the consonant effect (blue lines) and others not really sensitive to any effect (black lines). e. Discussion / Summary The role of vowel, both in shape and size choice, could either be due to backness or to roundedness as we confounded these factors. To further study the role of vowels, it would be necessary to separate these two factors by recording other pseudowords containing unrounded back vowels and rounded front vowels. It would also be interesting to compare the factors playing a role in shape and in size choice. Although consonants played a role in shape choice, either by voicing or manner of articulation, they did not influence size choice. The fact that consonants do not modify size choice seems reliable as we replicated it in another experiment with other words and 21 new subjects (data not shown). These results could be explained in the framework of the “articulatory” hypothesis, in which words are associated to shapes and sizes depending on the articulatory movement or the shape the mouth takes to pronounce them. Indeed, the vowels [u] and [o] are pronounced with our mouth forming a circle, thus opening the mouth more, whereas to make the sound [i] we have taut lips and the mouth almost closed. These two factors, the form of the mouth and its openness could respectively influence shape and size choice. For consonants the airflow and articulatory movement are totally different between stops and continuants. While continuant movement is rather soft and continuous evoking the soft contour of round shapes, stop consonants are produced by stopping the airflow and then releasing a burst of air evoking the sharp angles of spiky shapes. On the other hand, both continuant and stop consonants can be pronounced without modifying mouth openness, maybe explaining the lack of effect on size choice. Our results confirmed the work done on associations between pseudowords and size or shape and extended it. To our knowledge, this is the first study separating the role of vowels and consonants in the bouba/kiki effect. Last but not least, this experiment allowed us to select pseudowords that we could use in other paradigms knowing their degree of association with round or spiky shapes. 3. Implicit Association Test a. Description We switched to another kind of experiment to look for a bouba/kiki effect without requiring subjects to make any explicit decision relative to sound-shape associations. We used the implicit association test (IAT) (Greenwald, McGhee, & Schwartz, 1998), a method which is widely used in social psychology to observe effects that are not found by explicit questioning, for example to assess racial prejudices which subjects would not overtly confess (Phelps, et al., 2000). The Implicit Association Test (IAT) provides a measure of the associations between two pairs of contrasted concepts (e.g. flower vs insect names and pleasant vs unpleasant words). Subjects have to categorize each stimulus into one of the four categories defined by the two pairs of concepts. In our experiment the two pairs of concepts were round shape-spiky shape and “o” pseudoword-“i” pseudoword. Subjects had to categorize words or shapes according to these categories. The critical trick of the IAT is that responses will be faster and more accurate when concepts that are strongly associated share the same behavioral response, e.g. if responses to insects and to unpleasant words should be produced using the same hand. Thus, this experiment allowed us to test, without requiring any explicit choice, the association between sounds and shapes. We chose to focus on shape dimension because it is the one showing the most powerful effect in the explicit choice task. b. Procedure We followed the methodological advice of Greenwald and chose few items per category, picking those most representative of the concept (Nosek, Greenwald, & Banaji, 2003). We used 4 pseudowords from two classes we used previously: voiced continuants with two [o] vowels (e.g. “Lojo”) and unvoiced stops with two [i] vowels (e.g. “Kipi”). “Lojo”-like pseudowords were strongly associated to round shapes in the previous explicit choice experiment, whereas “Kipi”-like pseudowords were associated to spiky shapes. Eight big round and eight big spiky shapes were also chosen from the previous experiment. Subjects used their right or left hand to categorize the stimuli and had all their time to answer. Following the “standard” IAT methodology (Nosek, Greenwald, & Banaji, 2003), the test was divided in 5 blocks: -Block 1: Shape classification: subjects had to classify shapes as round or spiky (25 trials), - Block 2: Word classification: subjects had to classify pseudowords as belonging to the “o” or to the “i” category (25 trials) - Block 3: Mixed classification: subjects either saw a shape or heard a pseudoword, and had to classify them following the same instructions as in Blocks 1 and 2 (65 trials). Such mixed blocks could be either “congruent” or “incongruent”, depending on whether associated sounds and shapes were to be answered with the same hand. In congruent blocks, one hand was used for round shapes and “o” words and the other hand for spiky shapes and “i” words. In incongruent blocks, one hand was used for round shapes and “i” words and the other for spiky shapes and “o” words. - Block 4: Inverted shape classification: subjects had again to classify only shapes, but with inverted instructions relative to Blocks 1 and 3, e.g. if round shapes were previously associated to a left hand response, they were now associated to the right hand. (45 trials) - Block 5: Mixed classification 2: same as Block 3, except that subjects had to classify shapes using the new instructions for which they were trained in Block 4 (65 trials) This procedure allowed us to collect data from every subject in both congruent and incongruent conditions. To avoid, even within subject, confounding the congruent/incongruent distinction with serial order, the 5 blocks were run twice in each subject. If in the first run of 5 blocks Block 3 was congruent and Block 5 incongruent, then in the second run Block 3 was incongruent and Block 5 congruent, and conversely. Half of the subjects received the CIIC sequence and the other half the ICCI sequence. Hands of response were also counterbalanced across subjects. Instructions appeared at the top of the screen before the beginning of each block and remained visible during the first 5 trials. Shape instructions appeared as two small shapes, one in each top corner; pseudoword instructions appeared as an “I” and an “O” one in each top corner. When the subjects made an error, they saw the word “error” and the instructions were again displayed for 1 second. 24 subjects participated in this experiment. Following the usual IAT procedure, we only analyzed “mixed classification” blocks, the other blocks being used only to familiarize subjects with the instructions. We entered median reaction times for correct responses in a two way ANOVA with two intra-subjects factors (congruency and modality (shapes or pseudowords)) and one random factor (subjects). 6 Congruent Incongruent Incongruent 4 3 0 0 1 2 400 600 Mean error rates (%) 5 Congruent 200 Median reaction time (ms) 800 1000 c. Results Shapes Pseudowords Shapes Pseudowords Fig II.3.1: A. Median reactions times of correct trials to shapes and pseudowords trials in mixed classification congruent and incongruent blocks. Error bars are SEM. B. Same figure with mean error rates. We found a main effect of congruency, both on reaction times (F(1.23)=12.02, p=0.002) and error rates (F(1.23=10.782, p=0.003). We also found a main effect of modality on reaction times (F(1.23)=278.2, p<10-13) . Subjects were quicker to categorize shapes than pseudowords. The main effect of congruency shows that it is harder to categorize shapes and pseudowords when hands are associated to sounds and pseudowords which are not matching according to phonetic symbolism. d. Variability across subjects -400 -200 Effect size (ms) 0 200 400 600 Effect size by subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Subjects Fig II.8: Difference between incongruent and congruent blocks median reaction times by subject. We looked at the difference in median reaction times between incongruent blocks and congruent blocks, which represents the strength of the association between round shapes and “lujo” pseudowords and between spiky shapes and “kipi” pseudowords. Like in the previous (explicit) task, the strength of the association differed substantially between subjects. Three subjects even had a negative effect, which could mean that they are sensitive to the opposite associations. e. Discussion The IAT is a tool used in social cognition studies to assess the strength of association between concepts belonging to two different categories (shapes-words). Our results show that the associations we study exist for the majority of our subjects but the question concerning the implicit nature of these associations remains. First the word implicit may not be adapted here as most of our subjects explicitly associate “lojo”-like pseudowords with round shapes and “kipi”-like pseudowords to spiky shapes. Consequently we prefer to use the word automatic. The IAT seems to show that the links we study are automatic and pervasive but does not give us information about the possible crossmodal integration between shapes and pseudowords. Indeed, subjects alternatively saw a shape or a word and have to categorize them. Such an experimental design does not allow them to directly integrate shapes and sounds (Doehrmann & Naumer, 2008). What we are manipulating in this experiment are the concepts of round shapes or “lojo” pseudowords rather than the actual physical stimuli. 4. Speeded classification experiment a. Procedure In the first experiment, we observed a sound-shape matching effect, in a task explicitly involving both visual and auditory stimuli, and moreover requiring an explicit crossmodal matching decision. In the second experiment, the same effect prevailed, in a task also involving both visual and auditory stimuli, but with no explicit crossmodal matching decision required. Going one step further, we wondered whether the matching effect would still be present in a more automatic setting. Would the effect survive even when one modality is entirely irrelevant to the experimental task? We used a speeded categorization paradigm, in which participants had to classify pseudowords, while task-irrelevant shapes, round or spiky, were presented simultaneously. Subjects listened to C-V-C-V pseudowords and had to choose if they contained the vowel [i] or [o]. They had 2000 ms to respond by pushing the right or the left buttons. Pseudowords belonged to 4 different groups, composed of [i], [e], [o], [u] vowels and [p], [k], [t], [l], [m], [ʒ] consonants. The discriminating vowel, [i] or [o] was always placed at the second vowel position in our pseudowords and the shape appeared at the end of the first syllable. The first vowel was either [u] or [e] and the three first letters of the word did not allow subjects to predict the end, for example, “Ket” could be followed by either [o] or [i]. Consequently we had 4 classes of words: “Keti”-like pseudo words, “Muji”-like pseudo words, “Keto”-like pseudo words, “Lujo”-like pseudo words. Thus, subjects could not choose before hearing the whole word and seeing the shape. The association between sounds and buttons was counterbalanced across subjects. Sixteen subjects participated in the experiment. We chose these pseudowords because “Lujo”-like pseudo words and “Keti”-like pseudowords are the two most efficient classes to elicit a sound-symbolic effect in explicit tasks. The two other classes of pseudowords are only used to ensure that subjects could not respond as soon as they heard the first part of the word. Thus, we decided not to analyze these two classes of pseudowords, in which vowel effect and consonant effect act in opposite directions. However, as the task consisted in deciding whether the word contained “o” or “i”, it put an emphasis on these vowel and it is highly plausible that the association is made between the shape and these vowel. We chose to manipulate shape and not size because shape yielded the clearest effect in explicit tasks. Our hypothesis was that a matching shape, that is to say a shape linked to the pseudoword in the sense of the effect observed in explicit tasks, would decrease the reaction time and the error rate for the word and a mismatching shape would increase them (Noppeney, Josephs, Hocking, Price, & Friston, 2008; Doehrmann & Naumer, 2008). For example, according to our hypothesis, answering [o] to trials where subjects hear “lojo” and see a spiky shape would be longer and more difficult than the same trials with a round shape. To analyze our data, we entered mean reaction times to our two word groups of interest (“Lujo” and “Keti”) with each type of shape in a two way ANOVA with two intra-subjects factors: shape type and word type, and one random factor (subjects). We used mean and not median, as each answer was limited to only 2000 ms, preventing outliers effect. 950 b. Results 600 650 700 RT (ms) 750 800 850 900 Sound and shape match Sound and shape mismatch Spiky Round "Keti" Spiky Round "Lujo" Fig II.4: Each bar represents the mean reaction time of correct answers to the type of trial described beneath. There are 4 types of trials: our two word groups of interest accompanied by a spiky or a round shape. Error bars are SEM. We found a main effect of word in reaction times (F(1.15)=77.67, p<10-6) explained by the fact that continuant consonants are typically longer than stop consonants. This factor should not have any influence on the comparison between matching and mismatching stimuli as they are composed of the same number of “Keti”-like and “Lujo”-like pseudowords. We did not find any interaction between shape type and pseudoword type reaction times (F(1.15)=1.74, p=0.2) and error rates (F(1.15)=1.06, p=0.3), meaning that there was no matching effect. c. Discussion In this experiment we tried to find an interference prompted by the shape presented at the same time as the word and failed to find any significant results. It may be due to the fact that our subjects did not pay enough attention to the presented shape. Some studies suggest that both components of an audiovisual stimulus have to be attended to for crossmodal integration to occur (Alsius, Navarra, Campbell, & Soto-Faraco, 2005; Degerman, et al., 2007). In our experiments, subjects did not have to take account of the shape to perform the task they were required to do and may have ignored the shape completely. Others have shown incongruency effects due to task-irrelevant stimuli in another modality (Noppeney, Josephs, Hocking, Price, & Friston, 2008; Laurienti, Kraft, Maldjian, Burdette, & Wallace, 2004) but only with strongly linked audio-visual stimuli (such as animal picture and vocalization or blue patch and word “blue”). Spence et al found an incongruency effect on multisensory integration with stimuli that only shared a “physical” link (auditory pitch and visual size) but they used a relative temporal order task, asking subjects which of the two stimuli appeared first. This design allowed them to make sure that subjects attended both visual and auditory stimulus. Thus, working with stimuli that share a rather subtle link, we thought that we had to find ways to force subjects to attend to the presented shape in order to find possible inconsistency effects. 5. Speeded double classification experiment a. Procedure In this experiment, a word and a shape were presented simultaneously for 600 ms and subjects had to perform a task both on the word and on the shape. First, they had to classify the word based on the vowel it contained (“i” or “o”), and then they had to classify the shape (round or spiky). They used their left or right hand to classify the stimuli, thus, as in the IAT, one hand was associated to two responses, one for the shape and one for the word. Subjects had 2000 ms to classify the word and 2000 ms to classify the shape. There were two types of blocks: congruent, where one hand is associated to “o” and round shape and the other to “i” and spiky, and incongruent, where one hand is associated to “o” and spiky shape and the other to “i” and round. This design is almost similar to the IAT, except that you have the two types of stimuli and the two types of response at each trial (in the IAT, subjects saw either a shape or a word). The fact that the shape and the pseudoword are presented simultaneously permits us to study the integration between the two more closely. Indeed, it allows us to separate two effects: an effect due to the association between type of hand response and pair of stimuli that we call congruency effect and an effect simply due to the association between stimuli, that we call matching effect. As a result we could compare congruent versus incongruent blocks or matching versus mismatching stimuli. Note that we can find both matching and mismatching pairs of stimuli in each kind of block. 18 subjects participated to this experiment. Half of the subjects had a congruent block of 96 trials, then an incongruent of the same number of trials and the other half began by the incongruent block. Hands of response were randomized across subjects. We used unvoiced stops, voiced continuants, [o], [u], [i] and [e] vowels. As was the case in the second experiment, we did not want subjects to be able to determine if the word contained “o” or “i” as soon as the word began, so we only placed “o” and “i” vowels at the end of the words. The first vowel was either [u] or [e] and the three first letters of the word could not enable the subjects to predict the end, for example, “ket” could be followed by either [o] or [i]. Consequently we had 4 classes of words: “Keto”, “Luji”, “Keti”, “Lujo”. As we noted earlier, the last two classes are the easier to interpret, given that their two vowels and consonants are associated to the same kind of shape and so we focused on these. First, we hypothesized that subjects would be quicker and more accurate in congruent blocks, following the IAT congruency effect. We also hypothesized that they would be quicker to respond to matching pairs than to mismatching pairs, an effect we could not identify in standard IAT. We wanted to analyze reaction times and error rates separately for word and shape responses. We used a two-way ANOVA with two intra-subjects factors (matching and congruency) and one random factor (subjects). This ANOVA was conducted for word and shape mean reaction times and for word and shape mean error rates. We used mean and not median, as each answer was limited to only 2000 ms, preventing outliers effect. b. Results 600 Congruent Incongruent 400 300 100 200 Reaction time (ms) 600 400 0 0 Match Mismatch Same Hand Different Hand Match Mismatch Different Hand Same Hand Match Mismatch Same Hand Different Hand Match Mismatch Different Hand Same Hand D C Word Error Rates 20 20 Shape Error Rates Congruent Incongruent Congruent Incongruent 5 10 Error rates (%) 15 15 10 5 0 0 Error rates (%) Shape Reaction Time 500 Congruent Incongruent 800 1000 1200 1400 B Word Reaction Time 200 Reaction time (ms) A Match Mismatch Same Hand Different Hand Match Mismatch Different Hand Same Hand Match Mismatch Same Hand Different Hand Match Mismatch Different Hand Same Hand Fig II.5: A. Mean reaction times to pseudowords. B. Mean reaction times to shapes. C. Mean error rates of pseudoword responses. D. Mean error rates of shape responses. We separated matching (Spiky+”Keti” or Round+”Lujo” and mismatching (Spiky+”Lujo” or Round+”Keti”) trials and the congruent (in white) and incongruent blocks (in grey). We also indicated the pair of responses associated to the type of trials, same hand meaning that subjects had to answer two times with the same hand and different hand that they had to alternate hands to answer correctly. Error bars are SEM. We found a significant effect of congruency for reaction times to pseudowords (F(1.17)=12.72, p=0.002) and for error rates to pseudowords(F(1.17)=4.69, p<0.05) and shapes (F(1.17)=12.34, p=0.003). It shows that subjects are quicker and more accurate in blocks where round shapes and “lujo”like pseudowords and spiky shapes and “keti”-like pseudowords are associated to the same hand. This observation is valid for word and shape responses, and is reflected on reaction times and error rates. We observed a significant main effect for matching in pseudowords reaction times (F(1.17)=8.07, p=0.01) and error rates (F(1.17)=5.24, p=0.03). This shows that subjects are globally, that is to say considering the two types of blocks, quicker and more accurate to answer to matching stimuli than to mismatching stimuli. The interaction between matching and congruency for pseudowords reaction times (F(1.17)=6.75, p=0.02) was significant. This interaction reflects the fact that subjects are quicker to classify words accompanied by a matching shape in congruent blocks and pseudowords accompanied by a mismatching shape in incongruent blocks. c. Discussion The congruency effect we found reminded us of the one found in the IAT. Subjects associate more easily to the same hand round shapes and “Lujo” pseudowords, but it does not mean that the two are perceptually linked. The matching effect that we observe is maybe more interesting. The fact that it is only present for word answer is not totally unexpected, as this is a more subtle effect than congruency and shape response is shorter and more homogenous across types of trials than word response. In fact, given the design of our experiment, it is likely that subjects directly decide for a pair of responses rather than answering to word and then recalling the shape. The fact that the mental effort takes place before giving the first answer could explain why shape reaction times are almost 4 times shorter than pseudowords reaction times. Before trying to explain this effect, we also have to take into account the interaction we found for word reaction times, which shows that people are better for matching stimuli in congruent blocks and mismatching stimuli in incongruent blocks. In fact, this interaction is confounded with hand effect: in congruent sessions matching stimuli entail two responses with the same hand, as mismatching stimuli do in incongruent sessions. Inversely, matching stimuli in incongruent sessions and mismatching stimuli in congruent sessions entail responses with different hands. The way we understand that is the following: globally, subjects are more quick and accurate for word response to matching stimuli, they are also quicker answering with the same hand than with different hands. In congruent blocks these two effects go in the same direction whereas in incongruent block they go against each other. Thus, graphically, the main effect of matching on word reaction time can be apprehended as mean RT(congruent mismatch - congruent match) - mean RT(incongruent match incongruent mismatch) (± SEM) = 81.9 ms (±28.8). This measure is fully counterbalanced in term of shapes and pseudowords used, and shows that matching stimuli directly facilitated behavioral response. We think that this matching effect reflects a crossmodal integration of matching shapes and pseudowords. Indeed, interference effects shown by increased reaction times or error rates during audio-visual stimuli processing are often interpreted as a marker of multi-sensory integration (Doehrmann & Naumer, 2008). Such effects have been found with semantically linked audio-visual stimuli (Laurienti, Kraft, Maldjian, Burdette, & Wallace, 2004) or with physically linked audio-visual stimuli (Spence, 2007) but it is the first time that they are observed with abstract sound symbolic links like the one associating round shapes to “lujo”-like pseudowords and spiky shapes to “kipi”-like pseudowords. This confirms that the combined presentation of audio-visual stimuli which are congruent even on a subtle level can significantly modulate perceptual decision processes and thereby behavioral performance, a conclusion already advanced by Doehrmann. III. fMRI experiment 1. Objectives and hypotheses Our behavioral studies allowed us to conclude that people explicitly and automatically associated round shapes to some pseudo words and spiky shapes to others. The factors we identified as important for this choice were vowel backness and roundedness and consonants’ manner of articulation and voicing. We showed that this association might take place at different levels, from the perceptual level for the matching effect to the conceptual level for the congruency effect. These results only gave us hypotheses to explain this sound symbolic phenomenon, and as we wanted to dissociate furthermore these explanations we conducted this fMRI study. Indeed, the areas in the brain where integration between shapes and pseudowords takes place could give us important insight into the kind of link implicated (Beauchamp, 2005). In the first experiment, subjects passively attended to unimodal stimuli (shapes or pseudo words), or to bimodal stimuli associating a shape and a pseudoword. Bimodal stimuli were either matching or mismatching, based on our previous experiments. This design allowed us to study automatic associations, as subjects were totally naïve concerning our goals and were performing a task independent from the actual stimuli, which we presented as the main objective of the study. As the behavioral experiments essentially showed no crossmodal matching effects when subjects did not actively attend to both stimuli, we also conducted a second experiment very close to the single trial IAT. The behavioral results of this experiment are presented but we do not show here the imaging results as we lacked time to analyze them. 2. Material and methods a. Subjects Eighteen native right-handed French speakers, 19 to 35 years old (9 men, mean age: 23 years) participated in the present fMRI study. None of the subjects had participated in the behavioral experiments and they had no history of neurological or psychiatric disease. Their vision was normal or corrected to normal. Their handedness was assessed by Edinburgh Handedness Inventory (mean score: +73, standard deviation: 19). The project was approved by the regional ethical committee and the subjects gave their written informed consent before the experiment. Each subject participated in two different experiments in the MRI for a total scanning duration of about 1 hour. b. fMRI Acquisition parameters We used a 3-Tesla MRI (Siemens Trio TIM) with a 12-channel head coil, and a gradient-echo planar imaging sequence sensitive to brain oxygen-level dependant (BOLD) contrast (40 contiguous axial slices, acquired using ascending interleaved sequence, 3 mm thickness; TR = 2400 ms; Flip angle = 90°, TE = 30 ms, in-plane resolution = 3 × 3 mm, matrix = 64 × 64). For each acquisition, the first 4 volumes were discarded to reach equilibrium. T1-weighted images were also acquired for anatomical localization. 301 functional volumes were acquired for the first experiment and 4 sessions of 141 functional volumes for the second experiment. c. Experimental setup Auditory and visual stimuli were presented using E-Prime 2.0 (Psychology Software Tools) running on a Windows PC. Auditory stimuli were presented via high-fidelity stereo headphones (MR Confon) at approximately 80dB and visual stimuli were projected onto a translucid screen, 640 pixels wide (VGA mode), subtending 42.5 cm width and viewed through a mirror from a distance of 125 cm, for an overall angular size of 19.3 degrees. The subjects performed the behavioral task using a MR-compatible button device connected to the stimulation computer. d. Passive experiment Material In this experiment, we used two classes of pseudowords picked from behavioral experiments: words with voiced continuant consonants [l], [m], [ʒ] and back vowels [u], [o] (e.g. “loju”) that are associated to round shapes in our behavioral experiments and words with unvoiced stop consonants [p], [k], [t] and front vowels [i], [e] (e.g. “kipe”) that are associated to spiky shapes. We chose these words because they were the ones showing the biggest effects in our behavioral experiments. We also picked 18 big spiky shapes and 18 big round shapes; the shapes were presented in light gray on black background. Procedure Subjects had to fixate the screen while 8 different types of blocks randomly followed one another: “lujo”-like pseudo words, “kipi”-like pseudo words, round shapes, spiky shapes, matching pairs of round shapes and “lujo”-like pseudo words, matching pairs of spiky shapes and “kipi”-like pseudo words, mismatching pairs of round shapes and “kipi”-like pseudo words, mismatching pairs of spiky shapes and “lujo”-like pseudo words. Each type of block included 10 trials, and was repeated 10 times with random stimuli during the whole experiment. One trial was composed of a central point for 100 ms and the stimuli for 600 ms. 20 resting blocks of the same duration as activation blocks were randomly interspersed in the experiment. Two types of distractors, a cross or a “bip” appeared randomly at a frequency of 1/20, replacing a stimulus. The subjects had to push a button with their right hand when they saw a cross or heard a “bip”. In the visual blocks, the distractor was always a cross, in the auditory blocks always a “bip”, and in mixed blocks either a cross or a “bip”. This allowed us to control subject’s concentration and attention. We did not tell subjects that we worked on sound-shape association and told them that they had to detect crosses and “bips” as quickly and accurately as possible. e. Active experiment Material We used the same classes of words as in the passive experiment but only chose words containing at least one “o” or one “i”. The shapes were the same as in the passive experiment. Procedure This experiment was separated in 4 sessions, each session beginning by a short training. The training was composed of 10 trials and a session of 80 trials. Then the total number of trials, training excluded, was 320. At each trial subjects heard a pseudo word and saw a shape at the same time and had to classify the word and the shape. First, they had to decide if the word contained the sound “o” or “i” (hard or soft pseudo words), then they had to decide if the shape accompanying the word was round or spiky. To classify these two stimuli they had to use their right or left hand. The word response was prompted by a white loudspeaker on black background and the shape response by a white eye. The stimuli appeared for 600 ms and subjects had 1500 ms to give the word answer and then 1500 ms to give the shape answer. During training, subjects had all their time to answer. As soon as the stimuli disappeared the loudspeaker appeared and as soon as the first answer was given the eye prompting the shape response appeared. Between trials, subjects saw a central cross for a variable amount of time, depending on the time they took to answer (3 seconds - time to answer to word and shape). As such, trials all lasted the same time, whatever the rapidity of the subject, preventing them to answer too quickly to finish the test earlier. Resting trials where subjects only saw the white cross were disseminated in the blocks at a frequency of 1/5, their duration was the same as stimuli trials. The instructions changed across the sessions and we could separate congruent sessions, where subjects had to answer for round shape and soft pseudo words with the same hand, and incongruent sessions, where subjects had to answer for spiky shapes and soft pseudo words with the same hand. In each type of session, matching and mismatching pairs were presented. For example, if a subject saw a spiky shape and heard at the same time “lujo” he had to answer with different hands in a congruent session and with the same hand in an incongruent session. For half the subjects the order of the sessions was congruent-incongruentincongruent-congruent (CIIC) and for the other half incongruent-congruent-congruentincongruent (ICCI). The hands used for the sounds (e.g left: o, right: i) stayed the same during the whole experiment and the hands to answer for shapes were inversed between congruent and incongruent sessions. The hands used for “o” and “i” were randomized across subjects. We had consequently 4 groups of subjects: left hand for “o” and CIIC order of sessions, left hand for “o” and ICCI order of sessions, right hand for “o” and CIIC order for sessions, right hand for “o” and ICCI order for sessions. Figure III.2: Description of the procedure used in the active fMRI experiment. Subjects saw a shape and heard a word during 600 ms and had to decide first if the word contained “i” or “o” and then if the shape was round or spiky both times by pushing a button with their right or left hand. There were two types of blocks, one where the same hand is used for “o” and round that we call “congruent” and another where the same hand is used for “o” and spiky that we call “incongruent”. f. Assessment of synaesthesia To quantify a putative link between synaesthesia and sound symbolism, we decided to give to our subjects a “synaesthesia form” at the end of the experiment. Our goal was to try to quantify subjects’ synaesthesic tendency. We did not want to find real synesthetes but to see if some subjects were more prone than others to have synaesthesic feelings. Subjects had to answer “not at all” “a bit ” or “a lot” to 18 questions such as “do you experience colored feelings when you hear spoken word”, “Do you experience personalities for numbers » or “Do you experience numbers, days of the week, or months of the year in a spatial arrangement (i.e. a circle, ellipse, oval)” that covered a range of reported synaesthesic experiences. The score was calculated adding 1 point for each “a bit” answer and 2 points for each “a lot” answer. g. fMRI Statistical analysis Individual data processing, performed with SPM8 software, included corrections for EPI distortion, slice acquisition time, and motion; normalization to the MNI anatomical template; Gaussian smoothing (5 mm FWHM); and fitting with a linear combination of functions derived by convolving the time series of the stimulus categories with the standard hemodynamic response function implemented in the SPM8 software (a combination of 2 gamma functions, with a rise peaking around 6 s followed by a longer undershoot), without including in the model the temporal derivatives of those functions. Individual contrast images were computed for each stimulus type minus baseline, then smoothed (5 mm FWHM), and eventually entered in an ANOVA for random effect group analysis. Note that the histograms represent values of the very same smoothed contrast images at selected voxels, averaged across subjects. Unless stated otherwise we used a voxelwise threshold of p<0.001, with a threshold for cluster extent of p<0.05 corrected for multiple comparisons across the whole brain. We also entered individual images for contrasts of interest in models of linear regression with behavioral scores as regressors. These models show voxels whose activation in a given contrast is correlated across subjects with behavioral scores (e.g. voxels whose difference of activation between matching and mismatching stimuli is larger in subjects with higher synaesthesia score). Activation images are overlaid on the average of normalized structural T1 images. The coordinates we report are Montreal Neurological Institute (MNI) standard coordinates. 3. Behavioral Results a. Synaesthesia Subjects had a mean score of synaesthesia (± standard deviation) of 5.9 (±4.3), the score ranging from 2 to 15. Subjects mostly reported associations between days of the weeks or months and spatial location, months or seasons and colors, music and colors or numbers and personalities. No subject reported being a synesthete. b. Passive experiment The mean detection rate was 89% for crosses and 87% for “bips”. Mean global detection rate (± standard deviation) is 88% (±14%). All subjects had a global detection rate included in mean detection rate +-1.2 SD (71%-100%), excepted one who had a global detection rate of less than 50%. As we could not be sure that this subject correctly attended to the stimuli, we excluded him from fMRI analysis. We also excluded from fMRI analysis a subject who did not hear the pseudowords in this experiment due to a technical problem. c. Active experiment 600 Shape Reaction Time 300 400 500 Congruent Incongruent 0 0 100 200 400 Reaction time (ms) 600 Congruent Incongruent 200 Reaction time (ms) B Word Reaction Time 800 A Match Mismatch Same Hand Different Hand Match Mismatch Same Hand Different Hand D Match Mismatch Different Hand Same Hand Shape Error Rate 20 Word Error Rate 20 C Match Mismatch Different Hand Same Hand Congruent Incongruent 15 10 Error rate (%) 0 5 10 5 0 Error rate (%) 15 Congruent Incongruent Match Mismatch Same Hand Different Hand Match Mismatch Different Hand Same Hand Match Mismatch Same Hand Different Hand Match Mismatch Different Hand Same Hand Fig III.2: A. Mean reaction times to pseudo-words. B. Mean reaction times to shapes. C. Mean error rates of pseudo-word responses. D. Mean error rates of shape responses. We separated matching (Spiky+”Keti” or Round+”Lujo” and mismatching (Spiky+”Lujo” or Round+”Keti”) trials and the congruent (in white) and incongruent blocks (in grey). We also indicated the pair of responses associated to the type of trials, same hand meaning that subjects had to answer two times with the same hand and different hand that they had to alternate hands to answer correctly. Error bars are SEM. The same statistical analysis was applied as the one described in the Single Trial IAT procedure (II.5.1). The mean error rate (± standard deviation) for word response is 5.6 % (±5.3 %) and 4.9 % (±4.4%) for shape response. 67 % of errors for word and 44 % for shape consisted in absence of response. The mean reaction time (± standard deviation) for correct word response is 532 ms (±152 ms) and 417 ms (±100 ms) for correct shape response. We found a main effect of congruency in all analyses: word response reaction time (F(1.17)=19.77, p=0.0004) and error rate (F(1.17)=8.45, p<0.01), shape response reaction time (F(1.17)=16.67, p<0.001) and error rate (F(1.17)=6.09, p=0.02). A main effect of matching was also observed for word reaction time (F(1.17)=41.39, p<0.00001) but not for shape reaction time or shape and word error rates. Finally, there was a significant interaction between matching and congruency in all analyses: word response reaction times (F(1.17)=23.37, p<0.0001) and error rates (F(1.17)=6.30, p=0.02), shape reaction times (F(1.17)=6.07, p=0.02) and error rates (F(1.17)=7.48, p=0.01). As in our behavioral experiments, we looked at inter-individual variability, knowing that people could be more or less sensitive to the sound-symbolic association between shapes and pseudowords. First, we plotted the size of the difference in reaction times and in error rates between congruent and incongruent blocks. This is calculated by subtracting the mean reaction time or error rate of all responses (word and shape) in congruent blocks to the same measure in incongruent blocks. As we can see, effect size differs a lot between subjects going from more than 350 ms to -100 ms for reaction times. We tested the correlation between these two measures; it was not significant (r2=0.07, p=0.27). Then, we calculated the difference in reaction times and in error rates between matching and mismatching trials. As for congruency, these two measures were not correlated (r2=0.05, p=0.38). A table of correlation showed us that the size of the difference between mismatching and matching stimuli was correlated to the size of the difference between incongruent and congruent blocks, both for reaction times (r2=0.35, p=0.01) and error rates (r2=0.32, p=0.01). We also observed that our synaesthesia score was significantly correlated to the difference between mismatching and matching stimuli as reflected by error rates (r2=0.37, p=0.008). A B Congruency effect: error rates 4 6 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Subjects -2 0 2 Effect size (%) 200 100 0 -100 Effect size (ms) 300 10 400 Congruency effect: reaction times 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Subjects Figure III.3: A. Plot of the difference between mean reaction time in incongruent blocks minus mean reaction time in congruent blocks for each subject. B. Plot of the difference between mean error rates in incongruent blocks minus mean error rate in congruent blocks for each subject. From those behavioral scores we derived two regressors to be used for fMRI analysis. We chose to focus on the congruency effect, the most important and stable across experiments and analyzes. For reaction time, we took for each subject the congruency effect size and divided it by the subject’s mean reaction time. We obtained 18 numbers and calculated their mean, then we subtracted this mean to each number to center the score. For error rates, we subtracted the mean effect size from each subject’s effect size. d. Discussion Globally, we confirmed the effects we found in the behavioral experiment: main effect of congruency and matching and interaction matching:congruency. This interaction could simply be due to the difference between different hands and same hand answers. Correlations between measures showed that congruency and matching effects were significantly correlated both for reaction times and error rates. According to the interpretation we made of congruency and matching effect, one hypothesis would be that subjects that are the most sensitive to higher order associations are also the ones integrating at lower level round shapes and “lojo” pseudowords. We hypothesize that the absence of correlation between effects on error rates and reaction times could be the result of a trade-off between these two measures, some subjects for example taking more time to do fewer errors. The fact that subjects with higher synaesthesia score do more errors with mismatching stimuli than matching stimuli was unexpected even if it makes sense. If replicated this result could be an interesting element linking synaesthesic tendencies to sound symbolic links between shapes and pseudowords. It remains a question why it is only correlated to error rates and not reaction times differences. 4. Imaging Results a. Activations Auditory stimuli activated the superior temporal gyrus and sulcus bilaterally (left: MNI -60 28 6, Z>8; right: MNI 66 -20 2, Z>8). At a 0.01 voxelwise threshold, they also activated a bilateral precentral cluster but under clusterwise significance threshold (left: MNI -48 -4 50, Z=6.04; right: MNI 56 0 44, Z=5.67). The contrast “lujo”-like pseudowords minus “keti”-like pseudowords activated a left temporal cluster, in the superior temporal gyrus (left: MNI -50 10 4, Z=6). The contrast of hard sounds minus soft sounds did not yield any significant activation. We think that this difference could be due to low-level differences between “lujo” and “keti” pseudowords. Visual stimuli activated the occipital cortex (left: MNI -10 -102 6, Z>8; right: 28 -92 16, Z>8), the ventral temporal posterior to about y=-35, and the posterior intraparietal cortex bilaterally, with an additional bilateral mid-intraparietal cluster (right: MNI 30 -54 52, Z=4.71; left: MNI -28 -56 58, Z=4.59). At a 0.01 voxelwise threshold, visual stimuli activated a bilateral precentral cluster but under clusterwise significance threshold (left: MNI -50 0 46, Z=3.35; right: MNI 48 4 32, Z=3.59). These clusters were slightly anterior to the ones activated by auditory stimuli. The contrast of round minus spiky shapes activated bilateral calcarine sulcus (left: MNI -8 -82 8, Z=5.95; right: MNI: 8 -76 12, Z=6.48). Conversely, spiky minus roundish shapes activated the bilateral occipital poles (left: MNI -12 -94 -8, Z=3.88; right: MNI 14 -98 -6, Z>8), including the posterior part of the striate cortex. We think that these activations are due to differences between round and spiky shapes. Whereas spiky shapes boundaries are situated in a more central region corresponding to posterior visual cortex, round shapes boundaries are more peripheral thus activation anterior regions. Figure III.4.1: Visual activations by shapes in passive experiment. A. Contrast of activations by round shapes minus spiky shapes is in red. Contrast of activations by spiky shapes minus round shapes is in yellow. Round shapes activated ventral and central visual areas and spiky shapes dorsal and lateral visual areas. B. Superposition of round shapes and spiky shapes used in passive experiment, with lines colored in red for round shapes and in yellow for spiky shapes. Round shapes lines are more peripheral than spiky shapes lines, which are more central. Bimodal stimuli significantly activated the union of the two unimodal networks, and a bilateral precentral cluster (left: MNI -48 -4 50, Z=7.37; right: MNI 56 2 44, Z=6.52). This precentral cluster approximately corresponds to the union of the ones found in auditory and visual stimuli. At voxelwise threshold 0.01, we saw activation in a left inferior frontal cluster (MNI -46 16 18, Z=3.23) that did not reach clusterwise significance level. In order to study regions activated by both unimodal stimuli, we studied the conjunction audio minus rest and visual minus rest. At voxelwise threshold 0.01, we found the anterior part of the bilateral precentral clusters (left: MNI -50 0 46, Z=7.37; right: MNI 58 6 40, Z=6.52), which did not reach clusterwise significance level. Figure III.4.2: Global auditory (A), visual (B) and bimodal (C) activations in passive experiment accompanied by the profile of activation of the most significant voxel in each cluster. Voxelwise threshold: 0.01, no clusterwise threshold. b. Deactivations We also looked at regions deactivated compared to rest with our stimuli. Indeed, some studies have found effect of cross-modal stimuli on deactivated regions (Laurienti, Burdette, Wallace, Yen, Field, & Stein, 2002; Kircher, Sass, Sachs, & Krach, 2009) making it important to carefully examine the responses to all multisensory conditions, even in deactivated regions (Beauchamp M. , 2010). As deactivations concerned larger but less significant regions, we used a 0.01 threshold voxelwise and kept the clusterwise correction (FWE) across the whole brain, unless specified otherwise. Auditory stimuli deactivated bilateral calcarine sulcus and cuneus (left: MNI -26 -56 10, Z=3.61; right MNI 12 -56 18, Z=3.32) and the anterior cingulum (left: MNI -12 48 -4, Z=3.39; right MNI 6 38 14, Z=3.36). Visual stimuli deactivated bilaterally the cuneus region (left: MNI -8 46 52, Z=4.05; right MNI 24 -42 64, Z=4.34) and a small but very significant left inferior occipital cluster (threshold used: voxelwise: 0.0001 and clusterwise: 0.05) (MNI -24 -102 -10, Z=6.95). The same region on the right side (MNI 28 -100 -6, Z=5.04) was also deactivated but is slightly under clusterwise statistical threshold. Bimodal stimuli deactivated the union of the two unimodal networks, except for regions deactivated in one condition and activated in the other (as calcarine sulcus). In order to study regions deactivated by both unimodal stimuli, we studied the conjunction rest minus audio and minus rest visual. Even at a low threshold, i.e voxelwise 0.02 each, we only found small non-significant clusters in bilateral cuneus and precuneus. c. Cross-modal matching To find regions more activated in bimodal trials than in unimodal trials, we studied the conjunction of bimodal minus auditory and bimodal minus visual (AV > A AND AV > V). This statistical criterion is used by many researchers in multisensory integration (Beauchamp M. , 2010). We notably found a part of the left precentral cluster (MNI -50 2 48, Z=3.26). We also studied the conjunction of auditory minus bimodal and visual minus bimodal to search for regions more deactivated in bimodal trials than in unimodal trials, but did not find any significant clusters. We then contrasted mismatching minus matching bimodal stimuli. This showed bilateral frontal activations (left: MNI -42 44 22, Z=3.92; right MNI 34 56 28, Z=3.50). These activations bilaterally covered the lateral part of the inferior, middle and superior frontal gyri. Those were due to larger deactivations by matching than by mismatching stimuli. When comparing separately the two types of mismatching stimuli to matching stimuli we found the same prefrontal activations in addition to those resulting from the different stimuli in the two terms of the subtractions (eg spiky minus round activations, as seen earlier). Figure III.4.3: Mismatch-match activations in frontal cortices accompanied by profile of activation in cluster most significant voxel. The opposite contrasts yielded no significant activations. When comparing separately the two types of matching stimuli to mismatching stimuli we did not find any activations except for those resulting from the different stimuli in the two terms of the subtractions. d. Linear regression The contrast of match minus mismatch was positively correlated with the behavioral congruency score, as measured during the active fMRI experiment, in the right temporooccipital cortex (MNI 44 -78 0, Z=4.16). There was a symmetrical left hemispheric activation somewhat below the threshold for cluster extent (MNI -44 -86 -10, Z=4.27). Fig III.4.4: Match-mismatch activations correlated to error rate congruency score. The figures on the right are plots of the difference of activation between matching and mismatching pairs of stimuli as a function of the difference in error rates between congruent and incongruent blocks. The numbers in the plots represent the subjects. r2 is the square of the correlation coefficient. We did not find activations correlated to the reaction time score. We observed significant mismatch-match activations correlated to synaesthesia score in the bilateral inferior postcentral gyri (left: MNI -32 -28 38, Z=4.26; right MNI 38 -22 34, Z=3.57) and inferior precentral gyri (left: MNI -36 -8 34, Z=3.39; right MNI 40 -6 34, Z=3.36). Fig III.4.5 : Mismatch-match activations correlated to synaesthesia index. The figures on the right are plots of the difference of activation between match and mismatch as a function of the synaesthesia index. The numbers in the plots represent the subjects. r2 is the square of the correlation coefficient. 5. Imaging Discussion a. Bimodal precentral clusters Figure III.6: Bilateral precentral cluster in bimodal blocks accompanied by mean profile of activation of a 1000 voxel cube centered on the most significant voxel of the cluster. Fig III.5: Precentral cluster in auditory (red) and visual (yellow) blocks. We found a precentral bilateral cluster activated by words, images and bimodal stimuli. This cluster was more anterior for images than words but a small part was bilaterally activated both for shapes and pseudowords. The cluster activated in auditory condition was apparently just anterior to the motor mouth region (Brown, Laird, Pfordresher, Thelen, Turkeltaub, & Liotti, 2009) and could correspond to activation of articulatory motor cortex. Such activations have been repeatedly observed during passive speech perception and have been identified as playing a role in audio-visual speech integration (Skipper, Van Wassenhove, Nusbaum & Small, 2007). The fact that shapes also activated a more anterior precentral cluster is more intriguing. Studies have found motor neurons responding to static 3-D visual objects in monkey (Murata, Fadiga, Fogassi, Gallese, Raos, & Rizzolatti, 1997), these neurons activation is selective to object shape and could reflect the specific grip needed to catch them. It has also been shown in humans that motor neurons could be activated during passive perception of static graspable objects, consistent with the idea that motor representations are accessed during the perception of object (Grèzes & Decety, Neuropsychologia, 2002; Grèzes, Tucker, Armony, Ellis, & Passingham, 2003). Thus it could be that our shapes appeared as graspable objects and activated a premotor cortex corresponding to hand movements. The fact that we observed with visual and auditory stimuli activations in different premotor areas that could respectively correspond to hand and mouth motor cortex is very interesting. We also noted that a small cluster situated in the left precentral region responded more to bimodal stimuli that to both unimodal stimuli. We failed to find a congruency effect in the precentral clusters, meaning that matching and mismatching stimuli did not activate differently this area. Nevertheless, the profile of activation that we found in this region makes it a good candidate for integration of shapes and pseudowords. Moreover it could interestingly correspond to the articulatory theory we developed in the introduction. But, as most of these activations were not significant at clusterwise threshold we will need to confirm it, maybe in the analysis of the active experiment. b. Mismatch-match We found increased deactivations for matching stimuli compared to mismatching stimuli in the left and right lateral frontal gyri. Looking at the activation profile of these areas we saw that they were almost equally deactivated in unimodal auditory and visual conditions and in bimodal mismatching condition but significantly more deactivated in matching condition. At clusterwise significance level, these regions were only deactivated by matching stimuli and not by any other conditions. Such an activation profile is difficult to explain but we think that these regions could reflect an incongruency effect, as the contrast mismatching minus matching stimuli suggest. Confirming this effect, we also found preliminary results in our active experiment that showed increased activations for mismatching stimuli compared to matching stimuli in bilateral frontal cortex (data not shown). Many studies interested in multisensory integration have found that frontal cortices responded to semantic inconsistencies between auditory and visual stimuli. Hein et al found increased activations in inferior frontal cortex for incongruent audiovisual stimuli (animal pictures and vocalization) and hypothesized that it could reflect learning of new associations. Noppeney et al in an important study on incongruency in crossmodal integration used either pictures (of animals or man-made objects) or written words as visual primes and studied their respective impact on the processing of subsequently presented natural sounds or spoken words. They observed increased activation both for incongruent words and spoken sounds in prefrontal and inferior frontal cortex. As they did not present auditory and visual stimuli together, they mostly studied higher order semantic and conceptual processing over perceptual integration. Participation of frontal regions in reaction to the presentation of incongruent stimuli has also been supported by some EEG studies (Doehrmann & Naumer, 2008). c. Mismatch-match correlated to synaesthesia The difference of activation between mismatching and matching conditions correlated to synaesthesia score is more difficult to interpret. As these regions (bilateral inferior precentral and postcentral gyri) are not significantly activated either for words and shapes and are more activated for mismatching pairs than for matching pairs, it seems that they are not regions playing a role in multisensory integration. The most plausible hypothesis would be that activation in these regions reflects incongruency effect, which would be correlated to synaesthesia index. In the behavioral analysis of active experiment, we saw that synaesthesic tendencies were significantly correlated to matching effect, thus confirming that the synaesthesia index we use and the effects we observe are not independent. It could be the case that subjects with synaesthesic tendencies have a higher incongruency effect in these regions. d. Match-mismatch correlated to error score We found a difference of activation between matching and mismatching stimuli correlated to the error rate score in bilateral lateral occipito-temporal regions (slightly under clusterwise threshold for the left side). These regions were regions activated in unimodal visual conditions but not in unimodal auditory conditions. Subjects that were more sensitive to shapes-words associations activated more these visual areas for matching stimuli than for mismatching stimuli. The role of LOC (lateral occipital cortex) in object processing is well documented. Originally, it was described as a visual region preferentially activated by images of intact (as compared to scrambled) objects (Malach, et al., 1995). In 2001, Amedi et al. (Amedi, Malach, Hendler, Peled, & Zohary, 2001) found robust activation for tactile objects in the posterior part of LOC, suggesting that this area could constitute a multimodal network and calling it LOtv (Lateral occipital tactile visual). Similar results were found in another study (James, Humphrey, Gati, Servos, Menon, & Goodale, 2002) where a region posterior to MT was activated by visual and tactile exploration of novel objects made of clay. Many others ( (Zhang, Weisser, Stilla, Prather, & Sathian, 2004) have found visuo-tactile activations in LOtv and the main hypothesis is that this region is engaged in processing higher-order shape representation accessible either by vision or touch. Evidence has been brought that these activations could not only be provoked by top-down mental imagery (Beauchamp M. , 2005). As in our study, it has been shown that LOtv is not activated by auditory stimuli (Naumer, et al., 2010; Amedi, Malach, Hendler, Peled, & Zohary, 2001) but some studies observed modulation by auditory stimuli in this region. First, Amedi, et al. used soundscapes created by a visual-auditory sensory substitution device (SSD) that converts visual shape information into an auditory stream via a specific algorithm (Amedi, et al., 2007). In this study, sighted and blind subjects who learned to associate these soundscapes to shapes activated LOtv when listening to them.Another recent study (Doehrmann, Weigelt, Altmann, Kaiser, & Naumer, 2010) found repetition enhancement in LOC for the same auditory animal vocalization (compared to different animal vocalization) presented with a congruent animal picture. They showed that “object-related visual cortex is able to discriminate auditory signals at least along the dimension of “same” versus “different”.Taken together, there is growing consensus that LOC is a ‘‘metamodal operator’’ (Lacey, Tal, Amedi, & Sathian, 2009; Amedi, von Kriegstein, van Atteveldt, Beauchamp, & Naumer, 2005) that processes shape information irrespective of the particular input modality. The fact that this region is strongly activated for our visual and bimodal conditions could mean that our subjects extracted tactile and shape information about our stimuli. This is not surprising as our visual stimuli are abstract round or spiky shapes whose geometrical representations are maybe easier to access than other visual objects. What is more surprising is that we found a matching effect in this area. Our hypothesis for this effect is the following: when subjects see one of our shapes, they extract some abstract properties of this shape (e.g. 3D structure) activating LOC. When this shape is accompanied by a matching auditory stimulus, the information seems to be congruent and subjects also extract properties of the shape activating LOC. On the contrary, when the shape is accompanied by a mismatching auditory stimulus, sensitive subjects are confronted to an incongruent situation (as reflected by frontal cortices activations we found) that could prevent the extraction of shape properties and result in a significant decrease of activation. The question remains if this conflict is due to top-down incongruency signals or direct links between unisensory regions. Studies have shown that direct links between auditory and visual cortices could exist without the need of specific multisensory regions (Kayser, 2010). If this region is extracting shape and 3D information about our visual stimuli, it is possible that they are considered as graspable objects, maybe linking this activation to premotor activations for shapes and to our “articulatory” hypothesis. This region should be analyzed in active experiment to look for such effects. This is the first time at our knowledge that (in)congruency effects are observed in LOC in fMRI and it would need to be further tested, for example, with congruent or incongruent visuo-tactile objects, which has never been done. Using EEG, Molholm et al. found a different response for congruent and incongruent audio-visual stimuli (animal picture and vocalization) taking place in right LOC (Molholm, Ritter, Javitt, & Foxe, 2004). They concluded that “behavioral multisensory object-recognition effect was due to auditory influences on visual information processing, and that this likely occurred at the feature level of representation”. This conclusion is very close to the hypothesis we make about our activations. Fig III.6: Recapitulative figure of imaging results with histograms showing activation profile at the most significant voxel of each cluster. IV. Conclusion The goal of our study was to explore the bouba/kiki effect. The fact that people associate “bouba” to round shapes and “kiki” to spiky shapes has been known for many years and replicated by many studies but its causes remain mostly unknown. First, we decided to confirm this effect with auditory and visual stimuli we designed. This was important for at least three reasons: first, a part of the evidence concerning the bouba/kiki effect was rather anecdotal and we decided to replicate it more systematically. Moreover we were curious to assess the respective role of vowels and consonants in this effect. Finally, as we wanted to conduct further experiments on this effect, we needed to select pseudowords and shapes whose associations had been assessed. This experiment confirmed that pseudowords were non-randomly mapped onto shapes and showed for the first time that both vowels and consonants modulated this effect. Then, we decided to check if these associations could affect subjects in tasks which did not need explicit choice. We tried a variety of methods and found evidence that matching pairs of pseudowords and shapes could be processed faster and more accurately than mismatching pairs. We interpreted this finding as the result of cross-modal integration between auditory and visual stimuli. We developed two major hypotheses that could account for cross-modal integration between pseudowords and shapes: the “articulatory” theory and the “low-features” theory. According to the “articulatory” theory this effect could be the consequence of links between speech perception and shape perception mediated by hand and mouth motor areas. The “lowfeatures” theory asserts that shapes and pseudowords associations are driven by links existing in the real world such as the one between frequency and size. We decided to use brain imagery as a way to bring evidence for one of these theories. We thought that the “articulatory” hypothesis would show activations in motor and premotor cortex sensitive to matching whereas the “low-features” theory would predict activation in early sensory areas. We conducted two imaging experiments, one where subjects passively attended to pseudowords, shapes, matching or mismatching pairs of pseudowords and shapes and one where they had to perform a speeded double classification task on matching or mismatching pairs of stimuli. We only present here the results of the passive experiment. We found a significant difference of activation between mismatching and matching pairs in frontal cortices which could correspond to an incongruency effect. This result shows that subjects are sensitive to the bouba/kiki effect, even during passive perception but does not particularly support one of the two hypotheses. Indeed, frontal cortex activations probably represent conflict between higher-order representations that could result from “articulatory” or “low-features” information about stimuli. Although we found activations for shapes and pseudowords in slightly different parts of the motor and premotor cortex, they were below threshold and we did not find a different response to matching and mismatching pairs in this region. We also observed increased activation for matching stimuli correlated to a behavioral score reflecting the sensitivity to the effect in the active experiment. These activations were situated in a lateral occipital region that could correspond to an area identified as a metamodal operator processing shape information irrespective of the input modality. We are currently analyzing active experiment and we will pay particular attention to these two regions. These results bring more questions than answers and do not really allow us to conclude between our theories. As one of the first studies on the bouba/kiki effect, we entered the subject in a rather exploratory fashion and had to find ways to study it. Much remains to be done on the subject and we think that behavioral experiments could help us disentangle explanations. Using for example soundless videos of articulated syllables or pure tone varying in frequency and amplitude with shapes would bring essential clues to what is going on when associating sound and shapes. Like Ramachandran and Hubbard, we are convinced that explanations to the bouba/kiki effect could bring important insight into theories about language and its origins. V. Bibliography Alsius, A., Navarra, J., Campbell, R., & Soto-­‐Faraco, S. (2005). Audiovisual integration of speech falters under high attention demands. Current Biology , 15, 839-­‐843. Amedi, A., Malach, R., Hendler, T., Peled, S., & Zohary, E. (2001). Visuo-­‐haptic object-­‐related activation in the ventral visual pathway. Nature neuroscience , 4 (3), 324-­‐30. Amedi, A., Stern, W., Camprodon, J., Bermpohl, F., Merabet, L., Rotman, S., et al. (2007). Shape conveyed by visual-­‐to-­‐auditory sensory substitution activates the lateral occipital complex. Nature neuroscience , 10 (6), 687-­‐9. Amedi, A., von Kriegstein, K., van Atteveldt, N., Beauchamp, M., & Naumer, M. (2005). Functional imaging of human crossmodal identification and object recognition. Experimental brain research Experimentelle Hirnforschung Expérimentation cérébrale , 166 (3-­‐4), 559-­‐71. Beauchamp, M. (2005). See me, hear me, touch me: multisensory integration in lateral occipital-­‐temporal cortex. Current opinion in neurobiology , 15 (2), 145-­‐53. Beauchamp, M. (2010, Mar 22). Statistical Criteria in fMRI Studies of Multisensory Integration. Neuroinformatics , 22. Brown, R., Black, A., & Horowitz, A. (1955). Phonetic symbolism in natural languages. Journal of Abnormal and Social Psychology , 50, 388–393. Brown, S., Laird, A., Pfordresher, P., Thelen, S., Turkeltaub, P., & Liotti, M. ( 2009). The somatotopy of speech: phonation and articulation in the human motor cortex. Brain and Cognition , 70, 31-­‐41. Damasio, A., & Geschwind, N. (1984). The neural basis of language. 7, 127–147. Davis, R. (1961). The fitness of names to drawings: A cross-­‐cultural study in Tanganyika. British Journal of Psychology , 52, 259–268. Degerman, A., R. T., Pekkola, J., Autti, T., Jääskeläinen, I., Sams, M., et al. (2007). Human brain activity associated with audiovisual perception and attention. Neuroimage , 34, 1683-­‐1691. Doehrmann, O., & Naumer, M. (2008). Semantics and the multisensory brain: how meaning modulates processes of audio-­‐visual integration. Brain Research , 1242, 136-­‐50. Doehrmann, O., Weigelt, S., Altmann, C., Kaiser, J., & Naumer, M. (2010). Audiovisual Functional Magnetic Resonance Imaging Adaptation Reveals Multisensory Integration Effects in Object-­‐Related Sensory Cortices. Journal of Neuroscience , 30 (9), 3370-­‐3379. Eitan, Z., & Timmers, R. (2010). Beethoven’s last piano sonata and those who follow crocodiles: Cross-­‐ domain mappings of auditory pitch in a musical context. Cognition , 114 (3), 405-­‐422. Farmer, T., Christiansen, M., & Monaghan, P. (2006). Phonological typicality influences on-­‐line sentence comprehension. Proceedings of the National Academy of Sciences of the United States of America , 103 (32), 12203-­‐8. Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-­‐realist perspective. Journal of phonetics , 14, 3-­‐28. French, P. (1977). Toward an explanation of phonetic symbolism. Word , 28, 305–322. Gentilucci, M. (2003). Grasp observation influences speech production. European Journal of Neuroscience , 17, 179–184. Gentilucci, M., Campione, G., Dalla Volta, R., & Bernardis, P. (2009). The observation of manual grasp actions affects the control of speech: A combined behavioral and Transcranial Magnetic Stimulation study. Neuropsychologia , 47 (14), 3190-­‐3202. Gentilucci, M., Stefanini, S., Roy, A. C., & Santunione, P. (2004). Action observation and speech production: Study on children and adults. Neuropsychologia , 42, 1554–1567. Gernsbacher, M., & Kaschak, M. (2003). Neuroimaging studies of language production and comprehension. Annual review of psychology , 54, 91-­‐114. Ghazanfar, A., & Schroeder, C. (2006). Is neocortex essentially multisensory? Trends in Cognitive Sciences , 10, 278-­‐285. Greenwald, A., McGhee, D., & Schwartz, K. (1998). Measuring individual differences in implicit cognition: the implicit association test. Journal of Personality and Social Psychology , 74, 1464–1480. Grèzes, J., & Decety, J. (2002). Does visual perception of object afford action? Evidence from a neuroimaging study. Neuropsychologia , 40 (2), 212-­‐22. Grèzes, J., Tucker, M., Armony, J., Ellis, R., & Passingham, R. (2003). Objects automatically potentiate action: an fMRI study of implicit processing. The European journal of neuroscience , 17 (12), 2735-­‐40. Hauk, O., Johnsrude, I., & Pulvermüller, F. (2004). Somatotopic representation of action words in human motor and premotor cortex. Neuron , 41, 301–307. Holland, M., & Wertheimer, M. (1964). Some physiognomic aspects of naming, or maluma, and takete revisited. Perceptual and Motor Skills , 19, 111–117. Huang, Y., Pratoomraj, S., & Johnson, R. (1969). Universal magnitude symbolism. Journal of Verbal Learning and Verbal Behavior , 8, 155–156. Imai, M., & Okada, H. (2008, Oct 23). Sound Symbolism facilitates early verb learning. Cognition . James, T., Humphrey, G., Gati, J., Servos, P., Menon, R., & Goodale, M. (2002). Haptic study of three-­‐ dimensional objects activates extrastriate visual areas. . Neuropsychologia , 40, 1706–1714. Köhler, W. (1947). Gestalt psychology, an introduction to new concepts in modern psychology (éd. 2nd edition). New York: Liveright. Kayser, C. (2010). The Multisensory Nature of Unisensory Cortices: A Puzzle Continued. Neuron , 67 (2), 178-­‐180. Kircher, T., Sass, K., Sachs, O., & Krach, S. (2009). Priming words with pictures: neural correlates of semantic associations in a cross-­‐modal priming task using fMRI. Hum Brain Mapp , 4116-­‐4128. Klink, R. (2000). Creating brand names with meaning: the use of sound symbolism. Marketing Letters , 11, 5-­‐20. Koriat, A., & Levy, I. (2004, Dec 1). The Symbolic Implications of Vowels and of Their Orthographic Representations in Two Natural Languages. Journal of Psycholinguistic Research , 6. Lacey, S., Tal, N., Amedi, A., & Sathian, K. (2009). A Putative Model of Multisensory Object Representation. Brain Topography , 21 (3-­‐4), 269-­‐274. Laurienti, P., Burdette, J., Wallace, M., Yen, Y., Field, A., & Stein, B. (2002). Deactivation of sensory-­‐specific cortex by cross-­‐modal stimuli. Journal of Cognitive Neuroscience , 14, 420-­‐429. Laurienti, P., Kraft, R., Maldjian, J., Burdette, J., & Wallace, M. (2004). Semantic congruence is a critical factor in multisensory behavioral performance. Experimental brain research Experimentelle Hirnforschung Expérimentation cérébrale , 158 (4), 405-­‐14. Lemus, L., Hernández, A., Luna, R., Zainos, A., & Romo, R. (2010). Do sensory cortices process more than one sensory modality during perceptual judgments? Neuron , 67, 335-­‐348. Liberman, A., & Mattingly, I. (1985). The motor theory of speech perception revised. Cognition , 21 (1), 1-­‐ 36. Malach, R., Reppas, J., Benson, R., Kwong, K., Jiang, H., Kennedy, W., et al. (1995). Object-­‐related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proc Natl Acad Sci U S A , 92, 8135-­‐8139. Maurer, D., Pathman, T., & Mondloch, C. (2006, Apr 5). The shape of boubas : sound-­‐shape correspondences in toddlers and adults. Developmental Science . Molholm, S., Ritter, W., Javitt, D., & Foxe, J. (2004). Multisensory visual-­‐auditory object recognition in humans: a high-­‐density electrical mapping study. Cerebral cortex (New York, NY : 1991) , 14 (4), 452-­‐65. Murata, A., Fadiga, L., Fogassi, L., Gallese, V., Raos, V., & Rizzolatti, G. (1997). Object representation in the ventral premotor cortex (area F5) of the monkey. Journal of neurophysiology , 78 (4), 2226-­‐30. Naumer, M., Ratz, L., Yalachkov, Y., Polony, A., Doehrmann, O., Van De Ven, V., et al. (2010). Visuohaptic convergence in a corticocerebellar network. European Journal of Neuroscience , 31 (10), 1730-­‐1736. Nichols, J., & Ohala, J. Sound symbolism. New York: Cambridge University Press. Noppeney, U., Josephs, O., Hocking, J., Price, C., & Friston, K. (2008). The effect of prior visual information on recognition of speech and sounds. Cerebral cortex (New York, NY : 1991) , 18 (3), 598-­‐609. Nosek, B., Greenwald, A., & Banaji, M. (2003). Understanding and Using the Implicit Association Test: II. Method Variables and Construct Validity. Journal of Personality and Social Psycholog , 85, 197-­‐216. Nygaard, L., Cook, A., & Namy, L. (2008, Jun 3). Sound Symbolism in Word Learning. Phelps, E., O'Connor, K., Cunningham, W., Funayama, E., Gatenby, J., Gore, J., et al. (2000). Performance on indirect measures of race evaluation predicts amygdala activation. J. Cogn. Neurosci. , 12, 729-­‐738. Pulvermüller, F., Huss, M., Kherif, F., Moscoso del Prado Martin, F., Hauk, O., & Shtyrov, Y. (2006). Motor cortex maps articulatory features of speech sounds. Proceedings of the National Academy of Sciences of the United States of America , 103 (20), 7865-­‐70. Ramachandran, V., & Hubbard, E. (2001, Nov 28). Synaesthesia -­‐ A Window into Perception, Thought and Language. Rizzolatti, G., & Arbib, M. (1998). Language within our grasp. Trends in Neuroscience , 21, 188–194. Sapir, E. (1929). A study in phonetic symbolism. Journal of Experimental Psychology , 12, 225–239. Skipper, J.I, Van Wassenhove, V., Nusbaum, H.C, & Small, S.L. (2007). Hearing Lips and Seeing Voices: How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception. Cerebral Cortex, 17,2387-­‐2399 Spence, C. (2007, Feb 15). Audiovisual multisensory integration. Acoust. Sci. & Tech. , 10. Taylor, I., & Taylor, M. (1962). Phonetic symbolism in four unrelated languages. Canadian Journal of Psychology , 16, 344–356. Van Atteveldt, N., Formisano, E., Goebel, R., & Blomert, L. (2004). Integration of letters and speech sounds in the human brain. Neuron , 43, 271-­‐282. Wernicke, C. (1874). Der Aphasische Symptomencomplex. Eine Psychologische Studie auf Anatomischer Basis. Breslau, Germany: Kohn und Weigert. Wertheimer, M. (1958). The relation between the sound of a word and its meaning. The American Journal of Psychology , 71, 412–415. Westermann, G., & Reck Miranda, E. (2004). A new model of sensorimotor coupling in the development of speech. Brain and Language , 89 (2), 393-­‐400. Wilson, S., Saygin, A., Sereno, M., & Iacoboni, M. (2004). Listening to speech activates motor areas involved in speech production. Nature neuroscience , 7 (7), 701-­‐2. Yuen, I., Davis, M., Brysbaert, M., & Rastle, K. (2010). Activation of articulatory information in speech perception. Proceedings of the National Academy of Sciences , 107 (2), 592-­‐597. Zhang, M., Weisser, V., Stilla, R., Prather, S., & Sathian, K. (2004). Multisensory cortical processing of object shape and its relation to mental imagery. Cogn Affect Behav Neurosci , 4, 251-­‐259. VI. Remerciements Je tiens à remercier les personnes qui m’ont aidé et soutenu pendant cette année passionnante: Le Professeur Laurent COHEN pour sa patience, sa disponibilité, ses conseils avisés et sa gentillesse qui ont fait de ce stage un véritable plaisir. Le Docteur Lionel NACCACHE également pour sa gentillesse et ses excellentes idées. Jean-Rémi KING pour son soutien, ses conseils et sa présence permanente à côté de moi. Gabriel GARCIA, Fabien VINCKIER et Emilie QIAO pour leurs présentations à l’Ecole de l’Inserm qui m’ont donné envie de faire un master de sciences cognitives. Les membres du laboratoire de Neuropsychologie et de Neuroimagerie et en particulier Marcin SZWED, Felipe PEGADO, Claire SERGENT, Frédéric FAUGERAS et Kimihiro NAKAMURA. Les membres du laboratoire Unicog, en particulier le Professeur Stanislas DEHAENE pour ses conseils, Christophe PALLIER pour ses éclaircissements techniques et Antoinette JOBERT qui a su faire preuve beaucoup de patience et de compétence pour la réservation des créneaux IRM. Toute l’équipe du Cogmaster pour cette formation très enrichissante. Toute l’équipe de l’Ecole de l’Inserm et en particulier le Professeur Jean-Claude CHOTTARD et le Professeur Philippe ASCHER pour leur soutien indéfectible. Ma famille pour leur présence et leur affection. Mes amis et mon colocataire pour leur soutien. Et enfin les nombreux sujets qui ont accepté de participer à mes expériences. VII. Annexes 1. Synaesthesia form Est-ce que le fait d’entendre prononcer des mots suscite en vous des sensations colorées ? Pas du tout Un peu Beaucoup Lorsque vous lisez des mots imprimés en noir, avez-vous l’impression de voir ces mots colorés ? Pas du tout Un peu Beaucoup Est-ce que le fait d’entendre prononcer des lettres de l’alphabet suscite en vous des sensations colorées ? Pas du tout Un peu Beaucoup Lorsque vous lisez des lettres de l’alphabet imprimées en noir, avez-vous l’impression de voir ces lettres colorées ? Pas du tout Un peu Beaucoup Est-ce que le fait d’entendre prononcer des nombres suscite en vous des sensations colorées ? Pas du tout Un peu Beaucoup Lorsque vous lisez des nombres imprimés en noir, avez-vous l’impression de voir ces nombres colorés ? Pas du tout Un peu Beaucoup Les jours de la semaine ou les mois de l’année vous évoquent-ils une couleur ou une teinte particulière ? Pas du tout Trouvez Pas du tout Un peu vous que les Un peu Beaucoup chiffres ont chacun une personnalité particulière ? Beaucoup Quand vous pensez aux nombres, les imaginez-vous disposés dans l’espace d’une certaine manière bien précise (par exemple en rond, ou suivant une forme géométrique, ou n’importe quel autre arrangement) ? Pas du tout Un peu Beaucoup Quand vous pensez aux jours de la semaine, les imaginez-vous disposés dans l’espace d’une certaine manière bien précise (par exemple en rond, ou suivant une forme géométrique, ou n’importe quel autre arrangement) ? Pas du tout Un peu Beaucoup Quand vous pensez aux mois de l’année, les imaginez-vous disposés dans l’espace d’une certaine manière bien précise (par exemple en rond, ou suivant une forme géométrique, ou n’importe quel autre arrangement) ? Pas du tout Un peu Beaucoup Est ce que le fait Pas du tout d’entendre certains Un peu bruits vous fait percevoir des couleurs ? Beaucoup Est ce que le fait d’entendre certains bruits vous fait percevoir des formes géométriques ou autres ? Pas du tout Est ce que Pas du tout Un peu le fait d’entendre Un peu Beaucoup de la musique vous fait percevoir des couleurs ? Beaucoup Si vous voyez quelqu’un se cogner ou toucher un objet, ressentez-vous la même sensation dans votre propre corps ? Pas du tout Un peu Beaucoup Avez-vous l’oreille absolue (en entendant une note savez-vous tout de suite si c’est un do, un ré…) ? Pas du tout Un peu Beaucoup Est ce que le fait de ressentir certaines douleurs provoque la perception de sons, de forme ou de couleurs particulières ? Pas du tout Un peu Beaucoup Avez-vous observé chez vous-même d’autres genres de correspondances entre des sensations d’espèces différentes (odeurs, textures, goûts, bruits, mots…) ? Si oui lesquelles ? Pas du tout Un peu Beaucoup