The processing of emotional audiovisual stimuli By: Mariska Koekkoek Studentnumber: 3670864 Utrecht University Supervisors: MSc. C. van den Boomen, Prof. dr. C. Kemner Second reviewer: Dr. M.J.H.M. Duchateau Date: 17-08-2012 Index Abstract ................................................................................................................................................... 3 Introduction ............................................................................................................................................. 4 The effect of bimodal stimuli on the recognition of emotions: behavioural studies ............................... 5 Processing audio-visual stimuli: EEG studies ........................................................................................... 7 Processing audio-visual stimuli: imaging studies .................................................................................... 9 Discussion .............................................................................................................................................. 14 References ............................................................................................................................................. 18 2 Abstract In a natural environment humans constantly receive a multitude of different signals and more often than not these signals engage multiple senses at the same time. These signals make it possible to understand and interact with our environment. In order to express ourselves and communicate with others we use facial expressions and different tone of voice. Correctly and swiftly identifying these signals is an important aspect of interaction. Due to the combination of both auditory and visual cues used during interaction it is not only important to investigate the processing of these signals separately, but the interaction of the processing of these signals as well. A number of studies have investigated the difference between the processing of uni- and multimodal emotional and non-emotional stimuli. A number of models have been suggested to investigate interaction between multimodal stimuli. Here the results found in regards to the interaction of emotional and non-emotional audio-visual stimuli on several different levels such as behavior and brain activity will be discussed, as well as evidence supporting different models pertaining this interaction. In conclusion it is found that both emotional and non-emotional audio-visual stimuli show interaction during sensory processing. Moreover the processing of emotional bimodal seems to depend in part on the emotion. 3 Introduction In our everyday life we constantly receive various kinds of signals from our environment. These signals make it possible to interact with our surrounding and with other beings living in this environment. However signals are often perceived by multiple senses at the same time, e.g. sight and hearing, and all of these signals are then used to comprise a picture of our surrounding. The processing of these signals has been studied extensively, although a lot of these studies focused on a single module instead of multiple modules. In a natural environment however, more than one sense often receives information at the same time. Investigation of the combination of multiple modules is therefore important. This led to the discovery of interesting phenomena. For example hearing and seeing someone speak at the same time may lead to perception of a sound that is neither formed by the auditory or visual aspect alone (McGurk and MacDonald, 1976). Additional evidence that the integration of sound and image can have an illusory effect came from Shams et al (2001) who showed that when one flash of light is accompanied by two bursts of noise, the single flash is perceived as two separate flashes. Furthermore visual cues may also draw attention towards a possible location of an auditory clue, but only under the condition that the auditory clue is difficult to locate (Spence and Driver, 2000; Vroomen et al., 2001a). Aside from illusory effects both seeing and hearing speech can also help in the understanding of what is spoken (Grant, 2001; Grant and Seitz, 2000; Swartz et al. 2004). All these examples illustrate that there seems to be an interaction in the processing of auditory and visual stimuli. However these studies primarily used very general and biological irrelevant stimuli. As humans we interact daily with other people. An important aspects of these interactions is the emotion expressed by others through for example face or voice. The ability to accurately and swiftly identifying these emotions in others is therefore very important. Early research to the recognition of emotional facial expression has found that basic expressions are recognized as portraying a particular emotion across cultures. Ekman and Friesen (1969) conducted an experiment to investigate whether emotional faces are recognized across cultures regardless of prior experience or lack of interaction between these cultures. Results showed that facial expressions are recognized as portraying a particular emotion and could therefore be considered universal, however they acknowledged the possibility that subjects were able to recognize these expression because they were all extensively exposed to the same mass media photos of the facial expressions. This may have led them to recognize the expressions. In a follow-up study to investigate this possibility Ekman and Friesen (1971) showed that cultures who have been minimally exposed to mass media and literate cultures, are able to recognize basic emotions such as fear, anger, sadness, disgust, happiness and surprise from other cultures as well. These results indicate that it is very likely that the recognition of emotions, an important aspects of human interaction, is universal. Apart from facial expressions emotions can be expressed by voices as well. It has been found that sometimes emotions may be easier to recognize using cues from either 4 facial expressions or tone of voice (de Silva et al., 1997). For example it has been found that happy emotions are quite easy to recognize from facial expressions (de Gelder et al., 1998), but quite difficult to recognize from the tone of voice (Vroomen et al., 1993). When investigating if vocally expressed emotions are recognized across cultures like facial expression are similar results have been found. Participants from different cultures were able to discriminate emotions from sentences spoken in foreign languages well above chance level (Pell et al., 2009; Scherer et al., 2001; Thompson and Balkwin, 2006). However some emotions such as anger and sadness were easier to recognize in foreign languages than other emotions such as fear and joy (Pell et al., 2009; Thompson and Balkwill, 2006) indicating again that some emotions are easier to recognize than others. These results show that not only are emotional expression universal, the recognitions of emotions from the voice seems to be easily recognized across cultures as well. In short emotions are easily recognized across cultures regardless of whether it is expressed through faces or voices. The integration of audio-visual stimuli however seems to lead to different effects. It is therefore interesting to know the effect of emotional audio-visual integration. In this paper the effect of emotional audio-visual interaction on the on behavior and brain activity will be investigated. To this aim emotional audio-visual stimuli will be compared to unimodal emotional stimuli as well as to non-emotional audio-visual stimuli. Behavioral, EEG and imaging studies will be reviewed in order to get a full understanding of the effect of bimodal stimuli on behavior and brain activity compared to unimodal stimuli. In order to determine whether integration of the processing of audio-visual stimuli occurs, which seems to be suggested by the phenomena that occur when both auditory and visual stimuli are presented, two different models will be reviewed which will be explained momentarily. Furthermore the results of the different studies will be analyzed to determine the nature of this interaction. The effect of bimodal stimuli on the recognition of emotions: behavioural studies It is a well known phenomena that the simultaneous presentation of auditory and visual stimuli leads to a lower reaction time compared to visual and auditory stimuli separately. Additionally it also leads to a higher accuracy in the identification of said stimuli (Giard and Peronnet, 1999; Molholm et al., 2002, 2004; Teder-Sälejärvi et al., 2002; TederSälejärvi et al., 2005). Reaction to bimodal (redundant) targets is often swifter than reaction to unimodal targets. To explain this effect multiple models have been suggested. One of these models is the race-model. This model suggests that the input from both sources is processed simultaneously and the first stimulus to be processed determines the reaction time (RT) and that no interaction occurs. The probability of getting a faster RT than a given time is higher when using bimodal targets compared to unimodal targets (Gondan et al., 2005). However there is a limit to the redundancy gain that can be explained by this effect as is explained by Miller’s inequality model (see Miller, 1982). Violation of this inequality model, for example a redundancy gain higher than the inequality model allows, leads to the 5 rejection of separate activation models such as the race-model. In general studies investigating redundant target effect often examine whether the RT in response to bimodal and unimodal stimuli exceeds the inequality model in order to determine if the race-model is adequate to explain the difference in RT facilitated by bi- and unimodal stimuli. If it is found inadequate the model is rejected for another model such as co-activation models which suggest that information from both sources are integrated, which would explain the excessive redundancy gain (Gondan et al., 2005). Adopting either the race-model or a coactivation model leads to a difference in assumption of the nature of the processing of audio-visual stimuli. The race-model suggests that there is no neural interaction in the processing of stimuli, while the co-activation model suggests there is. However simply the failure to violate the race-model does not prove that an interaction does not occur. A number of studies have used the inequality model to investigate the redundancy gain of both emotional and non-emotional audio-visual stimuli (Collignon et al., 2008, 2010; Corballis, 1998, 2002; de Gelder and Vroomen, 2000; Massaro and Egan, 1993; Marzi et al., 1996; Molholm et al., 2002, 2006; Murray et al., 2001; Teder-Sälejärvi et al., 2005). And although some studies found the race-model to be sufficient in explaining the redundancy gain (Corballis, 1998, 2002; Murray et al., 2001) a large number of studies found a violation of the inequality model when using both non-emotional and emotional stimuli (de Gelder and Vroomen, 2000; Gielen et al., 1983; Gondan et al., 2005; Massaro and Egan, 1993 Miller, 1982; Schröger and Widmann, 1998). The redundancy gain of multimodal stimuli exceeds the probability summation of the race-model, suggesting that there is an interaction in the processing of both stimuli. This leads to the rejected of this model in favor of a co-activation model. However it has been argued that the effect of combined modalities might be due to the instructions given in some cases, which leads to subjects paying closer attention to both modalities. In order to disprove this effect de Gelder and Vroomen (2000) used a test in which subjects were instructed to pay attention to only one modality and ignore the input of the other. Should the effect still be present under these conditions it would suggest that one modality is affected by input from the other, regardless of attentional bias. The results showed that this was indeed the case. The effect found when subjects were instructed to pay attention to only one of the modalities was smaller, but still present. This indicates that the influence of the input from modalities to which one pays no attention is involuntary. This cross-modal interaction is even present when participants’ attention is further diverted to a completely unrelated task (Vroomen et al., 2001b). It was therefore suggested that crossmodal interaction is present as well as automatic. Besides the effect bimodal stimuli have on the processing of either modality, when using emotional bimodal stimuli other effects may influence RT as well, such as congruency. Congruent emotions are recognized swifter compared to incongruent emotional pairs (de Gelder and Vroomen, 2000; Massaro and Egan, 1993), comparable to the effect found in congruent and incongruent spatial distribution of visual and auditory stimuli in which congruent spatially distributed multimodal stimuli lead to a lower reaction time compared to 6 incongruently distributed stimuli (Miller, 1991). The similar results found by de Massaro and Egan (1993) and Gelder and Vroomen (2000) in regards to reaction time and incongruency effect. In summary results from multiple behavioral studies show that there is an interaction in the processing of auditory and visual stimuli, which results in a swifter reaction time and a more accurate recognition of stimuli. The same effect has been found in the response to non-emotional and emotional audio-visual stimuli. Furthermore an incongruency effect has been found in which the recognition of congruent emotions is easier compared to incongruent emotions. Processing audio-visual stimuli: EEG studies Behavioral studies show that there appears to be an interaction in the processing of auditory and visual stimuli when shown simultaneously. Nevertheless this does not tell us much about the effects of these stimuli activity in different brain areas or how these areas affect one another, only what the result of the integrating of stimuli is in terms of behavior. In order to investigate the actual processing of these integrated stimuli within the brain one can for example measure brain activity by using electroencephalography (EEG) to measure early event-related potentials (ERP). The advantage of EEG is that it has a high temporal resolution. This means that by using ERP one can detect changes over milliseconds as opposed to other methods for measuring brain activity such as fMRI and PET. These methods have a much coarser temporal resolution. This makes ERP particularly suited to investigate early activity and small differences in latency of ERP’s evoked by different types of stimuli. Several studies have investigated brain activity in regards to multimodal interaction using this method using both non-emotional and emotional stimuli. For example ERP’s evoked by non-emotional audio-visual stimuli often display a difference in amplitude of modality specific peaks compared to the auditory and visual stimuli, even when considering very early potentials. Furthermore the processing of audio-visual stimuli evokes components which are not found when using unimodal auditory or visual stimuli (Giard and Peronnet, 1999; Molholm et al., 2002, 2004). For example several studies have shown an activation over parieto-occipital channels, 40-90ms after stimulus onset in the audio-visual condition, with no similar activation in the unimodal conditions (Giard and Peronnet, 1999; Fort et al., 2002). The same study showed that a component primarily evoked by auditory stimuli, the auditory N1, was enhanced in the multimodal condition compared to the unimodal auditory condition. Similar results were revealed by Molholm et al. (2002), who also found an early activation in parieto-occipital regions which was enhanced in the bimodal condition. Additionally primarily visually evoked components (visual P1 and N1) were inhibited in the audio-visual conditions compared to unimodal conditions. Primarily auditory components (auditory N1 and P2) however were enhanced in the audio-visual condition. Moreover the scalp region of activation of the visual N1 was modulated in the bimodal condition, with the N1 peaking more posterior compared to the visual condition 7 (Molholm et al., 2004). In summary these studies indicate that non-emotional audio-visual stimuli show an early interaction during sensory processing as indicated by the unique early ERP components in audio-visual conditions and modulation of modality specific components. A number of ERP-studies have also investigated the processing of emotional stimuli (de Gelder et al., 1999; Magnée et al., 2008; Pourtois et al., 2000, 2002) showing again a modulation of the amplitude of the ERP’s by combined stimuli, as well as a modulation of the latency of the evoked peaks. Several different components have been researched in order to determine if components evoked by one modality are influenced by information received from a second modality as well. For example the auditory N1 and P2b (Pourtois et al., 2000, 2002) and the visual N2 (Magnée et al., 2008). The amplitude of the auditory N1 was enhanced in emotional audio-visual condition compared to single module conditions. Moreover this result was not replicated when using inverted faces, indicating that the processing of facial expressions is required for the enhancement of the N1 component in audio-visual conditions. In addition when using emotional stimuli components may also be influenced by the congruity of the emotions. Congruent emotional face-voice pairs may affect components differently than incongruent pairs. It was found that congruent face-voice pairs elicited an earlier P2b component compared to incongruent face-voice pairs. In accordance with the auditory components the visual N2 shows similar results. This component is affected by emotional sound. Furthermore this component is found to be modulated differently by different emotional voices. E.g. displaying larger amplitude for fearful voices compared to happy voices. Moreover this peak displays a larger amplitude for congruent fearful face-voice pairs, however neither congruent nor incongruent face-voice pairs led to an enhanced amplitude in the happy voice condition. In contrast several studies have found a slightly different effect in regards to congruency. For example several studies revealed that when comparing fearful and happy audio-visual stimuli, incongruent stimuli evoke higher activity compared to congruent stimuli (Magnée et al., 2011, Paulmann and Pell, 2010, Grossman et al., 2006). Additionally it was found that this congruency effect was dependent on the length of the auditory stimuli, with shorter auditory fragments evoking larger activity in congruent conditions and longer fragments evoking larger activity in incongruent conditions (Paulmann and Pell, 2010). In addition to these auditory and visual components de Gelder et al. (1999) researched mismatch negativity (MMN). The MMN is often examined in auditory studies as a method to research the processing of auditory stimuli. When a series of similar auditory stimuli is followed by incongruent auditory stimuli it will elicit the MMN. The amplitude of this brainwave is larger in proportion to the difference between the standard and the deviant stimuli. The MMN was known to be sensitive to only one modality, but recent studies have shown that aside from the processing of auditory stimuli, it can also reflect the processing of cross-modal stimuli. De Gelder et al. (1999) revealed that a series of emotionally congruent audio-visual stimuli (an angry voice combined with an angry face) followed by an incongruent pair (an angry voice combined with a sad face) elicit the same 8 MMN. Additionally the results were repeated when a series of incongruent face-voice pairs was followed by a congruent pair. As mentioned the MMN is thought to primarily reflects auditory processing. In recent years an analogous component evoked by visual stimuli has been found, the vMMN (PazoAlvarez et al., 2003). This component shows enhanced activation to incongruent visual stimuli interrupting a series of similar visual stimuli. This vMMN has been used to investigate the effect of different facial expressions. In a recent study Zhao and Li (2006) investigated a mismatch negativity evoked by facial expressions when participants focused on another task. Participants were asked to determine the pitch of a series of tones while they were simultaneously shown a series of neutral faces which was interrupted by either sad or happy faces. It was shown that the vMMN evoked by sad faces were more negative compared to happy faces. It was concluded that facial expression can elicit a vMMN and that different expression evoked different amplitude in the vMMN. In accordance with the results found with non-emotional audio-visual stimuli these studies show a modulation of early ERP components, both auditory and visual. Furthermore the MMN evoked by emotional visual stimuli show that the processing of emotional auditory stimuli is affected by the processing of simultaneously presented emotional visual stimuli as well as vise versa. Further analysis of the results show that the modulation of early ERP components is quite similar in emotional and non-emotional conditions, for example an enhanced amplitude of auditory components in audio-visual conditions. However the investigation of emotional audio-visual integration also shows that the modulation of the components may depend on the type of emotion presented by the stimuli with some components showing a larger amplitude for particular emotions. However although these studies gives us a good understanding of the interaction of audio-visual stimuli on a fine temporal scale, EEG has a low spatial resolution and is not suited to investigate activity evoked by stimuli over large areas. Other methods such as tract tracing and neuro-imaging may clarify interaction between specific auditory and visual regions in the brain. As opposed to EEG imaging techniques have a low temporal resolution, but a high spatial resolution. These techniques are therefore useful to research the location of activation in response to stimuli. Processing audio-visual stimuli: imaging studies It is generally accepted that various sensory input is processed in sensory specific cortices such as the auditory, tactile and visual cortex. These cortices are hierarchal with connectivity between lower and higher levels. Furthermore traditional models indicated that specific sensory systems are linked by association cortices which are activated during integration of multisensory processing. This leads to a bottom-up model that states that different sensory information converge in specific higher order multimodal areas. The bottom-up model assumes that sensory specific stimuli are first processed separately in modality specific brain regions and are integrated subsequently in higher order multimodal brain areas using feed9 forward projections More recently however evidence has been found supporting the topdown model which states that these multimodal areas project to sensory specific areas through feedback. Both models have been supported by subsequent anatomical and imaging studies using both emotional and non-emotional stimuli. However as seen in the EEG studies the processing of audio-visual stimuli may be different for emotional and non-emotional stimuli. In order to get a full understanding of the processing of emotional audio-visual stimuli it is not only important to compare it to unimodal stimuli, but to non-emotional stimuli as well. The next part will therefore will therefore first review studies investigating non-emotional audio-visual stimuli. A number of studies have found that brain regions which are thought to be unimodal and should therefore be activated by a single modality are actually activated by other modalities as well (Calvert et al., 2002; Finney et al., 2001; Sadato et al., 1996). For example Calvert et al. (2002) showed that lipreading activated the auditory cortex even in absence of sound. However this was only the case if linguistic facial movements were shown, if nonlinguistic facial movements were viewed, no activation in the auditory cortex was found. Further evidence of activation of auditory cortex by visual stimuli comes from Finney et al. (2001). They examined the activation of the auditory cortex by visual stimuli in deaf subjects. It was found that a non-linguistic visual moving object elicited activation in the auditory cortex. Both of these studies indicate that the auditory cortex can be activated by both language related and non-language related visual stimuli. In addition other sensory specific areas have been found to show activation in response information from other senses. Sadato et al. (1996) showed that the visual cortex can be activated by sensory input in Braille-reading in deaf people in both linguistical and non-linguistical tasks. Furthermore the removal of one sensory modality may also lead to a rearrangement of the connectivity between modalities. For example Weeks et al. (2000) revealed that the loss of sight leads to a change in connectivity in the right posterior parietal cortex which participates in the localization of sound in congenitally blind. In sum these studies show that primarily unimodal cortices may be activated by sensory input from other senses. Brain imaging and tracing have also been used extensively to determine which brain regions show activation when a person is presented with bimodal stimuli by using both emotional and non-emotional bimodal stimuli. Furthermore several studies have investigated the connectivity between auditory and visual regions in several different species when presented with non-emotion stimuli using both imaging and tracking techniques. These studies revealed connections between a great number of regions (Table 1). For example it was also revealed that primary auditory sites project to both auditory and visual brain areas (Eckert et al, 2008). This result was further supported by a tracing study by Rockland and Ojima (2003). Similar results were revealed by Falchier et al. (2000) and Cappe and Baron (2005), who both showed a connectivity between visual regions and auditory regions. In summary these studies revealed connections both within as well as between sensory specific brain regions. 10 Further investigation into brain activity evoked by non-emotional audio-visual stimuli compared to unimodal stimuli revealed that non-emotional bimodal stimuli evoked an enhanced activity in several different areas. For example Calvert et al. (2000) found enhanced activation in the primary and secondary auditory cortex as well as the vision motor cortex. And although some slight differences in specific activation sites were revealed, these results were repeated by Scheef et al. (2009), indicating quite strongly that these brain areas are involved in the processing of non-emotional bimodal stimuli (Table 2). Moreover to determine the processing of bimodal emotional stimuli several studies have investigated brain activity when participants are presented with emotional audio-visual stimuli. A number studies have found that found that emotional bimodal stimuli evoked an enhanced activity compared to neutral bimodal stimuli in the auditory cortex (Ethofer et al., 2006; Robins et al., 2009). Further analysis however showed that different emotions activated specific regions (Table 3). In addition a number of studies have compared bimodal emotional stimuli to unimodal. These studies have found among others activation in auditory and multimodal brain areas (Table 4). Further analysis showed again a differential enhancement for different emotions. Table 1: Anatomical region showing connectivity within and between sensory specific modalities (inter-/intramodal) across studies. Study Projection site Eckert et al., 2008 A1 MT+ Left V1 Right V1 Rockland and Ojima, 2003 Falchier et al., 2002 Cappe and Baron Parietal lobule Caudolateral belt Middle lateral belt Calcarine site Periperal V1 Visual cortex Receiving site Left and right superior temporal gyrus Medial geniculate nucleus Anterior calcarine cluster Left and right anterior calcarine fissure Posterior thalamic nuclei Lateral occipito-temporal Cortex Heschl’s gyrus MT+ Lateral geniculate nucleus MT+ Lateral geniculate nucleus Calcarine fissure Dorsal V2/V1 V2/V1 V2 A1 STP Auditory cortex Inter-/ intramodal Intra Intra Inter Intra Inter Intra Intra Intra Intra Inter Intra Inter Inter Inter Intra Inter Inter Inter 11 Table 2: Areas of enhanced activation in non-emotional audio-visual condition compared to unimodal conditions across studies. Study Calvert et al., 2000 Comparison Congruent bimodal to unimodal Scheef et al., 2009 Bimodal to unimodal area of enhanced activation Right fusiform gyrus Right middle occipital gyrus Left middle frontal gyrus Right inferior parietal lobule Right occipital temporal gyrus Superior temporal sulcus Left Heschl's gyrus Left cuneus Middle and inferior occipital lobe Middle and superior temporal lobe Medial and superior frontal lobe Right posterior lobe cerebellum Fusiform gyrus Left cuneus Cingulate gyrus Table 3: areas of enhanced activity in emotional audio-visual condition compared to neutral audio-visual conditions across studies Study Emotion Robins et al., 2009 Angry, fearful and happy Angry Happy Fearful kreifelts et al., 2007 Alluring, angry disgusted, fearful, happy and sad Area of enhanced activation Right anterior superior temporal gyrus Left fusiform gyrus Right anterior superior temporal gyrus Left superior temporal gyrus Lateral fissure Right anterior superior temporal gyrus Left superior frontal gyrus Right superior temporal gyrus Right fusiform gyrus Left anterior lateral fissure Right superior frontal gyrus Posterior thalamus Posterior superior temporal gyrus Right thalamus 12 Table 4: area of enhance activation in emotional bimodal conditions compared to unimodal conditions Study Emotion Robins et al., 2009 Angry, fearful and happy Fearful Pourtois et al., 2005 Fearful and happy Fearful Happy Kreifelts et al., 2007 Alluring, angry disgusted, fearful, happy, sad and neutral Alluring, angry, disgusted, fearful, sad and neutral Alluring, angry and sad Area of enhanced activity Posterior superior temporal sulcus Left superior temporal sulcus Right posterior superior temporal sulcus Right temporal pole Left anterior superior temporal sulcus Left middle temporal gyrus Fusiform gyrus Posterior middle temporal gyrus Superior frontal gyrus Middle frontal gyrus Inferior parietal lobule Left posterior superior temporal gyrus Right posterior superior temporal gyrus Right thalamus In addition the effect of congruent compared to incongruent non-emotional and emotional audio-visual stimuli on brain activity has also been studied (Dolan et al., 2003; Jones and Callan, 2003). Jones and Callan (2003) found a higher activation in the right supramarginal gyrus and the left inferior parietal lobule during the presentation of non-emotional incongruent stimuli compared to congruent stimuli. In addition incongruent stimuli resulted in more activation in the right precentral gyrus. Additionally Dolan et al. (2003) studied the congruency effect of emotional bimodal stimuli. Enhanced activity was found in for example the left amygdala and the right fusiform cortex for fearful congruent conditions. Happy congruent conditions however elicited enhanced activity in different brain regions, demonstrating again a differentiation in the activity evoked by different emotions. Furthermore in order to understand the integration of emotional bimodal stimuli Kreifelts et al. (2007) investigated the connectivity between specific unimodal brain regions. An increased connectivity was found between the right and left posterior superior temporal sulcus, as well as the right thalamus and several other structures such as the fusiform gyrus. These results combined with the results of the previously mentioned studies indicate that the superior temporal sulcus and gyrus may play a major role in the processing of bimodal emotional stimuli. Furthermore the results also show that different besides these areas each emotion elicits enhancement in specific brain regions. 13 In summary as EEG studies, imaging studies show that emotional audio-visual stimuli show a modulation of brain activity compared to unimodal stimuli. Although compared to non-emotional bimodal stimuli, emotional stimuli may have a stronger or comparable effect on brain activity dependent on the type of emotion. Discussion This paper looks into literature on the effect of emotional audio-visual stimuli compared to unimodal emotional stimuli and bimodal neutral stimuli. A great number of studies has found differences in behavior and brain activity in response to both emotional and nonemotional bimodal and unimodal stimulation. The comparison of emotional bimodal and unimodal stimuli shows differences both behavior and brain activity. And although there are some differences in the processing of emotional bimodal stimuli compared to neutral bimodal stimuli, e.g. different emotions showing enhance activity in different brain areas, there are also several similarities. For example, both evoke an enhanced activity in sensory specific brain areas in multimodal compared to unimodal conditions. In addition both emotional and non-emotional multimodal stimuli evoke activity in multimodal regions. One major research point in behavioral studies for comparing unimodal and bimodal stimuli is the difference in RT. In general multimodal stimulation leads to an increase in RT in comparison to unimodal stimulation. Two types of models are often used to explain the result; the race-model and the co-activation models. As said before as long as the increase in RT does not exceed the probability summation and violates the inequality model the racemodel is considered sufficient to explain the increase in RT. In this case there is no reason to assume integration of bimodal sensory information. In contrast if the increase in RT does violate the inequality model the race model is rejected and it is assumed that integration of bimodal input occurs as is the case in co-activation models. For example the race-model has been found to be sufficient to explain the difference in RT in several studies examining visual unimodal stimuli presented in left or right visual field or in both visual fields (Corballis, 1998, 2002; Murray et al., 2001). Corballis (1998, 2002) found a faster response to presentation in both visual fields. However this increase in RT did not exceed the redundancy gain predicted by probability summation and the race-model was not rejected. Contrarily other studies have found a violation of the race-model. These studies concluded that the race-model was insufficient to explain the increase in reaction time and that this increase was more likely a result of co-activation of different brain regions instead of simply a result of the fastest processing of one of the presented stimuli (Collignon et al., 2008, 2010; Marzi et al., 1996, Molholm et al., 2002, 2006, Teder-Sälejärvi et al., 2005). In short there is a violation of the inequality model in a number of studies. This indicates an interaction in the processing of auditory and visual stimuli. And although some studies have also found the race-model to be sufficient in explaining the increase in RT, this in itself does not exclude the possibility of interaction between unimodal brain areas. Nor does it exclude interaction in the processing of audio-visual stimuli. In short there is no evidence in behavioral studies to assume that 14 there is no interaction in the processing of audio-visual stimuli and quite some results that suggest that there is. Moreover not only is the interaction of audio-visual stimuli supported by behavioral studies, it is also supported by EEG and imaging studies. This is shown by the modulation of ERP’s by emotional bimodal stimuli and the enhanced brain activity revealed by imaging studies in comparison to both emotional unimodal and non-emotional bimodal stimulation. Again when comparing emotional stimuli with non-emotional stimuli there is very little difference overall; both result in modulation of modality specific ERP components and an enhancement of activity in modality specific and multimodal brain areas. Unlike behavioral studies however both EEG and imaging studies also show that the processing of emotional audio-visual stimuli is slightly more complex than non-emotional audio-visual stimuli. Different emotions tend to have a different effect on the modulation of ERP component. I.e. some have little to no effect while other have a very strong effect. Additionally different emotions tend to eliciting enhanced activity in different brain areas. Despite the slight differences between emotional and non-emotional studies, behavioral and brain activity studies do indicate an interaction in the processing of audiovisual stimuli. This interaction might be realized in different ways. Two models are often used in order to explain the interaction. First is the bottom-up model in which it is assumed that sensory specific stimuli are first processed separately in modality specific brain regions and are integrated subsequently in higher order multimodal brain areas using feed-forward projections. Another model is the top-down model which assumes that multisensory stimuli are processed in multimodal brain areas, which then project to sensory specific unimodal areas. This model emphasizes the importance of feedback connections from multimodal to unimodal brain areas. Traditionally the processing of multisensory stimuli is thought to be a bottom-up process, driven by feed-forward projections. It has been extensively investigated in both imaging and tracing studies in animals, as well as in humans using non-invasive methods. These studies investigating the neural processing of multimodal stimuli revealed brain areas responding to multiple modality. These areas are found to receive input from modality specific regions, which is in line with the bottom-up model (Macaluso, 2006). Imaging studies also found, aside from activation in sensory specific brain areas, activation of multisensory regions by unimodal input independent of attentional awareness or selective attention to a single modality (Macaluso et al., 2002). This result is also in concordance with the bottom-up model, however it does not exactly exclude the top-down model. Furthermore a number of studies have found evidence more in line the top-down model. For example several EEG studies have found ERP’s evoked by audio-visual stimuli as early as 40 ms over parietal-occipital channels (Giard and Peronnet, 1999; Fort et al., 2002). These early potentials indicate a feed-forward instead of a feedback mechanism. The activation of unimodal and multimodal brain areas are found around the same time. Furthermore these early ERP’s are found too soon after stimulus onset to have been caused by feedforward projections from unimodal brain areas (Foxe and Schroeder, 2005). Further evidence 15 supporting top-down modulation comes from imaging studies. Several studies have found and enhanced activation of sensory specific cortices during the processing of multimodal stimuli compared to unimodal stimuli. If there were only feed-forward connections from unimodal to multimodal areas, this enhancement would not be found. Additionally connectivity between primary sensory modules (Falchier et al., 2002; Rockland and Ojima, 2003) also argues against purely feed-forward connections. In summary both behavioral and brain activity studies have contributed to the knowledge pertaining audio-visual integration. Behavioral studies display differences in reaction to unimodal and multimodal stimuli and both EEG and imaging studies give us a better insight in brain activity evoked by these stimuli. However when using EEG and imaging techniques to gain more insight in the nature of the integration of audio-visual stimuli results can be conflicting. EEG and imaging studies have revealed evidence supporting both the bottom-up model as well as the top-down model. It seems however that the bottom-up model is mostly supported by imaging studies, while evidence supporting the top-down model comes primarily from EEG studies and to a lesser degree from imaging studies. Results indicate that both bottom-up and top-down projections play a part in the integration of audio-visual stimuli. Overall research to the integration of emotional audio-visual stimuli seems to still be in the exploratory phase. Studies primarily focus on activation elicited by audio-visual stimuli. And although a number of studies have gone further than simply studying which brain areas show activation in response to particular stimuli and have investigated the connectivity between specific brain regions when participants are presented non-emotional stimuli, similar studies using emotional audio-visual stimuli are scarce. These type of studies would increase our understanding of the nature of the processing of emotional and nonemotional audio-visual stimuli. In order to expand what is known on audio-visual processing and specifically emotional audio-visual processing both EEG and imaging techniques should be used. As said before EEG has a high temporal and a low spatial resolution, fMRI on the other hand has a high spatial and a low temporal resolution. This results in complementary but separate information from both techniques. In addition it is also possible to use combined EEG-fMRI. This may reveal underlying generators of EEG activity which cannot be found by using EEG alone (Goldman et al., 2000; Lemieux et al, 2001). Combining these offers the advantage of a high spatial and temporal resolution. However it is still a fairy new technique and there seems to be no consensus yet over how to most effectively combine the techniques. Still it may give us some valuable insight in the processing of emotional audio-visual stimuli. As of yet very few studies have used concordant EEG/fMRI to study bimodal integration, let alone emotional audio-visual integration. All in all emotional audio-visual stimuli modulate behavior and brain activity in several ways. First of all it leads to a faster RT and a more accurate recognition of stimuli compared to unimodal stimulation. Furthermore the processing of bimodal stimuli leads to a modulation of several ERP component and evokes enhanced brain activity in unimodal and multimodal brain areas. However these results are dependent on the type of emotion 16 presented. Some emotions affect ERP components and brain activity stronger than others. Furthermore different emotions may cause enhanced activity in different brain areas. Still even with the slightly different results in response to different emotions it is clear that emotional audio-visual stimuli affect brain activity and behavior differently than unimodal stimuli. At this point in time it is unclear whether the integration of emotional audio-visual stimuli is predominantly driven by feed-forward or feedback projections. Both EEG and imaging studies reveal that purely feed-forward projections of unisensory areas to multisensory areas are not enough to explain the activity found in uni- and multisensory areas for both emotional and non-emotional stimuli, indicating at least a partial role of feedback connections. Considering the results it is more logically to assume that multimodal stimuli are processed using a combination of feed-forward and feedback projections. In short emotional audio-visual stimuli most definitely show interaction during processing of sensory information and show a different processing of sensory information compared to unimodal stimuli and even non-emotional stimuli. However the underlying mechanisms are not quite clear yet. Acknowledgements: I would like to thank my supervisors and reviewers for the assistance in writing and for reviewing this thesis. 17 References Calvert, G.A., Bullmore, E.T., Brammer, M.J., Campbell, R., Williams, S.C.R., McGuire, P.K., Woodruff, P.W.R., Iversen, S.D., David, A.S. (2002). Activation of auditory cortex during silent lipreading. Science. 276, 593-596. Calvert, G.A., Campbell, R., Brammer, M.J. (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology. 10:11, 649-657. Cappe, C., Barone, P. (2005). Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. European Journal of Neuroscience. 22, 2886-2902. Collignon, O., Girard, S., Gosselin, F., Roy, S., Saint-Amour, D., Lassonde, M., Lepore, F. (2008). Audio-visual integration of emotional expression. Brain Research. 1242, 126-135. Collignon, O., Girard, S., Gosselin, F., Roy, S., Saint-Amour, D., Lepore, F., Lassonde, M. (2010). Women process multisensory emotion expressions more efficiently than men. Neuropsychologia. 48, 220-225. Corballis, M.C. (1998). Interhemispheric neural summation in the absence of the corpus callosum. Brain. 121, 1795-1807. Corballis, M.C. (2002). Hemispheric interaction in simple reaction time. Neuropsychologia. 40, 423-434. Dolan, R.J., Morris, J.S., de Gelder, B. (2001). Crossmodal binding of fear in voice and face. PNAS. 98:17, 1000610010. Eckert, M.A., Kamdar, N.V., Chang, C.E., Beckmann, C.F., Greicius, M.D., Menon, V. (2008). A cross-modal system linking primary auditory and visual cortices: Evidence from intrinsic fMRI connectivity analysis. Human Brain Mapping. 29, 848-857. Ekman, P., Friesen, W.V. (1969). The repertoire of nonverbal behavior – Categories, origins, usage and coding. Semiotica. 1, 49-98. Ekman, P., Friesen, W.V. (1971). Constants across culture in the face and emotion. Journal of Personality and Social Psychology. 17:2, 124-129. Ethofer, T., Anders, S., Erb, M., Droll, C., Royen, L., Saur, R., Reiterer, S., Grodd, W., Wildgruber, D. (2006). Impact of voice on emotional judgment of faces: An event-related fMRI study. Human Brain Mapping. 27, 707-714. Falchier, A., Clavagnier, S., Barone, P., Kennedy, H. (2002). Anatomical evidence of multimodal integration in primate striate cortex. Finney, E.M., Fine, I., Dobkins, K.R. (2001). Visual stimuli activate auditory cortex in the deaf. Nature Neuroscience. 4:12, 1171-1173. Fort, A., Delpuech, C., Pernier, J., Giard, M.H. (2002). Dynamics of cortico-subcortical cross-modal operations involved in audio-visual object detection in humans. Cereb Cortex. 12:10, 1031-1039. Foxe, J.J., Schroeder, C.E. (2005). The case for feedforward multisensory convergence during early cortical processing. NeuroReport. 16:5, 419-423. de Gelder, B., Böcker, K.B.E., Tuomainen, J., Hensen, M., Vroomen, J. (1999). The combined perception of emotion from voice and face: early interaction revealed by human electric brain responses. Neuroscience Letters. 260, 133-136. de Gelder, B., Vroomen, J. (2000). The perception of emotion by ear and by eye. Cognition and Emotion. 14:3, 289-311. de Gelder, B., Vroomen, J., Bertelson, P. (1998). Upright but not inverted faces modify the perception of emotion in the voice. Current Psychology of Cognition. 17, 1021-1032. Giard, M.H., Peronnet, F. (1999). Auditory-visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience. 11:5, 473-490. Gielen, S.C.A.M., Schmidt, R.A., van den Heuvel, P.J.M. (1983). On the nature of intersensory facilitation of reaction time. Perception and Psychophysics. 34:2, 161-168. Goldman, R.I., Stern, J.M., Engel, J. Jr, Cohen, M.S. (2000). Aquiring simultaneous EEG and functional MRI. Clinical Neurophysiology. 111:11, 1974-1980. Gondan, M., Niederhaus, B., Rösler, F., Röder, B. (2005). Multisensory processing in the redundant-target effect: A behavioral and event-related potential study. Perception and Psychophysics. 67:4, 713-726. Grant, K.W. (2001). The effect of speechreading on masked detection threshold for filtered speech. The Journal of the Acoustical Society of America. 109:5, 2272-2275. Grant, K.W., Seitz, P.,-F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. The Journal of the Acoustical Society of America. 108:3, 1197-1208. 18 Grossman, T. (2010). The development of emotional perception in face and voice during infancy. Restorative Neurology and Neuroscience. 28, 219-236. Jones, J.A., Callan, D.E. (2003). Brain activity during audiovisual speech perception: An fMRI study of the McGruk effect. NeuroReport. 14:8, 1129-1133. Kreifelts, B., Ethofer, T., Grodd, W., Erb, M., Wildgruber, D. (2007). Audiovisual integration of emotional signals in the voice and face: An event-related fMRI study. NeuroImage. 37, 1445-1456. Lemieux, L., Salek-Haddadi, A., Josephs, O., Allen, P., Toms, N., Scott, C., Krakow, K., Turner, R., Fish, D.R. (2001). Event-related fMRI with simultaneous and continuous EEG: Description of the method and initial case support. NeuroImage. 14:3, 780-787. Macaluso, E. (2006). Multisensory processing in sensory-specific cortical areas. Neuroscientist. 12:4, 327-338. Macaluso, E., Frith, C.D., Driver, J. (2002). Directing attention to locations and to sensory modalities: multiple levels of selective processing revealed with PET. Cereb Cortex. 12, 357–68. Magnée, M.J.C.M., de Gelder, B., van Engeland, H., Kemner, C. (2008). Atypical processing of fearful face-voice pairs in Pervasive Development Disorder: An ERP study. Clinical Neurophysiology. 119, 2004-2010. Magnée, M.J.C.M., de Gelder, B., van Engeland, H., Kemner, C. (2011). Multisensory integration and attention in Autism Spectrum Disorder: evidence from Event-Related Potentials. PLoS ONE 6:8, e24196. Marzi, C.A., Smania, N., Martini, M.C., Gambina, G., Tomelleri, G., Palamara, A., Alessandrini, F., Prior, M. (1996).Implicit redundant-target effect in visual extinction. Neuropsychologia. 34:1, 9-22. Massaro, D.W., Egan, P.B. (1996). Perceiving affect from the voice and the face. Psychonomic Bullitin and Review. 3:2, 215-221. McGurk H., MacDonald J. (1976). Hearing lips and seeing voices. Nature. 264, 746–748. Miller, J. (1982). Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology. 14, 247-279. Miller, J. (1991). Channel interaction and the redundant-target effects in bimodal divided attention. Journal of Experimental Psychology: Human Perception and Performance. 17:1, 160-169. Molholm, S., Ritter, W., Murray, M.M., Javitt, D.C., Foxe, J.J. (2004). Multisensory visual-auditory object recognition in human: A high-density electrical mapping study. Cerebral Cortex. 14, 452-465. Molholm, S., Ritter, W., Murray, M.M., Javitt, D.C., Schroeder, C.E., Foxe, J.J. (2002). Multisensory auditoryvisual interactions during early sensory processing in humans: a high-density electrical mapping study. Cognitive Brain Research. 14, 155-128. Molholm, S., Sehatpour, P., Mehta, A.D., Shpaner, M., Gomez-Ramirez, M., Ortigue, S., Dyke, J.P., Schwartz, T.H., Foxe, J.J. (2006). Audio-visual multisensory integration in superior parietal lobule revealed by human intracranial recordings. Journal of Neurophysiology. 96, 721-729. Murray, M.M., Foxe, J.J., Higgins, B.A., Javitt, D.C., Schroeder, C.E. (2001). Visio-spatial neural response interactions in early cortical processing during a simple reaction time task: A high density electrical mapping study. Neuropsychologia. 39, 828-844. Olsen, I.R., Plotzker, A., Ezzyat, Y. (2007). The enigmatic temporal pole: A review of findings on social and emotional processing. Brain. 130, 1718-1731. Paulmann, S., Pell, M.D. (2010). Contextual influences of emotional speech prosody on emotional face processing: How much is enough? Cognitive, Affective and Behavioral Neuroscience. 10:2, 230-242. Pazo-Alvarez, P., Cadaveira, F., Amenedo, E. (2003). MMN in the visual modality: A review. Biological Psychology. 63, 199-236. Pell, M.D., Monetta, L., Paulmann, S., Kotz, S.A. (2009). Recognizing emotions in a foreign language. Journal of Nonverbal Behavior. 33, 107-120. Pourtois, G., Debatisse, D., Despland, P.-A., de Gelder, B. (2002). Facial expressions modulate the time course of long latency auditory brain potentials. Cognitive Brain Research. 14, 99-105. Pourtois, G., de Gelder, B., Bol, A., Crommelink, M. (2005). Perception of facial expressions and voices and of their combination in the human brain. Cortex. 41, 49-59. Pourtois, G., de Gelder, B., Vroomen, J., Rossoin, B., Crommelinck, M. (2000). The time-course of intermodal binding between seeing and hearing affective information. NeuroReport. 11:6, 1329-1333. Robins, D.L., Hunyadi, E., Schultz, R.T. (2009). Superior temporal activation in response to dynamic audio-visual emotional cues. Brain and Cognition. 69, 269-278. Rockland, K.S., Ojima, H. (2003). Multisensory convergance in calcarine visual areas in macaque monkey. International Journal of Psychology. 50, 19-26. Sadato, N., Pascual-Leone, A., Grafman, J., Ibañez, V., Deiber, M.-P., Dold, G., Hallet, M. (1996). Activation of the primary visual cortex by Braille reading in blind subjects. Nature. 380, 526-528. 19 Scherer, K.R., Banse, R., Wallbott, H.G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology. 32:1, 76-92. Scheef, L., Boecker, H., Daamen, M., Fehse, U., Landsberg, M.W., Granath, D.-O., Mechling, H., Effenberg, A.O. (2009). Multimodal motion processing in area V5/MT: Evidence from an artificial class of audio-visual events. Brain Research. 1252, 94-104. Schwartz, J.-L., Berthommier, F., Savariaux, C. (2004). Seeing to hear better: Evidence for early audio-visual interactions in speech identification. Cognition. 93:2, B69-B78. Schröger, E., Widmann, A. (1998). Speeded responses to audiovisual changes result from bimodal integration. Psychophysiology. 35, 755-759 Shams, L., Kamitani, Y., Thompson, S., Shimojo, S. (2001). Sound alters visual evoked potentials in humans. NeuroReport. 12:17, 3849-3852. de Silva, L.C., Miyasato, T., Nakatsu, R. (1997). Facial emotion recognition using Multi-modal information. Communications and Signal processing. 1, 397-401. Spence, C., Driver, J. (2000). Attracting attention to the illusory location of a sound: reflexive crossmodal orienting and ventriloquism. NeuroReport. 11:9, 2057-2061. Teder-Sälejärvi, W.A., Di Russo, F., McDonald, J.J., Hillyard, S.A. (2005). Effects of spatial congruity on audiovisual multimodal integration. Journal of Cognitive Neuroscience. 17:9, 1396-1409. Teder-Sälejärvi, W.A., McDonald, J.J., Di Russo, F., Hillyard, S.A. (2002). An analysis of audio-visual crossmodal integration by means of event-related potential (ERP) recordings. Cognitive Brain Research. 14, 106114. Thompson, W.F., Balkwill, L-L. (2006). Decoding speech prosody in five languages. Semiotica. 158 (1/4). 407424. Vroomen, J., Collier, R., Mozziconacci, S. (1993). Duration and intonation in emotional speech. Proceedings of the Third European Conference on Speech Communication and Technology, Berlin. 577-580. Berlin: ESCA. Vroomen, J., Bertelson, P., de Gelder, B. (2001a). Directing spatial attention towards the illusory location of a ventriloquized sound. Acta Psychologica. 108, 21-33. Vroomen, J., Driver, J., de Gelder, B. (2001b). Is cross-modal integration of emotional expressions independent of attentional resources? Cognitive, Affective, and Behavioral Neuroscience. 1:4, 382-387. Zhao, L., Li, J. (2006). Visual mismatch negativity elicited by facial expressions under non-attentional conditions. Neuroscience Letters. 410, 126-131. 20