Lorraine E Bahrick, Florida International University, Miami, Florida, USA
Integration versus differentiation
Amodal invariant relations
Auditory±visual correspondence
Bimodal perception of speech
Intermodal perception is the perception of unitary
objects and events through spatially and temporally
coordinated stimulation from multiple sense modalities. Research suggests that the senses are united
in early infancy, fostering the rapid development of
intermodal perception.
Intermodal perception is the perception of an object
or event that makes information available to two or
more sensory systems simultaneously. Most objects
and events are multimodal in that they can be
experienced through multiple sense modalities.
For example, a person talking, a fire, or a bouncing
ball can all be seen as well as heard and felt. Intermodal perception is thus one of the most fundamental human capabilities and forms the basis for
most of what we perceive, learn, and remember.
One of the questions developmental psychologists
have asked is how and when the child comes to
perceive multimodal events as single, unitary
events, in the way adults do. For example, without
prior experience with objects and events, how does
the infant learn that certain patterns of auditory
and visual stimulation, such as the sight of the
mother's face and the sound of her voice, belong
together and constitute a unitary event, whereas
other concurrent patterns of sensory stimulation
are unrelated? How does the child acquire intermodal knowledge such that the sound of footsteps
in the hallway will elicit the expectation of seeing a
person in the doorway?
Researchers have discovered that intermodal perception develops rapidly during infancy. Infants
Visual±tactile correspondence
Visual±motor correspondence and the self
Neural bases of intermodal perception
are intrinsically motivated to pick up new information. Some researchers (e.g., Piaget, 1954) have
characterized the development of intermodal perception as a process of integration. According to
this view, the senses are separate at birth and the
infant must gradually learn to put together or `integrate' stimulation from the different sense modalities in order to perceive a unitary multimodal
event. This `integration' may occur through associating concurrent information across different modalities. Thus, before integration takes place,
infants would perceive only unrelated streams of
light, sound, or tactile impressions. A contrasting
position is the `differentiation' view of development (e.g., Gibson, 1969). According to this view,
the senses are unified at birth, and perceptual development is characterized by a progressive process of `differentiation' of increasingly finer levels
of stimulation. Thus, in early infancy, information
from the different senses must be gradually separated from the global, undifferentiated perceptual
array. From this perspective, intermodal perception of some kinds of information is possible at
birth and infants continue to show perceptual
learning of more complex multimodal relations
throughout infancy and early childhood.
Recent evidence has demonstrated that young
infants are adept at perceiving a wide array of
multimodal objects and events and they do so by
detecting information that is common, or invariant,
across the senses. This body of research has thus
weakened the integration position, especially when
intermodal abilities are discovered in very young
infants who have had little opportunity to learn to
associate or integrate information across the senses.
Much infant research has provided support for the
differentiation view, particularly the large body of
research on young infants' detection of `amodal
invariants', suggesting that the senses are unified
in early infancy.
Amodal information is information that is not specific to a particular sense modality, but is completely redundant or invariant across two or more
senses. For example, the sights and sounds of
hands clapping share a synchrony relation, a
common tempo of action, and a common rhythm.
The same rhythm and tempo can be detected by
watching or hearing the hands clap. Thus, synchrony, rhythm, and tempo are `amodal invariant
relations' in that this information can be perceived
across different sense modalities. Most information
that is amodal characterizes how events are distributed in space and time, two of the most fundamental dimensions of our experience. According to the
differentiation view, detection of amodal relations
focuses attention on meaningful, unitary events
and buffers infants from making incongruent, inappropriate associations (e.g., Bahrick and Pickens,
1994). For example, if the infant detects synchrony,
shared rhythm, and common tempo between the
sight of a person's moving face and the sound of
the person's voice, the infant would necessarily be
attending to a unitary event: the person talking. In
this way, unrelated sounds and movements would
not be merged with the event. In support of the
differentiation view, research has found that
young infants detect a wide array of amodal invariant relations in multimodal events.
In contrast to amodal relations, information can
also be nonredundant and arbitrarily related across
the sense modalities (e.g., speech sounds and the
objects they refer to; particular faces and voices).
Information such as color, pattern, timbre, or pitch
is `modality-specific' and can be perceived only
through a single sense modality. Research suggests
that infants detect amodal relations (such as temporal synchrony) developmentally prior to arbitrary relations, and detection of amodal relations
can then guide and constrain learning about arbitrary relations.
To assess intermodal perception of auditory±visual
relations, sometimes an intermodal preference
method is used. In this method, infants view two
filmed events simultaneously, along with the
soundtrack to one of them coming from a centralized speaker. It is expected that if the infant detects
the intermodal relations, he or she will look longer
at the film that belongs with the soundtrack played.
Research using these and similar procedures has
demonstrated that young infants display a wide
array of intersensory abilities in the area of audiovisual perception (see Gibson and Pick, 2000;
Lewkowicz, 2000; Lewkowicz and Lickliter, 1994).
Neonates turn their eyes in the direction of a sound,
demonstrating a basic coordination of audio-visual
space. In the first month of life, infants detect the
temporal synchrony between sights and sounds of
an object striking a surface, and the spatial location
common to the sights and sounds of a moving
object. By three to five months, infants can match
films and soundtracks of moving objects on the
basis of their substance (rigid versus elastic) or
their composition (single versus multiple objects),
as well as the rhythm and tempo of their impact
sounds. Further, by four to six months, infants can
match faces and voices on the basis of affective
expressions, including happy, sad, neutral, and
angry (Walker-Andrews, 1997). They can also
match faces and voices on the basis of age (adults
versus children) and gender of speaker. All these
relations are amodal and invariant across vision
and audition.
The perception of speech, an auditory±visual event,
has traditionally been studied as a unimodal, auditory event. However, speech is produced by a
speaker who can be heard and seen, and who typically uses gesture as well. It turns out that the multimodal nature of speech is salient to infants and
facilitates its perception (e.g., Meltzoff and Kuhl,
1994). By the age of at least two months, infants
are sensitive to voice±lip synchrony during speech.
By four months, infants are able to detect the voice±
lip correspondence between speech sounds such as
`a' and `i'. When one of these speech sounds is
played in synchrony with two films side by side of
a speaker's face intoning each sound, infants look
more to the face with the matching lip movements.
The McGurk effect, an auditory±visual illusion, also
illustrates how infants and adults merge information for speech across the senses. When we view the
face of a person speaking one speech sound such as
`ga`, while hearing a different speech sound, for
example `ba', we perceive another sound, `da', a
blend between the two. Infants show evidence of
this effect in the first half-year of life. Visual input
appears to have significant auditory consequences.
Amodal information during speech is also important for learning the arbitrary relation between
speech sounds and the objects they denote (Gogate
et al., 2001). By 14 months of age, infants are able to
learn to pair a speech sound and an object during a
brief familiarization. However, if amodal synchrony unites the sounds and object movements,
for example in showing and naming the object
simultaneously, infants can learn the relation as
early as seven months of age. Adults even match
their teaching style to the infant's needs. They use
more synchronous movement with labeling, to
highlight object±sound relations, when they are
first teaching the names of new objects to their
young infants. Their use of synchrony decreases
as infants become more linguistically competent.
Further evidence for the importance of visual information for perceiving speech lies in the success of
teaching speech to deaf individuals using a visual
depiction of the lip and tongue movements involved in different speech sounds.
Amodal invariant relations also unite perception
across vision and touch. Information for shape, texture, substance, and size are invariant across visual
and tactile stimulation (Rose and Ruff, 1987). One
method for investigating perception of visual±
tactile correspondence is the cross-modal transfer
method. An object is presented to one sense modality alone, and a preference test is then given in
another sense modality to determine whether the
information transfers across modalities. Using this
method, research has shown that, by the age of one
month, infants can perceive the correspondence between an object they experienced tactually (on the
back or a pacifier) and a visual replica of the object.
Infants looked more to the object of the shape and
texture that they had previously experienced orally.
Infants are also able to transfer information about
the substance of an object (rigid versus deforming)
across touch and sight.
Evidence also shows that infants can transfer
information obtained through manual exploration
to vision, and this develops across the first year.
One factor determining the extent to which manual
information is perceived is whether exploration is
active or passive. Tactile exploration develops over
the first year. Young infants tend to grasp objects,
whereas older infants become more adept at
obtaining tactile feedback by moving their hand
relative to the object's surface. By four months,
infants can perceive whether two parts of objects
are connected or separate, by the type of motion
they produce during haptic exploration. By six
months, infants can recognize the shape of an object
visually that they have manually explored, as long
as exploration is active.
Infants are also able to perceive information specifying the self by detecting amodal invariant relations (Rochat, 1995). Even in the first weeks of life,
infants can imitate facial expressions. In order to do
this, they must relate the visual appearance of the
adult's facial expression with their own production
of the expression. This is most probably guided
by proprioception: proprioception is information
about self-movement based on feedback from the
muscles, joints, and vestibular system. Facial imitation reveals evidence of early intermodal coordination between visual information and motor
behavior, and this coordination continues to develop over the first year (Meltzoff and Moore,
Infants also show evidence of self-perception by
detecting amodal invariant relations in a procedure
where they view their own body moving live in
a video display (Bahrick, 1995). By three to five
months, infants can distinguish between a live
video of their own legs kicking and a video of
another infant's legs kicking, a pre-recorded video
of their own legs, or a spatially incongruent video of
their own legs. They do this by detecting the
amodal temporal synchrony and spatial relations
common to the visual display of their motion and
the proprioceptive experience of their motion.
Infants also demonstrate a phenomenon called
`visually guided reaching', which develops rapidly
during the first year. That is, they show continuous
adjustments in their reaching and manual behavior
as a function of visual input about the size, shape,
and position of objects. Infants are even able to
contact a moving object by aiming their reach
ahead of the object and taking into account the
speed and direction of its movement as well as
that of their arm motion. Later, infants show an
ability to adapt their crawling and exploratory behavior as a function of visual information about the
slant and solidity of the surface. These examples
illustrate a close coupling between vision and
motor behavior and an understanding of self in
relation to objects (Gibson and Pick, 2000).
Behavioral research on the rapid development of
intermodal perception during infancy is consistent
with research findings from the neurosciences.
Some areas of the brain (cortex, superior colliculus)
contain `multimodal neurons' that respond to
inputs from multiple sense modalities, providing
a biological basis for the early integration of the
senses (e.g., Stein and Meredith, 1993). Further,
some cells of the superior colliculus (devoted to
attention and orienting) are activated much more
by simultaneous auditory and visual inputs than
by either auditory or visual information alone.
Other cells, however, are modality-specific but
can have receptive fields that are spatially coordinated across the sense modalities as a result of experience with multimodal events. Thus, auditory
and visual input from the same spatial location can
be related. Neurophysiological findings suggest
that if input to one modality is somehow modified,
the receptive field of cells in the superior colliculus
can compensate and realign with those of the other
modality to maintain a coherent multimodal spatial
mapping. The early plasticity of the brain, its sensitivity to multimodal inputs, and its reliance on
experience in the multimodal world to guide neuronal development, appears well tailored to the behavioral findings of the early development of
intermodal perception.
Infants demonstrate a diverse array of intermodal
abilities. These abilities illustrate the close connection between the senses during early development
and the rapid growth in intersensory abilities
across the first year of life. Development appears
to be guided by the detection of amodal, invariant
relations, and this promotes accurate and unitary
perception of multimodal events.
