Second Language Acquisition of Suprasegmental Phonology
Lesley Carmichael
1.0 Introduction
New questions need to be asked in second language acquisition (SLA) research: Can
second language learners have differential success in acquiring the segmental vs.
suprasegmental phonology of a second language (L2)? That is, are segmental and
suprasegmental phonology independent aspects of phonological acquisition? And if so,
are they necessarily constrained according to the same age-based schedule of acquisition?
In this paper, I draw from the SLA literature to demonstrate how work that has already
been done indicates a natural division between segmental and suprasegmental phonology
in terms of SLA. I propose that second language learners do tend to achieve differential
pronunciation success of segmental and suprasegmental components of speech as a factor
of the age-related characteristics of their SLA experience. Specifically, suprasegmental
acquisition success in an L2 may require an earlier start in life than segmental acquisition
success. The decline of an L2 learner’s ability to acquire nativelike intonation appears to
begin at a very young age, probably before the decline of the ability to achieve nativelike
segmental phonology in an L2.
1.1 Outline of discussion
A review of SLA phonology research and the generally accepted separation of
phonological and morphosyntactic acquisition lay the foundation for the paper. A critique
of SLA research on the acquisition of phonology will present the tendency of SLA
research to evaluate segmental and suprasegmental phonology together as a single
system, and will motivate a natural subdivision between them. I will discuss the
importance of intonation in (the perception of) foreign-accented speech, substantiating
the need to investigate the acquisition of intonation as an independent linguistic system.
Intonation is assessed in terms of its role in the discrimination of languages and its role in
assisting the development of other linguistic subsystems. A look at first language
acquisition suggests that suprasegmental phonology begins developing earlier than
segmental phonology in the development of first language (L1) competence. Combined,
these evaluations of L1 and L2 acquisition research suggest that the segmental and
suprasegmental components of L2 phonology are subject to different developmental
constraints. A new, finer-grained research agenda is proposed to properly characterize the
acquisition of L2 phonology and pronunciation. This agenda includes new approaches
designed to isolate the respective contributions of segmental and suprasegmental
production to foreign accent. The agenda also proposes independent mappings of
segmental and suprasegmental production data to age of acquisition data in order to
assess whether segmental and suprasegmental phonology have different maturational
1.2 Terminology
The terms suprasegmental phonology, intonation, and prosody are used with various
intentions and interpretations across linguistic subdisciplines. In this paper, the term
suprasegmental phonology refers to phrase- and discourse-level pronunciation
components such as the manipulation of the fundamental frequency of the voice, pausing,
and the timing of pitch events across phrases, sentences, and discourse units. I will use
the term intonation to primarily indicate the manipulation of pitch, but also to broadly
include features such as duration and loudness. (The term prosody is often used to refer to
the collective arrangement of pitch, loudness, and duration features.) The distinction
between intonation and prosody are not critical to this particular discussion of a
differentiation between the L2 acquisition of segmental and suprasegmental phonology. It
may be the case that a distinction between intonation and prosody is relevant in other
discussions of the acquisition of SLA phonology, and will become more relevant as
investigation into SLA continues.
2.0 Age of acquisition and second language success
Lenneberg (1967) originally theorized that puberty marked the end of a biological critical
period for language acquisition. Since then, the acquisition of individual linguistic
subsystems (such as phonology and morphology) have been studied as well as language
as a whole. The later in life an individual begins to learn a second language, the more
elusive nativelike pronunciation of the L2 becomes. This observation is supported by a
large body of L2 research correlating the presence and degree of foreign accent in speech
with the starting age of L2 acquisition (e.g., Scovel, 1969, 1988; Birdsong, 1999; Flege et
al.., 1995; Carmichael, 2000). Overall, this research seems to indicate that L2 acquisition
at some point before complete biological maturation is critical for a learner to be able to
attain a nativelike command of the pronunciation, morphology, and syntax of the L21.
There are attested cases of late L2 learners achieving a nativelike command of the L2. Bongaerts, Planken
and Schils (1995) (and the 1997 replication by Bongaerts, van Summeren, Planken, and Schils) found that a
small number of late second language learners could pass for native speakers phonologically. Scovel (1988)
describes the Joseph Conrad effect, a phenomenon in which L2 learners can pass for natives in the areas of
2.1 Independent critical periods for SLA of phonology and syntax
SLA research shows evidence for differential acquisition success of phonology and
syntax relative to the starting point of SLA during a learner’s life span. There is general
agreement among SLA researchers that L2 learners who begin learning an L2 before
roughly puberty can achieve good syntax (e.g., Patkowski’s [1980] study of ultimate
attainment of L2 syntax and age of arrival effects). Many researchers also maintain that
the same general biological constraints (i.e., puberty) mark the end of one’s ability to
acquire a nativelike pronunciation.2 For example, Patkowski (1990) found a discrete shift
in pronunciation ability corresponding to an age of acquisition at about age 15, indicating
the end of a critical period for phonological acquisition. Flege, Munro, and MacKay
(1995) also proposed a discrete age-related constraint on phonological acquisition, but
instead found a continuous decline in ratings of foreign accent with respect to L2
learners’ age of arrival in the L2 environment. Research comparing the perceived
nativeness of (both) the grammar and pronunciation attained by L2 learners seems to
indicate a natural split between these two aspects of language acquisition. Ioup (1984)
found that native English listeners could classify native and nonnative English speakers
according to their pronunciation but not by their use of syntax. She later concluded that
near-native fluency of syntax is achieved more easily by adults than near-native
pronunciation (1987). In Scovel’s (1988) perception study, native speaker judges
syntax, morphology, and lexicon upon examination of writing samples. The evaluation of SLA in this paper
is based on the prominent trend in SLA research indicating that accent-free speech is generally unattainable
for learners beginning the SLA process later in life.
This paper is mainly concerned with the performance, or productive, aspect of intonation. It should be
noted that infants demonstrate a perceptive sensitivity to intonation before they begin to produce intonation
features of their native language (see section 6 for a discussion of intonation in L1, including infants’
perceptual abilities). A gap between the comprehension and production of both phonology and syntax is
well attested (see the volume on competence and performance in SLA edited by Brown et al., 1996).
correctly identified nonnatives 97% of the time based on speech but only 47% of the time
based on writing samples, suggesting that the spoken language contained stronger
evidence of nonnativeness than the grammar did. He claimed that even highly advanced
L2 users can be easily identified as nonnative speakers (p. 108). These differential
acquisition outcomes have been related to differences in the acquisition experiences of
the learners: specifically, the time of life when a second language phonology can be
successfully achieved passes earlier than the time of life when a second language syntax
can be successfully achieved.
2.2 Maturational effects on second language pronunciation
While there is general agreement that people who acquire an L2 before puberty can attain
a good, communicative command of syntax (not easily identified as nonnative), there is
less accord regarding the relationship between age of acquisition and ultimate
pronunciation outcomes. Some work indicates that while most adults retain an accent
when they begin learning a second language after puberty, children who begin the
process before puberty show little or no foreign accent—and the younger they begin, the
better. Oyama (1976) found that L2 learners who were younger than 10 years of age
when they began learning their L2 were rated as having the least accent. Tahta, Wood
and Loewenthal (1981) observed a marked jump in ratings of foreign accentedness for L2
learners who began learning their L2 at age 12 as compared to those who began at age 11.
In Seliger, Krashen and Ladefoged (1975), L2 learners were asked to report on their own
degree of foreign accent in the languages they spoke. About 85% of those who
immigrated to the L2 environment before the age of 10 rated themselves as having no
accent in either their first or second language, while about 90% of those who immigrated
after the age of 15 rated themselves as having an accent in both languages. Those L2
learners who immigrated between the ages of 10-15 were split almost in half in their selfreports of accent vs. no accent in the L2. Snow and Hoefnagel-Hohle (1978) found that
the teens in their study were learning Dutch pronunciation (also morphology and syntax)
more quickly than the children were; however, Scovel (1988) points out in his review of
research on L2 pronunciation acquisition that those early outcomes do not prove that the
teens ultimately achieved greater pronunciation proficiency.
Other researchers propose that SLA must begin before even 5-7 years of age to result in
accent-free speech. Asher and Garcia (1969) found that 68% of the L2 learners who
began learning the L2 before 7 years of age (n=19) were rated as being near-native. Of
the L2 learners who began learning the L2 after 12 years of age (n=15), 93% were rated
as having a foreign accent. The L2 learners who began learning the L2 between ages 7-12
(n=37) were divided between near-native (41%), slight accent (43%), and definite accent
(16%). Fathman (1975) and Williams (1979) found that younger language students
retained less accent than adolescents who were just past puberty. Furthermore, younger
children have an eventual oral skill achievement advantage over older children. In
Fathman (1975), older children (11- to 15-year-olds) initially showed significant
advantages over younger children (6- to 10-year-olds) in acquiring second language
English. Three years later, the younger children performed better than the older ones on
the same pronunciation measures (and also on morphology and syntax). Other research
confirms this finding, with younger language learners eventually outperforming older
learners in oral skills (e.g., Oyama, 1978). Krashen, Long, and Scarcella (1979) reviewed
several studies on child and adult SLA and concluded that while older is better in terms
of the rate of SLA, younger is better in terms of ultimate attainment.
Scovel (1969) said that accent-free speech would be impossible for anyone who learned
an L2 after the language faculty had lateralized in the brain. Evidence cited by Krashen
(1973) suggested that the lateralization of language occurred around age 5, long before
puberty. Krashen interpreted this to mean that lateralization could therefore have nothing
to do with a critical period for language acquisition, nor could it present any kind of a
biological barrier to the attainment of nativelike pronunciation. (It should be noted that at
that time, a single critical period ending around puberty was a fairly new hypothesis.)
SLA research since that time seems to not only require a softening of Krashen’s
interpretation, but perhaps even a reversal of it. Scovel (1988) actually proposes some
kind of “cutoff” for the ability to acquire nativelike L2 phonology at around age 6 or 7.
Long (1990) supports this claim in his survey of the evidence for critical periods in
language acquisition, saying that in order to attain authentic pronunciation, first exposure
to the language must be before age 6 for many people, and by about age 12 for the rest
(accounting for some exceptional language learners). The tenets of the Constructionist
hypothesis of SLA (Herschensohn, 2000) support observations of early exposure
followed by a progressive development of the L2 phonology, resulting in nativelike
pronunciation. Constructionism contends that an initial period of feature
underspecification is followed by a period in which L2 values are constructed by building
on other constructions. While presented mainly for the SLA of morphology and syntax,
the principles of Constructionism may well account for the development of L2
pronunciation over time.
The increasing granularity and sophistication of SLA research over the last 30 years is
yielding growing support for a hypothesis of L2 phonological acquisition in which
acquisition of the L2 must begin before about age 6 or 7.
3.0 The objects of phonological acquisition research
When age-based constraints are proposed for L2 phonological acquisition, exactly what
aspects of language are comprised by the term phonological? SLA research on
phonological acquisition has focused predominantly on segmental and articulatory
phonology, without particular regard to suprasegmental linguistic behavior (e.g., the
construction of new perceptual categories for segments: Flege, 1987; Bohn and Flege,
1990; e.g., the production of new sounds: Lado, 1957; Flege, 1987; Bohn and Flege,
1992; Logan, Lively, and Pisoni, 1991) or on a general measure such as “overall foreign
accent.” Recent pedagogically-driven work (see section 4.1) and work by Archibald
(1998) on second language phonology are notable exceptions. (Archibald [1998] includes
an investigation of the acquisition of L2 metrical structure, and offers a thorough account
of the interlanguage occurring during SLA.) Various components of phonology interact
in pronunciation, and studies of pronunciation acquisition have not considered all
subdivisions of phonology as possibly independent contributors to pronunciation
accuracy. The L2 acquisition of suprasegmental phonology has not been subjected to
much independent investigation; specifically, it has not been intentionally separated from
other aspects of phonological acquisition. Further, the heretofore limited interest in
suprasegmentals as an independent system within a language’s phonology has resulted in
studies of segmental acquisition which do not necessarily consider the influence of
suprasegmental features. Scovel (1988) pointed out in a criticism of Olsen and Samuels
(1973) (in which teen L2 learners were found to have better pronunciation than children
L2 learners) that since they did not consider intonation, the question of how “accented”
the speakers sounded was really left unanswered. More recent work has suggested that
intonation plays a very distinct role in terms of its contribution to a foreign accent (e.g.,
Magen, 1998, conducted a perception study using speech synthesized to contain
purported features of accentedness; and Wennerstrom, 2001, analyzed intonation used by
English as a Second Language speakers [see section 4.2]). Various speech-based
technologies are also drawing attention to the complexity of linguistic intonation and its
independent role in establishing pragmatic relationships and discourse structure
(Wichmann, 2000) and synthesizing natural sounding speech (Goldsmith, 1999) (also see
Cutler et al., 1997 for a review of prosody in lexical, sentence level, and discourse
processing). These new applications indicate a need for new research on intonation as its
own sophisticated linguistic system.
3.1 Factors measured in previous SLA phonology research
Age of acquisition was one factor assessed in an investigation of voice onset time
produced by Spanish speakers of English in a study by Flege et al.. (1998). In Riney,
Flege and Flege (1998), native English listeners assessed “global foreign accent” change
and improvement in the production of English [l] and [] by Japanese students attending
an American university over a period of four years. They found that the subjects reduced
their overall foreign accent during their time in the university and improved in their
ability to identify and produce the two liquid sounds. It is unclear exactly what factors the
native English respondents were attending to when they made decisions about the global
foreign accent they perceived in the Japanese students’ speech. In another study focusing
on L2 learners’ ability to perceive nonnative segmental contrasts, Lively et al.. (1994)
trained Japanese listeners to discriminate between English [l] and []. Follow-up testing
three months later showed that they retained the ability to discriminate the sounds
without any further training. They concluded that changes in perception occur from
selective attention to the acoustic cues that signal phonetic contrasts. It is tempting to
expand this general claim about developing perceptual discrimination to the
discrimination of intonational contrasts. Similar research focusing on intonation can help
determine whether selective attention also contributes to the perception of intonational
contrasts in another language.
Claims about critical periods for pronunciation or correlations between age and perceived
degrees of foreign accent, as exemplified by the studies mentioned herein, are typically
based on L2 learners’ ability to acquire the segmental articulation and phonology of the
target language. Others are unspecified in terms of what it is about the speech that is
causing listeners to perceive a foreign accent. (A recent exception is a study of the impact
of speech rate on perceptions of foreign accent by Munro and Derwing, 2001.)
Importantly, this means that most investigations of L2 phonological acquisition to date
not only have not explicitly considered intonation, they have not explicitly controlled for
it either. Flege, Munro, and MacKay (1995) found that foreign accents were detected in
the speech of Italian learners of English who began learning English before puberty;
however, it was not ascertained what aspects of pronunciation caused the listeners to
perceive a foreign accent. Piske, MacKay and Flege (2001) present an excellent review of
work examining factors that have been claimed to affect the degree of perceived foreign
accent in L2 (e.g., age of SLA, length of residence in the L2 environment, instruction,
motivation, etc.). But in their study (presented in the same paper), Piske et al.. continue to
use “overall degree of foreign accent” as the measurable outcome. This thorough work
can thus only tell us how the factors influenced the amalgamated end product of segment
production and intonation. This and many other studies dismiss the evaluation of
suprasegmental variation as beyond the scope of their current work; crucially, this
variation is not controlled for as a separate variable contributing to the resulting foreign
accent. Results which correlate (segmental) pronunciation metrics with perceived foreign
accent and age-related acquisition experience may actually reflect L2 learners’ use of
intonation as well.
4.0 Intonation is a key component of foreign accent
Theoretical linguistics has only engaged in a focused investigation of intonation since the
late 1970s (see Cutler et al.., 1997 for a comprehensive literature review). In this time, it
has also been acknowledged by language pedagogists and English as a Second Language
(ESL) researchers as a critical subcategory of language—one which plays a significant
role both in the production and perception of language (e.g., James, 1976; Pennington
and Richards, 1986; Nunan, 1999).
4.1 Intonation in communicative language teaching
Nonnative intonation interferes with the accurate communication of meaning, and a
mismatch between intonation and intention can even result in a complete contradiction of
a speaker’s intended message (Nash, 1971). James (1976) stressed the importance of
intonation as a critical factor in (second) language in his work using a speech visualizer
to assist speakers in modeling speech intonation. The best results were achieved by
speakers using an audiovisual representation of the target speech, including a pitch
contour, along with a feedback display showing speakers’ imitations (better than results
achieved with audio or audiovisual presentation but without a feedback display). He
claimed that incorrect use of prosodic components can cause not only misunderstanding
but a total breakdown in communication, and that his results suggested that intonation is
more essential to the acceptability of second language speech than articulation. This
suggestion has since been borne out in writings on second language pedagogy. As noted
by Nunan in his pedagogical book Second Language Teaching and Learning (1999),
improper use of stress, rhythm, and intonation present more difficulty for hearers than
does poor articulation.
The intentional and primary development of intonation in L2 learning is justified by
Abberton et al. (1978), who point out that intonation provides the temporal foundation for
the distribution and realization of segmental information. Without the organization
nativelike intonation provides, properly articulated segments are delivered in a nonnative
relation to each other. As approaches to language teaching have moved toward
communicative methods (see, for example, Lee and VanPatten, 1995), intonation is
taking a more prominent role in the instruction of pronunciation. While L2 teaching
methodologies have changed significantly and often over the years3, the advantage of
prosodic naturalness to communicative efficacy is becoming recognized in L2 pedagogy
(e.g., Nunan, 1999). Pennington and Richards (1986) explicitly state that the positioning
of sounds and words in speech is a critical aspect of pronunciation, and they discourage
the teaching of linguistic subunits in isolation.
4.2 Experimental assessments of L2 intonation
These tendencies in teaching are corroborated by analytical work with L2 learners.
Wennerstrom (2001) analyzed natural dialogues between native English speakers and
ESL learners within the framework of a phonological model of intonation, the Tones and
Break Indices (ToBI) model of standard varieties of English (see Silverman et al., 1992,
for a description). She assessed learners’ use of pitch to disambiguate new and given
information in a discourse and to signal turn taking. Learners who were rated by native
English speakers as sounding more fluent were better able to manipulate pitch to indicate
both turn taking and relationships among discourse elements. Wennerstrom concluded
that proper use of intonational features is an essential aspect of fluency. In addition, she
A number of L2 teaching methodologies have been developed over the last 50 years, including Situational
Language Teaching, Audiolingualism, Total Physical Response, and Communicative Language Teaching.
Different methods prescribe different roles for teachers and students and varying degrees of interactivity.
(Richards and Rogers, 2001)
asserted that intonation is clearly part of a grammatical system and not merely a stylistic
aspect of language and communication.
Wennerstrom’s study is part of a growing body of work indicating that intonation figures
prominently in the perception of foreign accented speech. Pedagogically driven work, as
indicated earlier, positions intonation as a critical foundation for good L2 pronunciation
and communication. Experimental phonetics and phonology create exciting possibilities
for assessing the contribution of intonation to the perception of foreign accent in a more
controlled way. Magen (1998) independently manipulated various segmental and
suprasegmental aspects of foreign accented speech through synthetic alteration and
presented the resulting speech stimuli to native speakers for judgments of accentedness.
In addition to adjusting segmental features such as vowel reduction, consonant aspiration,
and vowel tensing on Spanish-accented utterances, she resynthesized the utterances with
native English intonation contours. The results of her perception experiment (n=10
listeners; n=96 stimulus utterances) indicated that the fundamental frequency (f0)
behavior was the most influential cue (more so than all segmental cues) to the perception
of a foreign accent in the speech.
The results of my research on the contribution of L1 segmental phonology to a foreign
accent in an L2 (Carmichael, 2000) suggested that intonation is so strong a cue to foreign
accent that nonnative use of prosodic features can override accurate segmental phonology
to a native ear. In the study, it appeared that nonnative intonation on monosyllabic
utterances overrode accurate segmental production when listeners were asked to rate the
speech in terms of the degree of foreign accent they perceived. Speech tokens with nearnative segmental characteristics but an intonation contour lacking the typical American
English declarative intonation phrase features received average ratings of “moderately
heavy” to “heavy” accent (3-4 on a 5-point scale [0=no accent; 4=heavy accent]; n=69
respondents; n=192 stimulus tokens). In contrast, tokens with similar segmental features
but an English-like intonation contour received average ratings of “little” to “no” accent
(0-1 on the same 0-4 scale).
4.3 Intonation as an independent factor in the perception of foreign accent
The development of L2 intonation is undoubtedly part of phonological acquisition, as it is
a spoken characteristic of language with its own rules and meaningful contribution to the
language being spoken. Research is just beginning on the possible independence of
intonation within the boundaries of phonological acquisition. It is unknown whether
segmental and suprasegmental phonologies are necessarily constrained by the same
maturational schedule (see section 6 for a discussion of intonation in L1). There has also
been only limited discussion or experimentation thus far about the individual
contributions of segmental vs. suprasegmental aspects of speech to the presence of a
foreign accent (or to nativelike pronunciation, depending on the approach). Pitch proved
to be a stronger cue to the presence of foreign accent than segmental features in Magen
(1998), and intonation also appeared to have an impact on perception ratings in
Carmichael (2000) (see section 4.2). Recent work shows that the intonational structure of
an utterance is not isomorphic with syntactic structure (see Shattuck-Hufnagel and Turk,
1996 for a discussion of the role of prosodic structure in sentence representation), and it
requires its own parsing (Beckman, 1996). A system which is independently functional
and necessitates its own processing may make a unique contribution to the perception of
foreign accent. Intonation needs to be independently investigated to determine whether
excellent command of suprasegmental features outweighs inaccurate segmental
phonology in a native listener’s assessment of an L2 learner’s speech. It may be the case
that phrase-level intonation can even cause the perception of an authentic accent (as in
Carmichael, 2000, in which tokens containing measurably non-native segments and
nativelike intonation were perceived to have little foreign accent).
5.0 Intonation as a perceptual discriminant between languages
The perceptual import of intonation is also indicated by its robustness as a perceptual
discriminant between languages. A neural network model was employed by Cummins et
al. (1999) to empirically distinguish ten languages by their prosodic systems alone.
Synthesized f0 contours of spontaneous speech from native Cantonese, Japanese, and
English speakers provided sufficient information for adult native listeners to correctly
identify these languages from each other in a study done by Ohala and Gilbert (1978).
Even infants are able to discriminate between languages based on features of prosody
alone, as show in a variety of experiments by Jusczyk (1995), Mehler et al. (1997), Nazzi
et al. (1998), and Ramus and Mehler (1999). Jusczyk (1995) presented speech which had
been low-pass filtered at 400 Hz (with the intention of removing segmental information
and retaining the prosodic characteristics of the speech) to infants and found that at only
4.5 months of age, they indicated a preference for their native ambient language over
other languages. Low-pass filtered speech was also used in discrimination tasks presented
by Mehler et al. (1997) and Nazzi et al. (1998). In their experiments, newborns were
shown to discriminate between pairs of languages based only on the low-pass filtered
speech, and they concluded that newborns were particularly sensitive to one aspect of
suprasegmental phonology, linguistic rhythm (the syllable-, mora-, or stress-timing of a
language as described by Pike, 1945; Abercrombie, 1967; and Ladefoged 1975). In a
series of four experiments conducted by Ramus and Mehler (1999), a resynthesis
technique was used to selectively degrade cues to specific segments but preserve the full
frequency range of the speech signal (thus preventing the inherent unnaturalness and
incompleteness of low-pass filtered speech from being a factor in the experiments).
Consonant and vowel qualities were controlled in each experiment to allow for an
assessment of whether suprasegmental information is cued throughout the signal (in the
form of, e.g., energy bursts in particular frequency ranges or pitch excursions reflected in
higher frequencies). Four sets of resynthesized stimuli were created to examine which
aspects of speech were needed to discriminate between English and Japanese: a)
phonotactics and prosody, b) prosody, c) rhythm only, and d) f0 only. Syllabic rhythm
and intonation (f0) each independently emerged as sufficient discriminants between the
two languages for adult French speakers.
6.0 Intonation in L1
It is interesting that intonation carries such weight in marking speakers as nonnative, and
that it is a notoriously difficult aspect of an L2 to develop. The young human brain may
be “wired” in some way to develop language (see Kuhl, 1989, for a discussion of infants’
sophistication in perceiving speech sounds); could the persistence of foreign accent in L2
learners’ use of intonation be related to an early and quick development schedule for
suprasegmental acquisition? Research on the development path of first language
acquisition indicates that infants are sensitive to suprasegmental properties of language,
such as rhythm and intonation, long before they are responsive to segmental ones.
Experiments conducted by Ramus and Mehler (1999) indicate that newborns as young as
2-5 days old can discriminate between languages based on prosodic factors but not on
phonotactic ones. These results are consistent with Jusczyk’s evidence that infants do not
indicate familiarity with their native language’s phonotactics until they are 6-9 months
old (Jusczyk et al., 1993).
6.1 The development of language subsystems in L1
While the acquisition of intonation begins very early in life, the sophistication of a native
speaker’s L1 intonation is said to develop over time. According to Neufeld and
Schneiderman (1980), the linguistic system of intonation is in place quite early in life,
and subsequent advances are those of degree and not kind. An analogue to this concept is
the ability of adult L1ers to continue to learn new lexical items throughout life (after the
L1 has been fully acquired or developed, including the structural principles governing
possible lexical items). When new lexical items are learned, they conform to the
phonological and grammatical constraints that have already been acquired. Neufeld and
Schneiderman divide the development of ultimate attainment of intonation into primary
and secondary competence. Primary competence describes the typical 5-year-old, who
has acquired the intonational phonology of the L1 and speaks without an accent, but
whose use of intonation is less sophisticated than that of an adult. Children at this stage
can distinguish between sentence types (such as declarative, imperative, or interrogative)
and can use intonation to signal a variety of affective states. Secondary prosodic
competence develops into adolescence and adulthood and enables the speaker to signal
more subtle affective states and to use language in a more socially appropriate manner.
Thus, the building blocks of intonational phonology appear to be in place by about age 5,
with the ability to use new and more complicated grammatical constructions emerging as
they move into adulthood (Berko-Gleason, 1993). Neufeld-Kaiser examined children’s
ability to identify changes in a pronoun’s referent which were signaled by stress alone.
The children demonstrated sensitivity to the stress to near adult level by age 5 when the
contrastively stressed pronouns were part of a transitive context (e.g., “The turtle jumps
over the frog, and then the chicken comes along and jumps over HIM.”). The children
(and also the adults) did not perform as well, however, when the target pronouns were
part of a dative construction (e.g., “Give the turtle to the frog; now give ‘im to the
chicken; now give HIM to the lion.”). Neufeld-Kaiser concluded that their difficulty was
due to their working memory capacity, not because they were unable to use the stress
information properly (Neufeld-Kaiser, 1995).4 A comparison between prosodic and
syntactic development may weaken Neufeld and Schneiderman’s assessment of prosodic
development, however. Many syntactic constructions are not produced by native speakers
until well into adolescence, yet some researchers claim that syntactic competence is in
place in early childhood (e.g., Slobin, 1992). Are later changes in use of syntax, such as
the production of new constructions, changes of degree and not kind? Is the ability to
Neufeld-Kaiser refers to Daneman and Carpenter’s (1980) working memory measurement for language,
the reading span test, which showed considerable differences between individuals’ ability to recall
linguistic information. Also, MacDonald, Just, and Carpenter (1992) found that individuals with low
reading span scores responded more slowly to stimuli and interpreted ambiguous sentences incorrectly.
signal subtle distinctions with intonation any different than the ability to use syntax to
indicate more subtle grammatical relationships? Intonation is expressed by the
manipulation of continuous variables (such as f0 or amplitude), whereas the expression of
different syntactic constructions is discontinuous in nature (such as morphology, or the
presence, absence, or relative location of words). The continuous nature of the
components of intonation should not be misinterpreted to indicate that development of
the intonation system is one of degree and not kind. The underlying structure of
intonation behavior may well be different in kind from intonation behavior used just a
few years prior, even though the signal expressing the intonation appears to have changed
by degree. The lack of sophisticated grammatical or prosodic structures may indicate that
they have not yet been acquired; or it may only indicate a lesser amount of confidence
about using them on the part of the speaker (i.e., the constructions are often avoided even
though they are acquired).
6.2 The acquisition schedule of linguistic subsystems in L1
The acquisition of intonation appears to have already begun within days of birth,
according to the research cited in section 7. Stark (1980) identifies five stages of infant
vocalizing preceding meaningful language use. The final of these stages is a period of
non-reduplicated babbling (9-18 months). Stark claims that up to and during this period,
children can use the intonation and stress patterns of their ambient language, but not the
segmental phonology. According to Jusczyk et al. (1993), perceptual sensitivity to the
phonotactics of the native language appears at 6-9 months of age, but Stark maintains that
at no point during the five stages of vocalizing are the segments produced by the children
limited to those of the linguistic environment (1980). In terms of syntax, the acquisition
of the native syntax is considered complete by about the age of four (Slobin, 1992).
However, the productive grammar at that time isn’t replete with the variety and
functional appropriateness of the adult language; it continues to develop (Kuczaj, 1999).
These observations of L1 acquisition could be interpreted to show that these separate
systems within the language each have their own schedule of acquisition. The onset of
productive ability to use the suprasegmental phonology in L1 appears to begin the
earliest, followed by segmental and articulatory features of pronunciation, and then
syntax, with each of these facets of language acquired within some limited amount of
time by the child.
6.3 Prosodic bootstrapping?
Infants’ early sensitivity to their native language prosody may act as a bootstrap for the
acquisition of syntactic structures. Basic prosodic competence would have to be in place
quite early in life in order to serve as a bootstrap to the acquisition of syntax or
information structure. Cutler et al. (1997) point out that prosody and syntax are not
isomorphic; however, after a thorough review of research on prosody in the computation
of syntactic structure, they conclude that listeners prefer syntactic analyses which are in
accord with the prosodic information they hear. Morgan and Demuth’s volume on
prosodic bootstrapping suggests that prosody plays some role in assisting children in the
acquisition of other aspects of language (1996). Jucszyk and Kemler Nelson (1996)
suggest that prosodic groupings may provide a perceptual precursor to the discovery of
syntactic units. Venditti, Jun, and Beckman (1996) claim that prosody serves as a cue to
information structure rather than syntactic structure. Through a comparison of the
mappings between prosodic structure and linguistic categories in Japanese, Korean, and
English, they found a better fit between prosody and pragmatics than between prosody
and syntax. Dresher (1996) and Fisher and Tokura (1996) also conclude that prosody
does not map transparently onto syntax. Fernald and McRoberts (1996) point out that
indirect evidence for prosodic bootstrapping exists, but no direct evidence exists. Only
the results of manipulating the syntax-prosody relationship during the actual language
acquisition process could provide unequivocal evidence for or against the prosodic
bootstrapping hypothesis (p. 365). They also bring up a problem with the way prosodic
bootstrapping has been investigated. Most studies evaluate the prosody-syntax
relationship by estimating the probability of a prosodic cue given a particular syntactic
structure [p(cue|structure)]. With this estimation, the probability could appear high even
when the same cue can be used to signal other constructions. The probability estimation
ought actually to be based on the probability of the syntactic structure given the
occurrence of the prosodic cue [p(structure|cue)]; that is, the probability that the prosodic
cue led to the correct conclusion regarding the syntactic structure (Fernald and
McRoberts, 1996). A dynamic systems perspective is proposed by Hirsch-Pasek and
Tucker (1996) for the role prosody plays in language acquisition. They view prosody as
one of a set of mutually informing systems of language input that “coordinately enable
the induction of grammar” (p. 464). The weighting of each system can change over time
as the different systems interact in the acquisition process.
7.0 The acquisition of intonation in L1 vs. L2
The path of L1 acquisition may be characterized as a sequence of sensitive period onsets
(i.e., the acquisition of one linguistic subsystem must begin before the next one can
begin, possibly implying a sequential bootstrapping mechanism). Infants demonstrate
sensitivity to intonation very early, but they begin developing segmental phonology
before they have attained a level of competence in their intonational phonology. If L1 is
subject to maturational or age-based constraints on acquisition, is it the case that L2 is
subject to a similar maturational schedule? In review, we find evidence in the literature
that syntax acquired after puberty often reveals itself as nonnative. (Note Scovel’s [1988]
finding that 47% of non-native speakers were identified as non-native according to their
writing samples as compared to 97% by their speech samples. It is possible that the
writing task afforded more time and/or attention to the grammar than an on-demand
spoken task may have, enabling L2 learners to self-correct; therefore, the spoken syntax
may well be strongly nonnative.) There is also evidence that pronunciation acquired after
ages as young as 6 or 7 will generally retain a foreign accent. Where does intonation fit
into the L2 acquisition schedule?
In L1, infants seem to begin to acquire their native intonation extremely early; is it the
case that to acquire nativelike intonation in an L2, learning must also begin before some
early point in maturation? The evidence of differential age-related acquisition constraints
for L2 pronunciation and syntax, combined with observations that intonation is generally
the most difficult aspect of an L2 to produce with nativelike mastery, suggest that in
order for L2 learners (non-exceptional) to achieve nativelike intonation, they would have
to begin their exposure to and use of the L2 at an earlier age than would be necessary to
achieve nativelike segmental outcomes. The traditional analysis of phonological
acquisition must be subdivided into suprasegmental and segmental phonological
acquisition to account for both L1 acquisition behavior and age-related pronunciation
attainment in L2. This reanalysis mandates a new research agenda to explicitly and
separately address the acquisition of segmental vs. suprasegmental acquisition.
8.0 New research agenda for L2 phonological acquisition: Independent
investigations of segmental vs. suprasegmental acquisition
In order to characterize the acquisition of segmental vs. suprasegmental aspects of L2
phonology, new research assessing their separate contributions to L2 pronunciation must
be done. Assessing segmental and suprasegmental aspects of L2 acquisition separately is
a nontrivial task. Finding L2 learners who only show non-native pronunciation along one
of those dimensions is probably impossible. It is also the case that research into
intonation is relatively new, and it’s not entirely clear how suprasegmental behavior
would be evaluated and classified in terms of nativeness.
A common strategy in SLA is to rely upon native speaker judgments to determine the
nativeness of speech. In fact, native speaker impressions are important in speech
assessment across disciplines: It is common to find terms such as “acceptability” used
even when referring to the quality of synthetic speech. Native speaker judgments have
helped SLA researchers learn a great deal about the SLA process, as evidenced by many
of the studies cited herein. Thus, the critical requirement of a new research agenda is the
development of a way to collect native speaker judgments of segmental and
suprasegmental aspects of L2 speech separately, and then organize those judgments in
terms of the age-related acquisition experiences of the L2 learners.
8.1 Experimental isolation of intonation
Modern signal processing technology allows us to manipulate features of pronunciation
independently, keeping other features essentially unchanged. For example, the Pitch
Synchronous Overlap and Add [PSOLA] resynthesis technique can be applied to a speech
stream whose f0 values have been altered. PSOLA and other types of resynthesis can be
done using Praat, a phonetic analysis system developed by Boersma and Weenink, 2000.
By using this kind of technology, SLA researchers can independently model and
synthesize foreign-accented segmental or suprasegmental features while preserving all
other aspects of an original, native speech stream. The resynthesized speech samples can
then be presented to native speakers for judgment of degree of foreign accent. With this
technology, different aspects of pronunciation can be independently evaluated for their
contribution to the perceived foreign accent in the speech. The judgments will reflect
only the contribution of the non-native features specifically manipulated into the signal
since all other aspects of the native speech are held constant. The reverse process can also
be done, using nonnative speech as a baseline and manipulating different aspects of the
pronunciation to be nativelike (cf. Magen, 1998).
Nonnative speech can also be low-pass filtered to reduce segmental distinctions and
preserve mostly only prosodic features. This would allow for native speaker judgments to
be based mostly on suprasegmental information. This method is not ideal for a variety of
reasons: The degraded signal is highly unnatural, which could affect judgments. Further,
low-pass filtering only reduces segmental distinctions; it does not remove them. The
frequency band up to 400 Hz (a typical low-pass filter used in intonation research)
contains features which redundantly indicate segmental information. For example, low
vowels are typically accompanied by a slight lowering of f0, and high vowels are
typically accompanied by a slight raising of f0. Different segments create more energy in
some frequencies than others, so segments which contain more energy in the lower
frequencies will be better represented in a low-pass filtered sample than other segments.
Furthermore, f0 is indicated across frequencies as harmonics (energy at integer multiples
of the f0). Low-pass filtering may be removing some of the critical cue information to the
f0—which is precisely the aspect of the signal the filtering is intended to preserve.
However, low-pass filtering could be used in conjunction with other tasks to help clarify
the independent contribution of intonation to the degree of foreign accent perceived in L2
learners’ pronunciation.
8.2 Examining L1-L2 pairings
The two experimental methods mentioned in section 8.1 will also allow investigation into
the pronunciation outcomes of specific L1-L2 language pairs to determine what role, if
any, the specific L1 plays in contributing to the perception of foreign accent. Johnson and
Newport (1989) argued that their work on a maturational model of L2 (based on a
grammaticality judgment task involving Chinese and Korean L2 learners of English)
should extend to any L1-L2 pairing. However, Birdsong and Molis’ (2001) replication of
that study, involving Spanish learners of English, suggested that the L1-L2 pairing may
be a factor in eventual L2 acquisition. Piske et al.. (2001) also question whether the
relative contribution of segmental vs. suprasegmental phonology to foreign accent varies
as a function of the L1-L2 pairing. Using phonetic analysis technologies, the
pronunciation features of an L1 speech stream can be mechanically adjusted to match
those of various L2s. The resulting resynthesized speech samples could then be presented
for native speaker judgments to determine the relative influence of the L1 on different
L2s. The experiment can be repeated using different L1 speech streams to show the
relative influence of different L1s on the same L2.
8.3 Modeling intonation
Background preparation for the kind of work envisioned in sections 8.1 and 8.2 includes
a thorough understanding of the intonation systems of the L1 and L2. Research is
currently underway to formulate phonological and phonetic representations of many
languages’ intonation systems. The phonological Tones and Break Indices (ToBI)
intonation model has been applied to at least 10 languages to date (see, resulting in phonological models of the
categorical possibilities of intonation behavior in each language. Phonetic models such as
Tilt (Taylor, 2000) and INTSINT (Campione et al., 2000) are being used crosslinguistically to evaluate the acoustic outcomes of intonation behavior. These
representations can be used to identify the relevant aspects of the foreign-accented
intonation to model on the otherwise native speech signal.
It should be noted here that separate aspects of pronunciation may actually have a
synergistic effect, amplifying the perceived degree of foreign accent when combined. Or,
the naturalness of a full speech signal may override the nonnativeness to some degree.
Therefore, it is important to evaluate the pronunciation of unaltered L2 speech alongside
examinations of any single component of pronunciation.
8.4 Mapping L2 intonation to L2 acquisition experiences
Finally, any methods used must yield measures that can be organized in terms of the SLA
experiences of the speakers after whom each speech sample’s segmental or
suprasegmental features were modeled. Important SLA markers include the age of first
exposure to the L2, the age of arrival in the L2 immersion environment, and the age at
which using the L2 became important or necessary (since age of arrival may not entail
the actual onset of L2 use; see Carmichael, 2000 for a discussion).
Combining new technological methods such as those outlined here with the well-attested
practice of using native speaker judgments will initiate the examination of L2 segmental
and suprasegmental acquisition separately. Subsequent methodological refinements will
determine whether and how these aspects of L2 phonology are differentially acquired.
