Chapter 5 - Rachaelanne.net

advertisement
The role of the plateau in the perception of duration, pitch height and prominence 146
Chapter 5 The role of the plateau in the perception of
duration, pitch height and prominence
5.1 Introduction
It has been shown that the plateau may be an important marker of linguistic structure
(Chapter 3) and assist in the process of spoken word recognition (Chapter 4). These
previous results, however, relate to the alignment of the plateau and most likely only to
the end point. Another important question is why speakers ever produce plateaux. There
appears to be no physiological reason why speakers could not rise to a single high point
and then fall immediately back down without sustaining high pitch and forming a plateau.
Part of the reason for the occurrence of plateaux may relate to the results discussed in
Chapter 3, based on the work of Xu (2002), which demonstrate that speakers often do not
change pitch at maximum speed even when they are using an expanded span. It is likely
that speakers do not want to use maximum energy and resources all the time and it may in
some sense be easier to produce a smoothed plateau rather than a sharp peak. It may also
be the case, however, that speakers use the plateau for communicative effect. The three
experiments reported in this chapter investigate how the longer stretch of high pitch in the
plateau might affect perceptual attributes such as perceived pitch, duration or prominence
of the syllable with which the plateau is associated.
It is quite likely that some effect of the plateau will be found on the perception of duration
or pitch height (and therefore prominence). As Lehiste (1970) points out (e.g. p36) it is
clear that to some extent suprasegmentals do co-occur and interact. Therefore it is
possible that the different durations of high pitch between a plateau and a peak might
affect the perception of pitch itself or the perception of the duration of the syllable with
which they are associated. This chapter presents three experiments that aim to address
these issues. Specific interactions of variables (or lack thereof) will be dealt with in the
discussions following each experiment.
The role of the plateau in the perception of duration, pitch height and prominence 147
5.2 Experiment 1 – Perceived duration
5.2.1 Introduction
An initial hypothesis was that a plateau in the contour (as opposed to a peak) might make
a syllable sound of increased length. This could be the case, for example, if the longer
stretch of high pitch interacts with the perceived duration of the segments giving the
impression that they too are of greater duration.
5.2.2 Method
5.2.2.1 Stimuli
The sentence ‘Anna came with Manny’ was spoken by a female phonetician who was
instructed to produce a falling nuclear accent on the word ‘Manny’. The word ‘Manny’
was cut from this sentence and different versions were created by resynthesis as shown in
Figure 5.1. Versions were created with a maximum frequency of each of 160, 170, 180,
190, 200, 210 and 220 Hz. At each of these seven frequencies two different versions
were created in which the shape of the contour associated with the nuclear tone varied. In
one version the high tone was realised as a 100 ms plateau and in the other as a sharp
peak. In the contours containing a plateau, EP was placed at the same position as the
peak in peak stimuli and the plateau was extended backwards in time.
Previous
experiments (specifically that described in Chapter 4) have indicated that the end of the
plateau is most likely to be the point used in spoken word recognition so it is important to
maintain its alignment in order to maintain the naturalness of the stimuli. The rise to the
peak or SP began at the beginning of the word.
The role of the plateau in the perception of duration, pitch height and prominence 148
Figure 5.1 Stimuli used in Experiment 1. The solid line represents the peak stimuli whilst the dotted
line represents the plateau stimuli
5.2.2.2 Subjects
There were 24 subjects, fifteen were female and nine were male. Their ages ranged from
18 to 31 with a mean of 21.7. Fourteen were proficient musicians having attained grade
seven or eight on one or more instruments. Six had some musical training but had
attained a lower level of proficiency and four had never received musical training. All
were students at the University of Cambridge and were paid for their time.
The role of the plateau in the perception of duration, pitch height and prominence 149
5.2.2.3 Setup and instructions
Testing took place individually in a sound-treated room in the same session as testing for
Experiment 3. The experiment was presented using shell scripts running on a Silicon
Graphics workstation. Stimuli were presented through headphones. For each maximum
frequency all possible pairwise combination of stimuli were created. Thus, subjects heard
each version of each item paired with itself (14 pairs), each peak as the first item paired
with the same frequency plateau (seven pairs) and each peak as the second item paired
with the same frequency plateau (seven pairs). Pairs of items were pseudo-randomised
with the condition that the same stimulus version did not occur in two successive pairs.
Subjects were instructed to decide which member of each pair was the longer word and
press the corresponding button (marked 1 and 2) on the keyboard. They were warned that
they might find the task very difficult and were told that they should try and give a
response for each pair even if they thought they were guessing. They were also told to
concentrate only on the length of the presented words and to ignore any other difference,
such as differences in perceived pitch or intensity. Before the main experiment, subjects
took part in a practice session consisting of items with the high tone resynthesised at
frequencies not used in the main experiment. In this practise session subjects were
exposed to one pair of identical peak stimuli, one pair of identical plateau stimuli, one
pair where the plateau preceded the peak and one pair where the peak preceded the
plateau.
The role of the plateau in the perception of duration, pitch height and prominence 150
5.2.3 Results
As shown by the black bar in Figure 5.2, the plateau stimuli sounded longer than the peak
stimuli only 49% of the time. Paired t-tests show that this is not significantly different
from the result that would be predicted by chance (t(23) = 0.327, p>0.05). Further
analyses were undertaken to see if the position of the plateau had any effect on the results.
The percentage of responses for ‘plateau sounds longer’ when the plateau was in second
position (51% as shown by the dark grey bar in Figure 5.2) were tested against the
percentage of times the second item was perceived as longer when two identical stimuli
were presented (52%). There is no significant difference between these results (t(23) =
0.145, p>0.05). The same calculation was made for plateaux in first position, which
sounded longer 46% of the time (as shown by the light grey bar in Figure 5.2) against
48% of responses for ‘first item sounds longer’ for pairs of identical stimuli. Again the
result was not significant (t(23) = 0.511, p>0.05). These results demonstrate that a
plateau in the contour does not make syllables sound longer than a peak.
Figure 5.2 Percentage of 'plateau sounds longer' responses overall and separately for the plateau as
the first and second member of the pair
The role of the plateau in the perception of duration, pitch height and prominence 151
5.2.4 Discussion
It seems that there have not been any previous studies investigating the perceived
duration of syllables or segments when tones of different durations are associated with
them. Nevertheless, a review of the literature seems to suggest that in general the
suprasegmentals duration (or quantity) and frequency do not interact. For example,
Lehiste (1970: 82) states that there is no known evidence that segment length has any
effect on F0 height or that F0 height has any effect on the duration of segments in
production.
In terms of perception, it similarly appears that the realms of frequency and duration are
independent. For example, both Small and Campbell (1962) and Ruhm et al. (1966, cited
in Lehiste 1970) indicate that difference limens for the perception of duration remain the
same regardless of the frequency (ranging from 250 to 5000 Hz in the different studies) of
the tones presented. The results found in the present experiment are obviously different
in nature from these earlier results. Not only are the present results based on speech
stimuli rather than pure tones, they also require the listener to judge the duration of the
syllable when the duration (rather than the frequency) of the associated tone is varied.
Nevertheless, the absence of any effect seems to fit well with the general finding that
frequency and duration are largely independent.
The role of the plateau in the perception of duration, pitch height and prominence 152
5.3 Experiment 2 – Perceived pitch height
5.3.1 Introduction
A second hypothesis was that the plateau might have an effect on the perceived pitch of a
syllable. Assuming an effect was found it could operate in one of two different ways.
Firstly, a longer plateau could make a syllable sound higher as the high pitch is sustained
for some time. On the other hand, the perception of height could be associated with pitch
dynamism. As we have seen in Chapter 3, accents with a higher maximum frequency not
only have greater pitch excursions but also have shorter plateaux. Therefore, a plateau
could make the syllable sound less high in pitch as there is less pitch movement in a given
period of time.
5.3.2 Method
5.3.2.1 Stimuli
Stimuli for this experiment were based on the utterance ‘came with Manny’ taken from
the sentence ‘Anna came with Manny’ recorded for Experiment 1. This utterance was
resynthesised to create twelve different versions (shown in Figure 5.3) varying in
maximum frequency and shape of the nuclear accent (on ‘Manny’). Versions were
created with maximum frequencies of 160, 180, 200 and 210 Hz1. At each of these
frequencies, three versions were created which varied in contour shape so that the accent
associated with the word ‘Manny’ was realised as a sharp peak or a plateau of either 50 or
100 ms in duration. As in Experiment 1, plateaux were created by aligning the end of the
plateau at the same place as the peak in peak stimuli and extending the high frequency
backwards in time. This method of creating the plateau was additionally important in this
experiment as, as will be discussed later in section 5.4.1, events later in the utterance or
text sound higher than events of the same frequency earlier on due to perceptual
compensation for the effect of declination. So, extending the plateau later in time than
the peak (rightwards) could lead to a false result indicating that plateaux sound higher
than peaks when the real difference is that the accent extends later into the utterance and
is, therefore, perceived as higher in pitch by virtue of its position.
1
A version with the contour reaching 220Hz was not created as the speaker did not produce accents that
were this high even in utterances with narrow focus on ‘Manny’
The role of the plateau in the perception of duration, pitch height and prominence 153
Figure 5.3 Stimuli used in Experiment 2. The solid line represents peak stimuli, the dashed line
represents 50 ms plateau stimuli, and the dotted line represents 100 ms plateau stimuli
5.3.2.2 Subjects
Subjects were six members of the University of Cambridge. Five were male and one
female. Their ages ranged from 19 to 50 and the mean was 30 years of age. All except
one had a high level of musical training. Two of the male subjects were also subjects in
Experiments 1 and 3.
5.3.2.3 Setup and instructions
At each frequency each version of ‘came with Manny’ was paired with itself. Also, every
version with a peak occurred twice as the first member of the pair and twice as the second
member of a pair when the other member was a 50 or 100 ms plateau of the same
frequency. At each frequency the two durations of plateau were also paired with each
other, with each item occurring once in first position and once in second position. This
made 36 pairs altogether, the combinations being shown in Table 5.1. Although members
of a pair only ever differed in shape and never in frequency subjects were nevertheless
told to listen to each pair and compare the pitch of the word ‘Manny’ in each utterance.
They were instructed to press one of two buttons to indicate in which member of each
pair (1 or 2) ‘Manny’ sounded higher in pitch.
The role of the plateau in the perception of duration, pitch height and prominence 154
First Member
Peak
50 ms plateau
100 ms plateau
Peak
Peak
50 ms plateau
50 ms plateau
100 ms plateau
100 ms plateau
Second Member
Peak
50 ms plateau
100 ms plateau
50 ms plateau
100 ms plateau
Peak
100 ms plateau
Peak
50 ms plateau
Type
Identical
Identical
Identical
Longer second
Longer second
Longer first
Longer second
Longer first
Longer first
Table 5.1 Stimuli combinations created at each frequency in Experiment 2
5.3.3 Results
5.3.3.1 Overall
Initial results, shown in Figure 5.4, were calculated for the responses overall. The
number of times that a longer stretch of contour sounded higher than a shorter stretch was
counted and expressed as a percentage of total responses. Thus, these results include the
number of times either length of plateau is perceived as higher in pitch than a peak and
also the number of times a 100 ms plateau sounds higher than a 50 ms plateau. All t-tests
in this experiment are two-tailed, as the hypotheses predicted that any significant
differences could occur in either direction.
Overall, as shown by the black bar, longer stretches of contour sound higher in pitch than
shorter stretches 73% of the time, significantly more often than would be predicted by
chance (t(5) = 4.793, p<0.01). Position of occurrence was taken into account by testing
the percentage of ‘longer stretch of contour sounds higher’ responses in either position
against the percentage of responses favouring that position when identical stimuli were
presented. When position of occurrence is taken into account in this way the overall
result is only significant when the plateau is in the second member of the pair (79% vs.
54%, t(6) = 4.108, p<0.01) as shown by the dark grey bar, and not when it is in the first
member of the pair (67% vs. 46%, t(6) = 1.806, p>0.05) as shown by the light grey bar.
The role of the plateau in the perception of duration, pitch height and prominence 155
Figure 5.4 Results overall and separately for the longer stretch of contour in first and second position
5.3.3.2 By frequency
Separate analyses were conducted on results at each frequency to see if the overall
significant result holds. Again these results include comparisons of peaks and both
lengths of plateau and also comparisons of the two different lengths of plateau. Analysis
by frequency reveals that the longer stretches of contour sound higher than the shorter
stretches significantly more often than would be predicted by chance for stimuli of 160
Hz (67%, t(6) = 3.873, p<0.01), 200 Hz (83%, t(6) = 5.477, p<0.01) and 210 Hz (67%,
t(6) = 2.7050, p<0.05). At 180 Hz however, although the plateau sounds higher 72% of
the time, the result failed to reach significance (t(6) = 2.390, p = 0.062). These results are
shown in Figure 5.5.
The role of the plateau in the perception of duration, pitch height and prominence 156
Figure 5.5 Results shown separately for each frequency
5.3.3.3 By shape
The final set of results compares the different contour shapes to each other. These results
include responses at every frequency and are shown in Figure 5.6. Overall, the 50 ms
plateau sounds higher than the peak 77% of the time (t(6) = 7.050, p<0.01) and the 100
ms plateau also sounds higher than the peak 77% of the time (t(6) = 4.540, p<0.01).
There is no significant difference between responses when subjects compare the 100 ms
and 50 ms plateau, however, with 65% of responses for ‘100 ms plateau sounds higher’
(t(6) = 1.941, p>0.05).
The role of the plateau in the perception of duration, pitch height and prominence 157
Figure 5.6 Results shown separately for each combination of contour shapes
5.3.4 Discussion
The results suggest that there is an effect of contour shape on the perception of pitch
height but only in certain circumstances. Firstly, the effect is only reliably found when
the longer stretch of contour is in second position presumably because, as discussed in
more detail below, speakers will normalise for declination and therefore the perceived
pitch height of accents later on in a text or utterance will be increased. Thus, the
advantage to the perception of pitch height caused by the longer stretch of contour in first
position is to some extent counteracted by the positional advantage of the shorter stretch.
When the longer stretch is in second position it enjoys both advantages. It is noteworthy
that when the shorter stretch is in second position it does not sound higher in pitch than
the longer stretch (there is no significant difference) indicating that the effect of length is
more important than the order effect.
The role of the plateau in the perception of duration, pitch height and prominence 158
The general result seems to hold at every frequency but there is one a strange result in
that the finding is of only borderline significance at 180 Hz. An investigation of the
individual subject responses suggests that this is due to one subject who thought the
longer stretches of contour sounded higher in only 17% of cases. This was an unusual
finding even for this particular subject who consistently perceived longer stretches of
contour as higher in pitch at all the other frequencies. At 180 Hz two other subjects
responded at 50%, two others at 67% and two at 100%, suggesting that in general the
overall finding holds at this frequency too but does not reach significance due to the
unusual responses of a single subject.
Finally, it is clear that there is no significant difference in perceived pitch between the
two lengths of plateau. This suggests that there is a categorical effect of peak versus
plateau rather than a gradient effect of plateau length. At first sight it would appear that
two explanations could fit the data, one based on integration and one on temporal
smoothing. An explanation based on integration would suggest that the listener integrates
the entire area under the pitch curve when making judgements about pitch height.
Therefore plateaux, which, as shown in Figure 5.7, have a greater area under the curve
than peaks of the same frequency, would sound higher in pitch. An explanation based on
temporal smoothing would suggest that listeners do not extract pitch at every point in
time, as shown in Figure 5.8, and therefore peaks may be perceived as less high than they
actually are due to the brief amount of time spent at the highest frequency.
The role of the plateau in the perception of duration, pitch height and prominence 159
Figure 5.7 The greater area under the curve of a plateau than under a peak of the same height
Figure 5.8 A peak perceived as being of a lower frequency due to temporal smoothing in the auditory
system
The two explanations make different predictions about what will happen as plateaux get
increasingly longer. The integration explanation predicts that the perceived pitch will
continue to increase as the area under the curve increases. The smoothing explanation
predicts that this will not be the case. The lack of a perceptual difference between the two
plateau durations found in the present experiment shows that perceived pitch does not
continue to increase in a gradient fashion and therefore supports the smoothing
explanation. The exact physiological mechanism behind such an explanation will be
discussed in detail in section 5.5.1.
The role of the plateau in the perception of duration, pitch height and prominence 160
5.3.4.1 The relation of the present findings to early
psychoacoustic studies
Overall, the results are largely in support of much of the psychophysical literature from
the 1940s and 1950s. For example, Doughty and Garner (1947 and 1948) specifically
investigate the pitch characteristics of short tones.
Doughty and Garner (1947)
demonstrate that some pitch percept is available even in very short tones. The ‘clickpitch’ threshold is considered to be the shortest duration at which some pitch
identification is possible from a tone. Doughty and Garner (1947) demonstrate that this
threshold is approximately 11 ms for tones of 250 Hz. However at this ‘click-pitch’
threshold even reasonably good pitch discrimination is not possible. Therefore, Doughty
and Garner (1948) investigate how the perceived pitch of a tone changes as duration is
shortened. In one experiment subjects were played two tones and asked to alter the
frequency of the second, which ranged from 6 to 200 ms in duration, until it matched the
frequency of the first. Results show that at each frequency tones tend to be perceived as
lower in pitch when they are of shorter duration. Specifically, at 250 Hz (the frequency
closest to the speech in the present experiment) the pitch change was estimated to be –4%
when the tone was 6 ms in duration (Doughty and Garner 1948: 484). Pitch perception
was considered to be poor at this duration as it is close to the ‘click-pitch’ threshold
(Doughty and Garner 1948: 490). There was still some loss of pitch (approximately 1 or
2%) for tone durations as long as 25 ms but at all longer durations pitch remained
relatively constant.
The role of the plateau in the perception of duration, pitch height and prominence 161
In many respects these results fit well with the results of the experiment reported here.
Firstly, the direction of the change is the same; tones sound lower in pitch at shorter
durations.
Secondly, there is not a gradient effect of tone duration as the drop in
perceived pitch occurs only at the shortest durations. Thirdly, the main difference in
perceived pitch occurs below 25 ms in accordance with the present finding that there is no
significant difference between the perceived pitch of plateaux of 50 and 100 ms in
duration.
It is however important to remember that Doughty and Garner’s (1948)
experiment involved monotone stimuli. It seems likely therefore that the findings result
from a different mechanism to the temporal smoothing hypothesis suggested to explain
the results of the present experiment. The temporal smoothing hypothesis deals with
stimuli that change rapidly in pitch and makes different predictions for some types of
stimuli. For example, the temporal smoothing hypothesis suggests that a low plateau
would sound lower than a well defined trough as the stimuli stays low for longer giving
the auditory system a greater chance to perceive the lower pitch. Doughty and Garner’s
(1948) results from monotone stimuli suggest that longer stimuli will always sound higher
in pitch than shorter stimuli of the same frequency. Therefore, although the results of the
present experiment are very similar in some respects to those in the early experiments it is
important to remember that they are likely to be based on different auditory mechanisms.
The role of the plateau in the perception of duration, pitch height and prominence 162
5.4 Experiment 3 - Perceived pitch height and prominence
5.4.1 Introduction
Experiment 2 demonstrated that a plateau in the intonation contour does make syllables
sound higher in pitch than a peak of the same frequency. It is therefore possible that a
plateau will also make a syllable sound more prominent. Rietveld and Gussenhoven
(1985) investigate the relationship between pitch excursion size and prominence. The
authors resynthesised natural rise-falls in sentences in Dutch and varied the size of the
pitch excursion by either increasing the maximum frequency of the accent by 1.5 or 3
semitones or decreasing it by 1.5 semitones. These resynthesised sentences were played
to Dutch listeners who judged which of the two accents in each was the most prominent
and indicated how certain they were about their choice. The results show that greater
pitch excursions are perceived as more prominent and that listeners are sensitive to
changes as small as 1.5 semitones.
It is not, however, just the size of the pitch excursion that affects listeners judgements of
pitch and prominence. Pierrehumbert (1979) demonstrates that the position of an accent
in the utterance will also have an effect. Pierrehumbert (1979) resynthesised a string of
nonsense syllables so that the maximum frequency of the second of two stressed syllables
was increased and decreased by small increments. Subjects judged whether the first or
second accent in each syllable string sounded higher in pitch. The results indicated that
when the two accents sounded equal in pitch the second accent was actually about 10 Hz
lower in frequency.
These results are interpreted as reflecting “normalisation for declination” (Pierrehumbert
1979: 363).
Declination, or the tendency for pitch to drift downwards over an
intonational phrase, means that peaks later in the utterance will, in general, be less high
than those earlier on. It is suggested by Pierrehumbert (1979) that the listener normalises
for the expected slope of declination and therefore assumes that peaks later on are ‘worth
more’ as they are further away from the speaker’s (declining) baseline than peaks of the
same frequency earlier on in the utterance.
The role of the plateau in the perception of duration, pitch height and prominence 163
Further experimental results from the literature suggest that many aspects of the contour
may influence the degree to which listeners normalise for declination and pitch height.
For example Gussenhoven and Rietveld (1998) resynthesised a female voice so that the
formants were appropriate for either an average male or female speaker. In a similar task
to those described above, listeners were asked to judge the prominence of accents within
a sentence. Results showed that subjects judged accents of the same frequency to be
more prominent when the formants were appropriate for a male than when they were
appropriate for a female, suggesting that listeners make a judgement about the speaker’s
natural pitch range and assign prominence based on the relation of an individual accent to
this estimate. So, in this case, the male-sounding voice is expected to have a narrower
pitch range and therefore accents are assumed to be further above the baseline, and
therefore more prominent, than accents of equal frequency in the speech of the femalesounding voice.
The studies described in this section are of importance for two reasons. Firstly, they
indicate that higher accents, those with a greater pitch excursion, are usually interpreted
as more prominent and this suggests that the plateau, which, as we have seen, increases
the perception of height, may also affect judgements of prominence. Secondly, the
method employed in these studies will allow us to judge the effect of the plateau on
perceived prominence. If the plateau is included as a variable in an experiment similar to
those cited above, we should expect syllables with a plateau in the contour to be judged as
more prominent than syllables with a sharp peak of the same frequency. The hypothesis
is, therefore, that two accents will sound of equal pitch when the second is at a lower
frequency if it is realised as a plateau rather than as a peak.
Equal
Pitch
Equal
Pitch
Figure 5.9 Accents will sound of equal pitch if the second accent is realised as a plateau at a lower
frequency
The role of the plateau in the perception of duration, pitch height and prominence 164
5.4.2 Method
5.4.2.1 Stimuli
The sentence ‘Anna came with Manny’, used in the previous two experiments, was
resynthesised so that the nuclear accent (on ‘Manny’) varied in both frequency and shape.
Frequency varied in seven equal steps between 160 and 220 Hz whilst at each of these
frequencies the shape of the contour was either a sharp peak or a plateau of 100 ms in
duration. As in previous experiments the plateau was extended backwards in time and the
rise began at the beginning of the word ‘Manny’. In each case the rest of the utterance
was unmodified with the frequency of Anna remaining at 224 Hz. Figure 5.10 is a
schematic diagram of the different pitch contours created.
Anna
Manny
Figure 5.10 Schematic representation of the pitch contour in different versions of the sentence
5.4.2.2 Setup and instructions
Subjects heard each of the sentences in a pseudo-random order (so that no frequency or
shape occurred in two successive presentations) and for each were asked to compare the
accents on ‘Anna’ and ‘Manny’. All subjects heard the same sentences in the same order
but twelve were asked which accent sounded higher in pitch whilst the remaining twelve
were asked which accent was more prominent. They registered their choice by pressing
one of two buttons labelled ‘A’ and ‘M’. This experiment was run after the experiment
presented in this chapter as Experiment 1 using the same setup as described there.
The role of the plateau in the perception of duration, pitch height and prominence 165
5.4.2.3 Subjects
Subjects were the same 24 who served as subjects in Experiment 1. The twelve who
judged pitch height were six men and six women aged between 18 and 31 (mean = 21.9).
Seven were proficient musicians, four had some musical training whilst one was not
musical. The twelve who judged accent prominence were three men and nine women
aged between 20 and 25 (mean = 21.3). Seven were proficient musicians, two had some
training and three were not musical.
5.4.3 Results
Probit analysis identified the point of subjective equality in each stimulus series (peak and
plateau) for each subject. The point of subjective equality is the frequency of ‘Manny’ at
which the accents on ‘Anna’ and ‘Manny’ sound to be of equal prominence or pitch and
therefore subjects are effectively guessing in order to make responses.
For subjects asked about height, there were more ‘Manny’ responses for plateau than
peak stimuli at every frequency except the lowest, as shown in Figure 5.11. In addition,
the mean point of subjective equality (roughly equivalent to 50% on the graph) was at a
higher frequency in the peak than in the plateau series. Paired t-tests revealed that this
difference was significant (201 Hz vs. 190 Hz, t(11) = 5.086, p<0.01). The results for
judgements about prominence closely mirrored those for height. As shown in Figure 5.12
there were more ‘Manny’ responses to plateau than peak stimuli at each frequency except
the lowest and the point of subjective equality was higher in the peak than in the plateau
series (209 Hz vs. 196 Hz, t(11) = 3.623, p<0.01).
Comparisons were also made between the point of subjective equality found for height
and prominence judgements. There was no significant difference between results gained
by the height and the prominence task for either the peak (201 Hz vs. 209 Hz, t(11) =
1.966, p>0.05) or the plateau stimuli set (190 Hz vs. 196 Hz, t(11) = 1.384, p>0.05)
suggesting that both groups of subjects performed in the same way and that responses did
not differ depending on the instructions (pitch or prominence) given.
The role of the plateau in the perception of duration, pitch height and prominence 166
100
% of Manny responses
90
80
70
60
Peak
50
Plateau
40
30
20
10
0
160
170
180
190
200
210
220
Frequency of Manny (Hz)
Figure 5.11 Percentage of 'Manny' responses at each frequency in the height judgement task
100
% of Manny responses
90
80
70
60
Peak
50
Plateau
40
30
20
10
0
160
170
180
190
200
210
220
Frequency of Manny (Hz)
Figure 5.12 Percentage of 'Manny' responses at each frequency in the prominence judgement task
The role of the plateau in the perception of duration, pitch height and prominence 167
5.4.4 Discussion
These results suggest that the results found for pitch height in Experiment 2 can indeed be
extended to prominence judgments as in the present experiment a plateau in the contour
makes the syllable sound both higher and more prominent. They also suggest that, in the
present experiment at least, pitch height is a close correlate of perceived prominence as
the results of the two tasks are so similar.
Although not significant, the results suggest that the point of subjective equality is higher
in both stimulus sets when subjects are asked about prominence than when they are asked
about pitch height. This trend is the opposite of that found by Terken (1999). However,
both the stimuli and the task were different in Terken’s experiment. Terken used a string
of monosyllables and asked subjects to adjust the frequency of the second peak until it
was of equal pitch or prominence to the first peak. In addition, Terken (1999: 1775)
states that the timing of the accents may have been somewhat unnatural, both first and
second accents being timed early within the syllable. It is possible that the non-linguistic
nature of the stimuli removed the close association between pitch and prominence found
here and also that the unnaturalness of the timing of accents led to the different result.
The results of the present experiment also suggest that in both tasks there is about a 10 Hz
difference between the point of subjective equality in peak and plateau stimuli. This
result suggests that the effect of the plateau is to raise the perceived pitch of an accent by
10 Hz. This is roughly consistent with the results of Doughty and Garner (1948) who, as
discussed in section 5.3.3, found that at 250 Hz the pitch drop for the shortest tones
corresponded to 4% or 10 Hz. However, as suggested above, the mechanisms behind the
two results are probably rather different as Doughty and Garner worked with monotone
stimuli rather than dynamic pitch contours.
The role of the plateau in the perception of duration, pitch height and prominence 168
The results from the peak stimuli alone suggest that the point of subjective equality
occurred at a lower frequency than in the results presented by Pierrehumbert (1979). In
Pierrehumbert’s (1979) results, subjective equality occurred when the second accent was
10 Hz lower than the first. In the present experiment the accent on ‘Anna’ was 224 Hz in
frequency and the point of subjective equality was at 201 Hz (a difference of 23 Hz) for
height judgements and 209 Hz (a difference of 15 Hz) for prominence judgements. This
slight difference in the details of the results may be explained on the basis of several
factors: the present stimuli consist of real words rather than reiterant speech, the quality
of the resynthesis produced by newer systems is likely to better, and the different
speakers probably have different declination slopes.
The role of the plateau in the perception of duration, pitch height and prominence 169
5.5 General Discussion
5.5.1 The
physiological
mechanisms
underlying
temporal
smoothing
The results from Experiments 2 and 3 have shown that plateaux sound higher in pitch
(and therefore more prominent) than sharp peaks of the same maximum frequency but
there is no perceptual difference between two plateaux of different durations.
As
discussed in section 5.3.3, the categorical rather than gradient effect suggests that a
suitable explanation may be one based on temporal smoothing. In order to understand
more fully the mechanisms behind this temporal smoothing, we must turn to the
physiological processes that underlie pitch perception.
It is generally considered that there are two perceptual mechanisms for extracting pitch
from a signal (see Moore 1997, especially chpt. 5, for a review). The first of these is
known as the place mechanism whereby the perceived pitch is related to the pattern of
excitation on the basilar membrane. Specifically, the basilar membrane, which acts like a
filter bank, vibrates maximally at different places along its length in response to different
frequencies in the signal. Thus, the location of the maximum vibration allows for the
extraction of pitch.
Although the place mechanism operates to some extent at all
frequencies it is generally considered to be less useful for complex sounds such as speech
due to the complex excitation pattern that such stimuli produce on the basilar membrane.
The second pitch extraction mechanism is considered to be a temporal mechanism based
on the timing of neural activity in the auditory nerve. The temporal mechanism is
believed to be the main determiner of perceived pitch for speech sounds. As the basilar
membrane vibrates in response to stimuli, a shearing motion is created between it and the
tectorial membrane. This motion displaces the sterocilla of the outer hair cells causing
the inner hair cells to fire and send information to the auditory nerve. Phase locking
occurs as spikes in the activity of the auditory nerve occur regularly at the same phase of
the stimulating waveform. In this way pitch is extracted from the signal as integral
multiples of the intervals between nerve firings. Temporal mechanisms of extracting
pitch are present over the entire speech range but generally not above 5 kHz as phase
locking cannot occur at such rapid frequencies due to physiological constraints.
The role of the plateau in the perception of duration, pitch height and prominence 170
Even at frequencies below 5 kHz recent work suggests that the phase locking mechanism
may not work well if the frequency of the signal is changing too rapidly. For example,
Sek and Moore (1999) investigate the discrimination of frequency steps linked by glides
of various durations. In this experiment subjects compared the pitch of two sounds. One
sound was a sinusoid of constant frequency and the other consisted of an initial and final
portion of steady frequency linked by a downward frequency glide of between 5 and 500
ms in duration. The duration of all stimuli was 500 ms, so that stimuli with longer glides
had correspondingly shorter steady states before and after the glide.
Frequency
discrimination was found to be worse as the glide duration increased. In particular
performance worsened between glide durations of 200 ms and 500 ms. For glides of 200
ms there was still some steady region of frequency whereas for glides of 500 ms there
was no steady state. In a second condition the frequency glide was replaced by a 5 ms
interval of silence. A comparison of this condition and the stimuli containing 500 ms
glides reveals that below 4 kHz performance was better for the stimuli containing silence.
The authors suggest that, at frequencies where phase locking is operating, discrimination
is better when frequency is changing less rapidly.
Gockel et al. (2001) go on to investigate this effect in response to frequency modulated
tones. They obtained pitch matches between unmodulated sinusoids and those with
repeated u-shaped or inverted u-shaped frequency modulation functions.
Subjects
adjusted the frequency of the unmodulated tone until it matched the frequency of the
modulated one. Results showed that for u-shaped modulation patterns the stimuli were
matched to sinusoids of lower frequencies than the mean frequency of the unmodulated
sinusoid. For the inverted u-shaped stimuli matches were made to sinusoids of a higher
mean frequency.
The authors interpret these pitch shifts as demonstrating auditory
sluggishness in that the portions of the signal where pitch is changing rapidly “receive
less weight in the computation of pitch than portions where the frequency is changing
more slowly” (Gockel et al., 2001: 705) and refer to this effect as “stability-sensitive
weighting” (Gockel et al., 2001: 702).
The role of the plateau in the perception of duration, pitch height and prominence 171
It seems likely that ‘stability-sensitive weighting’ can, to some extent, explain the results
found in Experiments 2 and 3. In peak stimuli, the fundamental frequency is changing
rapidly and it is possible therefore that the auditory system does not have time to phase
lock to the very highest frequency as it is reached so briefly. In this way peak stimuli will
be perceived as lower in pitch than they actually are. For plateau stimuli, on the other
hand, the highest frequency is sustained for enough time to allow the system to phase lock
and the contour to be perceived as the height that it really is.
This stability-sensitive weighting effect also explains the lack of a difference between the
two plateau durations found in Experiment 2. In the 50 ms plateau stimuli the high
frequency is already sustained for enough time for the system to phase lock and the real
pitch to be extracted so lengthening the plateau further has no significant effect. It is
likely that the differential ability of the system to phase lock also influences the results of
experiments such as those reported by Doughty and Garner (1948) as the shortest tones
may be too brief for phase locking to occur as so few cycles of the waveform are
presented.
5.5.2 Biological codes relating intonational form to function
It is clear then that the shape of the contour can affect the perception of pitch and that this
effect can be explained by physiological mechanisms such as auditory sluggishness.
However, in order to answer the original question of why speakers produce plateaux at
all, we need to consider how speakers can use their knowledge that plateaux cause
perception of higher pitch for communicative purposes. The following sections will
consider how speakers make use of universal, biological codes that relate intonational
form to function.
The role of the plateau in the perception of duration, pitch height and prominence 172
Gussenhoven (2002) discusses three metaphors or biological codes that relate aspects of
the speech production process to meaning. The first of these, the frequency code (e.g.
Ohala 1994), relates to the fact that there are universal tendencies for particular uses of
fundamental frequency to be associated with particular emotions and sentence types.
Firstly it seems that there are cross-culturally similar uses of F0 to signal emotions. For
example, high or rising pitch is generally associated with politeness, deference or lack of
confidence. As an extension of this, most languages use a high or rising F0 to mark
questions and a low or falling F0 to mark statements.
This informational use is
considered to be related to the emotional use of high or rising F0 because the questioner
desires the good will of the listener and hence wishes to appear polite and deferent.
The universal uses of F0 captured in the frequency code are explained by reference to the
agonistic displays of non-human animals. Morton (1977) points out that in virtually all
vocalising species, when competitive encounters occur, the animal who is, or who wishes
to appear, more dominant makes lower frequency vocalisations than the animal who is
more submissive.
Morton (1977) links this vocal behaviour to aspects of visual
competitive behaviour such as hair-raising in dogs and back-arching in cats, both of
which make an animal appear larger. He suggests that lower frequency vocalisations give
the impression of a larger larynx and, by extension, the impression of a larger overall
body size. This impression of larger size is considered to be advantageous as a larger
animal is more likely to win if a physical fight ensues. Higher frequency vocalisations,
by contrast, are interpreted as the animal wishing to show its smaller size and that it does
not want to fight.
Impressions of size for agonistic purposes may also be permanent. The males of many
species develop humps, antlers or manes, which give a permanent impression of larger
size. Ohala (1994) considers the sexual dimorphism of the human larynx to also be a case
in point. So, at puberty the male larynx both increases in size and descends, lengthening
the vocal tract. Both of these features cause the adult male to have a lower fundamental
frequency than infants or adult females. Ohala (1994) suggests that this change occurs at
puberty because, in evolutionary terms, this is the time when the male will be ready to
take on a protective or sexual role and therefore be engaged in combat situations. Also at
puberty the human male’s facial hair begins to grow, a visual display of size equivalent to
those of other animals.
The role of the plateau in the perception of duration, pitch height and prominence 173
In addition to the frequency code Gussenhoven (2002) discusses two other biological
codes that also exploit universal features of fundamental frequency for communicative
purposes. These are the effort code and the production (phase) code.
The effort code states that the relationship between F0 and meaning can be tied to the
amount of effort expended by the speaker. More effort will result in more numerous,
canonical and precise pitch movements. Thus, for example, wider pitch movements are
associated with greater prominence (as discussed in section 5.4.1 above) because it is
considered that the speaker puts in more effort as s/he believes the associated information
to be important.
The production code states that energy for speech becomes available in phases as it is
linked to the breathing process. Thus more energy is available at the beginning than at
the end of the utterance and therefore pitch will be higher at the beginning than at the end,
all other things being equal. In communication then, high pitch at the beginning of an
utterance may signal a new topic whilst low pitch may signal a continuation. The reverse
is true of utterance ends where high pitch signals continuation and low pitch finality.
Importantly, it seems that the physical correlates of these three codes do not actually have
to occur for the intended meaning to be transmitted. It is enough to create the effect of a
particular code without actually using the physiological mechanisms that underlie it. So,
in the example of the effort code, where wider pitch movements are associated with
emphasis and prominence, the pitch does not actually have to be higher for this effect to
be created. This is nicely demonstrated by the experiment of Gussenhoven and Rietveld
(1998) discussed in section 5.4.1, which shows that the perception of prominence is
speaker dependent. The main finding was that peaks of the same frequency are perceived
as less prominent if made by a female-sounding voice because they are considered to be
closer to the baseline of a female speaker and to therefore have required less effort in
production. This result is an indication that the effect of the effort code can be created
without actually having to reach higher in pitch but merely by the impression that the
speaker has made more effort. In this way the variables commonly associated with the
three biological codes can be substituted for variables that create the same perceptual
effect.
The role of the plateau in the perception of duration, pitch height and prominence 174
5.5.2.1 Peak delay as a substitute variable for peak height
One feature that, as we will shortly see, is related to the communicative use of the plateau
is the substitution of peak delay for peak height. As we have seen in Chapter 3, higher
peaks take longer to reach and are correspondingly timed later in the tone-bearing unit
than lower frequency peaks. Thus, as Gussenhoven (2002) suggests, speakers can use
peak delay as a substitute for (or an enhancement of) high peaks as listeners will interpret
a late peak as a high peak by their knowledge of this constraint.
In terms of the effort code Ladd and Morton (1997: 321) indicate that peaks are
invariably later when speakers produce emphatic contours. In one perception experiment
Ladd and Morton (1997) create two sets of stimuli using sentences such as ‘the alarm
went off’. In both stimulus sets the accent (on ‘alarm’ for example) varied in frequency
but in the second set the alignment of the peak was 60 ms later than alignment in the first
set. Subjects took part in an identification task where they were asked whether the event
being described was a normal or an unusual experience, previous experiments having
shown that higher peaks resulted in more ‘unusual experience’ responses than lower
peaks. The results indicate that the abrupt rise from ‘normal’ to ‘unusual experience’
responses occurs at a lower frequency in the late than in the early alignment stimuli
indicating that late peaks lead to accents sounding more prominent or emphatic than
earlier peaks of the same frequency.
Gussenhoven (2002) also discuses how the frequency code can be exploited by the use of
peak delay as a substitute for peak height. Gussenhoven and Chen (2000) recorded made
up CVCVCV sequences and by resynthesis altered the height and alignment of the pitch
peak. Stimulus pairs were played which varied in terms of height and alignment, and
Dutch, Hungarian and Chinese listeners were asked to judge which member of each pair
sounded more like a question. Results showed that, as expected, utterances with higher
peaks sounded more like questions. In addition, utterances with later peaks also sounded
more like questions. This result was true for listeners of all languages even though all
three languages use different methods to signal interrogativity.
Nevertheless, there
appeared to be some language specific component as, for example, Hungarian listeners
were more affected by peak height than Chinese listeners, an effect attributed to the
different intonational means used to signal interrogativity in the two languages. Thus
peak delay may also exploit the meanings associated with the frequency code.
The role of the plateau in the perception of duration, pitch height and prominence 175
Finally, speakers may also use delayed peaks to simulate the effects of the production
code. For example, Wichmann et al. (1999) investigate the effects of discourse structure
on peak timing in English. Stimuli consisted of texts where the same word occurred in
three different positions, sentence final (nuclear), paragraph initial (the first accented
syllable in a paragraph) and sentence initial (the first accented syllable in a paragraph
medial sentence). Measurements of peak alignment and height were taken from subjects’
readings of these texts. Results, reproduced below in Table 5.2, replicate findings that
later accents are lower in frequency due to declination and also indicate that there is a
significant difference between alignment in all discourse positions, with the more initial
positions being associated with later peaks. In this way later peaks probably enhance the
impression of height and help to signal that an accent is early in the utterance.
Alignment (% syll)
Height (Hz)
Paragraph Initial
115.7
314
Sentence Initial
105.2
285
Sentence Final
62.2
189
Table 5.2 Peak alignment and height in different discourse positions (reproduced from Wichmann et
al., 1999)
5.5.2.2 The plateau as a substitute variable for peak height
Peak delay can be considered a substitute for peak height and it seems that the plateau
may have the same effect. I will discuss this hypothesis in relation to the types of plateau
introduced in Chapter 1. I shall begin by discussing the case of the three types of plateau
that have not been the focus of this dissertation, before going on, in the next section, to
suggest why the plateau might be found as a feature of falling nuclear accents in broad
focus declaratives.
The role of the plateau in the perception of duration, pitch height and prominence 176
Firstly, it is interesting to note that in both discussions of high plateau outside the present
work (and the ProSynth work), plateaux occur in the same environment as peak delay.
The experiment by Wichmann et al. (1999), discussed in the previous section, indicated
that in paragraph initial position some readers reached their maximum frequency at the
same time as in other positions but produced a longer plateau, causing the fall, but not the
rise, to be aligned later. Wichmann et al. (1999), suggest that there may be two reasons
for this plateau. One may be equivalent to simple peak delay and involve delaying only
the falling part of the gesture rather than the whole accent. The other may be to signal
initiality in an alternative way to peak delay (although the mechanisms by which this
would be accomplished are not discussed). The results of Experiments 2 and 3 indicate
that the plateau itself will increase the effect of pitch height and therefore, according to
the production code, also the perception of initiality.
Secondly, work by D’Imperio suggests that the effect of a plateau in Neapolitan Italian
influences perception towards questions in the same way as peak delay. In Neapolitan
Italian (D’Imperio 2002a) both questions and (narrow-focus) statements are realised by
rise-falls, the peaks of which are aligned around 40 ms later in questions. D’Imperio and
House (1997) and D’Imperio et al. (2000) conduct perceptual experiments where the peak
of a rise-fall configuration is shifted backwards in time in steps of 15 ms and Italian
listeners are asked whether each utterance sounds like a statement or a question. Results
show that subjects identify earlier peaks as statements and later peaks as questions in the
same way as is predicted by the frequency code and found by Gussenhoven and Chen
(2000). Also, the result seemed to be universal (D’Imperio et al. 2000) as American
listeners, who did not speak Neapolitan Italian, showed similar results to the Italian
listeners. Results were also influenced by language specific constraints, however, as the
crossover point from statement to question responses for American listeners was later
than for the Italian listeners, possibly reflecting the phonological difference between early
and late rises in American English (D’Imperio 2002a).
The role of the plateau in the perception of duration, pitch height and prominence 177
In a second part of the experiment by D’Imperio et al. (2000), a plateau continuum was
created where the plateau was 45 ms in duration, spreading forwards in time from the
alignment of the peak. A stimulus set was created with the plateau aligned earlier and
later in the stressed vowel and again listeners were asked to label each utterance as a
question or a statement. When these results are compared to the peak stimuli it is clear
that the plateau elicits more question responses at each alignment point. For example, at
the second time step plateau stimuli gained 75% question responses whilst peak stimuli
gained only around 10%.
These results were further compared to the peak stimulus corresponding to the plateau
offset (that is the peak stimulus 3 time steps later than the peak stimulus timed at plateau
onset). Here the differences between peak and plateau stimuli were much smaller. For
example, at the second time step 82% of responses favoured the question interpretation
compared to 75% for plateau stimuli. This supports the suggestion in Chapter 3 that the
end of intonational plateaux is speakers' real target. However, in D’Imperio et al’s.
(2000) experiment we cannot be completely sure that the declination normalisation issue
is not having an effect as the plateau spreads forwards in time from the peak and will
therefore appear to be at an increased pitch because of its position regardless of its shape.
Despite the complication caused by the declination issue, we again see, in Neapolitan
Italian, that the effect of a plateau in the contour is very similar to that of peak delay, in
this case leading to the perception of a question rather than a statement. This seems likely
to be linked to the frequency code in that higher pitch suggests the speaker needs to
appear passive and docile, as they want the cooperation of the listener.
The role of the plateau in the perception of duration, pitch height and prominence 178
D’Imperio et al. (2000) do entertain the idea that the effect of the plateau may be to
increase the perceived height of the accent. Therefore, in a further part of the experiment.
they create another set of stimuli where the peak varies in scaling but not in timing and
subjects again judge whether utterance are statements or questions. Some effect of height
is found, specifically that high peaks (300 Hz) sound more like questions than low peaks
(260 Hz). D’Imperio et al. (2000) largely reject this result as an explanation for their
earlier findings, however, on the grounds that there is no significant difference between
peaks of medium height (280 Hz) and peaks of either of the other two heights. However,
as we now know that a plateau in the contour does indeed make syllables sound higher in
pitch it seems reasonable to suggest that the plateau may have had a similar effect in this
case. The fact that no difference was found between D’Imperio et al.’s (2000) low and
medium or medium and high peaks may be due to the fact that the frequency
manipulation changed the height of all targets, rather than just the peak, possibly
lessening the effect of pitch height as the baseline was also altered. In addition, the 20 Hz
changes used result in about 1.3 semitones difference between the low and medium peaks
but only 1.2 semitones difference between the medium and high peaks.
Thus it seems that high plateaux can act as a surrogate for pitch height in the same way as
peak delay. This tactic takes advantage of the properties of the auditory system, which
cause plateaux to sound higher in pitch. In this way the plateau appears to be used as a
substitute variable for pitch height exploiting both the frequency code (to increase the
perception of interrogativity) and the production code (to increase the perception of
initiality).
The plateaux in Japanese found by Pierrehumbert and Beckman (1988) represent a
slightly different case because they are associated with low boundary tones rather than
with high targets within a pitch accent. Nevertheless the same effects are probably at
work. Due to the auditory sluggishness hypothesis it is likely that a short low tone will
sound higher in pitch than a long low tone because the auditory system will not have time
to extract the lowest pitch from the contour. Thus, the low plateau or strong allophone
will sound lower in pitch. This lowering of a low tone is likely to be equivalent to the
raising of a high tone and goes along with Pierrehumbert and Beckman’s (1988: 29)
suggestion that the strong allophone will sound lower in pitch and be associated with a
stronger boundary than the weak allophone.
The role of the plateau in the perception of duration, pitch height and prominence 179
5.5.2.3 Interaction of variables signalling peak height
Of course, it is certainly not the case that the plateau is the only determinant of perceived
pitch height or prominence in an utterance. As we have seen, the biological codes can be
used quite straightforwardly by actually increasing the frequency of intonational peaks to
increase perceived prominence. Alternatively, or in addition, peaks can be delayed, also
creating an impression of added height. In some cases these different methods of creating
the impression of pitch height may in fact be incompatible.
This seems to be the
explanation behind the shorter plateaux found in expanded pitch spans in Chapter 3. It
would perhaps be plausible to suggest that as the expanded spans are likely to make the
accents sound more emphatic (due to the effort code), the plateau should also be longer to
further increase the effect of prominence. However, it seems that in this case the effort
code is used in a straightforward fashion, the perception of height actually being created
by a higher maximum frequency and enhanced by the later alignment of the beginning of
the plateau and the peak. Other constraints on alignment, such as the rate of change of
the fall and the alignment of the end of the plateau, may mean that it is more difficult for
the speaker to realise a plateau as well. Even if this were not the case it would be difficult
to have both a longer plateau for emphasis and a later peak alignment as the end of the
plateau must be anchored and therefore lengthening can only occur by starting the plateau
earlier.
The role of the plateau in the perception of duration, pitch height and prominence 180
5.5.2.4 The plateau as a substitute variable in nuclear position
in broad focus declaratives
Although I have suggested how plateaux may be used to signal initiality or
interrogativity, so far, however, I have not discussed how the majority of plateaux studied
in this thesis and in House et al. (1999a and 1999b) might be related to the various
biological codes.
Why should plateaux be found so consistently in nuclear position?
Cruttenden (1997: 42) states that the word ‘nucleus’ “is used to describe the pitch accent
which stands out as the most prominent in an intonation group”.
In broad focus
declaratives in English, such as the utterances studied in Chapter 2, the nucleus is placed
on the stressed syllable of the final content word of the intonation phrase. This final
position of the nucleus may be at odds, to some extent, with the idea that the nuclear
accent should be the most prominent since the effects of declination will mean that the
nucleus will have a lower maximum frequency than any of the prenuclear accents.
Although this problem is partially solved by listeners normalising for declination it is
possible that speakers also produce a plateau in such circumstances to add to the effect of
prominence.
Although this dissertation has not looked at the realisation of high targets in prenuclear
position, unpublished work from the ProSynth project (Knight ms) suggests that
prenuclear accents can be modelled successfully for speech synthesis by using one high
turning point representing the peak rather than two points representing the beginning and
end of a plateau. Thus it seems that plateaux are usually a feature of nuclear rather than
prenuclear position, except in paragraph initial position (cf. Wichmann et al. 1999),
where they may be used to signal initiality.
5.5.3 Physiology of intonation
We have still not fully resolved why speakers should consistently produce plateaux in
nuclear position. Rather than realising a plateau to increase prominence the speaker
could, in theory, raise the frequency of the peak or alter the slope of declination. The
physiology of intonation suggests, however, that these strategies may be impossible or at
least rather effortful.
The role of the plateau in the perception of duration, pitch height and prominence 181
It is likely that the phenomenon of declination is automatic to some extent. Ladd (1984:
64) in reviewing this issue states that the idea of an automatic declination component is
attractive as it explains the near universality of declination across the world’s languages.
An automatic component would also explain the declination evident in the vocalisations
of other primates (Hauser and Fowler 1992). Indeed it seems that there is good evidence
for declination being linked to an automatic physiological mechanism.
For example, Lieberman (1967: 25) suggests that falls in fundamental frequency at the
end of “unmarked breath groups” are due to a drop in subglottal pressure at the end of
exhalation so that the pressure within the lungs will be below atmospheric pressure for the
next inhalation. Collier (1975) takes up this idea and measures both subglottal pressure
and the activity of various laryngeal muscles during production of Dutch sentences. The
activity of the laryngeal muscles was found not to vary systematically with the slope of
declination. Measures of subglottal pressure, on the other hand, show that, whilst it does
not covary with the individual accents in the utterance, pressure is indeed positively
correlated with the slope of declination.
Thus, Collier (1975) sees intonation as being due to two interacting components. The
first component is the declination line related largely to subglottal pressure. The second
is the individual accents that are superimposed onto this declination line.
The
mechanisms responsible for the control of these accents are different from those
responsible for the slope of the declination line. Collier (1975: 250) shows that the main
determiner of the height and location of F0 accents is the action of the cricothyroid
muscle. The cricothyroid muscle produces activity around 94 ms before F0 peaks and is
activated more strongly before higher frequency peaks. It seems that activity of this
muscle lengthens the vocal folds making them longer and tighter and therefore likely to
vibrate at a higher frequency (see Orlikoff and Kahane 1996 for a review of the structure
and function of the larynx including the activity of the different laryngeal muscles).
The role of the plateau in the perception of duration, pitch height and prominence 182
Collier (1975) demonstrates that for two F0 peaks of around the same frequency, the one
later in the utterance is associated with a greater activity of the cricothyroid muscle. He
suggests that this greater activation is produced in order to overcome the pitch lowering
effect of the falling subglottal pressure.
Also related to this is the action of the
cricothyroid muscle when the speaker produces a plateau. In the long (around 800 ms)
plateau produced by the speaker in the example given by Collier (1975: 254) the activity
of the cricothyroid muscle is sustained throughout.
Thus, if declination is physiologically determined and difficult to override it is possible
that speakers may wish to find energy-efficient ways to counteract its effects. This could
be the physiological basis behind the plateaux we find in nuclear position. The realisation
of a high target as a plateau adds to the perceived height and prominence of the nucleus
without the effort needed to counteract the slope of declination by actually achieving a
higher fundamental frequency. Thus, it may be more economical, in some sense, for
speakers to sustain the activity of the cricothyroid muscle to create a plateau rather than to
initiate greater activity in order to produce a higher peak. In this way plateaux in nuclear
position in broad focus declaratives are again substitute variables for peak height but this
time under the umbrella of the effort code. The speaker uses a plateau to give the
impression of height, which the listener associates with greater effort and therefore
greater importance.
5.6 Conclusions
The presence of a plateau in the contour adds to the perceived height and prominence (but
not to the perceived duration) of the syllable with which it is associated. This perceptual
effect is probably caused by the auditory sluggishness found when frequency changes
rapidly. The effect of this sluggishness is that in peak stimuli the system fails to phase
lock to the highest frequencies of the peak and therefore pitch is perceived as lower than
it actually is. For stimuli containing a plateau, the system has enough time to phase lock
and pitch is perceived more accurately.
The role of the plateau in the perception of duration, pitch height and prominence 183
It seems that in natural speech speakers may use their knowledge about the workings of
the auditory system to take advantage of various biological codes that relate intonational
form to meaning. In this way, plateaux may be used as a substitute variable for peak
height. Thus plateaux are associated with paragraph initial accents (the production code),
accents marking questions (the frequency code) and the nuclear accent in broad focus
declaratives (the effort code).
The physiological reasons behind a speaker’s choice of a plateau rather than a higher
frequency peak probably lie in the different actions of the cricothyroid muscle. This
muscle must be activated for a longer time to create a plateau than to create a peak but
more strongly to produce a peak of a higher frequency. As both actions create the same
perceptual effect it is likely that the speaker chooses the tactic that is more energyefficient.
Download