The role of the plateau in the perception of duration, pitch height and prominence 146 Chapter 5 The role of the plateau in the perception of duration, pitch height and prominence 5.1 Introduction It has been shown that the plateau may be an important marker of linguistic structure (Chapter 3) and assist in the process of spoken word recognition (Chapter 4). These previous results, however, relate to the alignment of the plateau and most likely only to the end point. Another important question is why speakers ever produce plateaux. There appears to be no physiological reason why speakers could not rise to a single high point and then fall immediately back down without sustaining high pitch and forming a plateau. Part of the reason for the occurrence of plateaux may relate to the results discussed in Chapter 3, based on the work of Xu (2002), which demonstrate that speakers often do not change pitch at maximum speed even when they are using an expanded span. It is likely that speakers do not want to use maximum energy and resources all the time and it may in some sense be easier to produce a smoothed plateau rather than a sharp peak. It may also be the case, however, that speakers use the plateau for communicative effect. The three experiments reported in this chapter investigate how the longer stretch of high pitch in the plateau might affect perceptual attributes such as perceived pitch, duration or prominence of the syllable with which the plateau is associated. It is quite likely that some effect of the plateau will be found on the perception of duration or pitch height (and therefore prominence). As Lehiste (1970) points out (e.g. p36) it is clear that to some extent suprasegmentals do co-occur and interact. Therefore it is possible that the different durations of high pitch between a plateau and a peak might affect the perception of pitch itself or the perception of the duration of the syllable with which they are associated. This chapter presents three experiments that aim to address these issues. Specific interactions of variables (or lack thereof) will be dealt with in the discussions following each experiment. The role of the plateau in the perception of duration, pitch height and prominence 147 5.2 Experiment 1 – Perceived duration 5.2.1 Introduction An initial hypothesis was that a plateau in the contour (as opposed to a peak) might make a syllable sound of increased length. This could be the case, for example, if the longer stretch of high pitch interacts with the perceived duration of the segments giving the impression that they too are of greater duration. 5.2.2 Method 5.2.2.1 Stimuli The sentence ‘Anna came with Manny’ was spoken by a female phonetician who was instructed to produce a falling nuclear accent on the word ‘Manny’. The word ‘Manny’ was cut from this sentence and different versions were created by resynthesis as shown in Figure 5.1. Versions were created with a maximum frequency of each of 160, 170, 180, 190, 200, 210 and 220 Hz. At each of these seven frequencies two different versions were created in which the shape of the contour associated with the nuclear tone varied. In one version the high tone was realised as a 100 ms plateau and in the other as a sharp peak. In the contours containing a plateau, EP was placed at the same position as the peak in peak stimuli and the plateau was extended backwards in time. Previous experiments (specifically that described in Chapter 4) have indicated that the end of the plateau is most likely to be the point used in spoken word recognition so it is important to maintain its alignment in order to maintain the naturalness of the stimuli. The rise to the peak or SP began at the beginning of the word. The role of the plateau in the perception of duration, pitch height and prominence 148 Figure 5.1 Stimuli used in Experiment 1. The solid line represents the peak stimuli whilst the dotted line represents the plateau stimuli 5.2.2.2 Subjects There were 24 subjects, fifteen were female and nine were male. Their ages ranged from 18 to 31 with a mean of 21.7. Fourteen were proficient musicians having attained grade seven or eight on one or more instruments. Six had some musical training but had attained a lower level of proficiency and four had never received musical training. All were students at the University of Cambridge and were paid for their time. The role of the plateau in the perception of duration, pitch height and prominence 149 5.2.2.3 Setup and instructions Testing took place individually in a sound-treated room in the same session as testing for Experiment 3. The experiment was presented using shell scripts running on a Silicon Graphics workstation. Stimuli were presented through headphones. For each maximum frequency all possible pairwise combination of stimuli were created. Thus, subjects heard each version of each item paired with itself (14 pairs), each peak as the first item paired with the same frequency plateau (seven pairs) and each peak as the second item paired with the same frequency plateau (seven pairs). Pairs of items were pseudo-randomised with the condition that the same stimulus version did not occur in two successive pairs. Subjects were instructed to decide which member of each pair was the longer word and press the corresponding button (marked 1 and 2) on the keyboard. They were warned that they might find the task very difficult and were told that they should try and give a response for each pair even if they thought they were guessing. They were also told to concentrate only on the length of the presented words and to ignore any other difference, such as differences in perceived pitch or intensity. Before the main experiment, subjects took part in a practice session consisting of items with the high tone resynthesised at frequencies not used in the main experiment. In this practise session subjects were exposed to one pair of identical peak stimuli, one pair of identical plateau stimuli, one pair where the plateau preceded the peak and one pair where the peak preceded the plateau. The role of the plateau in the perception of duration, pitch height and prominence 150 5.2.3 Results As shown by the black bar in Figure 5.2, the plateau stimuli sounded longer than the peak stimuli only 49% of the time. Paired t-tests show that this is not significantly different from the result that would be predicted by chance (t(23) = 0.327, p>0.05). Further analyses were undertaken to see if the position of the plateau had any effect on the results. The percentage of responses for ‘plateau sounds longer’ when the plateau was in second position (51% as shown by the dark grey bar in Figure 5.2) were tested against the percentage of times the second item was perceived as longer when two identical stimuli were presented (52%). There is no significant difference between these results (t(23) = 0.145, p>0.05). The same calculation was made for plateaux in first position, which sounded longer 46% of the time (as shown by the light grey bar in Figure 5.2) against 48% of responses for ‘first item sounds longer’ for pairs of identical stimuli. Again the result was not significant (t(23) = 0.511, p>0.05). These results demonstrate that a plateau in the contour does not make syllables sound longer than a peak. Figure 5.2 Percentage of 'plateau sounds longer' responses overall and separately for the plateau as the first and second member of the pair The role of the plateau in the perception of duration, pitch height and prominence 151 5.2.4 Discussion It seems that there have not been any previous studies investigating the perceived duration of syllables or segments when tones of different durations are associated with them. Nevertheless, a review of the literature seems to suggest that in general the suprasegmentals duration (or quantity) and frequency do not interact. For example, Lehiste (1970: 82) states that there is no known evidence that segment length has any effect on F0 height or that F0 height has any effect on the duration of segments in production. In terms of perception, it similarly appears that the realms of frequency and duration are independent. For example, both Small and Campbell (1962) and Ruhm et al. (1966, cited in Lehiste 1970) indicate that difference limens for the perception of duration remain the same regardless of the frequency (ranging from 250 to 5000 Hz in the different studies) of the tones presented. The results found in the present experiment are obviously different in nature from these earlier results. Not only are the present results based on speech stimuli rather than pure tones, they also require the listener to judge the duration of the syllable when the duration (rather than the frequency) of the associated tone is varied. Nevertheless, the absence of any effect seems to fit well with the general finding that frequency and duration are largely independent. The role of the plateau in the perception of duration, pitch height and prominence 152 5.3 Experiment 2 – Perceived pitch height 5.3.1 Introduction A second hypothesis was that the plateau might have an effect on the perceived pitch of a syllable. Assuming an effect was found it could operate in one of two different ways. Firstly, a longer plateau could make a syllable sound higher as the high pitch is sustained for some time. On the other hand, the perception of height could be associated with pitch dynamism. As we have seen in Chapter 3, accents with a higher maximum frequency not only have greater pitch excursions but also have shorter plateaux. Therefore, a plateau could make the syllable sound less high in pitch as there is less pitch movement in a given period of time. 5.3.2 Method 5.3.2.1 Stimuli Stimuli for this experiment were based on the utterance ‘came with Manny’ taken from the sentence ‘Anna came with Manny’ recorded for Experiment 1. This utterance was resynthesised to create twelve different versions (shown in Figure 5.3) varying in maximum frequency and shape of the nuclear accent (on ‘Manny’). Versions were created with maximum frequencies of 160, 180, 200 and 210 Hz1. At each of these frequencies, three versions were created which varied in contour shape so that the accent associated with the word ‘Manny’ was realised as a sharp peak or a plateau of either 50 or 100 ms in duration. As in Experiment 1, plateaux were created by aligning the end of the plateau at the same place as the peak in peak stimuli and extending the high frequency backwards in time. This method of creating the plateau was additionally important in this experiment as, as will be discussed later in section 5.4.1, events later in the utterance or text sound higher than events of the same frequency earlier on due to perceptual compensation for the effect of declination. So, extending the plateau later in time than the peak (rightwards) could lead to a false result indicating that plateaux sound higher than peaks when the real difference is that the accent extends later into the utterance and is, therefore, perceived as higher in pitch by virtue of its position. 1 A version with the contour reaching 220Hz was not created as the speaker did not produce accents that were this high even in utterances with narrow focus on ‘Manny’ The role of the plateau in the perception of duration, pitch height and prominence 153 Figure 5.3 Stimuli used in Experiment 2. The solid line represents peak stimuli, the dashed line represents 50 ms plateau stimuli, and the dotted line represents 100 ms plateau stimuli 5.3.2.2 Subjects Subjects were six members of the University of Cambridge. Five were male and one female. Their ages ranged from 19 to 50 and the mean was 30 years of age. All except one had a high level of musical training. Two of the male subjects were also subjects in Experiments 1 and 3. 5.3.2.3 Setup and instructions At each frequency each version of ‘came with Manny’ was paired with itself. Also, every version with a peak occurred twice as the first member of the pair and twice as the second member of a pair when the other member was a 50 or 100 ms plateau of the same frequency. At each frequency the two durations of plateau were also paired with each other, with each item occurring once in first position and once in second position. This made 36 pairs altogether, the combinations being shown in Table 5.1. Although members of a pair only ever differed in shape and never in frequency subjects were nevertheless told to listen to each pair and compare the pitch of the word ‘Manny’ in each utterance. They were instructed to press one of two buttons to indicate in which member of each pair (1 or 2) ‘Manny’ sounded higher in pitch. The role of the plateau in the perception of duration, pitch height and prominence 154 First Member Peak 50 ms plateau 100 ms plateau Peak Peak 50 ms plateau 50 ms plateau 100 ms plateau 100 ms plateau Second Member Peak 50 ms plateau 100 ms plateau 50 ms plateau 100 ms plateau Peak 100 ms plateau Peak 50 ms plateau Type Identical Identical Identical Longer second Longer second Longer first Longer second Longer first Longer first Table 5.1 Stimuli combinations created at each frequency in Experiment 2 5.3.3 Results 5.3.3.1 Overall Initial results, shown in Figure 5.4, were calculated for the responses overall. The number of times that a longer stretch of contour sounded higher than a shorter stretch was counted and expressed as a percentage of total responses. Thus, these results include the number of times either length of plateau is perceived as higher in pitch than a peak and also the number of times a 100 ms plateau sounds higher than a 50 ms plateau. All t-tests in this experiment are two-tailed, as the hypotheses predicted that any significant differences could occur in either direction. Overall, as shown by the black bar, longer stretches of contour sound higher in pitch than shorter stretches 73% of the time, significantly more often than would be predicted by chance (t(5) = 4.793, p<0.01). Position of occurrence was taken into account by testing the percentage of ‘longer stretch of contour sounds higher’ responses in either position against the percentage of responses favouring that position when identical stimuli were presented. When position of occurrence is taken into account in this way the overall result is only significant when the plateau is in the second member of the pair (79% vs. 54%, t(6) = 4.108, p<0.01) as shown by the dark grey bar, and not when it is in the first member of the pair (67% vs. 46%, t(6) = 1.806, p>0.05) as shown by the light grey bar. The role of the plateau in the perception of duration, pitch height and prominence 155 Figure 5.4 Results overall and separately for the longer stretch of contour in first and second position 5.3.3.2 By frequency Separate analyses were conducted on results at each frequency to see if the overall significant result holds. Again these results include comparisons of peaks and both lengths of plateau and also comparisons of the two different lengths of plateau. Analysis by frequency reveals that the longer stretches of contour sound higher than the shorter stretches significantly more often than would be predicted by chance for stimuli of 160 Hz (67%, t(6) = 3.873, p<0.01), 200 Hz (83%, t(6) = 5.477, p<0.01) and 210 Hz (67%, t(6) = 2.7050, p<0.05). At 180 Hz however, although the plateau sounds higher 72% of the time, the result failed to reach significance (t(6) = 2.390, p = 0.062). These results are shown in Figure 5.5. The role of the plateau in the perception of duration, pitch height and prominence 156 Figure 5.5 Results shown separately for each frequency 5.3.3.3 By shape The final set of results compares the different contour shapes to each other. These results include responses at every frequency and are shown in Figure 5.6. Overall, the 50 ms plateau sounds higher than the peak 77% of the time (t(6) = 7.050, p<0.01) and the 100 ms plateau also sounds higher than the peak 77% of the time (t(6) = 4.540, p<0.01). There is no significant difference between responses when subjects compare the 100 ms and 50 ms plateau, however, with 65% of responses for ‘100 ms plateau sounds higher’ (t(6) = 1.941, p>0.05). The role of the plateau in the perception of duration, pitch height and prominence 157 Figure 5.6 Results shown separately for each combination of contour shapes 5.3.4 Discussion The results suggest that there is an effect of contour shape on the perception of pitch height but only in certain circumstances. Firstly, the effect is only reliably found when the longer stretch of contour is in second position presumably because, as discussed in more detail below, speakers will normalise for declination and therefore the perceived pitch height of accents later on in a text or utterance will be increased. Thus, the advantage to the perception of pitch height caused by the longer stretch of contour in first position is to some extent counteracted by the positional advantage of the shorter stretch. When the longer stretch is in second position it enjoys both advantages. It is noteworthy that when the shorter stretch is in second position it does not sound higher in pitch than the longer stretch (there is no significant difference) indicating that the effect of length is more important than the order effect. The role of the plateau in the perception of duration, pitch height and prominence 158 The general result seems to hold at every frequency but there is one a strange result in that the finding is of only borderline significance at 180 Hz. An investigation of the individual subject responses suggests that this is due to one subject who thought the longer stretches of contour sounded higher in only 17% of cases. This was an unusual finding even for this particular subject who consistently perceived longer stretches of contour as higher in pitch at all the other frequencies. At 180 Hz two other subjects responded at 50%, two others at 67% and two at 100%, suggesting that in general the overall finding holds at this frequency too but does not reach significance due to the unusual responses of a single subject. Finally, it is clear that there is no significant difference in perceived pitch between the two lengths of plateau. This suggests that there is a categorical effect of peak versus plateau rather than a gradient effect of plateau length. At first sight it would appear that two explanations could fit the data, one based on integration and one on temporal smoothing. An explanation based on integration would suggest that the listener integrates the entire area under the pitch curve when making judgements about pitch height. Therefore plateaux, which, as shown in Figure 5.7, have a greater area under the curve than peaks of the same frequency, would sound higher in pitch. An explanation based on temporal smoothing would suggest that listeners do not extract pitch at every point in time, as shown in Figure 5.8, and therefore peaks may be perceived as less high than they actually are due to the brief amount of time spent at the highest frequency. The role of the plateau in the perception of duration, pitch height and prominence 159 Figure 5.7 The greater area under the curve of a plateau than under a peak of the same height Figure 5.8 A peak perceived as being of a lower frequency due to temporal smoothing in the auditory system The two explanations make different predictions about what will happen as plateaux get increasingly longer. The integration explanation predicts that the perceived pitch will continue to increase as the area under the curve increases. The smoothing explanation predicts that this will not be the case. The lack of a perceptual difference between the two plateau durations found in the present experiment shows that perceived pitch does not continue to increase in a gradient fashion and therefore supports the smoothing explanation. The exact physiological mechanism behind such an explanation will be discussed in detail in section 5.5.1. The role of the plateau in the perception of duration, pitch height and prominence 160 5.3.4.1 The relation of the present findings to early psychoacoustic studies Overall, the results are largely in support of much of the psychophysical literature from the 1940s and 1950s. For example, Doughty and Garner (1947 and 1948) specifically investigate the pitch characteristics of short tones. Doughty and Garner (1947) demonstrate that some pitch percept is available even in very short tones. The ‘clickpitch’ threshold is considered to be the shortest duration at which some pitch identification is possible from a tone. Doughty and Garner (1947) demonstrate that this threshold is approximately 11 ms for tones of 250 Hz. However at this ‘click-pitch’ threshold even reasonably good pitch discrimination is not possible. Therefore, Doughty and Garner (1948) investigate how the perceived pitch of a tone changes as duration is shortened. In one experiment subjects were played two tones and asked to alter the frequency of the second, which ranged from 6 to 200 ms in duration, until it matched the frequency of the first. Results show that at each frequency tones tend to be perceived as lower in pitch when they are of shorter duration. Specifically, at 250 Hz (the frequency closest to the speech in the present experiment) the pitch change was estimated to be –4% when the tone was 6 ms in duration (Doughty and Garner 1948: 484). Pitch perception was considered to be poor at this duration as it is close to the ‘click-pitch’ threshold (Doughty and Garner 1948: 490). There was still some loss of pitch (approximately 1 or 2%) for tone durations as long as 25 ms but at all longer durations pitch remained relatively constant. The role of the plateau in the perception of duration, pitch height and prominence 161 In many respects these results fit well with the results of the experiment reported here. Firstly, the direction of the change is the same; tones sound lower in pitch at shorter durations. Secondly, there is not a gradient effect of tone duration as the drop in perceived pitch occurs only at the shortest durations. Thirdly, the main difference in perceived pitch occurs below 25 ms in accordance with the present finding that there is no significant difference between the perceived pitch of plateaux of 50 and 100 ms in duration. It is however important to remember that Doughty and Garner’s (1948) experiment involved monotone stimuli. It seems likely therefore that the findings result from a different mechanism to the temporal smoothing hypothesis suggested to explain the results of the present experiment. The temporal smoothing hypothesis deals with stimuli that change rapidly in pitch and makes different predictions for some types of stimuli. For example, the temporal smoothing hypothesis suggests that a low plateau would sound lower than a well defined trough as the stimuli stays low for longer giving the auditory system a greater chance to perceive the lower pitch. Doughty and Garner’s (1948) results from monotone stimuli suggest that longer stimuli will always sound higher in pitch than shorter stimuli of the same frequency. Therefore, although the results of the present experiment are very similar in some respects to those in the early experiments it is important to remember that they are likely to be based on different auditory mechanisms. The role of the plateau in the perception of duration, pitch height and prominence 162 5.4 Experiment 3 - Perceived pitch height and prominence 5.4.1 Introduction Experiment 2 demonstrated that a plateau in the intonation contour does make syllables sound higher in pitch than a peak of the same frequency. It is therefore possible that a plateau will also make a syllable sound more prominent. Rietveld and Gussenhoven (1985) investigate the relationship between pitch excursion size and prominence. The authors resynthesised natural rise-falls in sentences in Dutch and varied the size of the pitch excursion by either increasing the maximum frequency of the accent by 1.5 or 3 semitones or decreasing it by 1.5 semitones. These resynthesised sentences were played to Dutch listeners who judged which of the two accents in each was the most prominent and indicated how certain they were about their choice. The results show that greater pitch excursions are perceived as more prominent and that listeners are sensitive to changes as small as 1.5 semitones. It is not, however, just the size of the pitch excursion that affects listeners judgements of pitch and prominence. Pierrehumbert (1979) demonstrates that the position of an accent in the utterance will also have an effect. Pierrehumbert (1979) resynthesised a string of nonsense syllables so that the maximum frequency of the second of two stressed syllables was increased and decreased by small increments. Subjects judged whether the first or second accent in each syllable string sounded higher in pitch. The results indicated that when the two accents sounded equal in pitch the second accent was actually about 10 Hz lower in frequency. These results are interpreted as reflecting “normalisation for declination” (Pierrehumbert 1979: 363). Declination, or the tendency for pitch to drift downwards over an intonational phrase, means that peaks later in the utterance will, in general, be less high than those earlier on. It is suggested by Pierrehumbert (1979) that the listener normalises for the expected slope of declination and therefore assumes that peaks later on are ‘worth more’ as they are further away from the speaker’s (declining) baseline than peaks of the same frequency earlier on in the utterance. The role of the plateau in the perception of duration, pitch height and prominence 163 Further experimental results from the literature suggest that many aspects of the contour may influence the degree to which listeners normalise for declination and pitch height. For example Gussenhoven and Rietveld (1998) resynthesised a female voice so that the formants were appropriate for either an average male or female speaker. In a similar task to those described above, listeners were asked to judge the prominence of accents within a sentence. Results showed that subjects judged accents of the same frequency to be more prominent when the formants were appropriate for a male than when they were appropriate for a female, suggesting that listeners make a judgement about the speaker’s natural pitch range and assign prominence based on the relation of an individual accent to this estimate. So, in this case, the male-sounding voice is expected to have a narrower pitch range and therefore accents are assumed to be further above the baseline, and therefore more prominent, than accents of equal frequency in the speech of the femalesounding voice. The studies described in this section are of importance for two reasons. Firstly, they indicate that higher accents, those with a greater pitch excursion, are usually interpreted as more prominent and this suggests that the plateau, which, as we have seen, increases the perception of height, may also affect judgements of prominence. Secondly, the method employed in these studies will allow us to judge the effect of the plateau on perceived prominence. If the plateau is included as a variable in an experiment similar to those cited above, we should expect syllables with a plateau in the contour to be judged as more prominent than syllables with a sharp peak of the same frequency. The hypothesis is, therefore, that two accents will sound of equal pitch when the second is at a lower frequency if it is realised as a plateau rather than as a peak. Equal Pitch Equal Pitch Figure 5.9 Accents will sound of equal pitch if the second accent is realised as a plateau at a lower frequency The role of the plateau in the perception of duration, pitch height and prominence 164 5.4.2 Method 5.4.2.1 Stimuli The sentence ‘Anna came with Manny’, used in the previous two experiments, was resynthesised so that the nuclear accent (on ‘Manny’) varied in both frequency and shape. Frequency varied in seven equal steps between 160 and 220 Hz whilst at each of these frequencies the shape of the contour was either a sharp peak or a plateau of 100 ms in duration. As in previous experiments the plateau was extended backwards in time and the rise began at the beginning of the word ‘Manny’. In each case the rest of the utterance was unmodified with the frequency of Anna remaining at 224 Hz. Figure 5.10 is a schematic diagram of the different pitch contours created. Anna Manny Figure 5.10 Schematic representation of the pitch contour in different versions of the sentence 5.4.2.2 Setup and instructions Subjects heard each of the sentences in a pseudo-random order (so that no frequency or shape occurred in two successive presentations) and for each were asked to compare the accents on ‘Anna’ and ‘Manny’. All subjects heard the same sentences in the same order but twelve were asked which accent sounded higher in pitch whilst the remaining twelve were asked which accent was more prominent. They registered their choice by pressing one of two buttons labelled ‘A’ and ‘M’. This experiment was run after the experiment presented in this chapter as Experiment 1 using the same setup as described there. The role of the plateau in the perception of duration, pitch height and prominence 165 5.4.2.3 Subjects Subjects were the same 24 who served as subjects in Experiment 1. The twelve who judged pitch height were six men and six women aged between 18 and 31 (mean = 21.9). Seven were proficient musicians, four had some musical training whilst one was not musical. The twelve who judged accent prominence were three men and nine women aged between 20 and 25 (mean = 21.3). Seven were proficient musicians, two had some training and three were not musical. 5.4.3 Results Probit analysis identified the point of subjective equality in each stimulus series (peak and plateau) for each subject. The point of subjective equality is the frequency of ‘Manny’ at which the accents on ‘Anna’ and ‘Manny’ sound to be of equal prominence or pitch and therefore subjects are effectively guessing in order to make responses. For subjects asked about height, there were more ‘Manny’ responses for plateau than peak stimuli at every frequency except the lowest, as shown in Figure 5.11. In addition, the mean point of subjective equality (roughly equivalent to 50% on the graph) was at a higher frequency in the peak than in the plateau series. Paired t-tests revealed that this difference was significant (201 Hz vs. 190 Hz, t(11) = 5.086, p<0.01). The results for judgements about prominence closely mirrored those for height. As shown in Figure 5.12 there were more ‘Manny’ responses to plateau than peak stimuli at each frequency except the lowest and the point of subjective equality was higher in the peak than in the plateau series (209 Hz vs. 196 Hz, t(11) = 3.623, p<0.01). Comparisons were also made between the point of subjective equality found for height and prominence judgements. There was no significant difference between results gained by the height and the prominence task for either the peak (201 Hz vs. 209 Hz, t(11) = 1.966, p>0.05) or the plateau stimuli set (190 Hz vs. 196 Hz, t(11) = 1.384, p>0.05) suggesting that both groups of subjects performed in the same way and that responses did not differ depending on the instructions (pitch or prominence) given. The role of the plateau in the perception of duration, pitch height and prominence 166 100 % of Manny responses 90 80 70 60 Peak 50 Plateau 40 30 20 10 0 160 170 180 190 200 210 220 Frequency of Manny (Hz) Figure 5.11 Percentage of 'Manny' responses at each frequency in the height judgement task 100 % of Manny responses 90 80 70 60 Peak 50 Plateau 40 30 20 10 0 160 170 180 190 200 210 220 Frequency of Manny (Hz) Figure 5.12 Percentage of 'Manny' responses at each frequency in the prominence judgement task The role of the plateau in the perception of duration, pitch height and prominence 167 5.4.4 Discussion These results suggest that the results found for pitch height in Experiment 2 can indeed be extended to prominence judgments as in the present experiment a plateau in the contour makes the syllable sound both higher and more prominent. They also suggest that, in the present experiment at least, pitch height is a close correlate of perceived prominence as the results of the two tasks are so similar. Although not significant, the results suggest that the point of subjective equality is higher in both stimulus sets when subjects are asked about prominence than when they are asked about pitch height. This trend is the opposite of that found by Terken (1999). However, both the stimuli and the task were different in Terken’s experiment. Terken used a string of monosyllables and asked subjects to adjust the frequency of the second peak until it was of equal pitch or prominence to the first peak. In addition, Terken (1999: 1775) states that the timing of the accents may have been somewhat unnatural, both first and second accents being timed early within the syllable. It is possible that the non-linguistic nature of the stimuli removed the close association between pitch and prominence found here and also that the unnaturalness of the timing of accents led to the different result. The results of the present experiment also suggest that in both tasks there is about a 10 Hz difference between the point of subjective equality in peak and plateau stimuli. This result suggests that the effect of the plateau is to raise the perceived pitch of an accent by 10 Hz. This is roughly consistent with the results of Doughty and Garner (1948) who, as discussed in section 5.3.3, found that at 250 Hz the pitch drop for the shortest tones corresponded to 4% or 10 Hz. However, as suggested above, the mechanisms behind the two results are probably rather different as Doughty and Garner worked with monotone stimuli rather than dynamic pitch contours. The role of the plateau in the perception of duration, pitch height and prominence 168 The results from the peak stimuli alone suggest that the point of subjective equality occurred at a lower frequency than in the results presented by Pierrehumbert (1979). In Pierrehumbert’s (1979) results, subjective equality occurred when the second accent was 10 Hz lower than the first. In the present experiment the accent on ‘Anna’ was 224 Hz in frequency and the point of subjective equality was at 201 Hz (a difference of 23 Hz) for height judgements and 209 Hz (a difference of 15 Hz) for prominence judgements. This slight difference in the details of the results may be explained on the basis of several factors: the present stimuli consist of real words rather than reiterant speech, the quality of the resynthesis produced by newer systems is likely to better, and the different speakers probably have different declination slopes. The role of the plateau in the perception of duration, pitch height and prominence 169 5.5 General Discussion 5.5.1 The physiological mechanisms underlying temporal smoothing The results from Experiments 2 and 3 have shown that plateaux sound higher in pitch (and therefore more prominent) than sharp peaks of the same maximum frequency but there is no perceptual difference between two plateaux of different durations. As discussed in section 5.3.3, the categorical rather than gradient effect suggests that a suitable explanation may be one based on temporal smoothing. In order to understand more fully the mechanisms behind this temporal smoothing, we must turn to the physiological processes that underlie pitch perception. It is generally considered that there are two perceptual mechanisms for extracting pitch from a signal (see Moore 1997, especially chpt. 5, for a review). The first of these is known as the place mechanism whereby the perceived pitch is related to the pattern of excitation on the basilar membrane. Specifically, the basilar membrane, which acts like a filter bank, vibrates maximally at different places along its length in response to different frequencies in the signal. Thus, the location of the maximum vibration allows for the extraction of pitch. Although the place mechanism operates to some extent at all frequencies it is generally considered to be less useful for complex sounds such as speech due to the complex excitation pattern that such stimuli produce on the basilar membrane. The second pitch extraction mechanism is considered to be a temporal mechanism based on the timing of neural activity in the auditory nerve. The temporal mechanism is believed to be the main determiner of perceived pitch for speech sounds. As the basilar membrane vibrates in response to stimuli, a shearing motion is created between it and the tectorial membrane. This motion displaces the sterocilla of the outer hair cells causing the inner hair cells to fire and send information to the auditory nerve. Phase locking occurs as spikes in the activity of the auditory nerve occur regularly at the same phase of the stimulating waveform. In this way pitch is extracted from the signal as integral multiples of the intervals between nerve firings. Temporal mechanisms of extracting pitch are present over the entire speech range but generally not above 5 kHz as phase locking cannot occur at such rapid frequencies due to physiological constraints. The role of the plateau in the perception of duration, pitch height and prominence 170 Even at frequencies below 5 kHz recent work suggests that the phase locking mechanism may not work well if the frequency of the signal is changing too rapidly. For example, Sek and Moore (1999) investigate the discrimination of frequency steps linked by glides of various durations. In this experiment subjects compared the pitch of two sounds. One sound was a sinusoid of constant frequency and the other consisted of an initial and final portion of steady frequency linked by a downward frequency glide of between 5 and 500 ms in duration. The duration of all stimuli was 500 ms, so that stimuli with longer glides had correspondingly shorter steady states before and after the glide. Frequency discrimination was found to be worse as the glide duration increased. In particular performance worsened between glide durations of 200 ms and 500 ms. For glides of 200 ms there was still some steady region of frequency whereas for glides of 500 ms there was no steady state. In a second condition the frequency glide was replaced by a 5 ms interval of silence. A comparison of this condition and the stimuli containing 500 ms glides reveals that below 4 kHz performance was better for the stimuli containing silence. The authors suggest that, at frequencies where phase locking is operating, discrimination is better when frequency is changing less rapidly. Gockel et al. (2001) go on to investigate this effect in response to frequency modulated tones. They obtained pitch matches between unmodulated sinusoids and those with repeated u-shaped or inverted u-shaped frequency modulation functions. Subjects adjusted the frequency of the unmodulated tone until it matched the frequency of the modulated one. Results showed that for u-shaped modulation patterns the stimuli were matched to sinusoids of lower frequencies than the mean frequency of the unmodulated sinusoid. For the inverted u-shaped stimuli matches were made to sinusoids of a higher mean frequency. The authors interpret these pitch shifts as demonstrating auditory sluggishness in that the portions of the signal where pitch is changing rapidly “receive less weight in the computation of pitch than portions where the frequency is changing more slowly” (Gockel et al., 2001: 705) and refer to this effect as “stability-sensitive weighting” (Gockel et al., 2001: 702). The role of the plateau in the perception of duration, pitch height and prominence 171 It seems likely that ‘stability-sensitive weighting’ can, to some extent, explain the results found in Experiments 2 and 3. In peak stimuli, the fundamental frequency is changing rapidly and it is possible therefore that the auditory system does not have time to phase lock to the very highest frequency as it is reached so briefly. In this way peak stimuli will be perceived as lower in pitch than they actually are. For plateau stimuli, on the other hand, the highest frequency is sustained for enough time to allow the system to phase lock and the contour to be perceived as the height that it really is. This stability-sensitive weighting effect also explains the lack of a difference between the two plateau durations found in Experiment 2. In the 50 ms plateau stimuli the high frequency is already sustained for enough time for the system to phase lock and the real pitch to be extracted so lengthening the plateau further has no significant effect. It is likely that the differential ability of the system to phase lock also influences the results of experiments such as those reported by Doughty and Garner (1948) as the shortest tones may be too brief for phase locking to occur as so few cycles of the waveform are presented. 5.5.2 Biological codes relating intonational form to function It is clear then that the shape of the contour can affect the perception of pitch and that this effect can be explained by physiological mechanisms such as auditory sluggishness. However, in order to answer the original question of why speakers produce plateaux at all, we need to consider how speakers can use their knowledge that plateaux cause perception of higher pitch for communicative purposes. The following sections will consider how speakers make use of universal, biological codes that relate intonational form to function. The role of the plateau in the perception of duration, pitch height and prominence 172 Gussenhoven (2002) discusses three metaphors or biological codes that relate aspects of the speech production process to meaning. The first of these, the frequency code (e.g. Ohala 1994), relates to the fact that there are universal tendencies for particular uses of fundamental frequency to be associated with particular emotions and sentence types. Firstly it seems that there are cross-culturally similar uses of F0 to signal emotions. For example, high or rising pitch is generally associated with politeness, deference or lack of confidence. As an extension of this, most languages use a high or rising F0 to mark questions and a low or falling F0 to mark statements. This informational use is considered to be related to the emotional use of high or rising F0 because the questioner desires the good will of the listener and hence wishes to appear polite and deferent. The universal uses of F0 captured in the frequency code are explained by reference to the agonistic displays of non-human animals. Morton (1977) points out that in virtually all vocalising species, when competitive encounters occur, the animal who is, or who wishes to appear, more dominant makes lower frequency vocalisations than the animal who is more submissive. Morton (1977) links this vocal behaviour to aspects of visual competitive behaviour such as hair-raising in dogs and back-arching in cats, both of which make an animal appear larger. He suggests that lower frequency vocalisations give the impression of a larger larynx and, by extension, the impression of a larger overall body size. This impression of larger size is considered to be advantageous as a larger animal is more likely to win if a physical fight ensues. Higher frequency vocalisations, by contrast, are interpreted as the animal wishing to show its smaller size and that it does not want to fight. Impressions of size for agonistic purposes may also be permanent. The males of many species develop humps, antlers or manes, which give a permanent impression of larger size. Ohala (1994) considers the sexual dimorphism of the human larynx to also be a case in point. So, at puberty the male larynx both increases in size and descends, lengthening the vocal tract. Both of these features cause the adult male to have a lower fundamental frequency than infants or adult females. Ohala (1994) suggests that this change occurs at puberty because, in evolutionary terms, this is the time when the male will be ready to take on a protective or sexual role and therefore be engaged in combat situations. Also at puberty the human male’s facial hair begins to grow, a visual display of size equivalent to those of other animals. The role of the plateau in the perception of duration, pitch height and prominence 173 In addition to the frequency code Gussenhoven (2002) discusses two other biological codes that also exploit universal features of fundamental frequency for communicative purposes. These are the effort code and the production (phase) code. The effort code states that the relationship between F0 and meaning can be tied to the amount of effort expended by the speaker. More effort will result in more numerous, canonical and precise pitch movements. Thus, for example, wider pitch movements are associated with greater prominence (as discussed in section 5.4.1 above) because it is considered that the speaker puts in more effort as s/he believes the associated information to be important. The production code states that energy for speech becomes available in phases as it is linked to the breathing process. Thus more energy is available at the beginning than at the end of the utterance and therefore pitch will be higher at the beginning than at the end, all other things being equal. In communication then, high pitch at the beginning of an utterance may signal a new topic whilst low pitch may signal a continuation. The reverse is true of utterance ends where high pitch signals continuation and low pitch finality. Importantly, it seems that the physical correlates of these three codes do not actually have to occur for the intended meaning to be transmitted. It is enough to create the effect of a particular code without actually using the physiological mechanisms that underlie it. So, in the example of the effort code, where wider pitch movements are associated with emphasis and prominence, the pitch does not actually have to be higher for this effect to be created. This is nicely demonstrated by the experiment of Gussenhoven and Rietveld (1998) discussed in section 5.4.1, which shows that the perception of prominence is speaker dependent. The main finding was that peaks of the same frequency are perceived as less prominent if made by a female-sounding voice because they are considered to be closer to the baseline of a female speaker and to therefore have required less effort in production. This result is an indication that the effect of the effort code can be created without actually having to reach higher in pitch but merely by the impression that the speaker has made more effort. In this way the variables commonly associated with the three biological codes can be substituted for variables that create the same perceptual effect. The role of the plateau in the perception of duration, pitch height and prominence 174 5.5.2.1 Peak delay as a substitute variable for peak height One feature that, as we will shortly see, is related to the communicative use of the plateau is the substitution of peak delay for peak height. As we have seen in Chapter 3, higher peaks take longer to reach and are correspondingly timed later in the tone-bearing unit than lower frequency peaks. Thus, as Gussenhoven (2002) suggests, speakers can use peak delay as a substitute for (or an enhancement of) high peaks as listeners will interpret a late peak as a high peak by their knowledge of this constraint. In terms of the effort code Ladd and Morton (1997: 321) indicate that peaks are invariably later when speakers produce emphatic contours. In one perception experiment Ladd and Morton (1997) create two sets of stimuli using sentences such as ‘the alarm went off’. In both stimulus sets the accent (on ‘alarm’ for example) varied in frequency but in the second set the alignment of the peak was 60 ms later than alignment in the first set. Subjects took part in an identification task where they were asked whether the event being described was a normal or an unusual experience, previous experiments having shown that higher peaks resulted in more ‘unusual experience’ responses than lower peaks. The results indicate that the abrupt rise from ‘normal’ to ‘unusual experience’ responses occurs at a lower frequency in the late than in the early alignment stimuli indicating that late peaks lead to accents sounding more prominent or emphatic than earlier peaks of the same frequency. Gussenhoven (2002) also discuses how the frequency code can be exploited by the use of peak delay as a substitute for peak height. Gussenhoven and Chen (2000) recorded made up CVCVCV sequences and by resynthesis altered the height and alignment of the pitch peak. Stimulus pairs were played which varied in terms of height and alignment, and Dutch, Hungarian and Chinese listeners were asked to judge which member of each pair sounded more like a question. Results showed that, as expected, utterances with higher peaks sounded more like questions. In addition, utterances with later peaks also sounded more like questions. This result was true for listeners of all languages even though all three languages use different methods to signal interrogativity. Nevertheless, there appeared to be some language specific component as, for example, Hungarian listeners were more affected by peak height than Chinese listeners, an effect attributed to the different intonational means used to signal interrogativity in the two languages. Thus peak delay may also exploit the meanings associated with the frequency code. The role of the plateau in the perception of duration, pitch height and prominence 175 Finally, speakers may also use delayed peaks to simulate the effects of the production code. For example, Wichmann et al. (1999) investigate the effects of discourse structure on peak timing in English. Stimuli consisted of texts where the same word occurred in three different positions, sentence final (nuclear), paragraph initial (the first accented syllable in a paragraph) and sentence initial (the first accented syllable in a paragraph medial sentence). Measurements of peak alignment and height were taken from subjects’ readings of these texts. Results, reproduced below in Table 5.2, replicate findings that later accents are lower in frequency due to declination and also indicate that there is a significant difference between alignment in all discourse positions, with the more initial positions being associated with later peaks. In this way later peaks probably enhance the impression of height and help to signal that an accent is early in the utterance. Alignment (% syll) Height (Hz) Paragraph Initial 115.7 314 Sentence Initial 105.2 285 Sentence Final 62.2 189 Table 5.2 Peak alignment and height in different discourse positions (reproduced from Wichmann et al., 1999) 5.5.2.2 The plateau as a substitute variable for peak height Peak delay can be considered a substitute for peak height and it seems that the plateau may have the same effect. I will discuss this hypothesis in relation to the types of plateau introduced in Chapter 1. I shall begin by discussing the case of the three types of plateau that have not been the focus of this dissertation, before going on, in the next section, to suggest why the plateau might be found as a feature of falling nuclear accents in broad focus declaratives. The role of the plateau in the perception of duration, pitch height and prominence 176 Firstly, it is interesting to note that in both discussions of high plateau outside the present work (and the ProSynth work), plateaux occur in the same environment as peak delay. The experiment by Wichmann et al. (1999), discussed in the previous section, indicated that in paragraph initial position some readers reached their maximum frequency at the same time as in other positions but produced a longer plateau, causing the fall, but not the rise, to be aligned later. Wichmann et al. (1999), suggest that there may be two reasons for this plateau. One may be equivalent to simple peak delay and involve delaying only the falling part of the gesture rather than the whole accent. The other may be to signal initiality in an alternative way to peak delay (although the mechanisms by which this would be accomplished are not discussed). The results of Experiments 2 and 3 indicate that the plateau itself will increase the effect of pitch height and therefore, according to the production code, also the perception of initiality. Secondly, work by D’Imperio suggests that the effect of a plateau in Neapolitan Italian influences perception towards questions in the same way as peak delay. In Neapolitan Italian (D’Imperio 2002a) both questions and (narrow-focus) statements are realised by rise-falls, the peaks of which are aligned around 40 ms later in questions. D’Imperio and House (1997) and D’Imperio et al. (2000) conduct perceptual experiments where the peak of a rise-fall configuration is shifted backwards in time in steps of 15 ms and Italian listeners are asked whether each utterance sounds like a statement or a question. Results show that subjects identify earlier peaks as statements and later peaks as questions in the same way as is predicted by the frequency code and found by Gussenhoven and Chen (2000). Also, the result seemed to be universal (D’Imperio et al. 2000) as American listeners, who did not speak Neapolitan Italian, showed similar results to the Italian listeners. Results were also influenced by language specific constraints, however, as the crossover point from statement to question responses for American listeners was later than for the Italian listeners, possibly reflecting the phonological difference between early and late rises in American English (D’Imperio 2002a). The role of the plateau in the perception of duration, pitch height and prominence 177 In a second part of the experiment by D’Imperio et al. (2000), a plateau continuum was created where the plateau was 45 ms in duration, spreading forwards in time from the alignment of the peak. A stimulus set was created with the plateau aligned earlier and later in the stressed vowel and again listeners were asked to label each utterance as a question or a statement. When these results are compared to the peak stimuli it is clear that the plateau elicits more question responses at each alignment point. For example, at the second time step plateau stimuli gained 75% question responses whilst peak stimuli gained only around 10%. These results were further compared to the peak stimulus corresponding to the plateau offset (that is the peak stimulus 3 time steps later than the peak stimulus timed at plateau onset). Here the differences between peak and plateau stimuli were much smaller. For example, at the second time step 82% of responses favoured the question interpretation compared to 75% for plateau stimuli. This supports the suggestion in Chapter 3 that the end of intonational plateaux is speakers' real target. However, in D’Imperio et al’s. (2000) experiment we cannot be completely sure that the declination normalisation issue is not having an effect as the plateau spreads forwards in time from the peak and will therefore appear to be at an increased pitch because of its position regardless of its shape. Despite the complication caused by the declination issue, we again see, in Neapolitan Italian, that the effect of a plateau in the contour is very similar to that of peak delay, in this case leading to the perception of a question rather than a statement. This seems likely to be linked to the frequency code in that higher pitch suggests the speaker needs to appear passive and docile, as they want the cooperation of the listener. The role of the plateau in the perception of duration, pitch height and prominence 178 D’Imperio et al. (2000) do entertain the idea that the effect of the plateau may be to increase the perceived height of the accent. Therefore, in a further part of the experiment. they create another set of stimuli where the peak varies in scaling but not in timing and subjects again judge whether utterance are statements or questions. Some effect of height is found, specifically that high peaks (300 Hz) sound more like questions than low peaks (260 Hz). D’Imperio et al. (2000) largely reject this result as an explanation for their earlier findings, however, on the grounds that there is no significant difference between peaks of medium height (280 Hz) and peaks of either of the other two heights. However, as we now know that a plateau in the contour does indeed make syllables sound higher in pitch it seems reasonable to suggest that the plateau may have had a similar effect in this case. The fact that no difference was found between D’Imperio et al.’s (2000) low and medium or medium and high peaks may be due to the fact that the frequency manipulation changed the height of all targets, rather than just the peak, possibly lessening the effect of pitch height as the baseline was also altered. In addition, the 20 Hz changes used result in about 1.3 semitones difference between the low and medium peaks but only 1.2 semitones difference between the medium and high peaks. Thus it seems that high plateaux can act as a surrogate for pitch height in the same way as peak delay. This tactic takes advantage of the properties of the auditory system, which cause plateaux to sound higher in pitch. In this way the plateau appears to be used as a substitute variable for pitch height exploiting both the frequency code (to increase the perception of interrogativity) and the production code (to increase the perception of initiality). The plateaux in Japanese found by Pierrehumbert and Beckman (1988) represent a slightly different case because they are associated with low boundary tones rather than with high targets within a pitch accent. Nevertheless the same effects are probably at work. Due to the auditory sluggishness hypothesis it is likely that a short low tone will sound higher in pitch than a long low tone because the auditory system will not have time to extract the lowest pitch from the contour. Thus, the low plateau or strong allophone will sound lower in pitch. This lowering of a low tone is likely to be equivalent to the raising of a high tone and goes along with Pierrehumbert and Beckman’s (1988: 29) suggestion that the strong allophone will sound lower in pitch and be associated with a stronger boundary than the weak allophone. The role of the plateau in the perception of duration, pitch height and prominence 179 5.5.2.3 Interaction of variables signalling peak height Of course, it is certainly not the case that the plateau is the only determinant of perceived pitch height or prominence in an utterance. As we have seen, the biological codes can be used quite straightforwardly by actually increasing the frequency of intonational peaks to increase perceived prominence. Alternatively, or in addition, peaks can be delayed, also creating an impression of added height. In some cases these different methods of creating the impression of pitch height may in fact be incompatible. This seems to be the explanation behind the shorter plateaux found in expanded pitch spans in Chapter 3. It would perhaps be plausible to suggest that as the expanded spans are likely to make the accents sound more emphatic (due to the effort code), the plateau should also be longer to further increase the effect of prominence. However, it seems that in this case the effort code is used in a straightforward fashion, the perception of height actually being created by a higher maximum frequency and enhanced by the later alignment of the beginning of the plateau and the peak. Other constraints on alignment, such as the rate of change of the fall and the alignment of the end of the plateau, may mean that it is more difficult for the speaker to realise a plateau as well. Even if this were not the case it would be difficult to have both a longer plateau for emphasis and a later peak alignment as the end of the plateau must be anchored and therefore lengthening can only occur by starting the plateau earlier. The role of the plateau in the perception of duration, pitch height and prominence 180 5.5.2.4 The plateau as a substitute variable in nuclear position in broad focus declaratives Although I have suggested how plateaux may be used to signal initiality or interrogativity, so far, however, I have not discussed how the majority of plateaux studied in this thesis and in House et al. (1999a and 1999b) might be related to the various biological codes. Why should plateaux be found so consistently in nuclear position? Cruttenden (1997: 42) states that the word ‘nucleus’ “is used to describe the pitch accent which stands out as the most prominent in an intonation group”. In broad focus declaratives in English, such as the utterances studied in Chapter 2, the nucleus is placed on the stressed syllable of the final content word of the intonation phrase. This final position of the nucleus may be at odds, to some extent, with the idea that the nuclear accent should be the most prominent since the effects of declination will mean that the nucleus will have a lower maximum frequency than any of the prenuclear accents. Although this problem is partially solved by listeners normalising for declination it is possible that speakers also produce a plateau in such circumstances to add to the effect of prominence. Although this dissertation has not looked at the realisation of high targets in prenuclear position, unpublished work from the ProSynth project (Knight ms) suggests that prenuclear accents can be modelled successfully for speech synthesis by using one high turning point representing the peak rather than two points representing the beginning and end of a plateau. Thus it seems that plateaux are usually a feature of nuclear rather than prenuclear position, except in paragraph initial position (cf. Wichmann et al. 1999), where they may be used to signal initiality. 5.5.3 Physiology of intonation We have still not fully resolved why speakers should consistently produce plateaux in nuclear position. Rather than realising a plateau to increase prominence the speaker could, in theory, raise the frequency of the peak or alter the slope of declination. The physiology of intonation suggests, however, that these strategies may be impossible or at least rather effortful. The role of the plateau in the perception of duration, pitch height and prominence 181 It is likely that the phenomenon of declination is automatic to some extent. Ladd (1984: 64) in reviewing this issue states that the idea of an automatic declination component is attractive as it explains the near universality of declination across the world’s languages. An automatic component would also explain the declination evident in the vocalisations of other primates (Hauser and Fowler 1992). Indeed it seems that there is good evidence for declination being linked to an automatic physiological mechanism. For example, Lieberman (1967: 25) suggests that falls in fundamental frequency at the end of “unmarked breath groups” are due to a drop in subglottal pressure at the end of exhalation so that the pressure within the lungs will be below atmospheric pressure for the next inhalation. Collier (1975) takes up this idea and measures both subglottal pressure and the activity of various laryngeal muscles during production of Dutch sentences. The activity of the laryngeal muscles was found not to vary systematically with the slope of declination. Measures of subglottal pressure, on the other hand, show that, whilst it does not covary with the individual accents in the utterance, pressure is indeed positively correlated with the slope of declination. Thus, Collier (1975) sees intonation as being due to two interacting components. The first component is the declination line related largely to subglottal pressure. The second is the individual accents that are superimposed onto this declination line. The mechanisms responsible for the control of these accents are different from those responsible for the slope of the declination line. Collier (1975: 250) shows that the main determiner of the height and location of F0 accents is the action of the cricothyroid muscle. The cricothyroid muscle produces activity around 94 ms before F0 peaks and is activated more strongly before higher frequency peaks. It seems that activity of this muscle lengthens the vocal folds making them longer and tighter and therefore likely to vibrate at a higher frequency (see Orlikoff and Kahane 1996 for a review of the structure and function of the larynx including the activity of the different laryngeal muscles). The role of the plateau in the perception of duration, pitch height and prominence 182 Collier (1975) demonstrates that for two F0 peaks of around the same frequency, the one later in the utterance is associated with a greater activity of the cricothyroid muscle. He suggests that this greater activation is produced in order to overcome the pitch lowering effect of the falling subglottal pressure. Also related to this is the action of the cricothyroid muscle when the speaker produces a plateau. In the long (around 800 ms) plateau produced by the speaker in the example given by Collier (1975: 254) the activity of the cricothyroid muscle is sustained throughout. Thus, if declination is physiologically determined and difficult to override it is possible that speakers may wish to find energy-efficient ways to counteract its effects. This could be the physiological basis behind the plateaux we find in nuclear position. The realisation of a high target as a plateau adds to the perceived height and prominence of the nucleus without the effort needed to counteract the slope of declination by actually achieving a higher fundamental frequency. Thus, it may be more economical, in some sense, for speakers to sustain the activity of the cricothyroid muscle to create a plateau rather than to initiate greater activity in order to produce a higher peak. In this way plateaux in nuclear position in broad focus declaratives are again substitute variables for peak height but this time under the umbrella of the effort code. The speaker uses a plateau to give the impression of height, which the listener associates with greater effort and therefore greater importance. 5.6 Conclusions The presence of a plateau in the contour adds to the perceived height and prominence (but not to the perceived duration) of the syllable with which it is associated. This perceptual effect is probably caused by the auditory sluggishness found when frequency changes rapidly. The effect of this sluggishness is that in peak stimuli the system fails to phase lock to the highest frequencies of the peak and therefore pitch is perceived as lower than it actually is. For stimuli containing a plateau, the system has enough time to phase lock and pitch is perceived more accurately. The role of the plateau in the perception of duration, pitch height and prominence 183 It seems that in natural speech speakers may use their knowledge about the workings of the auditory system to take advantage of various biological codes that relate intonational form to meaning. In this way, plateaux may be used as a substitute variable for peak height. Thus plateaux are associated with paragraph initial accents (the production code), accents marking questions (the frequency code) and the nuclear accent in broad focus declaratives (the effort code). The physiological reasons behind a speaker’s choice of a plateau rather than a higher frequency peak probably lie in the different actions of the cricothyroid muscle. This muscle must be activated for a longer time to create a plateau than to create a peak but more strongly to produce a peak of a higher frequency. As both actions create the same perceptual effect it is likely that the speaker chooses the tactic that is more energyefficient.