Pausing in Dialogues and Read Speech in Swedish:

advertisement
Pausing in Dialogues and Read Speech in Swedish:
Speakers’ Production and Listeners’ Interpretation
Beáta Megyesi and Sofia Gustafson-Čapková
Centre for Speech Technology
Department of Speech, Music and Hearing
KTH
S-10044, Stockholm, Sweden
bea@speech.kth.se
Department of Linguistics
Computational Linguistics
Stockholm University
S-10691 Stockholm, Sweden
sofia@ling.su.se
Abstract
In this study, we investigate the characteristics of pausing in
speakers’ production and listeners’ interpretation in three different speaking styles in Swedish: elicited spontaneous dialogues,
professional and non-professional news reading. Considerable
attention is given to the positions in which pauses can appear,
in particular their discourse context regarding theme shift. We
show that the acoustic silent intervals that are perceived by the
listeners correlate with the discourse structure, while perceived
pauses having an acoustic silence in the speech signal, correlate
to the duration of the acoustic silence.
The results show clear differences between the speaking
styles. In reading, the majority of acoustic pauses are perceived
and the majority of both the acoustic and perceived pauses are
located at theme shift. In dialogues, on the other hand, few
acoustic pauses are perceived by the listeners and the majority of both the acoustic and perceived pauses are positioned at
theme continuation. Furthermore, where many pauses are perceived by the listeners, such as in non-professional reading and
dialogues, we find long acoustic silent intervals.
1. Introduction
In the last decades, many studies have been carried out to investigate the characteristics of pausing. One reason is that pauses
often indicate prosodic phrase boundaries which highlight the
organization of the message [1], [2], [3], [4], [5], [6]. Therefore,
knowledge about the variation of pausing in different speaking styles is necessary for several applications, such as textto-speech systems, speech recognition, and dialogue systems
where the structure of the message can be crucial for good system performance.
The purpose of this study is to investigate the distribution
of pauses in Swedish in three different speaking styles: elicited
spontaneous dialogues, and news read by both professional announcers of radio news and non-professional readers. Questions addressed are what positions do silent intervals occur in
and where do people perceive those. Do the discourse environments in which acoustic silence appears have any effect on the
perceptual interpretation of pausing? In this study, pauses found
in the acoustic signal are compared to the pauses perceived by
listeners regarding frequency and position.
2. Background
Previous studies have shown that large differences can be found
in the characteristics of pausing across speaking styles.
Several studies report [3], [7], [8], [9] that the pause intervals in spoken language vary by different genres, e.g. spontaneous speech and reading aloud.
Spontaneous dialogues and the read version of the same text
have been compared for Swedish in [1] and for English in [3].
These studies reported that the number and the distribution of
pauses as well as the speech rate differs across the speaking
styles.
Hirschberg [3] reports that read speech is more rapid than
spontaneous speech when examining dialogues taken from the
American English ARPA ATIS 0 corpus and the transliteration
of these dialogues, read aloud by the same subjects. In [1], a
spontaneous dialogue and the read version of the same speech
in Swedish is compared and it is reported, among other results,
that the number and the distribution of pauses differs between
the speech styles.
In [8] and [9], the distribution and features of pauses in
professional news announcement, non-professional news reading and monologues have been compared. The results show
that spontaneous speech contains long and frequently occurring pauses, while professional announcing is characterized
by shorter and fewer pauses. Non-professional announcing
is placed in between those two polarities. The pauses occur
mainly in places relevant to the underlying message, e.g. at syntactic boundaries, and at semantically important words. However, pauses also occur in other positions. In those cases there
seems to be a preference for sites as e.g. in connection to conjunctions.
Fant & Kruckenberg [10] and [11] investigated pausing
phenomena in Swedish. They carefully examined durational
patterns and local F0-contours in nine sentences read by a pro-
fessional reader, and one sentence read by 15 non-professional
readers. They report that pause duration ranges between 50100 ms for short prompters and 1-2 seconds between sentences.
Normal pause duration within sentences ranges normally from
300 to 600 ms. Furthermore, they report that pauses at sentence
boundaries are usually prolonged and final lengthening is more
frequent at phrase boundaries than at sentence boundaries.
The relevance of pausing indicating clause and sentence
boundaries are also pointed out by Garman [12] and GoldmanEisler [13].
Swerts & Geluykens [6] showed that speakers in monologue discourse vary the duration and position of pauses on the
basis of information structure. Pauses occur between all topical
units, and directly after the topic-introducing phrase or clause.
In the following sections, we will describe a study on pausing in Swedish dialogues and read speech where we relate
acoustic silent intervals, the perception of pauses and the discourse environment of these two aspects of pausing.
3. Acoustic and Perceived Pauses in Three
Speaking Styles
This study focuses on differences between read speech and dialogues in three speaking styles:
professional news announcing
non-professional reading
elicited spontaneous dialogues
The material of read speech consists of recordings of
Swedish radio news [14] read by four professional and four
non-professional readers. The spontaneous speech material [15]
consists of recordings of two Swedish map task dialogues, each
with two dialogue participants. The materials consist of 920
words each.
To make a comparison of pauses between the three different speaking styles, we investigate three different dimensions
of communication – production, perception, and context – we
collected data from all three aspects:
To be able to investigate the discourse context of the acoustic and perceived pauses, we asked five subjects to annotate each
text material (without listening to the audio files) with discourse
labels marking theme shift. Four of the subjects were females,
of which one is a co-author to this paper with knowledge about
discourse structure. The other subjects had no expert knowledge
in linguistics.
4. The Distribution of Acoustic Pauses
The duration, frequency and position features of acoustic pauses
is reported in our previous study [16]. Here, we will give a brief
summary of the most important features found, that are relevant
for this study, as well as new results on the discourse context of
acoustic pauses.
The mean duration of the acoustic pause duration is lowest in the professional reading (271 ms), highest in the nonprofessional reading (561 ms) followed by the dialogues (538
ms).
Considering the frequency of acoustic pauses, the ratio of
word per acoustic pause is highest in the professional reading (77 words/pauses), while the non-professional reading (8.4
words/pauses) gets a slightly higher rate than the dialogue (5.5
words/pauses).
Although there are differences in the duration and frequency of pauses between the styles, the total length of the
speech files is approximately the same for the reading styles.
Hence, the time it takes to pronounce a word in average differs between the speaking styles suggesting greater variation in
speech tempo.
We can distinguish between different types of pauses such
as silent pause, and complex pause with breathing and/or swallowing. The study shows that the usage of the types differs
across the speaking styles as well as within each style, see Figure 1. For example, in the dialogues and the non-professional
reading above 60% of pauses are silent while in the skilled reading 83% of pauses are complex. The two different types (silent
acoustic data
subjects’ perception of pauses
data on discourse structure in the texts
In order to investigate the duration, frequency, type and position of acoustic pauses, the speech data was processed automatically by a pause detector. Silent intervals longer than or
equal to 100 ms were defined as acoustic correlate for pausing.
Pauses may include natural physical phenomena such as breathing and swallowing intervals. However, particles expressing
conversational support (e.g. mmm, aaa, aha) in dialogues are
not allowed inside pauses. The automatic detection was manually checked in order to properly include relevant disfluencies.
To find out what kind of acoustic pauses are perceived by
listeners, and where the perceived pauses occur, i.e. to examine the frequency and position of the perceived pauses, 20 human subjects annotated the position of what they identified as
a pause. They were asked to use different labels for long and
short pauses, and also mark cases where they were uncertain.
Two of the subjects were removed from the investigation because of their highly divergent results. Of the eighteen subjects total, there were eight females and ten males belonging
to different age groups and linguistic backgrounds. Eleven of
the subjects had some knowledge about linguistics but none of
them had ever participated in a similar experiment.
Figure 1: The amount of silent and complex pauses in professional and non-professional reading and in dialogues.
and complex pauses) are to a certain extent favored in different positions, as it was described in [16]. The position of the
acoustic pauses was labeled according to turn taking, theme
shift/continuation and the type of their following constituent:
phrase, clause or sentence. The discourse labeling was carried
out by the authors independently. The results were compared
and in case of conflicting analysis the authors agreed upon a
reconciled version of a data. In cases where a pause appears
inside a phrase, the PoS of the word was marked as well as
whether the word is a phrasal head or not.
The results show that in the professional reading, silent
pauses are rare (17% of all pauses are silent) and occur in
connection to theme continuation, mainly at sentence boundaries. In the non-professional reading, silent pauses also occur
at theme continuation (65.2%) but primarily at phrase boundaries, and secondly at clause and sentence boundaries. 34.8 %
of the silent pauses occur in connection to theme shift, mainly at
sentence boundaries. In the dialogues, 37% of the silent pauses
are found at turn taking often in front of conversational particles. Inside turns, silent pauses more frequently associated with
theme continuation than with theme shift. Additionally, in the
dialogues silent pauses also occur in front of head nouns and
adverbs.
The results concerning the discourse context of silent
pauses are shown in Figure 2. Please, note that the two columns
for the dialogue represent results computed separately for theme
shift/continuation with no regard to turn taking, as well as for
turns that overlap with theme shift /continuation.
Figure 3: The discourse context of complex pauses: The position of complex pauses regarding theme shift and theme continuation in three speaking styles.
shows the annotation of the positions of the acoustic pauses.
TC – none of the five subjects labeled a theme shift
Majority TC – only one or two of the five subjects annotated a theme shift
Figure 2: The discourse context of silent pauses: The position
of silent pauses regarding theme shift and theme continuation
in three speaking styles.
The results on the position of complex pauses are illustrated
in Figure 3. Complex pauses in professional news announcing
can be found in connection to theme shift at sentence boundaries
(70%). The rest can be found at theme continuation, mostly
at sentence boundaries and between noun phrases in a list. In
the non-professional reading, 61% of complex pauses correlates
with theme continuation at sentence and clause boundaries and
in connection to noun phrases. The remaining part occurs in
theme shifts at sentence boundaries. In the dialogues, the distribution of turns, theme shift and theme continuation in connection to pauses are relatively even. Pauses can be found in
phrasal heads: nouns, or adverbs preceded by hesitation particles, and in connection to overlapping speech, conversational
particles, hesitations, etc.
As mentioned, the discourse annotation in [16] was done
by the authors only. To get a more confident annotation, we let
five subjects independently annotate the pure text materials for
theme shift (TS). Annotators indicated TS with a mark and nonmarked intervals are assumed to represent theme continuation
(TC). Interannotator agreement was computed for all materials
and gave a kappa value of K = 0.82 for the news texts, and K
= 0.79 for the dialogues. In both cases, the values indicate high
interannotator agreement.
With this new discourse data, it is possible to give a picture
of the correlation between pausing and TS versus TC in the discourse, as well as the continuum between TS and TC. Figure 4
Majority TS – three or four subject labeled a theme shift
TS – all five subjects agreed on a theme shift
In this task, no marking of turn boundaries was performed.
As is shown in Figure 4, the results from this extended annotation task show the same tendencies as the earlier investigation,
described above; The majority of the acoustic pauses in the
professional reading style are corresponding to a TS position,
in the non-professional reading still a majority of the acoustic
pauses corresponds to a TS position but to a lesser extent than in
the professional reading; In the dialogue, however, the acoustic
pauses rather occur at TC positions.
Figure 4: Acoustic pauses and discourse context: The discourse
position of acoustic pauses in the three speaking styles.
5. The Distribution of Perceived Pauses
The distribution of the perceived pauses, labeled by the
eighteen subjects, are to a large extent evenly distributed
across the speaking styles, see Figure 5.
The average
words/perceived pauses ratio is highest in the professional
reading (12,2 words/perceived pause) followed by the dialogues (11.4 words/perceived pauses), and lowest in the nonprofessional reading (8.2 words/perceived pauses).
Figure 5: The words/perceived pauses ratio in professional
reading, dialogue, and non-professional reading.
Where do the subjects perceive pauses in the different
speaking styles? Figure 6 illustrates the distribution of the
theme shift/continuation continuum of the three speaking styles
in a similar way as it was described for the position of acoustic
pauses in the last part of Section 4. The results also show that
in the reading styles most of the perceived pauses are located
at theme shift, while in the dialogues we found the position of
perceived pauses at theme continuation.
Figure 7: The ratio of word/acoustic and perceived pause in
professional and non-professional reading, and in dialogue.
+ '
! ! "$#%'&(- #.*) "#% ( (, /&0 * !1+ ! #!&("$#% '&(' " *#%) "#% ' 0
*)
(1)
(2)
The mean of the recall and precision rates for each style
is shown in Figure 8 below. We can see that in the professional
reading, a considerable number of acoustic pauses are perceived
as a pause by the subjects, but many of the perceived pauses
does not have any correlates in acoustic silence. In the nonprofessional reading, almost every acoustic pause is perceived
by the listeners, and also the majority of the perceived pauses
corresponds to silent intervals in the speech signal. In the dialogues, on the other hand, few acoustic pauses are perceived but
many of the perceived pauses match the acoustic silence.
Figure 6: Perceived pauses and discourse context: The discourse position of perceived pauses in the three speaking styles.
6. The Correlation between Acoustic and
Perceived Pauses
What acoustic pauses are perceived and what are not? In Figure 7, the words/pauses ratio for acoustic as well as for perceived pauses is shown. It is clear that the correlation of the
acoustic and perceived pauses varies across the speaking styles.
In the professional reading, the amount of perceived pauses are
much larger than acoustic pauses, while in the dialogues we find
the opposite relation. The difference between the acoustic and
perceived pauses is not as striking as in the professional reading.
In the non-professional reading, on the other hand, the amount
of perceived and acoustic pauses is comparable.
To give an overall picture of the correlation between the
acoustic and perceived pauses, we counted recall and precision
rates for each of the eighteen subjects within every speaking
style. Recall describes the percentage of the acoustic pauses
that were actually perceived (see Equation 1), while precision
gives the percentage of perceived pauses that corresponds to
acoustic silence (see Equation 2).
Figure 8: Recall and precision rates for the perceived pauses in
the three speaking styles.
We also note that the deviation between the subjects’ interpretation of pausing differs across the speaking styles, see
Figures 9 and 10. In the professional news announcement, the
recall rate is high as well as the deviation between the subjects,
while the precision is low with agreement between the subjects.
In the dialogues, the relation appears to be the opposite. However, in the non-professional reading where both recall and precision rates are high, the deviation between the subjects is relatively small.
As we have seen there are acoustic pauses that are not perceived by listeners, and perceived pauses without any correlate
ing (52 cases) but very rare in the dialogues (2 cases), and nonexisting in the non-professional reading. The perceived pauses
without a silence correlate in the professional reading are located in 71% of the cases in connection to theme shift according
to the majority of the discourse annotators.
6.3. Not perceived acoustic pauses
Figure 9: Recall rates (%) for the three speaking styles.
When are acoustic silent intervals not perceived by more than
20% of the listeners as pauses? There are many such cases in
the dialogues, but only a few in the non-professional reading
and none in the professional reading, as it has been shown by
the precision rates for each speaking styles. If we look at the
position of those acoustic pauses that are not perceived by the
listeners in the dialogues, we find that the majority of the discourse annotators agreed on theme continuation in 58% of the
cases. Those acoustic pauses which position the majority of annotators regarded as theme continuation are shorter in average
(345 ms) than the overall pause duration for all the cases (530
ms).
7. Discussion
Figure 10: Precision rates (%) for the three speaking styles.
to acoustic silence, and lastly, cases where acoustic and perceived pauses coincide. The distribution of these three conditions differs between the speaking styles. Next, we will describe
those cases in detail and relate those to their discourse context.
6.1. Acoustic pause and perceived pause coincide
Acoustic pauses that are perceived by 75% to 100% of the listeners occur in every speaking style. In the professional reading, they are located in connection to theme shift in 100% of
the cases according to the majority of the discourse annotators.
In the non-professional reading, they can be found in 77 % of
the cases at theme shift according to the majority of annotators,
but also at theme continuation. When the pause is perceived at
theme continuation, we found that the acoustic silence interval
is shorter (466 ms) than the average duration of all cases (584
ms). In the dialogues, they occur in 79% of the cases at theme
continuation but we did not find any explanation in the duration
of these pauses.
6.2. Perceived pause without acoustic silence
Cases where 75% of the listeners perceived a pause without any
acoustic silence correlate are common in the professional read-
The high precision values of the non-professional reading and
the dialogue might be explained by longer pausing duration, 561
and 538 ms respectively, as compared to the professional reading with a mean duration of 271 ms. However, there are large
differences between the speaking styles. In the professional
reading, all acoustic pauses were found but also a great amount
of perceived pauses. The reading styles have similar recall rates
which indicates that subjects in the professional reading hear
about as many pauses as in the non-professional reading. This
might be due to the fact that the message organization is the
same in both speaking styles. In our study, non-professional
readers use silence to signal a structure while professional readers use other prosodic features. The listener can also chunk
the message according to a clear discourse structure. In the dialogue, however, there are many silent pauses that are ignored by
subjects. This might depend on the low correlation between the
acoustic pauses and the discourse structure. We might find an
explanation in that speakers in spontaneous dialogue use other
prosodic features, e.g. intonational and temporal variation, to
signal prosodic boundaries; perhaps the same features as we
can find in the professional reading.
Our results indicate that high recall mirrors a clear discourse structure, while high precision reflects longer acoustic
silent intervals. In the reading styles, we have high recall and
the majority of the discourse annotators agreed on theme shift.
High precision rates are found in those speaking styles where
the average duration of silent intervals are longer, namely in the
non-professional reading and the dialogue. Low precision in
the professional reading might be due to other prosodic features
such as intonational variations were used for prosodic phrasing.
Additionally, a possible explanation to the low recall in the
dialogue might be that the silent intervals often are not as relevant for the message structure as in the reading styles. The discourse structure in the dialogues is more opaque so the pauses
do not coincide with theme shift. This is also suggested by
the negative correlation between theme shift and acoustic pausing. Planning pauses are perhaps not perceived in the same way
as prosodic phrasing. In spontaneous speech, speakers perhaps
primarily use other prosodic features (such as intonational variations, segment lengthening, variation in tempo, etc) to signal
phrasing and discourse structure.
We did not find any correlation between pause duration and
the number of subjects who perceived silent intervals as a pause.
[4]
Hirschberg, J., “Communication and Prosody: Functional
Aspects of Prosody”, Speech Communication: Special Issue on Dialogue and Prosody, Terken, J., & Swerts, M.
(Eds.), 2001.
[5]
Ostendorf, M., “Prosodic Boundary Detection” Prosody:
Theory and Experiment, Studies presented to Gösta
Bruce, Kluwer Academic Publisher, 1997.
[6]
Swerts, M. & Geluykens, R., “Prosody as a marker of
information flow in spoken discourse”, Language and
Speech 37, 21-45, 1994.
[7]
Hirschberg, J., “Prosodic variation and discourse structure across speaking styles”, Prosody: Theory and Experiment, Studies presented to Gösta Bruce, Kluwer Academic Publisher, 1997.
[8]
Strangert, E., “Speaking style and pausing”, PHONUM,
Reports from the Department of Phonetics, University of
Umeå, 1993.
[9]
Strangert, E., “Clause Structure and Prosodic Segmentation”, FONETIK-93 Papers from the 7th Swedish Phonetics Conference, John Sören Petterson (ed), Uppsala, May
12-14, 1993.
8. Conclusions and Future Directions
In this study we investigated the phenomena of pausing in three
different speaking styles in Swedish: elicited spontaneous dialogues, professional news announcement and non-professional
reading. Additionally, we examined the discourse context that
corresponded to intervals of acoustic silence and listener perceived pauses. Our results show large differences across the
speaking styles. In the professional reading, all acoustic silence
intervals are found by the listeners, but a great number of perceived pauses do not have an acoustic correlate in silence. In the
non-professional reading, the majority of the acoustic pauses
are perceived by the listeners, and many of the perceived pauses
actually have an acoustic correlate. In the dialogues, on the
other hand, many acoustic pauses are not perceived as pauses by
the listeners but many of the perceived pauses have an acoustic
correlate in silence. Considering the discourse environment in
which the acoustic and perceptual pauses appear, we observed
that silence is perceived if it occurs in connection to theme shift,
while if the silence is found at theme continuation, the listeners
do not perceive those intervals as pauses. Not surprisingly, we
also showed that pause length have an effect on the the listeners perception; the longer the silent intervals are, the better the
chance that the perceived pause is actually an acoustic silent
interval.
Questions we find important to explore in future work concern intonational variation in connection to pausing and discourse structure. Since many perceived pauses do not seem to
have silence as a primary correlate, analysis of intonational patterns would shed more light on the importance of the intonational variations and their effect on prosodic phrasing.
Other fields for future work include the investigation of the
relation between the hierarchical discourse structure and pausing, as well as the closer examination of the syntactic environment of pauses and its relation to the discourse structure.
Acknowledgements
First of all, we would like to thank all the people without whom
this study would not see the light; Petur Helgason for the dialogue corpus, Swedish Radio for the Swedish news recordings,
the four non-professional readers and Mattias Heldner for his
help with the recordings of the non-professional readings. Also,
a big thank you to the participants in the listening tests and to
the subjects of the discourse annotation. Last, but not least,
many thanks to Rolf Carlson for the interesting and fruitful discussions, for his brilliant suggestions and valuable comments.
9. References
[1]
Bruce, G., “Modelling Swedish Intonation for Read
and Spontaneous Speech”, Proceedings of International
Congress on Phonetic Sciences, Vol. 2 pp. 28-35, 1995.
[2]
Deese, J., “Pauses, prosody and the demands of production in language”, Temporal Variables in Speech, Studies
in Honour of Frieda Goldman-Eisler, Hans & Raupach,
Manfred (Eds.), Mouton Publishers, 1980.
[3]
Hirschberg, J., “Prosodic and other acoustic cues to speaking style in spontaneous and read speech”, Proceedings of
International Congress on Phonetic Sciences, Vol. 2, pp.
36-43, 1995.
[10] Fant, G. & Kruckenberg, A., “Preliminaries to the Study
of Swedish Prose Reading and Reading Style”, In STLQPSR 2/1989 (April-June), Speech Transmission Laboratory (Department of Speech, Music and Hearing), Royal
Institute of Technology, Stockholm, Sweden, 1989.
[11] Fant, G., Kruckenberg, A., & Liljencrants, J, “Acousticphonetic Analysis of Prominence in Swedish, In Botinis,
A. (Ed.), Intonation: Analysis, Modelling and Technology,
Kluwer Academic Publishers, 2000.
[12] Garman, M., “Psycholinguistics”, Cambridge University
Press, 1990.
[13] Goldman-Eisler, F., “Pauses, Clauses, Sentences”, Language and Speech, 15:2, 1972.
[14] Recordings of Swedish Radio News, Swedish Radio,
1999-2000.
[15] Helgason, P., “Stockholm Corpus of Spontaneous Speech”,
Department of Linguistics, Stockholm University, forthcoming.
[16] Gustafson-Čapková, S. & Megyesi, B., “A Comparative
Study of Pauses in Dialogues and Read Speech”, Proceedings of Eurospeech 2001, Volume 2, pp. 931-935, Aalborg, Denmark, September 3-7, 2001.
Download