Lecture 22 Intonation and Discourse CS 4705

advertisement

Lecture 22

Intonation and Discourse

CS 4705

What does prosody convey?

• In general, information about:

– What the speaker is trying to convey

• Is this a statement or a question?

– The speaker state

• Is the speaker getting angry, frustrated?

• In dialogue, information about:

– The structure of the dialogue

• Is the user or the system trying to start a new topic?

• Is the speaker talking about given or new information?

– The state of the interaction :

• Is the user having trouble being understood?

• Is the user having trouble understanding the system?

Current Trends

• New description schemes (e.g. ToBI)

• Corpus-based research and machine learning

• Emphasis on evaluation of algorithms and systems (NLE ‘00 special issue)

• Investigation of spontaneous speech phenomena and variation in speaking style

• Applications to CTS, ASR and SDS

Corpora

• Public and semi-public databases

– ATIS, SwitchBoard, Call Home, Meetings

(NIST/DARPA/LDC)

– TRAINS/TRIPS (U. Rochester), FM Radio (BU), BDC

(Harvard, AT&T)

• Private collections

– Acquired for speech or dialogue research (August,

KTH; Voicemail, AT&T, IBM)

– Meetings, call centers, operator services, focus group collections

• The Web

– Newscasts, radio

To (nes and) B (reak) I (ndices)

• Developed by prosody researchers in four meetings over 1991-94

• Goals:

– devise common labeling scheme for Standard American

English that is robust and reliable

– promote collection of large, prosodically labeled, shareable corpora

• ToBI standards also proposed for Japanese,

German, Italian, Spanish, British and Australian

English,....

• Minimal ToBI transcription:

– recording of speech

– f0 contour

– ToBI tiers:

• orthographic tier: words

• break-index tier: degrees of junction (Price et al ‘89)

• tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80)

• miscellaneous tier: disfluencies, non-speech sounds, etc.

Sample ToBI Labeling

• Online training material,available at:

– http://www.ling.ohio-state.edu/phonetics/ToBI/

• Evaluation

– Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label,

91% agreement on break indices to within 1 level

(Silverman et al. ‘92,Pitrelli et al ‘94)

Pitch Accent/Prominence in ToBI

• Which items are made intonationally prominent and how?

• Accent type:

– H*

– L* simple high (declarative) simple low (ynq)

– L*+H scooped, late rise (uncertainty/ incredulity)

– L+H* early rise to stress (contrastive focus)

– H+!H* fall onto stress (implied familiarity)

•Downstepped accents:

•!H*, L+!H*, L*+!H

•Degree of prominence:

 within a phrase: HiF0

 across phrases

Functions of Pitch Accent

• Given/new information

S: Do you need a return ticket?

– U: No, thanks, I don’t need a return.

• Contrast (narrow focus)

– U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…)

• Disambiguation of discourse markers

– S: Now let me get you the train information.

– U: Okay (thanks) vs. Okay….(but I really want…)

Predicting Accent: Is it accented or not?

• Applications: TTS and CTS

• Corpora: read and spontaneous speech

• Features: pos window of 3, sentence position, position within NP, # of syllables, position in complex nominal, inferred given/new status, inferred focus, mutual information

• Results: 75-85% correct, depending on genre

Prosodic Phrasing in ToBI

• ‘Levels’ of phrasing:

– intermediate phrase: one or more pitch accents plus a phrase accent (Hor L)

– intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% )

• ToBI break-index tier

– 0 no word boundary

– 1 word boundary

– 2 strong juncture with no tonal markings

– 3 intermediate phrase boundary

– 4 intonational phrase boundary

Functions of Phrasing

• Disambiguates syntactic constructions, e.g. PP attachment, restrictive/non relative clause:

S: You should buy the ticket with the discount coupon.

S: The itinerary which I faxed includes deluxe accommodations

• Disambiguates scope ambiguities, e.g. Negation:

– S: You aren’t booked through Rome because of the fare.

• Or modifier scope:

– S: This fare is restricted to retired politicians and civil servants.

Predicting Phrase Boundaries

• Applications: TTS, CTS, ASR

• Corpora: AP news, Penn Treebank, ATIS

• Features: sentence position, sentence length, pos window of 4, location of previous predicted boundary, mutual information, constituent information, dependency structure

• Results: 96% correct

Contours: Accent + Phrasing

• What do intonational contours ‘mean’ (Ladd ‘80,

Bolinger ‘89)?

– Speech acts (statements, questions, requests)

S: That’ll be credit card?

(L* H- H%)

– Propositional attitude (uncertainty, incredulity)

S: You’d like an evening flight.

(L*+H L- H%)

– Speaker affect (anger, happiness, love)

U: I said four SEVEN one! (L+H* L- L%)

– “ Personality ”

S: Welcome to the Sunshine Travel System.

Pitch Range and Timing

• Level of speaker engagement

S: Welcome to InfoTravel. How may I help you?

• Contour interpretation

– S: You can take the L*+H bus from Malpensa to Rome

L-H% .

– U: Take the bus. vs. Take the bus!

• Discourse/topic structure

– Topic beginnings have higher pitch range, faster, preceded by longer pauses

– Endings the opposite

Prosody and Speaker Emotion

• What makes an utterance sound angry? Sad?

– How much comes from the lexical information?

– How much from the acoustic/prosodic?

– Does all anger, e.g., sound the same?

• Cahn ‘88 (examples)

Applications

• Text-to-Speech and Concept-to-Speech generation: improve naturalness

• Speech Recognition: identify suprasegmental meaning

• Spoken Dialogue Systems: understand when people are confused, angry

• Audio Browsing: format corpora for browsing and search

Challenges

• We don’t really know what most contours ‘mean’

• Our accent prediction needs more sensitivity to better model of given/new, focus, grammatical function

• Our phrasing prediction needs better information about e.g. attachment

• We don’t know much about emotional speech or

‘personality’ -- critical to applications

Download