Predicting Phrasing and Accent Julia Hirschberg CS 6998 7/15/2016 1 Why predict phrasing and accent? TTS and CTS Naturalness Intelligibility Recognition Decrease perplexity Modify durational predictions for words at phrase boundaries Identify most ‘salient’ words 7/15/2016 2 How predict phrasing and accent? Default prosodic assignment from simple text analysis Hand-built rule-based system: hard to modify and adapt to new domains Corpus-based approaches (Sproat et al ’92) Train prosodic variation on large labeled corpora using machine learning techniques Accent and phrasing decisions 7/15/2016 3 Associate prosodic labels with simple features of transcripts • distance from beginning or end of phrase • orthography: punctuation, paragraphing • part of speech, constituent information Apply learned rules to new text 7/15/2016 4 Prosodic Phrasing 2 `levels’ of phrasing in ToBI intermediate phrase: one or more pitch accents plus a phrase accent (Hor L) intonational phrase: one or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier 0 no word boundary 1 word boundary 2 3 4 7/15/2016 strong juncture with no tonal markings intermediate phrase boundary intonational phrase boundary 5 What are the indicators of phrasing in speech? Timing Pause Lengthening F0 changes Vocal fry/glottalization 7/15/2016 6 What linguistic and contextual features are linked to phrasing? Syntactic structure? We only suspected they all knew that a burglary had been committed. Abney ’91 chunking Steedman ’90, Oehrle ’91 CCGs … Length of sentence Word co-occurrence statistics I should have known and so should you have. 7/15/2016 7 Are words on each side accented or not? Where is previous phrase boundary? Part-of-speech of words around potential boundary site Largest constituent dominating w(i) but not w(j) Smallest constituent dominating w(i),w(j) 7/15/2016 8 Statistical learning methods Classification and regression trees (CART) Rule induction (Ripper) HMMs 7/15/2016 9 Evaluation: Hard! How to define Gold Standard? Natural speech corpus Multi-speaker/same text Subjective judgments 7/15/2016 10 Today Incremental improvements continue: Adding higher-accuracy parsing (Koehn et al ‘00) Collins ‘99 parser More sophisticated learning algorithms (Schapire & Singer ‘00) Better representations: tree based? Rules always impoverished Where to next? 7/15/2016 11 Pitch Accent/Prominence in ToBI Which items are made intonationally prominent and how? Accent type: H* L* L*+H L+H* simple high (declarative) simple low (ynq) scooped, late rise (uncertainty/ incredulity) early rise to stress (contrastive focus) H+!H* fall onto stress (implied familiarity) 7/15/2016 12 What are the indicators of accent? F0 excursion Durational lengthening Voice quality Vowel quality Loudness 7/15/2016 13 What are the correlates of accent in text and context? Word class: content vs. function words Information status: Given/new Topic/Focus Contrast Grammatical function The ball touches the cone. Surface position: Today George is hungry. 7/15/2016 14 Association with focus: Dogs must be carried. John only introduced Mary to Sue. 7/15/2016 15 How can we capture such information simply? POS window Position of candidate word in sentence Location of prior phrase boundary Pseudo-given/new Location of word in complex nominal and stress prediction for that nominal Word co-occurrence 7/15/2016 16 Today Concept-to-Speech (CTS) – Pan&McKeown99 systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how New features! New learning methods: Boosting and bagging (Sun02) Combine text and acoustic cues for ASR Co-training?? 7/15/2016 17 MAGIC MM system for presenting cardiac patient data Developed at Columbia by McKeown and colleagues in conjunction with Columbia Presbyterian Medical Center to automate post-operative status reporting for bypass patients Uses mostly traditional NLG hand-developed components Generate text, then annotate prosodically Corpus-trained prosodic assignment component 7/15/2016 18 Corpus: written and oral patient reports 50min multi-speaker, spontaneous + 11min single speaker, read 1.24M word text corpus of discharge summaries Transcribed, ToBI labeled Generator features labeled/extracted: syntactic function 7/15/2016 19 p.o.s. semantic category semantic ‘informativeness’ (rarity in corpus) semantic constituent boundary location and length salience given/new focus theme/ rheme ‘importance’ ‘unexpectedness’ 7/15/2016 20 Very hard to label features Results: new features to specify TTS prosody Of CTS-specific features only semantic informativeness (likeliness of occurring in a corpus) useful so far Looking at context, word collocation for accent placement helps predict accent RED CELL (less predictable) vs. BLOOD cell (more) Most predictable words are accented less frequently (40-46%) and least predictable more (73-80%) Unigram+bigram model predicts accent status w/77% (+/-.51) accuracy 7/15/2016 21 Next Week Discussion questions 7/15/2016 22