Predicting Phrasing and Accent Julia Hirschberg CS 6998 7/15/2016 1 A car bomb attack on a police station in the northern Iraqi city of Kirkuk early Monday killed four civilians and wounded 10 others U.S. military officials said. A leading Shiite member of Iraq's Governing Council on Sunday demanded no more "stalling" on arranging for elections to rule this country once the U.S.-led occupation ends June 30. Abdel Aziz al-Hakim a Shiite cleric and Governing Council member said the U.S.-run coalition should have begun planning for elections months ago. 7/15/2016 2 Why predict phrasing and accent? • TTS and CTS – Naturalness – Intelligibility • Recognition – Decrease perplexity – Modify durational predictions for words at phrase boundaries – Identify most ‘salient’ words 7/15/2016 3 How predict phrasing and accent? • Default prosodic assignment from simple text analysis • Hand-built rule-based system: hard to modify and adapt to new domains • Corpus-based approaches (Sproat et al ’92) – Train prosodic variation on large labeled corpora using machine learning techniques – Accent and phrasing decisions 7/15/2016 4 – Associate prosodic labels with simple features of transcripts • distance from beginning or end of phrase • orthography: punctuation, paragraphing • part of speech, constituent information – Apply learned rules to new text 7/15/2016 5 Prosodic Phrasing • 2 `levels’ of phrasing in ToBI – intermediate phrase: one or more pitch accents plus a phrase accent (Hor L- ) – intonational phrase: one or more intermediate phrases + boundary tone (H% or L% ) • ToBI break-index tier – 0 no word boundary – 1 word boundary – 2 strong juncture with no tonal markings – 3 intermediate phrase boundary – 4 intonational phrase boundary 7/15/2016 6 What are the indicators of phrasing in speech? • Timing – Pause – Lengthening • F0 changes • Vocal fry/glottalization 7/15/2016 7 What linguistic and contextual features are linked to phrasing? • Syntactic structure? – We only suspected they all knew that a burglary had been committed. – Abney ’91 chunking – Steedman ’90, Oehrle ’91 CCGs … • Length of sentence • Word co-occurrence statistics – I should have known and so should you have. 7/15/2016 8 • Are words on each side accented or not? • Where is previous phrase boundary? • Part-of-speech of words around potential boundary site • Largest constituent dominating w(i) but not w(j) • Smallest constituent dominating w(i),w(j) 7/15/2016 9 Statistical learning methods • Classification and regression trees (CART) • Rule induction (Ripper) • HMMs 7/15/2016 10 Evaluation: Hard! • How to define Gold Standard? – Natural speech corpus – Multi-speaker/same text – Subjective judgments 7/15/2016 11 Today • Incremental improvements continue: – Adding higher-accuracy parsing (Koehn et al ‘00) • Collins ‘99 parser • More sophisticated learning algorithms (Schapire & Singer ‘00) • Better representations: tree based? • Rules always impoverished • Where to next? 7/15/2016 12 Pitch Accent/Prominence in ToBI • Which items are made intonationally prominent and how? • Accent type: – H* simple high(declarative) – L* simple low (ynq) – L*+H scooped, late rise (uncertainty/ incredulity) – L+H* early rise to stress (contrastive focus) – H+!H* fall onto stress (implied familiarity) 7/15/2016 13 What are the indicators of accent? • • • • • F0 excursion Durational lengthening Voice quality Vowel quality Loudness 7/15/2016 14 What are the correlates of accent in text and context? • Word class: content vs. function words • Information status: – Given/new – Topic/Focus – Contrast • Grammatical function – The ball touches the cone. • Surface position: Today George is hungry. 7/15/2016 15 • Association with focus: – Dogs must be carried. – John only introduced Mary to Sue. 7/15/2016 16 How can we capture such information simply? • • • • • POS window Position of candidate word in sentence Location of prior phrase boundary Pseudo-given/new Location of word in complex nominal and stress prediction for that nominal • Word co-occurrence 7/15/2016 17 Today • Concept-to-Speech (CTS) – Pan&McKeown99 – systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how – New features! • New learning methods: – Boosting and bagging (Sun02) – Combine text and acoustic cues for ASR – Co-training?? 7/15/2016 18 MAGIC • MM system for presenting cardiac patient data – Developed at Columbia by McKeown and colleagues in conjunction with Columbia Presbyterian Medical Center to automate postoperative status reporting for bypass patients – Uses mostly traditional NLG hand-developed components – Generate text, then annotate prosodically – Corpus-trained prosodic assignment component 7/15/2016 19 • Corpus: written and oral patient reports – 50min multi-speaker, spontaneous + 11min single speaker, read – 1.24M word text corpus of discharge summaries – Transcribed, ToBI labeled – Generator features labeled/extracted: • syntactic function 7/15/2016 20 • • • • • • • • • • 7/15/2016 p.o.s. semantic category semantic ‘informativeness’ (rarity in corpus) semantic constituent boundary location and length salience given/new focus theme/ rheme ‘importance’ ‘unexpectedness’ 21 – Very hard to label features • Results: new features to specify TTS prosody – Of CTS-specific features only semantic informativeness (likeliness of occurring in a corpus) useful so far – Looking at context, word collocation for accent placement helps predict accent – RED CELL (less predictable) vs. BLOOD cell (more) Most predictable words are accented less frequently (40-46%) and least predictable more (73-80%) Unigram+bigram model predicts accent status w/77% (+/-.51) accuracy 7/15/2016 22 Future Intonation Prediction • Assigning affect (emotion) • Personalizing TTS • Conveying personality, charisma? 7/15/2016 23 Next Week • Look at 2 other phenomena we’d like to capture in TTS systems: – Information status – Discourse (topic) structure • Homework 2 is due on Monday! 7/15/2016 24