Predicting Phrasing and Accent Julia Hirschberg CS 6998 7/15/2016

advertisement
Predicting Phrasing and Accent
Julia Hirschberg
CS 6998
7/15/2016
1
A car bomb attack on a police station in the northern Iraqi city
of Kirkuk early Monday killed four civilians and wounded 10
others U.S. military officials said. A leading Shiite member of
Iraq's Governing Council on Sunday demanded no more
"stalling" on arranging for elections to rule this country once the
U.S.-led occupation ends June 30. Abdel Aziz al-Hakim a Shiite
cleric and Governing Council member said the U.S.-run
coalition should have begun planning for elections months ago.
7/15/2016
2
Why predict phrasing and accent?
• TTS and CTS
– Naturalness
– Intelligibility
• Recognition
– Decrease perplexity
– Modify durational predictions for words at
phrase boundaries
– Identify most ‘salient’ words
7/15/2016
3
How predict phrasing and accent?
• Default prosodic assignment from simple
text analysis
• Hand-built rule-based system: hard to
modify and adapt to new domains
• Corpus-based approaches (Sproat et al ’92)
– Train prosodic variation on large labeled
corpora using machine learning
techniques
– Accent and phrasing decisions
7/15/2016
4
– Associate prosodic labels with simple
features of transcripts
• distance from beginning or end of phrase
• orthography: punctuation, paragraphing
• part of speech, constituent information
– Apply learned rules to new text
7/15/2016
5
Prosodic Phrasing
• 2 `levels’ of phrasing in ToBI
– intermediate phrase: one or more pitch accents
plus a phrase accent (Hor L- )
– intonational phrase: one or more intermediate
phrases + boundary tone (H% or L% )
• ToBI break-index tier
– 0 no word boundary
– 1 word boundary
– 2 strong juncture with no tonal markings
– 3 intermediate phrase boundary
– 4 intonational phrase boundary
7/15/2016
6
What are the indicators of phrasing in
speech?
• Timing
– Pause
– Lengthening
• F0 changes
• Vocal fry/glottalization
7/15/2016
7
What linguistic and contextual features are
linked to phrasing?
• Syntactic structure?
– We only suspected they all knew that a
burglary had been committed.
– Abney ’91 chunking
– Steedman ’90, Oehrle ’91 CCGs …
• Length of sentence
• Word co-occurrence statistics
– I should have known and so should you have.
7/15/2016
8
• Are words on each side accented or not?
• Where is previous phrase boundary?
• Part-of-speech of words around potential
boundary site
• Largest constituent dominating w(i) but not w(j)
• Smallest constituent dominating w(i),w(j)
7/15/2016
9
Statistical learning methods
• Classification and regression trees (CART)
• Rule induction (Ripper)
• HMMs
7/15/2016
10
Evaluation: Hard!
• How to define Gold Standard?
– Natural speech corpus
– Multi-speaker/same text
– Subjective judgments
7/15/2016
11
Today
• Incremental improvements continue:
– Adding higher-accuracy parsing (Koehn et al
‘00)
• Collins ‘99 parser
• More sophisticated learning algorithms (Schapire &
Singer ‘00)
• Better representations: tree based?
• Rules always impoverished
• Where to next?
7/15/2016
12
Pitch Accent/Prominence in ToBI
• Which items are made intonationally prominent
and how?
• Accent type:
– H*
simple high(declarative)
– L*
simple low (ynq)
– L*+H scooped, late rise (uncertainty/
incredulity)
– L+H* early rise to stress
(contrastive
focus)
– H+!H* fall onto stress (implied familiarity)
7/15/2016
13
What are the indicators of accent?
•
•
•
•
•
F0 excursion
Durational lengthening
Voice quality
Vowel quality
Loudness
7/15/2016
14
What are the correlates of accent in text and
context?
• Word class: content vs. function words
• Information status:
– Given/new
– Topic/Focus
– Contrast
• Grammatical function
– The ball touches the cone.
• Surface position: Today George is hungry.
7/15/2016
15
• Association with focus:
– Dogs must be carried.
– John only introduced Mary to Sue.
7/15/2016
16
How can we capture such information
simply?
•
•
•
•
•
POS window
Position of candidate word in sentence
Location of prior phrase boundary
Pseudo-given/new
Location of word in complex nominal and stress
prediction for that nominal
• Word co-occurrence
7/15/2016
17
Today
• Concept-to-Speech (CTS) – Pan&McKeown99
– systems should be able to specify “better”
prosody: the system knows what it wants to
say and can specify how
– New features!
• New learning methods:
– Boosting and bagging (Sun02)
– Combine text and acoustic cues for ASR
– Co-training??
7/15/2016
18
MAGIC
• MM system for presenting cardiac patient data
– Developed at Columbia by McKeown and
colleagues in conjunction with Columbia
Presbyterian Medical Center to automate postoperative status reporting for bypass patients
– Uses mostly traditional NLG hand-developed
components
– Generate text, then annotate prosodically
– Corpus-trained prosodic assignment
component
7/15/2016
19
• Corpus: written and oral patient reports
– 50min multi-speaker, spontaneous + 11min
single speaker, read
– 1.24M word text corpus of discharge
summaries
– Transcribed, ToBI labeled
– Generator features labeled/extracted:
• syntactic function
7/15/2016
20
•
•
•
•
•
•
•
•
•
•
7/15/2016
p.o.s.
semantic category
semantic ‘informativeness’ (rarity in corpus)
semantic constituent boundary location and
length
salience
given/new
focus
theme/ rheme
‘importance’
‘unexpectedness’
21
– Very hard to label features
• Results: new features to specify TTS prosody
– Of CTS-specific features only semantic
informativeness (likeliness of occurring in a corpus)
useful so far
– Looking at context, word collocation for accent
placement helps predict accent
– RED CELL (less predictable) vs. BLOOD cell (more)
Most predictable words are accented less frequently (40-46%)
and least predictable more (73-80%)
Unigram+bigram model predicts accent status w/77% (+/-.51)
accuracy
7/15/2016
22
Future Intonation Prediction
• Assigning affect (emotion)
• Personalizing TTS
• Conveying personality, charisma?
7/15/2016
23
Next Week
• Look at 2 other phenomena we’d like to capture
in TTS systems:
– Information status
– Discourse (topic) structure
• Homework 2 is due on Monday!
7/15/2016
24
Download