Predicting Phrasing and Accent Julia Hirschberg CS 6998 7/15/2016

advertisement
Predicting Phrasing and Accent
Julia Hirschberg
CS 6998
7/15/2016
1
Why predict phrasing and
accent?
TTS and CTS
Naturalness
Intelligibility
Recognition
Decrease perplexity
Modify durational predictions for words at
phrase boundaries
Identify most ‘salient’ words
7/15/2016
2
How predict phrasing and
accent?
Default prosodic assignment from simple
text analysis
Hand-built rule-based system: hard to
modify and adapt to new domains
Corpus-based approaches (Sproat et al ’92)
Train prosodic variation on large labeled
corpora using machine learning
techniques
Accent and phrasing decisions
7/15/2016
3
Associate prosodic labels with simple
features of transcripts
• distance from beginning or end of phrase
• orthography: punctuation, paragraphing
• part of speech, constituent information
Apply learned rules to new text
7/15/2016
4
Prosodic Phrasing
 2 `levels’ of phrasing in ToBI
intermediate phrase: one or more pitch accents
plus a phrase accent (Hor L)
intonational phrase: one or more intermediate
phrases + boundary tone (H%
or L%
)
 ToBI break-index tier
0 no word boundary
1
word boundary
2
3
4
7/15/2016
strong juncture with no tonal markings
intermediate phrase boundary
intonational phrase boundary
5
What are the indicators of
phrasing in speech?
Timing
Pause
Lengthening
F0 changes
Vocal fry/glottalization
7/15/2016
6
What linguistic and contextual
features are linked to phrasing?
Syntactic structure?
We only suspected they all knew that a
burglary had been committed.
Abney ’91 chunking
Steedman ’90, Oehrle ’91 CCGs …
Length of sentence
Word co-occurrence statistics
I should have known and so should you
have.
7/15/2016
7
Are words on each side accented or not?
Where is previous phrase boundary?
Part-of-speech of words around potential
boundary site
Largest constituent dominating w(i) but not w(j)
Smallest constituent dominating w(i),w(j)
7/15/2016
8
Statistical learning methods
Classification and regression trees (CART)
Rule induction (Ripper)
HMMs
7/15/2016
9
Evaluation: Hard!
How to define Gold Standard?
Natural speech corpus
Multi-speaker/same text
Subjective judgments
7/15/2016
10
Today
Incremental improvements continue:
Adding higher-accuracy parsing (Koehn et al
‘00)
Collins ‘99 parser
More sophisticated learning algorithms (Schapire
& Singer ‘00)
Better representations: tree based?
Rules always impoverished
Where to next?
7/15/2016
11
Pitch Accent/Prominence in ToBI
 Which items are made intonationally prominent and
how?
 Accent type:
H*
L*
L*+H
L+H*
simple high (declarative)
simple low (ynq)
scooped, late rise (uncertainty/ incredulity)
early rise to stress (contrastive focus)
H+!H* fall onto stress (implied familiarity)
7/15/2016
12
What are the indicators of
accent?
F0 excursion
Durational lengthening
Voice quality
Vowel quality
Loudness
7/15/2016
13
What are the correlates of accent
in text and context?
Word class: content vs. function words
Information status:
Given/new
Topic/Focus
Contrast
Grammatical function
The ball touches the cone.
Surface position: Today George is hungry.
7/15/2016
14
Association with focus:
Dogs must be carried.
John only introduced Mary to Sue.
7/15/2016
15
How can we capture such
information simply?
POS window
Position of candidate word in sentence
Location of prior phrase boundary
Pseudo-given/new
Location of word in complex nominal and stress
prediction for that nominal
Word co-occurrence
7/15/2016
16
Today
Concept-to-Speech (CTS) – Pan&McKeown99
systems should be able to specify “better”
prosody: the system knows what it wants to
say and can specify how
New features!
New learning methods:
Boosting and bagging (Sun02)
Combine text and acoustic cues for ASR
Co-training??
7/15/2016
17
MAGIC
MM system for presenting cardiac patient data
Developed at Columbia by McKeown and
colleagues in conjunction with Columbia
Presbyterian Medical Center to automate
post-operative status reporting for bypass
patients
Uses mostly traditional NLG hand-developed
components
Generate text, then annotate prosodically
Corpus-trained prosodic assignment
component
7/15/2016
18
Corpus: written and oral patient reports
50min multi-speaker, spontaneous +
11min single speaker, read
1.24M word text corpus of discharge
summaries
Transcribed, ToBI labeled
Generator features labeled/extracted:
syntactic function
7/15/2016
19
p.o.s.
semantic category
semantic ‘informativeness’ (rarity in corpus)
semantic constituent boundary location and
length
salience
given/new
focus
theme/ rheme
‘importance’
‘unexpectedness’
7/15/2016
20
Very hard to label features
 Results: new features to specify TTS prosody
Of CTS-specific features only semantic
informativeness (likeliness of occurring in a corpus)
useful so far
Looking at context, word collocation for accent
placement helps predict accent
RED CELL (less predictable) vs. BLOOD cell (more)
Most predictable words are accented less frequently (40-46%)
and least predictable more (73-80%)
Unigram+bigram model predicts accent status w/77% (+/-.51)
accuracy
7/15/2016
21
Next Week
Discussion questions
7/15/2016
22
Download