Meaningful Intonational Variation 7/15/2016 1

advertisement
Meaningful Intonational Variation
7/15/2016
1
Today
Assigning variation for TTS, CTS
Contours
Accent
Phrasing
Pitch Range
Amplitude and timing
7/15/2016
2
TTS Production Pipeline
Orthographic input: Dr. Smith lives on Elm Dr.
Text normalization: abbreviation expansion…
Pronunciation modeling: POS id, WS
disambiguation
Intonation assignment: parsing, POS id, robust
semantics…
Phonetic/phonological realization: phonological
parsing, phonetic analysis
Unit selection: acoustic analysis
7/15/2016
3
Intonation Assignment: Phrasing
Traditional: hand-built rules
Punctuation 234-5682
Context/function word: no breaks after
function word He went to dinner
Parse? She favors the nuts and bolts
approach
Current: statistical analysis of large labeled
corpus
Punctuation, pos window, utt length,…
7/15/2016
4
Functions of Phrasing
 Disambiguates syntactic constructions, e.g. PP
attachment:
S: You should buy the ticket with the discount
coupon.
 Disambiguates scope ambiguities, e.g. Negation:
S: You aren’t booked through Rome because of
the fare.
 Or modifier scope:
S: This fare is restricted to retired politicians and
civil servants.
7/15/2016
5
Intonation Assignment: Accent
Hand-built rules
Function/content distinction He went out the
back door/He threw out the trash
Complex nominals:
Main Street/Park Avenue
city hall parking lot
Statistical procedures trained on large corpora
Contrastive stress, given/new distinction?
7/15/2016
6
Functions of Pitch Accent
 Given/new information
S: Do you need a return ticket.
U: No, thanks, I don’t need a return.
 Contrast (narrow focus)
U: No, thanks, I don’t need a RETURN…. (I need
a time schedule, receipt,…)
 Disambiguation of discourse markers
S: Now let me get you the train information.
U: Okay (thanks) vs. Okay….(but I really want…)
7/15/2016
7
Intonation Assignment: Contours
 Simple rules
‘.’ = declarative contour
‘?’ = yes-no-question contour unless whword present at/near front of sentence
Well, how did he do it? And what do you know?
What else might we do?
7/15/2016
8
Contours: Accent +
Phrasing
 What do intonational contours ‘mean’ (Ladd ‘80,
Bolinger ‘89)?
Speech acts (statements, questions, requests)
S: That’ll be credit card? (L* H- H%)
Propositional attitude (uncertainty, incredulity)
S: You’d like an evening flight. (L*+H L- H%)
Speaker affect (anger, happiness, love)
U: I said four SEVEN one! (L+H* L- L%)
“Personality”
S: Welcome to the Sunshine Travel System.
7/15/2016
9
Propositional attitude (uncertainty)
Did you feed the animals?
I fed the L*+H goldfish L-H%
Distinguish direct/indirect speech acts
Can you open the door?
7/15/2016
10
The TTS Front End Today
Corpus-based statistical methods instead of
hand-built rule-sets
Dictionaries instead of rules (but fall-back to
rules)
Modest attempts to infer contrast, given/new
Text analysis tools: pos tagger, morphological
analyzer, little parsing
7/15/2016
11
TTS:
Where are we now?
Natural sounding speech for some utterances
Where good match between input
and database
Still…hard to vary prosodic features and
retain naturalness
Yes-no questions: Do you want to fly
first class?
Context-dependent variation still hard to infer
from text and hard to realize naturally:
7/15/2016
12
Appropriate contours from text
Emphasis, de-emphasis to convey
focus, given/new distinction: I own a
cat. Or, rather, my cat owns me.
Variation in pitch range, rate, pausal
duration to convey topic structure
Characteristics of ‘emotional speech’ little
understood, so hard to convey: …a voice that
sounds friendly, sympathetic, authoritative….
How to mimic real voices?
7/15/2016
13
TTS vs. CTS
Decisions in Text-to-Speech (TTS) depend on
syntax, information status, topic structure,…
information explicitly available to NLG
Concept-to-Speech (CTS) systems should be
able to specify “better” prosody: the system
knows what it wants to say and can specify how
But….generating prosody for CTS isn’t so easy
7/15/2016
14
To(nes and)B(reak)I(ndices)
Developed by prosody researchers in four
meetings over 1991-94
Goals:
devise common labeling scheme for
Standard American English that is robust
and reliable
promote collection of large, prosodically
labeled, shareable corpora
ToBI standards also proposed for Japanese,
German, Italian, Spanish, British and
Australian English,....
7/15/2016
15
Minimal ToBI transcription:
recording of speech
f0 contour
ToBI tiers:
orthographic tier: words
break-index tier: degrees of junction (Price et
al ‘89)
tonal tier: pitch accents, phrase accents,
boundary tones (Pierrehumbert ‘80)
miscellaneous tier: disfluencies, non-speech
sounds, etc.
7/15/2016
16
Sample ToBI Labeling
7/15/2016
17
 Online training material,available at:
http://www.ling.ohio-state.edu/phonetics/ToBI/
 Evaluation
Good inter-labeler reliability for expert and naive
labelers: 88% agreement on presence/absence
of tonal category, 81% agreement on category
label, 91% agreement on break indices to within
1 level (Silverman et al. ‘92,Pitrelli et al ‘94)
7/15/2016
18
Pitch Accent/Prominence in ToBI
 Which items are made intonationally prominent and
how?
 Accent type:
H*
L*
L*+H
L+H*
simple high (declarative)
simple low (ynq)
scooped, late rise (uncertainty/ incredulity)
early rise to stress (contrastive focus)
H+!H* fall onto stress (implied familiarity)
7/15/2016
19
•Downstepped accents:
•!H*,
•L+!H*,
•L*+!H
•Degree of prominence:
within a phrase: HiF0
across phrases
7/15/2016
20
Prosodic Phrasing in ToBI
 ‘Levels’ of phrasing:
intermediate phrase: one or more pitch accents
plus a phrase accent (Hor L)
intonational phrase: 1 or more intermediate
phrases + boundary tone (H%
or L%
)
 ToBI break-index tier
0 no word boundary
1
word boundary
2
3
4
7/15/2016
strong juncture with no tonal markings
intermediate phrase boundary
intonational phrase boundary
21
L-L%
L-H%
H-L%
H-H%
H*
L*
L*+H
7/15/2016
22
L-L%
L-H%
H-L%
H-H%
L+H*
H+!H*
H* !H*
7/15/2016
23
Contour Examples
http://www.cs.columbia.edu/~julia/cs6998/card
s/examples.html
7/15/2016
24
And Other Things Contribute:
Pitch Range and Timing (Rate, Pause)
Level of speaker engagement
Hello vs. HELLO
 Contour interpretation
Rise/fall/rise (L*+H L-H%): Elephantiasis isn’t
incurable
 Discourse/topic structure: paratones
7/15/2016
25
Corpus-Based Research
Predicting accent, phrasing, contours from large
ToBI-labeled corpora
Features:
Word position, p.o.s. window, word
cooccurence, punctuation, capitalization,
sentence length, paragraph position, …
Results:
~80-85% correct accent prediction
~92-96% correct phrase boundary prediction
Contours????
Reality…
7/15/2016
26
This is my version of a rather long sentence
which ideally should be broken into several
phrases automatically by a smart system but we
don't know if this will actually happen do we?
Is a yes-no question uttered with falling
intonation? Does that sound delightful?
Mellifluous?
I don’t want cereal I want toast.
….
7/15/2016
27
Next:
Story analysis and generation (readings will be
available later this week – we’ll send mail)
7/15/2016
28
Download