Prosody and NLP presentation this We

advertisement
Prosody and NLP
We have a presentation this Friday ?
We have a presentation this Friday ?
We have a presentation this Friday ?
Seminar by
Nikhil
: 06005004
Adith
: 06005005
Prachur : 06D05011
Abstract
Speech Processing and Natural Language Processing share a
common area of study: Language. However, over time, they have
grown to have little in common regarding theoretical models or
methods of analysis.
NLP takes written text as the starting point for it's analysis, however,
a lot of valuable information is lost in encoding speech as merely text.
It is commonly accepted that intonational features of spoken language
can greatly aid NLP tasks (like adjective scope resolution).
We explore the foundations of the study of Prosody and observe some
approaches that use prosodic cues to aid NLP.
Motivation
• Language is not text driven but speech
driven.
• NLP currently has written text as the starting
point for it's analysis (primarily due to the
abundance of such data).
• A lot of information is lost on ignoring
spoken features and just looking at the text.
A Way Out ?
• Utilize spoken features for NLP
tasks
• NLP needs all the help it can
get.
Dealing at the pragmatics or
discourse level is extremely untenable
Prosodic cues carry useful
pragmatic information
What is Prosody, exactly?
• Comes from Poetry, prosody refers to the
study of poetic meters[1] (rhythms)
• Written text treats words as the basic
building blocks of language.
• Spoken language treats syllables as the
basic building block.
What is Prosody, exactly? (contd.)
• Wikipedia has this to say :
Prosody is the rhythm, stress, and
intonation of connected speech (as
opposed to smaller elements like syllables
or words).
Prosody may reflect :
 features of the speaker,
 emotional state of a speaker,
 features of the utterance,
 ironic or sarcastic,
 emphasis, contrast, and focus
Intonation ?
• Conveys paralinguistic information, emphasis and
contrast.
• Intonation on a particular word could differentiate
between sentence moods.
– You are finISHED (interrogative)
– You are FINIshed (imperative)
Image courtesy Google Image Search
And Stress!
•
Stress is applied on Content Words in spoken
utterances.
cOntent - Noun.
"I really liked their presentation's content.“
contEnt - Verb.
"I have done my best. I am content.“
• Stress on a pair of words distinguishes between
the syntactic role played by each word in the pair.
tight<pause>rope : A rope that is held taut.
tight-rope : A circus-act uses this contraption :)
[2]
Courtesy
tom The Dancing Bug
Prosodic cues
Prosodic functions important for linguistics are
– Marking of boundaries (syntactic, semantic or
dialogue units.)
– Relative duration of phonetic segments
– At syllable level : Energy, intensity, duration and
intonation of syllable.
We shall see two approaches of using these
features in tasks central to NLP.
Prosody-Augmented Syntax
Grammars[3]
• Cue Used :
Relative duration of phonetic segments
• Aim :
To improve the parsing of ambiguous sentences.
• Method :
Augmenting the syntax grammar with a few
non-terminals and rules.
• Concept of “Word Break Indices” used
to show prosodic decoupling between
neighboring words.
• E.g. • Andrea 1 moved 1 the 0 bottle 3 under 0 the 0
bridge.
• Andrea 1 moved 3 the 0 bottle 1 under 0 the 0
bridge.
Break indices were generated by analysing the coda that have a pause.
Coda is the final consonant of a word E.g. – cup , milk
Grammar Modification
• Original grammar rules like S -> NP VP etc.
are changed to S -> NP link1 VP.
• “Link” non-terminals are used for the wordbreak indices.
– For rules like NP ->  we allow rules of the form
link -> .
– To prevent spurious parses due to the introduction
of empty links, we need some constraints which
can be easily incorporated
Results
• The incorporation of prosody resulted in a
reduction of about 25% in the number of
parses found . Parse times increase about
37%.
• Extremely common cases of syntactic
ambiguity can be resolved with prosodic
information, and that grammars can be
modified to take advantage of prosodic
information for improved parsing
Using Prosodic Features in Language
Models[4]
• The outlined approach uses syllable-based
prosodic cues, namely
– Duration of the syllable
– Average energy (intensity)
– The average F0 (fundamental frequency of the
syllable) contour
– The slope of the F0 contour (visualised as
intonation-rising or falling/flat)
Recognition of Prosodic
Features
Prosody in Language Model
• We want to measure P(wn | wn-1,wn-2,…,F)
• Naively modelled by linear interpolation :
– Assumption : prosody features independent of
previous words (not true!!).
P(wn | wn-1,wn-2,…,F) = αP(wn | wn-1,wn-2,…) + (1- α)P(wn|F)
• We want something better
Factored Language Model
• Instead of a word W we will deal with
a set of word-factors F={f1,f2..fk}
(Factors may include the word itself)
• Here, F is chosen as {W, prosodic features}
• The four prosodic features are encoded as
binary numbers(s0 to s15).
• These numbers are assigned to each
syllable of the word .
• For e.g. the prosodic representation for the
word “Actually” can be either ‘s10s12s6’ or
‘s10s15s6’ .
Conclusion
• Prosodic Cues can play an important role
as a heuristic for many NLP tasks.
• All is not one way traffic though. POS
tagging (since its relatively accurate) is
used to aid speech synthesis tasks which
conventionally used only prosodic cues[5]
Future Work
• Handling prosodic information is a first
step towards integration of Speech
Processing and NLP
Courtesy ZITS
References
1. Wikipedia
2. Fromkin, Rodman and Hyams, An Introduction to
Language, 7th Ed, Thomson and Wadsworth
3. John Bear and Patti Price (1990), “Prosody,
Syntax and Parsing”, in proceedings of the 28th
annual meeting of the ACL
4. Songfang Huang and Steve Renals (2007),
“Using Prosodic Features in Language Models
for Meetings”, IRTG annual meeting
5. http://speech.iiit.net/~raghavendra/Webpage/ppp
rts.pdf
Download