ppt - Computational Linguistics and Phonetics

advertisement
Phonetics & Phonology
William Barry
What is phonetics
• The observation of how people say things.
• The description of spoken language at
the level of "pronunciation"
• The measurement of pronunciation events
• The modeling of pronunciation behaviour
• The explanation of the communicative
contribution of pronunciation patterns.
Areas of phonetics
• Speech production
(what do the speech organs do?)
• Speech acoustics
(what does the resulting speech signal look like?)
• Speech perception
(What are the acoustic properties that cause us
to hear what we hear?)
Speech production [levels]
• Respiration (sub-glottal activity)
How do we control our breathing to help our
speech communication?
• Phonation (glottal/laryngeal activity)
How do we control our vocal-folds to help our
speech communication?
• Articulation (supra-glottal activity)
How do we control our articulators to help our
speech communication?
Speech production [analysis]
• We make recordings (of course  )
- we choose the type of speech
- we choose the type of speaker
- we choose the type of signal
• These choices determine our analysis:
- speech type (basic sound types, precise vs. casual speech;
monologue vs. dialogue behaviour)
- speaker type (e.g., regional vs."standard" speakers; ……)
- signal type (acoustic = microphone; physiological; electromyographic; neurological .......)
• Signal type dictates the experimental set-up:
- Only the acoustic signal allows „natural" recordings
A standard text .....
The North Wind and the Sun
The North Wind and the Sun were disputing which was the
stronger, when a traveler came along wrapped in a warm
cloak.
They agreed that the one who first succeeded in making the
traveler take his cloak off should be considered stronger
than the other.
Then the North Wind blew as hard as he could, but the
more he blew the more closely did the traveler fold his cloak
around him; and at last the North Wind gave up the attempt.
Then the Sun shined out warmly, and immediately the
traveler took off his cloak. And so the North Wind was
obliged to confess that the Sun was the stronger of the two.
(see www.coli.uni-saarland.de/elaut for other languages)
For signal analysis
Popular analysis packages:
• Praat (www.praat.org)
(by Paul Boersma & David Weenink, Phonetics Amsterdam)
• Wavesurfer (www.speech.kth.se/wavesurfer/)
(by scientists at Stokholm Technical University - KTH)
• Goldwave (www.goldwave.de)
(commercial program, but available free for trial use)
Speech production [Speech Sounds]
E.g. What are the vowels of English and
German like?
• The cat sat on the mat:
SBE: ??;
US: ??
German: ??
• The computer is broken.
SBE: ??;
US: ??
German: ??
• Can you hear the differences?
• Can you describe the differences?
• Can you say why there are differences?
Vowel quality and symbols
German
English
iy
I Y
e O
u
U

o
u
?i
I
eI
U
U Br 
/
/
?e
(E)
(E)
E 
Br US


a/a
US

Br

?
US
US

Br 
A A
Speech production [Speech Sounds]
E.g. What are the consonants of English and
German like?
• The cat sat on the mat:
SBE: ??;
US: ??
German: ??
• The computer is broken.
SBE: ??;
US: ??
German: ??
• Can you hear the differences?
• Can you describe the differences?
• Can you say why there are differences?
Speech production [symbols & sounds]
Consonant articulation
• Place
– lips (labial)
– teeth (dental)
– alveolar ridge (alveolar)
– hard palate (palatal)
– soft palate (velar)
– uvula (uvular)
– pharynx (pharyngeal)
– larynx/glottis (glottal)
• Manner
– stop/plosive
– fricative
– nasal
– lateral
– glide/approximant
– trill
– tap/flap
Speech production [symbols again]
• IPA table
Speech production [Intonation]
(Intonation can have a syntactic or pragmatic function)
Statement – Question (sentence modality)
• Scotland beat France at rugby.
• Scotland beats France at rugby?
Request – Command (pragmatic function)
• Could you come to my office?
• Could you come to my office?
So what‘s "Phonology“?
• The systematic use of sound segments and
prosody in a specific language
• Examples:
– German has vowels a, b, c, d, f .....
English has a, c, e, .....
– German has final devoicing of obstruents
– The 'voicing' of English plural & genitive {s} and past
tense {d} follows the preceding sound
– Stress falls on the first element in compound words in
German (in English the second element is often
stressed) – Compare English vs. German:
(Eng.) Buckingham Palace, Albert Hall, National
Gallery
and Perception?
• Interesting facts:
We don‘t identify the individual sounds as they reach our
ears.
The syllable: (cons) + vowel + (cons) is probably the
smallest unit of perception.
The consonants by themselves contribute less than the
vowels by themselves to our understanding of a spoken
utterance.
(but they contribute more to the understanding of an
utterance if there is one unchanging vowel than the
vowels do with one unchanging consonant!)
And what about written consonants and vowels?
Consonants vs. vowels [1]
__e _ea_e_ _o_e_a__ _o_ _o_o__o_:
_a__e_ __ou_y i_ __e _o__i__ _i__
a _e_ _u__y __e___ i_ __e
a__e__oo_.
Consonants vs. vowels [2]
Th_ w _th_r f_r_c_st f_r t_m_rr_w:
r_th_r cl __d_ _n th_ m_rn_ng w_th _
f_w s_nn_ sp_lls _n th_ _ft_rn__n.
Consonants vs. vowels [3]
• The weather forecast for tommorow: rather
cloudy in the morning with a few sunny
spells in the afternoon.
Consonants vs. vowels [4]
• The weather forecast for tommorow: rather
cloudy in the morning with a few sunny
spells in the afternoon.
• speech versions
– only consonants
– only vowels
– original
Consonants vs. vowels [5]
• The vowel information is greater, but we need the
temporal pattern (the rhythm) of the utterance
(a product of the syllable structure (cons+ vowels)
and the duration/weight/prominance of the vowels.
• only vowels – without silences
• only vowels – with silences
• only vowels – monotonous
So we perceive in chunks
• The syllables are the (more prominent) vowels
with the (less prominent) consonants around
them
• The sentences are the chains of syllables,
with the more prominent words (the lexical or
content words) giving the content and the less
prominent words (grammatical or function
words) grouped around them, showing the
relation between them
• The melody (intonation pattern) helps to make
the important words more prominent.
Connected speech
"The president will be elected for a period of
four years."
• Natural connected speech
• as chain of isolated words (no reductions)
• Natural with silences between words
• as chain of isolated without silences
• Comparison of isolated vs. connected
function words
Connected speech (summary)
"The president will be elected for a period of four years."
• The content words are longer, louder,
unreduced, comprehensible when excised.
• The grammatical words are shorter, less
loud, strongly reduced, incomprehensible
when excised.
• The “production effort“ we invest reflects the importance
of the words (longer + louder + unreduced = more care
and effort).
• Our perception strategies follow this unequal distribution
of effort. We concentrate on the prominent words.
(BUT: Careful! This is a strategy associated with certain
types of languages “stress-timed languages“.
Not all languages do this. So-called “syllable-timed
languages“ are less stress oriented.)
So, what applications are there?
We don‘t normally think about pronunciation. for
granted, so understanding the mechanisms of
speech is invaluable in
• foreign language teaching/learning
• pronunciation dictionaries
• speech pathology
• forensic phonetics
• speech technology
Let‘s look briefly at speech synthesis
Commercial systems use "concatenative" methods
(they stick recorded bits of speech together)
• They don‘t need very much phonetic knowledge (but they need to have
good selection strategies) and the same approach can be applied to
many different languages.
• The systems are good in limited domains, and using a neutral
speaking style. They are bad on wide-ranging topics and more
expressive speech.
Research systems, such as "articulatory synthesis"
require a lot of knowledge, and are used to find out more
about speech production.
• They are potentially very flexible (so the developer has to know how
to constrain the system just to produce what is natural)
• They are much more complex, and have to be programmed for each
new language on the basis of knowledge acquired about that language.
Speech synthesis
A locally developed product (and research platform):
•"Mary": http://mary.dfki.de
Now a (staged) example of how unexpressive
synthesis in an inflexible dialogue system can go
wrong (German railway timetable inquiry system):
•Synthesis + Recognition = Dialogue?
(or chaos?)
Download