ppt

advertisement
Data-driven approach to rapid
prototyping Xhosa speech
synthesis
Albert Visagie
Justus Roux
Centre for Language and Speech
Technology
Stellenbosch University
South Africa
Introduction
• Japan-South African Intergovernmental
Science and Technology Cooperation
Programme.
• Goals:
– Understand what is needed from a linguistic
and technology standpoint.
– Build a text-analysis front-end.
– Experimental platform.
Outline
• Xhosa:
– orthography,
– phonetics,
– tone
• Approach:
– Text analysis,
– HTS.
Xhosa
• Xhosa is spoken in South Africa, by about 8
million people.
• One of the official languages of South Africa
• Writing system is relatively young, and based on
English letters.
• Many dialects.
• Borrowed clicks from Khoisan.
Xhosa: Orthography
Agglutinative language.
Nouns:
– 15 classes (including plural & singular).
– Nouns affixed for dimunitive.
Verbs:
– Verbs affixed according to subject, tense, negative etc.
Examples:
teach: -fundpreacher (teacher): umfundisi  u + m(u) + fund + is + i
small preacher: umfundisana  u + m(u) + fund + is + ana
He/she will teach them:
uzakubafundisa  u + za + ku + ba + fund + is + a
Xhosa: Phonetics
Consonants:
• Implosive /b/
• Ejectives and aspirated versions of stops.
• 15 Clicks
Vowels
• Five basic vowels, including long versions.
Xhosa: Tone
• According to the literature, it’s a tone language.
• High, Low, and Falling tones.
• Recent dictionary: has tone marked for root morphemes,
rules can be constructed to predict movement under
morphological composition.
• Recent work:
– Downing, Roux, argue for accent.
– Kuun: Statistical experiment suggests highly regular structure.
• Observed regularity on pitch rises and duration increase
gives a simple method to use in a first prototype.
Approach
Focus on language dependent components:
– Build the text analyser,
– use an existing synthesiser.
Choice: HTS 2.0
– Model driven, trainable synthesiser.
– Contains language independent F0 and duration
models
– Good use of synthesis database by predicting
spectrum, F0 and segment duration separately.
HTS
HTS: Symbolic Features
Each segment of audio (HMM state) is labelled
according to its linguistic context
Examples:
• Phonetic context: labels of preceding and following
phones.
• Parts-of-speech.
• Stress or canonical tone.
• Counting.
Text Analyser Components
Components:
– Orthographic to phonetic
– Morphological analysis
– Parts-of-speech
– Canonical tone marks
Orthographic to Phonetic
• The orthography is very young, and highly
consistent with the pronunciation.
• Hand-written letter-to-sound rewrite rules.
• Lexicon for loan words.
Morphology
• Specially bootstrapped from a Zulu version for
this project.
• Requires a lexicon of root morphemes.
• Works with isolated words.
• Ambiguous!
• Ideal: root morpheme boundaries, affix types,
POS tagger for disambiguation.
• Implemented: None
Parts-of-Speech
• Morphological analysis.
• Ideal: POS tagger.
• Implemented: Exhaustive lists of closed
sets – pronouns, conjunctions,
prepositions, etc.
Tone
• A printed dictionary with canonical tone markings for root
morphemes is available.
• Rules can be constructed to determine movement of at
least High tones, under morphological composition.
• Highly regular structure: 3rd-from-last syllable starts high
pitch excursion, 2nd-from-last syllable lengthened.
• Ideal: Exhaustive specification of set tones
• Implemented: Word-level syllable counts (3-1, 2-2, 1-3)
Tests
• Basic intelligibility test:
Listeners asked to transcribe what they hear.
– Incomplete phrases.
– Two versions of the question set, and natural
utterances (recoded)
– Mother-tongue and second language speakers.
• Impressions:
– “He’s from the townships.”
– “That’s perfect, there’s nothing wrong with that.”
– Also frowns and repeats.
Next Steps
•
•
•
•
Comprehension test?
Impressions.
Baseline comparative/preference test.
Improvements
– Question phrases.
– Information from morphological analysis.
– Canonical tone markings.
• Zulu
Conclusion
• The system worked very well, considering the
bare minimum of knowledge currently
incorporated.
• Data driven approach with HTS well suited to
bootstrapping a new language.
• Got experimental platform
Demos
“Ubangele amadoda amaninzi kule lali,”
– Natural:
– Synthesised:
“waqalisa ukunqwenela ukuba nomzi.”
– Natural:
– Synthesised:
Click song:
Download