Building a Catalan diphone voice Ariadna Font Llitjos May 10, 2001

advertisement
Building a Catalan diphone voice
Ariadna Font Llitjos
May 10, 2001
Defining the phoneset
• Most Catalan phones (34) plus 2 Spanish
phones (th and jj)
– Reason: All Catalan speakers also have Spanish
phones, and there are many Spanish borrowed
words that are in most Catalan speaker’s
lexicon
• Left out phones that need a much finer
classification than the ones made for
English phones (beta, gamma, etc)
Generating Diphone Schema
• Mostly same as Spanish, but with the new set of
phones.
• Catalan has 8 vowels (w/o considering stress),
whereas Spanish has only 5 -> had to add a level
of vheight (high mid-high mid-low low)
( draw graph on the board)
• Mapping Catalan phones to a predefined set of
phones
• Over generative. Voice better suited to pronounce
foreign or nonsense words that contain phones in
the language but no legal combination of those
Mapping Catalan phones to a
predefined set of phones
• Options: Spanish and English
• My choice: English
• Reasons:
– English has more phones for vowels, more
appropriate than Spanish,
– Spanish phones have already been mapped to
English phones, better to just map the phones
directly to English, rather than indirectly
Generating and recording
the prompts
• 1109 prompts (recorded on festvox0)
• Lots of room noise (typing, door, talking,
etc.)
• Microphone not always in same position
• Different power and even different
intonation and duration throughout the
whole recording process
Labeling nonsense words
• Automatically:
– make_labs
– make_diph_index
• Manually:
– Find a set of diphones that are wrong and look them up
in dic/afldiph.est
– Edit and correct the corresponding file with emulabel
– Rerun make_diph_index (etc.)
Extracting pitchmarks and LPS
coefficients
• Automatically:
– make_pm_wav (edit to modify pitch range of
speaker)
– find_powerfactors (tells us what general power
difference exists between files, calculated a
table of power modifiers for each file)
– make_lpc
Testing phone synthesis
• (SayPhones ‘(pau o l a pau s o k l a r i a d n
a pau))
– Catalan voice
– Spanish voice
– English voice
(modifying the phones)
Catalan voice is still quite bad
• Bad example
• But it does have a basic Spanish phone…
and without it, it would sound like this
And here is how kal_diphone sounds
Added tokenization
• To be able to tell the numbers in Catalan
(followed the Spanish tokenizer)
Show file
Added some lexical entries
• Letters of the alphabet, symbols,
punctuation, some content words…
Phrasing, duration and intonation
• Not there yet
• Nor can I get it to SayText
Summary: building a diphone
voice
•
•
•
•
•
•
•
•
•
•
•
•
•
Define phoneset
Generate diphone schema
Generate prompts
Record prompts
Label prompts
Extract pitchmarks and LPC coefficients
Test phone synthesis
Hand correct labels
Add tokenizer
Add lexicon
Add prosody, durations and intonation
Test and evaluate voice
Package for distribution
Download