Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham

advertisement
Modern speech synthesis:
communication aid personalisation
Sarah Creer
Stuart Cunningham
Phil Green
Clinical Applications of Speech
Technology
University of Sheffield
Introduction



Building voices for VIVOCA (communication
aids)
Speech synthesis techniques
Future research: personalisation of synthetic
voices
Current speech synthesis:
communication aids


High quality voices available
E.g. Toby Churchill Lightwriter
–
–

DECtalk™ (Fonix) for American English
Acapela for British English
Personalisation limited: age, gender, language
Personalisation

Voice = identity
–
–
–
–
–
–
Gender
Age
Geographic background
Socio-economic
background
Ethnic background
As that individual
• Maintains social relationships
• Maintains social closeness
• Sets group membership
VIVOCA

Voice Input Voice Output Communication Aid
Dysarthric speech input
Speech
Recognition
Recognised
text
Text-to-Speech
Synthesis
Intelligible and
Intelligible
personalisedsynthesised
synthesised
speech
speech output
output
• Retain elements of clients’ identity for
synthesised speech output
VIVOCA: personalisation


Sheffield/Barnsley user
group
Retain local accent
–

Speaker database
–

geographic identity
Arctic database:
593 + 20 sentences
Professional local speakers
–
–
Ian McMillan
Christa Ackroyd
Concatenative synthesis
Festvox: http://festvox.org/
Speech recordings
Input data
Unit
segmentation
i
a
sh
Unit database
Text input
Synthesised speech
Unit
selection
Concatenation
+ smoothing
…+
+
+…
Concatenative synthesis






High quality
Natural sounding
Sounds like original speaker
Need a lot of data (~600 sentences)
Can be inconsistent
Difficult to manipulate prosody
HMM synthesis
y
e
s
yes
HMM synthesis procedure
HTS http://hts.sp.nitech.ac.jp/
Speech recordings
Input data
Training
Text input
e
Synthesis
t
Speaker model
Synthesised
speech
HMM synthesis







Consistent
Intelligible
Needs relatively little input (~20 mins)
Can be adapted with small amount of data
(>5 sentences)
Easier to manipulate
Buzzy quality
Less natural than concatenative
Future research

Further personalisation for individuals with
progressive speech disorders
–

Voice banking
–

Capturing the essence of a voice
Before deterioration
Adaptation using HMM synthesis
–
Before or during deterioration
Thank you
This work is sponsored by EPSRC Doctoral Training grant
For further details of VIVOCA see: http://www.shef.ac.uk/cast/
Email: S.Creer@dcs.shef.ac.uk
Download