Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University of Sheffield Introduction Building voices for VIVOCA (communication aids) Speech synthesis techniques Future research: personalisation of synthetic voices Current speech synthesis: communication aids High quality voices available E.g. Toby Churchill Lightwriter – – DECtalk™ (Fonix) for American English Acapela for British English Personalisation limited: age, gender, language Personalisation Voice = identity – – – – – – Gender Age Geographic background Socio-economic background Ethnic background As that individual • Maintains social relationships • Maintains social closeness • Sets group membership VIVOCA Voice Input Voice Output Communication Aid Dysarthric speech input Speech Recognition Recognised text Text-to-Speech Synthesis Intelligible and Intelligible personalisedsynthesised synthesised speech speech output output • Retain elements of clients’ identity for synthesised speech output VIVOCA: personalisation Sheffield/Barnsley user group Retain local accent – Speaker database – geographic identity Arctic database: 593 + 20 sentences Professional local speakers – – Ian McMillan Christa Ackroyd Concatenative synthesis Festvox: http://festvox.org/ Speech recordings Input data Unit segmentation i a sh Unit database Text input Synthesised speech Unit selection Concatenation + smoothing …+ + +… Concatenative synthesis High quality Natural sounding Sounds like original speaker Need a lot of data (~600 sentences) Can be inconsistent Difficult to manipulate prosody HMM synthesis y e s yes HMM synthesis procedure HTS http://hts.sp.nitech.ac.jp/ Speech recordings Input data Training Text input e Synthesis t Speaker model Synthesised speech HMM synthesis Consistent Intelligible Needs relatively little input (~20 mins) Can be adapted with small amount of data (>5 sentences) Easier to manipulate Buzzy quality Less natural than concatenative Future research Further personalisation for individuals with progressive speech disorders – Voice banking – Capturing the essence of a voice Before deterioration Adaptation using HMM synthesis – Before or during deterioration Thank you This work is sponsored by EPSRC Doctoral Training grant For further details of VIVOCA see: http://www.shef.ac.uk/cast/ Email: S.Creer@dcs.shef.ac.uk