Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham

Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University of Sheffield Introduction    Building voices for VIVOCA (communication aids) Speech synthesis techniques Future research: personalisation of synthetic voices Current speech synthesis: communication aids   High quality voices available E.g. Toby Churchill Lightwriter – –  DECtalk™ (Fonix) for American English Acapela for British English Personalisation limited: age, gender, language Personalisation  Voice = identity – – – – – – Gender Age Geographic background Socio-economic background Ethnic background As that individual • Maintains social relationships • Maintains social closeness • Sets group membership VIVOCA  Voice Input Voice Output Communication Aid Dysarthric speech input Speech Recognition Recognised text Text-to-Speech Synthesis Intelligible and Intelligible personalisedsynthesised synthesised speech speech output output • Retain elements of clients’ identity for synthesised speech output VIVOCA: personalisation   Sheffield/Barnsley user group Retain local accent –  Speaker database –  geographic identity Arctic database: 593 + 20 sentences Professional local speakers – – Ian McMillan Christa Ackroyd Concatenative synthesis Festvox: http://festvox.org/ Speech recordings Input data Unit segmentation i a sh Unit database Text input Synthesised speech Unit selection Concatenation + smoothing …+ + +… Concatenative synthesis       High quality Natural sounding Sounds like original speaker Need a lot of data (~600 sentences) Can be inconsistent Difficult to manipulate prosody HMM synthesis y e s yes HMM synthesis procedure HTS http://hts.sp.nitech.ac.jp/ Speech recordings Input data Training Text input e Synthesis t Speaker model Synthesised speech HMM synthesis        Consistent Intelligible Needs relatively little input (~20 mins) Can be adapted with small amount of data (>5 sentences) Easier to manipulate Buzzy quality Less natural than concatenative Future research  Further personalisation for individuals with progressive speech disorders –  Voice banking –  Capturing the essence of a voice Before deterioration Adaptation using HMM synthesis – Before or during deterioration Thank you This work is sponsored by EPSRC Doctoral Training grant For further details of VIVOCA see: http://www.shef.ac.uk/cast/ Email: S.Creer@dcs.shef.ac.uk

Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham

Related documents

Products

Support

Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib