Exploring the use of synthetic speech by blind and partially

RNIB Centre for Accessible Information (CAI) Literature review #2 Exploring the use of synthetic speech by blind and partially sighted people Published by: RNIB Centre for Accessible Information (CAI), 58-72 John Bright Street, Birmingham, B1 1BN, UK Commissioned by: Pat Beech, Manager, National Library Service, RNIB Authors: Heather Cryer* and Sarah Home *For correspondence Tel: 0121 665 4211 Email: heather.cryer@rnib.org.uk Date: 24 November 2008 Document reference: CAI-LR2 [11-2008] Sensitivity: Internal and full public access Copyright: RNIB 2008 © RNIB 2008 Citation guidance: Cryer, H. and Home, S. (2008). Exploring the use of synthetic speech by blind and partially sighted people. RNIB Centre for Accessible Information, Birmingham: Literature review #2. Acknowledgements: Thanks to Pat Beech and Sarah Morley Wilkins for guidance with this paper. CAI-LR2 [11-2008] 2 © RNIB 2008 Exploring the use of synthetic speech by blind and partially sighted people RNIB Centre for Accessible Information (CAI) Prepared by: Heather Cryer (Research Officer, CAI) FINAL report © RNIB 24 November 2008 Table of Contents Executive summary ................................................................ 4 1. Introduction ........................................................................ 5 2. Benefits and uses of synthetic speech for blind and partially sighted people ........................................................................ 5 3. Subjective acceptance of synthetic speech - blind and partially sighted people........................................................... 6 4. Subjective acceptance of synthetic speech - general population .............................................................................. 7 5. Objective measures - how does synthetic speech affect reading performance? ............................................................ 8 Summary .............................................................................. 10 References ........................................................................... 11 CAI-LR2 [11-2008] 3 © RNIB 2008 Executive summary  This review considers the use of synthetic speech technology by blind and partially sighted people  There are many areas in which synthetic speech technology can be used by this population, including mobility aids, educational tools, entertainment and communication  Various benefits of synthetic speech are discussed, including speed of production, confidentiality and the potential for synthetic speech to deliver information which was not otherwise available  Research with blind and partially sighted people suggests that subjective acceptance of synthetic speech may depend on the users' experience, as people were found to "get used to" synthetic voices. Furthermore, some users preferred less expressive synthetic voices as they felt it helped them to focus on the content of the text, and others reported that whilst they preferred natural speech, synthetic speech was acceptable if it meant they could access the information sooner  Research with sighted people suggests that subjective acceptance of synthetic voices also varies depending on how natural the voice sounds and the context in which it is being used. For example, some people would prefer to have a computer read private information to them than have another person access that information  Synthetic speech has been found to affect objective measures of reading performance. For example, synthetic speech may be less intelligible than natural speech, particularly in background noise, and may need to be presented more slowly to be fully understood. However, measures of reading performance with synthetic speech improved with experience, again suggesting users may get used to using synthetic speech CAI-LR2 [11-2008] 4 © RNIB 2008 1. Introduction There are various ways in which blind and partially sighted people access information. For example, some use large or modified print materials, some use braille and some use audio information. Traditionally, audio information has consisted of having information read out by another person - either live, or on a recording - but developments in technology have led to an alternative. Synthetic speech - which is artificial human speech - can be produced in a number of ways and used for a wide range of applications. These range from talking devices which use simple pre-recordings of a limited range of human speech to "text-to-speech" (TTS) synthesisers which use algorithms to convert any text input into a speech waveform, therefore have an unlimited vocabulary (Koul, 2003; Freitas and Kouroupetroglou, 2008). 2. Benefits and uses of synthetic speech for blind and partially sighted people There are various benefits for blind and partially sighted people in using synthetic speech for audio information. For example, in a study by Llisterri, Fernàndez, Gudayol, Poyatos and Martí (1993), users of screen readers featuring synthetic speech outputs reported benefits including quicker access to information (compared to waiting for it to be brailled) and increased confidentiality (not having to ask someone else to read it). Indeed, Garcia (2004), further heralds synthetic speech as a means to access information without dependence on another person. Synthetic speech is used by blind and partially sighted people in a range of applications, ranging from leisure activities to devices which support independent living. A review by Freitas and Kouroupetroglou (2008) highlights the importance of audio information as an alternative to print for blind and partially sighted people, and reveals a wide range of applications of text-to-speech (TTS) synthesis. These include mobility aids (such as GPS navigation devices), educational tools (talking dictionaries, quick access to textbooks through TTS), entertainment (speaking TV subtitles/electronic programme guides) and screen reading software on computers (aiding professional work, internet access and communication through email). Other research offers more indepth insight into how synthetic speech can benefit blind and CAI-LR2 [11-2008] 5 © RNIB 2008 partially sighted users in some of these applications. For example, as reported by Llisterri et al (1993), professional users of screen readers using synthesised speech felt this improved their ability to work competitively with sighted peers. Furthermore, they reported that using screen readers helped to raise awareness of what blind and partially sighted professionals can achieve, improving their integration in the workplace. Furthermore, Hensil and Whittaker (2000) note that as humans are limited in how quickly they can articulate words, speech synthesis can be delivered much quicker than natural speech, which may benefit some users. These findings highlight some of the applications and benefits of synthetic speech for the blind and partially sighted population. There are also benefits of synthetic speech for organisations providing services for blind and partially sighted people. For example, for library services producing audio books, use of synthetic voices could hugely reduce both the time and cost involved in producing an audio book, which could enable a larger amount of information to be made available. 3. Subjective acceptance of synthetic speech - blind and partially sighted people Before adopting synthetic speech for use in products or publications, research is required to determine whether users would find synthetic speech acceptable. There are a few previous studies looking into blind and partially sighted people's acceptance of synthetic speech. Firstly, a study by Hjelmquist, Jansson and Torrell (1990) investigated the use of synthetic speech as a means for blind and partially sighted people to read daily newspapers. Participants' reading habits with the system were studied over a four month period, and interviews conducted throughout the project. It was found that whilst initially many users had doubts about the system, and found the synthesised speech difficult to understand, all reported "getting used to it" after a few hours of listening. Indeed, some users preferred the synthetic voice to a human reader, because they felt they could better focus on the content of the text, without being distracted by expressiveness in the voice. Indeed, anecdotal evidence suggests that some users are distracted by particular dialects in human readers, or feel that the emphasis used by human readers may interfere with the listener's imagination or interpretation of the text. CAI-LR2 [11-2008] 6 © RNIB 2008 Secondly, in a survey of how blind and partially sighted people would like to receive financial information, Thompson, Reeves and Masters (1999) found that whilst most respondents would prefer a natural voice to synthesised speech, just over half of those interviewed would find synthetic speech an acceptable medium for reading financial information. This suggests that whilst synthetic speech might not be a person's preferred format, there may be instances in which they would accept it. For example, perhaps if it meant they could access the information sooner. Thirdly, Loomis, Marston, Golledge and Klatzy (2005) found that blind participants preferred a GPS system with a synthetic voice to one with a tone indicating distance to a particular waypoint. The voice reported distance in feet to the waypoint, whereas the tone issued an "on course" bleeping sound. Whilst clearly the preference may have been for the more in depth information (distance) given by the voice compared to that from the tone, this finding highlights that there are instances in which a synthetic voice can provide additional information which may not otherwise be available. Whilst there is not a great deal of research in this area which has been carried out with blind and partially sighted people specifically, there is also a wealth of research into the use of synthetic speech with the general population, which may be useful in guiding the current study in terms of highlighting some of the issues involved. 4. Subjective acceptance of synthetic speech - general population Research suggests that listeners prefer natural sounding speech, both in comparing natural speech to synthetic speech, and in comparing different synthetic voices. Stevens, Lees, Vonwiller and Burnham (2005) found that in comparisons of 3 synthetic voices and natural speech, natural speech was rated as both most natural and most preferred. Furthermore, the synthetic voice rated least natural was also rated least preferred. According to Mayo, Clark and King (2005), key factors on which listeners base their judgements of the naturalness of synthetic voices are the appropriateness of the stress and intonation, and the appropriateness of the number of units used in creating the utterance (which relates to how the synthetic speech is CAI-LR2 [11-2008] 7 © RNIB 2008 constructed). As technology improves over time, it is likely that the quality of synthetic voices will improve and become more natural sounding. Indeed, according to Black (2007) the quality of synthetic voices is now approaching that of recorded natural speech. There is some suggestion in the research literature that users' acceptance of synthetic speech may depend on the context in which it is being used. For example, Francis and Nusbaum (1999) found that in some situations where information was considered private, users may prefer synthetic speech to having a human access that information in order to record it for the user. In this case, users chose an unnatural sounding synthetic voice (which in other contexts they rated as unpleasant) to be the most preferable for a banking-by-telephone system. Users rationalised their choice by saying that they would rather a computer knew their bank balance than have someone else read it to them. Users' subjective reactions to synthetic speech are important, as this is likely to affect their likelihood to purchase products featuring synthetic voices, and may also affect their enjoyment of publications produced in this way. However, it must also be noted that for some uses - particularly in education or professional use consideration should also be given to more objective measures, such as how synthetic speech may affect intelligibility, comprehension and reading speed of the spoken word. 5. Objective measures - how does synthetic speech affect reading performance? There is a great deal of research into the effects of synthetic speech on listener's performance on reading tasks. A review by Koul (2003) gives a useful overview of some of the key themes, including differences in intelligibility, the effects of background noise, and effects of the speed at which synthetic speech is listened to. Firstly, research shows that in comparing responses to natural versus synthetic speech, listeners are more accurate at identifying natural speech, and faster to respond to it. These findings have been observed for both single words and full sentences. It is thought that the time lag required for comprehension of synthetic CAI-LR2 [11-2008] 8 © RNIB 2008 speech indicates that it requires a higher cognitive load than decoding natural speech. Secondly, the literature suggests that the intelligibility of synthetic speech is more affected by background noise compared to natural speech, meaning that in the presence of noise, performance with synthetic speech is significantly reduced (Koul & Allen, 1993). It is thought that this may be because human speakers naturally adjust their speech to account for surrounding noise (Langer and Black, 2005). Thirdly, the rate at which synthetic speech is presented is likely to affect comprehension. Koul (2003) reports a number of studies in which slower presentation (e.g. greater spaces between words) resulted in better comprehension of texts. In a study by Hjelmquist, Dahlstrand and Hedelin (1992) blind and partially sighted listeners were allowed to choose the rate at which synthetic speech was presented. Those who were not familiar with synthetic speech were more likely to choose slower rates of presentation, which was thought to help compensate for their lack of experience. Indeed, experience of listening to synthetic speech is thought to affect reading performance, in that practice improves understanding of synthetic speech. Venkatagiri (1994) found that practice effects occur very quickly, with participants' recognition of synthetic sentences improving significantly within the first 5 sentences that they heard. Furthermore, these practice effects were carried over onto different days. Participants listened to 20 sentences on three consecutive days. Results showed a significant improvement from day 1 to day 2, with the poorest performers making the biggest improvements. Although performance continued to improve on the third day, this was not statistically significant, probably due to a ceiling effect. Indeed, Rhyne (1982) reported similar findings with blind children, who listened to four stories per day in synthetic speech over a ten day period. A linear relationship was found between the amount of practice (number of days) and comprehension scores for questions on the stories. These findings suggest that whilst there may be aspects of synthetic speech which may make it more difficult to understand than natural speech, performance may improve with continued use. CAI-LR2 [11-2008] 9 © RNIB 2008 Summary This review considered the issues around use of synthetic speech for both blind and partially sighted people and the general population. In summary, the research literature highlights a number of issues to consider in undertaking research into user acceptance of synthetic speech. Findings suggest that users' subjective evaluations of synthetic speech are likely to vary based on their experience, the context of use, and the characteristics of the voice. Furthermore, there are important aspects to consider relating to reading performance with synthetic speech publications. The findings of this literature review will contribute to future research activity around the use of synthetic speech by blind and partially sighted people. CAI-LR2 [11-2008] 10 © RNIB 2008 References Black, A.W. (2007). Speech synthesis for educational technology. SLaTE Workshop on Speech and Language Technology in Education. Farmington, PA. Francis, A.L., and Nusbaum, H.C. (1999). Evaluating the quality of synthetic speech. In D. Gardner-Bonneau (Ed.). Human Factors and Voice Interactive Systems. Boston: Kluwer Academic Publications. Freitas, D., and Kouroupetroglou, G. (2008). Speech technologies for blind and low vision persons. Technology and Disability, 20, 135 - 156. Garcia, L.G. (2004). Assessment of text reading comprehension by Spanish-speaking blind persons. British Journal of Visual Impairment, 22 (1), 4 - 12. Hensil, J., and Whittaker, S.G. (2000). Visual reading versus auditory reading by sighted persons and persons with low vision. Journal of Visual Impairment and Blindness, 94 (12), 762 - 770. Hjelmquist, E., Dahlstrand, U., and Hedelin, L. (1992). Visually impaired persons; comprehension of text presented with speech synthesis. Journal of Visual Impairment and Blindness, 86 (10), 426 - 428. Hjelmquist, E., Jansson, B., and Torell, G. (1990). Computerorinted technology for blind readers. Journal of Visual Impairment and Blindness, 17, 210- 215. Koul, R. (2003). Synthetic speech perception in individuals with and without disabilities. Augmentative and Alternative Communication, 19 (1), 49 - 58. Koul, R.K, and Allen, G.D. (1993). Segmental intelligibility and speech interference thresholds of high-quality synthetic speech in presence of noise. Journal of Speech and Hearing Research, 36, 790 - 798. Langer, B., and Black, A. (2005). Improving the understanding of speech synthesis by modelling speech in noise. Proceedings of the CAI-LR2 [11-2008] 11 © RNIB 2008 2005 International Conference on Acoustics, Speech and Signal Processing held at Pennsylvania Convention Centre, Pennsylvania. Llisteri, J., Fernàndez, N., Gudayol, F., Poyatos, J., and Martí, J. (1993). Testing users' acceptance of Ciber232, a test to speech system used by blind people. In Granström, B., Hunnicutt, S., and Spense, K.E. (Eds.) Speech and Language Technology for Disabled Persons. Proceeding of an ESCA Workshop, Stockholm, Sweden. pp 203 - 206. Loomis, J.M., Martson, J.R., Golledge, R.G., and Klatzy, R.L. (2005). Personal guidance system for people with visual impairment: a comparison of spatial displays for route guidance. Journal of Visual impairment and Blindness, 99 (4), 219 - 232. Mayo, C., Clark, R.A.J., and King, S. (2005). Multidimensional scaling of listener responses to synthetic speech. INTERSPEECH 2005, Lisbon, Portugal. September 4-8 2005. Rhyne, J.M. (1982). Comprehension of synthetic speech by blind children. Journal of Visual Impairment and Blindness, 10 (10), 313 - 316. Stevens, C., Lees, N., Vonwiller, J., and Burnham, D. (2005). Online experimental methods to evaluate text-to-speech (TTS) synthesis: effects of voice gender and signal quality on intelligibility, naturalness and preference. Computer Speech and Language, 19, 129 - 146. Thompson, L., Reeves, C., and Masters, K. (1999). In the balance: making financial information accessible. British Journal of Visual Impairment, 17 (2), 65 - 70. Venkatagiri, H.S. (1994). Effect of sentence length and exposure on the intelligibility of synthesized speech. Augmentative and Alternative Communication, 10, 96 - 104. CAI-LR2 [11-2008] 12

Exploring the use of synthetic speech by blind and partially

Related documents

Products

Support

Exploring the use of synthetic speech by blind and partially

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib