Exploring the use of synthetic speech by blind and partially

advertisement
RNIB Centre for Accessible Information (CAI)
Literature review #2
Exploring the use of synthetic
speech by blind and partially
sighted people
Published by:
RNIB Centre for Accessible Information (CAI), 58-72 John Bright
Street, Birmingham, B1 1BN, UK
Commissioned by:
Pat Beech, Manager, National Library Service, RNIB
Authors:
Heather Cryer* and Sarah Home
*For correspondence
Tel: 0121 665 4211
Email: heather.cryer@rnib.org.uk
Date: 24 November 2008
Document reference: CAI-LR2 [11-2008]
Sensitivity: Internal and full public access
Copyright: RNIB 2008
© RNIB 2008
Citation guidance:
Cryer, H. and Home, S. (2008). Exploring the use of synthetic
speech by blind and partially sighted people. RNIB Centre for
Accessible Information, Birmingham: Literature review #2.
Acknowledgements:
Thanks to Pat Beech and Sarah Morley Wilkins for guidance with
this paper.
CAI-LR2 [11-2008]
2
© RNIB 2008
Exploring the use of synthetic speech
by blind and partially sighted people
RNIB Centre for Accessible Information (CAI)
Prepared by:
Heather Cryer (Research Officer, CAI)
FINAL report
© RNIB 24 November 2008
Table of Contents
Executive summary ................................................................ 4
1. Introduction ........................................................................ 5
2. Benefits and uses of synthetic speech for blind and partially
sighted people ........................................................................ 5
3. Subjective acceptance of synthetic speech - blind and
partially sighted people........................................................... 6
4. Subjective acceptance of synthetic speech - general
population .............................................................................. 7
5. Objective measures - how does synthetic speech affect
reading performance? ............................................................ 8
Summary .............................................................................. 10
References ........................................................................... 11
CAI-LR2 [11-2008]
3
© RNIB 2008
Executive summary
 This review considers the use of synthetic speech technology
by blind and partially sighted people
 There are many areas in which synthetic speech technology
can be used by this population, including mobility aids,
educational tools, entertainment and communication
 Various benefits of synthetic speech are discussed, including
speed of production, confidentiality and the potential for
synthetic speech to deliver information which was not otherwise
available
 Research with blind and partially sighted people suggests that
subjective acceptance of synthetic speech may depend on the
users' experience, as people were found to "get used to"
synthetic voices. Furthermore, some users preferred less
expressive synthetic voices as they felt it helped them to focus
on the content of the text, and others reported that whilst they
preferred natural speech, synthetic speech was acceptable if it
meant they could access the information sooner
 Research with sighted people suggests that subjective
acceptance of synthetic voices also varies depending on how
natural the voice sounds and the context in which it is being
used. For example, some people would prefer to have a
computer read private information to them than have another
person access that information
 Synthetic speech has been found to affect objective measures
of reading performance. For example, synthetic speech may be
less intelligible than natural speech, particularly in background
noise, and may need to be presented more slowly to be fully
understood. However, measures of reading performance with
synthetic speech improved with experience, again suggesting
users may get used to using synthetic speech
CAI-LR2 [11-2008]
4
© RNIB 2008
1. Introduction
There are various ways in which blind and partially sighted people
access information. For example, some use large or modified print
materials, some use braille and some use audio information.
Traditionally, audio information has consisted of having information
read out by another person - either live, or on a recording - but
developments in technology have led to an alternative. Synthetic
speech - which is artificial human speech - can be produced in a
number of ways and used for a wide range of applications. These
range from talking devices which use simple pre-recordings of a
limited range of human speech to "text-to-speech" (TTS)
synthesisers which use algorithms to convert any text input into a
speech waveform, therefore have an unlimited vocabulary (Koul,
2003; Freitas and Kouroupetroglou, 2008).
2. Benefits and uses of synthetic speech for blind and
partially sighted people
There are various benefits for blind and partially sighted people in
using synthetic speech for audio information. For example, in a
study by Llisterri, Fernàndez, Gudayol, Poyatos and Martí
(1993), users of screen readers featuring synthetic speech outputs
reported benefits including quicker access to information
(compared to waiting for it to be brailled) and increased
confidentiality (not having to ask someone else to read it). Indeed,
Garcia (2004), further heralds synthetic speech as a means to
access information without dependence on another person.
Synthetic speech is used by blind and partially sighted people in a
range of applications, ranging from leisure activities to devices
which support independent living. A review by Freitas and
Kouroupetroglou (2008) highlights the importance of audio
information as an alternative to print for blind and partially sighted
people, and reveals a wide range of applications of text-to-speech
(TTS) synthesis. These include mobility aids (such as GPS
navigation devices), educational tools (talking dictionaries, quick
access to textbooks through TTS), entertainment (speaking TV
subtitles/electronic programme guides) and screen reading
software on computers (aiding professional work, internet access
and communication through email). Other research offers more indepth insight into how synthetic speech can benefit blind and
CAI-LR2 [11-2008]
5
© RNIB 2008
partially sighted users in some of these applications. For example,
as reported by Llisterri et al (1993), professional users of screen
readers using synthesised speech felt this improved their ability to
work competitively with sighted peers. Furthermore, they reported
that using screen readers helped to raise awareness of what blind
and partially sighted professionals can achieve, improving their
integration in the workplace. Furthermore, Hensil and Whittaker
(2000) note that as humans are limited in how quickly they can
articulate words, speech synthesis can be delivered much quicker
than natural speech, which may benefit some users.
These findings highlight some of the applications and benefits of
synthetic speech for the blind and partially sighted population.
There are also benefits of synthetic speech for organisations
providing services for blind and partially sighted people. For
example, for library services producing audio books, use of
synthetic voices could hugely reduce both the time and cost
involved in producing an audio book, which could enable a larger
amount of information to be made available.
3. Subjective acceptance of synthetic speech - blind
and partially sighted people
Before adopting synthetic speech for use in products or
publications, research is required to determine whether users
would find synthetic speech acceptable. There are a few previous
studies looking into blind and partially sighted people's acceptance
of synthetic speech. Firstly, a study by Hjelmquist, Jansson and
Torrell (1990) investigated the use of synthetic speech as a
means for blind and partially sighted people to read daily
newspapers. Participants' reading habits with the system were
studied over a four month period, and interviews conducted
throughout the project. It was found that whilst initially many users
had doubts about the system, and found the synthesised speech
difficult to understand, all reported "getting used to it" after a few
hours of listening. Indeed, some users preferred the synthetic
voice to a human reader, because they felt they could better focus
on the content of the text, without being distracted by
expressiveness in the voice. Indeed, anecdotal evidence suggests
that some users are distracted by particular dialects in human
readers, or feel that the emphasis used by human readers may
interfere with the listener's imagination or interpretation of the text.
CAI-LR2 [11-2008]
6
© RNIB 2008
Secondly, in a survey of how blind and partially sighted people
would like to receive financial information, Thompson, Reeves
and Masters (1999) found that whilst most respondents would
prefer a natural voice to synthesised speech, just over half of those
interviewed would find synthetic speech an acceptable medium for
reading financial information. This suggests that whilst synthetic
speech might not be a person's preferred format, there may be
instances in which they would accept it. For example, perhaps if it
meant they could access the information sooner.
Thirdly, Loomis, Marston, Golledge and Klatzy (2005) found that
blind participants preferred a GPS system with a synthetic voice to
one with a tone indicating distance to a particular waypoint. The
voice reported distance in feet to the waypoint, whereas the tone
issued an "on course" bleeping sound. Whilst clearly the
preference may have been for the more in depth information
(distance) given by the voice compared to that from the tone, this
finding highlights that there are instances in which a synthetic
voice can provide additional information which may not otherwise
be available.
Whilst there is not a great deal of research in this area which has
been carried out with blind and partially sighted people specifically,
there is also a wealth of research into the use of synthetic speech
with the general population, which may be useful in guiding the
current study in terms of highlighting some of the issues involved.
4. Subjective acceptance of synthetic speech - general
population
Research suggests that listeners prefer natural sounding speech,
both in comparing natural speech to synthetic speech, and in
comparing different synthetic voices. Stevens, Lees, Vonwiller
and Burnham (2005) found that in comparisons of 3 synthetic
voices and natural speech, natural speech was rated as both most
natural and most preferred. Furthermore, the synthetic voice rated
least natural was also rated least preferred.
According to Mayo, Clark and King (2005), key factors on which
listeners base their judgements of the naturalness of synthetic
voices are the appropriateness of the stress and intonation, and
the appropriateness of the number of units used in creating the
utterance (which relates to how the synthetic speech is
CAI-LR2 [11-2008]
7
© RNIB 2008
constructed). As technology improves over time, it is likely that
the quality of synthetic voices will improve and become more
natural sounding. Indeed, according to Black (2007) the quality of
synthetic voices is now approaching that of recorded natural
speech.
There is some suggestion in the research literature that users'
acceptance of synthetic speech may depend on the context in
which it is being used. For example, Francis and Nusbaum
(1999) found that in some situations where information was
considered private, users may prefer synthetic speech to having a
human access that information in order to record it for the user. In
this case, users chose an unnatural sounding synthetic voice
(which in other contexts they rated as unpleasant) to be the most
preferable for a banking-by-telephone system. Users rationalised
their choice by saying that they would rather a computer knew their
bank balance than have someone else read it to them.
Users' subjective reactions to synthetic speech are important, as
this is likely to affect their likelihood to purchase products featuring
synthetic voices, and may also affect their enjoyment of
publications produced in this way. However, it must also be noted
that for some uses - particularly in education or professional use consideration should also be given to more objective measures,
such as how synthetic speech may affect intelligibility,
comprehension and reading speed of the spoken word.
5. Objective measures - how does synthetic speech
affect reading performance?
There is a great deal of research into the effects of synthetic
speech on listener's performance on reading tasks. A review by
Koul (2003) gives a useful overview of some of the key themes,
including differences in intelligibility, the effects of background
noise, and effects of the speed at which synthetic speech is
listened to.
Firstly, research shows that in comparing responses to natural
versus synthetic speech, listeners are more accurate at identifying
natural speech, and faster to respond to it. These findings have
been observed for both single words and full sentences. It is
thought that the time lag required for comprehension of synthetic
CAI-LR2 [11-2008]
8
© RNIB 2008
speech indicates that it requires a higher cognitive load than
decoding natural speech.
Secondly, the literature suggests that the intelligibility of synthetic
speech is more affected by background noise compared to natural
speech, meaning that in the presence of noise, performance with
synthetic speech is significantly reduced (Koul & Allen, 1993). It
is thought that this may be because human speakers naturally
adjust their speech to account for surrounding noise (Langer and
Black, 2005).
Thirdly, the rate at which synthetic speech is presented is likely to
affect comprehension. Koul (2003) reports a number of studies in
which slower presentation (e.g. greater spaces between words)
resulted in better comprehension of texts. In a study by
Hjelmquist, Dahlstrand and Hedelin (1992) blind and partially
sighted listeners were allowed to choose the rate at which
synthetic speech was presented. Those who were not familiar with
synthetic speech were more likely to choose slower rates of
presentation, which was thought to help compensate for their lack
of experience.
Indeed, experience of listening to synthetic speech is thought to
affect reading performance, in that practice improves
understanding of synthetic speech. Venkatagiri (1994) found that
practice effects occur very quickly, with participants' recognition of
synthetic sentences improving significantly within the first 5
sentences that they heard. Furthermore, these practice effects
were carried over onto different days. Participants listened to 20
sentences on three consecutive days. Results showed a
significant improvement from day 1 to day 2, with the poorest
performers making the biggest improvements. Although
performance continued to improve on the third day, this was not
statistically significant, probably due to a ceiling effect. Indeed,
Rhyne (1982) reported similar findings with blind children, who
listened to four stories per day in synthetic speech over a ten day
period. A linear relationship was found between the amount of
practice (number of days) and comprehension scores for questions
on the stories. These findings suggest that whilst there may be
aspects of synthetic speech which may make it more difficult to
understand than natural speech, performance may improve with
continued use.
CAI-LR2 [11-2008]
9
© RNIB 2008
Summary
This review considered the issues around use of synthetic speech
for both blind and partially sighted people and the general
population. In summary, the research literature highlights a
number of issues to consider in undertaking research into user
acceptance of synthetic speech. Findings suggest that users'
subjective evaluations of synthetic speech are likely to vary based
on their experience, the context of use, and the characteristics of
the voice. Furthermore, there are important aspects to consider
relating to reading performance with synthetic speech publications.
The findings of this literature review will contribute to future
research activity around the use of synthetic speech by blind and
partially sighted people.
CAI-LR2 [11-2008]
10
© RNIB 2008
References
Black, A.W. (2007). Speech synthesis for educational technology.
SLaTE Workshop on Speech and Language Technology in
Education. Farmington, PA.
Francis, A.L., and Nusbaum, H.C. (1999). Evaluating the quality of
synthetic speech. In D. Gardner-Bonneau (Ed.). Human Factors
and Voice Interactive Systems. Boston: Kluwer Academic
Publications.
Freitas, D., and Kouroupetroglou, G. (2008). Speech technologies
for blind and low vision persons. Technology and Disability, 20,
135 - 156.
Garcia, L.G. (2004). Assessment of text reading comprehension by
Spanish-speaking blind persons. British Journal of Visual
Impairment, 22 (1), 4 - 12.
Hensil, J., and Whittaker, S.G. (2000). Visual reading versus
auditory reading by sighted persons and persons with low vision.
Journal of Visual Impairment and Blindness, 94 (12), 762 - 770.
Hjelmquist, E., Dahlstrand, U., and Hedelin, L. (1992). Visually
impaired persons; comprehension of text presented with speech
synthesis. Journal of Visual Impairment and Blindness, 86 (10),
426 - 428.
Hjelmquist, E., Jansson, B., and Torell, G. (1990). Computerorinted technology for blind readers. Journal of Visual Impairment
and Blindness, 17, 210- 215.
Koul, R. (2003). Synthetic speech perception in individuals with
and without disabilities. Augmentative and Alternative
Communication, 19 (1), 49 - 58.
Koul, R.K, and Allen, G.D. (1993). Segmental intelligibility and
speech interference thresholds of high-quality synthetic speech in
presence of noise. Journal of Speech and Hearing Research, 36,
790 - 798.
Langer, B., and Black, A. (2005). Improving the understanding of
speech synthesis by modelling speech in noise. Proceedings of the
CAI-LR2 [11-2008]
11
© RNIB 2008
2005 International Conference on Acoustics, Speech and Signal
Processing held at Pennsylvania Convention Centre,
Pennsylvania.
Llisteri, J., Fernàndez, N., Gudayol, F., Poyatos, J., and Martí, J.
(1993). Testing users' acceptance of Ciber232, a test to speech
system used by blind people. In Granström, B., Hunnicutt, S., and
Spense, K.E. (Eds.) Speech and Language Technology for
Disabled Persons. Proceeding of an ESCA Workshop, Stockholm,
Sweden. pp 203 - 206.
Loomis, J.M., Martson, J.R., Golledge, R.G., and Klatzy, R.L.
(2005). Personal guidance system for people with visual
impairment: a comparison of spatial displays for route guidance.
Journal of Visual impairment and Blindness, 99 (4), 219 - 232.
Mayo, C., Clark, R.A.J., and King, S. (2005). Multidimensional
scaling of listener responses to synthetic speech. INTERSPEECH
2005, Lisbon, Portugal. September 4-8 2005.
Rhyne, J.M. (1982). Comprehension of synthetic speech by blind
children. Journal of Visual Impairment and Blindness, 10 (10), 313
- 316.
Stevens, C., Lees, N., Vonwiller, J., and Burnham, D. (2005). Online experimental methods to evaluate text-to-speech (TTS)
synthesis: effects of voice gender and signal quality on
intelligibility, naturalness and preference. Computer Speech and
Language, 19, 129 - 146.
Thompson, L., Reeves, C., and Masters, K. (1999). In the balance:
making financial information accessible. British Journal of Visual
Impairment, 17 (2), 65 - 70.
Venkatagiri, H.S. (1994). Effect of sentence length and exposure
on the intelligibility of synthesized speech. Augmentative and
Alternative Communication, 10, 96 - 104.
CAI-LR2 [11-2008]
12
Download