Speech Intelligibility

advertisement
Speech Intelligibility
The focus of this discussion will be on the
measurement of speech intelligibility for clinical
populations such as: dysarthric speakers, deaf
and hearing impaired speakers, kids or adults
with speech sound disorders, and speakers of
English as a second language.
For these kinds of speakers, a case can be
made that speech intelligibility is the single
most important measure. The central purpose of
speech is to convey information from the
speaker to the listener. This requires that the
words uttered by the speaker are recovered
accurately by the listeners.
Speech intelligibility is not the only thing
that matters – the naturalness of speech,
for example, is also quite important. But
a good case can be made that speech
intelligibility is of central importance.
Definitions of intelligibility [emphasis added]:
“The quality of language that is comprehensible.”
http://www.thefreedictionary.com/speech+intelligibility
“The term intelligibility refers to 'speech clarity' or
the proportion of a speaker's output that a listener
can readily understand. “
www.speech-languagetherapy.com/index.php?option=com_content&view=article&id=29:admin&catid=11:admin&Itemid=117
“Intelligibility is a measure of how comprehensible
speech is, or the degree to which speech can be
understood. Intelligibility is affected by spoken
clarity, explicitness, lucidity, comprehensibility,
perspicuity, and precision.”
https://en.wikipedia.org/wiki/Intelligibility_(communication)
“Degree to which the speaker’s intended message
is recovered by the listener.” (Kent et al., 1989, Journal of Speech and
Hearing Disorders, 54, 482-499).
Start by taking a close look these definitions:
Point #1: Does an utterance need to be
understandable or comprehensible to be
intelligible?
Example 1: “Colorless green ideas sleep furiously.”
Do you understand what it means? If this utterance
were to be spoken clearly by a non-disordered,
native English speaker and correctly transcribed by
a neurologically intact native speaker in a quiet
listening environment, would it be reasonable to say
that the utterance was intelligible?
In my opinion, the answer is yes – yet it is not
comprehensible.
Example 2:
The velocity function is the 1st derivative of the
displacement function; the acceleration function
is the 2nd derivative of the displacement
function.
Assume that this utterance is spoken clearly in a
quiet listening environment, and that it was
accurately transcribed by a listener?
Would the utterance necessarily be understood?
Maybe, maybe not. Let’s assume not. Would it be
reasonable to say that the utterance is intelligible?
I say yes.
Point #1 continued
Example 3: Imagine that we asked listeners to
transcribe (or repeat) nonsense syllables; e.g.
ba, foo, blop, poot, …
These utterances are not comprehensible – this is
why they are called nonsense syllables. If listeners
are able to repeat these utterances accurately,
would it be reasonable to say that they are
intelligible? In my opinion, the answer is yes.
Moral: The term that should be used in these
definitions is RECOGNITION, not understandability
or comprehensibility – do listeners recognize the
speech sounds that are spoken?
Aside: Nonsense utterances are sometimes used
to test intelligibility. Why?
Nonsense utterances directly test the
intelligibility of speech with almost no influence
of language – no syntax, no semantics, no
lexicon.
(The only part of the language system that comes
into play is phonology – the test utterances
conform to English phonotactic* rules; e.g.,
utterances like svek [svɛk], ngah [ŋɑ], or bih [bɪ]
are not used because they violate English
phonotactic rules.)
*Aside: For those who are not familiar with the concept of
a phonotactic rule: Phonotactic rules are one type (out of
three) of phonological rule. Phonotactic rules specify
permissible and impermissible combinations of speech
sounds. They are language specific, and all languages
have them. Some examples of English phonotactic rules:
• English words cannot begin with /ʃt/; e.g., “stot” /stɑt/ is
not an English word, but it could be. On the other hand,
a word such as “shtot” /ʃtɑt/ is not permitted; i.e., it
violates an English phonotactic rule.
• English words can begin with /m/ or /n/, but they cannot
begin with /ŋ/.
• English words cannot end in lax vowels (e.g., /ɛ/, /ɪ/, /ʊ/).
For example, /di/ is a possible English word (/i/ is a
tense vowel), but not /dɪ/ (a lax vowel); /fu/ is a possible
English word (/u/ is a tense vowel), but not /fʊ/, etc.
Point #1, the bottom line:
I am arguing against the definitions of intelligibility
that include understanding or comprehension.
So, what definition should be used? My opinion is
that – for applications in this field – we need a
definition that focuses explicitly on the
transmission of SPEECH information (i.e., not
language and not meaning).
My proposal: An utterance is intelligible to the
degree that the speaker and the listener agree on
what was said. Intelligibility is maintained to the
degree that the listener recognizes the words
and/or speech sounds that were intended by the
talker.
For SLPs who work with dysarthric speakers, deaf
speakers, kids with speech-sound disorders, etc.,
it is not a crazy idea to assume that variations in
intelligibility arise mainly from the person who is
talking.
But, that does not mean that the listener plays no
role in explaining variations in intelligibility. More
soon.
Note: This concept that intelligibility requires
agreement between the talker and the listener is a
simple but important idea.
Quiz: If the talker intends to say one thing but the
listener hears something else, who made the
error; i.e., is it a speaking error or a listening
error?
a. the speaker
b. the listener
c. the transmission channel (room acoustics,
electronics, etc.)
d.
e.
f.
g.
It is not possible to know.
all of the above
some combination of the above
all of the below
Point #2: Does speech intelligibility characterize:
(a) the speaker, (b) the listener, or (c) the
transmission channel (room acoustics and any electronics
between the speaker & listener – more on this soon)?
Short answer: Yes.
This is a pretty big deal, so we’ll spend a little time
talking about.
The 1st scientists to take a serious interest in
speech intelligibility were not phoneticians or
SLPs. They were communications engineers – the
folks at Bell Labs.
Their problem:
talker > telephone system > listener
Stated more generally:
talker > transmission channel > listener
Now, the reason for all this: If the communication
engineer finds that intelligibility isn’t ideal, he/she
will assume that the problem is:
a. the talker
b. the transmission channel
c. the listener
The phone system (as it looked in the early 1900s
– it’s way more complicated now):
talker > mic > amp > BP filter (~300-4000 Hz) >
conversion to FM radio signal > more amps >
many miles of cable > switching network > more
amps > more cables > conversion from FM signal
back to sound > another amp > earphone >
listener
This is the stuff the phone company is most
interested in – everything except the talker and
listener, which is the transmission channel.
Short form of the telephone system:
talker > transmission channel > listener
Now, the reason for all this: If the communication
engineer finds that intelligibility isn’t ideal, he/she
will assume that the problem:
a. The talker
b. The transmission channel
c. The listener
Recall the question we started with:
Does speech intelligibility characterize: (a) the
talker, (b) the listener, or (c) the transmission
channel?
To the communications engineer, the answer is
(c) the transmission channel. Why?
(1) The talker and listener are unremarkable
(i.e., ordinary talker, ordinary listener), and (2)
the transmission channel is the only part of the
system they have any control over.
How does an SLP answer the same question?
To the SLP, does speech intelligibility
characterize: (a) the speaker, (b) the listener, or
(c) the listening environment/transmission
channel (explanation soon)?
For an answer, let’s look again at one of the
definitions of intelligibility we saw earlier:
Intelligibility is affected by spoken clarity,
explicitness, lucidity, comprehensibility,
perspicuity, and precision.
Does this description assume that intelligibility
characterizes the speaker, the listener, or the
transmission channel?
It absolutely does: The terms spoken clarity,
explicitness, lucidity, comprehensibility,
perspicuity, and precision all refer explicitly to the
talker, not the listener, not the communication
channel.
The assumption is made that: (1) the
communication channel is unremarkable (e.g., a
quiet room, live voice or very simple electronic
playback), and (2) the listener is unremarkable
(normal hearing adult). These are not crazy ideas.
Q: Does this mean that the listener doesn’t matter?
A: No.
One of the big ideas that needs to be understood
here is pretty simple: The listener’s familiarity with
the talker can make a big difference.
Listeners are remarkably good at adapting or
accommodating to speech that is distorted in a
variety of different ways.
There is all kinds of excellent literature on this
topic, but ordinary experience with daily life is
enough to make the point.
Example 1: The speech of very young kids.
a. Gay Bardino
b. “You need nukin.”
Example 2: Accented speech – the case of Jimmy
Brinegar. http://www.deeplake.com/content/sounds/koth/boomhauer/dogs.wav
http://www.deeplake.com/content/sounds/koth/boomhauer/seinfeld.wav
Example 3: Speech distorted by bad electronics.
Moral: The goal is to find a measure of speech
intelligibility that characterizes the talker, but the
listener does matter.
Why does the ability of listeners to accommodate to
atypical speech matter?
If intelligibility improves throughout the course of
treatment, who is getting better – the talker or the
listener?
It’s a very simple problem. Solution? Ask someone
other than the clinician to do the listening.
Practical??
Two very different general approaches are used
to assess speech intelligibility:
1. Subjective estimates made by clinicians (by far
the most common) – 50% intelligible, 80%
intelligible, ... Kent calls this method scaling (e.g.,
Kent et al., 1989, JSHD, 54, 482-499).
2. Objective measurement – usually the
percentage of words that are accurately
recognized by a listener. Kent calls this
method item identification (Kent et al., 1989).
Subjective estimates are easy to make, but: (a)
reliability is imperfect, (b) estimates can change
as clinicians become more familiar with the
talker. (The same can be true of item
identification, depending on how it’s done).
What should be used as speech material?
Big surprise: (a) there are many choices, (b)
the choice of speech material matters.
1. Conversational speech: From one point of
view, this is a great choice – it’s exactly what
you want to know. How well would the talker be
expected to do in ordinary conversation?
Obvious problem: The topic of conversation
will vary all over the place, making it
impossible to get an intelligibility measure that
is standardized in any way – either across
different clients or even within the same talker
across time.
2. Words: Standard word lists can be used.
There are many of these word lists available.
Word lists are very easy to score and, of
course, they are standard across talkers.
Some word intelligibility tests provide multiple
word lists with equivalent intelligibility. This is a
big deal: Listeners may still be adapting to the
speech of the talker, but at least they will not be
as likely to learn the word lists. (This can still
present a problem if the word lists are used
frequently.)
3. Sentences: Standard lists of sentences can
be also be used. There are many of these
available as well.
Sentences can be very useful since there are
some talkers who can speak intelligibly with
single words but may have greater difficulty
with more complicated utterances.
Effects of Predictability
Speech is massively redundant, which means
that listeners do not need to catch every little
acoustic-phonetic detail in order to recognize
what is being said. This applies to both words
and sentences, but especially to sentences.
All else being equal, as predictability goes up
intelligibility goes up. Bone-headed simple
example:
Mary had a little [wildly distorted something-or-other].
There’s no mystery about the missing word.
Striking demo from Warren (Science, 167, 392–3933).
“The state governors met with their respective
legislatures convening in the capital city.”
Warren entirely deleted one of the speech sounds
(the [s] of legislatures) and replaced it with a
cough. The [s] was gone. Out of 20 listeners, 19
did not notice that anything was missing; one
listener thought that a sound was missing but
guessed wrong about which one.
Q: How did listeners hear a sound that wasn’t
there?
A: Their brains created it.
The effect is called phonemic restoration.
What is the relevance of this to intelligibility
testing?
Pretty simple: One speaker is 75% intelligible,
another is 50% intelligible, but different sentence
intelligibility tests were used.
Does that comparison mean anything? It’s hard
to know for sure, but probably not.
Example:
HINT sentences:
TIMIT sentences:
Download