Speech Intelligibility The focus of this discussion will be on the measurement of speech intelligibility for clinical populations such as: dysarthric speakers, deaf and hearing impaired speakers, kids or adults with speech sound disorders, and speakers of English as a second language. For these kinds of speakers, a case can be made that speech intelligibility is the single most important measure. The central purpose of speech is to convey information from the speaker to the listener. This requires that the words uttered by the speaker are recovered accurately by the listeners. Speech intelligibility is not the only thing that matters – the naturalness of speech, for example, is also quite important. But a good case can be made that speech intelligibility is of central importance. Definitions of intelligibility [emphasis added]: “The quality of language that is comprehensible.” http://www.thefreedictionary.com/speech+intelligibility “The term intelligibility refers to 'speech clarity' or the proportion of a speaker's output that a listener can readily understand. “ www.speech-languagetherapy.com/index.php?option=com_content&view=article&id=29:admin&catid=11:admin&Itemid=117 “Intelligibility is a measure of how comprehensible speech is, or the degree to which speech can be understood. Intelligibility is affected by spoken clarity, explicitness, lucidity, comprehensibility, perspicuity, and precision.” https://en.wikipedia.org/wiki/Intelligibility_(communication) “Degree to which the speaker’s intended message is recovered by the listener.” (Kent et al., 1989, Journal of Speech and Hearing Disorders, 54, 482-499). Start by taking a close look these definitions: Point #1: Does an utterance need to be understandable or comprehensible to be intelligible? Example 1: “Colorless green ideas sleep furiously.” Do you understand what it means? If this utterance were to be spoken clearly by a non-disordered, native English speaker and correctly transcribed by a neurologically intact native speaker in a quiet listening environment, would it be reasonable to say that the utterance was intelligible? In my opinion, the answer is yes – yet it is not comprehensible. Example 2: The velocity function is the 1st derivative of the displacement function; the acceleration function is the 2nd derivative of the displacement function. Assume that this utterance is spoken clearly in a quiet listening environment, and that it was accurately transcribed by a listener? Would the utterance necessarily be understood? Maybe, maybe not. Let’s assume not. Would it be reasonable to say that the utterance is intelligible? I say yes. Point #1 continued Example 3: Imagine that we asked listeners to transcribe (or repeat) nonsense syllables; e.g. ba, foo, blop, poot, … These utterances are not comprehensible – this is why they are called nonsense syllables. If listeners are able to repeat these utterances accurately, would it be reasonable to say that they are intelligible? In my opinion, the answer is yes. Moral: The term that should be used in these definitions is RECOGNITION, not understandability or comprehensibility – do listeners recognize the speech sounds that are spoken? Aside: Nonsense utterances are sometimes used to test intelligibility. Why? Nonsense utterances directly test the intelligibility of speech with almost no influence of language – no syntax, no semantics, no lexicon. (The only part of the language system that comes into play is phonology – the test utterances conform to English phonotactic* rules; e.g., utterances like svek [svɛk], ngah [ŋɑ], or bih [bɪ] are not used because they violate English phonotactic rules.) *Aside: For those who are not familiar with the concept of a phonotactic rule: Phonotactic rules are one type (out of three) of phonological rule. Phonotactic rules specify permissible and impermissible combinations of speech sounds. They are language specific, and all languages have them. Some examples of English phonotactic rules: • English words cannot begin with /ʃt/; e.g., “stot” /stɑt/ is not an English word, but it could be. On the other hand, a word such as “shtot” /ʃtɑt/ is not permitted; i.e., it violates an English phonotactic rule. • English words can begin with /m/ or /n/, but they cannot begin with /ŋ/. • English words cannot end in lax vowels (e.g., /ɛ/, /ɪ/, /ʊ/). For example, /di/ is a possible English word (/i/ is a tense vowel), but not /dɪ/ (a lax vowel); /fu/ is a possible English word (/u/ is a tense vowel), but not /fʊ/, etc. Point #1, the bottom line: I am arguing against the definitions of intelligibility that include understanding or comprehension. So, what definition should be used? My opinion is that – for applications in this field – we need a definition that focuses explicitly on the transmission of SPEECH information (i.e., not language and not meaning). My proposal: An utterance is intelligible to the degree that the speaker and the listener agree on what was said. Intelligibility is maintained to the degree that the listener recognizes the words and/or speech sounds that were intended by the talker. For SLPs who work with dysarthric speakers, deaf speakers, kids with speech-sound disorders, etc., it is not a crazy idea to assume that variations in intelligibility arise mainly from the person who is talking. But, that does not mean that the listener plays no role in explaining variations in intelligibility. More soon. Note: This concept that intelligibility requires agreement between the talker and the listener is a simple but important idea. Quiz: If the talker intends to say one thing but the listener hears something else, who made the error; i.e., is it a speaking error or a listening error? a. the speaker b. the listener c. the transmission channel (room acoustics, electronics, etc.) d. e. f. g. It is not possible to know. all of the above some combination of the above all of the below Point #2: Does speech intelligibility characterize: (a) the speaker, (b) the listener, or (c) the transmission channel (room acoustics and any electronics between the speaker & listener – more on this soon)? Short answer: Yes. This is a pretty big deal, so we’ll spend a little time talking about. The 1st scientists to take a serious interest in speech intelligibility were not phoneticians or SLPs. They were communications engineers – the folks at Bell Labs. Their problem: talker > telephone system > listener Stated more generally: talker > transmission channel > listener Now, the reason for all this: If the communication engineer finds that intelligibility isn’t ideal, he/she will assume that the problem is: a. the talker b. the transmission channel c. the listener The phone system (as it looked in the early 1900s – it’s way more complicated now): talker > mic > amp > BP filter (~300-4000 Hz) > conversion to FM radio signal > more amps > many miles of cable > switching network > more amps > more cables > conversion from FM signal back to sound > another amp > earphone > listener This is the stuff the phone company is most interested in – everything except the talker and listener, which is the transmission channel. Short form of the telephone system: talker > transmission channel > listener Now, the reason for all this: If the communication engineer finds that intelligibility isn’t ideal, he/she will assume that the problem: a. The talker b. The transmission channel c. The listener Recall the question we started with: Does speech intelligibility characterize: (a) the talker, (b) the listener, or (c) the transmission channel? To the communications engineer, the answer is (c) the transmission channel. Why? (1) The talker and listener are unremarkable (i.e., ordinary talker, ordinary listener), and (2) the transmission channel is the only part of the system they have any control over. How does an SLP answer the same question? To the SLP, does speech intelligibility characterize: (a) the speaker, (b) the listener, or (c) the listening environment/transmission channel (explanation soon)? For an answer, let’s look again at one of the definitions of intelligibility we saw earlier: Intelligibility is affected by spoken clarity, explicitness, lucidity, comprehensibility, perspicuity, and precision. Does this description assume that intelligibility characterizes the speaker, the listener, or the transmission channel? It absolutely does: The terms spoken clarity, explicitness, lucidity, comprehensibility, perspicuity, and precision all refer explicitly to the talker, not the listener, not the communication channel. The assumption is made that: (1) the communication channel is unremarkable (e.g., a quiet room, live voice or very simple electronic playback), and (2) the listener is unremarkable (normal hearing adult). These are not crazy ideas. Q: Does this mean that the listener doesn’t matter? A: No. One of the big ideas that needs to be understood here is pretty simple: The listener’s familiarity with the talker can make a big difference. Listeners are remarkably good at adapting or accommodating to speech that is distorted in a variety of different ways. There is all kinds of excellent literature on this topic, but ordinary experience with daily life is enough to make the point. Example 1: The speech of very young kids. a. Gay Bardino b. “You need nukin.” Example 2: Accented speech – the case of Jimmy Brinegar. http://www.deeplake.com/content/sounds/koth/boomhauer/dogs.wav http://www.deeplake.com/content/sounds/koth/boomhauer/seinfeld.wav Example 3: Speech distorted by bad electronics. Moral: The goal is to find a measure of speech intelligibility that characterizes the talker, but the listener does matter. Why does the ability of listeners to accommodate to atypical speech matter? If intelligibility improves throughout the course of treatment, who is getting better – the talker or the listener? It’s a very simple problem. Solution? Ask someone other than the clinician to do the listening. Practical?? Two very different general approaches are used to assess speech intelligibility: 1. Subjective estimates made by clinicians (by far the most common) – 50% intelligible, 80% intelligible, ... Kent calls this method scaling (e.g., Kent et al., 1989, JSHD, 54, 482-499). 2. Objective measurement – usually the percentage of words that are accurately recognized by a listener. Kent calls this method item identification (Kent et al., 1989). Subjective estimates are easy to make, but: (a) reliability is imperfect, (b) estimates can change as clinicians become more familiar with the talker. (The same can be true of item identification, depending on how it’s done). What should be used as speech material? Big surprise: (a) there are many choices, (b) the choice of speech material matters. 1. Conversational speech: From one point of view, this is a great choice – it’s exactly what you want to know. How well would the talker be expected to do in ordinary conversation? Obvious problem: The topic of conversation will vary all over the place, making it impossible to get an intelligibility measure that is standardized in any way – either across different clients or even within the same talker across time. 2. Words: Standard word lists can be used. There are many of these word lists available. Word lists are very easy to score and, of course, they are standard across talkers. Some word intelligibility tests provide multiple word lists with equivalent intelligibility. This is a big deal: Listeners may still be adapting to the speech of the talker, but at least they will not be as likely to learn the word lists. (This can still present a problem if the word lists are used frequently.) 3. Sentences: Standard lists of sentences can be also be used. There are many of these available as well. Sentences can be very useful since there are some talkers who can speak intelligibly with single words but may have greater difficulty with more complicated utterances. Effects of Predictability Speech is massively redundant, which means that listeners do not need to catch every little acoustic-phonetic detail in order to recognize what is being said. This applies to both words and sentences, but especially to sentences. All else being equal, as predictability goes up intelligibility goes up. Bone-headed simple example: Mary had a little [wildly distorted something-or-other]. There’s no mystery about the missing word. Striking demo from Warren (Science, 167, 392–3933). “The state governors met with their respective legislatures convening in the capital city.” Warren entirely deleted one of the speech sounds (the [s] of legislatures) and replaced it with a cough. The [s] was gone. Out of 20 listeners, 19 did not notice that anything was missing; one listener thought that a sound was missing but guessed wrong about which one. Q: How did listeners hear a sound that wasn’t there? A: Their brains created it. The effect is called phonemic restoration. What is the relevance of this to intelligibility testing? Pretty simple: One speaker is 75% intelligible, another is 50% intelligible, but different sentence intelligibility tests were used. Does that comparison mean anything? It’s hard to know for sure, but probably not. Example: HINT sentences: TIMIT sentences: