Hearing-impaired perceivers' encoding and retrieval speeds for auditory, visual, and audio-visual spoken words Philip F. Seitz and Ken W. Grant Army Audiology and Speech Center, Walter Reed Army Medical Center [Poster paper presented at the 133rd Meeting of the Acoustical Society of America, State College, PA, June 19, 1997. (Seitz, P.F. (1997). "Hearing-impaired perceivers' encoding and retrieval speeds for auditory, visual, and audiovisual spoken words," J. Acoust. Soc. Am. 101, 3155.)] Note: Full manuscript of this paper is currently in review. OBJECTIVES In older people with acquired hearing loss, investigate whether there are differences among auditory, visual (lipread), and audio-visual spoken words with respect to: • Perceptual encoding speed • Working memory representations. BACKGROUND Current theories of spoken word recognition assume implicitly that the same abstract phonological code underlies spoken words' memory representations irrespective of the sensory modality through which the representations are activated; that is, auditory (A), visual (V), and audio-visual (AV) speech signals differ with respect to the phonetic information they afford, but not the phonological representations they activate (Campbell, 1990). However, even if speech processing is not modality-bound, the phonetic informational differences associated with modality can be expected to affect the efficiency of perceptual encoding of spoken words, as informational differences between natural and synthetic speech do (e.g., Pisoni & Greene, 1990). To the extent that modality differences involve more than just phonetic information (e.g., Campbell, 1985, 1992), they might also affect the quality of words' working memory representations. Such modality-related effects could limit or potentiate the success of speech communication. Perceptual encoding speeds for A, V, and AV speech have not been compared previously. Assuming that 1) V tends to afford less phonetic information per unit of time than A or AV, and 2) the amount of useable phonetic information per unit of time is the main determinant of perceptual encoding costs for speech, then encoding V should be slower than encoding A or AV. Based on observations of 1) faster choice RT to phonetically richer than to phonetically poorer speech signals, e.g., natural vs. synthetic speech (see Pisoni and Greene, 1990); and 2) faster choice RT to bimodal than to unimodal nonspeech stimuli (e.g., Hughes et al., 1994), it would not be surprising to find faster perceptual encoding of AV than A spoken words. However, in a choice RT experiment using synthetic speech syllables, Massaro & Cohen (1983) found no speed "benefit" for AV versus A stimuli. The speed of retrieving information from working memory, as when a probe stimulus must be judged for membership in a short list of memorized items (Sternberg, 1966), can reveal characteristics of working memory representations. Typically, stimulus quality degradation slows encoding but does not slow retrieval. The classical account of this immunity of retrieval operations from stimulus quality factors is that the operands of the retrieval process are represented in an abstract internal code from which stimulus quality properties have been removed (Sternberg, 1967). Therefore, if it is appropriate to conceive of V speech stimuli simply as degraded (informationally poorer) forms of AV or A events, working memory representations arising from V speech signals should be retrieved with the same speed as representations arising from A or AV signals. If modality differences involve more than phonetic information, however, spoken words received in various modalities could have systematic representational differences which might be revealed by characteristic retrieval speeds. APPROACH Task: Sternberg "memory scanning": Subject is presented serially a "memory set" of 1-4 spoken words, then is presented a "probe" to which he/she makes a speeded YES/NO response to indicate whether the probe is a member of the memory set (Sternberg, 1966, 1967, 1969a, 1969b, 1975). See Figure 1. Fig. 1. Schematic representation of a typical memory scanning trial. Trial decpicted has a memory set size of 3 and a correct YES response. Design: All within-subjects. Dependent Variables: Intercept and slope of the memory set size-reaction time (RT) function, as fit to a linear model. Intercept relates to perceptual encoding cost. Slope relates to memory retrieval speed, i.e., the speed with which comparisons are made between working memory representations of the encoded memory set items and probe. Main Independent Variable: Modality (auditory, visual, audiovisual) of spoken word stimuli. Other Independent Variables: Subject age and 3-frequency pure-tone average audiometric threshold (PTA), a measure of severity of auditory disability. METHODS Subjects: • N = 26; • Mean age = 66.2 y., std. dev. = 6.12 • Mean 0.5, 1, 2 kHz average audiometric threshold = 37 dB HL, std. dev. = 11.6 (see Figure 2). Stimuli: Two tokens of each of 10 words from the CID W-22 list, spoken by one talker, professionally video recorded on optical disc: bread, ear, felt, jump, live, owl, pie, star, three, wool This set of words was selected so as to allow 100% accurate closed-set identification in A, V, and AV modalities. Procedure: • After passing a 20/30 vision screening test, subjects established their most-comfortable listening levels (MCL) for monotic, better-ear, linear-amplified presentation of the spoken words under an Etymotic ER-3A insert earphone. All subsequent auditory testing was done at individual subjects' MCLs. • In a screening test, subjects had to demonstrate perfect recognition of the 20 stimuli (10 words × 2 tokens) in A, V, and AV modalities, given advance knowledge of the words. Five subjects out of 33 recruited (15%) did not pass the V (lipreading) part of this screening. • After a 20-trial practice block in each modality, subjects completed three 80-trial A, V, and AV test blocks in counterbalanced order over three 2-hour sessions, thus providing 240 responses per modality and 60 responses per memory set size within a modality. See Figure 2 for trial design. • For each subject, 12 RT means were calculated (3 modalities × 4 memory set sizes). Incorrect responses were discarded, as were responses more than 2.5 standard deviations greater than the mean for each of the 12 cells (Ratcliff, 1993). Least-squares lines were fitted to the four data points representing each subject's memory set size-RT function. Two subjects out of 28 were dropped because of excessive errors or excessively long RT in testing. RESULTS Figure 3 displays the main RT results. Figure 4 summarizes response accuracy. Perceptual Encoding Speed: The intercept (b0) of the linear model of a subject's memory set size-RT function is assumed to represent the sum of encoding speed and motor execution speed. Motor execution speed is assumed to be constant within a subject, so it is relative differences among conditions within a subject that indicate factor effects. Wilcoxon Signed Ranks Tests (2-tailed) were performed on the intercepts of each modality pairing (AV-V, A-V, AV-A). The difference between each modality pairing was significant: • 25/26 subjects encoded AV faster than V (z = 4.41, p < .001) • 21/26 encoded A faster than V (z = 3.72, p < .001) • 21/26 encoded AV faster than A (z = 3.44, p < .001). Retrieval Speed: The slope (b1) of the linear model of a subject's memory set size-RT function is assumed to represent the time it takes to "scan" one item in working memory. The average r2 of linear models was greater than 0.9 for all three modalities, indicating that the data are well fit by a linear model. Wilcoxon Signed Ranks Tests (2-tailed) were performed on the slopes of each modality pairing, and showed that both AV and A scanning speed are significantly faster than V, but AV scanning speed is not significantly faster than A: • 20/26 subjects scanned AV faster than V (z = 3.39, p < .001) • 18/26 scanned A faster than V (z = 2.86, p < .004) • 14/26 scanned AV faster than A (z = 1.18, ns). Response Accuracy: Although subjects showed perfect AV, A, and V word identification accuracy during screening, in RT testing they made significantly more response errors in the V condition than in A or AV, as indicated in Figure 4 and in the outcomes of Wilcoxon Signed Ranks Tests (2-tailed). In V, number of errors was significantly correlated with memory set size (r = .351, p < .01). • 22/26 subjects made more errors in V than AV (z = 3.78, p < .001) • 21/26 subjects made more errors in V than A (z = 3.89, p < .001). Effects of Age: Age was significantly positively correlated with encoding speed only in V (r = .508, p < .01). Age was not significantly correlated with scanning speed in any modality. There were no significant correlations between age and modality-specific or overall error rates. Effects of Severity of Auditory Disability: The 3-frequency (0.5, 1, 2 kHz) pure-tone average audiometric threshold (PTA) was used as a measure of severity of auditory disability. PTA was not significantly correlated with encoding or scanning speed in any modality, nor with error rates. CONCLUSIONS Encoding Speed: • Encoding of lipread speech is slow, even with a closed set of words for which recognition accuracy is close to perfect. • Encoding of auditory speech is speeded up by the addition of visual speech information. This speed-up could partially underlie findings of superior understanding of audiovisual spoken discourse (e.g., Reisberg, McClean, and Goldfield, 1987). Working Memory Representations: • As measured by memory retrieval speed in the memory scanning task, representations of lipread words might be different from those of auditorily or audio-visually perceived words. • There is no evidence for representational differences between auditorily and audiovisually perceived words. Aging and Audiological Factors: • Aging affects speed of encoding visual speech. • Within the tested range of hearing loss, severity of auditory disability has no effect on modality-related encoding and retrieval speed. Alternative Accounts: Differences in error rates among the modalities suggest possible modality-related differences in attentional demands. Apparent representational differences could be due to under-rehearsal of memory set items due to competing demand of difficult encoding. However, this account is inconsistent with the classical claim of independent encoding and retrieval information processing stages. REFERENCES Campbell, R. (1985). "The lateralisation of lipread sounds: A first look," Brain and Cognition 5, 1-22. Campbell, R. (1990). "Lipreading, neuropsychology, and immediate memory," in Neuropsychological Impairments of Short-Term Memory, edited by G. Vallar and T. Shallice (Cambridge U. Press, New York), pp. 268-286. Campbell, R. (1992). "Lip-reading and the modularity of cognitive function: Neuropsychological glimpses of fractionation for speech and for faces," in Analytical Approaches to Human Cognition, edited by J. Alégria, D. Holender, J. Junça de Morais, and M. Radeau (Elsevier Science Publishers, Amsterdam), pp. 275-289. Hughes, H.C., Reuter-Lorenz, P.A., Nozawa, G., and Fendrich, R. (1994). "Visualauditory interactions in sensorimotor processing: Saccades versus manual responses," J. Exp. Psychol.: Human Percept. Perform. 20, 131-153. Massaro, D.W., and Cohen, M.M. (1983). "Categorical or continuous speech perception: A new test," Speech Commun. 2, 15-35. Pisoni, D.B., and Greene, B.G. (1990). "The Role of Cognitive Factors in the Perception of Synthetic Speech," In Research on Speech Perception Progress Report No. 16 (pp. 193-214). Indiana University. Ratcliff, R. (1993). "Methods for dealing with reaction time outliers," Psychol. Bull. 114, 510-532. Reisberg, D., McLean, J., and Goldfield, A. (1987). "Easy to hear but hard to understand: A lipreading advantage with intact auditory stimuli," in Hearing by Eye: The Psychology of Lipreading, edited by B. Dodd and R. Campbell (Lawrence Erlbaum Assoc., Hillsdale, NJ), pp. 97-114. Sternberg, S. (1966). "High-speed scanning in human memory," Science 153, 652-654. Sternberg, S. (1967). "Two operations in character recognition: Some evidence from reaction-time measurements," Percept. and Psychophys. 2, 45-53. Sternberg, S. (1969a). "The discovery of processing stages: Extensions of Donders' method," in Acta Psychologica 30, Attention and Performance II, edited by W.G. Koster (North Holland, Amsterdam), pp. 276-315. Sternberg, S. (1969b). "Memory scanning: Mental processes revealed by reaction-time experiments," American Scientist 57, 421-457. Sternberg, S. (1975). "Memory scanning: New findings and current controversies," Q. J. Exp. Psychol. 27, 1-32. ACKNOWLEDGMENT Supported by research grant numbers R29 DC 01643 and R29 DC 00792 from the National Institute on Deafness and Other Communication Disorders, National Institutes of Health. This research was carried out as part of an approved Walter Reed Army Medical Center, Department of Clinical Investigation research study protocol, Work Unit 2553, entitled "Scanning Memory for Auditory, Visual, and Audiovisual Spoken Words." The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or reflecting the views of the Department of the Army or the Department of Defense. CONTACT INFORMATION Philip F. Seitz, Ph.D. Army Audiology and Speech Center Walter Reed Army Medical Center Washington, DC 20307-5001 (202) 782-8579 Philip-Franz_Seitz@wramc1-amedd.army.mil