Copyright 1984 by the American Psychological Association, Inc Developmental Psychology 1984, Vol. 20. No. 2. 212-218 Children's Memory for Auditory and Visual Information on Television Kathy Pezdek Ellen Stevens daremont Graduate School Stanford University This study examines the relationship between children's cognitive processing of video and audio information on television. Ninety-six 5-year-old children viewed a videotaped segment of Sesame Street followed by a comprehension test and a recognition test. Equal numbers of subjects viewed an experimental segment in which (a) the audio and video tracks were from the same segment (A/V match), (b) the audio and video tracks were not from the same segment (A/V mismatch), (c) the video track was presented alone, or (d) the audio track was presented alone. This design allows unconfounded comparisons of modality-specific processing. In the A/V mismatch condition, memory for audio information was reduced more than memory for video information. However, comprehension and recognition of audio information was similar in the audio-only and A/V match conditions. These results suggest that in regular television programs, the video information does not interfere with processing the audio information, rather, the video material simply appears to be more salient and more memorable than the audio material. Television presents a natural medium for studying cognitive processing of visual and auditory information. Comparisons of visual and auditory (most often verbal) processing have been of interest to researchers in cognitive psychology (Baggett, 1979; Pezdek, 1980) and education (Meringoff, 1980; Salomon, 1979). However, few researchers have examined modality-specific processing of television. Only recently have "television researchers" directed their attention specifically to how viewers watch television and what information they retain from watching television. The present study examines the relationship between children's memory for visual and auditory information presented on television. Specifically, two questions are addressed in this study. First, does processing video information interfere with processing the simultaneously presented audio information? Several studies have reported higher rates of memory and comprehension for video than audio information from intact television programs (Hayes This research was supported by a grant from the National Institute of Education. We thank Edward Teyber and Daniel Anderson for critical comments on the manuscript. Requests for reprints should be sent to Kathy Pezdek, Psychology Department, Claremont Graduate School, Claremont, California 91711. 212 & Birnbaum, 1980; Ward & Wackman, 1973; Zuckerman, Ziegler, & Stevenson, 1978). There is a problem with these comparisons in that the difference may be an artifact of the test items; that is, there is no way to assess the comparability of the audio and video test items. Nevertheless, the superior memory for video over audio information does appear to be robust. The first issue to be addressed in the present study is whether this difference is because the audio information is just not as compelling in some way as the video information, or does processing the video information interfere with processing the audio information? A number of studies in the cognitive literature have reported that visual stimuli tend to dominate over other modalities in both perceptual and memory tasks (Posner, Nissen, & Klein, 1976). However, none of these studies has used simultaneously presented visual and auditory dimensions of the same semantic message. In situations of this type, it is not clear whether visual superiority would be the result of the visual stimulus interfering with processing the auditory stimulus or whether the visual stimulus would simply be more salient and more memorable. In a relevant study, Beagles-Roos and Gat (1983) presented 6- to 11-year-old children with identical stories on the radio or on tele- CHILDREN'S AUDITORY AND VISUAL MEMORY FOR TELEVISION vision. On a later verbal-recall test, the children performed better in the radio than television condition. However, on the visual-memory test (a picture-arrangement task), the children performed better in the television than radio condition. A similar finding was reported by Meringoff (1980) comparing children's verbal and visual memory following listening to a picture book read versus viewing a narrated video film of the same story. These researchers concluded that processing the visual channel on television interferes with processing the verbal channel by making it less salient. However, an alternative interpretation is that the visual channel on television is more salient than the auditory-verbal channel and is therefore more memorable, but the visual channel does not reduce the salience of the auditoryverbal channel. The present study tests this interpretation. A relevant suggestion also follows from findings reported by Pezdek (1980). In this study, subjects were serially presented pictures and related sentences. Sixth graders (and young adults) integrated in memory the semantically related items despite the modality difference. However, third graders (and older adults) retained the pictures and sentences in an unintegrated form. Perhaps young children are also more likely to process audio and video information from television as separate information sources, rather than as a well-integrated whole. If so, the primary modality source would be expected to interfere with the secondary modality source. The second question of interest in the present study involves incompatible television programs in which the audio track from one program is combined and presented with the video track from a different program. With programs of this type, with which viewers have to choose which modality to process, how will processing be allocated to the video as compared with the audio track? With this manipulation it is possible to determine how children process television during ambiguous segments. Hayes and Birnbaum (1980) compared children's recognition for audio versus video portions of incompatible programs. They concluded that recognition was consistently higher for video than audio information. However, there were several problems with the study. First, as mentioned earlier, the difficulty of the 213 audio and video test questions and the extent to which each type of question could be answered from information presented in the other modality were not controlled for. It would have been useful, for example, to know (a) the guessing rate for both the audio and video questions for subjects who had not seen the modality channel on which the questions were based, and (b) the extent to which each type of question could be answered from information presented on the other channel alone. In this way, the possibility of confounding effects of test-item difficulty and the extent to which the items tested cross-modality information could be assessed. Second, within this study the critical comparisons of memory for incompatible audio versus video information were made on a limited number of program segments and only two (Experiment 1) or five (Experiment 2) questions from each channel of each segment. The effect might, therefore, be an artifact of the segments and test items used. Third, in Experiment 3 of this study, subjects were presented an intact program (Video A with Audio A) or a composite program (Composite 1 = Video A with Audio B; Composite 2 = Video B with Audio A). Recognition accuracy for the Video A information was compared in the intact versus Composite 1 program, and recognition accuracy of the Audio A information was compared in the intact versus Composite 2 program. However, all questions tapped information presented both visually and auditorily. Thus, the drop in recognition of "auditory" questions in the Composite 2 compared with the intact condition could have been due to the absence of the visual information necessary to answer the question. The result that there was no change in recognition accuracy of "visual" questions (although these questions also tapped both visually and auditorily presented information) could have been because the information presented on the compatible auditory channel was not as necessary for answering the specific questions used. The absence of questions that specifically relied on auditorily presented information alone and visually presented information alone thus limits the interpretation of Hayes and Birnbaum's (1980) study. The present study further examines children's memory for audio versus video infor- 214 KATHY PEZDEK AND ELLEN STEVENS mation on television. The audio and video tracks from four different television segments were multiply combined to produce conditions in which (a) the audio and video tracks were from the same segment (A/V match), (b) the audio and video tracks were not from the same segment (A/V mismatch), (c) the video track was presented alone, or (d) the audio track was presented alone. With this design, memory for the auditory track alone can be compared with memory for the same audio information presented in the A/V match condition. The confounding effect of test-item differences is thus avoided. This tests if processing the simultaneously presented visual information interferes with memory for audio information or if the audio information is simply less compelling than the video information. In addition, memory for the audio versus video portions of the A/V mismatch segments will be compared to examine how children process the two tracks when they have to choose between them. Method Subjects and Design Ninety-six children were recruited from kindergarten classes in public schools in the San Bernardino, California metropolitan area. The children were brought to the campus by a parent to participate. The experiment used a one-way, independent groups design with four conditions Twenty-four children were randomly assigned to each of four television program conditions (A/V match, A/V mismatch, video only, and audio only). The sex and ethnic mixture of the subjects was approximately equal in the four conditions, but these factors were not specifically controlled. The dependent variables were (a) the percentage of total viewing time that each child visually attended to the television, (b) recall accuracy on audio and video comprehension questions, and (c) accuracy in recognizing 5-s portions of the audio and video segments. Setting and Materials Children individually viewed a color videotaped segment from Sesame Street, selected from programs shown locally and edited by the experimenter. Although all of the subjects were familiar with Sesame Street, none had previously seen the experimental segment. There were four types of segments—A/V match, A/V mismatch, video only, and audio only. To increase the generalizability of the findings, however, there were four segments compiled for each of the four conditions, for a total of 16 segments. Each child viewed only one of these 16 segments Half of the segments were constructed from audio and video portions of two Bert and Ernie segments. Half were from audio and video portions of two Big Bird segments. The audio and video tracks from the two Bert and Ernie segments were combined to produce eight experimental segments as follows: (a) both of the compatible audio and video portions were presented together for the two A/V match segments; (b) the audio track of each segment was presented with the video track of the other Bert and Emie segment for the two A/V mismatch segments; (c) each of the two video tracks was presented alone with no accompanying audio track, (d) each of the two audio tracks was presented alone with no accompanying video track. The same procedure was applied with two Big Bird segments to produce the additional eight experimental segments. It is important to note that all of the A/V mismatch segments were arrived at by recombining audio and video portions of segments that involved the same characters Although the voices and mouth movement were not synchronized, the voices and visible characters were the same. Character-compatible audio and video sources in the A/V mismatch condition were included to avoid salient cues that the two sources did not go together The experimental segments were each 3 minutes long. Each was immediately preceded on the videotape by a 3minute intact Sesame Street segment. This filler segment was included to direct children's attention to the television as they situated themselves in the room and to reduce the primacy effect from the beginning of the experimental segment. The same filler segment was used in all conditions. The content of this segment was different from that of the experimental segments There were no test items on the filler segment. Procedure Each parent and child were brought into the comfortably furnished viewing room where the study was briefly explained The full session took 25 minutes. Each child participated individually. Children were instructed to watch television just like they would if they were in their own home. They were also told that they would be asked a few questions about the television sequence when it was finished The experimenter then turned on the television and left the room with the parent, leaving the child alone One observer behind a one-way mirror recorded the child's visual attention to the television during the experimental segment. Thus, visual but not auditory attention was monitored in the experiment. Only one observer was deemed necessary due to consistently high interobserver reliability (r = .98) with this measure in a previous study (Pezdek & Hartmann, 1983). The observer knew which condition each subject was in but was "bund" to the specific predictions in the study The observer depressed a push button attached to a timer every time the child looked at the television and released it when the child looked away. The percentage of the total time that the child was visually attending to the television during the experimental segment was thus calculated. Each child viewed the 3-minute filler segment and the 3-minute experimental segment from the condition to which they had been assigned. At the end of this sequence the experimenter returned to the room to test the child's comprehension and recognition accuracy for the experimental segment In the comprehension test, children were asked six questions from auditorily presented information in the segment and six questions from visually presented CHILDREN'S AUDITORY AND VISUAL MEMORY FOR TELEVISION information in the segment. Half of the subjects received the video questionsfirst,and half received the audio questions first The order of the audio and video questions was randomized for each subject. Each answer was scored on the following 3-point scale: 2 points if the child answered correctly; 1 point if a prompt from the experimenter was necessary before the child answered, and 0 points if the child could not answer or answered incorrectly with the help of the prompt Prompts were essentially restatements of the original questions but with an additional piece of information given For example, one of the original visual questions was, "What things were on top of the cabinet that Big Bird was working on?" If the child could not answer the question or answered it incorrectly, a prompt was offered, "What tools were on top of the cabinet that Big Bird was working on?" Prompts to visual questions always included visually presented information. Prompts to auditory questions always included auditorily presented information. Visual comprehension questions were generated by first watching each segment with the sound turned off. Then to check the questions, the segments were listened to with the picture turned off. Questions were eliminated if it was judged that they could be answered without the visual signal. The converse procedure was followed to generate auditory comprehension questions. Examples of visual comprehension questions are the following' "What type of hat did Big Bird pick out to wear to the party?" "What did Bert find in his toy box?" Examples of auditory comprehension questions are the following. "What languages could Mr. Hooper speak?" "Who did Big Bird want to invite to the party?" In the A/V match and A/V mismatch conditions, subjects were asked questions on the audio and video portions that had been presented—whether they were compatible or not. In the video-only condition, subjects were asked questions on the video track that had been presented and also on the compatible audio track that was not heard In the audio-only condition, subjects were asked questions on the audio track that had been presented and also on the compatible video track that was not seen. Audio questions in the video-only condition and video questions in the audio-only condition were included to get baseline response rates on all questions. The recognition test followed the comprehension test. Each child was presented twenty-two 5-s video portions and twenty-two 5-s audio portions on videotape As each item was presented, the subject responded "yes" or "no," had the item been seen or heard in the segment just presented? Half of the audio and video portions were from the specific segment presented to each subject and half were from the other segment that included the same characters (i.e., Bert and Ernie or Big Bird) but had not been presented to them. In the video-only condition, subjects were tested on video but not audio recognition items. In the audio-only condition subjects were tested on audio but not video recognition items. In both other conditions subjects received both audio and video recgmtion items. Half of the subjects received the audio test items first and half received the video test items first. curacy on video and audio comprehension questions, and video and audio recognition accuracy. An initial analysis of variance (ANOVA) indicated no significant effect of either segment type (four different segments were used) or test-item order (audio or video items first) on any measures. Thus, the critical analyses presented are the separate one-way ANOVAS carried out on each measure. The rejection region for all analyses was p < .05. Visual Attention The mean percentage of the total time that children visually attended to the experimental segment was calculated in each of the four conditions. These data are presented in the first column of Table I.1 A one-way ANOVA yielded a significant difference among the four conditions, F\3, 92) = 92.83, MSC = .021. Schefie comparisons indicated that attention was significantly less in the audio-only condition than in each of the other conditions. The only other significant difference was that visual attention was less in the video-only condition than in the A/V match condition. Video Comprehension Children's responses to comprehension questions were coded on a 0 (incorrect answer) to 2 (correct answer without prompt) scale. The mean score for video questions in each of the four conditions is presented in the second column of Table 1. A one-way ANOVA resulted in a significant difference among the four conditions, /1(3, 92) = 30.28, MSC = .170. Schefie tests yielded all comparisons significantly different except the difference between the A/V mismatch and the video-only conditions. To examine these differences more systematically, consider the score in the audio-only condition (.51) to be the chance response rate on video comprehension questions. Comprehension accuracy then was significantly better than chance in each of the other three conditions. Also, video comprehension was significantly higher in the A/V match condition than in 1 Results The principal measures were percentage of visual attention to the television, recall ac- 215 The level of visual attention in the A/V match condition was similar to the 88% and 89%figuresreported in comparable conditions by Pezdek and Hartmann (1983) and Lorch, Anderson, and Levin (1979), respectively. 216 KATHY PEZDEK AND ELLEN STEVENS Table 1 Mean Performance on Each Measure in Each Condition Condition Percentage of visual attention Video (V) comprehension accuracy (0-2 range) Audio (A) comprehension ;accuracy (0-2 range) Video recognition: d' Audio recognition: d A/V match A/V mismatch Video only Audio only 91.6 83.8 74 0 28.4 1.61 1 13 1.28 0.51 1.51 0.85 0.57 1.20 3.92 3.06 3.15 3.04 1.05 either the A/V mismatch or video-only conditions, which did not differ from each other. Audio Comprehension The mean comprehension score for audio questions in each of the four conditions is presented in the third column of Table 1. A oneway ANOVA indicated a significant difference among these four conditions, 7*1(3, 92) = 17.13, MSe = .234. Scheffe tests were conducted to more specifically examine the data. The comprehension score in the video-only condition (.57) is considered to be the chance response rate on the audio questions. Scores in the A/ V match and audio-only conditions were not significantly different, and both were significantly better than chance. However, audio comprehension in the A/V mismatch condition was both not significantly better than chance and significantly less than that in the A/V match condition. Video Recognition Accuracy The recognition data were transformed to the signal detection measure of d'. The d' values were included because the conditions of the experiment suggested that response bias as well as sensitivity might affect recognition accuracy. The values of d' reflect subjects' ability to distinguish old from new test items. (See Banks, 1970, for an explanation of Signal Detection Theory.) The procedure outlined by Hochhaus (1972) was followed for calculating d! values. The mean d' value for video recognition items in each of three conditions is presented in the fourth column of Table 1. Video recognition items were not presented to subjects in the audio-only condition, and audio recognition items were not presented to 2.17 subjects in the video-only condition. A oneway ANOVA yielded the differences among conditions nonsignificant, F(2, 69) = 3.03, MSe = 1.770. Audio Recognition Accuracy The mean d' value for audio recognition items in each of three experimental conditions is presented in the fifth column of Table 1. A one-way ANOVA indicated significant differences among conditions, F\2, 69) = 9.66, MSe = 2.409. Scheffe tests revealed that recognition accuracy in the A/V mismatch condition was significantly less than that in both the A/V match and the audio-only conditions, which did not differ from each other. Discussion The results are discussed primarily in terms of the issues raised in the introduction. The first question is whether processing video information interferes with processing the simultaneously presented audio information on television. To answer this question, comprehension and recognition of auditory information are compared in the audio-only condition and in the A/V match condition. In this way performance on the same audio test items can be compared with and without the accompanying video material. As reported in Table 1, comprehension and recognition of auditory information was not significantly different in the audio-only and the A/V match condition. Thus, the video channel in the A/ V match condition did not interfere with processing the audio channel. This suggests that the audio and video channels in the A/V match condition, and in most regular television programs, should not be considered as independently processed. Rather, audio and video in- CHILDREN'S AUDITORY AND VISUAL MEMORY FOR TELEVISION formation on television appear to be processed together as part of a single integrated stimulus. An alternative interpretation of the above result is that comprehension and recognition of auditory information in the audio-alone and A/V match conditions did not differ because processing of auditory information was actually elevated above normal in the audio-only condition. However, this interpretation flies in the face of results from numerous other studies. For example, Lorch, Anderson, and Levin (1979), in one condition, examined the relationship between visual attention to television and comprehension of information presented only auditorily. They reported a significant and substantial positive correlation between visual attention to television at the specific time that the auditory information was presented and comprehension accuracy on questions probing this auditory information. In other words, children comprehended more auditory information of television when they were looking at the television than when they looked away. In the present study, visual attention to the television in the audio-only condition was only 28.4%. Furthermore using materials similar to those used in the present study, Pezdek and Hartmann (1983) reported that children semantically processed auditory television segments primarily during periods of visual attention. Thus, it is unlikely in the present study that processing of auditory information was elevated above normal in the audio-only condition. The second question raised in the present study is how do children allocate their processing to the audio versus video channels on television in the A/V mismatch condition? The results suggest that when children have to choose one channel or the other, processing the audio information suffers more than does processing the video information. Visual attention in the A/V mismatch condition was not significantly less than in the A/V match condition. However, memory for both audio and video information declined when the two modality channels were incompatible. Audio comprehension and recognition were significantly reduced in the A/V mismatch condition, with audio comprehension not significantly different than chance. Video comprehension and recognition were relatively more 217 accurate and significantly better than chance. These findings are consistent with the literature on the visual-dominance effect, in which less complex perceptual and memory tasks have been used (cf. Posner et al., 1976). In summary, the suggestion by Hayes and Birnbaum (1980) that "nursery-school children have a strong tendency to "look and not listen" while attending to television" (p. 415) is disputed by the present findings. Children's comprehension and recognition of audio information in the A/V match condition were significantly greater than chance and not significantly different than in the audio-only condition. However, the conclusion by Hayes and Birnbaum (1980), Ward and Wackman (1973), and others that video information is better retained than audio information on television is supported with qualifications. With the A/ V match segments in the present study, comparable to typical television programs, comprehension and recognition were similarly high for audio and video information. However, these data cannot be directly compared because they are based on different test items. The relative memorability of audio and video information can be examined in the A/V mismatch condition. In the A/V mismatch condition, comprehension and recognition of audio information were reduced more than video information. When subjects had to choose which of two incompatible channels to process, the video channel was favored, and memory for the audio information was reduced to chance. Together, these results suggest that in typical television programs the video information does not interfere with processing the audio information, rather, the video material simply appears to be more salient and memorable than the audio material. References Baggett, P. (1979). Structurally equivalent stories in movie and text and the effect of the medium on recall. Journal of Verbal Learning and Verbal Behavior, 18, 333-356. Banks, W. P. (1970). Signal detection theory and human memory Psychological Bulletin. 74, 81-99 Beagles-Roos, J., & Gat, I. (1983). The specific impact of radio and television on children's story comprehension. Journal of Educational Psychology, 75. 128-137. Hayes, D. S., & Birnbaum, D. W (1980). Preschoolers' retention of televised events: Is a picture worth a thousand words7 Developmental Psychology; 16, 410-416. 218 KATHY PEZDEK AND ELLEN STEVENS Hochhaus, L. (1972). A table for the calculation of d and B. Psychological Bulletin, 77, 375-376. Lorch, E. P., Anderson, D. R., & Levin, S. R. (1979). The relationship of visual attention to children's comprehension of television. Child Development, 50, 722-727. Meringoff, L. K. (1980). Influence of the medium on children's story apprehension. Journal ofEducational Psychology, 72, 240-244. Pezdek, K. (1980). Life-span differences in semantic integration of pictures and sentences in memory Child Development, 51, 720-729. Pezdek, K., & Hartmann, E. P (1983). Children's television viewing: Attention and comprehension of auditory versus visual information. Child Development, 54, 1015-1023 Posnei; M. I., Nissen, M. J , & Klein, R M. (1976) Visual dominance: An information-processing account of its origins and significance. Psychological Review, 83, 157171. Salomon, G. (1979). Media and symbol systems as related to cognition and learning. Journal of Educational Psychology, 71, 131-148. Ward, S., & Wackman, D. B. (1973). Children's mformation processing of television advertising. In P. Clarke (Ed.), New models for mass communication research (Vol 2; pp 119-146). Beverly Hills, CA: Sage. Zuckerman, M., Ziegler, M., & Stevenson, H. W. (1978). Children's viewing of television and recognition of commercials Child Development, 49. 96-104. Received June 7, 1982 Revision received November 11, 1982 • Manuscripts Accepted for Publication Children's Conceptions of Friendship. A Multimethod Study of Developmental Changes. Wyndol Furman (Department of Psychology, Child Study Center, 2460 South Vine Street, University of Denver, Denver, Colorado 80208) and Karen Linn Bierman. The Stability and Determinants of Sociometric Status and Friendship Choice: A Longitudinal Perspective. William M. Bukowski and Andrew F. Newcomb (Department of Psychology, Michigan State University, East Lansing, Michigan 48824). Social Organization, Physical Environment and Infant-Caretaker Interaction. R. H. Woodson (Department of Psychology, Mezes Hall 330, University of Texas, Austin, Texas 78712) and E M. deCosta-Woodson. Children's Dispositions and Mother-Child Interaction at 12 and 18 Months; A Short-Term Longitudinal Study. Eleanor E. Maccoby (Department of Psychology, Jordan Hall, Bldg. 420, Stanford University, Stanford, California 94305), Margaret Ellis Snow, and Carol Nagy Jacklin. Children's Comprehension of Televised Formal Features With Masculine and Feminine Connotations. Aletha C. Huston (CRITC, Department of Human Development, University of Kansas, Lawrence, Kansas 66045), Douglas Greer, John C Wright, Renate Welch, and Rhonda Ross. Relationships Among Selected Child-Rearing Variables in a Cross-Cultural Sample of 110 Societies. David S. Zern (Department of Education, Clark University, 950 Mam Street, Worcester, Massachusetts 01610). Relations Between Reflection-Impulsivity and Behavioral Impulsivity in Preschool Children. James B. Victor, Charles F. Halverson (Department of Child and Family Development, Dawson Hall, University of Georgia, Athens, Georgia 30602), and Ruth B. Montague. Characteristics of 3-4 Year Olds Assessed at Home and Interactions in Preschool. R A. Hinde (MCR Unit of the Development and Integration of Behavior, University Sub-Department of Animal Behavior, Madingley, Cambridge, England CB3 8AA), J. Stevenson Hinde, and A. Tamplm. Arousal Effects on Visual Preferences in Neonates Judith M Gardner (Department of Psychiatry, Albert Einstein College of Medicine of Yeshiva University, 1300 Morris Park Ave., Bronx, New York 10461) and Bernard Z. Karmel Child Influences on Adult Controls: An Experimental Investigation. Molly A. Brunk and Scott W. Henggeler (Department of Psychology, Memphis State University, Memphis, Tennessee 38152). Accuracy and Consistency in the Development of Social Perception Richard N. Maclennan and Douglas N. Jackson (Department of Psychology, The University of Western Ontario, London Ontario Canada N6A 5C2) Predictors of Social Acceptance in Preschool Children. Lorene C Quay (Department of Early Childhood Development, Georgia State University, University Plaza, Atlanta, Georgia 30303) and Olga S. Jarrett. Preschool Children's Conceptions of Transgressions: The Effects of Varying Moral and Conventional DomainRelated Attributes. Judith G. Smetana (Graduate School of Education and Human Development, University of Rochester, Rochester, New York 14627). Paternal Determinants of Female Adolescent Marijuana Use. Judith S. Brook (Mount Sinai School of Medicine, One Gustave L. Levy Place, New York, New York 10029), Martin Whiteman, Ann Scovell Gordon, and David W. Brook. Socializing Procedures in Parent-Child and Friendship Relations During Adolescence. Fumiyo Tao Hunter (Child and Family Research Branch, Bldg. 31, Rm. B2B15, National Institute of Child Health and Human Development, Bethesda, Maryland 20205). (Continued on p. 228)