Journal of Communication Disorders 36 (2003) 129±151 Diagnostic accuracy and test±retest reliability of nonword repetition and digit span tasks administered to preschool children with speci®c language impairment Shelley Gray* Department of Speech and Hearing Sciences and The National Center for Neurogenic Communication Disorders, The University of Arizona, Tucson, AZ, USA Received 9 August 2002; received in revised form 28 December 2002; accepted 2 January 2003 Abstract To assess diagnostic accuracy and test±retest reliability, two forms of a nonword repetition task were administered to 22 preschool children with speci®c language impairment (SLI) and to 22 age- and gender-matched children with normal language (NL). Results were compared with performance on a digit span task and norm-referenced test scores. Nonword repetition scores provided excellent sensitivity and speci®city for discriminating between groups. Scores on both nonword repetition and digit span tasks improved signi®cantly from ®rst to second administrations for both groups, but remained relatively stable at the third administration. The SLI group appeared to bene®t more from repetition than the NL group. Acceptable levels of test±retest reliability were achieved for the digit span task, but not for the NL group on the nonword repetition task. These preliminary ®ndings suggest that with further re®nement to improve test±retest reliability, nonword repetition holds promise as a diagnostic measure for SLI in preschool children. Educational objectives: As a result of this activity, the participant will be able to (1) describe the content and administration of nonword repetition tasks; (2) explain why evidence of test±retest reliability is necessary before a measure may be considered reliable * Present address: Department of Speech and Hearing Science, Arizona State University, P.O. Box 870102, Tempe, AZ 85287-0102, USA. Tel.: 1-480-965-6796; fax: 1-480-965-8516. E-mail address: Shelley.Gray@asu.edu (S. Gray). 0021-9924/03/$ ± see front matter # 2003 Elsevier Science Inc. All rights reserved. doi:10.1016/S0021-9924(03)00003-0 130 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 for diagnostic purposes; and (3) accurately compare the sensitivity and speci®city of the nonword repetition task utilized in this study to standardized language test scores. # 2003 Elsevier Science Inc. All rights reserved. Keywords: Test±retest reliability; Nonword repetition; Speci®c language impairment; Preschoolers 1. Introduction Measures designed to quantify short-term phonological memory have been used increasingly to investigate the language skills of children. Over a decade ago, Gathercole, Baddeley, and colleagues (e.g., Gathercole & Baddeley, 1990b; Gathercole, Hitch, Service, & Martin, 1997; Gathercole, Willis, Baddeley, & Emslie, 1994) explored the relationship between phonological short-term memory and language acquisition using nonword repetition and digit span tasks. Their work demonstrated a strong relationship between retention of phonological information and vocabulary acquisition. This relationship appears in children as young as 4 years of age (e.g., Gathercole & Baddeley, 1989), and continues into adolescence (Gathercole, Service, Hitch, Adams, & Martin, 1999). In their research with children diagnosed with speci®c language impairment (SLI), Gathercole and Baddeley (1990a, 1990b) concluded that there was a ``causal'' connection between poor phonological short-term memory and poor vocabulary acquisition (although there is disagreement about this as discussed by Gathercole & Baddeley, 1995; Howard & van der Lely, 1995; van der Lely & Howard, 1993). Since that time, a number of studies have documented that children with language impairment (LI)1 perform more poorly on phonological memory measures than their peers with normal language (NL) (Bishop, North, & Donlan, 1996; Dollaghan & Campbell, 1998; Edwards & Lahey, 1998; Ellis Weismer et al., 2000; Gathercole & Baddeley, 1990a; Montgomery, 1995). Nonword repetition and digit span are both used to assess children's short-term phonological memory. Nonword repetition tasks require a child to listen to a series of nonsense words of varied lengths, and to repeat them correctly. Digit span tasks require a child to listen to single digit numbers presented in series of increasing lengths and to repeat them in the correct sequence. The maximum length (i.e., maximum number of numbers) that a child repeats correctly constitutes ``span.'' Gathercole et al. (1997) posited that nonword repetition provided a ``purer'' assessment of short-term phonological memory than digit span. They reasoned that, unlike the more familiar names for numbers used in digit span, prior lexical knowledge would not be used to supplement temporary representations in the phonological loop. Subsequent research demonstrated that nonword repetition performance may depend not only on phonological memory ability, but also on 1 Authors use the terms language impairment or specific language impairment in different studies to describe their participant's diagnoses. The authors use of LI or SLI is reproduced in this paper. S. Gray / Journal of Communication Disorders 36 (2003) 129±151 131 accumulated lexical knowledge. Gathercole and Baddeley (1990b) and Dollaghan, Biber, and Campbell (1995) demonstrated that the more ``wordlike'' the nonwords, the easier they are to repeat. The authors concluded that children with more developed lexicons may have an advantage in nonword repetition tasks because they can utilize words stored in long-term memory to help them remember nonwords. Thus, the ``nonlexical'' advantage of nonword repetition over digit span as a measure of phonological memory was called into question. Nonword repetition may resemble the challenge a child faces when hearing new words for the ®rst time more closely than digit span. Previous research employed nonwords of varying phonemic structures and lengths. In 1990, Gathercole and Baddeley (see Gathercole et al., 1994 for a detailed description) developed the Children's Test of Nonword Repetition (CNRep). It included 40 nonwords (10 each containing one, two, three, or four syllables). The original list of nonwords was later revised by replacing the onesyllable with ®ve-syllable nonwords. The nonwords were designed to ``conform to the phonotactic rules of English'' and to the ``dominant syllable stress patterns in English for words of that length.'' Researchers presented the nonwords via audiotape then immediately scored each repetition as correct or incorrect. If a child consistently pronounced one phoneme as another it was not counted as an error. The researchers acknowledged that a live system of whole-word scoring was approximate. Nevertheless, reliability calculations based on 104 4-year-old children from the sample of children who received the original nonword list were reportedly found to be high, with 97% agreement between live and tape-recorded responses based on whole-word, correct/incorrect scoring. Gathercole et al. (1994) published cross-sectional CNRep data for normally developing children ages 4 (N 142) through 9 (N 16) years. This study found that children's CNRep scores increased through age 8. Scores on the CNRep were reportedly ``highly and signi®cantly'' correlated with digit span at all ages tested: age 4, r 0:524; age 5, r 0:667 (Gathercole, Willis, & Baddeley, 1991) and age 8, r 0:445 (Gathercole, Willis, Emslie, & Baddeley, 1992). Test±retest reliability was considered ``satisfactory for psychometric purposes'' with correlation coef®cients of 0.77 for 5-year-olds and 0.80 for 7-year-olds when the CNRep was readministered after a 4-week interval. Early in its development, Gathercole and Baddeley (1990a) administered the ®rst version of the CNRep to 5-year-old children with SLI, and found that they performed poorer than children with NL matched for nonverbal cognitive ability and poorer than younger children with NL matched for language level. Bishop et al. (1996) reported similar ®ndings using the CNRep in a twin study of 39 7±9-year old children with ``persistent'' LI and 13 children with ``resolved'' LI (no longer enrolled in speech±language therapy), and in a second related twin study (Bishop et al., 1999). In these studies, Bishop and colleagues administered the nonwords live rather than via audiotape, but used the same whole-word scoring procedure. Montgomery (1995) developed a list of 48 nonwords (12 each containing one, two, three, or four syllables) to explore the relationship between phonological 132 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 working memory and sentence comprehension. They were presented to schoolage children (ages 5±11 years) via audiotape in random order. Four practice items were presented prior to the actual nonwords. Unlike the CNRep, the nonwords were repeated once if requested by the child, and children were allowed to attempt two repetitions of the nonwords. Researchers audiotaped responses, with the ®nal production scored for correct/incorrect repetition of the whole word. Point-to-point agreement between the original and second transcription of 50% of the NL and 50% of the SLI tapes resulted in 97 and 94% reliability, respectively. As in previous studies, children with SLI performed more poorly than their NL peers. They correctly repeated signi®cantly fewer three- and foursyllable nonwords, but between-group performance did not differ for one- or two-syllable nonwords. Edwards and Lahey (1998) utilized a list of six nonwords (three each containing three or four syllables) in a nonword repetition experiment designed to investigate variables that might explain nonword repetition inaccuracies exhibited by 6- and 7-year-old children with SLI. The nonwords ``obeyed the phonotactic constraints of English'' but contained no stressed syllables that were real words. One of the proposed reasons for NL groups outperforming SLI groups is that they have more experience producing varied phonological sequences such as found in nonword tasks. To investigate this possibility, Edwards and Lahey provided practice by presenting each nonword four successive times. Both the SLI group and the NL control group improved the accuracy of repetitions across administrations. However, the SLI group performed signi®cantly poorer than the NL control group each time. The consistently poorer performance by SLI groups relative to NL groups across studies provided researchers with the impetus to explore use of nonword repetition to evaluate whether performance on nonword repetition tasks might provide a less culturally-biased method of contributing to the diagnosis of SLI than currently available norm-referenced language tests. In a two-study series, Dollaghan and Campbell (1998) investigated whether nonword repetition could be used as a screening measure for LI in schoolage children. To address the concern that nonword repetition results might be in¯uenced by ``wordlikeness'' or articulatory dif®culty, Dollaghan and Campbell (1998) developed a list of 16 nonwords (four series, each containing one, two, three, or four syllables). According to the authors, ``neither the nonwords nor their constituent syllables correspond[ed] to lexical items.'' Each word comprised early developing, acoustically salient phonemes. No speci®c consonants or vowels occurred more than once within the same nonword. The stimuli were recorded in order to standardize presentation, with one-syllable nonwords presented ®rst, progressing in length to four-syllable nonwords. Responses were audiotaped for later phoneme-by-phoneme scoring, resulting in a dependent measure of ``Percentage of Phonemes Correct'' rather than number of whole nonwords repeated correctly. All phoneme substitutions were scored as incorrect. High scoring reliability was reported using this procedure (94% agreement for judgment of correctness). S. Gray / Journal of Communication Disorders 36 (2003) 129±151 133 In study 1 the authors compared the nonword repetition performance of 20 6to 9-year-old children enrolled in language intervention with 20 age-matched peers with NL, and found no overlap in scores between the two groups. Importantly, there was also no signi®cant difference in performance between the 25 African American participants and the 9 White participants in this study. According to Dollaghan and Campbell, this ®nding provided evidence that processing-dependent measures such as nonword repetition may be less culturally-biased than commonly used norm-referenced tests. In study 2, 85 5- to 12-year-old children (including the 40 age-matched children from study 1) (44 LI, 41 NL) completed the same nonword repetition task. Results indicated that children receiving language therapy were signi®cantly poorer at repeating nonwords than their NL peers. Results of likelihood ratio analyses utilizing performance on three- and four-syllable words demonstrated that children with and without LI could be accurately identi®ed based on their nonword repetition performance. In fact, nonword repetition performance was more accurate than the Test of Language Development Intermediate: 2 (TOLDI:2) (Hammill & Newcomer, 1988) in predicting language group status. Ellis Weismer et al. (2000) used the same list of nonwords with a populationbased sample of children to ``con®rm the extent to which nonword repetition performance can serve as a clinically useful index of language disorder . . ..'' The large sample of second-graders was part of an ongoing epidemiological study of SLI (see Tomblin et al., 1997). Children classi®ed as LI were required to score below ±1.25 S.D. on at least two of ®ve language composite scores based on local norms as described by Ellis Weismer et al. However, only 90 of the 164 children classi®ed as LI were enrolled in speech±language intervention. Children with LI and with NL were further classi®ed into two cognitive groups, those with normal nonverbal IQs (PIQ of 85 or above on the Wechsler Intelligence Scale for ChildrenIII (WISC-III; Wechsler, 1991)) and those with low nonverbal IQs (<85 PIQ). Both groups of children with LI scored signi®cantly lower on the nonword repetition task than the NL groups. In fact, the LI group with normal cognitive skills scored lower than the NL group with low cognitive skills. Children enrolled in language intervention scored signi®cantly lower than those not enrolled in intervention. Use of likelihood ratio analyses in this study did not result in the high levels of diagnostic accuracy found by Dollaghan and Campbell (1998), although the authors highlighted a number of differences in selection criteria and sample classi®cation that may have contributed to the different ®ndings. Most notably, Dollaghan and Campbell's participants with SLI represented a clinically referred sample, and the authors utilized relatively equal numbers of NL and LI children, rather than a population-based sample with unequal representation of NL and LI. In addition, some of the children in the NL group had a prior history of LI. Each of the preceding studies employed nonword repetition as a measure of phonological memory using different stimuli, instructions, scoring, and participant selection criteria. Despite these methodological differences, the results were consistent: as a group, children with SLI had considerable more dif®culty 134 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 repeating nonwords than children with NL. Thus, nonword repetition, in addition to being a measure of phonological memory, now also is considered to be a possible diagnostic measure of SLI that has only been investigated in schoolage children. Further research is needed to support nonword repetition's diagnostic validity and reliability, and to investigate whether it may be used for this purpose with preschoolage children. To be considered a valid diagnostic measure, performance on nonword repetition tasks must accurately discriminate children with and without SLI. Plante and Vance (1994) suggested that 90% should be considered good discriminate accuracy for diagnostic measures of language impairment in children. That is, a measure should demonstrate at least 90% sensitivity (identi®es SLI as SLI) and 90% speci®city (identi®es NL as NL). Theoretically, the signi®cantly different performance of children with and without the disorder that permits good discriminate accuracy occurs because poor phonological memory capacity negatively affects language acquisition. If this is the case, nonword repetition performance should be signi®cantly correlated with other measures of phonological memory, such as digit span, that might also be an index of SLI. To be considered a reliable diagnostic measure, performance on nonword repetition tasks must be stable across time. One important measure of stability is test±retest reliability, typically indexed by a signi®cant correlation between scores on successive administrations of a test. If the correlation is high, it may be argued that the skill being measured is stable and that the test accurately measures it. The closer in time the administrations, the higher the expected correlation. The purpose of this study was to conduct a preliminary evaluation of the usefulness of nonword repetition as a diagnostic measure of SLI in younger children than previously studied, and to gather information about the validity and reliability of this measure. The performance of preschoolers with SLI was compared with age- and gender-matched peers with NL. The diagnosis of SLI was made using the standard of therapy enrollment. This allows ®ndings to be generalized to the population of preschool children with SLI that SLPs serve on their caseloads. The stimuli, administration, and scoring procedures were selected to promote high levels of test±retest, administration, and scoring reliability while minimizing scoring time. This was accomplished by delivering the stimuli via computer, utilizing whole-word scoring, administering the ®rst two nonword repetition tasks only 1 day apart, and by providing word repetition practice between the second and third administrations of the nonword task to determine whether this would affect performance. Two forms of the CNRep were administered to permit alternate forms comparison. The diagnostic accuracy of nonword repetition and digit span were compared with the diagnostic accuracy of the Structured Photographic Expressive Language Test Ð II (SPELT-2) (Werner & Krescheck, 1983), a test shown to have good discriminate accuracy for preschoolers with SLI (Plante & Vance, 1994). The speci®c questions were: (1) Do the NL and SLI groups differ signi®cantly on nonword repetition or digit span performance? (2) Does performance on the S. Gray / Journal of Communication Disorders 36 (2003) 129±151 135 nonword repetition task or digit span task accurately discriminate preschool children with and without SLI? (3) Do scores remain stable with repeated administration, providing evidence for test±retest reliability? (4) Does word repetition practice improve nonword repetition task performance? (5) Is there a signi®cant difference between performance on alternate forms of the nonword repetition task? 2. Methods 2.1. Participants Twenty-two children diagnosed with SLI and 22 children with normally developing language (NL) participated in the study. Each child with NL was selected to match a child with SLI for gender and age (3 months). All children were between the ages of 4, 0 (years, months) and 5, 11 and spoke a standard dialect of English as their primary language by parent and teacher report. Table 1 provides descriptive information about both groups. Of the 22 children in the SLI group, 2 were Asian American, 6 were Hispanic, 13 were White, and 1 was Other. In the NL group 2 were Hispanic, 19 were White, and 1 was Other. Each group had 5 girls and 17 boys. A parent or guardian of each child completed a questionnaire regarding the child's developmental history, primary language, and the number of years of education the child's parents or guardians completed. Children were from similar middle-class socioeconomic backgrounds and their mothers reported similar levels of education. Children in the SLI group were selected for inclusion using the standard of intervention status (see Dollaghan & Campbell, 1998). They were recruited Table 1 Subject description information for children in both language groups NL group Age in months Mothers' years of education K-ABC* SPELT-II* BBTOP WI* PPVT-III SLI group Mean S.D. Range Mean S.D. Range 60.09 15.32 110.95 104.10 70.68 ±a 6.06 1.83 10.39 17.32 10.42 ±a 20 5.00 45.00 70.00 46.00 ±a 60.14 15.64 98.57 38.36 30.48 95.6 8.11 1.60 12.46 30.12 21.61 13.82 34 5.00 47.00 88.75 78.00 56.00 Note: K-ABC Nonverbal Scale of the Kaufman Assessment Battery for Children (Kaufman & Kaufman, 1983) (M 100; S:D: 15); SPELT-II Structured Photographic Expressive Language Test Ð II (Werner & Krescheck, 1983) (M 100; S:D: 15); BBTOP Bankson±Bernthal Test of Phonology Word Inventory Score (number correct out of 80) (Bankson & Bernthal, 1990); PPVT-III Peabody Picture Vocabulary Test Ð Third Edition (Dunn & Dunn, 1997) (M 100; S:D: 15). a Test not administered to NL group. * P < 0:05. 136 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 from public school and clinic programs in Tucson that provide language therapy for preschoolers. To qualify for services in these programs, children must score more than 1.5 S.D. below the mean on two norm-referenced language tests. Children were enrolled in the Child Language Center's (CLC) Wings On Words Preschool or in their own preschool program. The Wings On Words program provides a language-rich curriculum for children with SLI and NL and language therapy for children with SLI. An ASHA-certi®ed SLP from the CLC determined that each child classi®ed as SLI met the following additional selection criteria: 1. Hearing within normal limits bilaterally (25 dB HL) at 500, 1000, 2000, and 4000 Hz (American National Standards Institution ANSI, 1989). 2. Normal nonverbal intelligence as indicated by a nonverbal IQ score of 75 or higher on the Nonverbal Scale of the Kaufman Assessment Battery for Children (K-ABC) (Kaufman & Kaufman, 1983). According to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, mental retardation is characterized by an IQ of approximately 70 or below. The nonverbal IQ cutoff score of 75 used in this study allows for a fivepoint standard error of measure. 3. With the exception of language, articulation, or phonological problems, no evidence of a frank neurological problem or additional developmental disorder reported by the parent. 4. Speech intelligibility judged to be adequate for applying scoring procedures based on a 3- to 5-min story-retelling task administered by the SLP, and responses to norm-referenced tests. Therapy records indicated that certi®ed SLPs providing language therapy services to these children during the time of the study targeted grammar objectives for each child. Children in the NL group were enrolled in the CLC Wings On Words Preschool or another Tucson preschool at the time of the study. A teacher of each child in the NL group was asked to complete a questionnaire regarding the child's speech, language, motor, cognitive, and social-skill development. An ASHA-certi®ed SLP from the CLC determined that each child classi®ed as NL met the following selection criteria: 1. Hearing within normal limits bilaterally (30 dB HL) at 500, 1000, 2000, and 4000 Hz bilaterally (ANSI, 1989) (a screening level of 30 Hz was required because of ambient noise in some locations). 2. Normal nonverbal intelligence as indicated by a nonverbal IQ score of 75 or higher on the Nonverbal Scale of the Kaufman Assessment Battery for Children (Kaufman & Kaufman, 1983). 3. Age-expected progress in school by teacher report. 4. Normal motor, cognitive, social-emotional, speech, and language development by parent and teacher report. S. Gray / Journal of Communication Disorders 36 (2003) 129±151 137 Child Language Center SLPs or graduate students under their supervision administered the Structured Photographic Expressive Language Test Ð II (Werner & Krescheck, 1983) and the Bankson±Bernthal Test of Phonology (BBTOP) (Bankson & Bernthal, 1990) to all participants to further describe their speech and language skills. The Peabody Picture Vocabulary Test Ð Third Edition (PPVT-III) (Dunn & Dunn, 1997) was also administered to children in the SLI group. Table 1 lists the mean test scores for each group. 2.2. Procedures Speech and language tests were administered several days before the children started the research tasks. During the study, children worked one-on-one with a research assistant (RA) in a room at their school for approximately 30 min. On the ®rst day of task administration the digit span and the ®rst nonword repetition list were administered. On the second day the same digit span task was readministered, and the same nonword repetition list, except that the word order was changed. On days 3±6 children, grouped in pairs, played board games while they practiced repeating English words. One week after the second administration the same digit span task and the second nonword repetition list were administered. The presentation order of nonword lists A and B (described under Section 2.4) was counterbalanced across children. 2.3. Word repetition practice On the 4 days between the second and third administrations of the nonword repetition and digit span tasks, children met with an RA and another child enrolled in the study to practice repeating one-, two-, three-, four-, and ®ve-syllable English words. The words were practiced between turns of a game that were versions of Blue's Clues, Candyland, sticks and marbles, or magnetic ®shing. The practice words are listed in Table 2. The same set of words was always practiced with the same game, but the order of presentation was counterbalanced across children. Each child practiced each list one time. 2.4. Stimuli and scoring Nonwords from the CNRep developed by Gathercole et al. (1994) were utilized for the nonword repetition task. The original list of 40 nonwords was randomly divided by syllable length into two lists of 20 nonwords (A, B) reported in Table 3. The shorter list reduced administration and scoring time, and permitted test±retest reliability to be calculated for two different nonword lists. The lists were counterbalanced across children. Each child repeated nonwords from the same list two consecutive days; however, the order of the words was changed. One week later they repeated words from the other list. For all days the pattern of presentation included a two-, three-, four-, then ®ve-syllable nonwords followed 138 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 Table 2 List of English practice words repeated while playing games Blue's clues Candyland Sticks and marbles Magnetic fishing Words quaff moity ambulate meticulous polarization fez nougat conjugate lenticular ramification hast opus filigree infirmity macadamia gout pilar infrared habitable hydrophobia etch roily masculine estuary expediency drab shanty pacify disconsolate disintegration cease tundra radial congressional catamaran bask umber tabulate belligerent analytical quaint miser ambition morphology proximity fen nascent confiscate linoleum rudimentary luge oblique exertive jubilation orientation hasp pestle inflection incandescent indeterminate gaff rivet lucrative exhilarant hemophilia dint shallot ordinance dormitory documentary chafe tepid quavering congruity conglomerate baste ulu serenade calligrapher anatomical qualm meager asterisk parochial sedimentary flay morsel commingle mendacity serendipity liege nabob deficit lavatory proximity haze papal gabardine indicative infatuated gape resin invalid gesticulate hydrodynamics deign savvy misconstrue embroidery equilateral cad taro paraffin disarmament dilapidated cask tyrant ratify confederate antiquarian quay methane affricate officiatel ramification fray nadir concordant medicinal sedimentary louse oblige diffidence laceration polarization hake perroni genial indelible inevitable gad restive lithosphere formidable homophonous dame scoffer myriad effectual edification chaff tartar pedigree dexterity depository bogue udder sassafras charitable affiliation by the same pattern for the remaining nonwords. The nonwords were recorded into WAV computer ®les by a female speaker then imported into Microsoft Powerpoint for computer presentation to the children. The children wore a headphone/microphone set while listening to and repeating the nonwords. Children's repetitions were recorded into WAV computer ®les or were audiotaped. S. Gray / Journal of Communication Disorders 36 (2003) 129±151 139 Table 3 List A and B of nonwords from the CNRep by syllable length Two syllables Three syllables Four syllables Five syllables List A diller hampent rubid bannow sladding bannifer brasterer glistering thickery doppelate stopograttic contramponist commeecitate loddenapish pennerriful altupatory pristoractional reutterpation confrantually varsatrationist List B pennel tafflest ballop glistow prindle barrazon trumpetine skiticult commerine frescovent blonterstaping epliforvent woogalamic fenneriser perplisteronk voltularity sepretennial defermication detratapilic underbrantuand Prior to the presentation of the nonword repetition task, children practiced repeating three sample nonwords to familiarize them with the task. A voice on the computer said, ``Hello there, welcome to the word game. I'm going to say a funny, made-up word. I want you to say it just like me. Are you ready? Say .'' The RA encouraged the child to speak clearly into the microphone during the three practice nonwords, but provided no feedback regarding accuracy of response. The computer screen remained blank while each of the nonwords was presented. After the child repeated each nonword a sound effect WAV ®le played. The next word was then presented. The sound effect reinforcement was introduced because pilot studies suggested that it encouraged young children to attempt repetition of all of the nonwords. RAs scored each nonword production live. To be credited with correct repetition during live scoring, children were required to repeat the nonword exactly. Later, when listening to the recorded responses and tallying the score, an incorrect production was rescored as correct if the error was due to a consistent phoneme substitution (e.g., /t/ for /k/) also demonstrated on the Bankson±Bernthal Test of Phonology. Each correct response scored one point. For the digit span task numbers from one to nine were randomly selected without replacement to form lists varying from three to nine numbers in length. Two lists were presented at each length, beginning with three digits. The same digit span task was presented to the child on consecutive days. The number names were recorded into WAV computer ®les by a female speaker, then imported into Microsoft Powerpoint for presentation via computer. As with the nonword repetition task, the children wore a headphone/microphone set while listening to and repeating the digits. Children's repetitions were recorded into WAV computer ®les or were audiotaped. Prior to the presentation of the ®rst digit span series, children practiced repeating two two-digit series of numbers. During practice the RA encouraged the child to speak clearly into the microphone, but 140 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 provided no feedback regarding accuracy of response. A voice on the computer said, ``Hello there, welcome to the numbers game. My friends and I are really happy that you're going to play. I'm going to say some numbers, then you say them right after me. Are you ready? Let's practice. Say .'' The computer screen remained blank while each series of digits were presented. After the child repeated the digits, approximately 5 s of music played. The next series of numbers was then presented. RAs scored each digit span repetition live. To be credited with correct repetition children were required to repeat each number of the series in the correct order. Each correct repetition of a series scored one point. The task continued until the child failed to repeat either series of numbers at the same span length. 2.5. Reliability Audiotapes or WAV ®les from 40% of the nonword repetition and 40% of the digit span sessions were selected for independent scoring by a trained listener. Half of the sessions were children with SLI, and half with NL. The average pointto-point agreement for correct/incorrect scoring was 94% (range: 70±100%) for nonword repetition and 98% (range: 75±100%) for digit span. 3. Results A repeated-measures mixed-factorial ANOVA was used to assess between- and within-group differences across administrations of the nonword repetition task. The between-group factors were language group (NL, SLI) and nonword list (A, B), and the within-group factor was time of administration (1, 2, 3). With alpha set at 0.05, preliminary data analysis revealed no signi®cant difference for the nonword lists. These data sets, therefore, were collapsed across language groups. Table 4 reports the mean group scores for the nonword repetition task. Fig. 1 illustrates results for each syllable length. The language groups differed signi®cantly in the number of nonwords repeated correctly at all three times, F 1; 42 146:81, P 0:0001, with the NL group repeating more words correctly. There was a signi®cant within-group difference for time of administration, F 42; 2 3:89, P 0:0242, but no signi®cant time group interaction. Performance for both groups increased from administration time 1 to 2 and declined slightly from administration time 2 to 3. Effect sizes (Cohen's d; Cohen, 1988) were calculated to determine the degree to which the number of nonwords repeated correctly differed across time for each group. For this metric, Cohen proposed that effects of 0.25, 0.50, and 0.80 represent small, medium, and large effect sizes, respectively. The effect size for score differences for the SLI and NL groups from administration 1 to 2 was 0.49 and 0.30, respectively, and for administrations 2 to 3 was 0.01 and 0.1. S. Gray / Journal of Communication Disorders 36 (2003) 129±151 141 Table 4 Nonword repetition scores across administrations Mean S.D. Minimum Maximum SLI group Time 1 Time 2 Time 3 7.00 8.68 8.18 2.86 3.39 3.82 3.00 3.00 1.00 17.00 18.00 15.00 NL group Time 1 Time 2 Time 3 15.86 16.41 16.23 1.73 1.79 2.14 13.00 12.00 11.00 18.00 19.00 20.00 A second repeated-measures mixed-factorial ANOVA assessed between- and within-group differences for digit span performance across administrations. The between-group factor was language group, and the within-group factor was time of administration. Table 5 reports the mean group scores. The language groups differed signi®cantly in the number of series repeated correctly at all three times, F 1; 42 33:05, P 0:0001, with the NL group repeating more series correctly. There was a signi®cant within-group difference for time of administration, F 42; 2 4:93, P 0:0095, but no signi®cant time group interaction. Performance for both groups increased from administration time 1 to 2, declined slightly from administration time 2 to 3 for the SLI group, but increased for the NL group. The effect sizes associated with these differences were 0.50 and 0.02 for the SLI and NL groups from time 1 to 2, and 0.03 and 0.20 from time 2 to 3. 3.1. Discrimination To assess whether performance on the nonword repetition or digit span tasks accurately discriminated children with NL and SLI, and to compare accuracy with the SPELT-II, discriminate function analyses were conducted for each measure Table 5 Digit span scores across administrations Mean S.D. Minimum Maximum SLI group Time 1 Time 2 Time 3 1.36 1.86 1.82 0.85 0.99 1.05 0.00 0.00 1.00 3.00 4.00 4.00 NL group Time 1 Time 2 Time 3 3.68 3.73 4.09 1.39 1.78 1.74 2.00 1.00 1.00 7.00 9.00 8.00 142 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 Fig. 1. Mean number of nonwords repeated correctly by the SLI and NL groups across administrations. for each time of administration. Table 6 reports the results of these analyses. The ®rst administration of the nonword repetition task using the total number of nonwords repeated correctly resulted in excellent sensitivity and speci®city, with 95% of the children with SLI identi®ed as SLI, and 100% of the children with NL identi®ed as NL. Sensitivity and speci®city decreased slightly on the second administration, but remained high. Speci®city was further reduced on the third administration. Although digit span performance was a less accurate discriminator overall than nonword repetition, the level of sensitivity was high on the ®rst and third administrations. Both phonological memory measures demonstrated better sensitivity overall than the norm-referenced SPELT-II. S. Gray / Journal of Communication Disorders 36 (2003) 129±151 143 Table 6 Results of discriminant function analyses showing accuracy of language group classification using nonword repetition, digit span, K-ABC and SPELT-II scores SLI Sensitivity NL Error Specificity Error Nonword repetition Time 1 21/22 (95%) Time 2 20/22 (91%) Time 3 21/22 (95%) 1/22 (5%) 2/22 (9%) 1/22 (5%) 22/22 (100%) 21/22 (95%) 19/22 (86%) 0/22 (0%) 1/22 (5%) 3/22 (14%) Digit span Time 1 Time 2 Time 3 20/22 (91%) 17/22 (77%) 20/22 (91%) 2/22 (9%) 5/22 (23%) 2/22 (9%) 17/22 (77%) 17/22 (77%) 16/22 (73%) 5/22 (23%) 5/22 (23%) 6/22 (28%) SPELT-II 18/22 (82%) 4/22 (18%) 21/22 (95%) 1/22 (5%) Kaufman Assessment Battery for Children (K-ABC) (Kaufman & Kaufman, 1983); SPELTII Structured Photographic Expressive Language Test Ð II (Werner & Krescheck, 1983). 3.2. Test±retest reliability and task-test correlations To provide an index of test±retest reliability, Pearson correlation coef®cients were calculated for the ®rst, second, and third administrations of the nonword repetition and digit span tasks. These results, as well as test-task correlations, are reported in Table 7 for the SLI group and Table 8 for the NL group. Recall that children repeated the same list of nonwords at time 1 and time 2, and a different list of words at time 3, with the lists counterbalanced across groups. Nonword repetition scores were signi®cantly correlated at each time of administration for the SLI group with the strongest correlation from time 1 to 2 (0.72). Neither time 1±2 (0.40) or time 2±3 (0.22) were signi®cantly correlated for the NL group; however, time 1±3 (0.49) reached signi®cance at the P < 0:05 level. Digit span scores were signi®cantly correlated at each time of administration for both groups, with considerably stronger correlations for the NL (0.85±0.87) than the SLI group (0.48±0.57). One caveat to using correlation as a reliability measure is that change in performance across a group may result in a high correlation even though performance may have improved or declined by the same rate. To quantify the direction of change, Fig. 2 illustrates the number of children from each group whose scores increased, stayed the same, or decreased by 1 S.E.M. across administrations of each task. More children from the SLI group than the NL group increased their scores from time 1 to 2 for both tasks. In general, performance on speech and language tests was not signi®cantly correlated with phonological memory task performance for the SLI group. This is important to note because SLI and NL group scores differed signi®cantly. The only signi®cant correlations were between the SPELT-II with nonword repetition at time 144 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 Table 7 Correlations among nonword repetition, digit span, and test scores for the SLI group NW1 NW1 NW2 NW3 DS1 DS2 DS3 K-ABC SPELT-II BBTOP PPVT-III ± 0.72**** 0.50* 0.37 0.54** 0.63** 0.09 0.30 0.05 0.44 NW2 NW3 DS1 DS2 ± 0.52** 0.19 0.48* 0.37 0.06 0.36 0.24 0.31 ± 0.04 0.40 0.29 0.18 0.49* 0.25 0.45* ± 0.48* 0.11 0.28 0.11 0.38 ± 0.57** 0.56** 0.05 0.23 0.14 0.16 DS3 ± 0.13 0.23 0.10 0.24 K-ABC SPELT-II BBTOP PPVT-III ± 0.23 0.21 0.23 ± 0.12 ± ± 0.75**** 0.38 Note: NW1 nonword repetition task first administration; NW2 nonword repetition task second administration; NW3 nonword repetition task third administration; DS1 digit span task first administration; DS2 digit span task second administration; DS3 digit span task third administration; K-ABC Nonverbal Scale of the Kaufman Assessment Battery for Children (Kaufman & Kaufman, 1983) (M 100; S:D: 15); SPELT-II Structured Photographic Expressive Language Test Ð II (Werner & Krescheck, 1983) (M 100; S:D: 15); BBTOP Bankson±Bernthal Test of Phonology Word Inventory Raw Score (Bankson & Bernthal, 1990); PPVT-III Peabody Picture Vocabulary Test Ð Third Edition (Dunn & Dunn, 1997). * P < 0:05. ** P < 0:01. **** P < 0:0001. Table 8 Correlations among nonword repetition, digit span, and test scores for the NL group NW1 NW1 NW2 NW3 DS1 DS2 DS3 K-ABC SPELT-II BBTOP ± 0.40 0.49* 0.26 0.17 0.15 0.06 0.14 0.38 NW2 ± 0.22 0.51* 0.66*** 0.58** 0.19 0.57** 0.63** NW3 ± 0.05 0.07 0.06 0.01 0.15 0.03 DS1 DS2 DS3 ± 0.87**** 0.85**** 0.41 0.34 0.30 ± 0.87**** 0.39 0.60** 0.40 ± 0.48* 0.57** 0.32 K-ABC ± 0.41 0.02 SPELT-II BBTOP ± 0.30 ± Note: NW1 nonword repetition task first administration; NW2 nonword repetition task second administration; NW3 nonword repetition task third administration; DS1 digit span task first administration; DS2 digit span task second administration; DS3 digit span task third administration; K-ABC Nonverbal Scale of the Kaufman Assessment Battery for Children (Kaufman & Kaufman, 1983) (M 100; S:D: 15); SPELT-II Structured Photographic Expressive Language Test Ð II (Werner & Krescheck, 1983) (M 100; S:D: 15); BBTOP Bankson±Bernthal Test of Phonology Word Inventory Raw Score (Bankson & Bernthal, 1990). * P < 0:05. ** P < 0:01. *** P < 0:001. **** P < 0:0001. S. Gray / Journal of Communication Disorders 36 (2003) 129±151 145 Fig. 2. Score changes as indexed by 1 S.E.M. or more for the SLI and NL groups across administrations of the nonword repetition and digit span tasks. 3, and the PPVT-III with nonword repetition at time 3. The relationship between test and task performance was stronger and more prevalent for the NL group, with the SPELT-II signi®cantly correlated with nonword repetition at time 2 and digit span at times 3 and 4. Likewise, the BBTOP was signi®cantly correlated with nonword repetition at time 2. K-ABC performance was signi®cantly correlated with digit span at time 3. 146 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 4. Discussion This preliminary study evaluated the usefulness of nonword repetition and digit span as a diagnostic measure of SLI in a younger sample of children than previously studied, and assessed the validity and reliability of these measures. As in previous studies, the SLI group's performance was signi®cantly poorer than age-matched NL peers on both tasks. Ideally, a diagnostic measure demonstrates high levels of sensitivity and speci®city and is quick and simple to administer and to score. The CNRep lists, delivered via computer, using whole-word scoring, appeared to achieve this goal. Dollaghan and Campbell (1998) also found near perfect discrimination of schoolage children with LI and NL using their 16-nonword list. Use of the CNRep lists in the present study permitted comparison of alternate forms, where no signi®cant difference was found between lists. These ®ndings suggest that different nonword stimuli may provide accurate discriminate accuracy. Because the CNRep stimuli are more ``wordlike'' however, performance on the CNRep lists may re¯ect a greater in¯uence of prior language knowledge than Dollaghan and Campbell's 16-nonword list. This relationship is usually documented by correlating language test scores with task performance. In the present study, scores on the SPELT-II and PPVT-III were signi®cantly correlated only with the third administration of the nonword repetition task for the SLI group. This was a small sample, however, and the language tests measure only limited aspects of language knowledge. The whole-word scoring utilized in this study resulted in accurate discriminate accuracy, as did Dollaghan and Campbell's ``Percentage of Phonemes Correct'' scoring. The latter procedure may require more time than whole-word scoring, especially as the ages of children studied decreases and the incidence of articulation and phonological errors increases. If the purpose of a study is to investigate phonological memory, phoneme-by-phoneme scoring would provide the more de®nitive description of performance. If the purpose is to provide a measure that discriminates children with and without SLI, whole-word scoring may prove effective, and perhaps more ef®cient for clinicians. To further increase ef®ciency of discrimination between children with SLI and NL, investigators need to determine the minimum number of nonwords needed for accurate discrimination. In their ®rst study, Dollaghan and Campbell (1998) reported that repetition of three- and four-syllable nonwords, but not repetition of one- and two-syllable nonwords, was signi®cantly lower for the LI than the NL group in their study of schoolage children. In study 2, repetition of all nonwords resulted in the most accurate classi®cation of children with LI or NL, but repetition of three- and four-syllable nonwords also resulted in high levels of classi®cation accuracy. Ellis Weismer et al. (2000) reported similar ®ndings for LI groups de®ned by diagnosis at second grade using test scores, or de®ned by enrollment in language intervention. Between-group differences were most pronounced on three- and four-syllable nonwords. In the present study, the total S. Gray / Journal of Communication Disorders 36 (2003) 129±151 147 number of nonwords repeated correctly provided the best levels of sensitivity and speci®city, with performance on two-syllable words providing the least. These results suggest that nonword lists of three-, four-, and ®ve-syllable words might prove most effective for diagnostic purposes for preschool and schoolage children, and that 20 or fewer nonwords may be needed to accomplish accurate discrimination. Nonword repetition diagnostic accuracy surpassed that of the SPELT-II. Previously this test has demonstrated sensitivity and speci®city levels above 90%, but not as high as the 95% levels demonstrated in the present study on the ®rst administration of the nonword repetition task. This ®nding lends support to the suggestion by Dollaghan and Campbell (1998) that nonword repetition may prove to be a more accurate diagnostic measure of LI than norm-referenced language tests. The preschool SLI group in this study, similar to the schoolage LI group in the Dollaghan and Campbell (1998) study, was selected in part using the standard of enrollment in language intervention services. The high level of identi®cation accuracy found in this study may not be replicated in children with suspected SLI who are not enrolled in services. This was the case in the Ellis Weismer et al. (2000) study in which nonword repetition provided better discrimination of children with LI so classi®ed because they were receiving treatment, than for children classi®ed with LI based on test scores alone. Nevertheless, if nonword repetition tasks prove effective, reliable, less biased, and more ef®cient than norm-referenced language tests for diagnosing impairment in identi®ed populations from preschool through elementary school, this would improve our current standard of practice. Further research is needed to document success in unidenti®ed populations. To be considered reliable, a diagnostic measure must produce stable scores across administrations. In the present study, similarly to Edwards and Lahey's (1998) nonword study, scores on both the nonword repetition and digit span tasks improved signi®cantly when the same nonword list was administered a second time to children in both language groups. Unlike Edwards and Lahey, the SLI group appeared to bene®t more from repetition than the NL group. These ®ndings suggest a differential practice effect for children with SLI that should be taken into consideration when designing any diagnostic measure. Effect sizes indicate that the changes in scores from second to third administration were not marked for either group, despite children's opportunity to practice repeating words during the intervening time. The temporal consistency of scores across administrations, or test±retest reliability, was indexed for both nonword repetition and digit span by calculating the degree of correlation between successive administrations of the tasks. The Pearson-Product correlations for administrations 1, 2, and 3 varied between groups for the nonword repetition task, and for the NL group were lower than the correlations for the digit span task. Because the ®rst and second administrations of both tasks were only 1 day apart, high correlations would be expected for both groups; yet the nonword repetition task correlation for the NL group from 148 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 time 1 to 2 was not signi®cant. These results, taken together with the signi®cant difference in group performance from ®rst to second administrations of the tasks, raises concerns about the test±retest reliability of these measures. Researchers and clinicians might improve score stability by building more practice into nonword repetition tasks before administering scored items. The differential improvement in scores between groups might also re¯ect poorer test-taking skills by the SLI group that were improved with practice. Score increases, which were primarily the result of better performance on one- and two-syllable nonwords on the nonword repetition task, might also re¯ect the bene®t of production practice from administration 1 to 2. Although increased practice may improve score stability, it may also reduce discriminate accuracy because between-group differences may diminish if SLI group scores increase more than NL group scores. Performance on the nonword repetition and digit span tasks was signi®cantly correlated at both times of administration, and at higher levels than previously reported by Gathercole et al. (1994). Nevertheless, nonword repetition proved to be the better diagnostic measure of SLI. Although sensitivity was comparable between the two measures, digit span speci®city was lower than nonword repetition. In summary, nonword repetition performance holds promise as an identi®er of SLI in preschool as well as schoolage children; however, acceptable levels of speci®city and sensitivity must be demonstrated across a range of language impairment severity levels. The test±retest reliability of both nonword repetition and digit span require further investigation before these measures could be used for diagnostic purposes. In particular, evaluation of different nonword lists and varying amounts of time between administrations is needed for all age groups. SLI group performance improved more than NL group performance on both phonological memory tasks, suggesting that practice effects should be taken into consideration when re®ning nonword repetition as a future diagnostic measure of SLI. The use of whole-word scoring may prove both ef®cient and effective for discriminating children with and without SLI. Acknowledgments This research was supported by funding from the Tucson Scottish Rite Charitable Foundation, by National Multipurpose Research and Training Center Grant DC01409 from the National Institutes of Health NIDCD, and by National Institutes of Health NIDCD 1 R03 DC04240-01. Thank you to the children, families, teachers and administrators from The Scottish Rite Ð University of Arizona Child Language Center, Little Ranch School, and Castle Hill School who participated in this project, and to the graduate and undergraduate students who administered the assessments and research tasks. Special thanks to Becky Vance for her invaluable expertise and support and to Linda Swisher, David Ingram, and an anonymous reviewer for review of the manuscript. S. Gray / Journal of Communication Disorders 36 (2003) 129±151 149 Appendix A. Continuing education 1. Two tasks used to assess children's phonological memory include: A. Alphabet recitation and story retell. B. Nonword repetition and digit span. C. The rainbow passage and story retell. D. Digit span and alphabet recitation. E. Story retell and digit span. 2. To be considered a valid diagnostic measure, performance on a test or task must: A. Improve over time. B. Exceed a standard score of 95. C. Accurately discriminate children with and without the target disorder. D. Be an enjoyable task for the child. E. Show higher mean scores for normal groups than impaired groups. 3. To be considered a reliable diagnostic measure, performance on a test or task must: A. Improve with practice. B. Increase with the child's age. C. Be easy to administer. D. Be computer scored. E. Be highly correlated across repeated administrations. 4. In this study performance on the nonword repetition task changed from administration 1 to administration 2 as follows: A. The SLI group mean score improved more than the NL group mean score. B. The NL group mean score improved more than the SLI group mean score. C. Mean scores for both groups declined equally. D. Mean scores for both groups increased equally. E. The NL group mean score improved but the SLI group mean score declined. 5. The findings of this study suggest that: A. Digit span tasks are superior to nonword repetition tasks for identifying SLI in preschoolers. B. Nonword repetition tasks hold promise as an identifier of SLI in preschool children. C. Standardized language tests demonstrate better sensitivity and specificity than nonword D repetition tasks. D. Practice has no effect on task performance. E. Nonword repetition is too difficult for preschoolers. 150 S. Gray / Journal of Communication Disorders 36 (2003) 129±151 References American National Standards Institute. (1989). Specifications for audiometers (ANSI S3.6-1989). New York: ANSI. Bankson, N. W., & Bernthal, J. E. (1990). Bankson±Bernthal Test of Phonology. Chicago, IL: The Riverside Publishing Company. Bishop, D., Bishop, S., Bright, P., James, C., Delaney, T., & Tallal, P. (1999). Different origin of auditory and phonological processing problems in children with language impairment: Evidence from a twin study. Journal of Speech, Language, and Hearing Research, 42, 155±168. Bishop, D. V. M., North, T. L., & Donlan, C. (1996). Nonword repetition as a behavioural marker for inherited language impairment: Evidence from a twin study. Journal of Child Psychology and Psychiatry, 36, 1±13. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NL: Erlbaum. Dollaghan, C., Biber, M., & Campbell, T. (1995). Lexical influences on nonword repetition. Applied Psycholinguistics, 16, 211±222. Dollaghan, C., & Campbell, T. (1998). Nonword repetition and child language impairment. Journal of Speech, Language, and Hearing Research, 41, 1136±1146. Dunn, L., & Dunn L. (1997). Peabody Picture Vocabulary Test Ð Third Edition. Circle Pines, MN: American Guidance Service. Edwards, J., & Lahey, M. (1998). Nonword repetitions of children with specific language impairment: Exploration of some explanations for their inaccuracies. Applied Psycholinguistics, 19, 279±309. Ellis Weismer, S., Tomblin, J. B., Zhang, X., Buckwalter, P., Chynoweth, J. G., & Jones, M. (2000). Nonword repetition performance in schoolage children with and without language impairment. Journal of Speech, Language, and Hearing Research, 43, 865±878. Gathercole, S. E., & Baddeley, A. D. (1989). Evaluation of the role of phonological STM in the development of vocabulary in children: A longitudinal study. Journal of Memory and Language, 28, 200±213. Gathercole, S. E., & Baddeley, A. D. (1990a). Phonological memory deficits in language disordered children: Is there a causal connection? Journal of Memory and Language, 29, 336±360. Gathercole, S. E., & Baddeley, A. D. (1990b). The role of phonological memory in vocabulary acquisition: A study of young children learning new names. British Journal of Psychology, 81, 439±454. Gathercole, S., & Baddeley, A. (1995). Short-term memory may yet be deficient in children with language impairments: A comment on van der Lely & Howard (1993). Journal of Speech and Hearing Research, 38, 463±466. Gathercole, S. E., Hitch, G. J., Service, E., & Martin, A. J. (1997). Phonological short-term memory and new word learning in children. Developmental Psychology, 6, 966±979. Gathercole, S. E., Service, E., Hitch, G. J., Adams, A., & Martin, A. (1999). Phonological short-term memory and vocabulary development: Further evidence on the nature of the relationship. Applied Cognitive Psychology, 13, 65±77. Gathercole, S. E., Willis, C., & Baddeley, A. D. (1991). Differentiating phonological memory and awareness of rhyme: Reading and vocabulary development in children. British Journal of Psychology, 82, 387±406. Gathercole, S., Willis, C., Baddeley, A., & Emslie, H. (1994). The Children's Test of Nonword Repetition: A test of phonological working memory. Memory, 2, 103±127. Gathercole, S. E., Willis, C., Emslie, H., & Baddeley, A. (1992). Phonological memory and vocabulary development during the early school years: A longitudinal study. Developmental Psychology, 28, 887±898. Hammill, D., & Newcomer, P. (1988). Test of Language Development Intermediate: 2. Austin, TX: Pro-Ed. S. Gray / Journal of Communication Disorders 36 (2003) 129±151 151 Howard, D., & van der Lely, H. (1995). Specific language impairment in children is not due to a short-term memory deficit: Response to Gathercole & Baddeley. Journal of Speech and Hearing Research, 38, 466±472. Kaufman, A. S., & Kaufman, N. L. (1983). Kaufman Assessment Battery for Children. Circle Pines, MN: American Guidance Service. Montgomery, J. (1995). Sentence comprehension in children with specific language impairment: The role of phonological working memory. Journal of Speech and Hearing Research, 38, 187±199. Plante, E., & Vance, R. (1994). Selection of preschool language tests: A data-based approach. Language Speech and Hearing Services in Schools, 25, 15±24. Tomblin, J. G., Records, N., Buckwalter, P., Zhang, X., Smithe, E., & O'Brien, M. (1997). Prevalence of specific language impairment in kindergarten children. Journal of Speech, Language, and Hearing Research, 40, 1245±1260. van der Lely, H., & Howard, D. (1993). Children with specific language impairment: Linguistic impairment or short-term memory deficit? Journal of Speech and Hearing Research, 36, 1193±1207. Wechsler, D. (1991). Wechsler Intelligence Scale for Children-III (WISC-III). San Antonio, TX: Psychological Corporation. Werner, E., & Krescheck, J. D. (1983). Structured Photographic Expressive Language Test Ð II. Sandwich, IL: Janelle Publications.