Journal of Communication Disorders
36 (2003) 129±151
Diagnostic accuracy and test±retest reliability
of nonword repetition and digit span tasks
administered to preschool children with
speci®c language impairment
Shelley Gray*
Department of Speech and Hearing Sciences and The National Center for Neurogenic
Communication Disorders, The University of Arizona, Tucson, AZ, USA
Received 9 August 2002; received in revised form 28 December 2002; accepted 2 January 2003
Abstract
To assess diagnostic accuracy and test±retest reliability, two forms of a nonword
repetition task were administered to 22 preschool children with speci®c language
impairment (SLI) and to 22 age- and gender-matched children with normal language
(NL). Results were compared with performance on a digit span task and norm-referenced
test scores. Nonword repetition scores provided excellent sensitivity and speci®city for
discriminating between groups. Scores on both nonword repetition and digit span tasks
improved signi®cantly from ®rst to second administrations for both groups, but remained
relatively stable at the third administration. The SLI group appeared to bene®t more from
repetition than the NL group. Acceptable levels of test±retest reliability were achieved for
the digit span task, but not for the NL group on the nonword repetition task. These
preliminary ®ndings suggest that with further re®nement to improve test±retest reliability, nonword repetition holds promise as a diagnostic measure for SLI in preschool
children.
Educational objectives: As a result of this activity, the participant will be able to (1)
describe the content and administration of nonword repetition tasks; (2) explain why
evidence of test±retest reliability is necessary before a measure may be considered reliable
*
Present address: Department of Speech and Hearing Science, Arizona State University, P.O. Box
870102, Tempe, AZ 85287-0102, USA. Tel.: ‡1-480-965-6796; fax: ‡1-480-965-8516.
E-mail address: Shelley.Gray@asu.edu (S. Gray).
0021-9924/03/$ ± see front matter # 2003 Elsevier Science Inc. All rights reserved.
doi:10.1016/S0021-9924(03)00003-0
130
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
for diagnostic purposes; and (3) accurately compare the sensitivity and speci®city of the
nonword repetition task utilized in this study to standardized language test scores.
# 2003 Elsevier Science Inc. All rights reserved.
Keywords: Test±retest reliability; Nonword repetition; Speci®c language impairment; Preschoolers
1. Introduction
Measures designed to quantify short-term phonological memory have been
used increasingly to investigate the language skills of children. Over a decade ago,
Gathercole, Baddeley, and colleagues (e.g., Gathercole & Baddeley, 1990b;
Gathercole, Hitch, Service, & Martin, 1997; Gathercole, Willis, Baddeley, &
Emslie, 1994) explored the relationship between phonological short-term memory and language acquisition using nonword repetition and digit span tasks. Their
work demonstrated a strong relationship between retention of phonological
information and vocabulary acquisition. This relationship appears in children
as young as 4 years of age (e.g., Gathercole & Baddeley, 1989), and continues into
adolescence (Gathercole, Service, Hitch, Adams, & Martin, 1999).
In their research with children diagnosed with speci®c language impairment
(SLI), Gathercole and Baddeley (1990a, 1990b) concluded that there was a
``causal'' connection between poor phonological short-term memory and poor
vocabulary acquisition (although there is disagreement about this as discussed by
Gathercole & Baddeley, 1995; Howard & van der Lely, 1995; van der Lely &
Howard, 1993). Since that time, a number of studies have documented that
children with language impairment (LI)1 perform more poorly on phonological
memory measures than their peers with normal language (NL) (Bishop, North, &
Donlan, 1996; Dollaghan & Campbell, 1998; Edwards & Lahey, 1998; Ellis
Weismer et al., 2000; Gathercole & Baddeley, 1990a; Montgomery, 1995).
Nonword repetition and digit span are both used to assess children's short-term
phonological memory. Nonword repetition tasks require a child to listen to a
series of nonsense words of varied lengths, and to repeat them correctly. Digit
span tasks require a child to listen to single digit numbers presented in series of
increasing lengths and to repeat them in the correct sequence. The maximum
length (i.e., maximum number of numbers) that a child repeats correctly constitutes ``span.''
Gathercole et al. (1997) posited that nonword repetition provided a ``purer''
assessment of short-term phonological memory than digit span. They reasoned
that, unlike the more familiar names for numbers used in digit span, prior lexical
knowledge would not be used to supplement temporary representations in the
phonological loop. Subsequent research demonstrated that nonword repetition
performance may depend not only on phonological memory ability, but also on
1
Authors use the terms language impairment or specific language impairment in different studies
to describe their participant's diagnoses. The authors use of LI or SLI is reproduced in this paper.
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
131
accumulated lexical knowledge. Gathercole and Baddeley (1990b) and Dollaghan, Biber, and Campbell (1995) demonstrated that the more ``wordlike'' the
nonwords, the easier they are to repeat. The authors concluded that children with
more developed lexicons may have an advantage in nonword repetition tasks
because they can utilize words stored in long-term memory to help them
remember nonwords. Thus, the ``nonlexical'' advantage of nonword repetition
over digit span as a measure of phonological memory was called into question.
Nonword repetition may resemble the challenge a child faces when hearing new
words for the ®rst time more closely than digit span.
Previous research employed nonwords of varying phonemic structures and
lengths. In 1990, Gathercole and Baddeley (see Gathercole et al., 1994 for a
detailed description) developed the Children's Test of Nonword Repetition
(CNRep). It included 40 nonwords (10 each containing one, two, three, or four
syllables). The original list of nonwords was later revised by replacing the onesyllable with ®ve-syllable nonwords. The nonwords were designed to ``conform
to the phonotactic rules of English'' and to the ``dominant syllable stress patterns
in English for words of that length.'' Researchers presented the nonwords via
audiotape then immediately scored each repetition as correct or incorrect. If a
child consistently pronounced one phoneme as another it was not counted as an
error. The researchers acknowledged that a live system of whole-word scoring
was approximate. Nevertheless, reliability calculations based on 104 4-year-old
children from the sample of children who received the original nonword list were
reportedly found to be high, with 97% agreement between live and tape-recorded
responses based on whole-word, correct/incorrect scoring.
Gathercole et al. (1994) published cross-sectional CNRep data for normally
developing children ages 4 (N ˆ 142) through 9 (N ˆ 16) years. This study found
that children's CNRep scores increased through age 8. Scores on the CNRep were
reportedly ``highly and signi®cantly'' correlated with digit span at all ages tested:
age 4, r ˆ 0:524; age 5, r ˆ 0:667 (Gathercole, Willis, & Baddeley, 1991) and
age 8, r ˆ 0:445 (Gathercole, Willis, Emslie, & Baddeley, 1992). Test±retest
reliability was considered ``satisfactory for psychometric purposes'' with correlation coef®cients of 0.77 for 5-year-olds and 0.80 for 7-year-olds when the
CNRep was readministered after a 4-week interval.
Early in its development, Gathercole and Baddeley (1990a) administered the
®rst version of the CNRep to 5-year-old children with SLI, and found that they
performed poorer than children with NL matched for nonverbal cognitive ability
and poorer than younger children with NL matched for language level. Bishop et al.
(1996) reported similar ®ndings using the CNRep in a twin study of 39 7±9-year
old children with ``persistent'' LI and 13 children with ``resolved'' LI (no longer
enrolled in speech±language therapy), and in a second related twin study (Bishop
et al., 1999). In these studies, Bishop and colleagues administered the nonwords
live rather than via audiotape, but used the same whole-word scoring procedure.
Montgomery (1995) developed a list of 48 nonwords (12 each containing one,
two, three, or four syllables) to explore the relationship between phonological
132
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
working memory and sentence comprehension. They were presented to schoolage children (ages 5±11 years) via audiotape in random order. Four practice items
were presented prior to the actual nonwords. Unlike the CNRep, the nonwords
were repeated once if requested by the child, and children were allowed to
attempt two repetitions of the nonwords. Researchers audiotaped responses, with
the ®nal production scored for correct/incorrect repetition of the whole word.
Point-to-point agreement between the original and second transcription of 50%
of the NL and 50% of the SLI tapes resulted in 97 and 94% reliability,
respectively. As in previous studies, children with SLI performed more poorly
than their NL peers. They correctly repeated signi®cantly fewer three- and foursyllable nonwords, but between-group performance did not differ for one- or
two-syllable nonwords.
Edwards and Lahey (1998) utilized a list of six nonwords (three each containing three or four syllables) in a nonword repetition experiment designed to
investigate variables that might explain nonword repetition inaccuracies exhibited
by 6- and 7-year-old children with SLI. The nonwords ``obeyed the phonotactic
constraints of English'' but contained no stressed syllables that were real words.
One of the proposed reasons for NL groups outperforming SLI groups is that they
have more experience producing varied phonological sequences such as found in
nonword tasks. To investigate this possibility, Edwards and Lahey provided
practice by presenting each nonword four successive times. Both the SLI group
and the NL control group improved the accuracy of repetitions across administrations. However, the SLI group performed signi®cantly poorer than the NL
control group each time.
The consistently poorer performance by SLI groups relative to NL groups
across studies provided researchers with the impetus to explore use of nonword
repetition to evaluate whether performance on nonword repetition tasks might
provide a less culturally-biased method of contributing to the diagnosis of SLI
than currently available norm-referenced language tests. In a two-study series,
Dollaghan and Campbell (1998) investigated whether nonword repetition could
be used as a screening measure for LI in schoolage children. To address the
concern that nonword repetition results might be in¯uenced by ``wordlikeness'' or
articulatory dif®culty, Dollaghan and Campbell (1998) developed a list of 16
nonwords (four series, each containing one, two, three, or four syllables).
According to the authors, ``neither the nonwords nor their constituent syllables
correspond[ed] to lexical items.'' Each word comprised early developing, acoustically salient phonemes. No speci®c consonants or vowels occurred more than
once within the same nonword. The stimuli were recorded in order to standardize
presentation, with one-syllable nonwords presented ®rst, progressing in length to
four-syllable nonwords. Responses were audiotaped for later phoneme-by-phoneme scoring, resulting in a dependent measure of ``Percentage of Phonemes
Correct'' rather than number of whole nonwords repeated correctly. All phoneme
substitutions were scored as incorrect. High scoring reliability was reported using
this procedure (94% agreement for judgment of correctness).
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
133
In study 1 the authors compared the nonword repetition performance of 20 6to 9-year-old children enrolled in language intervention with 20 age-matched
peers with NL, and found no overlap in scores between the two groups.
Importantly, there was also no signi®cant difference in performance between
the 25 African American participants and the 9 White participants in this study.
According to Dollaghan and Campbell, this ®nding provided evidence that
processing-dependent measures such as nonword repetition may be less culturally-biased than commonly used norm-referenced tests.
In study 2, 85 5- to 12-year-old children (including the 40 age-matched
children from study 1) (44 LI, 41 NL) completed the same nonword repetition
task. Results indicated that children receiving language therapy were signi®cantly
poorer at repeating nonwords than their NL peers. Results of likelihood ratio
analyses utilizing performance on three- and four-syllable words demonstrated
that children with and without LI could be accurately identi®ed based on their
nonword repetition performance. In fact, nonword repetition performance was
more accurate than the Test of Language Development Intermediate: 2 (TOLDI:2)
(Hammill & Newcomer, 1988) in predicting language group status.
Ellis Weismer et al. (2000) used the same list of nonwords with a populationbased sample of children to ``con®rm the extent to which nonword repetition
performance can serve as a clinically useful index of language disorder . . ..'' The
large sample of second-graders was part of an ongoing epidemiological study of
SLI (see Tomblin et al., 1997). Children classi®ed as LI were required to score
below ±1.25 S.D. on at least two of ®ve language composite scores based on local
norms as described by Ellis Weismer et al. However, only 90 of the 164 children
classi®ed as LI were enrolled in speech±language intervention. Children with LI
and with NL were further classi®ed into two cognitive groups, those with normal
nonverbal IQs (PIQ of 85 or above on the Wechsler Intelligence Scale for ChildrenIII (WISC-III; Wechsler, 1991)) and those with low nonverbal IQs (<85 PIQ). Both
groups of children with LI scored signi®cantly lower on the nonword repetition
task than the NL groups. In fact, the LI group with normal cognitive skills scored
lower than the NL group with low cognitive skills. Children enrolled in language
intervention scored signi®cantly lower than those not enrolled in intervention. Use
of likelihood ratio analyses in this study did not result in the high levels of
diagnostic accuracy found by Dollaghan and Campbell (1998), although the
authors highlighted a number of differences in selection criteria and sample
classi®cation that may have contributed to the different ®ndings. Most notably,
Dollaghan and Campbell's participants with SLI represented a clinically referred
sample, and the authors utilized relatively equal numbers of NL and LI children,
rather than a population-based sample with unequal representation of NL and LI. In
addition, some of the children in the NL group had a prior history of LI.
Each of the preceding studies employed nonword repetition as a measure of
phonological memory using different stimuli, instructions, scoring, and participant selection criteria. Despite these methodological differences, the results were
consistent: as a group, children with SLI had considerable more dif®culty
134
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
repeating nonwords than children with NL. Thus, nonword repetition, in addition
to being a measure of phonological memory, now also is considered to be a
possible diagnostic measure of SLI that has only been investigated in schoolage
children. Further research is needed to support nonword repetition's diagnostic
validity and reliability, and to investigate whether it may be used for this purpose
with preschoolage children.
To be considered a valid diagnostic measure, performance on nonword
repetition tasks must accurately discriminate children with and without SLI.
Plante and Vance (1994) suggested that 90% should be considered good discriminate accuracy for diagnostic measures of language impairment in children.
That is, a measure should demonstrate at least 90% sensitivity (identi®es SLI as
SLI) and 90% speci®city (identi®es NL as NL). Theoretically, the signi®cantly
different performance of children with and without the disorder that permits good
discriminate accuracy occurs because poor phonological memory capacity negatively affects language acquisition. If this is the case, nonword repetition
performance should be signi®cantly correlated with other measures of phonological memory, such as digit span, that might also be an index of SLI.
To be considered a reliable diagnostic measure, performance on nonword
repetition tasks must be stable across time. One important measure of stability is
test±retest reliability, typically indexed by a signi®cant correlation between scores
on successive administrations of a test. If the correlation is high, it may be argued
that the skill being measured is stable and that the test accurately measures it. The
closer in time the administrations, the higher the expected correlation.
The purpose of this study was to conduct a preliminary evaluation of the
usefulness of nonword repetition as a diagnostic measure of SLI in younger
children than previously studied, and to gather information about the validity and
reliability of this measure. The performance of preschoolers with SLI was
compared with age- and gender-matched peers with NL. The diagnosis of SLI
was made using the standard of therapy enrollment. This allows ®ndings to be
generalized to the population of preschool children with SLI that SLPs serve on
their caseloads. The stimuli, administration, and scoring procedures were selected
to promote high levels of test±retest, administration, and scoring reliability while
minimizing scoring time. This was accomplished by delivering the stimuli via
computer, utilizing whole-word scoring, administering the ®rst two nonword
repetition tasks only 1 day apart, and by providing word repetition practice
between the second and third administrations of the nonword task to determine
whether this would affect performance.
Two forms of the CNRep were administered to permit alternate forms
comparison. The diagnostic accuracy of nonword repetition and digit span were
compared with the diagnostic accuracy of the Structured Photographic Expressive
Language Test Ð II (SPELT-2) (Werner & Krescheck, 1983), a test shown to have
good discriminate accuracy for preschoolers with SLI (Plante & Vance, 1994).
The speci®c questions were: (1) Do the NL and SLI groups differ signi®cantly
on nonword repetition or digit span performance? (2) Does performance on the
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
135
nonword repetition task or digit span task accurately discriminate preschool
children with and without SLI? (3) Do scores remain stable with repeated administration, providing evidence for test±retest reliability? (4) Does word repetition
practice improve nonword repetition task performance? (5) Is there a signi®cant
difference between performance on alternate forms of the nonword repetition task?
2. Methods
2.1. Participants
Twenty-two children diagnosed with SLI and 22 children with normally
developing language (NL) participated in the study. Each child with NL was
selected to match a child with SLI for gender and age (3 months). All children
were between the ages of 4, 0 (years, months) and 5, 11 and spoke a standard
dialect of English as their primary language by parent and teacher report. Table 1
provides descriptive information about both groups. Of the 22 children in the SLI
group, 2 were Asian American, 6 were Hispanic, 13 were White, and 1 was Other.
In the NL group 2 were Hispanic, 19 were White, and 1 was Other. Each group had
5 girls and 17 boys. A parent or guardian of each child completed a questionnaire
regarding the child's developmental history, primary language, and the number of
years of education the child's parents or guardians completed. Children were from
similar middle-class socioeconomic backgrounds and their mothers reported
similar levels of education.
Children in the SLI group were selected for inclusion using the standard
of intervention status (see Dollaghan & Campbell, 1998). They were recruited
Table 1
Subject description information for children in both language groups
NL group
Age in months
Mothers' years of education
K-ABC*
SPELT-II*
BBTOP WI*
PPVT-III
SLI group
Mean
S.D.
Range
Mean
S.D.
Range
60.09
15.32
110.95
104.10
70.68
±a
6.06
1.83
10.39
17.32
10.42
±a
20
5.00
45.00
70.00
46.00
±a
60.14
15.64
98.57
38.36
30.48
95.6
8.11
1.60
12.46
30.12
21.61
13.82
34
5.00
47.00
88.75
78.00
56.00
Note: K-ABC ˆ Nonverbal Scale of the Kaufman Assessment Battery for Children (Kaufman &
Kaufman, 1983) (M ˆ 100; S:D: ˆ 15); SPELT-II ˆ Structured Photographic Expressive Language
Test Ð II (Werner & Krescheck, 1983) (M ˆ 100; S:D: ˆ 15); BBTOP ˆ Bankson±Bernthal Test
of Phonology Word Inventory Score (number correct out of 80) (Bankson & Bernthal, 1990);
PPVT-III ˆ Peabody Picture Vocabulary Test Ð Third Edition (Dunn & Dunn, 1997) (M ˆ 100;
S:D: ˆ 15).
a
Test not administered to NL group.
*
P < 0:05.
136
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
from public school and clinic programs in Tucson that provide language therapy
for preschoolers. To qualify for services in these programs, children must score
more than 1.5 S.D. below the mean on two norm-referenced language tests.
Children were enrolled in the Child Language Center's (CLC) Wings On Words
Preschool or in their own preschool program. The Wings On Words program
provides a language-rich curriculum for children with SLI and NL and language
therapy for children with SLI. An ASHA-certi®ed SLP from the CLC determined that each child classi®ed as SLI met the following additional selection
criteria:
1. Hearing within normal limits bilaterally (25 dB HL) at 500, 1000, 2000,
and 4000 Hz (American National Standards Institution ANSI, 1989).
2. Normal nonverbal intelligence as indicated by a nonverbal IQ score of 75
or higher on the Nonverbal Scale of the Kaufman Assessment Battery for
Children (K-ABC) (Kaufman & Kaufman, 1983). According to the
Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition,
mental retardation is characterized by an IQ of approximately 70 or below.
The nonverbal IQ cutoff score of 75 used in this study allows for a fivepoint standard error of measure.
3. With the exception of language, articulation, or phonological problems, no
evidence of a frank neurological problem or additional developmental
disorder reported by the parent.
4. Speech intelligibility judged to be adequate for applying scoring
procedures based on a 3- to 5-min story-retelling task administered by
the SLP, and responses to norm-referenced tests.
Therapy records indicated that certi®ed SLPs providing language therapy
services to these children during the time of the study targeted grammar objectives
for each child.
Children in the NL group were enrolled in the CLC Wings On Words Preschool
or another Tucson preschool at the time of the study. A teacher of each child in the
NL group was asked to complete a questionnaire regarding the child's speech,
language, motor, cognitive, and social-skill development. An ASHA-certi®ed
SLP from the CLC determined that each child classi®ed as NL met the following
selection criteria:
1. Hearing within normal limits bilaterally (30 dB HL) at 500, 1000, 2000,
and 4000 Hz bilaterally (ANSI, 1989) (a screening level of 30 Hz was
required because of ambient noise in some locations).
2. Normal nonverbal intelligence as indicated by a nonverbal IQ score of 75
or higher on the Nonverbal Scale of the Kaufman Assessment Battery for
Children (Kaufman & Kaufman, 1983).
3. Age-expected progress in school by teacher report.
4. Normal motor, cognitive, social-emotional, speech, and language development by parent and teacher report.
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
137
Child Language Center SLPs or graduate students under their supervision
administered the Structured Photographic Expressive Language Test Ð II (Werner & Krescheck, 1983) and the Bankson±Bernthal Test of Phonology (BBTOP)
(Bankson & Bernthal, 1990) to all participants to further describe their speech and
language skills. The Peabody Picture Vocabulary Test Ð Third Edition (PPVT-III)
(Dunn & Dunn, 1997) was also administered to children in the SLI group. Table 1
lists the mean test scores for each group.
2.2. Procedures
Speech and language tests were administered several days before the children
started the research tasks. During the study, children worked one-on-one with a
research assistant (RA) in a room at their school for approximately 30 min. On the
®rst day of task administration the digit span and the ®rst nonword repetition list
were administered. On the second day the same digit span task was readministered, and the same nonword repetition list, except that the word order was
changed. On days 3±6 children, grouped in pairs, played board games while they
practiced repeating English words. One week after the second administration the
same digit span task and the second nonword repetition list were administered.
The presentation order of nonword lists A and B (described under Section 2.4)
was counterbalanced across children.
2.3. Word repetition practice
On the 4 days between the second and third administrations of the nonword
repetition and digit span tasks, children met with an RA and another child enrolled
in the study to practice repeating one-, two-, three-, four-, and ®ve-syllable
English words. The words were practiced between turns of a game that were
versions of Blue's Clues, Candyland, sticks and marbles, or magnetic ®shing. The
practice words are listed in Table 2. The same set of words was always practiced
with the same game, but the order of presentation was counterbalanced across
children. Each child practiced each list one time.
2.4. Stimuli and scoring
Nonwords from the CNRep developed by Gathercole et al. (1994) were
utilized for the nonword repetition task. The original list of 40 nonwords was
randomly divided by syllable length into two lists of 20 nonwords (A, B) reported
in Table 3. The shorter list reduced administration and scoring time, and permitted
test±retest reliability to be calculated for two different nonword lists. The lists
were counterbalanced across children. Each child repeated nonwords from the
same list two consecutive days; however, the order of the words was changed. One
week later they repeated words from the other list. For all days the pattern of
presentation included a two-, three-, four-, then ®ve-syllable nonwords followed
138
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
Table 2
List of English practice words repeated while playing games
Blue's clues
Candyland
Sticks and marbles
Magnetic fishing
Words
quaff
moity
ambulate
meticulous
polarization
fez
nougat
conjugate
lenticular
ramification
hast
opus
filigree
infirmity
macadamia
gout
pilar
infrared
habitable
hydrophobia
etch
roily
masculine
estuary
expediency
drab
shanty
pacify
disconsolate
disintegration
cease
tundra
radial
congressional
catamaran
bask
umber
tabulate
belligerent
analytical
quaint
miser
ambition
morphology
proximity
fen
nascent
confiscate
linoleum
rudimentary
luge
oblique
exertive
jubilation
orientation
hasp
pestle
inflection
incandescent
indeterminate
gaff
rivet
lucrative
exhilarant
hemophilia
dint
shallot
ordinance
dormitory
documentary
chafe
tepid
quavering
congruity
conglomerate
baste
ulu
serenade
calligrapher
anatomical
qualm
meager
asterisk
parochial
sedimentary
flay
morsel
commingle
mendacity
serendipity
liege
nabob
deficit
lavatory
proximity
haze
papal
gabardine
indicative
infatuated
gape
resin
invalid
gesticulate
hydrodynamics
deign
savvy
misconstrue
embroidery
equilateral
cad
taro
paraffin
disarmament
dilapidated
cask
tyrant
ratify
confederate
antiquarian
quay
methane
affricate
officiatel
ramification
fray
nadir
concordant
medicinal
sedimentary
louse
oblige
diffidence
laceration
polarization
hake
perroni
genial
indelible
inevitable
gad
restive
lithosphere
formidable
homophonous
dame
scoffer
myriad
effectual
edification
chaff
tartar
pedigree
dexterity
depository
bogue
udder
sassafras
charitable
affiliation
by the same pattern for the remaining nonwords. The nonwords were recorded
into WAV computer ®les by a female speaker then imported into Microsoft
Powerpoint for computer presentation to the children. The children wore a
headphone/microphone set while listening to and repeating the nonwords.
Children's repetitions were recorded into WAV computer ®les or were audiotaped.
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
139
Table 3
List A and B of nonwords from the CNRep by syllable length
Two syllables
Three syllables
Four syllables
Five syllables
List A
diller
hampent
rubid
bannow
sladding
bannifer
brasterer
glistering
thickery
doppelate
stopograttic
contramponist
commeecitate
loddenapish
pennerriful
altupatory
pristoractional
reutterpation
confrantually
varsatrationist
List B
pennel
tafflest
ballop
glistow
prindle
barrazon
trumpetine
skiticult
commerine
frescovent
blonterstaping
epliforvent
woogalamic
fenneriser
perplisteronk
voltularity
sepretennial
defermication
detratapilic
underbrantuand
Prior to the presentation of the nonword repetition task, children practiced
repeating three sample nonwords to familiarize them with the task. A voice on the
computer said, ``Hello there, welcome to the word game. I'm going to say a funny,
made-up word. I want you to say it just like me. Are you ready? Say
.'' The
RA encouraged the child to speak clearly into the microphone during the three
practice nonwords, but provided no feedback regarding accuracy of response. The
computer screen remained blank while each of the nonwords was presented. After
the child repeated each nonword a sound effect WAV ®le played. The next word
was then presented. The sound effect reinforcement was introduced because pilot
studies suggested that it encouraged young children to attempt repetition of all of
the nonwords.
RAs scored each nonword production live. To be credited with correct
repetition during live scoring, children were required to repeat the nonword
exactly. Later, when listening to the recorded responses and tallying the score, an
incorrect production was rescored as correct if the error was due to a consistent
phoneme substitution (e.g., /t/ for /k/) also demonstrated on the Bankson±Bernthal
Test of Phonology. Each correct response scored one point.
For the digit span task numbers from one to nine were randomly selected
without replacement to form lists varying from three to nine numbers in length.
Two lists were presented at each length, beginning with three digits. The same
digit span task was presented to the child on consecutive days. The number names
were recorded into WAV computer ®les by a female speaker, then imported into
Microsoft Powerpoint for presentation via computer. As with the nonword
repetition task, the children wore a headphone/microphone set while listening
to and repeating the digits. Children's repetitions were recorded into WAV
computer ®les or were audiotaped. Prior to the presentation of the ®rst digit
span series, children practiced repeating two two-digit series of numbers. During
practice the RA encouraged the child to speak clearly into the microphone, but
140
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
provided no feedback regarding accuracy of response. A voice on the computer
said, ``Hello there, welcome to the numbers game. My friends and I are really
happy that you're going to play. I'm going to say some numbers, then you say
them right after me. Are you ready? Let's practice. Say
.'' The computer
screen remained blank while each series of digits were presented. After the child
repeated the digits, approximately 5 s of music played. The next series of numbers
was then presented.
RAs scored each digit span repetition live. To be credited with correct
repetition children were required to repeat each number of the series in the
correct order. Each correct repetition of a series scored one point. The task
continued until the child failed to repeat either series of numbers at the same span
length.
2.5. Reliability
Audiotapes or WAV ®les from 40% of the nonword repetition and 40% of the
digit span sessions were selected for independent scoring by a trained listener.
Half of the sessions were children with SLI, and half with NL. The average pointto-point agreement for correct/incorrect scoring was 94% (range: 70±100%) for
nonword repetition and 98% (range: 75±100%) for digit span.
3. Results
A repeated-measures mixed-factorial ANOVA was used to assess between- and
within-group differences across administrations of the nonword repetition task.
The between-group factors were language group (NL, SLI) and nonword list
(A, B), and the within-group factor was time of administration (1, 2, 3). With
alpha set at 0.05, preliminary data analysis revealed no signi®cant difference for
the nonword lists. These data sets, therefore, were collapsed across language
groups. Table 4 reports the mean group scores for the nonword repetition task.
Fig. 1 illustrates results for each syllable length. The language groups differed
signi®cantly in the number of nonwords repeated correctly at all three times,
F…1; 42† ˆ 146:81, P ˆ 0:0001, with the NL group repeating more words correctly. There was a signi®cant within-group difference for time of administration,
F…42; 2† ˆ 3:89, P ˆ 0:0242, but no signi®cant time group interaction.
Performance for both groups increased from administration time 1 to 2 and
declined slightly from administration time 2 to 3. Effect sizes (Cohen's d; Cohen,
1988) were calculated to determine the degree to which the number of nonwords
repeated correctly differed across time for each group. For this metric, Cohen
proposed that effects of 0.25, 0.50, and 0.80 represent small, medium, and large
effect sizes, respectively. The effect size for score differences for the SLI and NL
groups from administration 1 to 2 was 0.49 and 0.30, respectively, and for
administrations 2 to 3 was 0.01 and 0.1.
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
141
Table 4
Nonword repetition scores across administrations
Mean
S.D.
Minimum
Maximum
SLI group
Time 1
Time 2
Time 3
7.00
8.68
8.18
2.86
3.39
3.82
3.00
3.00
1.00
17.00
18.00
15.00
NL group
Time 1
Time 2
Time 3
15.86
16.41
16.23
1.73
1.79
2.14
13.00
12.00
11.00
18.00
19.00
20.00
A second repeated-measures mixed-factorial ANOVA assessed between- and
within-group differences for digit span performance across administrations. The
between-group factor was language group, and the within-group factor was time
of administration. Table 5 reports the mean group scores. The language groups
differed signi®cantly in the number of series repeated correctly at all three times,
F…1; 42† ˆ 33:05, P ˆ 0:0001, with the NL group repeating more series correctly.
There was a signi®cant within-group difference for time of administration,
F…42; 2† ˆ 4:93, P ˆ 0:0095, but no signi®cant time group interaction. Performance for both groups increased from administration time 1 to 2, declined
slightly from administration time 2 to 3 for the SLI group, but increased for the NL
group. The effect sizes associated with these differences were 0.50 and 0.02 for
the SLI and NL groups from time 1 to 2, and 0.03 and 0.20 from time 2 to 3.
3.1. Discrimination
To assess whether performance on the nonword repetition or digit span tasks
accurately discriminated children with NL and SLI, and to compare accuracy with
the SPELT-II, discriminate function analyses were conducted for each measure
Table 5
Digit span scores across administrations
Mean
S.D.
Minimum
Maximum
SLI group
Time 1
Time 2
Time 3
1.36
1.86
1.82
0.85
0.99
1.05
0.00
0.00
1.00
3.00
4.00
4.00
NL group
Time 1
Time 2
Time 3
3.68
3.73
4.09
1.39
1.78
1.74
2.00
1.00
1.00
7.00
9.00
8.00
142
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
Fig. 1. Mean number of nonwords repeated correctly by the SLI and NL groups across
administrations.
for each time of administration. Table 6 reports the results of these analyses. The
®rst administration of the nonword repetition task using the total number of
nonwords repeated correctly resulted in excellent sensitivity and speci®city, with
95% of the children with SLI identi®ed as SLI, and 100% of the children with NL
identi®ed as NL. Sensitivity and speci®city decreased slightly on the second
administration, but remained high. Speci®city was further reduced on the third
administration. Although digit span performance was a less accurate discriminator overall than nonword repetition, the level of sensitivity was high on the ®rst
and third administrations. Both phonological memory measures demonstrated
better sensitivity overall than the norm-referenced SPELT-II.
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
143
Table 6
Results of discriminant function analyses showing accuracy of language group classification using
nonword repetition, digit span, K-ABC and SPELT-II scores
SLI
Sensitivity
NL
Error
Specificity
Error
Nonword repetition
Time 1
21/22 (95%)
Time 2
20/22 (91%)
Time 3
21/22 (95%)
1/22 (5%)
2/22 (9%)
1/22 (5%)
22/22 (100%)
21/22 (95%)
19/22 (86%)
0/22 (0%)
1/22 (5%)
3/22 (14%)
Digit span
Time 1
Time 2
Time 3
20/22 (91%)
17/22 (77%)
20/22 (91%)
2/22 (9%)
5/22 (23%)
2/22 (9%)
17/22 (77%)
17/22 (77%)
16/22 (73%)
5/22 (23%)
5/22 (23%)
6/22 (28%)
SPELT-II
18/22 (82%)
4/22 (18%)
21/22 (95%)
1/22 (5%)
Kaufman Assessment Battery for Children (K-ABC) (Kaufman & Kaufman, 1983); SPELTII ˆ Structured Photographic Expressive Language Test Ð II (Werner & Krescheck, 1983).
3.2. Test±retest reliability and task-test correlations
To provide an index of test±retest reliability, Pearson correlation coef®cients
were calculated for the ®rst, second, and third administrations of the nonword
repetition and digit span tasks. These results, as well as test-task correlations, are
reported in Table 7 for the SLI group and Table 8 for the NL group. Recall that
children repeated the same list of nonwords at time 1 and time 2, and a different
list of words at time 3, with the lists counterbalanced across groups. Nonword
repetition scores were signi®cantly correlated at each time of administration for
the SLI group with the strongest correlation from time 1 to 2 (0.72). Neither time
1±2 (0.40) or time 2±3 (0.22) were signi®cantly correlated for the NL group;
however, time 1±3 (0.49) reached signi®cance at the P < 0:05 level. Digit span
scores were signi®cantly correlated at each time of administration for both groups,
with considerably stronger correlations for the NL (0.85±0.87) than the SLI group
(0.48±0.57).
One caveat to using correlation as a reliability measure is that change in
performance across a group may result in a high correlation even though
performance may have improved or declined by the same rate. To quantify
the direction of change, Fig. 2 illustrates the number of children from each group
whose scores increased, stayed the same, or decreased by 1 S.E.M. across
administrations of each task. More children from the SLI group than the NL
group increased their scores from time 1 to 2 for both tasks.
In general, performance on speech and language tests was not signi®cantly
correlated with phonological memory task performance for the SLI group. This is
important to note because SLI and NL group scores differed signi®cantly. The only
signi®cant correlations were between the SPELT-II with nonword repetition at time
144
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
Table 7
Correlations among nonword repetition, digit span, and test scores for the SLI group
NW1
NW1
NW2
NW3
DS1
DS2
DS3
K-ABC
SPELT-II
BBTOP
PPVT-III
±
0.72****
0.50*
0.37
0.54**
0.63**
0.09
0.30
0.05
0.44
NW2
NW3 DS1
DS2
±
0.52**
0.19
0.48*
0.37
0.06
0.36
0.24
0.31
±
0.04
0.40
0.29
0.18
0.49*
0.25
0.45*
±
0.48*
0.11
0.28
0.11
0.38
±
0.57**
0.56**
0.05
0.23
0.14
0.16
DS3
±
0.13
0.23
0.10
0.24
K-ABC SPELT-II
BBTOP
PPVT-III
±
0.23
0.21
0.23
±
0.12
±
±
0.75****
0.38
Note: NW1 ˆ nonword repetition task first administration; NW2 ˆ nonword repetition task second administration;
NW3 ˆ nonword repetition task third administration; DS1 ˆ digit span task first administration; DS2 ˆ digit span
task second administration; DS3 ˆ digit span task third administration; K-ABC ˆ Nonverbal Scale of the Kaufman
Assessment Battery for Children (Kaufman & Kaufman, 1983) (M ˆ 100; S:D: ˆ 15); SPELT-II ˆ Structured
Photographic Expressive Language Test Ð II (Werner & Krescheck, 1983) (M ˆ 100; S:D: ˆ 15);
BBTOP ˆ Bankson±Bernthal Test of Phonology Word Inventory Raw Score (Bankson & Bernthal, 1990);
PPVT-III ˆ Peabody Picture Vocabulary Test Ð Third Edition (Dunn & Dunn, 1997).
*
P < 0:05.
**
P < 0:01.
****
P < 0:0001.
Table 8
Correlations among nonword repetition, digit span, and test scores for the NL group
NW1
NW1
NW2
NW3
DS1
DS2
DS3
K-ABC
SPELT-II
BBTOP
±
0.40
0.49*
0.26
0.17
0.15
0.06
0.14
0.38
NW2
±
0.22
0.51*
0.66***
0.58**
0.19
0.57**
0.63**
NW3
±
0.05
0.07
0.06
0.01
0.15
0.03
DS1
DS2
DS3
±
0.87****
0.85****
0.41
0.34
0.30
±
0.87****
0.39
0.60**
0.40
±
0.48*
0.57**
0.32
K-ABC
±
0.41
0.02
SPELT-II BBTOP
±
0.30
±
Note: NW1 ˆ nonword repetition task first administration; NW2 ˆ nonword repetition task second
administration; NW3 ˆ nonword repetition task third administration; DS1 ˆ digit span task first
administration; DS2 ˆ digit span task second administration; DS3 ˆ digit span task third
administration; K-ABC ˆ Nonverbal Scale of the Kaufman Assessment Battery for Children
(Kaufman & Kaufman, 1983) (M ˆ 100; S:D: ˆ 15); SPELT-II ˆ Structured Photographic
Expressive Language Test Ð II (Werner & Krescheck, 1983) (M ˆ 100; S:D: ˆ 15);
BBTOP ˆ Bankson±Bernthal Test of Phonology Word Inventory Raw Score (Bankson & Bernthal,
1990).
*
P < 0:05.
**
P < 0:01.
***
P < 0:001.
****
P < 0:0001.
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
145
Fig. 2. Score changes as indexed by 1 S.E.M. or more for the SLI and NL groups across
administrations of the nonword repetition and digit span tasks.
3, and the PPVT-III with nonword repetition at time 3. The relationship between test
and task performance was stronger and more prevalent for the NL group, with the
SPELT-II signi®cantly correlated with nonword repetition at time 2 and digit span at
times 3 and 4. Likewise, the BBTOP was signi®cantly correlated with nonword
repetition at time 2. K-ABC performance was signi®cantly correlated with digit
span at time 3.
146
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
4. Discussion
This preliminary study evaluated the usefulness of nonword repetition and
digit span as a diagnostic measure of SLI in a younger sample of children than
previously studied, and assessed the validity and reliability of these measures. As
in previous studies, the SLI group's performance was signi®cantly poorer than
age-matched NL peers on both tasks.
Ideally, a diagnostic measure demonstrates high levels of sensitivity and
speci®city and is quick and simple to administer and to score. The CNRep lists,
delivered via computer, using whole-word scoring, appeared to achieve this goal.
Dollaghan and Campbell (1998) also found near perfect discrimination of
schoolage children with LI and NL using their 16-nonword list. Use of the
CNRep lists in the present study permitted comparison of alternate forms, where
no signi®cant difference was found between lists. These ®ndings suggest that
different nonword stimuli may provide accurate discriminate accuracy. Because
the CNRep stimuli are more ``wordlike'' however, performance on the CNRep
lists may re¯ect a greater in¯uence of prior language knowledge than Dollaghan
and Campbell's 16-nonword list. This relationship is usually documented by
correlating language test scores with task performance. In the present study,
scores on the SPELT-II and PPVT-III were signi®cantly correlated only with the
third administration of the nonword repetition task for the SLI group. This was a
small sample, however, and the language tests measure only limited aspects of
language knowledge.
The whole-word scoring utilized in this study resulted in accurate discriminate
accuracy, as did Dollaghan and Campbell's ``Percentage of Phonemes Correct''
scoring. The latter procedure may require more time than whole-word scoring,
especially as the ages of children studied decreases and the incidence of
articulation and phonological errors increases. If the purpose of a study is to
investigate phonological memory, phoneme-by-phoneme scoring would provide
the more de®nitive description of performance. If the purpose is to provide a
measure that discriminates children with and without SLI, whole-word scoring
may prove effective, and perhaps more ef®cient for clinicians.
To further increase ef®ciency of discrimination between children with SLI and
NL, investigators need to determine the minimum number of nonwords needed for
accurate discrimination. In their ®rst study, Dollaghan and Campbell (1998)
reported that repetition of three- and four-syllable nonwords, but not repetition of
one- and two-syllable nonwords, was signi®cantly lower for the LI than the NL
group in their study of schoolage children. In study 2, repetition of all nonwords
resulted in the most accurate classi®cation of children with LI or NL, but
repetition of three- and four-syllable nonwords also resulted in high levels of
classi®cation accuracy. Ellis Weismer et al. (2000) reported similar ®ndings for LI
groups de®ned by diagnosis at second grade using test scores, or de®ned by
enrollment in language intervention. Between-group differences were most
pronounced on three- and four-syllable nonwords. In the present study, the total
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
147
number of nonwords repeated correctly provided the best levels of sensitivity and
speci®city, with performance on two-syllable words providing the least. These
results suggest that nonword lists of three-, four-, and ®ve-syllable words might
prove most effective for diagnostic purposes for preschool and schoolage
children, and that 20 or fewer nonwords may be needed to accomplish accurate
discrimination.
Nonword repetition diagnostic accuracy surpassed that of the SPELT-II. Previously this test has demonstrated sensitivity and speci®city levels above 90%, but
not as high as the 95% levels demonstrated in the present study on the ®rst
administration of the nonword repetition task. This ®nding lends support to the
suggestion by Dollaghan and Campbell (1998) that nonword repetition may prove
to be a more accurate diagnostic measure of LI than norm-referenced language tests.
The preschool SLI group in this study, similar to the schoolage LI group in the
Dollaghan and Campbell (1998) study, was selected in part using the standard of
enrollment in language intervention services. The high level of identi®cation
accuracy found in this study may not be replicated in children with suspected SLI
who are not enrolled in services. This was the case in the Ellis Weismer et al.
(2000) study in which nonword repetition provided better discrimination of
children with LI so classi®ed because they were receiving treatment, than for
children classi®ed with LI based on test scores alone. Nevertheless, if nonword
repetition tasks prove effective, reliable, less biased, and more ef®cient than
norm-referenced language tests for diagnosing impairment in identi®ed populations from preschool through elementary school, this would improve our current
standard of practice. Further research is needed to document success in unidenti®ed populations.
To be considered reliable, a diagnostic measure must produce stable scores
across administrations. In the present study, similarly to Edwards and Lahey's
(1998) nonword study, scores on both the nonword repetition and digit span tasks
improved signi®cantly when the same nonword list was administered a second
time to children in both language groups. Unlike Edwards and Lahey, the SLI
group appeared to bene®t more from repetition than the NL group. These ®ndings
suggest a differential practice effect for children with SLI that should be taken into
consideration when designing any diagnostic measure. Effect sizes indicate that
the changes in scores from second to third administration were not marked for
either group, despite children's opportunity to practice repeating words during the
intervening time.
The temporal consistency of scores across administrations, or test±retest
reliability, was indexed for both nonword repetition and digit span by calculating
the degree of correlation between successive administrations of the tasks. The
Pearson-Product correlations for administrations 1, 2, and 3 varied between
groups for the nonword repetition task, and for the NL group were lower than
the correlations for the digit span task. Because the ®rst and second administrations of both tasks were only 1 day apart, high correlations would be expected for
both groups; yet the nonword repetition task correlation for the NL group from
148
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
time 1 to 2 was not signi®cant. These results, taken together with the signi®cant
difference in group performance from ®rst to second administrations of the tasks,
raises concerns about the test±retest reliability of these measures. Researchers and
clinicians might improve score stability by building more practice into nonword
repetition tasks before administering scored items. The differential improvement
in scores between groups might also re¯ect poorer test-taking skills by the SLI
group that were improved with practice. Score increases, which were primarily
the result of better performance on one- and two-syllable nonwords on the
nonword repetition task, might also re¯ect the bene®t of production practice
from administration 1 to 2. Although increased practice may improve score
stability, it may also reduce discriminate accuracy because between-group
differences may diminish if SLI group scores increase more than NL group scores.
Performance on the nonword repetition and digit span tasks was signi®cantly
correlated at both times of administration, and at higher levels than previously
reported by Gathercole et al. (1994). Nevertheless, nonword repetition proved to be
the better diagnostic measure of SLI. Although sensitivity was comparable between
the two measures, digit span speci®city was lower than nonword repetition.
In summary, nonword repetition performance holds promise as an identi®er of
SLI in preschool as well as schoolage children; however, acceptable levels of
speci®city and sensitivity must be demonstrated across a range of language
impairment severity levels. The test±retest reliability of both nonword repetition
and digit span require further investigation before these measures could be used
for diagnostic purposes. In particular, evaluation of different nonword lists and
varying amounts of time between administrations is needed for all age groups. SLI
group performance improved more than NL group performance on both phonological memory tasks, suggesting that practice effects should be taken into
consideration when re®ning nonword repetition as a future diagnostic measure
of SLI. The use of whole-word scoring may prove both ef®cient and effective for
discriminating children with and without SLI.
Acknowledgments
This research was supported by funding from the Tucson Scottish Rite
Charitable Foundation, by National Multipurpose Research and Training Center
Grant DC01409 from the National Institutes of Health NIDCD, and by National
Institutes of Health NIDCD 1 R03 DC04240-01.
Thank you to the children, families, teachers and administrators from The
Scottish Rite Ð University of Arizona Child Language Center, Little Ranch
School, and Castle Hill School who participated in this project, and to the graduate
and undergraduate students who administered the assessments and research tasks.
Special thanks to Becky Vance for her invaluable expertise and support and to
Linda Swisher, David Ingram, and an anonymous reviewer for review of the
manuscript.
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
149
Appendix A. Continuing education
1. Two tasks used to assess children's phonological memory include:
A. Alphabet recitation and story retell.
B. Nonword repetition and digit span.
C. The rainbow passage and story retell.
D. Digit span and alphabet recitation.
E. Story retell and digit span.
2. To be considered a valid diagnostic measure, performance on a test or task
must:
A. Improve over time.
B. Exceed a standard score of 95.
C. Accurately discriminate children with and without the target disorder.
D. Be an enjoyable task for the child.
E. Show higher mean scores for normal groups than impaired groups.
3. To be considered a reliable diagnostic measure, performance on a test or
task must:
A. Improve with practice.
B. Increase with the child's age.
C. Be easy to administer.
D. Be computer scored.
E. Be highly correlated across repeated administrations.
4. In this study performance on the nonword repetition task changed from
administration 1 to administration 2 as follows:
A. The SLI group mean score improved more than the NL group mean
score.
B. The NL group mean score improved more than the SLI group mean
score.
C. Mean scores for both groups declined equally.
D. Mean scores for both groups increased equally.
E. The NL group mean score improved but the SLI group mean score
declined.
5. The findings of this study suggest that:
A. Digit span tasks are superior to nonword repetition tasks for
identifying SLI in preschoolers.
B. Nonword repetition tasks hold promise as an identifier of SLI in
preschool children.
C. Standardized language tests demonstrate better sensitivity and
specificity than nonword D repetition tasks.
D. Practice has no effect on task performance.
E. Nonword repetition is too difficult for preschoolers.
150
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
References
American National Standards Institute. (1989). Specifications for audiometers (ANSI S3.6-1989).
New York: ANSI.
Bankson, N. W., & Bernthal, J. E. (1990). Bankson±Bernthal Test of Phonology. Chicago, IL: The
Riverside Publishing Company.
Bishop, D., Bishop, S., Bright, P., James, C., Delaney, T., & Tallal, P. (1999). Different origin of
auditory and phonological processing problems in children with language impairment: Evidence
from a twin study. Journal of Speech, Language, and Hearing Research, 42, 155±168.
Bishop, D. V. M., North, T. L., & Donlan, C. (1996). Nonword repetition as a behavioural marker for
inherited language impairment: Evidence from a twin study. Journal of Child Psychology and
Psychiatry, 36, 1±13.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NL:
Erlbaum.
Dollaghan, C., Biber, M., & Campbell, T. (1995). Lexical influences on nonword repetition. Applied
Psycholinguistics, 16, 211±222.
Dollaghan, C., & Campbell, T. (1998). Nonword repetition and child language impairment. Journal
of Speech, Language, and Hearing Research, 41, 1136±1146.
Dunn, L., & Dunn L. (1997). Peabody Picture Vocabulary Test Ð Third Edition. Circle Pines, MN:
American Guidance Service.
Edwards, J., & Lahey, M. (1998). Nonword repetitions of children with specific language
impairment: Exploration of some explanations for their inaccuracies. Applied Psycholinguistics,
19, 279±309.
Ellis Weismer, S., Tomblin, J. B., Zhang, X., Buckwalter, P., Chynoweth, J. G., & Jones, M. (2000).
Nonword repetition performance in schoolage children with and without language impairment.
Journal of Speech, Language, and Hearing Research, 43, 865±878.
Gathercole, S. E., & Baddeley, A. D. (1989). Evaluation of the role of phonological STM in the
development of vocabulary in children: A longitudinal study. Journal of Memory and Language,
28, 200±213.
Gathercole, S. E., & Baddeley, A. D. (1990a). Phonological memory deficits in language disordered
children: Is there a causal connection? Journal of Memory and Language, 29, 336±360.
Gathercole, S. E., & Baddeley, A. D. (1990b). The role of phonological memory in vocabulary
acquisition: A study of young children learning new names. British Journal of Psychology, 81,
439±454.
Gathercole, S., & Baddeley, A. (1995). Short-term memory may yet be deficient in children with
language impairments: A comment on van der Lely & Howard (1993). Journal of Speech and
Hearing Research, 38, 463±466.
Gathercole, S. E., Hitch, G. J., Service, E., & Martin, A. J. (1997). Phonological short-term memory
and new word learning in children. Developmental Psychology, 6, 966±979.
Gathercole, S. E., Service, E., Hitch, G. J., Adams, A., & Martin, A. (1999). Phonological short-term
memory and vocabulary development: Further evidence on the nature of the relationship. Applied
Cognitive Psychology, 13, 65±77.
Gathercole, S. E., Willis, C., & Baddeley, A. D. (1991). Differentiating phonological memory and
awareness of rhyme: Reading and vocabulary development in children. British Journal of
Psychology, 82, 387±406.
Gathercole, S., Willis, C., Baddeley, A., & Emslie, H. (1994). The Children's Test of Nonword
Repetition: A test of phonological working memory. Memory, 2, 103±127.
Gathercole, S. E., Willis, C., Emslie, H., & Baddeley, A. (1992). Phonological memory and
vocabulary development during the early school years: A longitudinal study. Developmental
Psychology, 28, 887±898.
Hammill, D., & Newcomer, P. (1988). Test of Language Development Intermediate: 2. Austin, TX:
Pro-Ed.
S. Gray / Journal of Communication Disorders 36 (2003) 129±151
151
Howard, D., & van der Lely, H. (1995). Specific language impairment in children is not due to a
short-term memory deficit: Response to Gathercole & Baddeley. Journal of Speech and Hearing
Research, 38, 466±472.
Kaufman, A. S., & Kaufman, N. L. (1983). Kaufman Assessment Battery for Children. Circle Pines,
MN: American Guidance Service.
Montgomery, J. (1995). Sentence comprehension in children with specific language impairment: The
role of phonological working memory. Journal of Speech and Hearing Research, 38, 187±199.
Plante, E., & Vance, R. (1994). Selection of preschool language tests: A data-based approach.
Language Speech and Hearing Services in Schools, 25, 15±24.
Tomblin, J. G., Records, N., Buckwalter, P., Zhang, X., Smithe, E., & O'Brien, M. (1997). Prevalence
of specific language impairment in kindergarten children. Journal of Speech, Language, and
Hearing Research, 40, 1245±1260.
van der Lely, H., & Howard, D. (1993). Children with specific language impairment: Linguistic
impairment or short-term memory deficit? Journal of Speech and Hearing Research, 36,
1193±1207.
Wechsler, D. (1991). Wechsler Intelligence Scale for Children-III (WISC-III). San Antonio, TX:
Psychological Corporation.
Werner, E., & Krescheck, J. D. (1983). Structured Photographic Expressive Language Test Ð II.
Sandwich, IL: Janelle Publications.