Matching Strategies in Cognitive Research with Individuals with

advertisement
479681.qxd 2/2/04 10:56 AM Page 19
Journal of Autism and Developmental Disorders, Vol. 34, No. 1, February 2004 (© 2004)
Matching Strategies in Cognitive Research with Individuals
with High-Functioning Autism: Current Practices,
Instrument Biases, and Recommendations
Laurent Mottron1,2
A meta-analysis was performed on the 133 cognitive and behavioral papers in autism using comparison groups in the 1999–2002 period. High-functioning (average IQ: 84.7), adolescents
(average, 14.4 years) are largely dominant. IQ is the most frequent matching variable in use
(51.2%). The instruments that are most frequently used to determine IQ or general level are
Wechsler scales (46.9%), British Picture Vocabulary Scale (BPVS; 22.3%), and Raven Progressive Matrices (RPM; 16.9%). In order to determine if these instruments were equivalent
when applied to individuals with pervasive developmental disorders (PDDs), Wechsler IQ, EVIP
(a French Canadian translation of the BPVS), and RPM were given to a group of 14 individuals with autism and 12 with Asperger syndrome. Comparison of Wechsler and RPM IQs values,
expressed as percentiles, to percentile values of EVIP score revealed that EVIP (and to a lesser
extent RPM) considerably overestimates the level of all PDD participants as compared to
Wechsler Verbal IQ (VIQ), Performance IQ (PIQ), or Full-Scale IQ (FSIQ), whereas these
instruments are reported to be strongly correlated in typically developing individuals. This study
reveals that identification of objects from a verbal label—the BPVS–PPVT–EVIP task—is a
peak of ability in high functioning individuals with PDDs. This peak of ability, even superior
to that of block design, has a detrimental effect on matching based on this instrument. A recommendation to replace BPVS/PPVT/EVIP or RPM by Wechsler scale as a basis of IQ/level
matching is provided. Accordingly, the former instruments are a potential source of type-1 (for
cognitive deficits) or type-2 (for cognitive hyperfunctioning) errors.
KEY WORDS: British Picture Vocabulary Scale; high-functioning autism; IQ; matching strategy;
methodology; peaks of ability; Raven Progressive Matrices; Wechsler Intelligence Scale.
INTRODUCTION
performances under study. Therefore, some kind of control should be used to limit the confounding effects that
may result from this heterogeneity in these groups.
The charting of cognitive deficits and strengths
among persons with autism requires comparison of their
performance with that of another group. In addition to
differences in clinical status, the target and comparison
groups can differ on many factors, such as age and level
of functioning, which may influence the cognitive
1
2
Abbreviations: ADI, Autism Diagnostic Interview; ADOS, Autism
Diagnostic Observation Schedule; AS, Asperger syndrome; BPVS,
British Picture Vocabulary Scale; CA, chronological age; CELF,
Clinical Evaluation of Language Fundamentals; EVIP, Échelle de
Vocabulaire en Images Peabody; FSIQ, full-scale IQ; HFA, highfunctioning autism; MA, mental age; PDD, pervasive developmental
disorder; PDDNOS, pervasive developmental disorder not otherwise
specified; PIQ, performance IQ; PPVT, Peabody Picture Vocabulary
Test; RPM, Raven Progressive Matrices; RT, reaction time; ToM,
theory of mind; TROG, Test of Reception of Grammar; VIQ, verbal
IQ; WAIS, Wechsler Adult Intelligence Scale; WISC, Wechsler
Intelligence Scale for Children.
Centre de Recherche Fernand Seguin and Département de
Psychiatrie, Université de Montréal, Montréal, Canada
Correspondence should be addressed to Laurent Mottron, Clinique
Spécialisée de l’Autisme, Hôpital Rivière des Prairies, 7070 Blvd.
Perras, Montréal, Canada, H1E1A4; e-mail: mottronl@istar.ca
19
0162-3257/04/0200-0019/0 © 2004 Plenum Publishing Corporation
479681.qxd 2/2/04 10:56 AM Page 20
20
As autism is a condition that begins early in life,
is observed across the range of functioning levels from
severe mental retardation to superior intelligence, and
may or may not be accompanied by identified neurological syndromes, a random group of individuals who
satisfy the criteria for the diagnosis is likely to be heterogeneous at multiple and possibly interacting levels.
This heterogeneity complicates the matching procedure and, accordingly, leads to the study of specific
subgroups. The choices of these subgroups may be
arbitrary or driven by considerations of convenience
rather than by purely scientific decisions.
The search for the unique characteristics of autism
entails the study of specific factors that are linked either to the phenotype or to the causal mechanism of
autism per se and are not shared with other neurodevelopmental disorders. The attempt to disentangle specific and nonspecific factors in cognitive research on
autism led to the mental age (MA) matching paradigm
that is attributed to Hermelin and O’Connor (1970). Before their seminal contribution, individuals with autism
were typically compared to those of typically developing individuals of the same chronological age (CA), a
problematic strategy due to the high incidence of low
IQs among persons with autism. In their innovative series of experiments, Hermelin and O’Connor compared
the performance of persons with autism who functioned
in the range of mental retardation to that of younger
typically developing children matched on mental age
(MA) or of nonautistic persons with mental retardation
matched on MA. Despite the contributions of the MAmatching paradigm in controlling for differences in developmental level, this strategy introduces other
confounding elements. For example, inherent differences in CA between younger typically developing children and older persons with low-functioning IQ limit
the implications of findings based on MA-matched
studies (for a review, see Burack, Iarocci, Bowler, &
Mottron [2002]) because CA differences between
groups cannot be disentangled from the differences related to autism per se. Alternatively, when persons with
mental retardation matched on IQ are compared to persons with autism, the characteristics associated with the
etiologies of mental retardation in the comparison
group may be the source of artifactual group differences (Yirmiya, Shaked, Solomonica-Levi, 1998).
In order to minimize the theoretical and practical
shortcomings of MA matching, persons with highfunctioning autism (HFA) became the focus of cognitive research during the late 1980s. Individuals with
high-functioning autism may be compared to typically
developing individuals without the confounding factors
Mottron
of CA discrepancies and syndrome-specific findings.
The study of autism is further facilitated in this group
due to superior compliance and attention span and, for
adults, the ability to provide legally their own consent
to participate in the research.
Although individuals with HFA display the same
variability in average intelligence as typically developing individuals, they present an uneven profile of
strengths and deficits within subtests that comprise the
more general score of intelligence. Peaks and valleys
may differentially influence averaged level measures
according to the instruments used. Accordingly, the
practical aspects of matching on intelligence level may
have important consequences on the similarities between groups matched on intelligence level. Thus, the
purpose of this paper is twofold. The first is to present
a systematic survey of the current strategies for choosing subgroups, matching variables, and standardized
instruments for assessment of level of functioning in the
recent literature on cognitive neuroscience research with
individuals with pervasive developmental disorders
(PDDs). The second purpose of this paper is to compare
empirically the most common instruments used for level
matching which were identified in the first study.
STUDY 1: META-ANALYSIS OF MATCHING
STRATEGIES AND SUBGROUP CHOICES
USED IN BEHAVIORAL NEUROSCIENCE OF
PERVASIVE DEVELOPMENTAL DISORDERS
Method
Behavioral and cognitive, peer-reviewed, empirical papers involving group studies quoted in the MEDLINE database in the 3-year period from January 1999
to December 2002 under the entries “Autism” and
“Asperger syndrome” were selected. The inclusion criteria were that the papers were published in English,
included a full abstract, involved typical or clinical
comparison groups or normative data, and that performance was based on reaction time (RT) scores,
error/success rates, or another behavioral measure
as the dependent variable. Behavioral findings of
brain-imaging studies were included in the survey, but
clinical trials, rehabilitation, brain imaging, and electrophysiological studies without behavioral-dependent
measures as well as studies with only individuals with
pervasive developmental disorder not otherwise specified (PDD-NOS) were excluded.
One hundred and thirty-three papers in cognitive
neuroscience on autism and Asperger syndrome were
479681.qxd 2/2/04 10:56 AM Page 21
Matching Issues and IQ-Instrument Biases
reviewed, with a total of 169 comparison groups (as
multiple matching groups were used for the same
experiment or for several experiments in the same
paper). Among the 169 comparisons, 106 (62.7%) compared typically developing with individuals with
autism, 24 (14.2%) with Asperger syndrome, and 39
(23.1%) with combined groups of individuals. For each
comparison group, the mean IQ, MA, and CA of the
clinical population, type of comparison group used
(clinical, typical), type of matching (pair-wise, groupwise, covariation), number of matching variables (e.g.,
FSIQ, VIQ, PIQ, MA, CA), and tool to assess the level
of intelligence or performance used for matching (e.g.,
WAIS, RPM, PPVT, BPVS) was noted.
In order to determine the proportion of participants
with and without mental retardation among the persons
with autism, the intelligence levels provided by the various instruments across the studies was converted to a
common metric. Because the Wechsler instruments are
the most frequent measures of general intelligence in use,
Wechsler Full-Scale IQ (FSIQ; Wechsler, 1974, 1981)
was taken as reference measure and provided when available. In cases where only Wechsler Verbal IQ (VIQ)
(34 comparisons) or Performance IQ (PIQ) (14 comparisons) were mentioned, this value was taken as indication of participants’ intelligence. In the cases of the
British Picture Vocabulary Scale (BPVS; Dunn, Dunn,
Whetton, & Pintilie, 1982), which provides MAs and
percentiles, and Raven Progressive Matrices (RPM;
Raven, 1938, 1947, 1995, 1996), which provides IQs and
percentiles, an approximation of IQ was obtained using
Wechsler Percentile-IQ Correspondence Scale. In the
cases of other tests that provide only MA (e.g., Test of
Reception of Grammar [TROG], Mullen, Bayley,
Vineland Adaptive Behavior Scale, Clinical Evaluation
of Language Fundamentals [CELF], Brunet Lézine, and
Reynell), an approximation of IQs was computed using
the IQ = MA/CA formula. Finally, a cut-off between
high- and low-functioning of IQ = 67 rather than the
usual value of 70 was selected, as this is the lowest value
for which a correspondence between IQ and percentile
value can be obtained according to Wechsler manual.
Results
Characteristics of Participants
Distribution of Comparisons According to IQ
Level of Participants. The distribution of comparisons
according to IQ levels for the autism, the Asperger, and
the combined PDD subgroups are presented in Table I.
Due to the lack of availability of information for two
21
Table I. Distribution of Comparisons According to the
Intelligence Level of the Participantsa
Number of groups (%)
Intelligence level
Mild mental retardation
(FSIQ < 67)
Borderline intelligence
(FSIQ 67 to 90)
Normal intelligence
(FSIQ > 90)
Total number of
comparisons
Autism
PDD
Asperger
26 (22.0)
0 (0.0)
0 (0.0)
45 (38.2)
7 (41.2)
6 (18.8)
47 (39.8)
10 (58.8)
26 (81.2)
118 (100.0)
17 (100.0)
32 (100.0)
PDD, pervasive developmental disorder; FSIQ, full-scale IQ.
a
PDD refers to studies where data from autism and Asperger
syndrome cannot be differentiated.
Table II. Distribution of Comparisons According to Chronological
Age of Participants
Autism
PDD
Asperger
syndrome
Age range
Number of
groups (%)
Number of
groups (%)
Number of
groups (%)
0 to 5 years old
6 years and older
17 (14.5)
100 (85.5)
1 (5.9)
16 (94.1)
0 (0.0)
35 (100.0)
Total
117 (100.0)
17 (100.0)
35 (100.0)
PDD, pervasive developmental disorder.
papers, proportions are provided for 167 comparisons.
The average IQ across comparisons, with each
comparison weighted equally, was 84.7. The majority
(76.7%) of comparisons relied on participants with
autism without mental retardation.
Distribution of Comparisons According to the Age
of Participants. The ages of the participants included
in these studies are presented in Table II. The mean
age of participants across studies, with each study
weighted equally, was 14.4 years. In 85.5% of the
studies with persons with autism, 94.1% of those with
persons with PDD, and 100% of those with persons
with Asperger syndrome (AS), the average age of the
participants was older than 6 years. The lack of studies with preschoolers with AS likely results from the
practical problem of providing this diagnosis before
5 years of age.
Matching Strategies
Socio-Demographic Variables. The relative frequency of the different socio-demographic variables
that are used are depicted in Table III. Chronological
479681.qxd 2/2/04 10:56 AM Page 22
22
Mottron
Table III. Distribution of Socio-Demographic Matching Strategies Among Comparisons (Total: 169)
Variable
CA
Gender
Educational level
Laterality
SES
Other (profession, interest,
sensory profile)
Matching (%)
83
29
9
10
4
4
Covariation (%)
No differences
between groups (%)
3 (1.8)
—
—
—
—
—
18 (10.7)
3 (1.8)
1 (0.6)
—
1 (0.6)
2 (1.2)
(49.1)
(17.2)
(5.3)
(5.9)
(2.4)
(2.4)
Total number
of groups (%)
104
32
10
10
5
6
(61.5)
(18.9)
(5.9)
(5.9)
(3.0)
(3.6)
CA, chronological age; SES, socioeconomic status.
age (approximately two-thirds of the groups, 104
comparisons = 61.5%) and gender (approximately
one-fifth, 32 comparisons = 18.9%) were the most
frequently used variables.
Type and Number of Matching Variables. A single
matching variable was used in 51 comparisons (30.2%).
The most commonly used were VIQ or Verbal level in
20 comparisons (11.8%), CA in 12 = (7.1%), PIQ in
5 (2.9%), and MA in 5 (2.9%). Two variables were used
in 55 comparisons (32.5%). The most frequent combinations were CA + VIQ or Verbal level (17 comparisons; 10%), CA + FSIQ (9 comparisons; 5.3%), CA
+ PIQ (7 comparisons; 4.5%), and CA + Gender (4
comparisons; 2.3%). When three variables were used,
the most frequent combinations were VIQ/level + PIQ
+ CA (7 comparisons; 4.5%), FSIQ + Gender + CA
(5 comparisons; 2.9%), and VIQ/level + PIQ + FSIQ
(3 comparisons; 1.7%). Across all the comparisons, the
most common matching variables were VIQ/level in
74 comparisons (43.8%), PIQ in 53 (31.4%), and FSIQ
in 36 (21.3%).
Instruments Used to Assess General Intelligence.
Among the 130 comparisons where intelligence was
used as matching variable, the Wechsler Intelligence
Scale for Children/Wechsler Adult Intelligence Scale
(WISC/WAIS) was the test used most frequently
(61/130 = 46.9%), followed by the BPVS, the British
version of PPVT-R3 (29/130 = 22.3%) for verbal level,
and by RPM for performance IQ (22/130 = 16.9%).
Scales of adaptive functioning (VABS) were used only
marginally (3/169 = 1.8%).
3
BPVS and PPVT (Dunn & Dunn, 1981) are vocabulary scales that
consist in pointing to the picture corresponding to a verbally
presented name among three distracters. BPVS has been derived
from the PPVT-R. PPVT-III is similar to PPVT-R with the exception
of enlarged age validity and improved culture and gender fair
pictures. On the BPVS, only 3.9% of PPVT-III items/words
corresponding to specifically U.S. idioms have been replaced by
their U.K. equivalent.
Discussion
The current prototypical study in cognitive neuroscience on autism is a cross-sectional study using individuals with HFA with a FSIQ around 85 and CA of
14 years. Participants with low developmental and
chronological levels are practically discarded from cognitive research on autism. As a spontaneous and unquestioned research strategy, this finding raises
important issues of external validity. Can the
knowledge based on this intelligence level and age
group, now clearly the main source of knowledge regarding cognition in autism, be generalized to individuals in the entire range of age and IQ levels? Using
individuals more than 10 years older than the process
that made them autistic cannot separate the specific
effect of pathological process per se from those of reaction to these processes. The cognitive profile of
adults with autism plausibly results from the combined
effects of expertise and over-training linked to special
interests, adaptive compensatory strategy, or, in
contrast, under-training linked to impoverishment of
external stimulation. Accordingly, the use of adults is
probably not the best strategy to document “primary”
deficits—even if this represents an indispensable
source of information on the cognitive condition of
adults with autism.
The second finding of this literature survey was
that CA and IQ/level, mostly verbal but also performance or FSIQ, are the dominant matching variables
in current cognitive research on PDDs. This is due to
the use of high-functioning individuals that allow CA
matching—a practice impossible with the use of individuals with mental retardation. However, the use of
IQ or verbal level as the main matching variable puts
a heavy burden on the instruments used to measure this
IQ. This study also establishes that although Wechsler
scales are largely dominant (approximately one-half of
the studies), two other instruments, BPVS (one-quarter)
and RPM (one-sixth) are also used in a significant
479681.qxd 2/2/04 10:56 AM Page 23
Matching Issues and IQ-Instrument Biases
23
proportion of studies. Therefore, possible differences
in sensitivity and lack of comparability among studies
may result from this use of multiple instruments, which
may not be correlated similarly in comparison and clinical populations. This issue will be addressed in the
following study.
STUDY 2: ARE THE DIFFERENT
INSTRUMENTS USED FOR IQ OR LEVEL
MATCHING EQUIVALENT?
A survey of recent cognitive literature on autism
indicates that partial or general measures of IQ are the
variables most frequently used for matching. Notwithstanding, researchers used as equivalent several
instruments to measure the intelligence level on which
the matching is performed. For example, verbal IQ or
verbal level is obtained through Wechsler VIQ
or BPVS, and nonverbal IQ is obtained through RPM
or Wechsler PIQ. The purpose of Study 2 was therefore to determine if these instruments were equivalent
and, more precisely, the type of biases (over- or
underestimation) possibly introduced by these instruments. To answer these questions, we decided to compare the intelligence level of a group of individuals with
PDD, as measured by the two most frequently used
measures of verbal performance (Wechsler VIQ vs.
EVIP score) and nonverbal performance (Wechsler PIQ
vs. RPM). EVIP is a French translation of PPVT-R,
valid for French-Canadian as well as European French
populations.
Fig. 1. Percentiles equivalent obtained by EVIP, RPM, and
Wechsler subscales.
subgroup: 12 individuals, 8 M, 4 F; mean age, 15.3;
SD, 6.5; range, 7–29). PIQ, VIQ, and FSIQ obtained
by WISC/WAIS and IQs or levels from EVIP and RPM
were transformed into percentiles according to the tables provided with each instrument and compared using
Student t test for paired samples (two-tailed). Because
of relatively small samples, Wilcoxon matched-pairs
test was also carried out as validation process of
Student t-test results (Fig. 1).
Results
Relative IQ/levels (percentile equivalent) obtained
by EVIP, RPM, and Wechsler subscale are represented
in Figure 1 and statistical analyses among values provided by these different instruments in Table IV. For
comparison purposes, Wechsler block-design and
vocabulary subscale were included in the analyses.
Method
All individuals (a) with a diagnosis of autism,
using both Autism Diagnostic Interview (ADI) or
ADI-R and Autism Diagnostic Observation ScheduleGeneric (ADOS-G) module 3 or 4, (b) having received
a full-scale intelligence measurement with WISC-III or
WAIS-III, (c) who were administered both a RPM and
a EVIP assessment, and (d) with a FSIQ > 60 were selected in the PDD population of the clinique spécialisée de l’autisme of Rivière-des-Prairies’s hospital,
using Datafinder database (Digimed systems).
Among this group, individuals who were diagnosed with autism, but without language delay,
echolalia, pronoun reversal, or stereotyped language at
ADI administration were considered as having Asperger
syndrome. This resulted in a group of 26 individuals
(autism subgroup: 14 individuals, 11 M, 3 F; mean
age, 12.5; SD, 8.2; range, 6–39; Asperger syndrome
Autism Subgroup
Wechsler FSIQ, VIQ, and PIQ comparison revealed a nonsignificant PIQ > VIQ pattern, thus replicating the finding of Siegel et al. (1996) of
unremarkable differences between Wechsler sub- and
full-scale among sufficiently large groups of persons
with autism. In contrast, differences in the percentile
measure obtained by Wechsler VIQ and EVIP level
were statistically significant [Student t test for paired
samples (two-tailed): t(13) = 4.55, p = 0.001;
Wilcoxon matched-pairs: z = 3.30, p = 0.001]. This indicates that the EVIP considerably overestimates verbal abilities in comparison to the Wechsler verbal
subscale. Differences between Wechsler PIQ and RPM
were not significant, but differences between Wechsler
FSIQ and RPM were near significance or significant
according to analyses [Student t test for paired samples
479681.qxd 2/2/04 10:56 AM Page 24
24
Mottron
Table IV. Consistency Among Percentiles Equivalent Obtained by
Different Instruments (t-Test, p-values)
1
2
3
4
5
6
—
.0048c
.05b
.013b
.0084c
.0021c
.0033c
—
.45
.23
.33
.44
.61
—
.77
.0084c
.79
.040b
—
.10
.34
.07
—
.21
.28
—
.21
—
.31
.60
.0020c
.17
.00056d
.00058d
—
.16
.12
.44
.09
.06
—
.082
.015b
.038b
.02b
—
.63
.75
.94
—
.29
.25
—
.42
—
.0053c
.37
.00005d
.0032c
.000002d
.00001d
—
.11
.81
.21
.27
.09
—
.20
.00027d
.053
.0029c
—
.28
.34
.12
—
.86
.90
—
.60
Asperger
1.
2.
3.
4.
5.
6.
7.
EVIP
RPM
Block designa
Vocabularya
PIQa
VIQa
FSIQa
Autism
1.
2.
3.
4.
5.
6.
7.
EVIP
RPM
Block designa
Vocabularya
PIQa
VIQa
FSIQa
Discussion
Type of Bias Introduced by EVIP and RPM
Asperger or autism
1.
2.
3.
4.
5.
6.
7.
EVIP
RPM
Block designa
Vocabularya
PIQa
VIQa
FSIQa
and those with Asperger syndrome, a single analysis was
conducted on both groups. According to this analysis,
the scores were highest on the EVIP, which were superior to those on the three Wechsler subscales. The scores
on both the EVIP and block-design were superior to
those on the Wechsler VIQ and PIQ, thereby indicating
that they may be considered as peaks of ability. At an
individual level, EVIP level was greater than Wechsler
VIQ for 24 out of 26 individuals with autism or Asperger syndrome. In contrast, differences among RPM
and Wechsler PIQ in the combined group were not
significant.
a
Wechsler Intelligence Scale.
p < .05.
c
p < .01.
d
p < .001.
b
(two-tailed): t(13) = 2.06, p = 0.060; Wilcoxon
matched-pairs: z = 2.10, p = 0.036] showing that the
RPM overestimates general intelligence level in comparison to the Wechsler Full-Scale IQ.
Asperger Subgroup
Comparisons among Wechsler FSIQ, VIQ, and
PIQ revealed a nonsignificant VIQ > PIQ pattern that
replicates Miller & Ozonoff’s (2000) findings with this
group. In contrast, EVIP level was significantly superior to Wechsler VIQ [Student t test for paired samples
(two-tailed): t(11) = 4.01, p = 0.002; Wilcoxon
matched-pairs: z = 2.85, p = 0.004]. Differences between Wechsler PIQ and RPM and between Wechsler
FSIQ and RPM were not significant (See Fig. 1).
Findings from the Combined Groups of Persons
with Autism and Persons with Asperger Syndrome
In order to validate the findings in the face of possible disagreements concerning the strategy for diagnostically distinguishing between persons with autism
The purpose of Study 2 was to compare the intelligence level, expressed in percentile equivalent of
a typically developing population, of the tests most
frequently used for measuring IQ or level of performance used for matching. The finding indicates that
EVIP, RPM, and Wechsler scales result in important
differences in performance when applied to the same
population of high-functioning PDD individuals. For
the subgroup of individuals with autism, the performance discrepancies are especially important between
EVIP and RPM/Wechsler scales and between RPM
and Wechsler FSIQ. In the subgroup of individuals
with Asperger syndrome and in the combined
autism–Asperger group, the discrepancies are maximum
between EVIP and Wechsler scales. As a consequence,
findings based on matching with vocabulary-based
scales, RPM, and Wechsler measures are not compatible, and the comparisons across studies need to involve
corrections for differences in sensitivity.
The concern that correlations between vocabularybased scales and Wechsler scale might be substantially
lower in clinical populations than in typically developing individuals is supported by prior evidence. In a
MEDLINE literature review, findings from 17 studies conducted with various clinical populations (mostly learning disabilities and mental retardation at large)
indicated that PPVT-R underestimates intelligence level
compared to Wechsler VIQ or FSIQ. In contrast, evidence from three studies suggested that PPVT-R overestimates intelligence level in comparison with
Wechsler VIQ or FSIQ, mostly in neuropsychiatric
patients and elderly individuals with average to high
IQ (Mangiaracina & Simon, 1986; Price, Herbert,
Walsh, & Law, 1990; Snitz, Bieliauskas, Crossland,
Basso, & Roper, 2000). When individuals with low
intelligence are used as a comparison group for research
conducted with persons with PDD, the additive effect
479681.qxd 2/2/04 10:56 AM Page 25
Matching Issues and IQ-Instrument Biases
of these two opposite biases creates a high risk of comparing populations with important differences in level
of general intelligence.
The bias introduced by the use of RPM as matching instrument is quantitatively smaller than that associated with the EVIP, but was still significant in the
subgroup of persons with autism. This difference in
measures for persons with autism is discrepant with
high correlation between RPM and FSIQ or VIQ which
is evidenced among typically developing individuals
(Burke, 1985; Jensen, Saccuzzo, & Larsen, 1988;
O’Leary, Rusch, & Gaustello, 1991) as well as among
a heterogeneous population of patients in psychiatric
hospitals (O’Leary et al., 1991).
These findings from the analyses of the EVIP and
the RPM are inconsistent with the assumption that instruments that measure intelligence in persons with PDD
reflect general intelligence in the same way as in the comparison groups. The matching of persons with autism to
typically developing individuals with vocabulary-based
scales or with RPM results in an overestimation of the
general intelligence of the former group which would
not be manifested with a Wechsler scale. A more radical conclusion is that the concept of VIQ for persons
with autism, and even more so for persons with
Asperger syndrome, consists of two very different measures that cannot be used one for the other. One is an
average of measures involving language use and
language mechanisms, for which the Wechsler VIQ
might be one source, and the other is a measure of the
peak of ability in labeling objects as measured by
vocabulary-based scales.
Possible Explanation of the Differences
in Intelligence Level Between Wechsler Scales
and the Two Other Instruments Under Study
My position is that vocabulary-based instruments,
and to a lesser extent the RPM, are tasks that rely on a
more limited range of cognitive abilities than the corresponding verbal and nonverbal Wechsler subscales
and the resultant FSIQ. The consequence of this difference is that the extreme scores characterizing peaks
of abilities of persons with autism influence the performance on these instruments to a larger extent than
on Wechsler VIQ, PIQ, or FSIQ. This is evident when
the Wechsler standard scores for the subjects in Study 2
are averaged with and without the peaks and valleys.
The peaks were defined according to WISC-III manual
(Canadian supplement) as a significant difference
( p < 0.05) of a specific subtest in relation to the average of six other subtests. This resulted in a mean of
2.7 classical peaks (e.g., block design, vocabulary) and
25
2.15 valleys (e.g., comprehension, digit-span) per subject. Despite the prevalence of these widely discrepant
levels of ability, the averaged standard scores with and
without the extreme values were similar. In contrast,
strengths in the abilities that are the basis of EVIP and
RPM scores are not compensated by other tasks in
which performance is average or inferior.
A second position is that vocabulary-based scales
and RPM happen to tap into cognitive systems that correspond to peaks of ability among persons with autism.
Both instruments involve tasks in which the overfunctioning of low-level perceptual cognitive operations that characterize autism (Mottron & Burack,
2001) is an advantage. Accordingly, the association of
a verbal label and a picture (EVIP) may be performed
at a low perceptual level, as the similarity between a
series of visually presented patterns (RPM) may be detected by extraction of perceptual regularities. Consistent with this interpretation and as indicated in Figure 1,
performance on vocabulary-based scales by persons
with PDDs is even higher than on the Wechsler blockdesign task, the most documented source of a peak of
ability in this group.
Post hoc Reinterpretations of Data Based
on the Use of These Instruments
One of the major goals of current research on
autism is the search for cognitive hyper- or hypofunctioning. Hyperfunctioning refers to peaks of performance on a certain operation or in processing a certain
type of material that is higher than the performance on
other operations or with other materials. In terms of cognitive operations, most hyperfunctioning is related to
tasks based on perception. In terms of the material that
is processed, most hyperfunctioning in individuals with
autism is found with nonsocial materials (Mottron &
Burack, 2001). In contrast, hypofunctioning refers to
“valleys” in performance, when performance on a given
task or with a specific material is inferior to the average performance of the participant. This is the case for
theory of mind tasks, executive tasks, or more generally
“complex” tasks (Minshew, Sweeney, & Luna, 2002).
In terms of material, most hypofunctioning is evident in
the processing of social material.
The evidence from Study 2 establishes that the
peaks of abilities associated with the EVIP and RPM
result in an overestimation of general intelligence level
in persons with autism. In other words, and keeping in
mind the WISC IQ profile characterizing autism, EVIP
and RPM percentile value are closer to the vocabulary
and the block-design peaks than to the FSIQ. In contrast, FSIQ is closer to the baseline of IQ subtests. As
479681.qxd 2/2/04 10:56 AM Page 26
26
a consequence, the use of the levels provided by EVIP
and RPM for matching purposes results in a possible
discrepancy between target and comparison group level
according to the task under study. If the task relies
on the same type of cognitive operations than those on
which EVIP and RPM are based, the comparison
group will be at the same level as the persons with
autism. When searching for possible hyperfunctioning,
matching with these instruments will likely result in a
negative finding because the related baseline abilities
of the persons with autism will be overestimated. This
is a false negative finding (or type-2 error: a failure to
reject the null hypothesis when it is in fact false) when
looking for hyperfunctioning of the autism group. Thus,
the peaks of abilities among persons with autism may
not be sufficiently reported in the characterization of
cognitive profiles.
In the case where the task under study is not
related to the matching variable (e.g., looking for an
executive deficit when matching on a vocabulary-based
instrument), the overestimation of the general intelligence of the clinical group introduced by the matching
procedure introduces a heightened risk for a false positive finding (or type-1 error, which occurs when a true
null hypothesis is incorrectly rejected) when looking
for deficits. As a consequence, studies matching target
and comparison group using vocabulary-based instrument or RPM may have inflated deficits characterizing
autism. This is especially important in the field of face
and emotion processing for which verbal performance
is typically used for matching (Ozonoff, Pennington, &
Rogers, 1991). This may also lead to the reconsideration of some of the findings related to impairment in
theories of mind tasks, as BPVS is frequently used to
match groups in theses tasks, following the notion that
they are correlated to language ability (see Burack,
et al., this issue). Similarly, some group differences
(e.g., group effect in reaction time; Mottron, Burack,
Stauder, & Robaey, 1999) may result from overestimation of participants with autism when they are
matched on RPM.
Role of Matching Instruments in the Validation
of the Autism/Asperger Distinction
The evidence from Study 2 suggests that performance on tasks like the EVIP and RPM (see Fig. 1)
differs for persons with autism as compared to those
with Asperger syndrome. The level of performance on
the block-design task is clearly the highest peak among
the persons with autism as it exceeds even performance
on the EVIP, which is enhanced in relation to other
Mottron
areas of functioning. In contrast, the performance of
the persons with Asperger syndrome was high on the
EVIP, but average on the RPM and block-design
task. When these two subgroups are combined, blockdesign and EVIP appear as the two main peaks of performance, although the contribution of participants with
autism and Asperger syndrome differs. These results
are discrepant with the recent arguments that autism
and Asperger syndrome are not different at a neuropsychological level (Miller & Ozonoff, 2000). Rather,
the findings presented here indicate that the peaks of
abilities of persons with autism and Asperger syndrome
are not similar and that this difference needs to be considered when choosing matching instruments.
GENERAL RECOMMENDATIONS
Some recommendations follow from these two
studies. The first recommendation would be to focus
on people with autism with mental retardation as well
as on higher functioning persons. According to the current trend revealed by Study 1, matching issues, availability of high-functioning individuals and other
practical concerns, results in an emphasis on crosssectional studies involving adult, intelligent individuals with PDDs. Besides the positive aspect of increasing
our understanding of high-functioning individuals, one
may question whether this research strategy will also
be associated with conceptual and empirical insights
about persons with autism who function in the range of
mental retardation. Moreover, the use of adults leads
to difficulties in disentangling nuclear cognitive
deficits—what we are supposed to look for—from the
effects of complex compensatory mechanisms and
training or deprivation (or “life experiences,” see
Burack et al., this issue)—which are measured.
The second recommendation is that the same level
of concern regarding standardization for the diagnosis
of PDD should be applied to the use of intelligence
instruments. Although the scientific community considerably reduced the noise associated with the diagnoses of PDD by the use of standardized diagnosis
instruments, the choice of tools for the assessment of
intelligence level for matching purposes are quite
discrepant. In order to address this problem, the use of
the Wechsler scale should be considered as standard
scientific practice whereas the other instruments that
are commonly used, EVIP–BPVS–PPVT and RPM,
should not be employed as matching instruments because they introduce major risks for type-1 and type-2
errors.
479681.qxd 2/2/04 10:56 AM Page 27
Matching Issues and IQ-Instrument Biases
The last recommendation is related to the choice
of a general versus specific matching variable. The data
and arguments presented here lead to two clearly separate types of cognitive studies with regard to matching strategies. In one scenario, errors related to use of
over- and underestimation biases are prevalent when
searching for main effects of group, as is the case when
studying peaks or valleys among cognitive performance. In this situation, the use of the Wechsler instruments is recommended to avoid type-1 or type-2
errors. In contrast, when studying group-by-conditions
interactions, an over- or underestimation of the general
level of clinical group is less influential. This is the
case in fine-tuned cognitive tasks in which the purpose
is to disentangle the relative role of two mechanisms
(or the influence of one operation on another), rather
than main effects of superiority or inferiority. In such
cases, group main effects (i.e., possibly resulting from
an imprecise matching between groups) may be left
uninterpreted (e.g., Mottron et al., 1999). In this case,
a task-specific matching variable is preferable to a general one. This variable needs to be justified for each
different experiment, thereby precluding a single
solution that is valid for all the types of experiments
(Burack et al., this issue).
ACKNOWLEDGMENTS
We thank Geneviève Martel, Patricia Jelenic, and
Marie-Josée Caron for research assistance and Jake
Burack, Michelle Dawson, and Oriane Landry for editing help on an earlier version of this manuscript. This
work was supported by a grant from the Canadian
Institute for Health research, “Characterizing cognitive
deficit in Autism and Asperger Syndrome.”
REFERENCES
Burack, J. A., Iarocci, G., Bowler, D., & Mottron, L. (2002). Benefits and pitfalls in the merging of disciplines: The example of
developmental psychopathology and the study of persons with
autism. Development and Psychopathology, 14, 225–237.
Burke, H. R. (1985). Raven’s Progressive Matrices: More on norms,
reliability and validity. Journal of Clinical Psychology, 41,
231–235.
27
Datafinder 3.5. Digimed Systems, Montréal, Canada
<info@digimedsystems.com>.
Dunn, L. M., & Dunn, E. S. (1981). Peabody Picture Vocabulary
Test-Revised. Circle Pines, MN: American Guidance Services.
Dunn, L. M., Dunn, L. M., Whetton, C., & Pintilie, D. (1982). British
Picture Vocabulary Scale. Windsor: NFER-NELSON.
Dunn, L. M., Thériault-Whalen, C. M., & Dunn, L. M. (1993). Échelle
de vocabulaire en image Peabody. Toronto, Ontario: Psycan.
Hermelin, B., & O’Connor, N. (1970). Psychological experiments
with autistic children. Oxford: Pergamon Press.
Jensen, A. R., Saccuzzo, D. P., & Larsen, G. E. (1988). Equating
the standard and advanced forms of the Raven Progressive
Matrices. Educational and Psychological Measurement, 48,
1091–1095.
Mangiaracina, J., & Simon, M. J. (1986). Comparison of the PPVT-R
and WAIS-R in state hospital psychiatric patients. Journal of
Clinical Psychology, 42, 817–820.
Miller, J. N., & Ozonoff, S. (2000). The external validity of Asperger
disorder: Lack of evidence from the domain of neuropsychology. Journal of Abnormal Psychology, 109, 227–238.
Minshew, N. J., Sweeney, J., & Luna, B. (2002). Autism as a selective disorder of complex information processing and underdevelopment of neocortical systems. Molecular Psychiatry, 7,
S14–15.
Mottron, L., Burack, J. A., Stauder, J. E., & Robaey, P. (1999).
Perceptual processing among high-functioning persons with
autism. Journal of Child Psychology and Psychiatry, 40,
203–211.
Mottron, L., & Burack, J. (2001). Enhanced perceptual functioning
in the development of autism. In J. A. Burack, T. Charman,
N. Yirmiya, & P. R. Zelazo (Eds.), The development of autism:
Perspectives from theory and research (pp. 131–148). Mahwah,
NJ: Lawrence Erlbaum.
O’Leary, U. M., Rusch, K. M., & Guastello, S. J. (1991). Estimating age-stratified WAIS-R IQs from scores on the Raven’s Standard Progressive Matrices. Journal of Clinical Psychology, 47,
277–284.
Ozonoff, S., Pennington, B. F., & Rogers, S. J. (1991). Executive
function deficits in high-functioning autistic individuals: Relationship to theory of mind. Journal of Child Psychology and
Psychiatry, 32, 1081–1105.
Price, D. R., Herbert, D. A., Walsh, M. L., & Law, J. G. (1990). Study
of WAIS-R, Quick Test and PPVT IQs for neuropsychiatric
patients. Perceptual and Motor Skills, 70, 1320–1322.
Raven, J. C. (1938, 1996). Progressive Matrices: A perceptual test
of intelligence. Individual form. Oxford: Oxford Psychologists
Press Ltd.
Raven, J. C. (1947, 1995). Colored Progressive Matrices Sets I and II.
Oxford: Oxford Psychologists Press Ltd.
Snitz, B. E., Bieliauskas, L. A., Crossland, A., Basso, M. R., &
Roper, B. (2000). PPVT-R as an estimate of premorbid intelligence in older adults. The Clinical Neuropsychologist, 14,
181–186.
Wechsler, D. (1974). Wechsler Intelligence Scale for ChildrenRevised. New York: The Psychological Corporation.
Wechsler, D. (1981). Wechsler Adult Intelligence Scale-Revised. New
York: The Psychological Corporation.
Yirmiya, N., Erel, O., Shaked, M., & Solomonica-Levi, D. (1998).
Meta-analyses comparing theory of mind abilities of individuals
with autism, individuals with mental retardation, and normally
developing individuals. Psychological Bulletin, 124, 283–307.
Download