479681.qxd 2/2/04 10:56 AM Page 19 Journal of Autism and Developmental Disorders, Vol. 34, No. 1, February 2004 (© 2004) Matching Strategies in Cognitive Research with Individuals with High-Functioning Autism: Current Practices, Instrument Biases, and Recommendations Laurent Mottron1,2 A meta-analysis was performed on the 133 cognitive and behavioral papers in autism using comparison groups in the 1999–2002 period. High-functioning (average IQ: 84.7), adolescents (average, 14.4 years) are largely dominant. IQ is the most frequent matching variable in use (51.2%). The instruments that are most frequently used to determine IQ or general level are Wechsler scales (46.9%), British Picture Vocabulary Scale (BPVS; 22.3%), and Raven Progressive Matrices (RPM; 16.9%). In order to determine if these instruments were equivalent when applied to individuals with pervasive developmental disorders (PDDs), Wechsler IQ, EVIP (a French Canadian translation of the BPVS), and RPM were given to a group of 14 individuals with autism and 12 with Asperger syndrome. Comparison of Wechsler and RPM IQs values, expressed as percentiles, to percentile values of EVIP score revealed that EVIP (and to a lesser extent RPM) considerably overestimates the level of all PDD participants as compared to Wechsler Verbal IQ (VIQ), Performance IQ (PIQ), or Full-Scale IQ (FSIQ), whereas these instruments are reported to be strongly correlated in typically developing individuals. This study reveals that identification of objects from a verbal label—the BPVS–PPVT–EVIP task—is a peak of ability in high functioning individuals with PDDs. This peak of ability, even superior to that of block design, has a detrimental effect on matching based on this instrument. A recommendation to replace BPVS/PPVT/EVIP or RPM by Wechsler scale as a basis of IQ/level matching is provided. Accordingly, the former instruments are a potential source of type-1 (for cognitive deficits) or type-2 (for cognitive hyperfunctioning) errors. KEY WORDS: British Picture Vocabulary Scale; high-functioning autism; IQ; matching strategy; methodology; peaks of ability; Raven Progressive Matrices; Wechsler Intelligence Scale. INTRODUCTION performances under study. Therefore, some kind of control should be used to limit the confounding effects that may result from this heterogeneity in these groups. The charting of cognitive deficits and strengths among persons with autism requires comparison of their performance with that of another group. In addition to differences in clinical status, the target and comparison groups can differ on many factors, such as age and level of functioning, which may influence the cognitive 1 2 Abbreviations: ADI, Autism Diagnostic Interview; ADOS, Autism Diagnostic Observation Schedule; AS, Asperger syndrome; BPVS, British Picture Vocabulary Scale; CA, chronological age; CELF, Clinical Evaluation of Language Fundamentals; EVIP, Échelle de Vocabulaire en Images Peabody; FSIQ, full-scale IQ; HFA, highfunctioning autism; MA, mental age; PDD, pervasive developmental disorder; PDDNOS, pervasive developmental disorder not otherwise specified; PIQ, performance IQ; PPVT, Peabody Picture Vocabulary Test; RPM, Raven Progressive Matrices; RT, reaction time; ToM, theory of mind; TROG, Test of Reception of Grammar; VIQ, verbal IQ; WAIS, Wechsler Adult Intelligence Scale; WISC, Wechsler Intelligence Scale for Children. Centre de Recherche Fernand Seguin and Département de Psychiatrie, Université de Montréal, Montréal, Canada Correspondence should be addressed to Laurent Mottron, Clinique Spécialisée de l’Autisme, Hôpital Rivière des Prairies, 7070 Blvd. Perras, Montréal, Canada, H1E1A4; e-mail: mottronl@istar.ca 19 0162-3257/04/0200-0019/0 © 2004 Plenum Publishing Corporation 479681.qxd 2/2/04 10:56 AM Page 20 20 As autism is a condition that begins early in life, is observed across the range of functioning levels from severe mental retardation to superior intelligence, and may or may not be accompanied by identified neurological syndromes, a random group of individuals who satisfy the criteria for the diagnosis is likely to be heterogeneous at multiple and possibly interacting levels. This heterogeneity complicates the matching procedure and, accordingly, leads to the study of specific subgroups. The choices of these subgroups may be arbitrary or driven by considerations of convenience rather than by purely scientific decisions. The search for the unique characteristics of autism entails the study of specific factors that are linked either to the phenotype or to the causal mechanism of autism per se and are not shared with other neurodevelopmental disorders. The attempt to disentangle specific and nonspecific factors in cognitive research on autism led to the mental age (MA) matching paradigm that is attributed to Hermelin and O’Connor (1970). Before their seminal contribution, individuals with autism were typically compared to those of typically developing individuals of the same chronological age (CA), a problematic strategy due to the high incidence of low IQs among persons with autism. In their innovative series of experiments, Hermelin and O’Connor compared the performance of persons with autism who functioned in the range of mental retardation to that of younger typically developing children matched on mental age (MA) or of nonautistic persons with mental retardation matched on MA. Despite the contributions of the MAmatching paradigm in controlling for differences in developmental level, this strategy introduces other confounding elements. For example, inherent differences in CA between younger typically developing children and older persons with low-functioning IQ limit the implications of findings based on MA-matched studies (for a review, see Burack, Iarocci, Bowler, & Mottron [2002]) because CA differences between groups cannot be disentangled from the differences related to autism per se. Alternatively, when persons with mental retardation matched on IQ are compared to persons with autism, the characteristics associated with the etiologies of mental retardation in the comparison group may be the source of artifactual group differences (Yirmiya, Shaked, Solomonica-Levi, 1998). In order to minimize the theoretical and practical shortcomings of MA matching, persons with highfunctioning autism (HFA) became the focus of cognitive research during the late 1980s. Individuals with high-functioning autism may be compared to typically developing individuals without the confounding factors Mottron of CA discrepancies and syndrome-specific findings. The study of autism is further facilitated in this group due to superior compliance and attention span and, for adults, the ability to provide legally their own consent to participate in the research. Although individuals with HFA display the same variability in average intelligence as typically developing individuals, they present an uneven profile of strengths and deficits within subtests that comprise the more general score of intelligence. Peaks and valleys may differentially influence averaged level measures according to the instruments used. Accordingly, the practical aspects of matching on intelligence level may have important consequences on the similarities between groups matched on intelligence level. Thus, the purpose of this paper is twofold. The first is to present a systematic survey of the current strategies for choosing subgroups, matching variables, and standardized instruments for assessment of level of functioning in the recent literature on cognitive neuroscience research with individuals with pervasive developmental disorders (PDDs). The second purpose of this paper is to compare empirically the most common instruments used for level matching which were identified in the first study. STUDY 1: META-ANALYSIS OF MATCHING STRATEGIES AND SUBGROUP CHOICES USED IN BEHAVIORAL NEUROSCIENCE OF PERVASIVE DEVELOPMENTAL DISORDERS Method Behavioral and cognitive, peer-reviewed, empirical papers involving group studies quoted in the MEDLINE database in the 3-year period from January 1999 to December 2002 under the entries “Autism” and “Asperger syndrome” were selected. The inclusion criteria were that the papers were published in English, included a full abstract, involved typical or clinical comparison groups or normative data, and that performance was based on reaction time (RT) scores, error/success rates, or another behavioral measure as the dependent variable. Behavioral findings of brain-imaging studies were included in the survey, but clinical trials, rehabilitation, brain imaging, and electrophysiological studies without behavioral-dependent measures as well as studies with only individuals with pervasive developmental disorder not otherwise specified (PDD-NOS) were excluded. One hundred and thirty-three papers in cognitive neuroscience on autism and Asperger syndrome were 479681.qxd 2/2/04 10:56 AM Page 21 Matching Issues and IQ-Instrument Biases reviewed, with a total of 169 comparison groups (as multiple matching groups were used for the same experiment or for several experiments in the same paper). Among the 169 comparisons, 106 (62.7%) compared typically developing with individuals with autism, 24 (14.2%) with Asperger syndrome, and 39 (23.1%) with combined groups of individuals. For each comparison group, the mean IQ, MA, and CA of the clinical population, type of comparison group used (clinical, typical), type of matching (pair-wise, groupwise, covariation), number of matching variables (e.g., FSIQ, VIQ, PIQ, MA, CA), and tool to assess the level of intelligence or performance used for matching (e.g., WAIS, RPM, PPVT, BPVS) was noted. In order to determine the proportion of participants with and without mental retardation among the persons with autism, the intelligence levels provided by the various instruments across the studies was converted to a common metric. Because the Wechsler instruments are the most frequent measures of general intelligence in use, Wechsler Full-Scale IQ (FSIQ; Wechsler, 1974, 1981) was taken as reference measure and provided when available. In cases where only Wechsler Verbal IQ (VIQ) (34 comparisons) or Performance IQ (PIQ) (14 comparisons) were mentioned, this value was taken as indication of participants’ intelligence. In the cases of the British Picture Vocabulary Scale (BPVS; Dunn, Dunn, Whetton, & Pintilie, 1982), which provides MAs and percentiles, and Raven Progressive Matrices (RPM; Raven, 1938, 1947, 1995, 1996), which provides IQs and percentiles, an approximation of IQ was obtained using Wechsler Percentile-IQ Correspondence Scale. In the cases of other tests that provide only MA (e.g., Test of Reception of Grammar [TROG], Mullen, Bayley, Vineland Adaptive Behavior Scale, Clinical Evaluation of Language Fundamentals [CELF], Brunet Lézine, and Reynell), an approximation of IQs was computed using the IQ = MA/CA formula. Finally, a cut-off between high- and low-functioning of IQ = 67 rather than the usual value of 70 was selected, as this is the lowest value for which a correspondence between IQ and percentile value can be obtained according to Wechsler manual. Results Characteristics of Participants Distribution of Comparisons According to IQ Level of Participants. The distribution of comparisons according to IQ levels for the autism, the Asperger, and the combined PDD subgroups are presented in Table I. Due to the lack of availability of information for two 21 Table I. Distribution of Comparisons According to the Intelligence Level of the Participantsa Number of groups (%) Intelligence level Mild mental retardation (FSIQ < 67) Borderline intelligence (FSIQ 67 to 90) Normal intelligence (FSIQ > 90) Total number of comparisons Autism PDD Asperger 26 (22.0) 0 (0.0) 0 (0.0) 45 (38.2) 7 (41.2) 6 (18.8) 47 (39.8) 10 (58.8) 26 (81.2) 118 (100.0) 17 (100.0) 32 (100.0) PDD, pervasive developmental disorder; FSIQ, full-scale IQ. a PDD refers to studies where data from autism and Asperger syndrome cannot be differentiated. Table II. Distribution of Comparisons According to Chronological Age of Participants Autism PDD Asperger syndrome Age range Number of groups (%) Number of groups (%) Number of groups (%) 0 to 5 years old 6 years and older 17 (14.5) 100 (85.5) 1 (5.9) 16 (94.1) 0 (0.0) 35 (100.0) Total 117 (100.0) 17 (100.0) 35 (100.0) PDD, pervasive developmental disorder. papers, proportions are provided for 167 comparisons. The average IQ across comparisons, with each comparison weighted equally, was 84.7. The majority (76.7%) of comparisons relied on participants with autism without mental retardation. Distribution of Comparisons According to the Age of Participants. The ages of the participants included in these studies are presented in Table II. The mean age of participants across studies, with each study weighted equally, was 14.4 years. In 85.5% of the studies with persons with autism, 94.1% of those with persons with PDD, and 100% of those with persons with Asperger syndrome (AS), the average age of the participants was older than 6 years. The lack of studies with preschoolers with AS likely results from the practical problem of providing this diagnosis before 5 years of age. Matching Strategies Socio-Demographic Variables. The relative frequency of the different socio-demographic variables that are used are depicted in Table III. Chronological 479681.qxd 2/2/04 10:56 AM Page 22 22 Mottron Table III. Distribution of Socio-Demographic Matching Strategies Among Comparisons (Total: 169) Variable CA Gender Educational level Laterality SES Other (profession, interest, sensory profile) Matching (%) 83 29 9 10 4 4 Covariation (%) No differences between groups (%) 3 (1.8) — — — — — 18 (10.7) 3 (1.8) 1 (0.6) — 1 (0.6) 2 (1.2) (49.1) (17.2) (5.3) (5.9) (2.4) (2.4) Total number of groups (%) 104 32 10 10 5 6 (61.5) (18.9) (5.9) (5.9) (3.0) (3.6) CA, chronological age; SES, socioeconomic status. age (approximately two-thirds of the groups, 104 comparisons = 61.5%) and gender (approximately one-fifth, 32 comparisons = 18.9%) were the most frequently used variables. Type and Number of Matching Variables. A single matching variable was used in 51 comparisons (30.2%). The most commonly used were VIQ or Verbal level in 20 comparisons (11.8%), CA in 12 = (7.1%), PIQ in 5 (2.9%), and MA in 5 (2.9%). Two variables were used in 55 comparisons (32.5%). The most frequent combinations were CA + VIQ or Verbal level (17 comparisons; 10%), CA + FSIQ (9 comparisons; 5.3%), CA + PIQ (7 comparisons; 4.5%), and CA + Gender (4 comparisons; 2.3%). When three variables were used, the most frequent combinations were VIQ/level + PIQ + CA (7 comparisons; 4.5%), FSIQ + Gender + CA (5 comparisons; 2.9%), and VIQ/level + PIQ + FSIQ (3 comparisons; 1.7%). Across all the comparisons, the most common matching variables were VIQ/level in 74 comparisons (43.8%), PIQ in 53 (31.4%), and FSIQ in 36 (21.3%). Instruments Used to Assess General Intelligence. Among the 130 comparisons where intelligence was used as matching variable, the Wechsler Intelligence Scale for Children/Wechsler Adult Intelligence Scale (WISC/WAIS) was the test used most frequently (61/130 = 46.9%), followed by the BPVS, the British version of PPVT-R3 (29/130 = 22.3%) for verbal level, and by RPM for performance IQ (22/130 = 16.9%). Scales of adaptive functioning (VABS) were used only marginally (3/169 = 1.8%). 3 BPVS and PPVT (Dunn & Dunn, 1981) are vocabulary scales that consist in pointing to the picture corresponding to a verbally presented name among three distracters. BPVS has been derived from the PPVT-R. PPVT-III is similar to PPVT-R with the exception of enlarged age validity and improved culture and gender fair pictures. On the BPVS, only 3.9% of PPVT-III items/words corresponding to specifically U.S. idioms have been replaced by their U.K. equivalent. Discussion The current prototypical study in cognitive neuroscience on autism is a cross-sectional study using individuals with HFA with a FSIQ around 85 and CA of 14 years. Participants with low developmental and chronological levels are practically discarded from cognitive research on autism. As a spontaneous and unquestioned research strategy, this finding raises important issues of external validity. Can the knowledge based on this intelligence level and age group, now clearly the main source of knowledge regarding cognition in autism, be generalized to individuals in the entire range of age and IQ levels? Using individuals more than 10 years older than the process that made them autistic cannot separate the specific effect of pathological process per se from those of reaction to these processes. The cognitive profile of adults with autism plausibly results from the combined effects of expertise and over-training linked to special interests, adaptive compensatory strategy, or, in contrast, under-training linked to impoverishment of external stimulation. Accordingly, the use of adults is probably not the best strategy to document “primary” deficits—even if this represents an indispensable source of information on the cognitive condition of adults with autism. The second finding of this literature survey was that CA and IQ/level, mostly verbal but also performance or FSIQ, are the dominant matching variables in current cognitive research on PDDs. This is due to the use of high-functioning individuals that allow CA matching—a practice impossible with the use of individuals with mental retardation. However, the use of IQ or verbal level as the main matching variable puts a heavy burden on the instruments used to measure this IQ. This study also establishes that although Wechsler scales are largely dominant (approximately one-half of the studies), two other instruments, BPVS (one-quarter) and RPM (one-sixth) are also used in a significant 479681.qxd 2/2/04 10:56 AM Page 23 Matching Issues and IQ-Instrument Biases 23 proportion of studies. Therefore, possible differences in sensitivity and lack of comparability among studies may result from this use of multiple instruments, which may not be correlated similarly in comparison and clinical populations. This issue will be addressed in the following study. STUDY 2: ARE THE DIFFERENT INSTRUMENTS USED FOR IQ OR LEVEL MATCHING EQUIVALENT? A survey of recent cognitive literature on autism indicates that partial or general measures of IQ are the variables most frequently used for matching. Notwithstanding, researchers used as equivalent several instruments to measure the intelligence level on which the matching is performed. For example, verbal IQ or verbal level is obtained through Wechsler VIQ or BPVS, and nonverbal IQ is obtained through RPM or Wechsler PIQ. The purpose of Study 2 was therefore to determine if these instruments were equivalent and, more precisely, the type of biases (over- or underestimation) possibly introduced by these instruments. To answer these questions, we decided to compare the intelligence level of a group of individuals with PDD, as measured by the two most frequently used measures of verbal performance (Wechsler VIQ vs. EVIP score) and nonverbal performance (Wechsler PIQ vs. RPM). EVIP is a French translation of PPVT-R, valid for French-Canadian as well as European French populations. Fig. 1. Percentiles equivalent obtained by EVIP, RPM, and Wechsler subscales. subgroup: 12 individuals, 8 M, 4 F; mean age, 15.3; SD, 6.5; range, 7–29). PIQ, VIQ, and FSIQ obtained by WISC/WAIS and IQs or levels from EVIP and RPM were transformed into percentiles according to the tables provided with each instrument and compared using Student t test for paired samples (two-tailed). Because of relatively small samples, Wilcoxon matched-pairs test was also carried out as validation process of Student t-test results (Fig. 1). Results Relative IQ/levels (percentile equivalent) obtained by EVIP, RPM, and Wechsler subscale are represented in Figure 1 and statistical analyses among values provided by these different instruments in Table IV. For comparison purposes, Wechsler block-design and vocabulary subscale were included in the analyses. Method All individuals (a) with a diagnosis of autism, using both Autism Diagnostic Interview (ADI) or ADI-R and Autism Diagnostic Observation ScheduleGeneric (ADOS-G) module 3 or 4, (b) having received a full-scale intelligence measurement with WISC-III or WAIS-III, (c) who were administered both a RPM and a EVIP assessment, and (d) with a FSIQ > 60 were selected in the PDD population of the clinique spécialisée de l’autisme of Rivière-des-Prairies’s hospital, using Datafinder database (Digimed systems). Among this group, individuals who were diagnosed with autism, but without language delay, echolalia, pronoun reversal, or stereotyped language at ADI administration were considered as having Asperger syndrome. This resulted in a group of 26 individuals (autism subgroup: 14 individuals, 11 M, 3 F; mean age, 12.5; SD, 8.2; range, 6–39; Asperger syndrome Autism Subgroup Wechsler FSIQ, VIQ, and PIQ comparison revealed a nonsignificant PIQ > VIQ pattern, thus replicating the finding of Siegel et al. (1996) of unremarkable differences between Wechsler sub- and full-scale among sufficiently large groups of persons with autism. In contrast, differences in the percentile measure obtained by Wechsler VIQ and EVIP level were statistically significant [Student t test for paired samples (two-tailed): t(13) = 4.55, p = 0.001; Wilcoxon matched-pairs: z = 3.30, p = 0.001]. This indicates that the EVIP considerably overestimates verbal abilities in comparison to the Wechsler verbal subscale. Differences between Wechsler PIQ and RPM were not significant, but differences between Wechsler FSIQ and RPM were near significance or significant according to analyses [Student t test for paired samples 479681.qxd 2/2/04 10:56 AM Page 24 24 Mottron Table IV. Consistency Among Percentiles Equivalent Obtained by Different Instruments (t-Test, p-values) 1 2 3 4 5 6 — .0048c .05b .013b .0084c .0021c .0033c — .45 .23 .33 .44 .61 — .77 .0084c .79 .040b — .10 .34 .07 — .21 .28 — .21 — .31 .60 .0020c .17 .00056d .00058d — .16 .12 .44 .09 .06 — .082 .015b .038b .02b — .63 .75 .94 — .29 .25 — .42 — .0053c .37 .00005d .0032c .000002d .00001d — .11 .81 .21 .27 .09 — .20 .00027d .053 .0029c — .28 .34 .12 — .86 .90 — .60 Asperger 1. 2. 3. 4. 5. 6. 7. EVIP RPM Block designa Vocabularya PIQa VIQa FSIQa Autism 1. 2. 3. 4. 5. 6. 7. EVIP RPM Block designa Vocabularya PIQa VIQa FSIQa Discussion Type of Bias Introduced by EVIP and RPM Asperger or autism 1. 2. 3. 4. 5. 6. 7. EVIP RPM Block designa Vocabularya PIQa VIQa FSIQa and those with Asperger syndrome, a single analysis was conducted on both groups. According to this analysis, the scores were highest on the EVIP, which were superior to those on the three Wechsler subscales. The scores on both the EVIP and block-design were superior to those on the Wechsler VIQ and PIQ, thereby indicating that they may be considered as peaks of ability. At an individual level, EVIP level was greater than Wechsler VIQ for 24 out of 26 individuals with autism or Asperger syndrome. In contrast, differences among RPM and Wechsler PIQ in the combined group were not significant. a Wechsler Intelligence Scale. p < .05. c p < .01. d p < .001. b (two-tailed): t(13) = 2.06, p = 0.060; Wilcoxon matched-pairs: z = 2.10, p = 0.036] showing that the RPM overestimates general intelligence level in comparison to the Wechsler Full-Scale IQ. Asperger Subgroup Comparisons among Wechsler FSIQ, VIQ, and PIQ revealed a nonsignificant VIQ > PIQ pattern that replicates Miller & Ozonoff’s (2000) findings with this group. In contrast, EVIP level was significantly superior to Wechsler VIQ [Student t test for paired samples (two-tailed): t(11) = 4.01, p = 0.002; Wilcoxon matched-pairs: z = 2.85, p = 0.004]. Differences between Wechsler PIQ and RPM and between Wechsler FSIQ and RPM were not significant (See Fig. 1). Findings from the Combined Groups of Persons with Autism and Persons with Asperger Syndrome In order to validate the findings in the face of possible disagreements concerning the strategy for diagnostically distinguishing between persons with autism The purpose of Study 2 was to compare the intelligence level, expressed in percentile equivalent of a typically developing population, of the tests most frequently used for measuring IQ or level of performance used for matching. The finding indicates that EVIP, RPM, and Wechsler scales result in important differences in performance when applied to the same population of high-functioning PDD individuals. For the subgroup of individuals with autism, the performance discrepancies are especially important between EVIP and RPM/Wechsler scales and between RPM and Wechsler FSIQ. In the subgroup of individuals with Asperger syndrome and in the combined autism–Asperger group, the discrepancies are maximum between EVIP and Wechsler scales. As a consequence, findings based on matching with vocabulary-based scales, RPM, and Wechsler measures are not compatible, and the comparisons across studies need to involve corrections for differences in sensitivity. The concern that correlations between vocabularybased scales and Wechsler scale might be substantially lower in clinical populations than in typically developing individuals is supported by prior evidence. In a MEDLINE literature review, findings from 17 studies conducted with various clinical populations (mostly learning disabilities and mental retardation at large) indicated that PPVT-R underestimates intelligence level compared to Wechsler VIQ or FSIQ. In contrast, evidence from three studies suggested that PPVT-R overestimates intelligence level in comparison with Wechsler VIQ or FSIQ, mostly in neuropsychiatric patients and elderly individuals with average to high IQ (Mangiaracina & Simon, 1986; Price, Herbert, Walsh, & Law, 1990; Snitz, Bieliauskas, Crossland, Basso, & Roper, 2000). When individuals with low intelligence are used as a comparison group for research conducted with persons with PDD, the additive effect 479681.qxd 2/2/04 10:56 AM Page 25 Matching Issues and IQ-Instrument Biases of these two opposite biases creates a high risk of comparing populations with important differences in level of general intelligence. The bias introduced by the use of RPM as matching instrument is quantitatively smaller than that associated with the EVIP, but was still significant in the subgroup of persons with autism. This difference in measures for persons with autism is discrepant with high correlation between RPM and FSIQ or VIQ which is evidenced among typically developing individuals (Burke, 1985; Jensen, Saccuzzo, & Larsen, 1988; O’Leary, Rusch, & Gaustello, 1991) as well as among a heterogeneous population of patients in psychiatric hospitals (O’Leary et al., 1991). These findings from the analyses of the EVIP and the RPM are inconsistent with the assumption that instruments that measure intelligence in persons with PDD reflect general intelligence in the same way as in the comparison groups. The matching of persons with autism to typically developing individuals with vocabulary-based scales or with RPM results in an overestimation of the general intelligence of the former group which would not be manifested with a Wechsler scale. A more radical conclusion is that the concept of VIQ for persons with autism, and even more so for persons with Asperger syndrome, consists of two very different measures that cannot be used one for the other. One is an average of measures involving language use and language mechanisms, for which the Wechsler VIQ might be one source, and the other is a measure of the peak of ability in labeling objects as measured by vocabulary-based scales. Possible Explanation of the Differences in Intelligence Level Between Wechsler Scales and the Two Other Instruments Under Study My position is that vocabulary-based instruments, and to a lesser extent the RPM, are tasks that rely on a more limited range of cognitive abilities than the corresponding verbal and nonverbal Wechsler subscales and the resultant FSIQ. The consequence of this difference is that the extreme scores characterizing peaks of abilities of persons with autism influence the performance on these instruments to a larger extent than on Wechsler VIQ, PIQ, or FSIQ. This is evident when the Wechsler standard scores for the subjects in Study 2 are averaged with and without the peaks and valleys. The peaks were defined according to WISC-III manual (Canadian supplement) as a significant difference ( p < 0.05) of a specific subtest in relation to the average of six other subtests. This resulted in a mean of 2.7 classical peaks (e.g., block design, vocabulary) and 25 2.15 valleys (e.g., comprehension, digit-span) per subject. Despite the prevalence of these widely discrepant levels of ability, the averaged standard scores with and without the extreme values were similar. In contrast, strengths in the abilities that are the basis of EVIP and RPM scores are not compensated by other tasks in which performance is average or inferior. A second position is that vocabulary-based scales and RPM happen to tap into cognitive systems that correspond to peaks of ability among persons with autism. Both instruments involve tasks in which the overfunctioning of low-level perceptual cognitive operations that characterize autism (Mottron & Burack, 2001) is an advantage. Accordingly, the association of a verbal label and a picture (EVIP) may be performed at a low perceptual level, as the similarity between a series of visually presented patterns (RPM) may be detected by extraction of perceptual regularities. Consistent with this interpretation and as indicated in Figure 1, performance on vocabulary-based scales by persons with PDDs is even higher than on the Wechsler blockdesign task, the most documented source of a peak of ability in this group. Post hoc Reinterpretations of Data Based on the Use of These Instruments One of the major goals of current research on autism is the search for cognitive hyper- or hypofunctioning. Hyperfunctioning refers to peaks of performance on a certain operation or in processing a certain type of material that is higher than the performance on other operations or with other materials. In terms of cognitive operations, most hyperfunctioning is related to tasks based on perception. In terms of the material that is processed, most hyperfunctioning in individuals with autism is found with nonsocial materials (Mottron & Burack, 2001). In contrast, hypofunctioning refers to “valleys” in performance, when performance on a given task or with a specific material is inferior to the average performance of the participant. This is the case for theory of mind tasks, executive tasks, or more generally “complex” tasks (Minshew, Sweeney, & Luna, 2002). In terms of material, most hypofunctioning is evident in the processing of social material. The evidence from Study 2 establishes that the peaks of abilities associated with the EVIP and RPM result in an overestimation of general intelligence level in persons with autism. In other words, and keeping in mind the WISC IQ profile characterizing autism, EVIP and RPM percentile value are closer to the vocabulary and the block-design peaks than to the FSIQ. In contrast, FSIQ is closer to the baseline of IQ subtests. As 479681.qxd 2/2/04 10:56 AM Page 26 26 a consequence, the use of the levels provided by EVIP and RPM for matching purposes results in a possible discrepancy between target and comparison group level according to the task under study. If the task relies on the same type of cognitive operations than those on which EVIP and RPM are based, the comparison group will be at the same level as the persons with autism. When searching for possible hyperfunctioning, matching with these instruments will likely result in a negative finding because the related baseline abilities of the persons with autism will be overestimated. This is a false negative finding (or type-2 error: a failure to reject the null hypothesis when it is in fact false) when looking for hyperfunctioning of the autism group. Thus, the peaks of abilities among persons with autism may not be sufficiently reported in the characterization of cognitive profiles. In the case where the task under study is not related to the matching variable (e.g., looking for an executive deficit when matching on a vocabulary-based instrument), the overestimation of the general intelligence of the clinical group introduced by the matching procedure introduces a heightened risk for a false positive finding (or type-1 error, which occurs when a true null hypothesis is incorrectly rejected) when looking for deficits. As a consequence, studies matching target and comparison group using vocabulary-based instrument or RPM may have inflated deficits characterizing autism. This is especially important in the field of face and emotion processing for which verbal performance is typically used for matching (Ozonoff, Pennington, & Rogers, 1991). This may also lead to the reconsideration of some of the findings related to impairment in theories of mind tasks, as BPVS is frequently used to match groups in theses tasks, following the notion that they are correlated to language ability (see Burack, et al., this issue). Similarly, some group differences (e.g., group effect in reaction time; Mottron, Burack, Stauder, & Robaey, 1999) may result from overestimation of participants with autism when they are matched on RPM. Role of Matching Instruments in the Validation of the Autism/Asperger Distinction The evidence from Study 2 suggests that performance on tasks like the EVIP and RPM (see Fig. 1) differs for persons with autism as compared to those with Asperger syndrome. The level of performance on the block-design task is clearly the highest peak among the persons with autism as it exceeds even performance on the EVIP, which is enhanced in relation to other Mottron areas of functioning. In contrast, the performance of the persons with Asperger syndrome was high on the EVIP, but average on the RPM and block-design task. When these two subgroups are combined, blockdesign and EVIP appear as the two main peaks of performance, although the contribution of participants with autism and Asperger syndrome differs. These results are discrepant with the recent arguments that autism and Asperger syndrome are not different at a neuropsychological level (Miller & Ozonoff, 2000). Rather, the findings presented here indicate that the peaks of abilities of persons with autism and Asperger syndrome are not similar and that this difference needs to be considered when choosing matching instruments. GENERAL RECOMMENDATIONS Some recommendations follow from these two studies. The first recommendation would be to focus on people with autism with mental retardation as well as on higher functioning persons. According to the current trend revealed by Study 1, matching issues, availability of high-functioning individuals and other practical concerns, results in an emphasis on crosssectional studies involving adult, intelligent individuals with PDDs. Besides the positive aspect of increasing our understanding of high-functioning individuals, one may question whether this research strategy will also be associated with conceptual and empirical insights about persons with autism who function in the range of mental retardation. Moreover, the use of adults leads to difficulties in disentangling nuclear cognitive deficits—what we are supposed to look for—from the effects of complex compensatory mechanisms and training or deprivation (or “life experiences,” see Burack et al., this issue)—which are measured. The second recommendation is that the same level of concern regarding standardization for the diagnosis of PDD should be applied to the use of intelligence instruments. Although the scientific community considerably reduced the noise associated with the diagnoses of PDD by the use of standardized diagnosis instruments, the choice of tools for the assessment of intelligence level for matching purposes are quite discrepant. In order to address this problem, the use of the Wechsler scale should be considered as standard scientific practice whereas the other instruments that are commonly used, EVIP–BPVS–PPVT and RPM, should not be employed as matching instruments because they introduce major risks for type-1 and type-2 errors. 479681.qxd 2/2/04 10:56 AM Page 27 Matching Issues and IQ-Instrument Biases The last recommendation is related to the choice of a general versus specific matching variable. The data and arguments presented here lead to two clearly separate types of cognitive studies with regard to matching strategies. In one scenario, errors related to use of over- and underestimation biases are prevalent when searching for main effects of group, as is the case when studying peaks or valleys among cognitive performance. In this situation, the use of the Wechsler instruments is recommended to avoid type-1 or type-2 errors. In contrast, when studying group-by-conditions interactions, an over- or underestimation of the general level of clinical group is less influential. This is the case in fine-tuned cognitive tasks in which the purpose is to disentangle the relative role of two mechanisms (or the influence of one operation on another), rather than main effects of superiority or inferiority. In such cases, group main effects (i.e., possibly resulting from an imprecise matching between groups) may be left uninterpreted (e.g., Mottron et al., 1999). In this case, a task-specific matching variable is preferable to a general one. This variable needs to be justified for each different experiment, thereby precluding a single solution that is valid for all the types of experiments (Burack et al., this issue). ACKNOWLEDGMENTS We thank Geneviève Martel, Patricia Jelenic, and Marie-Josée Caron for research assistance and Jake Burack, Michelle Dawson, and Oriane Landry for editing help on an earlier version of this manuscript. This work was supported by a grant from the Canadian Institute for Health research, “Characterizing cognitive deficit in Autism and Asperger Syndrome.” REFERENCES Burack, J. A., Iarocci, G., Bowler, D., & Mottron, L. (2002). Benefits and pitfalls in the merging of disciplines: The example of developmental psychopathology and the study of persons with autism. Development and Psychopathology, 14, 225–237. Burke, H. R. (1985). Raven’s Progressive Matrices: More on norms, reliability and validity. Journal of Clinical Psychology, 41, 231–235. 27 Datafinder 3.5. Digimed Systems, Montréal, Canada <info@digimedsystems.com>. Dunn, L. M., & Dunn, E. S. (1981). Peabody Picture Vocabulary Test-Revised. Circle Pines, MN: American Guidance Services. Dunn, L. M., Dunn, L. M., Whetton, C., & Pintilie, D. (1982). British Picture Vocabulary Scale. Windsor: NFER-NELSON. Dunn, L. M., Thériault-Whalen, C. M., & Dunn, L. M. (1993). Échelle de vocabulaire en image Peabody. Toronto, Ontario: Psycan. Hermelin, B., & O’Connor, N. (1970). Psychological experiments with autistic children. Oxford: Pergamon Press. Jensen, A. R., Saccuzzo, D. P., & Larsen, G. E. (1988). Equating the standard and advanced forms of the Raven Progressive Matrices. Educational and Psychological Measurement, 48, 1091–1095. Mangiaracina, J., & Simon, M. J. (1986). Comparison of the PPVT-R and WAIS-R in state hospital psychiatric patients. Journal of Clinical Psychology, 42, 817–820. Miller, J. N., & Ozonoff, S. (2000). The external validity of Asperger disorder: Lack of evidence from the domain of neuropsychology. Journal of Abnormal Psychology, 109, 227–238. Minshew, N. J., Sweeney, J., & Luna, B. (2002). Autism as a selective disorder of complex information processing and underdevelopment of neocortical systems. Molecular Psychiatry, 7, S14–15. Mottron, L., Burack, J. A., Stauder, J. E., & Robaey, P. (1999). Perceptual processing among high-functioning persons with autism. Journal of Child Psychology and Psychiatry, 40, 203–211. Mottron, L., & Burack, J. (2001). Enhanced perceptual functioning in the development of autism. In J. A. Burack, T. Charman, N. Yirmiya, & P. R. Zelazo (Eds.), The development of autism: Perspectives from theory and research (pp. 131–148). Mahwah, NJ: Lawrence Erlbaum. O’Leary, U. M., Rusch, K. M., & Guastello, S. J. (1991). Estimating age-stratified WAIS-R IQs from scores on the Raven’s Standard Progressive Matrices. Journal of Clinical Psychology, 47, 277–284. Ozonoff, S., Pennington, B. F., & Rogers, S. J. (1991). Executive function deficits in high-functioning autistic individuals: Relationship to theory of mind. Journal of Child Psychology and Psychiatry, 32, 1081–1105. Price, D. R., Herbert, D. A., Walsh, M. L., & Law, J. G. (1990). Study of WAIS-R, Quick Test and PPVT IQs for neuropsychiatric patients. Perceptual and Motor Skills, 70, 1320–1322. Raven, J. C. (1938, 1996). Progressive Matrices: A perceptual test of intelligence. Individual form. Oxford: Oxford Psychologists Press Ltd. Raven, J. C. (1947, 1995). Colored Progressive Matrices Sets I and II. Oxford: Oxford Psychologists Press Ltd. Snitz, B. E., Bieliauskas, L. A., Crossland, A., Basso, M. R., & Roper, B. (2000). PPVT-R as an estimate of premorbid intelligence in older adults. The Clinical Neuropsychologist, 14, 181–186. Wechsler, D. (1974). Wechsler Intelligence Scale for ChildrenRevised. New York: The Psychological Corporation. Wechsler, D. (1981). Wechsler Adult Intelligence Scale-Revised. New York: The Psychological Corporation. Yirmiya, N., Erel, O., Shaked, M., & Solomonica-Levi, D. (1998). Meta-analyses comparing theory of mind abilities of individuals with autism, individuals with mental retardation, and normally developing individuals. Psychological Bulletin, 124, 283–307.