XBA in Perspective (December, 2009)

advertisement
XBA in Perspective (December, 2009)
This document was prepared by Dawn P. Flanagan, Samuel O. Ortiz, and Vincent C. Alfonso in response
to some questions and/or misconceptions about the XBA approach as well as the software programs
that were published in Essentials of Cross-Battery Assessment, 2nd edition (Flanagan, Ortiz, & Alfonso,
2007)."
1. What is the Cross-battery Assessment (XBA) approach?
The XBA approach is an assessment method comprised of pillars, guidelines, and procedures
that assist practitioners in measuring a wider and more in-depth range of cognitive abilities and
processes than that represented by a single ability battery (cognitive or achievement) in a manner that is
psychometrically respectable and based on contemporary theory. The XBA approach ensures that
assessments include composites that are relatively pure representations of CHC broad and narrow
abilities, allowing for measurement and interpretation of multiple ability constructs following sound
psychometric principles and procedures. Because most cognitive batteries provide adequate estimates
of broad CHC abilities (e.g., Gf, Gc, Gv), the XBA approach is most useful for ensuring that adequate
measurement of more specific or narrow abilities and processes are represented in assessment. This
method of assessment is used typically when there is a need to measure an individual’s range of
cognitive and academic capabilities comprehensively. It is important to note that XBA is not a diagnostic
method for SLD or any other ability-related educationally handicapping condition.
2. Can you use XBA to measure both broad and narrow CHC abilities?
Yes. To apply XBA, practitioners need to understand how broad and narrow CHC abilities relate
to the reason(s) for and purpose(s) of the referral. Broad abilities represent “basic constitutional and
longstanding characteristics of individuals that can govern or influence a great variety of behaviors in a
given domain” (Carroll, 1993, p. 634). In general, measurement of broad abilities is done when the
purpose of an evaluation is to examine the breadth of broad cognitive constructs that define overall
intellectual/cognitive functioning or g within the psychometric Gf-Gc (or CHC) tradition. Typically, the
breadth of broad cognitive constructs that may be represented in a comprehensive evaluation include,
Gf, Gc, Gv, Ga, Gsm, Glr, and Gs. The aggregate of these broad abilities provides an estimate of overall
intellectual/cognitive functioning or g. Examination of performance on the separate broad CHC abilities
provides an understanding of an individual’s basic constitutional and longstanding characteristics. It is
recommended that at least two subtests be used to measure a broad ability, each subtest measuring a
qualitatively different aspect of that broad ability (e.g., one measure of Induction and one measure of
General Sequential Reasoning or Deduction to assess broad Gf). Of course, the more qualitatively
different aspects of the broad ability that are assessed, the better the measurement and estimate of the
broad ability.
When assessing broad ability constructs, the specific narrow abilities that make up the construct
are not necessarily of considerable importance. According to the construct validation research,
individuals who perform above average on one aspect of a construct (or broad ability) ought to perform
above average on all aspects of the construct (Messick, 1995). In other words, if a broad ability is made
up of 5 different narrow abilities, then it is reasonable to expect that an individual will perform about
the same on each of these five narrow ability measures. Following this line of reasoning, when broad
abilities are the focus of assessment, then it is important to ensure that they are measured via at least
two qualitatively different narrow ability indicators of the broad ability. For example, if broad ability #1
is made up of narrow abilities a, b, c, d, and e, then it can be represented in an assessment by any
combination of a, b, c, d, and e. Theoretically (and in large nationally representative samples), broad
ability #1 would be equally well represented by narrow abilities a and d as by narrow abilities c and e.
Therefore, when assessing broad abilities, it is generally only necessary to ensure that the broad ability
is represented and measured via at least two qualitatively different narrow ability indicators. If the
individual performs about the same on two qualitatively different indicators of a given broad ability,
then it is reasonable to assume that the aggregate of performance on the two narrow ability indicators
provides an adequate measurement and estimate of the broad ability. If there is significant variation
between the two narrow ability indicators within a broad ability domain, and the lower of the two
scores is suggestive of a normative weakness, then further assessment within the broad domain is
warranted.
Unlike broad abilities, narrow abilities “represent greater specializations of abilities, often in
quite specific ways, that reflect the effects of experience and learning, or the adoption of particular
strategies of performance” (Carroll, 1993, p. 634). There are certain circumstances in which a greater
focus on narrow abilities is warranted. For example, if a child appears to have difficulty with memory,
then it would be important to assess memory in-depth, via the use of multiple narrow ability indicators
of Gsm and Glr, for example. By focusing on the full range of narrow abilities that make up Gsm and Glr,
it is possible to identify memory difficulties more precisely. Specifically, careful examination of task
demands (e.g., free recall memory; paired associative learning), nature of response (e.g., type of
strategy used), and task stimuli (e.g., visual, auditory) will assist the evaluator in understanding an
individual’s memory ability and inform the nature and type of compensatory strategies that are likely to
be most useful for him or her. Narrow abilities are also important to consider when SLD is suspected
because it is helpful in identifying specific areas of deficits and to make suggestions and
recommendations.
There is a growing body of research on the relations between narrow CHC abilities and academic
skill development and outcomes (see Flanagan, Ortiz, Alfonso, & Mascolo, 2006 and McGrew &
Wendling, in press, for a summary). For example, narrow abilities within the areas of Gc (e.g., Lexical
Knowledge), Ga (e.g., Phonetic Coding), Glr (e.g., Naming Facility), Gsm (e.g., Working Memory), and Gs
(e.g., Perceptual Speed) have been found to show a significant relationship to basic reading skills.
Knowledge of this research guides test and task selection and interpretation for individuals who have
been referred for suspected SLD in reading.
In assessment, narrow abilities should also be represented by at least two subtests. Because
most intelligence batteries do not contain multiple measures of the same narrow abilities (e.g., two or
more tests of inductive reasoning; two or more tests of spatial relations), it is often necessary to crossbattery in an attempt to measure narrow abilities adequately (see Flanagan et al., 2007).
Overall, measurement of broad and narrow abilities may be conducted for different purposes.
Measurement of broad abilities assists in gaining information about an individual’s range of basic
constitutional and longstanding strengths and weaknesses and in determining overall intellectual
functioning. Measurement of narrow abilities assists in gaining more specific (diagnostic) information
about specializations of abilities, or strategies of performance based on learning and experience, that
may be problematic for or slow to develop in the individual and that may contribute to academic and
learning difficulties. Information about narrow ability and processing strengths and weaknesses may be
more useful for the purpose of identifying the most appropriate intervention plans. When the purpose
of the evaluation involves SLD and identifying specific causes of academic and learning difficulties,
measurement of both broad and narrow abilities seems warranted (see Flanagan, Ortiz, & Alfonso, 2007
for details).
3. What is the role of XBA in SLD identification?
The XBA approach addresses the longstanding need within the entire field of assessment, from
learning disabilities to neuropsychological assessment, for methods that “provide a greater range of
information about the ways individuals learn – the ways individuals receive, store, integrate, and
express information” (Brackett & McPherson, 1996, p. 80). The more practitioners know about the
relations between cognitive processing and learning and achievement, the better able they will be at
organizing assessments and making interpretations that are relevant to suspected learning disability.
Once the practitioner determines the abilities and processes that should be measured, he/she may use
XBA to ensure that these constructs are measured, based on one or more batteries, in a manner that is
psychometrically sound and based on contemporary theory. Whenever models of SLD identification rely
on, at least in part, the measurement of cognitive abilities or processes, then XBA may be used. XBA, in
and of itself, is not a method of SLD identification. Rather, XBA is a method that is designed to ensure
that the abilities and processes that may be responsible for learning difficulties are measured reliably
and validly, which often requires using more than a single intelligence battery.
4. What is meant by the term “outlier” within the context of XBA?
When three subtest scores within a broad ability domain are evaluated, and there is statistically
significant variation among the scores, often a cluster is generated between two scores that are not
significantly different (i.e., the difference between the two scores is not greater than 15). The third score
that is not included in the cluster is referred to as an outlier. The formulae and discussion about how an
outlier is determined is found on page 98 of Essentials of Cross-Battery Assessment, 2nd edition
(Flanagan, Ortiz, & Alfonso, 2007). In the XBA approach, outliers are not ignored and are always
reported. They are either explained (e.g., in the case of a spoiled test) or other data are gathered in an
attempt to understand why performance on the subtest was unexpected. Examples of factors that may
lead to significantly different scores within a broad cognitive domain include lack of motivation, fatigue,
interruption of the task, violation of standardization, and so forth. In cases where a score falls in the
normative weakness range, it may be important to obtain another measure of that specific narrow
ability to determine whether there is actually a normative deficit in the specific narrow ability in
question. In general, whenever two or more measures do not converge as expected, an explanation
(based on observations, other data sources, etc.) or follow up assessment is warranted.
5. Does XBA eliminate the need for clinical judgment?
No. There is simply no way to make the psychological diagnostic process, particularly in the case
of SLD, a formulaic or completely objective exercise. Tests and tools do not diagnose. Clinicians do. The
tools that practitioners use are only intended to enhance their ability to make diagnostic decisions in a
reliable and valid way, not to make those decisions for them. Even in the case of XBA, where extensive
guidelines are suggested to provide clarity in both measurement and interpretation, there are
exceptions or situations where such guidelines do not necessarily apply or make the most sense.
Guidelines are useful most of the time, but not all of the time. The XBA approach is a set a procedures
that guides assessment and interpretation practices, but that does not dictate these practices.
Therefore, the XBA guidelines should not be applied rigidly. For example, one guideline associated with
the XBA approach is that scores that fall below 85 are suggestive of a normative weakness or deficit
(with the assumption being that all other scores – greater than 85 – are either “normal” or average [85115]. There are cognitive score performances below 85 that are indeed quite problematic for a child and
that adversely affect academic achievement; but, the same can be said in many situations about scores
of 87 or 88 or 89, for example. Thus, when guidelines are used rigidly, all clinical judgment is removed,
which effectively vitiates the practice of drawing logical conclusions from converging data sources
regarding why the 87, 88, or 89 may represent the underlying cause of or most reasonable explanation
for a child’s academic skill deficiency. Sound psychological practice, particularly in the arena of
assessment, will always necessitate some degree of clinical/professional judgment.
The key is being able to support decisions and judgments with actual data and evidence. We have
always advocated strongly that any and all decisions made in the course of rendering a diagnosis or
classification of disability must be supported by convergent data that establish a convincing and
compelling case. Such practice is predicated on an understanding that certain approaches undermine
sound interpretation of the meaning or significance of certain data. For example, failure to use
confidence bands or to consider measurement error, strict adherence to cutoff values for classification,
use of a single score or procedure, and discounting or ignoring data that are contrary to the proposed
interpretations, are all examples of poor practice. In short, there are no tests, tools, software programs,
or the like that replace clinical judgment. The set of rules and software programs that accompany XBA
may be used to guide assessment and interpretation of performance. Whether or not the data gathered
support a diagnosis or classification of SLD is a judgment or decision made by the practitioner based on
all data, which extends far beyond a profile of broad and narrow cognitive ability and processing
strengths and weaknesses. Determining how such a profile interacts with the child’s environment and
unique learning experiences is not based on a formula or a cutoff score, but is part and parcel for
diagnosis/classification and treatment and is most certainly based, in part, on clinical judgment.
6. Is XBA simply another “cookbook” approach like the discrepancy method in identifying SLD?
No. The XBA approach provides practitioners with a set of guidelines for organizing assessments
and for making systematic, reliable, and valid interpretations of an individual’s cognitive ability and
processing strengths and weaknesses, based on one or more intelligence batteries and/or special
purpose tests in a manner that is consistent with current CHC theory and research. The XBA guidelines
should not be used as rigid cutoffs for diagnosis/classification purposes. Practitioners who use the XBA
guidelines should be able to draw reliable, valid, and useful information from data across cognitive and
achievement batteries. However, these data must be considered within the context of other data
sources and information about the individual prior to making diagnostic/classification decisions.
7. Aren’t XBA and the “Operational Definition of SLD” the same thing?
No. Part of the Operational Definition of SLD (Flanagan, Ortiz, Alfonso, & Mascolo, 2002, 2006)
involves understanding a student’s unique pattern of cognitive ability and processing strengths and
weaknesses. The XBA approach is one method that may be used for this purpose.
8. What is the purpose of the programs on the CD-ROM?
The programs on the CD-ROM represent a convenient way to use the principles and methods
described in Essentials of Cross-Battery Assessment, 2nd edition (Flanagan, Ortiz, & Alfonso, 2007).
9. What is the specific purpose of the Cross-battery Data Management and Interpretive Assistant (XBA
DMIA v1.0)?
The purpose of XBA DMIA v1.0 is to facilitate the organization, management and interpretation
of data gathered from XBA. The XBA DMIA is not a scoring program. Therefore, it does not convert raw
scores to scaled scores or standard scores. Users of the XBA DMIA are responsible for following the
respective test publishers’ administration and scoring guidelines. Features of the program are as follows:
•
Allows data to be entered on separate tabs for the following batteries: WISC-IV, WPPSI-III, WJ III,
SB5, KABC-II, DAS-II
•
Assists in interpreting data from individual intelligence batteries and special purpose tests
•
Allows for data to be entered in individual CHC domains (Gf, Gc, Glr, Gsm, Ga, Gv, Gs, Gq, Grw)
via drop down menus (from Appendix B in book)
•
Assists in interpreting data from across batteries
•
Graphs data automatically
10. Are there any exceptions to the interpretation rules of the XBA approach?
Yes. In order to make XBA defensible from a psychometric standpoint, Flanagan, Ortiz, and
Alfonso (2007) developed guidelines based on “general rules of thumb.” The rules were based on the
notion that clusters or composites should only be interpreted as adequate estimates of broad or narrow
abilities when they are either unitary or nonunitary, but clinically meaningful. As such, the rules are
based on both statistical significance and the normative ranges in which scores fall (e.g., Average range,
Below Average range). A discussion of the XBA interpretive guidelines may be found beginning on page
95 in Essentials of Cross-Battery Assessment, 2nd edition.
It is important to note that these rules are meant to be followed in situations where test-based
norms are not available. Consider the following example:
WJ III Analysis-Synthesis (Gf-RG) = 100
WJ III Concept Formation (Gf-I) = 80
When these standard scores are entered into the WJ III tab of the XBA DMIA, along with the
corresponding WJ III Gf standard score (as may be found on the WJ III compuscore printout), the user
will be told that the WJ III Gf cluster is not interpretable (based on the criteria provided in Table 3.1 in
Essentials of Cross-Battery Assessment, 2nd edition). The guidelines for the XBA approach will instruct the
practitioner to follow up on the lower of the two scores in this example, since the lower score is
indicative of a normative weakness (i.e., it is less than 85). To assist in determining whether the Concept
Formation standard score is spuriously low or a reliable and valid estimate of Inductive Reasoning, it is
recommended that an additional measure of Inductive Reasoning be administered to the individual.
The following test was administered with the corresponding result:
WRIT Matrices (Gf-I) = 82
Based on the administration of the WRIT subtest, the practitioner now has additional
information about the individual’s Inductive Reasoning. However, because there are no norms available
for the combination of the WJ III and WRIT tests, the CHC tab of the XBA DMIA was created and is based
on rules for combining tests that were derived from different batteries.
The CHC tab of the XBA DMIA would calculate an arithmetic average (81) for the two measures
of Gf-I and report the measure of Gf-RG as an “outlier”. This particular outcome is linked to “interpretive
statement 4” in the Essentials book. This outcome demonstrates that in the area of Gf, the individual
has a normative weakness in Inductive Reasoning and a relative strength in General Sequential
Reasoning or Deduction.
Consider an alternative outcome:
WRIT Matrices (Gf-I) = 98
The CHC tab of the XBA DMIA would calculate an arithmetic average (99) for Analysis-Synthesis
and WRIT Matrices and call Concept Formation an “outlier”. This particular outcome is linked to
“interpretive statement 2” in the Essentials book. This outcomes demonstrates that broad Gf ability,
based on two qualitatively different indicators of Gf (Inductive and Deductive Reasoning ) is Average
and that the individual’s performance on Concept Formation was likely an anomalous finding. According
to Flanagan and colleagues, “in cases in which anomalous results are obtained, it is important that the
examiner provide actual reasons for such results. Examples of factors that may lead to anomalous
results include lack of motivation, fatigue, interruption of the task, violation of standardization, and so
forth. In general, whenever two or more measures do not converge as expected (e.g., two measures of
the same narrow ability or process), an explanation is warranted.
While the above guidelines are helpful, there may be times when an alternative approach to
interpretation is preferred. Consider the following:
WJ III Analysis-Synthesis (Gf-RG) = 98
WJ III Concept Formation (Gf-I) = 83
In this situation the WJ III tab will show that Gf is not interpretable because the difference between the
two scores is statistically significant (see Table 3.1). Because the lower of the two scores is suggestive of
a normative weakness, it is recommended that another measure of Gf-I be administered to determine if
Gf-I is a true weakness.
WRIT Matrices (Gf-I) = 86
When these three scores (two from WJ III and one from WRIT) are entered into the CHC tab, the
program provides a broad ability cluster (92) for Analysis-Synthesis (98) and Matrices (86) because these
two standard scores are not significantly different from one another and they are in the same normative
range (i.e., within normal limits/average range [85-115]). Thus, the interpretation is that the individual’s
Gf is average (92), with a spuriously low Concept Formation performance (which the practitioner will
need to explain).
A better interpretation would be that Gf-I is a weakness for the individual and that Gf-RG is a
relative strength. Thus, while the XBA interpretive guidelines may result in accurate and meaningful
interpretations of data most of the time, they do not do so all of the time. Knowing how the program
works, by understanding the interpretive rules of thumb, is critical to proper interpretation of data.
In short, the CHC tab groups data from different batteries according to the XBA guidelines and
links that configuration of scores to an interpretive statement. These subtest groupings and
corresponding interpretive statements are appropriate and meaningful in most situations. However, if
applied blindly, then one might make an interpretation that does not describe performance in the most
effective and meaningful way, as demonstrated in the example above.
11. Does the XBA DMIA eliminate the need for clinical/professional judgment?
No. There is simply no way to make the psychological diagnostic process, particularly in the case of SLD,
a formulaic or completely objective exercise. Tests and tools do not diagnose. People do. The tools that
practitioners use are only intended to enhance their ability to make diagnostic decisions, not to make
those decisions for them. Even in the case of XBA, where extensive guidelines are suggested to provide
clarity in both measurement and interpretation; there are exceptions or situations where such
guidelines do not necessarily apply or make the most sense. Guidelines are useful most of the time, but
not all of the time. The XBA approach is a set a rules that guides assessment and interpretation
practices, but that does not dictate these practices. Therefore, the XBA guidelines should not be applied
rigidly. For example, one guideline associated with the XBA approach is that scores that fall below 85
are suggestive of a normative weakness or deficit (with the assumption being that all other scores –
greater than 85 – are either “normal” or within the average range [85-115]). There are cognitive score
performances below 85 that are indeed quite problematic for a child and that adversely affect academic
achievement; but, the same can be said in many situations about scores of 87 or 88 or 89, for example.
Thus, when guidelines are used rigidly, clinical judgment is removed, which effectively vitiates the
practice of drawing logical conclusions from converging data sources regarding why the 87, 88, or 89
may represent the underlying cause of or most reasonable explanation for a child’s academic skill
deficiency. Sound psychological practice, particularly in the arena of assessment, will always necessitate
some degree of clinical/professional judgment.
The key is being able to support decisions and judgments with actual data and evidence. My colleagues
and I have always advocated strongly that any and all decisions made in the course of rendering a
diagnosis or classification of disability must be bolstered by convergent data that establish a convincing
and compelling case. Such practice is predicated on an understanding that certain approaches
undermine sound interpretation of the meaning or significance of certain data. For example, failure to
use confidence bands or to consider measurement error, strict adherence to cutoff values for
classification, use of a single score or procedure, and discounting or ignoring data that are contrary to
the proposed interpretations, are all examples of poor practice. In short, there are no tests, tools,
software programs, or the like that replace clinical judgment. The set of rules and software programs
that accompany XBA may be used to guide assessment and interpretation of performance. Whether or
not the data gathered support a diagnosis or classification of SLD is a judgment made by the practitioner
based on the totality of the data, which extends far beyond a profile of broad and narrow cognitive
ability and processing strengths and weaknesses. Determining how such a profile interacts with the
child’s environment and unique learning experiences is not based on a formula or a cutoff score, but it is
part and parcel for diagnosis/classification and treatment and is most certainly based, in part, on clinical
judgment.
12. What is the specific purpose of the C-LIM?
The dilemma that has long faced practitioners involved in testing culturally and linguistically
diverse individuals is whether the obtained results are actual (i.e., valid) reflections of ability (or lack
thereof) or instead simply an indication of their cultural knowledge and English language proficiency
(i.e., invalid). According to the Essentials of Cross-Battery Assessment, 2nd Edition (Flanagan, Ortiz &
Alfonso, 2007), “the basic question to be addressed in the evaluation of diverse individuals boils down to
whether the obtained results reflect cultural or linguistic difference or whether they indicate the
presence of some type of disability. This difference versus disorder question is the very reason for the
development of the C-LIM” (p. 175). Thus, the C-LIM is a check on the validity of test results—that is, did
the evaluation result in actual measurement of the intended constructs (e.g., Gf, Gc, Gv, etc.) or in the
measurement of unintended constructs (e.g., level of acculturation and English language proficiency). If
it was the former, results are valid and may be examined and interpreted, preferably in accordance with
XBA principles. If it was the latter, the results are invalid and no inferences regarding functioning in the
various ability domains should be made on the basis of the collected scores. The C-LIM simply arranges
subtests in a manner that allows examination of the impact of an individual’s level of developmentally
based acculturative knowledge and English proficiency on test performance. When the impact on an
individual’s test scores is evaluated to be primarily due to cultural and linguistic factors, the test results
are rendered invalid. When the impact is judged to be only contributory, and not primary, the test
results can stand as valid and the data may be interpreted as appropriate, including as support for the
presence of a disability. The C-LIM is neither designed nor intended to be used as a diagnostic tool in
and of itself. Rather, the central purpose of the C-LIM is to give any practitioner, bilingual or not, the
ability to evaluate in a systematic manner, the degree to which cultural and linguistic issues may have
affected the validity of their test results. Getting over this huge obstacle is a critical step in being able to
carry out nondiscriminatory assessment.
13. Is the C-LIM “research-based?”
Yes. The C-LIM rests upon two literature bases—an historical one that has been built over the
past century and a new one that is evaluating the current versions of test batteries. With respect to the
historical research base, the C-LIM is predicated upon the persistent research finding that individuals
who are not native English speakers (often simply called “bilinguals”) tend to score about a standard
deviation lower on tests of intelligence and cognitive ability than native English speakers (Figueroa,
1989; Valdes & Figueroa, 1994). Over the years, it has become clear that this difference in functioning is
not due to lower functioning in general, but related to the nature of the tests. Tests that rely more on
language skills and cultural objects (i.e., “verbal”) tend to be much more difficult for culturally and
linguistically diverse individuals than performance on tests that are more novel and abstract, and do not
require much language ability (i.e., “nonverbal). This finding is extremely robust and has been observed
for nearly a century dating back to the advent of psychological testing when immigrants at Ellis Island
were subjected to the early versions of the Binet Scales (Brigham, 1923; Cummins, 1984; Goddard, 1917;
Jensen, 1974, 1976; Mercer, 1979; Sanchez, 1934; Vukovich & Figueroa, 1982; Yerkes, 1921). By
examining the mean values from the tests used in these studies (where available), identifying the CHC
constructs measured by the tests, and using an expert consensus procedure to evaluate task
characteristics, a simple hierarchy emerged that arranged tests from “highest” to “lowest” scores which
reflected the degree of impact of cultural loading and linguistic demand and forms the general “pattern”
that forms the basis for comparison in determining difference versus disorder. There is no question that
the general pattern of decline in performance for diverse individuals varies in accordance with the
cultural/linguistic nature of the test and that a simple dichotomous view (i.e., verbal vs. nonverbal) is
both misleading and inaccurate. Nevertheless, as a general finding, the fact that bilinguals perform more
poorly on language and culture-based tasks than they do on more abstract/novel tasks is too well
established to debate it much. The second research base, performance of bilinguals on the newer
versions of tests, is an ongoing effort and supports both the general pattern of decline observed in older
tests as well as informs the Culture-Language Test Classifications as described below.
14. How were the Culture-Language Test Classifications used within the C-LIM established?
The Culture-Language Test Classifications are merely an extension of the general pattern of
decline in performance illustrated in the extensive literature on the testing of bilinguals as described
above. Because cultural and linguistic variables are highly correlated, we could have simply classified the
tests strictly in accordance with their mean values. However, we thought there might be some clinical
utility in distinguishing tests that had more (or less) cultural content from those that had more (or less)
language demands. We still believe this to be true, particularly in evaluations of English Learners with
speech-language problems. In short, the classifications were initially and continue to be made using a
variety of procedures including: 1) actual subtest means obtained from research that tested bilingual
populations; 2) identification of the CHC ability construct measured by the test (either as reported by
the test publisher or determined by factor analysis); and 3) an expert consensus process to examine the
task characteristics utilized in the administration or response to a particular test. Unfortunately, the fact
is bilinguals have not been formally researched with every known intelligence or cognitive ability test
and their rapid evolution coupled with ongoing revision means this is unlikely to ever happen. But given
the clinical purpose of the classifications (analysis in variation that may be due to task characteristics)
and the main intent (to evaluate a general pattern of decline), use of the other two methods appears to
suffice. Consider first, for example, that in the case of the latest versions of some batteries, we were
able to use historical research as a preliminary guide to classification because many tests are actually
carried over from version to version anyway (e.g., Wechsler Vocabulary, Similarities, Block Design, etc.).
Second, many tests use extremely similar tasks to measure the very same constructs. This is not
surprising because intelligence tests have always been more similar than dissimilar and most can be
traced to the original Binet Scales or tasks developed for the Army Mental Tests. Third, the
advancements in test construction and design, as related to the use and application of CHC theory, has
also promoted better consistency in this regard and therefore tests with similar instructions,
characteristics, and intended constructs correlate highly and are subject to similar effects from culture
and language. Tests that measure Gc, for example, are difficult if not impossible, to measure in a “low
culture/low language” manner. Thus, tests of Gc tend to be very similar in terms of cultural loading and
linguistic demand. Fourth, the classifications are based on a scale that uses only three rankings, low,
moderate, and high and we do not attempt to make fine discriminations as they are unnecessary.
Whereas a practitioner may dispute whether a test should be low or moderate or moderate or high, it is
not difficult to clearly separate tests that are low on both dimensions from those that are high on both
dimensions. This is all that is necessary to provide information regarding performance relative to task
characteristics and it does not alter the basic declining pattern revealed by the historical data which is
the basis of the C-LIM. And finally, we do have actual data on the performance of bilingual individuals on
several tests which allows us to both examine and refine the classifications (and misclassifications) and
extrapolate easily in the classification of other similar tests including the WJ-III (Dynda, 2007; SoteloDynega, 2006), WISC-III/IV (Nieves-Brull, 2005; Tychanska, 2008), SB-IV/V (Aziz, 2009, Lella-Sourvalis,
2009), DAS-II (Aguerra, 2005), and others. Indeed, our research has indicated that the declining pattern
is so profound that it does not appear as a function of other variables such as age, grade, battery, or
ethnicity (Beharry, 2008). Certainly, there may be tests we have “misclassified,” but that is the point of
new theory and new research. We never intended the classifications to be absolute, but in fact,
alterable when new data suggest alternative classifications. In the meantime, for what the C-LIM is
designed to do, such classification issues are minor and the fact that we continue to see the historical
pattern of decline in our present research suggests that the classifications are either correct, or very
close. As noted previously, this is a brand new area of research (less than a decade old), tests are
constantly being updated (five major intelligence tests have been revised since 2001), and it represents
issues and implications for practice that are not well understood. We expect that shortly this avenue of
research will become more prevalent in the scientific literature.
15. Are there any exceptions to the interpretive rules that underlie the C-LIM?
We have recently begun to evaluate patterns of performance in bilinguals who fall outside the
typical learning disability versus normal categories and have learned that there does appear to be
evidence of unique patterns of decline on the basis of type of disability. This research is quite
preliminary, but we have seen greater decline in the slope of scores for individuals with speech-language
problems and a flatter slope but significantly lower overall mean value for individuals with global
cognitive impairment. These patterns are different than that exhibited by individuals with learning
disabilities and suggest that the C-LIM may be useful in helping identify one type of disorder versus
another. Thus, there may be times when a declining pattern is evident, but when it is not in the range
delineated by the C-LIM, that it may still indicate disability. In addition, other factors such as motivation,
fatigue, emotional problems, scoring/administration errors, and the like can also lead to problems in
evaluating the presence of a declining pattern.
16. Does the C-LIM eliminate the need for clinical/professional judgment?
No. As noted previously, the C-LIM is not a diagnostic tool or test. It is a system that relies
completely on the judgment of the clinician who administered the tests and is designed to allow for
research to guide evaluation of the impact of cultural and linguistic differences on test performance in a
systematic manner. This determination, essentially “difference vs. disorder” is not something that can
be reduced to a formula or evaluated directly via testing. It will always remain subject to clinical and
professional judgment. To date, the C-LIM remains the only method that provides practitioners the
ability to make this one very important determination in a systematic and research-based manner.
Download