XBA in Perspective (December, 2009) This document was prepared by Dawn P. Flanagan, Samuel O. Ortiz, and Vincent C. Alfonso in response to some questions and/or misconceptions about the XBA approach as well as the software programs that were published in Essentials of Cross-Battery Assessment, 2nd edition (Flanagan, Ortiz, & Alfonso, 2007)." 1. What is the Cross-battery Assessment (XBA) approach? The XBA approach is an assessment method comprised of pillars, guidelines, and procedures that assist practitioners in measuring a wider and more in-depth range of cognitive abilities and processes than that represented by a single ability battery (cognitive or achievement) in a manner that is psychometrically respectable and based on contemporary theory. The XBA approach ensures that assessments include composites that are relatively pure representations of CHC broad and narrow abilities, allowing for measurement and interpretation of multiple ability constructs following sound psychometric principles and procedures. Because most cognitive batteries provide adequate estimates of broad CHC abilities (e.g., Gf, Gc, Gv), the XBA approach is most useful for ensuring that adequate measurement of more specific or narrow abilities and processes are represented in assessment. This method of assessment is used typically when there is a need to measure an individual’s range of cognitive and academic capabilities comprehensively. It is important to note that XBA is not a diagnostic method for SLD or any other ability-related educationally handicapping condition. 2. Can you use XBA to measure both broad and narrow CHC abilities? Yes. To apply XBA, practitioners need to understand how broad and narrow CHC abilities relate to the reason(s) for and purpose(s) of the referral. Broad abilities represent “basic constitutional and longstanding characteristics of individuals that can govern or influence a great variety of behaviors in a given domain” (Carroll, 1993, p. 634). In general, measurement of broad abilities is done when the purpose of an evaluation is to examine the breadth of broad cognitive constructs that define overall intellectual/cognitive functioning or g within the psychometric Gf-Gc (or CHC) tradition. Typically, the breadth of broad cognitive constructs that may be represented in a comprehensive evaluation include, Gf, Gc, Gv, Ga, Gsm, Glr, and Gs. The aggregate of these broad abilities provides an estimate of overall intellectual/cognitive functioning or g. Examination of performance on the separate broad CHC abilities provides an understanding of an individual’s basic constitutional and longstanding characteristics. It is recommended that at least two subtests be used to measure a broad ability, each subtest measuring a qualitatively different aspect of that broad ability (e.g., one measure of Induction and one measure of General Sequential Reasoning or Deduction to assess broad Gf). Of course, the more qualitatively different aspects of the broad ability that are assessed, the better the measurement and estimate of the broad ability. When assessing broad ability constructs, the specific narrow abilities that make up the construct are not necessarily of considerable importance. According to the construct validation research, individuals who perform above average on one aspect of a construct (or broad ability) ought to perform above average on all aspects of the construct (Messick, 1995). In other words, if a broad ability is made up of 5 different narrow abilities, then it is reasonable to expect that an individual will perform about the same on each of these five narrow ability measures. Following this line of reasoning, when broad abilities are the focus of assessment, then it is important to ensure that they are measured via at least two qualitatively different narrow ability indicators of the broad ability. For example, if broad ability #1 is made up of narrow abilities a, b, c, d, and e, then it can be represented in an assessment by any combination of a, b, c, d, and e. Theoretically (and in large nationally representative samples), broad ability #1 would be equally well represented by narrow abilities a and d as by narrow abilities c and e. Therefore, when assessing broad abilities, it is generally only necessary to ensure that the broad ability is represented and measured via at least two qualitatively different narrow ability indicators. If the individual performs about the same on two qualitatively different indicators of a given broad ability, then it is reasonable to assume that the aggregate of performance on the two narrow ability indicators provides an adequate measurement and estimate of the broad ability. If there is significant variation between the two narrow ability indicators within a broad ability domain, and the lower of the two scores is suggestive of a normative weakness, then further assessment within the broad domain is warranted. Unlike broad abilities, narrow abilities “represent greater specializations of abilities, often in quite specific ways, that reflect the effects of experience and learning, or the adoption of particular strategies of performance” (Carroll, 1993, p. 634). There are certain circumstances in which a greater focus on narrow abilities is warranted. For example, if a child appears to have difficulty with memory, then it would be important to assess memory in-depth, via the use of multiple narrow ability indicators of Gsm and Glr, for example. By focusing on the full range of narrow abilities that make up Gsm and Glr, it is possible to identify memory difficulties more precisely. Specifically, careful examination of task demands (e.g., free recall memory; paired associative learning), nature of response (e.g., type of strategy used), and task stimuli (e.g., visual, auditory) will assist the evaluator in understanding an individual’s memory ability and inform the nature and type of compensatory strategies that are likely to be most useful for him or her. Narrow abilities are also important to consider when SLD is suspected because it is helpful in identifying specific areas of deficits and to make suggestions and recommendations. There is a growing body of research on the relations between narrow CHC abilities and academic skill development and outcomes (see Flanagan, Ortiz, Alfonso, & Mascolo, 2006 and McGrew & Wendling, in press, for a summary). For example, narrow abilities within the areas of Gc (e.g., Lexical Knowledge), Ga (e.g., Phonetic Coding), Glr (e.g., Naming Facility), Gsm (e.g., Working Memory), and Gs (e.g., Perceptual Speed) have been found to show a significant relationship to basic reading skills. Knowledge of this research guides test and task selection and interpretation for individuals who have been referred for suspected SLD in reading. In assessment, narrow abilities should also be represented by at least two subtests. Because most intelligence batteries do not contain multiple measures of the same narrow abilities (e.g., two or more tests of inductive reasoning; two or more tests of spatial relations), it is often necessary to crossbattery in an attempt to measure narrow abilities adequately (see Flanagan et al., 2007). Overall, measurement of broad and narrow abilities may be conducted for different purposes. Measurement of broad abilities assists in gaining information about an individual’s range of basic constitutional and longstanding strengths and weaknesses and in determining overall intellectual functioning. Measurement of narrow abilities assists in gaining more specific (diagnostic) information about specializations of abilities, or strategies of performance based on learning and experience, that may be problematic for or slow to develop in the individual and that may contribute to academic and learning difficulties. Information about narrow ability and processing strengths and weaknesses may be more useful for the purpose of identifying the most appropriate intervention plans. When the purpose of the evaluation involves SLD and identifying specific causes of academic and learning difficulties, measurement of both broad and narrow abilities seems warranted (see Flanagan, Ortiz, & Alfonso, 2007 for details). 3. What is the role of XBA in SLD identification? The XBA approach addresses the longstanding need within the entire field of assessment, from learning disabilities to neuropsychological assessment, for methods that “provide a greater range of information about the ways individuals learn – the ways individuals receive, store, integrate, and express information” (Brackett & McPherson, 1996, p. 80). The more practitioners know about the relations between cognitive processing and learning and achievement, the better able they will be at organizing assessments and making interpretations that are relevant to suspected learning disability. Once the practitioner determines the abilities and processes that should be measured, he/she may use XBA to ensure that these constructs are measured, based on one or more batteries, in a manner that is psychometrically sound and based on contemporary theory. Whenever models of SLD identification rely on, at least in part, the measurement of cognitive abilities or processes, then XBA may be used. XBA, in and of itself, is not a method of SLD identification. Rather, XBA is a method that is designed to ensure that the abilities and processes that may be responsible for learning difficulties are measured reliably and validly, which often requires using more than a single intelligence battery. 4. What is meant by the term “outlier” within the context of XBA? When three subtest scores within a broad ability domain are evaluated, and there is statistically significant variation among the scores, often a cluster is generated between two scores that are not significantly different (i.e., the difference between the two scores is not greater than 15). The third score that is not included in the cluster is referred to as an outlier. The formulae and discussion about how an outlier is determined is found on page 98 of Essentials of Cross-Battery Assessment, 2nd edition (Flanagan, Ortiz, & Alfonso, 2007). In the XBA approach, outliers are not ignored and are always reported. They are either explained (e.g., in the case of a spoiled test) or other data are gathered in an attempt to understand why performance on the subtest was unexpected. Examples of factors that may lead to significantly different scores within a broad cognitive domain include lack of motivation, fatigue, interruption of the task, violation of standardization, and so forth. In cases where a score falls in the normative weakness range, it may be important to obtain another measure of that specific narrow ability to determine whether there is actually a normative deficit in the specific narrow ability in question. In general, whenever two or more measures do not converge as expected, an explanation (based on observations, other data sources, etc.) or follow up assessment is warranted. 5. Does XBA eliminate the need for clinical judgment? No. There is simply no way to make the psychological diagnostic process, particularly in the case of SLD, a formulaic or completely objective exercise. Tests and tools do not diagnose. Clinicians do. The tools that practitioners use are only intended to enhance their ability to make diagnostic decisions in a reliable and valid way, not to make those decisions for them. Even in the case of XBA, where extensive guidelines are suggested to provide clarity in both measurement and interpretation, there are exceptions or situations where such guidelines do not necessarily apply or make the most sense. Guidelines are useful most of the time, but not all of the time. The XBA approach is a set a procedures that guides assessment and interpretation practices, but that does not dictate these practices. Therefore, the XBA guidelines should not be applied rigidly. For example, one guideline associated with the XBA approach is that scores that fall below 85 are suggestive of a normative weakness or deficit (with the assumption being that all other scores – greater than 85 – are either “normal” or average [85115]. There are cognitive score performances below 85 that are indeed quite problematic for a child and that adversely affect academic achievement; but, the same can be said in many situations about scores of 87 or 88 or 89, for example. Thus, when guidelines are used rigidly, all clinical judgment is removed, which effectively vitiates the practice of drawing logical conclusions from converging data sources regarding why the 87, 88, or 89 may represent the underlying cause of or most reasonable explanation for a child’s academic skill deficiency. Sound psychological practice, particularly in the arena of assessment, will always necessitate some degree of clinical/professional judgment. The key is being able to support decisions and judgments with actual data and evidence. We have always advocated strongly that any and all decisions made in the course of rendering a diagnosis or classification of disability must be supported by convergent data that establish a convincing and compelling case. Such practice is predicated on an understanding that certain approaches undermine sound interpretation of the meaning or significance of certain data. For example, failure to use confidence bands or to consider measurement error, strict adherence to cutoff values for classification, use of a single score or procedure, and discounting or ignoring data that are contrary to the proposed interpretations, are all examples of poor practice. In short, there are no tests, tools, software programs, or the like that replace clinical judgment. The set of rules and software programs that accompany XBA may be used to guide assessment and interpretation of performance. Whether or not the data gathered support a diagnosis or classification of SLD is a judgment or decision made by the practitioner based on all data, which extends far beyond a profile of broad and narrow cognitive ability and processing strengths and weaknesses. Determining how such a profile interacts with the child’s environment and unique learning experiences is not based on a formula or a cutoff score, but is part and parcel for diagnosis/classification and treatment and is most certainly based, in part, on clinical judgment. 6. Is XBA simply another “cookbook” approach like the discrepancy method in identifying SLD? No. The XBA approach provides practitioners with a set of guidelines for organizing assessments and for making systematic, reliable, and valid interpretations of an individual’s cognitive ability and processing strengths and weaknesses, based on one or more intelligence batteries and/or special purpose tests in a manner that is consistent with current CHC theory and research. The XBA guidelines should not be used as rigid cutoffs for diagnosis/classification purposes. Practitioners who use the XBA guidelines should be able to draw reliable, valid, and useful information from data across cognitive and achievement batteries. However, these data must be considered within the context of other data sources and information about the individual prior to making diagnostic/classification decisions. 7. Aren’t XBA and the “Operational Definition of SLD” the same thing? No. Part of the Operational Definition of SLD (Flanagan, Ortiz, Alfonso, & Mascolo, 2002, 2006) involves understanding a student’s unique pattern of cognitive ability and processing strengths and weaknesses. The XBA approach is one method that may be used for this purpose. 8. What is the purpose of the programs on the CD-ROM? The programs on the CD-ROM represent a convenient way to use the principles and methods described in Essentials of Cross-Battery Assessment, 2nd edition (Flanagan, Ortiz, & Alfonso, 2007). 9. What is the specific purpose of the Cross-battery Data Management and Interpretive Assistant (XBA DMIA v1.0)? The purpose of XBA DMIA v1.0 is to facilitate the organization, management and interpretation of data gathered from XBA. The XBA DMIA is not a scoring program. Therefore, it does not convert raw scores to scaled scores or standard scores. Users of the XBA DMIA are responsible for following the respective test publishers’ administration and scoring guidelines. Features of the program are as follows: • Allows data to be entered on separate tabs for the following batteries: WISC-IV, WPPSI-III, WJ III, SB5, KABC-II, DAS-II • Assists in interpreting data from individual intelligence batteries and special purpose tests • Allows for data to be entered in individual CHC domains (Gf, Gc, Glr, Gsm, Ga, Gv, Gs, Gq, Grw) via drop down menus (from Appendix B in book) • Assists in interpreting data from across batteries • Graphs data automatically 10. Are there any exceptions to the interpretation rules of the XBA approach? Yes. In order to make XBA defensible from a psychometric standpoint, Flanagan, Ortiz, and Alfonso (2007) developed guidelines based on “general rules of thumb.” The rules were based on the notion that clusters or composites should only be interpreted as adequate estimates of broad or narrow abilities when they are either unitary or nonunitary, but clinically meaningful. As such, the rules are based on both statistical significance and the normative ranges in which scores fall (e.g., Average range, Below Average range). A discussion of the XBA interpretive guidelines may be found beginning on page 95 in Essentials of Cross-Battery Assessment, 2nd edition. It is important to note that these rules are meant to be followed in situations where test-based norms are not available. Consider the following example: WJ III Analysis-Synthesis (Gf-RG) = 100 WJ III Concept Formation (Gf-I) = 80 When these standard scores are entered into the WJ III tab of the XBA DMIA, along with the corresponding WJ III Gf standard score (as may be found on the WJ III compuscore printout), the user will be told that the WJ III Gf cluster is not interpretable (based on the criteria provided in Table 3.1 in Essentials of Cross-Battery Assessment, 2nd edition). The guidelines for the XBA approach will instruct the practitioner to follow up on the lower of the two scores in this example, since the lower score is indicative of a normative weakness (i.e., it is less than 85). To assist in determining whether the Concept Formation standard score is spuriously low or a reliable and valid estimate of Inductive Reasoning, it is recommended that an additional measure of Inductive Reasoning be administered to the individual. The following test was administered with the corresponding result: WRIT Matrices (Gf-I) = 82 Based on the administration of the WRIT subtest, the practitioner now has additional information about the individual’s Inductive Reasoning. However, because there are no norms available for the combination of the WJ III and WRIT tests, the CHC tab of the XBA DMIA was created and is based on rules for combining tests that were derived from different batteries. The CHC tab of the XBA DMIA would calculate an arithmetic average (81) for the two measures of Gf-I and report the measure of Gf-RG as an “outlier”. This particular outcome is linked to “interpretive statement 4” in the Essentials book. This outcome demonstrates that in the area of Gf, the individual has a normative weakness in Inductive Reasoning and a relative strength in General Sequential Reasoning or Deduction. Consider an alternative outcome: WRIT Matrices (Gf-I) = 98 The CHC tab of the XBA DMIA would calculate an arithmetic average (99) for Analysis-Synthesis and WRIT Matrices and call Concept Formation an “outlier”. This particular outcome is linked to “interpretive statement 2” in the Essentials book. This outcomes demonstrates that broad Gf ability, based on two qualitatively different indicators of Gf (Inductive and Deductive Reasoning ) is Average and that the individual’s performance on Concept Formation was likely an anomalous finding. According to Flanagan and colleagues, “in cases in which anomalous results are obtained, it is important that the examiner provide actual reasons for such results. Examples of factors that may lead to anomalous results include lack of motivation, fatigue, interruption of the task, violation of standardization, and so forth. In general, whenever two or more measures do not converge as expected (e.g., two measures of the same narrow ability or process), an explanation is warranted. While the above guidelines are helpful, there may be times when an alternative approach to interpretation is preferred. Consider the following: WJ III Analysis-Synthesis (Gf-RG) = 98 WJ III Concept Formation (Gf-I) = 83 In this situation the WJ III tab will show that Gf is not interpretable because the difference between the two scores is statistically significant (see Table 3.1). Because the lower of the two scores is suggestive of a normative weakness, it is recommended that another measure of Gf-I be administered to determine if Gf-I is a true weakness. WRIT Matrices (Gf-I) = 86 When these three scores (two from WJ III and one from WRIT) are entered into the CHC tab, the program provides a broad ability cluster (92) for Analysis-Synthesis (98) and Matrices (86) because these two standard scores are not significantly different from one another and they are in the same normative range (i.e., within normal limits/average range [85-115]). Thus, the interpretation is that the individual’s Gf is average (92), with a spuriously low Concept Formation performance (which the practitioner will need to explain). A better interpretation would be that Gf-I is a weakness for the individual and that Gf-RG is a relative strength. Thus, while the XBA interpretive guidelines may result in accurate and meaningful interpretations of data most of the time, they do not do so all of the time. Knowing how the program works, by understanding the interpretive rules of thumb, is critical to proper interpretation of data. In short, the CHC tab groups data from different batteries according to the XBA guidelines and links that configuration of scores to an interpretive statement. These subtest groupings and corresponding interpretive statements are appropriate and meaningful in most situations. However, if applied blindly, then one might make an interpretation that does not describe performance in the most effective and meaningful way, as demonstrated in the example above. 11. Does the XBA DMIA eliminate the need for clinical/professional judgment? No. There is simply no way to make the psychological diagnostic process, particularly in the case of SLD, a formulaic or completely objective exercise. Tests and tools do not diagnose. People do. The tools that practitioners use are only intended to enhance their ability to make diagnostic decisions, not to make those decisions for them. Even in the case of XBA, where extensive guidelines are suggested to provide clarity in both measurement and interpretation; there are exceptions or situations where such guidelines do not necessarily apply or make the most sense. Guidelines are useful most of the time, but not all of the time. The XBA approach is a set a rules that guides assessment and interpretation practices, but that does not dictate these practices. Therefore, the XBA guidelines should not be applied rigidly. For example, one guideline associated with the XBA approach is that scores that fall below 85 are suggestive of a normative weakness or deficit (with the assumption being that all other scores – greater than 85 – are either “normal” or within the average range [85-115]). There are cognitive score performances below 85 that are indeed quite problematic for a child and that adversely affect academic achievement; but, the same can be said in many situations about scores of 87 or 88 or 89, for example. Thus, when guidelines are used rigidly, clinical judgment is removed, which effectively vitiates the practice of drawing logical conclusions from converging data sources regarding why the 87, 88, or 89 may represent the underlying cause of or most reasonable explanation for a child’s academic skill deficiency. Sound psychological practice, particularly in the arena of assessment, will always necessitate some degree of clinical/professional judgment. The key is being able to support decisions and judgments with actual data and evidence. My colleagues and I have always advocated strongly that any and all decisions made in the course of rendering a diagnosis or classification of disability must be bolstered by convergent data that establish a convincing and compelling case. Such practice is predicated on an understanding that certain approaches undermine sound interpretation of the meaning or significance of certain data. For example, failure to use confidence bands or to consider measurement error, strict adherence to cutoff values for classification, use of a single score or procedure, and discounting or ignoring data that are contrary to the proposed interpretations, are all examples of poor practice. In short, there are no tests, tools, software programs, or the like that replace clinical judgment. The set of rules and software programs that accompany XBA may be used to guide assessment and interpretation of performance. Whether or not the data gathered support a diagnosis or classification of SLD is a judgment made by the practitioner based on the totality of the data, which extends far beyond a profile of broad and narrow cognitive ability and processing strengths and weaknesses. Determining how such a profile interacts with the child’s environment and unique learning experiences is not based on a formula or a cutoff score, but it is part and parcel for diagnosis/classification and treatment and is most certainly based, in part, on clinical judgment. 12. What is the specific purpose of the C-LIM? The dilemma that has long faced practitioners involved in testing culturally and linguistically diverse individuals is whether the obtained results are actual (i.e., valid) reflections of ability (or lack thereof) or instead simply an indication of their cultural knowledge and English language proficiency (i.e., invalid). According to the Essentials of Cross-Battery Assessment, 2nd Edition (Flanagan, Ortiz & Alfonso, 2007), “the basic question to be addressed in the evaluation of diverse individuals boils down to whether the obtained results reflect cultural or linguistic difference or whether they indicate the presence of some type of disability. This difference versus disorder question is the very reason for the development of the C-LIM” (p. 175). Thus, the C-LIM is a check on the validity of test results—that is, did the evaluation result in actual measurement of the intended constructs (e.g., Gf, Gc, Gv, etc.) or in the measurement of unintended constructs (e.g., level of acculturation and English language proficiency). If it was the former, results are valid and may be examined and interpreted, preferably in accordance with XBA principles. If it was the latter, the results are invalid and no inferences regarding functioning in the various ability domains should be made on the basis of the collected scores. The C-LIM simply arranges subtests in a manner that allows examination of the impact of an individual’s level of developmentally based acculturative knowledge and English proficiency on test performance. When the impact on an individual’s test scores is evaluated to be primarily due to cultural and linguistic factors, the test results are rendered invalid. When the impact is judged to be only contributory, and not primary, the test results can stand as valid and the data may be interpreted as appropriate, including as support for the presence of a disability. The C-LIM is neither designed nor intended to be used as a diagnostic tool in and of itself. Rather, the central purpose of the C-LIM is to give any practitioner, bilingual or not, the ability to evaluate in a systematic manner, the degree to which cultural and linguistic issues may have affected the validity of their test results. Getting over this huge obstacle is a critical step in being able to carry out nondiscriminatory assessment. 13. Is the C-LIM “research-based?” Yes. The C-LIM rests upon two literature bases—an historical one that has been built over the past century and a new one that is evaluating the current versions of test batteries. With respect to the historical research base, the C-LIM is predicated upon the persistent research finding that individuals who are not native English speakers (often simply called “bilinguals”) tend to score about a standard deviation lower on tests of intelligence and cognitive ability than native English speakers (Figueroa, 1989; Valdes & Figueroa, 1994). Over the years, it has become clear that this difference in functioning is not due to lower functioning in general, but related to the nature of the tests. Tests that rely more on language skills and cultural objects (i.e., “verbal”) tend to be much more difficult for culturally and linguistically diverse individuals than performance on tests that are more novel and abstract, and do not require much language ability (i.e., “nonverbal). This finding is extremely robust and has been observed for nearly a century dating back to the advent of psychological testing when immigrants at Ellis Island were subjected to the early versions of the Binet Scales (Brigham, 1923; Cummins, 1984; Goddard, 1917; Jensen, 1974, 1976; Mercer, 1979; Sanchez, 1934; Vukovich & Figueroa, 1982; Yerkes, 1921). By examining the mean values from the tests used in these studies (where available), identifying the CHC constructs measured by the tests, and using an expert consensus procedure to evaluate task characteristics, a simple hierarchy emerged that arranged tests from “highest” to “lowest” scores which reflected the degree of impact of cultural loading and linguistic demand and forms the general “pattern” that forms the basis for comparison in determining difference versus disorder. There is no question that the general pattern of decline in performance for diverse individuals varies in accordance with the cultural/linguistic nature of the test and that a simple dichotomous view (i.e., verbal vs. nonverbal) is both misleading and inaccurate. Nevertheless, as a general finding, the fact that bilinguals perform more poorly on language and culture-based tasks than they do on more abstract/novel tasks is too well established to debate it much. The second research base, performance of bilinguals on the newer versions of tests, is an ongoing effort and supports both the general pattern of decline observed in older tests as well as informs the Culture-Language Test Classifications as described below. 14. How were the Culture-Language Test Classifications used within the C-LIM established? The Culture-Language Test Classifications are merely an extension of the general pattern of decline in performance illustrated in the extensive literature on the testing of bilinguals as described above. Because cultural and linguistic variables are highly correlated, we could have simply classified the tests strictly in accordance with their mean values. However, we thought there might be some clinical utility in distinguishing tests that had more (or less) cultural content from those that had more (or less) language demands. We still believe this to be true, particularly in evaluations of English Learners with speech-language problems. In short, the classifications were initially and continue to be made using a variety of procedures including: 1) actual subtest means obtained from research that tested bilingual populations; 2) identification of the CHC ability construct measured by the test (either as reported by the test publisher or determined by factor analysis); and 3) an expert consensus process to examine the task characteristics utilized in the administration or response to a particular test. Unfortunately, the fact is bilinguals have not been formally researched with every known intelligence or cognitive ability test and their rapid evolution coupled with ongoing revision means this is unlikely to ever happen. But given the clinical purpose of the classifications (analysis in variation that may be due to task characteristics) and the main intent (to evaluate a general pattern of decline), use of the other two methods appears to suffice. Consider first, for example, that in the case of the latest versions of some batteries, we were able to use historical research as a preliminary guide to classification because many tests are actually carried over from version to version anyway (e.g., Wechsler Vocabulary, Similarities, Block Design, etc.). Second, many tests use extremely similar tasks to measure the very same constructs. This is not surprising because intelligence tests have always been more similar than dissimilar and most can be traced to the original Binet Scales or tasks developed for the Army Mental Tests. Third, the advancements in test construction and design, as related to the use and application of CHC theory, has also promoted better consistency in this regard and therefore tests with similar instructions, characteristics, and intended constructs correlate highly and are subject to similar effects from culture and language. Tests that measure Gc, for example, are difficult if not impossible, to measure in a “low culture/low language” manner. Thus, tests of Gc tend to be very similar in terms of cultural loading and linguistic demand. Fourth, the classifications are based on a scale that uses only three rankings, low, moderate, and high and we do not attempt to make fine discriminations as they are unnecessary. Whereas a practitioner may dispute whether a test should be low or moderate or moderate or high, it is not difficult to clearly separate tests that are low on both dimensions from those that are high on both dimensions. This is all that is necessary to provide information regarding performance relative to task characteristics and it does not alter the basic declining pattern revealed by the historical data which is the basis of the C-LIM. And finally, we do have actual data on the performance of bilingual individuals on several tests which allows us to both examine and refine the classifications (and misclassifications) and extrapolate easily in the classification of other similar tests including the WJ-III (Dynda, 2007; SoteloDynega, 2006), WISC-III/IV (Nieves-Brull, 2005; Tychanska, 2008), SB-IV/V (Aziz, 2009, Lella-Sourvalis, 2009), DAS-II (Aguerra, 2005), and others. Indeed, our research has indicated that the declining pattern is so profound that it does not appear as a function of other variables such as age, grade, battery, or ethnicity (Beharry, 2008). Certainly, there may be tests we have “misclassified,” but that is the point of new theory and new research. We never intended the classifications to be absolute, but in fact, alterable when new data suggest alternative classifications. In the meantime, for what the C-LIM is designed to do, such classification issues are minor and the fact that we continue to see the historical pattern of decline in our present research suggests that the classifications are either correct, or very close. As noted previously, this is a brand new area of research (less than a decade old), tests are constantly being updated (five major intelligence tests have been revised since 2001), and it represents issues and implications for practice that are not well understood. We expect that shortly this avenue of research will become more prevalent in the scientific literature. 15. Are there any exceptions to the interpretive rules that underlie the C-LIM? We have recently begun to evaluate patterns of performance in bilinguals who fall outside the typical learning disability versus normal categories and have learned that there does appear to be evidence of unique patterns of decline on the basis of type of disability. This research is quite preliminary, but we have seen greater decline in the slope of scores for individuals with speech-language problems and a flatter slope but significantly lower overall mean value for individuals with global cognitive impairment. These patterns are different than that exhibited by individuals with learning disabilities and suggest that the C-LIM may be useful in helping identify one type of disorder versus another. Thus, there may be times when a declining pattern is evident, but when it is not in the range delineated by the C-LIM, that it may still indicate disability. In addition, other factors such as motivation, fatigue, emotional problems, scoring/administration errors, and the like can also lead to problems in evaluating the presence of a declining pattern. 16. Does the C-LIM eliminate the need for clinical/professional judgment? No. As noted previously, the C-LIM is not a diagnostic tool or test. It is a system that relies completely on the judgment of the clinician who administered the tests and is designed to allow for research to guide evaluation of the impact of cultural and linguistic differences on test performance in a systematic manner. This determination, essentially “difference vs. disorder” is not something that can be reduced to a formula or evaluated directly via testing. It will always remain subject to clinical and professional judgment. To date, the C-LIM remains the only method that provides practitioners the ability to make this one very important determination in a systematic and research-based manner.