Clinical Decision-Making: Using Tests to Determine Child Language Impairment Severity TAMMIE SPAULDING, PHD, CCC-SLP UNIVERSITY OF CONNECTICUT NOVEMBER 19, 2011 Student Collaborators Graduate Students Undergraduate Students Cecilia Figueroa Ashley Burgess Sabrina Jara Caitlin Geary Margaret Swartwout Kacie Wittke Calli Schechtman Shannon Tobin Clinical Decision-Making Not Impaired Impaired Severity of Impairment Prognosis Frequency and Duration of Treatment Session Treatment Approach & Priorities LTGs & STOs Articles of Reference 1. Spaulding, T. (in press). Inconsistency in severity ratings for children with specific language impairment. Journal of Communication Disorders. 2. Spaulding, T., Swartwout, M., & Figueroa, C. (in press). Using normreferenced tests to determine severity of language impairment in children: Disconnect between U.S. policy-makers and test developers. Language, Speech, and Hearing Services in Schools. Determine what educational agencies say to do (re: the use of norm-referenced tests) for determining severity of impairment in children Discuss the Disconnect between Educational Agency Guidelines for School-Based SLPs and Empirical Evidence Determine if norm-referenced tests Indicate that they should be used to determine severity of language impairment Provide evidence in their examiner’s manuals for this use Determine if empirical evidence supports the use of norm-referenced tests for informing severity decisions Importance of Determining Severity of Impairment Influences Service Delivery (Spaulding et al., pilot data) Frequency and Duration of intervention Treatment approach Priorities for intervention Prognostic expectations Eligibility for service Characterizing Severity of Impairment Categorization is typical Clinical practice (Caesar and Kohler, 2009) Research studies al., 2004) (e.g., Ballantyne et al., 2007; Cohen et al., 2005; van Daal et Norm-referenced test use for severity determinations Researchers rely on norm-referenced tests as the indicator of impairment severity (e.g., Ballantyne et al., 2007; Bishop & Edmundson, 1987; Evans et al., 2009; Lahey et al, 2001; Hart et al., 2004) Are SLPs encouraged to do the same? School-based SLPs U.S. Department of Education guidelines Research Question: Clinical Guidelines Study Spaulding, Swartwout, & Figueroa (in press) Do U.S. state Departments of Education recommend the use of norm-referenced test scores to inform severity of language impairment decisions? Method: Clinical Guidelines Study Spaulding, Swartwout, & Figueroa (in press) Results: Clinical Guidelines Study State Departments of Education STATE CATEGORY STANDARD SCORE STANDARD DEVIATION NORTH DAKOTA MAINE ARKANSAS COLORADO MILD MODERATE SEVERE 78-85 70-77 <70 -1.0 to -1.5 -1.5 to -2.0 < -2.0 ILLINOIS MILD MODERATE SEVERE PROFOUND 78-84 63-77 62 or below No criteria specified -1.0 to -1.5 -1.5 to -2.5 -2.5 or below No criteria specified KENTUCKY MILD MODERATE SEVERE 75-79 70-74 <70 -1.33 to -1.66 -1.66 to -2.0 -2.0 or below TENNESSEE MILD MODERATE SEVERE 70-77 62-69 62 or below -1.5 to -2.0 -2.0 to -2.5 -2.5 or below VIRGINIA* MILD MODERATE SEVERE 78-84 70-75 <70 -1.0 to -1.5 -1.5 to -2.0 -2.0 or below Results: Clinical Guidelines Study State Departments of Education STATE DEPT OF ED OTHER SOURCES OF DATA Arkansas Alternate assessments when validity is a concern Colorado Classroom observations curriculum-based assessments, oral/written language samples, informal probes Illinois Two or more diagnostic procedures/standardized tests Kentucky Language samples/narratives, classroom observations, teacher/parent inteviews, criterion-referenced activities, writing samples Maine Informal assessments can be used North Dakota Checklists, language samples, classroom observations Tennessee Language samples, checklists, observations Virginia Criterion-referenced measures, curriculum-based assessments, dynamic assessment, language samples, contextual probes, structured observations, interviews, reports, checklists Results: Clinical Guidelines Study State Departments of Education STATE DEPT OF ED WEIGHTING OF MEASURES FOR SEVERITY DECISIONS Arkansas Illinois Maine ? Colorado Equal weight: Norm-referenced test score, informal assessment, comprehension of curricular info More weight: Impact of linguistic deficits on educational performance Kentucky More weight to functional assessment and impact on educational performance than on norm-referenced test score North Dakota Equal weight to educational impact and norm-referenced test score, less weight to informal assessment results Tennessee Equal weight to norm-referenced test score, informal assessments, and functional/academic language skills Virginia Equal weight to norm-referenced test score, nonstandardized asssessment/functional analyses, lang functioning in low comprehension/low verbal demand envts, and lang functioning in high comprehension/high verbal demand envts Discussion: Clinical Guidelines Study State Departments of Education: 8 state Depts of Ed. Say to use norm-referenced tests for severity decisions: Implication: Severity of impairment criteria varied across states: Implication: Indicate specific criteria for severity determinations based on test scores Assumption 1: ……….. I can apply these boundary criteria on whatever test I select to administer Research Question: Applying State Ed Criteria to Real Children Assumption 1: Tested I can apply the State Dept of Education criteria on whatever test I select to use and severity of impairment ratings will be consistent. Participants: Demographic Characteristics TD (n=31) SLI (n=31) AGE 4.66 (4.0-5.5) 4.66 (4.0-5.7) SEX 23 MALES, 8 FEMALES 23 MALES, 8 FEMALES 14.24 (11-17) 14.13 (11-17) 2 0 1 22 6 4 2 0 15 10 5 4 22 14 5 12 MOTHER’S EDUCATION LEVEL RACE AFRICAN AMERICAN AMERICAN INDIAN ASIAN WHITE UNSPECIFIED ETHNICITY HISPANIC NON-HISPANIC UNSPECIFIED Method: Assumption 1 Applying State Ed Criteria to Real Children Preschool children with SLI and typically developing (TD) children were administered: Test for Examining Expressive Morphology (TEEM:;Shipley, Stone, & Sue, 1983) Structured Photographic Expressive Language Test-Preschool, Second Edition (SPELT-P2; Dawson, Stout, Eyeret al., 2005) The most common boundary criteria that state Departments of Education recommended for use were applied to their scores on these assessments Consistency between severity rankings were determined Method: Assumption 1 Applying State Ed Criteria to Real Children Most common State Dept of Ed boundaries SEVERITY CATEGORY STANDARD SCORE BOUNDARIES Typically Developing >85 Mild 78-85 Moderate 70-77 Severe <70 Participants: Norm-referenced Test Scores Results: Consistency in Severity Classifications Using State Ed Criteria (SPELT-P2 vs TEEM) 30 CONSISTENT 25 # of participants INCONSISTENT 20 15 10 5 0 SLI TD Results: Consistency in Severity Classifications: Specifics of Severity Rankings TEEM SPELT-P2 TYPICALLY DEVELOPING 26 TD 31 TD, 3 SLI MILD 5 TD 6 SLI MODERATE 0 7 SLI SEVERE 31 SLI 15 SLI Assumption 1: Discussion Can I apply the Dept of State Education criteria on whatever test I select to use and it will result in the consistent severity determination? ANSWER: NO Study 1: Assumption 2 Spaulding, Swartwout, & Figueroa (in press) 8 (16%) of State Departments of Education are saying to use norm-referenced test scores to determine severity of impairment ASSUMPTION 2: Norm-referenced tests must be designed for the purpose of determining severity of impairment Research Questions Assumption 2: Tested Spaulding, Swartwout, & Figueroa (in press) Do norm-referenced tests indicate that they should be used to determine severity of language impairment in children? If so, do they provide empirical evidence supporting this use? Method: Assumption 2 Spaulding, Swartwout, & Figueroa (in press) Ordered the latest edition of 45 norm-referenced tests of child language Met Criteria: Assessed oral language skills of English-speaking children between the ages of 5-17 Were normed on kids who spoke American English Were commercially available Required elicited responses from children Were not screening or criterion-referenced measures Looked in the examiner’s manuals Method: Assumption 2 Spaulding, Swartwout, & Figueroa (in press) Descriptive data obtained regarding which tests: Indicated they should be used for the purpose of determining severity Provided information to convert test performance to a severity rating What those ratings were, the boundaries between the severity ratings, and how these boundaries were derived Results: Assumption 2: Norm-Referenced Test Review (Spaulding, Swartwout, & Figueroa, in press) The Test of Word Knowledge (TOWK; Wiig & Secord, 1992) stated that it could be used to determine severity of language impairment in children Eleven tests provided information on how to convert test performance to a severity rating TEST SEVERITY BOUNDARIES BBCS:E BBCS-3:R 55-70: Very Delayed 75-85: Delayed CELF-4 CELF-P2 DELV-NR MAVA* ≤70: Very Low/Severe 71-77: Low/Moderate 78-85: Marginal/Borderline/Mild CREVT-2 TELD-3 TOPL-2 UTLD-4 <70: Very Poor 70-79: Poor 80-89: Below Average ROWPVT-2000 ≤72: Low 73-88: Below Average TEST SEVERITY BOUNDARIES TEST PUBLISHER BBCS:E BBCS-3:R 55-70: Very Delayed 75-85: Delayed The Psychological Corporation CELF-4 CELF-P2 DELV-NR ≤70: Very Low/Severe 71-77: Low/Moderate 78-85: Marginal/Borderline/Mild The Psychological Corporation CREVT-2 TELD-3 TOPL-2 UTLD-4 <70: Very Poor 70-79: Poor 80-89: Below Average Pro-Ed MAVA ≤70: Very Low 71-77: Low 78-85: Borderline/Marginal Super Duper Publications ROWPVT-2000 ≤72: Low 73-88: Below Average Academic Therapy Publications Take Home Points: Assumption 2 Spaulding, Swartwout, & Figueroa (in press) Only eleven test manuals provide criteria for converting a child’s test score to a severity label No empirical data to support the cut-off boundaries they provided They don’t appear to be based on how children with language impairment perform on the test Take Home Points: You can’t apply the state dept of ed criteria on any test selected for use and expect it to be accurate Tests don’t appear to be designed to determine severity of impairment even if they tell you they are Research Question: Study 2 Spaulding (in press) Can I can apply the Dept of State Education criteria on whatever test I select to use and it will result in the consistent severity determination? ANSWER: NO If test manuals provide the same cut-off boundaries for severity determinations, will children be consistently classified with the same severity of impairment on these tests? Participants: Demographics SLI (n=16) TD (n=16) M = 50.81 months M = 51.38 months 8.89 8.79 (38-64 months) (38-65 months) 9 boys, 7 girls 9 boys, 7 girls M = 14.28 years M = 14.61 years 1.32 1.50 (12-16 years) (13-17 years) Not Hispanic 7 10 Hispanic 5 4 Not reported 4 2 White 9 13 Black/African American 1 1 Multiracial 2 1 Not reported 4 1 Age SD Range Sex Mother’s education level SD Range Ethnicity (n) Race (n) Participants: Norm-referenced Test Performance Norm-referenced Test Performance SLI Group TD Group ______________________ ______________________ Mean SD Range Mean SD Range *CELF-P2 77.69 7.19 (65-84) 113.06 9.55 (98-129) *PPVT-IV 91.06 9.15 (77-111) 113.75 10.70 (92-126) KABC-II 102.88 8.64 (89-115) 106.09 7.51 (98-119) *significantly different at p<.05 TEST SEVERITY BOUNDARIES BBCS:E BBCS-3:R 55-70: Very Delayed 75-85: Delayed CELF-4 CELF-P2 DELV-NR ≤70: Very Low/Severe 71-77: Low/Moderate 78-85: Marginal/Borderline/Mild CREVT-2 TELD-3 TOPL-2 UTLD-4 <70: Very Poor 70-79: Poor 80-89: Below Average MAVA ≤70: Very Low 71-77: Low 78-85: Borderline/Marginal ROWPVT-2000 ≤72: Low 73-88: Below Average Descriptive Ratings for Standardized Test Scores TELD-3 ______________________ Score Range Classification >130 UTLD-4 _______________________ Score Range Classification Very Superior 131-165 Very Superior 121-130 Superior 121-130 Superior 111-120 Above Average 111-120 Above Average 90-110 Average 90-110 Average 80-89 Below Average 80-89 Below Average 70-79 Poor 70-79 Poor <70 Very Poor 35-69 Very Poor Consistency in Language Proficiency Designations Using Procedures in Tests Themselves UTLD4 vs. TELD-3: Severity Consistency 14 CONSISTENT # of participants 12 INCONSISTENT 10 8 6 4 2 0 SLI TD Specifics of Proficiency Rankings TELD-3 VERY SUPERIOR UTLD-4 1TD SUPERIOR 3TD 2 TD, 2 TD ABOVE AVERAGE 7 TD 4 TD AVERAGE 8 SLI , 6 TD 1 SLI, 1 SLI, 6 TD, 1 TD BELOW AVERAGE 3 SLI 4 SLI, 2 SLI, 2 SLI POOR 2 SLI 4 SLI, 1SLI VERY POOR 3 SLI 1 SLI Discussion: Study 2 Spaulding (in press) Can I apply the Dept of State Education criteria on whatever test I select to use and it will result in the consistent severity determination? ANSWER: NO Can I apply the boundary criteria for determining severity of language impairment in the test manuals themselves and find consistent severity determinations? ANSWER: NO Conclusions Be cautious in using norm-referenced test performance to inform severity decisions given Inconsistency in educational agency guidelines Lack of consistency in severity designations based on how children with language impairment perform on these tests Lack of data within test manuals to support this use Thank you! Item Selection: Maximize Severity Determination Accuracy (hopefully) consider the items on the test relative to their purpose Considers Item difficulty Ability of the people tested Item difficulty Test developers carefully Person ability Easy item Moderate item Difficult item Item Difficulty Item selection: Maximize Diagnostic Utility Impaired Unimpaired Person’s Ability TEST SEVERITY BOUNDARIES BBCS:E BBCS-3:R 55-70: Very Delayed 75-85: Delayed CELF-4 CELF-P2 DELV-NR ≤70: Very Low/Severe 71-77: Low/Moderate 78-85: Marginal/Borderline/Mild CREVT-2 TELD-3 TOPL-2 UTLD-4 <70: Very Poor 70-79: Poor 80-89: Below Average MAVA ≤70: Very Low 71-77: Low 78-85: Borderline/Marginal ROWPVT-2000 ≤72: Low 73-88: Below Average Comparison group should be children with different degrees of impairment Do norm-referenced tests provide a sample of children with different degrees of impairment in their examiner’s manuals? If so, do they provide a means for comparing a child’s score to children with different degrees of impairment to determine how impaired they are? Design characteristics to maximize severity accuracy Characteristic 1: Diagnosing vs. determining severity of impairment (item selection process differs) - Lost 4 tests, down to 7 Characteristic 2: Comparison group should be children with different degrees of impairment – No tests do Characteristic 3: If boundary cut-offs are provided, need to include an analysis showing these cut-off boundaries accurately distinguish amongst children with these different degrees of impairment Design characteristics to maximize severity accuracy Characteristic 1: Diagnosing vs. determining severity of impairment (item selection process differs) - Lost 4 tests, down to 7 Characteristic 2: Comparison group should be children with different degrees of impairment – No tests do Characteristic 3: If boundary cut-offs are provided, need to include an analysis showing these cut-off boundaries accurately distinguish amongst children with these different degrees of impairment -No tests do; boundaries based primarily on publisher