CHAPTER 1 Assessment in Social and Educational Contexts (Salvia, Ysseldyke & Bolt, 2012) Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment Winter 2013 Chapters 1, 11, 12, 13, and 14 are included in this presentation AGENDA – Week 3 • Questions for the Good of the Group • Instruction and Lab Time: Continue WJ-III • Break • Group activity to process Chapters 1, 3, 11, 12, and 14 • Powerpoint overview of Chapters 1, 3, 11, 12, and 14 Individualized Support • Schools must provide support as a function of individual student need • To what extent is the current level of instruction working? • How much instruction is needed? • What kind of instruction is needed? • Are additional supports necessary? Assessment Defined • Assessment is the process of collecting information (data) for the purpose of making decisions about students • E.g. what to teach, how to teach, whether the student is eligible for special services How Are Assessment Data Collected? • Assessment extends beyond testing and may include: • Record review • Observations • Tests • Professional judgments • Recollections Why Care About Assessment? • A direct link exists between assessment and the decisions that we make. Sometimes these decisions are markedly important. • Thus, the procedures for gathering data are of interest to many people – and rightfully so. • Why might students, parents, and teachers care? • The general public? • Certification boards? Common Themes Moving Forward • Not all tests are created equal • Differences in content, reliability, validity, and utility • Assessment practices are dynamic • Changes in the political, technological, and cultural landscape drive a continuous process of revision Common Themes Moving Forward • The importance of assessment in education • Educators are faced with difficult decisions • Effective decision-making will require knowledge of effective assessment • Assessment can be intimidating, but significant improvements have happened and continue to happen • More confidence in the technical adequacy of instruments • Improvements in the utility and relevance of assessment practices • MTSS framework CHAPTER 11 Assessment of Academic Achievement with Multiple-Skill Devices Achievement Tests • Achievement Tests • Norm-referenced • Allow for comparisons between students • Criterion-referenced • Allow for comparisons between individual students and a skill benchmark. • Why do we use achievement tests? • Assist teachers in determining skills students do and do not have • Inform instruction • Academic screening • Progress evaluation Classifying Achievement Tests Diagnostic Achievement High Low More efficient administration – Comparisons between students can be made but very little power in determining strengths and weaknesses Less efficient administration – Dense content and numerous items allow teachers to uncover specific strengths and weaknesses Number of students who can be tested Low Less efficient administration – Allows for more qualitative information about the student. High Efficient administration – Typically only quantitative data are available Considerations for Selecting a Test • Four Factors • Content validity • What the test actually measures should match its intended use • Stimulus-response modes • Students should not be hindered by the manner of test administration or required response • Standards used in state • Relevant norms • Does the student population being assessed match the population from which the normative data were acquired? Tests of Academic Achievement • Peabody Individual Achievement Test (PIAT-R/NU) • Wide Range Achievement Test 4 (WRAT4) • Wechsler Individual Achievement Test 3 (WIAT-III) Peabody Individual Achievement TestRevised/Normative Update (PIAT-R/NU) • In general… • Individually administered; norm-referenced for K-12 students • Norm population • Most recent update was completed in 1998 • Representative of each grade level • No changes to test structure PIAT-R/NU Subtests Mathematics: 100 multiple-choice items assess students’ knowledge and application of math concepts and facts Reading recognition: 100 multiple-choice items require students to match and name letters and words General information: 100 questions presented orally. Content areas include social studies, science, sports, and fine arts. Reading comprehension: 81 multiple-choice items require students to select an appropriate answer following a reading passage Spelling: 100 items ranging in difficulty from kindergarten (letter naming) to high school (multiple-choice following verbal presentation) Written expression: Split into two levels. Level 1 assesses prewriting skills and Level II requires story writing following a picture prompt PIAT-R/NU • Scores • For all but one subtest (written expression), response to each item is pass/fail • Raw scores converted into: • • • • Standard scores Percentile ranks Normal curve equivalents Stanines • 3 composite scores • Total reading • Total test • Written language PIAT-R/NU • Reliability and Validity • Despite new norms, reliability and validity data are only available for the original PIAT-R (1989) • Previous reliability and validity data are likely outdated • Outdated tests may not be relevant in the current educational context Wide Range Achievement Test 4 (WRAT4) • In general… • Individually administered • 15-45 minute test length depending on age (5-94 age range) • Norm-referenced, but covers a limited sample of behaviors in 4 content areas • Norm population • Stratified across age, gender, ethnicity, geographic region, and parental education WRAT4 Subtests Word Reading: The student is required to name letters and read words Sentence Comprehension: The student is shown sentences and fills in missing words Spelling: The student write down words as they are read aloud Math Computation: The student solves basic computation problems • Scores • Raw scores converted to: • Standard scores, confidence intervals, percentiles, grade equivalents, and stanines • Reading composite available • Reliability • Internal consistency and alternate-form data are sufficient for screening purposes • Validity • Performance increases with age • WRAT4 is linked to other tests that have since been updated; additional evidence is necessary Wechsler Individual Achievement TestThird Edition (WIAT-III) • General • Diagnostic, norm-referenced achievement test • Reading, mathematics, written expression, listening, and speaking • Ages 4-19 • Norm Population • Stratified sampling was used to sample within several common demographic variables: • Pre K – 12, age, race/ethnicity, sex, parent education, geographic region WIAT-III • Subtests and scores • 16 subtests arranged into 7 domain composite scores and one total achievement score (structure provided on next slide) • Raw scores converted to: • Standard scores, percentile ranks, normal curve equivalents, stanines, age and grade equivalents, and growth scale value scores. WIAT-III Subtests Composite Subtest Basic Reading Word Reading Pseudoword Decoding Reading Comprehension and Fluency Reading Comprehension Oral Reading Fluency Early Reading Skills Mathematics Math Problem Solving Numerical Operations Math Fluency Math Fluency – (Addition, Subtraction, & Multiplication) Written Expression Alphabet Writing Fluency Spelling Sentence Composition Essay Composition Oral Expression Listening Comprehension Oral Expression WIAT-III • Reliability • Adequate reliability evidence • Split-half • Test-retest • Interrater agreement • Validity • Adequate validity evidence • • • • Content Construct Criterion Clinical Utility • Stronger reliability and validity evidence increase the relevance of information derived from the WIATIII Getting the Most Out of an Achievement Test • Helpful but not sufficient – most tests allow teachers to find an appropriate starting point • What is the nature of the behaviors being sampled by the test? • Need to seek out additional information concerning student strengths and weaknesses • Which items did the student excel on? Which did he or she struggle with? • Were there patterns of responding? CHAPTER TWELVE Using Diagnostic Reading Tests Why Do We Assess Reading? • Reading is fundamental to success in our society, and therefore reading skill development should be closely monitored • Diagnostic tests can help to plan appropriate intervention • Diagnostic tests an help determine a student’s continuing need for special services The Ways in Which Reading is Taught • The effectiveness of different approaches is heavily debated • Whole-word vs. code-based approaches • Over time, research has supported the importance of phonemic awareness and phonics Skills Assessed by Diagnostic Approaches • Oral Reading • Rate of Reading • Oral Reading Errors • Teacher pronunciation/aid • Hesitation • Gross mispronunciation • Partial mispronunciation • Omission of a word • Insertion • Substitution • Repitition • Inversion Skills Assessed by Diagnostic Approaches (cont.) • Reading Comprehension • Literal comprehension • Inferential comprehension • Critical comprehension • Affective comprehension • Lexical comprehension Skills Assessed by Diagnostic Approaches (cont.) • Word-Attack Skills (i.e., word analysis skills) – use of letter-sound correspondence and sound blending to identify words • Word Recognition Skills – “sight vocabulary” Diagnostic Reading Tests • See Table 12.1 • Group Reading Assessment and Diagnostic Evaluation (GRADE) • DIBELS Next • Test of Phonemic Awareness – 2 Plus (TOPA 2+) GRADE (Williams, 2001) • Pre-school to 12th grade • 60 to 90 minutes • Assesses pre-reading, reading readiness, vocabulary, comprehension, and oral language • Missing some important demographic information for norm group, high total reliabilities (lower subscale reliabilities), adequate information to support validity of total score. DIBELS Next (Good and Kaminski, 2010) • Kindergarten-6th grade • Very brief administration (used for screening and monitoring) • First Sound Fluency, Letter Naming Fluency, Phoneme Segmentation Fluency, Nonsense Word Fluency, Oral Reading Fluency, and DAZE (comprehension) • Use of benchmark expectations or development of local norms • Multiple administrations necessary for making important decisions TOPA 2+ (Torgesen & Bryant, 2004) • Ages 5 to 8 • Phonemic awareness and letter-sound correspondence • Good norms description • Reliability better for kindergarteners than for more advanced students • Adequate overall validity CHAPTER 13 Using Diagnostic Mathematics Measures Why Do We Assess Mathematics? • Multiple-skill assessments provide broad levels of information, but lack specificity when compared to diagnostic assessments • More intensive assessment of mathematics helps educators: • Assess the extent to which current instruction is working • Plan individualized instruction • Make informed eligibility decisions Ways to Teach Mathematics 1960s: New Math; movement away from traditional approaches to mathematics instruction < 1960: Emphasis on basic facts and algorithms, deductive reasoning, and proofs 1980s: Constructivist approach – standards-based math. Students construct knowledge with little or no help from teachers > 2000: Evidence supports explicit and systematic instruction (most similar to “traditional” approaches to instruction). Behaviors Sampled by Diagnostic Mathematics Tests • National Council of Teachers of Mathematics (NCTM) • Content Standards – Number and operations – Algebra – Geometry – Measurement – Data analysis and probability • Process Standards – Problem solving – Reasoning and proof – Communication – Connections – Representation Specific Diagnostic Math Tests • Group Mathematics Assessment and Diagnostic Evaluation (G●MADE) • KeyMath-3 Diagnostic Assessment (KeyMath-3 DA) G●MADE • General • Group administered, norm-referenced, standards-based test • Used to identify specific math skill strengths and weaknesses • Students K-12 • 9 levels of difficulty teachers may select from G●MADE • Subtests • Concepts and communication • Language, vocabulary, and representations of math • Operations and computation • Addition, subtraction, multiplication, and division • Process and applications • Applying appropriate operations and computations to solve word problems G●MADE • Scores • Raw scores converted to: • Standard scores, grade scores, stanines, percentiles, and normal curve equivalents, and growth scale values. • Norm population • 2002 and 2003; nearly 28,000 students • Selected based on geographic region, community type, socioeconomic status, students with disabilities G●MADE • Reliability • Acceptable levels of split-half and alternative form reliability • Validity • Based on NCTM standards (content validity) • Strong criterion related evidence KeyMath-3 Diagnostic Assessment (KeyMath3 DA) • General • Comprehensive assessment of math skills and concepts • Untimed, individually administered, norm-referenced test; 30-40 minutes • 4 years 6 months through 21 years KeyMath-3 DA Subtests • Numeration • Algebra • Geometry • Measurement • Data analysis and probability • Mental computation and estimation – Addition and subtraction – Multiplication and division – Foundations of problem solving – Applied problem solving KeyMath-3 DA • Scores • Raw scores converted to: • Standard scores, scaled scores, percentile rank, grade and age equivalents, growth scale values • Composite scores • Operations, basic concepts, and application • Norm population • 3,630 individuals • 4, 6, and 21 years – demographic distribution approximates data reported in 2004 census KeyMath-3 DA • Reliability • Internal consistency, alternate-form, and test-retest reliability • Adequate for screening and diagnostic purposes • Validity • Adequate content and criterion-related validity evidence for all composite scores CHAPTER 14 Using Measures of Oral and Written Language Assessing Language Competence • When assessing language skills, it is important to break language down into processes and measure each one – Language appears in written and verbal format • Comprehension • Expression – Normal levels of comprehension ≠ normal expression – Normal levels of expression ≠ normal comprehension Terminology: Language as Code • Phonology: • Hearing and discriminating word sounds • Semantics: • Understanding vocabulary, meaning, and concepts • Morphology and syntax: • Understanding the grammatical structure of language • Supralinguistics and pragmatics: • Understanding a speaker’s or writer’s intentions Assessing Oral and Written Language • Why? • Ability to converse and express thoughts is desirable • Basic oral and written language skills underlie higher-order skills • Considerations in assessing oral language • Cultural diversity • Differences in dialect are different, but not incorrect • Disordered production of primary language or dialect should be considered when evaluating oral language • Are the norms and materials appropriate? • Developmental considerations • Be aware of development norms for language acquisition Assessing Oral and Written Language • Considerations in assessing written language • Form and Content • Penmanship • Spelling • Style • May be best assessed by evaluating students’ written work and developing tests (vocabulary, spelling, etc.) that parallel the curriculum Methods for Observing Language Behavior • Spontaneous language – Record what child says while talking to an adult or playing with toys – Prompts may be used for older children – Analyze phonology, semantics, morphology, syntax, and pragmatics • Imitation – Require children to repeat words, phrases, or sentences produced by the examiner – Valid predictor of spontaneous production – Standardized imitation tasks often used in oral language assessment instruments • Elicited language – A picture stimulus is used to elicit language Methods for Observing Language Behavior Advantages and disadvantages of each method Spontaneous •Advantages • Most natural indicator of everyday language performance • Informal testing environment •Disadvantages • Not a standardized procedure (more variability) • Time-intensive Imitation Elicited language •Advantages • Comprehensive • Structured and efficient administration •Disadvantages • Auditory memory may affect results • Hard to draw conclusions from accurate imitations • Boring for child •Advantages • Interesting and efficient • Comprehensive •Disadvantages • Difficult to create valid measurement tools Specific Oral and Written Language Tests • Test of Written Language – Fourth Edition (TOWL-4) • Test of Language Development: Primary – Fourth Edition (TOLD-P:4) • Test of Language Development: Intermediate – Fourth Edition (TOLD-I:4) • Oral and Written Language Scales (OWLS) Test of Written Language – Fourth Edition (TOWL-4) • General • Norm-referenced • Designed to assess written language competence of students between the ages of 9 and 17 • Two formats • Contrived • Spontaneous TOWL-4 Subtests • Contrived – Vocabulary – Spelling – Punctuation – Logical sentences – Sentence combining • Spontaneous • Contextual conventions • Story composition TOWL-4 • Scores • Raw scores can be converted to percentile or standard scores • Three composite scores and one overall score • Contrived writing • Logical sentences • Spontaneous writing • Overall writing TOWL-4 • Norms – Three age ranges: 9-11, 12-14, and 15-17 – Distribution approximates nationwide school-age population for 2005; however, insufficient data are presented to confirm this • Reliability – Variable data for internal consistency, stability, and inter-scorer agreement – 2 composites reliable for making educational decisions about students • Validity – Content, construct, and predictive validity evidence is presented – Validity of inferences drawn from data is somewhat unclear Test of Language Development: Primary – Fourth Edition (TOLD-P:4) • General • Norm-referenced, untimed, individually administered test • 4-8 years of age • Used to: • • • • Identify children significantly below their peers in oral language Determine specific strengths and weaknesses Document progress in remedial programs Measure oral language in research studies TOLD-P:4 • Subtests • Picture vocabulary • Relational vocabulary • Oral vocabulary • Syntactic understanding • Sentence imitation • Morphological completion • Word discrimination • Word analysis • Word articulation • Scores – Raw scores converted to: • Age equivalents, percentile ranks, subtests scaled scores, and composite scores – Composite scores • • • • • • Listening Organizing Speaking Grammar Semantics Spoken language TOLD-P:4 • Norm population • 1,108 individuals across 4 geographic regions • Sample partitioned according to the 2007 census • Reliability • Adequate estimates of reliability • Coefficient alpha • Test-retest • Scorer difference • Validity • Adequate content, construct, and criterion-related validity evidence Test of Language Development: Intermediate – Fourth Edition (TOLD-I:4) • General • Norm-referenced, untimed, individually administered test • 8-17 years of age • Used to: • • • • Identify children significantly below their peers in oral language Determine specific strengths and weaknesses Document progress in remedial programs Measure oral language in research studies TOLD-I:4 • Subtests • Sentence combining • Picture vocabulary • Word ordering • Relational vocabulary • Morphological comprehension • Multiple meanings • Norm population • 1,097 students from 4 geographic regions • Sample partitioned according to the 2007 census • Scores – Raw scores converted to: • Age equivalents, percentile ranks, subtests scaled scores, and composite scores – Composite scores • Listening • Organizing • Speaking • Grammar • Semantics • Spoken language TOLD-I:4 • Reliability • Adequate estimates of reliability • Coefficient alpha • Test-retest • Scorer difference • Validity • Adequate content, construct, and criterionrelated validity evidence Oral and Written Language Scales (OWLS) • General • Norm referenced, individually administered assessment of receptive and expressive language • 3-21 years of age • Subtests • Listening comprehension • Oral expression • Written expression OWLS • Norm population • 1,985 students matched to 1991 census data • Scores • Raw scores converted to: • Standard scores, age equivalents, normal-curve equivalents, percentiles, and stanines • Scores generated for each subtest, an oral language composite, and for a written language composite OWLS • Reliability • Sufficient internal and test-retest reliability for screening, but not for making important decisions about individual students • Validity • Adequate criterion-related validity