Wechsler Individual Achievement Test 1 Running head: WECHSLER INDIVIDUAL ACHIEVEMENT TEST Review and Critique of Wechsler Individual Achievement Test Kaitlin Schulz and Megan Saunders Margaret G. Warner Graduate School of Education University of Rochester June 8, 2010 Wechsler Individual Achievement Test 2 Review of Wechsler Individual Achievement Test The Wechsler Individual Achievement Test- Second Edition (WIAT-II) is an achievement test that was designed to be given to individuals between the ages of 4 years old and 85 yrs old (Salvia, Ysseldyke, & Bolt, 2007b; Plak, Impara, & Spies, 2003; Doll, 2003). As with all achievement tests, the WIAT-II is designed to assess the academic achievement of an individual as opposed to measuring the student’s innate cognitive ability (Pierangelo & Giuliani, 2002). The WIAT-II is based on its predecessor, the WIAT, which was published in 1992 by The Psychological Corporation (Plak et. al, 2003; Tindal & Nutter, 2003). The WIAT-II began to come into fruition in 1996, four years after the WIAT was established as an individual achievement test (Doll, 2003; Tindal & Nutter, 2003). It is now the most commonly used achievement test for school aged children (Mayes & Calhoun, 2008) while also being an assessment tool for measuring the academic achievement of adults (Doll, 2003). The WIAT-II is used for identifying a discrepancy between a child’s ability and their achievement through the comparison of their scores on the WIAT-II and one of three Wechsler ability tests: WPPSI-R, WISC-III, or WAIS-III (Salvia et. al., 2007b). This method of testing very often serves as a differential diagnosis for students with disabilities (Tindal & Nutter, 2003). Essentially, the discrepancy or differential is used to determine whether or not a student has a learning disability and whether or not the student qualifies for special education services (Plak et. al., 2003; Pierangelo & Giulianni, 2002). However, the WIAT-II is also intended to be used for curriculum planning, such as the planning and evaluation of instruction (Pierangelo & Giuliani, 2002), and clinical appraisal for preschool children. Tindal and Nutter (2003) also note that the results of the WIAT-II are to be used across a variety of settings and not merely for the purpose of placing students in special education. Wechsler Individual Achievement Test 3 The theoretical basis for the WIAT-II comes from the beliefs of David Wechsler. Wechsler believed that intelligence is composed of many different types of intelligence and is affected by both the ways in which these intelligences are combined as well as by one’s own motivation. He also felt that one could not measure one’s intelligence but rather could only measure aspects of an individual’s intellectual ability. On this notion and on the foundations set by the Army Alpha and Army Beta exams, David Wechsler created his intelligence tests. Wechsler believe intelligence was part of the larger construct of personality and developed his scale by focusing on the global nature of intelligence. Weschler made no attempt to design subtests that would measure the basic units of intelligence (Sattler, 2008). Wechsler used this theory molded with the Cattell-Horn-Carroll theory of intelligence, which states that there are multiple dimensions to intelligence such as fluid intelligence, crystallized intelligence, general memeory and learning, broad visual perception, broad auditory perception, broad retrieval ability, broad cognitive speediness, and decision/reaction time/speed, to create intelligence tests (Salvia, Ysseldyke, & Bolt, 2007a). The WIAT and the WIAT-II are reflections of Wechsler’s beliefs on intelligence and offer a way of measuring one’s achievement in comparison to their intelligence as measured by a Wechsler Intelligence exam. Development of the WIAT-II includes research conducted by Berninger, the National Reading Panel, and the National Council of Teachers of Mathematics (Tindal & Nutter, 2003). While the WIAT-II is based on a test that is based on theoretical findings, Muenz, Ouchi, and Cole (1999) stress that the written expression subtest is not based on any validated theory. The WIAT-II is divided into four composites on which an individual is scored and then further divided into eight subtests. The four composites are Reading, Written Language, Mathematics, and Oral Language. Reading is divided into the subtests of word/pseudoword Wechsler Individual Achievement Test 4 decoding and reading comprehension. Mathematics is composed of the subtests math reasoning and numerical operations. Oral expression and listening comprehension make up the Oral Language composite and Written Language is composed of the spelling and written expression subtests (Plak et. al., 2003; Salvia et. al., 2007b; Pierangelo & Giulianni, 2002). The standardization of the WIAT-II began in 1997 with a pilot test that assessed 400 students and was followed up with a larger pilot that sampled 1900 students. An item analysis of these two pilot tests was used for creating the final version of the WIAT-II (Doll, 2003; Tindal & Nutter, 2003). The normalization process tested 3,600 students for the grade based sample and 2,950 students for the age based sample (Salvia et. al., 2007b; Doll, 2003). The sample of individuals tested reflected the 1998 US Census data (Doll, 2003; Salvia et. al, 2007; Tindal & Nutter, 2003). Students that receive special education were included in the sample (Tindal and Nutter, 2003). However, the sample excluded individuals whom do not speak English, those with neurological disorders, and individuals on medication that may affect their performance (Doll, 2003). Salvia, Ysseldyke, and Bolt (2007b) describe the standardization sample as “adequate”. The sample was stratified by gender, race/ethnicity, geographic location, and parent educational level data (Doll, 2003; Salvia et. al, 2007b; Tindal & Nutter, 2003). Although, the socioeconomic status of individuals was not stratified within the sample (Salvia, et. al., 2007b). Qualified and trained examiners conducted the administration of the normalization tests to ensure validity in the tests’ normalization (Tindal & Nutter, 2003). Additionally, 1069 individuals took both the WIAT-II and the corresponding, age appropriate Wechsler Intelligence test in order to create a normalization for the linking of ability and achievement test scores for measuring discrepancies (Salvia et. al., 2007b; Tindal & Nutter, 2003). The start, reversal, and Wechsler Individual Achievement Test 5 discontinuation rules of the WIAT-II were developed during the normative process to adequately align the questions with age/grade standards (Tindal & Nutter, 2003). Evidence for the reliability of the Wechsler Individual Achievement Test-II has been provided for the internal consistency, stability reliability and the interrater reliability. No evidence of alternate form reliability is present. This is due to the fact that there is only one form of the WIAT-II. The internal consistency reliability of the WIAT-II was determined by split-half reliability coefficients. There is some discrepancy in the reliability coefficients provided by reviewers of the test. However, most reviewers find the test to be internally consistent. The split half reliability analysis was conducted based on examinees age and grade. For most subtest scores the reliability coefficient was greater than .80, but numerical operations, written expression, listening comprehension, and oral expression fall below .80 for some ages and grades (Salvia et. al., 2007b; Tindal & Nutter, 2003). Doll (2003) reports the internal consistency of subtests above .85 except for written expression and listening comprehension in school aged sample and oral expression in the college student and adult sample. These three subtests maintained a greater than .70 reliability coeffiecient. The reliability for all four composite scores was greater than .80 (Salvia et. al., 2007b; Tindal & Nutter, 2003). Interrater reliability indicates a low coefficient in written expression, however 5 of the 7 elements of written expression were found to be both reliable and valid and two of the elements were found to be valid with limited reliability. It is thought that the general scoring system of the WIAT-II improves its interrater reliability and enhances the tests ability to discriminate between good and poor writers (Muenz et. al., 1999). The interrater reliability coefficient calculated from 2180 examinee responses rangers from .94-.98 for reading comprehension (a dichotomously Wechsler Individual Achievement Test 6 scored item), .91-.99 or oral expression and .1-.94 for written expression (Salvia et. al, 2007b; Tindal & Nutter, 2003). Stability reliability was determined through the conduction of a test-retest process. Examinees were administered the test and then administered the same test again an average of 10 days later. 297 students aged 6 year to 19 years old participated in the study and all subtest stability reliabilities are above .80 (Salvia et. al, 2007b). Additionally, Doll (2003) reports the stability coefficient for school aged children to be >.90 for all composites and .75-.85 for all composites scores of college aged and adult examinees. In addition to the general scoring system of the WIAT-II providing information regarding interrater reliability, it also improves the content validity of the assessment (Muenz et. al, 1999). The WIAT-II is considered valid based on the correlation between its subtests and other achievement tests as well as the WIAT-II’s correlation with curriculum objectives as analyzed by experts in the field (Salvia et. al., 2007b; Tindal & Nutter, 2003). A draft of the test was also compared to national and state standards to account for the content validity of the exam (Doll, 2003). However, the examiner’s manual for the WIAT-II provides little evidence of the test’s validity and it is unclear whether the results of the WIAT-II will provide more usable information about students’ abilities (Doll, 2003). It is found that scores are more valid at the middle ages than at either extreme of the test’s age spectrum (Pierangelo & Giulianni, 2002). According to Salvia et. al (2007b) the WIAT-II is both reliable and valid. In order to administer the WIAT-II, one must purchase the testing kit for $321. This kit includes: stimulus book 1, stimulus book 2, 25 record forms, 25 response booklets, an examiner’s manual, scoring and normative supplement for grades PreK-12, scoring and normative supplement for college students and adults, word cards, audiotapes, and a bag. In Wechsler Individual Achievement Test 7 addition to this kit, one would also need to purchase the scoring software compatible with their computer. Additional forms and replacement items can also be purchased for a fee (Plak et. al, 2003). The length of the test depends upon the examinee, ranging from 45 minutes for Preschoolers to two hours for high school/college students and adults (Salvia et. al, 2007b; Plak et. al., 2003; Tindal & Nutter, 2003). The person administering the WIAT-II must be trained and qualified in the administration of individual assessment tools and must be involved in educational or psychological testing (Tindal & Nutter, 2003). The test is to be administered to an individual student or adult and is not to be given in group settings (Plak et. al., 2003). Additionally, the subtests musts be administered in the prescribed order even if only a select few subtests are given (Tindal & Nutter, 2003). Start points are given in the test manual and stimulus booklets and are based upon the examinees age or grade (Doll, 2003). There is a reverse rule that is to be used if the examinee answers one of the first three questions wrong. Discontinue rules vary for each subtest, but generally apply when the examinee incorrectly answers six or seven questions in a row (Tindal & Nutter, 2003). Additionally, the Reading Comprehension and Written Expression subtests use Stop points to indicate the end of that portion. Modifications and allowable accommodations are not spelled out in the test manual and therefore fall to the discrepancy of the person administering the test (Tindal & Nutter, 2003). Scoring of the WIAT-II is partially completed simultaneously to its administration. Scores on each individual test item are either dichotomous (0 or 1) or given partial credit (0, 1, or 2). The test encourages test administrators to make qualitative recordings throughout the testing process to help aid in the evaluation of the student (Tindal & Nutter, 2003). After the completion of the test administration, raw test scores are translated into reported scores. Scores are reported Wechsler Individual Achievement Test 8 for each subtest, compiled into four composite scores (mathematics, oral language, written language, and reading) and a total composite score is also provided (Plak et. al., 2003; Tindal & Nutter, 2003). An individual’s score on the WIAT-II can be presented in eight different forms: standard, percentile rank, age equivalent, grade equivalent, normal-curve equivalent, stanine, quartile, and decile (Doll, 2003; Salvia et. al., 2007b; Tindal & Nutter, 2003). The report also includes a place to report on the ability-achievement discrepancy and plot the student’s results on a bell curve. (Tindal & Nutter, 2003). However, only professionals trained in giving tests are qualified to interpret its results into educational decisions (Doll, 2003). This is in part due to the fact that the test does not offer any link between the assessment score and future instruction (Tindal & Nutter, 2003). Sattler (2008) suggests that the WISC-III, the intelligence test cousin of the WIAT-II, is constant among cultures and that scores were remarkable similar across ethnicities. However, Pierangelo and Giulianni (2002) claim that some cultural bias may be present on some subtests. They do not expand on this notion to provide evidence of cultural bias in any specific subtest. Lastly, Tindal and Nutter (2003) state that “conventional and item response theory analyses are presented to document item consistency and to eliminate poorly constructed items, determine correct item order as well as to prevent bias (p.1001)”. No other evidence or claims on item or test bias was found on the WIAT-II. Wechsler Individual Achievement Test 9 Critique of Wechsler Individual Achievement Test The development of intelligence tests in the late 1800’s has lead to analytical theories of intelligence. These analytical theories of intelligence have furthered the development of newer intelligence assessments that try to reflect the models presented by the theorists (Sattler, 2008). The Wechsler Individual Achievement Test, Second Edition (WIAT-2) aims to assess “seven areas of learning disability specified in Public Law 94-142” along with spelling and pseudoword. The WIAT-2 is a new form of test that can be correlated with intelligence tests, such as the Wechsler Intelligence Scale for Children – Fourth Edition (WISC-IV) (Mayes & Calhoun, 2008). The purpose of this test is to assess the abilities of an individual to meet the necessary achievements for reading, spelling, and arithmetic. Ability can be scored in multiple ways, and one of those ways is through a person’s capability to achieve a specific function. In order to standardize the test, questions have to be generalized. This correlates to Spearman’s theory of general and specific factors for intelligence (Sattler, 2008). The Listening Comprehension and Oral Expression components of the assessment fit real-world contexts that allows for more accurate ability measurements (WIAT-2). Its design is to allow the test to be administered to a large range of individuals. This allows the test to be applicable across many different ages and grades. The scoring of the test allows both for standardization of scores as well as for the composite evaluation of eight different scores that reflect individualized contexts, which can be analyzed and assessed (Salvia et. al., 2007b). “This supplemental material, which includes a variety of additional subtests that allow for the comparison of student performance across a variety of conditions, is intended to facilitate the identification of specific processing deficits” (Salvia et. al, 2007a, p. 291). The test itself is used to identify the differences between one’s ability and one’s Wechsler Individual Achievement Test 10 achievement. In order to be classified with a learning disability, their ability would be high and their achievement would be lower. The use of the discrepancy model for labeling students means that only those students who are achieving lower than their ability are ones who are receiving extra services. Because of this, this achievement test fails to aid those students who have both a low ability and low achievement, concluding that they are performing to their potential. The same standards need to be set for all students, having high expectations for each one of them, but simply understanding that modifications or accommodations may be required to reach those standards. The fact that this test assesses skills that students learn both in the classroom and realworld contexts, as well as the applicability for this test to be given across a wide range of students and adults and scores can be standardized to reflect these different contexts means that the educational use of this test is high. It not only provides information about the student, but can be used to reflect upon the educational preparedness and practice of a school. This points out a problem in the assessment. If students are not provided with the contexts and skills deemed necessary by this assessment, then multiple things must be addressed. First, schools and family environments need to promote and provide more open, supportive, and critical contexts that develop these skills. Second, the students cannot be held accountable for any lack in development, but correction needs to be made in order to provide this student with the highest capability set possible. Finally, the requirements that are deemed necessary on the assessment should be continually reviewed and assessed based on the societal, communal, and historical contexts of present day. There are many different behaviors involved in assessing one’s intelligence, yet they are identified as being similar, hence the idea of “general intelligence” evolved. Intelligence tests Wechsler Individual Achievement Test 11 such as the Wechler Intelligence Scale for Children (WISC) are used to measure one’s general intelligence by presenting them with different tasks to complete that require the use of different behaviors and skills (Salvia et. al., 2007a). Are these tests truly predictive of one’s intelligence in different situations or environments? Are simple tasks on a test approached differently than if they were to apply similar behaviors to a real-life situation? Who decides whether there is such a thing as general intelligence? Yes, there are different types of intelligence, different styles and ways of thinking and different applications of these intelligences. Some parts of the test require that the examiner discontinue a section after the examinee gets a certain number incorrect. This claims that the student is unable to answer following questions that may require a higher level of thinking or more specific content. However, is it fair to make this assumption when some of the questions might simply be invalid, worded poorly, or involving content that the student might be unfamiliar with although they might have been able to answer questions later in the section? Also, it is very difficult to keep track of every second during which the test is being administered and how long it takes a student to answer one single question, while at the same time trying to score them. Timing alone could create a variable among different raters and test administrations, while also causing anxiety within the test taker. The WIAT is claimed to be both reliable and valid based on the construction of the test and performance by both children and adults. However, the reliability of the test is based off of interrater agreement and those who were trained to grade the achievement tests. However, what qualifies raters to be efficient and 100% consistent in their rating in comparison to other raters? There can always be room for disagreement, and at times it might be significant. Having different examiners and raters rather than just one will always present itself with potential for a test to be unreliable. This variable is impossible to control as there cannot be just a single rater Wechsler Individual Achievement Test 12 for the millions of WIATs that are conducted each year. The test is also deemed valid, yet it is deemed valid because of its results in comparison to other achievement tests (Salvia et. al., 2007b). However, what makes those achievement tests valid then? Are they being compared to other tests too? Experts determine the correlation of the questions to curriculum objectives. What makes these people experts? Is the curriculum they are designing these questions from reliable in and of itself? Who is to determine what a person is supposed to know by a specific age? Although appearing to be both reliable and valid, there are more complex considerations behind the creation of the test that could deem it invalid based upon what a person is expected to know, and how questions pertaining to those topics are constructed and then presented. There are many variables that could possibly be present to affect the instrument and its intent. The examiner must be someone who is not biased, and who is indifferent to the student’s performance. Any encouragement or disappointment could affect the student’s response to questions or their thought processes. In the listening comprehension section, the variation of the pronunciation or level of sound of the examiner’s voice could also affect the outcomes of student performance in that section. Whether the environment was not familiar versus familiar, and whether the examiner was difficult to hear or not, could change the way the student performs. Any acknowledgement of encouragement or disappointment could affect the student’s response to questions or their thought processes. The examinee’s age and experiences definitely can affect the instrument and the outcome, depending on the way in which the question is presented (Sax, 2005). For example, being biologically different, boys and girls learn in different ways and in general perform better when material or questions are differentiated to meet these preferences. Research has shown that Wechsler Individual Achievement Test 13 boys like material and questions to be succinct, to the point, and like to immediate start working on a problem (Sax, 2005). Girls like to hear a story behind it, make a connection to it, and approach the problem from different angles, therefore taking more time to finish it (Sax, 2005). One is not right or wrong, but they are generally proven differences in the ways boys and girls learn and then express their understanding. With only one form of the WIAT, and not writing the test differently for boys and girls, there could inherently be some biases that may lead to discrepancies in the test scores. Although not a definite effect on the test results, it is still a possible variable that can affect the outcomes. Language is also a variable that could influence one’s achievement on the test. A person might know and understand the content or concepts with which they are presented, but may not be able to decode the sentence structure or level of language being used. Having questions that measure math reasoning for example, could be constructed in a way that use language the student is not familiar with. A student may have had a strong understanding of those math concepts, but could have been confused by the structure or language of the question, therefore possibly influencing a wrong answer. This would therefore make a test invalid, and not allowing it to measure what it was intending to measure. Culture differences can also affect the instrument, catering towards one religion, race, culture or ethnicity over another. Sometimes certain beliefs or skills are stressed in a certain culture in comparison to others which could inflate or lower scores and cause a greater gap between two students of the same age. Students can also often have test anxiety, which can affect their performance on test, no matter how strongly they perceive and can apply certain behaviors or skills to various problems. Being a test that requires timed sections, students may ultimately not achieve to their potential if time is a pressing issue for them, simply from anxiety and not a different learning disability. Wechsler Individual Achievement Test 14 The assessment itself presents varied strengths and limitations, concluded after having reviewed the WIAT-II. The assessment includes small samples of very specific questions related to the topics of the subtests that are supposed to determine a general intelligence for those thinking skills. We believe that the test could have a greater number of questions addressing each subtest, in order to more accurately determine ones achievement in that section. The test does a good job at measuring what it is intending to measure (specific thinking in particular subject categories) and is indicative of student’s performance in those sections (Salvia et. al., 2007b). It can narrowly determine a student’s academic achievement based on those categories, however, although those subtests are closely related to material presented in schools, it is not indicative of real “intelligence” and critical thinking skills that are not often taught or practiced in traditional skills. It measures what it is intending to measure, however, is what is measures truly the definition of “intelligence?” We believe that the test could involve several other opportunities for critical thinking and interdisciplinary problem solving to extend intelligence to real life situations, and more than just pattern recognition and identifying relationships and so forth. Other limitations include the variables analyzed earlier, and how they can affect a student’s performance and ultimately their scores on this achievement test. During this activity, we learned that the assessment process, although extremely structured and leaving little room for error by the examiner and not much needed preparation, still has its variables and limitations. This achievement test is a very organized test which requires timing to be exact, and the presentation to be followed very closely, yet it can still be invalid or unreliable as a whole based upon the tests it is being compared to, and who created the questions to line up with certain ages and curriculum. Any achievement test has its history and reasons for generating certain questions, but to this day there lacks consideration of differences Wechsler Individual Achievement Test 15 between cultures, languages, races, and variables that are also present during the actual test presentation. Wechsler Individual Achievement Test 16 References Doll, B. (2003). The fifteenth mental measurement yearbook B. Plak, J. Impara, & R. Spies, (Eds.). Nebraska: University of Nebraska Press. Tindal, G & Nutter, M. (2003). The fifteenth mental measurement yearbook Plak, B., Impara, J., & Spies, R, (Eds.). Nebraska: University of Nebraska Press. Mayes, S., & Calhoun, S. (2008). WISC-IV and WIAT-II Profiles in Children With HighFunctioning Autism. Journal of Autism and Developmental Disorders, 38(3), 428-39. Retrieved from Education Full Text database. Muenz, T., Ouchi, B., & Cole, J. (1999). Item analysis of written expression scoring systems from the PIAT-R and WIAT. Psychology in the Schools, 36(1), 31-40. Retrieved from Education Full Text database. Pierangelo, R., & Giuliani, G. A. (2002). Chapter Eight: Assessment of academic achievement. In R. Pierangelo & G. A. Giuliani, Assessment in Special Education (pp.116-157). Boston: Allyn and Bacon. Plak, B., Impara, J., & Spies, R, (Eds.). (2003). The fifteenth mental measurement yearbook. Nebraska: University of Nebraska Press. Salvia, J., Ysseldyke, J., & Bolt, S. (2007a). Chapter 16: Assessment of intelligence: An overview. In Assessment: In special and inclusive education (10th ed.). New York, NY: Houghton Mifflin Company. Salvia, J., Ysseldyke, J., & Bolt, S. (2007b). Chapter 21: Assessment of academic achievement with multiple-skill devices. In Assessment: In special and inclusive education (10th ed.). New York, NY: Houghton Mifflin Company. Wechsler Individual Achievement Test 17 Sattler, J. (2008). Chapter 7: Historical survey and theories of intelligence. In Assessment of children: cognitive functions (5th ed.). San Diego, CA: Jerome M. Sattler, Publisher, Inc. Sax, L. (2005). Why gender matters: What parents and teachers need to know about the emerging science of sex differences. New York: Doubleday. Tindal, G & Nutter, M. (2003). The fifteenth mental measurement yearbook B. Plak, J. Impara, & R. Spies, (Eds.). Nebraska: University of Nebraska Press.