20100608-405AssessmentEvalCritique

advertisement
Wechsler Individual Achievement Test 1
Running head: WECHSLER INDIVIDUAL ACHIEVEMENT TEST
Review and Critique of Wechsler Individual Achievement Test
Kaitlin Schulz and Megan Saunders
Margaret G. Warner Graduate School of Education
University of Rochester
June 8, 2010
Wechsler Individual Achievement Test 2
Review of Wechsler Individual Achievement Test
The Wechsler Individual Achievement Test- Second Edition (WIAT-II) is an
achievement test that was designed to be given to individuals between the ages of 4 years old and
85 yrs old (Salvia, Ysseldyke, & Bolt, 2007b; Plak, Impara, & Spies, 2003; Doll, 2003). As with
all achievement tests, the WIAT-II is designed to assess the academic achievement of an
individual as opposed to measuring the student’s innate cognitive ability (Pierangelo & Giuliani,
2002). The WIAT-II is based on its predecessor, the WIAT, which was published in 1992 by The
Psychological Corporation (Plak et. al, 2003; Tindal & Nutter, 2003). The WIAT-II began to
come into fruition in 1996, four years after the WIAT was established as an individual
achievement test (Doll, 2003; Tindal & Nutter, 2003). It is now the most commonly used
achievement test for school aged children (Mayes & Calhoun, 2008) while also being an
assessment tool for measuring the academic achievement of adults (Doll, 2003).
The WIAT-II is used for identifying a discrepancy between a child’s ability and their
achievement through the comparison of their scores on the WIAT-II and one of three Wechsler
ability tests: WPPSI-R, WISC-III, or WAIS-III (Salvia et. al., 2007b). This method of testing
very often serves as a differential diagnosis for students with disabilities (Tindal & Nutter, 2003).
Essentially, the discrepancy or differential is used to determine whether or not a student has a
learning disability and whether or not the student qualifies for special education services (Plak et.
al., 2003; Pierangelo & Giulianni, 2002). However, the WIAT-II is also intended to be used for
curriculum planning, such as the planning and evaluation of instruction (Pierangelo & Giuliani,
2002), and clinical appraisal for preschool children. Tindal and Nutter (2003) also note that the
results of the WIAT-II are to be used across a variety of settings and not merely for the purpose
of placing students in special education.
Wechsler Individual Achievement Test 3
The theoretical basis for the WIAT-II comes from the beliefs of David Wechsler.
Wechsler believed that intelligence is composed of many different types of intelligence and is
affected by both the ways in which these intelligences are combined as well as by one’s own
motivation. He also felt that one could not measure one’s intelligence but rather could only
measure aspects of an individual’s intellectual ability. On this notion and on the foundations set
by the Army Alpha and Army Beta exams, David Wechsler created his intelligence tests.
Wechsler believe intelligence was part of the larger construct of personality and developed his
scale by focusing on the global nature of intelligence. Weschler made no attempt to design
subtests that would measure the basic units of intelligence (Sattler, 2008). Wechsler used this
theory molded with the Cattell-Horn-Carroll theory of intelligence, which states that there are
multiple dimensions to intelligence such as fluid intelligence, crystallized intelligence, general
memeory and learning, broad visual perception, broad auditory perception, broad retrieval
ability, broad cognitive speediness, and decision/reaction time/speed, to create intelligence tests
(Salvia, Ysseldyke, & Bolt, 2007a). The WIAT and the WIAT-II are reflections of Wechsler’s
beliefs on intelligence and offer a way of measuring one’s achievement in comparison to their
intelligence as measured by a Wechsler Intelligence exam. Development of the WIAT-II
includes research conducted by Berninger, the National Reading Panel, and the National Council
of Teachers of Mathematics (Tindal & Nutter, 2003). While the WIAT-II is based on a test that
is based on theoretical findings, Muenz, Ouchi, and Cole (1999) stress that the written expression
subtest is not based on any validated theory.
The WIAT-II is divided into four composites on which an individual is scored and then
further divided into eight subtests. The four composites are Reading, Written Language,
Mathematics, and Oral Language. Reading is divided into the subtests of word/pseudoword
Wechsler Individual Achievement Test 4
decoding and reading comprehension. Mathematics is composed of the subtests math reasoning
and numerical operations. Oral expression and listening comprehension make up the Oral
Language composite and Written Language is composed of the spelling and written expression
subtests (Plak et. al., 2003; Salvia et. al., 2007b; Pierangelo & Giulianni, 2002).
The standardization of the WIAT-II began in 1997 with a pilot test that assessed 400
students and was followed up with a larger pilot that sampled 1900 students. An item analysis of
these two pilot tests was used for creating the final version of the WIAT-II (Doll, 2003; Tindal &
Nutter, 2003). The normalization process tested 3,600 students for the grade based sample and
2,950 students for the age based sample (Salvia et. al., 2007b; Doll, 2003). The sample of
individuals tested reflected the 1998 US Census data (Doll, 2003; Salvia et. al, 2007; Tindal &
Nutter, 2003). Students that receive special education were included in the sample (Tindal and
Nutter, 2003). However, the sample excluded individuals whom do not speak English, those with
neurological disorders, and individuals on medication that may affect their performance (Doll,
2003). Salvia, Ysseldyke, and Bolt (2007b) describe the standardization sample as “adequate”.
The sample was stratified by gender, race/ethnicity, geographic location, and parent educational
level data (Doll, 2003; Salvia et. al, 2007b; Tindal & Nutter, 2003). Although, the socioeconomic
status of individuals was not stratified within the sample (Salvia, et. al., 2007b).
Qualified and trained examiners conducted the administration of the normalization tests
to ensure validity in the tests’ normalization (Tindal & Nutter, 2003). Additionally, 1069
individuals took both the WIAT-II and the corresponding, age appropriate Wechsler Intelligence
test in order to create a normalization for the linking of ability and achievement test scores for
measuring discrepancies (Salvia et. al., 2007b; Tindal & Nutter, 2003). The start, reversal, and
Wechsler Individual Achievement Test 5
discontinuation rules of the WIAT-II were developed during the normative process to adequately
align the questions with age/grade standards (Tindal & Nutter, 2003).
Evidence for the reliability of the Wechsler Individual Achievement Test-II has been
provided for the internal consistency, stability reliability and the interrater reliability. No
evidence of alternate form reliability is present. This is due to the fact that there is only one form
of the WIAT-II.
The internal consistency reliability of the WIAT-II was determined by split-half
reliability coefficients. There is some discrepancy in the reliability coefficients provided by
reviewers of the test. However, most reviewers find the test to be internally consistent. The split
half reliability analysis was conducted based on examinees age and grade. For most subtest
scores the reliability coefficient was greater than .80, but numerical operations, written
expression, listening comprehension, and oral expression fall below .80 for some ages and grades
(Salvia et. al., 2007b; Tindal & Nutter, 2003). Doll (2003) reports the internal consistency of
subtests above .85 except for written expression and listening comprehension in school aged
sample and oral expression in the college student and adult sample. These three subtests
maintained a greater than .70 reliability coeffiecient. The reliability for all four composite scores
was greater than .80 (Salvia et. al., 2007b; Tindal & Nutter, 2003).
Interrater reliability indicates a low coefficient in written expression, however 5 of the 7
elements of written expression were found to be both reliable and valid and two of the elements
were found to be valid with limited reliability. It is thought that the general scoring system of the
WIAT-II improves its interrater reliability and enhances the tests ability to discriminate between
good and poor writers (Muenz et. al., 1999). The interrater reliability coefficient calculated from
2180 examinee responses rangers from .94-.98 for reading comprehension (a dichotomously
Wechsler Individual Achievement Test 6
scored item), .91-.99 or oral expression and .1-.94 for written expression (Salvia et. al, 2007b;
Tindal & Nutter, 2003).
Stability reliability was determined through the conduction of a test-retest process.
Examinees were administered the test and then administered the same test again an average of 10
days later. 297 students aged 6 year to 19 years old participated in the study and all subtest
stability reliabilities are above .80 (Salvia et. al, 2007b). Additionally, Doll (2003) reports the
stability coefficient for school aged children to be >.90 for all composites and .75-.85 for all
composites scores of college aged and adult examinees.
In addition to the general scoring system of the WIAT-II providing information regarding
interrater reliability, it also improves the content validity of the assessment (Muenz et. al, 1999).
The WIAT-II is considered valid based on the correlation between its subtests and other
achievement tests as well as the WIAT-II’s correlation with curriculum objectives as analyzed by
experts in the field (Salvia et. al., 2007b; Tindal & Nutter, 2003). A draft of the test was also
compared to national and state standards to account for the content validity of the exam (Doll,
2003). However, the examiner’s manual for the WIAT-II provides little evidence of the test’s
validity and it is unclear whether the results of the WIAT-II will provide more usable
information about students’ abilities (Doll, 2003). It is found that scores are more valid at the
middle ages than at either extreme of the test’s age spectrum (Pierangelo & Giulianni, 2002).
According to Salvia et. al (2007b) the WIAT-II is both reliable and valid.
In order to administer the WIAT-II, one must purchase the testing kit for $321. This kit
includes: stimulus book 1, stimulus book 2, 25 record forms, 25 response booklets, an
examiner’s manual, scoring and normative supplement for grades PreK-12, scoring and
normative supplement for college students and adults, word cards, audiotapes, and a bag. In
Wechsler Individual Achievement Test 7
addition to this kit, one would also need to purchase the scoring software compatible with their
computer. Additional forms and replacement items can also be purchased for a fee (Plak et. al,
2003). The length of the test depends upon the examinee, ranging from 45 minutes for
Preschoolers to two hours for high school/college students and adults (Salvia et. al, 2007b; Plak
et. al., 2003; Tindal & Nutter, 2003).
The person administering the WIAT-II must be trained and qualified in the
administration of individual assessment tools and must be involved in educational or
psychological testing (Tindal & Nutter, 2003). The test is to be administered to an individual
student or adult and is not to be given in group settings (Plak et. al., 2003). Additionally, the
subtests musts be administered in the prescribed order even if only a select few subtests are given
(Tindal & Nutter, 2003). Start points are given in the test manual and stimulus booklets and are
based upon the examinees age or grade (Doll, 2003). There is a reverse rule that is to be used if
the examinee answers one of the first three questions wrong. Discontinue rules vary for each
subtest, but generally apply when the examinee incorrectly answers six or seven questions in a
row (Tindal & Nutter, 2003). Additionally, the Reading Comprehension and Written Expression
subtests use Stop points to indicate the end of that portion. Modifications and allowable
accommodations are not spelled out in the test manual and therefore fall to the discrepancy of the
person administering the test (Tindal & Nutter, 2003).
Scoring of the WIAT-II is partially completed simultaneously to its administration.
Scores on each individual test item are either dichotomous (0 or 1) or given partial credit (0, 1, or
2). The test encourages test administrators to make qualitative recordings throughout the testing
process to help aid in the evaluation of the student (Tindal & Nutter, 2003). After the completion
of the test administration, raw test scores are translated into reported scores. Scores are reported
Wechsler Individual Achievement Test 8
for each subtest, compiled into four composite scores (mathematics, oral language, written
language, and reading) and a total composite score is also provided (Plak et. al., 2003; Tindal &
Nutter, 2003). An individual’s score on the WIAT-II can be presented in eight different forms:
standard, percentile rank, age equivalent, grade equivalent, normal-curve equivalent, stanine,
quartile, and decile (Doll, 2003; Salvia et. al., 2007b; Tindal & Nutter, 2003). The report also
includes a place to report on the ability-achievement discrepancy and plot the student’s results on
a bell curve. (Tindal & Nutter, 2003). However, only professionals trained in giving tests are
qualified to interpret its results into educational decisions (Doll, 2003). This is in part due to the
fact that the test does not offer any link between the assessment score and future instruction
(Tindal & Nutter, 2003).
Sattler (2008) suggests that the WISC-III, the intelligence test cousin of the WIAT-II, is
constant among cultures and that scores were remarkable similar across ethnicities. However,
Pierangelo and Giulianni (2002) claim that some cultural bias may be present on some subtests.
They do not expand on this notion to provide evidence of cultural bias in any specific subtest.
Lastly, Tindal and Nutter (2003) state that “conventional and item response theory analyses are
presented to document item consistency and to eliminate poorly constructed items, determine
correct item order as well as to prevent bias (p.1001)”. No other evidence or claims on item or
test bias was found on the WIAT-II.
Wechsler Individual Achievement Test 9
Critique of Wechsler Individual Achievement Test
The development of intelligence tests in the late 1800’s has lead to analytical theories of
intelligence. These analytical theories of intelligence have furthered the development of newer
intelligence assessments that try to reflect the models presented by the theorists (Sattler, 2008).
The Wechsler Individual Achievement Test, Second Edition (WIAT-2) aims to assess “seven
areas of learning disability specified in Public Law 94-142” along with spelling and pseudoword.
The WIAT-2 is a new form of test that can be correlated with intelligence tests, such as the
Wechsler Intelligence Scale for Children – Fourth Edition (WISC-IV) (Mayes & Calhoun, 2008).
The purpose of this test is to assess the abilities of an individual to meet the necessary
achievements for reading, spelling, and arithmetic.
Ability can be scored in multiple ways, and one of those ways is through a person’s
capability to achieve a specific function. In order to standardize the test, questions have to be
generalized. This correlates to Spearman’s theory of general and specific factors for intelligence
(Sattler, 2008). The Listening Comprehension and Oral Expression components of the
assessment fit real-world contexts that allows for more accurate ability measurements (WIAT-2).
Its design is to allow the test to be administered to a large range of individuals. This allows the
test to be applicable across many different ages and grades. The scoring of the test allows both
for standardization of scores as well as for the composite evaluation of eight different scores that
reflect individualized contexts, which can be analyzed and assessed (Salvia et. al., 2007b). “This
supplemental material, which includes a variety of additional subtests that allow for the
comparison of student performance across a variety of conditions, is intended to facilitate the
identification of specific processing deficits” (Salvia et. al, 2007a, p. 291).
The test itself is used to identify the differences between one’s ability and one’s
Wechsler Individual Achievement Test 10
achievement. In order to be classified with a learning disability, their ability would be high and
their achievement would be lower. The use of the discrepancy model for labeling students means
that only those students who are achieving lower than their ability are ones who are receiving
extra services. Because of this, this achievement test fails to aid those students who have both a
low ability and low achievement, concluding that they are performing to their potential. The
same standards need to be set for all students, having high expectations for each one of them, but
simply understanding that modifications or accommodations may be required to reach those
standards.
The fact that this test assesses skills that students learn both in the classroom and realworld contexts, as well as the applicability for this test to be given across a wide range of
students and adults and scores can be standardized to reflect these different contexts means that
the educational use of this test is high. It not only provides information about the student, but can
be used to reflect upon the educational preparedness and practice of a school. This points out a
problem in the assessment. If students are not provided with the contexts and skills deemed
necessary by this assessment, then multiple things must be addressed. First, schools and family
environments need to promote and provide more open, supportive, and critical contexts that
develop these skills. Second, the students cannot be held accountable for any lack in
development, but correction needs to be made in order to provide this student with the highest
capability set possible. Finally, the requirements that are deemed necessary on the assessment
should be continually reviewed and assessed based on the societal, communal, and historical
contexts of present day.
There are many different behaviors involved in assessing one’s intelligence, yet they are
identified as being similar, hence the idea of “general intelligence” evolved. Intelligence tests
Wechsler Individual Achievement Test 11
such as the Wechler Intelligence Scale for Children (WISC) are used to measure one’s general
intelligence by presenting them with different tasks to complete that require the use of different
behaviors and skills (Salvia et. al., 2007a). Are these tests truly predictive of one’s intelligence in
different situations or environments? Are simple tasks on a test approached differently than if
they were to apply similar behaviors to a real-life situation? Who decides whether there is such a
thing as general intelligence? Yes, there are different types of intelligence, different styles and
ways of thinking and different applications of these intelligences.
Some parts of the test require that the examiner discontinue a section after the examinee
gets a certain number incorrect. This claims that the student is unable to answer following
questions that may require a higher level of thinking or more specific content. However, is it fair
to make this assumption when some of the questions might simply be invalid, worded poorly, or
involving content that the student might be unfamiliar with although they might have been able
to answer questions later in the section? Also, it is very difficult to keep track of every second
during which the test is being administered and how long it takes a student to answer one single
question, while at the same time trying to score them. Timing alone could create a variable
among different raters and test administrations, while also causing anxiety within the test taker.
The WIAT is claimed to be both reliable and valid based on the construction of the test
and performance by both children and adults. However, the reliability of the test is based off of
interrater agreement and those who were trained to grade the achievement tests. However, what
qualifies raters to be efficient and 100% consistent in their rating in comparison to other raters?
There can always be room for disagreement, and at times it might be significant. Having
different examiners and raters rather than just one will always present itself with potential for a
test to be unreliable. This variable is impossible to control as there cannot be just a single rater
Wechsler Individual Achievement Test 12
for the millions of WIATs that are conducted each year.
The test is also deemed valid, yet it is deemed valid because of its results in
comparison to other achievement tests (Salvia et. al., 2007b). However, what makes those
achievement tests valid then? Are they being compared to other tests too? Experts determine the
correlation of the questions to curriculum objectives. What makes these people experts? Is the
curriculum they are designing these questions from reliable in and of itself? Who is to determine
what a person is supposed to know by a specific age? Although appearing to be both reliable and
valid, there are more complex considerations behind the creation of the test that could deem it
invalid based upon what a person is expected to know, and how questions pertaining to those
topics are constructed and then presented.
There are many variables that could possibly be present to affect the instrument and its
intent. The examiner must be someone who is not biased, and who is indifferent to the student’s
performance. Any encouragement or disappointment could affect the student’s response to
questions or their thought processes. In the listening comprehension section, the variation of the
pronunciation or level of sound of the examiner’s voice could also affect the outcomes of student
performance in that section. Whether the environment was not familiar versus familiar, and
whether the examiner was difficult to hear or not, could change the way the student performs.
Any acknowledgement of encouragement or disappointment could affect the student’s response
to questions or their thought processes.
The examinee’s age and experiences definitely can affect the instrument and the
outcome, depending on the way in which the question is presented (Sax, 2005). For example,
being biologically different, boys and girls learn in different ways and in general perform better
when material or questions are differentiated to meet these preferences. Research has shown that
Wechsler Individual Achievement Test 13
boys like material and questions to be succinct, to the point, and like to immediate start working
on a problem (Sax, 2005). Girls like to hear a story behind it, make a connection to it, and
approach the problem from different angles, therefore taking more time to finish it (Sax, 2005).
One is not right or wrong, but they are generally proven differences in the ways boys and girls
learn and then express their understanding. With only one form of the WIAT, and not writing the
test differently for boys and girls, there could inherently be some biases that may lead to
discrepancies in the test scores. Although not a definite effect on the test results, it is still a
possible variable that can affect the outcomes.
Language is also a variable that could influence one’s achievement on the test. A person
might know and understand the content or concepts with which they are presented, but may not
be able to decode the sentence structure or level of language being used. Having questions that
measure math reasoning for example, could be constructed in a way that use language the student
is not familiar with. A student may have had a strong understanding of those math concepts, but
could have been confused by the structure or language of the question, therefore possibly
influencing a wrong answer. This would therefore make a test invalid, and not allowing it to
measure what it was intending to measure. Culture differences can also affect the instrument,
catering towards one religion, race, culture or ethnicity over another. Sometimes certain beliefs
or skills are stressed in a certain culture in comparison to others which could inflate or lower
scores and cause a greater gap between two students of the same age.
Students can also often have test anxiety, which can affect their performance on test, no
matter how strongly they perceive and can apply certain behaviors or skills to various problems.
Being a test that requires timed sections, students may ultimately not achieve to their potential if
time is a pressing issue for them, simply from anxiety and not a different learning disability.
Wechsler Individual Achievement Test 14
The assessment itself presents varied strengths and limitations, concluded after having
reviewed the WIAT-II. The assessment includes small samples of very specific questions related
to the topics of the subtests that are supposed to determine a general intelligence for those
thinking skills. We believe that the test could have a greater number of questions addressing each
subtest, in order to more accurately determine ones achievement in that section. The test does a
good job at measuring what it is intending to measure (specific thinking in particular subject
categories) and is indicative of student’s performance in those sections (Salvia et. al., 2007b). It can
narrowly determine a student’s academic achievement based on those categories, however,
although those subtests are closely related to material presented in schools, it is not indicative of
real “intelligence” and critical thinking skills that are not often taught or practiced in traditional
skills. It measures what it is intending to measure, however, is what is measures truly the
definition of “intelligence?” We believe that the test could involve several other opportunities for
critical thinking and interdisciplinary problem solving to extend intelligence to real life
situations, and more than just pattern recognition and identifying relationships and so forth.
Other limitations include the variables analyzed earlier, and how they can affect a student’s
performance and ultimately their scores on this achievement test.
During this activity, we learned that the assessment process, although extremely
structured and leaving little room for error by the examiner and not much needed preparation,
still has its variables and limitations. This achievement test is a very organized test which
requires timing to be exact, and the presentation to be followed very closely, yet it can still be
invalid or unreliable as a whole based upon the tests it is being compared to, and who created the
questions to line up with certain ages and curriculum. Any achievement test has its history and
reasons for generating certain questions, but to this day there lacks consideration of differences
Wechsler Individual Achievement Test 15
between cultures, languages, races, and variables that are also present during the actual test
presentation.
Wechsler Individual Achievement Test 16
References
Doll, B. (2003). The fifteenth mental measurement yearbook B. Plak, J. Impara, & R. Spies,
(Eds.). Nebraska: University of Nebraska Press.
Tindal, G & Nutter, M. (2003). The fifteenth mental measurement yearbook Plak, B., Impara, J.,
& Spies, R, (Eds.). Nebraska: University of Nebraska Press.
Mayes, S., & Calhoun, S. (2008). WISC-IV and WIAT-II Profiles in Children With HighFunctioning Autism. Journal of Autism and Developmental Disorders, 38(3), 428-39.
Retrieved from Education Full Text database.
Muenz, T., Ouchi, B., & Cole, J. (1999). Item analysis of written expression scoring systems
from the PIAT-R and WIAT. Psychology in the Schools, 36(1), 31-40. Retrieved from
Education Full Text database.
Pierangelo, R., & Giuliani, G. A. (2002). Chapter Eight: Assessment of academic achievement.
In R. Pierangelo & G. A. Giuliani, Assessment in Special Education (pp.116-157).
Boston: Allyn and Bacon.
Plak, B., Impara, J., & Spies, R, (Eds.). (2003). The fifteenth mental measurement yearbook.
Nebraska: University of Nebraska Press. Salvia, J., Ysseldyke, J., & Bolt, S. (2007a).
Chapter 16: Assessment of intelligence: An overview. In Assessment: In special and
inclusive education (10th ed.). New York, NY: Houghton Mifflin Company.
Salvia, J., Ysseldyke, J., & Bolt, S. (2007b). Chapter 21: Assessment of academic achievement
with multiple-skill devices. In Assessment: In special and inclusive education (10th ed.).
New York, NY: Houghton Mifflin Company.
Wechsler Individual Achievement Test 17
Sattler, J. (2008). Chapter 7: Historical survey and theories of intelligence. In Assessment of
children: cognitive functions (5th ed.). San Diego, CA: Jerome M. Sattler, Publisher, Inc.
Sax, L. (2005). Why gender matters: What parents and teachers need to know about the
emerging science of sex differences. New York: Doubleday.
Tindal, G & Nutter, M. (2003). The fifteenth mental measurement yearbook B. Plak, J. Impara, &
R. Spies, (Eds.). Nebraska: University of Nebraska Press.
Download