Generic Test Analysis Template

advertisement
Hayward, Stewart, Phillips, Norris, & Lovell
Test Review: Woodcock Reading Mastery Tests-Revised (NU Normative Update) (WRMT-R)
Name of Test: Woodcock Reading Mastery Tests-Revised (NU Normative Update) WRMT-R
Author(s): Richard W. Woodcock
Publisher/Year: 1973,1987, 1998 (norms only)
Forms: Form G and H (parallel forms)
Age Range: 5 years through adulthood to “75+”; Grades K to 6
Norming Sample: The 1998 edition differs from 1987 in that norms have been updated. There are also new Instructional Level
Profiles. WRMT-R is co-normed with K-TEA/NU and PIAT-R/NU.
Total Number: 3 184
Number and Age: 3 184 students from kindergarten to grade 12 as well as 245 young adults ages 18 to 22 years were tested.
Location: 129 sites in 40 states
Demographics: “A stratified multistage sampling procedure” was used to ensure selection of a “nationally representative group” (p.
125). “A stratified multistage sampling procedure” was used and compared to the March, 1994 Current Population Survey (U.S.
Bureau of Census). Sampling targets guided selection and were stratified by gender, race, parent education, and geographic region.
Rural/Urban: no information (see comment below)
SES: SES was determined using parent education level.
Other: Each child was randomly assigned one of the test batteries locally available. Special education students were also included in
small numbers as reflected in the census.
Comments: From reading the Buros review, I have the impression that there were problems with the sample in that though the
authors were ambitious, they did not fulfill their intentions (Crocker & Murray Ward, 2001). Though seen as economical to co-norm
with other tests of achievement, Buros reviewers note that new problems arise. For example, a smaller overall sample resulted (57%
less). No adults over 22 years of age were tested in the sample. Many of the major U.S. cities are not represented in the sample.
Linking samples also reduced the numbers in the samples for specific subtests: Visual-Auditory Learning n=1309, Word Attack n=
751, and Word Comprehension n=721.
Other criticisms: Buros reviewers note that no rationale, other than the obvious economical reason, was provided for why six tests
were brought together for norming (Crocker & Murray Ward, 2001).
1
Hayward, Stewart, Phillips, Norris, & Lovell
Summary Prepared By: Eleanor Stewart 31 May and June 2007
Test Description/Overview:
The test kit consists of a technical manual, two booklets (labeled G and H) containing test stimuli, test record forms, and an
audiocassette that provides pronunciation guides for the Word Attack and Word Identification tests. A form G+H Summary Record
combines the derived scores when both G and H have been administered.
The test remains unchanged from the 1989 revision in terms of test items, score forms, procedures for recording and analyzing errors,
profiles, and computerized scoring. The test consists of six individual tests that, when grouped, form a “cluster” that addresses a
composite of skills necessary for aspects of reading. WRMT-R provides an assessment of reading readiness, basic reading skills, and
reading comprehension. It is U.S.-orientated to address federally mandated “Reading First” criteria which includes provision for
testing students’ skills as basis for instructional planning.
The six tests are:
1. Visual-Auditory Learning: Stick figures and line icons in black and white are introduced with the intent of measuring the
student’s ability to make associations between visual stimuli and verbal responses.
2. Letter Identification: The purpose is to identify upper and lower case as well as letters typed in different styles. Also included
is a Supplementary Letter Checklist which asks student not only to name letters but also to identify by sound. This is provided
as s a checklist only, and is not intended to be included in scoring.
3. Word Identification: In this subtest, the student reads aloud printed word stimuli.
4. Word Attack: This subtest tests student ability to decode nonsense words and English words with low frequency of occurrence
using phonic and/or structural analysis strategies.
5. Word Comprehension: This subtest assesses reading vocabulary (has 3 subtests: antonyms, synonyms, and analogies).
6. Passage Comprehension: This subtest includes a short reading passage of 2 to 3 sentences. After reading, the student must
identify which key words are missing.
Tests one to three are found in Form G only. Two of these tests, Visual-Auditory (Test 1) and Letter Identification (Test 2) form the
composite Readiness Cluster while the remaining tests address reading achievement so that Form G is used to identify readiness and
achievement and Form H is used to assess reading achievement only. Test 5 (Word Comprehension) assesses reading vocabulary
measured by four content areas includes: general reading, science-mathematics, social sciences, and humanities.
2
Hayward, Stewart, Phillips, Norris, & Lovell
Purpose of Test: The purpose of this test is to identify areas of strengths and weakness, to diagnose reading problems, to measure
gains, to program plan, and to research.
Comment: The Buros reviewers point out that while the author claims these purposes, supporting documentation and studies are
unavailable to demonstrate these purposes (validity issue). Extensive critical notes from the Buros reviewers summarize this
shortcoming. (Crocker & Murray Ward, 2001).
Areas Tested:
 Oral Language
Vocabulary
Grammar
Narratives
Other
 Print Knowledge
Environmental Print
Alphabet
Other style of print
 Phonological Awareness
Segmenting
Blending
Elision
Rhyming
Other Word Attack
 Reading
Single Word Reading/Decoding
Comprehension

Spelling
Other
 Writing
Letter Formation
Capitalization
Punctuation
Conventional Structures
Word Choice
Other
 Listening
Lexical
Syntactic
Supralinguistic
Details
Who can Administer: Examiners must be Level B and have completed assessment/testing and statistics courses. Therefore, they are
likely to be teachers, special educators, psychologists, or speech pathologists.
Administration Time: Administration time varies from 10 to 30 minutes; depending on which cluster of subtests is administered.
Test Administration (General and Subtests):
This test is individually administered. For younger children, practice items are provided and training procedures are outlined. All
instructions are provided in the examiner’s manual and on the test easel.
Comments: What role does memory have in Test 1 where the student must make associations between icons and verbal responses?
Might a child with memory difficulties, such as one with a mild TBI, do poorly for this reason?
3
Hayward, Stewart, Phillips, Norris, & Lovell
The Supplementary Letter Checklist outcomes are not included in scoring, instead they are guidelines for interpretation. Might a
tester just skip this if he or she doesn’t know how to interpret or what the student’s errors might mean?
Letter identification employs a variety of print styles. What would an occupational therapist or vision specialist think? Again, as with
memory, I wonder if we might be tapping into some other aspect of human skill (introducing bias).
How well known are the predictive abilities for reading success? Do teachers readily identify them? If they’re like SLPs…I think
about this because when given the option, a teacher may select certain subtests or components over others due to time constraints or
even perhaps because he/she thinks more important information will be obtained sort of like the CELF-recalling sentences subtest.
The Pearson website offers an online demonstration, sample pictures, a WRMT-R/NU bibliography, technical information, samples
and a “Cross-Battery Approach to Individual Achievement Testing” tutorial. The technical manual also has training exercises.
I found the record forms a bit confusing and visually busy. I probably would not use the form to do so for the reason stated as it
appears intimidating. However, a sample report to share with parents is provided.
Test Interpretation:
Scoring is facilitated with the computer program ASSIST on CD-ROM (available for Macintosh or Windows) which allows the
examiner to enter raw scores which are then converted to the statistical profile for the student. In addition to Standard Score, NCEs,
age and grade equivalents, and percentile ranks, ASSIST provides Relative Performance Indexes, Confidence Band as 68% and 90%
levels, Grade Equivalent and Standard Score/Percentile Rank profiles, Aptitude-Achievement Discrepancy Analysis, and a Narrative
Report. Examples from the 1989 version are used to assist the examiner. A discrepancy analysis can be performed with the CD.
Diagnostic profiles allow comparison of WRMT-R results with Goldman-Fristoe-Woodcock Sound-Symbol Tests and the
Woodcock-Johnson Psychoeducational Battery.
No normative values are attached to the checklist scores but the author notes that the checklist is a “diagnostic tool” (see Form G test
manual instruction section for the Supplemental Checklist). The author states that the checklist is intended to be used in instructional
planning.
4
Hayward, Stewart, Phillips, Norris, & Lovell
Comment: Buros reviewers cite several concerns regarding interpretation. The smaller norm sample means that we should be
cautious particularly in interpreting scores for children who were underrepresented in the sample e.g., urban centres. The reviewers
also note that “the presence of new and old norms in the same manual and norm table is misleading at best” (Crocker & Murray
Ward, 2001, 1371).
Standardization:
Age equivalent scores
Grade equivalent scores
Percentiles
Standard scores
Other (Please Specify) Readiness Cluster (Form G only, consists of Visual-auditory Learning and Letter Identification), Basic
Skills Cluster (Word Identification and Word Attack), Reading Comprehension Cluster (Word Comprehension and Passage
Comprehension), Total Reading Full Scale (Word Identification, Word Comprehension, Passage Comprehension), and Total Reading
Short Scale (Word Identification and Passage Comprehension) are available clusters. NCEs and Relative Performance Index and
instructional ranges are provided.
Reliability:
Important note: The reliability reported only refers to the 1989 revision. No updated reliability with new norm sample was provided.
Internal consistency of items: Split-half median was .91 (range .68-.98) and was also reported for Clusters: median =.95 (range .87.98) and Total median=.97 (range .86-.99)
Test-retest: No information found. Technical Information from Pearson site reported “no”.
Inter-rater: No information found. Pearson reports “no”.
Other (Please Specify):
Comment: I think that it is misleading to have mixed 1989 reliability in with re-normed version. I’m not sure how doing so affects the
psychometric basis.
Validity:
Important note: The validity information provided in the manual is from the previous 1989 revision. The information is unrelated to
the 1997 norms. Also because assorted tests and subtests were used in norming/equating, this raises questions about the underlying
trait being measured. No attempt was evident to make sure that the various authors were operationally defining their terms
equivalently. The Buros reviewer states that this update “still does not address a number of validity and interpretation problems
cited in previous reviews” (Crocker & Murray Ward, 2001, p. 1371). Specifically, the reviewer points out that the content domain
5
Hayward, Stewart, Phillips, Norris, & Lovell
6
remains undescribed. Also, as in the previous revision, the sources for the selection of items was not provided nor was there any
indication of rationale for the words or skills provided.
The Buros reviews from the 1989 revised WRMT (Cooter & Jaeger, 1989) brought forth reliability and validity issues. I understand
from the current reviews that these issues remain as the WRMT-R NU is a re-normed version only.
Content: Content validity, as it refers to WRMT-R, was “developed with contributions from outside experts, including experienced
teachers and curriculum specialists” (Woodcock, 1998, p. 97). However, unlike other manuals reviewed for the TELL Project,
Woodcock does not provide references in the manual alongside his statement.
Criterion Prediction (concurrent) Validity: Validity was reported for WRMT-R and WJ reading tests for children in Grades 1, 3,
5, and 8 across subtests and total reading scores. Scores range from a low of .39 (Passage Comprehension) to a high of .91 (Full Scale
Total Reading). A 1978 study reported WRMT-1973 correlation with Iowa Test of Basic Skills, Iowa Tests of Educational
Development ((total reading), PIAT Reading, WJ Reading Achievement, and WRAT Reading demonstrating scores from .79 to .92.
The author justifies: “Although these results are based on the 1973 WRMT, they are reported in this revision because the
psychometric characteristics of the original WRMT (1973) and the WRMT-R are so similar that many generalizations from one to
the other can be validly made” (Woodcock, 1998, p. 100).
Construct Identification Validity: Test and Cluster Intercorrelations: Since tests were clustered to target readiness and skill areas,
correlations are reported for subtests within clusters as well as clusters overall. Subtest and clusters presented predictable
correlations.
Comment: Though all the data are presented in Table 5.6 (Woodcock, 1998, p. 97) for Grades 1, 3, 5, 8, 11 and College as well as
Adult, I would have thought it appropriate that the author offer some comments or interpretation rather than leave that task to the
reader.
Differential Item Functioning: Classical and Rasch models were used in item development and selection though it is unclear from
the statement on page 97. Woodcock states, “both contributed to the stringent statistical criteria employed during the process of item
selection in the WRMT-R” (Woodcock, 1998, p. 97). The correlations range from low (.35 at Grade 3 for letter identification/visualauditory learning) to high (.98 for Total reading short scale/total reading full scale).
Other (Please Specify): No reported studies investigating the predictive validity with special education populations were
Hayward, Stewart, Phillips, Norris, & Lovell
7
undertaken, though children with special needs did participate in norming.
Comment: With no predictive validity studies, we have little if anything on which to base placement decisions. Serious concern since
this test is widely used by school districts for that very purpose and author promotes this use. Buros reviewer states: “Scores for
special education students should be used cautiously. Although the author did include special education and gifted students in the
norm sample, matching their prevalence in the general population, their actual numbers are quite small. In addition, there are still
no predictive validity studies to validate the WRMT-R NU with this population; a serious omission because this test is frequently used
in placement and re-evaluation” (Crocker & Murray Ward, 2001, p. 1372).
Summary/Conclusions/Observations:
The Buros reviewers make important comments:
“…three interpretation issues arise. First, the use of the norms generated from the smaller norm sample means that interpretations
are limited…Second, scores for special education students should be used cautiously…[third]. The author states that
comparisons of old and new norm data clearly show a pattern of lower performances and higher standard and percentile
scores of lower achievers. This effect could result in overestimation of students’ reading levels. Thus, students might not receive
appropriate services or services may be terminated prematurely. Interestingly, there are no cautions to examiners to readjust
score referents to account for these changes” (Crocker & Murray Ward, 2001, p. 1372).
“It should also be remembered that no changes have been made in test skills or items, and there is no stipulation that the other
measures clarify the meaning of the WRMT-R NU scores. In conclusion, the WRMT-R/NU is a limited norms update. The test
still contains many test items and scores, but does not address problems identified by previous reviewers. Furthermore, the
renorming has narrowed the utility of the test. Therefore, the WRMT-R/NU should be used in conjunction with other measures
of reading. Results should not be overinterpreted. The examiner should also be very cautious in using the test with a wide range
of age groups. If these cautions are observed, the test may be useful in helping estimate reading achievement” (Crocker &
Murray Ward, 2001, p. 1372).
Clinical/Diagnostic Usefulness: Based on the critiques available, I think that this test has limited clinical utility and if used, should
only be used as an adjunct to more rigorous and contemporary reading tests. I would be very cautious about using this test’s results
to make important decisions about eligibility and intervention though the author intends for the test to be used in this way. This test
has a long history. It is probably widely used and likely to be well embedded in structures of assessment protocols and funding
structures. I wonder if educators would give much thought to its selection, or the consideration of other more recent tests as
Hayward, Stewart, Phillips, Norris, & Lovell
8
alternatives. Several generations of teachers would be familiar with it. A quote from the Pearson website: “I have used the
Woodcock Reading Mastery Tests for almost 30 years now . . . I believe it has great value in diagnosing reading difficulties and
providing a basis for me to write a prescription for remedying reading difficulties” (Dr. Dianne M. Haneke, Professor of Literacy
Education, retired) (from the Pearson website).
References
Cooter, R. B., & Jaeger, R. M. (1989). – Test review of the Woodcock Reading Mastery Test – Revised. From J. C. Conoley & J. J.
Kramer (Eds.), The tenth mental measurements yearbook (pp. 909-916). (pp. 1369-1373). Lincoln, NE: Buros Institute of
Mental Measurements.
Crocker, L., and Murray Ward, M. (2001). Test review of Woodcock Reading Mastery Test-Revised 1998 Normative Update. In B.S.
Plake and J.C. Impara (Eds.), The fourteenth mental measurements yearbook (pp. 1369-1373). Lincoln, NE: Buros Institute of
Mental Measurements.
Current Population Survey, March, 1994 [Machine readable data file]. (1994). Washington, DC: Bureau of the Census (Producer and
Distributor).
Pearson Assessments (2007). Speech and language forum. Retrieved May 31, 2008 from http://www.SpeechandLanguage.com
Woodcock, R. W. (1998). Woodcock reading mastery tests – Revised NU: Examiner’s manual. Circle Pines, MN: American Guidance
Service.
To cite this document:
Hayward, D. V., Stewart, G. E., Phillips, L. M., Norris, S. P., & Lovell, M. A. (2008). Test review: Woodcock reading mastery testsrevised (NU normative update) (WRMT-R). Language, Phonological Awareness, and Reading Test Directory (pp. 1-8).
Edmonton, AB: Canadian Centre for Research on Literacy. Retrieved [insert date] from
http://www.uofaweb.ualberta.ca/elementaryed/ccrl.cfm.
Download