Hayward, Stewart, Phillips, Norris, & Lovell Test Review: Woodcock Reading Mastery Tests-Revised (NU Normative Update) (WRMT-R) Name of Test: Woodcock Reading Mastery Tests-Revised (NU Normative Update) WRMT-R Author(s): Richard W. Woodcock Publisher/Year: 1973,1987, 1998 (norms only) Forms: Form G and H (parallel forms) Age Range: 5 years through adulthood to “75+”; Grades K to 6 Norming Sample: The 1998 edition differs from 1987 in that norms have been updated. There are also new Instructional Level Profiles. WRMT-R is co-normed with K-TEA/NU and PIAT-R/NU. Total Number: 3 184 Number and Age: 3 184 students from kindergarten to grade 12 as well as 245 young adults ages 18 to 22 years were tested. Location: 129 sites in 40 states Demographics: “A stratified multistage sampling procedure” was used to ensure selection of a “nationally representative group” (p. 125). “A stratified multistage sampling procedure” was used and compared to the March, 1994 Current Population Survey (U.S. Bureau of Census). Sampling targets guided selection and were stratified by gender, race, parent education, and geographic region. Rural/Urban: no information (see comment below) SES: SES was determined using parent education level. Other: Each child was randomly assigned one of the test batteries locally available. Special education students were also included in small numbers as reflected in the census. Comments: From reading the Buros review, I have the impression that there were problems with the sample in that though the authors were ambitious, they did not fulfill their intentions (Crocker & Murray Ward, 2001). Though seen as economical to co-norm with other tests of achievement, Buros reviewers note that new problems arise. For example, a smaller overall sample resulted (57% less). No adults over 22 years of age were tested in the sample. Many of the major U.S. cities are not represented in the sample. Linking samples also reduced the numbers in the samples for specific subtests: Visual-Auditory Learning n=1309, Word Attack n= 751, and Word Comprehension n=721. Other criticisms: Buros reviewers note that no rationale, other than the obvious economical reason, was provided for why six tests were brought together for norming (Crocker & Murray Ward, 2001). 1 Hayward, Stewart, Phillips, Norris, & Lovell Summary Prepared By: Eleanor Stewart 31 May and June 2007 Test Description/Overview: The test kit consists of a technical manual, two booklets (labeled G and H) containing test stimuli, test record forms, and an audiocassette that provides pronunciation guides for the Word Attack and Word Identification tests. A form G+H Summary Record combines the derived scores when both G and H have been administered. The test remains unchanged from the 1989 revision in terms of test items, score forms, procedures for recording and analyzing errors, profiles, and computerized scoring. The test consists of six individual tests that, when grouped, form a “cluster” that addresses a composite of skills necessary for aspects of reading. WRMT-R provides an assessment of reading readiness, basic reading skills, and reading comprehension. It is U.S.-orientated to address federally mandated “Reading First” criteria which includes provision for testing students’ skills as basis for instructional planning. The six tests are: 1. Visual-Auditory Learning: Stick figures and line icons in black and white are introduced with the intent of measuring the student’s ability to make associations between visual stimuli and verbal responses. 2. Letter Identification: The purpose is to identify upper and lower case as well as letters typed in different styles. Also included is a Supplementary Letter Checklist which asks student not only to name letters but also to identify by sound. This is provided as s a checklist only, and is not intended to be included in scoring. 3. Word Identification: In this subtest, the student reads aloud printed word stimuli. 4. Word Attack: This subtest tests student ability to decode nonsense words and English words with low frequency of occurrence using phonic and/or structural analysis strategies. 5. Word Comprehension: This subtest assesses reading vocabulary (has 3 subtests: antonyms, synonyms, and analogies). 6. Passage Comprehension: This subtest includes a short reading passage of 2 to 3 sentences. After reading, the student must identify which key words are missing. Tests one to three are found in Form G only. Two of these tests, Visual-Auditory (Test 1) and Letter Identification (Test 2) form the composite Readiness Cluster while the remaining tests address reading achievement so that Form G is used to identify readiness and achievement and Form H is used to assess reading achievement only. Test 5 (Word Comprehension) assesses reading vocabulary measured by four content areas includes: general reading, science-mathematics, social sciences, and humanities. 2 Hayward, Stewart, Phillips, Norris, & Lovell Purpose of Test: The purpose of this test is to identify areas of strengths and weakness, to diagnose reading problems, to measure gains, to program plan, and to research. Comment: The Buros reviewers point out that while the author claims these purposes, supporting documentation and studies are unavailable to demonstrate these purposes (validity issue). Extensive critical notes from the Buros reviewers summarize this shortcoming. (Crocker & Murray Ward, 2001). Areas Tested: Oral Language Vocabulary Grammar Narratives Other Print Knowledge Environmental Print Alphabet Other style of print Phonological Awareness Segmenting Blending Elision Rhyming Other Word Attack Reading Single Word Reading/Decoding Comprehension Spelling Other Writing Letter Formation Capitalization Punctuation Conventional Structures Word Choice Other Listening Lexical Syntactic Supralinguistic Details Who can Administer: Examiners must be Level B and have completed assessment/testing and statistics courses. Therefore, they are likely to be teachers, special educators, psychologists, or speech pathologists. Administration Time: Administration time varies from 10 to 30 minutes; depending on which cluster of subtests is administered. Test Administration (General and Subtests): This test is individually administered. For younger children, practice items are provided and training procedures are outlined. All instructions are provided in the examiner’s manual and on the test easel. Comments: What role does memory have in Test 1 where the student must make associations between icons and verbal responses? Might a child with memory difficulties, such as one with a mild TBI, do poorly for this reason? 3 Hayward, Stewart, Phillips, Norris, & Lovell The Supplementary Letter Checklist outcomes are not included in scoring, instead they are guidelines for interpretation. Might a tester just skip this if he or she doesn’t know how to interpret or what the student’s errors might mean? Letter identification employs a variety of print styles. What would an occupational therapist or vision specialist think? Again, as with memory, I wonder if we might be tapping into some other aspect of human skill (introducing bias). How well known are the predictive abilities for reading success? Do teachers readily identify them? If they’re like SLPs…I think about this because when given the option, a teacher may select certain subtests or components over others due to time constraints or even perhaps because he/she thinks more important information will be obtained sort of like the CELF-recalling sentences subtest. The Pearson website offers an online demonstration, sample pictures, a WRMT-R/NU bibliography, technical information, samples and a “Cross-Battery Approach to Individual Achievement Testing” tutorial. The technical manual also has training exercises. I found the record forms a bit confusing and visually busy. I probably would not use the form to do so for the reason stated as it appears intimidating. However, a sample report to share with parents is provided. Test Interpretation: Scoring is facilitated with the computer program ASSIST on CD-ROM (available for Macintosh or Windows) which allows the examiner to enter raw scores which are then converted to the statistical profile for the student. In addition to Standard Score, NCEs, age and grade equivalents, and percentile ranks, ASSIST provides Relative Performance Indexes, Confidence Band as 68% and 90% levels, Grade Equivalent and Standard Score/Percentile Rank profiles, Aptitude-Achievement Discrepancy Analysis, and a Narrative Report. Examples from the 1989 version are used to assist the examiner. A discrepancy analysis can be performed with the CD. Diagnostic profiles allow comparison of WRMT-R results with Goldman-Fristoe-Woodcock Sound-Symbol Tests and the Woodcock-Johnson Psychoeducational Battery. No normative values are attached to the checklist scores but the author notes that the checklist is a “diagnostic tool” (see Form G test manual instruction section for the Supplemental Checklist). The author states that the checklist is intended to be used in instructional planning. 4 Hayward, Stewart, Phillips, Norris, & Lovell Comment: Buros reviewers cite several concerns regarding interpretation. The smaller norm sample means that we should be cautious particularly in interpreting scores for children who were underrepresented in the sample e.g., urban centres. The reviewers also note that “the presence of new and old norms in the same manual and norm table is misleading at best” (Crocker & Murray Ward, 2001, 1371). Standardization: Age equivalent scores Grade equivalent scores Percentiles Standard scores Other (Please Specify) Readiness Cluster (Form G only, consists of Visual-auditory Learning and Letter Identification), Basic Skills Cluster (Word Identification and Word Attack), Reading Comprehension Cluster (Word Comprehension and Passage Comprehension), Total Reading Full Scale (Word Identification, Word Comprehension, Passage Comprehension), and Total Reading Short Scale (Word Identification and Passage Comprehension) are available clusters. NCEs and Relative Performance Index and instructional ranges are provided. Reliability: Important note: The reliability reported only refers to the 1989 revision. No updated reliability with new norm sample was provided. Internal consistency of items: Split-half median was .91 (range .68-.98) and was also reported for Clusters: median =.95 (range .87.98) and Total median=.97 (range .86-.99) Test-retest: No information found. Technical Information from Pearson site reported “no”. Inter-rater: No information found. Pearson reports “no”. Other (Please Specify): Comment: I think that it is misleading to have mixed 1989 reliability in with re-normed version. I’m not sure how doing so affects the psychometric basis. Validity: Important note: The validity information provided in the manual is from the previous 1989 revision. The information is unrelated to the 1997 norms. Also because assorted tests and subtests were used in norming/equating, this raises questions about the underlying trait being measured. No attempt was evident to make sure that the various authors were operationally defining their terms equivalently. The Buros reviewer states that this update “still does not address a number of validity and interpretation problems cited in previous reviews” (Crocker & Murray Ward, 2001, p. 1371). Specifically, the reviewer points out that the content domain 5 Hayward, Stewart, Phillips, Norris, & Lovell 6 remains undescribed. Also, as in the previous revision, the sources for the selection of items was not provided nor was there any indication of rationale for the words or skills provided. The Buros reviews from the 1989 revised WRMT (Cooter & Jaeger, 1989) brought forth reliability and validity issues. I understand from the current reviews that these issues remain as the WRMT-R NU is a re-normed version only. Content: Content validity, as it refers to WRMT-R, was “developed with contributions from outside experts, including experienced teachers and curriculum specialists” (Woodcock, 1998, p. 97). However, unlike other manuals reviewed for the TELL Project, Woodcock does not provide references in the manual alongside his statement. Criterion Prediction (concurrent) Validity: Validity was reported for WRMT-R and WJ reading tests for children in Grades 1, 3, 5, and 8 across subtests and total reading scores. Scores range from a low of .39 (Passage Comprehension) to a high of .91 (Full Scale Total Reading). A 1978 study reported WRMT-1973 correlation with Iowa Test of Basic Skills, Iowa Tests of Educational Development ((total reading), PIAT Reading, WJ Reading Achievement, and WRAT Reading demonstrating scores from .79 to .92. The author justifies: “Although these results are based on the 1973 WRMT, they are reported in this revision because the psychometric characteristics of the original WRMT (1973) and the WRMT-R are so similar that many generalizations from one to the other can be validly made” (Woodcock, 1998, p. 100). Construct Identification Validity: Test and Cluster Intercorrelations: Since tests were clustered to target readiness and skill areas, correlations are reported for subtests within clusters as well as clusters overall. Subtest and clusters presented predictable correlations. Comment: Though all the data are presented in Table 5.6 (Woodcock, 1998, p. 97) for Grades 1, 3, 5, 8, 11 and College as well as Adult, I would have thought it appropriate that the author offer some comments or interpretation rather than leave that task to the reader. Differential Item Functioning: Classical and Rasch models were used in item development and selection though it is unclear from the statement on page 97. Woodcock states, “both contributed to the stringent statistical criteria employed during the process of item selection in the WRMT-R” (Woodcock, 1998, p. 97). The correlations range from low (.35 at Grade 3 for letter identification/visualauditory learning) to high (.98 for Total reading short scale/total reading full scale). Other (Please Specify): No reported studies investigating the predictive validity with special education populations were Hayward, Stewart, Phillips, Norris, & Lovell 7 undertaken, though children with special needs did participate in norming. Comment: With no predictive validity studies, we have little if anything on which to base placement decisions. Serious concern since this test is widely used by school districts for that very purpose and author promotes this use. Buros reviewer states: “Scores for special education students should be used cautiously. Although the author did include special education and gifted students in the norm sample, matching their prevalence in the general population, their actual numbers are quite small. In addition, there are still no predictive validity studies to validate the WRMT-R NU with this population; a serious omission because this test is frequently used in placement and re-evaluation” (Crocker & Murray Ward, 2001, p. 1372). Summary/Conclusions/Observations: The Buros reviewers make important comments: “…three interpretation issues arise. First, the use of the norms generated from the smaller norm sample means that interpretations are limited…Second, scores for special education students should be used cautiously…[third]. The author states that comparisons of old and new norm data clearly show a pattern of lower performances and higher standard and percentile scores of lower achievers. This effect could result in overestimation of students’ reading levels. Thus, students might not receive appropriate services or services may be terminated prematurely. Interestingly, there are no cautions to examiners to readjust score referents to account for these changes” (Crocker & Murray Ward, 2001, p. 1372). “It should also be remembered that no changes have been made in test skills or items, and there is no stipulation that the other measures clarify the meaning of the WRMT-R NU scores. In conclusion, the WRMT-R/NU is a limited norms update. The test still contains many test items and scores, but does not address problems identified by previous reviewers. Furthermore, the renorming has narrowed the utility of the test. Therefore, the WRMT-R/NU should be used in conjunction with other measures of reading. Results should not be overinterpreted. The examiner should also be very cautious in using the test with a wide range of age groups. If these cautions are observed, the test may be useful in helping estimate reading achievement” (Crocker & Murray Ward, 2001, p. 1372). Clinical/Diagnostic Usefulness: Based on the critiques available, I think that this test has limited clinical utility and if used, should only be used as an adjunct to more rigorous and contemporary reading tests. I would be very cautious about using this test’s results to make important decisions about eligibility and intervention though the author intends for the test to be used in this way. This test has a long history. It is probably widely used and likely to be well embedded in structures of assessment protocols and funding structures. I wonder if educators would give much thought to its selection, or the consideration of other more recent tests as Hayward, Stewart, Phillips, Norris, & Lovell 8 alternatives. Several generations of teachers would be familiar with it. A quote from the Pearson website: “I have used the Woodcock Reading Mastery Tests for almost 30 years now . . . I believe it has great value in diagnosing reading difficulties and providing a basis for me to write a prescription for remedying reading difficulties” (Dr. Dianne M. Haneke, Professor of Literacy Education, retired) (from the Pearson website). References Cooter, R. B., & Jaeger, R. M. (1989). – Test review of the Woodcock Reading Mastery Test – Revised. From J. C. Conoley & J. J. Kramer (Eds.), The tenth mental measurements yearbook (pp. 909-916). (pp. 1369-1373). Lincoln, NE: Buros Institute of Mental Measurements. Crocker, L., and Murray Ward, M. (2001). Test review of Woodcock Reading Mastery Test-Revised 1998 Normative Update. In B.S. Plake and J.C. Impara (Eds.), The fourteenth mental measurements yearbook (pp. 1369-1373). Lincoln, NE: Buros Institute of Mental Measurements. Current Population Survey, March, 1994 [Machine readable data file]. (1994). Washington, DC: Bureau of the Census (Producer and Distributor). Pearson Assessments (2007). Speech and language forum. Retrieved May 31, 2008 from http://www.SpeechandLanguage.com Woodcock, R. W. (1998). Woodcock reading mastery tests – Revised NU: Examiner’s manual. Circle Pines, MN: American Guidance Service. To cite this document: Hayward, D. V., Stewart, G. E., Phillips, L. M., Norris, S. P., & Lovell, M. A. (2008). Test review: Woodcock reading mastery testsrevised (NU normative update) (WRMT-R). Language, Phonological Awareness, and Reading Test Directory (pp. 1-8). Edmonton, AB: Canadian Centre for Research on Literacy. Retrieved [insert date] from http://www.uofaweb.ualberta.ca/elementaryed/ccrl.cfm.