THE MAP TEST -or- “You can attach a meaningless number to anything. Even a child.” The graphs & statistical analysis for this presentation came from http://conceptualmath.org/misc/MAPtest.htm#stddev This graph shows the most precise results obtained from MAP tests in the classroom of an expert teacher. Yet, 10 out of 38 of the growth scores (26%) fall outside a reasonable range. Further, scores did not match the student performance that the teacher was observing in the classroom. • MAP is based on the assumption that all students learn information in the same order, and all schools teach information in the same order. • The RIT scores (RIT score = the final number assigned to each student by NWEA’s MAP test) are no more accurate than these assumptions. What’s RIT ? loaded question… That’s a IRT is Item Response Theory. NWEA uses a specific IRT model conceived by Danish mathematician, Georg Rasch, (1901-1980). They’ve apparently rebranded “IRT” as “RIT”. It’s proprietary…. • NWEA claims that • • • • • • Characteristics of the RIT Scale include: It is an achievement scale. It is an accurate scale. It is an equal interval scale. It helps to measure growth over time. It has the same meaning regardless of grade or age of the student. • MAP levels are based on academic levels (curriculum sequencing), not cognitive levels (Bloom's Taxonomy). • A high RIT score can represent low level cognition in advanced curriculum material. • Thus, MAP is not a good measure for gifted performance, and should not be used to guide decisions for gifted programs. (as it has been repurposed in Seattle Public Schools) • MAP does not provide information as to why the student is having difficulty with specific material. • MAP does not even provide specific information as to what material students need work on. • As a result, MAP does not provide useful information that can guide teachers in meeting student's real needs. • Teachers don’t see the test, and the test isn’t tied to their classes or curricula • All decisions based on test results should consider the 2-sigma range. (Greek letter sigma is a standard statistical result which describes the size of measurement error within a set of measurements. A low sigma represents more believable data. As astronomers say, ”The road to hell is paved with 3-sigma.”) • NWEA reports sigma to be about 3 (except for high RIT scores where sigma is even larger.) • Thus, all decisions should consider the possibility that each reported score could have been 6 point higher, or 6 points lower. • Many schools that have retested with MAP have seen evidence that sigma is actually larger than 3. • RIT scores are actually determined by averaging together strand scores. • Since a typical MAP test has 50 questions and 5 strands, only 10 questions were asked for each strand. • Uncertainty for a 10 question multiple choice test is rather large. • This is where the dream of the statistician meets the reality of the young student. Students can’t do tests for hours, and schools don’t have computer resources for students to do tests all the time. So you ask just a few questions, and you get results with a lot of, well, sigma. • One means of checking the reliability of a test is to compare its results to another test. This is particularly important if the goal is to increase the scores on the other test. Such would be the case for high stakes testing mandated by NCLB. Below is a comparison of the results of a MAP test to the results of an End of Grade Test (EOG) required by the state. Reliability was low. MAP ranked about 5 students a year higher in performance, and 4 students a year lower in performance, (23% total) than the EOG. • • NWEA's technical manual uses r-values to estimate the reliability of the tests. The rvalues they report range from 0.76 to 0.93 with most of the values being between 0.80 and 0.89. But what does this mean for those wishing to use the tests to guide instruction? One can simulate the r-values to estimate what percentage of scores will lie within a reliable range for given r-values. An r-value of 0.92 could easily mean that over 25% of the data lies more than 6 points away from true. However, 6 points constitutes a year's normal growth for over half of the tests. As simulated in the graph above, even with an r-value of 0.92, 29% of the data is in error by more than a year's normal growth. This imprecision would lead to serious errors in identifying student needs, and then tracking the students incorrectly. • For high RIT scores (score that are normal for 8th grade and higher) the SEM (standard error measurement) is larger than a typical year's growth • Thus, MAP is not only unable to identify what skills a student may need, MAP is unable to precisely determine the grade level at which student is performing. • For very high RIT scores, negative growth is more common that positive growth. • This implies that either MAP is totally unreliable for high performers, or that MAP testing reinforces educational strategies that are counterproductive for high achievers. So, if the MAP test is not so useful, what should teachers look for ? • If your school is considered adopting a standardized test, quiz the salesperson to demonstrate how the score reporting will help you. • Give specific common examples from your experience. "I have students who can solve two-step equations with minimal difficulty, but frequently confuse the properties for negatives. How will the test report let me know what specific help these students need?" • Be sure the testing company can give good answers to these questions. Ask the test salesperson to demonstrate how the test distinguishes between high level learning and accelerated knowledge. Recommend formative assessments that identify specific strengths and weaknesses, as opposed to tests that rank for tracking purposes. • Parents & Teachers: Learn more about testing. • Do not accept the “results” of any single type of assessment at face value. All tests have measurement errors, nowhere more so than in testing students, particularly when those students have no stake in the outcome. • Test results are commonly used inappropriately by administration for invalid purposes like student placement and teacher evaluation. • If you get a bad review based on test results alone, file a grievance. Insist the administrator demonstrate that the test was both precise and accurate enough to justify the results.