Measurement and RTI Overview - College of Education & Human

RTI Measurement Overview:
Measurement Concepts for RTI
Decision Making
A module for pre-service and in-service
professional development
MN RTI Center
Author: Lisa H. Stewart, PhD
Minnesota State University Moorhead click on RTI Center
MN RtI Center
MN RTI Center Training Modules
This module was developed with funding from the MN legislature
It is part of a series of modules available from the MN RTI Center
for use in preservice and inservice training:
Module Title
1. RTI Overview
Kim Gibbons & Lisa Stewart
2. Measurement and RTI Overview
Lisa Stewart
3. Curriculum Based Measurement and RTI
Lisa Stewart
4. Universal Screening (Benchmarking): (Two
Lisa Stewart
What, Why and How
Using Screening Data
5. Progress Monitoring: (Two parts)
Lisa Stewart & Adam Christ
What, Why and How
Using Progress Monitoring Data
MN RtI Center
6. Evidence-Based Practices
Ann Casey
7. Problem Solving in RTI
Kerry Bollman
8. Differentiated Instruction
Peggy Ballard
9. Tiered Service Delivery and Instruction
Wendy Robinson
10. Leadership and RTI
Jane Thompson & Ann Casey
11. Family involvement and RTI
Amy Reschly
12. Five Areas of Reading
Kerry Bollman
13. Schoolwide Organization
Kim Gibbons
Purpose(s) of assessment
Characteristics of effective measurement for RTI
Critical features of measurement and RTI in the
areas of screening, progress monitoring, and
diagnostic instructional planning
CBM/GOMs as a frequently used RTI
measurement tool
Multiple sources of information and convergence
MN RtI Center
Why Learn About Measurement?
“In God we trust…
All others must have data.”
Dr. Stan Deno
MN RtI Center
One of the Key Components in RTI
Curriculum and
School Wide
Organization &
Problem Solving
(Teams, Process, etc)
MN RtI Center
DRFT May 27, 2009
Adapted from Logan City School District, 2002
Measurement and Assessment
Schools have to make many choices about
measurement tools and the process of
gathering information used to make
decisions (assessment)
We need different measurement tools for
different purposes
MN RtI Center
Some Purposes of Assessment
Diagnostic - instructional planning
Monitoring student progress (formative)
Evaluation (summative)
MN RtI Center
Standardized measures given to all students to:
Help identify students at-risk in a PROACTIVE way
Give feedback to the system about how students progress
throughout the year at a gross (e.g., 3x per year) level
 If students are on track in the fall are they still on track in the
 What is happening with students who started the year below
target, are they catching up?
Give feedback to the system about changes from year to year
Is our new reading curriculum having the impact we were
MN RtI Center
DRAFT May 27, 2009
Diagnosis/Instructional Planning
Measures given to understand a student’s skill
level (strengths and weaknesses) help guide:
Instructional grouping
Where to place the student in the curriculum &
curricular materials
What skills are missing or weak and may need to be
retaught or practiced and the level of support and
explicitness needed
Development or selection of curriculum and targeted
MN RtI Center
Monitoring Student Progress (Formative)
Informally this happens all the time and helps teachers adjust
their teaching on the spot
More formalized progress monitoring involves standardized
measures, tied to important educational outcomes, and given
frequently (e.g. weekly) to:
Prompt you to change what you are doing with a student if it is not
working (formative assessment) so you are effective and efficient
with your time and instruction
Make decisions about instructional goals, materials, levels, and groups
Aid in communication with parents
Document progress for special education students as required for
periodic and annual reviews
MN RtI Center
Evaluation (Summative)
Measures used to provide a snapshot or
summary of student skill at one particular
point in time, often at the end of the
instructional year or unit
E.g. state high stakes tests
"When the cook tastes the soup, that’s
formative; when the guests taste the soup,
that’s summative."
MN RtI Center
One Test Can Serve More Than One Purpose
To the extent a test does more than one
thing well, it is a more efficient use of
student time and school resources
Example 1: Reading CBM measures of Oral
Reading Fluency can be used for screening and
progress monitoring
Example 2: the NWEA (MAP) test may be used for
screening and instructional planning
MN RtI Center
On Measurement Overview Purposes of
Assessment Worksheet
Make a list of all the tests you have learned about or
have seen used in the school setting (or are
currently in use in your school)
Try to decide what purpose(s) each test served
MN RtI Center
Assessment Tools and Purpose(s)
Name of Test
(Screening, Instructional Planning, Progress Monitoring, Program Eval.)
MN RtI Center
Buyer Beware
Although it is good if a test can serve more than one
purpose, just because a test manual or advertisement
SAYS it is useful for multiple purposes, doesn’t mean
the test actually IS useful for multiple purposes
Example: Many tests designed for diagnostic purposes or for
summative evaluation state they are also useful for progress
monitoring, but are too time consuming, too costly, too
unreliable, or too insensitive to changes in student skills to be
of practical use for progress monitoring
MN RtI Center
Establishing a Measurement System
A core feature of RTI is identifying a
measurement system
Screen large numbers of students
Identify students in need of additional intervention
Monitor students of concern more frequently
1 to 4x per month
Typically weekly
Diagnostic testing used for instructional planning to
help target interventions as needed
MN RtI Center
Characteristics of An Effective
Measurement System for RTI
easily understood
can be given often
sensitive to growth over
short periods of time
Credit: K Gibbons, M Shinn
Technical Characteristics of
Measurement Tools
Reliability- the consistency of the measure
If tested again right away or by a different person
or with an alternate equivalent form of the test,
the score should be similar
Allows us to have confidence in the score and use
the score to generalize what we see today to other
times and situations
If a student knows how to decode simple words on a sheet
of paper at 8am this morning, we would expect him to be
able to decode similar simple words at noon… and the next
MN RtI Center
Why is Reliability so Important?
Assume you have a test that decides whether or
not you need to take (and pay for) a remedial
math class in college that does not count toward
The test average score is 50 points.
The test has a “cut off” score of 35, so students who
score below 35 have to take the remedial class.
MN RtI Center
Why is Reliability so Important? (Cont’d)
If the test is reliable, and you get a score of 30, if you take
another version of the test or take the test again a week
later (without major studying or changing what you know!)
you would likely get a score very close to 30….
If the test is not reliable, and you get a score of 30…You
might be able to take the test again or take another
version of the test and get a score of 40…or a score of 20!
If the test is unreliable we can’t have much faith in the score
and it becomes difficult to use the test to make decisions!
MN RtI Center
But what if the test IS reliable and you get a score of
30 but your math skills are much better than the
score implies? What if you get a score of 30 but you
don’t really need a remedial math class?
Then the test has an issue with VALIDITY
A test is valid only if the interpretation of the test scores are
A common definition of validity is that “the test measures what
it says it measures”
Another definition is that a test is valid if it helps you make
better decisions or leads to better outcomes than if you had
never given the test
MN RtI Center
Types of Validity
There are many ways to try to demonstrate
Content validity
Criterion related validity: concurrent and predictive
Treatment Validity
Construct Validity
MN RtI Center
Types of Validity (Cont’d)
Content validity
The test content is reasonable
Criterion related validity: two types
Concurrent- the scores from this test are similar to
scores from other tests that measure the
same/similar thing
Predictive- the test scores from this test do a
pretty good job of letting us know what score a
student will get on another test in the future
MN RtI Center
Types of Validity (Cont’d)
Treatment Validity
If you use this test to decide about some treatment
or intervention or instructional approach….
Do you make better decisions?
Do you have better goals? Planning? Student
Most importantly: Are the outcomes for your students
MN RtI Center
Types of Validity (Cont’d)
Construct Validity
Does the test measure the theoretical trait or
E.g. If the theory says children need to have a base of solid
decoding skills before they will be fast and fluent readers of
new text, do the scores on the reading test of decoding and
fluency support that?
All other ways to try to document validity are in
some way also addressing construct validity
(content, criterion, treatment, etc.)
MN RtI Center
The NOT Validity Kind of Validity
Face validity is NOT really validity
 Positive: It “looks” good
Negative: I just don’t like it
Just because a test looks good or you (or your colleague)
like to give it does not mean it gives you good information
or is the best test to use
Just because a test isn’t set up exactly how you like it does
not mean it does NOT give you good information
Look for EVIDENCE of reliability and validity, don’t
rely on your reaction, or the reactions and
testimonials of colleagues, alone.
MN RtI Center
Reliability and Validity
Just because a test is reliable does not mean it is
It may reliably give you an inaccurate score!
If a test is not reliable, it cannot be valid
No test or test score is perfectly reliable
We use test scores to help make a variety of
decisions-- some “low stakes” and some “high stakes”
So how reliable is “reliable enough”?
It depends ….
MN RtI Center
Measuring Reliability and Validity
Typically reliability and validity evidence involves
comparing the test to itself or to other tests or
The statistic used to sum up that comparison is often
a correlation ( r )
Correlations vary from r = 0.0 to 1.0
The closer a correlation is to 1.0 the “stronger” the
relationship or the better you can predict one score or
outcome if you know the other one
MN RtI Center
How Reliable is Reliable Enough?
For important INDIVIDUAL decisions? r = .90
For SCREENING decisions? r = .80
Salvia & Yselldyke, 2006
“Reliability is like money, as long as you have
it, it’s not a problem, but if you don’t, it’s a BIG
problem!” ~ Fred Kurlinger
MN RtI Center
How Valid is Valid Enough?
Little/no validity
Below average
Average validity
MN RtI Center
Above average
Exceptional validity
Source: Webb, MW, 1983 journal of reading, 26(5) 414-424
Looking at Validity With a Purpose in Mind
Predictive Validity is really important if you
are using the test as a screening tool to
predict which students are at risk or not at
risk of reading difficulty
Treatment validity is really important if you
are using the test in an effort to lead to some
sort of improved outcome
MN RtI Center
Validity isn’t Just About the Test
Validity has to do with the test use and
interpretation, so even a “valid” test can be used
for the wrong reasons or misinterpreted or
Example 1: A test score for an ELL student should
reflect the student’s skills, not her ability to understand
the directions and what is being asked
Example 2 on next slide
MN RtI Center
Validity isn’t Just About the Test (Cont’d)
Example 2: Letter Naming Fluency (LNF)
LNF involves giving a student a page of randomized upper and
lower case letters and having the student name as many letters as
they can in one minute.
As a test of early literacy, LNF has good reliability and concurrent
and predictive validity, especially predictive validity
However, it can be easily MISUSED—
If interpreted correctly, LNF can identify students at risk for early reading
difficulty and get those students into well-rounded early literacy instruction
well suited to them,
BUT, if it is interpreted to mean that a student low in LNF needs to just
have a lot of instructional time spent only learning letter names (often
taking time away from high quality well-rounded early literacy instruction) it
can actually have a negative impact.
MN RtI Center
Test Utility
Is it easy to use, time efficient, and cheap? 
Even if a test is reliable and valid, if it is too difficult
to use, too time consuming, or too expensive it just
won’t get used
If a reliable and valid progress monitoring tool took 30
minutes per child and you wanted to monitor 10 students in
your class every week, would you use it?
However, if a test is easy and short and cheap… but
isn’t reliable or valid… it’s still a waste of time, no
matter how short!
MN RtI Center
Test Utility (Cont’d)
Is it sensitive enough for the decisions you
want to make?
Can it detect the differences between groups of
kids or within an individual that you need to help
you make a decision?
If a progress monitoring tool can only show gains of 1 point
per month, is it sensitive enough to help give you timely
feedback on the student’s response to your instruction?
MN RtI Center
On “Characteristics of Assessment Tools for RTI”
Make a list of tests you have learned about or have seen used
in the school setting (or are currently in use in your school)
Can use all or some of the tools from the Purposes of
Assessment Worksheet for your list
Is the test reliable and valid FOR THE PURPOSE IT IS BEING
Is it quick and simple?
Is it inexpensive?
Can it be given often (has alternate forms, etc)?
Is it sensitive?
MN RtI Center
Characteristics of Assessment Tools for RTI
Name of
MN RtI Center
Quick &
Can be
Sensitive to
growth over
short time
Some Help in Looking for Evidence
Measurement tools are reviewed at the
following sites:
 These sites only review tests submitted, if it is not on the list
it doesn’t mean it is bad, just that it wasn’t reviewed
 Be sure you know the purpose of assessment (screening,
progress monitoring, etc) to best interpret the information
MN RtI Center
Critical Features of Measurement and RTI
Progress Monitoring
Diagnostic Instructional Planning
MN RtI Center
Measurement and RTI: Screening
Reliability coefficients of at least r =.80. Higher is
better, especially for screening specificity.
Well documented predictive validity
Evidence the criterion (cut score) being used is
reasonable and creates not too many false positives
(students identified as at risk who aren’t) or false
negatives (students who are at risk who aren’t
identified as such)
Brief, easy to use, affordable, and results/reports are
accessible almost immediately
MN RtI Center
Measurement and RTI: Progress Monitoring
Reliability coefficients of r=.90+
Because you are looking at multiple data points
over time, it is possible to use a test with a lower
reliability (e.g. .80-.90), but wait until you have
several data points and use the combined data to
increase confidence in your decisions
Well documented treatment validity!
MN RtI Center
Msrmnt & RTI: Progress Monitoring (Cont’d)
Test and scores are very sensitive to increases
or decreases in student skills over time
Evidence of what slope of progress (how much
growth in a day, week or a month) is typical under
what conditions can greatly increase your ability to
make decisions
VERY brief, easy to use, affordable, alternate
forms, and results/reports are accessible
MN RtI Center
Measurement and RTI: Diagnostic
Assessment for Instructional Planning
Reliability coefficients of r =.80+ ASSUMING you are open
to changing the instruction (formative assessment) if your
planning didn’t work out as you thought it might
Aligned with research on the development and teaching of
Well documented treatment validity, utility for instructional
Time and cost efficient but specific enough to be useful for
designing effective interventions
Linked to standards and curriculum scope and sequence
MN RtI Center
Msrmnt & RTI: Diagnostic Assessment
for Instructional Planning (Cont’d)
Many instructional planning tools have limited
information on reliability and validity—Look for tools
that do have data.
If creating your own tests, use best practices in test
Overall be sure you are doing standardized frequent
progress monitoring and looking at student engaged
time as other sources of information to ensure
instruction is well planned.
MN RtI Center
RTI, General Outcome Measures and
Curriculum Based Measurement
Many schools use Curriculum Based Measurement (CBM)
general outcome measures for screening and progress
Most common CBM tool in Grades 1- 8 is Oral Reading Fluency
You don’t “have to” use CBM, but many schools do
Measure of reading rate (# of words correct per minute on a grade
level passage) and a strong indicator of overall reading skill,
including comprehension
Early Literacy Measures are also available such as Nonsense
Word Fluency (NWF), Phoneme Segmentation Fluency (PSF),
Letter Name Fluency (LNF) and Letter Sound Fluency (LSF)
MN RtI Center
Typically meet the criteria needed for RTI screening
and progress monitoring
Reliable, valid, specific, sensitive, practical
Also, some utility for instructional planning (e.g., grouping)
They are INDICATORS of whether there might be a
problem, not diagnostic!
Like taking your temperature or sticking a toothpick into a cake
Oral reading fluency is a great INDICATOR of reading
decoding, fluency and reading comprehension
Fluency based because automaticity helps discriminate
between students at different points of learning a skil
MN RtI Center
MN RtI Center
DRAFT May 27, 2009
CBM Oral Reading Fluency
Give 3 grade-level passages using standardized administration
and scoring; use median (middle) score
3-second rule (tell the student the word & point to next word)
Discontinue rule (after 0 correct in first row, if <10 correct on 1 st
passage do not give other passages)
MN RtI Center
Not Errors
Hesitation for >3 seconds
Incorrect pronunciation
for context
Omitted Words
Words out of order
Repeated Sounds
Skipped Row
Fluency and Comprehension
The purpose of reading is comprehension
A good measures of overall reading
proficiency is reading fluency
because of its strong correlation to
measures of comprehension.
MN RtI Center
The Importance of Multiple Sources of
No ONE test is going to serve all purposes or give you all the
information you need.
Use MULTIPLE sources of data to make the best decisions
Screening, progress monitoring, diagnostic, and evaluative
data from multiple sources and/or across time
Teacher observation and more formal observations
Other pieces of relevant information such as behavior,
attendance, health, the curriculum and instructional
environment, etc.
Look for CONVERGENCE of data- places where several
sources of data point to the same decision or conclusion
MN RtI Center
Articles Available with this Module
Shoemaker, J. (2006). Reliability and Validity
Stats “crib sheet” from Heartland AEA (Iowa)
Traditional and Modern Concepts of Validity. ERIC/AE
Also see articles specific to particular uses of
measurement in benchmark and progress monitoring
MN RtI Center
Recommended Resources
American Psychological Association, American Educational
Research Association, & National Council on Measurement
in Education. (1985). Standards for educational and
psychological testing. Washington, DC: American
Psychological Association.
Educational Measurement Text, e.g. texts by Hogan,
Marzano, or Salvia & Ysseldyke, or a good Educational
Psychology text that covers reliability, validity and utility of
MN RtI Center
Web Resource on Measurement
Heartland (Iowa) website link with
powerpoints on common myths and
confusions about assessment
MN RtI Center
RTI Related Resources
National Center on RTI
RTI Action Network – links for Assessment and
Universal Screening
 and click on link
National Center on Student Progress Monitoring
MN RTI Center
Research Institute on Progress Monitoring
RtI Center
RTI Related Resources (Cont’d)
National Association of School Psychologists
National Association of State Directors of Special
Education (NADSE)
Council of Administrators of Special Education
Office of Special Education Programs (OSEP) toolkit
and RTI materials
MN RtI Center
DRAFT May 27, 2009
1. A purpose of assessment is what?
 A.) Screening
 B.) Diagnostic
 C.) Progress Monitoring
 D.) Evaluation
 E.) All of the above
2. True or False? A test is useful for multiple purposes
as long as its manual or advertisement says it is.
MN RtI Center
DRAFT May 27, 2009
3. The consistency of the measure is called its what?
A.) Validity
B.) Reliability
C.) Criterion
D.) Sensitivity
4. If the test measures the construct it says it measures it
A.) Validity
B.) Reliability
C.) Criterion
MN RtI Center
True or False for each statement?
5.) Even if a test is not valid, it can still be reliable.
6.) Even if a test is not reliable, it can still be valid.
7.) Validity is not just about the test—it has to do with the
test use and interpretation, so even a valid test can be
used for the wrong reasons, misinterpreted, or misused.
MN RtI Center
The End 
Note: The MN RTI Center does not endorse any particular
product. Examples used are for instructional purposes only.
Special Thanks:
Thank you to Dr. Ann Casey, director of the MN RTI Center, for
her leadership
Thank you to Aimee Hochstein, Kristen Bouwman, and Nathan
Rowe, Minnesota State University Moorhead graduate
students, for editing, writing quizzes, and enhancing the quality
of these training materials
MN RtI Center