Webinar PowerPoint Slides - Center on Response to Intervention

advertisement
Iowa’s Application of Rubrics to
Evaluate Screening and Progress
Tools
John L. Hosp, PhD
University of Iowa
Overview of this Webinar
• Share rubrics for evaluating screening and
progress tools
• Describe process Iowa Department of Education
used to apply rubrics
Purpose of the Review
• Survey of universal screening and progress tools
currently being used by LEAs in Iowa
• Review these tools for technical adequacy
• Incorporate one tool into new state data system
• Provide access to tools for all LEAs in state
Collaborative Effort
The National Center on
Response to Intervention
Structure of the Review Process
Core
Group
Other IDE staff as well as
stakeholders from LEAs,
AEAs, and IHEs from
across the state
IDE staff responsible for
administration and
coordination of the effort
Vetting
Group
Work
Group
IDE and AEA staff who
conducted the actual reviews
Overview of the Review Process
• The work group was divided into 3 groups:
Group A
Key elements of tools:
name, what it measures,
grades it is used with,
how it is administered,
cost, time to administer
Group B
Technical features:
reliability, validity,
classification accuracy,
relevance of criterion
measure
Group C
Application features:
alignment with CORE,
training time, computer
system feasibility, turn
around time for data,
sample, disaggregated
data
• Within each group, members worked in pairs
Overview of the Review Process
• Each pair:
▫ had a copy of the materials needed to conduct the
review
▫ reviewed and scored their parts together and then
swapped with the other pair in their group
• Pairs within each group met only if there were
discrepancies in scoring
▫ A lead person from one of the other groups
participated to mediate reconciliation
• This allowed each tool to be reviewed by every work
group member
Overview of the Review Process
• All reviews will be completed and brought to a
full work group meeting
• Results will be compiled and shared
• Final determinations across groups for each tool
will be shared with the vetting group two weeks
later
• The vetting group will have one month to review
the information and provide feedback to the
work group
Structure and Rationale of Rubrics
• Separate rubrics for universal screening and
progress monitoring
▫ Many tools reviewed for both
▫ Different considerations
• Common header and descriptive information
• Different criteria for each group (a, b, c)
Universal Screening Rubric
Header on cover page
Iowa Department of Education
Universal Screening Rubric for Reading (Revised 10/24/11)
What is a Universal Screening Tool in Reading:
It is a tool that is administered at school with ALL students to identify which students are at-risk for reading failure on an outcome measure. It
is NOT a placement screener and would not be used with just one group of students (e.g., a language screening test)
Why use a Universal Screening Tool:
It tells you which students are at-risk for not performing at the proficient level on an end of year outcome measure. These students need
something more and/or different to increase their chances of becoming a proficient reader.
What feature is most critical:
Classification Accuracy because it provides a demonstration of how well a tool predicts who may and may not need something more. It is
critical that Universal Screening Tools identify the correct students with the greatest degree of accuracy so that resources are allocated
appropriately and students who need additional assistance get it.
Group A
Group A
Information Relied on to make determinations: (circle all that apply, minimum of two)
Manual from publisher
NCRtI Tool Chart
Buros/Mental Measurement Yearbook
On-Line publisher Info.
Outside Resource other than Publisher or Researcher of Tool
Name of Screening Tool:
Grades: (circle all that apply) K
Criteria
Cost
(minus
administrative fees
like printing)
Student time spent
engaged with tool
Skill/Area Assessed with Screener:
1
2
3
4
5
6 Above6
Justification
Tools need to be economically viable
meaning the cost would be
considered “reasonable” for the state
or a district to use. Funds that are
currently available can be used and
can be sustained. One time funding to
purchase something would not be
considered sustainable.
The amount of student time required
to obtain the data. This does not
include set-up and scoring time.
How Screener Administered: (circle one)
Score 3
Free
Score 2
$.01 to $1.00
per student
Score 1
$1.01 to $2.00
per student
≤ 5 minutes
per student
6 to 10
minutes per
student
11 to 15
minutes per
student
Group
or
Score 0
$2.01 to $2.99
per student
Individual
Kicked out if:
≥ $3.00 & over
per student
>15 minutes per
student
Group B
Criteria
Criterion Measure
used for
Classification
Accuracy (Sheet for
Judging Criterion
Measure)
Justification
The measure that is being used as a
comparison must be determined to
be appropriate as the criterion. In
order to make this determination
several features of the criterion
measure must be examined.
Classification
Accuracy
(Sheet for Judging
Classification
Accuracy for
Screening Tool)
Tools need to demonstrate they can
accurately determine which
students are in need of assistance
based on current performance and
predicted performance on a
meaningful outcome measure. This
is evaluated with: Area Under the
Curve (AUC), Specificity and
Sensitivity
The measure that is being used as a
comparison must be determined to
be appropriate as the criterion. In
order to make this determination
several features of the criterion
measure must be examined.
Criterion Measure
used for Universal
Screening Tool
(Sheet for Judging
Criterion Measure)
Group B
Score 3
Score 2
15-12 points
11-8 points on
on criterion
criterion
measure form measure form
Score 1
7-4 points on
criterion
measure form
Score 0
3-0 points on
criterion
measure form
9-7 points on
classification
accuracy form
6-4 points on
classification
accuracy form
3-1 points on
classification
accuracy form
0 points on
classification
accuracy form
15-12 points
on criterion
measure form
11-8 points on
criterion
measure form
7-4 points on
criterion
measure form
3-0 points on
criterion
measure form
Kicked out if:
Same test but
uses different
subtest or
composite OR
same test given
at a different
time
No data
provided
Same test but
uses different
subtest or
composite OR
same test given
at a different
time
Judging Criterion Measure
Additional Sheet for Judging the External Criterion Measure (Revised 10/24/11)
Used for: Circle all that apply
Screening: Classification Accuracy
Name of Criterion Measure: Gates
Screening: Criterion Validity
Progress Monitoring: Criterion Validity
How Criterion Administered: (circle one)
Group or Individual
Information Relied on to make determinations: (circle all that apply)
Manual from publisher
NCRtI Tool Chart
Buros/Mental Measurement Yearbook
info.
Outside Resource other than publisher or Researcher of Measure
1. An appropriate Criterion Measure is:
a) External to the screening or progress monitoring tool
b) A Broad skill rather than a specific skill
c) Technically adequate for reliability
d) Technically adequate for validity
e) Validated on a broad sample that would also represent Iowa’s population
On-Line publisher
Judging Criterion Measure (cont)
Feature
Justification
Criterion Measure is:
a) External to the
The criterion measure should be
Screening or
separate and not related to the
Progress
screening or progress monitoring
Monitoring Tool tool. Meaning the outside
measure should be by a different
author/publisher and use a
different sample. (e.g., NWF can’t
predict ORF by the same
publisher)
Score 3
b) A broad skill
rather than a
specific skill
Broad reading
skills are
measured (e.g.,
Total reading
score on ITBS)
We are interested in generalizing
to a larger domain and therefore,
the criterion measure should
assess a broad area rather than
splinter skills.
Score 2
External with
no/little overlap.
Different
author/publisher,
standardization
group.
Score 1
Score 0
External with
some/ a lot of
overlap. Same
author/publisher,
and
standardization
group.
Broad reading
skills are
measured but in
one area (e.g.,
comprehension
made up of two
subtests)
Specific skills
measured in two
areas (e.g.,
comprehension
and decoding)
Kicked Out
Internal
(same test
using
different
subtest or
composite
OR same
test given
at a
different
time)
Specific skill
measured in
one area (e.g.,
PA, decoding,
vocabulary,
spelling)
Judging Criterion Measure (cont)
c) Technically
adequate for
Reliability
d) Technically
adequate for
Validity
e) A broad sample
is used
Student performance needs to be
consistently measured. Typically
demonstrated with reliability
under different items (alternate
form, split half, coefficient alpha)
The tool measures what it
purports to measure. We focused
on criterion-related validity to
make this determination. The
extent to which this criterion
measure relates to another
external measure that is
determined good.
The sample used in determining
the technical adequacy for a tool
should represent a broad
audience. While a representative
sample by grade is desirable it is
often not reported therefore,
taken as a whole does the
population used represent all
students or is it specific to a
region or state?
Some form of
reliability above
.80
Some form of
reliability
between .70
and .80
Some form of
reliability
between .60 and
.70
All forms of
reliability below
.50
Criterion
≥ .70
Criterion
.50-.69
Criterion
.30 -.49
Criterion
.10 - .29
National sample
Several States
(3 or more)
across more
than one region
States (3, 2 or 1
in one region)
Sample of
convenience,
does not
represent a
state.
Judging Classification Accuracy
Additional Sheet for Judging Classification Accuracy for Screening Tool (Revised 10/24/11)
Assessment: (Include name and grade)
Complete the Additional Sheet for Judging the Criterion Measure. If it is not kicked out complete review for:
1) Area Under the Curve (AUC)
2) Specificity/Sensitivity
3) Lag time between when the assessments are given
Feature
Justification
1) Area Under the Curve (AUC)
Technical Adequacy is
Area Under the Curve is one
Demonstrated for Area
way to gauge how accurately a
Under the Curve
tool identifies students in need
of assistance. It is derived from
Receiver Operating
Characteristic curves (ROC) and
is presented as a number to 2
decimal places. One AUC is
reported for each comparison—
each grade level, each
subgroup, each outcome tool,
etc.
Score 3
AUC ≤ .90
Score 2
AUC ≥ .80
Score 1
AUC ≥ .70
Score 0
AUC < .70
Kicked Out
Judging Classification Accuracy (cont)
2) Specificity or Sensitivity
Technical Adequacy is
Specificity/Sensitivity is another
Demonstrated for
way to gauge how accurately a
Specificity or Sensitivity
tool identifies students in need
(see below)
of assistance. Specificity and
Sensitivity can give the same
information depending on how
the developer reported the
comparisons. Sensitivity is often
reported as accuracy of positive
prediction (yes on both tools).
Therefore if the developer
predicted positive/proficient
performance, Sensitivity will
express how well the screening
tool identifies students who are
proficient. If predicting at-risk or
non-proficient, this is what
Sensitivity shows. It is important
to verify what the developer is
predicting so that consistent
comparisons across tools can be
made (see below)
3) Lag time between when the assessments are given
Lag time- length of time
Time between when the
between when the
assessments are given should
criterion and screening
be shorter to eliminate effects
assessment is given
associated with differential
instruction
Sensitivity or
Specificity
≥ .90
Sensitivity or
Specificity
≥ .85
Sensitivity or
Specificity
≥ .80
Sensitivity or
Specificity
< .80
Under two
weeks
Between two
weeks and 1
month
Between 1
month and 6
months
Over 6 months
Sensitivity and Specificity Considerations and Explanations
Explanations:
True means “in agreement between screening and outcome”. So true can be
negative to negative in terms of student performance (i.e., negative
meaning at-risk or nonproficient). This could be considered either positive
or negative prediction depending on which the developer intends the tool
to predict. As an example, a tool that has a primary purpose of identifying
students at-risk for future failure would probably use ‘true positives’ to
mean ‘those students who were accurately predicted to fail the outcome
test’.
Sensitivity = true positives/true positives + false negatives
Specificity = true negatives/true negatives + false positives
Key
+ = proficiency/mastery
- = nonproficiency/at-risk
0 = unknown
= Sensitivity
= Specificity
Consideration 1:
Determine whether developer is predicting a positive outcome (i.e., proficiency, success, mastery, at or
above a criterion or cut score) from a positive performance on the screening tool (i.e., at or above
benchmark or a criterion or cut score) or a negative outcome (i.e., failure, nonproficiency, below a
criterion or cut score) from negative performance on the screening tool (i.e., below a benchmark,
criterion, or cut score). Prediction is almost always positive to positive or negative to negative;
however in rare cases it might be positive to negative or negative to positive.
Figure 1a
Outcome This is an example of positive to positive prediction. In this case, Sensitivity
+ - is positive performance on the screening tool predicting positive outcome.
Screening
+
Figure 1b
Screening
+
Outcome This is the opposite prediction—negative to negative as the main focus. In
- + this case, Sensitivity is negative (or at-risk) performance on the screening
tool predicting negative outcome.
Using the same information in these two tables, Sensitivity in the top table
will equal Specificity in the second table. Because our purpose is to predict
proficiency, in this instance we would use Specificity as the metric for judging.
Consideration 2:
Some developers may include a third category—unknown prediction. If this is the case, it is still
important to determine whether they are predicting a positive or negative outcome because Sensitivity
and Specificity are still calculated the same way.
Figure 2a
Outcome
+
0
-
+
Screening 0
Figure 2b
Outcome
Screening 0
+
0
+
This is an example of positive to positive prediction. In this case,
Sensitivity is positive performance on the screening tool predicting
positive outcome. It represents a similar comparison to that in Figure
1a.
This is the opposite prediction—negative to negative as the main
focus. In this case, Sensitivity is negative (or at-risk) performance on
the screening tool predicting negative outcome. It represents a similar
comparison to that in Figure 1b.
Using the same information in these two tables, Sensitivity in the top
table will equal Specificity in the second table. Because our purpose is
to predict proficiency, in this instance we would use Specificity as the metric
for judging.
Consideration 3:
In (hopefully) rare cases, the developer will set up the tables in opposite directions (reversing
screening and outcome or using a different direction for the positive/negative for one or both). This
illustrates why it is important to consider which column or row is positive and negative for both the
screening and outcome tools.
Screening
+
Outcome 0
-
0
+
Notice that the Screening and Outcome tools are transposed. This
makes Sensitivity and Specificity align within rows rather than
columns.
Group B (cont)
Criterion Validity for
Universal Screening
Tool. From technical
manual
Reliability for
Universal Screening
Tool.
Reliability across
raters for Universal
Screening Tool.
Tools need to demonstrate that
they actually measure what they
purport to measure (i.e., validity).
We focused on criterion-related
validity because it is a determination
of the relation between the
screening tool and a meaningful
outcome measure.
Tools need to demonstrate that the
test scores are stable across items
and/or forms. We focused on:
 alternate form
 split half
 coefficient alpha
How reliable scores are across raters
is critical to the utility of the tool. If
the tool is complicated to
administer and score it can be
difficult to train people to use it
leading to different scores from
person to person.
Criterion
≥ .70
Criterion
.50-.69
Criterion
.30 -.49
Criterion
.10 - .29
Criterion
< .10 or no
information
provided
Alternate
Form > .80
Alternate
Form > .70
Alternate
Form > .60
Alternate
Form > .50
There is no
evidence of
reliability
Split-half > .80
Split-half > .70
Split-half > .60
Split-half > .50
Coefficient
alpha >.80
Coefficient
alpha >.70
Coefficient
alpha >.60
Coefficient
alpha >.50
Rater ≥.90
Rater .89-.85
Rater .84-.80
Rater ≤.75
Group C
Criteria
Alignment with
Iowa CORE/
Demonstrated
Content Validity
Justification
It is critical that tools assess skills
identified in the Iowa Core.
Literature & Informational:
 Key Ideas & Details
 Craft & Structure
 Integration of knowledge &
ideas
 Range of reading & level of
text complexity
Foundational: (K – 1)
 Print Concepts
 Phonological Awareness
 Phonics and Word
Recognition
 Fluency
Foundational: (2 – 5)
 Phonics and Word
Recognition
 Fluency
Group C
Score 3
Score 2
Has a direct
Has alignment
alignment
with Iowa
with the Iowa CORE (Provide
CORE (provide
Broad Area)
Broad Area
and Specific
Skill)
Score 1
Score 0
Kicked out if:
Has no
alignment with
the Iowa CORE
Group C (cont)
Training Required
Computer
Application (tool
and data system)
Data
Administration and
Data Scoring
The amount of time needed for
Less than 5
training is one consideration related
hours of
to the utility of the tool. Tools that can
training
be learned in a matter of hours and
(1 day)
not days would be considered
appropriate.
Many tools are given on a computer
Computer or
which can be helpful if: schools have
hard copy of
computers, the computers are
tool available.
compatible with the software, and the Data reporting
data reporting can be separated from
is separate
the tool itself. It is also a viable option
if hard copies of the tools can be used
if computers are not available.
The number of people needed to
Student takes
administer and score the data speaks
assessment
to the efficiency of how data is
on computer
collected and the reliability of scoring.
and it is
automatically
scored by
computer at
end of test
5.5 to 10
hours of
training
(2 days)
10.5 to 15
hours of
training
(3 days)
Over 15.5 hours of training
(4+ days)
Computer
application
only.
Data reporting
is separate
Computer or
hard copy of
tools
available.
Data reporting
is part of the
system
Computer application only.
Data reporting is part of
the system
Adult
administers
assessment to
student and
enters
student’s
responses (in
real time) into
computer and
it is
automatically
scored by
computer at
end of test
Adult
administers
assessment to
student and
then
calculates a
score at end
of test by
conducting
multiple steps
Adult administers
assessment to student and
then calculates a score at
end of test by conducting
multiple steps AND
referencing additional
materials to get a score
(having to look up
information in additional
tables)
Group C (cont)
Data Retrieval (time The data needs to be available in a
for data to be
timely manner in order to use the
useable)
information to make decisions about
students
Data can be
used instantly
Data can be
used Same
day
Data can be
used Next day
Data are not
available until
2 – 5 days later
A broad sample is
used
The sample used in determining the
technical adequacy for a tool should
represent a broad audience. While a
representative sample by grade is
desirable it is often not reported
therefore, taken as a whole does the
population used represent all students
or is it specific to a region or state?
National
sample
Several States
(3 or more)
across more
than one
region
States (3, 2 or
1 in one
region)
Sample of
convenience,
does not
represent a
state.
Disaggregated Data
Viewing disaggregated data by
subgroups (i.e, race, English language
learners, economic status, special ed.
status) helps determine how the tool
works with each group. This
information is often not reported but
it should be considered if it is
available.
Race,
economic
status, and
special ed.
status are
reported
separately
At least two
disaggregated
groups are
listed
One
disaggregated
group is listed
No information
on
disaggregated
groups
Takes 5+ days
to use data
(have to send
data out to be
scored)
Progress Monitoring Rubric
Header on cover page
Iowa Department of Education
Progress Monitoring Rubric (Revised 10/24/11)
Why use Progress Monitoring Tools:
They quickly and efficiently provide an indication of a student’s response to instruction. Progress monitoring tools are sensitive to student growth
(i.e., skills) over time, allowing for more frequent changes in instruction. They allow teachers to better meet the needs of their students and
determine how best to allocate resources.
What feature is most critical:
Sufficient number of equivalent forms so that student skills can be measured over time. In order to determine if students are responding positively
to instruction, they need to be assessed frequently to evaluate their performance and the rate at which they are learning.
Descriptive info on each work group’s section
Information Relied on to make determinations: (circle all that apply, minimum of two)
Manual from publisher
NCRtI Tool Chart
Buros/Mental Measurement Yearbook
On-Line publisher Info.
Outside Resource other than Publisher or Researcher of Tool
Name of Progress Monitoring Tool:
Grades: (circle all that apply)
Name of Criterion Measure:
K
1
Skill/Area Assessed with Progress Monitoring Tool:
2
3
4
5
6
Above6
How Progress Monitoring Administered: (circle one)
How Criterion Administered: (circle one)
Group
Group
or
or
Individual
Individual
Group A
Criteria
Number of
equivalent forms
Cost
(minus
administrative fees
like printing)
Student time spent
engaged with tool
Justification
Progress monitoring requires
frequently assessing a student’s
performance and making
determinations based on their growth
(i.e., rate of progress). In order to
assess students’ learning frequently,
progress monitoring is typically
conducted once a week. Therefore,
most progress monitoring tools have 20
to 30 alternate forms.
Tools need to be economically viable
meaning the cost would be considered
“reasonable” for the state or a district
to use. Funds that are currently
available can be used and can be
sustained. One time funding to
purchase something would not be
considered sustainable.
The amount of student time required
to obtain the data. This does not
include set-up and scoring time. Tools
need to be efficient to use. This is
especially true of measures that
teachers would be using on a more
frequent basis.
Score 3
20 or more
alternate
forms
Score 2
15 – 19
alternate
forms
Score 1
10 – 14
alternate
forms
Score 0
9 alternate
forms
Kicked out if:
< 9 alternate
forms
Free
$.01 to $1.00
per student
$1.01 to $2.00
per student
$2.01 to $2.99
per student
≥$3.00 & over
per student
≤ 5 minutes
per student
6 to 10
minutes per
student
11 to 15
minutes per
student
>15 minutes
per student
Group B
Criteria
Forms are of
Equivalent
Difficulty
(Need to provide
detail of what
these are when
publish review)
Judgment of
Criterion Measure
(see separate sheet
for judging criterion
measure)
Technical Adequacy
is Demonstrated
for Validity of
Performance score
(sometimes called
Level)
Justification
Alternate forms need to be of
equivalent difficulty to be useful as a
progress monitoring tool. Having many
forms of equivalent difficulty allows a
teacher to determine how the student
is responding to instruction because
the change in score can be attributed
to student skill versus a change in the
measure. Approaches include:
 Readability formulae (e.g., FleishKincaid, Spache, Lexile, FORCAST)
 Euclidian Distance
 Equipercentiles
 Stratified Item Sampling
The measure that is being used as a
comparison must be determined to be
appropriate as the criterion. In order to
make this determination several
features of the criterion measure must
be examined.
Performance score is a student’s
performance at a given point in time
rather than a measure of his/her
performance over time (i.e., rate of
progress). We focused on criterionrelated validity to make this
determination because it is a
determination of the relation between
the progress monitoring tool and a
meaningful outcome.
Score 3
Addressed
equating in
multiple ways
Score 2
Addressed
equating in 1
way that is
reasonable
Score 1
Score 0
Addressed
equating in a
way that is
NOT
reasonable
15-12 points
on criterion
measure form
11-8 points on
criterion
measure form
7-4 points on
criterion
measure form
3-0 points on
criterion
measure form
Criterion
≥ .70
Criterion
.50-.69
Criterion
.30 -.49
Criterion
.10 - .29
Kicked out if:
Does Not
Provide any
indication of
equating forms
Group B (cont)
Technical Adequacy
is Demonstrated
for Reliability of
Performance score
Technical Adequacy
is Demonstrated
for Reliability of
slope
Tools need to demonstrate that the
test scores are stable across item
samples/forms, raters, and time.
Across item samples/forms:
 coefficient alpha
 split half
 KR-20
 alternate forms
Across raters:
 interrater (i.e., interscorer,
interobserver)
Across time:
 Test-retest
Item samples/
forms ≥.80
Item samples/
forms .79-.70
Item samples/
forms .69-.60
Item samples/
forms ≤.59
Rater ≥.90
Rater .89-.85
Rater .84-.80
Rater ≤.75
Time ≥.80
Time .79-.70
Time .69-.60
Time ≤.59
The Reliability of the slope tells us
how well the slope represents a
student’s rate of improvement. Two
criteria are used:
 Number of observation, that is
student data points needed to
calculate slope.
 Coefficients, that is reliability
for slope. This should be
reported via HLM (also called
LMM or MLM) results. If
calculated via OLS, the
coefficients are likely to be
lower. *
10 or more
observations/
data points
9-7
observations/
data points
6-4
observations/
data points
3 or fewer
observations/
data points
Coefficient
>.80
Coefficient
>.70
Coefficient
>.60
Coefficient
<.59
Must Report
2/3 OR a score
of 0 in 2 or
more areas.
(No tool would
be kicked out
due to lack of
any one.)
Group B (cont)
* HLM=Hierarchical Linear Modeling
LMM=Linear Mixture Modeling
MLM=Multilevel Modeling
OLS=Ordinary Least Squares
HLM, LMM, and MLM are three different ways to describe a similar approach to analysis. Reliability
of the slope should be reported as a proportion of variance accounted for by the repeated
measurement over time. These methods take into account that the data points are actually related
to one another because they come from the same individual. OLS does not take this into account
and as such, would ascribe the extra variation to error in measurement rather than the relation
among data points.
Group C
Criteria
Alignment with
Iowa CORE/
Demonstrated
Content Validity
Training Required
Computer
Application (tool
and data system)
Justification
It is critical that tools assess skills
identified in the Iowa Core.
Literature & Informational:
 Key Ideas & Details
 Craft & Structure
 Integration of knowledge &
ideas
 Range of reading & level of text
complexity
Foundational: (K – 1)
 Print Concepts
 Phonological Awareness
 Phonics and Word Recognition
 Fluency
Foundational: (2 – 5)
 Phonics and Word Recognition
 Fluency
The amount of time needed for training
is one consideration related to the
utility of the tool. Tools that can be
learned in a matter of hours and not
days would be considered appropriate.
Many tools are given on a computer
which can be helpful if: schools have
computers, the computers are
compatible with the software, and the
data reporting can be separated from
the tools itself. It is also a viable option
if hard copies of the tools can be used if
computers are not available.
Score 3
Has a direct
alignment
with the Iowa
CORE (provide
Broad Area
and Specific
Skill)
Score 2
Has alignment
with Iowa
CORE (Provide
Broad Area)
Score 1
Score 0
Less than 5
hours of
training
(1 day)
5.5 to 10
hours of
training
(2 days)
10.5 to 15
hours of
training
(3 days)
Over 15.5
hours of
training
(4+ days)
Computer or
hard copy of
tool available.
Data reporting
is separate
Computer
Computer or
application
hard copy of
only.
tool available.
Data reporting Data reporting
is separate
is part of the
system
Computer
application
only.
Data reporting
is part of the
system
Kicked out if:
Has no
alignment with
the Iowa CORE
Group C (cont)
Data
Administration and
Data Scoring
The number of people needed to
administer and score the data speaks
to the efficiency of how data is
collected and the reliability of scoring.
Student takes
assessment
on computer,
it is
automatically
scored by
computer at
end of test
Adult
administers
assessment to
student and
enters
student’s
responses (in
real time) into
computer, it is
automatically
scored by
computer at
end of test
Adult
administers
assessment to
student and
then
calculates a
score at end
of test by
conducting
multiple steps
(adding
together
scores across
many
assessments,
subtracting
errors to get a
total score)
Data Retrieval
(time for data to be
useable)
The data needs to be available in a
timely manner in order to use the
information to make decisions about
students
Data can be
used instantly
Data can be
used Same
day
Data can be
used Next day
Adult
administers
assessment to
student and
then
calculates a
score at end
of test by
conducting
multiple steps
AND
referencing
additional
materials to
get a score
(having to
look up
information in
additional
tables)
Data are not
available until
2 – 5 days
later
Takes 5+ days
to use data
(have to send
data out to be
scored)
Group C (cont)
A broad sample is
used
Disaggregated Data
The sample used in determining the
technical adequacy for a tool should
represent a broad audience. While a
representative sample by grade is
desirable it is often not reported
therefore, taken as a whole does the
population used represent all students
or is it specific to a region or state?
Viewing disaggregated data by
subgroups (i.e, race, English language
learners, economic status, special ed.
status) helps determine how the tool
works with each group. This
information is often not reported but it
should be considered if it is available
National
sample
Several States
(3 or more)
across more
than one
region
States (3, 2 or
1 in one
region)
Sample of
convenience,
does not
represent a
state.
Race,
economic
status, and
special ed.
status are
reported
separately
At least two
disaggregated
groups are
listed
One
disaggregated
group is listed
No
information
on
disaggregated
groups
Findings
• Many of the tools reported are not sufficient (or
appropriate) for universal screening or progress
monitoring
• Some tools are appropriate for both
• No tool (so far) is “perfect”
• There are alternatives from which to choose
Live Chat
• Thursday April 26, 2012
• 2:00-3:00 EDT
• Go to rti4success.org for more details
Download