Before setting parameters, districts are advised to have the following

advertisement
Implementation Brief:
Setting Parameters for Expected Growth
Suggestions, considerations and
resources for determining if students
have demonstrated high, moderate, or
low growth.
Overview
Once a district has identified common assessments,
and developed administration and scoring protocols,
the next step is to define parameters. Parameters are
the ranges of scores (see call out box to the right) on
an assessment that indicate whether a student has
demonstrated learning, growth, or achievement at
expected rates, above expected rates, or below
expected rates. Parameters provide a transparent
definition of what we expect from students.
Regulatory Definition, 603 CMR 35.09 (3)
(a) A rating of high indicates significantly higher than
one year's growth relative to academic peers in the
grade or subject.
(b) A rating of moderate indicates one year's growth
relative to academic peers in the grade or subject.
(c) A rating of low indicates significantly lower than
one year's student learning growth relative to
academic peers in the grade or subject.
Moderate Growth as a Range of Scores
It is easy to conflate “moderate growth” with “average
growth.” Educators are used to describing
assessment results using the term “average.” For
example, “average growth on this assessment is
eight points” or “the average student scored an 83 on
the final exam.” Average growth is almost always
described a single point, whereas when setting
parameters, moderate growth should describe the
range of scores that would satisfy educators’
expectations. Using this logic, high growth represents
the range of scores that exceed expectations and low
growth represents the range of scores that fall below
expectations.
One advantage to establishing a range for moderate
growth is that once it is defined, the ranges for high
and low growth are as well. For example, if a group
of educators determine that growth of 6-12 points is
what they expect on a given assessment, then 5 or
fewer points is less than what is expected, or low
growth, and 13 or more points exceeds expectations,
and therefore represents high growth.
Setting parameters may be a new practice for many
educators, including even those with deep expertise in
developing and administering assessments. However,
all educators are familiar with grading student work,
determining what types of work products represent “A”
level work, “B” level work, and so on. The knowledge
and skills necessary to fairly grade student work and
assessments are the same needed to set parameters.
Both practices require being clear about expectations,
considering all available information, and using
professional judgment to make honest distinctions
about student outcomes that meet, exceed, or fall short
of expectations. Reminding educators of these
similarities can improve their confidence in making
important decisions about parameters.
Prior to Setting Parameters
Before setting parameters, districts are advised to have
the following conditions in place:
 Engage educators in identifying assessments that
are well aligned to content and that provide
valuable information to educators about their
students
DDM Implementation Brief on Parameter Setting
1


Develop clear protocols to administer and score
assessments fairly and consistently across all
students, classrooms, and schools.
Clearly communicate how results from common
assessments are used in the evaluation process.
Specifically, that the Student Impact Rating is
separate from the Summative Performance Rating
and evaluators determine the rating by applying
professional judgment to results from multiple
measures over multiple years.
Suggested Next Steps
The recommendations in this brief can be helpful to
districts as they proceed through the following stages
of common assessments development:



Identifying an approach for setting parameters.
Ensuring that parameters are appropriate for all
students.
Using parameters to support educator practice
Approaches to Determinating Parameters
There are two broad approaches to determining
parameters: a normative approach involves setting
parameters based on pre-determining the percentage
of student scores that will fall into each category and a
criterion approach defines parameters based on a
fixed set of expecatations. Districts may use one
approach for all common assessments, or use different
approaches for different assessments.
Normative Approaches:
A normative approach involves setting a predetermined
percentage of student scores that will fall into the high,
moderate, and low growth categories. This approach
supports comparisons across different assessments.
For example, if a district determines that high growth is
defined as the top third of student scores for all
common assessments, all educators have a clear
understanding of what that means; the art teacher and
the physics teacher are using a common definition for
high growth, even if they are using very different
assessments.
Normative approaches can be based solely on the
students who complete the assessment in a given
district in a given year, or may use an expanded group
of students by including students from other districts or
who completed the assessment in prior years. In
general, the more student scores added to the
population, the more we understand about how
students generally perform on the assessment.
Therefore, when comparing their students’ results to
the population, educators can be more confident that
the parameters reflect meaningful differences in
2
DDM Implementation Brief on Parameter Setting
performance. For example, an educator would likely be
more confident that a student is demonstrating high
performance on an assessment if the student’s scores
were in the top 20% of a national sample compared to
the top 20% of the teacher’s class.
Normative approach based on current students: A
normative approach that looks at only the performance
of current students involves scoring an assessment
and then rank ordering the results. Parameters are
then applied to separate the scores into three groups:
high growth, moderate growth, and low growth based
on predetermined percentages. For example, a district
might choose to use thirds. The lower third of scores
would be designated as low growth, the middle third
would represent moderate growth, and the higher third
of scores would represent high growth. In this model,
the parameters do not necessarily need to be in even
thirds. Districts may instead choose to define low and
high growth as the bottom and top 25% percent of
scores respectively, leaving the middle 50% to
represent moderate growth. Regardless of the specific
percentages, the parameters are based on the
predetermined proportion and consistent across all
educators using the same assessment. Educators may
be familiar with the similar process of “grading on a
curve.”
This approach has several advantages:
 Data Requirements: This approach does not
require pre-existing data about the assessment
(i.e., past students’ scores), since parameters
are based on pre-determined percentages for
the high, moderate, and low categories.
Educators might find this especially beneficial
when trying out new assessments.
 Comparability: This approach also supports
comparisons across different assessments.
For example, if a district sets the same
normative parameters for all common
assessments, evaluators and educators will be
assured that all assessments have an equal
level of difficulty because the same percentage
of student scores will fall into the high,
moderate, and low growth categories.
However, there are important drawbacks to this
approach to consider.
 Planning: Since the scores need to be rank
ordered in order to determine which scores
translate to high, moderate, and low growth,
educators cannot know beforehand where the
cut scores will fall.
 Comparability: While one advantage of using
a normative approach is that it assumes a
consistent level of difficulty across all


assessments, this is also potential a drawback.
There may be real differences in the
performance of students across different
assessments that are hidden because the
comparison group for students is limited to a
single subject area.
Singletons: In some districts, there may be
only one educator serving in a particular
position. Since the percentage of student
scores that will fall into the high, moderate, and
low growth categories is predetermined, a
normative approach that factors in only current
students provides limited feedback to
singletons. Concretely, a singleton who
produces extraordinary growth in students
would have the same percentage of student
scores in the high category as every other
singleton, making it difficult for an evaluator to
draw conclusions about impact.
Yearly Comparisons: With a normative
approach based only on current students, it
can be challenging to see systematic high
growth in a group of students. For example, if
three quarters of students made tremendous
gains compared to students from previous
years, an approach that breaks high,
moderate, and low growth into even thirds
would mask that growth because only the top
33% of scores would earn the high growth
designation. That is, the same percentage of
scores would be determined as high each
year. By contrast, the same student results
using a normative approach based on multiple
years of student scores or student scores from
multiple schools, districts, or states would allow
an individual class’s high growth to shine
through.
Normative approach based on a wider population:
One way to address some of the disadvantages of
using a normative approach is to base norms on a
larger group of students. Most educators are familiar
with the Student Growth Percentiles (SGPs) calculated
by ESE for statewide assessments. SGPs are an
example of a normative approach based on a group of
students from multiple districts. Although there is a set
percentage of students that are determined to have
demonstrated high growth, there is no set percentage
for each district. As a result, it is possible for all
students in a district to demonstrate high growth on the
state assessment, while at the state level the
percentages of student scores that fall into the three
categories are fixed.
A larger reference population can be geographical or
temporal. Districts using commercial assessments may
be able to use a national norm group to provide a
better reference point for defining high, moderate, or
low growth. Districts can even use this approach with
district-developed assessments by looking at student
results across multiple years.
All normative approaches provide the advantage of
being able to set consistent definitions of high,
moderate, and low growth for all content areas,
regardless of the assessments used and their various
point scales and scoring processes. Considering a
wider population of student scores provides the
following additional advantages:
 Planning: Parameters based on either prior
years or a large population are more
predictable and better allow educators to plan
ahead.
 Singletons: Using parameters based on a
wider population is a great opportunity for
singletons to take advantage of the power of
common assessments. Looking at results from
a wider population may allow an educator to
identify areas of strength or weakness that
they were not able to determine before. For
example, foreign language teachers might
consider one of the national assessments
identified as a potential common assessment
by the Massachusetts Foreign Language
Association. These assessments allow
educators to see how their students compare
to a group of students from across the country.
 Yearly Comparisons: Using parameters
informed by results from prior years, educators
are able to make comparisons across multiple
years. For example, if more students in a
teacher’s class demonstrated high growth this
year than in previous years, he/she might think
about what new instructional strategies could
have led to this change and build on them in
subsequent years.
 Competition: By using a wider population of
scores to determine parameters, the impact of
any individual student’s score on the
population is diminished. For example, if
parameters are cut scores based on a national
population, it is possible for all students in a
given district to perform at the high growth
level, whereas if parameters are based solely
on the current students in a district, some
percentage of students scores necessarily fall
into the low growth category.
However, this type of normative approach is not
without potential drawbacks:
 Data Requirements: Collecting and analyzing
data from a wider population, be it scores from
DDM Implementation Brief on Parameter Setting
3
multiple years or multiple locales, requires
personnel time. Even for commercial
assessments, data may not be structured in a
way that easily informs the parameter-setting
process.
Changes in Assessment: Appropriately,
many districts and educators will want to make
changes to their common assessments from
year to year. If using a normative approach
based on a wider population to set parameters,
districts will have to ensure that educators are
not discouraged from making necessary
improvements to assessments. If modifications
are made, careful consideration must be given
to determine whether the changes are
significant enough to warrant re-setting the
parameters.
decisions about parameters in discussions about the
specific assessment items and their knowledge of
student learning progressions. The video example on
page 7 illustrates how a 5th grade teacher might set
parameters for a pre- and post- test in mathematics
using this strategy. The educator in the video uses his
understanding of the Curriculum Frameworks, as well
as his knowledge of past students to identify score
ranges that represent three different groups of
students: students entering 5th grade with math skills
that are below grade level, at grade level, and above
grade level. Using those three groups, the educator
thinks about which specific items on the assessment
each group would likely answer correctly on the pretest and then considers which additional items each
group would need to answer correctly on the post-test
to meet his expectations for a year of learning.
Criterion Approaches:
In contrast to normative approaches, criterion
approaches involve educators using professional
judgment to define high, moderate, and low growth.
The advantage of this approach is that growth is
considered in terms of the learning that a student
demonstrates as opposed to how a student’s score on
an assessment relates to other students’ scores.
Educators may find a criterion approach more
authentic than a normative approach because it
involves engaging a group of professionals in the
process of thinking through how they expect students
to perform on the assessment. However, since the
definition is based on professional judgment, it may be
harder to articulate and interpret the expectations for
students embedded in the parameters. For example,
with a normative approach, a district could make the
statement, “High growth means the top 25% of student
scores.” Using a criterion approach, the statement is
less cut and dry, “High growth reflects the range of
scores that educators determine is representative of
exceeding expected performance.” Comparisons
across different assessments in different content areas
is more challenging with a criterion approach because
the criteria for demonstrating high, moderate, and low
growth depend on the specific assessment.
Setting parameters can be a challenge for many
educators because it requires a shift in their thinking.
Most educators are adept at defining expected
achievement in their classrooms. However, they may
have less experience thinking about expected growth.
While this is a challenge, the shift to thinking about
growth is important work because it provides an
opportunity to think about the learning of all students,

Since criterion approaches require the use of
professional judgment, they require that educators put
forward their best thinking. One district called the
parameters they developed during the first year
“hypothesized parameters” to make it explicit that
expectations for what constitutes moderate growth may
need to be refined in the future.
How should educators make determinations about
what type of student work represents high, moderate,
and low growth? Some have found it useful to ground
4
DDM Implementation Brief on Parameter Setting
An On-Ramp?
Determining parameters for a new common
assessment can be challenging. One option is a
blended normative and criterion approach that
unfolds over two years.
For example, consider a 2nd grade team that does
not feel they have the expertise to set criterion-based
parameters for a new assessment. They decide to
use a normative approach in the first year and
determine that the top 25% of student scores on their
common assessment will be considered high growth.
After administering the assessment, they apply this
rule to their results and find that scores of 16 points
of growth or more comprise the top 25% and
therefore fall into the high growth category.
During the second school year, the team now has the
first year of results to inform their parameters.
Instead of going the normative route again, they
decide to use they use the prior year’s results to set
the criteria for their parameters and decide to use the
same cut score to define the parameters for high
growth, 16 points or more. As a result, in year 2 the
educators use know their parameters before teaching
and can look at each student’s pre-test to identify
how many points he/she would need to earn on the
post-test to demonstrate high growth and use this
information to inform their planning.
regardless of where they started the year.
Guidebook for Inclusive Practice
Created by Massachusetts educators, the Guidebook
includes tools for districts, schools, and educators
that are aligned to the MA Educator Evaluation
Framework and promote evidence-based best
practices. The Guidebook includes several tools
aligned to this brief for developing common
assessments that are fair and accessible for all
students.
Criterion approaches have several advantages:
 Alignment: Perhaps the greatest advantage of
using a criterion approach is that learning is
defined in relation to standards instead of the
performance of other students. Using a
criterion approach can help shift conversations
away from scores and to student learning.
 Data Requirements: While previous data can
support educators in setting parameters using
a criterion approach, it is not a requirement.
Educators can use experience with similar
tasks, items, or problems to inform the
parameters they set.
 Planning: Parameters should be determined
prior to using the assessment. This allows for
parameters to serve as a planning tool for
educators. By having a clear understanding of
what skills students are expected to
demonstrate over a course or year, educators
can backwards plan appropriate lessons and
assessments to ensure they are on track for
students to meet those expectations.
 Singletons: One of the important advantages
of using common assessments to inform
educator practice is that they can relieve the
isolation that some educators experience. By
looking at the results of students from other
classrooms, an educator can begin to
understand his/her relative strengths and
weaknesses. A singleton teacher using a
criterion approach does not need to set
parameters alone. For example, a ceramics
teacher might invite the two other art teachers
in the district to help develop parameters with
him/her. In fact, the process may help support
a broader and more cohesive approach to arts
education and assessment.


Using a Criterion Approach with a Rubric
Many common assessments are scored using a
rubric. Designed carefully, a rubric can be quite
helpful in helping educators understand criterionbased parameters. For example, consider a writing
rubric written such that moving up 2 to 3 performance
levels is a demonstration of moderate growth;
therefore, moving more than 3 performance levels
would be evidence of high growth and moving fewer
than 2 levels would be evidence of low growth. For
this approach to work, educators would have to be
confident that it is equally challenging to move from
performance level to the next throughout the rubric.
Yearly Comparisons: Since parameters are
not based on student scores, an educator can
look across years to see how they have made
improvements from year to year.
Competition: Since determinations of high,
moderate, and low growth are not based on
other scores, students who demonstrate high
growth do not impact the ability of other
students to also demonstrate high growth.
However, criterion approaches also have drawbacks
that educators must consider:
 Time: Compared to normative approaches,
criterion approaches are time intensive. They
require time for different educators to share
their perspectives and groups to arrive at
consensus-based decisions.
 Experience: Since the process of determining
parameters using a criterion approach involves
educators’ professional judgment, it relies on
educators with experience sharing what they
know about past students’ learning
progressions. Teams should plan to revisit
parameters in the first few years to capitalize
on increased educator experience and
knowledge in the parameter refinement
process.
 Comparability: A criterion approach has the
advantage of tying parameters closely to the
standards and learning objectives for each
content area. Unfortunately as a result, it can
be difficult to arrive at a clear cross-content
definition of moderate growth. This can make it
difficult to know whether all groups of
educators are using comparably rigorous
definitions of moderate growth As a result,
districts need to pay close attention to
comparability across different measures.
DDM Implementation Brief on Parameter Setting
5
Applying Student Results to Parameters
Setting parameters at the student level:
For most assessments, it makes sense to determine
parameters at the student level, whether using a
normative or criterion approach. In other words,
parameters that will be applied to each student’s work
to determined whether the student demonstrated high,
moderate, or low learning, growth, or achievement.
When looking across the assessment results,
evaluators can ask, “Have more than half of the
educator’s students demonstrated either high or low
growth?” If so, they will look to see if the results are
consistent with a pattern of high or low growth in the
educator’s students on other measures and over
multiple years. This information will ultimately inform
the educator’s Student Impact Rating (see additional
guidance on determining an educator’s Student Impact
Rating).
Setting parameters at the class level
There may be assessments where setting parameters
at the student level does not make sense. For
example, many art, music, and physical education
teachers work with large numbers of students for short
periods of time. In these cases, it may be appropriate
to use common assessments that look at whether
students have demonstrated high, moderate, or low
growth as a class instead of individual students. This
approach can also provide powerful feedback about an
educator’s impact.
For example, a team of art teachers may have each of
their students develop an individualized goal based on
an initial assessment. The team could then determine
that moderate growth for a class would be 70% to 90%
of the students in the class meeting their goals.
Another example of a class-based approach to
parameters comes from a music teacher who worked
Same Number of Students in Each Category?
There is no requirement that there be an equal
percentage of student scores that fall into the high,
moderate, and low growth categories for all common
assessments. However, if only a small handful of
scores generally fall in a given category, it makes it
difficult to notice trends or patterns about where and
with whom high or low growth is occurring. Districts
are encouraged to have rigorous definitions of high
and low growth in order to provide meaningful
feedback to educators about student learning. If,
over time, few or no students demonstrate high or
low growth on a particular assessment, it is a good
signal to revisit the parameters to see if they should
be adjusted up or down.
6
DDM Implementation Brief on Parameter Setting
across a district teaching different groups of students
13 half hour lessons. She was concerned that with so
little instructional time, a pre-test and post-test would
take up too significant an amount of her instructional
time with each group. She decided that it would be
more meaningful to measure whether each group as a
unit demonstrated the type of growth she was
expecting. During the first lesson with each group, she
assessed the group using three questions. She then
asked those same three questions during the last
lesson. This short assessment only took a few minutes,
and since her focus was growth as a group, she did not
worry about making determinitions at the student level,
which would have involved mitigating floor and ceiling
effects. While this assessment was very simple,
overall she had a robust assessment of her impact
without overburdening her classroom with
assessments. In some instances collecting a small
amount of data from a large number of students can be
just as meaningful as collecting a large amount of data
from a small number of students.
Parameters Appropriate for All Students
Using Banding:
All students should have an equal chance to
demonstrate growth on a common assessment. The
banding process allows educators to set growth
parameters that capture different cutoff scores
depending on the student’s baseline score.
Setting “bands” according to baseline scores allows
educators to set low, moderate, and high ranges of
growth for students more accurately and acknowledges
that, on many assessments, an increase of 1 point
does not necessarily equal the same amount of growth
consistently across the scale. In other words, it may be
easier for students to move from a baseline score of 5
to an end-of-course score of 10 than it is to move from
a baseline of 90 to an end-of-course score of 95.
The table below is an example of ranges of low,
moderate, and high growth for students in three bands,
based on baseline scores. Educators may create as
many bands in the parameter setting process as they
wish, but three is the recommended minimum when
working with students with diverse learning profiles. In
the following example, three bands were set to capture
different rates of growth.
Districts can use either normative or criterion
approaches while using banding. If they use a
normative approach, instead of low growth
representing those students in the lowest third, low
growth would represent those students who performed
in the lowest third on the post-test out of students
compared only to those students whose pre-test scores
fell in the same initial performance range. This is the
approach can be seen as a highly simplified version of
the process used to determine SGPs.
Districts are encouraged in future years to continue to
investigate issues of fairness in the process of
continuous improvement.
Using the Information
If a district uses a criterion approach, instead of
educators considering a single group of students,
educators would be asked to consider a group of
students to represent each band. For example, if a
district chose to use three bands, educators would
discuss three groups of students: students whose initial
scores suggest they are below grade level, at grade
level, and above grade level. See the video example of
this process below.
The purpose of the educator evaluation system is to
support the professional growth and development of
educators. Ensuring that all educators are using
assessments with clear parameters ensures that
educators contribute to and understand shared
expectations for student learning, growth, and
achievment. Below are questions for evaluators and
educators to support the use of parameters in
improving instruction and ultimately student learning.
Moderate Growth Parameters using Banding
Initial
Scores
Low
Moderate
High
0-2
0-5
6–7
8 – 10
3-5
0-6
7–8
9-10
6-7
0-7
8–9
10
Educators will often discover that students with higher
initial scores, present a “ceiling effect” problem that
prevents these students from demonstrating growth.
The addition of new, harder questions on the pre-test,
or the use of a modified assessment are ways to
address the ceiling effect.
When using bands, it is important to keep in mind that
the goal is not to set different expectations for different
students, but rather to acknowledge that the number of
points that represents moderate growth may be
different based on the initial score.
Parameter Setting Example Video
Click on the image to access a step-by-step walk through
of a criterion based approach to setting parameters on a
5th grade math assessment using three bands of initial
scores.
For Evaluators: Evaluators should consider two
important questions for all of the assessments used as
evidence for determining Student Impact Ratings.
1. First, evaluators should look across all
common assessments in the district and ask
whether similar numbers of students are
demonstrating high, moderate, or low growth
across the different assessments. Having an
assessment that consistently results in high
numbers of students demonstrating high (or
low) growth is not necessarily a problem.
However, evaluators should look for additional
evidence to support this finding. Absent
evidence that the assessment is helping
educators make meaningful distinctions about
student performance, evaluators should work
with educators to revise the parameters.
Parameters are likely to change over time, as
educators learn more about how students
typically perform on a given assessment.
2. Second, evaluators should investigate if
students that have demonstrated low growth
on a given assessment are clustered in a
specific classroom. Most of the time students
that have demonstrated low growth will be
spread across all classrooms. This is to be
expected. However, if low growth is
concentrated in a single classroom, there is
evidence that the educator(s) may need to
reinforce the knowledge and skills measured
by the assessment. That said, it is important to
remember that low growth may cluster in a
classroom based on factors other than the
instruction. For this reason, Student Impact
Ratings always factor in the context in which
the undergirding assessments were
administered.
DDM Implementation Brief on Parameter Setting
7
For Educators: The systematic use of common
assessments provides an excellent opportunity for
educators to take an honest look at the impact of their
instruction.
1. Educators may look across the population of
students that are demonstrating low growth
and investigate whether there are common
characteristics across these students. Doing so
may reveal potential issues of bias with the
assessment. For example, many assessments
result in low growth scores for the highest
achieving students. This is largely due to
ceiling effects that can be mitigated through
revisions to the assessment and/or scoring
process.
2. Educators are encouraged to look closely at
the common assessment results to see if
patterns emerge. For example, educators may
look across how students have grown on
different categories of a rubric and determine
that students in one class are not making the
same type of growth in providing details
connected to text as in other classes.
Understanding how different educators’
students are succeeding on different parts of
an assessment provides a good framework for
collaborative learning between teachers. This
is very much the value and intent of using
common assessments and connects back to
the goal of the educator evaluation framework,
that is, to support educator growth and
development.
Reviewing Parameters
It is expected that districts will review and revise
parameters to ensure they are providing meaningful
feedback to educators, especially during the first
couple of years of using a common assessment. One
strategy for engaging educators in parameter revisions
is identification agreement.
Identification Agreement: The process of
identification agreement involves investigating if the
determination of high, moderate, and low is consistent
if more data was used to make that determination.
There is not an expectation that there will be perfect
agreement. Even a student who has made high growth
may have an ‘off’ day when completing an assessment.
However, if there is an overall pattern of inconsistency,
districts should make changes to parameters and
potentially the assessment itself.
Educators can follow this process:
8
DDM Implementation Brief on Parameter Setting






Randomly select one student that has
demonstrated each level of growth (high,
moderate, and low) on the assessment.
Collect additional information about those three
students’ from the year. This can include
performance on other assessments, student
work samples, and teacher testimonials.
Looking at these multiple pieces of evidence,
ask whether the totality of the evidence
supports the conclusion that was reached on
the original assessment. In other words, did
the other evidence collected about the student
who demonstrated high growth on the common
assessment also signal that the student
exceeded expectations for the year?
If the conclusions based on multiple pieces of
evidence match the results of the common
assessment for all three students, there is
some evidence that the parameters are
appropriately set.
If they do not match, investigate whether the
parameters should be adjusted or whether the
common assessment is not well-aligned to the
other data collected.
In borderline cases, randomly choose another
three students and repeat the process.
Download