Implementation Brief: Setting Parameters for Expected Growth Suggestions, considerations and resources for determining if students have demonstrated high, moderate, or low growth. Overview Once a district has identified common assessments, and developed administration and scoring protocols, the next step is to define parameters. Parameters are the ranges of scores (see call out box to the right) on an assessment that indicate whether a student has demonstrated learning, growth, or achievement at expected rates, above expected rates, or below expected rates. Parameters provide a transparent definition of what we expect from students. Regulatory Definition, 603 CMR 35.09 (3) (a) A rating of high indicates significantly higher than one year's growth relative to academic peers in the grade or subject. (b) A rating of moderate indicates one year's growth relative to academic peers in the grade or subject. (c) A rating of low indicates significantly lower than one year's student learning growth relative to academic peers in the grade or subject. Moderate Growth as a Range of Scores It is easy to conflate “moderate growth” with “average growth.” Educators are used to describing assessment results using the term “average.” For example, “average growth on this assessment is eight points” or “the average student scored an 83 on the final exam.” Average growth is almost always described a single point, whereas when setting parameters, moderate growth should describe the range of scores that would satisfy educators’ expectations. Using this logic, high growth represents the range of scores that exceed expectations and low growth represents the range of scores that fall below expectations. One advantage to establishing a range for moderate growth is that once it is defined, the ranges for high and low growth are as well. For example, if a group of educators determine that growth of 6-12 points is what they expect on a given assessment, then 5 or fewer points is less than what is expected, or low growth, and 13 or more points exceeds expectations, and therefore represents high growth. Setting parameters may be a new practice for many educators, including even those with deep expertise in developing and administering assessments. However, all educators are familiar with grading student work, determining what types of work products represent “A” level work, “B” level work, and so on. The knowledge and skills necessary to fairly grade student work and assessments are the same needed to set parameters. Both practices require being clear about expectations, considering all available information, and using professional judgment to make honest distinctions about student outcomes that meet, exceed, or fall short of expectations. Reminding educators of these similarities can improve their confidence in making important decisions about parameters. Prior to Setting Parameters Before setting parameters, districts are advised to have the following conditions in place: Engage educators in identifying assessments that are well aligned to content and that provide valuable information to educators about their students DDM Implementation Brief on Parameter Setting 1 Develop clear protocols to administer and score assessments fairly and consistently across all students, classrooms, and schools. Clearly communicate how results from common assessments are used in the evaluation process. Specifically, that the Student Impact Rating is separate from the Summative Performance Rating and evaluators determine the rating by applying professional judgment to results from multiple measures over multiple years. Suggested Next Steps The recommendations in this brief can be helpful to districts as they proceed through the following stages of common assessments development: Identifying an approach for setting parameters. Ensuring that parameters are appropriate for all students. Using parameters to support educator practice Approaches to Determinating Parameters There are two broad approaches to determining parameters: a normative approach involves setting parameters based on pre-determining the percentage of student scores that will fall into each category and a criterion approach defines parameters based on a fixed set of expecatations. Districts may use one approach for all common assessments, or use different approaches for different assessments. Normative Approaches: A normative approach involves setting a predetermined percentage of student scores that will fall into the high, moderate, and low growth categories. This approach supports comparisons across different assessments. For example, if a district determines that high growth is defined as the top third of student scores for all common assessments, all educators have a clear understanding of what that means; the art teacher and the physics teacher are using a common definition for high growth, even if they are using very different assessments. Normative approaches can be based solely on the students who complete the assessment in a given district in a given year, or may use an expanded group of students by including students from other districts or who completed the assessment in prior years. In general, the more student scores added to the population, the more we understand about how students generally perform on the assessment. Therefore, when comparing their students’ results to the population, educators can be more confident that the parameters reflect meaningful differences in 2 DDM Implementation Brief on Parameter Setting performance. For example, an educator would likely be more confident that a student is demonstrating high performance on an assessment if the student’s scores were in the top 20% of a national sample compared to the top 20% of the teacher’s class. Normative approach based on current students: A normative approach that looks at only the performance of current students involves scoring an assessment and then rank ordering the results. Parameters are then applied to separate the scores into three groups: high growth, moderate growth, and low growth based on predetermined percentages. For example, a district might choose to use thirds. The lower third of scores would be designated as low growth, the middle third would represent moderate growth, and the higher third of scores would represent high growth. In this model, the parameters do not necessarily need to be in even thirds. Districts may instead choose to define low and high growth as the bottom and top 25% percent of scores respectively, leaving the middle 50% to represent moderate growth. Regardless of the specific percentages, the parameters are based on the predetermined proportion and consistent across all educators using the same assessment. Educators may be familiar with the similar process of “grading on a curve.” This approach has several advantages: Data Requirements: This approach does not require pre-existing data about the assessment (i.e., past students’ scores), since parameters are based on pre-determined percentages for the high, moderate, and low categories. Educators might find this especially beneficial when trying out new assessments. Comparability: This approach also supports comparisons across different assessments. For example, if a district sets the same normative parameters for all common assessments, evaluators and educators will be assured that all assessments have an equal level of difficulty because the same percentage of student scores will fall into the high, moderate, and low growth categories. However, there are important drawbacks to this approach to consider. Planning: Since the scores need to be rank ordered in order to determine which scores translate to high, moderate, and low growth, educators cannot know beforehand where the cut scores will fall. Comparability: While one advantage of using a normative approach is that it assumes a consistent level of difficulty across all assessments, this is also potential a drawback. There may be real differences in the performance of students across different assessments that are hidden because the comparison group for students is limited to a single subject area. Singletons: In some districts, there may be only one educator serving in a particular position. Since the percentage of student scores that will fall into the high, moderate, and low growth categories is predetermined, a normative approach that factors in only current students provides limited feedback to singletons. Concretely, a singleton who produces extraordinary growth in students would have the same percentage of student scores in the high category as every other singleton, making it difficult for an evaluator to draw conclusions about impact. Yearly Comparisons: With a normative approach based only on current students, it can be challenging to see systematic high growth in a group of students. For example, if three quarters of students made tremendous gains compared to students from previous years, an approach that breaks high, moderate, and low growth into even thirds would mask that growth because only the top 33% of scores would earn the high growth designation. That is, the same percentage of scores would be determined as high each year. By contrast, the same student results using a normative approach based on multiple years of student scores or student scores from multiple schools, districts, or states would allow an individual class’s high growth to shine through. Normative approach based on a wider population: One way to address some of the disadvantages of using a normative approach is to base norms on a larger group of students. Most educators are familiar with the Student Growth Percentiles (SGPs) calculated by ESE for statewide assessments. SGPs are an example of a normative approach based on a group of students from multiple districts. Although there is a set percentage of students that are determined to have demonstrated high growth, there is no set percentage for each district. As a result, it is possible for all students in a district to demonstrate high growth on the state assessment, while at the state level the percentages of student scores that fall into the three categories are fixed. A larger reference population can be geographical or temporal. Districts using commercial assessments may be able to use a national norm group to provide a better reference point for defining high, moderate, or low growth. Districts can even use this approach with district-developed assessments by looking at student results across multiple years. All normative approaches provide the advantage of being able to set consistent definitions of high, moderate, and low growth for all content areas, regardless of the assessments used and their various point scales and scoring processes. Considering a wider population of student scores provides the following additional advantages: Planning: Parameters based on either prior years or a large population are more predictable and better allow educators to plan ahead. Singletons: Using parameters based on a wider population is a great opportunity for singletons to take advantage of the power of common assessments. Looking at results from a wider population may allow an educator to identify areas of strength or weakness that they were not able to determine before. For example, foreign language teachers might consider one of the national assessments identified as a potential common assessment by the Massachusetts Foreign Language Association. These assessments allow educators to see how their students compare to a group of students from across the country. Yearly Comparisons: Using parameters informed by results from prior years, educators are able to make comparisons across multiple years. For example, if more students in a teacher’s class demonstrated high growth this year than in previous years, he/she might think about what new instructional strategies could have led to this change and build on them in subsequent years. Competition: By using a wider population of scores to determine parameters, the impact of any individual student’s score on the population is diminished. For example, if parameters are cut scores based on a national population, it is possible for all students in a given district to perform at the high growth level, whereas if parameters are based solely on the current students in a district, some percentage of students scores necessarily fall into the low growth category. However, this type of normative approach is not without potential drawbacks: Data Requirements: Collecting and analyzing data from a wider population, be it scores from DDM Implementation Brief on Parameter Setting 3 multiple years or multiple locales, requires personnel time. Even for commercial assessments, data may not be structured in a way that easily informs the parameter-setting process. Changes in Assessment: Appropriately, many districts and educators will want to make changes to their common assessments from year to year. If using a normative approach based on a wider population to set parameters, districts will have to ensure that educators are not discouraged from making necessary improvements to assessments. If modifications are made, careful consideration must be given to determine whether the changes are significant enough to warrant re-setting the parameters. decisions about parameters in discussions about the specific assessment items and their knowledge of student learning progressions. The video example on page 7 illustrates how a 5th grade teacher might set parameters for a pre- and post- test in mathematics using this strategy. The educator in the video uses his understanding of the Curriculum Frameworks, as well as his knowledge of past students to identify score ranges that represent three different groups of students: students entering 5th grade with math skills that are below grade level, at grade level, and above grade level. Using those three groups, the educator thinks about which specific items on the assessment each group would likely answer correctly on the pretest and then considers which additional items each group would need to answer correctly on the post-test to meet his expectations for a year of learning. Criterion Approaches: In contrast to normative approaches, criterion approaches involve educators using professional judgment to define high, moderate, and low growth. The advantage of this approach is that growth is considered in terms of the learning that a student demonstrates as opposed to how a student’s score on an assessment relates to other students’ scores. Educators may find a criterion approach more authentic than a normative approach because it involves engaging a group of professionals in the process of thinking through how they expect students to perform on the assessment. However, since the definition is based on professional judgment, it may be harder to articulate and interpret the expectations for students embedded in the parameters. For example, with a normative approach, a district could make the statement, “High growth means the top 25% of student scores.” Using a criterion approach, the statement is less cut and dry, “High growth reflects the range of scores that educators determine is representative of exceeding expected performance.” Comparisons across different assessments in different content areas is more challenging with a criterion approach because the criteria for demonstrating high, moderate, and low growth depend on the specific assessment. Setting parameters can be a challenge for many educators because it requires a shift in their thinking. Most educators are adept at defining expected achievement in their classrooms. However, they may have less experience thinking about expected growth. While this is a challenge, the shift to thinking about growth is important work because it provides an opportunity to think about the learning of all students, Since criterion approaches require the use of professional judgment, they require that educators put forward their best thinking. One district called the parameters they developed during the first year “hypothesized parameters” to make it explicit that expectations for what constitutes moderate growth may need to be refined in the future. How should educators make determinations about what type of student work represents high, moderate, and low growth? Some have found it useful to ground 4 DDM Implementation Brief on Parameter Setting An On-Ramp? Determining parameters for a new common assessment can be challenging. One option is a blended normative and criterion approach that unfolds over two years. For example, consider a 2nd grade team that does not feel they have the expertise to set criterion-based parameters for a new assessment. They decide to use a normative approach in the first year and determine that the top 25% of student scores on their common assessment will be considered high growth. After administering the assessment, they apply this rule to their results and find that scores of 16 points of growth or more comprise the top 25% and therefore fall into the high growth category. During the second school year, the team now has the first year of results to inform their parameters. Instead of going the normative route again, they decide to use they use the prior year’s results to set the criteria for their parameters and decide to use the same cut score to define the parameters for high growth, 16 points or more. As a result, in year 2 the educators use know their parameters before teaching and can look at each student’s pre-test to identify how many points he/she would need to earn on the post-test to demonstrate high growth and use this information to inform their planning. regardless of where they started the year. Guidebook for Inclusive Practice Created by Massachusetts educators, the Guidebook includes tools for districts, schools, and educators that are aligned to the MA Educator Evaluation Framework and promote evidence-based best practices. The Guidebook includes several tools aligned to this brief for developing common assessments that are fair and accessible for all students. Criterion approaches have several advantages: Alignment: Perhaps the greatest advantage of using a criterion approach is that learning is defined in relation to standards instead of the performance of other students. Using a criterion approach can help shift conversations away from scores and to student learning. Data Requirements: While previous data can support educators in setting parameters using a criterion approach, it is not a requirement. Educators can use experience with similar tasks, items, or problems to inform the parameters they set. Planning: Parameters should be determined prior to using the assessment. This allows for parameters to serve as a planning tool for educators. By having a clear understanding of what skills students are expected to demonstrate over a course or year, educators can backwards plan appropriate lessons and assessments to ensure they are on track for students to meet those expectations. Singletons: One of the important advantages of using common assessments to inform educator practice is that they can relieve the isolation that some educators experience. By looking at the results of students from other classrooms, an educator can begin to understand his/her relative strengths and weaknesses. A singleton teacher using a criterion approach does not need to set parameters alone. For example, a ceramics teacher might invite the two other art teachers in the district to help develop parameters with him/her. In fact, the process may help support a broader and more cohesive approach to arts education and assessment. Using a Criterion Approach with a Rubric Many common assessments are scored using a rubric. Designed carefully, a rubric can be quite helpful in helping educators understand criterionbased parameters. For example, consider a writing rubric written such that moving up 2 to 3 performance levels is a demonstration of moderate growth; therefore, moving more than 3 performance levels would be evidence of high growth and moving fewer than 2 levels would be evidence of low growth. For this approach to work, educators would have to be confident that it is equally challenging to move from performance level to the next throughout the rubric. Yearly Comparisons: Since parameters are not based on student scores, an educator can look across years to see how they have made improvements from year to year. Competition: Since determinations of high, moderate, and low growth are not based on other scores, students who demonstrate high growth do not impact the ability of other students to also demonstrate high growth. However, criterion approaches also have drawbacks that educators must consider: Time: Compared to normative approaches, criterion approaches are time intensive. They require time for different educators to share their perspectives and groups to arrive at consensus-based decisions. Experience: Since the process of determining parameters using a criterion approach involves educators’ professional judgment, it relies on educators with experience sharing what they know about past students’ learning progressions. Teams should plan to revisit parameters in the first few years to capitalize on increased educator experience and knowledge in the parameter refinement process. Comparability: A criterion approach has the advantage of tying parameters closely to the standards and learning objectives for each content area. Unfortunately as a result, it can be difficult to arrive at a clear cross-content definition of moderate growth. This can make it difficult to know whether all groups of educators are using comparably rigorous definitions of moderate growth As a result, districts need to pay close attention to comparability across different measures. DDM Implementation Brief on Parameter Setting 5 Applying Student Results to Parameters Setting parameters at the student level: For most assessments, it makes sense to determine parameters at the student level, whether using a normative or criterion approach. In other words, parameters that will be applied to each student’s work to determined whether the student demonstrated high, moderate, or low learning, growth, or achievement. When looking across the assessment results, evaluators can ask, “Have more than half of the educator’s students demonstrated either high or low growth?” If so, they will look to see if the results are consistent with a pattern of high or low growth in the educator’s students on other measures and over multiple years. This information will ultimately inform the educator’s Student Impact Rating (see additional guidance on determining an educator’s Student Impact Rating). Setting parameters at the class level There may be assessments where setting parameters at the student level does not make sense. For example, many art, music, and physical education teachers work with large numbers of students for short periods of time. In these cases, it may be appropriate to use common assessments that look at whether students have demonstrated high, moderate, or low growth as a class instead of individual students. This approach can also provide powerful feedback about an educator’s impact. For example, a team of art teachers may have each of their students develop an individualized goal based on an initial assessment. The team could then determine that moderate growth for a class would be 70% to 90% of the students in the class meeting their goals. Another example of a class-based approach to parameters comes from a music teacher who worked Same Number of Students in Each Category? There is no requirement that there be an equal percentage of student scores that fall into the high, moderate, and low growth categories for all common assessments. However, if only a small handful of scores generally fall in a given category, it makes it difficult to notice trends or patterns about where and with whom high or low growth is occurring. Districts are encouraged to have rigorous definitions of high and low growth in order to provide meaningful feedback to educators about student learning. If, over time, few or no students demonstrate high or low growth on a particular assessment, it is a good signal to revisit the parameters to see if they should be adjusted up or down. 6 DDM Implementation Brief on Parameter Setting across a district teaching different groups of students 13 half hour lessons. She was concerned that with so little instructional time, a pre-test and post-test would take up too significant an amount of her instructional time with each group. She decided that it would be more meaningful to measure whether each group as a unit demonstrated the type of growth she was expecting. During the first lesson with each group, she assessed the group using three questions. She then asked those same three questions during the last lesson. This short assessment only took a few minutes, and since her focus was growth as a group, she did not worry about making determinitions at the student level, which would have involved mitigating floor and ceiling effects. While this assessment was very simple, overall she had a robust assessment of her impact without overburdening her classroom with assessments. In some instances collecting a small amount of data from a large number of students can be just as meaningful as collecting a large amount of data from a small number of students. Parameters Appropriate for All Students Using Banding: All students should have an equal chance to demonstrate growth on a common assessment. The banding process allows educators to set growth parameters that capture different cutoff scores depending on the student’s baseline score. Setting “bands” according to baseline scores allows educators to set low, moderate, and high ranges of growth for students more accurately and acknowledges that, on many assessments, an increase of 1 point does not necessarily equal the same amount of growth consistently across the scale. In other words, it may be easier for students to move from a baseline score of 5 to an end-of-course score of 10 than it is to move from a baseline of 90 to an end-of-course score of 95. The table below is an example of ranges of low, moderate, and high growth for students in three bands, based on baseline scores. Educators may create as many bands in the parameter setting process as they wish, but three is the recommended minimum when working with students with diverse learning profiles. In the following example, three bands were set to capture different rates of growth. Districts can use either normative or criterion approaches while using banding. If they use a normative approach, instead of low growth representing those students in the lowest third, low growth would represent those students who performed in the lowest third on the post-test out of students compared only to those students whose pre-test scores fell in the same initial performance range. This is the approach can be seen as a highly simplified version of the process used to determine SGPs. Districts are encouraged in future years to continue to investigate issues of fairness in the process of continuous improvement. Using the Information If a district uses a criterion approach, instead of educators considering a single group of students, educators would be asked to consider a group of students to represent each band. For example, if a district chose to use three bands, educators would discuss three groups of students: students whose initial scores suggest they are below grade level, at grade level, and above grade level. See the video example of this process below. The purpose of the educator evaluation system is to support the professional growth and development of educators. Ensuring that all educators are using assessments with clear parameters ensures that educators contribute to and understand shared expectations for student learning, growth, and achievment. Below are questions for evaluators and educators to support the use of parameters in improving instruction and ultimately student learning. Moderate Growth Parameters using Banding Initial Scores Low Moderate High 0-2 0-5 6–7 8 – 10 3-5 0-6 7–8 9-10 6-7 0-7 8–9 10 Educators will often discover that students with higher initial scores, present a “ceiling effect” problem that prevents these students from demonstrating growth. The addition of new, harder questions on the pre-test, or the use of a modified assessment are ways to address the ceiling effect. When using bands, it is important to keep in mind that the goal is not to set different expectations for different students, but rather to acknowledge that the number of points that represents moderate growth may be different based on the initial score. Parameter Setting Example Video Click on the image to access a step-by-step walk through of a criterion based approach to setting parameters on a 5th grade math assessment using three bands of initial scores. For Evaluators: Evaluators should consider two important questions for all of the assessments used as evidence for determining Student Impact Ratings. 1. First, evaluators should look across all common assessments in the district and ask whether similar numbers of students are demonstrating high, moderate, or low growth across the different assessments. Having an assessment that consistently results in high numbers of students demonstrating high (or low) growth is not necessarily a problem. However, evaluators should look for additional evidence to support this finding. Absent evidence that the assessment is helping educators make meaningful distinctions about student performance, evaluators should work with educators to revise the parameters. Parameters are likely to change over time, as educators learn more about how students typically perform on a given assessment. 2. Second, evaluators should investigate if students that have demonstrated low growth on a given assessment are clustered in a specific classroom. Most of the time students that have demonstrated low growth will be spread across all classrooms. This is to be expected. However, if low growth is concentrated in a single classroom, there is evidence that the educator(s) may need to reinforce the knowledge and skills measured by the assessment. That said, it is important to remember that low growth may cluster in a classroom based on factors other than the instruction. For this reason, Student Impact Ratings always factor in the context in which the undergirding assessments were administered. DDM Implementation Brief on Parameter Setting 7 For Educators: The systematic use of common assessments provides an excellent opportunity for educators to take an honest look at the impact of their instruction. 1. Educators may look across the population of students that are demonstrating low growth and investigate whether there are common characteristics across these students. Doing so may reveal potential issues of bias with the assessment. For example, many assessments result in low growth scores for the highest achieving students. This is largely due to ceiling effects that can be mitigated through revisions to the assessment and/or scoring process. 2. Educators are encouraged to look closely at the common assessment results to see if patterns emerge. For example, educators may look across how students have grown on different categories of a rubric and determine that students in one class are not making the same type of growth in providing details connected to text as in other classes. Understanding how different educators’ students are succeeding on different parts of an assessment provides a good framework for collaborative learning between teachers. This is very much the value and intent of using common assessments and connects back to the goal of the educator evaluation framework, that is, to support educator growth and development. Reviewing Parameters It is expected that districts will review and revise parameters to ensure they are providing meaningful feedback to educators, especially during the first couple of years of using a common assessment. One strategy for engaging educators in parameter revisions is identification agreement. Identification Agreement: The process of identification agreement involves investigating if the determination of high, moderate, and low is consistent if more data was used to make that determination. There is not an expectation that there will be perfect agreement. Even a student who has made high growth may have an ‘off’ day when completing an assessment. However, if there is an overall pattern of inconsistency, districts should make changes to parameters and potentially the assessment itself. Educators can follow this process: 8 DDM Implementation Brief on Parameter Setting Randomly select one student that has demonstrated each level of growth (high, moderate, and low) on the assessment. Collect additional information about those three students’ from the year. This can include performance on other assessments, student work samples, and teacher testimonials. Looking at these multiple pieces of evidence, ask whether the totality of the evidence supports the conclusion that was reached on the original assessment. In other words, did the other evidence collected about the student who demonstrated high growth on the common assessment also signal that the student exceeded expectations for the year? If the conclusions based on multiple pieces of evidence match the results of the common assessment for all three students, there is some evidence that the parameters are appropriately set. If they do not match, investigate whether the parameters should be adjusted or whether the common assessment is not well-aligned to the other data collected. In borderline cases, randomly choose another three students and repeat the process.