Three Options for Setting Student Growth Targets OMLA 2014

advertisement
Evaluating Pretest to Posttest
Score Differences in CAP Science and
Social Studies Assessments:
How Much Growth is Enough?
February 2014
Dale Whittington, Ph.D. – Shaker Heights
Russ Brown, Ph.D – CMSD
Denis Jarvinen, Ph.D. – Strategic Measurement
and Evaluation, Inc.
Setting Standards for Performance
Licensing Tests
(e.g., Pharmacists)
One Test and One Standard for Performance
Pass or Fail
State Accountability Testing
(e.g., Ohio OAA)
One Test, Multiple Standards
Below Basic, Basic, Proficient, Advanced
CAP Foundation Science and
Social Studies Assessments
Two Tests, One Standard
Evaluating Growth (How Much?)
Looking at Performance Standards
Content-Based
Standards
Goal of standard setting is to determine
a level of knowledge and skill judged to
be appropriate for test purpose
Growth-Based
Standards
Goal of standard setting is to use common
statistical feature(s) of the data to set a
criteria for acceptable performance
Three Statistic-Based Approaches
for Evaluating Growth
of Student Scores
•
Using Effect Size
•
Using The Score Distribution
•
Using the Standard Error of Measurement
Describing and Comparing
Approaches
•
Data Points Needed
•
Calculations Required
•
Outcomes Using a Common Set of Student Data
•
Advantages and Disadvantages
The Common Data Set
The Common Data Set
The Common Data Set
The Common Data Set
The Common Data Set
Shaker Heights Schools
Effect Size for SLO’s and Growth
Prepared by Dale Whittington
Shaker Heights City School District
Ohio Middle Level Annual Conference
Columbus, Ohio
February 21, 2014
What is effect size?
• In an educational setting, effect size is one way to
measure the effectiveness of a particular
intervention.
• Effect size enables us to measure both the
improvement (gain) in learner achievement for a group
of learners AND at the same time, take into account the
variation of student performance.
Adapted from Understanding, using and calculating effect size, Govt of South Australia, Department of
Education & Child Development, http://www.decd.sa.gov.au/quality/files/links/WhatIsEffectSize.pdf
Practical Advantages
•
•
•
•
Easy to calculate
Easy to understand; makes intuitive sense
Adaptable to different kinds of assessments
Adaptable to different kinds of ways of considering
growth and goals for SLO’s:
– Shared attribution across the district
– Shared attribution within a school
– Attribution for a specific teacher or group of students
So how do you calculate effect sizes
for SLO’s or growth?
Start with a set of pretest scores and
posttest scores for the same students
Calculate the difference between the
pretest & posttest for each student
Student
Pretest
Posttest
Denis
40
35
-5
Donna
25
30
+5
Dale
45
50
+5
Russ
30
40
+10
Difference
(AKA Gain)
Calculations Continued
Calculate the means and standard
deviations for both tests
Average the Standard Deviations
• Pretest
• The average of 9.1
and 8.5 is 8.8
– Mean: 35.0
– SD: 9.1
• Posttest
– Mean: 38.8
– SD: 8.5
How to adapt
• If your pretest and posttest are different lengths,
convert to a similar scale, like percentages.
• Think about who you are basing your analysis on
and use that to decide what standard deviation
(SD) to use
–
–
–
–
Common attribution for district: District SD
Common attribution for school: School SD
Class: Class SD
Specific group, such as economically disadvantaged: the
group’s SD
Use the average standard
deviation and the gains to
calculate the effect size:
Effect Size=Gain/SD
Student
Pretest
Posttest
Gain
Effect
Denis
40
35
-5
-.57
Donna
25
30
+5
+.57
Dale
45
50
+5
+.57
Russ
30
40
+10
+1.14
Interpret your results:
Common criteria
Cohen (1969)
•
‘Small’ (.2)
o
o
•
‘Medium’ (.5)
o
o
•
real, but difficult to detect
difference between the heights of 15 year old and 16 year old girls in the US
‘large enough to be visible to the naked eye’
difference between the heights of 14 & 18 year old girls
‘Large’ (.8)
o
o
‘grossly perceptible and therefore large’
difference between the heights of 13 & 18 year old girls
Hattie: “For students moving from one year to the next, the average
effect size across all students is 0.40.”
How results differ, depending on
attribution and how you tier students
Another Example based on OAA
Resources
• Understanding, using and calculating effect size. Government of South
Australia, Department of Education & Child Development,
http://www.decd.sa.gov.au/quality/files/links/WhatIsEffectSize.pdf
• Review Methods/Interpreting Effect Sizes. JHU: Best Evidence
Encyclopedia.
http://www.bestevidence.org/methods/effectsize.htm
• Calculating an effect size: a practical guide. Visible Learning Plus.
http://visiblelearningplus.com/faqs/calculating-effect-size-practical-guide
Establishing Growth Targets
with Limited Data
Prepared by Russ Brown, Ph.D – CMSD
Overview
• Design Principles for Student Growth Model
work
• The PROBLEM!
• An Idea for a Solution
• Strengths/Weaknesses
Guiding Principles
1. Equity - like measures for like teachers, like
expectations for like students.
2. Simplicity - Parsimony and transparency
are critical.
3. Continuous improvement will be critical – It
simply will not be perfect on the first try!
The PROBLEM
How much growth is enough?
How do you estimate this when you don’t know
the relationship between the two tests?
?
What do we know?
1. Basic information about the distribution of scores.
Time
Mean
SD
Pretest
24.28
9.6
2. The relative position of each student on the distribution.
Can we leverage this
to set targets?
The Idea
1. Devoid of any way to estimate what growth
“should be”…
2. Students of like ability (ie., same pretest
scores) would typically be expected to make
comparable growth over time.
3. Use Normal Curve Equivalents as a means to
establish targets and relative growth.
How
1. Translate Pre-Test scores to NCEs
Class
Pretest
Class1
Class1
Class1
8.0
9.0
9.0
PreMean
24.3
24.3
24.3
SD
Pre-Z
9.6
9.6
9.6
-1.7
-1.6
-1.6
PreNCE
14.2
16.4
16.4
Z= (Pretest Score - Mean Pretest Score)
Standard Deviation of the Pretest
NCE = (Z x 21.063)+ 50 (1-99 Interval)
Outcomes – What Threshold?
Calculating whether the goal is obtained:
PreClass
NCE
Stu 1
Stu 2
Stu 3
Stu 4
14.2
16.4
16.4
18.6
PostNCE Stringency of Goal
NCE Change
0
-5
-7.5
3.2
-11.1
No
No
No
7.6
-8.9
No
No
No
9.0
-7.4
No
No
Yes
11.9
-6.7
No
No
Yes
• Must make a judgment about the stringency of
the goal/calculation
Outcomes – What Performance Level?
Percent of students
achieving the Goal
Teacher Growth
Rating
Translation
90- 100%
80-89%
70-79%
60-69%
0-59%
5
4
3
2
1
Above
Met
Met
Met
Below
Outcomes – What Performance Level?
Group
Class 1
Class 2
Class 3
Class 4
Percent of Students
Reaching the Goal
0
-5
-7.5
1- 12.0% 1- 20.0% 1- 52.0%
4- 84.0% 5- 92.0% 5- 92.0%
1- 44.0% 1- 52.0% 2- 60.0%
1- 44.0% 1- 56.0% 2- 64.0%
Mean
Gain
24.04
37.88
34.44
34.76
• Not surprisingly – outcomes vary by the stringency of
the expectation…
Outcomes – Quick Comparison
Group
Class 1
Class 2
Class 3
Class 4
Percent of Students Reaching the Goal
0
-5
-7.5
1- 12.0%
1- 20.0%
1- 52.0%
4- 84.0%
5- 92.0%
5- 92.0%
1- 44.0%
1- 52.0%
2- 60.0%
1- 44.0%
1- 56.0%
2- 64.0%
Mean Gain
24.04
37.88
34.44
34.76
Percent of Students Reaching the Goal (SEM)
Group
Class 1
Class 2
Class 3
Class 4
3 SE
1- 44%
5- 96%
1- 52%
1- 48%
2 SE
4 -88%
5- 100%
1- 56%
2 – 60%
1 SE
5- 100%
5- 100%
3- 76%
3- 76%
Mean Gain
24.04
37.88
34.44
34.76
Outcomes – What about Real Data?
Applied to 3rd Grade OAA (Fall to Spring):
Percent of
students
achieving the
Goal
Building Growth
Rating
Translation
IRN Count
90- 100%
60-89%
0-59%
5
2-4
1
Above
Met
Below
0
37
36
Outcomes – What about Real Data?
Applied to 4th Grade Benchmark to OAA (Fall to Spring):
Percent of
students
achieving the
Goal
Building
Growth
Rating
Translation
IRN Count
Mean Value
Add Index
90- 100%
60-89%
0-59%
5
2-4
1
Above
Met
Below
2
50
13
1.96
-.68
-1.56
Pros and Cons
+ Students with like scores have like expectations
for growth
+ Relatively simple and relatively transparent
- Must make a value judgment about the amount
of error for which one wishes to compensate (not
so transparent)
- More adjustment = more bias at the bottom
Standard Error of Measurement
All scores have a “true” score and “error”
• Error bands on score reports
Standard Error quantifies degree of “error” in a test score
Formula is: Standard Error of Measurement =
Values needed: Mean, Standard Deviation, Reliability of the Test
Assumptions that underlie this approach
Steps
1) For a set of data, calculate the mean and standard deviation
2) Calculate the reliability of the test
3) Use the formula to determine the Standard Error of Measurement (class
level, school level)
4) Set a level for the growth standard (1 se, 2 se, etc.)
5) Add chosen level of standard error to raw score
6) Convert (raw score + standard error) to percent correct on pretest
7) Find corresponding percent correct/raw score on posttest
(Note: Assumptions here not required once IRT equating is completed)
8) Compare actual student posttest score with target score
9) At or above target score = “Acceptable Progress”
Calculations for one student
Results
Observations
High pretest scores can lead to out-of-range posttest score
targets.
Any modification to the sample that increases the Standard
Deviation will increase the value of the Standard Error and
therefore require more score growth to reach the target.
Download