Using Assessment Results to Inform Educator Effectiveness

advertisement
Wyoming Accountability Advisory Committee
Scott Marion & Chris Domaleski
Center for Assessment
June 14, 2012





Some background
Outline key decisions for creating educator
evaluation systems
Our purpose today is to highlight some of the
key decisions we will need to make through the
interim
We’ll be asking a lot more questions than
providing answers, but we will need to answer
these questions in order to move forward…
A process note: Given the number of people on
the WEBEX/call, I will pause at specific places in
the presentation to respond to questions.
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
2




Wyoming, like an increasing number of states,
intends to revise its teacher and leader evaluation
practices
Educator effectiveness will be determined “in part
by student achievement”
This enterprise holds great promise, but also
presents real challenges
We are fortunate to be able to build off of the
work in many other states. We are closely
involved in:
◦ CO, RI, NH, GA, PA, UT, NYC, HI, LA
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
3



Why the interest in new forms of teacher
evaluation?
Nobody doubts the critical influence of
teacher quality on student achievement
Current (traditional) evaluation systems rarely
identify either highly effective or ineffective
teachers
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
4

From Aspen Report and our experience:
◦
◦
◦
◦
Vision and Goals
State-Local Roles and Responsibilities
Theory of Action
General Evaluation Model
 Coherence
◦ Specific Measurement Model(s)
◦
◦
◦
◦
◦
 Attribution rules
 Combining multiple measures
Information Requirements
Capacity Requirements
Reporting & Communication
Consequences & Support
Monitoring and Evaluation
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
5


What is the vision and what are the guiding
principles of the system we will design?
For example, will the system be designed to
identify and “council out” low quality
educators or is it designed primarily to
improve the performance of the majority of
educators?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
6






The primary purpose of the system is to maximize student learning
The system is designed to maximize educator development by providing
specific information, including appropriate formative information that
can be used to improve teaching quality.
Local instantiations of the State Model system must be designed
collaboratively among teachers, leaders, and other key stakeholders
such as parent and students as appropriate. Individual educators will
have input into the specific nature of their evaluation and considerable
involvement into the establishment of their specific goals.
The effectiveness rating of each educator shall be based on multiple
measures of teaching practice and student outcomes including using
multiple years of data when available, especially for measures of student
learning.
The Model system is designed to ensure that the framework, methods,
and tools lead to a coherent system that is also coherent with the
developing NH Leader Evaluation System.
The Model system shall be applied by well trained leaders and evaluation
teams using the multiple sources of evidence along with professional
judgment to arrive at an overall evaluation for each educator.
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
7


What will be the “reach” of the state in
defining local systems?
What factors must be considered in this
decision?
◦
◦
◦
◦

Comparability/portability vs. flexibility
Support and capacity building
Oversight and monitoring
Required Framework, “State Model” or Staterequired system
We are proceeding here with the assumption
that there will at least be a state required
framework?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
8





Grounds our design
Clarifies the assumptions, purposes, and goals of
the system
Specifies the various indicators and mechanisms
by which the system will fulfill its purposes (and
minimize unintended negative consequences)
Serves as a framework for evaluation
The ToA on the following slide is oversimplified
and somewhat naïve, but it is what is driving
much of the policy. We’ll be working with more
complex and honest ToAs as we do our work.
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
9
Hiring
Measures of
Educator
Effectiveness
and
Evaluation
Processes
Professional
Development
Student
Outcomes
Improve
Placement
Compensation
Dismissal
Career Ladder
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
10
Assumptions
or
Antecedents
Proximal
Indicators
Activities
and
Mechanisms
Intermediate
Indicators
Activities
and
Mechanisms
Distal
Indicators
(Intended
Outcomes)
Consequences
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
11

Let’s look at a more reasonable
approximation for an improvement-based
educator evaluation system
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
12
Educator
evaluation
system
Student
Learning
Improves
Focuses
educators’
attention on
productive
practices
Evaluation
results
improve
Student
performance
is well
measured
Results are
used to
improve
instruction
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
13

Policy makers should have to very explicitly say
why and how implementing test-based
approaches to support educator effectiveness
for these grades and subjects will lead to
improved educational opportunities for
students
For example, one might postulate that holding
teachers accountable for increases in student test
scores on classroom-based assessments will lead to
the development of both better assessments and
improvements in student learning.

What are the specific mechanism(s) by which
the intended outcomes will occur?
 E.g., targeted instruction, better PD, and/or more
appropriate curricular materials?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
14

What will be the major components of our
system?
◦ Measures of teacher practice
◦ Measures of student performance
◦ Student voice?
◦ Peer input?
◦ Other?


How will these be combined and weighted?
How will these classes of indicators be
integrated to form a coherent picture?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
15



Involves ensuring that the school
accountability and educator accountability
systems are sending similar messages to
schools and stakeholders
It would make sense to use data from the
school accountability system to augment
information from the educator system
Further, it would also make sense to integrate
the various components of the educator
evaluation system to avoid a silo effect
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
16



The following slides present some of the key
decisions related to measurement model that
will need to be made as we proceed?
As you know, the “devil is in the details” and
there are many details with which to contend.
This is even more complicated when trying to
reconcile and be clear about the state role
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
17

What are the indicators that operationalize
the knowledge & skills that define educator
practice? For example, domains from
Danielson’s Framework for Teaching include:




Planning and Preparation
The Classroom Environment
Instruction
Professional Responsibilities
◦ Should these be the default “standards of
professional practice” or should WY adopt
more general standards (e.g., ISLIC, NC,CO)
or leave it up to districts?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
18

Whatever standards are selected/developed,
how shall they be measured?
◦
◦
◦
◦



Classroom observations?
Document (artifact) analysis?
Structured interviews?
Professional portfolios?
What about required data collection strategies
and protocols (e.g., 4 observations/year)?
What are the expected levels of performance
on the various indicators?
What about observer training and
certification?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
19


What indicators of student growth should be
used for PAWS grades and content areas?
What performance (growth) indicators should
be used for non-PAWS grades and content
areas?
◦ This is a huge issue!

Should state-level measures of student
growth be combined with local measures of
student performance for each educator
determination? If so, how?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
20

What analytic approach (model) will be used
for analyzing State test data?
◦ What are the technical and policy issues that need
to be considered in choosing a model?
◦ What are the advantages/disadvantages of using
SGPs for educator evaluation?



What is the standard for ‘good enough’
growth?
Should growth expectations be “conditioned”
on factors other than prior performance such
as poverty, etc.?
What information should be reported to
whom and at what level?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
21
No curriculum
framework
(25%)
th
Curriculum Framework
but no Assessment
(32%)
th
8 & 12 Grade
History & Social
Science
Assessment, but
no growth
(10%)
Growth Indirect
(17%)
AP and IB
Teachers**
Visual
Arts
Admin
Staff
Pre-K – 2
K-4 Reading
using DIBELS &
Grade **
Growth Direct
(16%)
3rd Grade
Teachers
Phys Ed
Voc Ed
ELA and Math 4-8
Self Contained Classes and
Middle School Subject teachers
Health
9 & 10 ELA
and
Math
teachers
Business
& Mkting
Special
Gr. 11 & 12
STE, ELA
& Math
Education 4-10
K-12 ELL
Teachers (MEPA)
Foreign Language
Reading Specialists (4-8)
MS & HS School STE
Teachers
Special
Education
Specialists
K-2, 11&12
Spring 2010
Robert Lee, Massachusetts
ESE
Music
MS & HS
Computers
7th Grade
History
Teachers*
Gr 10 & 11
US History*
Drama
HS Electives
* HSS Tests
have been
suspended
**These
teachers have
not been
linked yet
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
22











Lack of high quality measures of student performance,
particularly for the purposes for which they are being used
Limitations of analytical options for calculating educator
contributions to student performance
Comparability concerns
Lack of technical capacity at the local and even state levels
Lack of predictable course sequences
Not enough time
Not enough money
Too much policy pressure (e.g., 50%)
Huge risk of corruption
Challenging issues of attribution
Many of these are challenges for tested as well as non-tested, but
may be exacerbated for non-tested subjects and grades
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
23
Instead of dealing with each individual case, it
makes sense to create an approach for
addressing categories of educators
 The general categorization can occur at the
state level and should be fine-tuned at the
district or even school level
 One classification approach is based on the
data available for the various groups of
educators
 The following excerpt of a chart, created for
Colorado, provides examples of the nominal
types of educators that would fall into the
different data categories

Center for Assessment. WY Accountability Advisory Committee (6/14/12)
24
Personnel defined by end of year state summative
Personnel Type (Examples)
assessments available
Personnel teaching a core subject area where end of Grades 4 -10 core subject teachers for literacy and
year state assessments measuring content taught in
math
their subject area are available in two adjacent grades Interventionists/specialists with shared responsibility
with core subject teachers for improving
literacy/numeracy skills of students in grades 4-10
(e.g., RTI specialists, ELA, special education
teachers)
Personnel teaching in a core subject area where an
Science teachers (currently, grades 5,8 and 10) and
end of year state summative assessment is available grade 3 teachers with end of year summative state
to measure content taught in their classrooms.
assessments available for their respective grade
Personnel teaching in a core subject area where no
end of year state summative assessments are
currently available to measure content taught in their
classrooms.
Core subject teachers in the sciences (with the
exception of grades 5, 8 and some personnel for10)
and social studies. All ECE, grades K-2 and grades
11-12 teachers.
Resource teachers/specialists with instructional
responsibility not directly linked to
literacy/numeracy skills of students (e.g., music, arts,
and P.E. teachers)
Personnel with no direct instructional responsibilities Resource teachers/specialists with indirect (noninstructional) responsibility for improving
literacy/numeracy skills of students (e.g., social
25
workers, psychologists, and school nurses).

What do we mean by comparability in this
context?
◦ Educators within the units of analysis are held to
similar levels of expectations, at least in some relative
sense
◦ For example, it would be a threat to the system if the
teachers in grades 4-8 reading and math received
noticeably lower ratings than the rest of the teachers
(NTSG) in the school

At what levels is comparability important?
◦ Within schools? Clearly yes.
◦ Within districts? Probably yes.
◦ Within states? It would be nice, but it might be too
high of a bar right now.
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
26
Norm-referenced tests (NRTs)
Commercial interim assessments
State or district created end-of-course exams
(both externally and locally developed)
1.
2.
3.
a.
Includes new assessment development in places like
DE, CO, Hillsborough, FL
School or teacher-developed measures of
student performance
4.
a.
Often includes Student Learning Objectives
*Note: 1 & 2 rarely cover courses beyond the core
content areas and even then, not well in HS.
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
27



If you thought the measurement/assessment
issue was daunting….
It pales in comparison to the analytic
challenges (i.e., how growth is calculated at
local levels)
Remember, using the most sophisticated VAM
models with high quality state test data has
been rightfully questioned based on challenges
with causal inferences, unreliability (year-toyear), and other technical issues (e.g., EPI
report, Braun, et al., 2010, Rothstein, 2009 &
2010)
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
28
1.
2.
Growth models using pre and post test from
the same subject
Value-added models
a. Pre and post test score in the same subject
b. Conditioned on data other than pretest from same
content area as posttest
3.
4.
5.
Student Growth Percentiles
Shared attribution of aggregate
growth/VAM results
Student learning objectives (SLO)
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
29




Growth refers to measures of performance for the same students
at two or more points in time and requires a common, often
vertical, scale to evaluate the magnitude of change. Only true
growth model here.
VAM: Generally describes multivariate models that include
certain variables to produce to an expectation against which
actual performance is evaluated.
Student Growth Percentiles (SGP) is a regression based measure
of growth that works by evaluating current achievement based
on prior achievement and describing performance (using
percentiles) relative to other students with the “same” prior
achievement histories.
Student Learning Objectives (SLO) is a general approach (often
called Student Growth Objectives) whereby educators establish
goals for individual or groups of students (often in conjunction
with administrators) and then evaluating the extent to which the
goals have been achieved.
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
30

Attribution: linking educator behavior
to student outcomes
◦ Assigning accountability
 Multiple educators contribute to instruction
 “Contact time” requirements—how long does
the student need to be in the teacher’s
classroom to count
◦ Opportunity to employ shared attribution
strategies
 Must be tied to local theories of action or
theories of improvement
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
31

How should we arrive at an overall judgment
of educator effectiveness?
◦ Weighting of student performance and knowledge &
skills

What are the different types of information
that should be employed when evaluating
principals compared with teachers?
◦ We know the specific indicators and even standards
will differ

Who should be responsible for making these
overall judgments?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
32





Data system requirements to link
students with teachers at the state level
Data system requirements to manage the
data at the local level
Dealing with student mobility
Dealing with missing data, especially
non-random missing data
“Full academic year” rules
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
33

How will this be managed at the state level?
◦
◦
◦
◦

Data, information, and analytics
Reporting and communication
Support and capacity building
Training and monitoring
How will this be managed at the local level?
◦ Capacity for implementation






Conducting observations, document analysis, etc
Induction, mentoring, and support
Training
Record keeping
Reporting and feedback
Decision making and appeals
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
34


How will results be communicated to
educators to improve practice?
How will information about the system be
communicated to the public and policy
makers while protecting educators?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
35




What sanctions, rewards, and/or
consequences are appropriate to advance
prioritized outcomes?
What strategies will be employed to use
information to support schools/ teachers/
students?
Is there capacity in the state (in the districts)
to improve educator quality in WY?
What resources will be required for this
improvement to occur?
◦ Where will they come from?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
36


As we consider the design and implementation of
WY’s new educator evaluation system, we must be
mindful that the likelihood of getting this wrong
(i.e., leading to unintended negative
consequences) are at least as high as the chances
of getting it right (i.e., improving teacher quality
and student learning)
Unintended consequences could include:
◦ Narrowing curriculum
◦ Competition vs. Cooperation
◦ Assignment of students or teachers to selected classes for
reasons unrelated to educational benefit
◦ Educator transition
◦ Educator attrition
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
37

"The more any quantitative social indicator is
used for social decision-making, the more
subject it will be to corruption pressures and
the more apt it will be to distort and corrupt the
social processes it is intended to monitor.”
(emphases added)
http://en.wikipedia.org/wiki/Campbell%27s_Law

Educator accountability systems will invite
significantly more implicit and explicit
corruption than has been seen with school
accountability
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
38



What types of formative evaluation
approaches need to be put in place to
monitor implementation and consequences?
Evaluate claims in theory of action
Evaluate impact
◦ Establish criteria to determine if results are
reasonable


Develop methods and standards to assess the
precision and stability of results
Does the system meet important utility
criteria?
Center for Assessment. WY Accountability Advisory Committee (6/14/12)
39

How should we plan our work going forward?
Who’s going to do what?
How will we work?

Goals for next meeting…


Center for Assessment. WY Accountability Advisory Committee (6/14/12)
40
Download