Wyoming Accountability Advisory Committee Scott Marion & Chris Domaleski Center for Assessment June 14, 2012 Some background Outline key decisions for creating educator evaluation systems Our purpose today is to highlight some of the key decisions we will need to make through the interim We’ll be asking a lot more questions than providing answers, but we will need to answer these questions in order to move forward… A process note: Given the number of people on the WEBEX/call, I will pause at specific places in the presentation to respond to questions. Center for Assessment. WY Accountability Advisory Committee (6/14/12) 2 Wyoming, like an increasing number of states, intends to revise its teacher and leader evaluation practices Educator effectiveness will be determined “in part by student achievement” This enterprise holds great promise, but also presents real challenges We are fortunate to be able to build off of the work in many other states. We are closely involved in: ◦ CO, RI, NH, GA, PA, UT, NYC, HI, LA Center for Assessment. WY Accountability Advisory Committee (6/14/12) 3 Why the interest in new forms of teacher evaluation? Nobody doubts the critical influence of teacher quality on student achievement Current (traditional) evaluation systems rarely identify either highly effective or ineffective teachers Center for Assessment. WY Accountability Advisory Committee (6/14/12) 4 From Aspen Report and our experience: ◦ ◦ ◦ ◦ Vision and Goals State-Local Roles and Responsibilities Theory of Action General Evaluation Model Coherence ◦ Specific Measurement Model(s) ◦ ◦ ◦ ◦ ◦ Attribution rules Combining multiple measures Information Requirements Capacity Requirements Reporting & Communication Consequences & Support Monitoring and Evaluation Center for Assessment. WY Accountability Advisory Committee (6/14/12) 5 What is the vision and what are the guiding principles of the system we will design? For example, will the system be designed to identify and “council out” low quality educators or is it designed primarily to improve the performance of the majority of educators? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 6 The primary purpose of the system is to maximize student learning The system is designed to maximize educator development by providing specific information, including appropriate formative information that can be used to improve teaching quality. Local instantiations of the State Model system must be designed collaboratively among teachers, leaders, and other key stakeholders such as parent and students as appropriate. Individual educators will have input into the specific nature of their evaluation and considerable involvement into the establishment of their specific goals. The effectiveness rating of each educator shall be based on multiple measures of teaching practice and student outcomes including using multiple years of data when available, especially for measures of student learning. The Model system is designed to ensure that the framework, methods, and tools lead to a coherent system that is also coherent with the developing NH Leader Evaluation System. The Model system shall be applied by well trained leaders and evaluation teams using the multiple sources of evidence along with professional judgment to arrive at an overall evaluation for each educator. Center for Assessment. WY Accountability Advisory Committee (6/14/12) 7 What will be the “reach” of the state in defining local systems? What factors must be considered in this decision? ◦ ◦ ◦ ◦ Comparability/portability vs. flexibility Support and capacity building Oversight and monitoring Required Framework, “State Model” or Staterequired system We are proceeding here with the assumption that there will at least be a state required framework? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 8 Grounds our design Clarifies the assumptions, purposes, and goals of the system Specifies the various indicators and mechanisms by which the system will fulfill its purposes (and minimize unintended negative consequences) Serves as a framework for evaluation The ToA on the following slide is oversimplified and somewhat naïve, but it is what is driving much of the policy. We’ll be working with more complex and honest ToAs as we do our work. Center for Assessment. WY Accountability Advisory Committee (6/14/12) 9 Hiring Measures of Educator Effectiveness and Evaluation Processes Professional Development Student Outcomes Improve Placement Compensation Dismissal Career Ladder Center for Assessment. WY Accountability Advisory Committee (6/14/12) 10 Assumptions or Antecedents Proximal Indicators Activities and Mechanisms Intermediate Indicators Activities and Mechanisms Distal Indicators (Intended Outcomes) Consequences Center for Assessment. WY Accountability Advisory Committee (6/14/12) 11 Let’s look at a more reasonable approximation for an improvement-based educator evaluation system Center for Assessment. WY Accountability Advisory Committee (6/14/12) 12 Educator evaluation system Student Learning Improves Focuses educators’ attention on productive practices Evaluation results improve Student performance is well measured Results are used to improve instruction Center for Assessment. WY Accountability Advisory Committee (6/14/12) 13 Policy makers should have to very explicitly say why and how implementing test-based approaches to support educator effectiveness for these grades and subjects will lead to improved educational opportunities for students For example, one might postulate that holding teachers accountable for increases in student test scores on classroom-based assessments will lead to the development of both better assessments and improvements in student learning. What are the specific mechanism(s) by which the intended outcomes will occur? E.g., targeted instruction, better PD, and/or more appropriate curricular materials? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 14 What will be the major components of our system? ◦ Measures of teacher practice ◦ Measures of student performance ◦ Student voice? ◦ Peer input? ◦ Other? How will these be combined and weighted? How will these classes of indicators be integrated to form a coherent picture? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 15 Involves ensuring that the school accountability and educator accountability systems are sending similar messages to schools and stakeholders It would make sense to use data from the school accountability system to augment information from the educator system Further, it would also make sense to integrate the various components of the educator evaluation system to avoid a silo effect Center for Assessment. WY Accountability Advisory Committee (6/14/12) 16 The following slides present some of the key decisions related to measurement model that will need to be made as we proceed? As you know, the “devil is in the details” and there are many details with which to contend. This is even more complicated when trying to reconcile and be clear about the state role Center for Assessment. WY Accountability Advisory Committee (6/14/12) 17 What are the indicators that operationalize the knowledge & skills that define educator practice? For example, domains from Danielson’s Framework for Teaching include: Planning and Preparation The Classroom Environment Instruction Professional Responsibilities ◦ Should these be the default “standards of professional practice” or should WY adopt more general standards (e.g., ISLIC, NC,CO) or leave it up to districts? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 18 Whatever standards are selected/developed, how shall they be measured? ◦ ◦ ◦ ◦ Classroom observations? Document (artifact) analysis? Structured interviews? Professional portfolios? What about required data collection strategies and protocols (e.g., 4 observations/year)? What are the expected levels of performance on the various indicators? What about observer training and certification? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 19 What indicators of student growth should be used for PAWS grades and content areas? What performance (growth) indicators should be used for non-PAWS grades and content areas? ◦ This is a huge issue! Should state-level measures of student growth be combined with local measures of student performance for each educator determination? If so, how? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 20 What analytic approach (model) will be used for analyzing State test data? ◦ What are the technical and policy issues that need to be considered in choosing a model? ◦ What are the advantages/disadvantages of using SGPs for educator evaluation? What is the standard for ‘good enough’ growth? Should growth expectations be “conditioned” on factors other than prior performance such as poverty, etc.? What information should be reported to whom and at what level? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 21 No curriculum framework (25%) th Curriculum Framework but no Assessment (32%) th 8 & 12 Grade History & Social Science Assessment, but no growth (10%) Growth Indirect (17%) AP and IB Teachers** Visual Arts Admin Staff Pre-K – 2 K-4 Reading using DIBELS & Grade ** Growth Direct (16%) 3rd Grade Teachers Phys Ed Voc Ed ELA and Math 4-8 Self Contained Classes and Middle School Subject teachers Health 9 & 10 ELA and Math teachers Business & Mkting Special Gr. 11 & 12 STE, ELA & Math Education 4-10 K-12 ELL Teachers (MEPA) Foreign Language Reading Specialists (4-8) MS & HS School STE Teachers Special Education Specialists K-2, 11&12 Spring 2010 Robert Lee, Massachusetts ESE Music MS & HS Computers 7th Grade History Teachers* Gr 10 & 11 US History* Drama HS Electives * HSS Tests have been suspended **These teachers have not been linked yet Center for Assessment. WY Accountability Advisory Committee (6/14/12) 22 Lack of high quality measures of student performance, particularly for the purposes for which they are being used Limitations of analytical options for calculating educator contributions to student performance Comparability concerns Lack of technical capacity at the local and even state levels Lack of predictable course sequences Not enough time Not enough money Too much policy pressure (e.g., 50%) Huge risk of corruption Challenging issues of attribution Many of these are challenges for tested as well as non-tested, but may be exacerbated for non-tested subjects and grades Center for Assessment. WY Accountability Advisory Committee (6/14/12) 23 Instead of dealing with each individual case, it makes sense to create an approach for addressing categories of educators The general categorization can occur at the state level and should be fine-tuned at the district or even school level One classification approach is based on the data available for the various groups of educators The following excerpt of a chart, created for Colorado, provides examples of the nominal types of educators that would fall into the different data categories Center for Assessment. WY Accountability Advisory Committee (6/14/12) 24 Personnel defined by end of year state summative Personnel Type (Examples) assessments available Personnel teaching a core subject area where end of Grades 4 -10 core subject teachers for literacy and year state assessments measuring content taught in math their subject area are available in two adjacent grades Interventionists/specialists with shared responsibility with core subject teachers for improving literacy/numeracy skills of students in grades 4-10 (e.g., RTI specialists, ELA, special education teachers) Personnel teaching in a core subject area where an Science teachers (currently, grades 5,8 and 10) and end of year state summative assessment is available grade 3 teachers with end of year summative state to measure content taught in their classrooms. assessments available for their respective grade Personnel teaching in a core subject area where no end of year state summative assessments are currently available to measure content taught in their classrooms. Core subject teachers in the sciences (with the exception of grades 5, 8 and some personnel for10) and social studies. All ECE, grades K-2 and grades 11-12 teachers. Resource teachers/specialists with instructional responsibility not directly linked to literacy/numeracy skills of students (e.g., music, arts, and P.E. teachers) Personnel with no direct instructional responsibilities Resource teachers/specialists with indirect (noninstructional) responsibility for improving literacy/numeracy skills of students (e.g., social 25 workers, psychologists, and school nurses). What do we mean by comparability in this context? ◦ Educators within the units of analysis are held to similar levels of expectations, at least in some relative sense ◦ For example, it would be a threat to the system if the teachers in grades 4-8 reading and math received noticeably lower ratings than the rest of the teachers (NTSG) in the school At what levels is comparability important? ◦ Within schools? Clearly yes. ◦ Within districts? Probably yes. ◦ Within states? It would be nice, but it might be too high of a bar right now. Center for Assessment. WY Accountability Advisory Committee (6/14/12) 26 Norm-referenced tests (NRTs) Commercial interim assessments State or district created end-of-course exams (both externally and locally developed) 1. 2. 3. a. Includes new assessment development in places like DE, CO, Hillsborough, FL School or teacher-developed measures of student performance 4. a. Often includes Student Learning Objectives *Note: 1 & 2 rarely cover courses beyond the core content areas and even then, not well in HS. Center for Assessment. WY Accountability Advisory Committee (6/14/12) 27 If you thought the measurement/assessment issue was daunting…. It pales in comparison to the analytic challenges (i.e., how growth is calculated at local levels) Remember, using the most sophisticated VAM models with high quality state test data has been rightfully questioned based on challenges with causal inferences, unreliability (year-toyear), and other technical issues (e.g., EPI report, Braun, et al., 2010, Rothstein, 2009 & 2010) Center for Assessment. WY Accountability Advisory Committee (6/14/12) 28 1. 2. Growth models using pre and post test from the same subject Value-added models a. Pre and post test score in the same subject b. Conditioned on data other than pretest from same content area as posttest 3. 4. 5. Student Growth Percentiles Shared attribution of aggregate growth/VAM results Student learning objectives (SLO) Center for Assessment. WY Accountability Advisory Committee (6/14/12) 29 Growth refers to measures of performance for the same students at two or more points in time and requires a common, often vertical, scale to evaluate the magnitude of change. Only true growth model here. VAM: Generally describes multivariate models that include certain variables to produce to an expectation against which actual performance is evaluated. Student Growth Percentiles (SGP) is a regression based measure of growth that works by evaluating current achievement based on prior achievement and describing performance (using percentiles) relative to other students with the “same” prior achievement histories. Student Learning Objectives (SLO) is a general approach (often called Student Growth Objectives) whereby educators establish goals for individual or groups of students (often in conjunction with administrators) and then evaluating the extent to which the goals have been achieved. Center for Assessment. WY Accountability Advisory Committee (6/14/12) 30 Attribution: linking educator behavior to student outcomes ◦ Assigning accountability Multiple educators contribute to instruction “Contact time” requirements—how long does the student need to be in the teacher’s classroom to count ◦ Opportunity to employ shared attribution strategies Must be tied to local theories of action or theories of improvement Center for Assessment. WY Accountability Advisory Committee (6/14/12) 31 How should we arrive at an overall judgment of educator effectiveness? ◦ Weighting of student performance and knowledge & skills What are the different types of information that should be employed when evaluating principals compared with teachers? ◦ We know the specific indicators and even standards will differ Who should be responsible for making these overall judgments? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 32 Data system requirements to link students with teachers at the state level Data system requirements to manage the data at the local level Dealing with student mobility Dealing with missing data, especially non-random missing data “Full academic year” rules Center for Assessment. WY Accountability Advisory Committee (6/14/12) 33 How will this be managed at the state level? ◦ ◦ ◦ ◦ Data, information, and analytics Reporting and communication Support and capacity building Training and monitoring How will this be managed at the local level? ◦ Capacity for implementation Conducting observations, document analysis, etc Induction, mentoring, and support Training Record keeping Reporting and feedback Decision making and appeals Center for Assessment. WY Accountability Advisory Committee (6/14/12) 34 How will results be communicated to educators to improve practice? How will information about the system be communicated to the public and policy makers while protecting educators? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 35 What sanctions, rewards, and/or consequences are appropriate to advance prioritized outcomes? What strategies will be employed to use information to support schools/ teachers/ students? Is there capacity in the state (in the districts) to improve educator quality in WY? What resources will be required for this improvement to occur? ◦ Where will they come from? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 36 As we consider the design and implementation of WY’s new educator evaluation system, we must be mindful that the likelihood of getting this wrong (i.e., leading to unintended negative consequences) are at least as high as the chances of getting it right (i.e., improving teacher quality and student learning) Unintended consequences could include: ◦ Narrowing curriculum ◦ Competition vs. Cooperation ◦ Assignment of students or teachers to selected classes for reasons unrelated to educational benefit ◦ Educator transition ◦ Educator attrition Center for Assessment. WY Accountability Advisory Committee (6/14/12) 37 "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” (emphases added) http://en.wikipedia.org/wiki/Campbell%27s_Law Educator accountability systems will invite significantly more implicit and explicit corruption than has been seen with school accountability Center for Assessment. WY Accountability Advisory Committee (6/14/12) 38 What types of formative evaluation approaches need to be put in place to monitor implementation and consequences? Evaluate claims in theory of action Evaluate impact ◦ Establish criteria to determine if results are reasonable Develop methods and standards to assess the precision and stability of results Does the system meet important utility criteria? Center for Assessment. WY Accountability Advisory Committee (6/14/12) 39 How should we plan our work going forward? Who’s going to do what? How will we work? Goals for next meeting… Center for Assessment. WY Accountability Advisory Committee (6/14/12) 40