ASSESSMENT SRIG BIENNIAL MEETING MARCH 30, 2012 NAfME NATIONAL CONFERENCE 3:45PM-5:45PM GRAND B TIMOTHY S. BROPHY, CHAIR KELLY PARKES, INCOMING CHAIR Teacher Evaluation: Issues of Validity and Reliability TODAY’S PROGRAM 3:45pm. Greeting and Welcome; Election results. Timothy S. Brophy, Chair 3:55pm. Program begins: Teacher Evaluations – Issues of Validity and Reliability Timothy S. Brophy and Richard Colwell. Teacher Evaluation: Issues of Validity and Reliability. 4:20pm Dru Davison, Memphis City Schools. The Tennessee Fine Arts Pilot: A Multiple Measures Portfolio System (Perform, Create, Respond, Connect) with Blind Peer Review. Electronic presentation. 4:40pm Keitha Lucas Hamann, U. Minnesota-Twin Cities, and Doug Orzolek, University of St. Thomas. Teacher Performance Assessment in Minnesota: Challenges for Music Educators. 5:05pm Breakout groups – Strategies for Measuring Student Growth in Music 5:30pm Leaders report 5:40pm Announcements of upcoming events. Closing remarks by Kelly Parkes, Incoming Chair TEACHER EVALUATIONS: ISSUES OF VALIDITY AND RELIABILITY TIMOTHY S. BROPHY, UNIVERSITY OF FLORIDA RICHARD COLWELL, PROFESSOR EMERITUS, UNIVERSITY OF ILLINOIS NAfME CONFERENCE ASSESSMENT SRIG MEETING MARCH 30, 2012 SESSION OVERVIEW The Context for The Reform of Teacher Evaluation The Problem: Determining Music Teacher Effectiveness Validity and Reliability Issues Challenges to the SRIG THE POLITICAL CONTEXT: THE AMERICAN RECOVERY AND REINVESTMENT ACT (2009) Achieving Equity in Teacher Distribution The State will take actions to improve teacher effectiveness and comply with section 1111(b)(8)(C) of the ESEA (20 U.S.C. 6311(b)(8)(C)) in order to address inequities in the distribution of highly qualified teachers between high- and low-poverty schools, and to ensure that low-income and minority children are not taught at higher rates than other children by inexperienced, unqualified, or out-of-field teachers. (H.R.1, p. 169) THE POLITICAL CONTEXT: RACE TO THE TOP PHASE 2 CFDA NUMBER: 84.395A (2010) RTTT Phase 2 defines teacher evaluation: States, LEAs, or schools must include multiple measures, provided that teacher effectiveness is evaluated, in significant part, by student growth (as defined in this notice). Supplemental measures may include, for example, multiple observation-based assessments of teacher performance. (p. 19499) THE POLITICAL CONTEXT: RACE TO THE TOP PHASE 2 Student achievement means: (b) For non-tested grades and subjects: alternative measures of student learning and performance such as student scores on pre-tests and end-of-course tests; student performance on English language proficiency assessments; and other measures of student achievement that are rigorous and comparable across classrooms. (p. 19500) Student growth means the change in student achievement for an individual student between two or more points in time . A State may also include other measures that are rigorous and comparable across classrooms . (p. 19500) Source: Federal Register/Vol. 75, No. 71/Wednesday, April 14, 2010/ Notices THE NEW “EVALUATION EQUATION” 35-50% student achievement Teacher evaluation and “effectiveness” determination 50-65% observations or other methods MUSIC TEACHER EFFECTIVENESS RTTT defines effective teachers in very specific terms. We need to be able to know what it means for music teachers to be: “Effective” – when students achieve at “acceptable rates” – at least one grade level in an academic year “Highly effective” – when her/his students achieve at “high rates” – for example, 1.5 grade levels in an academic year A BIG QUESTION: What is a “year’s growth” in music education? How do we find out? THE “ELEPHANT IN THE LIVING ROOM” GROWTH IN MUSIC What do we need to measure “one grade level” of growth in music? Rigorous, standardsbased grade level music curriculum on all standards Clear, consistent grade-level expectations Valid, reliable assessments Comparability across schools, districts, and states PART 1 OF THE EQUATION: VALID AND RELIABLE ASSESSMENTS OF STUDENT MUSIC LEARNING Student music learning = student achievement in RTTT Assessment must be done well or not at all NAEP is one reference for validity and reliability NAfME continues to advocate for the arts as a core subject. Question: if music is a core subject, how do we define it? What is assessed? The 2008 NAEP analysis omitted validity, reliability, item analysis, regressions, factor analysis and other test characteristics NAEP analysis was concerned with demographic and SES related characteristics – race, gender, free and reduced lunch, community and school type, etc. PART 2 OF THE EQUATION: “OTHER MEASURES” STRENGTHS AND CAUTIONS Classroom Observation Principal Evaluation Instructional Artifact Portfolio Teacher Self-Report Student Survey Value-Added Model Source: Goe, Holdheide, & Miller (2011). A practical guide to designing comprehensive teacher evaluation systems. National Comprehensive Center for Teacher Quality: Washington, DC. GENERAL VALIDITY ISSUES USING STUDENT LEARNING MEASURES IN TEACHER EVALUATIONS To what extent do changes in a student’s performance reflect actual changes in his or her understanding of the underlying content? When student test scores are used to estimate teaching effectiveness, what is the extent to which those estimates accurately represent the teacher’s contribution to student learning? What evidence do we have regarding various threats to the validity of inferences for a particular use of a measure? How do we attribute student performance to individual teachers when the assessments are intended to cover material from multiple courses? Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance measures into teacher evaluation systems. Santa Monica, CA: Rand Corporation. VALIDITY ISSUES FOR MUSIC TEACHER EVALUATION Student achievement data used for music teacher evaluation MUST be from music assessments, not an arbitrary attribution of the effect of the music teacher on scores for the “usual tested subjects” of math, reading, science, and writing Student music achievement MUST be measured using valid, reliable instruments “Other measures” used MUST be valid for music teachers and account for the variables unique to music education Observations and Evaluative tools MUST be implemented by trained personnel who are content experts in music education GENERAL RELIABILITY ISSUES Common approach: internal consistency reliability, which expresses the extent to which items on the test measure the same underlying construct Measures of internal consistency reliability do not take into account interrater reliability in the scoring of any open-response items that tests may include, and they also do not measure the reliability of the value-added estimates themselves. Interrater reliability is an important consideration in the case of items that are assessed by human scorers because one wants to minimize the extent to which an individual’s score on the assessment is dependent on the idiosyncrasies of the rater who happens to score it. Reliability of value-added estimates is an important consideration because, due to random classroom- and student-level error, value-added estimates are known to be unstable from year to year. Source: Steel, Hamilton, & Stecher (2010). Incorporating student performance measures into teacher evaluation systems. Santa Monica, CA: Rand Corporation. RELIABILITY NEEDS FOR MUSIC TEACHER EVALUATION: STUDENT MUSIC ACHIEVEMENT Clearly defining “openended” responses in music – prepared performance, ondemand performance, composition, improvisation, arrangement, etc. Norming/calibration of rubrics used for openended responses Expert rubric development and training of scorers Thorough item analysis for all item types DEVELOPING ASSESSMENT RELIABILITY AND VALIDITY: ITEM ANALYSIS Readily available analysis techniques allow us to obtain sophisticated item analysis data for music items Item Response Theory models should become the standard analysis approach 3 parameter models for dichotomous items which measure difficulty and discrimination while controlling for guessing Polytomous generalized rating scale models extend IRT theory to the analysis of rubric-based assessments (i.e. Samejima’s graded response model) Easy software programs such as XCalibre4™ make these complex calculations accessible Frank Baker’s classic book, Basics of Item response theory, is now a free ERIC document TEACHER EFFECTIVENESS A CALL FOR ACTION IN MUSIC EDUCATION Prince et al (2009) The Other 69 Percent: “Identifying highly effective teachers of subjects that are not tested with standardized achievement tests — such as teachers of art, music, physical education, vocational education, and foreign languages — requires a different approach.” (p. 5) “It is easy to believe that we can assess whether students read well or solve math problems well or understand social studies or science, but it is much more difficult to imagine how to assess whether students properly understand a subject such as art. Until we can agree on what constitutes effective teacher performance, it will be difficult to measure it and reward it.” (p. 6) MUSIC TEACHER EFFECTIVENESS QUESTIONS FOR OUR PROFESSION What is an effective music teacher? What is a highly effective music teacher? How do we measure music teacher effectiveness? How do we evaluate music teacher effectiveness? CHALLENGE TO THE SRIG: EVALUATION RESEARCH NEEDS FIRST AND FOREMOST: We must lead the profession to develop technically sound, valid, reliable, assessments of student music learning in every state, that are thoroughly analyzed for validity, reliability, DIF, and item characteristics A process or model of assessment development for states and districts In cooperation with SMTE, collect and evaluate the validity and reliability of music teacher evaluation systems in NAfME states Design and implement studies to develop empirically supported criteria for music teacher evaluation, use these to develop music teacher evaluation models, and assess their validity and reliability THE “EVALUATION DILEMMA” “Solutions to the evaluation dilemma are as complex as the issue itself. The evaluation of music teachers remains an area in need of relevant research, and the development of an appropriate evaluation and observation instrument must be urgently addressed. It is now the responsibility of the united music teaching profession, in tandem with active music education researchers, to address this challenge.” Source: Brophy (1993) Evaluation of music educators: Toward defining an appropriate instrument. THANK YOU