J Pers Eval Educ (2007) 20:165–184 DOI 10.1007/s11092-008-9053-z What is the Relationship Between Teacher Quality and Student Achievement? An Exploratory Study James H. Stronge & Thomas J. Ward & Pamela D. Tucker & Jennifer L. Hindman Received: 19 December 2007 / Accepted: 25 January 2008 / Published online: 13 February 2008 # Springer Science + Business Media, LLC 2008 Abstract The major purpose of the study was to examine what constitutes effective teaching as defined by measured increases in student learning with a focus on the instructional behaviors and practices. Ordinary least squares (OLS) regression analyses and hierarchical linear modeling (HLM) were used to identify teacher effectiveness levels while controlling for student-level and class/school-level variables. Actual achievement of 1936 third grade students in 85 classrooms on the Virginia Standards of Learning (SOL) Assessment results in English, Mathematics, Social Studies, and Science were compared to expected achievement resulting in an indicator of teacher effectiveness. Based on student learning gains, teachers were divided into quartiles. The statistical modeling approach facilitated comparisons of outcomes that were free of influences of identified extraneous variables. A double blind design was selected for indepth cross-case studies with teachers from the highest quartile representing highly effective teachers (N=5) and the lowest quartile the less effective teachers (N=6). The observation team assessed the third grade teachers (N=11) based 20 categories within four domains: instruction, student assessment, classroom management, and personal qualities. Key findings indicate that effective teachers scored higher across the four domains. Additionally, effective teachers tended to ask a greater number of higher level (e.g., analysis) questions and had fewer incidences of off-task behavior than ineffective teachers. The exploratory study identified instructional behaviors and practices of teachers that result in higher student learning gains. J. H. Stronge (*) School of Education, The College of William and Mary, P.O. Box 8795, Williamsburg, VA 23187-8795, USA e-mail: jhstro@wm.edu T. J. Ward The College of William and Mary, Williamsburg, VA, USA P. D. Tucker University of Virginia, Charlottesville, VA, USA J. L. Hindman Teacher Quality Resources, LLC, Williamsburg, VA, USA 166 J Pers Eval Educ (2007) 20:165–184 Keywords Teacher quality . Teacher effectiveness . Ineffective teacher . Effective teacher . Student achievement . Questioning . Student learning gains In recent years, research has focused on the value-added connection between teaching and learning, with leading examples of this assessment process including the Tennessee Value-Added Assessment System and the Dallas Independent Public Schools (see, for example, Mendro 1998; Nye et al. 2004; Wright et al. 1997). Analysis of data from these and other programs offers dramatic evidence regarding the influence of the classroom teacher on student learning (Stronge and Tucker 2000; Tucker and Stronge 2005; Wenglinsky 2002). Thus, we have seen in recent years the emergence of a new approach to answering an age-old question: What is the valueadded impact of teachers on student learning? The purpose of the study reported here was to examine what constitutes effective teaching as defined by measured increases in student learning. Specifically, what are the instructional practices of teachers who facilitate high growth in student achievement measures? In an effort to address this guiding question, we engaged in the following two steps: 1. Used regression analyses (ordinary least squares and hierarchical linear modeling) identifying teacher effectiveness as measured by student learning gains while controlling for both student-level and class/school-level concomitant variables; and 2. Identified behaviors and practices distinguishing top quartile versus bottom quartile teachers (i.e., teachers who effected higher versus lower than predicted gains in student learning). 1 Background 1.1 Demand for Accountability The current demand for educational accountability has been building and crystallizing although the post-Sputnik period when reforms based on “excellence” and “accountability” emerged due in part to the Elementary and Secondary Education Act (ESEA) of 1965. This predecessor of today’s No Child Left Behind Act (NCLB) was intended to increase quality and equity by emphasizing an accountability component that required evidence of effectiveness for Title I programs (Sacks 1999). In subsequent decades, we have experienced wave after wave of educational reform efforts, most notably those advocated in A Nation at Risk in 1983 (National Commission on Excellence in Education) which “solidified the accountability trends of the 1960s and 1970s” (Heinecke et al. 2003, p. 22) and galvanized the national agenda of high standards. The last 40 years of reform efforts have focused primarily on the development of curriculum standards, assessments to measure student achievement, and school level reporting mechanisms to publicly explain results. Most recently, reauthorization of the Elementary and Secondary Education Act, better known as the No Child Left Behind Act, is intended to tie federal education funding directly to improvements in student test scores. J Pers Eval Educ (2007) 20:165–184 167 Unfortunately, much of the foregoing policy discussion has overlooked the most fundamental unit of change—the classroom—and the primary catalyst for improvement in our schools—the teacher. In recent years, there has been a renewed interest in the role of the teacher as the key to school improvement (Darling-Hammond and Youngs 2002). In fact, the 2001 NCLB legislation (34 CFR Part 200: Title I–Improving the Academic Achievement of the Disadvantaged; Final Rule) codifies the emphasis of having a highly qualified teacher in every classroom. Although the various states appear to be operationalizing a “highly qualified teacher” under NCLB in relatively simplistic licensure terms, at least the focus for reform has moved from state-, district-, or even school-level reforms to the classroom. To a large extent, this transition is grounded in the realization that any significant improvement in schools and in student learning must have the teacher as a centerpiece (see, for example, Darling-Hammond 1997; Mendro 1998; Stronge and Tucker 2000; Tucker and Stronge 2005). Basic teacher qualifications, as stipulated under NCLB, are certainly an important starting point in acknowledging the critical role of teachers in student learning. In the following section, however, we move beyond issues of preparation and qualifications for teaching to ones of teacher competence and effectiveness. 1.2 Relationship Between Teacher Effectiveness and Student Achievement Over the past few decades, numerous studies have focused on defining the characteristics of effective schools and teachers. Contemporary research has focused on the value-added connection between teaching and learning, with leading examples of this assessment process including the Tennessee Value-added Assessment System and the Dallas Independent Public Schools. Analysis of data from these and other programs offer dramatic evidence regarding the influence of the classroom teacher on student learning (Mendro 1998; Nye et al. 2004;Wright et al. 1997). There is a growing body of research critiquing the Tennessee Value Assessment System research (see, for example, Kupermintz 2002). Nonetheless, the evidence from multiple studies seems to confirm the efficacy of value-added approaches for assessing teacher quality. In a review of studies that utilize value-added modeling to explain teacher effects on student achievement, McCaffrey et al. (2003) concluded that while the value-added approach has limitations it nonetheless should be an alternative in examining teacher quality. They stated, “…given the current state of knowledge about VAM [value-added modeling] we expect that some efforts to estimate teacher effects could provide useful information on teachers” (p. 114). The over-arching finding from value-added studies is that effective teachers are, indeed, essential for student success. For example, Wright et al. (1997) found there is evidence that lower-achieving students are more likely to be placed with less effective teachers. Thus, the neediest students are being instructed by the least capable teachers. Using a multi-year database, Sanders and colleagues found that when children, beginning in third grade, were placed with three high performing teachers in a row, they scored, on average, at the 96th percentile on Tennessee’s statewide mathematics assessment at the end of fifth grade. When children with comparable achievement histories starting in third grade were placed with three low performing teachers in a row, their average score on the same mathematics assessment was at the 44th percentile, yielding a 52-percentile point difference. 168 J Pers Eval Educ (2007) 20:165–184 They claimed that “the immediate and clear implication of this finding is that seemingly more can be done to improve education by improving the effectiveness of teachers than by any other single factor” (Wright et al. 1997, p. 63). A more recent study based in Tennessee supports Wright et al. (1997) conclusions regarding the magnitude of teacher effects. In a randomized experiment in which students and teachers were randomly assigned to classes from Kindergarten through grade 3, Nye et al. (2004) concluded that “the results of this study support the idea that there are substantial differences among teachers in the ability to produce achievement gains in their students” (p. 253). Students of less effective teachers experienced reading achievement gains of one third of a standard deviation less than that of students with effective teachers. In mathematics the differences was slightly less than one half a standard deviation. In addition, data from the Dallas Independent Public Schools revealed that there is a powerful residual effect on student learning based on the quality of the teacher. If a student has a high performing teacher for just 1 year, the student will remain ahead of peers for at least the next few years of schooling. Unfortunately, if a student has an ineffective teacher, the influence on student achievement is not remediated fully for up to 3 years (Mendro 1998). A study of third grade teachers in an urban Virginia school district found that students of teachers in the top quartile of effectiveness (based on hierarchical linear modeling predictions) scored approximately 30–40 scale score points higher than expected on the Virginia Standards of Learning state assessment in English, Mathematics, Science, and Social Studies, respectively. Students of teachers in the bottom quartile of effectiveness scored approximately 24–32 points below expected scores. (The range for scale scores on these assessments is 200 to 600). These differences occurred after controlling for multiple demographic variables (e.g., gender, ethnicity, free/reduced lunch, special education status, ESL status, and days absent), students’ prior achievement, and class-level differences (e.g., class size, free/reduced lunch percentages; Stronge and Ward 2002). 1.3 Qualities of Effective Teachers Dimensions that characterize teacher effectiveness synthesized from a meta-review of extant research (Stronge 2002, 2007) were used as the conceptual framework for this study. From this review, the qualities of effective teachers were divided into the dimensions of instructional expertise, student assessment, learning environment, and personal qualities of the teacher (Table 1). Each of these dimensions focuses on a fundamental aspect of the teacher’s professional qualifications or responsibilities and is summarized below.1 1 Due to the extensive nature of the extant research related to qualities of effective teachers, it is not feasible to provide a comprehensive review in this manuscript. Thus, the manuscript provides only a summary table depicting prominent research related to key teacher qualities. For more in-depth coverage of teacher qualities, see Stronge 2002, 2007, and similar reviews. J Pers Eval Educ (2007) 20:165–184 169 Table 1 Summary of teacher effectiveness dimensions and related research Dimensions of teacher effectiveness Instruction Focus on instruction Expectations for achievement Planning for instruction Range of strategies Questioning Student engagement Homework Student assessment Monitor student progress Differentiation Learning environment Classroom management Organization Behavioral expectations Personal qualities Caring Fairness and respect Interactions with students Enthusiasm and motivation Attitude toward teaching Reflective practice Representative research base Allington 2002; Darling-Hammond 2000; Johnson 1997; Wenglinsky 2000 Peart and Campbell 1999; Wenglinsky 2002 Good and Brophy 1997; Jay 2002; Shellard and Protheroe 2000 Pressley et al. 2004; Walsh and Sattes 2005; Weiss et al. 2003 Eisner 2003/2004; Peart and Campbell 1999; Sternberg 2003; Zahorik et al. 2003 Cawelti 2004; Walsh and Sattes 2005; Wenglinsky 2002 Allington 2002; Berliner 1986; Cawelti 2004; Cotton 2000; Johnson 1997 Cotton 2000; Foegen et al. 2007; Janisch and Johnson 2003; Yesseldyke and Bolt 2007 Shellard and Protheroe 2000; Tomlinson 1999, 2003; VanTassel-Baska 2005 Johnson 1997; Marzano et al. 2003; Pressley et al. 2004; Wang et al. 1993 McLeod et al. 2003; Zahorik et al. 2003 Good and Brophy 1997; Hamre and Pianta 2005; Marzano 2003; Pressley et al. 2004 Boyle-Baise 2005; Collinson et al. 1999 McBer 2000; Peart and Campbell 1999 Corbett and Wilson 2002; Cruickshank and Haefele 2001; Darling-Hammond 2001; Peart and Campbell 1999 Rowan et al. 1997; Quek 2005 Hamre and Pianta 2005; Southeast Center for Teaching Quality 2003 Cruickshank and Haefele 2001; Good and Brophy 1997 2 Method 2.1 Part I: Identification of Effective Teachers 2.1.1 Setting and Target Population The data for the current study were collected from third grade students and teachers in a moderately sized urban school district located in Virginia. The school district has 36 schools, a student population of approximately 23,000, and a teacher population of nearly 1,500. The student population is predominantly 60% AfricanAmerican and 35% white; approximately 2% of the student population receives ESL services. The sample selected for this study consisted of the third grade regular classroom teachers and students in the school district. Data for 1936 students and 85 classrooms were used for the analyses. 170 J Pers Eval Educ (2007) 20:165–184 2.1.2 Identifying Teacher Effectiveness The methodology used for determining teacher effectiveness for the study relied on the assumption that effective teachers are those who foster achievement gains beyond that expected from the student’s past achievement. This methodology is similar to other value-added systems that have been in use for some time (Mendro et al. 1994). The methodology employed was both ordinary least squares (OLS) and hierarchical linear modeling (HLM). Control variables were used at both the individual and classroom levels as previous research (Mendro et al. 1994) has shown that effectiveness estimates can be biased if individual and classroom level background influences are not controlled for. Extant research also has shown that multiple models of the data need to be estimated and examined for fit (Webster et al. 1998). Simple one-stage OLS, two-stage OLS, and two-stage, two-level HLM models have been found to be the best fitting in previous applications and were employed in the current study. The tested models and predictors for fitting student achievement are described in Table 2. The HLM analysis was conducted using HLM 6 (Raudenbush et al. 2005). For the HLM analysis, grand-centering was utilized. Table 2 Teacher effectiveness identification models employed in the study Model 1st stage predictors Basic OLS regression Gender Age Free/reduced lunch status Race Days absent School mobility English proficiency status Degrees of reading proficiency grade 1 Degrees of reading proficiency grade 2 Two-stage OLS regression Block 1 Gender Age Free/reduced lunch status Race Days absent School mobility Block 2: Degrees of reading proficiency grade 1, Degrees of reading proficiency grade 2 Two-stage, two-level Student level HLM Gender Age Free/reduced lunch status Race Days absent School mobility English proficiency status Degrees of reading proficiency grade 1 Degrees of reading proficiency grade 2 2nd stage predictors None Class size Percent free/reduced lunch Percent male Percent minority Percent ESL Classroom level Class size Percent free/reduced lunch Percent male Percent minority Percent ESL Interactions of dichotomous variables J Pers Eval Educ (2007) 20:165–184 171 The target variables in each case were the third grade results on Virginia’s high stakes student assessment, Standards of Learning (SOL), in English, Mathematics, Social Studies, and Science. It should be noted that third grade teachers in the school district were in self-contained settings and, consequently, each teacher was primarily responsible for teaching all four subject areas to the students assigned to her/his class. The purposes of the state’s standards-based assessments at selected grades and high school subjects are to inform parents and teachers about what students are learning in relation to the SOL and to hold schools accountable for teaching the SOL content (Hambleton et al. 2000). Selection of the specific statistical models was based on examination of statistical fit. In each instance, two-stage OLS models were determined to provide a sufficient model.2 Table 3 presents the results of the selected model for each of the dependent variables. Age, race, number of days absent, and previous achievement (second grade measure of reading ability) were the consistent predictors across the analyses. Gender, class size, and percent receiving free or reduced lunch were other predictors that appeared in at least one of the analyses. The identified OLS models were used to establish the achievement expectations for each student. Actual achievement was then compared to expected achievement estimates from the selected OLS equation. In these analyses, positive differences indicated student achievement beyond expectation, zero differences indicated achievement commensurate with expectation, and negative differences indicated achievement below expectation. The difference scores of the students were standardized, aggregated, and averaged to develop a composite for each teacher. Consistent with previous research, a minimum of ten student cases per teacher was set as the floor value for establishing a teacher composite (Mendro 1998). Teacher composites were then corrected for class size. Analysis of the distribution of teacher composites allowed the identification of the most effective and least effective teachers, based on comparisons of student achievement scores after controlling for the various factors noted above. Figures 1, 2, 3 and 4 illustrate the distribution of teacher residuals for the four subject areas examined. 2.2 Part II: Comparative Analysis of Effective and Less Effective Teachers 2.2.1 Sample Part II of the study involved an examination of the instructional practices of teachers who effected higher than predicted gains in student learning and those who effected lower than predicted gains in student learning as measured by the SOL assessments. In order to explore the phenomenon of effective teaching, exploratory cross-case analyses were used. The results of part I were used to identify third-grade teachers for in-depth case studies from among the highest and lowest quartiles based on their student academic growth composite. Consequently, five of the 24 teachers from the top quartile and six of the 21 teachers from the lowest quartile were selected.3 Given 2 Since the Basic OLS and HLM models were not utilized, the statistical results for those analyses are not presented here. 3 Due to the extensive time and cost involved in conducting case study research, a small sample was selected (N=11). Thus, caution should be exercised in interpreting the results of this study due to the small sample size. 172 J Pers Eval Educ (2007) 20:165–184 Table 3 Results of regression analyses Variable Model R2 Significant predictors English Two-stage OLS 0.74 Gender Age Race Days absent Degrees of reading proficiency Class size Age Race Days absent Degrees of reading proficiency Class size Gender Age Race Days absent Degrees of reading proficiency Percent free/reduced lunch Age Race Days absent Degrees of reading proficiency Percent free/reduced lunch Mathematics Social studies Science Two-stage OLS Two-stage OLS Two-stage OLS 0.69 0.70 0.69 grade 2 grade 2 grade 2 grade 2 that the Effective Teacher (ET) and Ineffective Teacher (IT) samples were small, we decided to approach this part of the study as a set of case studies. Therefore, we will not report statistical comparisons of the two groups in this paper. 2.2.2 Data Analysis Approach In order to explore the phenomenon of effective teaching, the qualitative approach of exploratory cross-case analysis was used. Using multiple cases makes it possible to build a logical chain of evidence (Miles and Huberman 1994; Yin 1994). Additionally, cross-case analysis allows for analysis of consistencies identified across the cases (Welker 2004). 2.2.3 Instrumentation A variety of data collection instruments were developed or adapted for use in this study to empirically capture selected instructional practices. Specifically, the following instruments were used: (a) questioning analysis chart, (b) narrative running record, (c) time-on-task chart, (d) student-teacher interaction analysis, (e) checklist of student assessment practices, (f) overall time use chart, and (e) teacher interview form. Following the observation and interview, both observers were asked to complete a teacher effectiveness behavior scale based on the dimensions identified in Table 1. Questioning Analysis Chart One observer recorded all instructional questions asked by the teacher, orally and in writing, for one to two lessons, or the equivalent of a J Pers Eval Educ (2007) 20:165–184 173 Fig. 1 Standardized teacher residuals for English 60-min time period during the 3-h observation. Subsequently, the observer coded and tallied the questions based on the six levels in Bloom’s taxonomy (1984): knowledge, comprehension, application, analysis, synthesis, and evaluation. Narrative Running Record This instrument was designed to record and code the type of classroom activities and interactions, at 5-min intervals, for a 60-min time period during the 3-h observation. This instrument is based on Glickman et al. (1998) Teacher Verbal Fig. 2 Standardized teacher residuals for mathematics 16 14 12 10 8 Frequency 6 4 Std. Dev = .42 Mean = .00 2 0 N = 85.00 -1.09 -.69 -.89 -.29 -.49 .11 -.09 .51 .31 Residuals .91 .71 174 J Pers Eval Educ (2007) 20:165–184 Fig. 3 Standardized teacher residuals for social studies 16 14 12 10 8 Frequency 6 4 Std. Dev = .52 2 Mean = -.01 N = 85.00 0 -1.09 -.29 -.69 .51 1.31 .11 .91 1.71 Residuals Behaviors Instrument. Following the actual classroom observation, the audiotapes were reviewed and verbatim quotes and examples of classroom activities were added to the record. The teacher’s interactions with the students were categorized as directions/ procedures, monitoring, feedback, management, modifications, and questioning. Time-on-Task Chart This instrument was designed to record student engagement in the teaching-learning process at 5-min intervals for a 60-min period. Additionally, comments regarding off-task behaviors and teacher responses were recorded. It is a modified version of an instrument used in the validity study of the National Board Fig. 4 Standardized teacher residuals for science 20 Frequency 10 Std. Dev = .46 Mean = -.00 N = 85.00 0 -.84 -.44 -.64 -.04 -.24 .36 .16 .76 .56 Residuals 1.16 .96 1.36 J Pers Eval Educ (2007) 20:165–184 175 for Professional Teaching Standards (Bond et al. 2000). Student off-task behaviors and teacher management of the behavior, both preventive and reactive, were noted. Student–Teacher Interaction Analysis This instrument was based on Flander’s Interaction Analysis (Flanders 1970) methodology to capture teacher interactions with students throughout a 60-min interval. Teacher interactions with students were categorized according to the following: accepts feelings, praises/encourages, accepts or uses student ideas, asks questions, lectures, gives directions, reprimands or asserts authority, records student talk response, and notes student talk initiation. Checklist of Student Assessment Practices Using a checklist of possible types of student assessments, one observer noted the types of assessments used in the classroom and made follow-up notes based on information provided during the teacher interview after the observation. Overall Time Use Chart This was a simple recording of how time was used in the classroom during the 3-h visitation. Major activities and the time dedicated to each were noted. Results of this analysis were used to determine the amount of time focused on instruction as compared to administrative tasks, transitions, and other non-instructional activities. The instrument utilized the Stallings Observation System (Stallings 1986) method of providing a snapshot of activities engaged in by the teacher. Teacher Interview Form A structured interview protocol that took 20–30 min was completed following the classroom observation. It was used to solicit information from the teacher on teaching credentials, professional development, student assessment strategies, and lesson objectives. Teacher Effectiveness Behavior Scale After the 3-h observation, the two raters scored the entire observation using a behavioral summary scale (Bond et al. 2000; McGreal 1990) of effective teacher behaviors. The scale is based on research of effective teaching behaviors and is designed to capture both the types of behaviors and the degree to which the participating classroom teachers exhibited those behaviors. Four levels of performance—ranging from most effective to least effective—were defined for each dimension of effective teaching. 2.2.4 Procedures A double blind design was employed in which the teachers were not informed as to the reason for their inclusion in the study; additionally the observers who collected observational data did not know the effective/ineffective identities of the teachers. All identifying teacher information was coded such that only a single school district employee knew the identity of teachers in each group. Classroom observations were conducted with the selected five teachers from the highest quartile and six teachers from the lowest quartile. Two observers used a variety of data collection strategies during a 3-h classroom visit and a subsequent half hour interview with each selected teacher. A training session was provided for classroom observers on conducting observations using the specific instruments developed for this study. The session included an 176 J Pers Eval Educ (2007) 20:165–184 overview of the study, specific training on the use of each protocol, and instruction on synthesizing the data for the overall rating of the observation. Observers were given opportunities to practice using the various observation instruments while viewing practice videotapes. The practice session continued until observers were able to score the videotaped performance of the teaching simulations with an 80% or above agreement. 3 Results The following sections report the findings from the observational data of “effective” teachers, those who facilitated higher than expected learning gains for students, and “ineffective” teachers, those who facilitated lower than expected learning gains. 3.1 Student–Teacher Interactions During a 1-h segment of a lesson, the observers recorded student–teacher interactions in three specific domains: indirect, direct, and student talk. There were no noteworthy differences between the effective and less effective teachers noted in this analysis (Table 4). 3.2 Teacher Classroom Behaviors The observation team assessed each of the selected third grade teachers (N=11) based on 20 categories within four specific domains: instruction, student assessment, classroom management, and personal qualities. The information in Table 5 lists the results of the observation teams’ rating using a four-point behavioral summary scale rubric. The observational data were summarized for the two groups and compared. Because of the small samples representing the two teacher groups, statistical comparisons are not reported here. Nonetheless, in 18 out of 20 dimensions on which the teachers were compared, the effective teachers received higher scores. Although we did not report statistical significance on each analysis due to the exploratory nature of the study and the small sample sizes, it is worth noting the effective teacher group did perform higher on two dimensions, instructional differentiation and complexity of instruction, at a significance level of p<.05. A summary of key findings from the comparative analysis of teacher classroom behaviors revealed the following: Instruction: 1. The effective teachers studied provided more complex instruction with a greater emphasis on meaning versus memorization than those teachers who were considered ineffective. 2. The effective teachers studied demonstrated a broader range of instructional strategies, using a variety of materials and media to support the curriculum, than those teachers who were considered ineffective. Student Assessment: 1. As a domain, student assessment was found to have noteworthy differences favoring the effective teachers. J Pers Eval Educ (2007) 20:165–184 177 Table 4 Comparative analysis between effective and ineffective teachers regarding types of student– teacher interactions Description Effective teachers (ET) Mean Ineffective teachers (IT) Mean Comparison Favors Indirect Accepts feelings Praises/encourages Accepts student ideas Asks questions Direct Lectures Gives directions Reprimands or asserts authority Student talk Response to Initiation of Total interactive behavior 66.20 1.40 16.20 5.80 42.80 39.20 4.00 26.20 9.00 36.80 23.60 13.20 142.20 70.67 1.17 11.67 3.33 54.46 56.33 3.00 34.83 18.50 31.17 21.33 9.83 158.17 IT ET ET ET IT IT ET IT IT ET ET ET IT 2. The effective teachers studied provided more differentiated assignments for students than did the ineffective teachers. Learning Environment: 1. The effective teachers studied were more organized than ineffective teachers with efficient routines and procedures for daily tasks. 2. The behavioral expectations for students of the effective teachers studied were higher than the expectations of the ineffective teachers. Personal Qualities: 1. There was a difference between teachers deemed effective and those deemed ineffective in the overall domain for personal qualities. 2. When compared to the ineffective teachers, the effective teachers studied demonstrated a higher degree of respect for and fairness toward students. 3.3 Teacher Questioning Analysis During 1 h of the observation period, the total number of questions asked by the teachers on three levels was tallied: recall questions, comprehension questions, and higher order questions (based on Bloom’s Taxonomy). Table 6 illustrates key findings from this analysis. A comparative analysis of the types of questions asked by teachers revealed that the effective teachers asked more higher-level questions than did the ineffective teachers (i.e., application, analysis, synthesis, evaluation), approximately seven times as many than those teachers considered ineffective. 3.4 Student Off-Task Behavior During a 60-min period, one member of the observation team noted the number of students who were disengaged or disruptive at 5-min intervals. Table 7 describes the 178 J Pers Eval Educ (2007) 20:165–184 Table 5 Comparative analysis between effective and ineffective teachers on research-based dimensions Description Effective teachers (ET) Mean Ineffective teachers (IT) Mean Comparison Favors Instruction Instructional focus Achievement expectations Planning Range of strategies Clarity of expectations Complexity of instruction Questioning Student engagement Homework Student assessment Monitoring students Differentiation Classroom management Management Organization Behavioral expectations Personal qualities Caring Fairness and respect Interactions with students Enthusiasm and motivation Dedication to teaching Reflective practice Overall effectiveness 25.20 3.40 3.00 3.20 3.20 3.20 3.00 2.40 3.00 .60 5.40 2.80 2.60 11.00 3.60 3.60 3.80 18.60 3.60 3.80 3.40 3.40 2.20 2.20 60.20 21.83 2.67 3.17 2.83 2.33 2.83 1.83 2.00 2.50 1.67 3.67 2.50 1.17 9.00 3.33 2.83 2.83 13.83 3.00 3.17 2.50 2.67 1.00 1.50 48.33 ET ET IT ET ET ET ET ET ET IT ET ET ET ET ET ET ET ET ET ET ET ET ET ET ET mean number of students who were disengaged, the mean number who were disruptive, and the total mean number of students who were off-task (disruptive and/ or disengaged). The results of this analysis revealed that the effective and the ineffective teachers had essentially the same number of students noted as disengaged; however, the ineffective teachers’ students exhibited more off-task behaviors than the effective teachers. Teachers who were considered ineffective had, on average, almost five disruptive behaviors during the 60-min observation periods compared to approximately one half of a disruptive event per 60-min period for the effective teachers. Table 6 Questioning analysis of effective and ineffective teachers Description Effective teachers (ET) Mean Ineffective teachers (IT) Mean Comparison Favors Recall Comprehension Application and beyond 48.40 8.80 9.80 52.00 26.80 1.20 IT IT ET A review of the data indicated that one teacher asked a total of 117 questions during the observation session. This case was considered an anomaly (2.6 questions per minute) and was omitted from the analysis J Pers Eval Educ (2007) 20:165–184 179 Table 7 Comparative analysis between effective and ineffective teachers regarding student behavior Description Effective teachers (ET) Mean Ineffective teachers (IT) Mean Comparison Favors Disengaged students Disruptive students Total students with off-task behaviors 7.80 0.60 8.40 7.33 4.83 12.17 ET IT IT 4 Discussion 4.1 Using Statistical Models to Assess Teacher Effectiveness The current environment for education is permeated with new calls for accountability at the student, teacher, and school levels. The NCLB Act calls for more attention to student gains and effectiveness of teachers. In the current study, we focused on the identification of teacher outcomes that can be closely linked to accountability. Fairness and usability are central issues in any system of accountability that would be proposed for use in educational settings (Webster and Mendro 1997). Part 1 of the current study employed a statistical methodology that ensures fairness by creating the teacher composites through the use of statistical controls of concomitant variables. The use of statistical models allows for the comparison of outcomes that are free from the influences of identified extraneous variables. The statistical models tested in Part 1 have the advantage of including measures at the student and classroom levels. While the current study found two-stage OLS regression models to provide an adequate fit, OLS and HLM regression models were tested. Previous research had recommended the use of two-level HLM (Webster et al. 1996) but also found that OLS solutions were highly correlated and relatively free from bias. The testing of multiple models is recommended as an additional safeguard in the process of identifying effective teachers. The advantages to school systems that employ such methods to help identify effective and less effective teachers include the possible future benefits to teacher evaluation and teacher development, as well as the ability to demonstrate compliance with new calls for accountability. Although seemingly at odds (Danielson and McGreal 2000; Stronge 2006), the purposes of accountability and professional growth in a teacher evaluation system can be met by examining teacher effects on student achievement and behaviors of those teachers for whom students experience higher than expected learning gains. First, adding teacher effectiveness to a school district’s accountability system would provide a critical empirical perspective to the multifaceted process of teacher evaluation. Secondly, when the data from teacher effectiveness are associated with professional development opportunities that are structured on the instructional characteristics and behaviors of effective teachers, the ultimate outcome may be increased educational success of more students. The improvement orientation of evaluating teacher effectiveness serves to meet the professional needs of the teacher and to support reform efforts within a school (Stronge 2006). Logic dictates that if teaching improves then student achievement will improve as well. 180 J Pers Eval Educ (2007) 20:165–184 4.2 Characteristics and Behaviors of Effective Teachers One important finding of this exploratory cross-case analysis is the preliminary identification of instructional characteristics and behaviors of those teachers who produced high gains in student learning. In the study, assessments were used that were closely aligned with the curriculum taught by the teachers, which allowed for a meaningful interpretation of student learning gains, both greater and lower than expected. Studies such as this may help us begin to better understand the links between classroom processes and desirable student outcomes. Moreover, by focusing on the hallmarks of effective teachers, eventually we may be better equipped to educate teachers more expertly, to set meaningful performance expectations once teachers are in classrooms, and to evaluate and reward teachers more fairly. This exploratory study identified three distinct differences in the practices of those teachers who effected greater than expected learning gains for students and those who effected lower than expected learning gains: (1) differentiation and complexity of instructional strategies, (2) questioning practices, and (3) level of disruptive student behavior. Consequently, the study reinforces the link between student learning and these teacher behaviors. & & & Differentiation and complexity of instruction. The effective teachers in this study demonstrated that they understood the need to alter the lesson presentation and materials in order to promote student learning given that a one-size fits all approach typically is not the best fit. Questioning. The effective and less effective teachers asked comparable numbers of lower-level questions; the distinction between the two groups occurred with effective teachers asking a far greater number of higher level questions (approximately seven times more). Disruptive student behavior. Effective teachers in the study had a disruptive behavior incident about once every 2 h whereas the ineffective teachers in the case analyses had a disruptive event approximately every 12 min. 4.3 Limitations Due to the very limited sample size of the cross-case analysis (N=11), large number of variables, and the large number of statistical tests, the analyses are presented as exploratory analyses focused on the trends of the findings rather than as statistical analyses. Thus, due caution should be exercised in interpreting or generalizing the results of the study. Given the promise in these findings, and considering the limitations of the current study, we recommend that future work continue this line of research. In particular, studies that could provide for a larger and more representative sample would allow for more robust statistical analyses to be conducted. 5 Conclusions Although policy makers periodically have suggested that schools have little impact on student learning, recent studies indicate that schools and their efforts do make a J Pers Eval Educ (2007) 20:165–184 181 difference, and much of that difference can be linked directly to teachers (DarlingHammond 2000). Given the clear and undeniable link that exists between teacher effectiveness and student learning, the use of student achievement information, when it is curriculum based, can provide an invaluable tool to explore the classroom practices of teachers who enhance student learning beyond predicted levels of accomplishment. Student achievement can be, indeed, should be, an important source of feedback on the effectiveness of schools, administrators, and teachers. The challenge for educators and policy makers is to make certain that student achievement is placed in the broader context of what teachers and schools are accomplishing. Moreover, given the central role that teachers have always played in successful schools, connecting teacher performance and student performance is a natural extension of the educational reform agenda. “The purpose of teaching is learning, and the purpose of schooling is to ensure that each new generation of students accumulates the knowledge and skills needed to meet the social, political, and economic demands of adulthood” (McConney et al. 1997, p. 162). Most educators view teaching and learning as a reciprocal process, an equal partnership, in which teachers and students, alike, shape the environment and support the learning endeavor through their thoughts and behaviors. Hence, how we conceptualize teacher effectiveness should reflect a balance of the instructional practices of teachers that both enhance teaching and curriculumbased assessments of student learning. References Allington, R. L. (2002). What I’ve learned about effective reading instruction. Phi Delta Kappan, 83, 740–747. Berliner, D. C. (1986). In pursuit of the expert pedagogue. Educational Researcher, 15(7), 5–13. Berliner, D. C., & Rosenshine, B. V. (1977). The acquisition of knowledge in the classroom. In R. C. Anderson, R. J. Spiro, & W. E. Montague (Eds.) Schooling and the acquisition of knowledge (pp. 375–396). New Jersey: Lawrence Erlbaum Associates. Bloom, B. S. (1984). The search for methods of group instruction as effective as one-to-one tutoring. Educational Leadership, 41(8), 4–17. Bond, L., Smith, T., Baker, W. K., & Hattie, J. A. (2000). The certification system of the National Board for Professional Teacher Standards: A construct and consequential validity study. Greensboro, NC: Center for Educational Research and Evaluation, The University of North Carolina at Greensboro. Boyle-Baise, M. (2005). Preparing community-oriented teachers: Reflections from a multicultural servicelearning project. Journal of Teacher Education, 56(5), 446–458. Cawelti, G. (Ed.). (2004). Handbook of research on improving student achievement (2nd ed.). Arlington, VA: Educational Research Service. Collinson, V., Killeavy, M., & Stephenson, H. J. (1999). Exemplary teachers: Practicing an ethic of care in England, Ireland, and the United States. Journal for a Just and Caring Education, 5(4), 349–366. Corbett, D., & Wilson, B. (2004). What urban students say about good teaching. Educational Leadership, 60(1), 18–22. Cotton, K. (2000). The schooling practices that matter most. Portland, OR: Northwest Regional Educational Laboratory, & Alexandria, VA: Association for Supervision and Curriculum Development. Cruickshank, D. R., & Haefele, D. (2001). Good teachers, plural. Educational Leadership, 58(5), 26–30. Danielson, C., & McGreal, T. L. (2000). Teacher Evaluation to Enhance Professional Practice. Princeton, NJ: Educational Testing Service. Darling-Hammond, L. (1997). Doing what matters most: Investing in quality teaching. New York: National Commission on Teaching and America’s Future. Darling-Hammond, L. (2000). Teacher quality and student achievement: A review of state policy evidence. Educational Policy Analysis Archive, 8(1). Retrieved from http://olam.ed.asu.edu/epaa/v8n1. Darling-Hammond, L. (2001). The challenge of staffing our schools. Educational Leadership, 58(8), 12–17. 182 J Pers Eval Educ (2007) 20:165–184 Darling-Hammond, L., & Youngs, P. (2002). Defining “highly qualified teachers”: What does “scientifically based research” actually tell us? Educational Researcher, 31(9), 13–25. Eisner, E. W. (2003/2004). Preparing for today and tomorrow. Educational Leadership, 61(4), 6–10. Flanders, N. A. (1970). Analyzing teaching behavior. New York: Addison–Wesley. Foegen, A., Jiban, C., & Deno, S. (2007). Progress monitoring measures in Mathematics: A review of the literatura. Journal of Special Education, 41(2), 121–139. Glickman, C. D., Gordon, S. P., & Ross-Gordon, J. (1998). Supervision of instruction: A developmental approach (4th ed.). Boston: Allyn and Bacon. Good, T. L., & Brophy, J. E. (1997). Looking in classrooms (7th ed.). New York: Addison–Wesley. Hambleton, R. K., Crocker, L., Cruse, K., Dodd, B., Plake, B. S., & Poggio, J. (2000). Review of selected technical characteristics of the Virginia Standards of Learning (SOL) assessments. Richmond, VA: Virginia Department of Education. Hamre, B. K., & Pianta, R. C. (2005). Can instructional and emotional support in the first-grade classroom make a difference for children at risk of school failure? Child Development, 76(5), 949–967. Heinecke, W. F., Curry-Corcoran, D. E., & Moon, T. R. (2003). U. S. schools and the new standards accountability initiative. In D. L. Duke, M. Grogan, P. D. Tucker, & W. F. Heinecke (Eds.) Educational leadership in an age of accountability: The Virginia experience (pp. 7–35). Albany, NY: SUNY. Janisch, C., & Johnson, M. (2003). Effective literacy practices and challenging curriculum for at-risk learners: Great expectations. Journal of Education for Students Placed At-Risk, 8(1), 295. Jay, J. K. (2002). Points on a continuum: An expert/novice study of pedagogical reasoning. The Professional Educator, 24(2), 63–74. Johnson, B. L. (1997). An organizational analysis of multiple perspectives of effective teaching: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 11, 69–87. Kupermintz, H. (2002). Value-added assessment of teachers: The empirical evidence. In School proposals: Research evidence by A. Molnar (Ed.). Retrieved February 14, 2002 from http://www.asu.edu/educ/ epsl/Reports/epru/EPRU%202002–101/epru-2002–101.htm. Marzano, R. J. (2003). What works in schools. Alexandria, VA: Association for Supervision and Curriculum Development. Marzano, R. J., Marzano, J. S., & Pickering, D. J. (2003). Classroom management that works. Alexandria, VA: Association for Supervision and Curriculum Development. McBer, H. (2000). Research into teacher effectiveness: A model of teacher effectiveness. (Research report #216). Department for Education and Employment: England. McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., & Hamilton, L. S. (2003). Evaluating value-added models for teacher accountability. Santa Monica, CA: RAND. McConney, A. A., Schalock, M. D., & Schalock, H. D. (1997). Indicators of student learning in teacher evaluation. In J. H. Stronge (Ed.) Evaluating teaching: A guide to current thinking and best practice (pp. 162–192). Thousands Oaks, CA: Corwin. McGreal, T. I. (1990). The use of rating scales in teacher evaluation: Concerns and recommendations. Journal of Personnel Evaluation in Education, 4, 41–58. McLeod, J., Fisher, J., & Hoover, G. (2003). The key elements of classroom management: Managing time and space, student behavior, and instructional strategies. Alexandria, VA: Association for Supervision and Curriculum Development. Mendro, R. L. (1998). Student achievement and school and teacher accountability. Journal of Personnel Evaluation in Education, 12, 257–267. Mendro, R. L., Webster, W. J., Bembry, K., & Orsak, T. H. (1994). An application of hierarchical linear modeling in determining school effectiveness. Phoenix, Arizona: Rocky Mountain Educational Research Association. Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: an expanded sourcebook. Thousand Oaks, CA: Sage. No Child Left Behind Act of 2001, Pub. L. no. 107–110, 115 Stat. 1425 (codified in 20 USC §6301). National Commission on Excellence in Education (1983). A nation at risk: The imperative for educational reform. Washington, DC: US Department of Education. Nye, B., Konstantopoulos, S., & Hedges, L. V. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26(3), 237–257. Panasuk, R., Stone, W., & Todd, J. (2002). Lesson planning strategy for effective mathematics teaching. Education, 22(2), 714, 808–827. Peart, N. A., & Campbell, F. A. (1999). At-risk students’ perceptions of teacher effectiveness. Journal for a Just and Caring Education, 5(3), 269–284. J Pers Eval Educ (2007) 20:165–184 183 Pressley, M., Raphael, L., Gallagher, J. D., & DiBella, J. (2004). Providence St. Mel school: How a school that works for African American students works. Journal of Educational Psychology, 96(2), 216–235. Quek, C.G. (2005). A national study of scientific talent development in Singapore. Unpublished doctoral dissertation, The College of William and Mary, Willamsburg, Virginia. Raudenbush, S., Bryk, A., & Congdon, R. (2005) HLM for Windows. Redfield, D. (2000). Lincolnwood, IL: SSI. Rowan, B., Chiang, F. S., & Miller, R. J. (1997). Using research on employees’ performance to study the effects of teachers on student achievement. Sociology of Education, 70, 256–284. Sacks, P. (1999). Standardized minds: The high price of America’s testing culture and we can do to change it. Cambridge, MA: Perseus. Sanders, W. L., & Horn, S. P. (1995). The Tennessee Value-Added Assessment System (TVAAS): Mixed model methodology in educational assessment. In A. J. Shinkfield, & D. L. Stufflebeam (Eds.) Teacher evaluation: Guide to effective practice. Boston: Kluwer. Shellard, E., & Protheroe, N. (2000). Effective teaching: How do we know it when we see it? The informed educator series. Arlington, VA: Educational Research Services. Southeast Center for Teaching Quality (2003). How do teachers learn to teach effectively? Quality indicators from quality schools. Teaching Quality in the Southeast: Best Practices and Policies, 7(2), 1–2. Stallings, J. (1986). Using time effectively: A self-analytic approach. In K. K. Zumwalt (Ed.) Improving teaching (pp. 15–27). Alexandria, VA: Association for Supervision and Curriculum Development. Sternberg, R. J. (2003). What is an expert student? Educational Researcher, 32(8), 5–9. Stronge, J. H. (2002). Qualities of effective teachers. Alexandria, VA: Association of Supervision and Curriculum Development. Stronge, J. H. (2006). Teacher evaluation and school improvement: Improving the educational landscape. In J. H. Stronge (Ed.) Evaluating teaching: A guide to current thinking and best practice (pp. 1–23, 2nd ed.). Thousand Oaks, CA: Corwin. Stronge, J. H. (2007). Qualities of effective teachers (2nd ed.). Alexandria, VA: Association of Supervision and Curriculum Development. Stronge, J. H., & Tucker, P. D. (2000). Teacher evaluation and student achievement. Washington, DC: National Education Association. Stronge, J. H., & Ward, T. J. (2002). Alexandria City public schools teacher effectiveness study. Report for Alexandria City Public Schools, Alexandria, VA: Authors. Tobin, K. (1980). The effect of extended teacher wait-time on science achievement. Journal of Research in Science Teaching, 17, 469–475. Tomlinson, C. (1999). The differentiated classroom: Responding to the needs of all learners. Alexandria, VA: Association for Supervision and Curriculum Development. Tomlinson, C. A. (2003). Differentiation of Instruction in the Early Grades. ERIC Digest. Washington, DC: ERIC Clearinghouse on Teaching and Teacher Education (ERIC Document Reproduction service no. ED443572). Tucker, P. D., & Stronge, J. H. (2005). Linking teacher evaluation and student learning. Alexandria, VA: Association for Supervision and Curriculum Development. VanTassel-Baska, J. (2005). Lessons learned from curriculum differentiation, instruction, and assessment. Presentation at the National Curriculum Network Conference, Williamsburg, VA. Walsh, J. A., & Sattes, B. D. (2005). Quality questioning: Research-based practice to engage every learner. Thousand Oaks, CA: Corwin. Wang, M. C., Haertel, G. D., & Walberg, H. J. (1993). What helps students learn? Educational Leadership, 51(4), 74–79. Webster, W., & Mendro, R. (1997). The Dallas Value-Added Accountability System. In J. Millman (Ed.). Grading teachers, grading schools: Is Student Achievement a Valid Evaluation Measure? Thousand Oaks, CA: Corwin Press. Webster, W. J., Mendro, R. L., Orsak, T. H., & Weerasinghe, D. (1996, April). The applicability of selected regression and hierarchical linear models to the estimation of school and teacher effects. Paper presented at the annual meeting of the National Council on Measurement in Education, New York. Webster, W. J., Mendro, R. L., Orsak, T. H., & Weerasinghe, D. (1998). An application of hierarchical linear modeling to the estimation of school and teacher effect. Paper presented at the Annual Meeting of the American Educational Research Association (San Diego, CA, April 13–17, 1998). Weiss, I. R., Pasley, J. D., Smith, P. S., Banilower, E. R., & Heck, D. J. (2003). A study of k-12 mathematics and science education in the United States. Chapel Hill, NC: Horizon Research. 184 J Pers Eval Educ (2007) 20:165–184 Welker, G. A. (2004). Patterns of order processing: A study of the formalization of the ordering process in order-driven manufacturing companies. The Netherlands: University of Groningen. Dissertation. Retrieved January 11, 2008 from http://dissertations.ub.rug.nl/faculties/management/2004/g.a.welker. Wenglinsky, H. (2000). How teaching matters: Bringing the classroom back into discussions of teacher quality. Princeton, NJ: Millikan Family Foundation and Educational Testing Service. Wenglinsky, H. (2002). How schools matter: The link between teacher classroom practices and student academic performance. Educational Policy Analysis Archives, 10(12). Retrieved February 13, 2007 from http://epaa.asu.edu/epaa/v10n12/. Wright, S. P., Horn, S. P., & Sanders, W. L. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 11, 57–67. Yesseldyke, J., & Bolt, D. M. (2007). Effect of technology-enhanced continuous progress monitoring on math achievement. School Psychology Review, 36(3), 453–467. Yin, R. K. (1994). Case study research. Design and methods, applied social research methods series. Thousand Oaks, CA: Sage. Zahorik, J., Halbach, A., Ehrle, K., & Molnar, A. (2003). Teaching practices for smaller classes. Educational Leadership, 61(1), 75–77.