The Effects of Within-Class Ability Grouping on the Social and Emotional Skills of Kindergartners and First Graders 1. Introduction Research on ability grouping dates back to the early 1900s (Findley and Bryan, 1971), educational researchers have published numerous papers on ability grouping, and the topic has been the source of much academic debate (see, e.g., Hallinan, 1994; and, Oakes, 1994). Yet for all the effort to assess the effects of ability grouping, much is still unknown about how the practice impacts students, and ability grouping debates march on. In general, students may experience three types of ability grouping: ability grouping across schools (e.g., certain magnet schools, like The Bronx High School of Science in New York City, require students to pass entrance exams for admission); ability grouping across classrooms, which is common in middle and high schools and is also known as tracking; and, within-class ability grouping, common in early elementary grades. While much of the literature on ability grouping focuses on tracking, this study examines the effects of within-class ability grouping on kindergartens and first graders. By many accounts, within-class ability grouping is widespread: in the early 1960s, 85 percent of elementary school administrators in the United States (U.S.) reported “Exclusive” or “Predominant” use of group instruction by reading ability (Baumann, Hoffman, Duffy-Hester and Moon Ro, 2000); in a study of Pennsylvania schools, McPartland, Coldiron and Braddock (1987) found that over 90 percent of first grade teachers used ability grouping in reading and that over 20 percent of them used it in math; and, in a recent, random-sample survey of first through third grade reading teachers from across the U.S., 63 percent of respondents reported using within-class ability grouping (Chorzempa and Graham, 2006). 1 There is a good amount of evidence suggesting that within-class ability grouping can affect children’s cognitive development. Causal studies and meta-analyses of within-class ability grouping provide consistent evidence of a positive grouping effect on students’ math achievement in the upper elementary grades (Kulik, 1992; Mosteller, Light and Sachs, 1996; Slavin and Karweit, 1985; and, Slavin, 1987), as well as some evidence of a positive effect on students’ reading achievement in elementary, secondary and post-secondary schools (Lou et al., 1996). They also generally find that grouping is better for low achievers than for high achievers (Slavin, 1987; Lou et al., 1996). There are also reasons to believe that within-class ability grouping might affect children’s social and emotional development. Once a child is placed in a grouped classroom, there are two powerful events that occur can trigger mechanisms affecting students’ social and emotional skills: the identification of students based on their demonstrated or perceived abilities and the division of the classroom into small groups for instruction. Identifying students based on their abilities can impact students’ judgments about their own abilities (self-efficacy beliefs), as well as ways in which students treat each other. Dividing students into small ability groups usually leads to differentiated instruction (Chorzempa and Graham, 2006; Gamoran, 1986; and Rowan and Miracle, 1983), it changes the classroom’s social setting, and it can alter how teachers allocate their time. All of these things can all impact students’ social and emotional skills. Despite the ample reasons to believe that within-class ability grouping could affect children’s socio-emotional development, the evidence on such effects is unclear. The few existing studies on the topic examine the effect of which group a student is placed in, not the effect of grouping per se. In addition, most of the extant studies treat grouping as a simple binary variable, examine the effect of grouping on one or a small number of social and emotional 2 skills, and use small samples and descriptive techniques, which are not designed to estimate the causal effects of grouping on children’s socio-emotional development. Understanding these effects is central to understanding the overall effects of grouping and of schools more generally. This study examines main and differential effects of three types of within-class ability grouping, relative to no grouping, on five social and emotional domain skills scores of kindergartners and first graders. The three grouping types are infrequent, frequent and flexible within-class ability groups. I distinguish infrequent groups from frequent groups given the likelihood that the dosage of grouping influences its effects. Indeed, recent research suggests the frequency of grouping matters with respect to students’ academic skills (Robinson, 2009). I define a flexible group as a frequent group in which at least 10 percent of students in the class change groups during the year. This distinction is important for two reasons. First, it is possible the flexible grouping impacts students differently than inflexible grouping. For example, students who change groups during the year are potentially exposed to different peer contexts, instructional content and pedagogy. Secondly, flexible grouping has grown rapidly over past 50 years, yet little is known about its effects.1 Finally, since within-class ability grouping is so pervasive in first grade reading classes – according to the data in this study, roughly 94% of first grade teachers use reading groups – examining the effects of the different types of grouping is useful for adding to the conversation on best literacy instructional practices. The five socio-emotional skill measures are the Early Childhood Longitudinal StudyKindergarten Class of 1998-1999 (ECLS-K) “Approaches to Learning,” “Self-Control,” 1 In the early 1960s, between-group mobility was virtually non-existent (Baumann, Hoffman, Duffy-Hester and Moon Ro, 2000), which mostly remained the case in the early 1980 s, when researchers found that ability groups remained highly stable throughout the school year (e.g., Rowan and Miracle, 1983; Hallinan and Sørensen, 1983). However, over 50 percent of respondents from two recent surveys reported using flexible ability groups (Baumann, Hoffman, Duffy-Hester and Moon Ro, 2000; and, Chorzempa and Graham, 2006). 3 “Interpersonal Skills,” “Externalizing Problem Behaviors,” and “Internalizing Problem Behavior” teacher social rating scale (SRS) scores. Collectively, they measure a broad range of students’ social and emotional skills, from student attentiveness to the frequency with which a child argues or fights. Examining the effects of grouping on multiple social and emotional skills is important, as studies have shown a positive relationship between individual earnings and selfcontrol (Dunifon and Duncan, 1998), persistence (Lindqvist and Westman, 2009), emotional stability (Nyhus and Pons, 2005; and, Lindqvist and Westman, 2009), social skills (Lindqvist and Westman, 2009), self-esteem (Goldsmith, Veum and Darity, 1997), an orientation towards challenges (Dunifon and Duncan, 1998), and general life outlook (Goldsmith, Veum and Darity, 1997). This study addresses three main research questions. The first question is: what are the characteristics of schools, teachers and classrooms that use within-class ability grouping? The second question is: to what extent does ability grouping affect students’ socio-emotional development? The third question is: are high- and low-ability students differentially affected by ability grouping? This study also examines whether or not the answers to these three questions vary based on grouping type, and across subjects and grades. For example, do the characteristics of schools, teachers and classrooms that use within-class ability grouping differ by ability grouping type? Do they differ between math and reading, or kindergarten or first grade? By examining main and differential effects of three types of within-class ability grouping, relative to no grouping, on five social and emotional domain skills scores of kindergartners and first graders, this study extends the body of research on within-class ability grouping in three ways. First, to the best of my knowledge, it is the only nationally-representative study that examines the impact of within-class ability grouping on a wide range of social and emotional 4 skills. Secondly, it is the only such study that I am aware of that analyzes the differential effects of within-class ability grouping on the social and emotional skills of students in different withinclass achievement quartiles. Finally, to the best of my knowledge, this is the only study that examines the effect of multiple types of within-class ability grouping – infrequent, frequent and flexible grouping – on children’s social and emotional skills. In fact, it is the only study that I am aware of that analyzes the effect of flexible grouping on any student outcome. This remainder of this paper proceeds as follows: Section Two provides background information on within-class ability grouping, including a review of the ability-grouping literature; Section Three presents my conceptual framework; Section Four describes the study’s data and discusses my approach for dealing with missing variables; Section Five outlines the study’s methods; Section Six reports results; Section Seven discusses the study’s finding; and, Section Eight concludes. 2. Background Ability grouping is a means of organizing students for instruction to deal with studentbody heterogeneity (Barr and Dreeben, 1983). It dates back to the mid 19th century, when schools began to transition from single-room schoolhouses in which one teacher taught children of many ages, towards the current, more differentiated form of schooling (Findley and Bryan, 1971).2 Today, ability grouping generally refers to either between-class ability grouping (also known as tracking) or within-class ability grouping. As mentioned above, this paper focuses on the effects of within-class ability grouping. Within-class ability grouping, common in elementary schools (Sørensen and Hallinan, 1986; Gamoran, 1989; and, Nomi, 2009), refers to the placement of students within a class into small groups with other students, generally for instructional purposes (Chorzempa and Graham, 2 Large increases in the demand for public education spurred this transition. 5 2006).3 It is most often based on students’ demonstrated or perceived abilities (Chorzempa and Graham; 2006; Felmlee and Eder, 1983; Pallas, Entwisle, Alexander and Stluka, 1994; and, Tach and Farkas, 2005).4 Many features of within-class ability grouping are described in the literature. In most grouped classrooms, students spend only part of the school day in ability groups, sometimes only fifteen to twenty minutes per day (Felmlee and Eder, 1983). Typically, a grouped class has three to four ability groups (Barr and Dreeben, 1983; Chorzempa and Graham, 2006; and, Hallinan and Sørensen, 1985), but the number of groups may not be related to class size (Hallinan and Sørensen, 1985). Groups can vary in size from four to eight students (Chorzempa and Graham, 2006; and, Hallinan and Sørensen, 1985).5 Students in low ability groups spend less time reading silently, answering critical comprehension questions, and reading expository books than students in high ability groups; and, they spend more time reading orally, answering literal questions and doing non-reading tasks (Chorzempa and Graham, 2006). Similarly, students in low ability groups spend more time being read to by teachers and completing worksheets than student in high-ability groups (Chorzempa and Graham, 2006). Over the past 75 years, researchers have been making arguments for and against ability grouping (Slavin, 1987). Perhaps the biggest potential benefit of within-class ability grouping is that it allows teachers to differentiate instruction so that it is neither too hard nor too easy for any student in the classroom (Lou et al., 1996; Sørensen and Hallinan, 1986; and, Slavin, 1987). Other potential benefits include: students pay more attention in small groups (Sørensen and 3 Some teachers use groups in response to guidance from their curriculum, or a district or a school policy (Chorzempa and Graham, 2006). 4 Some teachers make placement decisions on the basis of student behavior (Haller, 1985; and, Tach and Farkas, 2005); however, there is evidence that teachers do not make placement decisions on the basis of socio-economic status (Pallas, Entwisle, Alexander and Stluka, 1994) or race (Haller, 1985). 5 Chorzempa and Graham (2006) provided national average group sizes for low and average ability groups of 4.02 and 6.90 students, respectively. Hallinan and Sørensen (1985) found that the average ability group size in Northern California schools was eight students. 6 Hallinan, 1986); grouping frees up teachers’ time so that they can provide more individualized attention to students (Lou et al., 1996); and, within-class ability grouping incorporates the beneficial aspects of peer learning, such as peer helping, orally rehearsing materials, and interacting socially (Lou et al., 1996). In contrast, potential detriments of within-class ability grouping include: assignment to a low group communicates self-fulfilling low expectations (Good and Marshall, 1984); students in low ability groups receive inferior instruction (Allington, 1980); and, students in low ability groups suffer from negative peer effects (Felmlee and Eder, 1983). In the late 1940s and 1950s, educational researchers began studying the effects of withinclass ability grouping on student test scores; however, none of the early studies produced clear evidence on grouping effects (Dewar, 1963). Evidence became clearer in the late 1950s and 1960s, when researchers began conducting randomized experiments on the ability grouping (see, e.g., Dewar, 1963; and, Wallen and Vowles, 1960). Since the 1960s, interest in the effects of within-class ability grouping has grown, and additional within-class ability grouping experiments (e.g., Slavin and Karweit, 1985) have been conducted, along with quasi-experiments (e.g., Robinson, 2009; and, Nomi, 2009) and meta-analyses of causal within-class ability grouping effects research (Slavin, 1987; Kulik, 1992; Mosteller, Light and Sachs, 1996; and, Lou et al., 1996). Taken as a whole, these causal studies and meta-analyses provide consistent evidence of a positive grouping effect in math on students in the upper elementary grades (Kulik, 1992; Mosteller, Light and Sachs, 1996; Slavin and Karweit, 1985; and, Slavin, 1987), as well as some evidence of a positive effect in reading on elementary, secondary and post-secondary students (Lou et al., 1996). Also, the studies generally find that grouping is better for low achievers than 7 for high achievers (Slavin, 1987; Lou et al., 1996). In keeping with this result, a recent study found that grouping has a positive differential effect on the reading achievement of language minority Hispanic students in early elementary grades (Robinson, 2009). Finally, the reported within-class ability grouping effects on students’ attitudes towards subject matter is mixed (Lou et al., 1996; and, Slavin and Karweit, 1995). One exception in the body of casual research on the academic effects of within-class ability grouping is Nomi (2009), which estimates the effect of grouping on the reading achievement of first graders and does not find an overall effect or differential effects on students with different incoming test scores. The study does, however, find evidence that the effect of grouping is negative in schools with high propensities to group,6 especially on low-achieving students (Nomi, 2009). It also finds a positive effect of ability grouping in schools with a low likelihood grouping (again, especially on low-achieving students). One potential explanation for why Nomi’s (2009) results contradict those of other causal grouping studies is that she estimates within-class ability grouping as a school-level treatment instead of a class-level treatment. Many of the other studies compare classrooms within the same school. Far less research has been conducted on the effects of within-class ability grouping on students’ social and emotional skills. What is more, the few existing studies on the topic have only examined the effect of which group a student is placed in, not the effect of grouping per se. Most of these studies also used small samples and descriptive techniques, which are not designed to estimate causal effects. As an example, Felmlee and Eder (1983) estimated the effect of within-class ability grouping on student attentiveness, using data from 16 video-taped lessons of a single grouped classroom. Controlling for students’ prior level of attentiveness, they found that students in the 6 These schools serve a high proportion of low-socioeconomic-status and minority students. 8 low-ability groups were less attentive than students in higher groups (Felmlee and Eder, 1983). Similarly, Rowan and Miracle (1983) examined the relationship between students’ reading group rank and their habits and classroom conduct, using data on fourth graders from a single urban school district. They found that reading group rank was positively related to teacher assessments of student habits, but not to students’ classroom conduct (Rowan and Miracle, 1983). In a larger study of Baltimore first graders, Pallas, Entwisle, Alexander and Stluka (1994) found little relationship between group placement and changes in children’s conduct marks or self-reported ratings of academic self-esteem, character, and responsibility. Finally, in a study that uses the same data set as this one but focuses solely on group placement effects, Tach and Farkas (2006) found that being placed in high ability group has a positive effect on students’ Approaches to Learning. In addition to failing to provide any information of how within-class ability groups affect students’ social and emotional skills relative to whole-class instruction, the extant studies on within-class ability groups and children’s socio-emotional development treat grouping as a simple binary variable, yet not all grouping is the same. For example, Robinson (2009) found that grouping three or more times per week had a significant positive effect on language-minority Hispanic students’ reading performance, while grouping twice or less per week did not. It is possible that grouping frequency is important with respect to students’ social and emotional skills in a similar way. Distinguishing flexible grouping from static grouping may also be important, as students who move between groups throughout the year might be exposed to different peer contexts and instructional content, both of which could impact their socio-emotional development. What is more, the use of flexible groups is rapidly rising. In the early 1960s, between-group mobility 9 was virtually non-existent (Baumann, Hoffman, Duffy-Hester and Moon Ro, 2000). Grouping appears to have remained largely static into the 1980s (Rowan and Miracle, 1983; Hallinan and Sørensen, 1983);7 however, in two recent surveys, over 50 percent of respondents reported using flexible ability groups (Baumann, Hoffman, Duffy-Hester and Moon Ro, 2000; and, Chorzempa and Graham, 2006). In light of the potential for flexible groups to affect students differently than other types of grouping, this study also distinguishes grouping by flexibility level. Shedding light on the effects of flexible grouping is useful in light to the rapid rise in the use of the practice. Finally, the preponderance of prior ability grouping research has focused on one or a small range of social and emotional skills, such as measures like Approaches to Learning. While studies like these are undoubtedly useful, it is important to examine the impact on within-class ability grouping on multiple aspects of children’s socio-emotional development. Studies have shown a positive relationship between earning and the following socio-emotional skills: selfcontrol (Dunifon and Duncan, 1998); persistence (Lindqvist and Westman, 2009); emotional stability (Nyhus and Pons, 2005; and, Lindqvist and Westman, 2009); social skills (Lindqvist and Westman, 2009); self-esteem (Goldsmith, Veum and Darity, 1997); an orientation towards challenges (Dunifon and Duncan, 1998); and, general life outlook (Goldsmith, Veum and Darity, 1997). By examining main and differential effects of three types of within-class ability grouping, relative to no grouping, on five social and emotional domain skills scores of kindergartners and first graders, this study extends the body of research on within-class ability grouping in three ways. First, to the best of my knowledge, it is the only nationally-representative study that 7 For example, Rowan and Miracle (1983) found a 0.93 correlation between beginning- and end-of-year grouping assignments. One exception is Barr and Dreeben (1983), which found in an analysis of data from 15 first grade classes, that flexibility was a prominent feature of ability groups. 10 examines the impact of within-class ability grouping on a wide range of social and emotional skills. Secondly, it is the only such study that I am aware of that analyzes the differential effects of within-class ability grouping on the social and emotional skills of students in different withinclass achievement spectra. Finally, to the best of my knowledge, this is the only study that examines the effect of multiple types of within-class ability grouping – infrequent, frequent and flexible grouping – on children’s social and emotional skills (most studies treat grouping as a simple binary variable). In fact, it is the only study that I am aware of that analyzes the effect of flexible grouping on any student outcome. 3. Conceptual Framework This section describes factors that could affect students' selection into ability-grouped classes as well as possible mechanisms by which ability grouping could affect their socioemotional development. Estimating the effects of ability grouping accurately using non- experimental data requires an understanding of the factors that affect whether or not a student is placed into an ability-grouped classroom. This understanding provides a basis for assessing whether the estimates have ruled out alternative explanations for the patterns in the data. At the same time, to better understand within-class ability grouping effects, and to set the stage for future research, it is useful to identify mechanisms through which grouping could affect students’ social and emotional skills. The conceptual model illustrated in Figure 1 describes ability group placement factors and potential grouping mechanisms, ultimately detailing a process of how within-class ability grouping might affect students’ social and emotional skills. Importantly, this process may differ across infrequently-grouped, frequently-grouped and flexibly-grouped classes, across math and reading classes, across kindergarten and first grade classes, and for students across the within-class achievement distribution. 11 3.1 Class Placement Factors There are numerous factors that can impact whether or not a student attends an abilitygrouped class. To begin, schools and districts may affect whether or not a student experiences within-class grouping. These organizations directly impact grouping by choosing curricula, instructional materials and interim assessment packages. For example, some curricula, materials and assessment packages emphasize small-group work (like the Phonological Awareness Literacy Screening, or PALS), while other do not. Schools and districts can also affect grouping through professional development choices and teacher hires. Additionally, some schools and districts have instructional-grouping policies (Chorzempa and Graham, 2006). Finally, formal and informal student placement policies at the school-level may impact whether or not students are placed into grouped classes. Obviously, teachers can also affect whether or not a class is grouped. For instance, there is some evidence that teachers’ beliefs about the merits of grouping impact whether or not they group (Chorzempa and Graham, 2006). In addition, teachers who can manage a complex classroom may be more likely to group than those who cannot, while teachers who excel with a large group of students may be less likely to group than the average teacher. Finally, in some school districts, teachers choose their own curricula and instructional resources. Almost certainly, those teachers who choose curricula and resources emphasizing grouping are more likely to group than those who do not. Other classroom characteristics, such as the academic performance heterogeneity of the class, class size, and class supports (e.g., whether or not the classroom has a paid aide), can impact whether or not the class is grouped. The same teacher, for example, might choose not to use ability groups in a classroom in which all students enter with about the same achievement 12 level in a given subject, but may choose to do so if students come in with a wide range of abilities. Moreover, teachers of large classes may be more likely to group than teachers of small classes, to manage the class’ level of behavior, or if they believe students pay more attention in small groups. Likewise, teachers with paid aides may group more often than those without aides, in an effort to maximize classroom resources. A fourth set of factors that can influence whether or not a student experiences grouping includes student and parent characteristics. For example, the location of a student’s home residence can affect whether or not the student attends an ability-grouped classroom, if ability grouping occurs more frequently in some geographic regions, or in areas with particular economic, racial or cultural compositions. In addition, if school administrators strategically distribute students within a school, then a student’s prior behaviors and abilities can affect his or her class placement. Finally, the level of parental involvement of a student’s parents can predict a student’s class placement, as some parents lobby for their students to be placed in particular classes. Indeed, there is some evidence from middle schools of a relationship between parents’ educational levels and their children’s placement in math tracks (Useem, 1992). 3.2 Grouping Effect Mechanisms Once a child is placed in a grouped classroom, there are two potentially-powerful events that occur: students are identified based on their demonstrated or perceived abilities and students are divided into small groups for instruction. These events can trigger a number of mechanisms that affect students’ social and emotional skills. For instance, identifying students based on their abilities can impact students’ self-efficacy beliefs, as well as how students treat each other. At the same time, dividing the classroom into small groups for instruction can lead to differentiated instruction, influence the teacher’s time allocations, and change the classroom’s social setting. 13 Identifying student based on their abilities sends them cues about their in-class positions and chances for future success (Gamoran, 1986). These signals can ultimately affect students’ social and emotional skills. For example, it is not hard to imagine that ability group-based cues can impact a student’s beliefs about his or her academic capabilities, which are more commonly known as self-efficacy beliefs (Bandura, 1977). Students’ self-efficacy beliefs, can, in turn affect their social and emotional skills. In fact, prior research has linked self-efficacy to motivation (Schunk, 1989), persistence (Multon, Brown and Lent, 1991) and social skills (Schunk, 1989). While it is easy to see how students’ self-efficacy beliefs could mediate a grouping effect, predicting the direction of such an effect is more difficult. Upon group placement, the selfefficacy beliefs of some students may improve (likely those of the high achievers), while those of others may worsen (likely those of the low achievers) or stay the same. However, it also possible that students’ self-efficacy beliefs are affected after they experience some amount grouping. For example, the self-efficacy beliefs of low achievers may be buoyed if they experience academic success (which could result from differentiated instruction). In contrast, the self-efficacy beliefs of high achievers may worsen as their reference group no longer includes lower achievers (i.e., the ability of high achievers to positively distinguish themselves on academic grounds is reduced). Identifying students based on their abilities can also impact how students treat each other. Gamoran (1986) notes that students in grouped classrooms are generally aware of the fact that a hierarchy exists, and that most can accurately assess their group’s hierarchical status, even if the teacher gives rank-hiding pseudonyms to the groups, such as “Bluebirds” or “Robins.” More importantly, a student’s peers may communicate their beliefs about the student to the student, 14 based on the student’s group placement (Gamoran, 1986). How students communicate their beliefs can impact the social and emotional skills of other the students. For instance, a student’s peers may tease the student based on his or her group placement; and, there is evidence that childhood teasing based on competency can affect an individual’s sense of self and self esteem later in life (Gleason, Alexander and Somers, 2000). However, as is the case with students’ selfefficacy beliefs, predicting the direction of a grouping effect mediated by students’ treatment of each other is not an easy task. For example, a child can be teased for being placed in either a high group or a low group. One of the primary benefits of within-class ability grouping is that it allows teachers to differentiate instruction at the small-group level (Chorzempa and Graham, 2006; Gamoran, 1986; and Rowan and Miracle, 1983). Differentiating instruction so that tasks are neither too easy nor too difficult for students can affect their socio-emotional skills. For example, Umbreit, Lane and Dejud (2004) found that increasing the difficulty of a ten year old boy’s tasks that were too easy for him was related to an improvement in his classroom behavior. Additionally, Center, Deitz and Kaufman (1982) found that increasing task difficulty to a level above students’ abilities was associated with an increase in students’ levels of inappropriate behavior. Unlike the grouping mechanisms discussed thus far, differentiated instruction seems to have positive benefits for all students in the classroom, so long as the teacher does not misidentify students, can teach to myriad skill levels and can manage a productive classroom with multiple groups – non-trivial qualifications. Not only do grouping teachers often differentiate instruction, most of them allocate their time differently than teachers who employ whole-class instruction. The amount of time teachers spend with students can affect how the students develop socio-emotionally. For instance, 15 Skinner and Belmont (1993) found that the level of teacher involvement – which includes time, energy and aid – is positively related to student engagement. Given that grouping teachers usually break up their time amongst groups, any particular student probably has less total teacher time in a grouped class than in an ungrouped class. However, each student’s interaction with the teacher might be more meaningful in a grouped class than in an ungrouped class, which is important in light of the fact that quality is an aspect of Skinner and Belmont’s (1993) measure of teacher involvement. What is more, teachers can spend different amounts of time with different groups. Some research has found that grouping teachers spend less time with low ability groups than with high ability groups (e.g., Allington, 1980), while other research has found that the opposite is true (see, e.g., Rowan, 1983). In light of this conflicting evidence, the direction of a teacher time-driven grouping effect is not clear a priori. Dividing the classroom into small ability groups also changes the room’s social setting, which can impact students’ socio-emotional development. Within-class ability grouping stratifies peer achievement contexts. That is, low academically-able students are placed in one group, average-able students in another, and so on (Rowan and Miracle, 1983). This stratification can engender socio-emotional peer effects. Katz, Kling and Liebman (2001) note that peer effects can arise from peer learning and imitation, as well as from changes in individuals’ beliefs about the social desirability of a given behavior and changes in individuals’ expectations regarding the likelihood that they will be penalized for behaving badly. Empirically, peer effects have been associated with educationally-related behaviors of juveniles (Gaviria and Raphael, 2001), as well as those of college students (Sacerdote, 2001). There is a general agreement in the literature that achievement and behavior are inversely related (Algozzine, Wang and Violette, 2010). This research suggests that students in low ability 16 groups likely suffer from negative peer effects while students in high ability groups likely benefit from positive peer effects. Such a divergence makes predicting an average peer effects-driven grouping effect difficult, but it has clear implications for differential grouping effects. However, it bears mentioning that any influence that peer effects might have on differential grouping effects could be muted by influences that students’ self-efficacy beliefs have on them, as described above. 3.3 Expected Grouping Effects The average effect of within-class ability grouping is difficult to predict. Grouping may have positive or negative self-efficacy effects and these effects may differ by the relative position of the student in the classroom. In addition, grouping can influence how students treat each other, which may have negative effects for lower achievers but may also have positive effects if students are more comfortable with their peer groups. Differentiated instruction can benefit all students but if teachers do not have the ability to do it well then the overall effect could be negative. Similarly, ability grouping can change the quality and quantity of instructional time that a student has with a teacher, with, again, positive or negative outcomes. Finally, grouping can change the peer group that students have access to. The peer effects are likely to be more beneficial to high-achieving students but low-achieving students may have had little access to these students in whole class settings anyway, so peer effects influences may be small. Taking into account all of the possible grouping mechanisms, the overall direction of a grouping effect is unclear. While it is difficult to predict the overall direction of a within-class ability grouping effect, it is easier to speculate about how such an effect might differ across infrequently-grouped, frequently-grouped and flexibly-grouped classes, across math and reading classes, across 17 kindergarten and first grade classes, and for students in different within-class achievement quartiles. First, relative to students in infrequently-grouped classes, students in frequently-grouped classes receive a higher dosage of grouping – they are identified by their group placement more often, receive a greater amount of differentiated instruction and differentiated teacher time, and they are stratified by social settings more often. For this reason, any grouping effect, positive or negative, is likely to be more pronounced for students in frequently-grouped classes than in infrequently grouped classes. The one exception to this is the effect of identification itself, which may not be affected by the time spent in groups. Compared to students in frequently-grouped classes, students in flexibly-grouped classes move between ability groups throughout the year. This dynamic can ameliorate the negative grouping effects driven by students’ self-efficacy beliefs and how students treat each other; and, it can enhance the benefits of differentiated instruction and peer effects. For these reasons, I expect any flexible grouping effect to be more positive for low-performing students than frequent grouping effects. Secondly, potential grouping effects may differ between math and reading classes. In ability-grouped math classes, teachers commonly present the lesson to the group as a whole using a single textbook (Hallinan and Sørensen, 1983), then provide enrichment activities to a group of high achievers and remediation activities to a group of low achievers (Slavin, 1987). By comparison, in ability-grouped reading classes, teachers often use instructional materials tailored to the group level (Hallinan and Sørensen, 1983). In such reading classes, lessons are presented to ability groups separately. There is also some evidence that math teachers tend to use fewer ability groups than reading teachers. Often, math teacher use only two groups, 18 compared to three to four typically employed by reading teachers (Slavin, 1987). Both of these differences suggest that the instruction that takes place in ability-grouped math classes is less differentiated than that which takes place in grouped reading classes. Also, the fact that math teacher generally use fewer groups than reading teachers means that students’ achievementbased identification is less refined. Overall, I view grouping in math class as less intense than grouping in reading classes, and I expect any grouping effect in math to be smaller than that in reading. Thirdly, grouping might affect kindergartens and first graders differently. Compared to first grade students, kindergartens may be less aware of their group’s hierarchical status. As a result, their self-efficacy beliefs may be less affected than those of first graders when they are identified based on achievement; and, kindergartners might treat each other better than first graders as a result. In addition, there may be fewer opportunities to use instructional groups in first grade than in kindergarten. In a study of what takes place in a full-day kindergarten program, Elicker and Mathur (1997) found that student’s spend 58 percent of the day in childinitiated learning activities (free play, learning centers, cooperative learning and individual creative time), or eating or transitioning between activities. In first grade, students spend more time in structured learning activities. Because it is likely the case that first graders spend more time in ability groups than kindergartners, and because they are more likely aware of the hierarchical meaning of their group placement, I expect any grouping effect, positive or negative, to be more pronounced for first graders than for kindergartners. Finally, grouping effects may differ for students depending on their position in different within-class achievement quartiles. Students in all ability groups can benefit from differentiated instruction, and students in both high and low ability groups are at risk of being teased by their 19 peers based on their group placement; however, student placed in low-ability groups probably suffer from a deterioration of self-efficacy beliefs and negative peer effects, while students placed in high ability groups probably receive a boost in their self-efficacy beliefs. High achievers are also likely exposed to peers who behave well. Therefore, I expect within-class ability grouping to have a more positive effect on high-ability students. However, if teachers spend more time with low-ability students, my expectation could change (as mentioned above, findings are mixed about how grouping teachers allocate their time amongst low and high achievers). 3.4 Confounding Factors While the analyses below do not directly address the mechanisms by which grouping might affect student development, they do address the factors that affect grouping. These factors can influence children’s socio-emotional development directly and through grouping assignments (and could therefore bias my results). For example, schools that choose abilitygroup-based curricula that employ student differentiation may be more or less effective in ways that impact student development. Similarly, teachers who elect to group might also be better at promoting the social and emotional skills of their students than non-grouping teachers, regardless of grouping. Finally, student and parent characteristics that predict grouping might also affect students’ social and emotional skills. For example, highly-involved parents who lobby for their child to be placed in a grouped classroom could be particularly good at promoting their child’s socio-emotional development. These confounding factors, which are represented in Figure 1 below as dashed lines, could bias my within-class ability grouping effects estimates. Statistically controlling for them is at the heart of my estimation strategy, described in Section Five below. 20 4. Data The data for this study come the from the U.S. Department of Education, National Center for Educational Statistics’ (NCES) Early Childhood Longitudinal Study-Kindergarten Class of 1998-1999 (ECLS-K). ECLS-K followed the same children from the fall of 1998, when most children began kindergarten, through the spring of 2007, when most students finished the eighth grade. The ECLS-K data set, which is comprised of data gathered through parent, teacher and school administrator surveys, contains nationally-representative data on a total of 21,260 children. The data set has information on children's cognitive, social, emotional, and physical development, home environments, and home educational activities, as well as information on their school environments, classroom environments, classroom curricula, and teachers’ qualifications. To estimate the effects of within-class ability grouping, I use information from the first two waves of data collection (1998-1999 and 1999-2000). I primarily rely on students’ teacher Social Rating Scale (SRS) scores, data on teachers’ use of within-class ability grouping, and students’ ECLS-K direct cognitive assessment scores in reading and math. I use teacher SRS scores as measures of students’ social and emotional skills. In the fall and spring of kindergarten, and in the fall of first grade, teachers rated students on a number of socio-emotional developmental characteristics, using a scale from one (the student never exhibits the behavior) to four (the student exhibits the behavior very often). This data was used to create the five Teacher SRS scale scores that I use in this study: Approaches to Learning, Self-Control, Interpersonal Skills, Externalizing Problem Behaviors, and Internalizing Problem Behavior. The Approaches to Learning Scale includes six items that are intended to capture student attentiveness, task persistence, eagerness to learn, learning independence, flexibility, and 21 organization. The Self-Control Scale is comprised of four items intended to measure a child’s ability to respect the property rights of others, control his or her temper, accept peer ideas for group activities, and to respond appropriately to pressure from peers. The five Interpersonal Skills items rate a child’s ability to form and maintain friendships, get along with people who are different, comfort or help other children, express feelings, ideas and opinions in positive ways, and show sensitivity to the feelings of others. Externalizing Problem Behaviors include acting out behaviors; five items on the scale rate: the frequency with which a child argues, fights, gets angry, acts impulsively, and disturbs ongoing activities. The Internalizing Problem Behavior Scale is comprised of four items, which collectively ask about the apparent presence of anxiety, low self-esteem and sadness. Following Tach and Farkas (2006), I standardize the teacher SRS scale scores, which enhances the interpretability of the results.8 To generate the ability grouping variables, I rely on two questions from the ECLS-K Kindergarten Teacher (Spring) and First Grade Teacher (Spring) Questionnaires. The first question is: “How often do you divide your class into achievement groups for reading or math?”9 The second question is: “Has the child moved to a higher or lower reading achievement group or not moved during this school year?” Following Robinson (2009), I define infrequent grouping as grouping that occurs less than three to four times per week, and frequent grouping as grouping that occurs three to four or more times per week. I define flexible grouping in reading as grouping that occurs three of four times per week or more, in which more than 10 percent of students move between groups throughout the year. Ten percent between-group mobility is about the average level of mobility in frequently grouped classrooms in which there is any I calculated z-scores prior to restricting or imputing data for missing values so that students’ standardized teacher SRS scores are nationally-representative. 9 Answer options include: never, less than once per week, once or twice per week, three of four times per week, and daily. 8 22 mobility. The ECLS-K data set does not have information that allows for the analysis of flexible groups in math classes. Such an analysis is therefore excluded from this study. In order to assess the differential effects of ability grouping on high- and low-achieving students, I generate achievement quartile variables using students’ baseline ECLS-K direct cognitive assessment scores in math and reading. For all analyses, I use the ECLS-K T-scores, which are versions of the test scores that are standardized at each assessment wave with a zero mean and a standard deviation of 10. Because there is no fall data for first graders, I use students’ spring kindergarten scores as their first-grade baseline. Finally, I standardize each student’s direct cognitive assessment scores within the classroom. It is from these measures that I create within-class achievement quartiles. I restrict my sample to include only students whose teachers provided within-class ability grouping information and students with valid teacher SRS scores (in the spring of kindergarten and first grade) and direct cognitive assessment scores (in the fall and spring of kindergarten). I also restrict my analyses to a subsample of students who meet a few other criteria. First, I drop students who were not first-time kindergartners from the sample, as well students who received special educational services. Secondly, I drop students who changed teachers during their kindergarten year (an analogous variable does not exist for first graders). I did this because moving classrooms, as well as a student’s experiences in his or her new classroom, could affect the student’s social and emotional skills and bias my effects estimates. After all data restrictions, my total sample size is 8,459 students. In addition to restricting the sample based on missing data for the few key variables described above, I impute data for missing values of other variables. To do this, I use multiple imputation by chained equations (White, Royce and Wood, 2010). Imputation by chained 23 equations relies on using the distribution of observed data to estimate plausible values for the missing data (White, Royce and Wood, 2010). I use imputation by chained equation to create multiple imputed data sets, which I use to generate the within-class ability grouping effect estimates. 5. Methods As mentioned above, this study addresses three primary research questions. The first question is: what are the characteristics of schools, teachers and classrooms that use within-class ability grouping? The second question is: to what extent does within-class ability grouping affect students’ socio-emotional development? The third question is: are high-ability and lowability students differentially affected by within-class ability grouping? In what follows, I describe the methods that I use to answer each of these questions. 5.1 What are the characteristics of schools, teachers and classrooms that use within-class ability grouping? This question is descriptive and exploratory. To answer it, I use basic descriptive techniques. First, I present the mean student, teacher and classroom characteristics for each type of grouping in kindergarten and first grade, and I conduct F-tests to determine whether or not differences in mean characteristics between no grouping and each type of ability grouping are statistically significant. The student variables I examine include: gender, race, age, baseline achievement test scores, and baseline social and emotional skills scores. The classroom-level variables I analyze include: teacher gender, age, years of experience teaching the grade, certification and attitudes; class size; percentages of African-American, Hispanic and male students in the class; the average age and socioeconomic status of students in the class; the baseline average achievement test and social and emotional skills scores of student in the class; 24 and, the within-class standard deviation of student achievement test scores. The school variables that I examine include whether or not the school is a public or private school and the school’s level of urbanicity. The results from the F-tests tell me if schools with high levels of grouping, grouping teachers, and students in grouped classrooms differ from their ungrouped counterparts along observable dimensions. This is important because if ungrouped and grouped schools, classrooms and students differ in observable ways, it is more likely that they differ in unobservable ways as well. For example, if students in high socio-economic strata are placed into grouped classrooms more often than less-wealthy students, it could be the case that there are unobserved factors that impact richer students’ placement in grouped classrooms. Perhaps parents of wealthy students lobby school administrators to have their children placed in grouped classrooms because grouping teachers are better at promoting the academic achievement of student than nongrouping teachers. If this is the case, and good teachers also affect students’ social and emotional skills, then unadjusted regression results will be biased. If observed differences do exist, they might be explained by other observed factors that differ across grouping types. To test whether or not this is the case, I run multivariate logit analyses, first including the variables described above and then adding school fixed effects. When analyzed alongside the model without fixed effects, the fixed-effects model allow me to examine whether or not the factors within a school that predict teachers’ grouping are similar to the factors that predict grouping across schools. The outcomes of the multivariate logits should be interpreted as the comparison between the grouping type (infrequent grouping, frequent grouping and flexible grouping) and no grouping. 25 5.2. Within-Class Ability Grouping Effects Estimation Strategy: To estimate the effects of within-class ability grouping, I use several different techniques. The purpose of my effects estimation strategy is to purge the effect estimates of omitted variable bias. 5.2a. Weighted Least Squares regressions: To begin, I use a weighted least squares regression model with no covariates, given by Equation 1:10 [1] Yicst = β0 + Gcstβ1 + eicst, where student i's (in classroom c in school s) social and emotional domain skill score in spring (time t) is a function of: whether his or her class was ungrouped, infrequently-grouped, frequently-grouped, or flexibly grouped during the school year, denoted by Gcst; and, an error term comprised of student, classroom and school components, as well as unobserved factors, denoted by eicst. I estimate Model 1 for all five socio-emotional developmental scores. Given that ungrouped classrooms likely differ from grouped classrooms in ways that affect students’ social and emotional skills, Equation 1 cannot be taken as causal. As a first step in estimating the causal effect of within-class ability grouping on students’ socio-emotional skills, the next model, given by Equation 2, adds baseline student, teacher, classroom and schoollevel covariates to Equation 1: [2] Yicst = β0 + Gcstβ1 + Xicstβ2 + Ccstβ3 + Sstβ4 + eicst Student-level covariates, denoted by Xicst, include race, socioeconomic status (SES) and age. Teacher and classroom covariates, denoted by Ccst include: teacher gender, race, education level, 10 Kindergarten data is weighted using ECLS-K student-level panel weight BYCW0, ECLS-K strata variable BYCWSTR, and ECLS-K primary sampling unit (PSU) variable BYCWPSU (note: strata with single sampling units are treated as certainty units). First grade data is weighted in the same way, but with ECLS-K variables Y2COMW0, Y2COMWSTR, and Y2COMWPSU. Weighting the data in this way corrects standard errors to reflect ECLS-K’s multistage probability sample design. 26 certification level, and attitudes (including indicators for whether or not the teacher really enjoys his or her job, strongly believes s/he is making a difference, and that not all students are capable of learning); and, class size, the percent of black and Hispanic students in the class, the percent of boys in the class, whether or not the class is a full day class (kindergarten only), the average age of students in class, the average reading test score of students in the class, the within-class standard deviation of students’ reading test scores, and the average social and emotional skills scores of student in the class.11 School covariates, denoted by Sst, include indicators for whether the school is a private school, an urban school or a rural school. Since Equations 1 and 2 may not fully capture differences in grouped and ungrouped students’ prior social and emotional skills, I add students’ baseline social and emotional skills scores in Equation 3:12 [3] Yicst = β0 + Gcstβ1 + Xicstβ2 + Ccstβ3 + Sstβ4 + β5Yicst-1 + eicst, Where Yicst-1 is a student i's fall social and emotional domain skill scores. 5.2b. Weighted Least Squares regressions with school-level fixed effects: Models 1-3 control for observable characteristics of students, classrooms and schools; however, schools may employ different types of grouping may differ in unobservable ways. In order to account for these differences, the next model, as given by Equation 4, compares classrooms within the same schools using school fixed effects: [4] Yicst = β0 + Gcstβ1 + Xicstβ2 + Ccstβ3 + Sstβ4 + β5Yicst-1 + γs + eicst, 11 The ECLS-K data set does not have baseline data for first grade teachers. Consequently, I use first grade teacher data from the spring, after teachers have taught such classes. A teacher’s gender or race is not affected by whether or not s/he groups; and, it is highly unlikely that a student’s socioeconomic status, a teacher’s education level, a teacher’s certification level, and any of the classroom level variables are affected by grouping. However, it may be the case that grouping is affected by teachers’ attitudes. One should therefore view the results of the first grade within-class ability grouping effect estimates with caution. 12 Students’ fall social and emotional social skills levels were measured one to two months into the school year, which means that Model 3 might over-control for these factors. Since no baseline data exists for first graders in the ECLS-K data set, I use students’ spring kindergarten social and emotional skills scores as their baseline scores. 27 where γs is a school fixed effect. To take into account the ECLS-K multistage probability sampling procedure and calculate adjusted standard errors, I estimate Equation 4 by including sampling strata dummies, and I cluster the standard errors at the primary sampling unit (PSU) level.13 Doing this is necessary to generalize results to the national level. However, it does not take into account the possibility that the error terms of students in the same classroom are correlated. This is a concern because within-class ability grouping is a class-level “treatment,” and it is possible that one or more explanatory variables not described by Equation 4 could impact all students in a given class. If this is the case, the degrees of freedom used to estimate the within-class ability grouping effects could be too high, which could impact their statistical significance. Unfortunately, there is no way to simultaneously control for the multistage probability sample design of ECLS-K and the natural clustering associated with within-class ability grouping. Thus, there is a tradeoff between clustering standard errors so that estimates are nationally representative and taking into account similarities amongst student within classrooms. Fortunately, in this study, the two approaches produce similar results. 13 Stata’s survey data commands (also known as “svy” commands), which are required to account for ECLS-K’s multistage probability sample design and to obtain correct standard errors, do not support its “xtreg” commands (these are used to estimate fixed-, between- and random-effects models, as well as population-averaged linear models). A solution to this problem would be to use Stata’s “areg” command with the proper primary sampling unit (PSU) as the cluster variable, along with dummy variables for the number of strata in the sample (including strata dummy variables corrects for the inappropriate increase in the degrees of freedom that occur when one abandons the svy command). However, Stata’s multiple imputation estimation command (“mim”), which I use throughout my analyses to control for missing values in the original data set, does not support the use of areg. Thus, I proceeded as follows. First, I estimate the within-class ability grouping effects using the mim and xtreg commands with the “vce(robust)” option. Using the vce(robust) option in this way is equivalent to clustering the standard errors at the school level (source: http://www.stata.com/features/panel-data/xtreg.pdf, pp. 5-6; date last accessed: 1/25/2012). However, it does not account for the ECLS-K sample design, which could lead to incorrect standard errors. Therefore, I abandoned the use of the mim command. Instead, I took the average value across the imputed data sets and then input them into a single regression (this is different than running regressions within each data set and then taking the average of those estimates, which is what mim does). To check the impact of taking this approach, I used it with xtreg, fe vce(robust) (these are the same Stata specifications I used with the mim command), and compared the results to those I originally generated with mim. The two methods produced nearly identical results. However, the new approach might also produce incorrect standard errors (they too were calculated using vce(robust); i.e., they don’t take into account clustering at the PSU level). To calculate the correct standard errors, I use the new approach with areg, strata dummies and the PSU as the cluster variable. This approach approximates the svy command and takes into account the sample design of ECLS-K. 28 5.2c. Differential effects Weighted Least Squares regression with teacher-level fixed effects: While Model 4 controls for school-level variables related to grouping and students’ socio-emotional skills, it does not account for unobserved characteristics of teachers, which are likely the largest potential source of omitted variable bias. For example, if high-quality teachers are more likely to group then low-quality teachers, then the above approaches will confound teacher quality with grouping. Unfortunately, there is no way to separate teacher-level unobservable variables from grouping to get a clean estimate of the overall effect of grouping with the ECLS-K data set. However, one can look at the differential effects of grouping on different students in the class, while removing the unobserved characteristics of teachers. To do this, I introduce both a teacher fixed effect and interaction between the grouping variable and a dummy variable for a student’s reading achievement quartile. Equation 5 describes this approach: [5] Yict = β0 + Qictβ1 + Gcst*Qictβ2 + Xicstβ3 + β4Yict-1 + γc + eict, where Qict indicates the baseline within-class reading quartile of student i, Gict*Qict is an interaction of student i’s grouping status and baseline within-class reading achievement quartile, and γc is a teacher fixed effect. The estimated coefficient β2 describes the differential effect of within-class ability grouping on the social and emotional skills score of a student within a particular achievement quartile. So long as teachers are not differentially good at promoting the socio-emotional skills of within different achievement quartiles, it is an unbiased estimate. 5.2d. Propensity score matching estimators as a specification check: Equations 1-5 above parameterize the relationship between grouping and student outcomes, which relies on the assumption that the linear model is correctly specified (Reardon, Cheadle and Robinson, 2009). However, if the equations are incorrectly specified, then the 29 estimates might be biased, especially if students differ a lot between ungrouped and grouped classrooms. Propensity score matching can reduce the threat of mis-specification bias. To take advantage of this benefit, I re-estimate Equations 1-5 using a propensity score matching estimator. I employ nearest neighbor matching, and in general, I follow the matching and effects estimation procedures outlined by Reardon, Cheadle and Robinson (2009). To begin, I fit a logit model predicting whether or not a teacher uses ability groups, with a robust set of student, teacher, and classroom covariates (the same as those in model 3), as well as higher-order terms to increase covariate balance in the matched sample. From this model, I obtain each student’s propensity score, or the predicted probability of each student’s placement into a grouped classroom. Secondly, I match each grouped student to all ungrouped students with estimated propensity scores within one percentage point of that of the grouped student, up to a maximum of ten matches (if a grouped student has more than ten matches, I select the closest ten), with replacement (this means that any ungrouped student may be used for matching more than once). My caliper choice of one percentage point ensures that each grouped student is matched to one or more very similar ungrouped students, while my decision to use ten matches is based on the tradeoff between precision and bias, as well as on a sensitivity analysis of covariate balance across the grouped and matched samples (using five matches resulted in worse balance, while I worried that 15 matches would produce matches that are too dissimilar). Thirdly, I assigned a weight to each matched ungrouped student proportional to the extent to which the ungrouped student is used as a match, and such that the sum of the weights for the matched and ungrouped samples are equivalent. Re-weighting the data using the matching 30 weights makes the distributions of grouped and ungrouped students similar (Reardon, Cheadle and Robinson, 2009). Fourthly, I assess the covariate balance in the matched sample, by estimating a series of simple linear regressions in which each covariate used in model 3 (as well as higher order terms) is a dependent variable and the grouping variables are the independent variables. The coefficients on the grouping variables indicate the average difference in the covariate between the weighted matched ungrouped and grouped student samples (Reardon, Cheadle and Robinson, 2009). Across all ten matched samples, only three of over three hundred individual covariate estimates are statistically significant.14 To further assess the overall quality of my matching process, I computed the average of the absolute values of the grouping coefficients across all covariates. Across all ten matched samples, this average is about 0.025. These results suggest that the matching process described above produced good covariate balance. Finally, I re-estimate Equations 1-5 using the matched samples. Since re-weighting the data using the matching weights makes the distributions of grouped and ungrouped students similar, the within-class ability group effect estimates should be interpreted as estimates of the average effect of grouping on the type of students who are grouped (Reardon, Cheadle and Robinson, 2009). Therefore, when comparing the weighted linear regression and matching estimation results, it is important to remember that the former represent average treatment effect (ATE) estimates, while the latter represent average treatment effect on the treated (ATT) estimates (Reardon, Cheadle and Robison, 2008). ATE estimates generalize to students like 14 These include: the percentage of Hispanic students in flexible, first-grade reading groups (flexibly-grouped classes have nearly 5 percent more Hispanic students, on average, than ungrouped classes); whether or not a teacher is black in infrequently-grouped first grade math classes (infrequently-grouped classes are roughly 2.5 percent more likely to have a black teacher); and student socioeconomic status levels in frequently-grouped first grade math classes (the average socio-economic status of students in frequently-grouped classes is about one-seventh of a quintile lower than that of students in non-grouped classes). 31 those in the entire analytic sample, while the ATT estimates generalize only to students like those in the analytic sample who are likely to be grouped. 6. Results In this section, I present the results of the analyses described above. When interpreting the within-class grouping effect estimates in this paper, it is important to keep in mind that a “positive grouping effect” means that there is evidence that grouping leads to an increase in a student’s given social and emotional skill score. Increases in the Self-Control, Interpersonal Skills and Approaches to Learning scale scores are good things, while increases in Externalizing and Internalizing Problem Behavior are bad. It is also important to keep in mind that all results should be interpreted as comparisons between the given form of within-class ability grouping (e.g., flexible grouping) and no grouping. 6.1 Characteristics of schools, teachers and classrooms that use within-class ability grouping: Table 1 reports summary statistics as well as population-weighted mean differences in the characteristics of students in grouped and ungrouped classes across my four analytic samples (kindergarten reading, kindergarten math, first grade reading and first grade math). The summary statistics reported in Table 1 are consistent with those of prior within-class ability grouping research. I find that grouping occurs more frequently in first grade than in kindergarten, and more often in reading classes than in math classes. For example, roughly 61% of kindergartners are grouped in reading classes, below the percentage of first graders in reading groups (94.2%), but above the number of first graders in math groups (53.3%).15 I also find that 15 Chorzempa and Graham (2006) is an example of a past study that documented similar trends. 32 many of today’s students do indeed experience flexible grouping in reading. As an example, in first grade, nearly 47% of students experience flexible grouping in reading.16 When examining the population-weighted mean differences in the characteristics of students in grouped and ungrouped classes, I find that grouping has racial and socioeconomic components. There are fewer Caucasian students in grouped classes than in ungrouped classes, while there are more African-American and Hispanic students in grouped classes than in ungrouped classes. Likewise, students in grouped classes tend be younger, poorer and have lower initial test scores than students who do not experience grouping. 17 Separately, my data provide some evidence of a difference between grouped and ungrouped kindergartners with respect to their initial levels of social and emotional skills: in both reading and math classes, grouped students have lower Approaches to Learning skill scores than ungrouped students Table 2 summarizes the mean differences amongst teachers who do and do not use within-class ability groups. I find that more teachers group in first grade than in kindergarten and that reading teachers use groups more frequently than math teachers (e.g., roughly 95% of first grade reading teachers use ability groups). In addition, ungrouped classrooms are generally led more often by Caucasian teachers than grouped classes, while grouped classrooms are led more often by African-American teachers than ungrouped classrooms.18 Furthermore, in kindergarten, there is fairly consistent evidence that grouped classrooms are led by teachers who are less satisfied with their job than ungrouped classrooms. There is also some evidence that The documented rise in flexible grouping is described in this paper’s introduction. These trends are particularly pronounced in the comparison between frequent grouping and no grouping. One meaningful exception to the racial and socioeconomic grouping trends is in first grade reading classes, where grouping is nearly universal. In first grade reading classes, the only one consistent difference between grouped and ungrouped students is that there are more black students in grouped classes than in ungrouped classes. This finding might be explained by the near universality of grouping in first grade reading classes. 18 One meaningful exception pertains to first grade teachers – there are no meaningful racial differences among first grade teacher who do and do not use ability groups. This finding might also be due to the fact that nearly all first grade reading teachers group students. 16 17 33 teachers of full-day kindergarten classes tend to group more often than those of partial-day classes. Finally, across all four analytic samples, there is scant, mixed evidence regarding how within-class student achievement heterogeneity impacts grouping: in kindergarten reading classes, the average standard deviation of within-class assessment z-scores is lower in infrequently-grouped classes than in ungrouped classes; and, in first grade, the average standard deviation of assessment scores is lower in frequently-grouped classes than in ungrouped classes, but higher in flexibly-grouped classes than in ungrouped classrooms. Tables 3 and 4 summarize my analytic samples at the school-level, as well as differences across schools with respect to within-class ability grouping. To begin, it is useful to describe kindergarten and first grade analytic samples in terms of their school-type compositions. Of the 715 total schools in the kindergarten samples, 78% are public schools (the rest are private schools), 40% are urban schools, 14% are rural schools, and 46% are suburban schools. The breakdown is about the same for the 1,043 total schools in the first grade samples. When analyzing differences across schools pertaining to within-class ability grouping, two trends emerge. First, grouping occurs more frequently in public schools than in private schools. For example, as Table 3 illustrates, in only 18% of all public schools is it the case that no kindergarten reading teachers use ability groups. This figure is well below the corresponding percentage related to private schools of 44%. Similarly, in 90% of all public schools, at least one first grade teacher uses reading groups, well above the corresponding figure related to private schools of 82% (see Table 4). The second school-level grouping trend that emerges pertains to school urbanicity. Within-class ability grouping occurs less frequently in rural schools than in urban and suburban schools. For instance, in only 20% and 23% percent of urban and suburban 34 schools, respectively, is it the case that no kindergarten teacher uses reading groups. These figures are well below the corresponding percentage related to rural schools of 36% (see Table 3). If I consider schools in which at least one teacher groups, the urbanicity trends weaken but remain present. As an example, in 69% and 66% of all urban and suburban school, respectively, at least one first grade teacher uses reading groups. These figures are above the corresponding percentage related to rural schools of 65% (see Table 4). Tables 1-4 capture within-class-ability-grouping-based differences in my analytic samples in univariate framework. The results of logit analyses presented in Table 5 capture differences across grouping paradigms in a multivariate framework. That is, they provide insight into whether or not the univariate differences reported in Tables 1-4 can be explained by other observed factors that differ across grouping types. Table 5 reports results from a logit model with no school fixed effects, as well as those from a model with school fixed effects. The results from the two models shed light on the extent to which the factors that predict grouping within schools are similar to those that predict grouping between schools. Findings from the between schools logit model generally support results of the mean differences analyses described above. Across all four analytic samples, the logit results provide consistent evidence that African-American and Hispanic students are more likely than students of other races to experience within-class ability grouping. For instance, in first grade reading classes, a one point increase in the percentage of African-American students in a classroom makes it about 2.4 times as likely that a classroom is frequently grouped (see Table 5). The logit results also provide some evidence that African-American teachers group more frequently than teachers of other races, as well as some evidence that teachers of full-day kindergarten classes are more likely to group than teachers of part-day classes. Finally, the between school logit 35 analysis, like the means differences analyses, provides little, mixed evidence that within-class student performance heterogeneity. However, findings from the between schools logit analysis and results of mean differences analyses diverge in two ways. First, in the logit analysis, low job satisfaction no longer predicts whether or not a kindergarten teachers uses ability groups. Secondly, in the logit analysis, the classroom average level of Internalizing Problem Behavior amongst students predicts whether or not the class is grouped. As an example, in first grade reading classes, a standard deviation increase in the average level of Internalizing Problem Behavior makes it 0.36 times as likely that the class is infrequently grouped. The factors that predict ability grouping within-schools are somewhat similar to those that predict grouping between schools. For instance, between schools, the classroom average level of Internalizing Problem Behavior amongst students negatively predicts whether or not the class is grouped within schools (see Table 5). However, as Table 5 illustrates, within-schools, the percentage of African-American students in a class is only a statistically-significant predictor in first grade reading classes. Additionally, the percentage of Hispanic students in a class positively predicts grouping in kindergarten, but negatively predicts grouping in first grade reading classes (this is only instance in which the percentage of Hispanic students is a negative predictor). Somewhat surprisingly, the within-class student performance heterogeneity negatively predicts ability grouping within schools. For example, in first grade reading classes, a standard deviation increase in the heterogeneity of students’ reading assessment z-scores makes it 0.123 times as likely that a class is grouped. Taken as a whole, results of the logit analyses, mean differences analyses and school comparisons provide several meaningful insights into within-class ability grouping. First, ability 36 grouping appears to have a racial component. Nearly all of the mean differences and logit analyses suggest that African-American and Hispanic students are more likely to experience grouping than white students. On what is perhaps a related note, the comparisons of schools in my samples indicate that grouping occurs more frequently in public schools and urban and suburban than in private schools and rural schools. While these results are consistent with past research (see, e.g., Robinson, 2009) and thus unsurprising, there are two surprising results from the analyses described above. First, there is scant, mixed evidence that the level of student performance heterogeneity of a classroom predicts grouping. This result is surprising in light of the widely-accepted notion that teachers group to meet diverse student needs. The second surprising finding is that the classroom level of Internalizing Problem Behavior negatively predicts grouping, both within and between schools. One rationale for grouping is managing classroom behavior, so it is somewhat surprising to see that students with low levels of Internalizing Problem Behavior are more likely to be in grouped classes. Overall, the analyses make is clear that grouped students, teacher and schools differ along observable dimensions from their ungrouped counterparts. Since this is true, grouped and ungrouped students, classrooms and school might differ in unobservable ways and unadjusted regression results could be biased. 6.2 Main within-class ability grouping effect estimates: Tables 6 presents within-schools estimates of within-class ability grouping effects (the results of model 4). Within-class ability grouping appears to negatively affect a broad range of students’ social and emotional skills. For instance, in first grade reading classes, frequent ability grouping negatively impacts students’ Self-Control, Internalizing Problem Behavior, Interpersonal Skills and Approaches to Learning skills scores. The largest effects pertain to 37 students’ Self-Control and Approaches to Learning scores. As an example, in first grade reading classes, the effect of infrequent grouping, relative to no grouping, on those skill scores is -0.112 and -0.116 standard deviations, respectively. Interestingly, within-class ability grouping appears to have little to no effect on students’ levels of Externalizing Problem Behavior. It also bears mentioning that the within-schools ability grouping effects tend to be confined to first grade, with the exception of the effect grouping has on students’ Approaches to Learning scores. In both reading and math classes, grouping negatively impacts’ students Self-Control, Internalizing Problem Behavior, and Approaches to Learning skills scores (again, many of the effects are confined to first grade). However, in virtually all cases, the within-class ability grouping effects are stronger in reading than in math. For example, in first grade reading classes, the effect of frequent grouping, relative to no grouping, on students Internalizing Problem Behavior scores is +0.049 standard deviations. This effect is larger than the corresponding effect of +0.038 standard deviations in math. There another way in which the effect of ability grouping differs across reading and math classes. In first grade, reading groups appear to have a negative impact on students’ interpersonal skills, while math groups do not appear to impact them. The grouping effect estimates are surprisingly consistent across infrequent, frequent and flexibly-grouped classes. If anything, frequent grouping effects appear more modest than infrequent grouping effects; and, flexible grouping effects are very similar to both. For instance, the effect of infrequent grouping, relative to no grouping, on first graders’ Interpersonal Skills scores in reading classes is -0.60 standard deviations. This effect is smaller than the corresponding effect associated with frequent grouping of -0.051 standard deviations. One exception to the overall consistency of grouping effects pertains to students’ Internalizing 38 Problem Behavior scores, which is negatively impacted by frequent grouping in both kindergarten and first grade, but not by infrequent grouping. 6.3 Differential within-class ability grouping effect estimates: Table 9 presents differential within-class ability grouping effects estimates, based on students’ within-class achievement levels (the results of model 5). Across all of the socioemotional skills domains that I examined, there is at least some evidence of such effects. However, the most consistent evidence of differential grouping effects pertains to students SelfControl, Interpersonal Skills, and Approaches to Learning skills scores: within-class ability grouping differentially affects students with respect to these skills in both kindergarten and first grade, as well as in math and in reading. The largest estimated differential effects pertain to students’ Self-Control and Approaches to Learning skill scores. For example, in first grade math classes, infrequent grouping is associated with a 0.136 standard deviation increase in the Approaches to Learning scores of student. There do not appear to be any significant differential effects trends across subjects or grouping types. However, within-class ability grouping has opposite differential effects in kindergarten and in first grade. In kindergarten, there is consistent evidence that within-class ability grouping has a positive effect on the highest achievers, as well as some evidence that it has a negative differential effect on the socio-emotional development of the lowest-achieving kindergartners. In contrast, within-class ability grouping has a positive differential effect on the lowest achievers but a negative differential effect on the highest achievers in first grade. For example, in kindergarten reading classes, frequent grouping has a -0.022 standard deviation impact on the Interpersonal skills scores of the lowest achievers but a +0.065 standard deviation impact on the corresponding scores of students in the highest achievement quartile. However, in 39 first grade reading classes, infrequent grouping is associated with a 0.136 standard deviation increase in the Self-Control scores of the lowest achievers but a 0.148 standard deviation decrease in the corresponding scores of students in the highest achievement quartile. It also bears mentioning that the differential effect estimates tend to be larger in first grade than in kindergarten. 7. Discussion 7.1 Main Findings Perhaps the most interesting findings from my analyses are: (1) there is evidence that within-class ability grouping is practiced most often in public urban and suburban schools, with African-American and Hispanic students; (2) within-schools, within-class ability grouping appears to have a negative overall effect on children’s socio-emotional development; (3) many of the negative within-schools grouping effect estimates are concentrated in first grade; (4) the within-schools effect estimates appear fairly consistent across reading and math classes, but they tend to be larger in reading classes than in math classes; (5) the within-schools effect estimates appear fairly consistent across grouping types; and, finally, (6) within-class ability grouping seems to affect high and low achievers in the same classroom differently in kindergarten than in first grade. Each of these findings merits additional discussion, which is the direction that this paper now turns. To begin, it is worth noting that my finding that minority students in public urban and suburban schools are more likely to be grouped than white students in private schools or rural public schools is consistent with prior research. For example, Robinson (2009) also found that black and Hispanic students are grouped more frequently than white students. In Robison’s (2009) study, the fact that within-class ability grouping has a racial component is not necessarily 40 a bad thing. In fact, he found that the instructional practice benefitted language-minority Hispanic students more than other groups of students (Robinson, 2009). In contrast, I find that within-class ability grouping has a negative overall effect on children’s socio-emotional development. In light of this finding and studies demonstrating a link between individual earnings and socio-emotional skills (Dunifon and Duncan, 1998; Goldsmith, Veum and Darity, 1997; Lindqvist and Westman, 2009; and, Nyhus and Pons, 2005), the fact that within-class ability grouping has a racial component may be a bigger concern than previously believed. However, it does bear mentioning that this study’s overall grouping effect estimates are imprecisely measured; and, as a result, I cannot rule out the possibility that within-class ability grouping has a positive effect on minority students’ social and emotional skills (this is discussed further below). As previously mentioned, I find that within-class ability grouping has a negative overall effect on a wide range of students’ social and emotional skills. This result suggests that any grouping benefits from differentiated instruction, changes in the way the teacher allocates his/her time, and/or positive peer effects are outweighed by negative grouping impacts associated with identifying students based on their achievement level, students’ negative treatment of one another, and/or negative peer effects. My finding of a negative overall grouping effect contradicts results of causal studies and meta-analyses of within-class ability grouping that focus on academic outcomes. These studies provide consistent evidence of a positive grouping effect on the math achievement of students in the upper elementary grades (Kulik, 1992; Mosteller, Light and Sachs, 1996; Slavin and Karweit, 1985; and, Slavin, 1987), as well as some evidence of a positive effect on the reading achievement of elementary, secondary and post-secondary 41 students (Lou et al., 1996). They also show that grouping is better for low achievers than for high achievers with respect to academic outcomes (Slavin, 1987; Lou et al., 1996). I have three theories for why my results differ from those of the causal studies and meta analyses described above. My first theory is: in first grade (where many of my statisticallysignificant results are concentrated), it could be the case that the positive benefit of grouping for low achievers outweighs the negative impact of grouping for higher achievers with respect to academic outcomes but not for social and emotional skills. That is, in a grouped class, high achievers might continue to achieve at a high level but could exhibit worse social and emotional skills, perhaps due to less frequent contact with the teacher. Secondly, it could be the case that within-class ability grouping impacts students differently in the early elementary years than in upper elementary grades, where most of the findings from the causal studies and meta analyses come from. Finally, it could be the case that the main effects estimates of this study suffer from missing teacher-level variables. Whereas results of the casual studies and meta analyses control for teacher-level characteristics, my main effects estimates come from within-schools models, not within-classroom models. One exception to the trend in main effects pertains to students’ Externalizing Problem Behaviors, which appear to be mostly unaffected by within-class ability grouping. I can think of three potential explanations for this anomaly. First, it could be the case that teachers in both grouped and ungrouped classrooms can more easily detect and address Externalizing Problem Behavior than other social and emotional skills problems. Indeed, fighting and arguing students are easy to spot. Secondly, teachers may be less likely to report Externalizing Problem Behaviors than other problematic socio-emotional behaviors, if they are evaluated on the grounds of how often their students argue and fight. Finally, that within-class ability grouping 42 appears not to affect students’ Externalizing Problem Behaviors could be to due to the effectiveness of existing within-school policies geared towards minimizing the extent to which students fight and argue. A third noteworthy finding in this study is that many of the negative within-schools grouping effect estimates are concentrated in first grade. While this finding met my expectation (described in Section Three), the one exception to the finding – that students' Approaches to Learning scores are negatively impacted by grouping in both kindergarten and first grade – deserves some consideration. There are at least two possible reasons for why this exception exists. First, if ability in kindergarten is difficult to determine, in practice, teachers may assemble heterogeneous ability groups rather homogenous groups. Heterogeneous grouping is less likely to affect children’s socio-emotional development through their self-efficacy beliefs, students’ treatment of each other, peer effects and differentiated instruction than homogeneous grouping. This means, that of the casual mechanisms that I enumerated above, only changes in the way a teacher spends his/her time remains. It is possible that this change only impacts things like task persistence and student engagement, which are captured in Approaches to Learning. For example, it could be the case that fewer, more meaningful interactions with teachers motivate kindergartners more than a higher number of less meaningful interactions. A second possible reason for this finding is that ability grouping in kindergarten may have more to do with classroom management than differentiating instruction; and, improved management may drive persistence and engagement higher, but have little effect on other social and emotional skills. The line of reasoning is similar to that pertaining to heterogeneous grouping. 43 Another main finding than warrants additional discussion is that my within-schools grouping effect estimates appear fairly consistent across reading and math classes,19 although the effect estimates tend to be larger in reading classes than in math classes. That both reading and math groups appear to affect children’s socio-emotional development is consistent with results of causal grouping studies and meta analyses that find within-class ability grouping has effects on academic outcomes in both reading (Lou et al., 1996) and math (Kulik, 1992; Mosteller, Light and Sachs, 1996; Slavin and Karweit, 1985; and, Slavin, 1987). Also, the finding that the grouping effects appear larger in reading than in math class met my expectation, described in Section Three. One result that surprised me is that my within-schools grouping effect estimates are generally consistent across infrequent, frequent and flexible groups.20 Given that students in frequently-grouped classes receive a higher dosage of grouping than those in infrequentlygrouped classrooms, I had expected that frequent grouping would have a more pronounced effect than infrequent grouping. At the same time, given the benefits of flexible grouping relative to frequent grouping for low achievers (described in Section Three), I expected any flexible grouping effects to be more positive than frequent grouping effects for low achievers. If anything, frequent grouping effects generally appear more modest than infrequent grouping effects; and, in the one case in which flexible grouping affects low achievers in a statisticallysignificant way,21 flexible grouping has a more negative effect on students than frequent grouping. Two explanations for these finding jump to mind. First, it could be the case that a student‘s initial group placement is the most influential mediator of within-class ability grouping One exception is in first grade, where reading groups appear to negatively impact students’ Interpersonal Skills and math groups do not. 20 One exception pertains to Internalizing Problem Behavior, which is negatively impacted by frequent grouping in kindergarten and first grade, but not by infrequent grouping. 21 See Externalizing Problem Behaviors in first grade reading classes in Table 9. 19 44 effects. That is, the impact that being identified based on ability has on students’ self-efficacy beliefs might outweigh grouping effects driven by other mechanisms. Secondly, it is possible my grouping definitions do not accurately delineate between grouping types. This may be especially true of my definition of flexible grouping (I discuss this potential issue further below). Perhaps my most striking finding is that within-class ability appears to affect low and high achievers differently in kindergarten and in first grade. Taking into account all of the potential within-class mechanisms that I discussed, I expected grouping to have a more positive effect on high-ability students than on low-ability students. This is what I found in kindergarten but now what I found in first grade, where grouping occurs more often. In kindergarten, there is consistent evidence that within-class ability grouping has a positive effect on the highest achievers, as well as some evidence that it has a negative differential effect on the socioemotional development of the lowest-achieving kindergartners. However, the instructional practice has a positive differential effect on the lowest achievers but a negative differential effect on the highest achievers in first grade. I have two theories for why grouping might affect low- and high-achieving kindergartners and first graders differently. One theory is that in first grade, when teachers begin to focus on academics, they devote relatively more time to low achievers. This may be especially true in the current, high-stakes testing environment and in light of achievement gaps. For example, in first grade, when teachers begin to worry about students’ test scores, they may be most worried about the test scores of low achievers and devote more instructional time to them. This, in turn, may differentially affect the low achievers’ social and emotional skills in a positive way, while differentially affecting the high achievers’ social and emotional skills in a negative way (since they received less teacher time). In fact, there is some evidence that students 45 in low ability groups are involved in more direct interactions with teachers than student in high groups (Rowan and Miracle, 1983). A second theory why the difference between kindergartner and first grade exists is that high achievers do not begin to feel academically pressured until the first grade, and that this academic pressure negatively impacts their socio-emotional development. 7.2 A Causal Warrant for This Study’s Differential Effects Estimates As mentioned in Section Five, the purpose of my estimation strategy is to purge my ability grouping effect estimates of bias. In light of the steps I take towards that end, also described above, one can give a causal interpretation to the differential effects estimates of this study so long as grouping teachers are not particularly good at promoting the social and emotional skills of students in a particular achievement quartile in unobserved ways. While there is no easy way to test this, this study’s finding that differential effects in the kindergarten and first grade samples are nearly opposite suggests that the assumption is met. It is difficult to think of a reason why grouping kindergarten teachers are particularly good in unobserved ways at promoting the social and emotional skills of the lowest achievers, while grouping first grade teachers are especially good in unobserved ways at promoting the socio-emotional development of the highest achievers. 7.3 Study Limitations This study has a number of limitations. Its first limitation is that its results do not rely upon random assignment. While my analytic strategy focuses on purging the grouping effect estimates of omitted variables bias, some bias may still remain. This study’s second limitation pertains to the stable unit treatment value assumption, or SUTVA. SUTVA requires homogeneity of treatment, in this case, that all teachers use ability groups in the same way. 46 While this study goes a long way towards meeting that assumption, relative to past ability grouping studies – by delineating infrequent, frequent and flexible grouping – it is possible that a SUTVA violation still exists and that the results of this study are biased. The third limitation of this study pertains to my definition of flexible grouping. As a reminder, I define flexible grouping in reading as grouping that occurs three of four times per week or more, in which more than 10 percent of students move between groups throughout the year. It is unclear whether or not this is the type of flexible grouping used by roughly half of today’s early elementary reading teachers (Baumann, Hoffman, Duffy-Hester and Moon Ro, 2000; and, Chorzempa and Graham, 2006). The final limitation of this study is that assumes that when teachers answer the question “How often do you divide your class into achievement groups for reading or math?” they are referring to homogeneous ability groups. Since this is a standard interpretation of the question, the assumption may be met; however, some teachers might also consider heterogeneous ability groups as achievement groups. 8. Conclusion This study investigates the impact of three types of within-class ability grouping on a wide range of students’ social and emotional skills. It extends the large research base on ability grouping that dates back to the 1900s (Findley and Bryan, 1971), and it informs the on-going debate on the topic. In summary, within-class ability grouping appears to be a practice most often used with minority students in public urban and suburban schools, which is potentially troubling in light of my finding that, within schools, the practice has a negative overall effect on children’s socioemotional development. Also within schools, ability grouping seems to impact first graders to a greater extent than kindergartners, which might suggest that grouping does not have much 47 meaning in kindergarten, either for teachers or students. As expected, within-schools grouping effects appear stronger in reading than in math, where grouping in less intense. However, to my surprise, there does not seem to be much differentiation across grouping types with respect to grouping effects – this finding might purely be a function of my grouping definitions. Finally, within-class ability grouping seem to affect high and low achievers within the same classroom differently in kindergarten and in first grade (in first grade, it positively affects the social and emotional skills of low achievers but negatively affects high achievers). One explanation for this result is that in first grade, when academics become a central focus in the classroom, teachers spend more time addressing the needs of low achievers than the needs of high achievers; and, high achievers act out as a result. While this study’s results provide insights in to the overall and differential effects of within-class ability grouping on children’s socio-emotional development, they do not shed much light on the mechanisms through which grouping affects students. Future research should focus on grouping effects mediators. A student’s self-efficacy beliefs may be a particularly powerful grouping-effect mechanism and thus deserves special attention. Separately, it is possible that first grade teachers focus on the needs of low achievers in response the current high-stakes testing environment and/or in response to achievement gaps. Future research should examine the extent to which these phenomena are true. It should also pay attention to potential unintended consequences of these phenomena. 48 Figure 1: Conceptual Framework: Students’ Self Efficacy Beliefs: - E.g., placement in a low group may negatively impact a student’s beliefs and vice versa Teacher Characteristics: - Beliefs about grouping - Pedagogical skills - Experience - Effort - Others (e.g., teacher uses research for instruction) Student Identified Based on Demonstrated or Perceived Ability Classroom Characteristics: - Academic performance heterogeneity - Class size - Supports (e.g., paid aides) Instructional Content: - Teacher can tailor content to the small group level Student Placed in School/District Factors: - Instructional policy - Student placement policy - Curricula Student/Parent Factors: - Demographics (via school selection) - Parental involvement (e.g., lobbying) - Student behaviors and abilities How Students Treat Each Other: - E.g., students may tease each other based on group placement Grouped Class Class Divided Into Small Groups for Instruction Social and Emotional Skills: - End-of-year skills level - Non-cognitive skills developmental trajectory Teacher’s Time: - Students may interact less with teacher; and, some groups may receive greater attention than others Classroom Social Setting: - Peer contexts are stratified by ability group (peer effects) 49 References Algozzine, B., Wang, C. and Violette, A.S. (2010). Reexamining the relationship between academic achievement and social behavior. Journal of Positive Behavior Interventions, 13(1): 3-16 Allington, R.L. (1980). Poor readers don’t get to read much in reading groups. Language Arts, 57: 872-876 Barr, R. and Dreeben, R. (1983). How Schools Work. Chicago: The University of Chicago Press Baumann, J.F., Hoffman, J.V., Duffy-Hester, A.M. and Moon Ro, J. (2000). “The First R” yesterday and today: U.S. elementary reading instruction practices reported by teachers and administrators. Reading Research Quarterly, 35(3): 338-377 Bandura, A. (1977). Self-efficacy: Towards a unifying theory of behavioral change. Psychological Review, 84(2): 191-215 Bowles, S., Gintis, H. and Osborne, M. (2001). The determinants of earnings: A behavioral approach. Journal of Economic Literature, 39(4): 1137-1176 Center, D.B., Deitz, S.M. and Kaufman, M.E. (1982). Student ability, task difficulty, and inappropriate classroom behavior: A study of children with behavior disorders. Behavior Modification, 6: 355-374 Chorzempa, B.F. and Graham, S. (2006). Primary-grade teachers’ use of within-class ability grouping in reading. Journal of Educational Psychology, 98(3): 529-541 Dewar, J.A. (1963). Grouping for arithmetic instruction in sixth grade. The Elementary School Journal, 63(5): 266-269 Dunifon, R. and Duncan, G.J. (1998). Long-run effects of motivation on labor-market success. Social Psychology Quarterly, 61(1): 33-48 50 Elicker, J. and Mathur, S. (1997). What do they do all day? Comprehensive evaluation of fullday kindergarten. Early Childhood Research Quarterly, 12: 459-480 Felmlee, D. and Eder, D. (1983). The impact of ability groups on student attention. Sociology of Education, 56(2): 77-87 Findley, W.G. and Bryan, M.M. (1971). Ability grouping: 1970 – status, impact, and alternatives. University of Georgia: Center for Educational Improvement Gamoran, A. (1989). Rank, performance, and mobility in elementary school grouping. The Sociological Quarterly, 30(1): 109-123 Gamoran, A. (1986). Instructional and institutional effects of ability grouping. Sociology of Education, 59(4): 185-198 Gamoran, A. (1992). Synthesis of research / Is ability grouping equitable? Educational Leadership, 50(2): 11-17 Gaviria, A. and Raphael, S. (2001). School-based peer effects and juvenile behavior. The Review of Economics and Statistics, 83(2): 257-268 Gleason, J.H., Alexander, A.M. and Somers, C.L. (2000). Later adolescent’ reactions to three types of childhood teasing: Relations with self-esteem and body image. Social Behavior and Personality, 25(5): 471-480 Goldsmith, A.H., Veum, J.R. and Darity, Jr., W. (1997). The impact of psychological and human capital on wages. Economic Inquiry, XXXV: 815-829 Haller, E. (1985). Pupil race and elementary school ability grouping: Are teachers biased against black children? American Educational Research Journal, 22(4): 465-483 Hallinan, M.T. (1994). Tracking: from theory to practice. Sociology of Education, 67(2): 79-84 51 Hallinan, M.T. and Sørensen, A.G. (1985). Ability grouping and student friendships. American Educational Research Journal, 22(4): 485-499 Lindqvist, E., and Westman, R. (2009). The labor market returns to cognitive and noncognitive ability: Evidence from the Swedish enlistment. Research Institute of Industrial Economics (IFN) Working Paper No. 794 Lou, Y., Abrami, P.C., Spence, J.C., Poulsen, C., Chambers, B. and d’Apollinia, S. (1996). Within-class grouping: A meta-analysis. Review of Educational Research, 66(4): 423-458 Katz, L.F., Kling, J.R. and Liebman, J.B. (2001). Moving to opportunity in Boston: Early results of a randomized mobility study. The Quarterly Journal of Economics, 116(2): 607-654 Kulik, J.A. (1992). An analysis of the research on ability grouping: Historical and contemporary perspectives. The National Research Center on the Gifted and Talented. Mosteller, F., Light, R.J. and Sachs, J.A. (1996). Sustained inquiry in education: Lessons from skill grouping and class size. Harvard Educational Review, 66(4): 797-842 Multon, K.D., Brown, S.D. and Lent, R.W. (1991). Relation of self-efficacy beliefs to academic outcomes: A meta-analytic investigation. Journal of Counseling Psychology, 38(1): 3038 Nomi, T. (2009). The effects of within-class ability grouping on academic achievement in early elementary years. Journal of Research on Educational Effectiveness, 3(1): 56-92 Nyhus, E.K. and Pons, E. (2005). The effect of personality on earnings. Journal of Economic Psychology, 26: 363-384 Oakes, J. (1994). More than misapplied technology: A normative and political response to Hallinan on tracking. Sociology of Education, 67(2): 84-91 52 Pallas, A.M., Entwisle, D.R. and Alexander, K.L. (1994). Ability-group effects: Instructional, social, or institutional? Sociology of Education, 67(1): 27-46 Reardon, S.F., Cheadle, J.E. and Robinson, J.P. (2009). The effect of Catholic schooling on math and reading development in kindergarten through fifth grade. Journal of Research on Educational Effectiveness, 2: 45-87 Robinson, J.P. (2008). Evidence of a differential effect of ability-grouping on the reading achievement growth of language-minority Hispanics. Educational Evaluation and Policy Analysis, 30(2): 141-180 Rowan, B. and Miracle, A.W. (1983). Systems of ability grouping and the stratification of achievement in elementary schools. Sociology of Education, 56(3): 133-144 Sacerdote, B. (2001). Peer effects with random assignment: Results for Dartmouth roommates. The Quarterly Journal of Economics: 681-704 Schunk, D.H. (1989). Self-efficacy and achievement behaviors. Educational Psychology Review, 1(3): 173-208 Skinner, E.A. and Belmont, M.J. (1993). Motivation in the classroom: Reciprocal effects of teacher behavior and student engagement across the year. Journal of Educational Psychology, 85(4): 517-581 Slavin, R.E. (1987). Ability grouping and student achievement in elementary schools: A bestevidence synthesis. Review of Educational Research, 57(3): 293-336 Slavin, R.E. and Karweit, N.L. (1985). Effects of whole class, ability grouped, and individualized instruction on mathematics achievement. American Educational Research Journal, 22(3): 351-367 53 Sørensen, A.B. and Hallinan, M.T. (1986). Ability grouping on growth in academic achievement. American Educational Research Journal, 23(4): 519-542 Tach, L.M. and Farkas, G. (2006). Learning-related behaviors, cognitive skills, and ability grouping when schooling begins. Social Science Research, 35: 1048-1079 Umbreit, J., Lane, K.L. and Dejud, C. (2004). Improving classroom behavior by modifying task difficulty: Effects of increasing the difficulty of too-easy tasks. Journal of Positive Behavior Interventions, 6: 13-20 Useem, E.L. (1992). Middle school and math groups: Parents’ involvement in children’s placement. Sociology of Education, 65(4): 263-279 Wallen, N.E. and Vowles, R.O. (1960). The effect of intraclass ability grouping on arithmetic achievement in the sixth grade. Journal of Educational Psychology, 51(3): 159-163 White, I.R., Royston, P. and Wood, A.M. (2010). Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine, 30(4): 377-399 54 Table 1: Mean Differences in Students' Baseline Characteristics Type of Grouping Sample size: Kindergarten reading (Kr) Kindergarten math (Km) First grade reading (1r) First grade math (1m) Student-level characteristics: Caucasian: Kr Km 1r 1m African-American: Kr Km 1r 1m Hispanic: Kr Km 1r 1m Age (in months, at start of year): Kr Km 1r 1m SES (ECLS-K categorical scale): Kr Km 1r 1m Assessment z-score: Kr Km 1r 1m Overall Sample No Grouping Infrequent Grouping Frequent Grouping Flexible Grouping°° n=8,459 n=8,459 n=8,459 n=8,459 Std. Mean Error n=3,308 n=2,406 n=1,214 n=494 n=1,419 Std. Mean Error Mean n=1,602 n=4,567 Std. Mean Diff. Error F-Stat. Mean n=2,409 n=2,473 Std. Mean Diff. Error F-Stat. Mean n=1,513 N/A n=3,954 N/A Std. Mean Diff. Error F-Stat. 0.65 0.65 0.64 0.64 (0.02) (0.02) (0.02) (0.02) 0.72 0.70 0.66 0.68 (0.02) (0.02) (0.07) (0.03) 0.67 0.66 0.67 0.65 (0.02) (0.02) (0.03) (0.02) 0.047** 0.059* 0.884 0.253 0.55 0.48 0.63 0.59 (0.03) (0.04) (0.02) (0.03) 0.000*** 0.000*** 0.617 0.002*** 0.55 N/A 0.63 N/A (0.03) 0.000*** N/A N/A (0.02) 0.619 N/A N/A 0.13 0.13 0.14 0.14 (0.01) (0.01) (0.01) (0.01) 0.10 0.11 0.09 0.10 (0.01) (0.01) (0.02) (0.01) 0.12 0.12 0.14 0.15 (0.02) (0.01) (0.02) (0.01) 0.157 0.476 0.096* 0.004*** 0.16 0.23 0.15 0.15 (0.02) (0.04) (0.01) (0.02) 0.003*** 0.002*** 0.029** 0.027** 0.18 N/A 0.13 N/A (0.03) 0.001*** N/A N/A (0.01) 0.188 N/A N/A 0.15 0.15 0.16 0.16 (0.01) (0.01) (0.01) (0.01) 0.12 0.14 0.17 0.15 (0.02) (0.01) (0.07) (0.03) 0.13 0.15 0.13 0.15 (0.01) (0.01) (0.02) (0.01) 0.796 0.554 0.592 0.938 0.22 0.22 0.15 0.18 (0.02) (0.02) (0.02) (0.02) 0.000*** 0.000*** 0.774 0.332 0.19 N/A 0.17 N/A (0.02) 0.001*** N/A N/A (0.01) 0.973 N/A N/A 68.51 68.51 86.79 86.79 (0.09) (0.09) (0.09) (0.09) 68.80 68.65 87.13 87.39 (0.11) (0.10) (0.32) (0.15) 68.51 68.52 87.24 86.80 (0.13) (0.13) (0.17) (0.11) 0.058* 0.333 0.753 0.002*** 68.22 68.00 86.86 86.40 (0.19) (0.17) (0.14) (0.12) 0.011** 0.000*** 0.415 0.000*** 68.13 N/A 86.51 N/A (0.17) 0.001*** N/A N/A (0.10) 0.067* N/A N/A 3.21 3.21 3.23 3.23 (0.04) (0.04) (0.04) (0.04) 3.34 3.31 3.03 3.17 (0.05) (0.04) (0.19) (0.09) 3.29 3.24 3.26 3.30 (0.05) (0.05) (0.07) (0.04) 0.416 0.245 0.233 0.115 3.00 2.84 3.21 3.13 (0.08) (0.09) (0.05) (0.06) 0.000*** 0.000*** 0.348 0.651 2.99 N/A 3.26 N/A (0.08) 0.000*** N/A N/A (0.05) 0.246 N/A N/A -0.06 0.20 0.04 0.19 (0.03) (0.02) (0.04) (0.02) 0.03 0.25 0.01 0.23 (0.04) (0.02) (0.12) (0.05) 0.00 0.22 0.11 0.21 (0.04) (0.03) (0.05) (0.02) 0.606 0.424 0.394 0.754 -0.29 -0.02 0.05 0.12 (0.08) (0.05) (0.05) (0.04) 0.000*** 0.000*** 0.756 0.039** -0.14 N/A 0.01 N/A (0.07) 0.042** N/A N/A (0.04) 0.986 N/A N/A The k indergarten math and reading samples were weighted using ECLS-K student-level panel weight BYCW0, ECLS-K strata variable BYCWSTR, and ECLS-K primary sampling unit (PSU) variable BYCWPSU. The first grade math and reading sample were weighted using Stata’s svy command and ECLS-K student-level panel weight Y2COMW0, ECLS-K strata variable Y2COMSTR, and ECLS-K primary sampling unit (PSU) variable Y2COMPSU. Weighting the data tak es into account ECLS-K's multistage probability sampling procedure and produces corrected standard errors. °°Flexible grouping data not available for math classes °°°F-Statistic of Adjusted Wald Test of the difference in variable means between grouped and non-grouped students (all grouping types are compared to no grouping) °°°°Reading assessment z-scores for Kr and 1r; and, math assessment z-scores for Km and 1m Statistical signifance levels: *0.10, **0.05, ***0.001 55 Table 2: Mean Differences in Teachers' Baseline Characteristics (population-weighted°) Type of Grouping Sample size: Kindergarten reading (Kr) Kindergarten math (Km) First grade reading (1r) First grade math (1m) Teacher-level characteristics: Caucasian: Kr Km 1r 1m African-American: Kr Km 1r 1m Really enjoys present teaching job:°°°° Kr Km 1r 1m Standard deviation of assessment z-scores: Kr Km 1r 1m Percentage of kindergarten classes that are full day; Kr Km Overall Sample No Grouping n=2,135 n=2,135 n=2,785 n=2,785 Std. Mean Error n=731 n=1,081 n=140 n=436 Std. Mean Error 87.8% 87.8% 84.8% 84.8% (0.01) (0.01) (0.01) (0.01) 92.0% 92.2% 85.7% 87.4% 5.8% 5.8% 6.2% 6.2% (0.01) (0.01) (0.00) (0.00) 59.8% 59.8% 52.9% 52.9% 0.64 0.57 0.41 0.40 Infrequent Grouping Mean n=592 n=693 n=475 n=1,455 Std. Mean Diff. Error F-Stat. (0.01) (0.01) (0.03) (0.02) 89.0% 87.9% 85.3% 86.8% (0.01) (0.01) (0.02) (0.01) 3.3% 3.1% 4.3% 3.0% (0.01) (0.01) (0.02) (0.01) 5.5% 5.5% 6.7% 5.9% (0.01) (0.01) (0.01) (0.01) 63.7% 62.2% 51.3% 50.2% (0.02) (0.02) (0.04) (0.02) (0.02) (0.01) (0.01) (0.01) 0.65 0.59 0.40 0.40 (0.02) (0.02) (0.04) (0.02) 62.8% (0.02) 62.8% (0.02) 59.4% (0.03) 59.9% (0.03) Frequent Grouping Flexible Grouping°° Mean n=421 n=424 n=982 n=894 Std. Mean Diff. Error F-Stat. Mean 0.104 0.012** 0.895 0.752 80.5% 75.8% 83.8% 80.3% (0.03) (0.03) (0.01) (0.01) 0.000*** 0.000*** 0.565 0.001*** 84.7% (0.02) 0.001*** N/A N/A N/A 85.4% (0.01) 0.909 N/A N/A N/A (0.01) (0.01) (0.01) (0.01) 0.084* 0.044** 0.291 0.016** 9.6% 13.7% 6.9% 8.2% (0.02) (0.02) (0.01) (0.01) 0.001*** 0.000*** 0.240 0.000*** 58.5% 56.5% 48.3% 50.3% (0.02) (0.02) (0.02) (0.01) 0.085* 0.056* 0.524 0.993 55.9% 58.9% 55.1% 58.6% (0.03) (0.03) (0.02) (0.02) 0.024** 0.303 0.403 0.004*** 0.64 0.55 0.40 0.41 (0.03) (0.02) (0.02) (0.01) 0.758 0.065* 0.887 0.801 0.59 0.58 0.31 0.38 (0.05) (0.03) (0.02) (0.01) 0.232 0.595 0.030** 0.344 56.4% (0.03) 0.447 63.7% (0.03) 0.217 70.9% (0.03) 0.008*** 69.1% (0.03) 0.034 7.6% N/A 5.6% N/A n=391 N/A n=1,188 N/A Std. Mean Diff. Error F-Stat. (0.02) 0.015** N/A N/A (0.01) 0.531 N/A N/A 57.8% (0.03) 0.096* N/A N/A N/A 53.2% (0.01) 0.669 N/A N/A N/A 0.68 N/A 0.50 N/A (0.03) 0.473 N/A N/A (0.02) 0.036** N/A N/A 71.5% (0.03) 0.003*** N/A N/A N/A The k indergarten math and reading samples were weighted using the ECLS-K teacher-level weight B1TW0, strata variable B1TTSTR, and primary sampling unit (PSU) variable B1TTPSU. Weighting the data tak es into account ECLS-K's multistage probability sampling procedure and produces corrected standard errors. Analagous weights for first grade teachers do not exist. °°Flexible grouping data not available for math classes °°°F-Statistic of Adjusted Wald Test of the difference in variable means between grouped and non-grouped students (all grouping types are compared to no grouping) °°°°Teacher answered "strongly agree" to a question about whether or not s/he enjoys her present teaching job °°°°°Reading assessment z-scores for Kr and 1r; and, math assessment z-scores for Km and 1m Statistical signifance levels: *0.10, **0.05, ***0.001 56 Table 3: Comparison of Sample Schools (population-weighted°) Schools in which all teachers group (by type) School Type: Kindergarten reading: - All schools - Public schools - Private schools - Urban schools°°° - Rural schools - Suburban schools°°°° Kindergarten math: - All schools - Public schools - Private schools - Urban schools - Rural schools - Suburban schools First grade reading: - All schools - Public schools - Private schools - Urban schools - Rural schools - Suburban schools First grade math: - All schools - Public schools - Private schools - Urban schools - Rural schools - Suburban schools n 715 555 160 283 101 331 Total Sample % of all schools 78% 22% 40% 14% 46% 715 555 160 283 101 331 1,043 850 193 426 124 493 81% 19% 41% 12% 47% 1,043 850 193 426 124 493 No Grouping % of school n type total°° 168 23% 98 18% 70 44% 56 20% 36 36% 76 23% Infrequent Grouping % of school n type total 102 14% 62 11% 40 25% 44 16% 9 9% 49 15% Frequent Grouping % of school n type total 44 6% 28 5% 16 10% 17 6% 7 7% 20 6% Flexible Grouping % of school n type total 45 6% 33 6% 12 8% 25 9% 5 5% 15 5% 221 135 86 80 40 101 31% 24% 54% 28% 40% 31% 102 65 37 48 12 42 14% 12% 23% 17% 12% 13% 62 46 16 32 10 20 9% 8% 10% 11% 10% 6% N/A N/A N/A N/A N/A N/A 33 15 18 14 5 14 3% 2% 9% 3% 4% 3% 88 52 36 42 10 36 8% 6% 19% 10% 8% 7% 207 157 50 76 26 105 20% 18% 26% 18% 21% 21% 216 176 40 112 21 83 83 50 33 34 13 36 8% 6% 17% 8% 10% 7% 300 223 77 128 35 137 29% 26% 40% 30% 28% 28% 160 126 34 71 19 70 15% 15% 18% 17% 15% 14% N/A N/A N/A N/A N/A N/A 21% 21% 21% 26% 17% 17% °Kindergarten samples are weighted using the ECLS-K school-level population weight S2SAQW0. Weighting the data tak es into account ECLS-K's multistage probability sampling procedure and produces corrected standard errors. Analogous weights for first grade are unavailable. °°For example, in 18% of public schools, no k indergarten reading teachers use within-class ability groups °°°Schools in large or mid-sized cities °°°°Includes schools in large and small towns 57 Table 4: Comparison of Sample Schools (population-weighted°) Schools in which some or all teachers group (by type) School Type: Kindergarten reading: - All schools - Public schools - Private schools - Urban schools°°° - Rural schools - Suburban schools°°°° Kindergarten math: - All schools - Public schools - Private schools - Urban schools - Rural schools - Suburban schools First grade reading: - All schools - Public schools - Private schools - Urban schools - Rural schools - Suburban schools First grade math: - All schools - Public schools - Private schools - Urban schools - Rural schools - Suburban schools n 715 555 160 283 101 331 Total Sample % of all schools 78% 22% 40% 14% 46% 715 555 160 283 101 331 1,043 850 193 426 124 493 81% 19% 41% 12% 47% 1,043 850 193 426 124 493 n 385 299 86 134 61 190 No Grouping % of school type total°° 54% 54% 54% 47% 60% 57% Infrequent Grouping % of school n type total 343 48% 292 53% 51 32% 137 48% 32 32% 174 53% Frequent Grouping % of school n type total 249 35% 224 40% 25 16% 103 36% 34 34% 112 34% Flexible Grouping % of school n type total 211 30% 188 34% 23 14% 96 34% 30 30% 85 26% 502 396 106 177 73 252 70% 71% 66% 63% 72% 76% 381 326 55 151 42 188 53% 59% 34% 53% 42% 57% 226 205 21 100 32 94 32% 37% 13% 35% 32% 28% N/A N/A N/A N/A N/A N/A 116 81 35 44 16 57 11% 10% 18% 10% 13% 12% 314 251 63 121 36 159 30% 30% 33% 28% 29% 32% 612 536 76 221 77 314 59% 63% 39% 52% 62% 64% 605 536 69 256 69 280 340 277 63 132 43 166 33% 33% 33% 31% 35% 34% 763 640 123 304 86 374 73% 75% 64% 71% 69% 76% 548 489 59 223 63 262 53% 58% 31% 52% 51% 53% N/A N/A N/A N/A N/A N/A 58% 63% 36% 60% 56% 57% °Kindergarten samples are weighted using the ECLS-K school-level population weight S2SAQW0. Weighting the data tak es into account ECLS-K's multistage probability sampling procedure and produces corrected standard errors. Analogous weights for first grade are unavailable. °°For example, in 54% of public schools, some or all k indergarten reading teachers do not use within-class ability groups °°°Schools in large or mid-sized cities °°°°Includes schools in large and small towns 58 Table 5: Logit Analysis (population-weighted°)°° Grouping Type Classroom-level grouping predictors: African-American teacher: Kindergarten reading (Kr) Kindergarten math (Kr) First grade reading (1r) First grade math (1m) Percent of African-American students in class: Kr Km 1r 1m Percent of Hispanic students in class: Kr Km 1r 1m Standard deviation of assessment z-scores in class:°°°° Kr Km 1r 1m Average Internalizing Problem Behavior z-score of students in class: Kr Km 1r 1m Kindergarten class is a full day class: Kr Km Infrequent Grouping School Fixed No Fixed Effects Effects Frequent Grouping School Fixed No Fixed Effects Effects Flexible Grouping°°° School Fixed No Fixed Effects Effects 1.23 (0.41) 1.66 (0.51) 1.20 (0.65) 1.771* (0.58) 0.94 (0.45) 1.34 (0.54) 0.00 (0.00) 1.65 (0.93) 1.748* (0.53) 2.795*** (0.91) 1.15 (0.57) 1.70 (0.58) 1.05 (0.66) 1.82 (0.85) 8021.50 (117552.80) 2.16 (1.32) 1.26 (0.44) 0.99 (0.48) - 0.99 (1.06) 0.31 (1.28) - 2.207** (0.78) 1.01 (0.32) 1.99 (1.03) 1.54 (0.45) 0.83 (0.97) 2.88 (2.61) 0.03 (0.15) 0.36 (0.43) 2.050** (0.74) 2.00 (0.88) 1.63 (0.79) 2.365*** (0.72) 0.90 (1.47) 0.41 (0.48) 1428253.4** (8096775.90) 2.41 (4.07) 2.486** (1.03) 2.488* (1.21) - 1.14 (2.31) 5.92e+51* (3.79e+53) - 1.18 (0.57) 1.04 (0.34) 0.76 (0.37) 0.96 (0.24) 7.506** (7.38) 4.837** (3.49) 0.000560* (0.00) 1.74 (1.67) 2.818** (1.14) 2.014* (0.78) 2.377* (1.09) 1.759** (0.48) 0.63 (0.64) 0.48 (0.41) 0.01 (0.02) 0.48 (0.50) 2.568** (1.01) 1.59 (0.69) - 1.64 (2.63) 4.56e-23* (0.00) - 0.96 (0.12) 0.84 (0.10) 0.98 (0.22) 1.04 (0.17) 1.09 (0.19) 0.758* (0.12) 0.123** (0.11) 0.77 (0.25) 0.87 (0.13) 1.09 (0.15) 0.695** (0.12) 0.92 (0.15) 0.591** (0.15) 0.97 (0.21) 2.02 (1.23) 1.17 (0.44) 1.23 (0.17) 1.593** (0.33) - 1.33 (0.44) 4.17 (7.68) - 2.259** (0.82) 1.70 (0.55) 0.361** (0.17) 0.589** (0.16) 3.031** (1.61) 1.59 (0.65) 0.31 (0.54) 0.413** (0.18) 1.11 (0.44) 1.05 (0.45) 0.351*** (0.14) 0.72 (0.20) 1.85 (1.28) 2.505* (1.39) 0.0832** (0.10) 0.98 (0.51) 1.36 (0.53) 0.303*** (0.14) - 3.76 (3.67) 3.09e-08* (0.00) - 0.717* (0.13) 1.00 (0.14) 0.70 (0.48) 0.75 (0.42) 1.669** (0.38) 1.34 (0.31) 0.46 (0.50) 1.34 (1.17) 1.546** (0.33) - 2.81 (4.23) - The k indergarten math and reading samples were weighted using the ECLS-K teacher-level weight B1TW0, strata variable B1TTSTR, and primary sampling unit (PSU) variable B1TTPSU. Weighting the data tak es into account ECLS-K's multistage probability sampling procedure and produces corrected standard errors. Analagous weights for first grade teachers do not exist. °°Reported estimates are odds ratios °°°Flexible grouping data not available for math classes °°°°Reading assessment z-scores for Kr and 1r; and, math assessment z-scores for Km and 1m Statistical signifance levels: *0.10, **0.05, ***0.001 59 Table 6: Estimates of within-class ability grouping effects (within schools, by grade°)°° Kindergarten First Grade OLS Estimator: Propensity-Score Matching Estimator -0.0010 (0.021) -0.0034 (0.019) 0.0185 (0.018) -0.0167 (0.015) -0.0079 (0.016) -0.0090 (0.022) -0.0029 (0.018) -0.0167 (0.036) -0.0149 (0.013) -0.0343* (0.017) - Rinfreq Externalizing Problem Behaviors: - Rinfreq 0.0077 (0.019) - Rfreq 0.0043 (0.019) - Rflex -0.0128 (0.017) - Minfreq 0.0203** (0.009) - Mfreq -0.0149 (0.018) 0.0051 (0.017) 0.0128 (0.027) -0.0306 (0.034) 0.0114 (0.013) 0.0289 (0.026) - Rinfreq Internalizing Problem Behavior: - Rinfreq -0.0187 (0.020) - Rfreq 0.0057 (0.014) - Rflex -0.0064 (0.016) - Minfreq 0.0014 (0.010) - Mfreq 0.0392*** (0.007) -0.0227 (0.017) 0.0234 (0.022) -0.0107 (0.025) 0.0018 (0.006) 0.0585*** (0.014) - Rinfreq -0.0144 (0.016) -0.0270 (0.018) 0.0001 (0.026) 0.0030 (0.015) -0.0431** (0.021) - Rinfreq -0.0517** (0.021) -0.0375 (0.023) -0.0558* (0.030) -0.0244*** (0.009) -0.0448 (0.034) - Rinfreq Self-Control: - Rinfreq°°° - Rfreq - Rflex - Minfreq - Mfreq Interpersonal Skills: - Rinfreq - Rfreq - Rflex - Minfreq - Mfreq -0.0194 (0.014) -0.0006 (0.018) 0.0071 (0.017) -0.0118 (0.012) -0.0010 (0.017) Approaches to Learning: - Rinfreq -0.0546*** (0.017) - Rfreq -0.0354* (0.019) - Rflex -0.0373*** (0.008) - Minfreq -0.0275** (0.013) - Mfreq -0.0258 (0.023) Propensity-Score OLS Estimator: Matching Estimator - Rfreq - Rflex - Minfreq - Mfreq - Rfreq - Rflex - Minfreq - Mfreq - Rfreq - Rflex - Minfreq - Mfreq - Rfreq - Rflex - Minfreq - Mfreq - Rfreq - Rflex - Minfreq - Mfreq -0.112*** (0.035) -0.107** (0.049) -0.128*** (0.048) -0.0699*** (0.022) -0.0524** (0.023) -0.0372 (0.055) -0.0860*** (0.026) -0.149** (0.056) -0.0690** (0.030) -0.0552 (0.050) -0.0512 (0.052) -0.0037 (0.042) -0.0195 (0.045) 0.0107 (0.026) 0.0219 (0.033) -0.1020 (0.075) -0.0539 (0.032) 0.0237 (0.086) -0.0034 (0.037) 0.0186 (0.042) 0.0285 (0.032) 0.0498** (0.022) 0.0440* (0.023) 0.0114 (0.011) 0.0380** (0.018) 0.0056 (0.064) 0.0460 (0.042) 0.0485 (0.078) -0.0196 (0.015) -0.0070 (0.024) -0.0604** (0.028) -0.0515* (0.027) -0.0603** (0.028) -0.0348 (0.023) -0.0160 (0.024) 0.0147 (0.052) -0.0864*** (0.017) -0.0996 (0.066) -0.0350** (0.015) -0.0307 (0.042) -0.116*** (0.034) -0.118*** (0.043) -0.0892** (0.042) -0.0705* (0.042) -0.0513 (0.033) 0.0687 (0.093) -0.223*** (0.048) -0.277** (0.107) -0.0643** (0.032) -0.0783** (0.035) °Models include school fixed effects and teacher- and student-level covariates (including prior social and emotional sk ills scores). °°Kindergarten estimates are weighted using ECLS-K panel weight BYCWO. The k indergarten models also includes ECLS-K strata dummy variables (BYCWSTR), and its standard errors are clustered at ECLS-K primary sampling unit-level (BYCPSU). First grade estimates are weighted using ECLS-K panel weight Y2COMWO. The first grade models also includes ECLS-K strata dummy variables (Y2COMSTR), and its standard errors are clustered at ECLS-K primary sampling unit-level (Y2COMPSU). Including strata dummies and clustering standard errors at the PSU-level tak e into account the multistage probably sample design of the ECLS-K data set. °°° R infreq = infrequently-grouped reading class; R freq = frequently-grouped reading class; R flex = flexibly-grouped reading class; M infreq = infrequently-grouped math class; M freq = frequently-grouped math class Statistical significance: ***0.01, **0.05, *0.10 60 Table 7: Estimates of within-class ability grouping effects (within schools, by subject°)°° Reading Math Self-Control: - Kinfreq - Kfreq - Kflex - 1infreq - 1freq - 1flex OLS Esimator: -0.0010 (0.021) -0.0034 (0.019) 0.0185 (0.018) -0.112*** (0.035) -0.107** (0.049) -0.128*** (0.048) Propensity-Score Matching Estimator -0.0090 (0.022) -0.0029 (0.018) -0.0167 (0.036) -0.0372 (0.055) -0.0860*** (0.026) -0.149** (0.056) - Kinfreq - Kfreq - 1infreq - 1freq Externalizing Problem Behavior: - Kinfreq 0.0077 (0.019) - Kfreq 0.0043 (0.019) - Kflex -0.0128 (0.017) - 1infreq -0.0512 (0.052) - 1freq -0.0037 (0.042) - 1flex -0.0195 (0.045) 0.0051 (0.017) 0.0128 (0.027) -0.0306 (0.034) -0.1020 (0.075) -0.0539 (0.032) 0.0237 (0.086) - Kinfreq Internalizing Problem Behavior: - Kinfreq -0.0187 (0.020) - Kfreq 0.0057 (0.014) - Kflex -0.0064 (0.016) - 1infreq 0.0285 (0.032) - 1freq 0.0498** (0.022) - 1flex 0.0440* (0.023) -0.0227 (0.017) 0.0234 (0.022) -0.0107 (0.025) 0.0056 (0.064) 0.0460 (0.042) 0.0485 (0.078) - Kinfreq -0.0194 (0.014) -0.0006 (0.018) 0.0071 (0.017) -0.0604** (0.028) -0.0515* (0.027) -0.0603** (0.028) -0.0144 (0.016) -0.0270 (0.018) 0.0001 (0.026) 0.0147 (0.052) -0.0864*** (0.017) -0.0996 (0.066) - Kinfreq Approaches to Learning: - Kinfreq -0.0546*** (0.017) - Kfreq -0.0354* (0.019) - Kflex -0.0373*** (0.008) - 1infreq -0.116*** (0.034) - 1freq -0.118*** (0.043) - 1flex -0.0892** (0.042) -0.0517** (0.021) -0.0375 (0.023) -0.0558* (0.030) 0.0687 (0.093) -0.223*** (0.048) -0.277** (0.107) - Kinfreq Interpersonal Skills: - Kinfreq - Kfreq - Kflex - 1infreq - 1freq - 1flex - Kfreq - 1infreq - 1freq - Kfreq - 1infreq - 1freq - Kfreq - 1infreq - 1freq - Kfreq - 1infreq - 1freq Propensity-Score OLS Esimator: Matching Estimator -0.0167 -0.0149 (0.015) (0.013) -0.0079 -0.0343* (0.016) (0.017) -0.0699*** (0.022) -0.0524** (0.023) -0.0690** (0.030) -0.0552 (0.050) 0.0203** (0.009) -0.0149 (0.018) 0.0114 (0.013) 0.0289 (0.026) 0.0107 (0.026) 0.0219 (0.033) -0.0034 (0.037) 0.0186 (0.042) 0.0014 (0.010) 0.0392*** (0.007) 0.0018 (0.006) 0.0585*** (0.014) 0.0114 (0.011) 0.0380** (0.018) -0.0196 (0.015) -0.0070 (0.024) -0.0118 (0.012) -0.0010 (0.017) 0.0030 (0.015) -0.0431** (0.021) -0.0348 (0.023) -0.0160 (0.024) -0.0350** (0.015) -0.0307 (0.042) -0.0275** (0.013) -0.0258 (0.023) -0.0244*** (0.009) -0.0448 (0.034) -0.0705* (0.042) -0.0513 (0.033) -0.0643** (0.032) -0.0783** (0.035) °Models include school fixed effects and teacher- and student-level covariates (including prior social and emotional sk ills scores). °°Kindergarten estimates are weighted using ECLS-K panel weight BYCWO. The k indergarten models also includes ECLS-K strata dummy variables (BYCWSTR), and its standard errors are clustered at ECLS-K primary sampling unit-level (BYCPSU). First grade estimates are weighted using ECLS-K panel weight Y2COMWO. The first grade models also includes ECLS-K strata dummy variables (Y2COMSTR), and its standard errors are clustered at ECLS-K primary sampling unit-level (Y2COMPSU). Including strata dummies and clustering standard errors at the PSU-level tak e into account the multistage probably sample design of the ECLS-K data set. °°° R infreq = infrequently-grouped reading class; R freq = frequently-grouped reading class; R flex = flexibly-grouped reading class; M infreq = infrequently-grouped math class; M freq = frequently-grouped math class Statistical significance: ***0.01, **0.05, *0.10 61 Table 8: Within Schools OLS Estimates of Within-class Ability Grouping Effects (by Grouping Type°)°° Externalizing Internalizing Interpersonal Approaches Self-Control Problem Behavior Problem Behavior Skills to Learning Infrequent Grouping: - Kindergarten reading (Kr) -0.0010 0.0077 -0.0187 -0.0194 -0.0546*** (0.021) (0.019) (0.020) (0.014) (0.017) - Kindergarte math (Km) -0.0167 0.0203** 0.0014 -0.0118 -0.0275** (0.015) (0.009) (0.010) (0.012) (0.013) - First grade reading (1r) -0.112*** -0.0512 0.0285 -0.0604** -0.116*** (0.035) (0.052) (0.032) (0.028) (0.034) - First grade math (1m) -0.0699*** 0.0107 0.0114 -0.0348 -0.0705* (0.022) (0.026) (0.011) (0.023) (0.042) Frequent Grouping: - Kr - Km - 1r - 1m Flexible Grouping: - Kr - 1r -0.0034 (0.019) -0.0079 (0.016) -0.107** (0.049) -0.0524** (0.023) 0.0043 (0.019) -0.0149 (0.018) -0.0037 (0.042) 0.0219 (0.033) 0.0057 (0.014) 0.0392*** (0.007) 0.0498** (0.022) 0.0380** (0.018) -0.0006 (0.018) -0.0010 (0.017) -0.0515* (0.027) -0.0160 (0.024) -0.0354* (0.019) -0.0258 (0.023) -0.118*** (0.043) -0.0513 (0.033) 0.0185 (0.018) -0.128*** (0.048) -0.0128 (0.017) -0.0195 (0.045) -0.0064 (0.016) 0.0440* (0.023) 0.0071 (0.017) -0.0603** (0.028) -0.0373*** (0.008) -0.0892** (0.042) °Models include school fixed effects and teacher- and student-level covariates (including prior social and emotional sk ills scores) °°Kindergarten estimates are weighted using ECLS-K panel weight BYCWO. The k indergarten models also includes ECLS-K strata dummy variables (BYCWSTR), and its standard errors are clustered at ECLS-K primary sampling unit-level (BYCPSU). First grade estimates are weighted using ECLS-K panel weight Y2COMWO. The first grade models also includes ECLS-K strata dummy variables (Y2COMSTR), and its standard errors are clustered at ECLS-K primary sampling unit-level (Y2COMPSU). Including strata dummies and clustering standard errors at the PSU-level tak e into account the multistage probably sample design of the ECLS-K data set. 62 Table 9: OLS Estimates of Differential Within-class Ability Grouping Effects (by within-class achievement quartile; population-weighted°) Kindergarten reading (Kr) Kindergarte math (Km) First grade reading (1r) First grade math (1m) Kindergarten reading (Kr) Kindergarte math (Km) First grade reading (1r) First grade math (1m) Kindergarten reading (Kr) Kindergarte math (Km) First grade reading (1r) First grade math (1m) Kindergarten reading (Kr) Kindergarte math (Km) First grade reading (1r) First grade math (1m) Kindergarten reading (Kr) Kindergarte math (Km) First grade reading (1r) First grade math (1m) Infrequent Grouping Quartile 1 Quartile 4 No Quartiles 0.0019 0.0077 0.0007 (0.017) (0.018) (0.012) 0.0123 0.0168 0.0025 (0.016) (0.017) (0.005) 0.136** -0.148*** -0.0720*** (0.061) (0.048) (0.021) 0.0223 -0.0408*** -0.0195* (0.025) (0.015) (0.011) Self-Control Frequent Grouping Quartile 1 Quartile 4 No Quartiles -0.0078 0.0276** 0.0093 (0.017) (0.011) (0.006) -0.0126 0.0136 0.0082 (0.017) (0.017) (0.009) 0.1160 -0.0946*** -0.0450** (0.070) (0.033) (0.020) 0.0531* -0.0374 -0.0233** (0.031) (0.031) (0.011) Infrequent Grouping Quartile 1 Quartile 4 No Quartiles -0.0020 -0.0017 0.0011 (0.019) (0.014) (0.008) -0.0076 -0.0008 0.0036 (0.017) (0.019) (0.008) -0.0915 0.1310 0.0707 (0.115) (0.145) (0.062) 0.0120 0.0892** 0.0295* (0.032) (0.040) (0.018) Externalizing Problem Behaviors Frequent Grouping Quartile 1 Quartile 4 No Quartiles -0.0216 -0.0190 -0.0006 (0.023) (0.019) (0.005) -0.0023 -0.0003 0.0018 (0.020) (0.021) (0.010) -0.0933 0.1130 0.0520 (0.113) (0.101) (0.049) -0.0283 0.0767* 0.0301 (0.037) (0.043) (0.023) Infrequent Grouping Quartile 1 Quartile 4 No Quartiles 0.0126 -0.0234 -0.0107 (0.038) (0.019) (0.011) 0.0255 -0.0154 -0.0104 (0.016) (0.020) (0.007) 0.0123 0.0106 0.0041 (0.079) (0.028) (0.032) -0.0023 -0.0075 0.0035 (0.033) (0.027) (0.014) Internalizing Problem Behavior Frequent Grouping Quartile 1 Quartile 4 No Quartiles -0.0325 0.0093 0.0092 (0.040) (0.017) (0.010) 0.0004 -0.0107 -0.0003 (0.019) (0.019) (0.009) 0.0212 0.0682** 0.0117 (0.041) (0.030) (0.019) -0.0593 0.0270 0.0333** (0.039) (0.021) (0.016) Infrequent Grouping Quartile 1 Quartile 4 No Quartiles -0.0194 0.0190 0.0058 (0.019) (0.019) (0.010) 0.0061 0.00917** 0.0000 (0.017) (0.004) (0.004) 0.0585 -0.0459 -0.0432* (0.046) (0.054) (0.025) 0.0029 -0.0383 -0.0220** (0.028) (0.029) (0.009) Interpersonal Skills Frequent Grouping Quartile 1 Quartile 4 No Quartiles -0.0220* 0.0653*** 0.0232*** (0.011) (0.013) (0.008) -0.0295 0.0115 0.0111 (0.018) (0.023) (0.011) 0.0601 -0.0370 -0.0309 (0.072) (0.058) (0.030) 0.0515 -0.0119 -0.0308** (0.031) (0.045) (0.015) Infrequent Grouping Quartile 1 Quartile 4 No Quartiles -0.0028 0.0397** 0.0139 (0.029) (0.017) (0.009) -0.0107 0.0381** 0.0154* (0.015) (0.019) (0.008) 0.0533 -0.1120 -0.0610 (0.152) (0.160) (0.073) 0.0266 -0.0384 -0.0382* (0.054) (0.050) (0.020) Approaches to Learning Frequent Grouping Quartile 1 Quartile 4 No Quartiles 0.0158 0.0634*** 0.0188*** (0.031) (0.017) (0.006) -0.0406** 0.0116 0.0188* (0.020) (0.019) (0.011) 0.0843 -0.0427 -0.0397 (0.144) (0.126) (0.059) 0.129** -0.0173 -0.0593*** (0.049) (0.060) (0.014) Quartile 1 -0.0038 (0.009) -0.0170 (0.016) Quartile 1 0.0060 (0.022) 0.0407* (0.021) Quartile 1 0.0047 (0.031) -0.0240 (0.037) Quartile 1 -0.0186 (0.011) -0.0238 (0.032) Quartile 1 0.0032 (0.028) -0.0557 (0.116) Flexible Grouping Quartile 4 No Quartiles 0.0180 0.0071 (0.021) (0.005) 0.0080 (0.044) -0.0170 (0.016) Flexible Grouping Quartile 4 No Quartiles -0.0178 -0.0035 (0.020) (0.008) -0.0068 (0.045) 0.0039 (0.023) Flexible Grouping Quartile 4 No Quartiles -0.0273 -0.0123 (0.023) (0.012) -0.0146 (0.025) 0.0053 (0.014) Flexible Grouping Quartile 4 No Quartiles 0.0189 0.0102*** (0.014) (0.003) 0.0165 (0.090) 0.0020 (0.030) Flexible Grouping Quartile 4 No Quartiles 0.0637*** 0.0234** (0.023) (0.011) 0.0063 (0.117) -0.0012 (0.055) °Kindergarten estimates are weighted using ECLS-K panel weight BYCWO. The k indergarten models also includes ECLS-K strata dummy variables (BYCWSTR), and its standard errors are clustered at ECLS-K primary sampling unit-level (BYCPSU). First grade estimates are weighted using ECLS-K panel weight Y2COMWO. The first grade models also includes ECLS-K strata dummy variables (Y2COMSTR), and its standard errors are clustered at ECLS-K primary sampling unit-level (Y2COMPSU). Including strata dummies and clustering standard errors at the PSUlevel tak e into account the multistage probably sample design of the ECLS-K data set. 63