Agent-based Model Simulations of Open Enrollment Policies Matt Kasman Stanford University March, 2014 Draft paper: please do not cite Introduction There is a consistent body of research indicating that a racially diverse educational environment has a number of positive benefits not only for students, but also for their communities, future employers, and society. Being exposed to a diverse set of peers in a school setting fosters tolerance for different perspectives, reduces racial prejudice, and confers stronger cooperative and critical-thinking skills (Orfield et al., 2008). These outcomes facilitate participation both in a global, information-driven economy and as citizens in a democratic society. However, given widespread residential segregation, persistent biases, legal restrictions, and political obstacles, it is unclear whether or how policymakers and other interested parties might be able to substantially increase diversity within schools and school districts. One solution that has gained political and popular traction is the introduction of large-scale school choice policies. Intra-district school choice, also known as open enrollment, has become an increasingly common feature of large urban school districts. These policies give families the option of selecting public schools for their children that differ from their default, neighborhood schools; in some cases, open enrollment policies do away with default schools and ask that every family list a set of preferred schools (i.e. “mandatory choice” policies). Because these policies explicitly decouple school attendance from residence, many have argued that open enrollment policies have the potential to increase diversity within district schools by creating opportunities for families in overwhelmingly impoverished, minority neighborhoods to access schools other than highly segregated (and often low-achieving) traditional neighborhood schools. However, in practice the impact of open enrollment policies on diversity has been underwhelming, with many schools in large urban school districts that have implemented these policies still serving a substantial majority of students from a single racial background. In order for open enrollment policies to have the desired effect upon diversity, it is necessary to understand what might be limiting their impact and under what (if any) conditions they can succeed. The composition of schools in open enrollment districts is the result of a three-stage process. First families select a set of schools in descending order of desirability. Then, based on these selections as well as rules that determine students’ priorities for seats at schools, students are assigned to schools. Finally, parents can choose to enroll in their assigned school, request reassignment, or not enroll in the school district (e.g. enroll in a private school or relocate to a suburban school district). The effect of specific interventions such as changing priority rules, opening new schools, or making changes to a targeted set of schools will therefore be influenced by two major factors. The first is families’ decision-making processes. If large numbers of families have preferences or information that differ by race, or have preferences that are based strongly on geography or school demographics, then it could negatively affect the impact of open enrollment on diversity. The second factor is families’ responses once they have made their selections of schools for their children and have been offered placement into particular schools. It is possible that there are systematic patterns of attrition from the school district or reassignment requests that might affect diversity within schools and the district as a whole. Once these two factors have been explored, it will then be possible to conduct simulations that can help build intuition about trends in school diversity over time and how specific interventions might affect the racial composition of schools and districts. Background Previous simulations of school choice School choice is a topic that lends itself well to study through simulation. Policymakers, school administrators, and families have discrete decision-making opportunities; the effects of school choice policies are intended to reverberate throughout a school system and to arrive as the result of institutional adjustments and shifts in the school landscape that occur over time. Therefore, it is not surprising that a several papers have utilized simulations to examine the potential effects of school choice policies. Some of the more prominent simulations of school choice policies use a general equilibrium approach. Epple and Romano (1998) construct a system with households that have income and a student with a given ability, and the presence of both public schools and private schools that can set their tuitions (which can be student-specific, to include the presence of financial aid) and admission policies (minimum ability for attendance). They then explore the effects of the introduction of different voucher policies on the proportion of students in private schools as well as student achievement (which is a function of peer quality and student ability). Nechyba (2000) constructs a system with residents of three districts making decisions about where to live, whether to attend private school, and voting on local tax rates. He then conducts experiments by simulating the introduction of different voucher policies and examining the resulting levels of private school attendance, residential segregation by income, and school quality (a function of school financing, student quality and, potentially, competitive influence). Similarly, Ferreyra (2007) uses a general equilibrium system of schools and households making decisions about school attendance to estimate the effect of introducing different voucher policies into the Chicago metropolitan area. Although these simulations provide an informative framework for simulating school choice on a large scale, the general equilibrium approach requires simplifying assumptions about how the system operates. In addition, general equilibrium models are by definition focused on identifying outcomes at a system’s equilibrium point. Two other studies attempt to overcome the drawbacks of the general equilibrium approach by constructing agent-based models that simulate the school choice process. Lauren (2004) creates a system of eight schools, giving each a quality value (which is based on average student achievement) and a location on a grid, and students with utility values indicating how highly they value school quality relative to distance, achievement values, and locations on the same grid as schools; initially, students are assigned to their closest school. During a time period, students compare their current school to a randomly chosen comparison school using their utility value and, if it the comparison school is preferable and has available seats, they switch. Using this simple model, Lauren (2004) conducts two experiments, varying the capacity of schools (while holding the number of students constant) and the correlation between student location and achievement. He concludes that “slack” in the system (the number of available seats relative to number of students) is essential in order for school choice to operate effectively (i.e. for students to move between schools and for low-quality schools to become extinct), and that achievement segregation hinders its effective operation. Maroulis et al. (2010) use data from Chicago Public Schools to populate an agent-based model of students and schools. Students and schools have physical locations on a grid corresponding to a map of Chicago. Students also have preferences for distance and school quality as well as achievement levels; schools have enrollment capacities as well as quality values that are based on a function of inherent value-added values and mean student achievement. During each time period in their model, some portion of an incoming cohort of students are designated as choosers; rather than attending their designated neighborhood schools, they will rank schools by preference and attend the highest ranked school that has an available seat. At the end of each time period, students update their achievement based on schools’ value-added, and schools update their quality, enrollment and available capacity; if they do not have enough enrolled students, schools close. Using this basic model, Maroulis et al. (2010) experiments with system conditions such as proportion of choosers and the ability of charter schools to enter into the system and explores their influence on the achievement levels of choosers relative to non-choosers. All of these general equilibrium and agent-based simulations explore the introduction of new school choice policies, with the general equilibrium models simulating the introduction of voucher programs and the agent-based models simulating open enrollment. There is only one study that uses an existing open enrollment program as a baseline and explores alternative ways of processing students’ school selections. Dur et al. (2013) explore the student assignment policy used by Boston Public Schools, which employed a deferred-acceptance student assignment algorithm that allowed for heterogeneity in student priority across seats in a given school (i.e. half of the seats give priority to applicants in a school’s “walk zone” over other applicants, while half do not). In such a system, the order in which seats in a school are allocated (referred to as “precedence”) influences the schools to student students are assigned. Using a school selections made by students in Boston, they simulate student assignment under several hypothetical combinations of school mixtures of priority sets and precedence types (e.g. all seats that give priority to walk-zone students get filled before all those that don’t). They demonstrate that although a casual observer might assume that having half of all school seats give priority to walk-zone students would give students in schools’ walk zones a strong advantage (and thus fulfill the district’s intention to retain some element of neighborhood schooling), when this mix of seats is coupled with the precedence type employed by the district, it actually resulted in only a slight increase in the proportion of walk-zone students attending schools relative to having no seats give walk-zone students priority. Partially as a result of this finding, Boston Public Schools stopped giving any priority to applicants within school walk zones (Dur et. al, 2013). Simulating open enrollment policies In any given year, the effects of school choice are determined by a process with three distinct phases: school selection, student assignment, and school enrollment. First, families are given the opportunity to select schools for their children. This stage is comprised of families’ decisions to participate in the choice process, the options that families are presented with, the factors that families consider when making decisions about schools (and the weight that they give to each factor), the information that families have about schools, and, ultimately, the selections that families make. This phase is the most conspicuous aspect of the school choice process; every study that simulates the school choice process either models this phase explicitly (Epple and Romano, 1998; Nechyba, 2000; Lauren, 2004; Ferreyra, 2007; Maroulis et al., 2010) or includes school selections as input (Dur et al., 2013). After families have made school selections, students are then assigned to schools. When school choice is solely comprised of residential selection and decisions about whether to attend private school, this phase is fairly straightforward: students are assigned to their neighborhood school unless they apply and are admitted to a private school. Thus, simulations that explore the introduction of vouchers to this simple school choice environment need not explicitly model this phase (Epple and Romano, 1998; Nechyba, 2000; Ferreyra, 2007). However, this phase plays a salient role when school choice includes open enrollment policies (Abdulkadiroglu et al., 2005; Abdulkadiroglu et al., 2009; Toch and Aldeman, 2009; Dur et al., 2013). Dur et al. (2013) focus their study on the student assignment phase. However, the two existing agent-based model simulations of open enrollment school choice gloss over this phase (Lauren, 2004; Maroulis et al., 2010). Both models incorporate capacity constraints that limit school enrollment, but they do not address the way in which students are assigned to those limited school seats. After students are assigned to schools, families can decide how they will respond. They can enroll in the school to which they were assigned or attempt to find some alternative, such as private school, home schooling, relocating to another school district, or taking advantage of a district’s appeals process to procure another school assignment. Although this phase can have a substantial impact on school enrollment, no simulation of the school choice process includes it. Agent-based modeling can be used to accurately represent all three phases of the school choice process. In fact, this approach is uniquely suited to simulating dynamic processes such as school choice. The composition of schools in a district, and of the district as a whole, is determined through a complicated, multi-stage matching process that is influenced by families’ preferences for schools, school assignment rules, and families’ enrollment responses subsequent to assignment. This process is also dynamic; the result of the process in one year can influence decisions made during the next year. Agent-based modeling can be used to simulate processes where multiple agents (in the case of school choice, schools and students) repeatedly interact according to a defined set of rules and update themselves or their environment over time; the result of repeated interaction between agents can reveal to the emergence of macro-level patterns (Lauren, 2004; Page, 2005; Miller and Page, 2007; Maroulis et al., 2010; Schelling, 1971). Several specific features of agent-based modeling are particularly useful for exploring the school choice process. Agent-based modeling allows for heterogeneity in schools and prospective students. Schools and students can differ on relevant characteristics and can express a variety of behaviors. Schools and students can observe one another, learn from the past, and update their behaviors during the simulation. Students can make decisions based on realistically imperfect information about schools. Agent-based models are robust to the entrance and exit of students and schools from the system. And agent-based models do not require a system to reach an equilibrium state, but can depict trends over time. Therefore, I will use an agent-based model to simulate all three phases of the school choice process in a large urban school district with an existing open enrollment policy and a long history of using open enrollment policies. By building a model that incorporates families’ behavior, schools’ locations and characteristics, and student assignment rules over a period of several years, I will accomplish two goals. The first is to gain a general intuition about how conditions affect diversity within schools and districts. The second is to create a tool that can be used to predict the efficacy of specific interventions. Data and Methods Parameters used for choice and enrollment behavior I estimate the parameters used to determine families’ behavior in my simulations using a rich set of administrative data from a large urban school district. The data include information on students, schools, school programs, school applications, school assignments, and enrollment. I focus my analyses on students entering Kindergarten from the 2009-2011 and 2012-2013 school years. I choose to focus on students entering Kindergarten because this is the point when the observable selection and enrollment processes are the simplest. Applications and enrollment for students entering middle and high schools are likely to be influenced by the presence of schools with non-standard beginning or terminal grade levels; prior school selection, transfer, and education experiences; a mix of parental and student decision-making; and a greater number of school factors that are difficult to accurately observe (e.g. the quality and types of athletic programs in schools). Families making school selections and enrollment decisions most likely consider the most recent observable school characteristics. Therefore, I aggregate student-level data on ethnicity and standardized test scores from the 2008-2009 through 2011-2012 school years—the year in which families were making decisions for the 2009-2012 through 2012-2013 school years—and supplement these with school-level variables that include school addresses and programs offered. I do not have student-level free or reduced price lunch eligibility. However, I was able to determine the proportion of students in schools who were eligible for free or reduced price lunches by using the common core of data (NCES, 2012). Both to reflect parents’ lack of direct access to the proportion of free or reduced price lunch eligible students that schools serve and because of the data that are available, I use school levels of free and reduced price lunch eligibility from the 2007-2008 through 2010-2011 school years. In order to explore school selection, I use applications that families submitted to the district indicating ordered preferences for school programs for the 2011-2012 school year; I focus on this year because it is the school year that I use to populate school programs in my simulation (I discuss this in more detail below). The application form allows parents to list up to ten programs in decreasing order of preference. In addition, parents could also fill out supplemental application forms indicating additional preferences for school programs. Because it is likely that families who specify larger numbers of choices differ in important ways, I choose to focus my analyses on families’ first choice selections in order to prevent biased results. In the student application data that I use, each school program selection contains a “sibling priority” flag that indicates whether a prospective student has a sibling currently enrolled in a school program; the presence of siblings in school programs that families do not select remains unobserved. Approximately 30% of prospective kindergarten students apply to school programs where their siblings are enrolled. I restrict the set of students who I include in my analyses to those without observable siblings. I do this because the selection process for families with siblings already in programs is likely to not only focus on the presence of those siblings, but also on prior experiences (i.e. prior search and selection as well as siblings’ school experiences), and their search and selection process is unlikely to represent a generalizable formation of preferences for school programs. Using student, school, and application data, I construct a dataset where each observation is a match between a student and a plausible school program selection. All students are matched with all general education programs and all dual immersion programs (which are available to both native and non-native speakers of particular languages). Matches to bilingual education programs are restricted to native speakers of the specified language. Each observation contains variables that are associated with the match between student and program: whether that program was identified as a first choice preference for that student, the distance between a student’s home and the school where a program is located; and whether a student lives in the attendance zone of a the school where a program is located. In order to examine the preferences for characteristics of school programs that are revealed when families select school programs, I employ a conditional logit model predicting the likelihood that the family of student i will select school program j as its first choice (McFadden, 1973): Pr(𝑌ij = j|Ji ) = X β+Zij γ e ij j=J X β+Zij γ ∑j=1 e ij (1) where Ji represents the set of school program alternatives that a family chooses from, Xij represents a vector of school program characteristics, and Zij represents a vector of specific choice characteristics. In these models, the program characteristics that I include consist of student achievement in the school where the program is located (standardized mean of math and language arts test scores), school demographic composition, and dummy variables indicating program type. The choice-specific characteristics that I include are geographic distance from a student’s home to a school program and whether a student’s family lives within the school attendance area for the school in which the program is located. I select these variables based on prior work that I have done on school selection in this district (Kasman, 2013). I run these models separately for students of each racial group that I will include in my simulations (i.e. White, Black, Hispanic, and Asian students), and report the coefficients in Appendix table A1. In order to explore enrollment decisions subsequent to assignment, I employ two simple logit models: 1 𝑃(𝑌|𝑋) = 1+𝑒 −(𝛼0 +𝑋𝛽) (2) These models both predict the probability of a student leaving the school system after receiving an assignment. The first model only uses student race dummy variables and is estimated using students who do not make school selections. The second uses race dummy variables as well as variables that represent the differences between first choice schools and assigned schools in school achievement and percent of students in a school eligible for free and reduced price lunch; in prior work, I identified both of these relative school characteristics as significant predictors of attrition (Kasman, 2014). I report the coefficients from these models in Appendix table A2. Data used for simulation Using administrative data from the 2009-2010 through 2012-2013 school years from the same large urban school district, I create a dataset of prospective Kindergarten students. These data include students’ race, geographic location, and relevant geographic attributes (i.e. whether they live in a low test-score area and which school’s attendance zone they reside within). Student attributes are reported in table 1. I also create a dataset that contains school program options offered for the 2011-2012 school year, including the following characteristics: geographic location, school achievement (standardized at the school level), two measures of value-added (discussed in further detail in Appendix A), program capacity, program demand, school demographics, and program type. Program characteristics are reported in table 2. Baseline simulation I run the following baseline simulation model of the school enrollment process: 1) Initialization: I populate the initial set of school programs that families will be given as options; this set is the same as the 2011-2012 school data described above. I then sample a single cohort’s worth of prospective students from the full set of students described above. 2) School choice: Using race-specific estimates of participation probability, I determine the set of students who will make school selections. Using coefficients from the conditional logit model predicting school selection with school characteristics and student race, each student who participates in the choice process makes a ranked set of school program selections. 3) Student Assignment: Based on students’ school selections and student priorities (i.e. residence within an attendance zone or a low test-score area), I run a deferred-acceptance student assignment algorithm. Students who remain unassigned as well as those who did not submit selections will be randomly sorted and assigned to the closest school with an available Kindergarten seat. 4) Student Enrollment: After receiving their assignments, students’ probabilities of enrolling in their assigned school are calculated using coefficients from the logit model predicting attrition probability using school assignment characteristics relative to first choice characteristics and race interaction terms. 5) Iteration: After students have enrolled (or left the district), schools are updated to reflect the demographics of their incoming Kindergarten cohort. Then the next year begins with the creation of a new cohort of students sampled from the full set of student data. 6) Output: After the model has run for a specified number of years, choice, assignment, and enrollment data from the simulation are saved and, using these, output metrics by year are calculated. These include: school segregation within the district (Theil’s H), attrition rates from the district (both total and disaggregated by race), and school enrollment statistics such as distance to school and school achievement levels (both total and disaggregated by race). These metrics will allow for an observation of trends during the course of the baseline model simulation as well as comparisons between simulated runs of the school selection and enrollment process. Experimental simulations I alter the conditions under which my model operates to examine specific scenarios. These scenarios represent “virtual counterfactuals.” By comparing these scenarios to the baseline scenario, which represents the school enrollment trend under current policy conditions, I gain intuition about the effect that specific interventions might have on diversity within the district. First, I examine a scenario in which all families participate in the school selection process, rather than there being race-specific probabilities that students will not have school selections submitted for them. This provides a plausible upper bound for the effects of interventions that are intended to increase engagement. I then simulate full participation for Black and Hispanic students; this represents the effects of efforts to increase engagement among traditionally disadvantaged students in the district. Next, I simulate changes in the information that is provided to families about school quality. In this scenario, I replace schools’ achievement values, which are solely based on mean scores on standardized math and language arts exams in the baseline model, with two simple value-added measures. This represents a shift in the information that is provided by the district. At present, the district publishes state-mandated school reports that include mean test scores. However, it is possible for the district to follow the example of other large urban school districts and to publish more sophisticated measures of school quality. This scenario can provide some intuition on the effect that this might have on the school choice and enrollment process. I simulate capacity increases in high-demand Kindergarten programs. Some schools consistently receive more selections than they can meet with assignments. In this scenario, I explore the effects of the district making investments in order to expand the number of available seats in these school programs. And finally, I explore changes in the priority rules used by the student assignment algorithm. At present, the district has four levels of priority for students selecting Kindergarten programs. The first tier includes children who have one or more siblings already attending a selected school. The second includes children who reside in areas designated as “low test-score zones” by the district. The third includes children who reside in schools’ attendance zones, and the fourth includes all other children. This priority structure is not the district’s first, and it is certainly possible that it will change to meet changing political pressures or practical considerations. When presented with evidence from simulations of school choice, the Boston school district recently eliminated priority for “walk zone” students in its student assignment algorithm (Dur et al., 2013). This scenario follows in a similar spirit, and explores whether eliminating priority for low test-score zone residence might have an impact on the school choice and enrollment process in the district. Both the baseline model and experimental simulation conditions are discussed in greater detail in the appendix section. Results Testing Model Validity Before I explore trends in a full run of the baseline simulation or compare the baseline to simulations under experimental conditions, I first want to examine whether and to what extent those analyses have external validity. In order to do this, I create a version of my simulation that uses data for the students who were assigned Kindergarten seats for the 2011-2012 school year (i.e. reading in this pre-specified set of students instead of a fictional, sampled cohort). I run my baseline simulation for one year, effectively simulating the selection, assignment, and enrollment processes for that school year. I then compare my simulated cohort’s selections, assignments, and enrollment to those made by the actual cohort of students. The results are shown in figures 1 through 9. The distributions of selections, assignments, and attrition subsequent to assignment appear qualitatively very similar to those of the actual cohort. Thus, despite the simplifying assumptions that I make when constructing this abstract model, it appears to be a suitably accurate facsimile of the real-world process, and the results of analyses that use it should have meaningful validity. Trends in Baseline Simulation I observe enrollment trends during a ten year run of my baseline simulation, disaggregated by race. Figure 10 shows trends in school achievement levels enrolled in by students of different races. Overall, there is not much change during the course of the simulation. The largest change is for Black students, who tend to enroll in slightly lower-achieving schools at the end of the simulation than at the start, although the trend appears to be cyclical in nature. Figure 11 shows trends in distances to schools that students enroll in. Again, there is not a dramatic change in these distances over the course of the simulation. The most notable trend is an overall increase of about a quarter mile in distances between where Black students live and the schools in which they enroll. Figure 12 shows trends in the percentage of students of the same race attending the schools in which students enroll; there is a moderate increase in this for White students that levels out at around year 7, and slight decreases for Black and Asian students. Figure 13 shows trends in the percentage of FRPL eligible students who attend the schools in which students enroll; there is a slight decrease in this for White students and a slight increase for other students. Finally, figure 14 shows trends in attrition rates. As they do in the real world, attrition rates fluctuate a bit from year to year. However, in this simulation, I observe an overall moderate decrease in attrition for White students and an increase for Black and Hispanic students. Comparing the baseline simulation to experimental simulations One aspect of racial segregation in schools is a glaring disparity in the achievement levels of schools attended by different children. Along with long-standing patterns of racial isolation in its schools, this school district has experienced persistent gaps in the achievement levels of the schools attended by Black and Hispanic students and White and Asian students, with White and Asian students consistently attending higher-achieving schools. My baseline simulation indicates that this pattern will persist without intervention. Therefore, I examine whether any of the experimental conditions that I run my simulation under can do anything to alleviate these enrollment patterns. Figure 15 shows the gap in enrolled school achievement levels at the end of each of my simulations. There were slight increases in the gap relative to the baseline simulation associated with full choice participation for Black and Hispanic students as well as for removing assignment priority for low test-score zone residents. However, there were moderate reductions in gap size associated with providing families with school value-added information in place of school achievement levels. Next, I turn to the effect that policy conditions have on racial diversity in district schools. Specifically, I examine how policy conditions affect the number of predominantly single-race schools (either over 60% single-race or over 80% single-race) and Theil’s H, which is a index that effectively portrays racial segregation in the district and will equal 0 when the racial composition of all schools matches that of the district as a whole and 1 under conditions of total racial segregation (Iceland, 2004; Reardon and Firebaugh, 2002). Figure 16 shows these values at the end of each simulation. I find that full school choice participation yields the greatest increase in racial diversity in district schools relative to the baseline simulation; this intervention reduces the number of schools over 60% single-race from 32 to 27, the number of schools over 80% from 10 to 8, and the segregation index from .241 to .226. Conversely, removing assignment priority for low test-score zone residents results in the largest decrease in racial diversity in district schools; at the end of this simulation, I observe 34 schools that are over 60% single-race, 11 that are over 80%, and a segregation index value of .255. Finally, I examine whether any of the policy changes that I simulate have an effect on the overall racial composition in the district through effects on attrition rates. District composition at the end of each simulation is shown in figure 17. I find that the policy conditions that I explore do not seem to have a substantial influence on racial composition in the district. Discussion My baseline simulation depicts trends in enrollment absent any changes in policy. I find that there is a large amount of stability in enrollment patterns over a ten year period. Although this is not an exciting finding on its face, it does increase my confidence in the accuracy of my simulations. The district has a long history of intra-district choice. The composition of schools in the 2010-2011 school year, which I use as the starting point for my simulations, is the result of this history; to the extent that families sort themselves into schools through a combination of residential and school choices, they have already done so at the outset of my analysis. Thus, I would not reasonably expect to see drastic changes to patterns of enrollment during the course of my baseline simulation. The policy interventions that I simulated did have small to moderate effects on enrollment patterns in the district; the sizes of these effects seem plausible given that I only simulated modest policy changes. I found that getting all families in the district to engage in the school choice process has the largest positive impact on diversity in the district (especially with respect to the number of predominantly single-race schools, a metric that the district is concerned about) and that replacing information given to families about school achievement levels with school value-added measures causes the largest reduction in the gap between the achievement levels of the schools that White and Asian students enroll in and the schools that Black and Hispanic students enroll in. I was pleasantly surprised by both of these findings, and hopefully both prove to be useful for the school district. This paper demonstrates that it is possible to create plausibly realistic agent-based model simulations of the entire school choice process, including families’ selections of schools, student assignment into schools, and families’ enrollment decisions. My simulations are capable of generating outcomes that match those observed in the real world, despite the fact that these simulations abstract away many of the complexities of school choice in this district. Thus, it is possible to use these simulations as a tool to conduct causal analyses of school choice. I chose to do so here by experimenting with policy changes that the district might consider implementing, thus gaining some intuition about what the effects of these interventions would be. Changes to school choice policies have the potential to not only carry financial costs, but also to be fraught with political conflict. Families tend to have strong opinions about where their children should go to school, and within the public school system open enrollment policies give families the freedom to express those opinions with their selections rather than through residential decisions alone. However, these policies also create a greater degree of uncertainty for families during the student assignment process. The justifiable importance that families place on where their children will be assigned coupled with this uncertainty causes quite a bit of attention to be focused on the details of open enrollment policies. Therefore, I believe that using agent-based model simulations to explore potential changes such as changing student priority, expanding specific programs, or opening new ones is a boon to districts. They will be able to determine the most cost-effective strategies before suggesting them to the public or attempting to implement them, and be able to point to evidence to support their decisions in the face of opposition from parents who feel that their interests are being threatened. It is also possible to use this tool to test other hypotheses about school choice that would otherwise be difficult or impossible to explore. For example, one can examine how particular sets of preferences shape enrollment in a district, or how the relationships between poverty, race, and achievement influence the outcomes of school choice policies. I intend to pursue these sorts of questions using agent-based model simulations in future work, and invite other scholars to do the same. References Dur, U. M., Kominers, S. D., Pathak, P. A., & Sönmez, T. (2013). The demise of walk zones in Boston: Priorities vs. precedence in school choice (No. w18981). National Bureau of Economic Research. Epple, D., & Romano, R. (1998). Competition between private and public schools, vouchers, and peer-group effects. American Economic Review, 88 (1), 33-62. Ferreyra, M. (2007). Estimating the e_ects of private school vouchers in multidistrict economies. American Economic Review, 93 (3), 789-817. Lauren, D. (2004).An agent based modeling approach to school choice. Working Paper, University of Chicago. Iceland, J. (2004). The multigroup entropy index (also known as Theil’s H or the information theory index). US Census Bureau. Retrieved from http://www.census.gov/housing/patterns/about/multigroup_entropy.pdf Kasman, M. (2013). How Families Choose Schools: What We Can Learn From School Application Data. Working paper. Kasman, M. (2014). Enrollment Responses to Student Assignment in a Mandatory Choice School District. Working paper. Maroulis, S., Bakshy, E., Gomez, L., & Wilensky, U. (2010). An Agent-Based Model of Intra-District Public School Choice. Downloaded November 17, 2013 from http://ccl.northwestern.edu/papers/choice.pdf. Miller, J. H., & Page, S. E. (2009). Complex Adaptive Systems: An Introduction to Computational Models of Social Life: An Introduction to Computational Models of Social Life. Princeton University Press. NCES (2012). Common Core of Data. Retrieved from http://nces.ed.gov/ccd/. Nechyba, T. (2000). Mobility, targeting, and private-school vouchers. The American Economic Review, 90 (1), 130-146. Orfield, G., Frankenberg, E., & Garces, L. M. (2008). Statement of American social scientists of research on school desegregation to the US Supreme Court in Parents v. Seattle School District and Meredith v. Jefferson County. The Urban Review, 40(1), 96-136. Page, S. E. (2005). Agent based models. The New Palgrave Dictionary of Economics. Palgrave MacMillan, New York. Reardon, S. F., & Firebaugh, G. (2002). Measures of multigroup segregation. Sociological methodology, 32(1), 33-67. Schelling, T. (1971). Dynamic Models of Segregation. Journal of mathematical sociology, 1(May), 143-186. Tables and Figures Table 1: Summary statistics for students used in simulation % Black 10.1% % Hispanic 27.3% % White 22.9% % Asian 39.7% % Reside in Low Test Score Area 21.6% N 16,955 Table 2: Initial summary statistics for school programs in simulation School % Black 12.692 (15.284) School % White 13.081 (14.096) School % Hispanic 31.018 (25.844) School % Asian 43.209 (27.65) School % FRPL eligible 64.189 (20.178) Standardized school achievement -0.069 (0.962) Basic school value-added measure 0.041 (0.542) More sophisticated school value-added measure 0.029 (0.596) Immersion Program 0.131 Bilingual Program 0.262 Spanish Program 0.206 Program capacity 41.234 (21.213) Program Demand (first choice selections per seat) 0.998 (0.684) N 107 Figure 1: Comparing selections for simulated 2011-2012 cohort to actual cohort Distance to First Choice School by race, for Fall 2011 Kindergarten cohort Simulated Cohort Actual Cohort 6 5 5 Distance (in Miles) 6 4 3 2 4 3 2 1 1 0 0 Black Hispanic Asian White Black Hispanic choices made for students without siblings, submitted by application deadline Asian White Figure 2: Comparing selections for simulated 2011-2012 cohort to actual cohort Average Achievement in First Choice School by race, for Fall 2011 Kindergarten cohort Standardized Achievement (Math and ELA) Simulated Cohort Actual Cohort 2 2 1 1 0 0 -1 -1 -2 -2 -3 -3 Black Hispanic Asian White Black Hispanic choices made for students without siblings, submitted by application deadline Asian White Figure 3: Comparing selections for simulated 2011-2012 cohort to actual cohort Percentage of Same Race Students in First Choice School by race, for Fall 2011 Kindergarten cohort Actual Cohort 100 100 80 80 % Same Race % Same Race Simulated Cohort 60 40 60 40 20 20 0 0 Black Hispanic Asian White Black choices made for students without siblings, submitted by application deadline Hispanic Asian White Figure 4: Comparing selections for simulated 2011-2012 cohort to actual cohort Percentage of FRPL Eligible Students in First Choice School by race, for Fall 2011 Kindergarten cohort Simulated Cohort Actual Cohort 80 80 60 60 % FRPL 100 % FRPL 100 40 40 20 20 0 0 Black Hispanic Asian White Black choices made for students without siblings, submitted by application deadline Hispanic Asian White Figure 5: Comparing assignment in simulated 2011-2012 cohort to actual cohort Distance to Assigned School by race, for Fall 2011 Kindergarten cohort Simulated Cohort Actual Cohort 6 5 5 4 4 Distance (in Miles) 6 3 2 3 2 1 1 0 0 Black Hispanic Asian White assignments based on choices submitted by application deadline Black Hispanic Asian White Figure 6: Comparing assignment in simulated 2011-2012 cohort to actual cohort Average Achievement in Assigned School by race, for Fall 2011 Kindergarten cohort Standardized Achievement (Math and ELA) Simulated Cohort Actual Cohort 2 2 1 1 0 0 -1 -1 -2 -2 -3 -3 Black Hispanic Asian White assignments based on choices submitted by application deadline Black Hispanic Asian White Figure 7: Comparing assignment in simulated 2011-2012 cohort to actual cohort Percentage of Same Race Students in Assigned School by race, for Fall 2011 Kindergarten cohort Actual Cohort 100 100 80 80 % Same Race % Same Race Simulated Cohort 60 40 60 40 20 20 0 0 Black Hispanic Asian White assignments based on choices submitted by application deadline Black Hispanic Asian White Figure 8: Comparing assignment in simulated 2011-2012 cohort to actual cohort Percentage of FRPL Eligible Students in Assigned School by race, for Fall 2011 Kindergarten cohort Simulated Cohort Actual Cohort 80 80 60 60 % FRPL 100 % FRPL 100 40 40 20 20 0 0 Black Hispanic Asian White assignments based on choices submitted by application deadline Black Hispanic Asian White Figure 9: Comparing attrition in simulated 2011-2012 cohort to actual cohort Attrition from District by race, for Fall 2011 Kindergarten cohort Actual Cohort 25 25 20 20 % by race % by race Simulated Cohort 15 15 10 10 5 5 0 0 Black Hispanic Overall attrition: 12.2% students without siblings Asian White Black Hispanic Overall attrition: 11.9% Asian White Figure 10: Trends in standardized school achievement levels of enrolled schools in baseline simulation Average Achievement in Enrolled School by race and year, baseline simulation 1 .5 0 -.5 -1 1 3 2 4 6 5 7 8 9 10 Year Black Hispanic Asian White Figure 11: Trends in distance to enrolled schools in baseline simulation Mean Distance to Enrolled School by race and year, baseline simulation 3 2.5 Distance 2 1.5 1 .5 0 1 2 3 4 5 6 7 8 9 10 Year Black Hispanic Asian White Figure 12: Trends in racial composition of enrolled schools in baseline simulation in baseline simulation Mean % Same Race in Enrolled School by race and year, baseline simulation 80 % Same Race 60 40 20 0 1 3 2 4 6 5 7 8 9 10 Year Black Hispanic Asian White Figure 13: Trends in FRPL percentage in enrolled schools in baseline simulation Mean % FRPL Eligible in Enrolled School by race and year, baseline simulation 80 % FRPL 60 40 20 0 1 2 3 4 5 6 7 8 9 10 Year Black Hispanic Asian White % of Students Figure 14: Trends in attrition from district in baseline simulation Attrition from District by race and year, baseline simulation 30 20 10 0 1 2 3 4 5 6 7 8 9 10 Year Black Hispanic Asian White Figure 15: Comparing racial gaps in standardized school achievement levels across simulations Gap in Average Achievement in Enrolled Schools between Black/Hispanic and White/Asian students, by simulation 1.2 1.08 1 1.04 1.00 0.99 0.97 0.85 .8 .6 Baseline Simulation Full Participation year 10 of simulation Full Black and Hispanic Participation Double Capacity in Highest Demand Programs Simple Value-added 0.87 More Complex Value-added No Priority for Low Test Score Zone Residents Figure 16: Comparing racial isolation (measured using predominantly single-race school counts and Theil’s H) across simulations Comparing Racial Isolation across simulated scenarios 40 H=.255 H=.241 30 H=.235 H=.226 10 9 H=.242 H=.235 H=.241 11 8 12 14 8 20 10 0 22 Baseline Simulation 19 Full Participation 22 Full Black and Hispanic Participation 24 Double Capacity in Highest Demand Programs # Schools > 60% and < 80% year 10 of simulation 23 19 Simple Value-added 16 More Complex Value-added No Priority for Low Test Score Zone Residents # Schools > 80% % of District Figure 17: Comparing racial composition of the district across simulations Racial Composition of District across simulated scenarios 100 20.62 21.32 20.05 20.29 20.69 19.94 19.89 40.74 40.87 40.09 41.66 41.06 41.11 42.14 28.45 27.51 29.02 28.10 27.75 28.48 27.97 10.20 10.30 10.84 9.95 10.50 10.47 10.00 Baseline Simulation Full Participation Full Black and Hispanic Participation Double Capacity in Highest Demand Programs Simple Value-added More Complex Value-added No Priority for Low Test Score Zone Residents 80 60 40 20 0 % Black year 10 of simulation % Hispanic % Asian % White Appendix: technical description of agent-based model simulation Initialization At the start of each model run, school programs are based on data for the programs that were explicitly offered to parents as options for the 2011-2012 school year (i.e. excluding special education programs). I adjust school compositions to solely reflect the proportions of Black, Hispanic, White, and Asian students served by each school. For programs in the one school with missing FRPL eligibility data, I calculate the percent of FRPL eligible students using the same parameters that I use to update FRPL percentages at the end of each simulated year (discussed in more detail below). At the start of each model run each school is fully populated; each simulated school program enrolls Kindergarten through fifth grade at the Kindergarten capacity level (thus abstracting away from non-standard grade offerings and later transfers, attrition, or enrollees), and every grade for every school program has racial compositions and achievement levels that match school-level values. Next, I sample a cohort of students from the full set of students.1 I assign each student an “achievement” value using race-specific distributions of standardized averages of math and ELA tests given to students in the 2nd grade in the 2008-2009 through 2012-2013 school years (i.e. the earliest grade in which students are tested).2 School choice 1 The full set of students is comprised of all Black, Hispanic, Asian, and White students from the 2009-2010 through 2012-2013 school years for whom I have geographic data. A cohort consists of 4600 students, which is approximately the number of students who received assignments for the 2011-2012 school year. 2 These distributions are as follows: Black (mean= -0.712 , s.d.= 0.859); Hispanic (mean= -0.582, s.d.= 0.832); Asian (mean= 0.351 , s.d.= 0.876); White (mean= 0.497 , s.d.= 0.900) The probability of each student participating in the choice process is calculated based on observed race-specific participation rates for students who received a school assignment.3 Each student who participates considers every plausible school program option (i.e. all programs except for bilingual education programs, which are only considered by students who are native speakers of the program language). Using the parameters constructed using actual first choice selections and presented in Appendix table A1, the probabilities of students choosing each program are calculated. These probabilities are then used to sample up to ten unique, ranked selections for each student. Student assignment I calculate priority numbers for each student’s potential selections. These are constructed using a hierarchy consisting of low test-score zone residency, school attendance zone residency, and a randomly generated “lottery number” for each student (e.g. a student who lives in a low test-score zone would have greater priority than one who does not, but has a better lottery number).4 Using these priority values and school selections, I run a deferred-acceptance assignment algorithm as follows: 1) Initially, all students are temporarily assigned to their first choice program in order of their priority numbers for those programs and constrained by capacity in those programs. 2) Unassigned students are considered for temporary assignment in their second choice programs (if they have made a second choice) along with all other students who are either temporarily assigned to those programs or are unassigned and for whom those programs are their second choices; limited by program capacity, students gain or retain temporary 3 Observed participation rates for school years 2009-2010 through 2012-2013: Black (89.9%); Hispanic (93.0%); Asian (93.7%); White (88.3%) 4 I do not include sibling priority, as keeping track of (or accurately generating siblings for) students in my simulated Kindergarten cohorts would be prohibitively difficult assignment in order of priority number (e.g. a student can be temporarily assigned to her second choice program, “bumping” another student who had listed it as her first choice). 3) Step 2 is repeated, with students incrementally going through their available selections, potentially displacing temporarily assigned students, until either all students have been temporarily assigned or until all unassigned students have gone through their school program selections. At this point, temporary assignments are finalized. 4) Unassigned students are randomly ordered and given assignments in the general education (i.e. not language immersion or bilingual education) program for their attendance zone schools or, if those programs are at capacity, the geographically closest available general education programs. Enrollment decisions I calculate the probabilities that assigned students will attrite from the district using the coefficients reported in Appendix table A2. Available program capacity is then updated based on attrition. Remaining unassigned students are then assigned to programs using the same procedure as step 4 of the student assignment algorithm, and then these newly assigned students make their enrollment decisions. Iteration I update school characteristics after all enrollment decisions are made. Students currently in fifth grade “graduate” from my simulated schools while the other cohorts in these schools are promoted and the Kindergarten class enters. I then calculate new values for school-level achievement and racial composition using students’ race and achievement values. Because FRPL eligibility is not stored at the student level in district data, I estimate the percentage of FRPL eligible students entering each Kindergarten program using the racial composition of entering Kindergarten students before aggregating the percentage of FRPL eligible students to the school level. 5 At the start of the next simulated year, new cohort of students will be sampled who will observe these updates schools; the simulation will continue through ten years. During this time detailed data on students, schools, choices, assignment, and enrollment are saved out for analysis. Experiments In addition to a “baseline” version of the simulation, I run the simulation under six alternative “experimental” conditions: 1) Full participation: all students participate in the choice process 2) Full Black/Hispanic participation: all Black/Hispanic students participate in the choice process; participation rates for White and Asian students remain at baseline levels. 3) Basic school value-added information provided to families: I calculate average betweenyear gains in math and ELA test scores for students in schools, standardize these across schools, and then use these values in place of the initial student achievement values given to simulated schools.6 Families in the simulation then observe these values (which do not update during the simulation) instead of student achievement when making school selections and enrollment decisions. 4) More sophisticated school value-added information provided to families: I predict the same student gain scores that were used in the previous experiment with the following model: Yigys=β0 + Xiβ1 + γg + λy + θs + εigys 5 (A1) I use school-level data from 2008-2009 through 2011-2012 school years to do this. The results of the regression that I use are reported in Appendix table A3. 6 I use student test scores from grades 2 through 5 during the 2008-2009 through 2010-2011 school years Where X represents racial dummy variables, γ represents grade dummy variables, λ represents year dummy variables, and θ are school fixed effects that are saved out as value-added measures and then standardized. As above, families use these values for the duration of the simulation in place of school achievement levels. 5) Increase capacity in high-demand programs: I calculate demand for programs as the number of students who selected each program as their first choice for the 2011-2012 school year divided by its capacity (i.e. demand per seat). In this simulation, I double the capacity in the five highest-demand programs. 6) Remove low test-score zone priority: During the assignment phase, I calculate priority numbers for programs only using attendance zone residency followed by a student’s randomly generated lottery number. Appendix tables Appendix Table A1: race-specific predictions of first choice school programs VARIABLES White Students Asian Students Distance to School Reside in School Attendance Zone Standardized School Achievement (Math and ELA) School % Black School % Hispanic School % Asian School % FRPL Eligible Immersion Program Bilingual Program Spanish Program Observations Hispanic Students Black Students -1.155*** (-17.38) 1.660*** -1.048*** (-21.43) 1.393*** -1.078*** (-19.64) 1.121*** -0.646*** (-10.42) 1.338*** (14.30) 0.859*** (14.07) 0.411*** (9.678) 0.717*** (7.117) -0.0128 (6.676) 0.0148 (1.617) 0.0167* (2.286) -0.000260 (-0.0508) -0.0499*** (-8.845) -1.384*** (-5.205) -0.910 (-0.850) 2.132*** (6.824) (5.040) -0.0157 (-1.627) 0.00998 (1.568) 0.0282*** (5.337) -0.0282*** (-6.176) -0.406* (-2.127) 1.226*** (12.10) -0.625+ (-1.812) (6.110) 0.00530 (0.519) 0.0324*** (3.925) 0.00741 (1.137) -0.0220*** (-3.792) -2.063*** (-3.919) -2.348*** (-4.368) 3.446*** (6.512) (-0.0737) 0.0592*** (5.542) 0.0371*** (3.404) 0.0335*** (3.890) -0.0515*** (-6.364) -2.086*** (-4.584) -9.800*** (-9.739) 1.871*** (3.704) 55,770 82,902 70,886 Robust z-statistics in parentheses *** p<0.001, ** p<0.01, * p<0.05, + p<0.1 26,347 Appendix Table A2: predicting probability of attrition subsequent to assignment (1) (2) -1.517*** (-5.680) -1.947*** (-8.634) -0.950*** (-3.983) -0.765*** (-7.186) -0.642*** (-8.571) -0.992*** (-13.65) -0.296*** (-5.842) 0.0178*** (7.928) -2.062*** (-38.82) VARIABLES Black Student Hispanic Student Asian Student Difference in School Achievement (assigned-first choice) Difference in School % FRPL (assigned-first choice) Constant 2.599*** (13.50) Observations 1,266 Robust z-statistics in parentheses *** p<0.001, ** p<0.01, * p<0.05, + p<0.1 16,211 Appendix Table A3: predicting school % FRPL with racial composition (1) VARIABLES School % Black School % Hispanic School % Asian Constant 1.274*** (0.0854) 1.245*** (0.0826) 1.017*** (0.0936) -29.20*** (6.766) Observations 278 R-squared 0.736 Root MSE 11.169 Robust standard errors in parentheses; schools weighted by enrollment *** p<0.001, ** p<0.01, * p<0.05, + p<0.1