Agent-based Simulation Models of the College Sorting Process

advertisement
1
Agent-based Simulation Models of the College Sorting Process
Sean F. Reardon, Matt Kasman, Daniel Klasik, and Rachel Baker
Stanford University
DRAFT LAST REVISED: MARCH 2, 2013
PLEASE DO NOT CITE WITHOUT PERMISSION OF THE AUTHORS
Direct correspondence to:
Sean F. Reardon
Stanford University
Center for Education Policy Analysis
520 Galvez Mall, #526
Stanford, CA 94305
sean.reardon@stanford.edu
2
Abstract
In this paper, we explore how dynamic processes operate to sort students into, and
create stratification between, colleges. We use a highly stylized agent-based model to
simulate sorting processes and to explore how factors related to family resources might
influence college application choices and patterns of stratification in college enrollment.
Specifically, our model includes two types of “agents”—students and colleges—to simulate
a two-way matching process with three basic stages that occur during each time period:
application, admission, and enrollment. Within this model, we examine how the following
relationships influence college sorting: the relationship between student resources and
student achievement; the relationship between student resources and the quality of
information used in the college selection process; the relationship between student
resources and the number of applications students submit; the relationship between
student resources and how students value college quality; and the relationship between
student resources and the students’ ability to enhance their apparent caliber. We find that
the relationship between resources and achievement explains much of the sorting of
students by resources, but that other factors have non-trivial influences as well. We
conclude by discussing the policy implications of our results.
3
Agent-based Simulations of the College Sorting Process
Introduction
The economic, social, and health benefits of obtaining a college degree are
substantial. A college degree leads, on average, to greater upward economic mobility,
higher employment rates, higher lifetime income, higher levels of personal savings,
improved health, and a longer life expectancy (Carneiro, Heckman, Vytlacil, 2011; Douglass,
2009). Moreover, these economic benefits have increased substantially in the last few
decades: the annual earnings premium for a college graduate, relative to a high school
graduate, is now roughly double what it was in the 1970s (Autor, 2010; Baum, Ma, & Payea,
2010; Zumeta 2011).
Not all college degrees are the same, however. Students who attend elite, highlyselective schools enjoy larger tuition subsidies, disproportionately extensive resources, and
more focused faculty attention (Hoxby, 2009). Not surprisingly, students who attend more
selective colleges have higher average earnings than those who do not (Long, 2007; Dale
and Krueger, 2001; Black and Smith, 2004; Hoekstra, 2009). Elite graduate schools and top
financial, consulting, and law firms recruit almost exclusively at highly selective colleges;
likewise, evaluators at elite firms routinely use school prestige as a key factor when
screening resumes (Rivera, 2009; Wecker, 2012).
As a result, where (as opposed to if) a student attends college has become
increasingly important, and competition for the relatively few seats at these selective
colleges has intensified (Alon and Tienda, 2007). Students have become more willing to
4
travel long distances to go to good schools and less willing to attend schools that they
perceive as being poorly resourced or having less academically-able peers (Hoxby, 2009).
At the same time, the relationship between family income and students’ academic
achievement has grown substantially over the last few decades (Reardon, 2011). Indeed,
over the last few decades, family income and socioeconomic status have grown more
strongly correlated with academic achievement, college completion, and access to selective
colleges (Alon 2009; Astin and Oseguera, 2004; Bailey and Dynarski, 2011; Bastedo and
Jaquette, 2011; Belley and Lochner, 2007; Karen, 2002; Reardon, 2011).
There are a number of mechanisms that might affect the degree to which family
socioeconomic background is associated with enrollment at highly-selected colleges. Most
significantly, academic achievement is strongly associated with family income and
socioeconomic status: high income students have much higher scores on standardized tests
(including the SAT and ACT) than middle- and low-income students (Reardon, 2011).
Because academic achievement is a key criterion for admission to selective schools, it is not
surprising that high-income students are more likely to be admitted to such schools. But
there are many other factors that may affect admission and enrollment patterns.
First, conditional on academic achievement (and other measures of application
caliber), high- and lower-income students may not apply to the same types of schools.
Hoxby and Avery (2012) show that many high-achieving low-income students make what
appear to be suboptimal application choices—applying to much less selective colleges than
similarly high-achieving high-income students, even when the availability of financial aid
would mean they could afford to attend such schools. This may be because lower-income
5
students have less information than higher-income students about college quality, their
likelihood of admission, or the likely cost (and availability of financial aid). Or it may be
because they do not think such colleges would be as beneficial for them as do similarly
skilled high-income students.
Second, higher-income students may more often do things that enhance their
likelihood of admission to more selective colleges. For example, they may apply to more
schools, thereby increasing their odds of admission to selective schools relative to lowerincome students. Additionally, higher income students may be able to enhance their
probability of admission to selective colleges through activities that require time or money,
such as taking SAT/ACT prep classes, retaking the SAT/ACT (both of which might increase
their SAT/ACT scores visible to colleges), or by participating in activities (overseas trips,
music or arts activities, volunteer activities) that may make them more attractive to
selective colleges.
Third, more selective colleges and universities may be more expensive than lessselective schools. Although some of the most selective schools provide financial aid
packages designed to make school affordable for all students admitted, not all selective
schools can do this. Differences in real cost may play a role in differential college
application and enrollment behavior.
On the other hand, many very selective colleges practice some kind of discreet
affirmative action for lower-income students. In the interest of diversity and to counter
charges of elitism, some selective schools may have a preference for admitting lowerincome students who meet admissions criteria over similarly qualified higher-income
6
students (of course, some may have a preference for admitting higher-income students, all
else equal, in the interest of cultivating the university’s relationship with their parents, who
may be seen as potential donors or conduits to high-wealth and high-influence social
networks). Thus, income-related college admissions preferences may play a role in the
sorting of students among schools by socioeconomic background.
Our goal in this paper is to build intuition about the potential relative strength of
some of these mechanisms in shaping the distribution of students, by socioeconomic
background, among more- and less-selective colleges and universities. We focus in this
paper on only a subset of the mechanisms described above. In particular, we focus on
academic achievement differences, application behavior differences, and application
enhancement differences among students of different socioeconomic background. For
now, we leave cost considerations and college admission preferences out of the model
(though we plan to further develop our models to incorporate these mechanisms in
subsequent versions of this work).
To explore these issues, we use an agent-based model—a simulation model in which
student agents make decisions about what colleges to apply to; colleges make decisions
about which applicants to accept; and students make decisions about which admission
offer to accept. By altering the distribution of student characteristics, their ability to
enhance those characteristics, and the factors that govern their application behaviors, we
can use the simulation models to explore the effect of each factor on college enrollment
patterns. These simulations do not explain why real-world enrollment patterns are the
7
way they are, but they do help build some intuition about the relative influence of different
factors in shaping these patterns.
Background
College Sorting and Income
It is difficult to disentangle whether the positive academic and social outcomes of
students who graduate from elite colleges are due to the fact that these students are
innately highly capable and likely to succeed in most domains, or due to an effect of the
college itself. Yet, increasingly, researchers have been able to disentangle these influences
and have demonstrated that many life outcomes are indeed shaped by the quality of the
college in which students enroll. Indeed, scholars have long recognized the returns to a
four-year college degree, but it is increasingly clear that these returns are greater if these
degrees are earned at more selective colleges (Black & Smith, 2004; Dale & Krueger, 2002;
Hoekstra, 2009; Long, 2007). Part of this benefit may come from the fact that elite schools
elite schools provide a more expensive education at a much lower cost to their students
than other colleges: they spend more on education per student than at less prestigious
schools (Hoxby, 2009). Further, students at more selective schools reap the social network
benefits of attending college with other academically talented students. This association
benefit conveys its own advantages on the labor market, where employers use school
prestige as a screening tool in employment decisions—thus ascribing to individuals the
perceived characteristics of a student body as a whole (Rivera, 2011).
Given the advantages conveyed by graduation from a selective college, these
institutions have the ability to play a large role in facilitating social mobility in the United
8
States. Despite this possibility, low income students are less likely to attend selective
colleges than their high income peers. In fact, Reardon, Baker, and Klasik (2012) show that
students from families earning less than $25,000 a year make up less than half the
percentage of the student body at the most selective colleges (as defined by Barron’s
Profiles of American Colleges) than they do at the least selective four-year colleges.
Further, the underrepresentation of low income students at highly selective schools has
increased over time (Alon, 2009; Astin & Oseguera, 2004; Belley & Lochner, 2007; Karen,
2002). This trend has paralleled an increase in income stratification within the US, as well
as an increase in the achievement gap between high and low income students (Reardon,
2011).
While these concurrent trends suggest that either or both of these increasing gaps
are contributing to the decreasing likelihood that low income students enroll in selective
schools, it is unclear the mechanism behind these relationships. It may be, for example, that
high and low income students have different preferences for colleges; high income students
may prefer more selective colleges more than low income students. Some scholars have
suggested that the greater social and cultural capital of high income students leads to
different tastes for college or gives them more information about the college application
process (Ellwood & Kane, 2000; Grodsky & Jackson 2009; Hossler, Schmit, & Vesper,1999;
Perna, 2006; Perna & Titus 2005), but it is again unclear how these differences lead to
differential college choices based on family income. Do these high-capital students engage
in activities that make them more attractive to colleges? Do high-capital students have
more information about the quality of a college or their own ability to get in? Many or all of
9
these factors may be at play as students choose colleges, but specific the role each plays is
still unknown.
Agent Based Models
Part of the challenge in studying the relationship between income and college
enrollment is that there are multiple stages in the college selection process, each of which
may affected by a number of potentially interrelated factors. Students engage in a number
of activities during their college search both to make decisions about where to apply and to
make themselves more attractive to institutions to which they submit applications, and
many of these choices may be influenced by either a student’s financial resources or his or
her academic ability. Colleges are also actors in the college choice process and pursue their
own goals as they decide which of their applicants to admit. The ultimate enrollment
decision is left to students, who must choose among the schools that have elected to admit
them. The entire process is further complicated by its path-dependent nature: both
students and colleges are likely to use past college admissions and enrollment to inform
their behavior.
Scholars have turned to agent based models when trying to study a host of similarly
complex behaviors. One of the classic uses of agent based models was by Schelling (1971),
who used the method to illustrate residential segregation processes and demonstrated how
it is difficult to make conclusions about individual motives based on group-level patterns of
residential choice. Scholars have also used such modeling to describe the evolution of
societies of living organisms (Gardner, 1970) and how ethnocentrism, or in-group
favoritism, can support cooperation when the cost of such cooperation is high for any given
individual (Hammond & Axelrod, 2006). These investigations were successful because the
10
authors identified the essential elements of the processes under study to reveal surprising
and powerful patterns.
There are many features of agent based models that support such studies of
complex processes, and that suggest that such methods will allow us to gain some traction
on the problem of determining how income affects college sorting. Specifically, agent based
models allow for a diverse set of agents (colleges and students) to interact according to
simple rules and learn over time in ways that lead to emergent effects resulting from
repeated interaction (Page, 2005). Heterogeneity in the model allows for our agents to
differ in important and systematic ways. For example, we can allow agents to vary in both
their family income and academic achievement levels and dictate how these characteristics
determine agents’ behavior. As agents interact, they are able to learn over time. Agents can
be given simple rules that guide their actions, but they can also learn from the successes
and failures of other agents. Such learning is an important feature in the college sorting
environment because students likely learn about the colleges and the application process
from each other (Manski, 1993) and colleges carefully manage their admissions decisions
based on enrollment patterns from previous years. Finally, the result of repeated
interaction between agents can lead to the emergence of macro-level patterns and results.
These patterns can be, but do not necessarily have to be, equivalent to a long-run system
equilibrium. This flexibility allows scholars to use agent based models to study systems
that may not have stable equilibria (Page, 2005).
To date, only one previous study has used agent based models to examine college
sorting. Henrickson (2002) designed an agent based model to demonstrate that such a
model could indeed be used to approximate the college enrollment decisions made by real
11
students. She accomplishes this task by having students use very simple application
strategies to a small number of schools (e.g. apply to all schools or apply to schools
randomly) and compared her results to real world observed college choices. In the section
that follows we discuss how we extend this work by giving students a more optimal college
application allocation algorithm. We also examine what happens to student sorting
between colleges when certain parameters, such as the correlation between family income
and student achievement, are adjusted away from their real-world values. Such
experiments allow us to comment on possible causes and solutions to income segregation
in the market for selective higher education.
Method
Model
Our model of the college sorting process is comprised of two types of agents,
students and colleges, and three stages: application, admission, and enrollment. The
process is repeated over a “run” that simulates the passage of a specified number of years.
During each year, students observe the existing distribution of colleges and select a
portfolio of colleges to which they will apply. Colleges then admit a number of applicants
that they believe will be sufficient to fill their available seats. Finally, students enroll in the
most appealing school to which they have been admitted.
Students have two attributes that we call “resources” (Rs) and “caliber” (Cs).
Resources represent the socioeconomic capital that a student can tap when engaging in the
college application process (e.g. family wealth and social network connections). Caliber
represents the observable markers of academic achievement and potential for future
academic success (e.g. grades, SAT scores, and application essay quality). Both resources
12
and caliber are normally distributed; we specify the amount of correlation between
resources and achievement. In addition, students’ calibers are supplemented by cs, which
represent artificial “enhancement” of apparent caliber in ways that are directly related to
resources (e.g. test preparation, college essay consultation, strategic extracurricular
participation). We specify the amount of application enhancement that is associated with
an increase in resources. The only attribute that colleges have is “quality” (Qc), which
represents the quality of the educational experience at a given school.
At the start of each model run, we generate a distribution of M colleges with m
available seats per year. These colleges will persist throughout the run. During each year of
the model run, a new cohort of N students engages in the college application process. Every
student observes each college’s quality with some amount of noise (ucs), which represents
both imperfect information and idiosyncratic preferences, and then evaluates the potential
utility of attendance:
Q*cs = Qc + ucs
U*cs = as + bs (Q*cs)
where as is the intercept of a linear utility function and bs is the slope. We allow the utility
function to differ between students, specifying the relationships between resources and the
intercept and slope. Students view their own apparent caliber with noise:
C*s = Cs + cs +es
The reliability with which students view both college quality and their own apparent
caliber are directly related to resources; the slopes of these relationships are specified at
the start of a model run along with maximum and minimum reliability levels. If we increase
the magnitude of these relationships, a student with above-average resources will perceive
13
their own apparent caliber and college qualities with less noise, while the opposite will be
true for a student with below-average resources. Based on their observations of their own
caliber and college quality, students estimate their probabilities of admission into each
college:
P*cs = f (C*s - Q*cs)
At any point after the first year of a run, the parameters of this function are based on
admissions from the past y years. A student’s expected utility of applying to one college is
the product of the estimated probability of admission and the estimated utility of
attendance. Students apply to sets of schools that maximize their overall expected utility.1
For example, if a student applies to three colleges, then they will select the set of three
colleges that they believe has the greatest combined expected utility (the algorithm that is
used to generate application portfolios with maximum expected utilities is outlined in
Appendix section A). The number of applications that students submit is directly related to
their resources. We specify this relationship at the beginning of a model run, and stipulate
that every student submits at least one application.
Colleges observe the apparent caliber of applicants with some amount of noise (like
the noise with which students view college quality, this also reflects both imperfect
information as well as idiosyncratic preferences):
C**cs = Cs + cs +wcs
We specify the reliability with which all colleges perceive student caliber at the start
of each model run. After the first year of a run, colleges are able to refer to the proportion
1
This assumption of rational behavior is an abstraction that facilitates focus on the elements of college sorting that
we wish to explore. We recognize that real-world students probably use many different strategies to determine
where they apply.
14
of admitted students over the past z years that they enrolled. Based on these historical
admission yields, colleges determine the number of students that they will need to admit in
order to fill m seats (nc). They then admit the nc applicants with the highest observed
calibers. After this, students enroll in the school with the highest estimated utility of
attendance (U*cs) to which they were admitted. Colleges’ underlying quality values (Qc) are
updated based on the incoming class of enrolled students before the next year’s cohort of
students begins the application process.
We run our model for 30 years (this appears to be a sufficient length of time for our
model to reach a relatively stable state for the parameter specifications that we explore).
We run our model with 40 colleges, each of which has 150 available seats per year, and
8000 students in every cohort. We select this number of students and schools because it
provides a balance between computational speed and distributional density, and because
the ratio of prospective students to available seats approximates what we observe in the
real world. Wherever possible, we use data from the Education Longitudinal Study: 2002
(ELS) and the Integrated Postsecondary Education Data System (IPEDS) to inform our
selection of parameters; where this is not possible (e.g. the reliability with which college
quality and student caliber are observed), we attempt to use plausible values.2 At the start
of each run, college quality is normally distributed with a mean of 1070 and a standard
deviation of 130. This distribution roughly mirrors the real-world distribution of median
SAT scores of admitted students at selective colleges. Student caliber and resources for
each cohort are normally distributed with means of 1000 and 50 and standard deviations
2
In order to alleviate concerns that our results were driven by incorrectly estimated parameters, we run our model
with an extensive series of alternative specifications of our parameters of interest. Our results appear to be robust to
modest changes in parameter values.
15
of 200 and 10, respectively. On average, students submit 4 applications each and view both
their own apparent caliber and colleges’ quality with a 0.7 reliability (with stipulated
minimum reliabilities of 0.5 and maximum reliabilities of 0.9). The intercept of the utility
function slope is -250 and the slope is 1, meaning that students would consider attending a
school with an estimated quality of less than 250 to offer no utility. Colleges view apparent
student caliber with a reliability of 0.8. Colleges use admission yields from the prior 3 years
when determining how many students to admit, and students use admission data from the
prior 5 years when estimating their probability of being admitted to a given college.
Experiments
This simulated world, with flexible parameters and multiple pathways through
which student resources can affect college quality, gives us the ability to understand how
students might be sorted by resources across colleges and gives us intuition about which
kinds of interventions would be the most effective in reducing this phenomenon.
There are five main parameters that we varied in each model:
1. the correlation between student resources and student caliber;
2. the relationship between resources and number of applications submitted;
3. the relationship between student resources and information (the amount of
noise with which they viewed their own caliber and schools’ quality);
4. the extent to which students with more resources could enhance their
perceived caliber;
5. and the relationship between student resources and how much they valued
school quality.
16
We examined how these adjustments affected three main outcomes: likelihood of
enrolling in college by resources, likelihood of attending a top 10% college by resources,
and the relationship between student resource and college quality.
Results
Model 1: Basic – No Resource Influence
In our first model, we did not allow a student’s resources to have any effect on
his/her caliber or application behavior. That is, we set the correlation between caliber and
resources to be 0; did not allow students with more resources to enhance their caliber; had
all students, regardless of resources, submit the same number of applications; did not give
students with more resources better information about school quality or their own caliber;
and had all students value college quality to the same extent.
As expected, this model produces an equal distribution of students from varying
resources across colleges. Higher resource students were no more likely than lower
resource students to enroll in any college (figure 1) or a top 10% college (figure 2). Further,
there was no relationship between student resources and college quality (figure 3).
Model 2: Real World Baseline – All Resource Pathways
In our next model, we allowed resources to affect college quality via all five
pathways. We set the parameters to approximate what we find empirically using realworld data or, where that is not possible, what we determine to be plausible values. First,
we set the correlation between resources and caliber was set to 0.3. This is a conservative
estimate of what this relationship is currently for high school students, based on the
observed 0.43 correlation between students’ SAT scores and the socioeconomic status
index in the ELS dataset (US Department of Education, 2006)). Second, the relationship
17
between resources and number of applications submitted set so that each standard
deviation of resources yields an additional 0.5 applications. Third, the relationship between
student resources and information was set so that each standard deviation of resources
results in 0.1 greater information reliability. Fourth, the relationship between resources
and caliber enhancement was set so that each standard deviation of resources allowed a
student to enhance apparent caliber by 0.1 standard deviations. This number is loosely
derived from research that shows that students who take SAT coaching classes typically
raise their SAT scores by approximately 25 points (Becker, 1990; Buchmann, Condron, &
Roscigno 2010; Powers & Rock, 1999). Finally, high resource students (resources greater
than the median) have a utility function with a lower intercept (need a higher quality
school to receive any utility) and a steeper slope (value high quality schools more than low
resource students do).
Using these plausibly realistic values for parameters we find patterns that are
similar to what we see empirically, which serves to demonstrate the capacity of our model
to mimic real world behavior. For example, in terms of patterns of applications and
admissions, the relationships between college quality and the number of applications
received and number of students admitted and enrolled (figure 4) is similar to what we see
using data from ELS. The relationships between college quality and selectivity (admission
rate) and yield (enrollment rate) (shown in figure 5) are also quite similar to what we see
using this real world data (graphs showing the same relationships using ELS data are in
Appendix B).
These plausible parameter values dramatically change student enrollment
outcomes. In this world, as compared with our basic model, students from high resource
18
backgrounds are much more likely both to enroll in any college (figure 6) and to attend a
top 10% college (figure 7). Students from low resource backgrounds are correspondingly
less likely. While students in the basic model all had about a 75% likelihood of enrollment
in any college, turning on these five pathways increased the likelihood of college
enrollment for the students in the 90th percentile of family resources to nearly 95% while
the likelihood for students from students whose families are in the 10th percentile of
resources decreased to nearly 55%. This change in likelihood is even more dramatic for
enrollment in one of the schools in the top 10 percent of our distribution. Whereas in the
basic model all students have a roughly equal probability of enrolling in a highly selective
school, the five resource pathways turned on, the likelihood of enrollment for 90th
percentile students is nearly 20 times what it is for 10th percentile students. There is also a
strong relationship between student resources and college quality (figure 8). Again we see
that these simulations mimic what we see using real world data. Figure 9 shows the
simulated predicted probability of enrolling in highly selective schools for students from
different resource backgrounds compared with what we find using the ELS data.
These similarities, in terms of application and enrollment behavior, bolster our
confidence that Model 2, in which we have set all resource pathways to realistic levels,
gives us a reasonable starting point from which we can examine how different policy
experiments could impact outcomes.
Models 3-7: Policy Experiments
Starting from the model in which all resource pathways are turned on, we then ran
experiments in which we turned one pathway off at a time. That is, in each model we kept
all parameters constant except for one, which we set to zero:
19

Model 3: We set the correlation between student resources and student caliber to
zero.

Model 4: We turned off the pathway through which higher resourced students
could enhance their perceived caliber.

Model 5: We set the correlation between resources and number of applications to
zero.

Model 6: We set the correlation between resources and information to zero.

Model 7: We set the correlation between resources and value of college quality to
zero.
Figures 10-12 show the results of these experiments. In general, these results show
that the correlation between student resources and caliber has the strongest influence on
the relationship between students’ resources and their college destinations, while other
resource pathways have more subtle, but still notable, impacts.3
Eliminating the correlation between resources and caliber decreases the difference
in probability of enrollment for very high and very low resource students from about 55%
to closer to 25% (figure 10). Figure 11 shows that eliminating this correlation also has a
large impact on differences in probability of enrolling in a highly selective school. Without
the correlation between student resources and student caliber, the students in the 90th
percentile of resources are about four times as likely as those in the 10th percentile to
enroll in highly selective school, compared with about 20 times as likely when all resource
pathways are turned on. The impact on quality of enrollment is also large- without the
3
Looking more closely at which sets of colleges students apply to under these different experimental conditions can
help us to understand why we are seeing these patterns. The figures in Appendix C show the maximum, minimum
and mean quality of schools that high and low resource students at all points of the caliber distribution apply to.
20
resource-caliber correlation, students in the 90th percentile enroll in schools with an
average quality 75 points higher than students in the 10th percentile, which is roughly half
the difference we see when all resource pathways are engaged. While the correlation
between resources and caliber is clearly the most powerful player, other pathways have
non-negligible effects.
Turning off the application enhancement pathway results in a significant shift
toward equality. If students are unable to enhance their perceived caliber, the relationship
between student resources and probability of enrollment at any college decreases. The
probability of a very high resource student enrolling decreases by about five percent (from
95% to 90%) and the probability of a very low resource student increases by a similar
margin (from 42% to 48%). Probabilities for students in the middle of the resource
distribution do not change appreciably, if at all. The relationship between student
resources and probability of enrolling in a top 10% school is also affected when we no
longer allow high resource students to enhance their caliber. Students in the bottom 60%
of the resource distribution are about one percent more likely to attend a selective college,
while students in the top 20% of the distribution are much less likely (up to six percent less
likely).
If we turn off the pathway through which student resources can affect the quality of
the information students have about their own caliber and college quality, the relationship
between student resources and the probability of enrolling in any college remains
remarkably unchanged. However, removing this pathway does affect a student’s
probability of enrolling in a top 10% college. We see that students from the middle of the
income distribution (incomes between about 20-70%) have an increased probability of
21
attending a highly selective school (up to two percent), while students at the very high end
of the resource distribution have a decreased probability (up to six percent less likely).
Eliminating the relationships between resources and the number of applications a
student submits and resources and perceived utility of college quality do not change the
results appreciably. When high-resource students are able to submit more applications,
they have a one-to-two percent higher likelihood of attending any college, while the
probability of any student attending a highly selective college is unchanged. When we
eliminate the relationship between resources and the perceived utility of college quality,
neither the probability of enrolling anywhere nor the probability of enrolling in a highly
selective school changes appreciably for any students.
Discussion and Conclusion
In this paper we have used agent based modeling to simulate the college application
and selection process. Despite the simplifying assumptions we made to execute our model,
we were able to successfully replicate real-world patterns of application and enrollment.
We were then able to conduct “experiments” by manipulating parameters that determine
the specific ways in which student resources might influence student behavior and
enrollment outcomes. Thus, one of the major accomplishments of this paper is to
demonstrate the utility of agent based models both to simulate the college choice process
and to provide a way of exploring hypotheses that are not testable in the real world.
This model thus forms a basic framework to which we can add additional
complexity and address other policy-relevant questions. For example, in this paper we
focus on how students’ information and behavior influence the college sorting process. We
have already begun an extension to this process in which colleges play a more active and
22
strategic role in selecting students. In addition, we will add more nuance by including costs
considerations for both students and schools.
Our strongest finding in this paper is that the relationship between student
resources and student caliber drives most of the observed socioeconomic sorting of
students into schools. This result provides evidence that increasing income-achievement
gaps may have strong and lasting effects on students’ life outcomes, which in turn could
have a negative impact on intergenerational mobility. While clearly important, the
relationship between socioeconomic status and achievement is a broad social phenomenon
that may prove intractable in the short term.
Socioeconomic status affects college destinations in ways other than through its
relationship with academic achievement. These other pathways are important to explore
not only because they provide a deeper understanding of the college sorting process, but
also because they may represent more feasible avenues for policy intervention. Two of the
pathways that we explored seemed particularly promising. Reducing the ability of high
resource students to enhance their apparent caliber had an effect on our outcomes of
interest. Removing the relationship between socioeconomic status and the quality of
information students have about college qualities and their own ability relative to others
also has a positive effect. These results indicate that student- or institution-level policies
(such as application coaching and college information provision to students in low income
schools or encouraging affirmative action-like polices for dimensions other than
race/ethnicity) could have notable impacts on how students sort into colleges.
Much previous work has shown that students are increasingly sorting into colleges
in ways that reflect their socioeconomic origins. While the societal dangers of this trend are
23
clear, the mechanisms behind it are harder to disentangle. This paper provides evidence
that agent based modeling is a promising avenue for exploring potential policy
interventions that could begin to reverse this trend.
24
References
Alon, S. (2009). The evolution of class inequality in higher education: Competition,
exclusion, and adaptation. American Sociological Review, 74, 731.
Astin, A., & Oseguera, L. (2004). The declining “equity” of American higher education. The
Review of Higher Education, 27, 3, 321.
Becker, B. J. (1990). Coaching for the Scholastic Aptitude Test: Further synthesis and
appraisal. Review of Educational Research, 60, 373.
Belley, P. & Lochner, L. (2007). The changing role of family income and ability in
determining educational achievement. Journal of Human Capital, 1, 1, 37.
Black, D. & Smith, J. (2004). How robust is the evidence on the effects of college quality?
Evidence from matching. Journal of Econometrics, 121, 99.
Buchmann, C., Condron, D. J., and Roscigno, V. J. (2010). Shadow education, American style:
Test preparation, the SAT and college enrollment. Social Forces, 89, 2, 435.
Dale, S. and Krueger, A. (2002). Estimating the payoff to attending a more selective college:
An application of selection on observables and unobservables. Quarterly Journal of
Economics, 117, 4, 1491.
Ellwood, D. T. and Kane, T. J. (2000). Who is getting a college education? Family background
and the growing gaps in enrollment. In S. Danziger and J. Waldfogel (Eds.) Securing
the future: Investing in children from birth to college (pp. 283-324). New York:
Russell Sage Foundation.
Gardner, M. (1970). Mathematical games: The fantastic combinations of John Conway’s new
solitaire game “life. Scientific American, 223, 120.
25
Grodsky, E. and Jackson, E (2009). Social stratification in higher education. Teachers College
Record, 111, 10, 2347.
Hammond, R. A., and Axelrod, R. (2006). The evolution of ethnocentrism. Journal of Conflict
Resolution, 50, 926.
Henrickson, L. (2002). Old wine in a new wineskin: College choice, college access using
agent-based modeling. Social Science Computer Review, 20, 400.
Hoekstra, M. (2009) The effect of attending the flagship state university on earnings: a
discontinuity-based approach. The Review of Economics and Statistics, 91, 4, 717.
Hossler, D., Schmit, J., and Vesper, N. (1999). Going to college: How social, economic, and
educational factors influence the decisions students make. Baltimore: The Johns
Hopkins University Press.
Hoxby, C. (2009). The changing selectivity of American colleges. Journal of Economic
Perspectives, 23, 4, 95.
Karen, D. (2002). Changes in access to higher education in the United States: 1980-1992.
Sociology of Education, 75, 3, 191.
Long, M.C. (2007). College quality and early adult outcomes. Economics of Education Review,
27, 588.
Manski, C. (1993). Adolescent econometricians: How do youth infer the returns to
schooling? In C. Clotfelter and M. Rothschild (Eds.) Studies of supply and demand in
higher education (pp. 43–57). Chicago: University of Chicago Press.
Page, S. E. (2008). Agent-Based Models. In S. n. Durlauf, and L. E. Blume (Eds.) The New
Palgrave Dictionary of Economics Online. Basingstoke, Hampshire: Palgrave
Macmillan.
26
Perna, L. W. (2006). Studying college choice: A proposed conceptual model. In J. C. Smart
(Ed.), Higher education: Handbook of theory and research (Vol. 21, pp. 99-157). New
York: Springer.
Perna, L. W. and Titus, M. A. (2005). The relationship between parental involvement as
social capital and college enrollment: An examination of racial/ethnic group
differences. The Journal of Higher Education, 76, 5, 485.
Powers, D. E., and Rock, D. A. (1999). Effects of coaching on SAT I: Reasoning test scores.
Journal of Educational Measurement, 36, 2, 93.
Reardon S.F. (2011). The widening academic achievement gap between the rich and the
poor: New evidence and possible explanations. In Duncan, G.C., & Murnane, R. (Eds.)
Whither Opportunity? New York: Russell Sage Foundation.
Reardon, S. F., Baker, R., B., and Klasik, D. (2012). Race, income, and enrollment patterns in
highly selective colleges, 1982-2004. Center for Education Policy analysis.
Rivera, L. (2011). Ivies, extracurriculars, and exclusion: Elite employers’ use of educational
credentials. Research in Social Stratification and Mobility, 29, 1, 71.
Schelling, T. A. (1971). Dynamic models of segregation. The Journal of Mathematical
Sociology, 1, 2, 143.
US Department of Education (2006). National Center for Education Statistics. Education
Longitudinal Study (ELS), 2002 & 2006: Base Year through Second Follow-up.
27
Figure 1. Probability of enrolling in any college, by student resources percentile, year 30.
28
Figure 2. Probability of enrolling in a top-10% college, by student resource percentile, year
30.
29
Figure 3. Quality of college enrolled in, but student resource percentile, year 30
30
Figure 4. Number of applications, admittees, and enrollees, but college quality, baseline
scenario.
31
Figure 5. College selectivity and yield, by college quality, baseline scenario.
32
Figure 6. Probability of enrolling in any college, by student resource percentile, year 30
33
Figure 7. Probability of enrolling in a top-10% college, by student resource percentile, year
30
34
Figure 8. Probability of enrolling in a top-10% college, by student resource percentile, year
30.
35
Figure 9. Probability of attending a highly selective college, by income, high school class of
2004.
36
Probability of Enrolling in Any College
by Student Resource Percentile, Year 30
1.00
1.00
1.00
0.90
0.90
0.90
0.80
0.80
0.80
0.70
0.70
0.70
0.60
0.60
0.60
0.50
0.50
0.50
0.40
0.40
Full: All Resource Pathways
Full - Corr(Resources, Caliber)
0.30
0
20
40
60
80
100
0
1.00
0.90
0.90
0.80
0.80
0.70
0.70
0.60
0.60
0.50
0.50
Full: All Resource Pathways
Full - Corr(Resources, Info)
0.30
0
20
40
60
80
100
Full - App. Enhancement
0.30
1.00
0.40
Full: All Resource Pathways
20
0.40
40
60
80
0.40
Full: All Resource Pathways
Full - Corr(Resources, #Apps)
0.30
100
0
20
40
60
80
100
Full: All Resource Pathways
Full - Corr(Resources, Utility)
0.30
0
20
40
60
80
100
Figure 10. Probability of enrolling in any college, by student resource percentile and resource pathway, year 30.
37
Probability of Enrolling in a Top 10% College
by Student Resource Percentile, Year 30
Full: All Resource Pathways
Full: All Resource Pathways
Full - Corr(Resources, Caliber)
Full - App. Enhancement
Full: All Resource Pathways
Full - Corr(Resources, #Apps)
0.32
0.32
0.32
0.28
0.28
0.28
0.24
0.24
0.24
0.20
0.20
0.20
0.16
0.16
0.16
0.12
0.12
0.12
0.08
0.08
0.08
0.04
0.04
0.04
0.00
0.00
0.00
0
20
40
60
80
100
0
20
Full: All Resource Pathways
40
60
80
100
0
20
40
60
80
100
Full: All Resource Pathways
Full - Corr(Resources, Info)
Full - Corr(Resources, Utility)
0.32
0.32
0.28
0.28
0.24
0.24
0.20
0.20
0.16
0.16
0.12
0.12
0.08
0.08
0.04
0.04
0.00
0.00
0
20
40
60
80
100
0
20
40
60
80
100
Figure 11. Probability of enrolling in a top-10% college, by student resource percentile and resource pathway, year 30.
38
Quality of College Enrolled In
Full: All Resource Pathways
40
60
80
Full: All Resource Pathways
100
1150
1100
1050
1000
1050
1100
1150
Full - Corr(Resources, #Apps)
1000
20
1200
0
0
20
40
60
80
100
0
20
40
60
Full: All Resource Pathways
Full - Corr(Resources, Utility)
1100
1050
1000
1000
1050
1100
1150
Full - Corr(Resources, Info)
1150
Full: All Resource Pathways
Full - App. Enhancement
1200
1000
1050
1100
1150
Full - Corr(Resources, Caliber)
1200
Full: All Resource Pathways
1200
1200
by Student Resource Percentile, Year 30
0
20
40
60
80
100
0
20
40
60
80
100
Figure 12. Quality of college enrolled in, by student resource percentile and resource pathway, year 30.
80
100
39
Appendix A
Optimal College Portfolio Algorithm
Notation. Let 𝑖 = 1, 𝑡𝑁 index students and let 𝑗 = 1, 𝑑𝑒𝐽 index colleges. Let 𝑄𝑗
denote college quality (or utility). Suppose student 𝑖 applies to some set 𝐀 𝑖 = {𝑎1 , 𝑎2 , … , 𝑎𝑛 }
of 𝑛 schools (where the set is ordered such that 𝑄𝑎1 ≤ 𝑄𝑎2 ≤ ⋯ ≤ 𝑄𝑎𝑛 ). Denote the set of
schools to which student 𝑖 is admitted as 𝐁𝑖 , where 𝐁𝑖 ⊆ 𝐀 𝑖 . Assume that a student will
enroll in the highest quality school to which she is admitted.
Now let 𝑈𝑖 (𝐁𝑖 ) = max(𝑄𝑗 , . . , 𝑄𝑘 |𝑗, … , 𝑘 ∈ 𝐁𝑖 ). In other words 𝑈𝑖 (𝐁𝑖 ) is the quality of
the school that student 𝑖 will enroll in if 𝑖 is admitted to the set of schools 𝐁𝑖 . Define
𝑈𝑖 (∅) = 0 (if 𝑖 is not admitted to any schools, then 𝑖 experiences school quality of 0).
Now define 𝐸𝑖 (𝐀 𝑖 ) = 𝐸[𝑈𝑖 (𝐁𝑖 )|𝐀 𝑖 ] = 𝐸[𝑈𝑖 (𝐂𝑖 )|𝐀 𝑖 ]. That is, let 𝐸𝑖 (𝐀 𝑖 ) indicate the
expected value of the quality of the college student 𝑖 will enroll in if she is applies to the set
of colleges 𝐀 𝑖 .
The student’s challenge is to pick, given a number of applications, the set 𝐌𝑖𝑛 of 𝑛
schools that will maximize 𝐸𝑖 (𝐀 𝑖 ).
Assumptions. Let 𝐴𝑖𝑗 = 1 if student 𝑖 is admitted to school 𝑗 and 𝐴𝑖𝑗 = 0 otherwise.
Let 𝑃𝑖𝑗 = 𝑃𝑖 (𝑄𝑗 ) = Pr(𝐴𝑖𝑗 = 1) be the probability that a given student will be admitted to a
school of quality 𝑄. We will assume that 𝑃𝑖 is a monotonically decreasing function of 𝑄 for
all 𝑖. Assume the utility of enrolling in no college is 0. Assume that the available colleges
range from a value of 0 to 1 (the scale is arbitrary), and that the function 𝑃𝑖 (𝑄) ∙ 𝑄 has a
single local maximum on the interval (0,1). That is, there is at most a single point 𝑚𝑖 in the
interval [0,1] where
𝑑(𝑃𝑖 𝑄)
𝑑𝑄
= 0. Moreover, if there is such a point, then
𝑑(𝑃𝑖 𝑄)
𝑑𝑄
> 0 if 0𝑖𝑄𝑗 <
40
𝑚𝑖 and
𝑑(𝑃𝑖 𝑄)
< 0 if 𝑚𝑖 < 𝑄𝑗 ≤ 1. If there is no such point, then 𝑃𝑖 𝑄 is monotonically
𝑑𝑄
increasing or decreasing over the interval [0,1].
This last assumption about a single local maximum is met if 𝑃𝑖 (𝑄) = (1 + 𝑒 𝑎+𝑏𝑄 )−1
and 𝑏 > 0 (i.e., if the probability of admission is described by a logit function of – 𝑄).
Algorithm. Now, it can be shown that, under these conditions, and a given number
of applications 𝑛, student 𝑖 should apply the set of school 𝐌𝑖𝑛 defined the following way:
First, compute 𝐸𝑖 (𝑗) = 𝑃𝑖 (𝑄𝑗 ) ∙ 𝑄𝑗 for all 𝑗 ∈ 1, ∈ 𝐽. Apply to the school with the
maximum value of 𝐸𝑖 (𝑗). Call this school 𝑎1 . If student 𝑖 were to apply to a single school, this
would optimize the student’s expected utility. In other words, 𝐌𝑖1 = {𝑎1 }, and 𝐸(𝐌𝑖1 ) =
𝑃𝑖 (𝑄𝑎1 ) ∙ 𝑄𝑎1 .
Continue this process recursively to identify schools 𝑎2 , … , 𝑎𝑛 . For 𝑘 = 2, 𝑟𝑛, use the
following algorithm. At the 𝑘 𝑡ℎ step, check if there are schools with 𝑄𝑗 > 𝑄𝑎𝑘−1 . If not, pick
the 𝑛 − (𝑘 − 1) highest quality schools that 𝑖 has not yet applied to and include them in 𝐀 𝑖 .
This will complete the set 𝐌𝑖𝑛 that maximizes 𝐸𝑖 (𝐀 𝑖 ).
If, however, there are schools with 𝑄𝑗 > 𝑄𝑗 > 𝑄𝑎𝑘−1 , then for each such school 𝑗,
compute 𝐸𝑖 (𝐌𝑖𝑘−1 , 𝑗) = (1 − 𝑃𝑖𝑗 )𝐸𝑖 (𝐌𝑖𝑘−1 ) + 𝑃𝑖𝑗 𝑄𝑗 . Pick the school with the maximum
value of 𝐸𝑖 (𝑎𝑖 , 𝑗), and apply to this school as well. Call this school 𝑎𝑘 . Now 𝐌𝑖𝑘 =
{𝐌𝑖𝑘−1 , 𝑎𝑘 } = {𝑎1 , 𝑎2 , … , 𝑎𝑘 }
Under these rules, 𝐌𝑖𝑛 will be the set of 𝑛 schools that maximize the expected quality
of the school in which student 𝑖 will enroll.
It is easy to show that
𝐸𝑖 (𝐌𝑖𝑘 ) − 𝐸𝑖 (𝐌𝑖𝑘−1 ) = (1 − 𝑃𝑖𝑎𝑘 )𝐸𝑖 (𝐌𝑖𝑘−1 ) + 𝑃𝑖𝑎𝑘 𝑄𝑎𝑘 − 𝐸𝑖 (𝐌𝑖𝑘−1 )
41
= 𝑃𝑖𝑎𝑘 (𝑄𝑎𝑘 − 𝐸𝑖 (𝐌𝑖𝑘−1 ))
>0
because 𝑄𝑎𝑘 > max(𝑄𝑎 1 , … , 𝑄𝑎 𝑘−1 ) ≥ 𝐸𝑖 (𝐌𝑖𝑘−1 ). In other words, adding a 𝑘 𝑡ℎ school
of higher quality than any of the 𝑘 − 1 schools in 𝐌𝑖𝑘−1 will always increase the expected
quality of one’s college enrollment.
42
Appendix B
The following figures are based on IPEDS-based admissions statistics from the
2010-2011 admissions cycle. They were used to help confirm the calibration of our model.
Figures B1 and B2 are intended to be compared to Figures 4 and 5, respectively.
Figure B1. Applications and acceptances per enrolled student, by ‘median’ admitted SAT
score. From IPEDS data from 2010011. Median SAT score is approximated half of the sum
of the 25th and 75th percentile of SAT score.
43
Figure B2. Acceptance and yield rates, by ‘median’ admitted SAT score. From IPEDS data
from 2010011Median SAT score is approximated half of the sum of the 25th and 75th
percentile of SAT score
44
Appendix C
Mean Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Corr(Resources, Caliber)
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
Student Caliber
High resource students
Low resource students
Graphs by Scenario
Figure C1.Mean quality of schools applied to, by (true) student caliber and scenario, year 30.
1250
1500
45
Average Maximum and Minimum Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Corr(Resources, Caliber)
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
1250
1500
Student Caliber
High resource students (maximum quality)
Low resource students (maximum quality)
High resource students (minimum quality)
Low resource students (minimum quality)
Graphs by Scenario
Figure C2. Average maximum and minimum quality of schools applied, by (true) student caliber and scenario, year 30.
46
Mean Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Application Enhancement
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
Student Caliber
High resource students
Low resource students
Graphs by Scenario
Figure C3. Mean quality of schools applied to, by (true) student caliber and scenario, year 30.
1250
1500
47
Average Maximum and Minimum Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Application Enhancement
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
1250
1500
Student Caliber
High resource students (maximum quality)
Low resource students (maximum quality)
High resource students (minimum quality)
Low resource students (minimum quality)
Graphs by Scenario
Figure C4. Average maximum and minimum quality of schools applied to, by (true) student caliber and scenario, year 30.
48
Mean Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Corr(Resources, #Apps)
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
Student Caliber
High resource students
Low resource students
Graphs by Scenario
Figure C5. Mean quality of schools applied to, by true student caliber and scenario, year 30
1250
1500
49
Average Maximum and Minimum Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Corr(Resources, #Apps)
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
1250
1500
Student Caliber
High resource students (maximum quality)
Low resource students (maximum quality)
High resource students (minimum quality)
Low resource students (minimum quality)
Graphs by Scenario
Figure C6. Average maximum and minimum quality of schools applied to, by (true) student caliber and scenario, year 30.
50
Mean Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Corr(Resources, Information)
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
Student Caliber
High resource students
Low resource students
Graphs by Scenario
Figure C7. Mean quality of schools applied to, by (true) student caliber and scenario, year 30.
1250
1500
51
Average Maximum and Minimum Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Corr(Resources, Information)
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
1250
1500
Student Caliber
High resource students (maximum quality)
Low resource students (maximum quality)
High resource students (minimum quality)
Low resource students (minimum quality)
Graphs by Scenario
Figure C8. Average maximum and minimum quality of schools applied to, by (true) student caliber and scenario, year 30.
52
Mean Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Corr(Resources, Utility)
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
Student Caliber
High resource students
Low resource students
Graphs by Scenario
Figure C9. Mean quality of schools applied to, by (true) student caliber and scenario, year 30.
1250
1500
53
Average Maximum and Minimum Quality of Schools Applied To
by (True) Student Caliber, Year 30
Full: All Resource Pathways
Full - Corr(Resources, Utility)
1400
1300
1200
1100
1000
900
500
750
1000
1250
1500
500
750
1000
1250
1500
Student Caliber
High resource students (maximum quality)
Low resource students (maximum quality)
High resource students (minimum quality)
Low resource students (minimum quality)
Graphs by Scenario
Figure C10. Average maximum and minimum quality of schools applied to, by (true) student caliber and scenario, year 30.
Download