Domestic Migration Networks in the United States by Robert Allen Manduca B.A., Swarthmore College (2010) Submitted to the Department of Urban Studies and Planning in partial fulfillment of the requirements for the degree of Master in City Planning A:A() HUSEZTTS at the \ASSACHUSETTS INSTITUTE OF TECHNOLOGY JUN 19 2014 IBRARIES June 2014 @ Robert Allen Manduca. MMXIV. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature redacted Author . Department of Urban Studies and Planning May 22. 2014 201 CySignature ~\a redacted Albert Saiz Associate Professor Ahesis Supervisor Accepted by .............. INSTITUTE ihc."*4OLOGY Signature redacted -P. Viristopher Zegras Associate Professor Chair, MCP Committee 2 Domestic Migration Networks in the United States by Robert Allen Mkanduca Submitted to the Department of Urbain Studies and Planning on May 22, 2014, in partial fulfillment of the requirenents for the degree of Master in City Planning Abstract In recent years, there has been substantial interest in understanding urban systems at the national and global scales: what are the economic and social ties that link cities together, and what is the netw o)rk structure formed by such ties? At the saime time, human capital accumulation is increasingly seen as a, primary driver of regional ec()nomic growth. Doimnestic migration patterns have the potential to illuminate the social and economlic connections among cities, while also highlighting economi( ally significant fows of111human capital. In this thesis I examine the US city system through the lens of gross migrat ion flows, taking advantage of unusually complete data on county-to-county migration cornpiled annually by the IRS. I compare the observed flows to those predicted]by the raliation model. finling most notably that there are far more lo ng-distance migrants than would be predicted based on the spatial (listribution of population alone. I then use reciprocal migration patterns to construct a migration network connecting metro areas in the United States. I utilize current-flow centrality measures to identifv the most prolinent nodes in this wvveighted network. Additionally, I use repeated applications of the Louvain community detection algoritmin to identify reasonably robust communities within the migration network. These exhibit a striking degree of spatial contiguity. Thesis Supervisor: Albert Saiz Title: Associate Professor 4 Acknowledgments The seeds of tis thesis wvere planted more than flour Years ago, when I first encointered the bloggings of Aaro)n Renn. At the t ime, the idea of st udying these 4opics in earnest -at MIT no less--seeied fanciful. After two years here, all I can say is that iny experience has exceeded all possible expect at ions. A number of people have contributed to making that the case. Foremost among these are my classmates: I wouldn't have thought it possible to feel so close to so many people I find so impressive. 'A" advisor, Albert Saiz, has been a consistent source of guidance on everything from the specifics of this thesis to what I should do with mv life. He was invaluable in helping roe to craft a curriculum in quaintitative social science at D SP. Mv thesis reader, Mart a Gonzalez, has been a tremendous source of enthusiasm and met hodological insight 4throughout this process. I am grateful to her for being so willing to embra(ce a stlludent from urban planning. Xavier de Souza Briggs and Joe Ferreira have hoth been foundationail to my experience at DUSI and (onversat ions with them have criticallv informed my academic and c'areer trajector . A number of pr)fessors across the Inst itute--Am Glasmeier, PIhil Cla y. Phil Thompson, Karl Seidman, Cesar Hidalgo, Fiona Murrav. and Eran Ben-Joseph, among others--have opened my mind and profoundly shaped my worldview. I was expecting MIT professors to be brilliant, but I was not expecting them to be such outstanding teachers. A big shout out to all of the staff at DUSP. Ezra Glenn has enriched my experience immensely thbrough his boundless creativity, energ, and wisdon. Kirsten Greco kept me on track to graduate by\ cheerfully answering countless inquiries about d1partmental minutiae, and Janine Marchese was admirably sanguine about processing a similar number of reimbursement request s. My part ner Roseanna cont inues to challenge me it ellectuailv, guide me ethically, and support me emotionally. This probably wouldn't have beenl possible without her. Finally, my family. ly sister Katie remains a ('onstant source of cheer in my life, and a model of'how to face difficult situations witi courage and good humor. As for my mot her and father: the further I go in life the more I realize Just how much they have given ine. A son coukd not ask to be raised by better parents. 5 6 Contents 1 Introduction 2 Motivating Literature 11 2.1 Investigations of Migration . . . . . . . . . . . . . . . . . . . . . . . . 2.1 i )eterminants of Migration . . . . . . . . . . . . . . . . . . . . 11 11 2.1.2 (ffosequenees Migrat . . . . . . . . . . . . . . . . . . . HIu-liran Capita, So( ial Networks, aid E-onomi-ifl Development . . . . 2.2.1 Human C( pital and Econom (Groth . . . . . . . . . . . . . 2.2.2 The lim)por1taCUE e of Social Networks in Cross-Regional Eeoiiomi 13 2.2 2.8 2.4 3 4 5 1-4 14 A v it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Mapping and Analy zin tin Ur ban Sysem . . . . . . . . . . . . . . .15 Themes and i)ireetion.... . . . . . . . . . . . .. . . . . . . . .. 16 17 Data Source and Preparation .......... 17 ................... 3.1 Data Soure.. 3.2 D ata Prepar ion'r . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19 8.8 3.2.1 Metropolitan Areas . . . . . . . . . . . . . . . . . . . . . . . . Net, Gross, and Reciproeal ligratin . . . . . . . . . . . . . . . . . . 19 19 Observed and Modeled Migration Patterns 1.1 Metropolitan Migration Rates . . . . . . . . . . . . . . . . . . . . . . 23 23 4.2 Inidividual Migrat ion Flows . . . . . . . . . . . . . . . . 26 4.3 4.4 The Radialiori Model . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Impleientation of the Radialion Model . . . . . . . . . . . . . Radiation Model Results . . . . . . . . . . . . . . . . . . . . . . . . . 27 29 30 4.5 Patterns of Resi(ials . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Centrality Analaysis of the Migration Network 5.1 Degree Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 37 Closeness Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . Betweenness Ceritiality . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Detecting Communities in the Migration Network 6.1 Approach to Cormunity )eteetion . . . . . . . . . . . . . . . . . . . 45 45 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 5.3 6 9 6.1.1 M odularity 7 . . . . . . 39 6.1.2 The Louvairi Algorithm . . . . 6.1.3 Initial Partitions . . . . . . . 6.1.4 Repeated Louvain Runs . . . Community Roles of In(livi(lual Metro 6.2.1 Extra-Community Degree . . 6.2.2 Cornmunity Diversity . . . . . 6.2.3 Within-Community Role . . . . . . . . . . . . . . . Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 46 48 51 51 52 55 7 Discussion, Limitations, and Further Research 7.1 Limitations of the Current Research . . . . . . . . . . . . . . . . . . . 7.2 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 59 60 References 63 6.2 8 . . . . . . . Chapter 1 Introduction The i 11ted States has long been l)articularly mobile society. Overwhelmingly descended from immigrants, Americarns don't seem to stay rooted to their place of birth at the saime rate as residents of most other countries. On multiple occasions throughout the nation's history-the era of west ward expansion, the Great MiIigration, the rise of the sun belt literally millions of Americans have uprooted their lives and moved west, north, or south in search of i(h(es, opportunit. or simply better weather. Although internal migration has decreased in recent years. the US still has one of the highest rates of domestic migration in the world (Frey, 2009; Molloy, Smith, k Vozniak, 2011). Moving across the country for college, a job, or just a change of scenery is notiinnsua. This propensity to migrate is of particular interest now because of the rising ((C011011 importance of human capital. Increasingly, the knowledge spillovers created by having iarge numbers of workers in the same industry chistered together are seen as a imaor driver of long-term econormic growth. If a significant fraction of that talent pool decides one day to leave, that may cast a shadow on the economic future of their region. Alternateiv, if a region can increase its share of the migration take it riay be able to improve its economic xellbeing. 1nder these cireunstances, understanding how people move around the country and what drives that movement becomes more than just a matter of intellectual curiousity. The economic vitality of cities and states nav depend on it. At the same time. in these days of ideological polarization and cultura1 fracturing, migration flows offer a glimpse of the social lies that knit different parts of the country together. People are most likely to move to places they are familiar with, and when they move they do not abandon all ties to their former home. A large exchange of migrants between two regions therefore is both an indicator of existing social connection between them and a harbinger of stronger linkages to come. Examining the full set of migration ties offers the possibility of identifying relatively self-contained regions of the countries and the connecting cities that link them together. This thesis explores domestic inigraiton in the United States with an eye toward these two topics: human capital accumulation and social distance. Ut ilizing a large and unusually complete dataset on county to county migration patterns from the IRS, I look at the migration flows around the US and the cities they connect. I 9 begin by simply observing the patterns of migration, and coiparing them to a model of what patterns might occur if there were no economic or cultural differentiation among cities. Finding that such a, model cannot explain the observed patterns, I shift to examining the connections that they illuminate among cities. I find that certain cities occupy particularly central poistions in the migration network, serving as hubs that connect otherwise separated parts of the country. Finally I investigate more formally the boundaries of these regions, using the data to articulate migration regions within the country and investigating the roles played by different cities within those regions. The following section situates this analysis by providing an overview of previous studies of migration, as well as research on the importance of human capital and social networks to economic growth. Chapter 3 discusses the data source used and the process of preparation. The ensuing three chapters describe my investigations in detail, outlining the methods used for each and the results found. I conclude with a discussion of my findings, their limiations, and possibilities for further research. 10 Chapter 2 Motivating Literature 2.1 Investigations of Migration N1odern study of (omestic migration is generally acknowledged to have begun wvith Ravenstein's study( of nigration patterns in nineteenth century Britain (Ravenstein. 1885). He formulated seven laws of migration, among- them that most igibrants tend to move relatively short distances, that a flow of iglrants in one direction produces a countervailing fhw ill the other direction, and that loiig-distace nigrniIants tend to mnove to major cities. Since then. scholarship on domestic migration has proliferated in a diverse array of approaches, theories, and methods. The majority of work has attemripted to determine the determinants of migration, at both the individual and aggregate levels. What kinds of people inigrate, and why do they go where they do? A second, smnaller branch of research has mattempted to address the consequences of migration for niigrants and the places they encounter. 2.1.1 Determinants of Migration At the ind1ividual level, migration has been widely conceptualized as a human capital investment. This was first proposed by Sjaastad (Sjaastad, 1962) and is widely used t(day. A person is hypothesized to migrate if the expected benefits to doing so, in terms of increased wages or improved access to amenities, exceed the costs of the move. The imrportance of wage increases has figured most, prominently in the literature (Yezer k Thurston, 1976; Kennan & Walker, 2011), although there does appear to be an initial period of wage decline folkowing a move (Grant K. Vanderkamp, 1980; Borjas, Bronars, & Trejo, 1992). Some evidence has suggested that factors other than employment prospects may in fact be dominant in the majority of cases (Morrison & Clark, 2011)., or that the relative importance of employment and amenity factors may vary over the lifecycle (Chen k&Rosenthal, 2008). Scholarship on demographic characteristics ailn migration has found that the propensity to migrate tends to peak in the 20s and 30s. and that the overall national migration rate has risen and fallen on the hacks of generations (Plane &T Rogerson, 1991; Plane, 1993). Migration rates have also been shown to increase with education level (Greenwood. 1997; Kodrzycki, II 2001), and specifically, educated workers have been shown to be more likely to move long distances to areas with better employment prospects (Wozniak, 2010). Early economic analyses of migration at the regional level viewed it largely as a means by which regional differences in wages and unemployment were equilibrated (Courchene, 1970; Vanderkamp, 1971). If wages in one part of the country were especially high, more people would migrate there until the labor supply increased enough to bring wages down. In the 1980s, an alterative view was proposed that argued that wage differentials across regions are not evidence of a lack of equilibrium, but rather are compensating for differences in amenities between locations, most notably due to climate (Graves, 1980; Graves & Knapp, 1988). In this view interregional migration is not a response to disequilibrium in labor markets but rather reflects changing preferences for the various consumption baskets offered by different cities. It is worth noting that the equilibrium and disequilibrium approaches are not entirely contradictory-both almost certainly operate at different points in time. Rather, the question is whether regions are mostly in a state of equilibrium or one of disequilibrium. The equilibrium approach also emphasizes the need to control for amenities in the destination region in addition to economic variables, since high wages can be considered compensation for a less appealing set of amenities (Hunt, 1993). However, both frameworks are called into question by recent work by Kemeny and Storper that documents sustained regional disparities in both wages and amenities (Kemeny & Storper, 2012). Models of migration at the regional level in both the disequilibrium and the equilibrium framework generally model gross or net flows between regions as a function of distance and of conditions at the origin and destination regions. Common explanatory variables include per capita income, unemployment rates, tax levels, measures of government expenditures, and climatological variables (Schachter &T Althaus, 1989; Treyz, Rickman, Hunt, &-Greenwood, 1993). One possibility that has not received extensive amounts of study is that there may be a role for path-dependency in domestic migration. In international migration, the phenomenon of "chain migration" is frequently observed: the first few migrants to a new country will settle in a given city essentially at random-sometimes almost literally, as the government decides where to settle refugees. However, once a few migrants are established there is a great tendency for further migrants from that country to settle in the same place. The original settlers can assist newcomers in adjusting to the new country, and can provide information about opportunities in their new surroundings to acquaintances back home (Elsner, Narciso, & Thijssen, 2013; Massey, 1988). Although the cultural differences in internal migration tend not to be as stark as those involved in international migration, there is still room for information spread through social networks to make a substantial difference in the decisions of migrants. One attempt to model internal migration flows found that migration patterns in 1970 were a better predictor of migration in 2000 than both gravity models and models based on other forms of connectivity such as air traffic (Andris, Halverson, & Hardisty, 2011). Unfortunately this study was unable to conduct a. comparison with a fully specified econometric model including all of the variables described above. 12 2.1.2 Consequences of Migration Research on the regional impacts of internal nigration has been less subst antial than that on its determinants. It is relatively difficult to miake causal inference about the economic impacts of migration into a region, and there have been few attempts. A larger number of studies have looked at the demographic impacts of doriestic migration in terms of the net population change to a region or changes in the relative size of different demographic groups. Overall net domestic migration is frequently used as an ind(Pator of regional economic health in the popular media (Kotkin, 2012), despite having systematic biases as a statistic (Rogers, 1990). It has been docunented that regions that receive large numbers of immigrants have tended to have substantial net domestic outmigration, especia llv of low-skilled workers (Frey, 1994: Borjas, 2006). Beginning in the 1990s., this prompted concerns about "demographic Balkanization," in which different regions of the country would become increasingly culturallv and ethnically dist inct, dividing the country spatially along racial, class, and ideological lines (Frey, 1995, 1996; Wrioht, Ellis, 1997). [Reibel, One consequence of migration that has attracted a great deal of concern from plolicymnakers, especially in rural or economically distressed areas, is that of "brain drain." In the domestic context, this is the net out-migration of a region's young, college-educated workers. Due to the importance of human capital in stimulating economic growth (see below), the prospect of college graduates leaving town is frequently viewed with alarm by policymakers, and many states and cities have implemented retention programs for their students, offering financial incentives to attend college in the state or to remain after graduation. Brain drain almost certainly does occur, in the sense that universitv graduates are extremely mobile and many do move away after graduation (Hansen, Ban, & Huggins, 2003; Sanderson & Dugoni, 2002; McGuire, Hardy-Johnston, &KSaevig, 2006; Stricker, 2007). However, several studies have found a simall but existent relationship between the location of a school arid the region where its graduates end up (Bound, Groen, K6zdi, k Turner, 2004; Groen, 2004: Gottlieb & Joseph, 2006). Further, 20-29 year olds nationally are the only age group with a strong net tendency to move into large cities, moves that are balanced out by older residents leaving (Plane., Henrie, & Perry, 2005). This suggests that the proper way for municipalities to approach brain drain may be to acknowledge that young educated workers are likely to move to a major city as a natural part, of their life course, note that, attending college in an area in fact makes them more likely to end up there than if they had gone to a different school, and perhaps focus on attracting their parents. 13 2.2 2.2.1 Human Capital, Social Networks, and Economic Development Human Capital and Economic Growth Studies of migration have gained renewed relevance in recent years with the increasing emphasis given to the role of human capita] in economic development. Knowledge spillovers stemming from the concentration of human capital are seen as one of the primary mechanisms driving endogenous growth (Lucas, 1988; Romer, 1990). Cities with high levels of human capital have been shown to grow more quickly than those with lower levels, perhaps because they are more able to adapt to negative economic shocks (Glaeser, Scheinkman, & Shleifer, 1995; Glaeser k Saiz, 2003; Black & Henderson, 1999; Gottlieb k Fogarty, 2003). This evidence has put a premium on high-skilled workers, and urban policymakers have been encouraged to pursue growxth strategies that attempt to accumulate human capital (Mathur, 1999). The role of urban amenities in particular has received a, great deal of attention from both academic and popular sources. High-amenity cities have been shown to grow faster than those with few amenities (Glaeser, Kolko, & Saiz, 2001), and residential amenities have been found to impact the location decisions of firms (Gottlieb, 1995). Focusing on high-skilled workers specifically, Richard Florida has found them to be concentrated in areas with high levels of diversity as well as amenities (Florida, 2002). Whisler et al. suggest that different types of amenities are important to college-educated workers at different stages in the lifecycle, with young workers being attracted to areas with large numbers of cultural amenities while older workers are drawn to areas with low crime rates and mild climates (Whisler, Waldorf, Mulligan, & Plane, 2008). It should be noted that there is still very much a debate in the literature about whether the causality described above-amenities attract, human capital which attracts companies-is in fact correct, or if the causality is more or less inverted: strong career prospects attract highly skilled workers to a region, who create a market for the amenities. Some economic geographers, most notably Michael Storper, argue for the latter scenario, claiming that firms at the forefront of innovation-driven sectors are able to secure excess profits, which attract highly skilled workers who are able to create the next round of innovations (Storper, 2010). 2.2.2 The Importance of Social Networks in Cross-Regional Economic Activity It is increasingly understood that economic activity is embedded in a network of social relationships (Granovetter, 1985). People and places occupying central positions in this network may be able to reap rewards unavailable to those on the periphery (Borondo, Borondo, Rodriguez-Sickert, & Hidalgo, 2014). The importance of social networks to employment prospects in particular is evidenced by the proliferation of LinkedIn and the popularity of happy hours at conferences. But it was studied more 14 extensively by Mark Granovetter, who found that among white collar workers a large proportion come) to their jobs via leads from their social networks (Granovetter, 1973). Gr anovetter noted that most of these leads came from people the job seeker saw relatively rarely. He suggests that these "weak ties" are extremely valuable, because they connect people who for the most part occupy different social circles and are thereftore exposed to different infornation and different opportunities. An individual with many weak ties will be able to draw on a, much larger and more complete body of awareness about what is goin g on in the world. Inter-regional I migrants are likely to have large numbers of these weak ties. In many cases they will leave a dense network of family and friends behind when they move, a group they feel a strong connection with but no longer see very frequently. In addition, they will create a new circle of social ties in their new home. 'They are thus very well poised to act as a bridge between their old friends and their new neighbors. The economic impacts of such bridges on their home regions can be powerful. In the international context, entrepreneurs returning from Silicon Valley were instrumental to the development of the Taiwanese tech industr. largely because they were alble to use their connections to and knowledge of Silicon Valley to identify areas where Taiwan co1k1d play a complementary role (Saxenian & Sabel, 2008). A similar pro('e'ss has played out in the emergence of Israel's tech industry (Senor & Singer. 2009). In fact, over the past two decades many countries have shifted from viewing their expatriates as a national embarrassment to seeing them as an important economic resource that allows the country to tap foreign markets and investors (Ga men. 2011; Kuznetsov, 2006). Even the domestic context, social ties are important to the developrent of businesses and industries. They are a primary mechanism 1y which innovations are diffused (Hagerst ra ind, 1966). Business owners tend to purchase from suppliers who they know, and investors wvii-1 be predisposed towards cities they are familiar with (Pred, 1977). All of this means that regions wvith high numbers of informal social links to other, prosperous regions--at what has been termed a low "social distance" (Andris, 2011)- may be likely to prosper as well. Practitioners of economic development are increasingly less concerned with retaining residents than with attracting large volumnes of migrants from elsewhere, who bring with them new connections and ideas, keeping the Intellectual ecosystem of the city vital (Piiparinen K; Russell, 2013). The literature on humman capital, economic developmenit, migration, an1d social ties suggest s that migration flows may not be simply a result of' econoumic conditions in the affected regions. Rather, migration may also contribute to the economic development of both sending and receiving regions, whether by supplementing the human capil al stock of the receiving region or by creating social links between the two. 2.3 Mapping and Analyzing the Urban System A number of researchers have attempted to map the economic and social connections among cities over the years. The current round of interest in "city networks" can probably be dated to Saskia Sassen's book The Global Cily, published in 1991 (Sassen, I5 1991). Sassen noted that certain high-profile cities-most notably New York, London, and Tokyo were increasingly becoming the command and control centers for the world economy, and that in many ways they had stronger connections to each other than they did to the rest of their home countries. Attempts to delineate world cities as proposed by Sassen and explore the connections between them have used the location of advanced producer services firms (Taylor, 2001). These advanced producer services-accounting, corporate law, finance-were argued by Sassen to be the primary mechanisms by which world cities achieve their prominent role. By examining where such firms locate and looking at co-location patterns of their offices, Taylor was able to define a. world-city system and articulate links between its component parts. City systems did not begin with transnational producer services, however, and urban system analysis did not begin with Sassen. In the United States, Allan Pred created an extensive body of historical work on the formation of the US urban system in the eighteenth and nineteenth centuries (Pred, 1977, 1973, 1980) and the spatial diffusion of innovations (Pred, 1975, 1971). Other, recent efforts to map the connections between cities at both a. domestic and international level have utilized data from air traffic patterns (Guimera, Mossa, Turtschi, & Amaral, 2005; Neal, 2010), telephone calls, (Ratti et al., 2010; Calabrese et al., 2011), and the movement of dollar bills (Thiemann, Theis, Grady, Brune, &-Brockmann, 2010). 2.4 Themes and Direction The economic literature on migration has shed a great deal of light over the years on the determinants of migration: what types of people tend to migrate, what drives them to do so, and how they choose where to move. The empirical literature thus far has had less to say about the impacts of migration, especially at the regional level. However, some glimmers of a theory about such impacts emerge from the literature on human capital and economic growth, migration, and social networks that are worth bearing in mind. It seems plausible that, rather than being solely an effect of economic prosperity in a region, migration may contribute to it. The increase in human capital from in-migration to a region may endogenously contribute to economic growth in the future. Less directly, it is possible that the social ties migrants establish between their old and new homes may improve the economic prospects of each. Finally, if individuals migrate only when it increases their overall well being, as hypothesized by the human capital investment model, then regions with high volumes of migration churn may enjoy a net benefit independent of any absolute gains in population due to improved matching between residents and opportunities. 16 Chapter 3 Data Source and Preparation 3.1 Data Source My primlary data source throughout this tIhesis is the IRS Statistics of Income igra tion Data. This dataset is put together each year based on address changes reported -)n indiidu(Ial tax returns filed with the IRS. Dat a is a(Ore(ated to the county level. For each county, the dathaset conlains inflows and outflows to each other county in the country measured in three wars: the nurnber of ret urns, the number of exemptionsgeneraliy one exemption is clairned for each person in a filer's family-and the total adjluste gross incorne. Data for a given pair of counties is suppressed ii there are fewer than 10 returns (Internal Revenue Service, 2014). This dat aset is extremely powerful because it is not based on a sample: the entire population of tax filers in the United States is used. That allows researchers to examine the flows of migrants in much greater detail than is possible with datasets based on surveys or samples of the population, which tend to have large margins of error for small geographic samples (Isserman, Plane, &- MicMillen, 1982). For insta nce, the American Community Survey 2006-2010 5-year est imat es contain a table of county-to-county migration flows. but the margins of error are generally larger in imagnitude than the estimates. The construction of the dataset does present a few limitations, however. Most notably, since it is construct ed based on federal income tax returns, it excludes those people who do not file income taxes. This means that these counts likely underrepresent the poor and the elderly, who are not required to file federal income taxes. Additionally, this dat a is limited to returns that have been filed by Sept ember of the filing year. This captures roughly 95-98% of all returns, but late returns tend to be complex and to report high income, meaning that this data may underrepresent the very wealthy as well (Gross, 1999). In order to examine the conipleteness of the dataset, I compare it to population data from the 2010 US Census. For each county in the 2009-2010 IRS dataset I sum the number of exemptions reported as non-migrants and t he total in-migrants from all other states and foreign countries. This gives the total number of exemptions for taxes filed in the county in 2010-a rough approximation of the tax filing population 17 Figure 3-1: Histogram of Exemptions Compared to Population, 2009-2010 K C) 07 0 0C\j U- 0 - .11.I ___________________________________ 0.4 0.6 0.8 1.0 1.2 1.4 Ratio of Exemptions to Population as of April 2010. I then calculate the ratio of this total exemption number to the population in the 2010 census. A histogram of this ratio for the entire country is shown in Figure 3-1. On average, counties have about 79% as many exemptions as they do people. The overall ratio of exemptions to people for the entire country is 79% as well. It is fair to say, then, that my dataset for 2010 includes exact data for roughly 79% of the country, but misses completely the other 21%, who are likely to be on average poorer and or more elderly than those present in my data. 18 3.2 Data Preparation In this analysis, I use data from 2009-2010. The IRS dahta counts the number of people who changed their address betwe(en the pre'vious year's and the current year's filings. Here that means it generally captiures people who moved between April 15, 2009 and April 15, 2010. Note that taxes filed in a given year pertain to income earned the year before, so income reported as moving may actually have been earned prior to 3.2.1 Metropolitan Areas I conduct my aialvsis at the metropolitan level. The rnetropolitan area-a core rinicipality and it's surrounding suburbs is generally considered the most logical definitiOn of a "city" for the purposes of econonic analysis, since in many cases t he exact i-muniiipal boundaries are drawn for political or historical reasons that have little comnection to current economic or social processes. In particular, I aggregate counties into Core Based Statistical Areas, as lefinied by the Office of Management and Budget. (BSAs are collections of comities chosen to consist of one or more urban cores with at least 10.000 people along with the SurrounIng areas to which t hey are linked socioeconomicallv, as measured by commnutincg pattems (Office of Management and Budget. 2013). As opposed to migration between counties. where a person nmight move from the city to the suburbs but keep the same job and social ties, migration between CBSAs can generally be thought of as Involving a substantial change in a persons life. Since eah county is assigned to at most one CBSA, it is fairly straight forward to map each origin andldestinat ion colinty to its appropriat e CBSA and then aggregate the county-to-county flows into (BSA-to-CBSA flows. This results in a dataset with one row for each pair of CBSAs coutaining the number of returns, exemptions, and gross in ome flowing in each dire tion. CBSAs can be further divided into Metropolitan Statistical Areas (MSAs), which have an urban core of at least 50.000 pe_ople, and Micropolitan Statistical Areas, which have urban cores of between 10,000 and 50,000 people. For the rest of this thesis, the terns CBSA., MSA, metro, and city will be used interchangeably. 3.3 Net, Gross, and Reciprocal Migration The first quantity that generally comes to mind when considering migration flows to and from a region is the net change. This is the bottom line-the overall impact of irugration of a region's population-and it is certainly an important quantity to know. However, one of the benefits of the IRS dataset is that it allows us to go beyond the net changes to examine the gross flovs that, underlie them, which often paint a much richer pictulire. Consider New York and Boston. As illustrated in figure 3-2, from 2009 to 2010, according to the IRlS data, 7035 people moved from the Boston MSA to the New York 19 Figure 3-2: Migration Between New York and Boston, 2009-2010 Boston 4 4 Net Migration New York Reciprocal Migration MSA, while 7641 people moved in the other direction. That means that there was a net flow of 606 people from New York to Boston. But these 606 people represent just 8% of the people moving from New York to Boston, and just 4% of the total people moving between those two cities. Focusing solely on net change ignores the other 96% of people moving back and forth. As another illustration, consider Boston's relationship with two other cities, Chicago and Detroit. In 2010, Boston posted net gains of 211 people from Chicago and 243 from Detroit. But the net gain from Chicago was just 8% of the 2713 people moving back and forth, while that from Detroit was a full 43% of the total migrants. Boston clearly has a much stronger connection with Chicago than with Detroit, but focusing solely on net migration misses that. To fully access the information contained in migration data, it is necessary to move beyond just net migration and examine the other components of the flow. In- and outmigration are relatively straightforward: the total number of people moving in each direction. Some authors are increasingly promoting the use of gross migration, the total number of people moving in both directions. Because this thesis views migration as a measure of exchange and connection between cities, I will make extensive use of a measure I call "reciprocal migration." This is the number of people who switched places between two cities in a given year-the number that moved in both directions. It is the complement of net migration, tracking all of the population movement that didn't contribute to a net population shift. Using reciprocal migration isolates the exchange part of migration from the directional movement portion. The relationships among these various types of migration can be understood through the equation: G = I + O= N + 2 -R Where G is gross migration, I is in-migration, 0 is out-migration, N is net mi20 gration, and [? is reiiprocal migration. 21 22 Chapter 4 Observed and Modeled Migration Patterns 4.1 Metropolitan Migration Rates In 2(H)9-2010 9,653,424 people (as approximated by tax exemptions in the IRS data) moved across co(unnty boundaries in the United States. 5,280,236 of these moved to another MSA. Table 4.1 shows the top ten metro areas by number of migrants receive(l in 20092(110. Fr each metro, it lists the poputation of the MA5A, the number of migrants who entered, the number of migrants who left, and the net population change due to migration. This table brings home the distinction between net afnd gross mnigrat ion. New York and Los Angeles received nore rnigrants than any other cities in the country during 2009-2010. However, their net populat ion change due to domestic migration was negative because even larger numbers of people left. On the other hand, cities like Riverside, Washington D(, and Houston all posted substanti al population gains due to inigration despite smaller overall flows than LA or New York. Miami and San Diego form a third category, with hardly any net migration at all despite seeing aImost I 00,000 people move in and out. In addition, Table 4.1 shows that the net flows are quite small compared to the total number of people moving. Los Angeles had the largest net outflow in the country, and Riverside had the largest net inflow. In both of these cases the net change is less than a quarter of the total flow in thait direction, and thiat prop-)lortion drops relat ively quickly. On the whole, net population gains and losses due to migration amount to just 7.1% of the total flows into and out of cities, and the average pairwise flow has a net population shift accounting for just 16% of its total flux. F(:cusing on population exchange, Figure 4-1 maps the total reciprocal migration by metro. In this map as well as the many to folloy, the circles representing MSAs are sized according to population. They are colored based on the variable indicted. with blues indicating low values and reds indicating high ones. Here, we see that the metros with the largest overall numbers of reciprocal migrants are Los Angeles, New York City, and Riverside. New York and LA are not surprising considering that they 23 Table 4.1: Top 10 Migrant-Receiving Metros, 2009-2010 MSA Population In-migrants Out-migrants Net Migration Los Angeles, CA New York, NY Riverside, CA Washington. DC Dallas, TX Houston, TX Phoenix, AZ Miami, FL San Diego, CA San Francisco, CA 12,828,837 18,897,109 4,224,851 5,582,170 6,371,773 5,946,800 4,192,887 5,564,635 3,095,313 4,335,391 194,069 155,501 152,943 124,960 121,958 103,380 98,921 98,188 95,048 94,551 258,610 214,875 117,577 102.225 103,831 80,884 93,317 99,582 94,514 101,321 -64,541 -59,374 35,366 22,735 18,127 22,496 5,604 -1,394 534 -6,770 are the two largest cities in the country. Riverside is less expected, but over half of its migrants are exchanged with Los Angeles. The other cities with high levels of reciprocal migration are Washington DC, Dallas, San Francisco, Miami, San Diego, Chicago, and Phoenix. In addition to looking at the total numbers of people moving between cities, it is instructive to consider the rate of migration relative to population. One can think of a city's migration "churn rate" as the percentage of its population that migrates in and out, over the course of a given year, independent of any net, changes in population. Churn rate has no direct effect on the population size of a city, but places with high churn rates will see large portions of their populations cycle in and out ea.ch year, perhaps indicating greater dynamism. Figure 4-3 maps the churn rates of metros. Two things stand out. The highest churn rates are found in relatively small metro areas that are distributed seemingly at random around the country. However, on closer inspection these appear to be metros that contain military bases. Many of these metros are also among the cities whose migrants travel the farthest average distances. This makes sense: there are a lot of members of the armed forces, and they tend to dominate the rural areas surrounding major bases. They are also likely to move frequently due to changing deployments or discharge from service, and they often have little prior relationship with their post,. Because these movements are not strictly economic decisions-army staff choose where to send people based on military needs, and wages are not tied to being in a particular location-they may have different economic impacts than economicallymotivated migration. However, if soldiers engage with the local communities they may still form social ties that they maintain after their period of service. Beyond the very high churn rates surrounding military facilities, some regions of the country appear to have generally higher churn rates than others. The west, especially the desert southwest, and Florida stand out for relatively high churn rates across a number of metro areas. The northeast and Midwest appear to have lower churn rates on the whole. Figure 4-3 displays box plots of the distribution of churn 24 Figure 4-1: Reciprocal Migration, 2009-2010 I 0 -~ * . 00 ee e,"'4 a gI Figure 4-2: Migration Churn Rate, 2009-2010 A 0 loop* 25 Figure 4-3: Migration Churn Rate by Census Region, 2009-2010 0.045 0.040 - 0.035 2 0.030 - D0,0255 C.2 0.025 - r ~ 0.020 0.000, Moujntain Pacific South Atlantic WS Central New England EN Central ES Central Mid Atlantic WN Central rates by Census region. The median western metro has a churn rate more than 50% higher than the median metro in the West North Central region (the western Midwest and northern great plains). 4.2 Individual Migration Flows Narrowing the focus of analysis, Table 4.2 shows the largest individual flows between pairs of cities. This list is dominated by pairs of large MSAs adjacent to each other, and particularly by the flows between Los Angeles and Riverside, California (the number of people moving between L.A. and Riverside is greater than the rest of the top ten flows combined). In many of these cases, it can be debated whether the two MSAs are truly distinct economic units. In fact, the OMB releases a list of "Combined Statistical Areas" composed of groups of adjacent MSAs that have particularly strong mutual ties. Of the top 10 flows, only those from New York to Miami and Philadelphia, cross CSA boundaries. This raises an interesting point. It is impressive that 90,000 people moved from Los Angeles to Riverside in one year, but it is not necessarily surprising that large numbers of people move between adjacent large cities. Figuring out whether the LA-Riverside migration is notable requires a method to determine which flows are substantially larger than would be expected a priori, and thus suggestive of unusually strong economic or cultural ties between their sending and receiving cities, and which can be explained due to geography alone. Identifying meaningful migration links requires a "null hypothesis" for comparison. How to formulate this null hypothesis is a difficult question. It doesn't make sense to assume that migrants have an equal propensity to migrate between all pairs of cities, but any departure from this should have a theoretical foundation. Fur26 Table 4.2: Top 10 Migration Flows, 2009-2010 Origin Destination Los Angeles, CA Riverside, CA San Jose, CA Washington, DC San Francisco., CA Baltimore, MD New York, NY San Diego, CA New York, NY Philadelphia, PA Riverside, CA Los Angeles, CA Scan Francisco, CA Baltimore, Mi) San Jose. CA Washington, DC Philadelphia, PA Riverside, CA Miami, FL New York, NY Migrants Distance (km) 93,807 551729 20,014 18,732 18,120 16,936 16.870 16,179 15,374 13,964 194 194 125 89 125 89 156 179 1,743 156 ther, determining which factors explain two cities' inherent likelihood of exchanging migrants and which ones are additional explanatory variables tlhat signify a special relationship between the cities is somnething of a judgment call. If there is an interstate highway dir(ectly connecting two MSAs. should that go into the model as an a priori predictor of increased riigration between them? What about if they are in the same state. and thus share many of the sane political and economic institutions? It is conventional wisdom that there has been a net migration of people to warmer climates over the past several decades. Should temperature be considered an inherent fact or in predicting migration relat ilnships? Ultimately, I include popullat ion and dist ance as the exogenous factors that should De expected to influence migration patterns independent of any special relationships between cities. The population of a sending city determines the number of potential migrants, while that of a receiving city is a reasonable proxy for its overall attractiveness as migration (estination the likelihood of having a relative or a job opportunity there, for example. Following Tobler's first, law of geography, all else equal we would expect a city to have less interaction with distant metros than with nearby ones. 4.3 The Radiation Model Even with only two predictor variables there are a number of options for the functional form of the "null hypothesis" model. Traditionally geographers have been fond of gravity models, which take inspiration from the equations governing the gravitational forces that all physical bodies impose on each other. The amount of interaction between two cities is hypothesized to be proportional to the product of those cities' populations divided by the distance between them raised to some exponent. Theory is agnostic on what that exponent should be (in the case of physical gravity, it is equal to two. but that exponent has not been universally observed among geographic phenomena), and in practice it is often selected to fit, the data. The gravity model has a number of appealing properties and it is in wide use 27 today. However, one weakness of it, is that while it incorporates information on the population of the origin and destination, as well as the distance between them, it does not include any information on what exists in the space between the cities. The intervening features of a, landscape can have a strong impact on the amount of interaction between two places, independent of distance. In Montana, high school sports teams will travel hundreds of miles for a regular season game, because there are so few towns in the intervening area. On the east coast that type of interaction rarely involves a trip of more than a dozen miles, because there are so many nearby alternatives. One recent attempt to incorporate the population of the area between two cities, and to remove the free parameter that is the exponent in the gravity model, has been termed the "radiation niodel" (Simini, GonzAlez, Maritan, & Barabdsi, 2012). Rather than using the analogy of gravitational forces, the radiation model treats migrants (or commuters, or freight flows) as if they are physical particles. The probability of a given particle colliding with another at a distance d is equal to the probability of it hitting the latter while not having collided with any particle in the intervening space. In the case of migration, we can imagine that a, person will move if she becomes aware of an opportunity that is sufficiently superior to her current living situation in terms of employment options, access to friends or relatives, access to amenities, or other factors. Two assumptions about these opportunities underlie the radiation model. First, it is assumed that the likelihood of there being such an opportunity in a given city is proportional to its population. Most of the factors that draw people to migrate are in fact correlated with population, so this is a fairly reasonable assumption. However, it does assume that the types of opportunities are distributed uniformly across the country: if Chicago has twice the population of Minneapolis, then it has twice the number of opportunities, period, and there's no reason why it might be more or less attractive to certain subsets of the population. This is almost certainly false, but it creates an effective null hypothesis. Second, it is assumed that people will tend to move to opportunities closest to their current locations. This could be because they are more likely to learn of opportunities that are nearby, or because longer moves are more costly. The result of these two assumptions is the assertion that people will only move if they find a sufficiently superior opportunity, and that they will move to the closest such opportunity that they find. To estimate the overall flow of migrants from one city to another, this decision process is multiplied by the total number of people in the city of origin. The estimated flow is thus directly proportional to the population of both cities: the city of origin because its population is the pool of potential migrants, and the destination city because the presence of attractive opportunities is correlated with population. The estimated flow is inversely correlated with the total population of all cities that are closer to the origin than is this particular destination, because people will only move to the destination if they have not yet found a sufficiently good opportunity at a closer location. This last assumption is not entirely count erintuitive, but it does seem a. little less straightforward than the rest of the assumptions underlying the model. People learn about opportunities in all sorts of ways, and if they are already going to be switching 28 jobs they mlay not care how far they move. Still, if we limit ourselves to purely a priori thinking based only on popullation and distance it seems reasonable to think that people will mnore often tend to moXve to nearby places tha n to farther ones. The radiiation miodel was originally developed for coiimuting flows, arid in that case the assumptiol1nI isore innocuous: since people don't like to commut e, they will tend to take the closest job to their house than is a sufficiently good amatch for their interests and desired compensation. Titirnately, V - the fraction of all migrants leaving city z who end up in city j, is predieted to be: A) , A i-,jQj ( I-.I ]) j + Si) Where pA, is the population of city k and sij is the total population of cities that are closer to city I than city J is. This ultiimate expression is somewhat similar in form to t hat of t he gravity model, but with sij replacing distance in the denominator (along with pi and p]). Note that unlike the gravity model, the radiation model is not symmetric: the predicted flow from city 1 to city j will not generlly be equal to that from city j to city i. This is because while the cities and the distance dij between them are constanlt, a circle of radius dij centered at city I will c(ntain different cities is in a I with aI different total population than one centered on city j. So if' to migrants total its of peripheral location it is likely to send a higher proportion centrally located cit jvthan city j will send back. However. it is possible that city j rnay send more total migrants in which case the numerical flows may be similar. 4.3.1 Implementation of the Radiation Model Although this study is primiarily concerned with migration among MSAs, I include nlonnetropolit an count ies in this calculation to fully account for the intervening opportunities that are central to the radiation model. TO implemnent the radiat io n)model I first find the geographic centroid for each NISA and no1netropolitall county. Note that in some cases the M\SA centroid may be a somnewhat imprecise represent ation of the of a netro's center of population. Many MSAs, especially in the western U)S, contain geographically large counties that are sparsely populated. For exaniple, San Bernardino County, CA, part of tile Riverside MSA, is larger that Vermont and New Hampshire combined, but almost all of its population is concentrated in its southwestern corner. The centroid of the Riverside MSA is almost certainly substantially north and east of the vast majority of its population. Next I compute the great circle distance between each pair of centroids. I then calculate the total population within that distance of each centroid based on the 2010 Census. That information is sufficient to calculate F as described above. I then multiply F& 1 by the total number of migrants originating at city I in the 2009-2010 migration data. I compute the radiat ion models predictions for 2009-201() migration flows. 29 Origin Table 4.3: Largest Positive Radiation Model Residuals Destination Predicted Actual RIesidual Los Angeles, CA New York, NY Riverside, CA San Diego, CA Miami, FL New York, NY San Jose, CA New York, NY Los Angeles, CA New York, NY 4.4 Riverside, CA Miami, FL Los Angeles, CA Riverside, CA New York, NY Los Angeles, CA San Francisco, CA Atlanta, GA New York, NY Orlando, FL 35,087 650 41,942 3,797 523 639 13,559 1,150 785 320 93,807 15,374 55,729 16,179 11,134 7,883 20,014 7,352 6,928 5,995 58,720 14,724 13,787 12,382 10,611 7244 6,455 6,202 6,143 5,675 Distance (km) 194 1,743 194 179 1,743 3,943 125 1,222 3,943 1,534 Radiation Model Results Figure 4-4 compares the predicted results of the radiation model with those observed. To measure goodness of fit I use the "common part"' statistic employed by Lenormand et al. (Lenormand, Huet, (argiulo, & Deffuant, 2012). The statistic is computed as: 2. E E min (Aj, 1TJ,) i=I 7 1 M i=Ij-1 =1 Where n is the number of MSAs, ,ij is the observed migration flow from location to location j, and AMfj is the predicted migration flow from location i to location j. When the model and the have the same total number of migrants, as they do here, the statistic measures the fraction of those that are correctly classified. The common part statistic for the radiation model on 2009-2010 migration data is 0.527, indicating that the radiation model is able to correctly classify about half of the observed migration flows. This is a, reasonably good fit, considering that the radiation model is parameterless-it, is based purely on theory and is not tweaked to reflect the observed data-and that it uses only two variables to predict a noisy and idiosyncratic process. The common part statistic of 0.527 is in line with values found by Lenormand et al. in their evaluation of the radiation model across a number of commuting datasets. More interesting than the model's overall fit are the specific instances in which it over- and under-predicts the flows of people. These represent connections between cities that are either stronger or less strong than would be expected based purely on the physical distribution of people across the country. Tables 4.3 and 4.4 show the top ten positive and negative residuals from the radiation model. Table 4.3 shows the largest positive residuals from the model. These are flows that are much larger-hence representing stronger connections between their start and end points-than are predicted by the radiation model. There are two main types of 130 Figure 4-4: Actual vs. Predicted Migration Flows, 2009-2010 105 U. 10 4- -U- -0 :3 * 102 4-J 10 - 100 0-2 I1, 10 102 10 5 104 103 Actual Value Origin Table 4.4: Largest Negative Radiation Model Residuals Destination Predicted Actual Residual San Diego, CA Riverside, CA Las Vegas, NV Los Angeles, CA Phoenix, AZ Colorado Springs, CO New York, NY Washington, DC Phoenix, AZ Tucson, AZ Los Angeles, CA San Diego, CA Riverside, CA San Diego, CA Riverside, CA Denver, CO Philadelphia, PA Baltimore, MD Tucson, AZ Phoenix, AZ 31 71,669 49,841 35,194 38,322 23,550 20,959 32,938 33,121 18,019 17,569 12,652 12,060 3,289 13,675 2,689 3,528 16,870 18,732 4,050 5,654 -59,017 -37,781 -31,905 -24,647 -20,861 -17,431 -16,068 -14,389 -13,969 -11,915 Distance (km) 178 179 212 178 405 69 156 89 121 121 flow represented in this table. The first are short-distance flows-generally less than 200 kilometers, meaning that they are between metros that, are essentially adjacent. These flows are generally predicted to be quite large, but the observed numbers are even greater. Los Angeles was predicted to send 35,000 people to Riverside, but it actually sent over 90,000. The second type of positive residual is perhaps more interesting from the perspective of determining the urban structure of the United States. It consists of longdistance flows, generally between major cities, that are predicted to be small but are in fact quite substantial. The strongest of these link New York City to Miami and Los Angeles, though also in the top ten are flows from New York to Atlanta and Orlando. Outside of the top ten list some of the highest long distance residuals are found on the flows from Dallas to Atlanta, New York to Tampa, Atlanta, to Miami, New York to San Francisco, and Chicago to Phoenix and Los Angeles. The negative residuals shown in Table 4.4 are somewhat more uniform in nature. They tend to be the inverse of the first type of flow represented among the positive residuals: MSAs that are close together and that do exchange large numbers of migrants, just not as many as the radiation model would predict. The lack of large negative residuals at long distances makes sense given the radiation model's preference for nearby opportunities: it is not going to predict large long-distance movements if there are closer opportunities. There are a few flows in Table 4.4-most notably those from Las Vegas and Phoenix to Riverside where the metros in question are not actually adjacent but simply have very few intervening opportunities (the primary one in these two cases being the Mojave Desert). These two have the smallest observed flows as a percentage of those predicted-less than 10% in both cases. They may be cases where distance or physical barriers act as an impediment to migration even in the lack of intervening opportunities. This phenomenon is illustrated even more clearly in the flow from Honolulu to San Francisco, which is predicted to be 8817 people but in reality is just 758. Possibly the most surprising residual in Table 4.4 is that between Denver and Colorado Springs. These are relatively large cities in the same state separated only by seventy kilometers of fertile plains, but Colorado Springs sends an order of magnitude fewer migrants to Denver than the radiation model would predict. This could potentially be due to cultural differences between the two cities: Colorado Springs is known as a relatively conservative city, while Denver is generally seen as more progressive. The largest positive and negative residuals involve the three major MSAs of southern California-Los Angeles, Riverside, and San Diego. The extreme size of these residuals is likely due in part to the quirk in how MSA centroids are calculated described above. Because the MSA centroids are calculated using the full area of all counties in the MSA, the model treats San Diego as being closer to Los Angeles than to Riverside, and closer to Riverside than Riverside is to LA. It is difficult to precisely define the distance between two cities, but by almost any measure this is incorrect: Riverside and Los Angeles form one continuous urbanized area., while San Diego is a hundred miles to the south. The driving distances between the downtowns of the central cities of these MSAs are 54 miles from Los Angeles to Riverside, 98 miles from 32 Riverside to San Diego. an(d 120 miles from San Diego to Los Angeles. This inaccuracy likely has a draminatic impa ct on the predict ions, because it casts San Diego as a n intervening op)porturit y between Los Angeles aiid Riverside, and Los Angeles as one between RIiverside and San Diego. Since these metros are each other's nearest neighbors, this is the difference between having si be zero and having it be 12 million. Consider that the flow from San Diego to Riverside was predicted to be just 3,797, while that from San Diego to Los Angeles-just one kilometer closer, and with roughly three times the populationvas predicted to be 71.669. 4.5 Patterns of Residuals Figure 4-5 firther investigates the relationship between extreme residuals and distance. It shows box and whisker plots summarizing the distribution of residuals for different distance classes. Each distance class is labeled with its upper limit, so the boxplot laleled "250" represents distances between zero and 250 km, that labeled "500" represents dist ances between 250 and 500 km, and so on. Plot A shows these plots for all residuals. While the vast majority of residuals in all distance classes are clustered around zero, the extreme residuals are almost entirely positive at distances creater than 1000 kilometers. Plot B further examines this trend bv Show\ inlg only residuals with an absolute value over 500. These instances represent just 3 8% of the mnigration flows, but account for 55% of the total error associated xvith the model. At distances under 250 kilometers the median such residual is negative, alnost -1000. For distances between 250 and 500 kilonieters the nedian residual is positive but the interquartile range still extends to roughly -1000. At greater distances, though, there are very few large negative residuals, ami there are none at all for disti ances between 1000 and 2250 kilometers. This suggests that there are substantial numbers of large, long-distance mioration flows, which are not incorporated into the radiation model's framework. Large residuals generated over dist ances greater than 250 kilomneters account for roughly 18% of the total error. In addition to grouping residuals by distance, they can be grouped by metro area. Because the radiation model directly incorporates the total number of outmigrants into its predictions, analysis of residuals grouped by the source metro is not useful. But grouping residuals by destimation MSA provides a portrait of which MSAs are more attractive than geography would predict. These are listed in Ta7ble 4.5, and the results are striking: some cities attract far, far more migrants than the population distribution alone can account for. Dallas received almnost four times as many migrants in 2009-2010 as the radiation model predicted, and many large cities received more than double the amount. These cities may be places whose unique characteristics make them stand out to mnigrants, offering something that cannot be replicated elsewhere. Figure 4-6 maps the aggregated residuals for the whole country. Besides the major MSAs in Table 1.5, there are large residuals found in several midsize cities in the southeast. The aggregate residuals for Los Angeles, San Diego, and San Francisco are low compared to most other cities their size. Among large cities Philadelphia stands out as one of the few where in-migration was substantially less 33 Figure 4-5: Residuals versus Distance B: Residuals >500 by Distance A: All Residuals by Distance 600 0- j 2000 j 400 0 -- 16 :3 -o 'U 1000 200 0 - 0 0-200 0 0 - 1000 . '- - -400 0'0 -2000 -600 ~/lflAPLI 0 250 I 500 050 0000 1250 1!500 1050 2000 2250 2500 0050 0 2000 Distance Between Origin and Destination (kin) 250 500 7500 1000 0250 1500 1750 000 2250 2500 2750 0000 Distance Between Origin and Destination (kn) Table 4.5: Top Aggregate Residuals by Destination MSA Actual Flow Predicted Flow Residual MSA Dallas, TX New York, NY Washington, DC Miami, FL Atlanta, GA Houston, TX Phoenix, AZ Chicago, IL Riverside, CA Seattle, WA 121,958 155,501 124,960 98,188 90,857 103,380 98,921 87,986 152,943 75,297 33,263 71,915 46,983 22,494 22,565 38,330 44,619 35,663 104,963 31,138 88,695 83,586 77,977 75,694 68,292 65,050 54,302 52,323 47,980 44,159 than the radiation model predicts. Taken together, the analysis of residuals by distance and by destination MSA suggest that there are major structural features of the US domestic migration system that are incompatible with the radiation model. This is not an indictment of the model-as stated above, the purpose of using the radiation model here is to create a null hypothesis of what migration would look like if it were influenced only by the spatial distribution of population. The observed migration patterns differ from these features, so other features of the urban economic and social system must play a role. Determining the exact nature of these features is beyond the scope of this thesis. Rather, I turn now to simply examining the structure of these linkages in greater detail. 34 Figure 4-6: Aggregate Residuals by Destination INISA 4A. * 0* 0@ 36 Chapter 5 Centrality Analaysis of the Migration Network Thus far this thesis has documented and analyzed the patterns of individual migration flows betveeen (cities, and det ermined that they conformi to a structure more compy)ex than can be explained by the geographic distribution of population alone. Going forward I shift to an analysis of this structure, remainning agnostiC on what factors might influence the patteiii of flows that is observed and instead examining the network of ties that they create. In this section. I use measures of centrality from network science tc) explore the relative positions of various eities within the domestic migration network. Central cities will tend to have stronger ties to a more diverse set of MSAs. Because the emphasis here is on the strength of connection, and for ease of cornputat ion of certain metrics, I use reciprocal migration as my primary variable of interest. 5.1 Degree Centrality There are a number of measures of centrality cornmonlv used in network science. The most straightforward of these is degree centrality. The degree of a node is defined as the number of c(nnections it has to other nodes. and a central node is one that has strong direct connections to lots of other nodes. Degree carn be measured without weighting, simply counting the total number of other nodes a given node is connected to, or the measure can be weighted by the strength of the connection. In the case of migration, unweighted degree measures the number of cities that a given metro exchanged migrants with, while weighted degree measures the total number of reciprocal migrants that it had. Both of these measures are useful. The unweighted measure gets at the sheer geographic reach of a city, while the weighted measure describes the total volume of mniigrant turnover the total amount of exchange that's happening. However, each measure on its own provides an incomplete picture of what's going on. A high un- weighted measure may originate with a city that, exchanges just one migrant with a large number of metros. In that case it may be wrong to conclude that such a city has 37 Figure 5-1: Unweighted Degree, 2009-2010 le 0 0.( a particularly large amount of interaction with the rest of the country. On the other hand a high weighted measure may document a city that exchanges an enormous number of migrants with just one other region. Figure 5-1 shows the unweighted degree of US metro areas. Again, circles are sized proportionally to their population, while the color ranges from low scores in blue to high ones in red. Phoenix has the highest degree-it exchanged migrants with 368 other metro areas in 2009-2010. Following it are San Diego, Los Angeles, Chicago, Las Vegas, and Houston. With the exception of Chicago, these are all Sunbelt cities, and many have reputations as retirement communities or job magnets. The high unweighted degrees of these cities indicate that they are able to attract, migrants from a diverse range of communities at the national scale. However, it is worth keeping in mind that these metros exchange migrants with all of those communities; they don't just receive them. So while they are receiving migrants from all over the country, they are sending people to those communities as well. Weighted degree is the same as total reciprocal migration, and it was already mapped in Figure 4-1. Compared to unweighted degree, the weighted degree is more concentrated: Los Angeles and New York are in a league of their own, with Riverside, Washington DC, and Dallas substantially further behind. New York is an interesting case because its unweighted degree is relatively low, implying that it is exchanging larger numbers of migrants with a smaller set of cities. Figure 5-2 plots degree versus weighted degree. There is a fairly strong correlation between the two at lower levels that opens up a bit among the most connected metros. 38 Figure 5-2: Degree vs. Weighted Degree, 2009-2010 Figure 5-3: Population vs. Weighted Degree, 2009-2010 200000 10 150000 10, Mets -____________________________ t10t 10000 -. 01 10- 1'. I0S00 O0 200 250 300 050 00 w Unweighted Degree o, 0 Pnoulti0, i\Jetros that, exchange migrants with more than, 100 other cities tend to have a, higher weighted degree than would be predicted based on the trend among less-connected metros, but there is also substantial variation among them. New York and Los Angeles have particularly high weighted degrees for their levels of 1unweighted degree. while Phoenix and Las Vegas have lower weighted degrees than are typical for their levels of unweighted degree. Figure 5-3 plots population versus weighted degree. The relationship is linear, and striking-the correlation between population and the number of reciprocal migrants is 0.92. 5.2 Closeness Centrality A second measure of centrality that is frequently used is closeness centrality. In an unweighted network, this measures the average number of steps it takes to get from a node of interest to all other nodes in the network. This measure thus takes the degree of the node into account, but it also incorporates the degrees of the nodes it is connected to. If a. node has only one link, but that link connects it to a hub, it can still claim a, high closeness centrality score. In the case of weighted networks such as this one, this measure becomes somewhat more complicated. How should one calculate the average number of steps when some links carry far more people than others? One approach draws on the physics of electricity (Brandes & Fleischer, 2005). Electric circuits are often constructed with parallel paths that have varying resistances. When this occurs, it is not the case that all of the current flows through the path with the least resistance. Rather, some current flows through each path, with the exact amount proportional to the inverse of the resistance. We can conceptualize a weighted network as an electric circuit, with the weights (the numbers of migrants in this case) being thought of as the "resistance." Then the analogue of the distance between two nodes is the effective resistance of all paths connecting them. This yields as measure known as the "current flow closeness cen39 Figure 5-4: Closeness Centrality, 2009-2010 K% trality," which is equivalent to "information centrality," a measure first proposed in the 1980s that has been infrequently used due to its unintuitiveness (Stephenson & Zelen, 1989). Nodes that score high on this measure can be thought of as being near the center of the network in the sense that they communicate relatively well with most other points within it. Figure 5-4 shows the current flow closeness centrality of the US reciprocal migration network in 2009-2010. What is immediately striking about this map is the sheer amount of red-there are a lot of metros with very high closeness centrality, including almost all the major population centers. This suggests that the web of migrants tying the country together is relatively complete, especially among populous metros. If a large metro area, doesn't have a direct connection to a given city, it is likely to have a very strong connection to another metro area that does. The metros showing low closeness centrality tend to be micropolitan areas, especially in the upper Midwest and interior south. These are places that exchange only a few migrants with a small number of places, none of which are very well connected. One interesting feature of the map is that there are extremely peripheral metro areas in physical proximity to major centers. Southern Georgia, for example, has several peripheral nodes sandwiched between Atlanta, Jacksonville, and Augusta, all of which are quite central. This affirms that physical proximity does not necessarily imply a high degree of social connectedness. Figure 5-5 plots closeness centrality against population. The strong but nonlinear relationship between population and closeness centrality stands out here. In 40 Figure 5-5: Closeness Centrality vs Population 0O % Ln V) 1 0ee U @0 101 14 10 10 5 106 Population 41 10 particular, the closeness measure plateaus once the metro population reaches one million-every single metro with more than I million people is also at the very top of the closeness centrality measure. This implies both that the large metros are extremely well-connected and that they are roughly equally connected. The overall correlation between closeness centrality and population is 0.36. Whether closeness centrality as measured here is meaningful sociologically is an open question. If LA exchanges many migrants with New York, which exchanges lots of migrants with Buffalo, does that. establish a, meaningful proxy connection between LA and Buffalo? It's unlikely that many individual people moving from Buffalo to New York continue onwards to Los Angeles. But perhaps coexisting with Buffalonians creates in New Yorkers a base level of cultural awareness about that city1 that they bring with them to Los Angeles. The lack of differentiation in closeness centrality among large metros is initially frustrating: how does one determine if San Francisco or Houston is more central? But that may be the point: with advances in communications and shipping technology, specific location has become somewhat less important, at least among a certain set of large metropolitan areas. Outside of a few extremely elite or innovative centers, many locations are somewhat interchangeable. A national company looking to site a new mid-level management facility may be equally willing to put it in suburban Providence or suburban Denver. But it will think twice about putting it in suburban Duluth. The closeness centrality measure appears to capture this distinction, and could be considered as a means to delineate the "metropolitan core" of the country- the metro areas from which it is essentially equally possible for a typical business to participate in the national economy. An alternative view of this measure is that living in Boston it is not altogether unusual to come across someone from Minnesota. But it is surprising to find a Minnesotan from outside of the Twin Cities metro area. Much of that is no doubt due to the sheer population share of Minneapolis-St. Paul within the state, but some may be because the cities are more integrated into the national migration network than the rest of the state. Perhaps the set of high-closeness cities can be considered an approximation of the set, whose members one expects to come across in everyday life in a, given member city. Again with such a strong relationship between closeness and population, it's hard to parse what effects are driven by population and what are about position in the migration network. 5.3 Betweenness Centrality A third frequently used measure of centrality is betweenness centrality. This attempts to measure the extent to which a given node acts as bridge between otherwise disjoint parts of the network. In the unweighted case, this is done by finding the shortest paths--those involving the smallest number of steps-between all pairs of nodes in the network. The betweenness centrality of each individual node is defined as the fraction of all shortest paths that pass through it. A node with high betweenness is "central" 'Such as whether Buffalo buffalo Buffalo buffalo buffalo do in fact buffalo Buffalo buffalo. 42 Figure 5-6: Betweenness Centrality, 2009-2010 I lz 10 *. 00 0 in the sense of being integral to the network: if it, is removed, it becomes substantially harder for the rest, of the nodes to communicate with each other. In the case of ain airline network, the nodes with high betweenness are the hubs the places one has to travel through to get, fron Ipoint A to point B. The interpretation of betweenness centrality is less straightforward in the ease of mnigration. Unlike in air travel, migrants are not constrainled to flow via, the links in the network. There is no sense in which someone plann11ing t~o migrate fromn Boston to Seattle has to stop in C"hicago first, and there is no reason to expect that many of the people moving to Chicago from Boston will continue to Seattle in a, few years. Howvever, cit ies with high measures of betweenness centrality can still be thought, of as hubs in the sense that, they exchange large numbers of mnigrants with parts of the country that, do not, have extensive (lirect interaction. High- betxveenness cities are cosmnopolitan centers, welcoming migrants from all regions of the country. As with closeness cenrtrality, the presence of weighted edges makes computing betweenness more difficult. Again, the electric current analogy is used to adjust for the number of migrants flowing on a, given link. It also allows for the incorporation of more than just the single shortest path between any twxo nodes. The electric current eqluivalent of the fraction of shortest paths between node A and node B that pass through node C. is the fraction of a, unit, AB current that passes through C. With this fraction defined for each pair of nodes it, is straightforward though computationally intensive to calculate the overall betxveenness centrality. Figure .5-6 shows the bet weeniness centrality of' metro areas based on the 200943 Figure 5-7: Betweenness Centrality vs. Figure 5-8: Population Betweenness Centrality 10 Closeness Centrality vs. 10 Betweenness Centrality Popo1tion 2010 reciprocal migration network. As opposed to the closeness centrality map, here there is very little red. Only a few cities have high betweenness, and there is a fairly strict hierarchy among them. Chicago has by far the highest betweenness score. This distinction aligns with its role as a transportation and freight center, and with its reputation as the hub of the United States. Dallas has the second highest betweenness measure, and it shares many characteristics with Chicago: they are both large metros in the center of the country that have extensive transportation links with the rest of the US. It is interesting to find that Dallas has a higher betweenness rating than Houston, since they are similarly sized and Houston has a reputation of being a stronger attractor of international migrants. New York City ranks third in betweenness, followed by Los Angeles, Atlanta, Phoenix, Houston, Washington DC, and Minneapolis. Though they are situated on the coasts, New York and LA are the largest two cities in the country, the cultural and economic centers of the east and west coasts. As such, it makes sense that they draw and send migrants from a. diverse set of regions. The rest of the high-betweenness cities show a combination of high population with a central or Sunbelt location. Figure 5-7 plots betweenness centrality versus population2 . Even more than with closeness centrality, there is an extremely strong, almost linear relationship between betweenness centrality and population, with a correlation of 0.86. Figure 5-8 plots closeness centrality against betweenness centrality. The pattern is almost identical to figure 5-5, with a positive relationship between betweenness and closeness centrality among metros with lower levels of both that flattens out near the top. The overall correlation here is 0.46. 2 For ease of viewing I have excluded the Thomaston. Georgia MSA from figures 5-7 and 5-8. At 6.8 -10-17 its betweenness centrality is twelve orders of magnitude smaller than that of any other observation. 44 Chapter 6 Detecting Communities in the Migration Network A second major area of network analysis is the identification of communit es within networks. Here the goal is to partition the network into conininnities of nodes such that the connections within ea h group are much stronger than the conle( tions )(tweein groups. In the case of social networks, each comnmunity can be thoulight ofas a, friend group or social c(irce-a set of people with tight ties to each other and weaker ties to the rest of the world. In the context of migration networks, cornmunities are regions of thlie counitry: groups of cities that exchange many migrants with each other and do not send as many to other parts. It is not a giveln that these migrat ion regions will share common cultuiral attributes or econoiiic linkages. But migration will likely be correlated with both of these types of connect ion. Pe p leaving their honmetown will, all else equal, probably be drawn to culturally similar cities. Once they arrive in their new town, they coitribute to a cultural exchange between it and their city of origin. Similarly, as described above economiists have generally found economic considerations to be paramount in determining propensity to migrate. Cities with strong econonic connect ions to each other--branch locations of the saime companies, high volumes of trade-will have more opportunities for migration. And this migration is self-reinforcing since people are more likely to make investments in or transact with companies in places that they are familiar with. 6.1 6.1.1 Approach to Community Detection Modularity How to best partition a network into coniunities is still an open question in network an connmunity detection algorithms work on the principle of modularity science. optimization. The "modularity" of a network partition is computed as the fraction of links that fall within the proposed groups minus the fraction that would be expected if links were distributed at random (Newman k&Girvan, 2004). Modularity can range 45 from - to 1. A score below zero means the partition is terrible more links cross proposed boundaries than wouldi be expected if the lines were drawn at random. A score of zero means that the partition is exactly in line with random chance: the fraction of ties that cross conmmnity boundaries is exactly what chance would predict. Positive values of modularity denote increasingly good partitions. For most networks a good partition will result in a modularity score between 0.3 and 0.7. 6.1.2 The Louvain Algorithm Even with a well-defined measure of partition quality, finding the optimal partition of a network is extremely difficult. The sheer number of possible partitions makes it impractical to search all of them looking for the best one. Instead, a number of algorithms have been proposed that attempt to find an approximation of the optimal partition. Here I use the Louvain method for community detection (Blondel k Guillaume, 2008). This algorithm is noted for its low computational complexity. It begins by assigning each node to its own community. The algorithm then iterates over the nodes. For each node, it calculates the change in modularity that would result from assigning the node into each community found among its neighbors. The node is then assigned to whichever community most increases the modularity. The algorithm iterates through all nodes and repeats until it makes a full pass without reassigning any node. Then it creates a new network with one node for each community from the previous phase. It, then repeats the entire process until it reaches a point where the modularity begins to decrease. What makes the algorithm so fast is its "greedy" nature: it never takes back an earlier step trying to improve the overall results, even though there are almost certainly times when that would be desirable. 6.1.3 Initial Partitions Figure 6-1 shows the result, of one run of the Louvain algorithm on the migration data, for 2009-2010. The country is partitioned into seven communities, each occupying a, different region. The entire western third of the US forms one community, encompassing everything from California to Colorado. A second large region might be termed "Greater Texas," containing all of that state as well as Oklahoma, Louisiana, Arkansas, Missouri, Kansas, and New Mexico. The Upper Midwest forms a third community, centered on Chicago and M\inneapolis. Michigan, Indiana, and Ohio comprise the core of a community in the eastern Midwest, which extends to include Pittsburgh. The eastern seaboard from Washington DC to Maine makes a fifth community. The final two communities are in the South. One is centered on Virginia, and the Carolinas, while a, second contains everything from Mississippi to Kentucky to Florida. One interesting feature of these communities is the fact that each of them is spatially contiguous. There are no enclave cities located within one region but tied most strongly to the nodes in another. This is not necessarily unexpected-it makes sense that proximate places should exchange more migrants-but it does emerge strictly 46 Figure 6-1: Community Detection Round 1, 2009-2010 10 F, C V -j from the data. There is nothing in the community detection algorithm that, requires nodes in the same community to be located next to each other. Unfortunatelv. identifying "true" or "correct" communities in a network is not this straightforward. Figures 6-2 and 6-3 show the output from two further runs of the Louvain algorithm. This is the exact same algorithm run with identical settings, differing from Figure 6-1 only in the order through which the nodes are iterated. The results are not entirely dissimilar from those in Figure 6-1, but the differences are substantial. The total number of communities changes from run to run, the relative sizes of each community shift substantially, and sometimes enclaves even appear. Further runs of this algorithm produce continued variations in the output. There are consistent features: the western states always form one community, Texas is at the center of another, the northeast a third. But the details change substantially from one run to another. The ultimate number of communities varies from five to seven, and even in the most stable parts of the country the community boundaries are always in flux. This variability in results is partly due to the approximate nature of the algorithm used. There is randomness in the approximation process, so it is natural that repeated iterations produce slightly different results. A more thorough algorithm might be able to hone in on the one partition that truly maximizes the modularity score. But the true problem lies at a deeper level, with the modularity score itself. The modularity function has been found to have a fairly flat surface (Good, Montjoye, & Clauset, 2010). In many cases, there is no obvious peak of modularity that, defines a clear 47 Figure 6-3: Community Detection Round 3, 2009-2010 Figure 6-2: Community Detection Round 2, 2009-2010 optimal partition. Rather, there are often a wide variety of partitions with very similar modularities. This is what we observe here. Running the Louvain algorithm 100 times produces 100 substantially different partitions, whose modularity scores vary only from 0.394 to 0.411. The standard deviation of modularity in this sample is only 0.0037. Yet the variation of proposed communities is extensive. Given that such different partitions can have such similar modularities, it becomes hard to argue that modularity maximization alone will find the single most optimal partition. Good et al. propose a number of alternatives to pure modularity maximization. These include combining information from many distinct high-modularity partitions, attempting to estimate the statistical significance of a, partition, and using generative models to account for overlapping communities. 6.1.4 Repeated Louvain Runs The approach that I take to overcome the difficulties of modularity optimization is relatively straightforward, and relies on combining information from multiple highmodularity partitions. The results of the Louvain algorithm differ from run to run, but there are common features throughout. The core cities in many communities do not change over time, and there are groups of metros that stay together even as they flip from one community to another. To more completely observe these subtleties of the migration community structure, I run the Louvain algorithm repeatedly and examine the co-occurrence of various cities within the same community. This approach turns the approximate nature of the Louvain method into an advantage, using the randomness it induces to more thoroughly explore the various high-modularity partitions. It is similar to the approach taken by Thiemann et al. in their analysis of money circulation (Thiemann et al., 2010). Figures 6-4 and 6-5 show the results of this approach applied to 100 runs of the Louvain algorithm. Figure 6-4 draws lines connecting metros that appear in the same community at least 95 percent of the time. These groups of metros are the building blocks of the communities found throughout process, the pieces that do not come apart. There are two major takeaways from this map. First, there are some communi48 Figure 6-4: Metros Co-Occurring 95% of the Time ties that really are extremely cohesive. The western states essentially always form one community, and the boundaries rarely change except to include or exclude New Mexico. The line separating the west from the rest of the country is particularly clear cut. Similarly, the Upper Midwest community, comprised of Illinois, Wisconsin, Minnesota, Iowa, and the Dakotas, is present in virtually the same form 95 percent of the time. The northeastern states (excluding western Pennsylvania) and Texas are other areas that form consistent, large comiunities. In the eastern half of the country there are fewer consistent large communities of metros. Instead, there are many groups of 10-20 metros that are consistently in the same community. In many cases these groups conform fairly well to state boundaries. Georgia, Florida, Alabama, Mississippi, Kentucky, and Nebraska all have self-contained groups that include almost all of their metros, while Missouri and Kansas form one group. Additionally, it is interesting to note that there are very few cities that do not share a community 95 percent of the time with at least one other metro area. Most cities have fairly tight relationships with at least one other place. Figure 6-5 shows the overall pattern of community formation over 100 runs of the Louvain algorithm. Here the darkness of the line connecting two metros is proportional to the fraction of the time that they fall into the same community. The "building blocks" apparent in Figure 6-4 begin to coalesce into larger groupings. The statesized communities in the Deep South-Alabama, Mississippi, Tennessee-are seen to frequently form one larger mid-south community. Similarly Oklahoma and Arkansas are joined with Texas almost all of the time, and Kansas-Missouri only slightly less 49 Figure 6-5: Community Co-Occurrence frequently. Perhaps most notable are the strong connections up and down the east coast, linking Florida to the Northeast. The relationships diagrammed in Figure 6-5 can be used to weight each edge by the percentage of the time it crosses a community boundary. This arrives at an approximation of the percentage of a given city's migrants who come from a "different community," even without strictly defining the communities. For analyses that depend only on the percentage of a city's migrants that come from outside of its community it is perhaps more accurate to use this measure than to impose one formalized partition. For some analyses, however, it is helpful to have one strict community partition. This can be developed by running the Louvain algorithm on the community cooccurrence network. Figure 6-6 shows the result of this approach. The communities found in this map are quite stable: repeated runs of the Louvain algorithm produce almost no variation in the communities found (the only change that occurs is that Kentucky tends to switch between the Mid-South and East Central communities). The end result of this procedure is a partition of the United States into six migration regions. These are the West, including all states west of the continental divide as well as Alaska and Hawaii; Greater Texas, which stretches as far as Louisiana, Missouri, aid Kansas; the Upper Midwest, including Illinois, Wisconsin, Iowa, Minnesota, and the Dakotas; an East Central region comprised of Michigan, Indiana, and Ohio; the Mid-South of Mississippi, Alabama, Tennessee, and Kentucky; and the East Coast from Maine to Florida. The modularity of this partition is 0.41. That 50 Figure 6-6: Formalized Communities 0 >J~; (~~} K ~ u '\~ K - U tx.' 7;!> K (<) Cr7 - 0 ( ~ a >- (Ay N~ (J CO )(CJ \ ;~) >E~d isn't as high as the highest of the individual runs, but it is in the top 5%. 6.2 Community Roles of Individual Metro Areas Having arrived at a partition, a natural next step is to investigate the roles played by different cities within the community structure. Do certain cities dominate their communities, either by monopolizing migration within the community or by being its primary link to the rest, of the country? This analysis of within-community roles complements the centrality analysis of the full network conducted above. 6.2.1 Extra-Community Degree First I examine the extent to which different cities are connected to metros beyond their immediate community. This is calculated by weighting the number of migrants exchanged between two metro areas by the percentage of the time they are in different communities as displayed in Figure 6-5. Table 6.1 displays the top 10 metro areas in terms of this extra-community degree. Chicago tops the list, exchanging almost three-quarters of its reciprocal migrants with cities outside the upper Midwest. Los Angeles and New York are next, although their extra-community migrants make up a much smaller fraction of a larger total number. Comparing extra-community to overall weighted degree (mapped in Figure 4-1), Chicago is far more prominent here. Additionally, Riverside has dropped from 51 Table 6.1: Top 10 Extra-Community Migration Hubs, 2009-2010 MSA Population Extra-community Degree Total Weighted Degree Chicago, IL Los Angeles, CA New York, NY Dallas, TX Atlanta, GA Washington, DC Phoenix, AZ Houston, TX San Diego, CA Miami, FL 9,461105 12,828,837 18,897,109 6,371,773 5,268,860 5,582,170 4,192,887 5,946,800 3,095,313 5,564,635 59,914 49,202 48,013 43,146 38,830 38,329 33,264 31,028 29,673 24,946 83,655 188,467 153,050 95,159 72,252 95,434 82,350 75,998 84,420 87,077 having the third highest total degree to not even making the top 10 in terms of extra-community degree (it comes in at number 19), because so many of its migrants are exchanged with Los Angeles. New York occupies roughly the same position in extra-community degree as it does in weighted degree, while Washington DC, Dallas, and Atlanta are more prominent in terms of extra-community degree than they are in pure weighted degree. On the whole, extra-community migrants are more concentrated in a, few major cities than are migrants on the whole. For instance, the top 10 cities account, for 37% of the extra-community migrants, compared to just 23% of the total migrants. Figure 6-7 shows the rank-ordered distribution of population, weighted degree, and extra-community degree. The x-axis is the metros ranked by the variable of interest (note that the ordering is different for each variable), while the y-axis shows the percentage of the total found in each metro. Population and total weighted degree follow a similar pattern, decaying rapidly through the first two hundred or so metros and more gradually after that. Extra-community degree, on the other hand, decays far more quickly at all levels of the distribution. There is a very strong correlation between betweenness centrality and extracommunity degree, 0.94, compared to only 0.86 between betweenness and total weighted degree. This is not particularly surprising: it makes sense that cities with many connections to metros outside their immediate community should occupy more central locations in the overall network. 6.2.2 Community Diversity Using the formalized communities shown in Figure 6-6 it is possible to examine the distribution of cities' migrants across the different communities, seeing not just how many migrants cross community boundaries but also where they go. Figure 6-8 'These numbers are calculated by dividing each city's degree by the sum of total degree across all cities. This effectively counts each reciprocal migrant twice, once at each city it connects. 52 Figure 6-7: Ranked Distributions of Weighted Degree, Ext ra-Communmity Degree, and Population 101 Population All Migrants Extra-Community Migrants 10 2 4-1 0 0 oc' (0 4 10 -1 10-6 0 200 400 600 Metro Rank 53 800 1000 Figure 6-8: Out-migrants by Community of Destination, Selected Metros Los Angeles, CA New York, NY Riverside, CA Chicago, IL Upper Kd West V - -W- ~adeout Wd-saou% West Eat. MId SOuth C-at CoastEau East C Md5- W Dallas, TX We-ter Te- C"MsW Washington, DC est San Francisco, CA fte Cd Total Population Eav~e on (II...T.. - shows the breakdown of communities represented among migrants leaving the top seven cities, along with the total population breakdown for comparison. Among these cities, Chicago stands on its own in terms of cosmopolitanism: it sends substantial numbers of migrants to five of the six communities, and sends almost as many migrants to the East coast as it does to its own community. Dallas and to a lesser extent Washington send more than a quarter of their out-migrants to communities besides their own, while Los Angeles, New York, and especially Riverside primarily send migrants to their own community. Notably, for both New York and Washington the West is the second largest destination for migrants, even though it is geographically the most distant. To more formally examine the inter-community migrationsheds of each city, I compute a "participation coefficient" for each city. This measure, used by Guimera et al. in their study of air traffic networks, measures the extent to which a city's migrants are spread across all communities (GuimerA, et al., 2005). It is computed for node i as: Pi = 1 - N 2 Where ki, is node i's degree in community s, ki is node i's total degree, and NAI is the total number of communities. This index will be zero if all of a node's links are within its own community and will approach one if a node's links are spread evenly across all communities. Table 6.2 shows the top ten cities by participation coefficient. While Chicago ranks highest in terms of participation coefficient, the list, is not dominated by the large metro areas that score highest on extra-community degree. Rather, it contains several mid-size metros in the Midwest and South, especially near the borders of the Mid-South, Greater Texas, East Coast, and East Central communities. The Mid54 'Table 6.2: Top 10 -Metros by Cornmunity Pariicipation CoeffcT, MSA Comnmunity (hicago, IL St. Louis, MO Clarksville, TN Memphis, TN Louisville Jefferson County, KY Crestview, FL Columnbus, GA Pensacola, FL Fort Leonard Wood., MO Colorad(o Springs, (0 Upper Midwest Greater Texas Mid-South Mid-South Mid-South Mid-South Mid-South Mid-South Greater Texas West 2009-2010 Population Participation (Coef 9,461,105 2,837,592 273,949 I,316.100 1,283566 180822 294,865 448,991 52,274 645,613 .79 0.76 0.76 0.76 0.74 0.73 0.70 0.70 0.69 0.68 South is particularly well represented. So wliile extra-community migrant s are more likely to move to and from major cities, there are plenty of mid-size metro areas that receive a high proportion of migrants from multiple c-ommunit ies. 6.2.3 Within-Community Role Having examined the distrilbution of extra-conmmunity migrants, the obvious next step is to look at those migrants who (on't leave their coninunities. A ty's intracommunity degree can be calculated by looking at the total nuinber of reciprocal migrants it exchanges with other cities in its coummnity. To account for variation in t he tot al number of migrainits in each community, it is useful to examine the percent age of a community's total migrants that pass through each city. Figure 6-9 imps this percentage. ilere appears to be a fair anount of concentration here as well: roughly ten cities stand out as hubs of within-commnunity migration. To b,etter understand the distribution of within-community migrants, Figure 6-10 plots the top end of the rank distribution for (ich community. Some communities most notably the Upper Midwest, Greater Texas, and the West--have a few cities that clearly dominate intra-cormmunity migration. Chicago, Dallas. aInd Los Angeles domnimate their regions, with Minneapolis, Houston, and Riverside also playing important roles. The East Central and East Coast communities have clear top migration centers in Detroit and New York respectively, although neither plays as strong a role in its community as do Chicago, Dallas, and LA in theirs. Finally, no cities dominate migration within the Mid-South. Nashville and Birmingham have the highest percentage of migrants, but their shares are substantially lower than the shares of the top cities in other communities. Finally, Figure 6-11 simply plots extra-community degree against intra-comununity degree. There appear to be two parts to this relationship. A large number of smaller cities have low extra-community degree: they exchange fewer than 100 migrants xvith MSAs outside of their own community. Aiong cities that exchange more than 100 55 Figure 6-9: Percentage of Communities' Migrants By Metro fQi p .~ ~.. Figure 6-11: Extra Community Degree vs. Intra Community Degree Figure 6-10: Rank Distribution of Within-Community Degree 0.14, Greater Texas Upper Midwest East Central West East Coast Mid-South oIt C. * 15, I.. age. 0.4 Jl. 002 00 2 4 U 8 10 12 10, 14 MSA Rank 56 10 10 10, Within-Community Degree ID' 10, migrants outside their cominunnity, intra- and extra-cOmlnimifity degree show a fairly strong positive relationship. Places with more migrants outside their community tend also t have rm(o)re migrants fron within their community. 57 58 Chapter 7 Discussion, Limitations, and Further Research The analysis co(lnducted in this thesis has been primarily descriptive in naIture. After showing that migration flows in the IUS in 2009-2010 systematically differ from the prediction1s of the radiation model in the pres(nce of' large long-distance flows. I identified sole of tIhe most central cities in the migration network by various metrics. Ech(141 of these sheds light on a (ifferent aspect of the ni gration systenm and may be useful for a different purpose. Unweighted degree identifies the c(ities with the wIdest geographic reach-the ability to draw migrants from nany different large and small cities. Phoenix, the capital of retirement, is the dominant city oy this measure. Weighted degree centrality measures the total reciprocal migration into and out of an area, and gives a sense of which metros have the most lenographic churn: Los Angeles. Nexw York. AI Riverside. Closeness eentrality appears to highlight metros that make up what might be considered the metropolitan core of the country, though this mar simply be due to its high correlat ion with population. Betveenness centralitv identifies the cities that link to multiple distinct regions that don't independently exchange many migrants. High betweenness cities, then, might be thought of a~s the most nationally cosmopolitan places, where people from many regions meet. Chicago and Dallas rank highest on this measure. Finally, I used reciprocal migration flows as the basis for identifying distinct conmunmities. formed of multiple MSAs that exchange many migrants among themselves <1and fewer with the rest of the counntry. This process identified several very cohesive regions--most notably the West-anid other areas where borders were a bit. fuzzier. Most of' the identified regions have one or two cities that account for a large share of the internal mnigration flows, and also tend to have the most interaction with the rest of the country. 7.1 Limitations of the Current Research The scope of this project is relatively liimted, and as such it is unable to fully address every aspect of US migration palterns and how they interaoct with econonic and social 59 activity. However, two key limitations impact the richness of the picture I have been able to uncover and the strength of its conclusions. One major limitation of this study is the amount of information contained in the IRS migration data, about the migrants themselves. The IRS dataset contains only the overall numbers of migrants and their gross income. This is sufficient to determine the flows of people and conduct the centrality analyses. But it skates over the question of who exactly is noving where, and may obscure more subtle patterns. Are the people moving into a given city denographically similar to those moving out? What, fraction of reciprocal migrants are people moving back to their hometowns? Are flows between distant areas made up of different types of people than ones between nearby ones? These questions are unanswerable based only on the migration data, but their answers will dramatically shape the impact of migrants on their new communities. Previous research suggests that patterns of migration vary greatly with age-young adults have a marked tendency to move to major metropolitan areas, balanced out by net outflows among the middle-aged and elderly (Plane et al., 2005). Particularly relevant for econonic development, the IRS data doesn't contain any information about the human capital of migrants, so it's difficult, to tell what kinds of migrants different, places are attracting, and what the economic impact of them is likely to be. A second limitation is that this thesis is based entirely on migration data for one year, 2009-2010. The networks and communities explored here are valid for that year only, and will shift as flows wax and wane. Determining the speed and extent to which this occurs will be important for finding the utility of the constructs employed here, especially since previous research has suggested that migration patterns may shift relatively quickly over time (McHugh & Gober, 1992; Clark, 1982). If the central nodes and communities within the network are found to fluctuate quickly over time, it will be difficult to argue that they play a meaningful economic role. 7.2 Further Research An initial possibility for further work is to extend the analysis conducted in this thesis to other years. The IRS has released data, for the years 2004-2011, a long enough time period to establish the speed at which migration patterns are currently shifting and the stability of communities and central locations over the past decade. A more ambitious extension would begin to approach the possibility of making causal claims about the determinants or impacts of migration. Revisiting the radiation model, an investigation could attempt to explain the residuals found there using data on the economic, cultural, and physical conditions of the sending and receiving MSAs and their relationship. This would to some degree emulate previous work modeling migration at the regional level, but may be able to expand on that work by seeking to explain the residual rather than the total flow-that is, why the flow is greater or less than would be normally expected, instead of explaining the total amount. It also may be able to expand on previous work by making use of interactions between origin and destination variables, such as whether their economies are centered on complimentary industries. 60 The comp]--)lement to this wvould be to attempt to model tie imlipact of migration-perhaps total churn, or network cenlrality-on econoimic performcance. This would present a major identification challenge, but rmight be possible using lagged measures or sufficient controls. Finally, if alternative or supplemental sources of data can be found, it, would be extremely informative to conduct a deeper dive into the demographic conposition of various migration flows. 61 62 References Andris, C. (2011). Afetrics and Afehods for Sociai Distance. Unpublished doctoral dissertat ion, Massachusetts Institute of Technology. Andris, C., Halverson, S., k. Hardist.y, F. (2011, June). Predicting migration system dyilamics with conditional and posterior probabilities. Procecdings 2011 IEEE Iniernational Conkfrence on Spaial Data intinng and Ceoqrpicai Knowlidge Scrrvics, 192--197. Black, D. & Henderson. V. (1999). A theory of urb!)an g(!rowth . Jouirnal of political cconony. 107(2), 252-284. Blondel, \. & Guillaume, J . (20)8). Fast mfolding of co-mnlnities in large netw)orks. Jolrnal oj Statistlcal Alchanics: Thory (nd Experinent, 1-2. Boijas, (. (2006). Native Internal i\igration and the Labor Market Impact of Immigration. Journal of Human Resources(August 2005). (192. Sept ember). Solf-Selection and jnternal Borjas. G., Bronars, S. & Trjo, Migration in the United States. Journal of Urban Econonacs, V2(2), 159--85. Borondo, J., Boronfdo, F., Rodriguez-Sickert, C., & Hidalgo, C. a. (2011, January). To each aceording to its degree: the meritocracy and topocracy of embedded markets. Scientific rcports, 4, 3784. IKzdi, G.. & Turner. S. (2004, July). Trade in university Bound, ., Groen, J., training: cross-stcate vari ation in the production and stock of college-educated labor. Journal of Econotclrics, 121 (1-2), 143 173. Brandes, U., k7 Fleiseher, 1). (2005). Centra lity measures based on current flow, in Poe. 22nd symp. thcoretida aspcfts of computer sclincc (pp. '33- 544). Calabrese., F.. Dahhlemn./ D., Gerber, A., Paul, D., Clen, X., Rowland, J., ... Ratti, C. (2011, October). The (onnected States of America: Quantifying Social Radii '. of Influence. 2011 IEEE Third [1 ' Confereca n Pr vacy , Securily. Risk and Trust and 2011 IEEE Third Int'71 Confe-renc( on Soci'al Computin(], 223--230. (2008, November). Chen, Y., &. Rosenthal, 5.S. migration: Do people move for jobs or fun? Local amenities and life-cycle Journal of Urban Econonmics, 64 (3), 519-537. Clark, G. L. (1982, March). Volatility in the geographical structure of short-run US interstate migration. Environment C planninq A, 14(2), 145-67. Courchene, T. (1970). Interprovincial Migration and Economic Adjustment. Canadian Journal of Economics. Elsner, B., Narciso, G., k Thijssen, J. (2013). Migrant Networks and the Spread of Misinformation. , 1-49. 63 Florida, R. (2002). The Economic Geography of Talent. Annals of the Association of American geographers, 92(4), 743-755. Frey, W. H. (1994). The New White Flight. American Demographics, 16(4), 1-8. Frey, W. H. (1995). Immigration and Internal Migration "Flight": A California Case Study. Population and Environment, 16(4), 353-375. Frey, W. H. (1996). Immigration, Domestic Migration, and Demographic Balkanization in America: New Evidence for the 1990s. Population and Development Review, 22(4), 741-763. Frey, W. H. (2009). The Great American Migration Slowdown. Brookings Institution, Washington, DC. . . . (December), 1-28. Gamlen, A. (2011). Creating and destroying diasporastrategies (No. April). Oxford. Glaeser, E., Kolko, J., &- Saiz, A. (2001). Consumer city. Journal of economic geography, 1, 27-50. Glaeser, E., & Saiz, A. (2003). The Rise of the Skilled City. Glaeser, E., Scheinkman, J., & Shleifer, A. (1995). Economic growth in a cross-section of cities. Journal of Monetary Economics, 36. Good, B. H., Montjoye, Y.-a. D., & Clauset, A. (2010). The Performance of Modularity Maximization in Practical Contexts. Physical Review E, 81, 1-20. Gottlieb, P. D. (1995, November). Residential Amenities, Firm Location and Economic Development. Urban Studies, 32(9), 1413-1436. Gottlieb, P. D., & Fogarty, M. (2003, November). Educational Attainment and Metropolitan Growth. Economic Development Quarterly, 17(4), 325--336. Gottlieb, P. D., & Joseph, G. (2006, October). College-To-Work Migration of Technology Graduates and Holders of Doctorates Within the United States. Journal of Regional Science, 46(4), 627-659. Granovetter, M. (1973). The Strength of Weak Ties. American journal of sociology, 78(6), 1360-1380. Granovetter, M. (1985). Economic action and social structure: the problem of embeddedness. American journal of sociology, 91(3), 481-510. Grant, E., & Vanderkamp, J. (1980). The Effects of Migration on Income: A Micro Study with Canadian Data 1965-71. The Canadian journal of economics. Graves, P. (1980). Migration and climate. Journal of regional Science. Graves, P., & Knapp, T. a. (1988, July). Mobility behavior of the elderly. Journal of Urban Economics, 24(1), 1-8. Greenwood, I. J. (1997). Internal migration in developed countries. Handbook of population and family economics. Groen, J. a. (2004, July). The effect of college location on migration of collegeeducated labor. Journal of Econometrics, 121(1-2), 125--142. Gross, E. (1999). US Population Migration Data: Strengths and Limitations (Tech. Rep.). Internal Revenue Service. Guimerk, R., Mossa, S., Turtschi, A., & Anaral, L. a. N. (2005, May). The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles. Proceedings of the National Academy of Sciences of the United States of America, 102(22), 7794-9. 64 Hagerstrand, T. (1966). Aspects of the Spatial Structure of Social Communication and the Diffusion of lnfwrnation. Papers in JRegional cience. Hansen, S.. Ban, (I., &T Huggins, L. (2003). Explaining the AAIJBrain Dr ainiAl From Older Industrial Cities: the Pittsburgh Region. Ecornomic Development Quar'terly. lunt, G. L. (1993, January). Equilibrium and disequilibrium in migration riodelling. Regional studies, 27(4), 341-9. Internal Revenue Service. (2011). Supplemental Documentation Products (Tech. Rep.). Issermnan, A., Plane, D.. & MeMillen, 1). for Migration Pa/a (1982). Internal Migration in the United States: An Evaluation of Federal Data. Review of Public Data Use. Kemeny, T., & Storper, M. (2012, February). the Sources of Urban Development: Wages, Housing, and Anenity Gaps Across American Cities*. Journal (4 Re!Ilonal Hcience, 52(1), 85-108. Kennan, J., & Walker, J. R. (2011). The Effect of Expected Income on Individual Migration Decisions. Econonetrica, 79(1), 211-251. Kodrzcki, Y. (2001). Migration of recent college graduates: Evidence fiom the Nat)ional Longitudinal Survey of Youth. New Enland LEonomin i(eJiw. Kotkin, . (2012). hlir' A miericans Arc, Moving. Kuznet sov, Y. (2006). Networks and the International A4iration: How Countries Can. Draw on Their Talent Abroad. ) I -tilVersCAI C I,",,nt JIur .CX (2N2 , CargIIIuO, L ., & -1 _Lenorma114nd,_ A1. Huet, model of conmuting networks. PloS one, 7(10), e45985. Lucas, R. (1988). On the Mechanics of Economic 1)evelopment. Journai of monetary economies, 22(February), 3--42. lassey, D. (1988). Economic Development and International Migration in Comparative Perspective. Poplaition, and development reiew. Mathur, V. K. (1999, August). Human Capital-Based Strategy for Regional Economic Devel(o1pnent. Econliomic Development Quartcrly, 13(3), 203-216. (2006). Brain Drin in Ohio: ObserMcGuire. P., Hardy-Johnston. D., &;Saevig,L. to Northwest Ohio. Reference vations and 5,umwiaries uith Pariticuha'r Mcl-ugh, K. E., & Gober, P. (1992). ShorthATerm Dvnaics of the US Interstate Mligration System, 1980fuA 1988. Growth and (hanqe. Mollov, R., Smith, C. L., & Wozniak, A. K. (2011). Internal 11iiyration i. the United States (No. 17307). Morrison, P. S., & Clark, W. a. V. (2011). Internal migration and employment: macro flows and micro motives. Environment and Planning A, 43(8), 1948-1964. Neal, Z. (2010, March). Refining the Air Traffic Approach to City Networks. Urban Studies, 47(10), 2195--2215. Newman, -M., & Girvan, M. (2004, February). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113. Office of I\Management and Budget. (2013). Revised Delineations of Metropolitan Statistical Areas, Micropolitan tatistical Areas, and Combined Statistical Areas, and Guidance on Uses of the Delineations of These Areas (Tech. Rep. No. 13). 65 Piiparinen, B., & Russell, J. (2013). From. Balkanized Cleveland to Global Clcveland: A Theory of Change for Legacy Cities (Tech. Rep. No. November). Plane, D. (1993, January). Demographic influences on migration. Regional studies, 27(4), 375-83. Plane, I., Henrie, C., & Perry, M. (2005). Migration up and down the urban hierarchy and across the life course. Proceedings of the National Academy of Sciences, 102(43), 15313-15318. Plane, D., & Rogerson, P. (1991). Tracking the Baby Boom, the Baby Bust, and the Echo Generations: How Age Composition Regulates US Migration. The Professional Geographer. Pred, A. R. (1971). LargeaARCity Interdependence and the Preelectronic Diffusion of Innovations in the US. GeographicalAnalysis. Pred, A. R. (1973). Urban growth and the circulation of information: the United States system, of cities, 1790-1840. Harvard University Press Cambridge, MA. Pred, A. R.. (1975). Diffusion, Organizational Spatial Structure, and City-System Development. Economic Geography, 51 (3), 252-268. Pred, A. R. (1977). City Systems in Advanced Economies. London: Hutchinson. Pred, A. R. (1980). Urban Growth and City-Systems in the United States, 1840-1860. Harvard University Press. Ratti, C., Sobolevsky, S., Calabrese, F., Andris, C., Reades, J., Martino, M., Strogatz, S. H. (2010, January). Redrawing the map of Great Britain from a network of human interactions. PloS one, 5(12), e14248. Ravenstein, E. (1885). The Laws of Migration. Journal of the Statistical Society of London, 48(2), 167-235. Rogers, A. (1990). Requiem for the Net Migrant. GeographicalAnalysis, 22(4). Romer, P. (1990). Endogenous Technological Change. Journal of Political Economy, 98(5). Sanderson, A., &T- Dugoni, B. (2002). Interstate Migration Patterns of Recent Science and Engineering Doctoral Recip (Vol. 1999; Tech. Rep.). Sassen, S. (1991). The Global City: New York, London, Tokyo (2nd ed.). Princeton, NJ: Princeton University Press. Saxenian, A., & Sabel, C. (2008). Roepke Lecture in Economic Geography Venture Capital in the hAUPeripherysAI: The New Argonauts, Global Search, and Local Institution Building. Economic Geography. Schachter, J., & Althaus, P. (1989). An Equilibrium Model of Gross Migration. Journal of Regional Science. Senor, D., & Singer, S. (2009). Start-up Nation: The Story of Israel's Economic Miracle. Simini, F., Gonzslez, M. C., Maritan, A., & Barabssi, A.-L. (2012, April). A universal model for mobility and migration patterns. Nature, 484 (7392), 96-100. Sjaastad, L. (1962). The Costs and Returns of Human Migration. The journal of political economy, 70(5), 80-93. Stephenson, K., & Zelen, M. (1989). Rethinking Centrality: Methods and Examples. Social Networks, 11. 66 Storper. M. (2010, December). Why do regions develop and change? The challenge for geography and economics. Journa;o o/ Economic Geoqraphy, 11(2), 333-34G. Stricker, K. (2007). Hural TRain Drai. Unpublished doctoral dissertation, Loyoha University ( 1hica go. Taylor, P. J. (2001. September). Specification of the Worl City Network. Geo!raphical A nalysis, 33(2), 181-194. Thierriann, C., Theis, F., Grady, D., Brune, R., & Brockmann, D. (2010, January). The structure of borders in a small world. PloS one., 5(11), e15422. Trevz, G. I., Rickman, D. S., Hunt, G. L.. & Greenwood, M. J. (1993). The Dvnanics of US Internal Migration. The Review of Econormics anid Statistics, 75(2), 209214. Vanderkamp. J. (1971). Migration flows, their determinants and the effects of return migration. The Journal of Political Econorny, 79(5), 1012- 1031. Whisler, R. L., Waldorf, B. S., Mulligan, G. F., V Plane, D. a. (2008, March). Quality of Life and the Migration of the C ollege-Educated: Growth and Change, 39(1 ), 58 94. Wozniak, A. (2010). A Life-Course Approach. Are Colleoe Graduatces More Re sponsive to Disltam Labor Market Oppli)ortunities ? Joianal of Humant Wright, B., Ellis, M., & Reibel, M. (1997). Resources(INlay). The Linkage between immigration and Internal Migration in Large Metropolit an Areas in the U.'nited St ates. Econormoi Geo(ap hy. Yezer, A., &, Thurston, L. (1976). Migration Patterns and Income Change: Implicat ions for the Human Capit al Approach to Migration. Southarn Eco nonic Journal. 67