The Role of Firms in Fostering Within Country Migration: Evidence from a Natural Experiment in India Prithwiraj Choudhury Tarun Khanna Working Paper 14-080 February 28, 2014 Copyright © 2014 by Prithwiraj Choudhury and Tarun Khanna Working papers are in draft form. This working paper is distributed for purposes of comment and discussion only. It may not be reproduced without permission of the copyright holder. Copies of working papers are available from the author. The Role of Firms in Fostering Within Country Migration Evidence from a Natural Experiment in India1 Prithwiraj Choudhury Harvard Business School Tarun Khanna Harvard Business School October 30, 2013 ABSTRACT High ability individuals can be constrained from commensurate employment opportunities due to their geographic location. In the face of physical, informational and social barriers to migration, firms with nation-wide hiring practices can benefit from facilitating the migration of high ability individuals from low employment districts to regions with better employment opportunities. We exploit a natural experiment within an Indian technology firm where the pre-existence of a computer generated talent allocation protocol allows us to isolate the relation between an employee’s prior home town/village and subsequent performance within the firm. Using unique personnel data for entry level undergraduates and leveraging the fact that the assignment of an employee to one of many technology centers within the firm is uncorrelated to observable characteristics of the employee, we find that employees hired from low employment districts (remote employees) out-perform their non-remote counterparts in the short term. They continue to outperform their non-remote counterparts in the long term once we control for the distance of migration. As a possible explanation of our result, we test for selection and find that employees hired from low employment districts outperform their non-remote counterparts in standardized verbal and logical tests at the recruitment stage. To explain why the firm might be more likely to select high ability individuals from remote districts, we additionally conduct a survey of randomly selected urban and rural colleges and document statistically significant differences in employment opportunities for rural and urban graduates. Our survey results also indicate that not every firm follows the policy of hiring from low-employment districts. 1 The authors would like to thank seminar participants at Duke, Harvard Business School, the International Economic Association (IEA) meetings in 2011 in Beijing, INSEAD, London School of Economics, NYU Stern School of Business, Washington University St. Louis, Wharton and the World Bank DECTI seminar for their comments on a previous draft. 1 I. INTRODUCTION The influence of firms in shaping human capital and labor markets has been long studied by economists (Baker, Gibbs and Holmstrom, 1994). Using personnel data collected from firms, economists have conducted several productivity studies that document how hiring, training and human resources management policies of the firm shape the development of human capital (Bartel, Ichniowski and Shaw, 2004). However, firms are notably absent from the literature on migration. To quote Kerr, Kerr and Lincoln (2013; page 5), “firms are mostly absent from the literature on the impact of immigration.” The authors also argue that “this approach seems quite incomplete for skilled migration” given that firms play an active role in the migration of the skilled workers, in the context of U.S. and other countries. In this paper, we posit that in the absence of roads, air connectivity, information and other relevant “infrastructure”, nationwide hiring by large firms can facilitate efficient within country migration patterns. Firms with nationwide hiring practices can select high ability individuals from geographically remote areas and can benefit from such hiring practices. If employees hired from remote locations outperform employees recruited from non-remote locations, then this could plausibly lead to positive rents for the firm in question. We leverage a natural experiment within a large technology firm in India and report several interesting results. Individuals hired from low-employment districts (‘remote’ employees) outperform non-remote employees in the short term. In the longer term individuals hired from low-employment districts continue to outperform employees hired from non-remote locations, once we control for the distance of migration. To offer a possible explanation of our result, we test for selection. We use standardized test scores in logical and verbal ability during recruitment and provide evidence that employees hired from low-employment districts outperform their non-remote counterparts on standardized verbal and logical ability tests at the point of recruitment. Our empirical setting and identification strategy exploits a natural experiment within one of India’s largest software multinationals, employing greater than 120,000 people worldwide. The Indian technology firm in question (“INDTECH”) has made significant investments in nationwide hiring and recruits talent from over 250 colleges in India. Several of these colleges are in low employment districts of India. After four months of induction training, the firm then randomly allocates talent it recruits from across the country – including from remote districts 2 across its projects executed in its development centers serving global clients from locations across India. This randomization, done by running a computer application, is done so that the end customers of the firm, which are mostly U.S. based firms, are indifferent among the particular INDTECH center that executes their project. An important point here is that we do not use the random assignment of employees to technology centers as ‘treatment’; instead we use the protocol to control for endogeneity concerns. In other words, our main research question is not to study differences in performance for employees who get assigned to mainstream centers versus employees who get assigned to non-mainstream centers. Our primary research question is to study differences in performance for employees who come from remote versus non-remote districts. However the randomization protocol helps us control for endogeneity concerns where the center an employee is assigned to is correlated to observable characteristics such as being from a remote district on one hand, and is correlated to future performance on the other hand. As an example, randomization implies that employees are not systematically assigned to centers close to their hometowns. If this was not the case, then employees coming from remote areas would be assigned a remote development center and their performance ratings would have been downward-biased because of the distance from the larger knowledge centers and because of missing out on agglomeration economies. To conduct the empirical analysis, we collect unique personnel data for undergraduates hired by INDTECH, anonymized not to reveal names of individuals. The data collected includes details about the school district, college district, assigned technology center, grades received during training, short and long term performance data and attrition data for individual employees. To test for whether or not the firm selects high ability employees from remote districts, we further collected data of the standardized test scores of logical and verbal ability at the recruitment stage for each employee and find evidence of the same. Our interviews with managers at the focal firm indicate that for high ability individuals from remote areas, joining the firm is among their top career choices; while high ability individuals from the larger cities have several other competing career choices. To further augment our analysis and to establish why INDTECH might be more likely to select high quality individuals in remote districts, we conducted a survey where 11 randomly selected urban and rural engineering colleges participated in a telephonic interview with questions related to the nature of firms hiring from such colleges and the salaries offered by the 3 firms. Analysis of the survey data confirms large, statistically significant differences in salaries for students graduating from urban and rural colleges. The survey also indicates that not every technology firm follows the policy of hiring from remote districts. Multinational technology firms in our survey sample mostly hire from urban colleges and INDTECH has a unique policy of hiring from rural colleges. Interviews conducted with INDTECH confirm the higher costs of hiring from rural colleges that includes the costs of travel, and the higher search costs in finding talent from rural colleges. Finally, we also find that employees from low employment districts appear disproportionately to use the firm as a platform to further education and join a master’s degree program. As one possible explanation of our result, we test for selection. We build on the models of selection in the migration literature. Borjas (1994) and Young (2013) are two such models that provide a possible theoretical explanation for our results. The refugee sorting model of Borjas (1994) is based on the Roy (1951) model of self-selection of workers and identifies theoretical conditions under which migrants have below average earnings in the source country but end up in the upper tail of the earnings distribution in the host country. In a recent paper, Young (2013) provides evidence that the large gap between urban and rural living standards in developing countries accounts for much of the inequality within those countries and is attributable to selection based upon unobserved skill and within country migration. Our empirical analysis tests whether or not hiring by firms of migrants from remote areas within the country involves selection based on ability and whether or not high ability employees out-perform their nonremote counterparts in the short-term and in the long-term. Over the past few decades, the economics literature on migration has focused on international migration and much of the recent literature studies relatively low skilled Mexican immigrants and their impact on the U.S. labor market. A notable exception here is the recent paper by Young (2013). However, the disproportionate focus of the prior migration literature in studying international migration and on a single dyad of countries (U.S., Mexico) overlooks the widespread within-country migration happening in several developing countries. To quote Zhao (1999), “the migration of rural labor to urban areas in China since the mid-1980s has created the largest labor flow in world history.” (Zhao 1999, page 281) Zhao estimates that the number of rural migrants in urban areas in China in the mid-1990s was around 50 million. In comparison, 4 to quote Borjas (1994), the number of legal immigrants coming into the U.S. between 1981-1990 was 7.3 million (Borjas, 1994; Table 1, page 1668) and in the year 1986, the Border Patrol in the U.S. apprehended 1.8 million illegal aliens (Borjas 1994; page 1669). Our results help bridge this gap in the migration literature by studying the role of firms in the context of within country migration. Here, we are also motivated by the literature that documents physical, social and informational barriers to migration and employment opportunities for individuals in developing countries (Jensen, 2012; Banerjee et al, 2007). In the face of such barriers to migration, large firms with wide-spread national hiring might be particularly relevant in facilitating within country migration. Given the potential higher costs of hiring from remote districts, the firm will only do so if individuals hired from remote districts are more productive, i.e. they out-perform their non-remote counterparts. In addition to informing the literature on migration, our results have policy implications for India, reportedly the ‘youngest’ country (of appreciable size) in the world, where there is much hope of a demographic dividend and equivalent fear of what we might term a ‘demographic albatross’ that might result if burgeoning youth pools are unemployed and unemployable. Our findings also have implications for other large developing countries like China which has seen widespread within country migration and an increase over the last two decades in economic disparity across regions (Yang, 2002). Our work also has relevance for developed countries like the United States and has a philosophical connection to the “moving to opportunity” experiments conducted in Boston by Katz, Kling and Liebman (2001). In the past decade or so, labor economists have documented the polarization of the U.S. labor market (Autor, Katz and Kearney, 2006) and in recent work, Moretti (2010) has identified large differences in worker earnings for “observationally similar workers” based on the location of the individual. In his new book, Moretti states that “your salary depends more on where you live than on your resume”, (Moretti, 2012; page 88). Given this, even in the U.S., hiring practices of firms could have policy implications for efficient within country migration. The rest of the paper proceeds as follows. In the next section, we summarize our theoretical antecedents w.r.t. the positive selection of migrants, the barriers to migration in developing countries and the incentives of firms to facilitate migration. Section III describes our 5 empirical setting and the natural experiment. We report the descriptive statistics in Section IV and our econometric specifications and results in Section V. Section VI concludes. II. THEORY We posit that firms with nation-wide hiring practices can facilitate high ability individuals overcome barriers to migration and can move them from low employment districts to areas with high employment opportunities. We also posit that despite the higher costs of hiring from remote regions, firms can benefit from such hiring if high ability employees hired from remote regions outperform their non-remote counterparts. The thread of our theorizing is well captured by a quote from Yap (1976) – “Firms maximize profits, and individuals maximize utility. However, because of institutional constraints, noneconomic motivations, government policies and imperfect information, factor price differentials between sectors exist and are only gradually reduced over time. Migration between sectors is a means of equalizing factor returns…however, neither migration, nor any equilibrating force is strong enough to eliminate imbalances instantaneously.” (Yap, 1976, page 122) We build on theoretical antecedents from the literature in economics on migration, development and personnel economics. We first summarize the selection models of migrants. Two possible theoretical models that are relevant to our work are the refugee sorting model from Borjas (1994) and the recent within-country migration and inequality model of Young (2013). We next summarize the literature that outlines the physical, informational and social barriers to employment opportunities in developing countries. Finally we summarize the literature in personnel economics that outlines the role of hiring, training and human resource practices of firms in shaping human capital and labor markets. II.A. Selection Models of Migrants and Refugee Sorting As Borjas (1994) outlines, the early literature on migration is focused on the labor market performance of immigrants in the host country. Sjaastad (1962), Chiswick (1978) and Carliner (1980) are the pioneering studies in this area. Sjaastad (1962) models migration as an investment decision where each individual assess the expected utility to be obtained in each possible destination and chooses the location with highest expected utility. In other words, self-selection is driven by wage differentials net of migration costs. Chiswick finds that after 30 years in the 6 United States, the typical immigrant earns about 11 percent more than a comparable native worker. This result was interpreted by several researchers in terms of a selection argument. Chiswick mentions that immigrants are “more able and more highly motivated” than natives (Chiswick 1978, page 900) and Carliner postulated that immigrants “choose to work longer and harder than non-migrants” (Carliner 1980, page 89). Subsequent research has further developed the self-selection arguments of immigrant flow. This analysis is based on the Roy (1951) model of self-selection of workers, which describes how workers sort themselves between employment opportunities. In the case of migrants, Borjas (1987) models this based on a wage distribution for the home and host countries of the migrant and identifies conditions for immigrant positive self-selection and immigrant negative self-selection. In other words, this model generates conditions under which migrants will either have above average earnings in the source and host country (positive self-selection) or will have below average earnings in the source and host country (negative self-selection). Borjas also identifies conditions for “refugee sorting” (Borjas 1994; page 1689) where immigrants have below average earnings in the source country but end up in the upper tail of the earnings distribution in the host country. This sorting happens when the correlation between the skills of the two countries is small or negative. To illustrate “refugee sorting”, Borjas gives the example of high skilled workers in a Communist country which does not value their skills migrating to a market economy and performing well in the host country’s market economy.2 The recent empirical literature on migration has tested the theoretical predictions of the selection hypotheses; however most of the work is focused on a single dyad of countries (Mexican immigrants in the U.S.) and has focused on relatively low skilled workers. Recent papers in this area include Munshi (2003), Chiquiar and Hanson (2005), McKenzie and Rapoport (2010), Kaestner and Malamud (2010) and Moraga (2011). McKenzie and Rapoport (2010) consider the effect of migrant networks in influencing self-selection patterns. Munshi (2003) studies Mexican immigrant networks and finds that the same individual is more likely to be employed and retain a higher paying non-agricultural job when the network of the individual is exogenously larger.3 2 An exposition of the Borjas (1994) refugee sorting model, reproduced from the original text is provided in the appendix 3 There is a recent empirical literature on skilled immigrants in the U.S. and much of this literature is focused on the impact of skilled immigrants on wages and employment opportunities of domestic skilled workers. This literature 7 II.B. Within Country Migration and Selection Models The theoretical foundations of the within country migration literature dates back to the 1970s and the ‘job-search’ models of Harris and Todaro (1970). These models included a rural and urban sector with the urban sector characterized by an institutionally fixed wage above the market clearing level. These models explained within country migration as rational maximizing behavior where the higher urban wage acted as a regulating mechanism. Subsequent work in this area included Pinera and Selowsky (1978) who additionally accounted for the existence of voluntary unemployment, especially in urban areas. Most of the prior literature on within country migration in developing countries is focused on models of agricultural to non-agricultural migration. However, the within-country migration literature has not received much attention in recent times with the notable exception of Young (2013) who provides evidence of self-selection of workers into sectors leading to within country migration.4 Young (2013) provides evidence that the large gap between urban and rural living standards in developing countries accounts for much of the inequality within those countries and is attributable to selection based upon unobserved skill and within country migration. One out of four or five individuals raised in rural areas moves to a city as a young adult and earns much higher incomes than non-migrant rural permanent residents. Similarly, one out of four or five dates back to Friedberg (2001) who studied the effect of Russian immigrants into Israel and the effect they had on wages of domestic workers. Exploiting information on the immigrants' former occupations abroad, the author finds no adverse impact of immigration on native outcomes. Borjas (2005) uses data from the survey of earned doctorates and the survey of doctoral recipients to estimate that a 10 percent immigration induced increase in the supply of doctorates lowers the wage of competing workers by about 3 percent. In a recent paper, Kerr and Kerr (2013) study the impact of skilled immigrants in occupations related to science, technology, engineering and mathematics (STEM) and find that STEM workers departing their firms during periods of disproportionately high immigration experience difficult employment transitions. A second stream of recent papers focus on the impact of skilled immigrants on innovation and the progress of science. Hunt and Gauthier-Loiselle (2010) use data from the 2003 National Survey of College Graduates and find that a 1 percentage point increase in immigrant college graduates' population share increases patents per capita by 9-18 percent. Other studies in this area include Borjas and Doran (2012) who study the migration of Russian mathematicians following the collapse of the Soviet Union and Moser et al. (2012) who study Jewish scientist expellees from Nazi Germany. 4 Schultz (1971) provides evidence that more than one-third of the rural Colombian population under the age of 40 in 1951 had left for urban areas by 1964. The author considers several factors in building a model of rural-urban migration including differences in agricultural and manufacturing wages, population growth rate in a region (greater population growth rate leading to higher out-migration) and characteristics of individual migrants. The author finds that rural-urban migration is selective with respect to age and sex as well as region. He also finds that schooling contributes to out-migration of students, given that the returns to education are higher in the cities compared to that in the rural areas. Yap (1976) focuses on rural-urban migration in Brazil and focuses on rural-urban differences in capital and labor productivity; rates of technological change; rates of natural population growth, marginal savings propensities and tax rates. The migration decision here is endogenous to the wage differential between the agricultural and non-agricultural sectors. 8 individuals raised in urban areas moves to a rural area as a young adult and earns much lower incomes than their non-migrant urban cousins. The theoretical model of Young (2013) assumes the simultaneous existence in developing countries of two sectors, urban and rural. In equilibrium, it is more likely that a skilled worker will work in an urban area and an unskilled worker will work in a rural area. Also, observable educational attainment determines the probability that a worker is skilled. In other words, the relative factor demands of the urban and the rural sectors produces a sorting of workers, so that a worker of a given educational attainment working in the urban sector is more likely to be skilled. The author also assumes underlying heterogeneity in workers’ urban and rural productivity conditional on their skill status. This assumption together with the concentration of demand for skilled workers in urban areas leads to the average rural to urban migrant being better educated that the average rural permanent resident. This idea builds on Lewis (1954) and the existence of dual economies where workers in rural areas migrate to cities, by comparing their marginal product in urban output to their average product in rural family output. The model is also similar to Lagakos and Waugh (2011) who argue that workers sort themselves into urban and rural areas based on their intrinsic abilities. II.C. Barriers to Migration and Employment in Developing Countries The theoretical model of Young (2013) does not assume any barriers to migration within the country and workers can self-select into the urban or rural employment sectors based on their underlying skills and educational attainment. However, the economics literature in the context of developing countries documents physical, informational and social barriers to migration and employment. First of all, individuals may be located in low-employment regions and may not invest in education given the paucity of employment opportunities around them. This follows the literature in economic geography (e.g. Henderson et al. 2001), that outlines how both across countries and within countries, economic activity is concentrated to a few metropolitan regions. There is also a related literature (Dyson and Moore, 1983 and Foster and Rosenzweig, 2009) that points out how considerations such as the gender or social status, e.g. caste of the student could 9 negatively affect the perceived benefits to investing in education.5 A recent study of how gender is related to underinvestment in education in the context of India is by Jensen (2012). The author ran an experiment where recruiting services for the newly emerging business process outsourcing industry was provided for young women in randomly selected rural Indian villages. The author finds that during this time, young women in treatment villages were significantly less likely to get married or have children; instead they were likely to obtain more schooling or post-school training and enter the labor market. On the issue of caste, scholars such as Banerjee, Iyer and Somanathan (2005) observe that in India, caste classifications are rigid and caste divisions lead to social relations that might become conflict-prone. Members of backward castes can be relatively disadvantaged in their access towards public goods related to education and skill development. As Banerjee et al. (2004) point out in their study of the provision of public goods to various districts of India based on the caste demographics of the area, “Areas with a concentration of Brahmans, the traditional priestly class considered the top of the caste hierarchy, have higher levels of schools….Areas with groups that were recognized as socially and economically marginalized by the Indian state at the time of independence are associated with lower access.” (Banerjee et al. 2004, page 4). Munshi and Rosenzweig (2009) also have a working paper that explains the persistence of low spatial and marital mobility in rural India, despite increased growth rates and rising inequality in recent years, on the existence of sub-caste networks that provide mutual insurance to their members. Secondly, in developing countries, employment opportunities for talented individuals in remote areas might be hindered by the lack of teaching infrastructure and the lack of committed teachers. Banerjee et al. (2007) summarize the dismal quality of educational services offered to the poor in developing countries. The authors refer to a 2005-India wide survey on educational attainment and state that 44 percent of the children aged 7-12 cannot read a basic paragraph and 50 percent cannot do simple subtraction. The authors also run two sets of randomized experiments where employing young women to teach students lagging behind in basic literacy and numeracy skills and a computer assisted learning program for Math resulted in higher test scores. Chaudhury et al. (2006) find for a sample of developing countries that includes 5 Dyson and Moore (1983) and Foster and Rosenzweig (2009) outline the practice of ‘patrilocal exogamy’ where the woman gets married to an individual from a different village and leaves her parents’ village to live with her husband's family. This and similar practices imply that in many cases, the returns to investing in girls' human capital does not accrue to the parents and as a result, parents have less incentive to invest in the education of the girls. 10 Bangladesh, Ecuador, India, Indonesia, Peru and Uganda, that on average, 19 percent of teachers were absent during unannounced visits to primary scholars made by the researchers. The authors also state that a comparable teacher absence rate for a large sample of school districts in New York State was five percent. Choudhury and Khanna (2012) summarize the physical, informational and social barriers to migration and employment for high ability individuals in remote regions in the context of developing countries. II.D. Firms and Migration As described earlier in the introduction, firms are conspicuously absent from the economics literature on migration. To quote Kerr, Kerr and Lincoln (2013), “from an academic perspective, there is very little tradition for considering firms in analyses of immigration. As one vivid example, the word “firm” does not appear in the 51 pages of the classic survey of Borjas (1994) on the economics of immigration, and more recent surveys also tend to pay little attention to firms” (Kerr et al, 2013; page 1).6 The authors also explain that the role of firms needs to be studied in the context of migration, particularly in the case of migration of skilled workers.7 There is also a separate literature in personnel economics, summarized by Bartel et al. (2004) that outlines the role of hiring, training and human capital management policies of firms in shaping human capital and labor markets. We leverage insights from this literature in personnel economics and posit that firms with country-wide hiring practices might facilitate within-country migration, moving high ability individuals from low employment districts to regions rich in employment opportunities. If employees hired from remote locations outperform employees recruited from non-remote locations, hiring employees from remote locations could plausibly lead to positive rents for the firm in question. In other words, there could be a “recruiting arbitrage” in hiring employees from remote locations and expropriating rents from their superior performance within the firm. Here, we are motivated by the findings from Fisman and Khanna (2004). The authors show that in equilibrium it is possible for some firms to profit from factor arbitrage in rural India. 6 Other recent surveys on immigration include Friedberg and Hunt (1995), Freeman (2006), Dustmann et al. (2008) and Kerr and Kerr (2011). 7 The authors analyze the role of firms in immigration in the context of the H1-B visa, where the firms identifies workers it wants to hire. They use the Longitudinal Employer-Household Dynamics (LEHD) dataset and an unbalanced panel of 319 firms over 1995-2008 and estimate that a 10% increase in a firm’s young skilled immigrant employment correlates with a 6% increase in the total skilled workforce of the firm. 11 To quote the authors, “firms that are best able to deal with infrastructure shortages will be more likely to locate in low-infrastructure regions, as it allows them to take advantage of cheap factors of production that arise in equilibrium in order for markets to clear”. (Fisman and Khanna, 2004, page 615). The authors also provide evidence that firms in India that are able to operate in states with less than median development indices benefit from lower wage rates (about 30% less, on average, than wage rates in more developed states) and tax rates (average of 8.2% in developed, vs. 7.0% in undeveloped states), as well as a higher average rate of government 'fiscal benefits' (0.8% vs. 0.6%). III. NATURAL EXPERIMENT Our empirical setting is one of India’s largest technology firms (INDTECH) with over 120,000 employees spread over 10 technology centers in India and working on global projects. We exploit a natural experiment w.r.t. how this firm assigns entry level employees to its 10 technology centers spread all over India. As stated earlier, the firm has made significant investments in nationwide hiring and recruits talent from over 250 colleges in India. Several of these colleges are in low employment districts of India. After four months of induction training, the firm then randomly allocates talent it recruits from across the country to one of the 10 possible technology centers all across the country. The allocation is done by a computer application that is part of the firms’ enterprise resource planning software. This policy ensures that the allocation of an employee to a particular location within the firm is uncorrelated to measures of observed ability such as test scores at the end of induction training. Out of the 10 centers, 7 centers are located in mainstream locations close to the firm headquarters while 3 centers are in relatively remote parts of India. Figure 1 outlines the geographic distribution of the 10 development centers. Exhibit 1 outlines the steps followed by the computer application that assigns new employees to one of the ten locations. Interviews with the head of talent development at INDTECH reveals that the “primary motivation” of this talent allocation policy is to ensure that the end-customers of INDTECH, mostly U.S. based firms are indifferent about the location of the technology center that executes its project. The secondary motivation of this talent allocation policy is the avoidance of regional and/or ethnic cliques at the technology centers. To quote the head of talent development at INDTECH, “we do not want all Tamils to join the Chennai center or all Punjabis to join 12 Chandigarh and start conversing in their regional language rather than English. If that happens, both our clients and employees from other parts of the country are affected.” As described earlier, we do not use the random assignment of employees to technology centers as ‘treatment’; instead we use the protocol to control for endogeneity concerns. In other words, our main research question is not to study differences in performance for employees who get assigned to mainstream centers versus employees who get assigned to non-mainstream centers. Our primary research question is to study differences in performance for employees who come from remote versus non-remote districts. Given this, the talent allocation protocol at INDTECH is extremely valuable from the point of the econometrician. The personnel economics literature (Baker, Gibbs and Holmstrom, 1994; Gibbons, 1995) has long pointed out, there are pre-defined “career ladders” inside firms. These career ladders are long, structured and involve endogenous progressions (“fast moves”) based on prior period performance. In this context, being able to find a randomized assignment of employees to locations inside the firm helps avoid several endogeneity concerns that may arise in a more conventional setting. If the allocation of employees to a technology center was endogenous on employee level characteristics, then the performance estimates of remote employees could be downward or upward biased, as the following few examples illustrate. For example, in a more conventional setting it is conceivable that employees could be systematically assigned technology centers close to their home towns/villages. If that is the case, then employees coming from remote areas would be assigned a remote development center and their performance estimates would have been downward-biased because of the distance from the larger knowledge centers and because of missing out on agglomeration economies.8 In another example, employees could be sorted to locations based on measures of ability observable to the managers of the firm but not to the econometrician. In this case, the highest ability employees might be disproportionately assigned mainstream technology centers. If this is indeed the case and if observed measures of ability are not perfectly correlated with actual ability, then any further study of what drives subsequent performance is subject to methodological bias. In this case for example, if there is a positive correlation between being from a remote district and 8 The researchers conducted several employee interviews at INDTECH to confirm this. A couple of the employees who were interviewed came from the Khordha and Sundergarh districts in the eastern Indian state of Orissa and had families living in these places. In interviews, they confirmed they given a choice, they would have selected the Bhubaneshwar, Orissa technology center of INDTECH but given that they had no choice in selecting the center, both of these employees were assigned to and continue to work in the Bangalore technology center. 13 unobserved measures of ability, and if employees from remote districts are systematically assigned mainstream technology centers, their performance estimates could be upward biased. The talent assignment also helps us avoid issues related to assortative matching (Becker 1973). In a more conventional setting for example, it is conceivable that employees might be assigned to locations based on considerations such as ethnicity. In this example, all employees who are ethnic Kannadigas (i.e. from Karnataka) could be assigned to the Bangalore technology center, all Tamils (i.e. from Tamil Nadu) could be assigned to the Chennai technology center and all employees from Orissa to the technology center in Bhubaneshwar. If this is the case and further if there is systematic bias in grading performance towards certain ethnic or social groups within the firm in question, we might get spurious results in analyzing how prior location of the employee relates to subsequent performance. As an example, there could be a systematic positive bias in grading performance for employees from the South Indian centers (Bangalore and Chennai centers). In that case, the performance ratings of employees from the Orissa (Bhubaneshwar) center will be downward biased. Given assortative matching based on ethnicity and given that Orissa has a large number of ‘low employment districts’, the econometrician will observe a downward bias in the performance of employees hired from low employment districts. IV. DESCRIPTIVE STATISTICS We collected unique data for entry level employees recruited over 2007-2009 from over 250 colleges all across India. The employees in our sample are undergraduates hired from engineering colleges with no prior full-time employment experience. Post recruitment, they are randomly assigned to one of several technological areas such as .NET, Java or mainframe, and receive four months of induction training prior to being assigned a technology center.9 Entry level undergraduates join the firm and start training between May and November and prior to starting training are assigned a ‘technological area’. This assignment of an undergraduate to a technological area is uncorrelated to observable characteristics of the entry level undergraduate. Employees assigned a particular technological area are then trained in batches of around 100 employees each. For the sample of employees hired in 2007, there are 18 9 The .NET Framework (pronounced dot net) is a software framework developed by Microsoft that runs primarily on Microsoft Windows. It includes a large library and provides language interoperability (each language can use code written in other languages) across several programming languages. (Source: http://en.wikipedia.org/wiki/.NET_Framework) 14 batches with an average of 94 employees each. In addition, as described earlier, post training, the assignment of employees to a technology center is not correlated to observable characteristics of employees. Each of the 10 technology centers at INDTECH work on projects related to the three major technologies (.NET, Java, Mainframe) that entry level undergraduates are trained in. To avoid being biased by diverse temporal trends affecting various technologies that the employees are trained in, we restricted our data collection exercise to employees trained in a single technological area (the single area is .NET). Focusing on employees trained in the same technological area enables us to alleviate concerns of employee performance being biased by short term demand or supply trends affecting the underlying technology they are trained in. In all, we are able to collect data for a total of 8520 undergraduates hired in 2007, 2008 and 2009. Of this, 1696 undergraduates were hired in 2007. The personnel data for the 2007 batch is much more complete and has less missing data compared to data collected for the batches hired in 2008 and 2009. Table I summarizes the personnel data for both the 2007 batch (first three columns) and for all batches (last three columns) and the notes in the table explain the level of completeness of data both for the 2007 batch as well as for subsequent batches. The firm hires around 10,000 undergraduates every year. Given that we only collect data for employees trained in .NET implies that we collect data on around 17% of total entry undergraduates in 2007. The main independent variable of interest is whether or not the employee is from a remote district (from remote district). This variable is constructed as follows. We requested detailed resumes of employees in our sample listing the name and location of their school, high school and undergraduate college. This data was made available for 93% of the 2007 batch and for 37% of employees from all batches. In the next step, we use data from the 2001 Indian census to identify average employment for 594 districts in India. A district is coded as ‘remote’, if the employment level in the district based on the census data is less than the median employment level across all Indian districts. Given this data, we code from remote district as ‘1’ if three conditions are met – (i) the employee went to school in a remote district; (ii) the employee went to high school in a remote district and (iii) the employee went to undergraduate college in a remote district. Assuming that remote employees perform better, as we will show later, this turns out to be the most conservative way of coding remoteness. In this definition, individuals who went to a remote school but a non-remote college have the variable coded as 0, i.e. though they might have 15 performed better in school and consequently moved to a non-remote college, they are coded as part of the control group. This arguably biases us against finding an effect for the treatment group of remote employees. Table II summarizes the migration patterns in our data. The next independent variable of interest is whether or not the employee is assigned a mainstream center (assigned mainstream center). INDTECH has 10 technology centers in India. We code seven of these 10 locations as mainstream locations and these are Bangalore, Hyderabad, Chennai, Mysore, Mangalore, Trivandrum and Pune. The remaining three locations, Bhubaneshwar, Jaipur and Chandigarh, are coded as non-mainstream locations. We did this coding in consultation with the head of talent development at INDTECH and the underlying rationale is predominantly based on the geographic distance to the headquarters (Bangalore). This follows the literature in economic geography (Henderson et al. 2001), that outlines how both across countries and within countries, knowledge is concentrated in a few central locations. This variable is available for 96% of the 2007 batch and for 88% of employees from all batches. Our first dependent variable of interest is performance. At the end of every year, each employee generally receives a performance rating if and only if she worked on a coding/testing project for at least 9 months in the calendar year. For the 2007 batch, we use a measure of performance at the end of 2008 (short term performance) and a measure of performance at the end of 2010 (long term performance). However, given the ‘9 month work’ rule, not every employee in the batch gets a performance rating in every year. Interviews with the head of talent development at INDTECH, with a senior manager in human resources and with several employees in the sample indicate that the performance ratings for entry level undergraduates is based on mostly objective measures including the quality of coding and/or testing (measured using ‘mistakes’ in the code that are recorded by automated software) and the timeliness and completeness in coding/testing and documentation (measured using automated software). Employees are also tested for their communication skills and this is assessed by the manager of the employee. However, the metrics are predominantly objective and measurable for entry level undergraduates. To quote a senior human resources manager, “For the first three years, performance evaluation is mostly based on objective metrics….there is an underlying normal distribution for the cohort in assigning these ratings.” Interviews with senior human resource managers also indicate important differences in how short term performance (i.e. performance at the end of 2008 for the 2007 batch) and long 16 term performance (i.e. performance at the end of 2010 for the 2007 batch) is measured and coded. Short term performance (i.e. performance at the end of 2008 for the 2007 batch) is measured using the following two dimensions – (i) error rate in coding/testing and (ii) completeness in coding/testing and documentation and is distributed across three possible discrete ratings, with the distribution of ratings across employees fitted using a normal distribution. Given that INDTECH uses automated software to measure the error rate of coding/testing and the completeness of coding/testing and documentation, it is safe to say that the measures of performance are quantifiable and objective. Long term performance is measured based on one additional dimension. For long term performance, the three dimensions - (i) the error rate in coding/testing, (ii) completeness in coding/testing and (iii) communication skills. This additional measure, communication skills, is subject to a more subjective assessment by the manager of the employee. In addition, long term performance (i.e. performance at the end of 2010 for the 2007 batch) is distributed across five possible discrete ratings, with the distribution of ratings fitted using a normal distribution. For the 2007 batch, data on both short term and long term performance was made available for every single employee who exceeded the ‘9 month work rule’ and received a performance rating. Our interviews also indicate that the manager of each employee enters an initial performance rating based on the objective criteria and then managers from human resources check the rating to the underlying scores (scores of error rate of coding, completeness of coding, etc.) to ascertain errors committed by the manager in entering the scores. The second set of dependent variables relate to the verbal and logical scores each employee received for the standardized tests at the recruitment stage. This variable helps us test for one of the possible underlying mechanisms for why remote employees might have different subsequent performance compared to their non-remote counterparts and allows us to test whether or not there is positive selection of remote employees in light of the selection models discussed earlier. INDTECH administers a standardized test to measure verbal and logical ability at the recruitment stage and questions can have negative penalties for incorrect responses in certain years. We use these scores as an observable measure of the unobservable ‘skill’ or ‘ability’ of each individual. 17 We also collected data on attrition and have two other dependent variables – quit firm by 2011 and quit for higher studies. To code the second variable, we collected data from exit interviews for each employee in our sample who left the firm by 2011. We also code control variables to indicate the gender of the employee (Male) and whether or not the employee is from one of the underrepresented scheduled caste (SC) or other backward castes (OBCs) in India.10 The gender data was only available for the 2007 batch. A key control variable relates to the distance of migration of the employee from her home town/village to the technology center she is assigned to. We estimate this distance using the distance of the district headquarters of the school district of the employee and the technology center she is assigned to.11 The migration literature has long considered the costs of migration related to the distance of migration. This literature dates back to Schultz (1971), who finds that within country migration between pairs of regions are responsive to locational factors and specifically the distance between a focal region and each major city influences the cost of travel and in turn affects the migration decision. The author also conjectures that travel costs are not linearly related to time or distance of travel and uses the logarithm of the time in hours to travel the distance as a proxy for the cost of migration. Recent papers that use similar measures include McKenzie and Rapoport (2010) and Dahl and Sorenson (2010). Our paper is plausibly the first paper where this measure is exogenously determined. Given the random assignment of employees to technology center, the distance of migration is exogenous, given that the home town/village location is pre-determined, however the final destination is exogenously determined. We also control for cumulative grade point average (CGPA) at the end of training. This is a cumulative grade point average score that controls for performance during training and is expected to be positively correlated to subsequent performance within the firm. 10 As Banerjee et al. (2009) point out, the term ‘Scheduled Castes’ comes from the Ninth Schedule of the Indian Constitution, which lists for each state in India the specific caste groups who are eligible to benefit from the affirmative action provisions outlined in the Constitution 11 The variable distance of migration is computed as follows. We use the latitude and longitudes of the district headquarters of the school district and the final location within INDTECH to which the employee is assigned. Using the latitude and longitude of the pair of towns, we use the following formula to calculate the distance in kilometers: ACOS(COS(RADIANS(90-Lat1))*COS(RADIANS(90-Lat2)) +SIN(RADIANS(90-Lat1)) *SIN(RADIANS(90Lat2)) *COS(RADIANS(Long1-Long2)))*6371 18 V. RESULTS V.A. Econometric Specifications To estimate whether or not being from a remote district for employee i affects short-term and long-term performance, we run the following specification: 1) = , where is a measure of underlying short term (in 2008) or long term (in 2010) performance, L indicates technology center fixed effects and indicates a vector of individual characteristics including gender and whether or not individual is a member of scheduled caste/scheduled tribe. Given that performance is measured in normalized bands, we implement the specification using an Ordered Logit model.12 To recap the benefit of exploiting the natural experiment, in our setting, the variable assigned mainstream is arguably uncorrelated to the variable from remote or observable measures of ability. To estimate whether or not there is positive selection of employees from remote districts, we run the following specification: 2) where = is the verbal/logical scores at the recruitment stage and , indicates a vector of individual characteristics including whether or not individual is a member of scheduled caste/scheduled tribe. We implement this specification using OLS with robust standard errors. 12 Let be unobserved dependent variable measuring performance, be a vector of independent variables, parameter vector and the error term, where: Instead of , we observe: ; ; ) () () Consequently, ( ( ) ( ) where ( ) ) In other words, ( be the unknown 19 To estimate whether or not being from a remote district for employee i affects the probability that the employee leaves the firm to join a master’s degree program, we run the following specification: 3) = , where is a dummy variable indicating that the employee left the firm by 2011 to join a master’s degree program, L indicates technology center fixed effects and indicates a vector of individual characteristics including gender and whether or not individual is a member of scheduled caste/scheduled tribe. Given that dummy variable nature of the dependent variable, we implement this specification using a Logit regression and robust standard errors. V.B. Remote status and Performance Figure II outlines the short term performance ratings (2008 performance ratings) for the remote and non-remote employees for the 2007 batch.13 We run distributional tests to compare the short term performance for remote and non-remote employees. We run the two-sample Wilcoxon rank-sum (Mann-Whitney) test and reject the null hypothesis that the performance data for the two groups follow the same distribution. Tables III and IV report results for the Ordered Logit regression with robust standard errors described in specification (1). For these regressions we only considered the 2007 batch given that for the 2007 batch we could analyze the relation between home town/village and both short term (end of 2008) and long term (end of 2010) performance within the firm. Table III reports the relation between the variable from remote district and short term (end of 2008) performance. Table IV reports this relationship for long term (end of 2010) performance. We would also like to highlight that though the size of the 2007 batch is 1696 employees, the sample size of the regression analysis in Tables III and IV is smaller than that, given the ‘9 month rule’ described earlier in Section IV. Table III indicates that there is a positive and statistically significant relation between being from a remote district and short term performance. This result is robust to controlling for 13 Summary statistics for short-term and long-term performance ratings for the 2007 batch are in Appendix III 20 whether or not the employee was assigned a mainstream technology center, technology center fixed effects, test scores at the end of training, gender, whether or not the employee is from a schedule caste and distance of migration from home. Among the control variables, as expected, being assigned a mainstream technology center and CGPA at the end of training are highly correlated to short term performance. Columns IV, VIII and IX also indicate a positive and statistically significant relation between being a member of a scheduled/other backward caste and short term performance. Marginal effects are reported in the top graphic of Figure IV and indicate that the predicted probability that an employee from a remote district will achieve the highest performance rating (for short term performance) is 39% in the fully specified model. The corresponding predicted probability that an employee from a non-remote district will achieve the highest performance rating (for short term performance) is 28%. Table IV reports the relation between long term (end of 2010) performance and being from a remote district. Columns I-VII suggest that the effect of being from a remote district loses statistical significance on long term performance. However, once we control for the log distance of migration and introduce an interaction effect between being from remote district and log distance of migration, we find that being from a remote district has a positive and statistically significant relation with long term performance. Column IX indicates that the main effect of being from a remote district on long term performance is positive and statistically significant and the interaction effect is negative. Table IV also indicates that there is no statistically significant relation between being from a scheduled or other backward caste and long term performance. Here too, as expected, CGPA at the end of training is highly correlated to long term performance. Marginal effects are reported in the bottom graphic of Figure IV and indicate that the predicted probability that an employee from a remote district will achieve the highest performance rating (for long term performance) is 17% in the fully specified model. The corresponding predicted probability that an employee from a non-remote district will achieve the highest performance rating (for long term performance) is 5%. Here, we would like to recap that long term performance (i.e. 2010 performance) has five possible ratings compared to short term (i.e. 2008) performance which has three possible ratings. Given this, we also computed the predicted probability of achieving the highest or second best performance rating for long term performance. This combined predicted probability was 39% for remote employees; the 21 corresponding combined predicted probability was 14% for non-remote employees. In summary, the difference in predicted probabilities for remote and non-remote employees was larger for long term performance, compared to short term performance. We also analyze the effect of distance of migration on the predicted probability of achieving the highest performance rating for remote employees and results are reported in Figure V. The predicted probability of achieving the highest performance rating for remote employees is 23% when migration distance is less than 2 miles. The predicted probability of achieving the highest performance rating for remote employees is 10.4% when migration distance is 250 miles. The predicted probability drops to 7.5% when the distance of migration is 1850 miles.14 V.C. Evidence of Selection – Remote Status and Recruitment Scores As one possible explanation of our result, we test for selection. We build on the selection models presented earlier and test whether or not employees from remote districts have higher scores in tests of verbal and logical ability during the time of recruitment. Figure III outlines the plot of verbal and logical scores for the remote and non-remote employees in our sample across all years. We also run distributional tests to compare the verbal and logical scores for remote and non-remote employees. We run the two-sample Wilcoxon rank-sum (Mann-Whitney) test and reject the null hypothesis that the test scores data for the two groups follow the same distribution. Tables V and VI present our regression results that documents a positive and statistically significant relation between being from a remote district and higher verbal and logical scores from the standardized tests at the time of recruitment. Table V relates remote status to logical scores and Table VI does the same for verbal scores. Here we implement specification (2) and use OLS with robust standard errors. In the base case, we run the regressions for the entire cohort (employees hired in 2007, 2008 and 2009). Though the size of the cohort is 8520 employees, the sample size for Tables V and VI are constrained by the availability of data to code the from remote district variable. Among the control variables, as expected, the recruitment test scores are highly correlated with CGPA at the end of training. The recruitment scores are also highly correlated to being from a member of a scheduled caste/OBC. These results provide empirical validation that the firm is selecting high ability individuals from remote districts. 14 India measures 3,214 km (1,997 miles) from north to south and 2,933 km (1,822 miles) from east to west (Source: http://en.wikipedia.org/wiki/Geography_of_India) 22 V.D. Remote Status and Attrition Next we present evidence of a positive and statistically significant relation between being from a remote district and quitting the firm to join a master’s degree program. Here we implement specification (3) and employ a Logit regression with robust standard errors. The sample size here is determined by the total number of employees in our sample who left the firm till 2011 (N=1823 as indicated in Table I) and the availability of data to code the from remote district variable. Results are reported in Table VII. The fully specified model indicates that conditional on quitting the firm by 2011, the predicted probability that an employee from a remote district will quit to join a master’s degree program is 45%. The corresponding predicted probability that conditional on quitting the firm by 2011, an employee from a non-remote district will quit to join a master’s degree program is 38%. V.E. Robustness Checks Our most important robustness check relates to verifying the validity of the natural experiment, i.e. validating that the technology center allocation decision is not correlated with observable measures of performance and employee level characteristics. Results are reported in Table VIII and indicate that the decision to allocate an employee to a mainstream technology center post induction training is not correlated to observable employee level characteristics (such as being from a remote district) or observable measures of ability (such as CGPA at the end of training or standardized test scores at recruitment stage). This validates the talent allocation policy underlying the natural experiment. In additional robustness checks, for the 2007 class, we dropped the individuals who left the firm between 2007 and 2011 and ran the specifications and the results remained robust. We also relaxed the definition of the from remote district variable. In the base case, we had taken the most limiting definition of the variable, only coding the variable as 1 if there was no missing data for the school, high school or college district and if all three of these districts were low employment districts. In robustness checks, we relaxed this limitation of missing data and our results remain robust. We get larger coefficient estimates for the ‘from remote district’ variable in Tables III-VIII and the statistical significance remains the same or improves. We also ran regressions to track individuals over time. Results are reported in the appendix and indicate that employees from remote districts who achieved the highest 23 performance rating in the short term are more likely to achieve the highest performance rating over the long term. We however found no evidence that remote employees disproportionately improved performance between 2008 and 2010, compared to non-remote employees. V.F. Survey of Urban and Rural Colleges In addition to conducting the analysis using personnel data from INDTECH, we conducted a survey of randomly selected rural and urban colleges in India, to study why INDTECH might have a higher probability of finding a high ability individual in a remote district. In other words, the purpose of the survey was to validate whether or not the demand for skilled engineers and the recruitment opportunities for skilled engineers exhibited geographic separation in line with the underlying assumptions of the refugee sorting model of Borjas (1994) and the geographic skill segregation model of Young (2013). To conduct this analysis, we randomly selected 10 urban and 10 rural colleges from the list of colleges from which INDTECH hired its employees. We then contacted these colleges via phone and/or email. Eleven of the 20 colleges agreed to participate in a telephonic survey that lasted for around 30 minutes each. In these interviews, we asked questions about the total graduating class size at these colleges, starting salaries for undergraduate engineers in 2011 and 2012 and asked questions related to which technology firms (both Indian and foreign multinational) hired from these colleges and how many students were hired by each firm. The interviews were conducted with either the head of the college, or the head of the group that was responsible in organizing recruitment at the college. Results are reported in Table IX and indicate that the mean salaries for 2011 and 2012 are significantly higher for the urban colleges compared to the rural colleges. We also validated that this difference is statistically significant in a t-test comparison of means. These findings are in line with the geographic returns to skills segregation assumptions of Young (2013) and Borjas (1994) and indicate that firms such as INDTECH can arbitrage such differences in returns to skills across geographic regions. In addition, the survey reveals that not every technology firm hires from low employment districts. Multinational technology firms predominantly hire from the urban colleges and INDTECH has the distinctive policy of hiring from both urban and rural colleges. 24 VI. CONCLUSION Our empirical study attempts to bridge two gaps in the migration literature in economics – the lack of understanding on the role of the firm in facilitating migration and the relative lack of focus in the literature on within country migration. We exploit a natural experiment within a large Indian technology firm that allows us to isolate the relation between the prior location (home town/village) of an employee on subsequent performance within the firm and attrition. The allocation of employees to technology centers is done by a computer application that does not consider observable skills and other personal characteristics of the employee while allocating employees to one of several technology centers. The firm follows this talent allocation policy so that the end customers of the firm, which are firms mostly based in the U.S., are indifferent among the particular INDTECH center that executes their project. As described earlier, we do not use the random assignment of employees to technology centers as ‘treatment’; instead we use the protocol to control for endogeneity concerns. This policy helps us avoid endogeneity issues that arise more usually, where employees might have been assigned technology centers close to their home town/village. If assignment of technology center was based on proximity to home, then remote employees would have been systematically far away from the larger technology centers and their performance might have been downward biased due to missing out on agglomeration effects. In more usual settings, ethnic or social ties could also determine the assignment of an employment to a technology center and this too could abet or retard performance in a way that an econometrician may not be able to observe. Our results indicate that employees hired from low employment districts of India outperform their non-remote counterparts in the short term. In the longer term, remote employees continue to outperform their non-remote counterparts, once we control for the distance of migration. An important point here is that though the distance of migration measure has been used in the migration literature since Schultz (1971) and Schwartz (1973); however, given the random assignment of employees to technology center, our study is plausibly the first study where the distance of migration is exogenously determined. To provide a possible interpretation of our results, we test for selection and build on the selection models in the migration literature. Borjas (1994) and Young (2013) are two possible models that provide a plausible theoretical explanation for our results. In light of the selection models, we provide evidence that employees hired from low employment districts have higher 25 verbal and logical test scores at the recruitment stage. Our interviews further corroborate this. To quote the head of talent development at INDTECH, “If you are the best student in Bangalore, you will probably never join INDTECH. Instead, you will join some MBA course to go to America. However, if you are the best student in Sundergarh Orissa, a remote district with high unemployment levels, INDTECH is your dream first job.” To further establish the geographic separation of employment opportunities for undergraduates in India and why INDTECH might be able to hire high ability individuals from remote colleges, we conducted a survey at 11 randomly selected urban and rural engineering colleges. The urban engineering colleges have 2.3 times higher starting salaries on average and this difference is statistically significant. Multinational technology firms like IBM almost exclusively visit urban colleges in our sample. The large difference in wages for skilled workers graduating from urban and rural colleges is arguably an exposition of the underlying conditions that leads to ‘refugee sorting’ in the Borjas (1994) model of self-selection of migrants. Our results also indicate that remote employees are more likely to join a master’s degree program when they leave the firm. Our study has several limitations. Our natural experiment and data comes from a single firm. We follow the tradition of insider econometrics in the personnel economics and collect data from a single firm; future analysis needs to test our central findings in other settings and will need to corroborate that (i) employees from low employment districts outperform their nonremote counterparts and (ii) this is attributable to selection in other settings. Also, though we interpret our results using plausible selection models in the migration literature and the regressions of recruitment test scores, we do not have a way to tease out alternative and complementary mechanisms for why employees from remote districts perform better. It is plausible that remote employees exert more effort (in addition to being higher ability) and we are unable to test this. Moreover, our empirical analysis only provides estimates of the performance premium of hiring remote employees and do not provide estimates of the incremental costs related to hiring from remote districts. Our interviews with managers at INDTECH indicate that there are incremental costs of hiring from remote districts compared to hiring from cities like Bangalore and we do not incorporate these additional costs in our theoretical or empirical analysis. These costs relate to travel and search costs in remote locations and we do not have an estimate of how 26 long it took INDTECH to assemble the infrastructure to hire from remote districts and the sunk costs of such investments.15 Given that smaller firms including technology startups in Beijing or Bangalore may not be able to incur these additional costs and might focus their hiring around the large cities, from a social planner’s point of view, it might make sense for government actors to facilitate employment fairs in remote districts where a large number of smaller firms might be able to hire highly skilled workers. We also do not have a way to compare the ‘hiring funnel ratio’, a term used by hiring managers at INDTECH which measures the number of individuals to whom a job offer is made as a fraction of individuals tested, for rural and urban colleges. It is plausible that the ‘average’ remote student who is not hired (i.e. the average remote student who is tested and not hired) is lower ability than the average non-remote student (similar student in non-remote districts who is tested and not hired) and we do not have the data to test this. Another limitation of our study is that, in equilibrium, the gains from hiring remote employees is likely to disappear as firms set up centers in the remote districts over time and/or as the barriers of within-country migration are gradually overcome. However, here again we borrow from the long standing wisdom in the migration literature and posit that “neither migration, nor any equilibrating force is strong enough to eliminate imbalances instantaneously.”(Yap, 1976, page 122) Our study, though empirical in nature, informs the prior theory literature on migration and labor markets. In addition to being related to the selection models in the migration literature, including Borjas (1994) and Young (2013), our survey results provide evidence of a geographic 15 Our back of envelope analysis however does suggest a net gain in hiring remote employees. This is calculated as follows: in the first step we compute the gains in hiring a remote employee. We base this analysis on 2008 performance data for the 2007 batch. We use the predicted probabilities of achieving the highest performance rating for remote and non-remote employees. Figure IV indicates that remote employees are 11% more likely to achieve the highest performance rating compared to non-remote employees. We assume that the entry level salaries are ~$8000 per year (at 2013 USD to Rupee exchange rates). We also assume that compared to employees who achieve the highest performance rating, other employees need 35% more man-days to correct coding/testing/documentation errors (this assumption was based on rough calculations with INDTECH human resource managers on the error rate and the man days lost due to coding/testing/documentation errors for top-tier employees compared to other employees). We then calculate the gains from hiring a remote employee to be around $308 per year. Next we compute the incremental costs of recruiting remote employees. This analysis is based on discussions with recruiting managers at INDTECH and leads to an estimate of $21 incremental cost of hiring a remote employee. This is based on both incremental travel costs for INDTECH executives involved in hiring from remote locations, the additional search costs of visiting rural colleges and the drop in the ‘funnel ratio’ (number of students offered a job as a percentage of the number of students who are invited for the recruitment tests) for rural colleges. Given these numbers, the net gain of hiring a remote employee is estimated to be around $287 per employee per year. This works out to around 4% of the entry level salary of an undergraduate employee. 27 separation of skills and employment opportunities. This finding is related to the segmented labor markets literature in economics, notably the work of Reich, Gordon and Edwards (1973) and Dickens and Lang (1988).16 Our findings have several policy implications for India, which aspires to enjoy a demographic dividend in the coming decades. As Chandrasekhar et al. (2006) point out, in 2020, the average Indian will be only 29 years old, compared with the average age of 37 years in China and the US, 45 in west Europe and 48 in Japan. Within the next 30 years, 64 per cent of India’s population will fall within working age range (15-64). Aiyar and Mody (2011) estimate that these shifts in India’s age structure will contribute significantly to India’s economic growth, adding upwards of 2 percentage points per year to India’s per capita GDP in the next two decades. However, aligning economic outcomes with demographic trends is no easy task, especially in a country with already existing regional inequalities. Deaton and Dreze (2002) delineate how regional disparities increased in the 1990s, with southern and western regions showing much higher rates of growth than the northern and eastern regions. While urban populations have seen an increase in average per capita expenditures (ACPE) as high as 20 to 30 per cent, poorer states have seen minimal growth in ACPE in recent years. They also document minimal reduction of poverty in rural areas.17 In addition, Chadrasekhar et al. (2006) point out that absorption of the Indian youth into the labor force is not as high as one would expect. In our own analysis, using data on where the young Indian population lives (collected from the Indian census of 2001) and data on where the technology firms are located (based on membership data of the National Association of Software and Services Companies or NASSCOM), we outline this geographic mismatch in the context of the Indian technology sector. Table X outlines this analysis. It is clear that in absence of within-country migration that moves highly skilled young 16 The theory of segmented labor markets came into prominence in the 1970s. As Reich, Gordon and Edwards (1973) point out, “American workers seemed to operate in different labor markets, with different working conditions, different promotional opportunities, different wages and different market institutions”. The authors also point out that these segmented labor markets were created through differences in race, sex, educational credentials, etc. The theory of segmented labor markets was implicitly used by authors such as Summers (1986) in analyzing differences in unemployment rates among workers belonging to different age and gender characteristics. The theory also made a prominent comeback in the late 1980s in work by Dickens and Lang (1988). The authors summarize the key propositions of this theory. Firstly, the labor market has two sectors – a high wage primary sector with stable employment and substantial returns to human capital variables such as education and a low wage secondary sector with the opposite characteristics. Moreover, primary jobs are rationed, not every worker who desires a job in the primary sector can obtain one. 17 Deaton and Dreze used adjusted headcount ratios (HCR), poverty indexes, and per capital expenditures to estimate the gaps between rural and urban poverty lines in periods from 1987-88, 1993-94, and 1999-00 across states and regions. 28 people to the urban centers which have the employment opportunities, the perceived demographic dividend could become a ‘demographic albatross’. Our results also have policy implications for other large developing countries such as China which have witnessed significant within-country migration and have arguably seen an increase in regional disparity. Yang (2002) documents how regional inequality in China has risen in the past two decades. The author attributes this inequality primarily to a large rural–urban income gap and growing inland–coastal disparity and documents that the ratio of urban–rural income and consumption hovered between 2 and 3.5 since the inception of reform. Yang also documents that per capita production and consumption diverged across China’s regions—the initially rich coastal provinces were better off and the interior provinces became relatively disadvantaged during the reform period (Fleisher & Chen, 1997; Kanbur & Zhang, 1999). Overall, the indices of regional inequality first showed moderate declines, but then rose. Towards the end of 1990s, they gradually climbed to peak historical levels during the Great Leap Famine (Kanbur & Zhang, 2002). Our work also has relevance for developed countries like the United States. In the context of the U.S. labor market, economists have also found that geographic location has a significant effect on wages. To quote Moretti (2010), “the hourly wage of workers located in metropolitan areas at the top of the wage distribution is more than double the wage of observationally similar workers located in metropolitan areas at the bottom of the distribution”, (Moretti 2010, page 1238). In the same light, in his new book titled ‘The New Geography of Jobs’, Moretti states that “your salary depends more on where you live than on your resume”, (Moretti, 2012; page 88). In addition, building on Katz, Kling and Liebman (2001), the geographic disparities in wages and employment opportunities in the U.S. is accentuated by the residential segregation by race and by income in the large metropolitan areas. In this context, the authors conducted the “moving to opportunity” experiments in Boston where, in the context of high poverty public housing projects certain households received ‘Section 8 housing vouchers’ that could be used to help pay for rental from private landlords. Children in households offered vouchers valid only in low poverty neighborhoods had reduced likelihoods of injury and victimization by crime. In conclusion, both in the U.S. and in developing countries, firms can play an important role in facilitating the within-country migration of talent from low employment to areas with more employment opportunities. Our results indicate that a focal firm can benefit from such 29 hiring practices if it is able to effectively screen and select high ability individuals in remote districts and if such individuals perform better compared to their non-remote counterparts. 30 TABLE I EMPLOYEE CHARACTERISTICS FOR 2007 BATCH AND FOR ALL BATCHES From remote district Assigned mainstream center CGPA (end of training) Logical score (recruitment test) Verbal score (recruitment test) Member of scheduled or other backward caste Male Log distance of migration Quit firm by 2011 Quit for higher studies N 1578 1621 1696 1635 1635 1696 1696 1189 1696 713 2007 Batch Mean Std. Dev. 0.38 0.49 0.89 0.31 4.48 0.43 4.72 3.67 4.29 3.98 0.51 0.50 0.65 0.48 6.30 1.58 0.42 0.49 0.39 0.49 All Years (2007-2009) N Mean Std. Dev. 3174 0.33 0.47 7497 0.89 0.31 8517 4.50 0.51 8401 0.68 3.71 8401 0.57 4.34 8520 0.10 0.30 1696 0.65 0.48 2487 6.17 1.67 8520 0.21 0.41 1823 0.24 0.42 Notes: This table lists employee characteristics for the 2007 batch (columns 1-3) and for all the batches who join in 2007, 2008, 2009 (columns 4-6). The variable ‘from remote district’ is coded as 1 if the individual went to school, high school and college in a low employment district. This variable was coded based on resumes collected for the employees; the data to code this variable is available for 93% of the 2007 batch and for 37% of employees for all batches. The variable ‘assigned mainstream center’ is coded as 1 if the individual is randomly assigned one of the mainstream technology centers of the firm; data to code this variable is available for 96% of the 2007 batch and for 88% of employees from all batches. CGPA is the cumulative grade point average at the end of the training and is available for the entire 2007 batch and 99.9% of employees from all batches. The logical and verbal scores are from the standardized multiple-choice recruitment tests; the standardized test includes negative penalties for wrong answers. The test was changed between 2007 and 2008 and this reflects in the mean scores for the 2007 and later batches; this information is available for 96% of the 2007 batch and 99% of employees from all batches. The variable ‘member of scheduled or other backward caste’ is coded as 1 if the employee is member of one of the scheduled or other backward castes (OBCs). This variable is available for the entire batch; the mean for the 2007 batch is significantly higher than the mean for other batches. There were changes in government reservation policy towards OBCs in 2008 (on 10th April 2008 the Supreme Court of India upheld the government's initiative of 27% OBC quotas in government-funded institutions); however it is unclear if that affected the change in fraction of SC/OBC employees between 2007 and 2008. The variable ‘log distance of migration’ is the log of the distance from the district headquarters of the school district of the employee to the final technology center the employee is assigned to. This distance was calculated using the latitude and longitude of the two locations (district headquarters of school district and final location) and using an algorithm that calculates the distance between two locations using the arc radians method and data to code this variable was available for 70% of the 2007 batch and for 29% of employees from all years. The variables ‘quit firm by 2011’ was available for all the batches across the years; the variable ‘quit for higher studies’ was coded based on data from exit interviews. 31 TABLE II MIGRATION PATTERNS School Remote High School College 38% (low employment district) 12% Non-remote 20% (high employment district) 21% Notes: This table outlines the various migration patterns that emerge for employees in the sample collected. Similar to Young (2013), we observe bi-directional migration flows in our data (i.e. remote to non-remote and non-remote to remote). We also observe students who spend their entire educational career in remote districts and non-remote districts. 1. Path 1 (top row) refers to the set of employees who spend their entire educational career in a low employment district prior to being hired by INDTECH. For the 2007 batch, this proportion is 38% and for the entire batch, the proportion is 33%. Only these employees are coded as being from remote district 2. Path 2 (bottom row) refers to employees who spend their entire educational career in a high employment district prior to being hired by INDTECH. Their proportion is 21% in the 2007 batch and 16% for all batches 3. The proportion of students who went to a (i) non-remote school, (ii) non-remote high school and (iii)remote college is 20% in the 2007 batch and 24% for all batches 4. The proportion of students who went to a (i)remote school, (ii)remote high school and (iii) nonremote college is 12% in the 2007 batch and 9% for all batches 5. The proportion of students who went to a (i) non-remote school, (ii)remote high school and (iii)remote college is 3% in the 2007 batch and 1% for all batches 6. The proportion of students who went to a (i)remote school, (ii) non-remote high school and (iii)remote college is 3% in the 2007 batch and 13% for all batches 7. The proportion of students who went to a (i)remote school, (ii) non-remote high school and (iii) non-remote college is 2% in the 2007 batch and 3% for all batches 8. The proportion of students who went to a (i) non-remote school, (ii)remote high school and (iii) non-remote college is 1% in the 2007 batch and 1% for all batches 9. As Table I indicates, the data to code this variable is available for 93% of the employees in the2007 batch and for 37% of employees for all batches. 32 TABLE III ORDERED LOGIT OF REMOTE STATUS ON SHORT TERM PERFORMANCE I Dependent Variable = Performance at the end of 2008 II III IV V VI VII VIII 0.29* (0.18) 0.41** (0.18) 0.30* (0.18) 0.32* (0.18) 0.31* (0.18) 0.39* (0.20) 0.33* (0.18) CGPA (end of training) - 2.31*** (0.30) - - - - - 2.39*** 2.39*** (0.31) (0.31) Assigned mainstream center - - 0.53* (0.28) - - - - 0.68** (0.30) - Member of scheduled or other backward caste - - - 0.34** (0.16) - - - 0.48** (0.18) 0.49*** (0.17) Male - - - - 0.30* (0.17) - - 0.22 (0.18) 0.32* (0.18) Log distance of migration - - - - - 0.00 (0.06) - - - 627 627 604 627 627 486 Yes 627 604 Yes 627 From remote district Fixed Effects for center N 0.46** (0.19) IX Notes: This table reports the relation between the variable from remote district and short term (end of 2008) performance and implements specification (1) using an Ordered Logit Regression with robust standard errors (reported in parentheses). We only considered the 2007 batch given that for the 2007 batch we could analyze the relation between home town/village and both short term (end of 2008) and long term (end of 2010) performance within the firm (reported in Table IV). Though the size of the 2007 batch is 1696 employees, the sample size of the regression analysis in Tables II and III is lower than that, given the ‘9 month rule’ described earlier in Section IV. This table indicates that there is a positive and statistically significant relation between being from a remote district and short term performance. There is also a there is a positive and statistically significant relation between being from a scheduled/other backward caste and short term performance. *Significant at the 10% level. **Significant at the 5% level. *** Significant at the 1% level. 33 0.52*** (0.18) TABLE IV ORDERED LOGIT OF REMOTE STATUS ON LONG TERM PERFORMANCE Dependent Variable = Performance at the end of 2010 III IV V VI VII I II 0.10 (0.11) 0.14 (0.12) 0.11 (0.12) 0.10 (0.12) 0.10 (0.12) 0.09 (0.12) CGPA (end of training) - 1.23*** (0.19) - - - Assigned mainstream center - - -0.02 (0.19) - Fixed Effects for center - - - Yes Member of scheduled or other backward caste - - - Male - - - Log distance of migration - - From remote * log distance of migration 1165 From remote district N VIII IX 0.09 (0.13) 1.19* (0.62) 1.37** (0.62) - - - 1.33*** (0.22) - - - - -0.04 (0.23) - - - - - 0.01 (0.11) - - - 0.06 (0.14) - - 0.52*** (0.12) - - 0.45*** (0.15) - - - - -0.09* (0.05) 0.02 (0.07) 0.05 (0.07) - - - - - - -0.17* (0.10) -0.19* (0.10) 1165 1130 1163 1165 1165 853 853 828 Notes: This table reports the relation between the variable from remote district and long term (end of 2010) performance and implements specification (1) using an Ordered Logit Regression with robust standard errors (reported in parentheses). Like in Table III, we only considered the 2007 batch. The sample size is less than the size of the 2007 batch (1696 employees) given the ‘9 month rule’ (Section IV). The table indicates a positive and statistically relation between being from a remote district and long term performance once we control for the distance of migration and introduce an interaction term between being from a remote district and distance of migration (Columns VIII and IX). *Significant at the 10% level. **Significant at the 5% level. *** Significant at the 1% level. 34 TABLE V OLS OF REMOTE STATUS ON LOGICAL SCORE DURING RECRUITMENT TEST Dependent variable = Logical score at recruitment test I II III IV V From remote district 0.85*** (0.17) 0.82*** (0.16) 0.69*** (0.15) 0.46*** (0.14) 0.42*** (0.14) CGPA - 1.46*** (0.14) - - 0.62*** (0.12) Member of scheduled or other backward caste - - 4.02*** (0.16) Verbal score - - - 0.47*** (0.01) 0.38*** (0.01) 3118 3118 3118 3118 3118 N 2.44*** (0.17) Notes: This table presents our regression results establishing a positive and statistically significant relation between being from a remote district and logical scores from the standardized tests at the time of recruitment. Here we implement specification (2) and use OLS with robust standard errors (reported in parentheses). In the base case, we run the regressions for the entire cohort (employees hired in 2007, 2008 and 2009). Though the size of the cohort is 8520 employees, the sample size for the regressions in this table is constrained by the availability of data to code the from remote district variable. *Significant at the 10% level. **Significant at the 5% level. *** Significant at the 1% level. 35 TABLE VI OLS OF REMOTE STATUS ON VERBAL SCORE DURING RECRUITMENT TEST I From remote district Dependent variable = Verbal score at recruitment test II III IV V 0.82*** (0.19) 0.79*** (0.18) 0.67*** (0.17) 0.33** (0.16) 0.31** (0.15) CGPA - 1.53*** (0.16) - - 0.62*** (0.14) Member of scheduled or other backward caste - - 3.87*** (0.18) Logical score - - - 0.58*** (0.02) 0.50*** (0.02) 3118 3118 3118 3118 3118 N 1.78*** (0.18) Notes: This table presents our regression results establishing a positive and statistically significant relation between being from a remote district and verbal scores from the standardized tests at the time of recruitment. Here we implement specification (2) and use OLS with robust standard errors (reported in parentheses). In the base case, we run the regressions for the entire cohort (employees hired in 2007, 2008 and 2009). Though the size of the cohort is 8520 employees, the sample size for the regressions in this table is constrained by the availability of data to code the from remote district variable. *Significant at the 10% level. **Significant at the 5% level. *** Significant at the 1% level. 36 TABLE VII LOGIT OF REMOTE STATUS ON ATTRITION TO JOIN MASTERS DEGREE PROGRAM Dependent variable: Quit firm to join master’s degree program I II III IV V From remote district 0.35** (0.14) 0.25* (0.15) 0.31** (0.15) 0.33** (0.14) 0.31** (0.16) Assigned mainstream center - 0.45* (0.27) - - 0.49* (0.28) CGPA (end of training) - - 1.94*** (0.14) - 1.34*** (0.19) Member of scheduled or other backward caste - - - 0.46*** (0.14) -0.34** (0.16) 1059 767 1059 1059 767 N Notes: In this table, we present evidence of a positive and statistically significant relation between being from a remote district and quitting the firm to join a master’s degree program. Here we implement specification (3) and employ a Logit regression with robust standard errors. The sample size here is determined by the total number of employees in our sample who left the firm till 2011 (N=1823 as indicated in Table I) and the availability of data to code the from remote district variable. *Significant at the 10% level. **Significant at the 5% level. *** Significant at the 1% level. 37 TABLE VIII LOGIT OF REMOTE STATUS AND TEST SCORES ON BEING ASSIGNED MAINSTREAM CENTER Dependent variable: Assigned mainstream center I II III IV IV -0.02 (0.17) - - - -0.06 (0.17) Logical score at recruitment test - 0.02 (0.02) - - 0.02 (0.02) Verbal score at recruitment test - - -0.01 (0.02) - -0.01 (0.02) CGPA (end of training) - - - -0.22 (0.23) -0.15 (0.26) 1508 1564 1564 1621 1459 From remote district N Notes: This table reports results from an important robustness check, that the technology center allocation decision is not correlated to observable measures of performance and employee level characteristics such as being from a remote district. 38 TABLE IX SURVEY OF URBAN AND RURAL ENGINEERING COLLEGES Urban colleges Rural colleges Average size of graduating class in computer science/IT (undergraduate & masters) 342 458 Average percentage of graduating class in computer science/IT hired by INDTECH (in 2011, 2012) 0.17 0.06 Average percentage of graduating class in computer science/IT hired by multinational technology firms IBM and Cognizant (in 2011, 2012) 9% 1% Mean annual salary (Rupees Lakhs, 2011 and 2012 average) 6.2 2.7 7 4 N Notes: The researchers randomly selected 10 urban and 10 rural engineering colleges from the list of colleges that INDTECH hires from and contacted the colleges to participate in a telephonic survey. The researchers were able to conduct interviews with 7 out of the 10 urban colleges and these included R.V. College of Engineering, Bangalore; M.S. Ramaiah Institute of Technology, Bangalore; MLR Institute of Technology, Hyderabad; Muffakham Jah College of Engineering and Technology, Hyderabad; Vasavi College of Engineering, Hyderabad; G.Narayanamma Institute of Technology & Science (GNITS), Hyderabad and Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad. The researchers were also able to conduct interviews with 4 out of the 10 selected rural colleges. These included M.J.P. Rohilkhand University; Majhighariani Institute of Technology & Science, Rayagada Orissa; Bapatla Engineering College, Guntur and Jaya Prakash Narayan College of Engineering, Mahabubnagar. The survey results indicate that the mean salaries for 2011 and 2012 are significantly higher for the urban colleges compared to the rural colleges and this difference is statistically significant for a t-test comparison of means. In addition, the survey reveals that multinational technology firms predominantly hire from the urban colleges while INDTECH has the distinctive policy of hiring from both urban and rural colleges. Rs.1 lakh = Rs. 100,000. 39 TABLE X PERCENTAGE OF YOUNG POPULATION AND TECHNOLOGY FIRMS IN INDIAN STATES What percentage of the 20-34 years population in India lives in the state Uttar Pradesh Maharashtra West Bengal Andhra Pradesh Bihar Tamil Nadu Madhya Pradesh Karnataka Gujarat Rajasthan Orissa Kerala Assam Jharkhand Punjab Haryana Chattisgarh Delhi Jammu and Kashmir Uttarakhand Himachal Pradesh Tripura Manipur Meghalaya 0.34 0.23 0.19 0.18 0.17 0.15 0.13 0.13 0.12 0.12 0.09 0.08 0.06 0.06 0.06 0.05 0.05 0.04 0.02 0.02 0.01 0.01 0.01 0.01 Percentage of NASSCOM firms that are headquartered in the state 0.08 0.22 0.04 0.11 0.00 0.10 0.00 0.20 0.03 0.01 0.00 0.01 0.00 0.00 0.00 0.12 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 Notes: This table tabulates the percentage of population in the age group of 20-34 years living in each Indian state and the corresponding fraction of information technology firms listed with the National Association of Software and Services Companies (NASSCOM) in India. States with less than 0.01 percent of the population in age group 20-34 years were dropped from the table. Data collected by the researchers. 40 FIGURE I TECHNOLOGY CENTERS OF INDTECH IN INDIA Notes: INDTECH has 10 technology centers in India. We code seven of these 10 locations as mainstream locations and these are Bangalore, Hyderabad, Chennai, Mysore, Mangalore, Trivandrum and Pune. These centers are marked in blue. The remaining three locations, Bhubaneshwar, Jaipur and Chandigarh, are coded as non-mainstream locations and are marked in red. We did this coding in consultation with the head of talent development at INDTECH and the underlying rationale is predominantly based on the geographic distance to the headquarters (Bangalore). 41 FIGURE II 0 2 4 6 SHORT TERM (END OF 2008) PERFORMANCE FOR 2007 BATCH 1 1.5 2 2.5 Not Remote Remote 3 Notes: This graphic plots the distributions of short term performance (i.e. performance at the end of 2008) for the 2007 batch. Interviews with managers at INDTECH indicate that short term performance (i.e. performance at the end of 2008 for the 2007 batch) is measured using the following two dimensions – (i) error rate in coding/testing and (ii) completeness in coding/testing and documentation and is distributed across three possible discrete ratings, with the distribution of ratings across employees fitted using a normal distribution. We also conduct distributional tests to compare the short term performance for remote and non-remote employees. We run the two-sample Wilcoxon rank-sum (Mann-Whitney) test and reject the null hypothesis that the performance data for the two groups follow the same distribution (Prob > |z| = 0.0937). 42 FIGURE III .06 .04 0 .02 kdensity ls .08 .1 LOGICAL AND VERBAL SCORES FROM RECRUITMENT TEST -10 -5 0 5 10 x Remote .04 0 .02 kdensity vs .06 .08 Not Remote -10 -5 0 5 10 15 x Not Remote Remote Notes: This graphic plots the kernel densities of logical scores (top panel) and verbal scores (bottom panel) for remote and non-remote employees. These scores are based on standardized tests conducted at the time of recruitment. We also conduct distributional tests to compare the scores for remote and nonremote employees. We run the two-sample Wilcoxon rank-sum (Mann-Whitney) test and reject the null hypothesis that the scores for the two groups follow the same distribution (Prob > |z| = 0.0000). 43 FIGURE IV PREDICTED PROBABILITIES FOR SHORT TERM AND LONG TERM PERFORMANCE Predicted probabilities - Short term performance 0.8 Predicted probability 0.7 0.6 0.5 0.4 Remote 0.3 Non remote 0.2 0.1 0 Lowest rating Middle rating Highest Rating Performance slabs Predicted probabilities - Long term performance predicted probabilities 70% 60% 50% 40% 30% Remote 20% Non remote 10% 0% Lowest rating Middle Rating Highest Rating Performance slabs Notes: This graphic plots the predicted probabilities of achieving the highest performance rating for remote and non-remote employees in the short term and in the long term, based on the regressions reported in Tables III and IV. The top panel plots the predicted probabilities for short term (end of 2008) performance and the bottom panel plots the predicted probabilities for long term (end of 2010) performance, both for the 2007 batch. Both plots indicate an 11-12% higher predicted probability of achieving the highest performance rating for a remote employee, vis-à-vis a non-remote employee. 44 FIGURE V MIGRATION DISTANCE AND PREDICTED PROBABILITIES FOR ACHIEVING HIGHEST RATING IN LONG TERM PERFORMANCE (FOR REMOTE EMPLOYEES) 25% Predicted probability 20% 15% Predicted probability of achieving highest rating for long term performance (for remote employees) 10% 5% 0% 0 500 1000 1500 2000 Migration distance (miles) Notes: This graphic plots the effect of distance of migration on the predicted probability of achieving the highest performance rating for remote employees. This analysis was done for long term performance (performance at the end of 2010) for the 2007 batch. The predicted probability of achieving the highest performance rating for remote employees is 23% when migration distance is less than 2 miles. The predicted probability of achieving the highest performance rating for remote employees is 10.4% when migration distance is 250 miles. The predicted probability drops to 7.5% when the distance of migration is 1850 miles. 45 APPENDIX 1 REFUGEE SORTING MODEL FROM BORJAS (1994) pages 1687-1689 Suppose the residents of country 0 (source) consider migrating to country 1 (host) where the earnings distribution in the two countries is given by: and The migration decision is determined by a comparison of earning opportunities across countries, net of migration costs (C). The model then defines an index function: ( ) ( ) ( ) gives “time-equivalent” measure of migration cost. A worker migrates to host country if Where | ) which gives earnings of immigrants The model then computes the conditional means ( | ) which gives immigrant earnings in host country prior to migration and ( ( | ) ( | and where can be written as ( ) | ) reduces to | And ( Where where ( is the correlation between and V ) ) reduces to ( ) ( ) The model then defines ( | ) and ( | ) Refugee sorting happens under between ) is very small The exact condition is and this happens if iff ( (correlation coefficient ) 46 APPENDIX 2 STEPS OF EMPLOYEE RANDOM ASSIGNMENT [Based on interviews and INDTECH internal documents. Part of the text is copied from INDTECH internal documents] Allocation of Software Engineer trainees (SET) to business units is done by a computer application called ‘Talent Planning’ that is part of the firm enterprise resource allocation software system. This application allocates SETs to a unit and location based quarterly manpower budget released by Corporate Planning (CPLAN) The “process lifecycle steps” are: Collating the manpower budget and unit wise requirements Allocation of individuals to various technology streams Trainee allocation (Unit and Location) Communication to stake holders Talent Planning does the allocation by matching the following Unit wise requirements (Business HR at each location provides data on requirement for SETs trained in various technologies) Data from HR located at the training location. Two weeks prior to completion of training batches, HR at the training location releases data on which individuals are expected to complete training The two variables that the ‘Talent Planning’ team looks at while doing the matching on an automated system include the stream of training for the trainee and the estimated date of completion of training. The prior background of the employee and the test scores of the employee are not considered in this decision Communication of allocation decisions is through a centralized portal 47 APPENDIX 3 SUMMARY DATA OF PERFORMANCE RATINGS FOR 2007 BATCH Variable N Mean Std. Dev. Min Max perf08band1 711 0.34 0.47 0 1 perf08band2 711 0.62 0.49 0 1 perf08band3 711 0.05 0.21 0 1 perf10band1 1696 0.07 0.26 0 1 perf10band2 1696 0.12 0.32 0 1 perf10band3 1696 0.43 0.49 0 1 perf10band4 1696 0.1 0.3 0 1 perf10band5 1696 0.02 0.12 0 1 48 Appendix 4: Tracking Individuals Over Time From remote district From remote district * Achieved highest performance rating in 2008 Dependent Variable = Did employee achieve highest performance rating in 2010 -0.26 -0.09 -0.20 -0.28 -0.24 -0.02 0.30* (0.27) (0.27) (0.27) (0.27) (0.27) (0.28) (0.18) 0.97*** (0.34) 0.65* (0.36) 0.97*** (0.34) 0.99*** (0.35) 0.98*** (0.35) 0.69* (0.37) - CGPA (end of training) - 1.22*** (0.34) - - - 1.13*** (0.35) - Assigned mainstream center - - 0.15 (0.33) - - 0.20 (0.34) - Member of scheduled or OBC - - - -0.08 (0.19) - -0.01 (0.20) - Male - - - - 0.57*** (0.21) 0.50** (0.22) - - - - - - - -0.62*** (0.13) - - - - - - -0.24 (0.24) 627 627 604 627 627 604 1881 Post 2008 From remote district * Post 2008 N Notes: This table tracks performance of individuals over time. The first 6 columns uses cross-sectional data to track whether remote employees who achieved the highest performance rating in 2008, continue to perform well and achieve the highest performance rating in 2010. The key variable of interest is the interaction term (from remote district * achieved highest performance rating in 2008) and indicates that remote employees who achieve the highest rating in 2008 also are likely to achieve the highest rating in 2010. The last column then conducts a difference in differences test by using performance data from 2008, 2009 and 2010 and tests whether remote employees are disproportionately likely to improve performance between 2008 and 2010. The key variable is the interaction term (from remote district * post 2008); we find no statistically significant effect. 49 References Aiyar, Shekhar, and Ashoka Mody. Demographic Dividend: Evidence from the Indian States. International Monetary Fund, 2011. Autor, David, H., Lawrence F. Katz, and Melissa S. Kearney. The polarization of the US labor market. No. w11986. National Bureau of Economic Research, 2006. Baker, George, Gibbs, Michael and Holmstrom, Bengt, “The Internal Economics of the Firm: Evidence from Personnel Data”, The Quarterly Journal of Economics, Vol. 109, No. 4 (Nov., 1994), pp. 881-919 Banerjee, Abhijit, Shawn Cole, Esther Duflo, and Leigh Linden, “Remedying Education: Evidence from Two Randomized Experiments in India”, The Quarterly Journal of Economics (2007) 122 (3): 1235-1264 Banerjee, Abhijit, and Rohini Somanathan. "The political economy of public goods: Some evidence from India." Journal of Development Economics 82.2 (2007): 287-314. Bartel, Ann, Casey Ichniowski and Kathryn Shaw, “ Using "Insider Econometrics" to Study Productivity”, The American Economic Review , Vol. 94, No. 2, (May, 2004), pp. 217-223 Borjas, George J. 1985. Assimilation, Changes in Cohort Quality, and the Earning of Immigrants. Journal of Labor Economics 3 (4) (October): 463-89. Borjas, George, "Economics of Immigration", Journal of Economic Literature 32 (1994), 1667-1717. Borjas, George, "The Labor Demand Curve Is Downward Sloping: Reexamining the Impact of Immigration on the Labor Market", The Quarterly Journal of Economics 118:4 (2003), 1335-1374. Borjas, George, "Do Foreign Students Crowd Out Native Students from Graduate Programs?”, in Ehrenberg, Ronald, and Paula Stephan (ed.), Science and the University (Madison, WI: University of Wisconsin Press, 2005). Borjas, George, and Kirk Doran, "The Collapse of the Soviet Union and the Productivity of American Mathematicians", Quarterly Journal of Economics 127:3 (2012), 1143-1203. Carliner, Geoffrey. “Wages, Earnings and Hours of First, Second and Third Generation American Males,” Econ. Inquiry, Jan. 1980, 18(1), pp. 87–102 Chandrasekhar, C. P., Ghosh, Jayati, Roychowdhury, Anamitra “The ‘Demographic Dividend and Young India’s Economic Future”, Economic and Political Weekly December 9, 2006 Chaudhury, Nazmul, Jeffrey Hammer, Michael Kremer, Karthik Muralidharan and F. Halsey Rogers, “Missing in Action: Teacher and Health Worker Absence in Developing Countries”, Journal of Economic Perspectives—Volume 20, Number 1—Winter 2006—Pages 91–116 Chiswick, Barry, “The Effect of Americanization on the Earnings of Foreign-Born Men,” J. Polit. Econ., Oct. 1978, 86(5), pp. 897–921. Choudhury, Prithwiraj and Khanna, Tarun, “Physical, Social and Informational Barriers to Domestic Migration”, Chapter 17, Between Theory and Developmental, Institutional Realities, edited by Masahiko Aoki, Ken Binmore, Simon Deakin and Timur Kuran, 2012 50 Deaton, Angus, and Jean Dreze. "Poverty and inequality in India: a re-examination." Economic and Political Weekly (2002): 3729-3748. Dickens, W., T., and Lang, K., “The Reemergence of Segmented Labor Market Theory”, The American Economic Review, Vol. 78, No. 2, Papers and Proceedings of the One-Hundredth Annual Meeting of the American Economic Association (May, 1988), pp. 129-134 Dyson, Tim, and Mick Moore. "On kinship structure, female autonomy, and demographic behavior in India." Population and development review (1983): 35-60. Fisman, Raymond and Khanna, Tarun "Facilitating Development: The Role of Business Groups." World Development 32, no. 4 (April 2004): 609-628. Fleisher, Belton M., and Jian Chen. "The coast–non coast income gap, productivity, and regional economic policy in China." Journal of Comparative Economics 25.2 (1997): 220-236. Friedberg, Rachel, "The Impact of Mass Migration on the Israeli Labor Market", Quarterly Journal of Economics 116:4 (2001), 1373-1408. Friedberg, Rachel, and Jennifer Hunt, "The Impact of Immigrants on Host Country Wages, Employment and Growth", Journal of Economic Perspectives 9:2 (1995), 23-44. Gibbons, Robert, “Incentives and Careers in Organizations”, Advances in Economics and Econometrics: Theory and Applications, ed. D.Kreps and K. Wallis, Cambridge University Press, 1995. Henderson, J. Vernon, Shalizi, Zmarak and Anthony J. Venables, “Geography and Development”, Journal of Economic Geography, Volume1, Issue1, Pp. 81-105 Hunt, Jennifer. and Marjolaine Gauthier-LoiselleSource, “How Much Does Immigration Boost Innovation?”, American Economic Journal: Macroeconomics, Vol. 2, No. 2 (April 2010), pp. 31-56 Jensen, Robert, “Do Labor Market Opportunities Affect Young Women's Work and Family Decisions? Experimental Evidence from India”, The Quarterly Journal of Economics (2012) 127 (2): 753-792 first published online March 3, 2012 Kanbur, Ravi, and Xiaobo Zhang. "Fifty years of regional inequality in China: a journey through central planning, reform, and openness." Review of Development Economics 9.1 (2005): 87-106. Katz, Lawrence F., Jeffrey R. Kling, and Jeffrey B. Liebman, “Moving to Opportunity in Boston: Early Results of a Randomized Mobility Experiment”, The Quarterly Journal of Economics (2001) 116 (2): 607-654 Kerr, Sari Pekkala, and William Kerr, "Economic Impacts of Immigration: A Survey", Finnish Economics Papers 24:1 (2011), 1-32. Kerr, Sari Pekkala, Kerr, William and Lincoln, William., F., “Skilled Immigration and the Employment Structures of U.S. Firms”, Working Paper. February 2013. Kerr, William, "Breakthrough Inventions and Migrating Clusters of Innovation", Journal of Urban Economics 67:1 (2010), 46-60. 51 Moretti. Enrico. "Local labor markets." Handbook of labor economics 4 (2011): 1237-1313. Moretti, Enrico. The new geography of jobs. Houghton Mifflin Harcourt, 2012. Munshi, Kaivan, and Mark Rosenzweig. Why is mobility in India so low? Social insurance, inequality, and growth. No. w14850. National Bureau of Economic Research, 2009. Reich , M., Gordon, D., M., and Edwards, R., C., “A theory of labor market segmentation”, The American Economic Review, 1973 Roy, Andrew, D. “Some Thoughts on the Distribution of Earnings,” Oxford Econ. Pap., N.S., June 1951, 3, pp. 135–46 Schultz, T. Paul. "Rural-urban migration in Colombia." The Review of Economics and Statistics 53.2 (1971): 157-163. Schwartz Aba, “Interpreting the Effect of Distance on Migration”, The Journal of Political Economy, Vol. 81, No. 5 (Sep. - Oct., 1973), pp. 1153-1169 Singh, Nirvikar, Bhandari, Laveesh, Chen, Aoyu and Aarti Khare, “Regional Inequality in India: A Fresh Look”, Economic and Political Weekly, Vol. 38, No. 11 (Mar. 15-21, 2003), pp. 1069-1073 Sjaastad, Larry A., “The Costs and Returns of Human Migration”, The Journal of Political Economy, Vol. 70, No. 5, Part 2: Investment in Human Beings. (Oct., 1962), pp. 80-93 Summers, L., H., and Abraham, K., G., “Why is the unemployment rate so very high near full employment?” Brookings Papers on Economics, 1986 Yap, Lorene. "Internal migration and economic development in Brazil." The Quarterly Journal of Economics 90.1 (1976): 119-137. Young, Alwyn. “Inequality, the Urban-Rural Gap and Migration”, 2013. The Quarterly Journal of Economics (2013) Zhao, Yaohui. "Leaving the countryside: rural-to-urban migration decisions in China." The American Economic Review 89.2 (1999): 281-286. 52