The Role of Firms in Fostering Within Country Migration: Experiment in India

advertisement
The Role of Firms in Fostering
Within Country Migration:
Evidence from a Natural
Experiment in India
Prithwiraj Choudhury
Tarun Khanna
Working Paper
14-080
February 28, 2014
Copyright © 2014 by Prithwiraj Choudhury and Tarun Khanna
Working papers are in draft form. This working paper is distributed for purposes of comment and
discussion only. It may not be reproduced without permission of the copyright holder. Copies of working
papers are available from the author.
The Role of Firms in Fostering Within Country Migration
Evidence from a Natural Experiment in India1
Prithwiraj Choudhury
Harvard Business School
Tarun Khanna
Harvard Business School
October 30, 2013
ABSTRACT
High ability individuals can be constrained from commensurate employment opportunities due to
their geographic location. In the face of physical, informational and social barriers to migration,
firms with nation-wide hiring practices can benefit from facilitating the migration of high ability
individuals from low employment districts to regions with better employment opportunities. We
exploit a natural experiment within an Indian technology firm where the pre-existence of a
computer generated talent allocation protocol allows us to isolate the relation between an
employee’s prior home town/village and subsequent performance within the firm. Using unique
personnel data for entry level undergraduates and leveraging the fact that the assignment of an
employee to one of many technology centers within the firm is uncorrelated to observable
characteristics of the employee, we find that employees hired from low employment districts
(remote employees) out-perform their non-remote counterparts in the short term. They continue
to outperform their non-remote counterparts in the long term once we control for the distance of
migration. As a possible explanation of our result, we test for selection and find that employees
hired from low employment districts outperform their non-remote counterparts in standardized
verbal and logical tests at the recruitment stage. To explain why the firm might be more likely to
select high ability individuals from remote districts, we additionally conduct a survey of
randomly selected urban and rural colleges and document statistically significant differences in
employment opportunities for rural and urban graduates. Our survey results also indicate that not
every firm follows the policy of hiring from low-employment districts.
1
The authors would like to thank seminar participants at Duke, Harvard Business School, the International
Economic Association (IEA) meetings in 2011 in Beijing, INSEAD, London School of Economics, NYU Stern
School of Business, Washington University St. Louis, Wharton and the World Bank DECTI seminar for their
comments on a previous draft.
1
I. INTRODUCTION
The influence of firms in shaping human capital and labor markets has been long studied
by economists (Baker, Gibbs and Holmstrom, 1994). Using personnel data collected from firms,
economists have conducted several productivity studies that document how hiring, training and
human resources management policies of the firm shape the development of human capital
(Bartel, Ichniowski and Shaw, 2004). However, firms are notably absent from the literature on
migration. To quote Kerr, Kerr and Lincoln (2013; page 5), “firms are mostly absent from the
literature on the impact of immigration.” The authors also argue that “this approach seems quite
incomplete for skilled migration” given that firms play an active role in the migration of the
skilled workers, in the context of U.S. and other countries.
In this paper, we posit that in the absence of roads, air connectivity, information and other
relevant “infrastructure”, nationwide hiring by large firms can facilitate efficient within country
migration patterns. Firms with nationwide hiring practices can select high ability individuals
from geographically remote areas and can benefit from such hiring practices. If employees hired
from remote locations outperform employees recruited from non-remote locations, then this
could plausibly lead to positive rents for the firm in question.
We leverage a natural experiment within a large technology firm in India and report
several interesting results. Individuals hired from low-employment districts (‘remote’
employees) outperform non-remote employees in the short term. In the longer term individuals
hired from low-employment districts continue to outperform employees hired from non-remote
locations, once we control for the distance of migration.
To offer a possible explanation of our result, we test for selection. We use standardized
test scores in logical and verbal ability during recruitment and provide evidence that employees
hired from low-employment districts outperform their non-remote counterparts on standardized
verbal and logical ability tests at the point of recruitment.
Our empirical setting and identification strategy exploits a natural experiment within one
of India’s largest software multinationals, employing greater than 120,000 people worldwide.
The Indian technology firm in question (“INDTECH”) has made significant investments in
nationwide hiring and recruits talent from over 250 colleges in India. Several of these colleges
are in low employment districts of India. After four months of induction training, the firm then
randomly allocates talent it recruits from across the country – including from remote districts 2
across its projects executed in its development centers serving global clients from locations
across India. This randomization, done by running a computer application, is done so that the
end customers of the firm, which are mostly U.S. based firms, are indifferent among the
particular INDTECH center that executes their project.
An important point here is that we do not use the random assignment of employees to
technology centers as ‘treatment’; instead we use the protocol to control for endogeneity
concerns. In other words, our main research question is not to study differences in performance
for employees who get assigned to mainstream centers versus employees who get assigned to
non-mainstream centers. Our primary research question is to study differences in performance
for employees who come from remote versus non-remote districts. However the randomization
protocol helps us control for endogeneity concerns where the center an employee is assigned to
is correlated to observable characteristics such as being from a remote district on one hand, and
is correlated to future performance on the other hand. As an example, randomization implies that
employees are not systematically assigned to centers close to their hometowns. If this was not the
case, then employees coming from remote areas would be assigned a remote development center
and their performance ratings would have been downward-biased because of the distance from
the larger knowledge centers and because of missing out on agglomeration economies.
To conduct the empirical analysis, we collect unique personnel data for undergraduates
hired by INDTECH, anonymized not to reveal names of individuals. The data collected includes
details about the school district, college district, assigned technology center, grades received
during training, short and long term performance data and attrition data for individual
employees. To test for whether or not the firm selects high ability employees from remote
districts, we further collected data of the standardized test scores of logical and verbal ability at
the recruitment stage for each employee and find evidence of the same. Our interviews with
managers at the focal firm indicate that for high ability individuals from remote areas, joining the
firm is among their top career choices; while high ability individuals from the larger cities have
several other competing career choices.
To further augment our analysis and to establish why INDTECH might be more likely to
select high quality individuals in remote districts, we conducted a survey where 11 randomly
selected urban and rural engineering colleges participated in a telephonic interview with
questions related to the nature of firms hiring from such colleges and the salaries offered by the
3
firms. Analysis of the survey data confirms large, statistically significant differences in salaries
for students graduating from urban and rural colleges. The survey also indicates that not every
technology firm follows the policy of hiring from remote districts. Multinational technology firms
in our survey sample mostly hire from urban colleges and INDTECH has a unique policy of
hiring from rural colleges. Interviews conducted with INDTECH confirm the higher costs of
hiring from rural colleges that includes the costs of travel, and the higher search costs in finding
talent from rural colleges.
Finally, we also find that employees from low employment districts appear
disproportionately to use the firm as a platform to further education and join a master’s degree
program.
As one possible explanation of our result, we test for selection. We build on the models
of selection in the migration literature. Borjas (1994) and Young (2013) are two such models that
provide a possible theoretical explanation for our results. The refugee sorting model of Borjas
(1994) is based on the Roy (1951) model of self-selection of workers and identifies theoretical
conditions under which migrants have below average earnings in the source country but end up
in the upper tail of the earnings distribution in the host country. In a recent paper, Young (2013)
provides evidence that the large gap between urban and rural living standards in developing
countries accounts for much of the inequality within those countries and is attributable to
selection based upon unobserved skill and within country migration. Our empirical analysis tests
whether or not hiring by firms of migrants from remote areas within the country involves
selection based on ability and whether or not high ability employees out-perform their nonremote counterparts in the short-term and in the long-term.
Over the past few decades, the economics literature on migration has focused on
international migration and much of the recent literature studies relatively low skilled Mexican
immigrants and their impact on the U.S. labor market. A notable exception here is the recent
paper by Young (2013). However, the disproportionate focus of the prior migration literature in
studying international migration and on a single dyad of countries (U.S., Mexico) overlooks the
widespread within-country migration happening in several developing countries. To quote Zhao
(1999), “the migration of rural labor to urban areas in China since the mid-1980s has created
the largest labor flow in world history.” (Zhao 1999, page 281) Zhao estimates that the number
of rural migrants in urban areas in China in the mid-1990s was around 50 million. In comparison,
4
to quote Borjas (1994), the number of legal immigrants coming into the U.S. between 1981-1990
was 7.3 million (Borjas, 1994; Table 1, page 1668) and in the year 1986, the Border Patrol in the
U.S. apprehended 1.8 million illegal aliens (Borjas 1994; page 1669).
Our results help bridge this gap in the migration literature by studying the role of firms in
the context of within country migration. Here, we are also motivated by the literature that
documents physical, social and informational barriers to migration and employment
opportunities for individuals in developing countries (Jensen, 2012; Banerjee et al, 2007). In the
face of such barriers to migration, large firms with wide-spread national hiring might be
particularly relevant in facilitating within country migration. Given the potential higher costs of
hiring from remote districts, the firm will only do so if individuals hired from remote districts are
more productive, i.e. they out-perform their non-remote counterparts.
In addition to informing the literature on migration, our results have policy implications
for India, reportedly the ‘youngest’ country (of appreciable size) in the world, where there is
much hope of a demographic dividend and equivalent fear of what we might term a
‘demographic albatross’ that might result if burgeoning youth pools are unemployed and
unemployable. Our findings also have implications for other large developing countries like
China which has seen widespread within country migration and an increase over the last two
decades in economic disparity across regions (Yang, 2002).
Our work also has relevance for developed countries like the United States and has a
philosophical connection to the “moving to opportunity” experiments conducted in Boston by
Katz, Kling and Liebman (2001). In the past decade or so, labor economists have documented
the polarization of the U.S. labor market (Autor, Katz and Kearney, 2006) and in recent work,
Moretti (2010) has identified large differences in worker earnings for “observationally similar
workers” based on the location of the individual. In his new book, Moretti states that “your
salary depends more on where you live than on your resume”, (Moretti, 2012; page 88). Given
this, even in the U.S., hiring practices of firms could have policy implications for efficient within
country migration.
The rest of the paper proceeds as follows. In the next section, we summarize our
theoretical antecedents w.r.t. the positive selection of migrants, the barriers to migration in
developing countries and the incentives of firms to facilitate migration. Section III describes our
5
empirical setting and the natural experiment. We report the descriptive statistics in Section IV
and our econometric specifications and results in Section V. Section VI concludes.
II. THEORY
We posit that firms with nation-wide hiring practices can facilitate high ability
individuals overcome barriers to migration and can move them from low employment districts to
areas with high employment opportunities. We also posit that despite the higher costs of hiring
from remote regions, firms can benefit from such hiring if high ability employees hired from
remote regions outperform their non-remote counterparts. The thread of our theorizing is well
captured by a quote from Yap (1976) – “Firms maximize profits, and individuals maximize
utility. However, because of institutional constraints, noneconomic motivations, government
policies and imperfect information, factor price differentials between sectors exist and are only
gradually reduced over time. Migration between sectors is a means of equalizing factor
returns…however, neither migration, nor any equilibrating force is strong enough to eliminate
imbalances instantaneously.” (Yap, 1976, page 122)
We build on theoretical antecedents from the literature in economics on migration,
development and personnel economics. We first summarize the selection models of migrants.
Two possible theoretical models that are relevant to our work are the refugee sorting model from
Borjas (1994) and the recent within-country migration and inequality model of Young (2013).
We next summarize the literature that outlines the physical, informational and social barriers to
employment opportunities in developing countries. Finally we summarize the literature in
personnel economics that outlines the role of hiring, training and human resource practices of
firms in shaping human capital and labor markets.
II.A. Selection Models of Migrants and Refugee Sorting
As Borjas (1994) outlines, the early literature on migration is focused on the labor market
performance of immigrants in the host country. Sjaastad (1962), Chiswick (1978) and Carliner
(1980) are the pioneering studies in this area. Sjaastad (1962) models migration as an investment
decision where each individual assess the expected utility to be obtained in each possible
destination and chooses the location with highest expected utility. In other words, self-selection
is driven by wage differentials net of migration costs. Chiswick finds that after 30 years in the
6
United States, the typical immigrant earns about 11 percent more than a comparable native
worker. This result was interpreted by several researchers in terms of a selection argument.
Chiswick mentions that immigrants are “more able and more highly motivated” than natives
(Chiswick 1978, page 900) and Carliner postulated that immigrants “choose to work longer and
harder than non-migrants” (Carliner 1980, page 89).
Subsequent research has further developed the self-selection arguments of immigrant
flow. This analysis is based on the Roy (1951) model of self-selection of workers, which
describes how workers sort themselves between employment opportunities. In the case of
migrants, Borjas (1987) models this based on a wage distribution for the home and host countries
of the migrant and identifies conditions for immigrant positive self-selection and immigrant
negative self-selection. In other words, this model generates conditions under which migrants
will either have above average earnings in the source and host country (positive self-selection) or
will have below average earnings in the source and host country (negative self-selection).
Borjas also identifies conditions for “refugee sorting” (Borjas 1994; page 1689) where
immigrants have below average earnings in the source country but end up in the upper tail of the
earnings distribution in the host country. This sorting happens when the correlation between the
skills of the two countries is small or negative. To illustrate “refugee sorting”, Borjas gives the
example of high skilled workers in a Communist country which does not value their skills
migrating to a market economy and performing well in the host country’s market economy.2
The recent empirical literature on migration has tested the theoretical predictions of the
selection hypotheses; however most of the work is focused on a single dyad of countries
(Mexican immigrants in the U.S.) and has focused on relatively low skilled workers. Recent
papers in this area include Munshi (2003), Chiquiar and Hanson (2005), McKenzie and Rapoport
(2010), Kaestner and Malamud (2010) and Moraga (2011). McKenzie and Rapoport (2010)
consider the effect of migrant networks in influencing self-selection patterns. Munshi (2003)
studies Mexican immigrant networks and finds that the same individual is more likely to be
employed and retain a higher paying non-agricultural job when the network of the individual is
exogenously larger.3
2
An exposition of the Borjas (1994) refugee sorting model, reproduced from the original text is provided in the
appendix
3
There is a recent empirical literature on skilled immigrants in the U.S. and much of this literature is focused on the
impact of skilled immigrants on wages and employment opportunities of domestic skilled workers. This literature
7
II.B. Within Country Migration and Selection Models
The theoretical foundations of the within country migration literature dates back to the
1970s and the ‘job-search’ models of Harris and Todaro (1970). These models included a rural
and urban sector with the urban sector characterized by an institutionally fixed wage above the
market clearing level. These models explained within country migration as rational maximizing
behavior where the higher urban wage acted as a regulating mechanism. Subsequent work in this
area included Pinera and Selowsky (1978) who additionally accounted for the existence of
voluntary unemployment, especially in urban areas. Most of the prior literature on within country
migration in developing countries is focused on models of agricultural to non-agricultural
migration. However, the within-country migration literature has not received much attention in
recent times with the notable exception of Young (2013) who provides evidence of self-selection
of workers into sectors leading to within country migration.4
Young (2013) provides evidence that the large gap between urban and rural living
standards in developing countries accounts for much of the inequality within those countries and
is attributable to selection based upon unobserved skill and within country migration. One out of
four or five individuals raised in rural areas moves to a city as a young adult and earns much
higher incomes than non-migrant rural permanent residents. Similarly, one out of four or five
dates back to Friedberg (2001) who studied the effect of Russian immigrants into Israel and the effect they had on
wages of domestic workers. Exploiting information on the immigrants' former occupations abroad, the author finds
no adverse impact of immigration on native outcomes. Borjas (2005) uses data from the survey of earned doctorates
and the survey of doctoral recipients to estimate that a 10 percent immigration induced increase in the supply of
doctorates lowers the wage of competing workers by about 3 percent. In a recent paper, Kerr and Kerr (2013) study
the impact of skilled immigrants in occupations related to science, technology, engineering and mathematics
(STEM) and find that STEM workers departing their firms during periods of disproportionately high immigration
experience difficult employment transitions. A second stream of recent papers focus on the impact of skilled
immigrants on innovation and the progress of science. Hunt and Gauthier-Loiselle (2010) use data from the 2003
National Survey of College Graduates and find that a 1 percentage point increase in immigrant college graduates'
population share increases patents per capita by 9-18 percent. Other studies in this area include Borjas and Doran
(2012) who study the migration of Russian mathematicians following the collapse of the Soviet Union and Moser et
al. (2012) who study Jewish scientist expellees from Nazi Germany.
4
Schultz (1971) provides evidence that more than one-third of the rural Colombian population under the age of 40
in 1951 had left for urban areas by 1964. The author considers several factors in building a model of rural-urban
migration including differences in agricultural and manufacturing wages, population growth rate in a region (greater
population growth rate leading to higher out-migration) and characteristics of individual migrants. The author finds
that rural-urban migration is selective with respect to age and sex as well as region. He also finds that schooling
contributes to out-migration of students, given that the returns to education are higher in the cities compared to that
in the rural areas. Yap (1976) focuses on rural-urban migration in Brazil and focuses on rural-urban differences in
capital and labor productivity; rates of technological change; rates of natural population growth, marginal savings
propensities and tax rates. The migration decision here is endogenous to the wage differential between the
agricultural and non-agricultural sectors.
8
individuals raised in urban areas moves to a rural area as a young adult and earns much lower
incomes than their non-migrant urban cousins.
The theoretical model of Young (2013) assumes the simultaneous existence in
developing countries of two sectors, urban and rural. In equilibrium, it is more likely that a
skilled worker will work in an urban area and an unskilled worker will work in a rural area. Also,
observable educational attainment determines the probability that a worker is skilled. In other
words, the relative factor demands of the urban and the rural sectors produces a sorting of
workers, so that a worker of a given educational attainment working in the urban sector is more
likely to be skilled. The author also assumes underlying heterogeneity in workers’ urban and
rural productivity conditional on their skill status. This assumption together with the
concentration of demand for skilled workers in urban areas leads to the average rural to urban
migrant being better educated that the average rural permanent resident. This idea builds on
Lewis (1954) and the existence of dual economies where workers in rural areas migrate to cities,
by comparing their marginal product in urban output to their average product in rural family
output. The model is also similar to Lagakos and Waugh (2011) who argue that workers sort
themselves into urban and rural areas based on their intrinsic abilities.
II.C. Barriers to Migration and Employment in Developing Countries
The theoretical model of Young (2013) does not assume any barriers to migration within
the country and workers can self-select into the urban or rural employment sectors based on their
underlying skills and educational attainment. However, the economics literature in the context of
developing countries documents physical, informational and social barriers to migration and
employment.
First of all, individuals may be located in low-employment regions and may not invest in
education given the paucity of employment opportunities around them. This follows the
literature in economic geography (e.g. Henderson et al. 2001), that outlines how both across
countries and within countries, economic activity is concentrated to a few metropolitan regions.
There is also a related literature (Dyson and Moore, 1983 and Foster and Rosenzweig, 2009) that
points out how considerations such as the gender or social status, e.g. caste of the student could
9
negatively affect the perceived benefits to investing in education.5 A recent study of how gender
is related to underinvestment in education in the context of India is by Jensen (2012). The author
ran an experiment where recruiting services for the newly emerging business process outsourcing
industry was provided for young women in randomly selected rural Indian villages. The author
finds that during this time, young women in treatment villages were significantly less likely to
get married or have children; instead they were likely to obtain more schooling or post-school
training and enter the labor market. On the issue of caste, scholars such as Banerjee, Iyer and
Somanathan (2005) observe that in India, caste classifications are rigid and caste divisions lead
to social relations that might become conflict-prone. Members of backward castes can be
relatively disadvantaged in their access towards public goods related to education and skill
development. As Banerjee et al. (2004) point out in their study of the provision of public goods
to various districts of India based on the caste demographics of the area, “Areas with a
concentration of Brahmans, the traditional priestly class considered the top of the caste
hierarchy, have higher levels of schools….Areas with groups that were recognized as socially
and economically marginalized by the Indian state at the time of independence are associated
with lower access.” (Banerjee et al. 2004, page 4). Munshi and Rosenzweig (2009) also have a
working paper that explains the persistence of low spatial and marital mobility in rural India,
despite increased growth rates and rising inequality in recent years, on the existence of sub-caste
networks that provide mutual insurance to their members.
Secondly, in developing countries, employment opportunities for talented individuals in
remote areas might be hindered by the lack of teaching infrastructure and the lack of committed
teachers. Banerjee et al. (2007) summarize the dismal quality of educational services offered to
the poor in developing countries. The authors refer to a 2005-India wide survey on educational
attainment and state that 44 percent of the children aged 7-12 cannot read a basic paragraph and
50 percent cannot do simple subtraction. The authors also run two sets of randomized
experiments where employing young women to teach students lagging behind in basic literacy
and numeracy skills and a computer assisted learning program for Math resulted in higher test
scores. Chaudhury et al. (2006) find for a sample of developing countries that includes
5
Dyson and Moore (1983) and Foster and Rosenzweig (2009) outline the practice of ‘patrilocal exogamy’ where the
woman gets married to an individual from a different village and leaves her parents’ village to live with her
husband's family. This and similar practices imply that in many cases, the returns to investing in girls' human capital
does not accrue to the parents and as a result, parents have less incentive to invest in the education of the girls.
10
Bangladesh, Ecuador, India, Indonesia, Peru and Uganda, that on average, 19 percent of teachers
were absent during unannounced visits to primary scholars made by the researchers. The authors
also state that a comparable teacher absence rate for a large sample of school districts in New
York State was five percent. Choudhury and Khanna (2012) summarize the physical,
informational and social barriers to migration and employment for high ability individuals in
remote regions in the context of developing countries.
II.D. Firms and Migration
As described earlier in the introduction, firms are conspicuously absent from the
economics literature on migration. To quote Kerr, Kerr and Lincoln (2013), “from an academic
perspective, there is very little tradition for considering firms in analyses of immigration. As one
vivid example, the word “firm” does not appear in the 51 pages of the classic survey of Borjas
(1994) on the economics of immigration, and more recent surveys also tend to pay little attention
to firms” (Kerr et al, 2013; page 1).6 The authors also explain that the role of firms needs to be
studied in the context of migration, particularly in the case of migration of skilled workers.7
There is also a separate literature in personnel economics, summarized by Bartel et al.
(2004) that outlines the role of hiring, training and human capital management policies of firms
in shaping human capital and labor markets.
We leverage insights from this literature in personnel economics and posit that firms with
country-wide hiring practices might facilitate within-country migration, moving high ability
individuals from low employment districts to regions rich in employment opportunities. If
employees hired from remote locations outperform employees recruited from non-remote
locations, hiring employees from remote locations could plausibly lead to positive rents for the
firm in question. In other words, there could be a “recruiting arbitrage” in hiring employees from
remote locations and expropriating rents from their superior performance within the firm.
Here, we are motivated by the findings from Fisman and Khanna (2004). The authors
show that in equilibrium it is possible for some firms to profit from factor arbitrage in rural India.
6
Other recent surveys on immigration include Friedberg and Hunt (1995), Freeman (2006), Dustmann et al. (2008)
and Kerr and Kerr (2011).
7
The authors analyze the role of firms in immigration in the context of the H1-B visa, where the firms identifies
workers it wants to hire. They use the Longitudinal Employer-Household Dynamics (LEHD) dataset and an
unbalanced panel of 319 firms over 1995-2008 and estimate that a 10% increase in a firm’s young skilled immigrant
employment correlates with a 6% increase in the total skilled workforce of the firm.
11
To quote the authors, “firms that are best able to deal with infrastructure shortages will be more
likely to locate in low-infrastructure regions, as it allows them to take advantage of cheap factors
of production that arise in equilibrium in order for markets to clear”. (Fisman and Khanna, 2004,
page 615). The authors also provide evidence that firms in India that are able to operate in states
with less than median development indices benefit from lower wage rates (about 30% less, on
average, than wage rates in more developed states) and tax rates (average of 8.2% in developed,
vs. 7.0% in undeveloped states), as well as a higher average rate of government 'fiscal benefits'
(0.8% vs. 0.6%).
III. NATURAL EXPERIMENT
Our empirical setting is one of India’s largest technology firms (INDTECH) with over
120,000 employees spread over 10 technology centers in India and working on global projects.
We exploit a natural experiment w.r.t. how this firm assigns entry level employees to its 10
technology centers spread all over India.
As stated earlier, the firm has made significant investments in nationwide hiring and
recruits talent from over 250 colleges in India. Several of these colleges are in low employment
districts of India. After four months of induction training, the firm then randomly allocates talent
it recruits from across the country to one of the 10 possible technology centers all across the
country. The allocation is done by a computer application that is part of the firms’ enterprise
resource planning software. This policy ensures that the allocation of an employee to a particular
location within the firm is uncorrelated to measures of observed ability such as test scores at the
end of induction training. Out of the 10 centers, 7 centers are located in mainstream locations
close to the firm headquarters while 3 centers are in relatively remote parts of India. Figure 1
outlines the geographic distribution of the 10 development centers. Exhibit 1 outlines the steps
followed by the computer application that assigns new employees to one of the ten locations.
Interviews with the head of talent development at INDTECH reveals that the “primary
motivation” of this talent allocation policy is to ensure that the end-customers of INDTECH,
mostly U.S. based firms are indifferent about the location of the technology center that executes
its project. The secondary motivation of this talent allocation policy is the avoidance of regional
and/or ethnic cliques at the technology centers. To quote the head of talent development at
INDTECH, “we do not want all Tamils to join the Chennai center or all Punjabis to join
12
Chandigarh and start conversing in their regional language rather than English. If that happens,
both our clients and employees from other parts of the country are affected.”
As described earlier, we do not use the random assignment of employees to technology
centers as ‘treatment’; instead we use the protocol to control for endogeneity concerns. In other
words, our main research question is not to study differences in performance for employees who
get assigned to mainstream centers versus employees who get assigned to non-mainstream
centers. Our primary research question is to study differences in performance for employees who
come from remote versus non-remote districts. Given this, the talent allocation protocol at
INDTECH is extremely valuable from the point of the econometrician. The personnel economics
literature (Baker, Gibbs and Holmstrom, 1994; Gibbons, 1995) has long pointed out, there are
pre-defined “career ladders” inside firms. These career ladders are long, structured and involve
endogenous progressions (“fast moves”) based on prior period performance. In this context,
being able to find a randomized assignment of employees to locations inside the firm helps avoid
several endogeneity concerns that may arise in a more conventional setting. If the allocation of
employees to a technology center was endogenous on employee level characteristics, then the
performance estimates of remote employees could be downward or upward biased, as the
following few examples illustrate.
For example, in a more conventional setting it is conceivable that employees could be
systematically assigned technology centers close to their home towns/villages. If that is the case,
then employees coming from remote areas would be assigned a remote development center and
their performance estimates would have been downward-biased because of the distance from the
larger knowledge centers and because of missing out on agglomeration economies.8 In another
example, employees could be sorted to locations based on measures of ability observable to the
managers of the firm but not to the econometrician. In this case, the highest ability employees
might be disproportionately assigned mainstream technology centers. If this is indeed the case
and if observed measures of ability are not perfectly correlated with actual ability, then any
further study of what drives subsequent performance is subject to methodological bias. In this
case for example, if there is a positive correlation between being from a remote district and
8
The researchers conducted several employee interviews at INDTECH to confirm this. A couple of the employees
who were interviewed came from the Khordha and Sundergarh districts in the eastern Indian state of Orissa and had
families living in these places. In interviews, they confirmed they given a choice, they would have selected the
Bhubaneshwar, Orissa technology center of INDTECH but given that they had no choice in selecting the center,
both of these employees were assigned to and continue to work in the Bangalore technology center.
13
unobserved measures of ability, and if employees from remote districts are systematically
assigned mainstream technology centers, their performance estimates could be upward biased.
The talent assignment also helps us avoid issues related to assortative matching (Becker
1973). In a more conventional setting for example, it is conceivable that employees might be
assigned to locations based on considerations such as ethnicity. In this example, all employees
who are ethnic Kannadigas (i.e. from Karnataka) could be assigned to the Bangalore technology
center, all Tamils (i.e. from Tamil Nadu) could be assigned to the Chennai technology center and
all employees from Orissa to the technology center in Bhubaneshwar. If this is the case and
further if there is systematic bias in grading performance towards certain ethnic or social groups
within the firm in question, we might get spurious results in analyzing how prior location of the
employee relates to subsequent performance. As an example, there could be a systematic positive
bias in grading performance for employees from the South Indian centers (Bangalore and
Chennai centers). In that case, the performance ratings of employees from the Orissa
(Bhubaneshwar) center will be downward biased. Given assortative matching based on ethnicity
and given that Orissa has a large number of ‘low employment districts’, the econometrician will
observe a downward bias in the performance of employees hired from low employment districts.
IV. DESCRIPTIVE STATISTICS
We collected unique data for entry level employees recruited over 2007-2009 from over
250 colleges all across India. The employees in our sample are undergraduates hired from
engineering colleges with no prior full-time employment experience. Post recruitment, they are
randomly assigned to one of several technological areas such as .NET, Java or mainframe, and
receive four months of induction training prior to being assigned a technology center.9
Entry level undergraduates join the firm and start training between May and November
and prior to starting training are assigned a ‘technological area’. This assignment of an
undergraduate to a technological area is uncorrelated to observable characteristics of the entry
level undergraduate. Employees assigned a particular technological area are then trained in
batches of around 100 employees each. For the sample of employees hired in 2007, there are 18
9
The .NET Framework (pronounced dot net) is a software framework developed by Microsoft that runs primarily on
Microsoft Windows. It includes a large library and provides language interoperability (each language can use code
written in other languages) across several programming languages. (Source:
http://en.wikipedia.org/wiki/.NET_Framework)
14
batches with an average of 94 employees each. In addition, as described earlier, post training, the
assignment of employees to a technology center is not correlated to observable characteristics of
employees. Each of the 10 technology centers at INDTECH work on projects related to the three
major technologies (.NET, Java, Mainframe) that entry level undergraduates are trained in.
To avoid being biased by diverse temporal trends affecting various technologies that the
employees are trained in, we restricted our data collection exercise to employees trained in a
single technological area (the single area is .NET). Focusing on employees trained in the same
technological area enables us to alleviate concerns of employee performance being biased by
short term demand or supply trends affecting the underlying technology they are trained in. In
all, we are able to collect data for a total of 8520 undergraduates hired in 2007, 2008 and 2009.
Of this, 1696 undergraduates were hired in 2007. The personnel data for the 2007 batch is much
more complete and has less missing data compared to data collected for the batches hired in 2008
and 2009. Table I summarizes the personnel data for both the 2007 batch (first three columns)
and for all batches (last three columns) and the notes in the table explain the level of
completeness of data both for the 2007 batch as well as for subsequent batches. The firm hires
around 10,000 undergraduates every year. Given that we only collect data for employees trained
in .NET implies that we collect data on around 17% of total entry undergraduates in 2007.
The main independent variable of interest is whether or not the employee is from a
remote district (from remote district). This variable is constructed as follows. We requested
detailed resumes of employees in our sample listing the name and location of their school, high
school and undergraduate college. This data was made available for 93% of the 2007 batch and
for 37% of employees from all batches. In the next step, we use data from the 2001 Indian census
to identify average employment for 594 districts in India. A district is coded as ‘remote’, if the
employment level in the district based on the census data is less than the median employment
level across all Indian districts.
Given this data, we code from remote district as ‘1’ if three conditions are met – (i) the
employee went to school in a remote district; (ii) the employee went to high school in a remote
district and (iii) the employee went to undergraduate college in a remote district. Assuming that
remote employees perform better, as we will show later, this turns out to be the most
conservative way of coding remoteness. In this definition, individuals who went to a remote
school but a non-remote college have the variable coded as 0, i.e. though they might have
15
performed better in school and consequently moved to a non-remote college, they are coded as
part of the control group. This arguably biases us against finding an effect for the treatment
group of remote employees. Table II summarizes the migration patterns in our data.
The next independent variable of interest is whether or not the employee is assigned a
mainstream center (assigned mainstream center). INDTECH has 10 technology centers in India.
We code seven of these 10 locations as mainstream locations and these are Bangalore,
Hyderabad, Chennai, Mysore, Mangalore, Trivandrum and Pune. The remaining three locations,
Bhubaneshwar, Jaipur and Chandigarh, are coded as non-mainstream locations. We did this
coding in consultation with the head of talent development at INDTECH and the underlying
rationale is predominantly based on the geographic distance to the headquarters (Bangalore).
This follows the literature in economic geography (Henderson et al. 2001), that outlines how
both across countries and within countries, knowledge is concentrated in a few central locations.
This variable is available for 96% of the 2007 batch and for 88% of employees from all batches.
Our first dependent variable of interest is performance. At the end of every year, each
employee generally receives a performance rating if and only if she worked on a coding/testing
project for at least 9 months in the calendar year. For the 2007 batch, we use a measure of
performance at the end of 2008 (short term performance) and a measure of performance at the
end of 2010 (long term performance). However, given the ‘9 month work’ rule, not every
employee in the batch gets a performance rating in every year.
Interviews with the head of talent development at INDTECH, with a senior manager in
human resources and with several employees in the sample indicate that the performance ratings
for entry level undergraduates is based on mostly objective measures including the quality of
coding and/or testing (measured using ‘mistakes’ in the code that are recorded by automated
software) and the timeliness and completeness in coding/testing and documentation (measured
using automated software). Employees are also tested for their communication skills and this is
assessed by the manager of the employee. However, the metrics are predominantly objective and
measurable for entry level undergraduates. To quote a senior human resources manager, “For the
first three years, performance evaluation is mostly based on objective metrics….there is an
underlying normal distribution for the cohort in assigning these ratings.”
Interviews with senior human resource managers also indicate important differences in
how short term performance (i.e. performance at the end of 2008 for the 2007 batch) and long
16
term performance (i.e. performance at the end of 2010 for the 2007 batch) is measured and
coded. Short term performance (i.e. performance at the end of 2008 for the 2007 batch) is
measured using the following two dimensions – (i) error rate in coding/testing and (ii)
completeness in coding/testing and documentation and is distributed across three possible
discrete ratings, with the distribution of ratings across employees fitted using a normal
distribution. Given that INDTECH uses automated software to measure the error rate of
coding/testing and the completeness of coding/testing and documentation, it is safe to say that
the measures of performance are quantifiable and objective.
Long term performance is measured based on one additional dimension. For long term
performance, the three dimensions - (i) the error rate in coding/testing, (ii) completeness in
coding/testing and (iii) communication skills. This additional measure, communication skills, is
subject to a more subjective assessment by the manager of the employee. In addition, long term
performance (i.e. performance at the end of 2010 for the 2007 batch) is distributed across five
possible discrete ratings, with the distribution of ratings fitted using a normal distribution.
For the 2007 batch, data on both short term and long term performance was made
available for every single employee who exceeded the ‘9 month work rule’ and received a
performance rating. Our interviews also indicate that the manager of each employee enters an
initial performance rating based on the objective criteria and then managers from human
resources check the rating to the underlying scores (scores of error rate of coding, completeness
of coding, etc.) to ascertain errors committed by the manager in entering the scores.
The second set of dependent variables relate to the verbal and logical scores each
employee received for the standardized tests at the recruitment stage. This variable helps us test
for one of the possible underlying mechanisms for why remote employees might have different
subsequent performance compared to their non-remote counterparts and allows us to test whether
or not there is positive selection of remote employees in light of the selection models discussed
earlier. INDTECH administers a standardized test to measure verbal and logical ability at the
recruitment stage and questions can have negative penalties for incorrect responses in certain
years. We use these scores as an observable measure of the unobservable ‘skill’ or ‘ability’ of
each individual.
17
We also collected data on attrition and have two other dependent variables – quit firm by
2011 and quit for higher studies. To code the second variable, we collected data from exit
interviews for each employee in our sample who left the firm by 2011.
We also code control variables to indicate the gender of the employee (Male) and
whether or not the employee is from one of the underrepresented scheduled caste (SC) or other
backward castes (OBCs) in India.10 The gender data was only available for the 2007 batch.
A key control variable relates to the distance of migration of the employee from her
home town/village to the technology center she is assigned to. We estimate this distance using
the distance of the district headquarters of the school district of the employee and the technology
center she is assigned to.11 The migration literature has long considered the costs of migration
related to the distance of migration. This literature dates back to Schultz (1971), who finds that
within country migration between pairs of regions are responsive to locational factors and
specifically the distance between a focal region and each major city influences the cost of travel
and in turn affects the migration decision. The author also conjectures that travel costs are not
linearly related to time or distance of travel and uses the logarithm of the time in hours to travel
the distance as a proxy for the cost of migration. Recent papers that use similar measures include
McKenzie and Rapoport (2010) and Dahl and Sorenson (2010). Our paper is plausibly the first
paper where this measure is exogenously determined. Given the random assignment of
employees to technology center, the distance of migration is exogenous, given that the home
town/village location is pre-determined, however the final destination is exogenously
determined.
We also control for cumulative grade point average (CGPA) at the end of training. This is
a cumulative grade point average score that controls for performance during training and is
expected to be positively correlated to subsequent performance within the firm.
10
As Banerjee et al. (2009) point out, the term ‘Scheduled Castes’ comes from the Ninth Schedule of the Indian
Constitution, which lists for each state in India the specific caste groups who are eligible to benefit from the
affirmative action provisions outlined in the Constitution
11
The variable distance of migration is computed as follows. We use the latitude and longitudes of the district
headquarters of the school district and the final location within INDTECH to which the employee is assigned. Using
the latitude and longitude of the pair of towns, we use the following formula to calculate the distance in kilometers:
ACOS(COS(RADIANS(90-Lat1))*COS(RADIANS(90-Lat2)) +SIN(RADIANS(90-Lat1)) *SIN(RADIANS(90Lat2)) *COS(RADIANS(Long1-Long2)))*6371
18
V. RESULTS
V.A. Econometric Specifications
To estimate whether or not being from a remote district for employee i affects short-term
and long-term performance, we run the following specification:
1)
=
,
where
is a measure of underlying short term (in 2008) or long term (in 2010) performance, L
indicates technology center fixed effects and
indicates a vector of individual characteristics
including gender and whether or not individual is a member of scheduled caste/scheduled tribe.
Given that performance is measured in normalized bands, we implement the specification using
an Ordered Logit model.12 To recap the benefit of exploiting the natural experiment, in our
setting, the variable assigned mainstream is arguably uncorrelated to the variable from remote or
observable measures of ability.
To estimate whether or not there is positive selection of employees from remote districts,
we run the following specification:
2)
where
=
is the verbal/logical scores at the recruitment stage and
,
indicates a vector of
individual characteristics including whether or not individual is a member of scheduled
caste/scheduled tribe. We implement this specification using OLS with robust standard errors.
12
Let
be unobserved dependent variable measuring performance, be a vector of independent variables,
parameter vector and the error term, where:
Instead of , we observe:
;
;
)
()
()
Consequently, (
(
)
(
) where ( )
)
In other words, (
be the unknown
19
To estimate whether or not being from a remote district for employee i affects the
probability that the employee leaves the firm to join a master’s degree program, we run the
following specification:
3)
=
,
where
is a dummy variable indicating that the employee left the firm by 2011 to join a
master’s degree program, L indicates technology center fixed effects and
indicates a vector of
individual characteristics including gender and whether or not individual is a member of
scheduled caste/scheduled tribe. Given that dummy variable nature of the dependent variable, we
implement this specification using a Logit regression and robust standard errors.
V.B. Remote status and Performance
Figure II outlines the short term performance ratings (2008 performance ratings) for the remote
and non-remote employees for the 2007 batch.13 We run distributional tests to compare the short
term performance for remote and non-remote employees. We run the two-sample Wilcoxon
rank-sum (Mann-Whitney) test and reject the null hypothesis that the performance data for the
two groups follow the same distribution.
Tables III and IV report results for the Ordered Logit regression with robust standard
errors described in specification (1). For these regressions we only considered the 2007 batch
given that for the 2007 batch we could analyze the relation between home town/village and both
short term (end of 2008) and long term (end of 2010) performance within the firm. Table III
reports the relation between the variable from remote district and short term (end of 2008)
performance. Table IV reports this relationship for long term (end of 2010) performance. We
would also like to highlight that though the size of the 2007 batch is 1696 employees, the sample
size of the regression analysis in Tables III and IV is smaller than that, given the ‘9 month rule’
described earlier in Section IV.
Table III indicates that there is a positive and statistically significant relation between
being from a remote district and short term performance. This result is robust to controlling for
13
Summary statistics for short-term and long-term performance ratings for the 2007 batch are in Appendix III
20
whether or not the employee was assigned a mainstream technology center, technology center
fixed effects, test scores at the end of training, gender, whether or not the employee is from a
schedule caste and distance of migration from home. Among the control variables, as expected,
being assigned a mainstream technology center and CGPA at the end of training are highly
correlated to short term performance. Columns IV, VIII and IX also indicate a positive and
statistically significant relation between being a member of a scheduled/other backward caste
and short term performance. Marginal effects are reported in the top graphic of Figure IV and
indicate that the predicted probability that an employee from a remote district will achieve the
highest performance rating (for short term performance) is 39% in the fully specified model. The
corresponding predicted probability that an employee from a non-remote district will achieve the
highest performance rating (for short term performance) is 28%.
Table IV reports the relation between long term (end of 2010) performance and being
from a remote district. Columns I-VII suggest that the effect of being from a remote district loses
statistical significance on long term performance. However, once we control for the log distance
of migration and introduce an interaction effect between being from remote district and log
distance of migration, we find that being from a remote district has a positive and statistically
significant relation with long term performance. Column IX indicates that the main effect of
being from a remote district on long term performance is positive and statistically significant and
the interaction effect is negative. Table IV also indicates that there is no statistically significant
relation between being from a scheduled or other backward caste and long term performance.
Here too, as expected, CGPA at the end of training is highly correlated to long term
performance.
Marginal effects are reported in the bottom graphic of Figure IV and indicate that the
predicted probability that an employee from a remote district will achieve the highest
performance rating (for long term performance) is 17% in the fully specified model. The
corresponding predicted probability that an employee from a non-remote district will achieve the
highest performance rating (for long term performance) is 5%. Here, we would like to recap that
long term performance (i.e. 2010 performance) has five possible ratings compared to short term
(i.e. 2008) performance which has three possible ratings. Given this, we also computed the
predicted probability of achieving the highest or second best performance rating for long term
performance. This combined predicted probability was 39% for remote employees; the
21
corresponding combined predicted probability was 14% for non-remote employees. In summary,
the difference in predicted probabilities for remote and non-remote employees was larger for
long term performance, compared to short term performance.
We also analyze the effect of distance of migration on the predicted probability of
achieving the highest performance rating for remote employees and results are reported in Figure
V. The predicted probability of achieving the highest performance rating for remote employees is
23% when migration distance is less than 2 miles. The predicted probability of achieving the
highest performance rating for remote employees is 10.4% when migration distance is 250 miles.
The predicted probability drops to 7.5% when the distance of migration is 1850 miles.14
V.C. Evidence of Selection – Remote Status and Recruitment Scores
As one possible explanation of our result, we test for selection. We build on the selection
models presented earlier and test whether or not employees from remote districts have higher
scores in tests of verbal and logical ability during the time of recruitment. Figure III outlines the
plot of verbal and logical scores for the remote and non-remote employees in our sample across
all years. We also run distributional tests to compare the verbal and logical scores for remote and
non-remote employees. We run the two-sample Wilcoxon rank-sum (Mann-Whitney) test and
reject the null hypothesis that the test scores data for the two groups follow the same distribution.
Tables V and VI present our regression results that documents a positive and statistically
significant relation between being from a remote district and higher verbal and logical scores
from the standardized tests at the time of recruitment. Table V relates remote status to logical
scores and Table VI does the same for verbal scores. Here we implement specification (2) and
use OLS with robust standard errors. In the base case, we run the regressions for the entire cohort
(employees hired in 2007, 2008 and 2009). Though the size of the cohort is 8520 employees, the
sample size for Tables V and VI are constrained by the availability of data to code the from
remote district variable. Among the control variables, as expected, the recruitment test scores are
highly correlated with CGPA at the end of training. The recruitment scores are also highly
correlated to being from a member of a scheduled caste/OBC. These results provide empirical
validation that the firm is selecting high ability individuals from remote districts.
14
India measures 3,214 km (1,997 miles) from north to south and 2,933 km (1,822 miles) from east to west (Source:
http://en.wikipedia.org/wiki/Geography_of_India)
22
V.D. Remote Status and Attrition
Next we present evidence of a positive and statistically significant relation between being
from a remote district and quitting the firm to join a master’s degree program. Here we
implement specification (3) and employ a Logit regression with robust standard errors. The
sample size here is determined by the total number of employees in our sample who left the firm
till 2011 (N=1823 as indicated in Table I) and the availability of data to code the from remote
district variable. Results are reported in Table VII. The fully specified model indicates that
conditional on quitting the firm by 2011, the predicted probability that an employee from a
remote district will quit to join a master’s degree program is 45%. The corresponding predicted
probability that conditional on quitting the firm by 2011, an employee from a non-remote district
will quit to join a master’s degree program is 38%.
V.E. Robustness Checks
Our most important robustness check relates to verifying the validity of the natural
experiment, i.e. validating that the technology center allocation decision is not correlated with
observable measures of performance and employee level characteristics. Results are reported in
Table VIII and indicate that the decision to allocate an employee to a mainstream technology
center post induction training is not correlated to observable employee level characteristics (such
as being from a remote district) or observable measures of ability (such as CGPA at the end of
training or standardized test scores at recruitment stage). This validates the talent allocation
policy underlying the natural experiment.
In additional robustness checks, for the 2007 class, we dropped the individuals who left
the firm between 2007 and 2011 and ran the specifications and the results remained robust. We
also relaxed the definition of the from remote district variable. In the base case, we had taken the
most limiting definition of the variable, only coding the variable as 1 if there was no missing data
for the school, high school or college district and if all three of these districts were low
employment districts. In robustness checks, we relaxed this limitation of missing data and our
results remain robust. We get larger coefficient estimates for the ‘from remote district’ variable
in Tables III-VIII and the statistical significance remains the same or improves.
We also ran regressions to track individuals over time. Results are reported in the
appendix and indicate that employees from remote districts who achieved the highest
23
performance rating in the short term are more likely to achieve the highest performance rating
over the long term. We however found no evidence that remote employees disproportionately
improved performance between 2008 and 2010, compared to non-remote employees.
V.F. Survey of Urban and Rural Colleges
In addition to conducting the analysis using personnel data from INDTECH, we
conducted a survey of randomly selected rural and urban colleges in India, to study why
INDTECH might have a higher probability of finding a high ability individual in a remote
district. In other words, the purpose of the survey was to validate whether or not the demand for
skilled engineers and the recruitment opportunities for skilled engineers exhibited geographic
separation in line with the underlying assumptions of the refugee sorting model of Borjas (1994)
and the geographic skill segregation model of Young (2013).
To conduct this analysis, we randomly selected 10 urban and 10 rural colleges from the
list of colleges from which INDTECH hired its employees. We then contacted these colleges via
phone and/or email. Eleven of the 20 colleges agreed to participate in a telephonic survey that
lasted for around 30 minutes each. In these interviews, we asked questions about the total
graduating class size at these colleges, starting salaries for undergraduate engineers in 2011 and
2012 and asked questions related to which technology firms (both Indian and foreign
multinational) hired from these colleges and how many students were hired by each firm. The
interviews were conducted with either the head of the college, or the head of the group that was
responsible in organizing recruitment at the college. Results are reported in Table IX and
indicate that the mean salaries for 2011 and 2012 are significantly higher for the urban colleges
compared to the rural colleges. We also validated that this difference is statistically significant in
a t-test comparison of means. These findings are in line with the geographic returns to skills
segregation assumptions of Young (2013) and Borjas (1994) and indicate that firms such as
INDTECH can arbitrage such differences in returns to skills across geographic regions. In
addition, the survey reveals that not every technology firm hires from low employment districts.
Multinational technology firms predominantly hire from the urban colleges and INDTECH has
the distinctive policy of hiring from both urban and rural colleges.
24
VI. CONCLUSION
Our empirical study attempts to bridge two gaps in the migration literature in economics
– the lack of understanding on the role of the firm in facilitating migration and the relative lack
of focus in the literature on within country migration. We exploit a natural experiment within a
large Indian technology firm that allows us to isolate the relation between the prior location
(home town/village) of an employee on subsequent performance within the firm and attrition.
The allocation of employees to technology centers is done by a computer application that
does not consider observable skills and other personal characteristics of the employee while
allocating employees to one of several technology centers. The firm follows this talent allocation
policy so that the end customers of the firm, which are firms mostly based in the U.S., are
indifferent among the particular INDTECH center that executes their project. As described
earlier, we do not use the random assignment of employees to technology centers as ‘treatment’;
instead we use the protocol to control for endogeneity concerns. This policy helps us avoid
endogeneity issues that arise more usually, where employees might have been assigned
technology centers close to their home town/village. If assignment of technology center was
based on proximity to home, then remote employees would have been systematically far away
from the larger technology centers and their performance might have been downward biased due
to missing out on agglomeration effects. In more usual settings, ethnic or social ties could also
determine the assignment of an employment to a technology center and this too could abet or
retard performance in a way that an econometrician may not be able to observe.
Our results indicate that employees hired from low employment districts of India outperform their non-remote counterparts in the short term. In the longer term, remote employees
continue to outperform their non-remote counterparts, once we control for the distance of
migration. An important point here is that though the distance of migration measure has been
used in the migration literature since Schultz (1971) and Schwartz (1973); however, given the
random assignment of employees to technology center, our study is plausibly the first study where the
distance of migration is exogenously determined.
To provide a possible interpretation of our results, we test for selection and build on the
selection models in the migration literature. Borjas (1994) and Young (2013) are two possible
models that provide a plausible theoretical explanation for our results. In light of the selection
models, we provide evidence that employees hired from low employment districts have higher
25
verbal and logical test scores at the recruitment stage. Our interviews further corroborate this. To
quote the head of talent development at INDTECH, “If you are the best student in Bangalore,
you will probably never join INDTECH. Instead, you will join some MBA course to go to
America. However, if you are the best student in Sundergarh Orissa, a remote district with high
unemployment levels, INDTECH is your dream first job.”
To further establish the geographic separation of employment opportunities for
undergraduates in India and why INDTECH might be able to hire high ability individuals from
remote colleges, we conducted a survey at 11 randomly selected urban and rural engineering
colleges. The urban engineering colleges have 2.3 times higher starting salaries on average and
this difference is statistically significant. Multinational technology firms like IBM almost
exclusively visit urban colleges in our sample. The large difference in wages for skilled workers
graduating from urban and rural colleges is arguably an exposition of the underlying conditions
that leads to ‘refugee sorting’ in the Borjas (1994) model of self-selection of migrants.
Our results also indicate that remote employees are more likely to join a master’s degree
program when they leave the firm.
Our study has several limitations. Our natural experiment and data comes from a single
firm. We follow the tradition of insider econometrics in the personnel economics and collect data
from a single firm; future analysis needs to test our central findings in other settings and will
need to corroborate that (i) employees from low employment districts outperform their nonremote counterparts and (ii) this is attributable to selection in other settings. Also, though we
interpret our results using plausible selection models in the migration literature and the
regressions of recruitment test scores, we do not have a way to tease out alternative and
complementary mechanisms for why employees from remote districts perform better. It is
plausible that remote employees exert more effort (in addition to being higher ability) and we are
unable to test this.
Moreover, our empirical analysis only provides estimates of the performance premium of
hiring remote employees and do not provide estimates of the incremental costs related to hiring
from remote districts. Our interviews with managers at INDTECH indicate that there are
incremental costs of hiring from remote districts compared to hiring from cities like Bangalore
and we do not incorporate these additional costs in our theoretical or empirical analysis. These
costs relate to travel and search costs in remote locations and we do not have an estimate of how
26
long it took INDTECH to assemble the infrastructure to hire from remote districts and the sunk
costs of such investments.15 Given that smaller firms including technology startups in Beijing or
Bangalore may not be able to incur these additional costs and might focus their hiring around the
large cities, from a social planner’s point of view, it might make sense for government actors to
facilitate employment fairs in remote districts where a large number of smaller firms might be
able to hire highly skilled workers. We also do not have a way to compare the ‘hiring funnel
ratio’, a term used by hiring managers at INDTECH which measures the number of individuals
to whom a job offer is made as a fraction of individuals tested, for rural and urban colleges. It is
plausible that the ‘average’ remote student who is not hired (i.e. the average remote student who
is tested and not hired) is lower ability than the average non-remote student (similar student in
non-remote districts who is tested and not hired) and we do not have the data to test this.
Another limitation of our study is that, in equilibrium, the gains from hiring remote
employees is likely to disappear as firms set up centers in the remote districts over time and/or as
the barriers of within-country migration are gradually overcome. However, here again we borrow
from the long standing wisdom in the migration literature and posit that “neither migration, nor
any equilibrating force is strong enough to eliminate imbalances instantaneously.”(Yap, 1976,
page 122)
Our study, though empirical in nature, informs the prior theory literature on migration
and labor markets. In addition to being related to the selection models in the migration literature,
including Borjas (1994) and Young (2013), our survey results provide evidence of a geographic
15
Our back of envelope analysis however does suggest a net gain in hiring remote employees. This is calculated as
follows: in the first step we compute the gains in hiring a remote employee. We base this analysis on 2008
performance data for the 2007 batch. We use the predicted probabilities of achieving the highest performance rating
for remote and non-remote employees. Figure IV indicates that remote employees are 11% more likely to achieve
the highest performance rating compared to non-remote employees. We assume that the entry level salaries are
~$8000 per year (at 2013 USD to Rupee exchange rates). We also assume that compared to employees who achieve
the highest performance rating, other employees need 35% more man-days to correct coding/testing/documentation
errors (this assumption was based on rough calculations with INDTECH human resource managers on the error rate
and the man days lost due to coding/testing/documentation errors for top-tier employees compared to other
employees). We then calculate the gains from hiring a remote employee to be around $308 per year.
Next we compute the incremental costs of recruiting remote employees. This analysis is based on discussions with
recruiting managers at INDTECH and leads to an estimate of $21 incremental cost of hiring a remote employee.
This is based on both incremental travel costs for INDTECH executives involved in hiring from remote locations,
the additional search costs of visiting rural colleges and the drop in the ‘funnel ratio’ (number of students offered a
job as a percentage of the number of students who are invited for the recruitment tests) for rural colleges. Given
these numbers, the net gain of hiring a remote employee is estimated to be around $287 per employee per year. This
works out to around 4% of the entry level salary of an undergraduate employee.
27
separation of skills and employment opportunities. This finding is related to the segmented labor
markets literature in economics, notably the work of Reich, Gordon and Edwards (1973) and
Dickens and Lang (1988).16
Our findings have several policy implications for India, which aspires to enjoy a
demographic dividend in the coming decades. As Chandrasekhar et al. (2006) point out, in 2020,
the average Indian will be only 29 years old, compared with the average age of 37 years in China
and the US, 45 in west Europe and 48 in Japan. Within the next 30 years, 64 per cent of India’s
population will fall within working age range (15-64). Aiyar and Mody (2011) estimate that
these shifts in India’s age structure will contribute significantly to India’s economic growth,
adding upwards of 2 percentage points per year to India’s per capita GDP in the next two
decades. However, aligning economic outcomes with demographic trends is no easy task,
especially in a country with already existing regional inequalities. Deaton and Dreze (2002)
delineate how regional disparities increased in the 1990s, with southern and western regions
showing much higher rates of growth than the northern and eastern regions. While urban
populations have seen an increase in average per capita expenditures (ACPE) as high as 20 to 30
per cent, poorer states have seen minimal growth in ACPE in recent years. They also document
minimal reduction of poverty in rural areas.17 In addition, Chadrasekhar et al. (2006) point out
that absorption of the Indian youth into the labor force is not as high as one would expect. In our
own analysis, using data on where the young Indian population lives (collected from the Indian
census of 2001) and data on where the technology firms are located (based on membership data
of the National Association of Software and Services Companies or NASSCOM), we outline this
geographic mismatch in the context of the Indian technology sector. Table X outlines this
analysis. It is clear that in absence of within-country migration that moves highly skilled young
16
The theory of segmented labor markets came into prominence in the 1970s. As Reich, Gordon and Edwards
(1973) point out, “American workers seemed to operate in different labor markets, with different working
conditions, different promotional opportunities, different wages and different market institutions”. The authors also
point out that these segmented labor markets were created through differences in race, sex, educational credentials,
etc. The theory of segmented labor markets was implicitly used by authors such as Summers (1986) in analyzing
differences in unemployment rates among workers belonging to different age and gender characteristics. The theory
also made a prominent comeback in the late 1980s in work by Dickens and Lang (1988). The authors summarize the
key propositions of this theory. Firstly, the labor market has two sectors – a high wage primary sector with stable
employment and substantial returns to human capital variables such as education and a low wage secondary sector
with the opposite characteristics. Moreover, primary jobs are rationed, not every worker who desires a job in the
primary sector can obtain one.
17
Deaton and Dreze used adjusted headcount ratios (HCR), poverty indexes, and per capital expenditures to estimate
the gaps between rural and urban poverty lines in periods from 1987-88, 1993-94, and 1999-00 across states and
regions.
28
people to the urban centers which have the employment opportunities, the perceived
demographic dividend could become a ‘demographic albatross’.
Our results also have policy implications for other large developing countries such as
China which have witnessed significant within-country migration and have arguably seen an
increase in regional disparity. Yang (2002) documents how regional inequality in China has risen
in the past two decades. The author attributes this inequality primarily to a large rural–urban
income gap and growing inland–coastal disparity and documents that the ratio of urban–rural
income and consumption hovered between 2 and 3.5 since the inception of reform. Yang also
documents that per capita production and consumption diverged across China’s regions—the
initially rich coastal provinces were better off and the interior provinces became relatively
disadvantaged during the reform period (Fleisher & Chen, 1997; Kanbur & Zhang, 1999).
Overall, the indices of regional inequality first showed moderate declines, but then rose. Towards
the end of 1990s, they gradually climbed to peak historical levels during the Great Leap Famine
(Kanbur & Zhang, 2002).
Our work also has relevance for developed countries like the United States. In the context
of the U.S. labor market, economists have also found that geographic location has a significant
effect on wages. To quote Moretti (2010), “the hourly wage of workers located in metropolitan
areas at the top of the wage distribution is more than double the wage of observationally similar
workers located in metropolitan areas at the bottom of the distribution”, (Moretti 2010, page
1238). In the same light, in his new book titled ‘The New Geography of Jobs’, Moretti states that
“your salary depends more on where you live than on your resume”, (Moretti, 2012; page 88). In
addition, building on Katz, Kling and Liebman (2001), the geographic disparities in wages and
employment opportunities in the U.S. is accentuated by the residential segregation by race and
by income in the large metropolitan areas. In this context, the authors conducted the “moving to
opportunity” experiments in Boston where, in the context of high poverty public housing projects
certain households received ‘Section 8 housing vouchers’ that could be used to help pay for
rental from private landlords. Children in households offered vouchers valid only in low poverty
neighborhoods had reduced likelihoods of injury and victimization by crime.
In conclusion, both in the U.S. and in developing countries, firms can play an important
role in facilitating the within-country migration of talent from low employment to areas with
more employment opportunities. Our results indicate that a focal firm can benefit from such
29
hiring practices if it is able to effectively screen and select high ability individuals in remote
districts and if such individuals perform better compared to their non-remote counterparts.
30
TABLE I
EMPLOYEE CHARACTERISTICS FOR 2007 BATCH AND FOR ALL BATCHES
From remote district
Assigned mainstream center
CGPA (end of training)
Logical score (recruitment test)
Verbal score (recruitment test)
Member of scheduled or other backward caste
Male
Log distance of migration
Quit firm by 2011
Quit for higher studies
N
1578
1621
1696
1635
1635
1696
1696
1189
1696
713
2007 Batch
Mean
Std. Dev.
0.38
0.49
0.89
0.31
4.48
0.43
4.72
3.67
4.29
3.98
0.51
0.50
0.65
0.48
6.30
1.58
0.42
0.49
0.39
0.49
All Years (2007-2009)
N
Mean
Std. Dev.
3174
0.33
0.47
7497
0.89
0.31
8517
4.50
0.51
8401
0.68
3.71
8401
0.57
4.34
8520
0.10
0.30
1696
0.65
0.48
2487
6.17
1.67
8520
0.21
0.41
1823
0.24
0.42
Notes: This table lists employee characteristics for the 2007 batch (columns 1-3) and for all the batches
who join in 2007, 2008, 2009 (columns 4-6). The variable ‘from remote district’ is coded as 1 if the
individual went to school, high school and college in a low employment district. This variable was coded
based on resumes collected for the employees; the data to code this variable is available for 93% of the
2007 batch and for 37% of employees for all batches. The variable ‘assigned mainstream center’ is coded
as 1 if the individual is randomly assigned one of the mainstream technology centers of the firm; data to
code this variable is available for 96% of the 2007 batch and for 88% of employees from all batches.
CGPA is the cumulative grade point average at the end of the training and is available for the entire 2007
batch and 99.9% of employees from all batches. The logical and verbal scores are from the standardized
multiple-choice recruitment tests; the standardized test includes negative penalties for wrong answers.
The test was changed between 2007 and 2008 and this reflects in the mean scores for the 2007 and later
batches; this information is available for 96% of the 2007 batch and 99% of employees from all batches.
The variable ‘member of scheduled or other backward caste’ is coded as 1 if the employee is member of
one of the scheduled or other backward castes (OBCs). This variable is available for the entire batch; the
mean for the 2007 batch is significantly higher than the mean for other batches. There were changes in
government reservation policy towards OBCs in 2008 (on 10th April 2008 the Supreme Court of India
upheld the government's initiative of 27% OBC quotas in government-funded institutions); however it is
unclear if that affected the change in fraction of SC/OBC employees between 2007 and 2008. The
variable ‘log distance of migration’ is the log of the distance from the district headquarters of the school
district of the employee to the final technology center the employee is assigned to. This distance was
calculated using the latitude and longitude of the two locations (district headquarters of school district and
final location) and using an algorithm that calculates the distance between two locations using the arc
radians method and data to code this variable was available for 70% of the 2007 batch and for 29% of
employees from all years. The variables ‘quit firm by 2011’ was available for all the batches across the
years; the variable ‘quit for higher studies’ was coded based on data from exit interviews.
31
TABLE II
MIGRATION PATTERNS
School
Remote
High School
College
38%
(low employment
district)
12%
Non-remote
20%
(high employment
district)
21%
Notes: This table outlines the various migration patterns that emerge for employees in the sample
collected. Similar to Young (2013), we observe bi-directional migration flows in our data (i.e. remote to
non-remote and non-remote to remote). We also observe students who spend their entire educational
career in remote districts and non-remote districts.
1. Path 1 (top row) refers to the set of employees who spend their entire educational career in a low
employment district prior to being hired by INDTECH. For the 2007 batch, this proportion is 38%
and for the entire batch, the proportion is 33%. Only these employees are coded as being from remote
district
2. Path 2 (bottom row) refers to employees who spend their entire educational career in a high
employment district prior to being hired by INDTECH. Their proportion is 21% in the 2007 batch and
16% for all batches
3. The proportion of students who went to a (i) non-remote school, (ii) non-remote high school and
(iii)remote college is 20% in the 2007 batch and 24% for all batches
4. The proportion of students who went to a (i)remote school, (ii)remote high school and (iii) nonremote college is 12% in the 2007 batch and 9% for all batches
5. The proportion of students who went to a (i) non-remote school, (ii)remote high school and
(iii)remote college is 3% in the 2007 batch and 1% for all batches
6. The proportion of students who went to a (i)remote school, (ii) non-remote high school and
(iii)remote college is 3% in the 2007 batch and 13% for all batches
7. The proportion of students who went to a (i)remote school, (ii) non-remote high school and (iii)
non-remote college is 2% in the 2007 batch and 3% for all batches
8. The proportion of students who went to a (i) non-remote school, (ii)remote high school and (iii)
non-remote college is 1% in the 2007 batch and 1% for all batches
9. As Table I indicates, the data to code this variable is available for 93% of the employees in the2007
batch and for 37% of employees for all batches.
32
TABLE III
ORDERED LOGIT OF REMOTE STATUS ON SHORT TERM PERFORMANCE
I
Dependent Variable = Performance at the end of 2008
II
III
IV
V
VI
VII
VIII
0.29*
(0.18)
0.41**
(0.18)
0.30*
(0.18)
0.32*
(0.18)
0.31*
(0.18)
0.39*
(0.20)
0.33*
(0.18)
CGPA (end of training)
-
2.31***
(0.30)
-
-
-
-
-
2.39*** 2.39***
(0.31)
(0.31)
Assigned mainstream center
-
-
0.53*
(0.28)
-
-
-
-
0.68**
(0.30)
-
Member of scheduled or
other backward caste
-
-
-
0.34**
(0.16)
-
-
-
0.48**
(0.18)
0.49***
(0.17)
Male
-
-
-
-
0.30*
(0.17)
-
-
0.22
(0.18)
0.32*
(0.18)
Log distance of migration
-
-
-
-
-
0.00
(0.06)
-
-
-
627
627
604
627
627
486
Yes
627
604
Yes
627
From remote district
Fixed Effects for center
N
0.46**
(0.19)
IX
Notes: This table reports the relation between the variable from remote district and short term (end of
2008) performance and implements specification (1) using an Ordered Logit Regression with robust
standard errors (reported in parentheses). We only considered the 2007 batch given that for the 2007
batch we could analyze the relation between home town/village and both short term (end of 2008) and
long term (end of 2010) performance within the firm (reported in Table IV). Though the size of the 2007
batch is 1696 employees, the sample size of the regression analysis in Tables II and III is lower than that,
given the ‘9 month rule’ described earlier in Section IV. This table indicates that there is a positive and
statistically significant relation between being from a remote district and short term performance. There is
also a there is a positive and statistically significant relation between being from a scheduled/other
backward caste and short term performance.
*Significant at the 10% level. **Significant at the 5% level. *** Significant at the 1% level.
33
0.52***
(0.18)
TABLE IV
ORDERED LOGIT OF REMOTE STATUS ON LONG TERM PERFORMANCE
Dependent Variable = Performance at the end of 2010
III
IV
V
VI
VII
I
II
0.10
(0.11)
0.14
(0.12)
0.11
(0.12)
0.10
(0.12)
0.10
(0.12)
0.09
(0.12)
CGPA (end of training)
-
1.23***
(0.19)
-
-
-
Assigned mainstream center
-
-
-0.02
(0.19)
-
Fixed Effects for center
-
-
-
Yes
Member of scheduled or other
backward caste
-
-
-
Male
-
-
-
Log distance of migration
-
-
From remote * log distance of
migration
1165
From remote district
N
VIII
IX
0.09
(0.13)
1.19*
(0.62)
1.37**
(0.62)
-
-
-
1.33***
(0.22)
-
-
-
-
-0.04
(0.23)
-
-
-
-
-
0.01
(0.11)
-
-
-
0.06
(0.14)
-
-
0.52***
(0.12)
-
-
0.45***
(0.15)
-
-
-
-
-0.09*
(0.05)
0.02
(0.07)
0.05
(0.07)
-
-
-
-
-
-
-0.17*
(0.10)
-0.19*
(0.10)
1165
1130
1163
1165
1165
853
853
828
Notes: This table reports the relation between the variable from remote district and long term (end of 2010) performance and implements
specification (1) using an Ordered Logit Regression with robust standard errors (reported in parentheses). Like in Table III, we only considered the
2007 batch. The sample size is less than the size of the 2007 batch (1696 employees) given the ‘9 month rule’ (Section IV). The table indicates a
positive and statistically relation between being from a remote district and long term performance once we control for the distance of migration
and introduce an interaction term between being from a remote district and distance of migration (Columns VIII and IX). *Significant at the 10%
level. **Significant at the 5% level. *** Significant at the 1% level.
34
TABLE V
OLS OF REMOTE STATUS ON LOGICAL SCORE DURING RECRUITMENT TEST
Dependent variable = Logical score at recruitment test
I
II
III
IV
V
From remote
district
0.85***
(0.17)
0.82***
(0.16)
0.69***
(0.15)
0.46***
(0.14)
0.42***
(0.14)
CGPA
-
1.46***
(0.14)
-
-
0.62***
(0.12)
Member of
scheduled or other
backward caste
-
-
4.02***
(0.16)
Verbal score
-
-
-
0.47***
(0.01)
0.38***
(0.01)
3118
3118
3118
3118
3118
N
2.44***
(0.17)
Notes: This table presents our regression results establishing a positive and statistically significant
relation between being from a remote district and logical scores from the standardized tests at the time of
recruitment. Here we implement specification (2) and use OLS with robust standard errors (reported in
parentheses). In the base case, we run the regressions for the entire cohort (employees hired in 2007, 2008
and 2009). Though the size of the cohort is 8520 employees, the sample size for the regressions in this
table is constrained by the availability of data to code the from remote district variable.
*Significant at the 10% level. **Significant at the 5% level. *** Significant at the 1% level.
35
TABLE VI
OLS OF REMOTE STATUS ON VERBAL SCORE DURING RECRUITMENT TEST
I
From remote
district
Dependent variable = Verbal score at recruitment test
II
III
IV
V
0.82***
(0.19)
0.79***
(0.18)
0.67***
(0.17)
0.33**
(0.16)
0.31**
(0.15)
CGPA
-
1.53***
(0.16)
-
-
0.62***
(0.14)
Member of
scheduled or other
backward caste
-
-
3.87***
(0.18)
Logical score
-
-
-
0.58***
(0.02)
0.50***
(0.02)
3118
3118
3118
3118
3118
N
1.78***
(0.18)
Notes: This table presents our regression results establishing a positive and statistically significant
relation between being from a remote district and verbal scores from the standardized tests at the time of
recruitment. Here we implement specification (2) and use OLS with robust standard errors (reported in
parentheses). In the base case, we run the regressions for the entire cohort (employees hired in 2007, 2008
and 2009). Though the size of the cohort is 8520 employees, the sample size for the regressions in this
table is constrained by the availability of data to code the from remote district variable.
*Significant at the 10% level. **Significant at the 5% level. *** Significant at the 1% level.
36
TABLE VII
LOGIT OF REMOTE STATUS ON ATTRITION TO JOIN MASTERS DEGREE PROGRAM
Dependent variable: Quit firm to join master’s degree program
I
II
III
IV
V
From remote district
0.35**
(0.14)
0.25*
(0.15)
0.31**
(0.15)
0.33**
(0.14)
0.31**
(0.16)
Assigned mainstream
center
-
0.45*
(0.27)
-
-
0.49*
(0.28)
CGPA (end of training)
-
-
1.94***
(0.14)
-
1.34***
(0.19)
Member of scheduled or
other backward caste
-
-
-
0.46***
(0.14)
-0.34**
(0.16)
1059
767
1059
1059
767
N
Notes: In this table, we present evidence of a positive and statistically significant relation between being
from a remote district and quitting the firm to join a master’s degree program. Here we implement
specification (3) and employ a Logit regression with robust standard errors. The sample size here is
determined by the total number of employees in our sample who left the firm till 2011 (N=1823 as
indicated in Table I) and the availability of data to code the from remote district variable.
*Significant at the 10% level. **Significant at the 5% level. *** Significant at the 1% level.
37
TABLE VIII
LOGIT OF REMOTE STATUS AND TEST SCORES ON BEING ASSIGNED MAINSTREAM
CENTER
Dependent variable: Assigned mainstream center
I
II
III
IV
IV
-0.02
(0.17)
-
-
-
-0.06
(0.17)
Logical score at recruitment test
-
0.02
(0.02)
-
-
0.02
(0.02)
Verbal score at recruitment test
-
-
-0.01
(0.02)
-
-0.01
(0.02)
CGPA (end of training)
-
-
-
-0.22
(0.23)
-0.15
(0.26)
1508
1564
1564
1621
1459
From remote district
N
Notes: This table reports results from an important robustness check, that the technology center allocation
decision is not correlated to observable measures of performance and employee level characteristics such
as being from a remote district.
38
TABLE IX
SURVEY OF URBAN AND RURAL ENGINEERING COLLEGES
Urban colleges
Rural colleges
Average size of graduating class in computer science/IT (undergraduate &
masters)
342
458
Average percentage of graduating class in computer science/IT hired by
INDTECH (in 2011, 2012)
0.17
0.06
Average percentage of graduating class in computer science/IT hired by
multinational technology firms IBM and Cognizant (in 2011, 2012)
9%
1%
Mean annual salary
(Rupees Lakhs, 2011 and 2012 average)
6.2
2.7
7
4
N
Notes: The researchers randomly selected 10 urban and 10 rural engineering colleges from the list of
colleges that INDTECH hires from and contacted the colleges to participate in a telephonic survey. The
researchers were able to conduct interviews with 7 out of the 10 urban colleges and these included R.V.
College of Engineering, Bangalore; M.S. Ramaiah Institute of Technology, Bangalore; MLR Institute of
Technology, Hyderabad; Muffakham Jah College of Engineering and Technology, Hyderabad; Vasavi
College of Engineering, Hyderabad; G.Narayanamma Institute of Technology & Science (GNITS),
Hyderabad and Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad. The
researchers were also able to conduct interviews with 4 out of the 10 selected rural colleges. These
included M.J.P. Rohilkhand University; Majhighariani Institute of Technology & Science, Rayagada
Orissa; Bapatla Engineering College, Guntur and Jaya Prakash Narayan College of Engineering,
Mahabubnagar.
The survey results indicate that the mean salaries for 2011 and 2012 are significantly higher for the urban
colleges compared to the rural colleges and this difference is statistically significant for a t-test
comparison of means. In addition, the survey reveals that multinational technology firms predominantly
hire from the urban colleges while INDTECH has the distinctive policy of hiring from both urban and
rural colleges. Rs.1 lakh = Rs. 100,000.
39
TABLE X
PERCENTAGE OF YOUNG POPULATION AND TECHNOLOGY FIRMS IN INDIAN STATES
What percentage of
the 20-34 years
population in India
lives in the state
Uttar Pradesh
Maharashtra
West Bengal
Andhra Pradesh
Bihar
Tamil Nadu
Madhya Pradesh
Karnataka
Gujarat
Rajasthan
Orissa
Kerala
Assam
Jharkhand
Punjab
Haryana
Chattisgarh
Delhi
Jammu and Kashmir
Uttarakhand
Himachal Pradesh
Tripura
Manipur
Meghalaya
0.34
0.23
0.19
0.18
0.17
0.15
0.13
0.13
0.12
0.12
0.09
0.08
0.06
0.06
0.06
0.05
0.05
0.04
0.02
0.02
0.01
0.01
0.01
0.01
Percentage of
NASSCOM firms
that are
headquartered in the
state
0.08
0.22
0.04
0.11
0.00
0.10
0.00
0.20
0.03
0.01
0.00
0.01
0.00
0.00
0.00
0.12
0.00
0.05
0.00
0.00
0.00
0.00
0.00
0.00
Notes: This table tabulates the percentage of population in the age group of 20-34 years living in each
Indian state and the corresponding fraction of information technology firms listed with the National
Association of Software and Services Companies (NASSCOM) in India. States with less than 0.01
percent of the population in age group 20-34 years were dropped from the table. Data collected by the
researchers.
40
FIGURE I
TECHNOLOGY CENTERS OF INDTECH IN INDIA
Notes: INDTECH has 10 technology centers in India. We code seven of these 10 locations as mainstream
locations and these are Bangalore, Hyderabad, Chennai, Mysore, Mangalore, Trivandrum and Pune.
These centers are marked in blue. The remaining three locations, Bhubaneshwar, Jaipur and Chandigarh,
are coded as non-mainstream locations and are marked in red. We did this coding in consultation with the
head of talent development at INDTECH and the underlying rationale is predominantly based on the
geographic distance to the headquarters (Bangalore).
41
FIGURE II
0
2
4
6
SHORT TERM (END OF 2008) PERFORMANCE FOR 2007 BATCH
1
1.5
2
2.5
Not Remote
Remote
3
Notes: This graphic plots the distributions of short term performance (i.e. performance at the end of 2008)
for the 2007 batch. Interviews with managers at INDTECH indicate that short term performance (i.e.
performance at the end of 2008 for the 2007 batch) is measured using the following two dimensions – (i)
error rate in coding/testing and (ii) completeness in coding/testing and documentation and is distributed
across three possible discrete ratings, with the distribution of ratings across employees fitted using a
normal distribution.
We also conduct distributional tests to compare the short term performance for remote and non-remote
employees. We run the two-sample Wilcoxon rank-sum (Mann-Whitney) test and reject the null
hypothesis that the performance data for the two groups follow the same distribution (Prob > |z| =
0.0937).
42
FIGURE III
.06
.04
0
.02
kdensity ls
.08
.1
LOGICAL AND VERBAL SCORES FROM RECRUITMENT TEST
-10
-5
0
5
10
x
Remote
.04
0
.02
kdensity vs
.06
.08
Not Remote
-10
-5
0
5
10
15
x
Not Remote
Remote
Notes: This graphic plots the kernel densities of logical scores (top panel) and verbal scores (bottom
panel) for remote and non-remote employees. These scores are based on standardized tests conducted at
the time of recruitment. We also conduct distributional tests to compare the scores for remote and nonremote employees. We run the two-sample Wilcoxon rank-sum (Mann-Whitney) test and reject the null
hypothesis that the scores for the two groups follow the same distribution (Prob > |z| = 0.0000).
43
FIGURE IV
PREDICTED PROBABILITIES FOR SHORT TERM AND LONG TERM PERFORMANCE
Predicted probabilities - Short term performance
0.8
Predicted probability
0.7
0.6
0.5
0.4
Remote
0.3
Non remote
0.2
0.1
0
Lowest rating
Middle rating
Highest Rating
Performance slabs
Predicted probabilities - Long term performance
predicted probabilities
70%
60%
50%
40%
30%
Remote
20%
Non remote
10%
0%
Lowest
rating
Middle
Rating
Highest
Rating
Performance slabs
Notes: This graphic plots the predicted probabilities of achieving the highest performance rating for
remote and non-remote employees in the short term and in the long term, based on the regressions
reported in Tables III and IV. The top panel plots the predicted probabilities for short term (end of 2008)
performance and the bottom panel plots the predicted probabilities for long term (end of 2010)
performance, both for the 2007 batch. Both plots indicate an 11-12% higher predicted probability of
achieving the highest performance rating for a remote employee, vis-à-vis a non-remote employee.
44
FIGURE V
MIGRATION DISTANCE AND PREDICTED PROBABILITIES FOR ACHIEVING HIGHEST
RATING IN LONG TERM PERFORMANCE (FOR REMOTE EMPLOYEES)
25%
Predicted probability
20%
15%
Predicted probability of
achieving highest rating
for long term
performance (for remote
employees)
10%
5%
0%
0
500
1000
1500
2000
Migration distance (miles)
Notes: This graphic plots the effect of distance of migration on the predicted probability of achieving the
highest performance rating for remote employees. This analysis was done for long term performance
(performance at the end of 2010) for the 2007 batch. The predicted probability of achieving the highest
performance rating for remote employees is 23% when migration distance is less than 2 miles. The
predicted probability of achieving the highest performance rating for remote employees is 10.4% when
migration distance is 250 miles. The predicted probability drops to 7.5% when the distance of migration
is 1850 miles.
45
APPENDIX 1
REFUGEE SORTING MODEL FROM BORJAS (1994) pages 1687-1689
Suppose the residents of country 0 (source) consider migrating to country 1 (host) where the earnings
distribution in the two countries is given by:
and
The migration decision is determined by a comparison of earning opportunities across countries, net of
migration costs (C).
The model then defines an index function:
(
)
(
)
(
)
gives “time-equivalent” measure of migration cost. A worker migrates to host country if
Where
|
) which gives earnings of immigrants
The model then computes the conditional means (
|
) which gives immigrant earnings in host country
prior to migration and (
(
|
)
( |
and
where
can be written as
(
)
|
) reduces to
|
And (
Where
where
(
is the correlation between
and V
)
) reduces to
(
)
( )
The model then defines
( |
) and
( |
)
Refugee sorting happens under
between
) is very small
The exact condition is
and this happens if
iff
(
(correlation coefficient
)
46
APPENDIX 2
STEPS OF EMPLOYEE RANDOM ASSIGNMENT
[Based on interviews and INDTECH internal documents. Part of the text is copied from
INDTECH internal documents]
Allocation of Software Engineer trainees (SET) to business units is done by a computer
application called ‘Talent Planning’ that is part of the firm enterprise resource allocation
software system. This application allocates SETs to a unit and location based quarterly
manpower budget released by Corporate Planning (CPLAN)
The “process lifecycle steps” are:

Collating the manpower budget and unit wise requirements

Allocation of individuals to various technology streams

Trainee allocation (Unit and Location)

Communication to stake holders
Talent Planning does the allocation by matching the following

Unit wise requirements (Business HR at each location provides data on requirement for
SETs trained in various technologies)

Data from HR located at the training location. Two weeks prior to completion of training
batches, HR at the training location releases data on which individuals are expected to
complete training

The two variables that the ‘Talent Planning’ team looks at while doing the matching on
an automated system include the stream of training for the trainee and the estimated date
of completion of training. The prior background of the employee and the test scores of
the employee are not considered in this decision

Communication of allocation decisions is through a centralized portal
47
APPENDIX 3
SUMMARY DATA OF PERFORMANCE RATINGS FOR 2007 BATCH
Variable
N
Mean
Std. Dev.
Min
Max
perf08band1
711
0.34
0.47
0
1
perf08band2
711
0.62
0.49
0
1
perf08band3
711
0.05
0.21
0
1
perf10band1
1696
0.07
0.26
0
1
perf10band2
1696
0.12
0.32
0
1
perf10band3
1696
0.43
0.49
0
1
perf10band4
1696
0.1
0.3
0
1
perf10band5
1696
0.02
0.12
0
1
48
Appendix 4: Tracking Individuals Over Time
From remote district
From remote district *
Achieved highest performance rating in 2008
Dependent Variable = Did employee achieve highest performance rating in 2010
-0.26
-0.09
-0.20
-0.28
-0.24
-0.02
0.30*
(0.27)
(0.27)
(0.27)
(0.27)
(0.27)
(0.28)
(0.18)
0.97***
(0.34)
0.65*
(0.36)
0.97***
(0.34)
0.99***
(0.35)
0.98***
(0.35)
0.69*
(0.37)
-
CGPA (end of training)
-
1.22***
(0.34)
-
-
-
1.13***
(0.35)
-
Assigned mainstream center
-
-
0.15
(0.33)
-
-
0.20
(0.34)
-
Member of scheduled or OBC
-
-
-
-0.08
(0.19)
-
-0.01
(0.20)
-
Male
-
-
-
-
0.57***
(0.21)
0.50**
(0.22)
-
-
-
-
-
-
-
-0.62***
(0.13)
-
-
-
-
-
-
-0.24
(0.24)
627
627
604
627
627
604
1881
Post 2008
From remote district * Post 2008
N
Notes: This table tracks performance of individuals over time. The first 6 columns uses cross-sectional data to track whether remote employees
who achieved the highest performance rating in 2008, continue to perform well and achieve the highest performance rating in 2010. The key
variable of interest is the interaction term (from remote district * achieved highest performance rating in 2008) and indicates that remote
employees who achieve the highest rating in 2008 also are likely to achieve the highest rating in 2010. The last column then conducts a difference
in differences test by using performance data from 2008, 2009 and 2010 and tests whether remote employees are disproportionately likely to
improve performance between 2008 and 2010. The key variable is the interaction term (from remote district * post 2008); we find no statistically
significant effect.
49
References
Aiyar, Shekhar, and Ashoka Mody. Demographic Dividend: Evidence from the Indian States.
International Monetary Fund, 2011.
Autor, David, H., Lawrence F. Katz, and Melissa S. Kearney. The polarization of the US labor market.
No. w11986. National Bureau of Economic Research, 2006.
Baker, George, Gibbs, Michael and Holmstrom, Bengt, “The Internal Economics of the Firm: Evidence
from Personnel Data”, The Quarterly Journal of Economics, Vol. 109, No. 4 (Nov., 1994), pp. 881-919
Banerjee, Abhijit, Shawn Cole, Esther Duflo, and Leigh Linden, “Remedying Education: Evidence from
Two Randomized Experiments in India”, The Quarterly Journal of Economics (2007) 122 (3): 1235-1264
Banerjee, Abhijit, and Rohini Somanathan. "The political economy of public goods: Some evidence from
India." Journal of Development Economics 82.2 (2007): 287-314.
Bartel, Ann, Casey Ichniowski and Kathryn Shaw, “ Using "Insider Econometrics" to Study
Productivity”, The American Economic Review , Vol. 94, No. 2, (May, 2004), pp. 217-223
Borjas, George J. 1985. Assimilation, Changes in Cohort Quality, and the Earning of Immigrants. Journal
of Labor Economics 3 (4) (October): 463-89.
Borjas, George, "Economics of Immigration", Journal of Economic Literature 32 (1994), 1667-1717.
Borjas, George, "The Labor Demand Curve Is Downward Sloping: Reexamining the Impact of
Immigration on the Labor Market", The Quarterly Journal of Economics 118:4 (2003), 1335-1374.
Borjas, George, "Do Foreign Students Crowd Out Native Students from Graduate Programs?”, in
Ehrenberg, Ronald, and Paula Stephan (ed.), Science and the University (Madison, WI: University of
Wisconsin Press, 2005).
Borjas, George, and Kirk Doran, "The Collapse of the Soviet Union and the Productivity of American
Mathematicians", Quarterly Journal of Economics 127:3 (2012), 1143-1203.
Carliner, Geoffrey. “Wages, Earnings and Hours of First, Second and Third Generation American Males,”
Econ. Inquiry, Jan. 1980, 18(1), pp. 87–102
Chandrasekhar, C. P., Ghosh, Jayati, Roychowdhury, Anamitra “The ‘Demographic Dividend and Young
India’s Economic Future”, Economic and Political Weekly December 9, 2006
Chaudhury, Nazmul, Jeffrey Hammer, Michael Kremer, Karthik Muralidharan and F. Halsey Rogers,
“Missing in Action: Teacher and Health Worker Absence in Developing Countries”, Journal of Economic
Perspectives—Volume 20, Number 1—Winter 2006—Pages 91–116
Chiswick, Barry, “The Effect of Americanization on the Earnings of Foreign-Born Men,” J. Polit. Econ.,
Oct. 1978, 86(5), pp. 897–921.
Choudhury, Prithwiraj and Khanna, Tarun, “Physical, Social and Informational Barriers to Domestic
Migration”, Chapter 17, Between Theory and Developmental, Institutional Realities, edited by Masahiko
Aoki, Ken Binmore, Simon Deakin and Timur Kuran, 2012
50
Deaton, Angus, and Jean Dreze. "Poverty and inequality in India: a re-examination." Economic and
Political Weekly (2002): 3729-3748.
Dickens, W., T., and Lang, K., “The Reemergence of Segmented Labor Market Theory”, The American
Economic Review, Vol. 78, No. 2, Papers and Proceedings of the One-Hundredth Annual Meeting of the
American Economic Association (May, 1988), pp. 129-134
Dyson, Tim, and Mick Moore. "On kinship structure, female autonomy, and demographic behavior in
India." Population and development review (1983): 35-60.
Fisman, Raymond and Khanna, Tarun "Facilitating Development: The Role of Business Groups." World
Development 32, no. 4 (April 2004): 609-628.
Fleisher, Belton M., and Jian Chen. "The coast–non coast income gap, productivity, and regional
economic policy in China." Journal of Comparative Economics 25.2 (1997): 220-236.
Friedberg, Rachel, "The Impact of Mass Migration on the Israeli Labor Market", Quarterly Journal of
Economics 116:4 (2001), 1373-1408.
Friedberg, Rachel, and Jennifer Hunt, "The Impact of Immigrants on Host Country Wages, Employment
and Growth", Journal of Economic Perspectives 9:2 (1995), 23-44.
Gibbons, Robert, “Incentives and Careers in Organizations”, Advances in Economics and Econometrics:
Theory and Applications, ed. D.Kreps and K. Wallis, Cambridge University Press, 1995.
Henderson, J. Vernon, Shalizi, Zmarak and Anthony J. Venables, “Geography and Development”, Journal
of Economic Geography, Volume1, Issue1, Pp. 81-105
Hunt, Jennifer. and Marjolaine Gauthier-LoiselleSource, “How Much Does Immigration Boost
Innovation?”, American Economic Journal: Macroeconomics, Vol. 2, No. 2 (April 2010), pp. 31-56
Jensen, Robert, “Do Labor Market Opportunities Affect Young Women's Work and Family Decisions?
Experimental Evidence from India”, The Quarterly Journal of Economics (2012) 127 (2): 753-792 first
published online March 3, 2012
Kanbur, Ravi, and Xiaobo Zhang. "Fifty years of regional inequality in China: a journey through central
planning, reform, and openness." Review of Development Economics 9.1 (2005): 87-106.
Katz, Lawrence F., Jeffrey R. Kling, and Jeffrey B. Liebman, “Moving to Opportunity in Boston: Early
Results of a Randomized Mobility Experiment”, The Quarterly Journal of Economics (2001) 116 (2):
607-654
Kerr, Sari Pekkala, and William Kerr, "Economic Impacts of Immigration: A Survey", Finnish
Economics Papers 24:1 (2011), 1-32.
Kerr, Sari Pekkala, Kerr, William and Lincoln, William., F., “Skilled Immigration and the Employment
Structures of U.S. Firms”, Working Paper. February 2013.
Kerr, William, "Breakthrough Inventions and Migrating Clusters of Innovation", Journal of Urban
Economics 67:1 (2010), 46-60.
51
Moretti. Enrico. "Local labor markets." Handbook of labor economics 4 (2011): 1237-1313.
Moretti, Enrico. The new geography of jobs. Houghton Mifflin Harcourt, 2012.
Munshi, Kaivan, and Mark Rosenzweig. Why is mobility in India so low? Social insurance, inequality,
and growth. No. w14850. National Bureau of Economic Research, 2009.
Reich , M., Gordon, D., M., and Edwards, R., C., “A theory of labor market segmentation”, The
American Economic Review, 1973
Roy, Andrew, D. “Some Thoughts on the Distribution of Earnings,” Oxford Econ. Pap., N.S.,
June 1951, 3, pp. 135–46
Schultz, T. Paul. "Rural-urban migration in Colombia." The Review of Economics and Statistics 53.2
(1971): 157-163.
Schwartz Aba, “Interpreting the Effect of Distance on Migration”, The Journal of Political Economy, Vol.
81, No. 5 (Sep. - Oct., 1973), pp. 1153-1169
Singh, Nirvikar, Bhandari, Laveesh, Chen, Aoyu and Aarti Khare, “Regional Inequality in India: A Fresh
Look”, Economic and Political Weekly, Vol. 38, No. 11 (Mar. 15-21, 2003), pp. 1069-1073
Sjaastad, Larry A., “The Costs and Returns of Human Migration”, The Journal of Political Economy, Vol.
70, No. 5, Part 2: Investment in Human Beings. (Oct., 1962), pp. 80-93
Summers, L., H., and Abraham, K., G., “Why is the unemployment rate so very high near full
employment?” Brookings Papers on Economics, 1986
Yap, Lorene. "Internal migration and economic development in Brazil." The Quarterly Journal of
Economics 90.1 (1976): 119-137.
Young, Alwyn. “Inequality, the Urban-Rural Gap and Migration”, 2013. The Quarterly Journal of
Economics (2013)
Zhao, Yaohui. "Leaving the countryside: rural-to-urban migration decisions in China." The American
Economic Review 89.2 (1999): 281-286.
52
Download