What Determines Learning among Indigenous and Non-Indigenous Children in Peru? An Analysis of the First Three Waves of the Young Lives Survey PRELIMINARY AND INCOMPLETE PLEASE DO NOT CITE OR CIRCULATE WITHOUT PERMISSION Irma A. Arteaga1 Abstract In Peru, indigenous children obtain lower test scores, on average, than nonindigenous children. In this study we investigate whether there is a test score achievement gap prior to starting school (age 5) and whether this gap increases or not by age 8. We are particularly interested in decomposing the ethnic gap into its underlying determinants. Specifically, we examine four pathways: the effect of family characteristics and behavior, child nutrition, neighborhood, and school on child development. A major problem in estimating the effect of child health on child cognitive skills is that unobserved factors may affect child health and child cognitive skills simultaneously. This means that the effect of child health on child skills may not be correctly identified. To date, studies on the relationship between child health and child cognitive skills have used instrumental variables methods to correct for potential bias. In this paper, rainfall around the time of birth, district’s health infrastructure, and exogenous changes in food availability when a child was a newborn are used as instruments. Using the 2000/01 cohort of the Young Lives Longitudinal Survey for Peru, our results suggest that the indigenous- non indigenous gap increases over time. Also, child nutrition has a positive effect on vocabulary and math tests. Moreover, a bigger effect in both vocabulary and math is found when child nutrition is instrumented, in contrast to results when instrumental variables are not taken into account. Equalizing family characteristics and behavior, child nutrition, neighborhood, and school factors to the average level of Spanish-speaking children would potentially reduce the indigenous- non-indigenous test score gap up to 76%. The pathways considered yield valuable information for policy makers who are interested in targeting policies. These are important for human capital formation and could potentially have long lasting impacts on educational attainment and poverty in Peru. JEL classification: I20; J15; H75; O15 Keywords: Cognitive skills; test score; child development; indigenous – non-indigenous gap; discrimination 1 Paper prepared for presentation at the 2012 Annual Meeting of the Association for Education Finance and Policy. Please do not cite without permission. Arteaga is an Assistant Professor at the University of Missouri. Corresponding author: Irma Arteaga, 112 Middlebush Hall, Truman School of Public Affairs, University of Missouri, MO 65211. Email: arteagai@missouri.edu. 1 1. Introduction The gap in test scores between indigenous and non-indigenous children in Latin America is of increasing concern. The country with the highest indigenous population in Latin America is Peru, where 8.5 million people speak 67 different indigenous languages. Indigenous people in Peru account for approximately 30% of the total population, and Quechua is the most widely spoken indigenous language (Solis, 2009). Indigenous people have some of the lowest overall test scores in both reading and math (World Bank, 2009). This problem is compounded by the large, persistent test score gaps between rural and urban students as well as a large gap between indigenous and non-indigenous students (Sakellario, 2008). Prior research in Latin America has shown that these gaps persist even after controlling for socioeconomic status indicators (Hernandez-Zavala et al., 2006; Marshall, 2009; McEwan, 2007; and McEwan and Trowbridge, 2007). The sources of the disparities between indigenous and non-indigenous children have not been fully identified. In some studies, more than 50% of the achievement gap is left unexplained. Prior research has found that family and parental influences are important factors. The effects of schools have also been examined, although results are mixed. Data from Chile suggest that schools explain a large portion of the gap (McEwan, 2007) but data from Mexico and Guatemala show that school effects are not as important as family factors, and the combination of family and school effects does not satisfactorily explain the gap (Hernanez-Zavala, et al, 2006). Although parental education and attitudes, family health, and school quality have been suggested along with the usual determinants of educational outcomes, there are no studies on the ethnic achievement gap that adequately address all of these dimensions. Specifically, child health is absent from 2 these studies even when it is theoretically crucial in the explanation of the achievement gap (Glewwe & Miguel, 2008; Marshall, 2009). Another problem that we address is that school and neighborhood effects are often treated as the same effect. Additionally, the effects of parental preferences and attitudes have not been examined separately from demographics and socioeconomic status due to data limitations. Prior research has been limited to an analysis of parents’ roles based solely on education and ability. The importance of family influences on cognitive development is well established in the literature. Basic socioeconomic factors along with genetic endowments are accepted components of the production function for cognitive development. The role of parental behavior, however, has not been fully explored in the research from Latin American countries. In the United States, James-Burdumy (2005) finds that birth order, number of children, mother’s age at birth, parents’ education, and the presence of a grandmother in the household all have significant effects on cognitive development. These variables represent parental attitudes towards child rearing. A study of the test score gap between black and white students in the United States finds that differences in “household” inputs explain 10-20% of the gap and that differences in mothers’ abilities explain as much as 50% of the test score gap (Todd and Wolpin, 2007). This result raises some questions as to what the important household inputs are. School and teacher quality have also been examined as critical elements. Cook and Evans (2000) examine the importance of these factors on the decrease in the test score achievement gap between black and white students in the United States between the 1970’s and 2000. Using the Oaxaca-Blinder decomposition technique and data from the National Assessment of Education Progress, they found that, together, convergence in 3 family factors and school quality explain only 25% of the convergence in test scores. They conclude that more detailed variables for parental education and school quality would improve the model but that there still are missing elements such has health and parenting attitudes. Further investigation using data form the National Institute of Child Health and Human Development Study of Early Child Care and Youth Development (NICHD) shows that, among third graders, the quality of parenting has a significant effect on learning. (Murnane, et al. 2006). Results show that a rough measure, “maternal sensitivity,” explains almost one-third of the gap in test scores between black and white students. There is a great deal of variation in the estimates for the Latin American countries regarding the relative importance of family, school, and parenting quality. An analysis of data on test scores in Mexico and Guatemala shows that the test score gap between indigenous and non-indigenous students cannot be fully explained by family and school factors; only 41-75% of the gap is explained by these variables (Hernandez-Zavala, et al, 2006). Results from an analysis of the National Census for Student Achievement (SIMCE) in Chile show that school effects have a large impact on the test score gap between indigenous and non-indigenous students (McEwan, 2007). The study, which focused on Mathematics and Spanish scores for 8 year olds in 1997 and 2001, uses a differences-in-differences estimation method and a modified Oaxaca-Blinder decomposition. The results showed that a family’s socioeconomic status had only a small effect on the child’s cognitive development; but school factors explained a significant portion of the achievement gap between indigenous and non-indigenous children. 4 In addition to family, schools and parenting, health has been shown to have a large influence on a child’s cognitive development. Research shows that a well-nourished child is more able to pay attention in class, focus, concentrate and get better grades (Currie, 2009). Moreover, in a developing country such as Peru, a child who does not receive proper nutrition may develop lasting deficiencies or stunting that would be difficult or impossible to remedy (Alderman et al., 2006). This is one of the main reasons why, in contrast with the U.S. literature, research on developing countries has placed higher emphasis on child health as a determinant of cognitive development and little emphasis on parenting practices and home environment. However, child health has not been used to explain the test score gap between indigenous and non-indigenous students. One reason health and nutrition have been excluded from previous analysis is that data limitations make it difficult to measure nutrition and maternal health during the beginning of a child’s life, when nutrition is thought to have the greatest influence on cognitive development. There are few data sets currently available that include information on early nutrition and maternal health as well as test scores that show cognitive development. Additionally, it is challenging to measure health accurately since it encompasses a broad range of factors. Moreover, when child health is included, most of the literature ignores that child health and child cognitive skills both reflect household decisions regarding investments in children’s human capital, and therefore child health cannot be treated as an exogenous variable. There are only a few studies from developing countries that control for behaviors that determine child health when estimating the impact of child health on child cognitive skills (Alderman et al., 2001; Glewwe, Jacoby & King, 2001; Behrman & Levy, 1998). 5 These studies on Pakistan and the Philippines suggest that food availability, maternal health, and environmental health factors are ideal instruments for child health. Moreover, these findings indicate that child health is crucial to cognitive development and that the effect of child health on child cognitive skills is three to seven times greater than when child health is assumed to be an exogenous variable. The analysis in this essay contributes to the existing literature in a number of ways. First, this is one of the first studies that explicitly examines the achievement gap between Quechua and Spanish speakers. School performance and relevant information to model its determinants are not widely available for the Peru’s Quechua population, despite their being the largest indigenous group in Peru and Quechua being the second official language. Furthermore, even when such data are gathered, cultural factors and differences in modeling behavior in rural areas pose additional difficulties. For purposes of this analysis, an unusually rich dataset containing information on child, caregiver, family, and community is used, which identifies ethnicity for each members of the child’s nuclear family. Secondly, this may be the first study on a developing country that considers parenting behavior as one of the components explaining the achievement gap, even though these factors have proven to be important determinants of children’s cognitive skills (James-Burdumy, 2005; Kimmel & Connelly, 2007; Paxson & Schady, 2007). Thirdly, this paper expands the existing body of literature by incorporating several previously excluded variables: mother and child health, parenting behavior, and separate neighborhood and school characteristics. We use the Young Lives Study to measure the effects of parents education, family factors, home environment, child health, neighborhood and school. The data set provides longitudinal data for 1,681 Peruvian 6 children who were 1 year old in 2002, 5 years old in 2006, and 8 years old in 2009. It includes information on the factors listed above along with test scores from the Spanish version of the Peabody Picture Vocabulary Test along with information on the child’s ethnic and linguistic background. The remainder of this paper is organized as follows. Section 2 provides a background on the indigenous people in Peru. Section 3 describes the conceptual and empirical framework. Section 4 briefly describes the Peruvian setting, the sample and variables used in the analysis. Section 5 studies the effects of child health, family characteristics and behavior, school and neighborhood characteristics on cognitive skills of children, as well as the sources of the indigenous – non-indigenous gap on test scores of children. Section 6 concludes. 2. Background on Peru Peru has succeeded in providing broad access to preprimary and primary school: the net enrollment rates for preschool and primary school, respectively, were 64.4% and 96.6% in 2008, according to the Peruvian Ministry of Education. Improving educational outcomes for those who are in school, however, remains a challenge. Along these lines, more than half of Peruvian children tested into the lowest of 5 categories of the PISA 20092 tests in reading (OECD, 2010). Improving investments in the first years of life may be an important strategy to improve learning outcomes in Peru and other developing countries. 2 PISA is the acronym for Program in International Student Achievement. It is run by the Organization of Economic Cooperation and Development (OECD). Peru participated in PISA 2000 and 2009. In 2009, 65 countries participated and Peru ranked 62 in reading and 60 in mathematics. 7 Peru is a lower-middle income country, where the GDP per capita in 2010 was only $5,205, 11 percent that of the United States (IMF, 2010). An estimated 31 percent of people live below the official poverty line (INEI, 2011). An estimated 8 percent of the population lives on less than a dollar per person per day, and 19 percent live on less than two dollars per day (United Nations, 2010). Overall, education and health indicators for Peru are roughly on par with those of other Latin American countries. Net primary enrollment in Peru for males and females is 96% and 97%, respectively, whereas for the Latin American region average is 94% and 95%, respectively. Net secondary enrollment in Peru for males and females is 75%, whereas the Latin American region averages are 69% and 74%, respectively. Infant and child mortality are lower in Peru than in the rest of the Latin American region; the infant mortality rate per 1,000 live births is 17 for Peru, compared to an average of 22 for Latin America and the child mortality rate is 20 for Peru and 26 for Latin America. However, Peru has worse indicators for some measures of child nutrition. Thirty percent of children younger than five have low height-for-age (stunting), a figure that is far above the Latin American average of 16 percent (UNICEF, 2009). Moreover, one in ten children has low birth weight, whereas the Latin American average is nine percent. Indicators of child health status are even lower for indigenous children in Peru. But, what do we mean by indigenous people? According to Vasquez, Chumpitaz and Jara (2008) there are three main definitions. The first one refers to maternal tongue. According to these authors, maternal tongue is the most employed definition in Latin America. Bolivia, Ecuador, Colombia, Honduras, Mexico, Paraguay, and Peru, among others, use this definition. This criterion is considered “objective” because it allows 8 transmitting culture from one generation to the other. When one group loses its own language, it also looses a fundamental element in the socialization process and therefore it will start its extinction as an ethnic group (Vasquez, Chumpitaz, and Jara, 2008). The second definition of indigenous refers to self-identification and this is referred as a “subjective” measure. This definition of indigenous is based on the own perception about belonging to certain ethnic group and his/her self identification with its ancestors and customs. This criterion can be influenced by an individual’s pride to belonging to a certain group or by a feeling of exclusion. That’s the reason why this definition is more subjective. Finally, the last definition is based on the recognition of an indigenous community through its territory, race, religion or clothing. Guatemala and Paraguay, for example, use type of shoes and indigenous clothing as a complimentary criterion to identify natives groups. In this paper we use the objective definition of language learned at home. Traditional measures in Peru show that around 30 percent of the population is indigenous; these traditional measures use the first definition and complement their findings with the other measures of self-identification, territory and clothing. The Young Lives Survey shows that when mothers are asked about their language and when they are asked about their ethnicity, 30 percent identify themselves as indigenous. This is similar to the traditional measure. However, when the survey asks their children about the language they speak, only 13 percent responded that they speak an indigenous language. This result is similar to the Peruvian National Census from 2007, where 15 percent of the population five years and older responded that they learnt an indigenous language at home. The children in the Young Lives Survey were born in 2001, which means that the 9 younger the cohort, the lower the percentage of individuals who are learning an indigenous language. This consistency of results with the national census guarantees that our results can be generalizable. According to the Census of 2007, 83 percent of the indigenous population is Quechua, 11 percent is Aymara, 2 percent is Ashaninka, and 4 percent belong to other indigenous groups. The Peruvian National Institute of Statistics reports that the enrollment rate for indigenous children 6 to 16 years of age is 88 percent whereas for Spanish speaking children it is 96 percent. Moreover, 29 percent of indigenous children who are enrolled in school, do not go to school, and 73 percent are one or more grade levels below the grade level appropriate to their age. Because the Political Constitution of Peru guarantees education for all, and due to the large percentage of indigenous population in Peru and their vulnerability, the Ministry of Education initiated a National Policy of Bilingual Education in the early 1970s. During that time, the government gathered information about appropriate techniques to be used to address education for the indigenous population. Since then, some advances has been made, but different governments have had different visions of bilingual education and the National Division of Intercultural and Bilingual Education still does not have the capacity to hire teachers or to approve a bilingual curriculum for children or for future teachers. One of the most important policies that has been implemented is the creation of an intercultural bilingual school. A bilingual school is defined as a school that has at least one teacher who has been trained on intercultural and bilingual education. 10 According to the Peruvian Ministry of Education, only 35% of the rural bilingual population attends to an intercultural bilingual school in 2008. 3. Conceptual Framework (insert all the methodological issues and production functions) A standard procedure used to analyze the determinants of the child’s cognitive achievement gap is to fit equations between cognitive achievement and observed characteristics. Given there are two groups, indigenous (I) and non-indigenous (NI), let the outcome variable be Dcog, and denote the explanatory variables by X. The question is how much of the mean outcome difference is accounted for by group differences in predictors. I NI Gap E Dcog E Dcog (1) where E (Dcog) denotes the expected value of the outcome variable, child’s cognitive achievement (e.g. test scores). Consider the following linear model: Yl X l l l , assu min g E l | X i , l I , NI (2) where denotes unobserved determinants of Y and random noise in the Y variable. This setup allows one to write the gap between indigenous and non-indigenous children as follows: I NI Gap E Dcog E Dcog E X i I E X NI NI because: E Yl E X l l l E X l l E l E X l l with E(βl)=βl, and E(εl)=0 by assumption. 11 (3) To identify the contribution of group differences in variables and differences in the impact at themselves, equation (4) can be rearranged, so that the observed cognitive achievement differential can be decomposed as: I NI Gap E Dcog E Dcog X I X NI I X NI I NI 3 (4) where E(X)= X . A child is identified as indigenous if the child speaks an indigenous language. Robust checks that include different definitions of indigenous, such as mother’s tongue is indigenous, maternal ethnicity is indigenous, and child’s ethnicity is indigenous, showed similar results. This is known as the Oaxaca-Blinder decomposition (Blinder, 1973; Oaxaca, 1973). The overall achievement gap is decomposed into two components. The first is the portion attributed to differences in the means characteristics X I X NI evaluated on endowments of indigenous children I . This portion is known in the literature as the endowment effect, because it accounts for the effect of having been born as an indigenous child, given by βI. Thus, this portion of the gap answers the following question: If the initial endowment will be indigenous (βI), what will be the difference between indigenous and non-indigenous children? The second component is attributable to differences on performance (βI - βNI) across indigenous and non-indigenous children for the same characteristics (XNI). This portion of the gap is known in the literature as the coefficient effect, because it accounts for the contribution of differences in the coefficients. Moreover, this portion of the gap answers the following question: if indigenous children will have characteristics of non-indigenous children, what will be the differences between indigenous and non-indigenous children? 4. Sample, Data, and Measures 3 A second way to do the decomposition is: Gap=(X I – XNI)ΒNI + XI(βI-βNI). 12 The data used in this paper include the first three waves of the Peruvian Young Lives Survey, which consists of 2,000-year-old children randomly selected in 20 districts, that are part of 18 provinces in Peru4. In addition to that, district data were gathered from FONCODES (Fondo Nacional de Compensacion para el Desarrollo; Peruvian National Fund for Development and Social Compensation). This section briefly describes some important features of the Peruvian data, discusses measures of cognitive development, child health, family characteristics, home environment and school characteristics. 4.1. Data The sample in this paper consists of urban and rural children from the Young Lives Survey (YLS) panel data for Peru. Young Lives is a longitudinal research project that collects data for investigating the changing nature of childhood poverty and tracks the development of 12,000 children in Ethiopia, Vietnam, India and Peru through studying and following two cohorts of children over a 15-year period. The younger cohort consists of about 2,000 children per country, aged between 6 and 18 months in 2002. The older cohort consists of 1,000 children per country aged between 7.5 and 8.5 years in 2002. The present study only uses data from Peru. In Peru, the Young Lives used multi-stage, stratified, random sampling to select the two cohorts of children. A panel dataset is composed of three waves of data, collected in 2002, 2006, and 2009. This study uses the panel data set for the younger cohort, children who were about 1 year old in 2002, because it includes data on parenting practices just after birth, and also two waves of data that account for cognitive development. Previous research has shown that 4 A district was randomly chosen within each of the 18 provinces. Only in the province of Lima, three districts were selected. The reason is that almost one-third of the population in Peru lives in Lima. 13 cognitive ability is a cumulative process (Todd & Wolpin, 2007). This survey provides information on household characteristics, detailed caregiver records, child characteristics, as well as results of the Spanish version of the Peabody Picture Vocabulary Test (PPVT), a test of vocabulary recognition that has been widely used in Latin America as a general measure of cognitive development (Bernal, 2008; James-Burdumy, S., 2005, Shady, N. 2006). Results from the PPVT test constitute a key variable of interest in this study and serve as the primary measure of cognitive ability. This panel data set contains 1,784 observations from the whole country: 35% of the children live in the coast, 50% in the mountains and 15% in the cloud forest/jungle. Given that the YLS is a longitudinal study, it is important to consider the potential for attrition bias. Such bias may occur if attrition is non-random and if characteristics correlated with attrition are also correlated with the outcomes of interest. Outes-Leon and Dercon analyzed the first and second waves of data for the four YLS countries (2008). They found that the attrition rate (3.5 %, excluding 6 deaths) for the Peru sample is low compared to other longitudinal studies in developing countries. Furthermore, the authors use attrition probit and BGLW5 tests for attrition bias and conclude that attrition is overwhelmingly random and unlikely to lead to significant biases. A sample of 1,681 children was used for the analyses in this paper. First, this article selected the complete panel of children from the younger cohort, a total of 1,959 children. Second, 103 children were excluded because of missing information on the control variables used in the analyses. In most cases, missing values were due to missing 5 BGLW stands for Becketti, Gould, Lilliard and Welch (1988) test for selection on observables. 14 paternal characteristics (either the father was not present in the household at the time of the interview or the father refused to answer those questions). Third, 278 children were excluded because of missing information on the dependent variables used in the analysis: 223 were not reached on the second wave, and additional 55 were not reached on the third wave of data collection. A comparison of the mean characteristics of the full sample of children and the subsample of children used in the analysis, presented in Appendix table A1, suggests that differences between the two groups are very small and statistically insignificant. 4.2. Measures Descriptive statistics are shown in Appendix Table A1. The two outcome measures for the analysis of this paper are a child’s score on the quantitative subset of questions of the quantitative/mathematics test and a child’s score on the PPVT-III test. Both tests were conducted in rounds 2 and 3. The former test was developed by the International Evaluation Association. For round 2, the test used was the CDA-Q (cognitive development assessment – quantitative), which is specially developed for children 4.5-5.5 years of age. It consists of 15 items, and for each item the child is shown a picture and asked a question, and she is asked to choose the best answer from three or four choices. An example is the following: “Look at the plates of cupcakes; point to the plate that has the fewest cupcakes.” The child will score 1 point for each correct answer and 0 points for incorrect or incomplete answers. The raw score is calculated as the number of correct answers. Four round 3, the mathematical achievement test measures various numerical abilities appropriate for children aged 8, and it has two parts. The first part consists of 9 questions, where the fieldworker showed the child a series of cards and will ask him/her to answer some questions orally. Examples of these questions are: “Please put your finger on number 21,” “Which number should be in the blank space?” In the second part the child was tested on math computing. The fieldworker gave the child a test booklet and 15 explained the instructions found in there to him/her. The child had 12 minutes to answer 20 basic arithmetic questions. The PPVT is a test of receptive vocabulary that is frequently used to evaluate Hispanic children and adults (Paxson & Shady, 2007; Umbel et al., 1992). During the test, children were asked to select from among four pictures the one that best represented the meaning of a word presented to them orally by the interviewer. This test is composed of 125 questions arranged in 10 groups. Each group corresponds to items that children of a certain age should be proficient in. Thus, the first questions are targeted to 3-4 year-oldchildren, the next group to 5 year-old children, and so on. Finally, the last group targets 14 year-old children; later pictures are gradually more difficult. It is not expected that children who are 8 years old answer the 125 questions correctly, because they would be shown pictures that would represent words that are completely unfamiliar to them. However, some children will be able to recognize words that are far beyond the expectations for their age. Test items have been selected for their universality and appropriateness to Hispanic communities and have been widely used in Latin American countries and with Spanish-speaking children in the United States (Paxson & Shady, 2007; Rosenzweig & Wolpin, 1994). In the case of the Young Lives survey, the test was available in Spanish, Quechua and Aymara, the most frequent languages spoken in the country (Spanish and Quechua are both official languages in Peru). We examine how performance on the PPVT differs across children of different ages, which requires the use of age-normed scores. The PPVT has not been normed for samples of Peruvian children. We standardized the raw scores on the test using the norms published by the developers of the Peabody test, which are based on samples of Mexican 16 and Puerto Rican children, but apply to Hispanic populations. These norms are set so that the mean should be 100 and the standard deviation 15 at each age. It is important to notice that the PPVT has a minimum raw score that can be normed for children of each age. For children over 132 months of age (11 years old), the minimum raw score always corresponds to a normed score of 55. The existence of these minimum scores is a challenge for our analysis. A fraction of children in our sample—8.3 percent overall—do not attain the minimum raw score that can be normed. However, in contrast to other studies (Paxson & Schady, 2007), none of the children in our sample have a raw score of zero or close to zero, but most of them have a score between 40 and 54, which indicates that they have made at least some progress. A common practice is to exclude such children from analyses, but this will generally result in biased parameter estimates. Instead of dropping these children, and because their score is really close to the minimum norm, we assign them the minimum normed score for that age. Including the censored values, average performance on the PPVT test is low: The mean of 88.71 indicates that, on average, the children in our sample are .75 standard deviations below the mean of the group used for norming the PPVT in round 2. Twelve and a half percent of children in the sample have censored PPVT scores. When the censored group is excluded, the mean PPVT score rises to 90.1, still well below the normed mean. Our variable of interest is ethnicity. There is not a consensus in the literature about the definition of ethnicity. Some researchers use mother’s tongue, some others use 17 child’s language, while others use child’s self-reported ethnicity. Our analysis in this paper defines indigenous as child’s spoken language. However, auxiliary regressions using the other definitions of ethnicity yield similar results. This paper controls for socioeconomic characteristics, child health, parenting characteristics, neighborhood, and school characteristics. The socioeconomic variables include the natural logarithm of annual household expenditures, father’s education, and mother’s education. To calculate the annual household expenditures we added up all the expenses reported by the head of the household during the year. These expenses included items bought on weekly, monthly and annual basis, as well as paid utilities and services. The education of parents is given in single years. This paper uses four measures of parental behavior towards the education of their child: the natural logarithm of expenditure in education, number of months in day care, number of months in preschool and hours per week spent in preschool. One variable is used to control for home environment, number of books in the household. Measures of school characteristics were used only for round 3, because a child did not go to school in round 2. This study uses three measures of school characteristics: grade repetition, teaching quality and type of school (private or public). All measures are based on parents’ self-reports. Teacher quality was constructed using two measures: whether the parent thinks that the teacher is a good teacher or not, and whether the teacher frequently misses school or not. Grade repetition is a dichotomous variable that refers to whether the child repeated a grade in round 3 or not. 18 This paper uses the child’s height for age z-score as an indicator of early childhood malnutrition. There is a consensus in the literature that this is the best single indicator of a child’s cumulative nutritional status among infants and children (McKee & Todd, 2009; Behrman & Hoddinott, 2001; Glewwe & King, 2001). To avoid an endogeneity problem caused by the fact that parents make decisions about health/nutrition and education simultaneously, this study instruments child’s height-for-age (Glewwe, Jacoby & King, 2001; Glewwe & Miguel, 2008). Moreover, ordinary least squares (OLS) estimates of the parameter of the demand for cognitive skills are likely to be biased for several reasons. First, a child’s nutritional status may be correlated with unobserved parental tastes regarding their child’s nutrition. Second, height-for-age z-score may not completely reflect children’s nutritional status (there is variation in height and weight even among healthy children, which implies that this z-score can be a noisy measure of child’s nutritional status). Because these two problems can be resolved using instrumental variable methods, we consider the choice of instruments for child’s growth. To deal with the first problem, the instruments must be correlated with child’s growth but uncorrelated with parental tastes or a child’s inherited ability to learn, both of which are unobserved variables in the demand equation for child’s cognitive skills. Three variables that would appear to satisfy these criteria are exogenous changes in the food supply, rainfall for each child’s birth’s year and birth’s location, and a measure of fluctuations in the physical environment such as health characteristics of the community. Food shocks when a child is 6-18 months of age allows one to control for the behavior determining the child health stock as measured by child’s height by age 5. These food shocks can serve as instrumental variables because they may disrupt income-earning activities or raise 19 individual’s exposure to disease-spreading pathogens, both of which are important determinants of health status. Such exogenous changes in food supply (e.g. livestock death, crops failure, livestock theft, and exogenous changes in food availability) are uncorrelated with shocks at the subsequent age at which decisions are made whether or not to invest in children’s education. Therefore, this procedure permits estimation of the impact of child health on child cognitive skills without contamination from unobserved behaviors such as time allocation decisions about other household members that are concurrent with the child’s cognitive development decision. Similar conditions are satisfied with the rainfall shocks instrument. In a rural country like Peru, higher birthyear rainfall can be interpreted as a positive shock to a Peruvian community, leading to a higher local-level crop output and household income. These exogenous weather conditions, rainfall shocks, are expected to have a positive and significant effect on child’s nutrition and health on the first year of life. The nutrition literature suggests that this initial health endowment is determined by environmental conditions early in life and has lasting effects. We obtained historical rainfall data for weather stations across Peru from the Global Historical Climatology Network Precipitation and Temperature Data6. The data include monthly records for each station as well as its latitude and longitude. For each month between 1999 and 2001, we use the station location information to match each birth district represented in the YLS to the closest weather station. We matched a total of 20 stations with YLS birth districts. Finally, we use population per health center as our third instrument, to measure health environment in the community. All three of these variables appear to resolve the first econometric problem. In order to address the 6 These data are available at http://www.ncdc.noaa.gov 20 second problem, one can think that changes in height may not fully measure children’s nutritional status, because this not may reflect micronutrients deficiencies, implying that the error term in the demand equation for cognitive skills could be correlated with the instruments. However, the literature suggests that this is not going to generate a significant bias. The reason is that low-income households not only have lower calorie and protein intakes but also less diverse diets, which leads to micronutrient deficiencies. Because of the correlation between calorie and protein intakes and micronutrient intakes, child growth may reflect micronutrient deficiencies, leaving little possibility for such deficiencies to be revealed in the error term in the demand equation for cognitive skills. We explore how much the estimates differ with this preferred procedure in comparison with a naïve model in which child health is treated as exogenous as well as other alternatives. The alternatives include the district’s characteristics of health infrastructure during the first wave of data collection included in the first-stage estimates to attempt to control for the behaviors determining child health. 4.3. Empirical Framework The empirical approach is based on the Oaxaca-Blinder decomposition, which is commonly used to assess the determinants of earning gaps between two groups (Blinder, 1973; Oaxaca, 1973). This approach has recently been used to decompose the determinants of the racial and ethnic gap on test scores (Hernandez-Zavala et al, 2006; Neidell, 2000). As mentioned above, the Young Lives data collect information for children in 20 sites. Each group of about 100 children live in the same neighborhood, and is highly likely that these neighborhoods vary in terms of unobserved family variables such as wealth or parental behavior (due to choices parents made regarding where to raise their children). If these omitted variables have a causal impact on test scores, then the estimated impact of neighborhood effects will be biased in ordinary least squares (OLS) 21 regressions. This raises concern about omitted variable bias in OLS estimates. One way to reduce this concern is to control for neighborhood fixed effects, facilitated by the availability of multiple neighborhood observations, both indigenous and non-indigenous, within neighborhoods. 4.3.1. Ordinary Least Square (OLS) Estimation In order to explore the indigenous – non-indigenous gap in test scores, this study starts by specifying the test score to be a function of the child’s ethnicity and a set of child and household characteristics (child’s gender, per capita expenditure, father’s years of education, mother’s years of education, height for age z-score, per capita expenditure in education, months of child in day care, months of child in preschool, and hours a week spent in a preschool center), that may influence test scores. It begins with the estimation of OLS regressions for indigenous and indigenous children (as indicated in the superscripts) within each year’s sample: Test ind ind X ind ij ind ijind ij j Test nind nind X nind ij nind ijnind ij j (5) (6) The test score (Test) of child i in neighborhood j is regressed on a vector of child and family variables (X). The ’s are coefficients to be estimated. Each neighborhood (district) is allowed to have a separate intercept via a series of fixed effects (). Finally, is an error term, which represents all unobserved variables. The standard errors are adjusted for arbitrary forms of heteroskedasticity because observations within 22 neighborhoods are correlated due to their exposure to the same community factors; this makes the disturbances not independently and identically distributed (Wooldrigde, 2010). Given the coefficient estimates, the mean difference in indigenous and nonindigenous test scores within each cross-section can be re-written as: Test Test score difference X ˆ ind X nind ˆ ind ˆ nind Endowments Performance (7) Neighborhood effect Test non X ind X non where Test Test ind ind and X ind . The overbars refer to variable means and the subscripts refer to the subsamples of indigenous and nonindigenous children. The first term on the right-hand side of equation (7) is the portion of the gap that can be explained by differences in each group’s endowment such as maternal schooling, number of books in the household. The second term is known as the performance effect. Technically speaking, this is the sum, over all the x variables, of the contributions to the difference in mean test scores between non-indigenous and ethnic indigenous children caused by differences -across the two groups of children- in the impacts of the x variables on test scores. For example, it may be that the impact (β) of maternal years of education on test scores is higher for non-indigenous than for indigenous children, for example because school quality is higher among the non-indigenous population. The third term in equation (7) is the portion of the gap due to living in a certain neighborhood, also known as neighborhood effect. Even though one of the goals of this paper is to understand the indigenous – nonindigenous gap in each round of data, another goal is understand the causes of changes in the test score gap. In this sense, equation (6) can be successively applied to the 2005 and 2009 samples. The differences between the two can be written as: Test Test X ˆ ind X ˆ ind ˆ nind 23 Test score diff-indiff endowments performance neighb. effect (8) ind ind where: Xˆ ind X 09ˆ09 X 05ˆ05 , nind ˆ ind X ˆ ind ˆ nind X 09 09 ˆ09 nind X 05nind ˆ05ind ˆ05nind The left-hand side of equation (8) refers to the test score difference-in-difference, discussed on the conceptual framework section. The right-hand side decomposes into three terms. The first term represents the change in the gap that is due to changes in endowments between indigenous and non-indigenous children. We consider four types of endowments: socio-economic, child’s health, home environment and parental behavior towards education. The second term measures whether changing performance of child and family variables (that is, the coefficients on child and family variables), changes the test score gap. Finally, the third term represents the portion of changing gap that is due to changes in neighborhood effects. 4.3.2. Instrumental Variable Approach Following Glewwe and King (2001)This paper uses food availability, rainfall shocks and environmental health as instruments for child’s nutritional status. Child’s nutritional status is measured by height-for-age z-score. This measures the deviation of the child’s height according to her/his age and gender with respect to the reference population. This variable is commonly used to measure child’s nutritional status. We use three instruments: food availability shocks, rainfall shocks and health infrastructure in the child’s district. We are able to use food shocks during wave 1 to control for the behavior determining the child nutritional status in waves 2 and 3. Such food availability shocks are uncorrelated with shocks during the following waves at which decisions are made about investing in child’s cognitive skills. Similarly, we use child’s birth-year rainfall shocks on the locality where the child was born to instrument child’s nutritional status. Rainfall shocks should be uncorrelated with decisions about whether or not to invest in child’s cognitive skills. In the same way, health infrastructure in the district in wave 1 should be uncorrelated with parental decisions about investing in their child’s cognitive 24 skills. Therefore, this procedure permits estimation of the impact of child health on child cognitive skills without contamination from unobserved behaviors such as allocation decisions for other household members that are concurrent with the child’s cognitive skills decision. This suggests running the following regression in the first stage: height for age f food availabili tyt , ra inf all t , neight , ut ( ) Where dheight denotes difference in height, food availability refers to food availability shocks in wave 1, rainfall refers to rainfall shocks and neigh to health characteristics in the district in wave 1 (e.g. population per health center). As Fortin, Lemieux, and Firpo (2011) suggest, the Oaxaca-Blinder decomposition can be performed by substituting the OLS estimates of the beta coefficients by their instrumental variables equivalents. 5. Results This section examines the test scores of the 1,681 children in the younger cohort, who were about five years old when they were tested in mid 2006 and about eight – nine years old when they were retested at the end of 2009. The section begins by comparing the test scores of ethnic majority (Spanish speakers) and ethnic minority (Quechua, Aymara and other native language speakers) children, first for the entire sample and then for the subsample of districts that have both Spanish speaking and non-Spanish speaking children. Then, it presents estimates of cognitive skills production functions for both vocabulary and mathematics that attempt to explain the determinants of test scores, and why the test scores of children are lower for ethnic minority children. Table 1 presents means and standard deviations of the scores for the Math Achievement test and the PPVT-III test, first for the full sample and then separately for the Spanish speaking population and the indigenous speaking population. The top half of 25 the table shows statistics for all students while the bottom half limits the sample to the seven districts that had both Spanish speakers and ethnic minority children. Beginning with the first row in that table, the average child in the full sample had a standardized PPVT score of 88.7 in 2005. The gap between non-indigenous and indigenous children is not so large, the former had a mean score of 90 while the latter had a mean score of 79.9, a difference of 10.1 which is equivalent to 0.4 standard deviations of the distribution of test scores. This gap, however, considerably increases in 2009, where non-indigenous children had a mean score of 99.4 and indigenous children a mean score of 77.0, a difference of 22.4 which is equivalent to 1.1 standard deviations of the distribution of test scores. Turning to the mathematics scores, the average child in the full sample correctly answered 8.4 of the 15 questions in 2005. The gap between nonindigenous and indigenous children is small, the former had a mean score of 8.5 and the latter had a mean score of 8.0, a difference of 0.5 points which is equivalent to .2 standard deviations of the distribution of test scores. However, the mathematics achievement gap increases in 2009. The average child in the full sample correctly answered 12.1 of the 29 questions in the Mathematics Achievement test. Non-indigenous children had a mean score of 12.6 while indigenous children had a mean score of 8.2, a difference of 4.5 which is equivalent to 0.9 standard deviations of the distribution of test scores. These simple descriptive statistics show that the test score gap between nonindigenous and indigenous children is increasing overtime. It is possible; however, that part of the difference between the test scores of non-indigenous and indigenous children 26 may be due to the fact that they live in different communities and so the neighborhood environment can play a role. It may also be that living in different communities implies that children attend to different schools. To control for differences in communities, the bottom half of Table 1 presents the same numbers in the top half, but is limited to the seven districts that have both non-indigenous and indigenous children in the sample. This comparison does result in somewhat smaller gaps for 2005 (44 percent smaller for the PPVT test and 65 percent smaller for the Math test), and marginally smaller gaps for 2009 (12 percent smaller for both PPVT and Math tests). Yet, there are still large gaps even when comparing non-indigenous and indigenous children who live in the same district. To better understand the nature of the gaps, the rest of this section presents regressions that attempt to explain the differences in the test scores of indigenous and non-indigenous children in Peru. The results for the PPVT test are shown in Table 2, those for the Mathematics test are shown in Table 3 and those for the gap decomposition in Table 4. The first column of Table 2shows the estimates of nind in equation (1) and the second column shows the estimates of ind in equation (6), both for 2005. The next two columns show similar results for 2009. For 2005, per capita expenditure, age of child in months, per capita expenditure in education, and months in preschool have a significant effect on non-indigenous PPVT skills. Positive impacts of household income on child cognitive abilities are often found in other studies and are quite intuitive; however, a negative impact of expenditure in education is not that intuitive. One possible explanation 27 is that in 2005, most of the children were 5 years old; therefore, expenses on education are not that common in Peru for that age. In 2009; however, expenditure in education had a positive and significant effect on PPVT skills for non-indigenous children, as well as household income, maternal education, gender, age of child in months and height-forage z-scores. Similar effects were found for indigenous children: in 2005, age of child in month, household income, maternal education, height-for-age z-score and months in day care had a significant effect on PPVT scores; and in 2009, age of child in months, maternal and paternal education, household expenditures and height-for-age z-scores had positive and significant effects. As previously described, the gap in mathematics achievement was really small for 2005, but increased eight hundred percent by 2009. In the latter year, age in months was significant for both non-indigenous and indigenous children. It is possible that this reflects the fact that older children are more mature and therefore, are able to acquire more skills. Paternal education has a significant effect on indigenous children, but not in non-indigenous children. Presumably, this reflects the fact that in indigenous societies, the male plays a predominant role and if the father is educated, he will want his child to be it too. Maternal education, on the other hand, just has a significant effect on nonindigenous children. This result is frequently found in the literature (Currie, 2009; JamesBurdumy, 2005; Kimmel & Connellly, 2007). Height-for-age z-score has a positive and significant impact on mathematics achievement, as often found in the literature, but just for non-indigenous children. 28 The Oaxaca-Blinder decomposition can be used to see how much of the observed gap between non-indigenous and indigenous children in their mean PPVT test scores is explained by differences in the means of the explanatory variables between nonindigenous and indigenous children (endowment effect) and how much is explained by differences in the impacts of those variables (performance effect). Table 4 shows the decomposition results. The left-most columns report the decomposition within the 2005 sample, based upon Equation (7). Approximately 71 percent of the 2005 PPVT test score gaps can be explained by the different endowments across indigenous and non-indigenous children. A somewhat significant percentage, 5 percent is explained by differing neighborhood effects. In 2009, however, 64 percent of the PPVT gap can be explained by the different endowments across indigenous and non-indigenous children. We can observe then that parental education explains 49 percent of the gap, followed by preschool (13 percent). Twenty two percent of the gap is accounted by the returns or performance. Some researchers call the endowment effect, the “explained” part of the gap, because modifying the endowments of parental education, child health, or preschool for the indigenous groups will reduce the gap. This can be done for example, with policies that reduce adult illiteracy in indigenous communities or policies that provide incentives to indigenous people to get vaccinations for their children or policies that provide them nutritious food for their children. These same researchers call the performance effect, the “unexplained” part of the gap, because there are no simply policies that can be used in order to modify them. This means that we can also interpret these results in the following way, seventy 29 eight percent of the gap can be explained by parental, children, preschool (64 percent) and neighborhood factor (14 percent), and 22 percent of the gap is unexplained. One objective of this study is to decompose the gaps within years, but another one is to understand what caused the increase of the gap between 2005 and 2009. Thus, the right-most columns report the decomposition of the test score difference-in-differences in each subject based upon Equation (8). Overall the gap increases 0.46 standard deviations in Vocabulary, and 1.08 in mathematics. The decomposition of the vocabulary gap suggests that 79 percent of this change is explained by changing endowments of family, child and neighborhood, consistent with previous descriptive evidence. Similarly, in the case of mathematics, 77 percent of the gap can be explained by changes in the endowments. To summarize, the evidence suggests that convergence in endowments of family, children, preschool and neighborhood variables, at least those observed in the Young Lives data, plays a significant role in the increasing gap. The next tables present results with the IV method. Remember that the OLS estimates in Table 2 and Table 3 for non-indigenous and indigenous children in 2005 and indigenous children in 2009 suggest that poor nutrition has no significant effects on children’s cognitive development, which is consistent with previous findings (Alderman, et al., 2001; Stein et al., 1975; Waver et al. 1981), but is inconsistent with Glewwe & King, 2001; and Villar et al., 1984. Yet, the OLS results may be misleading because tastes for child quality may induce correlation between child nutrition and cognitive skills that is not causally related. To address this problem, we use an IV approach and present 30 these results on Tables 5, and 6. Estimates in these tables treat child nutrition as endogenous, using food availability shocks, rainfall shocks, and health characteristics in the community. The fist-stage results are given in Appendix table A2. These parameter estimates are different from the OLS results. Indeed, a Hausman test rejects the hypothesis that the two sets of estimates are equal. The most notable difference is that the coefficient on child nutrition increased relative to the OLS estimate. These results are consistent with Glewwe & King (2001), Alderman et al.(2001), Waber et al. (1981), and Stein et al. (1975), whose studies suggest that the OLS estimates are biased. The instrumental variables used for height-for-age z-score have strong predictive power, as revealed by the F-test statistics (bottom of table A2). The model easily passes the standard over-identification test, suggesting that the instruments are not correlated with the error term in the demand equation for cognitive skills. Table 5 presents regression results using an IV approach that includes measures of child health, parenting, and neighborhood characteristics. Regressions include the full vector of characteristics specified in table 2. Results suggest that for 2005 and 2009 maternal education, gender and expenditure in education have a statistically significant effect on vocabulary test scores for non-indigenous children. The effect of expenditure in education is negative in 2005. As it is shown in Table A1, Peruvian parents spend 5 times more in 2009 in comparison to 2005; thus, presumably, because they spend so little in 2005, it does not have a positive influence on test scores. In addition to that, for 2009, height for age z-score has a positive effect on vocabulary for indigenous children. This effect is 3.5 times bigger than the one described in Table 2 (OLS). In the case of non- 31 indigenous children, maternal education, household expenses, educational expenses and have positive effects on vocabulary of a similar magnitude than the ones found with the OLS regressions. Similar effects are found for mathematics achievement in 2009. Age of child in months has a positive effect on mathematics achievement for indigenous children. Expenditure in education also has a positive and significant effect on mathematics achievement for indigenous and non-indigenous children. For indigenous children, being a boy has a positive impact on mathematics achievement, as well as height-for-age zscore. The effect of height-for-age z-score on mathematics achievement is almost 6 times larger on the IV estimation than on the OLS estimation. For non-indigenous children, maternal and paternal education also play an important role. Results of the Oaxaca-Blinder decomposition are presented in table 6. As before, the left-most columns report the decomposition within the 2005 sample. Approximately one-third of the 2005 PPVT test score gaps can be explained by parental endowments, while in 2009, half the gap was explained by parental endowments. The right-most columns report the decomposition of the test score difference-indifferences in each subject. Overall the gap increases 0.46 standard deviations in Vocabulary, and 1.1 in mathematics. The decomposition of the vocabulary gap suggests that 78 percent of this change is explained by changing endowments of family and child, consistent with previous descriptive evidence. Similarly, in the case of mathematics, 79 percent of the gap can be explained by changes in the endowments. In both cases, the major contributor is parental education. 32 To summarize, the evidence suggests that convergence in endowments of family, children, and parental behavior towards daycare and preschool plays a significant role in the increasing gap. Even when child health is an important determinant of vocabulary and mathematics scores for indigenous children when using the IV approach, it does not explain a greater portion of the endowment effect of the ethnic gap. What it might occur is that child health operates through the performance effect. This result is consistent with the previous literature in developing countries that shows that child health is an important determinant of child’s cognitive skills. However, our results also show that parental behavior towards daycare and preschool plays a significant role. Thus, it seems that investments in early education will help closing the ethnic gap. Auxiliary regressions were run for different scenarios. We used different definitions of indigenous and found similar results. Also, we included school variables for 2009 (when children were old enough to go to school). Table 7 reports the indigenous-non-indigenous gap decomposition considering schooling factors. However, we found that the effect of school was small. 6. Discussion In this paper we studied the achievement gap between indigenous and non-indigenous children using a unique longitudinal data set, one that follows a large sample of children in a low income country from age 1 to age 8. Several important findings emerge from this analysis. First, there is a strong relationship between childhood nutrition and academic achievement and this factor explains an important portion of the achievement gap 33 between indigenous and non-indigenous children. The positive relationship between nutrition and achievement persists even after controlling for parental and household characteristics. Our results thus support a causal link between nutrition and academic success, though arguably a definitive answer to the causality question is only possible from an ideal (and therefore unlikely) empirical experiment. Second, parental characteristics, especially maternal education have a positive significant effect on child achievement. Moreover, this factor is also important in closing the indigenous-nonindigenous gap. However, the child development literature in the US points out that this is a crucial factor, probably the most important in explaining child achievement. Preliminary results that are not presented in this paper show that this is also the case for older children (using the Young Lives’ older cohort for Peru, where children were seven year old in wave 1, twelve in wave 2 and fifteen in wave 3). It the case studied in this paper, it seems possible that the effects of parental attitudes and behaviors when children were infants and toddlers are acting through the daycare and preschool variables. It is important to notice that there is not a consensus in the literature about the definition of indigenous. Some studies define indigenous based on self-reported ethnicity, maternal tongue or child’s tongue. We used the latter definition. However, for robustness checks, we tried the other two definitions and the results were similar. When considering the policy implications of our findings, substantial caution is in order. First, when we found evidence that the effect of school characteristics is small, we cannot conclude that schooling variables are not important. The data set that we used was collected by interviewing individuals in the household. We did not have access to teaching reports or specific data from schools. Our schooling variables are based on 34 parental perceptions about schools. Second, these findings are based on one area of one developing country; further evidence is needed from other countries before we can claim that parental education is the most critical factor in closing the indigenous – nonindigenous achievement gap. Third, even when child health has a large, positive and significant effect on achievement for indigenous children, the cost of preventing child malnutrition must be compared with the cost of literacy programs and further education classes for adults targeted to indigenous women. In sum, the results of this study suggest that parenting characteristics, child nutrition, school and neighborhood characteristics are responsible for more than 75 percent of the indigenous-non-indigenous gap. These findings suggest that there may be large returns to malnutrition programs, as well as literacy programs for adult women, and to programs that improve parenting quality. However, a full assessment of such policies would require a more complete analysis of how families make decisions about what inputs to provide to their children. 35 References Alderman, H., Behrman, J.R., Lavy, V., & Menon, R. (2001). Child health and school enrolment: A longitudinal analysis. The Journal of Human Resources, 36(1): 185205. Becker, G.S. (1965). A theory of the allocation of time. The Economic Journal, 75(299), 493-517. Bernal, R. (2008). The effect of maternal employment and child care on children’s cognitive development. International Economic Review, 49(4):1173-1209. Cook, M.D., & Evans, W.N. (2000). Families or schools? Explaining the convergence in white and black academic performance. Journal of Labor Economics, 18(4): 729754. Currie, J. (2009). Healthy, Wealthy, and Wise: Socioeconomic Status, Poor Health in Childhood, and Human Capital Development, Journal of Economic Literature 47(1), 87-122. Fortin, N., Lemieux, T., and Firpo, S. Decomposition methods in economics. In Handbook of Labor Economics, 2011. Glewwe, P., & King, E.M. (2001). The impact of early childhood nutritional status on cognitive development: Does the timing of malnutrition matter? The World Bank Economic Review, 15(1): 81-115. Glewwe, P., & E.A. Miguel. (2008). The impact of child health and nutrition on education in less developed countries, Handbook of Development Economics, Elsevier. Vol. 4: 3562-3604. Hernandez-Zavala, M., Patrinos, H.A., Sakellariou, C., & Shapiro, J. (2006). Quality of schooling and quality of schools for indigenous students in Guatemala, Mexico and Peru. World Bank Working paper 3982. International Monetary Fund. World Economic Outlook Database-September 2011, International Monetary Fund. Accessed on September 20, 2011. James-Burdumy, S. (2005). The effect of maternal labor force participation on child development. Journal of Labor Economics, 23(1), 177-211. Kimmel, J., & Connelly R. (2007). Mother’s time choices. Caregiving, leisure, home production, and paid work. Journal of Human Resources, 42(3), 643-81. 36 Leibowitz, A. (1977). Parental inputs and children’s achievement. Journal of Human Resources, 12(2), 242-51. Marshall, J.H. (2009). School quality and learning gains in Guatemala. Economics of Education Review, 28(2): 207-216. McEwan, P.J. (2007). Can schools reduce the indigenous test score gap? Evidence from Chile. Journal of Developmental Studies, 44(10): 1506-1530. McEwan, P.J., & Trowbridge, M. (2007). The achievement of indigenous students in Guatemalan primary schools. International Journal of Educational Development, 27: 61-76. McKee, & Todd, P. (2009). The longer term effects of human capital enrichment programs on poverty and inequality: Oportunidades in Mexico. Working paper. Neidell, M. (2000). Early Parental Time Investments. In Children's Human Capital Development: Effects Of Time In The First Year On Cognitive And Non-Cognitive Outcomes. UCLA Economics Working Papers, 886. Oaxaca, R. & Ransom, M. (1999). Identification in detailed wage decomposition. Review of Economics and Statistics, 81(1): 154-157. Paxson, C., & Schady, N. (2007). Cognitive development among young children in Ecuador: The role of health, wealth and parenting. Journal of Human Resources, 42(1), 49-84. Schady, N., “Early childhood development in Latin America and the Caribbean (March 1, 2006),” World Bank Policy Research Working Paper No. 3869. Shin, H. (2007). Child Health in Peru: Importance of Regional Variation and Community Effects on Children's Height and Weight. Journal of Health and Social Behavior, 48(4): 418-433. Stein, Z., Susser, M., Saenger, G., & Marolla, F. (1975). Famine and Human Development: The Dutch Hunger Winter of 1944-45. New York: Oxford University Press. Todd, P., & Wolpin, K. (2007). The production of cognitive achievement in children: home, school, and racial test scores gaps. Journal of Human Capital, 1(1), 91-136. Umbel, V.M., B.Z. Pearson, M.C. Fernandez, & Oller, D.K. (1992) Measuring bilingual children’s receptive vocabularies. Child Development, 63(4): 1012-1020. UNICEF. (2009). “The state of the world’s children special edition: Celebrating 20 years of the convention on the rights of the child. 37 United Nations. (2010). Human Development Report 2010. New York: Palgrave Macmillan Villar, J., Smerglio, V., Martorell, R., Brown, C.H., & Klein, R.E. (1984). Heterogenous growth and mental development of intrauterine growth-retarded infants Turing the first 3 years of life. Pediatrics, 74(5): 783-791. Waber, D., Vuori-Christiansen, L., Ortiz, N., Clement, J., Christiansen, N, Mora, J., Reed, R., & Herrera, G. (1981). Nutritional supplementation, maternal education, and cognitive development of infants at risk of malnutrition. American Journal of Clinical Nutrition, 34: 807-813. Wilcox-Gök, V. (1983). The determination of child health: An application of sibling and adoption data. The Review of Economics and Statistics, 65(5), 266-273. Wooldridge, J. (2010). Econometric analysis of cross-section and panel data. MIT Press, 2nd. Edition. 38 Figure 1 – Regions in the Sample by Child’s Spoken Language Language Spoken: - Only Spanish - Spanish and Indigenous (Quechua, Aymara or other) 39 Figure 2 – PPVT and Math Test Scores by Ethnicity, rounds 2 and 3 40 Figure 3 – Math Test Scores by different characteristics, round 2 and 3 Round2 Round 3 41 Table 1: Mean Test Scores for Ethnic Majority and Ethnic Minority Children Student Type All districts Full sample (n=1,662) (n=1,681) Spanish (child speaks Spanish) (n=1,450) (n=1,462) Indigenous (speaks other language) (n=212) (n=219) Student Type Mixed districts Full sample (n=572) (n=584) Non-indigenous (n=360) (n=365) Indigenous (n=212) (n=219) Variable PPVT standardized score 2005 2009 CDA-Q raw score 2005 2009 PPVT standardized score 2005 2009 CDA-Q raw score 2005 2009 PPVT standardized score 2005 2009 CDA-Q raw score 2005 2009 Variable PPVT standardized score 2005 2009 CDA-Q raw score 2005 2009 PPVT standardized score 2005 2009 CDA-Q raw score 2005 2009 PPVT standardized score 2005 2009 CDA-Q raw score 2005 2009 42 Mean Std. Dev. 88.71 96.55 21.77 19.92 8.40 12.05 2.11 4.91 90.00 99.41 22.04 18.57 8.47 12.63 2.13 4.72 79.93 76.97 17.49 17.72 7.96 1.91 8.17 4.34 Mean Std. Dev. 89.67 95.93 21.99 20.92 8.48 11.95 2.09 5.08 85.57 96.78 20.86 20.28 8.14 12.09 2.11 4.84 79.93 76.97 17.49 17.72 7.96 8.17 1.91 4.34 Table 2 - Regression Estimates for PPVT Test 2005 Spanish Indigenous Age of Child in month 0.596 *** Per Capita Expenditure(log) 6.455 *** Father's education 0.325 -0.085 Mother's education 0.303 1.008 Child is male -4.318 Height-for-age Z-score Per Capita Expenditure in education, (log) -1.324 Months in pre-school -0.039 * 1.553 0.572 *** 0.344 *** 0.437 *** 5.703 *** 3.496 ** 4.005 ** 0.894 ** *** ** *** 1.196 3.425 ** 3.051 1.975 * 3.252 -0.146 3.422 *** 0.708 -0.06 0.127 -0.152 0.11 0.119 1.083 0.268 Hours per week in pre-school 0.117 -0.091 -0.03 0.027 0.942 0.95 0.973 0.96 Adjusted R 0.302 * Months in daycare, 2 * 0.359 1.542 2.775 *** 2009 Spanish Indigenous * ** Notes: 1. 2. Child’s districts are also taken as control for neighborhood effect. *p<.1, **p<.05, ***p<.01. Table 3 - Regression Estimates for Mathematics Test 2005 Spanish Indigenous Age of Child in month Per Capita Expenditure(log) 0.106 0.096 * *** 0.017 0.05 0.221 0.109 0.1 Father's education 0.013 0.061 0.116 0.226 Mother's education 0.017 0.021 0.274 *** -0.013 Child is male 0.119 -0.054 1.118 ** 0.831 Height-for-age Z-score Per Capita Expenditure, (log) 0.042 -0.139 0.329 -0.116 -0.077 1.089 Months in pre-school -0.002 0.004 0.021 0.041 0.014 -0.01 0.01 -0.006 -0.003 -0.011 -0.041 0.938 0.943 0.88 0.853 Months in daycare, Hours per week in preschool 2 Adjusted R 0.25 *** 2009 Spanish Indigenous 0.048 ** *** Notes: Child’s districts are also taken as control for neighborhood effect. *p<.1, **p<.05, ***p<.01. 43 *** * ** 0.977 *** 0.716 ** Table 4 – Decomposition Results 2005 2009 Difference Panel A. PPVT Test Score Endowment Effect Parental Education 1.60 45% 5.72 49% 4.12 51% Child's Health -0.51 -14% 0.35 3% 0.86 11% Preschool -1.84 -52% 1.46 13% 3.30 41% 3.26 92% 1.60 14% -1.66 -21% Neighborhood Effects 0.18 5% -0.08 -1% -0.26 -3% Performance Effect 0.86 24% 2.57 22% 1.71 21% Total Gap 3.56 100% 11.63 100% 8.07 100% Parental Education 0.08 -227% 1.14 51% 1.06 47% Child's Health 0.01 -38% 0.05 2% 0.04 2% -0.22 614% 0.56 25% 0.77 34% 0.15 -440% 0.08 4% -0.08 -3% 0.03 -76% -0.03 -1% -0.05 -2% Performance Effect -0.09 266% 0.44 20% 0.53 23% Total Gap -0.04 100% 2.24 100% 2.28 100% Other Panel B. Math Test Score Endowment Effect Preschool Other Neighborhood Effects 44 Table 5 – Regression Estimates for PPVT Test, Instrumental Variable Method Mathematics Vocabulary 2005 Variable Age of Child in month Household Per Capita Expenditure(log) Spanish 0.182 *** 2009 Indigenous 0.093 *** Spanish 2005 Indigenous -0.032 0.119 ** Spanish 0.699 ** *** 2009 Indigenous 0.664 ** 0.313 5.456 ** 3.599 0.040 0.229 0.256 -0.076 6.169 Father's education -0.027 0.063 0.173 * 0.160 0.281 -0.125 Mother's education -0.034 0.025 0.306 *** 0.170 0.233 0.919 Child is male 0.244 -0.067 1.058 ** 1.037 * Height-for-age Z-score Education per Capita Expenditure, (log) 1.405 ** 0.222 1.097 3.440 *** -0.168 *** -0.074 1.257 0.537 * Months in pre-school -0.007 0.004 0.018 0.042 -0.043 0.013 -0.011 0.045 0.275 Months in daycare 0.050 *** *** -4.159 * 0.321 -1.406 *** ** Spanish 0.883 ** 0.389 0.459 *** 0.282 1.473 3.418 ** 4.313 -0.666 1.101 -0.246 3.540 -0.060 0.125 -0.162 0.106 0.357 * 18.843 ** -0.607 Hours per week in pre-school -0.005 -0.004 0.002 -0.047 0.113 -0.065 -0.021 0.015 Adjusted R2 0.8564 0.8470 0.8006 0.8026 0.8593 0.8486 0.801 0.7982 Notes: Child’s districts are also taken as control for neighborhood effect. *p<.1, **p<.05, ***p<.01. 45 *** 2.777 1.557 0.332 * Indigenous *** Table 6 – Decomposition results, Instrumental Variable Method 2005 2009 Difference Panel A. PPVT Test Score Endowment Effect Parental Education 1.3 36% 5.83 50% 4.53 55% 0.11 3% 0.20 2% 0.09 1% -1.94 -54% 1.53 13% 3.47 42% 3.13 88% 1.64 14% -1.49 -18% Neighborhood Effects 0.18 5% -0.09 -1% -0.27 -3% Performance Effect 0.79 22% 2.63 22% 1.84 22% Total Gap 3.57 100% 11.74 100% 8.17 100% -0.16 291% 1.37 60% 1.53 66% 0.44 -804% -0.18 -8% -0.62 -27% -0.28 499% 0.65 29% 0.92 40% 0.06 -112% 0.13 6% 0.07 3% 0.03 -47% -0.04 -2% -0.07 -3% Performance Effect -0.15 273% 0.34 15% 0.49 21% Total Gap -0.06 100% 2.27 100% 2.32 100% Child's Health Preschool Other Panel B. Math Test Score Endowment Effect Parental Education Child's Health Preschool Other Neighborhood Effects 46 Appendix Tables Appendix Table 1 – Descriptive Statistics – Children in and out of the sample Variable PPVT test Standardized PPVT, round 2 Standardized PPVT, round 3 Math test, round 2 Math test, round 3 Being indigenous (child's spoken language) Being indigenous (maternal tongue) Age of child in months, round 2 Age of child in months, round 3 Per Capita Expenditure (logs), round 2 Per Capita Expenditure (logs), round 3 Father's education, round 2 Father's education, round 3 Mother's education, round 2 Mother's education, round 3 Male Height-for-age Z-score, round 2 Height-for-age Z-score, round 3 Expenditure in child’s education (logs), round 2 Expenditure in child’s education (logs), round 3 Months in pre-school Months in daycare, round 2 Hours per week in pre-school Number of books in household (excluding textbooks) Student repeated a grade on round 3 Teacher quality, round 3 Attended private school, round 3 Sample in this Sample in this study all study mixed districts districts (n=1662) (n=987) All YLS for Peru Std. Std. Std. Mean Dev. Mean Dev. Mean Dev. Obs. 88.71 21.77 89.67 21.99 88.42 21.79 1903 96.55 19.92 95.93 20.92 95.58 20.15 1904 8.40 2.11 8.48 2.09 8.37 2.16 1949 12.05 4.91 11.95 5.08 11.90 4.93 1884 0.13 0.28 63.53 103.53 7.11 7.51 9.24 9.16 8.03 8.08 0.50 -1.43 -1.11 0.33 0.21 0.45 0.47 4.69 63.42 4.69 103.42 0.90 7.06 0.85 7.51 3.94 9.21 4.06 9.09 4.52 7.63 4.40 7.71 0.50 0.50 1.08 -1.53 1.03 -1.23 0.41 0.13 0.50 0.29 4.86 63.45 4.86 103.45 0.94 7.07 0.87 7.47 4.05 9.15 4.16 9.07 4.78 7.81 4.63 7.89 0.50 0.50 1.08 -1.50 1.03 -1.17 0.34 0.45 4.71 4.71 0.91 0.86 3.97 4.05 4.55 4.44 0.50 1.12 1.06 3.68 2.91 3.60 3.00 3.51 3.00 1963 5.27 14.10 3.04 17.31 1.23 10.19 7.28 9.49 5.35 13.68 3.43 17.53 1.19 10.17 7.70 9.87 5.15 13.09 2.78 16.20 1.45 10.42 7.02 10.04 1943 2052 2052 2052 1.71 0.04 0.63 0.19 1.26 0.21 0.48 0.39 1.69 0.04 0.63 0.23 1.26 0.20 0.48 0.42 1.68 0.04 0.59 0.18 1.26 0.21 0.49 0.39 1936 2052 2052 1938 47 1963 1954 1963 1963 1963 1932 1906 1923 1954 1816 2052 1950 1938 48