Towards Causal Estimates of Children’s Time Allocation on Skill Development Gregorio Caetano, Josh Kinsler and Hao Teng⇤ December 2015 Abstract Cognitive and non-cognitive skills are critical for a host of economic and social outcomes as an adult. While there is broad agreement that a significant amount of skill acquisition and development occurs early in life, the precise activities and investments that drive this process are not well understood. In this paper we examine how children’s time allocation affects their accumulation of skill. Children’s time allocation is endogenous in a model of skill production since it is chosen by parents and children. We apply a recently developed test of exogeneity to search for models that yield causal estimates of the impact time inputs have on child skills. We show that the test, which exploits bunching in time inputs induced by a non-negativity time constraint, has power to detect endogeneity stemming from omitted variables, simultaneity, measurement error, and several forms of model misspecification. Results suggest that with a rich set of controls we can consistently estimate the impact of time inputs on skills, though there is significant heterogeneity in which controls matter for different skills at different ages. For children aged 12 to 17, active time with adult family members appears to be the most productive use of time in developing cognitive skills. In contrast, time spent alone and passive time with the family are important to develop non-cognitive skills at these ages. For children aged 5 to 11, with few exceptions time allocation does not have a strong effect on cognitive skills. Moreover, sleep is the most productive input for non-cognitive skill development at these ages. Gregorio Caetano: Economics Department, University of Rochester. Josh Kinsler: Economics Department, University of Georgia. Hao Teng: Economics Department, University of Rochester. We would like to thank Carolina Caetano, Vikram Maheshri, Daniel Ringo, and David Slichter for helpful discussions. All errors are our own. ⇤ 1 1 Introduction There is a growing consensus among economists that skills acquired during childhood have an important influence on later life outcomes.1 Cognitive skills measured as early as age seven have been linked with educational attainment, employment, and wages as an adult.2 Recent research stresses the importance of early childhood non-cognitive skills on later life labor market, marriage, health, and criminal outcomes.3 Additional analyses indicate that adult labor market outcomes are largely determined by skills already in place by age 16.4 In light of the evidence linking children’s cognitive and non-cognitive skills to adult outcomes, it is important to understand the determinants of these skills. One can envision a production technology for child skills where endowed ability is combined with various environmental elements and time inputs contributed by parents, siblings, grandparents, friends, etc. Households will choose inputs and activities optimally given their preferences and constraints. The challenge in estimating the underlying skill production function is two-fold. First, the time allocation of child activities is endogenous since it reflects choices made by households. Second, researchers are typically unable to observe the full vector of activities, and can often only observe one or two time inputs. Thus, even if exogenous variation is available for a particular activity of interest, it is difficult to interpret the resulting coefficient without information on the substitution among all potential activities. These challenges are well known and have been widely discussed in the literature, yet so far they have not been systematically addressed.5 On the one hand, studies such as Dustmann and Schönberg (2012) and Bernal and Keane (2010) utilize quasi-experimental policy variation that enables them to study how an increase in maternal time affects child cognition. However, a lack of comprehensive data on all other time inputs prevents them from understanding the substitution between activities, making it difficult to infer what would be the effect of a reallocation of time inputs in circumstances other than the change implied by the policy. On the other hand, studies such as Todd and Wolpin (2007), Fiorini and Keane (2014), and Del Boca et al. (2013) estimate the impact of a more comprehensive list of child inputs on skills, but lack quasi-experimental variation.6 Going forward, the 1 See Almond and Currie (2011) for a comprehensive review of the literature. For instance, McLeod and Kaiser (2004) find that age 7 test scores and family background measures predict 11% (12%) of the variability in the probability of high school (college) completion. Using similar measures, Currie and Thomas (1999) are able to explain 4-5% of the variation in employment and 20% of the variation in wages at age 33. 3 See for example Cunha et al. (2006), Deming (2009) and Heckman et al. (2013). 4 See Keane and Wolpin (1997) and Cameron and Heckman (1998) as examples. 5 See Todd and Wolpin (2003) and Keane (2010) for a discussion of these issues. 6 Todd and Wolpin (2007) choose their most preferred model using root mean-squared error as the selection criteria. However, as the authors point out, this measure speaks to fit but not necessarily to whether the 2 2 lack of exogenous variation in models with many inputs is unlikely to be solved through an instrumental variables approach since each input requires its own instrument. Moreover, running an experiment is also difficult, as the treatment arms would have to consist of fully prescribed inputs for each child. If only some inputs are manipulated, parents can optimize over the remaining ones to either reinforce or negate the intended effect. Either way, it will be difficult to estimate the ceteris paribus causal impact of substituting one time input for a well-defined alternative time input. Our aim in this paper is to estimate the impact of children’s time allocation on cognitive and non-cognitive skills while also investigating endogeneity concerns in a systematic manner. To do this, we exploit a recently developed test of exogeneity (Caetano (2015); see also Caetano and Maheshri (2015)) that provides an objective statistical criterion to determine whether the parameters of interest can be interpreted as causal. We use the test to guide us in the search for causal models without an explicit source of quasi-experimental variation. Our empirical approach is essentially a model selection one, but with a key difference to existing approaches: our criterion to select models speaks directly to causality, not to fit. In the context of skill development, model selection occurs in a very large model space; indeed, there are many ex ante equally plausible ways of formulating the relationship between time inputs and skill. We show that a multivariate version of the exogeneity test in this context can be a feasible tool to substantially reduce the set of models that can plausibly be considered appropriate to make causal inference. The intuition behind the test is as follows. Consistent with the rest of the literature, we assume that cognitive and non-cognitive skills are continuous functions of time allocated to various activities. Thus, conditional on controls, skill measures should not vary discontinuously when a child spends exactly zero minutes on any activity (in comparison to, say, just a few minutes). If we find such discontinuities, then it is evidence that unobserved confounders that are not absorbed by controls vary discontinuously at the zero minute threshold, and hence the model suffers from endogeneity. For a test based on this logic to have power, we need to ensure that unobservables vary discontinuously when time inputs equal zero. We argue that this is the case in our context since children cannot spend negative time in any activity, and therefore, may bunch at zero because of a “corner-solution”. For instance, consider children’s reading ability as an important unobserved confounder. Intuitively, children who parameters of interest can be interpreted as causal. Fiorini and Keane (2014) approach the identification problem by estimating multiple production functions that rely on different exogeneity assumptions. Stability of the productivity ranking of inputs is cited as evidence that a reallocation of time use can enhance child development. However, as the authors point out, it is difficult to claim these estimates are causal since the models could suffer from similar biases. Similarly, Del Boca et al. (2013) assume inputs are exogenous conditional on lagged skill, but it is difficult to assert whether these controls are enough to handle endogeneity. 3 spend a few minutes per week reading choose that amount optimally given their constraints and preferences. In contrast, among children who spend zero minutes per week reading, there are children who optimally choose exactly zero minutes and children who would optimally choose negative amounts of reading time if possible. This implies that the average reading ability among children who spend x minutes reading tends to be discontinuous at x = 0, and thus the test would have power to detect endogeneity stemming from unobservables related to reading ability. The more unobserved confounders that vary discontinuously at zero minutes for a given time input, the more powerful the test will be as it can detect endogeneity stemming from multiple sources. One potential concern is that our exogeneity test might lack power: in spite of the intuition above, potential confounders might not vary discontinuously when children spend zero minutes in an activity. If this is the case, then we would not be able to detect endogeneity even if endogeneity exists. To allay this concern we first show that each activity of interest has a discontinuous mass of observations at zero minutes, providing direct evidence of bunching (McCrary (2008)). Next, we show that a variety of key observed characteristics are discontinuous at the zero threshold. While not a formal test, this is suggestive that unobservables are discontinuous as well.7 It also suggests that discontinuities when inputs are zero are the norm rather than the exception. Finally, we are able to detect discontinuities in the most parsimonious models. It is only when we add a richer set of controls that we fail to reject exogeneity. Even if the evidence discussed above establishes that the test has power, the test might not have power to detect all types of endogeneity we are concerned about. In particular, some potential confounders might be continuous when inputs are zero. We allay concerns about their existence in many ways. First, we test for discontinuities at zero minutes for all time inputs jointly. This increases the power of the test since a confounder that is continuous when one input is zero can be discontinuous when a different input is zero. Second, we provide empirical evidence that key variables that elicit the most salient sources of potential endogeneity – omitted variables (e.g., child’s reading ability), simultaneity, measurement error, and many forms of model misspecification – are discontinuous when inputs are zero. Third, we implement a series of robustness checks designed to detect confounders that cannot be detected by the test, and find that models that survive the exogeneity test also survive these robustness checks. Finally, we show that estimates are similar across models that survive the test, although they are often different among models that do not survive. 7 This logic is analogous to the one used in regression discontinuity (RD) designs. Support for the RD identifying assumption is based on the idea that if observables vary smoothly at a threshold, then unobservables are also likely to vary smoothly at that same threshold. 4 We implement our approach using skill assessments and time diaries from the Child Development Supplements of the Panel Study of Income Dynamics (PSID). With the help of their primary caregiver, children filled out a detailed 24 hour time diary to record all of their activities during the day, where each activity took place, and with whom they did the activity. These time diaries were collected in 1997, 2002, and 2007, and covered one weekday and one weekend day for each survey year. Cognitive and non-cognitive skills are also assessed during each wave of the survey. We split the sample into older (12-17) and younger (5-11) children and estimate separate production functions to allow for heterogeneity in the accumulation of skill as children age. In addition to time use and skill assessments, the PSID also includes a detailed list of child demographics, family background characteristics, and other measures of the environment in which the child is raised. Our search for a model whose parameters can be interpreted as causal proceeds in the following manner. In our baseline models, we categorize child activities according to the level of engagement – active (e.g., reading) or passive (e.g., watching TV) – and with whom the activity is completed – mother, father, siblings, friends, grandparents, others, or no one. We then relate the time devoted to these activities to skill measures (math, vocabulary, comprehension, non-cognitive) using standard production functions, such as value-added. We also consider models containing many other subclassifications of activities as suggested in the previous literature. The activity excluded from all models is sleeping, so that the results should be interpreted as a substitution between a given activity and sleeping time. Every model also includes a series of indicator variables that reflect whether the time devoted to each activity is zero. These indicator variables are included to absorb any discontinuous change in the outcome variable when inputs are zero, conditional on controls. If we reject the null hypothesis that the coefficients on the zero time input indicators are jointly equal to zero, then we conclude that this particular model suffers from an endogeneity problem. In our context, a model is defined by the outcome variable (skill), the main explanatory variables (time inputs), controls (ranging from none to a detailed list of child, family, and environmental observables) and a particular functional form establishing the relationships between these variables. For both younger and older children, our search ultimately identifies models for which we are unable to reject exogeneity of children’s time inputs. Part of our success in this endeavor is due to the detailed nature of the time use data. When we estimate the impact of a particular time input, we are able to control for all other time inputs of the same child. These alternative inputs absorb much of the endogeneity, as they elicit heterogeneity in preferences and constraints across children and activity partners in the sample. However, the time use data alone is not sufficient to account for all endogeneity. For younger children, 5 child and family characteristics are crucial additional controls needed to absorb endogeneity in all cognitive and non-cognitive skill models. In contrast, for older children different categories of controls are important in absorbing endogeneity for different skill measures. For math skill models, child and family characteristics are crucial, while for vocabulary and comprehension models, child characteristics, school characteristics and school experience are important. Further, for models of non-cognitive skills, child characteristics, family environmental characteristics and school characteristics are important for controlling for endogeneity.8 As surveys become increasingly onerous in their time demands on respondents, our results can provide guidance regarding which observed variables are critical to collect when studying child skill development. Once we arrive at models for which we fail to reject exogeneity, we turn our attention to the key parameters of interest. For older children, the activities that most promote cognitive skill formation are active time with adult family members, such as parents and grandparents. For example, one additional hour per week spent in active time with grandparents leads to 2.9% of a standard deviation increase in comprehension scores.9 Non-cognitive skills, on the other hand, are increased most by passive time with parents and alone. For younger children, the estimates of the impact of time inputs are quite different. Sleeping or napping is the most productive time for the development of non-cognitive skills. For example, one additional hour per week spent in sleeping or napping rather than in active time with friends would increase non-cognitive skills by 1.8% of a standard deviation. Math, vocabulary, and comprehension skills are less sensitive to alternative time allocations. Our results for cognitive skill development of young children contrasts with the results of Fiorini and Keane (2014), who find that active or educational time with parents is quite productive. We also find such results in our more parsimonious specifications, however, these specifications fail the exogeneity test. The rest of the paper is organized as follows. In Section 2, we describe the PSID data. Section 3 presents our approach and provides both theoretical and empirical evidence in support of this approach. In Section 4, we present our main results. In Section 5, we perform various robustness checks. In Section 6, we discuss why selection on observables is sufficient to identify the parameters of interest in this context, before we conclude in Section 7. 8 Our results help clarify the key finding in Fiorini and Keane (2014) that the ranking of time inputs across various production function models is stable. Each of their preferred models, including contemporaneous, lagged score, and lagged input, include controls similar to the ones we identify as being important for allaying endogeneity. 9 This result aligns with studies in developmental psychology which emphasize the irreplaceable role of grandparents in the development of grandchildren (see Smith (2003) for more details). 6 2 Data To estimate the effect of time inputs on skill we use data from the Panel Study of Income Dynamics (PSID) and the three waves of the Child Development Supplements (CDS-I, CDSII, and CDS-III).10 In 1997, the PSID started collecting data on a random sample of the PSID families that had children under the age of 13. About 3,500 children aged 0-12 residing in 2,400 households were interviewed in 1997, and then followed in two subsequent waves, 2002 and 2007. Rows 1-3 in Table 1 illustrate the age range and average age for each wave, respectively. Data collected in the CDS include measures of children’s cognitive and non-cognitive skills, time use diaries, and information about child and family characteristics, such as parentchild relationships, child health, and home environment. We match the CDS children with their PSID families to get additional information such as family annual income, mother and father’s ages, mother and father’s education levels, and so on. We pool CDS children across the three waves and divide all observations into two groups based on age: younger children (5-11 years old) and older children (12-17 years old). Pooling the data in this manner maximizes our sample size while still allowing for heterogenous production functions across developmental stages. Rows 4 and 5 in Table 1 illustrate the age range and average age for each group. To the best of our knowledge, the only other data set combining information on child skills and family background with time use diaries is the Longitudinal Study of Australian Children (LSAC). Compared to the LSAC, the PSID-CDS has the advantage of focusing on a larger age range of children (0-22 years old) and has richer time use data in terms of the number of children activities and with whom these activities were performed. Importantly, the PSID-CDS allows us to separate the time children spend with mothers and fathers, which according to Del Boca et al. (2013), have differential impacts on skill development.11 2.1 Time Use Diaries The time use diary from the CDS collects the details of child activities for two random days of a week (one weekday and one weekend). Diary forms are mailed to each child’s 10 Panel Study of Income Dynamics is a US longitudinal survey of a nationally representative sample of individuals and families, started in 1968 with a sample of 4800 families. It is funded by National Institute of Child Health and National Development (NICHD). 11 While our focus in this paper is on the PSID-CDS, we have completed analyses similar to Fiorini and Keane (2014) using the LSAC. However, we find that we have significantly less power to detect endogeneity in that sample. In the contemporaneous skill model with no additional controls we fail to reject exogeneity. The lack of power likely stems from somewhat smaller sample sizes and minimal bunching at zero time inputs, for this particular categorization of inputs. 7 address, and each child (with the help of her primary caregiver if needed) fills out a detailed 24 hour time diary to record all of her activities during the day, such as where each activity took place and with whom they did the activity.12 An interviewer then visits the household to check/edit the diary that has been completed.13 Child activities are classified according to the type of activity (215 in CDS I , 317 in CDS II, and 315 in CDS III), where the activity took place (14), and with whom (11) the activity was completed. In CDS I, most diaries (80%) were completed by the child’s primary caregiver or the child and her primary caregiver together. Sampled children are considerably older in CDS II and CDS III, and as a result approximately half of the children in these rounds completed the time diaries on their own. We clean the time use data so that the diaries are as representative as possible. Time diaries may have limited reliability since they are only a very small sample of a given child’s days.14 To allay this concern, we first exclude cases where either the weekday or the weekend diary was not returned. Second, we exclude diaries that describe a non-typical day. Third, we keep only complete diaries and do not impute unassigned slots, with one exception: time periods between 10 p.m. and 6 a.m. that are missing are recoded as sleeping or napping, as in Fiorini and Keane (2014).15 As a result, we drop 4% of time diaries in CDS I, 3% in CDS II, and 1% in CDS III. Thus, we are left with complete diaries – those such that the duration of all the activities add up to 24 hours. The numbers of observations in our samples are 2,807 in CDS I, 2,520 in CDS II, and 1,424 in CDS III. Since we have over 200 variables corresponding to the type of activity the child performed and 11 variables corresponding to with whom the child performed the activity, it is not feasible to estimate the effect of every single combination of these two variables given the available sample size. Ideally, all else constant, more disaggregated categories of activities are preferred, as it better exploits the heterogeneity in the data and estimates more interpretable parameters. However, as the categories of activities become increasingly disaggregated, the set of potential control variables gets exponentially larger. We choose to categorize children’s activities into two general types of activities, namely active and passive. Active (passive) activities include all activities in which the child actively (passively) participates. The activities that we recode as active are taking lessons (e.g., dancing), reading, socializing, active leisure, household chores, jobs, school/day care, and organizational activities. In contrast, 12 92% (90%) of the primary caregivers are mothers for the younger (older) cohort. Some interviews were done via phone: 24% for the younger cohort and 9% for the older cohort. 14 Researchers have found that young children’s parents enjoy working with their child to complete the child’s time diaries, and these diaries can adequately represent the child’s day (Timmer et al. (1985)), but it is not clear whether that particular day is a representative sample. 15 We also follow them by recoding as sleeping or napping the time periods between 10 p.m. and 6 a.m. originally filled as “refused to answer”. 13 8 the activities we recode as passive are obtaining goods and services, traveling/waiting, using computers, watching TV, passive leisure, and personal needs and care.16 We categorize with whom the child performed the activities into seven groups of people: mother, father, grandparents, siblings, friends, others (i.e., someone other than the first five groups), and self. In reality, a child could perform an activity with many different people at the same time. Whenever the child was with more than one person within the same time slot, we assign the slot to what we consider to be the primary person according to: mother, father, grandparent, sibling, friend, and others. Finally, we also add two other categories: refuse to answer or do not know and sleeping or napping.17 We choose sleeping or napping as the omitted time input in our estimation, so that all our reported results should be interpreted as a substitution between a given activity and sleeping or napping. Our decision to categorize child activities according to activity partners reflects two ideas. First, prior research indicates that there is important heterogeneity in the productivity of time depending on the partner (Del Boca et al. (2013)). Second, disaggregating activity partners allows us to better control for selection. Each of these partners has different preferences and marginal costs of time, which may influence input choices and, for a given input choice, the return of these inputs on skills (which likely depend on factors such as partner’s age, level of education, intensity of engagement, child’s perceived authority of activity partner, etc). Controlling for how time is allocated across all these potential partners helps minimize this source of endogeneity. However, we also explore the sensitivity of our results to additional controls that relate to the categorizations chosen by Fiorini and Keane (2014) and Del Boca et al. (2013). 2.2 2.2.1 Summary Statistics Children’s Time Allocation In this section, we describe children’s time allocation using the recoded activity categories as described above. We construct a weekly measure for each time input by multiplying the weekday hours by 5 and the weekend day hours by 2, and then adding up the total hours. Tables 2 and 3 show the weekly distributions of time (in hours) for younger and older children, respectively. Sleeping or napping is the most popular activity in our sample, as expected. Younger children also spend a lot of passive time with their mother, while older children spend a lot of active time by themselves. This is not surprising as a five- to elevenyear-old child is likely to spend a lot of time on personal needs and care with her mother, 16 A full description of our recoding rules is available upon request. Following Fiorini and Keane (2014), we distinguish “refused to answer or do not know” (which we include in our sample) from the case where an activity is missing (which we exclude from our sample). 17 9 while an older, twelve- to seventeen-year-old child, is likely to spend a lot of time at school.18 Compared to younger kids, older children have smaller means across the 16 time categories except for active and passive time with friends and with self. Specifically, as children age, the average active time with the mother drops from 13 hours per week to 8 hours per week. Active and passive time with friends and with self seem to reduce time with parents, grandparents and siblings as children age. It is also evident that there is less variation in active time with parents amongst older children, while there is more variation in passive time with parents. Importantly, almost every input category has a sizable mass of respondents reporting zero minutes. 2.2.2 Children’s Skills, Demographics, and Parental Background In this section, we discuss other variables in the data that are relevant to our analysis. We start by describing the children’s skill variables that we use as outcomes in our models. PSID-CDS children aged 3 and older were evaluated using the Woodcock-Johnson Revised Tests of Achievement (WJ-R), Form B (Woodcock and Johnson (1989)). In 1997, children aged 3-5 were administered Letter-Word Identification and Applied Problems subtests. Children aged 6 and above received Letter Word and Passage Comprehension sub-tests as well as Applied Problems and Calculation sub-tests. In the 2002 and 2007 waves, these tests were re-administered, with the exception of the Calculation sub-tests. Since the Calculation sub-test was only administered for the 1997 wave, we do not include it as one of our skill measures. Thus, we use standardized versions of Letter Word, Applied Problems, and Passage Comprehension as our child cognitive skill measures.19 In the following sections we refer to these scores as Vocabulary, Math, and Comprehension. Non-cognitive skills are measured through parental assessment. In all three waves, the primary caregiver was asked questions about the child’s behavioral problems. Twenty-six questions are used to measure the child’s behavioral problem scale, and ten other questions were asked about the positive aspects of children’s lives, including obedience/compliance, social sensitivity, persistence and autonomy. With these thirty-six questions, we construct a measure of non-cognitive skills by using iterated principal factor analysis, similar to Cunha and Heckman (2008) and Fiorini and Keane (2014). In Table 21 in the Appendix we show the rotated factor loadings. The factor loadings are all above 0.19 and stable across the 18 90% (96%) of self active time for the older (younger) cohort consists of time at school. We incorporate school time into self active time because it is difficult to determine with whom the child spends the bulk of their time at school. As discussed in Section 5, our results are robust to splitting self active time into two categories of inputs, one incorporating school activities and the other incorporating the other activities comprising of self active time. 19 We do not use the standardized scores provided by the PSID-CDS. Instead we standardize the raw score of each skill measure to have mean 0 and standard deviation 1 for both older and younger children. 10 two age groups. The constructed measure is standardized to have mean zero and standard deviation one and is ordered so that a higher score means better non-cognitive skills. The PSID-CDS collects extensive information on the child, her household, as well as her school environment. In Table 4, we present demographic and parental background statistics for a few selected variables. Child characteristics are presented in rows 1 to 4, parental characteristics are presented in rows 5 to 12, and environmental characteristics are presented in rows 13 to 16. The table shows that the younger and older children are fairly similar in terms of demographic and parental background characteristics. On average, children in the sample are the second child to her mother, and more than 50% live with both biological parents. The only sizable difference across age groups is the age of the children and their parents. Parents of the younger children are on average in their late thirties and parents of the older children are on average in their early forties. Also, households income is slightly higher for the older children relative to their younger counterparts. 3 Empirical Approach In this section, we discuss our empirical approach, paying special attention to how we implement the test of exogeneity in our setting. A more technical discussion of the test in a multivariate context can be seen in Caetano and Maheshri (2015); see Caetano (2015) for the formal description of the test in the univariate context. We are interested in assessing whether we can consistently estimate via OLS in the following equation: Skilli = Inputi + Controli ⇡ + Errori , (1) where i denotes a child. Skilli refers to a particular skill of the child (e.g., mathematics skill), as measured by standardized assessment scores. Inputi refers to a vector of all activities done by the child in hours per week, whose jth element is denoted as Inputji (e.g., active time spent with the mother). Controli refers to covariates added to absorb confounding factors, and Errori refers to the unobserved determinants of Skilli that are not absorbed by covariates. In this context, a “model” is defined as a unique combination of (Skill, Input, Control) in equation (1) for precise definitions of Skill, Input and Control. We can consistently estimate := ( 1 , ..., J )0 via OLS in model (Skill, Input, Control), described in equation (1), if: 0 1 B C Assumption 1. Cov @Errori , Inputji | Inputi j , Controli A = 0, for all j, where Inputi | {z } Covariatesji 11 j := (Input1i , ..., Inputji 1 , Inputj+1 , ..., InputJi ). i Our approach consists of testing Assumption 1 (jointly for all j) in all feasible models. In the models that survive the test, we conclude that, at the same time for all j, all confounding factors that would bias j are absorbed by Covariatesji := Inputi j , Controli . Thus, we can reasonably interpret the ˆOLS estimated from the models that survive the test as causal effects of time inputs on skills. Of course, the credibility of this approach depends crucially on the capability of the test to detect potential endogeneity. In the rest of this section, we explain the test and discuss in detail which types of endogeneity the test can and cannot detect in our context. 3.1 Testing the Exogeneity Assumption The test of exogeneity relies on the assumption that unobserved confounders will be discontinuous when at least one time input is zero; thus, when inputs are zero, unobserved confounders are elicited in the equation. If these unobservables affect Skilli conditional on ⇥ ⇤ covariates (i.e., if E Skilli |Inputji = x, Covariatesji is discontinuous at x = 0, for some j), then they are not fully controlled for and hence cannot be consistently estimated via OLS in this model. Thus, the test reflects whether covariates are able to absorb the confounding unobservables which vary discontinuously when at least one input is equal to zero. We explain the intuition of this test in Figure 1, which illustrates the correlation between a generic time input and a generic child skill across all children in the sample. The goal of the test is to understand whether part of this correlation can be interpreted as causal. The discontinuity shown in Panel (a) must be the result of either the observed covariates or unobserved confounders varying discontinuously when the time input is equal to zero.20 As shown in Panel (b), conditional on all observed covariates Inputi j , Controli , the discontinuity remains. The remaining discontinuity in Panel (b) must be the result of unobserved confounders that are not absorbed by the covariates, so Assumption 1 is rejected for this model. The statistical power of the test comes from the assumption that unobservables vary discontinuously when a time input is zero. Below we show empirical evidence supportive of that; but first, we discuss why this is the case. Unobservables are likely discontinuous at zero in our context because observations are bunching at a threshold, leading to a “corner 20 Of course, we rule out the possibility that the main effect is discontinuous at zero in equation (1). This implicit assumption, also made in all papers in the literature, is plausible in our context (e.g., a second of reading a book should not affect the child’s skills that much). Another reason this assumption seems innocuous in our context is that we cannot reject the null hypothesis of continuity for models with a detailed enough list of covariates. 12 solution” problem. For instance, consider a generic unobservable “mother type”, which helps determine the skills of a child. Figure 2 illustrates how the average mother type varies depending on the level of time spent reading books to the child. Mothers who read less to their child tend to have a lower type, as illustrated in the figure. However, something unique happens at zero. The mothers who read zero minutes to their child are discontinuously different from the mothers who read a little to their child. The reason is that among mothers who read zero, there are some whose type is so low that if possible they would have read negative amounts of time to their child. In this example, if mother type is not fully absorbed ⇥ ⇤ by covariates, then E Skilli |Inputji = x, Covariatesji will be discontinuous at x = 0, which explains the discontinuity found in Panel (b) of Figure 1. Each time input can elicit many such unobservable confounders; if covariates do not fully absorb them, then endogeneity will be detected. More concretely, we implement the test by adding the vector Di to equation (1): Skilli = Inputi + Controli ⇡ + Di + Error0i , (2) where Di := d1i , ..., dJi , dji := 1{Inputj =0} . The vector Di allows for a discontinuity in i ⇥ ⇤ E Skilli |Inputji = x, Covariatesji at x = 0 for each j. We implement an F-test for whether ⇥ ⇤ = 0, which tests for the null hypothesis that E Skilli |Inputji = x, Covariatesji is continuous at x = 0 for all j jointly. This test is equivalent to testing whether Assumption 1 holds (Caetano (2015); Caetano and Maheshri (2015)). 3.2 Evidence of Bunching The test described above exploits the potential clustering of observations that results from a non-negative time constraint. Here we show empirically that observations do indeed bunch at zero time input thresholds. Figure 4 shows the cumulative distribution function (CDF) of various activities for both older and younger children. The fact that the CDFs cross the vertical axis away from the origin is direct evidence of bunching, as it shows that the probability density function is discontinuous at zero (McCrary (2008)). Moreover, the CDFs are smooth away from zero, suggesting that there is not bunching elsewhere. We find many other activities with similar distribution functions. Tables 2 and 3 show the proportion of observations with zero time inputs for each input. Note that the bunching of inputs is not necessary for the test to work; it is sufficient that unobservables are discontinuous at the threshold for whatever reason. Nevertheless, the evidence of bunching is suggestive of the “corner solution” intuition developed above: unobservables should be discontinuous because people cannot choose negative amounts of 13 time for any child related activity. 3.3 Sources of Detectable Endogeneity While the bunching evidence above indicates that the proposed test will have power, the test may not have power to detect all sources of endogeneity. In this section, we provide evidence concerning the sources of endogeneity the test has power to detect. To structure the discussion, we write the following general model of skill production: g i , Otheri ) Skilli = f (Input (3) g i is a vector of J˜ activities, defined at a very detailed level, and Otheri is a vector where Input g i in this generic that includes all other inputs in the production function. Elements of Input framework are defined precisely by a unique combination of all its features. For instance, reading different books, or reading different pages of the same book, refer to different time inputs. The production function f (·) is unrestricted in this general setting. Assumption 1 essentially combines all of the simplifying assumptions that are needed to go from the general production function outlined above to the OLS specification we aim to estimate. For example, Assumption 1 includes assumptions about linearity, additive separability, and that Controli is sufficient to account for Otheri . A failure of any of these assumptions will imply the existence of a variable wi that is excluded from Equation (2) and that may be correlated with Inputi . If this variable wi is discontinuous when Inputji = 0 for some j, then it is correlated to Inputi and the test has power to detect its presence. Figure 5 shows a few examples of potentially omitted variables wi that are likely elements of Otheri , where E[wi |Inputji = x] is discontinuous at x = 0 for some j.21 These plots indicate that the exogeneity test has power to detect whether we incorrectly omitted wi : if wi affects skill conditional on covariates then skill will be discontinuous at Inputji = 0. As an example, Panel (a) of Figure 5 indicates that if maternal education affects child skill conditional on covariates, then the test will detect endogeneity when maternal education is excluded from Controli in (2). Panels (b)-(d) in Figure 5 indicate discontinuities in the number of books at home, hours the mother works per week, and household income when various time inputs are zero, expanding the list of potential elements of Otheri whose omission 21 Each plot shows E[wi |Inputji = x] along with a local cubic fit, where wi is denoted in the title, and Inputji is denoted in the horizontal axis. At x = 0, we also show the 95% confidence interval. For x > 0, the scatter plot aggregates to the next hour of the time input. The shaded region represents the 95% confidence interval for the fit with an out-of-sample prediction at zero minutes. For the local fit, we use the Epanechnikov kernel with the rule-of-thumb bandwidth for the kernel and 1.5 times the rule-of-thumb bandwidth for the standard-error calculation. Results are robust for different choices of kernel and bandwidths. 14 the test has power to detect. Additional examples are provided in Figures 10 and 11 in the Appendix. The wi we consider in Figure 5 are observable, so they can in principle be incorporated in Controli to avoid any endogeneity stemming from their exclusion. However, if observables are discontinuous at zero, then unobservables are also likely to be discontinuous at zero. For instance, the discontinuity in mother’s education level suggests that unobservables such as whether the mother is well read, whether the mother pays attention to the academic progress of the child, etc, might also be discontinuous.22 If these variables are not fully absorbed by covariates, then their discontinuity will be captured by Di , leading to a rejection of the model. Another potential source of endogeneity is simultaneity. If time inputs are caused by skills, rather than the other way around, Assumption 1 will be violated. For instance, children with low comprehension skill may be less willing to read, which may generate a spurious correlation between time spent reading and the comprehension skill measure. Because of bunching, this spurious correlation should result in a discontinuity at zero, which our test will detect. In contrast, the causal relationship of interest is plausibly continuous at zero: for instance, spending time to read the first word of the title of a book has essentially the same effect on comprehension skills as not reading it at all. In Figure 6, we show that strong correlates of skill levels, such as birth weight, race, age and height, are in fact discontinuous when various time inputs are zero, suggesting that we have power to detect simultaneity bias. g i 6= Inputi ) can also generate endogeneity if the degree Measurement error (i.e., Input of misreporting is correlated with the observed time input vector. For example, children who spend more (active or passive) time alone may be more likely to fill out their own time-use survey, and children might tend to overstate certain inputs relative to adults (e.g., they might overstate the amount of time they spend with friends or other family members to conceal how often they are alone). In Figure 7, we show discontinuities at zero time inputs for examples of wi that are likely correlates of misreporting, such as whether the child completed the time diary alone. These discontinuities suggest that we have power to detect endogeneity stemming from measurement error. Our test is also useful for detecting misspecification errors. This is important in our context since there are countless ways to group activities and model the relationship between skill and time inputs. In particular, we make four key simplifying assumptions to arrive at 22 The argument that similar patterns of discontinuity in observables should also be found in unobservables is analogous to the one made by researchers about continuity when implementing regression discontinuity designs. 15 Equation (1). First, we aggregate many time activities into only a few categories, which may induce endogeneity due to over-aggregation (i.e., J˜ > J). As an example, if a subcategory of maternal active time, such as reading with the mother, increases disproportionately as active time with the mother increases, then we may arrive at a biased estimate of maternal active time. Figure 8 shows discontinuities for a few examples of wi that speak directly to this potential issue. For example, Panel (a) shows that children who spend no passive time with their father are likely to spend a discontinuously larger proportion of the active time they spend with their father at home, relative to children who spend little passive time with their father. This suggests that if active time with the father is differentially productive at home (a type of heterogeneity precluded by our aggregation scheme) then the indicator variable for passive time with father would detect it. g i , Otheri ) is separable, ruling out the presence of hetSecond, we assume that f (Input erogeneous effects. For instance, mothers who read well may be more willing to read to their children, and this activity may generate a higher return to their children’s skill relative to mothers who do not read well. The plots in Figure 5 (see also Figure 10) depict discontinuities for examples of wi along which heterogeneity in returns of activities likely occurs, suggesting that we can detect endogeneity resulting from heterogeneous treatment effects. For instance, Panel (a) of Figure 5 shows that the level of education of the mother is discontinuous when active time spent with the mother is zero. Thus, the test has power to detect endogeneity from heterogeneous effects to the extent that any other time input (e.g., passive time with the mother) has a different effect on the child’s skills depending on the level of education of the mother. A third potential misspecification issue that will generate endogeneity is the presence g i , Otheri ) 6= Input g i + Controli ⇡). In this case, wi = of non-linear effects (i.e., f (Input g i , Otheri ) Input g i f (Input Controli ⇡ might be discontinuous when inputs are zero. Panel 0 (a) of Figure 9 shows that E[Inputji |Inputji = x] is discontinuous at x = 0 for examples of j 6= j 0 . Children who spend zero active time with their mother spend a discontinuously larger amount of active time with their grandparents, an average increase from 2 to 5 hours per week. This is direct evidence that the test has power to detect endogeneity from non-linear 0 0 0 effects (e.g., f j (5) f j (2) 6= (5 2) j ).23 Additionally, our linear model (2) may incorrectly predict a discontinuous impact of inputs at zero because of non-linearities away from zero. In this case, Di will be significantly different from zero in an attempt to correct for this model misspecification. Regardless of the reason, the test has power to detect endogeneity 23 In reality, there is heterogeneity across observations with the same value of Inputji , which enhances the 0 0 0 power of the test because it can detect endogeneity if f j (x1 ) f j (x2 ) 6= (x1 x2 ) j for other values of x1 and x2 . For instance, Panel (b) of Figure 9 shows that the entire distribution is discontinuous at x = 0, not only its first moment. Caetano and Maheshri (2015) discusses this point in more detail. 16 stemming from non-linear effects. Finally, misspecification of controls may also lead to endogeneity. If wi is discontinuous at zero, then wi0 := g(wi ) is also discontinuous at zero for almost all functions g(·). Thus, the test also has power to detect endogeneity due to misspecification of observed controls wi , which can occur since it is unclear how they should be included in the equation.24 These examples of wi are just a small subset of observed variables for which we find discontinuities at x = 0. Moreover, for ease of exposition, we have discussed the implications of each simplification individually. Our approach is agnostic about the specific reason why Assumption 1 might fail, and in fact jointly tests for all sources of detectable endogeneity, even ones we may not conceive. Of course, even among these sources of endogeneity there are potential confounders that cannot be detected by the test. For example, some confounder wi implied by an aggregation choice may not be discontinuous when inputs are zero. Next, we argue why these confounders are likely to be rare in our context. 3.4 Which confounders cannot be detected by the test? Consider the set of all potentially endogenous variables w, characterized by being correlated to both Inputi and Skilli . If any such w is not absorbed by covariates, Assumption 1 in the context of Equation (1) will be violated. These variables fall under two categories: (a) those that vary discontinuously at Inputji = 0, and (b) those that vary continuously at Inputji = 0. Thus far, we have focused our discussion on potential confounders of type (a) and our ability to detect them with our test. However, potentially endogenous variables of type (a) can actually be further subdivided in two types: (a1) those that are correlated with Skilli when Inputji = 0, and (a2) those that are uncorrelated with Skilli when Inputji = 0. These three types of potential confounders, (a1), (a2), and (b), form a partition: any potential confounder wi is of one and exactly one of these three types. The exogeneity test described above can detect all confounders of type (a1), but cannot detect confounders of types (a2) or (b). Note that our multivariate setting adds redundancies that contribute to the power of the test since a confounder of type (a2) or (b) for input j can be of type (a1) for input j 0 .25 In any case, as we argue next, confounders of type (a1) should be the norm 24 If wi enters the equation non-linearly, discontinuities in higher moments of the distribution will add power to the test. Here we show only discontinuities in the first moment of the distribution, but we actually find discontinuities in the whole distribution. For instance, the variance of wi is often discontinuously higher when inputs are zero. This is intuitive, as observations tend to be discontinuously more heterogeneous at that point because of bunching. 0 25 All observations with Inputji > 0 are such that Inputji = 0 for some j 0 , for all j, reducing the possibilities 0 of confounders of type (a2). Similarly, all observations with Inputji = 0 are such that Inputji = 0 for some j 0 , for all j, reducing the possibility of confounders of type (b). 17 rather than the exception in our context. To help frame our argument, Figure 3 illustrates examples of confounders of types (a) and (b). The solid black lines (and points) in the figure correspond to Inputji , while the j? dashed black lines correspond to Inputj? i , where Inputi is the unrestricted optimal choice j j j? j of input j by individual i. Of course, Inputj? i = Inputi for Inputi > 0, but Inputi Inputi for Inputji = 0. This occurs because people cannot spend a negative amount of time on an activity.26 Panel (a) of Figure 3 distinguishes between confounders of type (a1) and (a2), both of which vary discontinuously at Inputji = 0. In this example, the red range along the vertical axis is the support region of the confounder for the whole sample, while the blue range is the support region for the subsample of observations such that Inputji = 0. A confounder, by definition, must be correlated to Skilli in the red range. A confounder is of type (a1) if it is correlated to Skilli in the blue range, while it is of type (a2) if it is not correlated to Skilli in the blue range. The evidence of bunching shown above suggests that type (a2) is unlikely, since a significant portion of the sample is such that Inputji = 0. Moreover, the redundancies implied by the multivariate test is particularly helpful in this case because the blue range of the same confounder will vary across inputs, allowing the test to cover more of the support of any confounder. Panel (b) of Figure 3 depicts a confounder of type (b). In this case, the average of the confounder when Inputj? i 0 has to be equal to the corresponding average for observations j? where Inputi = 0. This is implausible because the confounder is by definition correlated to Inputji and there are many observations such that Inputj? i < 0 (as per the bunching evidence shown in Section 3.2). Indeed, the discontinuity plots discussed in the previous section suggest that confounders of type (a) are much more likely to occur. Despite the improbability of confounders of types (a2) and (b), we pursue in Section 5 an extensive set of robustness checks designed to detect them. However, prior to presenting these checks, we report our main findings. 4 Main Results We start by proposing a set of models that we can plausibly estimate given the data described in Section 2. As explained, a “model” is defined as a unique combination of (Skill, Input, Control) in equation (1), where Skill 2 {math, vocabulary, comprehension, 26 Note that Inputji , rather than Inputj? i , should be included as inputs in the production function, since we want to identify the effect of the actual (not the desired) time spent on activities. Thus, we do not have a censored model. Caetano (2015) discusses this distinction in detail. 18 non-cognitive skills}, Input 2 {sleeping or napping, active time with “companion”, passive time with “companion”, don’t know or refuse to answer}, and companion 2 {mother, father, grandparents, siblings, friends, others, self}. The set of potential Controls will vary by cohort because of data restrictions. We now describe the models we consider and present our results for each age group in turn. 4.1 4.1.1 Linear Treatment Effects Older Children (12-17 Year Olds) For older children, we consider only value-added models, which have been standard in the literature, by including in all specifications the value of Skilli observed in the previous wave.27 We consider a sequence of seven specifications of the value-added model, where each specification includes a richer set of controls than the previous one. We take this approach for two reasons. First, it illustrates that our exogeneity test has power to detect endogeneity in the most parsimonious models. Second, it helps identify the key controls that absorb important sources of endogeneity. The details of each specification are as follows. Specification (1) has no controls other than the corresponding lagged skill. Specification (2) adds child characteristics, such as age, gender, and race. Specification (3) adds mother demographic characteristics, such as age, education level, and age at child’s birth. Specification (4) adds family demographic characteristics, such as father’s age, whether the child lives with biological parents, and household annual income. Specification (5) adds family environmental characteristics, such as whether the child has a musical instrument at home and whether the child’s neighborhood is safe. Specification (6) adds school characteristics, such as whether the child is in a public or private school and the school’s teacher-student ratio. Specification (7) adds the child’s school experience, such as whether the child has ever repeated a grade, and whether the child has ever attended a gifted program.28 27 We also show results for non value-added models in the appendix, for completeness. Here is a full list of the control variables included for each category. Child characteristics: child’s age, child’s age squared, child’s gender, child’s race indicators, birth order to mother, born in the US indicator, child’s grade indicator, and child’s BMI. Mother demographic characteristics: mother’s education in years, mother’s current age, mother’s current age squared, and mother’s age at child birth. Family demographic characteristics: father’s education in years, father’s current age, father’s age at child birth, mother’s marital status at child birth, household annual income (in $1,000s), number of siblings child lives with, indicators of whether child lives with biological parents, and indicator of whether child lives with grandparents. Family environmental characteristics: spending on tutoring programs, spending on extracurricular lessons, indicator for whether caregivers spent on school supplies for the child, indicator for whether child has a musical instrument at home, indicator for whether child has a desk at home, and rating of neighborhood quality. School characteristics: indicator for whether school is public, indicators for school enrollment criteria (e.g., based on geography), school’s teacher-student ratio, and child’s teacher’s full-time 28 19 Table 5 shows the exogeneity test F-statistic and corresponding p-value for each model we consider. The F-statistics and p-values in bold represent the surviving specifications, i.e., specifications that we are not able to reject exogeneity at the 10% significance level. In specification (1), we reject exogeneity irrespective of the dependent variable, which provides direct evidence that our test has power to detect endogeneity in the basic value-added model we consider, complementing the evidence shown in Section 3.3. For different skill measures, the specification of Control that makes the test no longer able to reject exogeneity is different. For example, the child’s observed characteristics, together with their lagged skill and all time inputs, are enough to absorb any confounder in the production function of math, vocabulary, and non-cognitive skills. In contrast, comprehension seems to be a more complex production process, as we fail to reject only models that include all observed child, family, and school characteristics. School characteristics (i.e. specification (6)) lead to a jump in p-value for comprehension skills (i.e. p-value goes from 0.079 to 0.175), which is suggestive of the importance of school characteristics in absorbing endogeneity. Thus, Table 5 suggests that, for older children, different groups of control variables are playing very different roles in absorbing endogeneity depending on the skill in question.29 We are unable to reject exogeneity in specifications (6) and (7) for all four skills. Table 6 presents the estimated coefficients of time inputs from a surviving specification (specification (7)) for all four skill measures. We find that for math skills, active time with mother, active time with father, active time with grandparents, active time with friends, self active time, passive time with mother, passive time with siblings, and self passive time are statistically significant. Active time with grandparents is the most productive input: one more hour a week spent on active time with grandparents rather than on sleeping or napping would increase the math test score by 2.4% of a standard deviation, while one more hour a week spent on passive time with mother rather than sleeping or napping would increase test score by about 0.6% of a standard deviation. It is also noteworthy that active time with friends is as productive as active time with mother for older children. Although we find that parental inputs have an impact on math skills for older children, there is little to no effect on vocabulary skills. This result is consistent with the findings of Del Boca et al. (2013), who find that as children age the impact of parental inputs on child vocabulary declines significantly. In contrast, we find that active time with grandparents has a statistically significant effect teaching experience. Child’s school experience: indicator for whether child has ever skipped grade, indicator for whether child has ever attended a gifted program, and indicator for whether child has ever repeated a grade. 29 Note that in exercises not reported here, we vary the order in which we add controls. The importance of each group of variables in accounting for endogeneity is similar to what is observed in Table 5. We complete a similar exercise for the younger cohort as well. A full description of the permutation exercises is available upon request. 20 on child cognitive skills generally (i.e. math, vocabulary and comprehension). We also find that passive time with mother, self active time, and self passive time have similar effects (0.7%-0.8% of a standard deviation) on child’s non-cognitive skills. The coefficients in Table 6 indicate the impact of each input on skills relative to sleeping. However, by comparing the coefficients with each other we can comment on the relative effectiveness of various inputs. For example, substituting an additional hour per week of active time with the father for active time with others would increase math scores by 1.4% of a standard deviation (with a standard error of 0.2%) . The effect on skills of substituting one input for another among older children could be quite different from the effect on younger children, a group we now turn to. 4.1.2 Younger Children (5-11 Year Olds) Data restrictions prevent us from considering value-added models for the younger set of respondents. For a child in this age group to have Letter Word or Applied Problem scores in the previous wave, she has to be at least 8 years old in the current wave, and for her to have a Passage Comprehension score in the previous wave, she has to be at least 11 years old in the current wave.30 As a result, if we want to estimate the value added model for younger children in the same way as for older children, we would be left with 172 observations only. Given that we have 15 time inputs and a large number of control variables (i.e. 11 in specification (2), 15 in specification (3), 24 in specification (4), 30 in specification (5), 36 in specification (6), and 39 in specification (7)), there would be very few degrees of freedom left. Thus, we consider only models that exclude lagged skills. So in the baseline model (i.e. specification (1)) for young children, we include only time inputs. Specifications (2) through (7) add the same controls used when estimating the production for older children.31 Table 7 presents the F-statistics and corresponding p-values for the test of exogeneity performed in each model we consider.32 Differently from the case of older children, child characteristics are no longer enough to absorb confounders with regard to math skills. Instead, mother’s demographic characteristics are pivotal for absorbing endogeneity in math skills. For non-cognitive skills, family demographic characteristics are important to absorb confounders amongst younger children. Family demographic characteristics are also crucial for absorbing confounders for the vocabulary skills, but for comprehension skills, mother 30 As described in Section 2, Letter Word and Applied Problems were not administered for children below 3, and Passage Comprehension was not administered for children below 6. 31 See footnote 28 for a full description of the control variables. 32 For both age groups, the surviving specifications are the same irrespective of which input is omitted. This is not surprising, as any unobservable that might be correlated to sleeping or napping will necessarily have to be correlated to one of the other 15 inputs that are included in the regression. 21 demographic characteristics seem to be enough. We fail to reject exogeneity in specifications (4)-(7) for all skill measures.33 This result is somewhat surprising given the fact that lagged test scores are not included as controls, as in the case of older children. This could be explained in part by the fact that younger children have fewer opportunities to choose different activities from each other relative to older children, reducing the scope for endogeneity.34 Table 8 shows the estimated coefficients from specification (7) for the four skill measures. The estimates for children ages 5-11 in Table 8 are quite different from the estimates in Table 6 for children ages 12-17. Time inputs, in the way we categorize them, do not seem to play a critical role in improving younger children’s math or vocabulary skills. This result contrasts with the results of Fiorini and Keane (2014), who find that active or educational time with parents is productive. When we control only for child characteristics, we also find significant impacts of maternal time on cognitive skill formation, however these models are rejected by the exogeneity test. Once we add controls for parental characteristics, the impact of parental time on skill formation vanishes and we fail to reject the model. Fiorini and Keane (2014) include similar measures of parental characteristics in all of their models, suggesting that the difference in results does not necessarily indicate an endogeneity problem in their specification. Comprehension skills, on the other hand, are affected by time use at younger ages. Passive time with siblings, passive time with mother, as well as self active time have a statistically significant influence on comprehension skills. For non-cognitive skills, the estimates for younger children are mostly negative or zero, which suggests that spending time sleeping or napping appears to be the most productive way to improve younger children’s non-cognitive skills.35 Among the negative coefficients, active time with friends is the most unproductive one: one more hour a week spent in sleeping or napping rather than in active time with friends would increase the child’s non-cognitive skills by 1.8% of a standard deviation. Comparing the estimates for the non-cognitive skills between the younger and older children, we find that in order to improve a child’s non-cognitive skills, it is more productive to spend time sleeping or napping when the child is young, and more time with herself or with her parents as the child gets older. 33 For both age groups, the surviving models do not change even when we explicitly include napping as an input in the regression (leaving only night sleeping as the input of reference), along with its corresponding indicator variable, and implement a stronger test of whether the 16 coefficients of the indicator variables are equal to zero. 34 In the appendix, we present estimation results without including lagged skills for older children (Tables 22 and 23). Not surprisingly, without the lagged scores more controls are generally needed in order for us to fail to reject exogeneity. However, we are able to arrive at specifications where the coefficients can be interpreted causally. Reassuringly, the surviving models for children aged 12-17, with or without lagged test scores, provide similar estimates for all time inputs (see Tables 23 and 6). 35 Fiorini and Keane (2014) also find that parental time inputs have no impact on non-cognitive skill formation. 22 Similar to older children, we can compare various activities to each other using the coefficients in Table 8. Because many of these coefficients are small and insignificant, the differences will also be small and insignificant. One exception would be to substitute one additional hour per week in active time with mother for active time with friends. This would yield a 1.7% of a standard deviation (with a standard error of 0.3%) increase in non-cognitive skills. While variation in time inputs has relatively little impact for younger children, family background variables are quite strong predictors of math and verbal skills. We suspect that the structured nature of younger children’s days makes it difficult to identify the impact of alternative time uses.36 4.2 Non-Linear Treatment Effects Our main specifications assume that the effects of time inputs on child skills are linear, but there can be interesting hidden heterogeneity in the results.37 In this section, we re-estimate our models using a linear B-spline in order to allow for non-linear treatment effects: Skilli = X f j (Inputji ) + Controli ⇡ + Di + Error0i , (4) j where f j (·) is a linear B-spline function of Inputji with parameters jk , k = 1, 2, 3, representing the linear effect within equally frequent intervals of the distribution of Inputji . 4.2.1 Older Children (12-17 Year Olds) The exogeneity test results for children ages 12-17 are presented in Table 9. For comparison, we show in bold the surviving specifications according to the linear model (2). It is useful to check if the models that survive the linear exogeneity test also survive the non-linear exogeneity test. As discussed at the end of Section 3.3, the coefficients of Di in these linear models can capture endogeneity from either discontinuous confounders or from a failure of the linearity assumption. The results show that the specifications that survive the exogeneity test in the linear model also tend to survive the exogeneity test in the B-spline model, and vice-versa. The only exception is specification (5) for comprehension, which survives the non-linear test but does not survive the linear test, suggesting that the linear test detects endogeneity partly due to misspecification of the production function. Overall, most of the 36 In the decade since these households were interviewed, there has been a significant focus both in academia and the public media on early childhood investments. It would be interesting to explore these same questions for a more recent cohort of children. 37 In general, our approach is not capable of addressing the question of whether there is substantial, interesting heterogeneity which is not explained in the current linear model; our approach only aims at addressing the question of whether this unexplained heterogeneity generates endogeneity. 23 power of the test seems to stem from discontinuous unobservables, otherwise the B-spline models would fail to reject in even the most parsimonious specifications. From specifications (6) onwards, all models survive both exogeneity tests for all skills. Table 10 shows estimates for all four skill measures in our preferred model of specification (7). We find that maternal active time has a significant positive effect on math and noncognitive skills only when it is more than 15 hours per week, and maternal passive time only has a significant positive effect on math and comprehension skills when it is below 17 hours per week, and in fact has a negative, significant effect in comprehension skills when it is above 29 hours per week. A large amount (above 36 hours per week) of active self time seems to be productive for math, while a little (up until 1 hour per week) passive time with the father seems to be productive for vocabulary. These results are consistent with the linear results, but provide further details about the production function of skills. 4.2.2 Younger Children (5-11 Year Olds) The exogeneity test results for younger children are presented in Table 11. Again, models that survive the linear test of exogeneity tend to survive the non-linear one and vice-versa, with a few exceptions, suggesting that the linear test of exogeneity detects endogeneity partly stemming from misspecification of the skill production function. Table 12 reports the estimation results for children ages 5-11. Active time with the mother has a positive effect on non-cognitive skills, if in moderation (between 6 and 15 hours per week), but a negative effect when it is more than 15 hours per week. A lot of passive time with the mother or with siblings seem to be productive for vocabulary and comprehension. In contrast, passive time with the father seems to be counterproductive for the same skills. Moreover, up to 32 hours per week of self active time (mostly due to school activities, as discussed previously) has a positive effect on cognitive skills. Remark 1. The fact that the treatment effect estimates are not linear is not evidence that our surviving specifications in the linear models suffer from endogeneity. Indeed, the results suggest that the linear estimates in our preferred models are a weighted average of the corresponding non-linear estimates. For example, the coefficient of passive time with the mother on mathematics skills is 0.006 for older children, which is similar to a weighted average of the three coefficients of passive time with the mother from specification (7) shown in Table 10 (i.e. 0.06⇡1/3(0.016+0.000+0.004)). In general, an F-test for whether each coefficient of the linear model is the same as the weighted average of the corresponding coefficients of the B-spline model for all 15 time inputs yields a p-value of 0.6526 24 5 Sensitivity Analysis Thus far, we have chosen appropriate models for causal inference purely based on the exogeneity test described in Section 3. However, there can be confounders that are not detectable by the test. As discussed in Section 3.3, there are two potential categories of confounders: (a) confounders that are discontinuous at Inputji = 0, and (b) confounders that are continuous at Inputji = 0. Among type (a) confounders, there are two subtypes: (a1) those that are correlated with skill at Inputji = 0, and (a2) those that are not. The exogeneity test introduced in Section 3.1 is capable of detecting all unobservables of type (a1), but is incapable of detecting unobservables of types (a2) or (b). As discussed in Subsection 3.4, there are a number of reasons to believe that the class of variables included in types (a2) and (b) is small in our context. Regardless of how implausible the existence of these variables might be, this section provides robustness checks that can in principle detect them. 5.1 Comparing Surviving and Non-Surviving Specifications In this section, we compare estimates of across specifications, irrespective of whether the specification survives or does not survive the test, as shown in Section 4. This comparison is often done in empirical studies, where, heuristically, a good model is one that provides estimates that are robust to added controls (which might be omitted variables in the model).38 This “test of stable coefficients” is in principle capable of detecting endogeneity from the two undetectable sources of endogeneity discussed above. If a model survives the test of exogeneity, but does not survive this test, then it is evidence that some added control partially absorbs confounders of type (a2) or (b). We test for whether the fifteen elements of in each specification (1)-(6) from Section 4 are jointly significantly different from the corresponding coefficients in specification (7), our preferred model. We present the p-value of this test for each skill measure for older and younger children in Tables 13 and 14, respectively. Numbers in bold refer to those specifications that survive the exogeneity test at the 10% level of significance. In general, specifications that survive the exogeneity test (in bold) also survive the test of stable coefficients (p-value > 10%). Across all models of both tables, only one model that survives the exogeneity test is rejected by the other test: specification (2) for math in Table 5. This suggests that confounders from the undetectable sources of endogeneity discussed above are only controlled for after mother characteristics are added as controls (specification (3)). 38 For instance, Fiorini and Keane (2014) implement a somewhat weaker version of this test whereby they compare whether the ranking of the magnitude of each coefficient is the same across specifications. 25 Conversely, a few models do not survive the exogeneity test but survive the other test (e.g., specification (4) for comprehension in Table 5, specification (3) for non-cognitive skill in Table 7). In these cases, the test of stable coefficients is unable to detect some confounders that are discontinuous when inputs are zero because they are not correlated to the full list of controls of specification (7). From specification (5) onwards, all specifications survive both tests for all skills and both cohorts. Overall, these results are consistent with the idea that, as we add controls from specifications (1) to (7) in Section 4, we converge to the true causal estimates. Tables 15 and 16 show analogous results for the non-linear models discussed in Section 4.2. For each cohort and each specification (1)-(6), we show the p-value from a test of whether the 27 coefficients jk are significantly different from the corresponding ones in specification (7).39 We show in bold the specifications that survive the exogeneity test. For older children, all surviving specifications according to the exogeneity test also survive the other test, but the reverse is not true. For younger children, in two cases a specification survives the exogeneity test but does not survive the other test, and in one case the reverse happens. As in the linear models, all specifications from specification (5) onwards survive both tests for all skills and both age groups. In an online appendix, we present the actual estimates for specifications (1)-(7) for each age group and for each skill, for both the linear and the B-spline cases, illustrating more explicitly how the estimates are virtually unchanged for the surviving specifications but often change for the non-surviving ones.40 5.2 Alternative Specifications In this section, we perform many additional robustness checks on specification (7) from Section 4. Tables 17 and 18 report the p-value of a test for whether the coefficient of changes as we add controls to specification (7) from Section 4. Each specification in these tables contain additional controls of two types: (a’) variables that are discontinuous when some input is zero (some of which are shown in the plots presented in Section 3), and (b’) variables that are continuous when each input is zero, for all inputs.41 These variables might be correlated to undetectable confounders, as discussed above. For instance, observables of type (a’) (type (b’)) might be correlated to unobservables of type (a2) (type (b)). The p-values in Tables 17 and 18 provide clear evidence that our estimates of specification (7) 39 Some inputs did not allow for more than one or two B-spline terms. The online appendix is available at http://bit.ly/1KOy1aj. 41 Of course, these variables may not be confounders of type (b), because they may not be correlated to inputs at all. 40 26 are statistically unchanged in all alternative specifications for both age groups. Specifications (1’)-(3’) are particularly useful to allay further concerns about omitted variables and simultaneity. In specification (1’), we add more control variables related to child characteristics, family demographic characteristics, and environmental characteristics.42 In specification (2’), we add the 15 lagged (i.e., from the previous wave) time inputs.43 In specification (3’), we add the other three lagged skill measures as well as the interactions between any two of the four lagged skills.44 In specification (4’), we add controls related to misreporting of time diaries (12 additional controls)45 , to allay further concerns about measurement error. Specifications (5’)-(11’) are included to check for undetectable confounders from over-aggregation. Active time activities are further subcategorized in the data as educational, social, and school activities, while passive time activities are further subcategorized in the data as general care and media activities.46 In specification (5’), we add one more time input by separating school time from self active time, and test whether any of the 15 original coefficients change.47 In specification (6’), we add the proportions of each active time input spent in educational activities (7 additional controls).48 In specification (7’), we add the proportions of each passive time input spent in general care (7 additional controls).49 In specification (8’), we add the proportions of each passive time input spent watching TV (7 additional controls).50 42 Here is the full list of added controls in specification (1’): child’s birth weight, child’s current height, mother’s race indicators, father’s race indicators, birth order to father, mother’s working hours in a week, mother’s working days in a week, indicator for whether mother’s working schedule is a regular (vs. night) shift, number of books mother read last year, and indicator for whether caregivers spent on clothes for the child last year. We show in Section 3.3 that some of these controls are discontinuous when input is zero (e.g. child’s birth weight and hours mother works per week). 43 This specification is referred to as the “cumulative model” by Todd and Wolpin (2007) and Fiorini and Keane (2014). 44 We do not present younger cohort’s results for specification (3’) for lack of data, as discussed in Section 4. 45 The list includes whether the diary was self-administered, whether the diary was reviewed face-to-face, whether the diary was reviewed via phone, and indicators of who completed the diaries. Some of these variables (e.g. indicators of who completed the diaries) are shown to be discontinuous when inputs are zero in Section 3.3. 46 Fiorini and Keane (2014) stratifies activities according to these five types, depending on whether the activity involves parents. 47 School activities, originally fully included in self active time, comprise attending classes for full-time students, and daycare or nursery school for children not in school. They represent about 19% (18%) of all activities, 59% (54%) of all active activities and 90% (96%) of the self active time activity for the older (younger) cohort. 48 Educational activities include helping adults doing household chores, taking extracurricular lessons, and reading. They represent about 6% (5%) of all activities and 20% (15%) of all active activities for the older (younger) cohort. 49 General care include obtaining goods and services, personal needs and care (e.g. having meals), and traveling/waiting. They represent about 16% (15%) of all activities and 55% (60%) of all passive activities for the older (younger) cohort. 50 Watching TV represents about 8% (8%) of all activities and 29% (31%) of all passive activities for the 27 In specification (9’), we add the proportions of each time input spent at home as opposed to elsewhere (14 additional controls).51 In specification (10’), we add the proportions of each time input spent in activities with someone participating (14 additional controls).52 In specification (11’), we add the proportions of each time input spent during weekends (14 additional controls). In specification (12’), we check if our results are robust to the definition of age groups. In this specification, the younger group refers to five- to twelveyear-old children (rather than five- to eleven-year old children) and the older group refers to thirteen- to seventeen-year-old children (rather than twelve- to seventeen-year old children). Analogously, Tables 19 and 20 present the same robustness checks for the non-linear models discussed in Section 4.2, with the aim of allaying further concerns about non-linearities. We test whether the coefficients jk , for all j and k (27 coefficients) change as we change specification (7). The results show that, similarly to the linear models, the estimates do not change. Given the evidence presented in this section, it is difficult to conceive of a confounder that may be biasing our estimates. It needs to be of type (a2) or (b) for all inputs and at the same time be undetectable by all the robustness checks provided in this section. For instance, it is difficult to conceive of variables (of type (a2) or (b) for all inputs) correlated to both Skilli and Inputi observed in the current wave, and yet uncorrelated to both Skilli and Inputi observed in the previous wave. 6 Why Does Selection on Observables Work in This Context? The results for the linear and non-linear models discussed in the prior sections indicate that with rich enough controls we are able to arrive at specifications for which we fail to reject exogeneity. Moreover, as discussed in detail in the past sections, this does not appear to result from a lack of power. A natural question to ask at this point is why a selection on observables approach seems to be appropriate in the context of this application. older (younger) cohort. 51 Time spent at home accounts for about 26% of children’s total time in a week for both cohorts. 52 When filling out the time diaries, the respondents were asked not only about with whom each activity was performed, but also whether the partner actually participated in the activity (versus being just around while the child performed the activity). Participation time accounts for about 18% (21%) of children’s total time in a week for the older (younger) cohort. This variable was used in Del Boca et al. (2013) to categorize inputs. 28 While the richness of the available controls in the PSID is certainly helpful for mitigating endogeneity, incorporating the full set of inputs into the production function is also quite useful. To see this, consider the following simple model of input choices and skill formation where, for simplicity, we treat the child as the sole decision-maker. Skill for individual i is determined according to Skilli = f (Inputi , ✓i1 ), where Inputi is a vector of J time inputs and ✓i1 is a vector of unobservables impacting skill (e.g., how much attention the child pays when reading). Individuals choose Inputi to maximize utility Ui = g(Skilli , Inputi , ✓i1 , ✓i2 ) PJ j subject to Inputji 0 and j=1 Inputi = T , where T is the total available time (i.e., 24 hours per day). ✓i2 is a vector of unobservables that affect utility, but do not impact the production of skill directly (e.g., how much the child enjoys reading).53 Note that in this general formulation time inputs can affect utility directly, as can the unobservables influencing the production of skill, ✓i1 . Given this maximization problem, the chosen vector of time inputs is implicitly defined by the levels of ✓i1 and ✓i2 : individuals with different levels of (✓i1 , ✓i2 ) tend to choose different levels of inputs. In particular, endogeneity arises if an input is correlated with ✓i1 across individuals, conditional on covariates. In our context, we add as covariates all other inputs except sleeping. Thus, to the extent that these other inputs absorb elements of ✓i1 , adding them as covariates can substantially reduce the potential for endogeneity. Note that the variation in inputs due to ✓i2 is not endogenous and is in fact precisely the type of variation we want to exploit when estimating the production function. Of course, although ✓i2 would make ideal instruments to identify the effect of interest, it is difficult to know ex ante which source of variation is included in ✓i2 and which source of variation is included in ✓i1 , hence our need to develop an alternative identification strategy in this paper. Importantly, the full set of controls incorporated in the empirical model must be unable to thoroughly absorb ✓i2 , otherwise there would be no independent variation remaining in Inputi to estimate the production function. ✓i2 reflects tastes, which are likely quite heterogeneous across people, while ✓i1 is bound by technical features of the skill production technology. Thus, it is not surprising that observables can fully control for ✓i1 without fully controlling for ✓i2 . 53 ✓i2 can also include additional constraints in the maximization problem. 29 The above example illustrates a largely under-appreciated benefit of modeling the full vector of inputs in skill production. The inclusion of a comprehensive list of time activities not only helps the interpretability of the production parameters, but also can substantially allay endogeneity concerns. 7 Conclusion Cognitive and non-cognitive skills are critical for a host of economic and social outcomes as an adult. While there appears to be a consensus view that a significant amount of skill acquisition and development occurs early in life, the precise activities and investments that drive this process are not well understood. In this paper we examine how children’s time allocation affects the accumulation of skill. To do this, we apply a recently developed test of exogeneity to search for models that yield causal estimates of the impact time inputs have on child skills. The test exploits bunching in time inputs induced by a non-negativity time constraint. We provide evidence that the test is able to detect endogeneity arising from omitted variables, simultaneity, measurement error, and misspecification errors. There are potential sources of endogeneity that the test is unable to detect. However, our robustness exercises, which are designed to detect them, suggest that our rich set of controls, together with a comprehensive list of time inputs, are able to absorb them. The test indicates that with a sufficient set of controls, already available in the most detailed datasets, we are unable to reject exogeneity of time inputs for both younger and older children. For younger children, we find that sleeping is critical for the development of non-cognitive skills while maternal passive time is important for cognitive skill development. For older children, active time with adults is relatively valuable in developing cognitive skills, while passive time with parents and alone are important for non-cognitive skills. However, these effects are likely to be heterogeneous across families, children within families and activities within our time input categories. As better data become available, a similar approach to the one implemented here can be used to uncover causal estimates at a more disaggregated level. An additional benefit of our approach is that it can be used as a first stage in a broader model aimed at understanding household decisions about work, leisure, and investments in children. Typically, papers that are interested in such questions embed a skill production function in a more detailed household behavioral model (e.g., Del Boca et al. (2013)). Using our estimates would reduce the computational burden as well as ensure that endogeneity concerns have been considered. 30 Finally, our approach to estimating how children’s time allocation affects skill development can be utilized to study the consequences of other similar resource allocation decisions. Examples include understanding the impact of watching violent media on violent behavior or the productivity benefits of time spent exercising. In both examples, the activity of interest is endogenous, it is unclear which activity is being substituted for,54 and individuals are likely to bunch at zero as a result of non-negativity constraints. As time diaries become more ubiquitous, the methodology employed here provides researchers with a potential tool to study causality without an ex-ante source of exogenous variation. 54 DellaVigna and Ferrara (2015) discusses these endogeneity issues in the context of the economic and social impacts of the media. 31 References Almond, D. and Currie, J. (2011). Human capital development before age five. Handbook of labor economics, 4:1315–1486. Bernal, R. and Keane, M. P. (2010). Quasi-structural estimation of a model of childcare choices and child cognitive ability production. Journal of Econometrics, 156(1):pp.164– 189. Caetano, C. (2015). A test of exogeneity without instrumental variables in models with bunching. Econometrica, 83(4):pp.1581–1600. Caetano, G. and Maheshri, V. (2015). Identifying dynamic spillovers of crime: An empirical approach to model selection. Cameron, S. V. and Heckman, J. J. (1998). Life cycle schooling and dynamic selection bias: Models and evidence for five cohorts of american males. Journal of Political Economy, 106(2):pp. 262–333. Cunha, F. and Heckman, J. J. (2008). Formulating, identifying and estimating the technology of cognitive and noncognitive skill formation. The Journal of Human Resources, 43(4):pp. 738–782. Cunha, F., Heckman, J. J., Lochner, L., and Masterov, D. V. (2006). Interpreting the evidence on life cycle skill formation. Handbook of the Economics of Education, 1:pp. 697–812. Currie, J. and Thomas, D. (1999). Early test scores, socioeconomic status and future outcomes. Technical report, National bureau of economic research. Del Boca, D., Flinn, C., and Wiswall, M. (2013). Household choice and child development. The Review of Economic Studies, page rdt026. DellaVigna, S. and Ferrara, E. L. (2015). Economic and social impacts of the media. Technical report, National Bureau of Economic Research. Deming, D. (2009). Early childhood intervention and life-cycle skill development: Evidence from head start. American Economic Journal: Applied Economics, pages 111–134. Dustmann, C. and Schönberg, U. (2012). Expansions in maternity leave coverage and children’s long-term outcomes. American Economic Journal: Applied Economics, 4(3):pp. 190–224. 32 Fiorini, M. and Keane, M. P. (2014). How the allocation of children’s time affects cognitive and noncognitive development. Journal of Labor Economics, 32(4):pp. 787–836. Heckman, J., Pinto, R., and Savelyev, P. (2013). Understanding the mechanisms through which an influential early childhood program boosted adult outcomes. The American Economic Review, 103(6):pp. 2052–2086. Keane, M. P. (2010). A structural perspective on the experimentalist school. The Journal of Economic Perspectives, 24(2):pp. 47–58. Keane, M. P. and Wolpin, K. I. (1997). The career decisions of young men. Journal of Political Economy, 105(3):pp. 473–522. McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2):pp. 698–714. McLeod, J. D. and Kaiser, K. (2004). Childhood emotional and behavioral problems and educational attainment. American Sociological Review, 69(5):pp. 636–658. Smith, P. K. (2003). The psychology of grandparenthood: An international perspective. Routledge. Timmer, S. G., Eccles, J., and O’Brien, K. (1985). How children use time. Time, goods, and well-being, pages pp. 353–382. Todd, P. E. and Wolpin, K. I. (2003). On the specification and estimation of the production function for cognitive achievement. The Economic Journal, 113(485):pp. F3–F33. Todd, P. E. and Wolpin, K. I. (2007). The production of cognitive achievement in children: Home, school, and racial test score gaps. Journal of Human Capital, 1(1):pp. 91–136. Woodcock, R. W. and Johnson, M. B. (1989). Woodcock-Johnson tests of achievement. DLM Teaching Resources. 33 Table 1: Summary of Ages CDS I: 1997 CDS II: 2002 CDS III: 2007 Younger Children Older Children Age Range 0-12 years old 5-17 years old 10-22 years old 5-11 years old 12-17 years old Average Age 6 years and 9 months 11 years and 9 months 16 years and 9 months 8 years and 4 months 14 years and 6 months Table 2: Weekly Time in Each Activity (in Hours), Younger Children Younger Children Mean SD Active time with mother Passive time with mother Active time with father Passive time with father Active time with grandparents Passive time with grandparents Active time with siblings Passive time with siblings Active time with friends Passive time with friends Active time with others Passive time with others Self active time Self passive time Sleeping or napping Refused to answer or do not know 13.04 23.21 1.93 2.66 1.25 1.90 2.48 2.95 2.44 1.13 2.41 3.60 30.45 7.72 70.49 0.28 9.77 12.09 4.28 5.58 4.12 5.98 4.77 5.62 5.45 3.18 4.98 7.61 11.52 4.62 7.67 1.88 Proportion of Zero 0.09 0.04 0.69 0.57 0.85 0.79 0.62 0.52 0.70 0.71 0.69 0.39 0.08 0.00 0.00 0.96 Note: The third column shows the proportion of children who spend zero minutes in a week on the corresponding time category. 34 Table 3: Weekly Time in Each Activity (in Hours), Older Children Older Children Mean SD Active time with mother Passive time with mother Active time with father Passive time with father Active time with grandparents Passive time with grandparents Active time with siblings Passive time with siblings Active time with friends Passive time with friends Active time with others Passive time with others Self active time Self passive time Sleeping or napping Refused to answer or do not know 7.69 22.13 1.22 2.56 0.46 1.38 1.60 3.49 4.42 5.19 1.91 2.58 34.95 10.98 64.55 2.90 8.21 14.45 3.80 5.62 2.16 5.69 4.07 6.74 7.51 7.77 5.18 7.89 13.67 8.11 10.01 6.65 Proportion of Zero 0.23 0.07 0.81 0.62 0.92 0.86 0.75 0.55 0.56 0.37 0.81 0.58 0.06 0.00 0.00 0.67 Note: The third column shows the proportion of children who spend zero minutes in a week on the corresponding time category. Table 4: Demographics and Parental Background Child’s age (months) Child’s gender Birth order to mother Born in US Younger Mean 100.50 0.51 1.92 0.98 Children SD 24.33 0.50 1.09 0.14 Older Children Mean SD 174.10 20.57 0.50 0.50 1.95 1.06 0.98 0.14 Mother’s age Father’s age Mother’s age at child birth Father’s age at child birth Mother has only high school degree Mother has college degree Father has only high school degree Father has college degree 35.20 38.02 27.53 30.54 0.29 0.20 0.17 0.27 6.37 6.95 6.02 6.45 0.45 0.40 0.37 0.45 41.36 44.27 27.91 31.02 0.31 0.21 0.15 0.27 6.01 6.65 5.71 6.31 0.46 0.41 0.36 0.45 Number of siblings child lives with Lives with two biological parents Lives with grandparent Household annual income (in $1,000s) 1.67 0.60 0.08 107.95 1.86 0.49 0.27 128.00 2.21 0.55 0.07 117.71 2.66 0.50 0.26 129.68 35 Table 5: Exogeneity Test Results: Older Children Controls (1) (2) (3) (4) (5) (6) (7) Lagged Score Child Chrs. Mother Demog. Chrs. Family Demog. Chrs. Family Environ. Chrs. School Chrs. School Experience Math F-stat p-Value 2.920 0.000 1.312 0.187 1.256 0.223 1.071 0.379 1.065 0.385 0.951 0.506 0.879 0.588 Vocabulary F-stat p-Value 2.519 0.001 1.262 0.219 1.246 0.230 1.160 0.297 1.078 0.373 1.075 0.375 1.020 0.431 Comprehension F-stat p-Value 2.923 0.000 2.103 0.008 1.857 0.023 1.694 0.046 1.557 0.079 1.331 0.175 1.376 0.151 Non-cognitive F-stat p-Value 1.900 0.019 1.315 0.185 1.312 0.186 1.341 0.169 1.267 0.215 1.225 0.245 1.232 0.240 Note: Entries in bold are “surviving specifications” for which we cannot reject exogeneity at 10% of significance. Each specification contains different control variables: (1) no controls, except for the lagged corresponding input; (2) child characteristics; (3) mother demographic characteristics; (4) family demographic characteristics; (5) Family environmental characteristics; (6) School characteristics; (7) Child’s school experience. See footnote 28 for a full description of the control variables. All standard errors are corrected for heteroskedasticity. 36 Table 6: Effects of Children’s Time Allocation: Older Children Active time with mother Passive time with mother Active time with father Passive time with father Active time with grandparents Passive time with grandparents Active time with siblings Passive time with siblings Active time with friends Passive time with friends Self active time Self passive time Active time with others Passive time with others Don’t know or refuse to answer R-Squared Observations Exogeneity test F-statistic Exogeneity test p-value Math Vocabulary Comprehension Non-cognitive 0.008** (0.003) 0.006** (0.002) 0.016** (0.006) -0.001 (0.004) 0.024** (0.012) -0.003 (0.004) -0.002 (0.005) 0.006* (0.004) 0.008** (0.003) 0.004 (0.003) 0.007** (0.002) 0.005* (0.003) 0.002 (0.006) 0.003 (0.003) 0.003 (0.003) 0.604 1453 0.879 0.588 0.002 (0.003) 0.002 (0.002) 0.007 (0.006) -0.001 (0.004) 0.025** (0.012) -0.003 (0.005) -0.006 (0.005) 0.003 (0.003) 0.005 (0.003) -0.004 (0.003) -0.001 (0.002) 0.002 (0.002) -0.007 (0.005) -0.004 (0.003) 0.001 (0.003) 0.608 1455 1.020 0.431 0.001 (0.003) -0.001 (0.003) -0.000 (0.007) 0.007 (0.004) 0.029** (0.013) -0.002 (0.005) -0.011* (0.007) 0.000 (0.004) -0.002 (0.004) -0.001 (0.003) 0.002 (0.002) -0.001 (0.003) -0.005 (0.006) -0.005 (0.003) -0.000 (0.004) 0.568 1453 1.376 0.151 0.005 (0.004) 0.007** (0.003) 0.008 (0.007) 0.009* (0.005) 0.019 (0.018) -0.000 (0.006) 0.010 (0.007) 0.005 (0.005) 0.002 (0.005) 0.002 (0.004) 0.007** (0.003) 0.008** (0.003) 0.003 (0.007) 0.001 (0.006) -0.000 (0.004) 0.400 1454 1.232 0.240 Note: All estimates are for specification (7). See footnote 28 for a full description of the control variables. Standard errors corrected for heteroskedasticity are in parentheses. * Significant at the 10% level. ** Significant at the 5% level. 37 Table 7: Exogeneity Test Results: Younger Children Controls (1) (2) (3) (4) (5) (6) (7) No controls Child Chrs. Mother Demog. Chrs. Family Demog. Chrs. Family Environ. Chrs. School Chrs. School Experience Math F-stat p-Value 11.065 0.000 2.020 0.011 0.979 0.475 0.794 0.685 0.719 0.768 0.685 0.802 0.657 0.828 Vocabulary F-stat p-Value 11.327 0.000 2.761 0.000 1.675 0.049 1.094 0.356 1.126 0.326 1.104 0.346 1.116 0.335 Comprehension F-stat p-Value 8.001 0.000 2.497 0.001 1.338 0.171 1.034 0.416 1.025 0.426 0.958 0.498 0.891 0.574 Non-cognitive F-stat p-Value 2.774 0.000 2.427 0.002 1.644 0.056 1.434 0.122 1.396 0.140 1.389 0.143 1.333 0.173 Note: Entries in bold are “surviving specifications” for which we cannot reject exogeneity at 10% of significance. Each specification contains different control variables: (1) no controls; (2) child characteristics; (3) mother demographic characteristics; (4) family demographic characteristics; (5) family environmental characteristics; (6) school characteristics; (7) child’s school experience. See footnote 28 for a full description of the control variables. All standard errors are corrected for heteroskedasticity. 38 Table 8: Effects of Children’s Time Allocation: Younger Children Active time with mother Passive time with mother Active time with father Passive time with father Active time with grandparents Passive time with grandparents Active time with siblings Passive time with siblings Active time with friends Passive time with friends Self active time Self passive time Active time with others Passive time with others Don’t know or refuse to answer R-Squared Observations Exogeneity test F-statistic Exogeneity test p-value Math Vocabulary Comprehension Non-cognitive 0.001 (0.002) 0.001 (0.002) 0.001 (0.003) -0.003 (0.003) 0.002 (0.003) -0.004 (0.003) -0.001 (0.003) 0.002 (0.002) -0.000 (0.003) 0.000 (0.003) 0.003 (0.002) 0.002 (0.002) 0.000 (0.003) -0.001 (0.002) 0.007 (0.009) 0.805 2443 0.657 0.828 0.001 (0.002) 0.003* (0.001) 0.000 (0.003) -0.005** (0.003) 0.001 (0.003) 0.002 (0.002) -0.002 (0.003) 0.003 (0.002) -0.001 (0.003) 0.000 (0.003) 0.001 (0.002) 0.002 (0.002) 0.004 (0.003) -0.002 (0.002) 0.003 (0.006) 0.807 2449 1.116 0.335 0.004 (0.002) 0.004* (0.002) 0.004 (0.005) -0.004 (0.004) 0.008 (0.005) 0.003 (0.004) -0.003 (0.004) 0.007** (0.003) -0.003 (0.003) 0.004 (0.005) 0.005** (0.003) 0.003 (0.003) 0.007 (0.004) -0.001 (0.003) 0.010 (0.009) 0.673 2085 0.891 0.574 -0.001 (0.004) -0.004 (0.003) 0.008 (0.007) -0.003 (0.006) -0.004 (0.008) -0.006 (0.005) -0.007 (0.006) -0.007 (0.005) -0.018** (0.008) -0.007 (0.009) -0.009** (0.004) -0.003 (0.005) -0.014** (0.006) -0.005 (0.004) -0.018 (0.020) 0.131 2548 1.333 0.173 Note: All estimates are for specification (7). See footnote 28 for a full description of the control variables. Standard errors corrected for heteroskedasticity are in parentheses. * Significant at the 10% level. ** Significant at the 5% level. 39 Table 9: Exogeneity Test Results: Older Children, B-spline Controls (1) (2) (3) (4) (5) (6) (7) Lagged Score Child Chrs. Mother Demog. Chrs. Family Demog. Chrs. Family Environ. Chrs. School Chrs. School’s Experience Math F-stat p-Value 1.959 0.015 1.109 0.342 1.100 0.351 1.049 0.401 0.853 0.618 0.827 0.648 0.805 0.673 Vocabulary F-stat p-Value 2.214 0.005 1.366 0.156 1.438 0.121 1.402 0.138 1.322 0.180 1.305 0.191 1.270 0.214 Comprehension F-stat p-Value 2.217 0.005 1.641 0.057 1.753 0.036 1.698 0.045 1.434 0.123 1.339 0.170 1.320 0.182 Non-cognitive F-stat p-Value 1.832 0.026 1.400 0.139 1.383 0.147 1.383 0.147 1.255 0.224 1.239 0.235 1.228 0.243 Note: All specifications in this table are in the form of a linear B-Spline with 2 knots placed at 33rd and 67th percentiles of each time input, whenever possible. Entries in bold are “surviving specifications” for which we cannot reject exogeneity at 10% of significance in the linear model . Each specification contains different control variables: (1) no controls, except for the lagged corresponding input; (2) child characteristics; (3) mother demographic characteristics; (4) family demographic characteristics; (5) family environmental characteristics; (6) school characteristics; (7) child’s school experience. See footnote 28 for a full description of the control variables. All standard errors are corrected for heteroskedasticity. 40 Table 10: B-spline Estimation Results: Older Children Math Vocabulary Comprehension Noncognitive Active time with mother (0,5.8) Active time with mother (5.8,15) Active time with mother (15,.) Passive time with mother (0,17.4) Passive time with mother (17.41,28.7) Passive time with mother (28.7,.) Active time with father (0,.) Passive time with father (0,1.2) Passive time with father (1.2,.) Active time with grandparents (0,.) Passive time with grandparents (0,.) Active time with siblings (0,.) Passive time with siblings (0,1.7) Passive time with siblings (1.7,.) Active time with friends (0,.) Passive time with friends (0,0.8) Passive time with friends (0.8,.) 0.024 0.013 0.016 -0.007 (0.016) (0.015) (0.016) (0.018) -0.009 -0.004 -0.005 -0.001 (0.008) (0.007) (0.008) (0.008) 0.020** 0.006 0.003 0.014** (0.005) (0.005) (0.006) (0.006) 0.016** 0.008 0.010* 0.007 (0.006) (0.005) (0.006) (0.007) 0.000 0.001 0.005 0.011* (0.006) (0.005) (0.006) (0.007) 0.004 0.000 -0.010** 0.004 (0.003) (0.003) (0.004) (0.004) 0.014** 0.007 0.000 0.007 (0.006) (0.006) (0.007) (0.007) 0.131 0.263* 0.119 -0.075 (0.122) (0.146) (0.161) (0.143) -0.001 -0.003 0.008 0.010* (0.005) (0.004) (0.005) (0.005) 0.026** 0.025** 0.031** 0.018 (0.012) (0.012) (0.012) (0.018) -0.003 -0.002 -0.001 -0.000 (0.004) (0.005) (0.005) (0.006) -0.001 -0.005 -0.010 0.010 (0.006) (0.005) (0.007) (0.007) -0.064 -0.098 -0.047 -0.100 (0.084) (0.075) (0.080) (0.104) 0.008** 0.005 0.002 0.006 (0.004) (0.004) (0.004) (0.005) 0.009** 0.005 0.000 0.003 (0.003) (0.003) (0.004) (0.005) 0.104 0.305 -0.006 -0.061 (0.376) (0.442) (0.331) (0.269) 0.004 -0.004 -0.000 0.001 41 (0.003) (0.003) (0.003) (0.004) 0.003 -0.001 0.004 0.009 (0.004) (0.004) (0.005) (0.006) 0.019 -0.005 -0.012 0.013 (0.014) (0.013) (0.015) (0.017) 0.008** 0.001 0.004 0.005 (0.003) (0.003) (0.003) (0.004) -0.013 0.002 -0.013 -0.016 (0.028) (0.027) (0.031) (0.035) 0.007 0.015 0.004 0.029 (0.015) (0.014) (0.016) (0.018) 0.005* 0.000 0.001 0.006 (0.003) (0.003) (0.003) (0.004) 0.002 -0.007 -0.004 0.003 (0.006) (0.005) (0.006) (0.007) -0.059 0.014 -0.062 -0.025 (0.039) (0.036) (0.043) (0.049) 0.006* -0.004 -0.002 0.001 (0.003) (0.004) (0.004) (0.007) 0.004 0.001 0.001 -0.001 (0.003) (0.003) (0.004) (0.004) R-squared 0.609 0.612 0.575 0.404 Observations 1,453 1,455 1,453 1,454 Exogeneity test F-statistic 0.805 1.270 1.320 1.228 Exogeneity test p-value 0.673 0.214 0.182 0.243 Self active time (0,31.5) Self active time (31.5,36.3) Self active time (36.3,.) Self passive time (0,5.1) Self passive time (5.1,9) Self passive time (9,.) Active time with others (0,.) Passive time with others (0,2.5) Passive time with others (2.5,.) Don’t know or refuse to answer Note: All estimates are for specification (7). See footnote 28 for a full description of the control variables. In the first column, the parentheses shown after each time input indicates the time intervals. For example, (0,2.5) means between 0 hours and 2.5 hours per week. Depending on the distribution, some time inputs have less than three time intervals because the time input was not complex enough to accommodate two knots. Standard errors corrected for heteroskedasticity are in parentheses. * Significant at the 10% level. ** Significant at the 5% level. 42 Table 11: Exogeneity Test Results: Younger Children, B-spline Controls (1) (2) (3) (4) (5) (6) (7) No Controls Child Chrs. Mother Demog. Chrs. Family Demog. Chrs. Family Environ. Chrs. School Chrs. School’s Experience Math F-stat p-Value 6.918 0.000 1.502 0.096 0.981 0.472 0.815 0.661 0.661 0.825 0.625 0.857 0.642 0.842 Vocabulary F-stat p-Value 8.080 0.000 1.358 0.159 1.052 0.398 0.841 0.632 0.785 0.696 0.766 0.717 0.837 0.637 Comprehension F-stat p-Value 6.055 0.000 1.059 0.390 0.764 0.719 0.675 0.811 0.554 0.911 0.519 0.932 0.570 0.900 Non-cognitive F-stat p-Value 2.142 0.006 1.610 0.063 1.459 0.112 1.615 0.062 1.474 0.106 1.474 0.106 1.460 0.112 Note: All specifications in this table are in the form of a linear B-Spline with 2 knots placed at 33rd and 67th percentiles of each time input, whenever possible. Entries in bold are “surviving specifications” for which we cannot reject exogeneity at 10% of significance in the linear model . Each specification contains different control variables: (1) no controls, except for the lagged corresponding input; (2) child characteristics; (3) mother demographic characteristics; (4) family demographic characteristics; (5) family environmental characteristics; (6) school characteristics; (7) child’s school experience. See footnote 28 for a full description of the control variables. All standard errors are corrected for heteroskedasticity. 43 Table 12: B-spline Estimation Results: Younger Children Math Vocabulary Comprehension Noncognitive Active time with mother (0,5.8) Active time with mother (5.8,15) Active time with mother (15,.) Passive time with mother (0,17.4) Passive time with mother (17.41,28.7) Passive time with mother (28.7,.) Active time with father (0,.) Passive time with father (0,1.2) Passive time with father (1.2,.) Active time with grandparents (0,.) Passive time with grandparents (0,.) Active time with siblings (0,.) Passive time with siblings (0,1.7) Passive time with siblings (1.7,.) Active time with friends (0,.) Passive time with friends (0,0.8) Passive time with friends (0.8,.) -0.015 -0.001 0.010 -0.029 (0.011) (0.011) (0.015) (0.023) 0.002 0.002 0.005 0.019** (0.004) (0.004) (0.006) (0.008) 0.002 0.000 0.002 -0.009** (0.002) (0.002) (0.003) (0.004) 0.005 0.000 0.001 0.002 (0.004) (0.004) (0.005) (0.008) 0.001 0.003 0.003 -0.009 (0.003) (0.003) (0.004) (0.006) 0.000 0.004* 0.006* -0.000 (0.002) (0.002) (0.004) (0.005) 0.001 0.000 0.005 0.008 (0.003) (0.003) (0.005) (0.007) -0.020 0.046 0.039 -0.013 (0.066) (0.068) (0.093) (0.121) -0.002 -0.006** -0.004 -0.003 (0.003) (0.003) (0.004) (0.006) 0.002 0.002 0.008 -0.004 (0.004) (0.003) (0.005) (0.008) -0.003 0.002 0.002 -0.005 (0.003) (0.002) (0.004) (0.005) -0.001 -0.003 -0.002 -0.007 (0.003) (0.003) (0.004) (0.006) -0.017 -0.034 -0.051 -0.130* (0.036) (0.036) (0.050) (0.075) 0.003 0.004 0.009** -0.004 (0.003) (0.002) (0.003) (0.005) 0.001 -0.001 -0.002 -0.017** (0.003) (0.003) (0.003) (0.008) 0.003 -0.070 -0.107 0.496** (0.125) (0.115) (0.193) (0.234) 0.001 0.000 0.004 -0.011 44 (0.004) (0.003) (0.005) (0.010) 0.006** 0.006** 0.009** -0.007 (0.003) (0.002) (0.003) (0.005) -0.004 -0.008 -0.000 -0.022* (0.007) (0.007) (0.009) (0.013) -0.000 -0.003 0.001 -0.004 (0.005) (0.004) (0.007) (0.009) 0.035** 0.018 0.061** 0.023 (0.013) (0.012) (0.020) (0.025) -0.001 0.006 0.005 -0.004 (0.008) (0.008) (0.011) (0.015) 0.000 -0.002 -0.003 -0.007 (0.003) (0.003) (0.004) (0.007) -0.000 0.004 0.006 -0.014** (0.003) (0.003) (0.004) (0.006) -0.009 -0.001 -0.002 -0.003 (0.018) (0.018) (0.025) (0.034) -0.001 -0.001 0.000 -0.004 (0.002) (0.002) (0.003) (0.004) 0.007 0.002 0.009 -0.017 (0.009) (0.006) (0.009) (0.020) R-squared 0.807 0.808 0.676 0.138 Observations 2,443 2,449 2,085 2,548 Exogeneity test F-statistic 0.642 0.837 0.570 1.460 Exogeneity test p-value 0.842 0.637 0.900 0.112 Self active time (0,31.5) Self active time (31.5,36.3) Self active time (36.3,.) Self passive time (0,5.1) Self passive time (5.1,9) Self passive time (9,.) Active time with others (0,.) Passive time with others (0,2.5) Passive time with others (2.5,.) Don’t know or refuse to answer Note: All estimates are for specification (7). See footnote 28 for a full description of the control variables. In the first column, the parentheses shown after each time input indicates the time intervals. For example, (0,2.5) means between 0 hours and 2.5 hours per week. Depending on the distribution, some time inputs have less than three time intervals because the time input was not complex enough to accommodate two knots. Standard errors corrected for heteroskedasticity are in parentheses. * Significant at the 10% level. ** Significant at the 5% level. 45 Table 13: p-Values for Comparing Surviving and Non-surviving Specifications: Older Children Controls (1) (2) (3) (4) (5) (6) Lagged Score Child Chrs. Mother Demog. Chrs. Family Demog. Chrs. Family Environ. Chrs. School Chrs. Math 0.000 0.040 0.185 0.585 0.932 0.958 Older Cohort Vocabulary Comprehension 0.000 0.000 0.463 0.006 0.540 0.072 0.491 0.303 0.958 0.675 0.998 0.957 Non-cognitive 0.975 0.974 0.984 0.999 1.000 0.998 Note: This table shows the p-values of a joint test for whether the 15 coefficients of Inputi for each specification are the same as the corresponding ones from specification (7) in Table 5. Entries in bold are “surviving specifications” with respect to the exogeneity test, i.e., those for which we cannot reject exogeneity at 10% of significance. Each specification contains different control variables: (1) no controls, except for the lagged corresponding input; (2) child characteristics; (3) mother demographic characteristics; (4) family demographic characteristics; (5) family environmental characteristics; (6) school characteristics. See footnote 28 for a full description of the control variables. All standard errors are corrected for heteroskedasticity. Table 14: p-Values for Comparing Surviving and Non-surviving Specifications: Younger Children Controls (1) (2) (3) (4) (5) (6) No controls Child Chrs. Mother Demog. Chrs. Family Demog. Chrs. Family Environ. Chrs. School Chrs. Math 0.000 0.000 0.364 0.570 0.374 0.830 Younger Cohort Vocabulary Comprehension 0.000 0.000 0.000 0.001 0.151 0.350 0.555 0.533 0.431 0.441 0.682 0.828 Non-cognitive 0.004 0.011 0.531 0.904 0.942 1.000 Note: This table shows the p-values of a joint test for whether the 15 coefficients of Inputi for each specification are the same as the corresponding ones from specification (7) in Table 7. Entries in bold are “surviving specifications” with respect to the exogeneity test, i.e., those for which we cannot reject exogeneity at 10% of significance. Each specification contains different control variables: (1) no controls; (2) child characteristics; (3) mother demographic characteristics; (4) family demographic characteristics; (5) family environmental characteristics; (6) school characteristics. See footnote 28 for a full description of the control variables. All standard errors are corrected for heteroskedasticity. 46 Table 15: p-Values for Comparing Surviving and Non-surviving Specifications: Older Children, B-spline Controls (1) (2) (3) (4) (5) (6) Lagged Score Child Chrs. Mother Demog. Chrs. Family Demog. Chrs. Family Environ. Chrs. School Chrs. Math 0.000 0.182 0.465 0.734 0.996 1.000 Older Cohort Vocabulary Comprehension 0.000 0.000 0.857 0.148 0.844 0.428 0.869 0.591 0.998 0.956 1.000 0.999 Non-cognitive 0.990 0.998 0.996 0.999 1.000 1.000 Note: This table shows the p-values of a test for whether the 26 coefficient estimates of Inputi for each specification are statistically the same as the corresponding ones from Specification (7) in Table 9. Entries in bold are “surviving specifications” for which we cannot reject exogeneity at 10% of significance. Each specification contains different control variables: (1) no controls, except for the lagged corresponding input; (2) child characteristics; (3) mother demographic characteristics; (4) family demographic characteristics; (5) family environmental characteristics; (6) school characteristics. All standard errors are corrected for heteroskedasticity. See footnote 28 for a full description of the control variables. All standard errors are corrected for heteroskedasticity. Table 16: p-Values for Comparing Surviving and Non-surviving Specifications: Younger Children, B-spline Controls (1) (2) (3) (4) (5) (6) No Controls Child Chrs. Mother Demog. Chrs. Family Demog. Chrs. Family Environ. Chrs. School Chrs. Math 0.000 0.002 0.331 0.660 0.760 0.943 Younger Cohort Vocabulary Comprehension 0.000 0.000 0.000 0.005 0.120 0.378 0.583 0.610 0.813 0.694 0.872 0.899 Non-cognitive 0.007 0.035 0.898 0.979 0.999 1.000 Note: This table shows the p-values of a test for whether the 27 coefficient estimates of Inputi for each specification are statistically the same as the corresponding ones from Specification (7) in Table 11. Entries in bold are “surviving specifications” for which we cannot reject exogeneity at 10% of significance. Each specification contains different control variables: (1) no controls; (2) child characteristics; (3) mother demographic characteristics; (4) family demographic characteristics; (5) family environmental characteristics; (6) school characteristics. See footnote 28 for a full description of the control variables. All standard errors are corrected for heteroskedasticity. 47 Table 17: Alternative Specifications: Older Children (1’) (2’) (3’) (4’) (5’) (6’) (7’) (8’) (9’) (10’) (11’) (12’) Alternative Specifications (7) + more controls (7) + lagged inputs (7) + lagged skills (7) + measurement error controls (7), school time as a separate input (7) + prop. educational activities (7) + prop. general care (7) + prop. watching TV (7) + prop. at home (7) + prop. participation time (7) + prop. weekend time (7), change age of cohorts Math 0.855 0.997 0.798 0.534 0.998 0.984 0.973 0.506 0.781 0.863 0.965 0.953 Vocabulary 0.969 0.996 0.452 0.660 0.999 0.960 0.723 0.827 0.920 0.629 0.822 0.571 Comprehension 0.992 0.990 0.791 0.597 0.958 0.723 0.988 0.822 0.907 0.873 0.699 0.462 Non-cognitive 0.966 0.979 0.826 0.995 1.000 0.671 0.995 0.999 0.498 0.874 0.689 0.243 Note: This table shows the p-values of a test for whether the 15 coefficient estimates of Inputi for each alternative specification are statistically the same as the corresponding ones from Specification (7) in Table 5. Alternative specifications: (1’) the full list of added controls can be seen in footnote 42; (2’) lagged time inputs of all 15 activities; (3’) lagged skill measures of other types and interactions of any two skills; (4’) full list of added controls can be seen in footnote 45; (5’) 16 time inputs (15 original time inputs plus school activities), whereby the p-value refers to a test of whether the 15 coefficients of the original time inputs are statistically unchanged with respect to specification (7); (6’) proportions of each time input spent in educational activities (e.g. reading): 7 additional covariates; (7’) proportions of each time input spent in general care (i.e having meals): 7 additional covariates; (8’) proportions of each time input spent watching TV: 7 additional covariates; (9’) proportions of each time input spent at home: 14 additional covariates; (10’) proportions of each time input that partner actually participates in the activity: 12 additional covariates; (11’) proportions of each time input spent during weekends: 15 additional covariates; (12’) the age range of the older cohort is changed to be 13-17 years old. 48 Table 18: Alternative Specifications: Younger Children (1’) (2’) (4’) (5’) (6’) (7’) (8’) (9’) (10’) (11’) (12’) Alternative Specifications (7) + more controls (7) + lagged inputs (7) + measurement error controls (7), school time as a separate input (7) + prop. educational activities (7) + prop. general care (7) + prop. watching TV (7) + prop. at home (7) + prop. participation time (7) + prop. weekend time (7), change age of cohorts Math 0.902 0.991 0.836 1.000 0.918 0.997 0.999 0.663 0.955 0.976 0.954 Vocabulary 0.717 0.874 0.877 1.000 0.934 0.968 0.999 0.097 0.991 0.874 0.554 Comprehension 0.389 0.981 0.504 1.000 0.922 0.982 0.999 0.518 0.882 0.906 0.492 Non-cognitive 0.898 0.942 0.989 0.988 0.842 1.000 1.000 0.727 0.919 0.942 0.833 Note: This table shows the p-values of a test for whether the 15 coefficient estimates of Inputi for each alternative specification are statistically the same as the corresponding ones from Specification (7) in Table 7. Alternative specifications: (1’) the full list of added controls can be seen in footnote 42; (2’) lagged time inputs of all 15 activities; (3’) lagged skill measures of other types and interactions of any two skills, which is not implemented for this cohort because of lack of data; (4’) full list of added controls can be seen in footnote 45; (5’) 16 time inputs (15 original time inputs plus school activities), whereby the p-value refers to a test of whether the 15 coefficients of the original time inputs are statistically unchanged with respect to specification (7); (6’) proportions of each time input spent in educational activities (e.g. reading): 7 additional covariates; (7’) proportions of each time input spent in general care (i.e having meals): 7 additional covariates; (8’) proportions of each time input spent watching TV: 7 additional covariates; (9’) proportions of each time input spent at home: 14 additional covariates; (10’) proportions of each time input that partner actually participates in the activity: 12 additional covariates; (11’) proportions of each time input spent during weekends: 15 additional covariates; (12’) the age range of the older cohort is changed to be 5-12 years old. 49 Table 19: Alternative Specifications: Older Children, B-spline (1’) (2’) (3’) (4’) (5’) (6’) (7’) (8’) (9’) (10’) (11’) (12’) Alternative Specifications (7) + more controls (7) + lagged inputs (7) + lagged skills (7) + measurement error controls (7), school time as a separate input (7) + prop. educational activities (7) + prop. general care (7) + prop. watching TV (7) + prop. at home (7) + prop. participation time (7) + prop. weekend time (7), change age of cohorts Math 0.940 1.000 0.946 0.887 0.999 0.999 1.000 0.923 0.915 0.996 1.000 0.908 Vocabulary 0.997 1.000 0.871 0.869 1.000 0.990 0.969 0.970 0.999 0.909 0.997 0.224 Comprehension 0.991 1.000 0.830 0.978 0.995 0.937 1.000 0.990 1.000 0.998 0.974 0.795 Non-cognitive 0.999 1.000 0.994 0.999 1.000 0.989 1.000 1.000 0.861 0.997 0.847 0.313 Note: This table shows the p-values of a test for whether the coefficient estimates of Inputi for each alternative specification are statistically the same as the corresponding ones from Specification (7) in Table 9. All specifications in this table are in the form of a linear B-Spline with 2 knots placed at 33rd and 67th percentiles of each time input, whenever possible. Alternative specifications: (1’) the full list of added controls can be seen in footnote 42; (2’) lagged time inputs of all 15 activities; (3’) lagged skill measures of other types and interactions of any two skills; (4’) full list of added controls can be seen in footnote 45; (5’) 30 time inputs (27 original time inputs plus 3 school inputs), whereby the p-value refers to a test of whether the 27 coefficients of the original time inputs are statistically unchanged with respect to specification (7); (6’) proportions of each time input spent in educational activities (e.g. reading): 7 additional covariates; (7’) proportions of each time input spent in general care (i.e having meals): 7 additional covariates; (8’) proportions of each time input spent watching TV: 7 additional covariates; (9’) proportions of each time input spent at home: 14 additional covariates; (10’) proportions of each time input that partner actually participates in the activity: 12 additional covariates; (11’) proportions of each time input spent during weekends: 15 additional covariates; (12’) the age range of the older cohort is changed to be 13-17 years old. 50 Table 20: Alternative Specifications: Younger Children, B-spline (1’) (2’) (4’) (5’) (6’) (7’) (8’) (9’) (10’) (11’) (12’) Alternative Specifications (7) + more controls (7) + lagged inputs (7) + measurement error controls (7), school time as a separate input (7) + prop. educational activities (7) + prop. general care (7) + prop. watching TV (7) + prop. at home (7) + prop. participation time (7) + prop. weekend time (7), change age of cohorts Math 0.951 1.000 0.993 0.999 1.000 1.000 1.000 0.860 1.000 1.000 0.659 Vocabulary 0.979 0.993 0.987 0.998 1.000 1.000 1.000 0.269 1.000 0.999 0.690 Comprehension 0.821 1.000 0.908 1.000 0.999 1.000 1.000 0.896 0.999 0.999 0.384 Non-cognitive 0.973 0.998 1.000 0.999 0.997 1.000 1.000 0.984 0.998 1.000 0.573 Note: This table shows the p-values of a test for whether the coefficient estimates of Inputi for each alternative specification are statistically the same as the corresponding ones from Specification (7) in Table 11. All specifications in this table are in the form of a linear B-Spline with 2 knots placed at 33rd and 67th percentiles of each time input, whenever possible. Alternative specifications: (1’) the full list of added controls can be seen in footnote 42; (2’) lagged time inputs of all 15 activities; (3’) lagged skill measures of other types and interactions of any two skills, which is not implemented for this cohort because of lack of data; (4’) full list of added controls can be seen in footnote 45; (5’) 30 time inputs (27 original time inputs plus 3 school inputs), whereby the p-value refers to a test of whether the 27 coefficients of the original time inputs are statistically unchanged with respect to specification (7); (6’) proportions of each time input spent in educational activities (e.g. reading): 7 additional covariates; (7’) proportions of each time input spent in general care (i.e having meals): 7 additional covariates; (8’) proportions of each time input spent watching TV: 7 additional covariates; (9’) proportions of each time input spent at home: 14 additional covariates; (10’) proportions of each time input that partner actually participates in the activity: 12 additional covariates; (11’) proportions of each time input spent during weekends: 15 additional covariates; (12’) the age range of the older cohort is changed to be 5-12 years old. 51 Figure 1: Intuition for the Test of Exogeneity E[Skilli |Inputji = x] E[Skilli |Inputji = x, Covariatesji ] E[Skilli |Inputji =0, Covariatesji ] E[Skilli |Inputji =0] Inputji Inputji (a) Correlation between Time Input and Child Skill, Unconditional (b) Correlation between Time Input and Child Skill, Conditional on Covariates Figure 2: Why are Unobservables Discontinuous at Inputji = 0? Mother’s Typei E[Mother’s Typei |Inputji = 0] Inputji 52 Figure 3: Types of Confounders Confounderi Confounderi E[Confounderi |Inputji = 0] E[Confounderi |Inputji = 0] Inputji , Inputj? i (a) Type (a1) vs. Type (a2) Inputji , Inputj? i (b) Type (b) Note: Inputj? i represents the optimal choice of input j by individual i. Red range: Support of confounder among all observations of sample. Blue range: Support of confounder among all observations of sample for which Inputji = 0. The confounder is of type (a1) if some of its correlation with Skilli happens for values of the confounder in the blue range, otherwise it is of type (a2). 53 Figure 4: Evidence of Bunching .9 .8 Cumulative Density .4 .5 .6 .7 .3 .2 .1 0 0 .1 .2 .3 Cumulative Density .4 .5 .6 .7 .8 .9 1 (b) Passive Time with Friends, Older Cohort 1 (a) Self Active Time, Older Cohort 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 Self Active time 0 10 15 20 25 30 35 40 Inactive time with friends 45 50 55 60 .9 .8 Cumulative Density .4 .5 .6 .7 .3 .2 .1 0 0 .1 .2 .3 Cumulative Density .4 .5 .6 .7 .8 .9 1 (d) Active Time with Father, Younger Cohort 1 (c) Active Time with Mother, Older Cohort 5 0 5 10 15 20 25 30 35 40 Active time with mother 45 50 55 60 0 5 10 15 20 25 30 Active time with father 35 40 45 Note: Each plot shows the cumulative density function of the time spent in the corresponding activity for the corresponding cohort. The fact that these plots cross the vertical axis not at the origin is direct evidence of bunching, as it implies the probability density function is discontinuously larger at zero. Time described in the horizontal axis is reported in hours per week, but continuously (in minutes). 54 11 3.8 4 11.5 Conditional Mean 12.5 12 13 Conditional Mean 4.2 4.6 4.4 4.8 13.5 5 14 Figure 5: Evidence of Power to Detect Endogeneity from Omitted Variables 0 2 4 6 8 10 12 14 16 18 20 22 24 Active time with mother (hours per week) 26 28 30 0 4 6 8 10 12 14 16 18 20 22 24 Active time with father (hours per week) 26 28 30 (b) Number of Books Child Has, Younger Children 40 30 60 35 Conditional Mean 40 45 50 55 Conditional Mean 80 100 120 140 160 180 200 220 240 (a) Mother’s Level of Education (Years), Younger Children 2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Don't know or refuse to answer (hours per week) 28 30 0 (c) Hours Mother Works Per Week, Older Children 2 4 6 8 10 12 14 16 18 20 22 24 Passive time with father (hours per week) 26 28 30 (d) Household Income ($1,000s Per Year), Older Children Note: In each plot, the vertical axis shows the mean of a potential confounder conditional on a given level of time input (i.e. horizontal axis variable). The scatter plot represents the observed conditional mean of the confounder (aggregated to the next hour of the time input). At zero time input, we show the 95% confidence interval. The solid curve represents a third order local polynomial regression of the confounder on the time input, using time input data at the minute level. The shaded region represents the 95% confidence interval for this regression with an out-of-sample prediction at zero minutes. See footnote 21 for more details on the regression and confidence interval. 55 Figure 6: Evidence of Power to Detect Endogeneity from Simultaneity (a) Child’s Birth Weight (Pounds), Older Children 0 5 .1 5.5 .2 6 .3 Conditional Mean 6.5 7 7.5 8 Conditional Mean .4 .5 .6 .7 8.5 .8 9 .9 9.5 1 (b) Child is White, Older Children 0 2 4 6 8 10 12 14 16 18 20 22 24 Passive time with father (hours per week) 26 28 30 0 4 6 8 10 12 14 16 18 20 22 24 Passive time with father (hours per week) 26 28 30 (d) Child’s Height (Inches), Younger Children 40 50 42 60 44 70 Conditional Mean 48 46 50 Conditional Mean 100 80 90 52 110 54 56 120 (c) Child’s Age (Months), Younger Children 2 0 2 4 6 8 10 12 14 16 18 20 22 24 Active time with friends (hours per week) 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 Active time with friends (hours per week) 26 28 30 Note: In each plot, the vertical axis shows the mean of a potential confounder conditional on a given level of time input (i.e. horizontal axis variable). The scatter plot represents the observed conditional mean of the confounder (aggregated to the next hour of the time input). At zero time input, we show the 95% confidence interval. The solid curve represents a third order local polynomial regression of the confounder on the time input, using time input data at the minute level. The shaded region represents the 95% confidence interval for this regression with an out-of-sample prediction at zero minutes. See footnote 21 for more details on the regression and confidence interval. 56 Figure 7: Evidence of Power to Detect Endogeneity from Measurement Error (b) Child Completed Weekday Diary (With or Without Help), Older Children 0 .35 .1 .4 .2 .45 .3 Conditional Mean .4 .5 .6 .7 Conditional Mean .5 .55 .6 .8 .65 .9 1 .7 (a) Weekend Diary was Completed Without Help, Younger Children 0 2 4 6 8 10 12 14 16 18 20 22 24 Active time with mother (hours per week) 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 Passive time with others (hours per week) 26 28 30 Note: In each plot, the vertical axis shows the mean of a potential confounder conditional on a given level of time input (i.e. horizontal axis variable). The scatter plot represents the observed conditional mean of the confounder (aggregated to the next hour of the time input). At zero time input, we show the 95% confidence interval. The solid curve represents a third order local polynomial regression of the confounder on the time input, using time input data at the minute level. The shaded region represents the 95% confidence interval for this regression with an out-of-sample prediction at zero minutes. See footnote 21 for more details on the regression and confidence interval. Figure 8: Evidence of Power to Detect Endogeneity from Over-Aggregation of Inputs (b) Proportion of Passive time with Friends Watching TV, Older Children 0 0 .1 .05 .2 .1 .3 Conditional Mean .4 .5 .6 .7 Conditional Mean .15 .2 .25 .8 .3 .9 1 .35 (a) Proportion of Active Time with Father Spent at Home, Younger Children 0 2 4 6 8 10 12 14 16 18 20 22 24 Passive time with father (hours per week) 26 28 30 0 2 4 6 8 10 12 14 16 18 20 Active time with friends 22 24 26 28 30 Note: In each plot, the vertical axis shows the mean of a potential confounder conditional on a given level of time input (i.e. horizontal axis variable). The scatter plot represents the observed conditional mean of the confounder (aggregated to the next hour of the time input). At zero time input, we show the 95% confidence interval. The solid curve represents a third order local polynomial regression of the confounder on the time input, using time input data at the minute level. The shaded region represents the 95% confidence interval for this regression with an out-of-sample prediction at zero minutes. See footnote 21 for more details on the regression and confidence interval. 57 Figure 9: Evidence of Power to Detect Endogeneity from Non-Linear Effects (b) ActiveTime with Grandparents (Distribution), Younger Children 6 .95 Cumulative Probability .75 .8 .85 .9 .7 2 Conditional Mean 3 4 5 1 7 (a) Active Time with Grandparents (1st Moment), Younger Children 0 .6 1 .65 When active time with mother is 0 When active time with mother is 1 When active time with mother is 2 When active time with mother is 3 0 2 4 6 8 10 12 14 16 18 20 22 24 Active time with mother (hours per week) 26 28 30 0 5 10 15 20 25 30 35 Active time with grandparents (hours per week) 40 45 Note: In the plots of the right, we show the cumulative density function of the confounder for selected values of the time input (in hours), for the confounder and time input shown in the corresponding plot of the left. In the plots of the left, the vertical axis shows the mean of a potential confounder conditional on a given level of time input (i.e. horizontal axis variable). The scatter plot represents the observed conditional mean of the confounder (aggregated to the next hour of the time input). At zero time input, we show the 95% confidence interval. The solid curve represents a third order local polynomial regression of the confounder on the time input, using time input data at the minute level. The shaded region represents the 95% confidence interval for this regression with an out-of-sample prediction at zero minutes. See footnote 21 for more details on the regression and confidence interval. 58 Appendix Table 21: Non-cognitive Skills Loading Factors Cheats or tells lies Bullies or mean to others Feels no sorry after misbehaving Breaks things on purpose Has sudden changes in mood Feels no love Too fearful or anxious Feels worthless or inferior Sad or depressed Cries too much Easily confused Has obsessions Rather high strung, tense and nervous Argues too much Disobedient Stubborn, sullen, or irritable Has a very strong temper Has difficulty concentrating Impulsive, or acts without thinking Restless or overly active Has trouble getting along with other children Not liked by other children Withdrawn, does not get involved with others Clings to adults Demands a lot of attention Too dependent on others Thinks before acting, not impulsive Generally well behaved, does what adults request Can get over being upset quickly Waits turn in games and other activities Gets along well with other children Admired by other children Cheerful, happy Tries things for himself/herself Does neat, careful work Curious and exploring, likes new experiences Younger Cohort 0.4988 0.5519 0.4300 0.4874 0.5532 0.4679 0.4455 0.4894 0.5354 0.4139 0.5152 0.4735 0.5248 0.5822 0.5439 0.6125 0.6117 0.5895 0.5933 0.5370 0.5942 0.4502 0.4028 0.3216 0.5431 0.4666 0.4883 0.5719 0.4433 0.5142 0.6195 0.5779 0.4721 0.3619 0.4185 0.1932 Older Cohort 0.5272 0.5513 0.4729 0.4803 0.5647 0.5351 0.4776 0.5751 0.6158 0.3418 0.5314 0.5828 0.5353 0.5771 0.5872 0.6310 0.6362 0.5874 0.6395 0.5324 0.6106 0.4747 0.4533 0.2812 0.5203 0.4593 0.5418 0.5788 0.4623 0.4801 0.6168 0.5424 0.5283 0.4122 0.4201 0.2397 Note: The larger is the factor loading, the larger is the conditional correlation between the variables and the factor (i.e. the measure of non-cognitive skills). 59 Table 22: Exogeneity Test Results: Older Children, Not Value-Added Controls (1) (2) (3) (4) (5) (6) (7) No Controls Child Chrs. Mother Demographic Chrs. Family Demographic Chrs. Family Environmental Chrs. Other Environmental Chrs. School Experience Math F-stat p-Value 3.406 0.000 1.214 0.254 1.077 0.373 0.919 0.543 0.943 0.515 0.834 0.640 0.838 0.635 Vocabulary F-stat p-Value 2.468 0.001 1.719 0.042 1.644 0.056 1.490 0.101 1.333 0.174 1.351 0.164 1.349 0.165 Comprehension F-stat p-Value 2.009 0.012 1.681 0.049 1.466 0.110 1.385 0.146 1.283 0.205 1.205 0.261 1.300 0.194 Non-cognitive F-stat p-Value 1.722 0.041 1.281 0.206 1.208 0.258 1.326 0.178 1.220 0.249 1.175 0.284 1.169 0.289 Note: Entries in bold are “surviving specifications” for which we cannot reject exogeneity at 10% of significance. Each specification contains different control variables: (1) no controls; (2) child characteristics; (3) mother demographic characteristics; (4) family demographic characteristics; (5) family environmental characteristics; (6) school characteristics; (7) child’s school experience. All standard errors are corrected for heteroskedasticity. See footnote 28 for a full description of the control variables. All standard errors are corrected for heteroskedasticity. 60 Table 23: Effects of Children’s Time Allocation: Older Children, Not Value-Added Active time with mother Passive time with mother Active time with father Passive time with father Active time with grandparents Passive time with grandparents Active time with siblings Passive time with siblings Active time with friends Passive time with friends Self Active time Self Passive time Active time with others Passive time with others Don’t know or refuse to answer R-Square Observations Exogeneity test F-statistic Exogeneity test p-value Math Vocabulary Comprehension Non-cognitive 0.009** (0.004) 0.005* (0.003) 0.018** (0.007) 0.002 (0.005) 0.022* (0.012) -0.005 (0.005) 0.003 (0.007) 0.008* (0.004) 0.010** (0.004) 0.003 (0.004) 0.009** (0.003) 0.004 (0.003) 0.011 (0.007) 0.002 (0.004) 0.005 (0.004) 0.478 1453 0.838 0.635 0.004 (0.004) 0.002 (0.003) 0.013 (0.008) 0.001 (0.005) 0.028* (0.016) -0.004 (0.006) -0.003 (0.007) 0.004 (0.004) 0.008* (0.004) -0.002 (0.004) 0.001 (0.003) 0.002 (0.003) -0.003 (0.006) -0.005 (0.005) 0.003 (0.004) 0.422 1455 1.349 0.165 0.003 (0.004) 0.000 (0.003) 0.010 (0.009) 0.010** (0.005) 0.036** (0.013) -0.002 (0.006) -0.004 (0.007) 0.004 (0.004) 0.002 (0.004) 0.001 (0.004) 0.004 (0.003) 0.000 (0.003) 0.002 (0.008) -0.006 (0.004) 0.003 (0.004) 0.436 1453 1.300 0.194 0.001 (0.004) 0.004 (0.003) 0.009 (0.008) 0.005 (0.006) 0.018 (0.019) -0.009 (0.008) 0.011 (0.008) 0.000 (0.006) -0.002 (0.005) -0.004 (0.005) 0.004 (0.004) 0.003 (0.004) 0.005 (0.009) 0.003 (0.006) -0.001 (0.006) 0.134 1454 1.169 0.289 Note: Standard errors corrected for heteroskedasticity are in parentheses. All estimates are for specification (7). * Significant at the 10% level. ** Significant at the 5% level. See footnote 28 for a full description of the control variables. 61 Figure 10: Further Evidence of Power of Test (1 of 2) (b) Mother’s Age at Child Birth, Younger Children 30 29 Conditional Mean 26 28 27 25 24 30 32 Conditional Mean 34 38 36 40 42 (a) Mother’s Age, Younger Children 0 2 4 6 8 10 12 14 16 18 20 22 24 Active time with father (hours per week) 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 Active time with mother (hours per week) 26 28 30 (d) Child Lives with Biological Parents, Older Children 1 0 .2 .1 .3 .2 .4 .3 Conditional Mean .7 .5 .6 Conditional Mean .4 .5 .6 .7 .8 .8 .9 .9 1 (c) Mother is Married, Older Children 0 2 4 6 8 10 12 14 16 18 20 22 24 Passive time with father (hours per week) 26 28 30 0 4 6 8 10 12 14 16 18 20 22 24 Passive time with father (hours per week) 26 28 30 (f) Caregiver Spent on School Supplies Last Year, Older Children 0 .8 .1 .2 .3 Conditional Mean .4 .5 .6 .7 .8 Conditional Mean .82 .84 .86 .88 .9 .92 .94 .96 .98 .9 1 1 (e) Child Has Musical Instrument at Home, Older Children 2 0 2 4 6 8 10 12 14 16 18 20 22 24 Passive time with father (hours per week) 26 28 30 0 Notes: See the note for Figure 5 in the text. 62 2 4 6 8 10 12 14 16 18 20 22 24 Active time with mother (hours per week) 26 28 30 Figure 11: Further Evidence of Power of Test (2 of 2) (b) Neighborhood is Safe at Night (Rating 1-5), Older Children .4 10.5 .6 11 .8 11.5 Conditional Mean 1 1.2 1.4 1.6 Conditional Mean 12 12.5 13 13.5 1.8 14 2 2.2 14.5 (a) Father’s Level of Education (Years), Younger Children 0 2 4 6 8 10 12 14 16 18 20 22 24 Active time with mother (hours per week) 26 28 30 0 4 6 8 10 12 14 16 18 20 22 24 26 Don't know or refuse to answer (hours per week) 28 30 (d) Primary Caregiver Completed Weekday Diary, Older Children 0 0 .1 .1 .2 .2 .3 Conditional Mean .5 .3 .4 Conditional Mean .4 .5 .6 .7 .6 .8 .7 .9 1 .8 (c) Child Completed Weekend Diary (With or Without Help), Older Children 2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Don't know or refuse to answer (hours per week) 28 30 0 4 6 8 10 12 14 16 18 20 22 24 26 Don't know or refuse to answer (hours per week) 28 30 (f) Proportion of Passive Time with Mother Watching TV, Younger Children 0 .2 .05 .3 .1 Conditional Mean .4 .5 Conditional Mean .15 .2 .25 .3 .35 .6 .4 .45 .7 .5 (e) Proportion of Active time with Friends Engaging in Arts and Crafts, Older Children 2 0 2 4 6 8 10 12 14 16 18 20 Passive time with friends 22 24 26 28 30 0 Notes: See the note for Figure 5 in the text. 63 2 4 6 8 10 12 14 16 18 20 Passive time with father 22 24 26 28 30