MEMORANDUM No 20/2010 Identifying Trend and Age Effects in Sickness Absence from Individual Data: Some Econometric Problems Erik Biørn ISSN: 0809-8786 Department of Economics University of Oslo This series is published by the University of Oslo Department of Economics In co-operation with The Frisch Centre for Economic Research P. O.Box 1095 Blindern N-0317 OSLO Norway Telephone: + 47 22855127 Fax: + 47 22855035 Internet: http://www.sv.uio.no/econ e-mail: econdep@econ.uio.no Gaustadalleén 21 N-0371 OSLO Norway Telephone: +47 22 95 88 20 Fax: +47 22 95 88 25 Internet: http://www.frisch.uio.no e-mail: frisch@frisch.uio.no Last 10 Memoranda No 19/10 Michael Hoel, Svenn Jensen Cutting Costs of Catching Carbon Intertemporal effects under imperfect climate policy No 18/10 Hans Jarle Kind, Tore Nilssen, Lars Sørgard Price Coordination in Two-Sided Markets: Competition in the TV Industry No 17/10 Vladimir Krivonozhko, Finn R. Førsund and Andrey V. Lychev A Note on Imposing Strong Complementary Slackness Conditions in DEA No 16/10 Halvor Mehlum and Karl Moene Aggressive elites and vulnerable entrepreneurs - trust and cooperation in the shadow of conflict No 15/10 Nils-Henrik M von der Fehr Leader, Or Just Dominant? The Dominant-Firm Model Revisited No 14/10 Simen Gaure OLS with Multiple High Dimensional Category Dummies No 13/10 Michael Hoel Is there a green paradox? No 12/10 Michael Hoel Environmental R&D No 11/10 Øystein Børsum Employee Stock Options No 10/10 Øystein Børsum Contagious Mortgage Default No 09/10 Derek J. Clark and Tore Nilssen Learning by Doing in Contests No 08/10 Jo Thori Lind The Number of Organizations in Heterogeneous Societies Previous issues of the memo-series are available in a PDF® format at: http://www.sv.uio.no/econ/forskning/publikasjoner/memorandum IDENTIFYING TREND AND AGE EFFECTS IN SICKNESS ABSENCE FROM INDIVIDUAL DATA: SOME ECONOMETRIC PROBLEMS∗) ERIK BIØRN Department of Economics, University of Oslo, P.O. Box 1095 Blindern, 0317 Oslo, Norway E-mail: erik.biorn@econ.uio.no Abstract: When using data from individuals who are in the labour force to disentangle the empirical relevance of cohort, age and time effects for sickness absence, the inference may be biased, affected by sorting-out mechanisms. One reason is unobserved heterogeneity potentially affecting both health status and ability to work, which can bias inference because the individuals entering the data set are conditional on being in the labour force. Can this sample selection be adequately handled by attaching unobserved heterogeneity to non-structured fixed effects? In the paper we examine this issue and discuss the econometric setup for identifying from such data time effects in sickness absence. The inference and interpretation problem is caused, on the one hand, by the occurrence of time, cohort and age effects also in the labour market participation, on the other hand by correlation between unobserved heterogeneity in health status and in ability to work. We show that running panel data regressions, ordinary or logistic, of sickness absence data on certain covariates, when neglecting this sample selection, is likely to obscure the interpretation of the results, except in certain, not particularly realistic, cases. However, the fixed individual effects approach is more robust in this respect than an approach controlling for fixed cohort effects only. Keywords: Sickness absence, health-labour interaction, cohort-age-time problem, self-selection, latent heterogeneity, bivariate censoring, truncated binormal distribution, panel data JEL classification: C23, C25, I38, J22 * This paper is part of the project “Absenteeism in Norway – Causes, Consequences, and Policy Implications”, financed by the Norwegian Research Council (grant #187924). I am grateful to Knut Røed for comments. 1 Introduction During the last two decades, the rate at which workers have been absent from work due to sickness – absenteeism – has risen in several countries. Norway, for instance, has seen a sharp increase, from around 4–5 per cent of paid hours in the early 1990s to around 6.5 per cent in 2010. This rise has occurred despite general improvements in self-reported health conditions. In a recent paper, Biørn et al. (2010) have, by exploiting individual data on long-term absence spells for virtually all workers in Norway over a 13-year period, addressed this problem empirically, attempting, in particular, to disentangle the empirical relevance of cohort, age and time effects by means of “fixed effect methods”. It is obvious that the data available for a study of this kind are potentially affected by sorting-out mechanisms because the individuals entering the data set are conditional on being in the labour force. It may be questioned whether this sample selection can be adequately treated by handling unobserved heterogeneity through fixed effects, and whether suppressing individual heterogeneity and instead conditioning on cohort or age is likely to accentuate the bias in the estimation of time effects. In this paper we examine this issue and discuss more thoroughly the econometric setup for identifying from such data time effects in sickness absence while, as far as possible, controlling for cohort/age effects and systematic sample selection. The inference and interpretation problems arise, on the one hand, because of the occurrence of time and cohort/age effects also in the labour market participation, on the other hand because unobserved heterogeneity which most likely affects both health status and ability to work. Specifically, the modelling and inference should account for these two latent variables being correlated. Running regressions – ordinary or logistic – of sickness absence data on certain regressors, without accounting for this sample selection, is likely to obscure the interpretation of the findings and make it difficult to explain their message to non-specialists. The content of the paper is organized as follows. In Section 2 a simple basic model is formulated, explaining jointly degree of ability to work and degree of sickness by time, cohort and age, accounting for the exact collinearity of the latter, as well as individual heterogeneity. The modelling of unobserved heterogeneity and its implication for the interpretation of the coefficients are discussed in Section 3. Derived sickness probabilities are discussed in Section 4, where we emphasize the distinction between conditioning on individual effects and conditioning on cohort or age. Next, in Section 5, models treating degree of sickness as an observable quantitative variable are discussed, while in Section 6 models treating it as binary (sick versus non-sick) are considered. In the two latter sections, selection bias problems and ways of coming to grips with them are put in focus. Some concluding remarks follow in Section 7. 1 2 Notation and basic model: Heterogeneity unmodelled Let i and t denote individual and time period (year) and let ci and ait be birth cohort and age. The three variables are collinear, since by definition (2.1) ait ≡ t−ci . Let 1{A} = 1 and = 0 if event A is true and untrue, respectively, and define wit = 1{Individual i belongs to the labour force at time t}, sit = 1{Individual i is reported sick at time t}. Let also wit∗ = Degree of ability to work, individual i, time t, s∗it = Degree of sickness, individual t, time t, both quantitative and continuous, although not frequently observable in this way. Regardless of whether (wit∗ , s∗it ) are observable or latent, we postulate that they s depend on cohort, time, age, and latent heterogeneity (µw i , µi ) as w (2.2) wit∗ = βw ci + γw t + δw ait + µw i + εit , (2.3) s∗ = βs ci + γs t + δs ait + µsi + εsit , w it 2 0 σw σws εit w s (2.4) |[ci , t, µi , µi ] ∼ IID , ≡ IID(0, Σ). εsit 0 σws σs2 s where εw it and εit are genuine disturbances. Whether or not the covariance matrix Σ is diagonal, i.e., whether σws = 0 or 6= 0, will be important for the selection bias issue. s Ways of modelling the latent individual effects (µw i , µi ) and their consequences will be discussed in Section 3. We treat cohort, year and age as quantitative, but the terms involving these variables in (2.2)–(2.3) and formulae derived from them can be easily replaced by terms in cohort, year, age dummies – if desired. Specifically, we may extend t, ci , ait to (column) vectors of cohort, time, age dummies, and extend the scalar coefficients and (βw , γw , δw ) and (βs , γs , δs ) to (row) vectors of dummy coefficients, paying regard to the definitional relationships between the dummies which correspond to (2.1). Our primary objective is to identify γs , in combination with βs or δs if possible, while controlling for observed and unobserved heterogeneity. Because cohort, time and age are linearly related, confer (2.1), and the equations under consideration are linear, the dimension of the equations must be reduced accordingly (2.2) or (2.3) is confronted with data. As a starting point for the empirical modelling we therefore can take either of the following versions of the equations: w (2.5) wit∗ = (βw −δw )ci + (γw +δw )t + µw i + εit w ≡ (γw +βw )t + (δw −βw )ait + µw i + εit w ≡ (βw +γw )ci + (δw +γw )ait + µw i + εit , (2.6) s∗it = (βs −δs )ci + (γs +δs )t + µsi + εsit ≡ (γs +βs )t + (δs −βs )ait + µsi + εsit ≡ (βs +γs )ci + (δs +γs)ait + µsi + εsit . 2 3 Extensions: Modelling systematic heterogeneity The latent effects are likely to be correlated with observed regressors, for instance because norms with respect to labour force participation and absenteeism are correlated with cohort. Econometrically, a ‘norm’ is a latent entity, to be attached to, ‘proxied by’, observable variables to be of relevance. A simple way of formalizing this is (3.1) (3.2) (3.3) w w µw i =αw + λw ci + νi ≡ αw + λw (t−ait ) + νi , µs =αs + λs ci + νis ≡ αs + λs (t−ait ) + νis , w i 2 νi 0 ωw ωws |[ci , t] ∼ IID , ≡ IID(0, Ω), νis 0 ωws ωs2 and concurrently modify (2.4) to w 2 εit 0 σw σws w s (3.4) |[ci , t, νi , νi ] ∼ IID , ≡ IID(0, Σ). εsit 0 σws σs2 Inserting (3.1)–(3.2) in (2.5)–(2.6), we obtain (3.5) wit∗ = αw + (βw +λw −δw )ci + (γw +δw )t + νiw + εw it ≡ αw + (γw +βw +λw )t + (δw −βw −λw )ait + νiw + εw it ≡ αw + (βw +λw +γw )ci + (δw +γw )ait + νiw + εw , it (3.6) s∗it = αs + (βs +λs −δs )ci + (γs +δs )t + νis + εsit ≡ αs + (γs +βs +λs )t + (δs −βs −λs )ait + νis + εsit . ≡ αs + (βs +λs +γs )ci + (δs +γs)ait + νis + εsit This stylized modelling of heterogeneity makes (βw , βs ) unidentifiable, as it implies that we in (2.5)–(2.6) must extend (βw , βs ) to (βw +λw , βs +λs ) and replace s w s (µw i , µi ) by (νi , νi ). In view of (3.3)–(3.4), the composite disturbances s w w s s (uw it , uit ) = (νi +εit , νi +εit ) have a vector error components form with components mutually orthogonal (εz ⊥ νz , z = w, s) and orthogonal to both regressors, with standard deviations (τw , τs ) = 1 1 [(σw2 +ωw2 ) 2 , (σs2+ωs2 ) 2 ], covariance τws = σws + ωws and correlation coefficient κws = τws /[τw τs ]. We will to some extent stick to (3.1)–(3.4) as a way of modeling systematic heterogeneity on the following. However, unobserved heterogeneity may be related also to other observable variables than cohort, some of which time-varying, reflecting (gradual) changes in ‘norms’ (‘norm drift’); (3.1)–(3.2) may be argued to be too ‘simplistic’. Consider a s variant of (2.2)–(2.3) where uni-dimensional heterogeneity (µw i , µi ) is generalized to s two-dimensional heterogeneity (µw it , µit ) and (3.1)–(3.2) are extended to ‡w ‡ ‡ w µw it = αw + λw ci + γw t + δw ait + νi + εit , µsit = αs + λs ci + γs‡ t + δs‡ ait + νis + ε‡s it . 3 It is easy to show that this essentially implies extending (γw , δw , γs , δs ) in (3.5)–(3.6) ‡w ‡s s to include (γw‡ , δw‡ , γs‡ , δs‡ ), and (εw it , εit ) to include (εit , εit ), respectively. Obvious, but important, conclusions so far are: Conclusion 1: The interpretation of ‘time effect in absenteeism’ depend on which mechanism determines the two kinds of unobserved heterogeneity and whether cohort or age is the other control variable. Conclusion 2: The time effects in absenteeism obtained from (2.6), with (2.4) assumed, and with heterogeneity accounted for, i.e., γs + δs or γs + βs , may be a more stable ‘structure’ – the equation has a higher degree of ‘autonomy’ – than the time effects according to (3.6), with (3.3)–(3.4) assumed, or extensions of it. The latter, unlike the former, changes when the parameters of (3.2) change. 4 Sickness probabilities 4.1 Threshold values for sickness and ability to work As remarked, wit∗ and s∗it , in particular the former, may not be observable as continuous variables, while their qualitative counterparts – whether or not individual i is in the labour force and/or is sick at time t – are usually known. Let w̄, s̄ be unknown critical threshold values for the two continuous variables determining the status ‘being in the labour force’ and ‘being reported sick’: wit∗ ≥ w̄ =⇒ Individual i is observed belonging to the labour force. s∗it ≥ s̄ =⇒ Individual i is observed being declared sick by a doctor. The work ability threshold w̄ may be time invariant or time dependent, in the latter case capturing, inter alia, (worker) ‘norm drift’, the sickness threshold s̄ may, likewise, be time invariant or time dependent, in the latter case also capturing (worker) ‘norm drift’ as well as drift in doctors’ norms or attitudes with respect to issuing sickness certificates. We want to derive expressions for the corresponding sickness probabilities. Let, as a start, ψ(u, v) be the joint density of the standardized s disturbances in (2.5)–(2.6), or in (3.5)–(3.6), i.e., (u, v) = (εw it /σw , εit /σs ), or (u, v) = s (uw it /τw , uit /τs ), and define, for arbitrary a, b, R∞R∞ (4.1) f (a, b) = P (u > a, v > b) = a b ψ(u, v) du dv, f (a, b) P (u > a, v > b) (4.2) = . g(a, b) = P (v > b|u > a) = P (u > a) f (a, −∞) In (2.5)–(2.6), while utilizing (3.1)–(3.2), it is convenient to define (4.3) (4.4) µw∗ i µiw† µs∗ i µs† i = = = = w (βw −δw )ci + µw i = αw + (βw +λw −δw )ci + νi , w (βw +γw )ci + µw i = αw + (βw +λw +γw )ci + νi , s s (βs −δs )ci + µi = αs + (βs +λs −δs )ci + νi , (βs +γs )ci + µsi = αs + (βs +λs +γs )ci + νis . 4 They can be interpreted as representing ‘gross individual heterogeneity’ inclusive of cohort effects. Then (3.5)–(3.6) can be rewritten more simply as w† w w wit∗ = (γw +δw )t + µw∗ i + εit ≡ (δw +γw )ait + µi + εit , s s∗it = (γs +δs )t + µs∗ + εsit ≡ (δs +γs )ait + µs† i i + εit . (4.5) (4.6) Combining these expressions with (2.5)–(2.6), using the definition of the binary variables in Section 2, we obtain wit = 1 ⇐⇒ sit = 1 ⇐⇒ w† w∗ εw it ≥ w̄−(γw +δw )t−µi = w̄−(γw +δw )ait −µi , s† εsit ≥ s̄−(γs +δs )t−µs∗ i = s̄−(γs +δs )ait −µi . wit∗ ≥ w̄ ⇐⇒ s∗it ≥ s̄ ⇐⇒ We introduce, in order to simplify notation, putting the kind of parameters identifiable from binary response data (confer Section 6) in focus, two sets of rescaled parameters, obtained by normalizing coefficients and thresholds against the relevant disturbance standard deviations. The first is related to (2.5)–(2.6), the second to (3.5)–(3.6), giving, respectively, ‘σ-normalized’ parameters: (4.7) γwσ (4.8) γsσ µw∗ µiwσ = i , σw µs∗ µisσ = i , σs w̄ w̄σ = , σw s̄ s̄σ = , σs γw +δw , = σw γs +δs , = σs µ†iwσ µ†isσ µw† = i , σw µs† = i , σs and ‘τ -normalized’ parameters: γw +δw , τw γs +δs , = τs βw +λw −δw , τw βs +λs −δs = , τs (4.9) γwτ = βwτ = (4.10) γsτ βsτ w̄ , τw s̄ s̄τ = , τs w̄τ = where, obviously, (w̄τ , s̄τ , γwτ , γsτ ) are smaller (in absolute value) than (w̄σ , s̄σ , γwσ , γsσ ).1 We then obtain from (2.5)–(2.6) (4.11) wit = 1 ⇐⇒ εw it σw ≥ w̄σ −γwσ t−µiwσ = w̄σ −γwσ ait −µ†iwσ , (4.12) sit = 1 ⇐⇒ εsit σs ≥ s̄σ −γsσ t−µisσ = s̄σ −γsσ ait −µ†isσ , and, likewise, from (3.5)–(3.6) (4.13) (4.14) 1 wit = 1 ⇐⇒ uw it τw ≥ w̄τ −γwτ t−βwτ ci = w̄τ −(γwτ +βwτ )t+βwτ ait , sit = 1 ⇐⇒ usit τs ≥ s̄τ −γsτ t−βsτ ci = s̄τ −(γsτ +βsτ )t+βsτ ait . Possible smooth ‘norm-drift’ in w̄ and s̄ could be absorbed into (γwτ , γsτ ) or (γwσ , γsσ ). 5 4.2 Probabilities conditional on individual effects Conditioning on individual effects, we can, using (4.1)–(4.2), (4.7)–(4.8) and (4.11)– (4.12), express the probability of being sick unconditionally and conditional on being in the labour force, as, respectively,2 (4.15) (4.16) P (sit = 1; t, µisσ ) = f (−∞, s̄σ −γsσ t−µisσ ) = f (−∞, s̄σ −γsσ ait −µ†isσ ), P (sit = 1|wit = 1; t, µiwσ , µisσ ) = g(w̄σ −γwσ t−µiwσ , s̄σ −γsσ t−µisσ ) = g(w̄σ −γwσ ait −µ†iwσ , s̄σ −γsσ ait −µ†isσ ). s If εw it and εit are stochastically independent, then g(w̄σ −γwσ t−µiwσ , s̄σ −γsσ t−µisσ ) ≡ g(−∞, s̄σ −γsσ t−µisσ ) ≡ f (−∞, s̄σ −γsσ t−µisσ ). 4.3 Probabilities conditional on cohort or on age Conditioning instead on cohort, or equivalently on age, we can, using (4.1)–(4.2), (4.9)–(4.10) and (4.13)–(4.14), express the probability of being sick unconditionally and conditional on being in the labour force, as, respectively,3 (4.17) P (sit = 1; t, ci ) = f (−∞, s̄τ −γsτ t−βsτ ci ) = f (−∞, s̄τ −(γsτ +βsτ )t+βsτ ait ), (4.18) P (sit = 1|wit = 1; t, ci ) = g(w̄τ −γwτ t−βwτ ci , s̄τ −γsτ t−βsτ ci ) = g(w̄τ −(γwτ+βwτ )t+βwτ ait , s̄σ −(γsτ+βsτ )t+βsτ ait ). s w s If not only εw it and εit , but also νi and νi are stochastically independent, then g(w̄τ −γwτ t−βwτ ci , s̄τ −γsτ t−βsτ ci ) ≡ g(−∞, s̄τ −γsτ t−βsτ ci ) ≡ f (−∞, s̄τ −γsτ t−βsτ ci ). 5 Models treating sickness as quantitative. In this section, leaving the probability expressions in Section 4, we return to the setup presented in Sections 2 and 3 and consider three models with sickness assumed quantitatively observable, say measured as the number of sickness days per unit of time. All models condition on time or on age; otherwise they differ with respect to the conditioning assumed: the individual effect (Section 5.1), the birthcohort (Section 5.2), the age (Section 5.3). Conditioning on age and on cohort give, however, models which mirror models where the conditioning is on time and cohort. We assume throughout that the observable variables are s∗it , wit , t, ci . 2 Formally, the latter probability is conditional both on being in the labour force, and on unobserved individual-specific heterogeneity in sickness and ability to work. 3 Formally, the latter probability is conditional both on being in the labour force, and on the observed cohort to which the individual belongs. 6 5.1 Conditioning on individual effect s Assume first that (µw i , µi ) are treated as fixed effects and, accordingly, that the heterogeneity submodel (3.1)–(3.3) is ‘suspended’. It follows from (2.4)–(2.6) and (4.6) that only γs +δs and the composite parameters µs∗ i defined in (4.4) can be identified. With respect to the sample, we distinguish between two cases: [A] If the sample were not censored by labour force participation, the sick-leave trend estimated by regressing s∗it linearly on (t, µs∗ i ), would have been γs +δs , since then (5.1) s∗ E(s∗it |t, µs∗ i ) = (γs +δs )t + µi . [B] If the sample is censored by labour force participation, the sick-leave trend we actually estimate differs from γs +δs . We have (5.2) s∗ s s∗ E(s∗it |wit = 1; t, µs∗ i ) = (γs +δs )t + µi + E(εit |wit = 1; t, µi ). This equation exemplifies a bivariate sample selection model, whose last term accounts for the sample selection; see, e.g., Cameron and Trivedi (2005, Section 16.5.3). This model type is sometimes referred to as Amemiya’s ‘Type 2 Tobit Model’; confer Amemiya (1985, Section 10.7). In the binormal case, where 1 1 2 ψ(u, v) = (2π)−1 (1−ρ2 )− 2 e− 2 (u −2ρ uv+v 2 )/(1−ρ2 ) , we can express E(εsit |wit = 1; t, µs∗ i ) analytically as follows. Letting φ(·) and Φ(·) be the univariate normal density and c.d.f., respectively, we get, by exploiting φ′ (u) = −uφ(u), E(v|u) = ρu, and E(u|a ≤ u ≤ b) = [φ(a)−φ(b)]/[Φ(b)−Φ(a)] [see, Johnson, Kotz and Balakrishnan (1994, Section 10.1) or Biørn (2008, Appendix 8A)]: φ(a)−φ(b) (5.3) E(v|a ≤ u ≤ b) = ρ , Φ(b)−Φ(a) and also that, for any a, φ(a) φ(−a) (5.4) λ(a) ≡ ≡ , Φ(a) 1−Φ(−a) (5.5) λ′ (a) ≡ −ξ(a) = −λ(a)[λ(a)+a]. s Therefore, if (εw it , εit ) are binormal, letting ρws = σws /(σw σs ) and using the ‘σnormalized’ parameters (4.7)–(4.8), we obtain (5.6) s w w∗ E(εsit |wit = 1, t, µs∗ i ) = E[εit |εit ≥ w̄ − (γw +δw )t−µi ; t, ci ] s w ε ε = σs E( σits | σitw ≥ w̄σ − γwσ t−µiwσ ; t, ci ) = ρws σs λ(γwσ t− w̄σ +µiwσ ). Inserting (5.6) in (5.2) we obtain (5.7) s∗ E(s∗it |wit = 1; t, µs∗ i ) = (γs +δs )t + µi + ρws σs λ(γwσ t− w̄σ +µiwσ ) = σs [γsσ t + µisσ + ρws λ(γwσ t− w̄σ +µiwσ )]. 7 Hence, utilizing (5.5), we find that the correct sickness trend, allowing for the systematic censoring, is, in general, non-linear and given by (5.8) ∂E(s∗it |wit = 1; t, µs∗ i )/∂t = γs +δs + ρws σs ∂λ(γwσ t− w̄σ +µiwσ )/∂t = σs [γsσ −γwσ ρws ξ(γwσ t− w̄σ +µiwσ )]. If ρws 6= 0, i.e., if the genuine disturbances in (2.2) and (2.3) are correlated, the sickness trend (5.8) depends on σs , γwσ , w̄σ and µiwσ . Hence, when ρws 6= 0, the correct trend will be individual-specific. What can be said about the sign of the last component in (5.8)? First, (5.5) implies that ξ(γwσ t− w̄σ +µiwσ ) is likely to be positive. Second, assume that some common unspecified factors lead both to absenteeism and drop-out from the labour force and hence ρws < 0. Third, assume that the trend in inclusion into (exclusion from) the labour market is negative (positive), i.e., γwσ < 0. Hence, (5.8) most likely implies that ∂E(s∗it |wit = 1; t, ci , µsi )/∂t < γs +δs . Conclusion 3: If the sample is censored by labour force participation and ρws 6= 0, the (theoretical) regression E(s∗it |wit = 1; t, µs∗ i ) is, in general, nons∗ linear in (t, µi ). Its form depends on the coefficients of (2.5) and (2.6) as s well as the distribution of εw it , εit , as expressed by (5.7) in the binormal case. A linear regression of s∗it on (t, µs∗ i ) will result in biased estimation of the composite sickness trend coefficient σs γsσ = γs+δs and the composite individual ∗ s∗ effects µs∗ i . If ρws = 0 the bias disappears: ∂E(sit |wit = 1; t, µi )/∂t = σs γsσ = γs+ δs . In the latter case, (2.5)–(2.6) form a recursive structure, conditional on the individual effects: first labour market participation is decided, next sickness is determined. Conditional on the individual effects, there are no latent elements bringing feedback from the latter to the former. 5.2 Conditioning on birth-cohort Assume next that heterogeneity modeled as (3.1)–(3.2) is part of the model, and let ci be the conditioning variable in addition to t. It follows from (3.3)–(3.6) and (4.6) that only γs +δs and βs +λs −δs (or one-to-one transformations of them) can be identified. With respect to the sample, we again distinguish between two cases. [A] If the sample were non-censored, the trend coefficient we would have estimated by regressing s∗it linearly on (t, ci ) would have been γs +δs , since then (5.9) E(s∗it |t, ci ) = αs + (γs +δs )t + (βs +λs −δs )ci . [B] If the sample is censored by labour force participation, the trend coefficient ac- tually estimated by regressing s∗it on (t, ci ) differs from γs +δs . We have (5.10) E(s∗it |wit = 1; t, ci ) = αs +(γs +δs )t+(βs +λs −δs )ci +E(usit |wit = 1; t, ci ). This equation exemplifies again a bivariate sample selection model, whose last term accounts for the effects of the sample selection on the expected response variable. Now, however, the origin of the selection is the composite disturbance usit = νis +εsit . 8 s w s Assume in addition that (εw it , εit ) and (νi , νi ) are binormal, implying that s (uw it , uit ) are binormal with standard deviations (τw , τs ), covariance τws and correlation coefficient κws . From (5.3), (5.4) and (5.10), introducing the ‘τ -normalized’ parameters, (4.9)–(4.10), we then obtain (5.11) E(s∗it |wit = 1; t, ci ) = αs +(γs +δs )t+(βs +λs −δs )ci +κwsτs λ(γwτ t+βwτ ci − w̄τ ) = αs + τs [γsτ t+βsτ ci +κws λ(γwτ t+βwτ ci − w̄τ )], where κws 6= 0 if at least one of σws and ωws is non-zero. We then find, in a similar way as (5.7), that the correct trend and the correct cohort effects are, in general, non-linear and given by, respectively, (5.12) (5.13) ∂E(s∗it |wit=1; t, ci )/∂t = γs+δs +κws τs ∂λ(γwτ t+βwτ ci − w̄τ )/∂t = τs [γsτ −γwτ κws ξ(γwτ t+βwτ ci − w̄τ )], ∂E(s∗it |wit=1; t, ci )/∂ci = βs+λs−δs +κwsτs ∂λ(γwτ t+βwτ ci − w̄τ )/∂ci = τs [βsτ −βwτ κws ξ(γwτ t+βwτ ci − w̄τ )]. If κws 6= 0, both derivatives depend on τs , γwτ , βwτ and w̄τ , which implies that the correct trend is cohort-specific, while the correct cohort effect is time-varying. What can be said about the sign of the last components in (5.12) and (5.13)? First, (5.5) implies that ξ(γwτ t + βwτ ci − w̄τ ) is likely to be positive. Second, assume [1] that some common latent individual-specific factors lead to absenteeism and drop-out from labour force and hence ωws < 0, or [2] that some unspecified time-varying factors also lead to absenteeism and drop-out from labour force and hence σws < 0. Together, [1] or [2] suggests κws < 0. Third, assume that the trend in inclusion into (exclusion from) the labour force is negative (positive), i.e., γwτ < 0. Hence, (5.12) implies that ∂E(s∗it |wit = 1; t, ci )/∂t < γs+δs . Conclusion 4: If the sample is censored by labour force participation, the (theoretical) regression E(s∗it |wit = 1; t, ci ) is, in general, non-linear in (t, ci ). Its form depends on the coefficients of both (2.2)–(2.3) and (3.1)–(3.2), as well s as the distribution of (uw it , uit ), as expressed by (5.11) in the binormal case. A linear (empirical) regression of s∗it on (t, ci ) will result in biased estimation of the adjusted trend coefficient γs +δs . If both σws = 0 and ωws = 0 hold, implying κws = 0, the biases disappear: ∂E(s∗it |wit = 1; t, ci )/∂t = τs γsτ = γs + δs and ∂E(s∗it |wit = 1; t, ci )/∂ci = τs βsτ = βs + λs −δs . In the latter case, (3.5)–(3.6) form a recursive structure, unconditional on the individual effects: first labour market participation is decided, next sickness is determined. Conditional on cohort, but unconditional on the individual effects, there is no feedback from the latter to the former. 5.3 Conditioning on age Assume again that (3.1)–(3.2) are part of the model, and let t and ait be the conditioning variables. It follows from (3.3)–(3.6) and (4.6) that only γs +βs +λs and 9 δs −βs −λs (or one-to-one transformations of them) can be identified. With respect to the sample, we again distinguish between two cases. [A] If the sample were non-censored, the trend coefficient we would have estimated by regressing s∗it linearly on (t, ait ) would have been γs+βs+λs , since then (5.14) E(s∗it |t, ait ) = αs + (γs +βs +λs )t + (δs −βs −λs )ait . [B] If the sample is censored by labour force participation, the trend coefficient ac- tually estimated by regressing s∗it on (t, ait ) differs from γs +βs +λs . We have4 (5.15) E(s∗it |wit=1; t, ait ) = αs +(γs+βs+λs )t+(δs−βs−λs )ait +E(usit |wit=1; t, ait ). From (5.3), (5.4), (5.15) and (4.9), we obtain, in the binormal case, (5.16) E(s∗it |wit = 1; t, ait ) = αs + (γs +βs +λs )t + (δs −βs −λs )ait + κws τs λ[(γwτ +βwτ )t−βwτ ait − w̄τ ]. The correct trend and the correct age effects therefore become5 (5.17) ∂E(s∗it |wit = 1;t, ait )/∂t = γs +βs +λs +κws τs ∂λ[(γwτ +βwτ )t−βwτ ait − w̄τ ]/∂t = τs [(γsτ +βsτ )−(γwτ +βwτ )κws ξ[(γwτ +βwτ )t−βwτ ait − w̄τ ]] , (5.18) ∂E(s∗it |wit = 1;t, ait )/∂ait = δs −βs −λs +κwsτs ∂λ[(γwτ +βwτ )t−βwτ ait − w̄τ ]/∂ait = τs [−βsτ +βwτ κws ξ[(γwτ +βwτ )t−βwτ ait − w̄τ ]] . If κws 6= 0, both derivatives depend on τs , γwτ , βwτ and w̄τ , which implies that the correct trend is age-specific and the correct age effect is time-varying. Conclusion 5: If the sample is censored by labour force participation, the (theoretical) regression E(s∗it |wit = 1; t, ait ) is, in general, non-linear in (t, ait ). Its form depends on the coefficients of both (2.2)–(2.3) and (3.1)–(3.2), as s well as the distribution of (uw it , uit ), as given by (5.16) in the binormal case. A linear (empirical) regression of s∗it on (t, ait ) will result in biased estimation of the actual trend coefficient γs + βs + λs and of the adjusted age coefficient δs − βs − λs . If both σws = 0 and ωws = 0 hold, implying κws = 0, the biases disappear: ∂E(s∗it |wit = 1; t, ait )/∂t = τs (γsτ + βsτ ) = γs + βs + λs and ∂E(s∗it |wit = 1; t, ait )/∂ait = −τs βsτ = δs − βs − λs . Then (3.5)–(3.6) form a recursive structure, unconditional on the individual effects. Conditional on cohort, but unconditional on the individual effects, there is no feedback from sickness to labour force participation. 4 5 This equation, of course, mirrors (5.10). These equations mirror (5.12)–(5.13). 10 6 Models treating sickness as dichotomously observable 6.1 General remarks Having explored the situation where the degree of absenteeism, s∗it , is assumed to be recorded quantitatively, we next consider models where absenteeism is assumed to be recorded qualitatively (dichotomously). This may sometimes be a more realistic assumption. Or even if continuous observations are available, the analyst may want to exploit it only dichotomously for ‘institutional’ reasons, because of measurement problems which may plague the data collection, suggesting a need for ‘robustifying’ the results, etc. This corresponds to the approach of Biørn et al. (2010). With respect to the sample, we distinguish between cases [A] and [B], as in Section 5. [A] Data for all individuals, whether in the labour force or outside, are in the sample. Then we could want to make inference on trend effects in the sickness probability from (4.15) or (4.17). If we base inference on (4.15) when (3.1)–(3.2) are part of the data generating mechanism, using standard binomial logit or probit analysis – and hence conditioning on ci or ait – we would estimate τ -normalized coefficients. If we base inference on (4.17), using binomial logit or probit analysis – and hence conditioning on µisσ – we would estimate σ-normalized coefficients.6 Derivatives of the (log-)probabilities, ‘marginal effects’, could be estimated from either. [B] The sample is only from individuals being in the labour force. Then, to obtain valid inference on trend effects in the sickness probability, we should account for the implicit censoring. Again, we could only obtain inference on τ - or σ-normalized coefficients. Since the relevant sickness-absence probabilities underlying our binary response data are conditional on wit = 1, they are of the form (4.16) or (4.18). When conditioning on µisσ (µ†isσ ), we obtain more robust inference on the trend in the sickness probability than when conditioning on ci (ait ). To see this we differentiate the relevant expressions for the conditional logprobability of absenteeism with respect to time and the other relevant covariates. R∞ R∞ Let Ψu (u; b) = b ψ(u, v)dv and Ψv (v; a) = a ψ(u, v)du, and write (4.1) as R∞ R∞ ∂f (a, b)/∂a = −Ψu (a; b), (6.1) f (a, b) = a Ψu (u; b)du = b Ψv (v; a)dv =⇒ ∂f (a, b)/∂b = −Ψv (b; a). Now differentiation of (4.2) gives (6.2) (6.3) ∂g(a, b) ∂ ln g(a, b) = −g(a, b)Ga (a, b) ⇐⇒ = −Ga (a, b), ∂a ∂a ∂g(a, b) ∂ ln g(a, b) = −g(a, b)Gb (a, b) ⇐⇒ = −Gb (a, b), ∂b ∂b 6 In both cases the non-normalized coefficients are non-identifiable when only discrete informa∗ tion is exploited since no metric for (wit , s∗it ) and (w̄, s̄) is exploited. 11 where R∞ R∞ ψ(a, v)dv ψ(a, v)dv Ψu (a; b) Ψu (a;−∞) Ga (a, b) = − = R ∞Rb∞ − R ∞R−∞ , ∞ f (a, b) f (a,−∞) ψ(u, v)du dv ψ(u, v)du dv a b a −∞ R∞ ψ(u, b)du Ψv (b; a) Gb (a, b) = = R ∞Ra∞ . f (a, b) ψ(u, v)du dv a b 6.2 Conditioning on individual effect It follows by combining (6.2)–(6.3) with (4.16) that the derivative of the log-probability of absenteeism with respect to time is (6.4) ∂ ln P (sit = 1|wit = 1; t, µiwσ , µisσ )/∂t = ∂ ln g(w̄σ −γwσ t−µiwσ , s̄σ −γsσ t−µisσ )/∂t = γsσ Gb (w̄σ −γwσ t−µiwσ , s̄σ −γsσ t−µisσ ) + γwσ Ga (w̄σ −γwσ t−µiwσ , s̄σ −γsσ t−µisσ ). The first term after the last equality sign represents the direct effect of the trend in absenteeism – mirroring the effect of the trend term in (2.6). It is positive when γsσ > 0 since Gb (w̄σ−γwσ t−µiwσ , s̄σ−γsσ t−µisσ ) is positive. The second term represents the indirect effect, via the trend in the ability to work and dropping out of the labour market – mirroring the effect of the trend term in (2.5). It is negative if γwσ < 0, since Ga (w̄σ −γwσ t−µiwσ , s̄σ −γsσ t−µisσ ) is, most likely, positive. 6.3 Conditioning on cohort or age Combining (6.2)–(6.3) with (4.18), it follows, likewise, that (6.5) ∂ lnP (sit=1|wit=1; t, ci )/∂t = ∂ ln g(w̄τ −γwτ t−βwτ ci , s̄σ −γsτ t−βsτ ci )/∂t = γsτ Gb (w̄τ −γwτ t−βwτ ci , s̄τ −γsτ t−βsτ ci ) + γwτ Ga (w̄τ −γwτ t−βwτ ci , s̄τ −γsτ t−βsτ ci ), (6.6) ∂ lnP (sit=1|wit=1; t, ci )/∂ci = ∂ ln g(w̄τ −γwτ t−βwτ ci , s̄σ −γsτ t−βsτ ci )/∂ci = βsτ Gb (w̄τ −γwτ t−βwτ ci , s̄τ −γsτ t−βsτ ci ) + βwτ Ga (w̄τ −γwτ t−βwτ ci , s̄τ −γsτ t−βsτ ci ), or equivalently (6.7) ∂ ln P (sit = 1|wit=1; t, ait )/∂t = ∂ ln g(w̄τ −(γwτ+βwτ )t+βwτ ait , s̄σ −(γsτ+βsτ )t+βsτ ait )/∂t = (γsτ +βsτ )Gb (w̄τ −(γwτ+βwτ )t+βwτ ait , s̄σ −(γsτ+βsτ )t+βsτ ait ) + (γwτ +βwτ )Ga (w̄τ −(γwτ+βwτ )t+βwτ ait , s̄σ −(γsτ+βsτ )t+βsτ ait ), (6.8) ∂ ln P (sit = 1|wit=1; t, ait )/∂ait = ∂ ln g(w̄τ −(γwτ+βwτ )t+βwτ ait , s̄σ −(γsτ+βsτ )t+βsτ ait )/∂ait = −βsτ Gb (w̄τ −(γwτ+βwτ )t+βwτ ait , s̄σ −(γsτ+βsτ )t+βsτ ait ) − βwτ Ga (w̄τ −(γwτ+βwτ )t+βwτ ait , s̄σ −(γsτ+βsτ )t+βsτ ait ). 12 Again, the first terms after the last equality signs represents the direct effects, while the second terms represent the indirect effect, via the ability to work and dropping out of the labour market. 6.4 The recursive case and a synthesis It is illuminating to compare the last five expressions with those obtained when the R∞ structure is recursive, i.e., ρws = 0 or τws = 0. We then have g(a, b) = b φ(v)dv = 1−Φ(b), which imply:7 Ga (a, b) = 0, Gb (a, b) = φ(−b) φ(b) = = λ(−b). 1−Φ(b) Φ(−b) Then (6.4)–(6.8) are simplified to ρws = 0 (Recursivity conditional on individual effects) =⇒ ∂lnP (sit=1|wit=1; t, µiwσ , µisσ ) ∂lnP (sit=1; t, µisσ ) = = γsσ λ(γsσ t+µisσ −s̄σ ), ∂t ∂t τws = 0 (Recursivity conditional on cohort) =⇒ ∂lnP (sit=1|wit=1; t, ci ) ∂lnP (sit=1; t, ci ) = = γsτ λ(γsτ t+βsτ ci −s̄τ ), ∂t ∂t ∂lnP (sit=1|wit=1; t, ci ) ∂lnP (sit=1; t, ci ) = = βsτ λ(γsτ t+βsτ ci −s̄τ ), ∂ci ∂ci τws = 0 (Recursivity conditional on age) =⇒ ∂lnP (sit=1|wit=1; t, ait ) ∂lnP (sit=1; t, ait ) = = (γsτ+βsτ )λ((γsτ +βsτ )t−βsτ ait −s̄τ ), ∂t ∂t ∂lnP (sit=1|wit=1; t, ait ) ∂lnP (sit=1; t, ait ) = = −βsτ λ((γsτ+βsτ )t−βsτ ait −s̄τ ). ∂ait ∂ait Conclusion 6: When we condition on (a) the individual latent heterogeneity or (b) cohort (or age) only, we should account for sample truncation when formulating the appropriate response probabilities as functions of covariates and likelihood functions for estimating trends in absenteeism, except when σws = 0 (in case (a)) and ωws = σws = 0 (in latter case (b)). The correct form of the likelihood function will, in the general case, reflect the mixture of discrete choice and sample truncation. 7 Concluding remarks We have in this paper presented a simple model framework for analyzing jointly degree of sickness and degree of work ability with two kinds of latent heterogeneity 7 If u, v are independent =⇒ ψ(u, v) = φ(u)φ(v), f (a, b) = [1−Φ(a)][1−Φ(b)], g(a, b) = [1−Φ(b)] ∀ a, where φ(u) and Φ(u) are the univariate density and the c.d.f. of u or v [confer Section 4.1], then Ψu (u; b) = [1−Φ(b)]φ(u), Ψv (v; a) = [1−Φ(a)]φ(v), and hence Ga (a, b) = 0, Gb (a, b) = φ(b)/[1−Φ(b)]. 13 interacting, one related to absenteeism (sickness absence), the other related to ability to work. Obtaining valid inference on trend effects has been the main focus of the paper. Sometimes also cohort effects or age effects can be uncovered. We have shown that correlation pattern of the two kinds of latent heterogeneity is important. Treating the two decisions as recursive may not be always be the answer, and neglecting the sample selection may obscure the interpretation of the coefficients estimated. An overall conclusion, somewhat related to and extending conclusions derived for bivariate ‘Tobit models’ in literature, is that when we stick to linear regression, the conditions which need to be satisfied for estimated composite trends (time effects) to be unbiased are stronger when the other covariate (conditioning variable) is cohort or age, than when we condition on individual effects (and, by implication, eliminate any relationship between individual heterogeneity and cohort). In the former case, the genuine disturbances in the underlying sickness equation and work ability equation should be uncorrelated. The latter case, a kind of ‘double recursivity’ should hold: both the genuine disturbances and the latent individual effects in the two equations should be uncorrelated. Inference on sickness absence trends obtained by linear regression with fixed individual effects (additive shifts in the intercept) included, may therefore be characterized as more robust than that obtained when including only cohort or age as regressors and throwing all heterogeneity into (gross) disturbances. Essentially, these conclusions also carry over to the case where absenteeism is only observed dichotomously. Natural, and rather straightforward, extensions, not elaborated in the paper, could be to replace the time, cohort and age variable by corresponding time, cohort and age dummies. Genuine ‘economic regressors’ could also be included, formally as extensions of the models’ intercepts, except that no such regressor could be individual specific, in order to avoid perfect collinearity with the individual effects. Neither could, for a similar reason, time specific regressors be included in models where time dummies replace the continuous time variable. References Amemiya, T. (1985): Advanced Econometrics. Cambridge (Ma.): Harvard University Press. Biørn, E. (2008): Økonometriske emner. En videreføring. Oslo: Unipub. Biørn, E., Gaure, S., Markussen, S., and Røed, K. (2010): The Rise in Absenteeism: Disentangling the Impacts of Cohort, Age and Time. IZA Discussion Paper No. 5091. IZA, Bonn. Cameron, A.C. and Trivedi, P.K. (2005): Microeconometrics. Methods and Applications. Cambridge: Cambridge University Press. Johnson, N.L., Kotz, S., and Balakrishnan, N. (1994): Continuous Univariate Distributions, Volume 1, Second Edition. New York: Wiley. 14