Part 6: MLE for RE Models [ 1/38] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business Part 6: MLE for RE Models [ 2/38] The Random Effects Model The random effects model y it =xitβ+ci +εit , observation for person i at time t y i =X iβ+cii+εi , Ti observations in group i =X iβ+ci +ε i , note ci (c i , c i ,...,c i ) y =Xβ+c +ε, Ni=1 Ti observations in the sample c=(c1 , c2 ,...cN ), Ni=1 Ti by 1 vector ci is uncorrelated with xit for all t; E[ci |Xi] = 0 E[εit|Xi,ci]=0 Part 6: MLE for RE Models [ 3/38] Error Components Model Generalized Regression Model y it x it b+εit +ui E[εit | X i ] 0 E[εit2 | X i ] σ 2 E[ui | X i ] 0 E[ui2 | X i ] σ u2 2 u2 u2 2 2 2 u u Var[ε i +uii ] 2 u2 u y i =X iβ+ε i +uii for Ti observations 2 2 u u2 u2 Part 6: MLE for RE Models [ 4/38] Notation 2 u2 u2 2 2 2 u u Var[ε i +uii ] 2 u2 u = 2I Ti u2ii Ti Ti 2 2 u u2 u2 = 2I Ti u2ii = Ωi Ω1 0 Var[w | X ] 0 0 Ω2 0 0 0 (Note these differ only in the dimension Ti ) ΩN Part 6: MLE for RE Models [ 5/38] Maximum Likelihood Assuming normality of it and ui. Treat T joint observations on [(i1 , i2 ,...iTi ),ui ] as one Ti variate observation. The mean vector of ε i uii is zero and the covariance matrix is Ωi=2 I u2ii. The joint density for ε i ( y i - X iβ) is f(ε i ) (2) Ti / 2 | Ωi |1 / 2 exp 12 ( y i - X iβ)Ωi-1 ( y i - X iβ) logL= Ni=1logL i where -1 Ti log2 log | Ωi | ( y i - X iβ)Ωi-1 ( y i - X iβ) 2 -1 Ti log2 log | Ωi | εiΩi-1ε i = 2 logL i (β, 2 ,u2 ) = Part 6: MLE for RE Models [ 6/38] MLE Panel Data Algebra (1) Ωi-1 1 2 2 ii I Ti 2 2 Tiu So, 1 ε iΩ ε i 2 -1 i 1 2 2 ε iii ε i ε iε i 2 2 Tiu 2 (Ti i )2 εiε i 2 2 T i u Part 6: MLE for RE Models [ 7/38] MLE Panel Data Algebra (1, cont.) Ωi =2 I u2ii =2 [I 2ii]=2 A |Ωi|=(2 ) Ti t i 1 t , = a characteristic root of A T Roots are (real since A is symmetric) solutions to Ac = c Ac = c = c + 2iic or 2i(ic) = ( - 1)c Any vector whose elements sum to zero (ic=0) is a characteristic vector that corresponds to root = 1. There are Ti -1 such vectors, so Ti - 1 of the roots are 1. Suppose ic 0. Premultiply by i to find 2ii(ic) = ( - 1)ic = Ti2 (ic)=( - 1)ic. Since ic 0, divide by it to obtain the remaining root =1+Ti2 . 2 Ti Therefore, |Ωi|=( ) 2 Ti 2 ( ) (1 T ) t 1 t i Ti Part 6: MLE for RE Models [ 8/38] MLE Panel Data Algebra (1, conc.) -1 Ti log 2 log | Ωi | εiΩi-1ε i 2 2 (Ti i )2 -1 1 2 2 Ti log 2 Ti log log(1 Ti ) 2 εiε i 2 2 Tiu2 logL Ni1 logL i logL i 2 (Ti i )2 -1 1 N 2 N N 2 [(log 2 log )i1 Ti + i1 log(1 Ti )] 2 i1 εiε i 2 2 2 2 T i u 2 2 u 2 since / , 2 (Ti i )2 2 (Ti i )2 (Ti i )2 2 2 2 2 2 Tiu Ti 1 2 Ti -T 1 logL i i [(log 2 log 2 ) +log(1 Ti2 )] 2 2 2 (Ti i )2 εiε i 2 1 T i Part 6: MLE for RE Models [ 9/38] Maximizing the Log Likelihood Difficult: “Brute force” + some elegant theoretical results: See Baltagi, pp. 22-23. (Back and forth from GLS to ε2 and u2.) Somewhat less difficult and more practical: At any iteration, given estimates of ε2 and u2 the estimator of is GLS (of course), so we iterate back and forth between these. See Hsiao, pp. 39-40. 0. Begin iterations with, say, FGLS estimates of β, 2 , u2 . 2 2 2 2 ˆ by FGLS ( 1. Given ˆ ,r and ˆ u,r , compute β ˆ ,r , ˆ u,r ) r+1 ˆ compute 2. Given β ˆ r+1 2 ,r+1 Ni=1ˆi,r 1MDi ˆi,r 1 = Ni=1 (Ti 1) N ˆi.r2 1 i=1 ˆ , 3. Given β compute = ˆ r+1 ˆ N ˆ -β ˆ = 0. 4. Return to step 1 and repeat until β r+1 r 2 ,r+1 2 u,r+1 Part 6: MLE for RE Models [ 10/38] Direct Maximization of LogL Simpler : Take advantage of the invariance of maximum likelihood estimators to transformations of the parameters. Let =1/2 , =u2 / 2 , R i Ti 1, Qi / R i , logL i (1 / 2)[(εiε i Qi (Ti i )2 ) logR i Ti log Ti l og2] Can be maximized using ordinary optimization methods (not Newton, as suggested by Hsiao). Treat as a standard nonlinear optimization problem. Solve with iterative, gradient methods. Part 6: MLE for RE Models [ 11/38] Part 6: MLE for RE Models [ 12/38] Part 6: MLE for RE Models [ 13/38] Maximum Simulated Likelihood Assuming it and ui are normally distributed. Write ui = u v i where v i ~ N[0,1]. Then y it = x it β + u v i it . If v i were observed data, all observations would be independent, and log f(y it | x it , v i ) 1 / 2[log 2 log 2 (y it - x it β - u v i )2 / 2 ] Let 2 1 / 2 The log of the joint density for Ti observations with common v i is logL i (β, u , 2 | v i ) tTi 1 (1 / 2)[log 2 log 2 2 (y it - x itβ - u v i ) 2 ] The conditional log likelihood for the sample is then logL(β, u , 2 | v) Ni1 tTi 1 (1 / 2)[log 2 log 2 2 (y it - x itβ - u v i )2 ] Part 6: MLE for RE Models [ 14/38] Likelihood Function for Individual i The conditional log likelihood for the sample is then logL(β, u , 2 | v) Ni1 tTi 1 (1 / 2)[log 2 log 2 2 (y it - x it β - u v i ) 2 ] The unconditional log likelihood is obtained by integrating v i out of L i (β, u , 2 | v i ); 2 L i (β, u , ) Ti t 1 2 exp[(2 / 2)(y it - x it β - u v i )2 ] (v i )dv i E vi L i (β, u , 2 | v i ) 2 The integral usually does not have a closed form. (For the normal distribution above, actually, it does. We used that earlier. We ignore that for now.) Part 6: MLE for RE Models [ 15/38] Log Likelihood Function The full log likelihood function that needs to be maximized is logL i1 logL i (β, u , 2 ) N = N i1 N i1 Ti 2 exp[(2 / 2)(y it - x it β - u v i )2 ] log t 1 (v i )dv i 2 logE vi L i (β, u , 2 | v i ) This is the function to be maximized to obtain the MLE of [β, , u ] Part 6: MLE for RE Models [ 16/38] Computing the Expected LogL How to compute the integral: First note, (v i ) exp( v i2 / 2) / 2 Ti t 1 2 exp[(2 / 2)(y it - x it β - u v i )2 ] 2 E vi L i (β, u , 2 | v i ) (v i )dv i (1) Numerical (Gauss-Hermite) quadrature for integrals of this form is remarkably accurate; 2 e v g(v)dv H h 1 w hg(ah ) Example: Hermite Quadrature Nodes and Weights, H=5 Nodes: -2.02018,-0.95857, 0.00000, 0.95857, 2.02018 Weights: 1.99532,0.39362, 0.94531, 0.39362, 1.99532 Applications usually use many more points, up to 96 and Much more accurate (more digits) representations. Part 6: MLE for RE Models [ 17/38] Quadrature A change of variable is needed to get it into the right form: Each term then becomes L i,Q h1 wh H 1 Ti t 1 2 exp[(2 / 2)(y it - x it β - uah )2 ] 2 and the problem is solved by maximizing with respect to β, 2 , u logL Q i1 logL i,Q N (Maximization will be continued later in the semester.) Part 6: MLE for RE Models [ 18/38] Gauss-Hermite Quadrature Ti t 1 2 exp[(2 / 2)(y it - x it β - u v i )2 ] 2 (v i )dv i (v i ) exp( v i2 / 2) / 2 Make a change of variable to ai v i / 2 ,v i= 2 ai , dv i = 2 dai 1 1 2 2 1 2 2 2 1 2 1 2 2 i Ti t 1 2 i Ti t 1 exp(a ) exp(a ) 2 exp[(2 / 2)(y it - x it β - u 2ai )2 ] 2 dai 2 2 exp[(2 / 2)(y it - x it β - u 2ai )2 ] 2 ] dai exp(ai2 ) tTi 12 exp[(2 / 2)(y it - x it β - u 2ai ) 2 ] dai exp(ai2 )g ai dai 1 2 Hh1w hg(ah ) Part 6: MLE for RE Models [ 19/38] Simulation The unconditional log likelihood is an expected value; logL i (β, u , 2 ) Ti 2 exp[(2 / 2)(y it - x it β - u v i )2 ] = log t 1 (v i )dv i 2 logE vi L i (β, u , 2 | v i ) = E v g(v i ) An expected value can be 'estimated' by sampling observations and averaging them 2 2 2 exp[ ( / 2)(y x β v ) ] 1 R T it it u ir i ˆ v g(v i ) t 1 E R r 1 2 The unconditional log likelihood function is then 1 R Ti 2 exp[(2 / 2)(y it - x it β - u v ir )2 ] i1 log R r 1 t 1 2 N This is a function of (β, 2 , u| y i , X i , v i,1 ,..., v i,R ),i 1,...,N The random draws on v i,r become part of the data, and the function is maximized with respect to the unknown parameters. Part 6: MLE for RE Models [ 20/38] Convergence Results Target is expected log likelihood: logE vi [L(β,2 |v i )] Simulation estimator based on random sampling from population of v i LogL S (β,2 ) 1 R Ti 2 exp[(2 / 2)(y it - x it β - u v ir )2 ] log i1 r 1 t 1 R 2 N The essential result is plim(R )LogL S (β,2 ) logE vi [L(β,2|v i )] Conditions: (1) General regularity and smoothness of the log likelihood (2) R increases faster than N. ('Intelligent draws' - e.g. Halton sequences makes this somewhat ambiguous.) Result: Maximizer of LogL S (β,2 ) converges to the maximizer of logE vi [L(β,2|v i )]. Part 6: MLE for RE Models [ 21/38] MSL vs. ML .154272 = .023799 Part 6: MLE for RE Models [ 22/38] Two Level Panel Data Nested by construction Unbalanced panels No real obstacle to estimation Some inconvenient algebra. In 2 step FGLS of the RE, need “1/T” to solve for an estimate of σu2. What to use? Q 1/ T (1/T)=(1/N)Ni1 (1 / Ti ) (early NLOGIT) QH =[Ni=1 (1/Ti )]1/N (Stata) (TSP, current NLOGIT, do not use this.) Part 6: MLE for RE Models [ 23/38] Balanced Nested Panel Data Zi,j,k,t = test score for student t, teacher k, school j, district i L = 2 school districts, i = 1,…,L Mi = 3 schools in each district, j = 1,…,Mi Nij = 4 teachers in each school, k = 1,…,Nij Tijk = 20 students in each class, t = 1,…,Tijk Antweiler, W., “Nested Random Effects Estimation in Unbalanced Panel Data,” Journal of Econometrics, 101, 2001, pp. 295-313. Part 6: MLE for RE Models [ 24/38] Nested Effects Model y ijkt x ijkt uijk v ij w i ijkt Strict exogeneity, all parts uncorrelated. Normality assumption added later Var[uijk v ij w i ijkt ]=u2 2v 2w 2 Overall covariance matrix Ω is block diagonal over i, each diagonal block is block diagonal over j, each of these, in turn, is block diagonal over k, and each lowest level block has the form of Ω we saw earlier. Part 6: MLE for RE Models [ 25/38] GLS with Nested Effects Define 2 12 Tu2 2 2 Tu2 22 NT2v Tu2 2 12 NT2v 2v MNT2w NT2v Tu2 2 22 MNT2w GLS is equivalent to OLS regression of y ijkt y ijkt 1 y ijk . y ij .. y i ... 1 1 2 2 3 on the same transformation of x ijkt . FGLS estimates are obtained by "three group-wise between estimators and the within estimator for the innermost group." Part 6: MLE for RE Models [ 26/38] Unbalanced Nested Data With unbalanced panels, all the preceding results fall apart. GLS, FGLS, even fixed effects become analytically intractable. The log likelihood is very tractable Note a collision of practicality with nonrobustness. (Normality must be assumed.) Part 6: MLE for RE Models [ 27/38] Log Likelihood (1) Define : u2 2v 2w u 2 , v 2 , w 2 . Construct: ijk 1 Tijk u , ij ij 1 ij v , i Nij k 1 Mi j 1 Tijk ijk ij ij i 1 w i T 2 Sums of squares: A ijk t ijk1eijkt , eijkt y ijkt x ijktβ Tijk t 1 ijkt B ijk e , B ij Nij k 1 B ijk ijk , Bi Mi j 1 B ij ij Part 6: MLE for RE Models [ 28/38] Log Likelihood (2) H total number of observations logL= -1 [Hlog(22 ) Li1 { 2 log i Mji1 { N log ij k ij1 { 2 2 u B ijk v B ij w B i2 log ijk 2 }}}] 2 2 2 ijk ij i A ijk (For 3 levels instead of 4, set L = 1 and w = 0.) Part 6: MLE for RE Models [ 29/38] Maximizing Log L Antweiler provides analytic first derivatives for gradient methods of optimization. Ugly to program. Numerical derivatives: Let δ be the full vector of K+4 parameters. Let r perturbation vector, with =max(0 ,1 | r |) in the rth position and zero in the other K+3 positions. logL logL(δ r ) logL(δ r ) r 2 Part 6: MLE for RE Models [ 30/38] Asymptotic Covariance Matrix "Even with an analytic gradient, however, the Hessian matrix, Ψ is typically obtained through numeric approximation methods." Read "the second derivatives are too complicated to derive, much less program." Also, since logL is not a sum of terms, the BHHH estimator is not useable. Numerical second derivatives were used. Part 6: MLE for RE Models [ 31/38] An Appropriate Asymptotic Covariance Matrix The expected Hessian is block diagonal. We can isolate β. 2 logL 1 N T 2 Li1Mji1k ij1 t ijk1 x ijkt x ijkt ββ W L Mi Nij 1 Tijk i1 j1 k 1 t 1 x ijkt 2 ijk Tijk t 1 x ijkt Nij 1 v L Mi 1 Nij 1 Tijk Tijk 2 i1 j1 k 1 t 1 x ijkt k 1 t 1 x ijkt ij ijk ijk M 1 Nij 1 u L Mi 1 Nij 1 Tijk T 2 i1 j1 k 1 t 1 x ijkt ji1 k 1 t ijk1 x ijkt ijk ij ijk ij The inverse of this, evaluated at the MLEs provides the appropriate ˆ. Standard errors for the estimated asymptotic covariance matrix for β variance estimators are not needed. Part 6: MLE for RE Models [ 32/38] Some Observations Assuming the wrong (e.g., nonnested) error structure Still consistent – GLS with the wrong weights Standard errors (apparently) biased downward (Moulton bias) Adding “time” effects or other nonnested effects is “very challenging.” Perhaps do with “fixed” effects (dummy variables). Part 6: MLE for RE Models [ 33/38] An Application Y1jkt = log of atmospheric sulfur dioxide concentration at observation station k at time t, in country i. H = 2621, 293 stations, 44 countries, various numbers of observations, not equally spaced Three levels, not 4 as in article. Xjkt =1,log(GDP/km2),log(K/L),log(Income), Suburban, Rural,Communist,log(Oil price), average temperature, time trend. Part 6: MLE for RE Models [ 34/38] Estimates x1 Dimension . . . x2 x3 C S T C . T x4 x5 x6 x7 x8 x9 x10 C C C C . C . . S S . . S . T T T . T T T Random Effects -10.787 (12.03) 0.445 (7.921) 0.255 (1.999) -0.714 -0.627 -0.834 0.471 -0.831 -0.045 -0.043 (5.005) (3.685) (2.181) (2.241) (2.267) (4.299) (1.666) Nested Effects -7.103 (5.613) 0.202 (2.531) 0.371 (2.345) -0.477 (2.620) -0.720 (4.531) -1.061 (3.439) 0.613 (1.443) -0.089 (2.410) -0.044 (3.719) -0.046 (10.927) 2 0.330 0.329 u 1.807 1.017 v 1.347 logL -2645.4 (t ratios in parentheses) -2606.0 Part 6: MLE for RE Models [ 35/38] Rotating Panel-1 The structure of the sample and selection of individuals in a rotating sampling design are as follows: Let all individuals in the population be numbered consecutively. The sample in period 1 consists of N, individuals. In period 2, a fraction, met (0 < me2 < N1) of the sample in period 1 are replaced by mi2 new individuals from the population. In period 3 another fraction of the sample in the period 2, me2 (0 < me2 < N2) individuals are replaced by mi3 new individuals and so on. Thus the sample size in period t is Nt = {Nt-1 - met-1 + mii }. The procedure of dropping met-1 individuals selected in period t - 1 and replacing them by mit individuals from the population in period t is called rotating sampling. In this framework total number of observations and individuals observed are ΣtNt and N1 + Σt=2 to Tmit respectively. Heshmati, A,“Efficiency measurement in rotating panel data,” Applied Economics, 30, 1998, pp. 919-930 Part 6: MLE for RE Models [ 36/38] Rotating Panel-2 The outcome of the rotating sample for farms producing dairy products is given in Table 1. Each of the annual sample is composed of four parts or subsamples. For example, in 1980 the sample contains 79, 62, 98, and 74 farms. The first three parts (79, 62, and 98) are those not replaced during the transition from 1979 to 1980. The last subsample contains 74 newly included farms from the population. At the same time 85 farms are excluded from the sample in 1979. The difference between the excluded part (85) and the included part (74) corresponds to the changes in the rotating sample size between these two periods, i.e. 313-324 = -11. This difference includes only the part of the sample where each farm is observed consecutively for four years, Nrot. The difference in the non-rotating part, N„„„, is due to those farms which are not observed consecutively. The proportion of farms not observed consecutively, Nnon in the total annual sample, Nnon varies from 11.2 to 22.6% with an average of 18.7 per cent. Part 6: MLE for RE Models [ 37/38] Rotating Panels-3 Simply an unbalanced panel Time effects may be complicated. Treat with the familiar techniques Accounting is complicated Biorn and Jansen (Scand. J. E., 1983) households cohort 1 has T = 1976,1977 while cohort 2 has T=1977,1978. But,… “Time in sample bias…” may require special treatment. Mexican labor survey has 3 periods rotation. Some families in 1 or 2 or 3 periods. Part 6: MLE for RE Models [ 38/38] Pseudo Panels T different cross sections. y i(t),t x i(t),t ui(t) i(t),t , i(t)=1,...,N(t); t=1,...,T T N(t) independent observations. These are t=1 Define C cohorts - e.g., those born 1950-1955. y c,t x c,t uc,t c,t , c=1,...,C; t=1,...,T Cohort sizes are Nc (t). Assume large. Then uc,t uc for each cohort. Creates a fixed effects model: y c,t x c,t uc c,t , c=1,...,C; t=1,...,T. (See Baltagi 10.3 for issues relating to measurement error.)