Part 12: Random Parameters [ 1/46] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business Econometric Analysis of Panel Data 12. Random Parameters Linear Models Part 12: Random Parameters [ 3/46] Parameter Heterogeneity (1) Regression model y i,t x i,t βi εit (2) Conditional probability model f(y it | x i,t , βi ) (3) Heterogeneity - how are parameters distributed across individuals? (a) Discrete - the population contains a mixture of Q types of individuals. (b) Continuous. Parameters are part of the stochastic structure of the population. Part 12: Random Parameters [ 4/46] Agenda ‘True’ Random Parameter Variation Discrete – Latent Class Continuous Classical Bayesian Part 12: Random Parameters [ 5/46] Discrete Parameter Variation The Latent Class Model (1) Population is a (finite) mixture of Q types of individuals. q = 1,...,Q. Q 'classes' differentiated by (βq , ,q ) (a) Analyst does not know class memberships. ('latent.') (b) 'Mixing probabilities' (from the point of view of the Q analyst) are 1 ,..., Q , with q=1 q 1 (2) Conditional density is P(y i,t | class q) f(y it | x i,t , βq , ,q ) Part 12: Random Parameters [ 6/46] Log Likelihood for an LC Model Conditional density for each observation is P(y i,t | x i,t , class q) f(y it | x i,t , βq ) Joint conditional density for Ti observations is f(y i1 , y i2 ,..., y i,Ti | X i , βq ) t i 1 f(y it | x i,t , βq ) T (Ti may be 1. This is not only a 'panel data' model.) Maximize this for each class if the classes are known. They aren't. Unconditional density for individual i is f(y i1 , y i2 ,..., y i,Ti | X i ) q1 q Q Ti t 1 f(y it | x i,t , βq ) Log Likelihood LogL(β1 ,..., β Q , δ1 ,..., δ Q ) i1 log q1 q t i 1 f(y it | x i,t , β q ) N Q T Part 12: Random Parameters [ 7/46] Part 12: Random Parameters [ 8/46] Example: Mixture of Normals Q normal populations each with a mean q and standard deviation q For each individual in each class at each period, 2 y it q y it q 1 1 1 f(y it | class q) exp = . 2 q j q q 2 Panel data, T observations on each individual i, T 2 1 y it q 1 T exp t 1 f(y i1 ,..., y iT | class q) 2 2 q q Log Likelihood N Q logL i1 log q1 q T 2 1 y 1 it q T exp t 1 2 2 q q Part 12: Random Parameters [ 9/46] Unmixing a Mixed Sample Sample Calc Create Create Create Kernel Regress ; 1 – 1000$ ; Ran(123457)$ ; lc1=rnn(1,1) ;lc2=rnn(5,1)$ ; class=rnu(0,1)$ ; if(class<.3)ylc=lc1 ; (else)ylc=lc2$ ; rhs=ylc $ ; lhs=ylc;rhs=one;lcm;pts=2;pds=1$ . 224 . 180 Densi t y . 135 . 090 . 045 . 000 -4 -2 0 2 4 6 YLC Ker nel densi t y est i m at e f or YLC 8 10 Part 12: Random Parameters [ 10/46] Mixture of Normals Part 12: Random Parameters [ 11/46] Estimating Which Class Prior probability Prob[class=q]=q Joint conditional density for Ti observations is P(y i1 , y i2 ,..., y i,Ti | class q) t i 1 f(y it | x i,t , βq , ,q ) T Joint density for data and class membership is P(y i1 , y i2 ,..., y i,Ti , class q) q t i 1 f(y it | x i,t , βq , ,q ) T Posterior probability for class, given the data P(y i1 , y i2 ,..., y i,Ti , class q) P(class q | y i1 , y i2 ,..., y i,Ti ) P(y i1 , y i2 ,..., y i,Ti ) P(y i1 , y i2 ,..., y i,Ti , class q) J j 1 P(y i1 , y i2 ,..., y i,Ti , class q) Use Bayes Theorem to compute the posterior probability q t i 1 f(y it | x i,t , β q , ,q ) T w(q | datai ) P(class q | y i1 , y i2 ,..., y i,Ti ) Q q1 q t i 1 f(y it | x i,t , β q , ,q ) T Best guess = the class with the largest posterior probability. Part 12: Random Parameters [ 12/46] Posterior for Normal Mixture Ti 1 y it ˆq q t 1 ˆ ˆ q ˆ q ˆ | datai ) w(q ˆ | i) w(q Ti 1 y it ˆq Q q1 ˆq t 1 ˆ ˆ q q c iqˆ q = Q q1 ciqˆq Part 12: Random Parameters [ 13/46] Estimated Posterior Probabilities Part 12: Random Parameters [ 14/46] More Difficult When the Populations are Close Together Part 12: Random Parameters [ 15/46] The Technique Still Works ---------------------------------------------------------------------Latent Class / Panel LinearRg Model Dependent variable YLC Sample is 1 pds and 1000 individuals LINEAR regression model Model fit with 2 latent classes. --------+------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------|Model parameters for latent class 1 Constant| 2.93611*** .15813 18.568 .0000 Sigma| 1.00326*** .07370 13.613 .0000 |Model parameters for latent class 2 Constant| .90156*** .28767 3.134 .0017 Sigma| .86951*** .10808 8.045 .0000 |Estimated prior probabilities for class membership Class1Pr| .73447*** .09076 8.092 .0000 Class2Pr| .26553*** .09076 2.926 .0034 --------+------------------------------------------------------------- Part 12: Random Parameters [ 16/46] Predicting Class Membership Means = 1 and 5 Means = 1 and 3 +----------------------------------++----------------------------------+ |Cross Tabulation ||Cross Tabulation | +--------+--------+-----------------+--------+--------+----------------| | | CLASS || | | CLASS | |CLASS1 | Total | 0 1 ||CLASS1 | Total | 0 1 | +--------+--------+----------------++--------+--------+----------------+ | 0| 787 | 759 28 || 0| 787 | 523 97 | | 1| 1713 | 18 1695 || 1| 1713 | 250 1622 | +--------+--------+----------------++--------+--------+----------------+ | Total| 2500 | 777 1723 || Total| 2500 | 777 1723 | +--------+--------+----------------++--------+--------+----------------+ Note: This is generally not possible as the true underlying class membership is not known. Part 12: Random Parameters [ 17/46] How Many Classes? (1) Q is not a 'parameter' - can't 'estimate' Q with and β (2) Can't 'test' down or 'up' to Q by comparing log likelihoods. Degrees of freedom for Q+1 vs. Q classes is not well defined. (3) Use AKAIKE IC; AIC = -2 logL + 2#Parameters. For our mixture of normals problem, AIC1 10827.88 AIC2 9954.268 AIC3 9958.756 Part 12: Random Parameters [ 18/46] Part 12: Random Parameters [ 19/46] Latent Class Regression Assume normally distributed disturbances 1 f(y it | class q) ,q y it x it βq ,q Mixture of normals sets x it β q = μitq . Part 12: Random Parameters [ 20/46] An Extended Latent Class Model (1) There are Q classes, unobservable to the analyst (2) Class specific model: f(y it | x it , class q) g(y it , x it , βq ) (3) Conditional class probabilities q Common multinomial logit form for prior class probabilities to constrain all probabilities to (0,1) and ensure multinomial logit form for class probabilities; exp(q ) P(class=q|δ) q , Q = 0 J j1 exp(q ) Note, q = log(q / Q ). Q q=1 q 1; Part 12: Random Parameters [ 21/46] Baltagi and Griffin’s Gasoline Data World Gasoline Demand Data, 18 OECD Countries, 19 years Variables in the file are COUNTRY = name of country YEAR = year, 1960-1978 LGASPCAR = log of consumption per car LINCOMEP = log of per capita income LRPMG = log of real price of gasoline LCARPCAP = log of per capita number of cars See Baltagi (2001, p. 24) for analysis of these data. The article on which the analysis is based is Baltagi, B. and Griffin, J., "Gasoline Demand in the OECD: An Application of Pooling and Testing Procedures," European Economic Review, 22, 1983, pp. 117-137. The data were downloaded from the website for Baltagi's text. Part 12: Random Parameters [ 22/46] 3 Class Linear Gasoline Model Part 12: Random Parameters [ 23/46] Estimating E[βi |Xi,yi, β1…, βQ] ˆ from the class with the largest estimated probability (1) Use β q (2) Probabilistic Q ˆ ˆ βi = q=1 Posterior Prob[class=q|datai ] β q Part 12: Random Parameters [ 24/46] Estimated Parameters LCM vs. Gen1 RPM Part 12: Random Parameters [ 25/46] Heckman and Singer’s RE Model Random Effects Model Random Constants with Discrete Distribution (1) There are Q classes, unobservable to the analyst (2) Class specific model: f(y it | x it , class q) g(y it , x it , q , β) (3) Conditional class probabilities q Common multinomial logit form for prior class probabilities to constrain all probabilities to (0,1) and ensure multinomial logit form for class probabilities; P(class=q|δ) q exp(q ) J j1 exp(q ) , Q = 0 Q q=1 q 1; Part 12: Random Parameters [ 26/46] LC Regression for Doctor Visits Part 12: Random Parameters [ 27/46] 3 Class Heckman-Singer Form Part 12: Random Parameters [ 28/46] The EM Algorithm Latent Class is a 'missing data' model di,q 1 if individual i is a member of class q If di,q were observed, the complete data log likelihood would be logL c i1 log q1 di,q t i 1 f(y i,t | datai,t , class q) (Only one of the Q terms would be nonzero.) Expectation - Maximization algorithm has two steps (1) Expectation Step: Form the 'Expected log likelihood' N Q T given the data and a prior guess of the parameters. (2) Maximize the expected log likelihood to obtain a new guess for the model parameters. (E.g., http://crow.ee.washington.edu/people/bulyko/papers/em.pdf) Part 12: Random Parameters [ 29/46] Implementing EM for LC Models Given initial guesses 0q 10 , 02 ,..., 0Q , β0q β10 , β02 ,..., β0Q E.g., use 1/Q for each q and the MLE of β from a one class model. (Must perturb each one slightly, as if all q are equal and all β q are the same, the model will satisfy the FOC.) ˆ0 , δ ˆ0 ˆ (1) Compute F(q|i) = posterior class probabilities, using β Reestimate each β q using a weighted log likelihood Maximize wrt βq i=1 Fˆiq N Ti t=1 log f(y it | x it , βq ) (2) Reestimate q by reestimating δ ˆ q =(1/N)Ni=1F(q|i) using old ˆ and new β ˆ Now, return to step 1. Iterate until convergence. Part 12: Random Parameters [ 30/46] Continuous Parameter Variation (The Random Parameters Model) y it x it βi it , each observation y i X iβi ε i , Ti observations βi β ui E[ui | X i ] = 0 Var[ui | X i ] Γ constant but nonzero f(ui | X i )= g(ui , Γ), a density that does not involve X i Part 12: Random Parameters [ 31/46] OLS and GLS Are Consistent y i X iβi ε i , Ti observations βi β ui y i X iβ X iui ε i , Ti observations y i X iβ w i E[w i | X i ] X iE[ui | X i ] E[ε i | X i ] 0 Var[w i | X i ] 2 I X iΓX i (Discussed earlier - two step GLS) Part 12: Random Parameters [ 32/46] ML Estimation of the RPM Sample data generation y i,t x i,t βi i,t Individual heterogeneity βi = β ui Conditional log likelihood log f(y i1 ,..., y iTi | X i , βi , ) log t i 1 f(y it | x it , βi , ) T Unconditional log likelihood logL(β, Γ , ) βi log t i 1 f(y it | x it , βi , )g(βi | β, Γ)dβi T (1) Using simulated ML or quadrature, maximize to estimate β, Γ , . (2) Using data and estimated structural parameters, compute E[βi | datai , β, Γ , ] Part 12: Random Parameters [ 33/46] RP Gasoline Market Part 12: Random Parameters [ 34/46] Parameter Covariance matrix Part 12: Random Parameters [ 35/46] RP vs. Gen1 Part 12: Random Parameters [ 36/46] Modeling Parameter Heterogeneity Conditional Linear Regression y i,t x i,t βi i,t Individual heterogeneity in the means of the parameters βi = β Δzi + ui E[ui | X i , zi ] Heterogeneity in the variances of the parameters Var[ui,k | datai ] k exp(hiδk ) Estimation by maximum simulated likelihood Part 12: Random Parameters [ 37/46] Hierarchical Linear Model COUNTRY = name of country YEAR = year, 1960-1978 LGASPCAR = log of consumption per car LINCOMEP = log of per capita income LRPMG = log of real price of gasoline LCARPCAP = log of per capita number of cars yit = 1i + 2i x1it + 3i x2it + it. 1i=1+1 zi + u1i 2i=2+2 zi + u2i 3i=3+3 zi + u3i y z x1 x2 Part 12: Random Parameters [ 38/46] Estimated HLM Part 12: Random Parameters [ 39/46] RP vs. HLM Part 12: Random Parameters [ 40/46] A Hierarchical Linear Model German Health Care Data Hsat = β1 + β2AGEit + γi EDUCit + β4 MARRIEDit + εit γi = α1 + α2FEMALEi + ui Sample ; all $ Setpanel ; Group = id ; Pds = ti $ Regress ; For [ti = 7] ; Lhs = newhsat ; Rhs = one,age,educ,married ; RPM = female ; Fcn = educ(n) ; pts = 25 ; halton ; panel ; Parameters$ Sample ; 1 – 887 $ Create ; betaeduc = beta_i $ Dstat ; rhs = betaeduc $ Histogram ; Rhs = betaeduc $ Part 12: Random Parameters [ 41/46] OLS Results OLS Starting values for random parameters model... Ordinary least squares regression ............ LHS=NEWHSAT Mean = 6.69641 Standard deviation = 2.26003 Number of observs. = 6209 Model size Parameters = 4 Degrees of freedom = 6205 Residuals Sum of squares = 29671.89461 Standard error of e = 2.18676 Fit R-squared = .06424 Adjusted R-squared = .06378 Model test F[ 3, 6205] (prob) = 142.0(.0000) --------+--------------------------------------------------------| Standard Prob. Mean NEWHSAT| Coefficient Error z z>|Z| of X --------+--------------------------------------------------------Constant| 7.02769*** .22099 31.80 .0000 AGE| -.04882*** .00307 -15.90 .0000 44.3352 MARRIED| .29664*** .07701 3.85 .0001 .84539 EDUC| .14464*** .01331 10.87 .0000 10.9409 --------+--------------------------------------------------------- Part 12: Random Parameters [ 42/46] Maximum Simulated Likelihood -----------------------------------------------------------------Random Coefficients LinearRg Model Dependent variable NEWHSAT Log likelihood function -12583.74717 Estimation based on N = 6209, K = 7 Unbalanced panel has 887 individuals LINEAR regression model Simulation based on 25 Halton draws --------+--------------------------------------------------------| Standard Prob. Mean NEWHSAT| Coefficient Error z z>|Z| of X --------+--------------------------------------------------------|Nonrandom parameters Constant| 7.34576*** .15415 47.65 .0000 AGE| -.05878*** .00206 -28.56 .0000 44.3352 MARRIED| .23427*** .05034 4.65 .0000 .84539 |Means for random parameters EDUC| .16580*** .00951 17.43 .0000 10.9409 |Scale parameters for dists. of random parameters EDUC| 1.86831*** .00179 1044.68 .0000 |Heterogeneity in the means of random parameters cEDU_FEM| -.03493*** .00379 -9.21 .0000 |Variance parameter given is sigma Std.Dev.| 1.58877*** .00954 166.45 .0000 --------+--------------------------------------------------------- Part 12: Random Parameters [ 43/46] Simulating Conditional Means for Individual Parameters Eˆ (i | y i , Xi ) = ˆ ˆ Ti 1 yit ( Lw i , r )xit 1 R ˆ ˆ ( Lw i ,r ) t 1 r 1 R ˆ ˆ 1 R R r 1 ˆ ˆ Ti 1 yit ( Lw i , r )xit t 1 ˆ ˆ 1 R ˆ ˆ Weight ir ir R r 1 Posterior estimates of E[parameters(i) | Data(i)] Part 12: Random Parameters [ 44/46] “Individual Coefficients” Frequency --> Sample ; 1 - 887 $ --> create ; betaeduc = beta_i $ --> dstat ; rhs = betaeduc $ Descriptive Statistics All results based on nonmissing observations. ============================================================================== Variable Mean Std.Dev. Minimum Maximum Cases Missing ============================================================================== All observations in current sample --------+--------------------------------------------------------------------BETAEDUC| .161184 .132334 -.268006 .506677 887 0 - . 268 - . 157 - . 047 . 064 . 175 BETAEDUC . 285 . 396 . 507 Part 12: Random Parameters [ 45/46] Hierarchical Bayesian Estimation Sample data generation: y i,t x i,tβi i,t , i,t ~ N[0,2 ] Individual heterogeneity: βi = β ui , ui ~ N[0, Γ] What information exists about 'the model?' Prior densities for structural parameters : p(log )= uniform density with (large) parameter A 0 p(β) = N[β0 , Σ 0 ], e.g., 0 and (large) v 0I p(Γ) = Inverse Wishart[...] Priors for parameters of interest : p(βi )= N[β,Γ] p( ) = as above. Part 12: Random Parameters [ 46/46] Estimation of Hierarchical Bayes Models (1) Analyze 'posteriors' for hyperparameters β, Γ , (2) Analyze posterior for group level parameters, βi Estimators are Means and Variances of posterior distributions. Algorithm: Generally, Gibbs sampling from posteriors with resort to laws of large numbers To be discussed later.