Notes on Panel Data Models with Discrete Outcomes

Longitudinal and Multilevel Methods for Models with Discrete Outcomes with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey Focus of this talk: Binary dependent variables Unordered categorical dependent variables Models will be logit based – will not discuss probit, poisson or negative binomial models although STATA has methods for these estimators as well Empirical example uses data from the Indonesian Family Life Survey: Two outcomes: Binary indicator for whether the respondent uses contraception Unordered categorical variable for method choice Data Set Overview Four waves of data: 1993, 1997, 2000, and 2007 Individual level information on fertility, education, migration Community and facility level data on health and family planning providers Data from 321 enumeration areas – we will consider these communities IFLS Longitudinal Sample Size Initial Participation Cohort 1993 Wave 1 Cohort 3520 Wave 2 Cohort Wave 3 Cohort Wave 4 Cohort total observations Survey Year 1997 2873 2207 2000 2684 1742 1466 2007 1498 1152 933 2287 total 10575 5101 2399 2287 20362 IFLS Summary Statistics Dependent Variables Contraceptive Use Method Choice no method temporary modern long Lasting modern traditional Independent Variables highest ed grade school highest ed high school highest ed college age muslim number of posyandus Observations mean s.d. .588 .492 .412 .397 .168 .023 0.669 0.470 0.169 0.375 0.049 0.217 34.099 8.842 0.891 0.312 7.507 6.251 20,000 Basic Model for Longitudinal Logit:  P (Y  1|  )  ln   X    P  Z      P (Y  0 |  )  ti i ti ti ti i i i Where: Yti: observed binary variable (respondent i from time period t) Xti: time varying explanatory variables (age and education level) Pti: time varying program variable (posyandus) Zi: time invariant regressors (Muslim) i=1,2,…N (individuals) t=1,2,…Ti (observations per individual -- unbalanced panel) Assumptions:  ~ N (0,1) i for the parametric logit in STATA (xtlogit, melogit, and one variant of GLLAMM) and: E ( X ti i )  E ( Pti i )  E ( Z i i )  0 Note that observations for the same individual will be correlated because of the time invariant error – sometimes referred to as unobserved heterogeneity Given the assumptions, estimation options are: 1. Simple logit yields consistent point estimates but incorrect SE’s 2. Simple logit with cluster option corrects SE’s 3. Parametric or semi-parametric maximum likelihood The likelihood function for this model is derived as follows: e X ti   Pti  Zi  i P (Yti  1| i )  1  e X ti   Pti  Zi  i This is the probability that individual i at time t is using contraception conditional on time invariant heterogeneity. For individual i, we observe Ti binary responses that we can write as: Yi = (1,0,0,1) for a woman that is observed for 4 time periods and used contraception at times 1 and 4. Let Yi be the set of observed outcomes for individual i, then: P (Y )   i   Ti  P (Y t 1 ti  1|  ) (1  P(Y  1|  ) f (  ) d  Yti 1 Yti i ti i i i Joint probability must be approximated -- approximating the area under a curve. With the assumption of normality the approximation method is Gaussian Quadrature or Hermite integration Points: 1. More accurate with more Hermite points – but execution time is longer. 2. You need more points as Ti gets larger. Hermite integration replaces the integral with a sum: P (Y )   w M i m 1 Ti  P (Y m t 1 ti  1|  ) (1  P(Y  1|  ) Yti m ti Yti 1 m where the weights (wm ’s) and the masspoints (μm’s) are known because of the assumption of normality Alternative: The discrete factor approximation searches over weights and mass points along with the other parameters of the model. Must impose a normalization; 1. Weights sum to one 2. Either set one mass point to zero (fortran program) or set mean of distribution to zero (GLLAMM) Simple Logit logit cont_use posyandus age grade_school high_school college muslim Logistic regression Log likelihood = -13304.212 Number of obs LR chi2(6) Prob > chi2 Pseudo R2 = = = = 20000 494.70 0.0000 0.0183 -----------------------------------------------------------------------------cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------posyandus | .0246126 .00295 8.34 0.000 .0188306 .0303946 age | -.0267579 .0016964 -15.77 0.000 -.0300828 -.0234329 grade_school | .488657 .0469818 10.40 0.000 .3965744 .5807396 high_school | .3713791 .0569712 6.52 0.000 .2597175 .4830406 college | .3879987 .0789006 4.92 0.000 .2333564 .542641 muslim | -.0432766 .0468967 -0.92 0.356 -.1351924 .0486392 _cons | .7282074 .0909596 8.01 0.000 .5499298 .906485 ------------------------------------------------------------------------------ Simple Logit with Corrected Se’s logit cont_use posyandus age grade_school high_school college muslim, cluster(ind_id) Logistic regression Log pseudolikelihood = -13304.212 Number of obs Wald chi2(6) Prob > chi2 Pseudo R2 = = = = 20000 346.25 0.0000 0.0183 (Std. Err. adjusted for 9351 clusters in ind_id) -----------------------------------------------------------------------------| Robust cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------posyandus | .0246126 .0035385 6.96 0.000 .0176772 .031548 age | -.0267579 .0019687 -13.59 0.000 -.0306165 -.0228993 grade_school | .488657 .0573944 8.51 0.000 .376166 .601148 high_school | .3713791 .0679893 5.46 0.000 .2381225 .5046357 college | .3879987 .0930589 4.17 0.000 .2056065 .5703909 muslim | -.0432766 .057396 -0.75 0.451 -.1557708 .0692176 _cons | .7282074 .1084511 6.71 0.000 .5156472 .9407677 ------------------------------------------------------------------------------ Parametric Maximum Likelihood gllamm cont_use posyandus age grade_school high_school college muslim, i(ind_id) family(binomial) link(logit) nip(20) ip(g) trace dot log likelihood = -12661.672 -----------------------------------------------------------------------------cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------posyandus | .0316745 .0049243 6.43 0.000 .022023 .0413259 age | -.0378263 .0026797 -14.12 0.000 -.0430785 -.0325741 grade_school | .651587 .0775775 8.40 0.000 .4995379 .8036361 high_school | .453883 .0939311 4.83 0.000 .2697814 .6379847 college | .4845458 .12741 3.80 0.000 .2348268 .7342648 muslim | .0000928 .0809837 0.00 0.999 -.1586322 .1588179 _cons | .9398335 .1461969 6.43 0.000 .6532929 1.226374 -----------------------------------------------------------------------------Variances and covariances of random effects -----------------------------------------------------------------------------***level 2 (ind_id) var(1): 2.6610493 (.1476163) ------------------------------------------------------------------------------ Semi-Parametric Maximum Likelihood gllamm cont_use posyandus age grade_school high_school college muslim, i(ind_id) family(binomial) link(logit) nip(3) ip(f) trace dot log likelihood = -12660.352 -----------------------------------------------------------------------------cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------posyandus | .0311694 .0048755 6.39 0.000 .0216135 .0407253 age | -.0379544 .002707 -14.02 0.000 -.04326 -.0326488 grade_school | .6591399 .0788521 8.36 0.000 .5045927 .8136871 high_school | .4674408 .0945268 4.95 0.000 .2821716 .65271 college | .4973757 .1278639 3.89 0.000 .2467671 .7479843 muslim | .0008321 .0812183 0.01 0.992 -.1583529 .160017 _cons | 1.020194 .1998693 5.10 0.000 .6284575 1.411931 -----------------------------------------------------------------------------Probabilities and locations of random effects -----------------------------------------------------------------------------***level 2 (ind_id) loc1: -2.0306, 2.9674, .16649 var(1): 2.780105 prob: 0.2982, 0.1744, 0.5274 ------------------------------------------------------------------------------ Multilevel Panel Models Basic Form of the model:  P (Y  1|  ,  )  ln    X  P  Z         P (Y  0 |  ,  )  tij ij j tij tij ij tij j 1 ij 2 j j where j=1,2,…,J (communities) i=1,2,…,Nj (individuals from community j) t=1,2,…,Tij (observations for person i for community j) Xtij: individual level variables (some could be fixed through time) Ptij: time varying program variable Zj: time invariant community level variables μij: time invariant individual level unobserved heterogeneity λj: time invariant community level unobserved heterogeneity This model allows observations on the same individual to be correlated and observations from the same community to be correlated. Assumptions: E ( X  )  E ( P  )  E (Z  )  0 tij j tij j ij j E ( X  )  E ( P  )  E (Z  )  0 tij ij tij ij ij ij 1. Simple logit yields consistent point estimates but incorrect SE’s 2. Simple logit with cluster option corrects SE’s (at community level) 3. Parametric or semi-parametric maximum likelihood Maximum likelihood estimator is a straight forward extension of the longitudinal data model: X   P  Z        e tij tij j 1 ij 2 j P (Ytij  1| ij ,  j )  X   P  Z        1  e tij tij j 1 ij 2 j You need the unconditional joint probability of the observed set of outcomes for the set of individuals in each community: Conditional on the unobservables at the community level, the probability of the set of observed outcomes for person i from community j are: P (Y |  )   ij j Tij    P (Y Ytij 1 Ytij tij t 1  1|  ,  ) (1  P(Y  1|  ,  ) f (  ) d  ij j tij ij j ij ij The unconditional joint probability of the set of observed outcomes for all individuals in community j is then: P (Y )   ij   Ni  P (Y i 1 tij |  ) f ( )d  j j j We then either use Hermite integration or the discrete factor method to approximate the integral. Simple logit logit cont_use posyandus age grade_school high_school college muslim Logistic regression Log likelihood = -13304.212 Number of obs LR chi2(6) Prob > chi2 Pseudo R2 = = = = 20000 494.70 0.0000 0.0183 -----------------------------------------------------------------------------cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------posyandus | .0246126 .00295 8.34 0.000 .0188306 .0303946 age | -.0267579 .0016964 -15.77 0.000 -.0300828 -.0234329 grade_school | .488657 .0469818 10.40 0.000 .3965744 .5807396 high_school | .3713791 .0569712 6.52 0.000 .2597175 .4830406 college | .3879987 .0789006 4.92 0.000 .2333564 .542641 muslim | -.0432766 .0468967 -0.92 0.356 -.1351924 .0486392 _cons | .7282074 .0909596 8.01 0.000 .5499298 .906485 ------------------------------------------------------------------------------ Simple Logit with Corrected SE’s logit cont_use posyandus age grade_school high_school college muslim, cluster(com_id) Logistic regression Log pseudolikelihood = -13304.212 Number of obs Wald chi2(6) Prob > chi2 Pseudo R2 = = = = 20000 263.28 0.0000 0.0183 (Std. Err. adjusted for 313 clusters in com_id) -----------------------------------------------------------------------------| Robust cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------posyandus | .0246126 .0052652 4.67 0.000 .014293 .0349322 age | -.0267579 .0022948 -11.66 0.000 -.0312555 -.0222603 grade_school | .488657 .0796778 6.13 0.000 .3324914 .6448226 high_school | .3713791 .0929568 4.00 0.000 .1891871 .553571 college | .3879987 .1057477 3.67 0.000 .180737 .5952603 muslim | -.0432766 .1257938 -0.34 0.731 -.2898279 .2032747 _cons | .7282074 .1919567 3.79 0.000 .3519792 1.104436 ------------------------------------------------------------------------------ Parametric Maximum Likelihood gllamm cont_use posyandus age grade_school high_school college muslim, i(ind_id com_id) family(binomial) link(logit) nip(20) ip(g) trace dot number of level 1 units = 20000 number of level 2 units = 9394 number of level 3 units = 313 gllamm model log likelihood = -12548.522 -----------------------------------------------------------------------------cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------posyandus | .0228368 .0069701 3.28 0.001 .0091757 .036498 age | -.037996 .0026758 -14.20 0.000 -.0432405 -.0327516 grade_school | .6367873 .0786581 8.10 0.000 .4826202 .7909543 high_school | .4122478 .0975244 4.23 0.000 .2211036 .6033921 college | .4165882 .1299495 3.21 0.001 .1618919 .6712844 muslim | .0376821 .1052797 0.36 0.720 -.1686623 .2440266 _cons | 1.00658 .1701569 5.92 0.000 .6730791 1.340082 -----------------------------------------------------------------------------Variances and covariances of random effects -----------------------------------------------------------------------------***level 2 (ind_id) var(1): 2.2860509 (.13570515) ***level 3 (com_id) var(1): .34625941 (.04611334) ------------------------------------------------------------------------------ Non-parametric Maximum Likelihood gllamm cont_use posyandus age grade_school high_school college muslim, i(ind_id com_id) family(binomial) link(logit) nip(3) ip(f) trace dot number of level 1 units = 20000 number of level 2 units = 9394 number of level 3 units = 313 gllamm model log likelihood = -12546.725 -----------------------------------------------------------------------------cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------posyandus | .0208781 .0067149 3.11 0.002 .0077171 .0340391 age | -.037949 .0027013 -14.05 0.000 -.0432435 -.0326546 grade_school | .637163 .0795007 8.01 0.000 .4813445 .7929815 high_school | .4185102 .097197 4.31 0.000 .2280075 .6090129 college | .4177381 .1293205 3.23 0.001 .1642745 .6712016 muslim | -.0883427 .1015703 -0.87 0.384 -.2874169 .1107315 _cons | 1.164577 .1836346 6.34 0.000 .8046593 1.524494 -----------------------------------------------------------------------------Probabilities and locations of random effects -----------------------------------------------------------------------------***level 2 (ind_id) loc1: -1.9001, 2.6348, .23361 var(1): 2.2386873 prob: 0.295, 0.1648, 0.5402 ***level 3 (com_id) loc1: -1.4082, .65457, -.1872 var(1): .33048135 prob: 0.0826, 0.3421, 0.5753 ------------------------------------------------------------------------------ Testing for Program Targeting Programs may target high need areas or areas where they feel residents would be receptive to family planning For example: family planning programs may concentrate on high fertility areas Result is that simple methods may understate or overstate program impact Statistical Implication of program targeting: E(P  )  0 tij j Solutions: Explicitly model program placement and estimate placement simultaneously with program impact equations (Angeles, Guilkey, and Mroz, 1998) Treat  as fixed effects and include dummies for communities or some other fixed effects method (Gertler and Molyneau, 1994) j Angeles, Guilkey, and Mroz show that the joint modeling approach yields smaller standard errors in Tanzania but the two methods gave similar results Example (fixed effects) plus Hausman Test for endogenous placement: Efficient estimator under the null of no endogeneity (random effects): melogit cont_use posyandus age grade_school high_school college muslim ||prov_id: || ind_id:,intp(20) Mixed-effects logistic regression Number of obs = 20000 Integration points = 20 ----------------------------------------------------------| No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+-----------------------------------------prov_id | 16 11 1250.0 3116 ind_id | 9507 1 2.1 4 ----------------------------------------------------------Integration method: mvaghermite Wald chi2(6) = 468.70 Log likelihood = 5142.1765 Prob > chi2 = 0.0000 -------------------------------------------------------------------------------cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------posyandus | .0260336 .0036554 7.12 0.000 .018869 .0331981 age | -.0279207 .0019312 -14.46 0.000 -.0317057 -.0241357 grade_school | .603515 .0539052 11.20 0.000 .4978628 .7091672 high_school | .4773575 .0663403 7.20 0.000 .3473329 .607382 college | .5571055 .0914372 6.09 0.000 .3778918 .7363192 muslim | .2747446 .0685264 4.01 0.000 .1404353 .4090539 _cons | .3397094 .1159908 2.93 0.003 .1123716 .5670472 ---------------+---------------------------------------------------------------prov_id | var(_cons)| .264253 .1427983 .0916312 .7620726 ---------------+---------------------------------------------------------------prov_id>ind_id | var(_cons)| 2.878393 .2659968 2.401537 3.449935 -------------------------------------------------------------------------------LR test vs. logistic regression: chi2(2) = 36892.78 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. . estimates store efficient Consistent estimator under the alternate (fixed effects): xi: melogit cont_use posyandus age grade_school high_school college muslim i.prov_id || ind_id:,intp(20) Integration method: mvaghermite Integration points = 20 Wald chi2(21) = 485.84 Log likelihood = -12574.068 Prob > chi2 = 0.0000 -------------------------------------------------------------------------------cont_use | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------posyandus | .0279454 .0059279 4.71 0.000 .0163268 .0395639 age | -.0368115 .0026797 -13.74 0.000 -.0420635 -.0315595 grade_school | .6695194 .0781108 8.57 0.000 .516425 .8226137 high_school | .5039237 .0954551 5.28 0.000 .3168351 .6910123 college | .5033064 .1282156 3.93 0.000 .2520084 .7546044 muslim | .0992815 .1055572 0.94 0.347 -.1076069 .3061698 _Iprov_id_13 | .6485017 .1559989 4.16 0.000 .3427495 .9542539 . . _Iprov_id_76 | -.5329505 .8731102 -0.61 0.542 -2.244215 1.178314 _cons | .2371631 .1764231 1.34 0.179 -.1086197 .582946 ---------------+---------------------------------------------------------------ind_id | var(_cons)| 2.564441 .1439339 2.297299 2.862648 -------------------------------------------------------------------------------LR test vs. logistic regression: chibar2(01) = 1223.43 Prob>=chibar2 = 0.0000 estimates store consistent Hausman test results: hausman consistent efficient ---- Coefficients ---| (b) (B) (b-B) sqrt(diag(V_b-V_B)) | consistent efficient Difference S.E. -------------+---------------------------------------------------------------posyandus | .0279454 .0260336 .0019118 .0046667 age | -.0368115 -.0279207 -.0088908 .0018577 grade_school | .6695194 .603515 .0660044 .056529 high_school | .5039237 .4773575 .0265662 .0686342 college | .5033064 .5571055 -.0537991 .0898804 muslim | .0992815 .2747446 -.1754631 .0802898 -----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from meglm B = inconsistent under Ha, efficient under Ho; obtained from meglm Test: Ho: difference in coefficients not systematic chi2(6) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 31.45 Prob>chi2 = 0.0000 State Dependence and Unobserved Heterogeneity Consider the simple model: Yti  i   ti Note: Yt 1,i  i   t 1,i Implies: corr (YtiYt 1,i )    0 Unless i  0 (no time invariant unobserved heterogeneity) Now consider: Y  Yt 1,i   ti ti Now: corr (YtiYt 1,i )    0 Very difficult to distinguish between the two models Same problem would exist if the unobserved heterogeneity were at the community level Solution is to estimate a comprehensive model:  P (Y  1|  ,  )  ln    Y   X  P  Z         P (Y  0 |  ,  )  tij ij j t 1 ,ij tij ij tij tij j 1 ij 2 j j Initial conditions problem: Must either be able to set Y1ij  0 or jointly estimate the equation of interest with an equation of the form:  P (Y  1)  ln    X   P  Z         P (Y  0)  0 1 ij 1 ij 1 ij 0 0 tij j 0 1 0 ij 2 j Often it is reasonable to set the initial value: Observations start at the beginning of the woman’s child bearing years In this example, it is not since women enter the year one data set at different ages Joint estimation is basically a simultaneous equations problem subject to standard identification issues. However, time varying exogenous variables provide identification (age and education in this case) Example follows: Estimation with no controls for unobserved heterogeneity and initial conditions: DEPENDENT VARIABLE (LOGIT TYPE EQUATION): cont_use UNCONDITIONAL RESULTS LOG ODDS OF CATEGORY RHS. VAR. one posyandus grade_school high_school college age muslim cont_use_lag 2 RELATIVE TO CATEGORY 1 COEFFICIENT 1.68868 0.01820 0.30074 0.38269 0.65160 -0.06683 -0.11749 1.55126 STD. ERR. T-SCORE 0.1512 11.168 0.0044 4.162 0.0664 4.532 0.0873 4.385 0.1258 5.178 0.0030 -22.524 0.0710 -1.655 0.0481 32.257 FPD 0.193E+00 0.141E+01 0.136E+00 0.254E-01 0.753E-02 0.747E+01 0.172E+00 0.193E+00 SPD -0.205E+04 -0.154E+06 -0.140E+04 -0.293E+03 -0.834E+02 -0.302E+07 -0.183E+04 -0.142E+04 Estimation with Controls: RESULTS FOR LOGIT-TYPE EQUATION -- NUMBER: 1 DEPENDENT VARIABLE (LOGIT TYPE EQUATION): cont_use UNCONDITIONAL RESULTS LOG ODDS OF CATEGORY RHS. VAR. one posyandus grade_school high_school college age muslim OMEGAcl OMEGAcl OMEGAcl OMEGAi OMEGAi 2 RELATIVE TO CATEGORY 1 COEFFICIENT STD. ERR. T-SCORE FPD SPD 0.58041 0.4776 1.215 0.506E-02 -0.263E+03 0.03158 0.0134 2.350 0.297E-01 -0.197E+05 0.38268 0.1948 1.965 0.370E-02 -0.200E+03 0.29902 0.2368 1.263 0.730E-03 -0.385E+02 0.48432 0.3531 1.372 0.200E-03 -0.101E+02 -0.03249 0.0089 -3.654 0.145E+00 -0.283E+06 -0.31757 0.2206 -1.439 0.476E-02 -0.230E+03 0.0 -- NORMALIZED AT ZERO 1.75070 0.3575 4.897 -0.249E-03 -0.244E+02 0.12941 0.3337 0.388 0.336E-02 -0.107E+03 0.0 -- NORMALIZED AT ZERO 8.78497 2.3058 3.810 0.536E-03 -0.941E-04 Estimation with Controls (continued) RESULTS FOR LOGIT-TYPE EQUATION -- NUMBER: 2 DEPENDENT VARIABLE (LOGIT TYPE EQUATION): cont_use UNCONDITIONAL RESULTS LOG ODDS OF CATEGORY RHS. VAR. one posyandus grade_school high_school college age muslim cont_use_lag OMEGAcl OMEGAcl OMEGAcl OMEGAi OMEGAi 2 RELATIVE TO CATEGORY 1 COEFFICIENT STD. ERR. T-SCORE FPD SPD 1.14288 0.2293 4.984 0.114E-02 -0.130E+04 0.02006 0.0065 3.070 0.138E-01 -0.900E+05 0.32652 0.0831 3.929 0.146E-02 -0.107E+04 0.38546 0.1017 3.790 0.161E-04 -0.259E+03 0.66607 0.1411 4.721 0.397E-03 -0.814E+02 -0.07099 0.0033 -21.261 0.376E-01 -0.195E+07 -0.02557 0.1082 -0.236 0.130E-02 -0.114E+04 1.37790 0.0633 21.774 -0.861E-03 -0.109E+04 0.0 -- NORMALIZED AT ZERO 1.00568 0.2002 5.024 -0.619E-04 -0.250E+03 0.58970 0.0829 7.114 0.140E-02 -0.572E+03 0.0 -- NORMALIZED AT ZERO 0.51559 0.1148 4.492 -0.123E-02 -0.281E+03 HETEROGENEITY INFORMATION COMMUNITY SPECIFIC DISTRIBUTION POINT # 1 2 3 PROBABILITY WEIGHT 0.31240645 0.17598629 0.51160726 INDIVIDUAL SPECIFIC DISTRIBUTION POINT # 1 2 PROBABILITY WEIGHT 0.57174255 0.42825745 Basic Model Longitudinal Multinomial Logit with 3 Choices: U  X   P  Z      1 ti ti 1 1 ti i 1 1 i 1 ti U  X   P  Z      2 ti U  X   P  Z      3 ti 2 ti 3 ti ti ti 2 2 3 3 ti ti i i 2 2 3 3 i i Individual i at time t time makes choice 3 (for example) if : P(U  U 3 ti 1 ti and U  U ) 3 ti 2 ti If we assume that the ε’s follow independent extreme value distributions and impose the restriction that:       0 1 1 1 1 So that the probabilities sum to one then:  P (Y  k |  )  ln   X   P  Z      P (Y  1|  )  ti i ti ti k k ti i k k i i for k=2,3. The discrete factor model allows a more general pattern of correlation:  P (Y  k |  ln   P (Y  1|  ti ti km km )  X   P  Z    ) ti k k ti i k km for m=1,2…,M and a common set of weights: w allows for correlation in the μ’s m Unfortunately, GLLAMM estimates a needlessly restrictive version of the model: Parametric:   2 3 If there are more than 3 choices, all ρ’s are restricted Non-parametric:   2m for all m. 3m Extension to Multilevel Panel Model: Parametric:  P (Y  k |  ,  )  ln    X   P  Z         P (Y  1|  ,  )  tij ij j tij tij ij k k tij j k 1k ij 2k j Semi-parametric:  P (Y  k |  ,  )  ln   X   P  Z        P (Y  1|  ,  )  tij km kn tij ti km kn k k tij j k km kn j The empirical example estimates a model with four choices: 1= Non use 2=Temporary Methods (pill, condom, injection) 3=Long Lasting Methods (IUD, sterilization) 4=Traditional Methods We show the complete results for the most general model and then report partial results for other models: DEPENDENT VARIABLE (LOGIT TYPE EQUATION): new_meth UNCONDITIONAL RESULTS LOG ODDS OF CATEGORY RHS. VAR. Posyandus age grade_school high_school college muslim constant 2 RELATIVE TO CATEGORY 1 COEFFICIENT STD. ERR. T-SCORE FPD SPD 0.01467 0.0102 1.437 -0.233E+00 -0.120E+06 -0.06430 0.0034 -18.883 -0.109E+01 -0.246E+07 0.62999 0.0992 6.348 -0.219E-01 -0.163E+04 0.37284 0.1190 3.132 -0.678E-02 -0.351E+03 0.21653 0.1497 1.446 -0.171E-02 -0.908E+02 0.75840 0.1621 4.678 -0.300E-01 -0.201E+04 0.18753 0.3810 0.492 -0.336E-01 -0.227E+04 Community Heterogeneity OMEGAcl 0.0 OMEGAcl 0.27663 OMEGAcl 1.07322 Individual Heterogeneity OMEGAi 0.0 OMEGAi 1.41424 OMEGAi -1.28669 LOG ODDS OF CATEGORY RHS. VAR. Posyandus age grade_school high_school college muslim constant -- -- NORMALIZED AT ZERO 0.2840 0.974 -0.102E-01 0.1974 5.437 -0.181E-01 -0.889E+03 -0.116E+04 NORMALIZED AT ZERO 0.1658 8.530 -0.154E-01 0.1762 -7.302 -0.182E-01 -0.627E+03 -0.816E+03 3 RELATIVE TO CATEGORY 1 COEFFICIENT 0.03780 0.01783 0.35402 0.21220 0.48971 -0.92293 1.59472 STD. ERR. 0.0188 0.0066 0.1371 0.1968 0.2856 0.2182 0.5112 T-SCORE 2.006 2.702 2.582 1.078 1.715 -4.229 3.120 FPD -0.972E-01 -0.451E+00 -0.838E-02 -0.270E-02 -0.892E-03 -0.108E-01 -0.134E-01 SPD -0.699E+05 -0.100E+07 -0.555E+03 -0.130E+03 -0.433E+02 -0.648E+03 -0.760E+03 Community Heterogeneity OMEGAcl OMEGAcl OMEGAcl 0.0 -2.75334 -0.57495 Individual Heterogeneity OMEGAi 0.0 OMEGAi -3.49818 OMEGAi -4.50909 -- -- NORMALIZED AT ZERO 0.4127 -6.671 0.524E-04 0.4030 -1.427 -0.700E-02 -0.139E+03 -0.430E+03 NORMALIZED AT ZERO 0.3190 -10.965 -0.898E-03 0.1863 -24.207 -0.399E-02 -0.114E+03 -0.153E+03 LOG ODDS OF CATEGORY RHS. VAR. Posyandus age grade_school high_school college muslim constant 4 RELATIVE TO CATEGORY 1 COEFFICIENT 0.05196 0.02778 1.03143 1.55313 1.70120 -0.72360 -4.48901 STD. ERR. T-SCORE FPD 0.0139 3.734 0.431E-01 0.0051 5.405 0.174E+00 0.2152 4.793 0.271E-02 0.2487 6.246 0.140E-02 0.2826 6.021 0.535E-03 0.1498 -4.831 0.375E-02 0.4868 -9.221 0.490E-02 SPD -0.845E+05 -0.949E+06 -0.427E+03 -0.117E+03 -0.419E+02 -0.584E+03 -0.711E+03 Community Heterogeneity OMEGAcl OMEGAcl OMEGAcl 0.0 --0.36298 -0.25288 NORMALIZED AT ZERO 0.3285 -1.105 0.136E-02 0.3296 -0.767 0.231E-02 -0.167E+03 -0.235E+03 NORMALIZED AT ZERO 0.5734 -2.015 0.681E-03 0.2834 0.330 0.355E-02 -0.104E+02 -0.517E+03 Individual Heterogeneity OMEGAi OMEGAi OMEGAi 0.0 -1.15542 0.09344 -- HETEROGENEITY INFORMATION COMMUNITY SPECIFIC DISTRIBUTION POINT # 1 2 3 PROBABILITY WEIGHT 0.25422817 0.25159735 0.49417448 INDIVIDUAL SPECIFIC DISTRIBUTION POINT # 1 2 3 PROBABILITY WEIGHT 0.23493000 0.32619060 0.43887940 Comparison of Posyandu effects across estimation methods: Coefficient 2 versus 1 3 versus 1 4 versus 1 2 versus 1 3 versus 1 4 versus 1 2 versus 1 3 versus 1 4 versus 1 Heterogeneity Community Individual 2 versus 1 3 versus 1 4 versus 1 Heterogeneity Community Mass points Weights Individual Mass points Weights 2 versus 1 3 versus 1 4 versus 1 Std Error Z statistic Multinomial Logit .01127 .0027824 4.05 .0348281 .0031497 11.06 .03814 .0062596 6.09 Multinomial Logit with community corrected SE’s .01127 .0042326 2.66 .0348281 .0058663 5.94 .03814 .0081891 4.66 Parametric Random effects Multilevel Multinomial Logit (GLLAMM restrictions) .0114064 .005755 1.98 .0348275 .0059394 5.86 .0384119 .0080835 4.75 .35132246 2.3408954 .04663305 .13771143 7.53 17.00 Non-Parametric Random effects Multilevel Multinomial Logit (GLLAMM restrictions) .0116273 .0059026 1.97 .035 .0060997 5.74 .0385815 .0082056 4.70 -1.3963 0.083 .6822 0.3236 -.17666 0.5934 -1.9467 0.2916 2.6613 0.1622 .24876 0.5462 Non-Parametric Random effects Multilevel Multinomial Logit (Fortran) 0.01467 0.0102 1.44 0.03780 0.0188 2.01 0.05196 0.0139 3.74

Notes on Panel Data Models with Discrete Outcomes

Related documents

Products

Support

Notes on Panel Data Models with Discrete Outcomes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib