Part 23: Parameter Heterogeneity [1/115] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business Part 23: Parameter Heterogeneity [2/115] Econometric Analysis of Panel Data 23. Individual Heterogeneity and Random Parameter Variation Part 23: Parameter Heterogeneity [3/115] Heterogeneity Observational: Observable differences across individuals (e.g., choice makers) Choice strategy: Structural: Differences in model frameworks Preferences: Differences in model ‘parameters’ How consumers make decisions – the underlying behavior Part 23: Parameter Heterogeneity [4/115] Parameter Heterogeneity (1) Regression model y i,t x i,t βi εit (2) Conditional probability or other nonlinear model f(y it | x i,t , βi ) (3) Heterogeneity - how are parameters distributed across individuals? (a) Discrete - the population contains a mixture of Q types of individuals. (b) Continuous. Parameters are part of the stochastic structure of the population. Part 23: Parameter Heterogeneity [5/115] Distinguish Bayes and Classical Both depart from the heterogeneous ‘model,’ f(yit|xit)=g(yit,xit,βi) What do we mean by ‘randomness’ With respect to the information of the analyst (Bayesian) With respect to some stochastic process governing ‘nature’ (Classical) Bayesian: No difference between ‘fixed’ and ‘random’ Classical: Full specification of joint distributions for observed random variables; piecemeal definitions of ‘random’ parameters. Usually a form of ‘random effects’ Part 23: Parameter Heterogeneity [6/115] Fixed Management and Technical Efficiency in a Random Coefficients Model Antonio Alvarez, University of Oviedo Carlos Arias, University of Leon William Greene, Stern School of Business, New York University Part 23: Parameter Heterogeneity [7/115] The Production Function Model Definition: Maximal output, given the inputs Inputs: Variable factors, Quasi-fixed (land) Form: Log-quadratic - translog Latent Management as an unobservable input 1 ln yit = x ln xit xx (ln xit )2 2 1 m mi mm mi2 xm ln xit mi 2 vit Part 23: Parameter Heterogeneity [8/115] Application to Spanish Dairy Farms N = 247 farms, T = 6 years (1993-1998) Input Units Mean Std. Dev. Minimum 92,539 14,110 Milk Milk production (liters) 131,108 Cows # of milking cows 2.12 11.27 4.5 82.3 Labor # man-equivalent units 1.67 0.55 1.0 4.0 Land Hectares of land devoted to pasture and crops. 12.99 6.17 2.0 45.1 Feed Total amount of feedstuffs fed to dairy cows (tons) 57,941 47,981 3,924.14 Maximum 727,281 376,732 Part 23: Parameter Heterogeneity [9/115] Translog Production Model ln yit = ln yit* - uit k 1 k ln xitk K + m m 2 mm * i 1 m 1 2 *2 i K K k 1 l 1 1 2 kl ln xitk ln xitl K * ln x m itk i k 1 km + vit - uit mi * is an unobserved, time invariant effect. uit = ln yit* - ln yit m 1 2 k 1 km ln xkit K m * i mi 1 2 mm mi* 2 mi2 0. Part 23: Parameter Heterogeneity [10/115] Random Coefficients Model ln yit m m 2 mm m * i 1 2 K K k 1 l 1 K i k 1 *2 i 1 ki 1 m* ln x k 1 k 2 km i itk K kl ln xitk ln xitl vit uit ln xitk 1 2 K K k 1 l 1 kl ln xitk ln xitl it K mi* k ln xk wi k 1 [Chamberlain/Mundlak:] (1) Same random effect appears in each random parameter (2) Only the first order terms are random Part 23: Parameter Heterogeneity [11/115] Discrete vs. Continuous Variation Classical context: Description of how parameters are distributed across individuals Variation Discrete: Finite number of different parameter vectors distributed across individuals Mixture is unknown as well as the parameters: Implies randomness from the point of the analyst. (Bayesian?) Might also be viewed as discrete approximation to a continuous distribution Continuous: There exists a stochastic process governing the distribution of parameters, drawn from a continuous pool of candidates. Background common assumption: An over-reaching stochastic process that assigns parameters to individuals Part 23: Parameter Heterogeneity [12/115] Discrete Parameter Variation The Latent Class Model (1) Population is a (finite) mixture of Q types of individuals. q = 1,...,Q. Q 'classes' differentiated by (β q ) (a) Analyst does not know class memberships. ('latent.') (b) 'Mixing probabilities' (from the point of view of the J q 1 analyst) are 1 ,..., Q , with q=1 (2) Conditional density is P(y i,t | class q) f(y it | x i,t , βq ) Part 23: Parameter Heterogeneity [13/115] Latent Classes A population contains a mixture of individuals of different types (classes) Common form of the data generating mechanism within the classes Observed outcome y is governed by the common process F(y|x,j ) Classes are distinguished by the parameters, j. Part 23: Parameter Heterogeneity [14/115] Part 23: Parameter Heterogeneity [15/115] Part 23: Parameter Heterogeneity [16/115] Part 23: Parameter Heterogeneity [17/115] How Finite Mixture Models Work Part 23: Parameter Heterogeneity [18/115] Find the ‘Best’ Fitting Mixture of Two Normal Densities 2 1 yi - μj LogL = i=1 log j=1 π j σ σ j j Maximum Likelihood Estimates Class 1 Class 2 Estimate Std. Error Estimate Std. error 7.05737 .77151 3.25966 .09824 3.79628 .25395 1.81941 .10858 1000 μ σ π .28547 .05953 .71453 .05953 1 1 y - 7.05737 y - 3.25966 ˆ F(y) =.28547 +.71453 3.79628 3.79628 1.81941 1.81941 Part 23: Parameter Heterogeneity [19/115] Mixing probabilities .715 and .285 Part 23: Parameter Heterogeneity [20/115] Approximation Actual Distribution Part 23: Parameter Heterogeneity [21/115] Application Shoe Brand Choice Simulated Data: Stated Choice, 400 respondents, 8 choice situations 3 choice/attributes + NONE Fashion = High=1 / Low=0 Quality = High=1 / Low=0 Price = 25/50/75,100,125 coded 1,2,3,4,5 then divided by 25. Heterogeneity: Sex, Age (<25, 25-39, 40+) categorical Underlying data generated by a 3 class latent class process (100, 200, 100 in classes) Thanks to www.statisticalinnovations.com (Latent Gold) Part 23: Parameter Heterogeneity [22/115] A Random Utility Model Random Utility Model for Discrete Choice Among J alternatives at time t by person i. Uitj = j + ′xitj + ijt j = Choice specific constant xitj = Attributes of choice presented to person (Information processing strategy. Not all attributes will be evaluated. E.g., lexicographic utility functions over certain attributes.) = ‘Taste weights,’ ‘Part worths,’ marginal utilities ijt = Unobserved random component of utility Mean=E[ ijt] = 0; Variance=Var[ ijt] = 2 Part 23: Parameter Heterogeneity [23/115] The Multinomial Logit Model Independent type 1 extreme value (Gumbel): F(itj) = 1 – Exp(-Exp(itj)) Independence across utility functions Identical variances, 2 = π2/6 Same taste parameters for all individuals Prob[choice j | i, t] = exp(α j +β'xitj ) J(i,t) j=1 exp(α j +β'xitj ) Part 23: Parameter Heterogeneity [24/115] Estimated MNL +---------------------------------------------+ | Discrete choice (multinomial logit) model | | Log likelihood function -4158.503 | | Akaike IC= 8325.006 Bayes IC= 8349.289 | | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj | | Constants only -4391.1804 .05299 .05259 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ BF 1.47890473 .06776814 21.823 .0000 BQ 1.01372755 .06444532 15.730 .0000 BP -11.8023376 .80406103 -14.678 .0000 BN .03679254 .07176387 .513 .6082 Part 23: Parameter Heterogeneity [25/115] Latent Classes and Random Parameters Heterogeneity with respect to 'latent' consumer classes Pr(Choicei ) = q=1 Pr(choicei | class = q)Pr(class = q) Q Pr(choicei | class = q) = exp(xi,choiceβclass ) Σ j=choice exp(xi,jβclass ) Pr(class = q | i) = i,q , e.g., Fi,q = exp(ziδ q ) Σq=classes exp(ziδ q ) Simple discrete random parameter variation exp(xi,choiceβi ) Pr(choicei | βi ) = Σ j=choice exp(xi,jβi ) Pr (βi βq ) i,q = exp(ziδ q ) Σq=classes exp(ziδ q ) , q = 1,..., Q Pr(Choicei ) = q=1 Pr(choice | βi βq )Pr(βq ) Q Part 23: Parameter Heterogeneity [26/115] +---------------------------------------------+ | Latent Class Logit Model | | Log likelihood function -3649.132 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ Utility parameters in latent class -->> 1 BF|1 3.02569837 .14335927 21.106 .0000 BQ|1 -.08781664 .12271563 -.716 .4742 BP|1 -9.69638056 1.40807055 -6.886 .0000 BN|1 1.28998874 .14533927 8.876 .0000 Utility parameters in latent class -->> 2 BF|2 1.19721944 .10652336 11.239 .0000 BQ|2 1.11574955 .09712630 11.488 .0000 BP|2 -13.9345351 1.22424326 -11.382 .0000 BN|2 -.43137842 .10789864 -3.998 .0001 Utility parameters in latent class -->> 3 BF|3 -.17167791 .10507720 -1.634 .1023 BQ|3 2.71880759 .11598720 23.441 .0000 BP|3 -8.96483046 1.31314897 -6.827 .0000 BN|3 .18639318 .12553591 1.485 .1376 This is THETA(1) in class probability model. Constant -.90344530 .34993290 -2.582 .0098 _MALE|1 .64182630 .34107555 1.882 .0599 _AGE25|1 2.13320852 .31898707 6.687 .0000 _AGE39|1 .72630019 .42693187 1.701 .0889 This is THETA(2) in class probability model. Constant .37636493 .33156623 1.135 .2563 _MALE|2 -2.76536019 .68144724 -4.058 .0000 _AGE25|2 -.11945858 .54363073 -.220 .8261 _AGE39|2 1.97656718 .70318717 2.811 .0049 This is THETA(3) in class probability model. Constant .000000 ......(Fixed Parameter)....... _MALE|3 .000000 ......(Fixed Parameter)....... _AGE25|3 .000000 ......(Fixed Parameter)....... _AGE39|3 .000000 ......(Fixed Parameter)....... Estimated Latent Class Model Part 23: Parameter Heterogeneity [27/115] Latent Class Elasticities +-----------------------------------------------------------------+ | Elasticity Averaged over observations. | | Effects on probabilities of all choices in the model: | | Attribute is PRICE in choice B1 MNL LCM | | * Choice=B1 .000 .000 .000 -.889 -.801 | | Choice=B2 .000 .000 .000 .291 .273 | | Choice=B3 .000 .000 .000 .291 .248 | | Choice=NONE .000 .000 .000 .291 .219 | | Attribute is PRICE in choice B2 | | Choice=B1 .000 .000 .000 .313 .311 | | * Choice=B2 .000 .000 .000 -1.222 -1.248 | | Choice=B3 .000 .000 .000 .313 .284 | | Choice=NONE .000 .000 .000 .313 .268 | | Attribute is PRICE in choice B3 | | Choice=B1 .000 .000 .000 .366 .314 | | Choice=B2 .000 .000 .000 .366 .344 | | * Choice=B3 .000 .000 .000 -.755 -.674 | | Choice=NONE .000 .000 .000 .366 .302 | +-----------------------------------------------------------------+ Part 23: Parameter Heterogeneity [28/115] Individual Specific Means Part 23: Parameter Heterogeneity [29/115] A Practical Distinction Finite Mixture (Discrete Mixture): Functional form strategy Component densities have no meaning Mixing probabilities have no meaning There is no question of “class membership” The number of classes is uninteresting – enough to get a good fit Latent Class: Mixture of subpopulations Component densities are believed to be definable “groups” (Low Users and High Users in Bago d’Uva and Jones application) The classification problem is interesting – who is in which class? Posterior probabilities, P(class|y,x) have meaning Question of the number of classes has content in the context of the analysis Part 23: Parameter Heterogeneity [30/115] The Latent Class Model (1) There are Q classes, unobservable to the analyst (2) Class specific model: f(y it | x it , class q) g(y it , x it , βq ) (3) Conditional class probabilities (possibly given some information, zi ) P(class=q|zi , δ) Common multinomial logit form for prior class probabilities exp(ziδ q ) P(class=q|zi , δ) iq Q , δQ = 0 q1 exp(ziδq ) Note, if no zi , q = log(iq / iQ ). Part 23: Parameter Heterogeneity [31/115] Estimating an LC Model Conditional density for each observation is P(y i,t | x i,t , class q) f(y it | x i,t , βq ) Joint conditional density for Ti observations is f(y i1 , y i2 ,..., y i,Ti | X i , βq ) t i 1 f(y it | x i,t , βq ) T (Ti may be 1. This is not only a 'panel data' model.) Maximize this for each class if the classes are known. They aren't. Unconditional density for individual i is f(y i1 , y i2 ,..., y i,Ti | X i , zi ) q1 iq Q Ti t 1 f(y it | x i,t , βq ) LogLikelihood LogL(β1 ,..., βQ , δ1 ,..., δ Q ) i1 log q1 iq t i 1 f(y it | x i,t , βq ) N Q T Part 23: Parameter Heterogeneity [32/115] Estimating Which Class Prior class probability Prob[class=q|zi ]=iq Joint conditional density for Ti observations is P(y i1 , y i2 ,..., y i,Ti | X i , class q) t i 1 f(y it | x i,t , β q ) T Joint density for data and class membership is the product P(y i1 , y i2 ,..., y i,Ti , class q | X i , zi ) q t i 1 f(y it | x i,t , βq ) T Posterior probability for class, given the data P(class q | y i1 , y i2 ,..., y i,Ti , X i , zi ) P( y i , class q | X i , zi ) P(y i1 , y i2 ,..., y i,Ti | X i , zi ) P( y i , class q | X i , zi ) Q q1 P( y i , class q | X i , zi ) Use Bayes Theorem to compute the posterior (conditional) probability iq t i 1 f(y it | x i,t , β q ) T w(q | y i , X i , zi ) P(class j | y i , X i , zi ) Q q1 iq t i 1 f(y it | x i,t , β q ) T w iq Best guess = the class with the largest posterior probability. Part 23: Parameter Heterogeneity [33/115] ‘Estimating’ βi ˆ from the class with the largest estimated probability (1) Use β j (2) Probabilistic - in the same spirit as the 'posterior mean' ˆ = Q Posterior Prob[class=q|data ] β ˆ β i i q q=1 ˆ ˆ iqβ = q=1 w q Q Note : This estimates E[βi | y i , X i , zi ], not βi itself. Part 23: Parameter Heterogeneity [34/115] How Many Classes? (1) Q is not a 'parameter' - can't 'estimate' Q with and β (2) Can't 'test' down or 'up' to Q by comparing log likelihoods. Degrees of freedom for Q+1 vs. Q classes is not well define d. (3) Use AKAIKE IC; AIC = -2 logL + 2#Parameters. Part 23: Parameter Heterogeneity [35/115] Modeling Obesity with a Latent Class Model Mark Harris Department of Economics, Curtin University Bruce Hollingsworth Department of Economics, Lancaster University Pushkar Maitra Department of Economics, Monash University William Greene Stern School of Business, New York University Part 23: Parameter Heterogeneity [36/115] 300 Million People Worldwide. International Obesity Task Force: www.iotf.org Part 23: Parameter Heterogeneity [37/115] Costs of Obesity In the US more people are obese than smoke or use illegal drugs Obesity is a major risk factor for noncommunicable diseases like heart problems and cancer Obesity is also associated with: lower wages and productivity, and absenteeism low self-esteem USA costs are around 4-8% of all annual health care expenditure - US $100 billion Canada, 5%; France, 1.5-2.5%; and New Zealand 2.5% An economic problem. It is costly to society: Part 23: Parameter Heterogeneity [38/115] Measuring Obesity An individual’s weight given their height should lie within a certain range Body Mass Index (BMI) Weight (Kg)/height(Meters)2 World Health Organization guidelines: Underweight BMI < 18.5 Normal 18.5 < BMI < 25 Overweight 25 < BMI < 30 Obese BMI > 30 Morbidly Obese BMI > 40 Part 23: Parameter Heterogeneity [39/115] Two Latent Classes: Approximately Half of European Individuals Part 23: Parameter Heterogeneity [40/115] Modeling BMI Outcomes Grossman-type health production function Health Outcomes = f(inputs) Existing literature assumes BMI is an ordinal, not cardinal, representation of individuals. Weight-related health status Do not assume a one-to-one relationship between BMI levels and (weight-related) health status levels Translate BMI values into an ordinal scale using WHO guidelines Preserves underlying ordinal nature of the BMI index but recognizes that individuals within a so-defined weight range are of an (approximately) equivalent (weight-related) health status level Part 23: Parameter Heterogeneity [41/115] Conversion to a Discrete Measure Measurement issues: Tendency to under-report BMI women tend to under-estimate/report weight; men over-report height. Using bands should alleviate this Allows focus on discrete ‘at risk’ groups Part 23: Parameter Heterogeneity [42/115] A Censored Regression Model for BMI Simple Regression Approach Based on Actual BMI: BMI* = ′x + , ~ N[0,2] , σ2 = 1 True BMI = weight proxy is unobserved Interval Censored Regression Approach WT = 0 if BMI* < 25 1 if 25 < BMI* < 30 2 if BMI* > 30 Normal Overweight Obese Inadequate accommodation of heterogeneity Inflexible reliance on WHO classification Rigid measurement by the guidelines Part 23: Parameter Heterogeneity [43/115] Heterogeneity in the BMI Ranges Boundaries are set by the WHO narrowly defined for all individuals Strictly defined WHO definitions may consequently push individuals into inappropriate categories We allow flexibility at the margins of these intervals Following Pudney and Shields (2000) therefore we consider Generalised Ordered Choice models - boundary parameters are now functions of observed personal characteristics Part 23: Parameter Heterogeneity [44/115] Generalized Ordered Probit Approach A Latent Regression Model for True BMI BMIi* = ′xi + i , i ~ N[0,σ2], σ2 = 1 Observation Mechanism for Weight Type WTi = 0 if BMIi* < 0 Normal 1 if 0 < BMIi* < i(wi) Overweight 2 if (wi) < BMIi* Obese Part 23: Parameter Heterogeneity [45/115] Latent Class Modeling Several ‘types’ or ‘classes. Obesity be due to genetic reasons (the FTO gene) or lifestyle factors Distinct sets of individuals may have differing reactions to various policy tools and/or characteristics The observer does not know from the data which class an individual is in. Suggests a latent class approach for health outcomes (Deb and Trivedi, 2002, and Bago d’Uva, 2005) Part 23: Parameter Heterogeneity [46/115] Latent Class Application Two class model (considering FTO gene): More classes make class interpretations much more difficult Parametric models proliferate parameters Endogenous class membership: Two classes allow us to correlate the equations driving class membership and observed weight outcomes via unobservables. Part 23: Parameter Heterogeneity [47/115] Heterogeneous Class Probabilities j = Prob(class=j) = governor of a detached natural process. Homogeneous. ij = Prob(class=j|zi,individual i) Now possibly a behavioral aspect of the process, no longer “detached” or “natural” Nagin and Land 1993, “Criminal Careers… Part 23: Parameter Heterogeneity [48/115] Endogeneity of Class Membership Class Membership: C* = z i ui , C = 1[C* > 0] (Probit) BMI|Class=0,1 BMI* = c xi c ,i , BMI group = OP[BMI*,(c w i )] 0 1 ui Endogeneity: ~ N , c ,i 0 c c 1 Bivariate Ordered Probit (one variable is binary). Full information maximum likelihood. Part 23: Parameter Heterogeneity [49/115] Model Components x: determines observed weight levels within classes For observed weight levels we use lifestyle factors such as marital status and exercise levels z: determines latent classes For latent class determination we use genetic proxies such as age, gender and ethnicity: the things we can’t change w: determines position of boundary parameters within classes For the boundary parameters we have: weighttraining intensity and age (BMI inappropriate for the aged?) pregnancy (small numbers and length of term unknown) Part 23: Parameter Heterogeneity [50/115] Data US National Health Interview Survey (2005); conducted by the National Center for Health Statistics Information on self-reported height and weight levels, BMI levels Demographic information Split sample (30,000+) by gender Part 23: Parameter Heterogeneity [51/115] Outcome Probabilities Class 0 dominated by normal and overweight probabilities ‘normal weight’ class Class 1 dominated by probabilities at top end of the scale ‘non-normal weight’ Unobservables for weight class membership, negatively correlated with those determining weight levels: Part 23: Parameter Heterogeneity [52/115] Normal Class 1 Overweight Obese Class 0 Normal Overweight Obese Part 23: Parameter Heterogeneity [53/115] Classification (Latent Probit) Model Part 23: Parameter Heterogeneity [54/115] BMI Ordered Choice Model Conditional on class membership, lifestyle factors Marriage comfort factor only for normal class women Both classes associated with income, education Exercise effects similar in magnitude Exercise intensity only important for ‘non-normal’ class: Home ownership only important for .non-normal.class, and negative: result of differing socieconomic status distributions across classes? Part 23: Parameter Heterogeneity [55/115] Effects of Aging on Weight Class Part 23: Parameter Heterogeneity [56/115] Effect of Education on Probabilities Part 23: Parameter Heterogeneity [57/115] Effect of Income on Probabilities Part 23: Parameter Heterogeneity [58/115] Inflated Responses in Self-Assessed Health Mark Harris Department of Economics, Curtin University Bruce Hollingsworth Department of Economics, Lancaster University William Greene Stern School of Business, New York University Part 23: Parameter Heterogeneity [59/115] Introduction Health sector an important part of developed countries’ economies: E.g., Australia 9% of GDP To see if these resources are being effectively utilized, we need to fully understand the determinants of individuals’ health levels To this end much policy, and even more academic research, is based on measures of self-assessed health (SAH) from survey data Part 23: Parameter Heterogeneity [60/115] SAH vs. Objective Health Measures Favorable SAH categories seem artificially high. 60% of Australians are either overweight or obese (Dunstan et. al, 2001) 1 in 4 Australians has either diabetes or a condition of impaired glucose metabolism Over 50% of the population has elevated cholesterol Over 50% has at least 1 of the “deadly quartet” of health conditions (diabetes, obesity, high blood pressure, high cholestrol) Nearly 4 out of 5 Australians have 1 or more long term health conditions (National Health Survey, Australian Bureau of Statistics 2006) Australia ranked #1 in terms of obesity rates Similar results appear for other countries Part 23: Parameter Heterogeneity [61/115] SAH vs. Objective Health 1. Are these SAH outcomes are “overinflated” 2. And if so, why, and what kinds of people are doing the over-inflating/misreporting? Part 23: Parameter Heterogeneity [62/115] HILDA Data The Household, Income and Labour Dynamics in Australia (HILDA) dataset: 1. a longitudinal survey of households in Australia 2. well tried and tested dataset 3. contains a host of information on SAH and other health measures, as well as numerous demographic variables Part 23: Parameter Heterogeneity [63/115] Self Assessed Health “In general, would you say your health is: Excellent, Very good, Good, Fair or Poor?" Responses 1,2,3,4,5 (we will be using 0,1,2,3,4) Typically ¾ of responses are “good” or “very good” health; in our data (HILDA) we get 72% Similar numbers for most developed countries Does this truly represent the health of the nation? Part 23: Parameter Heterogeneity [64/115] Part 23: Parameter Heterogeneity [65/115] A Two Class Latent Class Model True Reporter Misreporter Part 23: Parameter Heterogeneity [66/115] Reporter Type Model r* xrr r r = 1 if r* > 0 True reporter 0 if r* 0 Misreporter r is unobserved Part 23: Parameter Heterogeneity [67/115] Y=4 Y=3 Y=2 Y=1 Y=0 Part 23: Parameter Heterogeneity [68/115] Pr(true,y) = Pr(true) * Pr(y | true) Part 23: Parameter Heterogeneity [69/115] Mis-reporters choose either good or very good The response is determined by a probit model m* xm m m Y=3 Y=2 Part 23: Parameter Heterogeneity [70/115] Part 23: Parameter Heterogeneity [71/115] Observed Mixture of Two Classes Part 23: Parameter Heterogeneity [72/115] Pr( y) Pr(true) Pr( y | true) Pr(misreporter ) Pr( y | misreporter) Part 23: Parameter Heterogeneity [73/115] Part 23: Parameter Heterogeneity [74/115] Who are the Misreporters? Part 23: Parameter Heterogeneity [75/115] Priors and Posteriors M=Misreporter, T=True reporter Priors : Pr( M ) ( xr), Pr(T ) ( xr) Posteriors: Noninflated outcomes 0, 1, 4 Pr( M | y 0,1, 4) 0, Pr(T | y 0,1, 4) ( xr) Inflated outcomes 2, 3 Pr( M | y 2) Pr( y 2 | M )Pr( M ) Pr( y 2 | M )Pr( M ) Pr( y 2 | T )Pr(T ) Part 23: Parameter Heterogeneity [76/115] General Results Part 23: Parameter Heterogeneity [77/115] Part 23: Parameter Heterogeneity [78/115] Latent Class Efficiency Studies Battese and Coelli – growing in weather “regimes” for Indonesian rice farmers Kumbhakar and Orea – cost structures for U.S. Banks Greene (Health Economics, 2005) – revisits WHO Year 2000 World Health Report Part 23: Parameter Heterogeneity [79/115] Studying Economic Efficiency in Health Care Hospital and Nursing Home Cost efficiency Role of quality (not studied today) Agency for Health Reseach and Quality (AHRQ) Part 23: Parameter Heterogeneity [80/115] Stochastic Frontier Analysis logC = f(output, input prices, environment) + v +u ε = v+u v = noise – the usual “disturbance” u = inefficiency Frontier efficiency analysis Estimate parameters of model Estimate u (to the extent we are able – we use E[u|ε]) Evaluate and compare observed firms in the sample Part 23: Parameter Heterogeneity [81/115] Nursing Home Costs 44 Swiss nursing homes, 13 years Cost, Pk, Pl, output, two environmental variables Estimate cost function Estimate inefficiency Part 23: Parameter Heterogeneity [82/115] Estimated Cost Efficiency Part 23: Parameter Heterogeneity [83/115] Inefficiency? Not all agree with the presence (or identifiability) of “inefficiency” in market outcomes data. Variation around the common production structure may all be nonsystematic and not controlled by management Implication, no inefficiency: u = 0. Part 23: Parameter Heterogeneity [84/115] A Two Class Model Class 1: With Inefficiency Class 2: Without Inefficiency logC = f(output, input prices, environment) + vv + uu logC = f(output, input prices, environment) + vv u = 0 Implement with a single zero restriction in a constrained (same cost function) two class model Parameterization: λ = u /v = 0 in class 2. Part 23: Parameter Heterogeneity [85/115] LogL= 464 with a common frontier model, 527 with two classes Part 23: Parameter Heterogeneity [86/115] Part 23: Parameter Heterogeneity [87/115] Random Parameters (Mixed) Models A General Model Structure f(y it | x it , βi ) g(y it | x it , βi , θ) βi = a set of random parameters = β + ui f(βi |zi ) = h(βi , zi , Ω) θ = a set of nonrandom parameters in the density of y it Ω = a set of parameters in the distribution of βi Typical application "repeated measures" = panel The "mixed" model f(y it | x it , zi , θ, Ω) βi f(y it | x it , βi , θ)h(βi , zi , Ω)dβi forms the basis of a likelihood function for the observed data. Part 23: Parameter Heterogeneity [88/115] Mixed Model Estimation WinBUGS: SAS: Proc Mixed. Mixing done by quadrature. (Very slow for 2 or more dimensions) Several loglinear models - GLAMM LIMDEP/NLOGIT Classical Uses primarily a kind of GLS/GMM (method of moments algorithm for loglinear models) Stata: Classical MCMC User specifies the model – constructs the Gibbs Sampler/Metropolis Hastings Classical Mixing done by Monte Carlo integration – maximum simulated likelihood Numerous linear, nonlinear, loglinear models Ken Train’s Gauss Code Monte Carlo integration Used by many researchers Mixed Logit (mixed multinomial logit) model only (but free!) Programs differ on the models fitted, the algorithms, the paradigm, and the extensions provided to the simplest RPM, i = +wi. Part 23: Parameter Heterogeneity [89/115] Modeling Parameter Heterogeneity Conditional Model, linear or nonlinear density : f(y i,t | x i,t , βi , θ) g(y i,t , x i,t , βi , θ) Individual heterogeneity in the means of the parameters βi = β Δzi + ui , E[ui | X i , zi ] 0 Heterogeneity in the variances of the parameters Var[ui,k | zi ] ik k exp(ziδk ) Var[ui | zi ] = Φi = diag(ik ) (Different variables in zi may appear in means and variances.) Free correlation: Var[ui | zi ] = Σi = ΓΦiΓ', Γ = a lower triangular matrix with 1s on the diagonal. Part 23: Parameter Heterogeneity [90/115] A Mixed Probit Model Random parameters probit model f(y it | x it , βi ) [(2y it 1) x it βi ] βi β + ui ui ~ N[0, Σ], Σ = ΓΛ 2Γ' Λ = diagonal matrix of standard deviations Γ = I lower triangular matrix or I if uncorrelated LogL(β, Γ, Λ )= i=1 log N βi 2 Γ']d βi ΓΛ , β N[ ] β x 1) [(2y t 1 it i it Ti Part 23: Parameter Heterogeneity [91/115] Maximum Simulated Likelihood logL(θ, Ω)= N i=1 log βi T t 1 f(y it | x it , βi , θ)h(βi | zi , Ω)dβi Ω = β, Δ, 1 ,..., K , δ1 ,..., δK , Γ Part 23: Parameter Heterogeneity [92/115] Simulated Log Likelihood for a Mixed Probit Model Random parameters probit model f(y it | x it , βi ) [(2y it 1) x itβi ] βi β + ui ui ~ N[0, ΓΛ 2Γ'] LogL(β, Γ)= i=1 log N βi 2 [(2y 1) x β ] N[ β , ΓΛ Γ']dβi t 1 it it i Ti Ti 1 R [(2y it 1) x it (β +ΓΛv ir )] r 1 t 1 R We now maximize this function with respect to (β, Γ, Λ ). LogLS i=1 log N Part 23: Parameter Heterogeneity [93/115] Application – Doctor Visits German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods Variables in the file are Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). Note, the variable NUMOBS below tells how many observations there are for each person. This variable is repeated in each row of the data for the person. DOCTOR = 1(Number of doctor visits > 0) HSAT = health satisfaction, coded 0 (low) - 10 (high) DOCVIS = number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLIC = insured in public health insurance = 1; otherwise = 0 ADDON = insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years MARRIED = marital status EDUC = years of education Part 23: Parameter Heterogeneity [94/115] Estimates of a Mixed Probit Model +---------------------------------------------+ | Random Coefficients Probit Model | | Dependent variable DOCTOR | | Log likelihood function -16483.96 | | Restricted log likelihood -17700.96 | | Unbalanced panel has 7293 individuals. | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Means for random parameters Constant -.09594899 .04049528 -2.369 .0178 AGE .02102471 .00053836 39.053 .0000 43.5256898 HHNINC -.03119127 .03383027 -.922 .3565 .35208362 EDUC -.02996487 .00265133 -11.302 .0000 11.3206310 MARRIED -.03664476 .01399541 -2.618 .0088 .75861817 +---------+--------------+----------------+--------+---------+----------+ Constant .02642358 .05397131 .490 .6244 AGE .01538640 .00071823 21.423 .0000 43.5256898 HHNINC -.09775927 .04626475 -2.113 .0346 .35208362 EDUC -.02811308 .00350079 -8.031 .0000 11.3206310 MARRIED -.00930667 .01887548 -.493 .6220 .75861817 Part 23: Parameter Heterogeneity [95/115] Random Parameters Probit Diagonal elements of Cholesky matrix Constant .55259608 .05381892 10.268 AGE .279052D-04 .00041019 .068 HHNINC .03545309 .04094725 .866 EDUC .00994387 .00093271 10.661 MARRIED .01013553 .00643526 1.575 Below diagonal elements of Cholesky matrix lAGE_ONE .00668600 .00071466 9.355 lHHN_ONE -.23713634 .04341767 -5.462 lHHN_AGE .09364751 .03357731 2.789 lEDU_ONE .01461359 .00355382 4.112 lEDU_AGE -.00189900 .00167248 -1.135 lEDU_HHN .00991594 .00154877 6.402 lMAR_ONE -.04871097 .01854192 -2.627 lMAR_AGE -.02059540 .01362752 -1.511 lMAR_HHN -.12276339 .01546791 -7.937 lMAR_EDU .09557751 .01233448 7.749 .0000 .9458 .3866 .0000 .1153 .0000 .0000 .0053 .0000 .2562 .0000 .0086 .1307 .0000 .0000 Part 23: Parameter Heterogeneity [96/115] Application Shoe Brand Choice Simulated Data: Stated Choice, 400 respondents, 8 choice situations 3 choice/attributes + NONE Fashion = High=1 / Low=0 Quality = High=1 / Low=0 Price = 25/50/75,100,125 coded 1,2,3,4,5 then divided by 25. Heterogeneity: Sex, Age (<25, 25-39, 40+) categorical Underlying data generated by a 3 class latent class process (100, 200, 100 in classes) Thanks to www.statisticalinnovations.com (Latent Gold and Jordan Louviere) Part 23: Parameter Heterogeneity [97/115] A Discrete (4 Brand) Choice Model with Heterogeneous and Heteroscedastic Random Parameters Ui,1,t = βF,i Fashioni,1,t +β Q Quality i,1,t +βP,i Pricei,1,t + ε i,1,t Ui,2,t = βF,i Fashioni,2,t +β Q Quality i,2,t +βP,i Pricei,2,t + ε i,2,t Ui,3,t = βF,i Fashioni,3,t +β Q Quality i,3,t +βP,i Pricei,3,t + ε i,3,t Ui,NONE,t = αNONE + εi,NONE,t βF,i = βF + δFSex i +[σ F exp(γ F1 AgeL25i + γ F2 Age2539i )] w F,i ; w F,i ~ N[0,1] βP,i = βP + δPSex i +[σ P exp(γ P1 AgeL25i + γ P2 Age2539i )] w P,i ; w P,i ~ N[0,1] Part 23: Parameter Heterogeneity [98/115] Multinomial Logit Model Estimates Part 23: Parameter Heterogeneity [99/115] Mixed Logit Estimates +---------------------------------------------+ | Random Parameters Logit Model | | Log likelihood function -3911.945 | | At start values -4158.5029 .05929 .05811 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ Random parameters in utility functions BF 1.46523951 .12626655 11.604 .0000 BQ 1.14369857 .16954024 6.746 .0000 Nonrandom parameters in utility functions BP -12.1098155 .91584476 -13.223 .0000 BN .17706909 .07784730 2.275 .0229 Heterogeneity in mean, Parameter:Variable BF:MAL .28052695 .14266576 1.966 .0493 BQ:MAL -.42310284 .20387789 -2.075 .0380 Derived standard deviations of parameter distributions NsBF 1.16430284 .13731611 8.479 .0000 NsBQ 1.81872569 .18108194 10.044 .0000 Heteroscedasticity in random parameters sBF|AG -.32466344 .16986949 -1.911 .0560 sBF0|AG -.51032609 .23975740 -2.129 .0333 sBQ|AG -.37953350 .13798031 -2.751 .0059 sBQ0|AG -.41636803 .17143046 -2.429 .0151 Part 23: Parameter Heterogeneity [100/115] Estimated Elasticities +--------------------------------------------------------------+ | Elasticity Averaged over observations. | | Effects on probabilities of all choices in the model: | | Attribute is PRICE in choice B1 RPL MNL LCM | | * Choice=B1 .000 .000 -.818 -.889 -.801 | | Choice=B2 .000 .000 .240 .291 .273 | | Choice=B3 .000 .000 .244 .291 .248 | | Choice=NONE .000 .000 .241 .291 .219 | | Attribute is PRICE in choice B2 | | Choice=B1 .000 .000 .291 .313 .311 | | * Choice=B2 .000 .000 -1.100 -1.222 -1.248 | | Choice=B3 .000 .000 .270 .313 .284 | | Choice=NONE .000 .000 .276 .313 .268 | | Attribute is PRICE in choice B3 | | Choice=B1 .000 .000 .287 .366 .314 | | Choice=B2 .000 .000 .326 .366 .344 | | * Choice=B3 .000 .000 -.647 -.755 -.674 | | Choice=NONE .000 .000 .311 .366 .302 | +--------------------------------------------------------------+ Part 23: Parameter Heterogeneity [101/115] Conditional Estimators Counterpart to Bayesian posterior mean and variance Ti ˆ = argmax N log 1 R Ω i=1 R r=1 t=1 Pijt (βir | Ω,datait ) Ti ˆ ) P (βˆ | Ω,data Lˆ = i t=1 ijt i it T R ˆ ˆ | Ω,data (β P Π β (1/R)Σ 1 R it ) i ijt t=1 i,k,r r=1 ˆ ˆ i,r βi,k,r w = = ] data | E[β i i,k r=1 T R ˆ ˆ Π P (β | Ω,data ) R (1/R)Σ r=1 ˆ E[β 2 i,k t=1 ijt i it 2 ˆ ΠTt=1Pijt (βˆ i | Ω,data (1/R)ΣRr=1βi,k,r 1 R it ) 2 ˆ i,r βi,k,r w = | datai ] = ˆ ) R r=1 ΠT P (βˆ | Ω,data (1/R)ΣR r=1 t=1 ijt i it ˆ ˆ 2 | data ] - E[β Var[βi,k | datai ] = E[β i,k | datai ] i i,k 2 ˆ E[β i,k | datai ] ± 2 Var[βi,k | datai ] will encompass 95% of any reasonable distribution Part 23: Parameter Heterogeneity [102/115] Individual E[i|datai] Estimates Part 23: Parameter Heterogeneity [103/115] Disaggregated Parameters The description of classical methods as only producing aggregate results As regards “targeting specific groups…” both of these sets of methods NEITHER METHOD PRODUCES ESTIMATES OF INDIVIDUAL is obviously untrue. produce estimates for the specific data in hand. Unless we want to trot out the specific individuals in this sample to do the analysis and marketing, any extension is problematic. This should be understood in both paradigms. PARAMETERS, CLAIMS TO THE CONTRARY NOTWITHSTANDING. BOTH PRODUCE ESTIMATES OF THE MEAN OF THE CONDITIONAL (POSTERIOR) DISTRIBUTION OF POSSIBLE PARAMETER DRAWS CONDITIONED ON THE PRECISE SPECIFIC DATA FOR INDIVIDUAL I. Part 23: Parameter Heterogeneity [104/115] Appendix: EM Algorithm Part 23: Parameter Heterogeneity [105/115] The EM Algorithm Latent Class is a 'missing data' model di,j 1 if individual i is a member of class j If di,j were observed, the complete data log likelihood would be logL c i1 log Ti d j1 i,j t 1 f(y i,t | datai,t , class j) (Only one of the J terms would be nonzero.) N J Expectation - Maximization algorithm has two steps (1) Expectation Step: Form the 'Expected log likelihood' given the data and a prior guess of the parameters. (2) Maximize the expected log likelihood to obtain a new guess for the model parameters. (E.g., http://crow.ee.washington.edu/people/bulyko/papers/em.pdf) Part 23: Parameter Heterogeneity [106/115] Implementing EM 0 Given initial guesses iq0 i10 , i20 ,..., iQ , β0q βi10 , βi20 ,..., βiq0 E.g., use 1/Q for each iq and the MLE of β from a one class model. (Have to perturb each one slightly, as if all iq are equal and all β q are the same, the model will satisfy the FOC.) ˆ0 , δ ˆ0 ˆ (1) Compute w(q|i) = posterior class probabilities, using β Reestimate each β q using a weighted log likelihood Maximize wrt β q i=1 wˆ iq N Ti t=1 log f(y it | x i1 , β q ) (2) Reestimate iq by reestimating δ q ˆ If no zi , new ˆ q=(1/N)Ni=1w(q|i) using old ˆ and new β If zi , Maximize wrt δ q Now, return to step 1. Iterate until convergence. ˆ log i=1 w(q|i) N exp(ziδ q ) Qq=1exp(ziδ q ) Part 23: Parameter Heterogeneity [107/115] Appendix: Monte Carlo Integration Part 23: Parameter Heterogeneity [108/115] Monte Carlo Integration (1) Integral is of the form K= range of v g(v|data,β) f(v|Ω) dv where f(v) is the density of random variable v possibly conditioned on a set of parameters Ω and g(v|data,β) is a function of data and parameters. (2) By construction, K(Ω) = E[g(v|data,β)] (3) Strategy: a. Sample R values from the population of v using a random number generator. b. Compute average K = (1/R) r=1 g(v r|data,β) R By the law of large numbers, plim K = K. Part 23: Parameter Heterogeneity [109/115] Monte Carlo Integration 1 R P f ( u ) f (ui ) g (ui )dui Eui [ f (ui )] ir ui R r 1 (Certain smoothness conditions must be met .) Drawing uir by 'random sampling' uir t (vir ), vir ~ U [0,1] E.g ., uir 1 (vir ) for N [, 2 ] Requires many draws, typically hundreds or thousands Part 23: Parameter Heterogeneity [110/115] Example: Monte Carlo Integral (x1 .9v)(x 2 .9v)(x 3 .9v) exp( v 2 / 2) dv 2 where is the standard normal CDF and x1 = .5, x 2 = -.2, x 3 = .3. (Looks like a RE probit model.) The weighting function for v is the standard normal. Strategy: Draw R (say 1000) standard normal random draws, v r . Compute the 1000 functions (x1 .9v)(x 2 .9v)(x 3 .9v) and average them. (Based on 100, 1000, 10000, I get .28746, .28437, .27242) Part 23: Parameter Heterogeneity [111/115] Generating a Random Draw Most common approach is the "inverse probability transform" Let u = a random draw from the standard uniform (0,1). Let x = the desired population to draw from Assume the CDF of x is F(x). The random draw is then x = F -1 (u). Example : exponential, . f(x)=exp(-x), F(x)=1-exp(-x) Equate u to F(x), x = -(1/)log(1-u). Example: Normal(,). Inverse function does not exist in closed form. There are good polynomial approximations to produce a draw from N[0,1] from a U(0,1). Then x = +v. This leaves the question of how to draw the U(0,1). Part 23: Parameter Heterogeneity [112/115] Drawing Uniform Random Numbers Computer generated random numbers are not random; they are Markov chains that look random. The Original Random Number Generator for 32 bit computers. SEED originates at some large odd number d3 = 2147483647.0 d2 = 2147483655.0 d1=16807.0 SEED=Mod(d1*SEED,d3) ! MOD(a,p) = a - INT(a/p) * p X=SEED/d2 is a random value between 0 and 1. Problems: (1) Short period. Based on 32 bits, so recycles after 231 1 values (2) Evidently not very close to random. (Recent tests have discredited this RNG) Part 23: Parameter Heterogeneity [113/115] L’Ecuyer’s RNG Define: norm = 2.328306549295728e-10, m1 = 4294967087.0, m1 = 4294944443.0, a12 = 140358.0, a13n = 810728.0, a21 = 527612.0, a23n = 1370589.0, Initialize s10 = the seed, s11 = 4231773.0, s12 = 1975.0, s20 = 137228743.0, s21 = 98426597.0, s22 = 142859843.0. Preliminaries for each draw (Resets at least some of 5 seeds) p1 = a12*s11 - a13n*s10, k = int(p1/m1), p1 = p1 - k*m1 if p1 < 0, p1 = p1 + m1, s10 = s11, s11 = s12, s12 = p1; p2 = a21*s22 - a23n*s20, k = int(p2/m2), p2 = p2 - k*m2 if p2 < 0, p2 = p2 + m2, s20 = s21, s21 = s22, s22 = p2; Compute the random number u = norm*(p1 - p2) if p1 > p2, u = norm*(p1 - p2 + m1) otherwise. Passes all known randomness tests. Period = 2191 Pierre L'Ecuyer. Canada Research Chair in Stochastic Simulation and Optimization. Département d'informatique et de recherche opérationnelle University of Montreal. Part 23: Parameter Heterogeneity [114/115] Quasi-Monte Carlo Integration Based on Halton Sequences Coverage of the unit interval is the objective, not randomness of the set of draws. Halton sequences --- Markov chain p = a prime number, r= the sequence of integers, decomposed as I i 0 bi p i H(r|p) i 0 bi p i 1 , r = r1 ,... (e.g., 10,11,12,...) I For example, using base p=5, the integer r=37 has b0 = 2, b1 = 2, and b3 = 1; (37=1x52 + 2x51 + 2x50). Then H(37|5) = 25-1 + 25-2 + 15-3 = 0.448. Part 23: Parameter Heterogeneity [115/115] Halton Sequences vs. Random Draws Requires far fewer draws – for one dimension, about 1/10. Accelerates estimation by a factor of 5 to 10.