DISCRETE CHOICE MODELING IN HEALTH ECONOMICS COPYRIGHT OF: ABHINAV ANAND JYOTI ARORA SHRADDHA RAMSWAMY INTRODUCTION Studies suggest that self rated health score is a reliable predictor of health status We investigate impact of a host of personal and status characteristics such as age, gender etc on the health perception of US Citizens DATA Dataset : NHANES Epidemiological Followup Study :1992 Health status , represented by Yi coded as follows POOR FAIR GOOD • Yi = 1 • Yi = 2 • Yi = 3 VERY GOOD EXCELLENT • Yi = 5 • Yi = 4 Age is measured in years, education is measured in terms of number of years of schooling completed and dichotomous variable is created for gender (female = 1) and race (black = 1). METHODOLOGY 1) Ordered Logit model for the first part of our enquiry. 2) Sequential Logit Model for the second part of our enquiry ORDERED LOGIT MODEL SPECIFICATION A multinomial choice model where the values taken by the dependent variable takes a natural order. Yi* is latent variable such that 5 Yi = j when αj-1 < Yi*< αj where j= 1,2,3,4,5 and Yi* = β’Xi + ui where u follows logistic distribution. α1 α2 α3 α4 ORDERED LOGIT MODEL Where F( ) is a cdf and j = 1,2,3,4,5 and i is theith individual We assume that u follows logistic distribution CONTD…. P(Yi =1/Xi ) = F [ α1 – β’X] P(Yi = 2/Xi) = F [α2 – β’X] – F[α1-β’X] P(Yi = 3/Xi) = F [α3 – β’X] – F [α2 -β’X] P(Yi = 4/Xi) = F [α4– β’X] – F [α3 – β’X] P(Yi = 5/Xi) = 1– F [α4-β’X] Where F ( ) is defined as above. For estimating the model we specify 5 dummy variables for the ith individual with the following rule Zij = 1 if Yi = j where j = 1,2,3,4,5. = 0 otherwise ORDERED LOGIT MODEL ESTIMATION Using MLE 5 5 Assuming independent observations, we get 3712 5 F Using Newton Raphson formula. F RESULTS COMMAND: proc logistic data = sasuser.nhanes descending; model health = age gender race edu south; run; Analysis of Maximum Likelihood Estimates Parameter Intercept 5 Intercept 4 Intercept 3 Intercept 2 Age gender race edu south DF 1 1 1 1 1 1 1 1 1 Estimate -1.446 0.1255 1.6139 3.138 -0.0313 0.00989 -0.2122 0.1553 -0.7989 Standard Error 0.2473 0.2463 0.2479 0.2539 0.00262 0.0605 0.0669 0.0114 0.1072 Chi-Square 34.1904 0.2598 42.3953 152.7003 143.3251 0.0267 10.0676 184.097 55.5218 Pr > Chisq <.0001 0.6103 <.0001 <.0001 <.0001 0.8701 0.0015 <.0001 <.0001 ..contd Odds Ratio Estimate Point Effect Estimate Age 0.969 95% Wald Confidence Limit 0.964 0.897 1.137 race 0.709 0.922 edu 1.168 south 0.45 1.142 0.365 Percent Concordant 65.8 Somers'D 0.322 Percent Discordant 33.6 Gamma 0.324 Percent Tied 0.6 Tau-a 0.244 Pairs 522179 9 c 0.974 gender1.01 0.809 Association of Predicted Probabilities and Observed Responses 1.194 0.555 0.661 Probability estimate for ith individual (-1.4460+β’Xi) 1 (-1.4460+β’Xi) 2 (0.1225+β’Xi) (0.1225+β’Xi) (1.6139+β’Xi) 3 (1.6139+β’Xi) (3.138+β’Xi) 4 (3.138+β’Xi) 5 1 (-1.4460+β’Xi) (-1.4460+β’Xi) (0.1225+β’Xi) (0.1225+β’Xi) (1.6139+β’Xi) (1.6139+β’Xi) (3.138+β’Xi) (3.138+β’Xi) INFERENCE (ORDERED LOGIT) One additional year of age results in a 3.13% decreases in odds ratio of higher self rating. The impact of gender is almost negligible. Blacks are 19.12% less likely than whites to rate their health at higher response values An additional year of schooling leads to 16.80% increase in odds ratio higher self rating The Southern residents in each district are 55% less likely than the northern to rate their health at higher response values. There are 522179 pairs of observations Of these 65.8% are concordant pairs while 33.6% are discordant pairs. SEQUENTIAL LOGIT MODEL Choices/Responses follow a sequence, so we need (m-1) latent variables to characterize (m) unordered choices. Self-rated health measure can be considered as a purely cardinal variable following a sequence instead of some natural ordering. This allows us to perform discrete choice analysis using (non-ordered) sequential logit model. SEQUENTIAL LOGIT MODEL Framework •Five choices, and hence we have 4 latent variables to describe the choices. •Choices in each step are independent of the previous step. Root (Sample) Poor (1) Fair (2) Fair+++ (2 or 3 or 4 or 5) Good++ (3 or 4 or 5) Good (3) VeryGood+ (4 or 5) Very Good (4) Excellent (5) Probability Computation Example P (Yi = 2) = P [Yi ≠ 1 and Yi = 2 |Yi ≠ 1] = P [Yi ≠ 1] P [Yi = 2|Yi ≠ 1 ] Therefore, for an individual i the conditional probability that his self-rated health measure will have a value j є {1,2,3,4,5} will be given by : Pij= P (Yi = j |Xi ) and so on till j = 5 ESTIMATION IN SEQUENTIAL LOGIT MODEL One-shot joint optimization with Independent Examples Maximum Likelihood Estimation •Thus, the parameter β1 can be estimated by dividing the entire sample into two groups Poor Fair OR Good OR Very Good OR Excellent •β2 can be estimated by first taking the sub-sample of those did not report poor into two groups Fair Good OR Very Good OR Excellent •β3 can be estimated by taking the sub-sample of those who didn’t report poor or fair into two groups Good Repeated Optimization Very Good OR Excellent •β4 can be estimated by taking the sub-sample of those who didn’t report poor or fair or good into two groups Very Good In each case the binary models can be estimated by logit using MLE. Excellent SEQUENTIAL LOGIT MODEL Implementation in SAS data seqlogit; Among those who report fair or good or very good or excellent health, the odds of reporting fair (rather than good++) are 64% lower among residents south of baseline than residents north of baseline of the same age, gender, education and race. set seqlogit; fairplus = (shm>1); fair = (shm=2); if fairplus = 1; run; proc format; value shm 1='poor' 2-5='fair+++'; value gender 0='male' 1='female'; value race 0='white' 1='black'; value resid 0='north' 1='south'; run; proc qlim data=seqlogit; *covest=qml; class race resid gender; endogenous fair ~ discrete(dist=logistic order=formatted); model fair = age gender race edu resid; format gender gender. race race. resid resid.; run; The QLIM Procedure Parameter Estimates Standard Estimate Error Parameter Intercept Age t Value Pr > |t| -0.9028 0.40898 -2.21 0.0273 0.031085 0.004264 7.29 <.0001 -0.03239 0.098606 -0.33 0.7426 Gender female Gender male 0 . . . Race black 0.12122 0.10717 1.13 0.258 Race white 0 . . . -0.15498 0.018192 -8.52 <.0001 Edu Resid south -1.03592 0.142367 -7.28 <.0001 Resid north 0 . . . CONCLUSION Ordered Logit Model Sequential Logit Model • Age, race, education (in terms of number of years of schooling ), and having residence in southern part of the district have a significant impact on self rated health. • Gender doesn’t have a significant impact. • Age, education ( in terms of schooling) and having residence in southern part of the district have a significant impact on self rated health. • Gender and race don’t have significant impact. REFERENCES • Agresti A. Categorical Data Analysis, Second edition. New York: John Wiley & Sons; 2002 • Gardiner J C. , Luo Z. Logit Models in Practice: B, C, E, G, M, N, O… SAS Institute Inc. ; 2011