Spatial Discrete Choice Models Professor William Greene Stern School of Business, New York University Spatial Correlation Spatially Autocorrelated Data Per Capita Income in Monroe County, New York, USA The Hypothesis of Spatial Autocorrelation Spatial Discrete Choice Modeling: Agenda Linear Models with Spatial Correlation Discrete Choice Models Spatial Correlation in Nonlinear Models Basics of Discrete Choice Models Maximum Likelihood Estimation Spatial Correlation in Discrete Choice Binary Choice Ordered Choice Unordered Multinomial Choice Models for Counts Linear Spatial Autocorrelation ( x i) W( x i) ε , N observations on a spatially arranged variable W ' contiguity matrix;' Wii 0 W must be specified in advance. It is not estimated. spatial autocorrelation parameter, -1 < < 1. E[ε]=0, Var[ε]=2 I ( x i) [I W]1 ε = Spatial "moving average" form E[x ]=i, Var[x ]=2 [(I W)(I W)]-1 Testing for Spatial Autocorrelation Spatial Autocorrelation y Xβ Wε. E[ε|X]=0, Var[ε|X]=2 I E[y|X]=Xβ Var[y|X] = 2 2 WW A Generalized Regression Model Spatial Autoregression in a Linear Model y Wy + Xβ ε. E[ε|X]=0, 2 Var[ε|X]= I y [I W]1 (Xβ ε) [I W]1 Xβ [I W]1 ε E[y|X]=[I W]1 Xβ -1 Var[y|X] = [(I W) (I W)] 2 Complications of the Generalized Regression Model Potentially very large N – GPS data on agriculture plots Estimation of . There is no natural residual based estimator Complicated covariance structure – no simple transformations Panel Data Application E.g., N countries, T periods (e.g., gasoline data) y it x it β c i it ε t Wε t v t = N observations at time t. Similar assumptions Candidate for SUR or Spatial Autocorrelation model. Spatial Autocorrelation in a Panel Alternative Panel Formulations Pure space-recursive - dependence pertains to neighbors in period t-1 y i,t [Wy t 1 ] i regression + it Time-space recursive - dependence is pure autoregressive and on neighbors in period t-1 y i,t y i,t-1 + [Wy t 1 ] i regression + it Time-space simultaneous - dependence is autoregressive and on neighbors in the current period y i,t y i,t-1 + [Wy t ] i regression + it Time-space dynamic - dependence is autoregressive and on neighbors in both current and last period y i,t y i,t-1 + [Wy t ] i + [Wy t 1 ] i regression + it Analytical Environment Generalized linear regression Complicated disturbance covariance matrix Estimation platform Generalized least squares Maximum likelihood estimation when normally distributed disturbances (still GLS) Discrete Choices Land use intensity in Austin, Texas – Intensity = 1,2,3,4 Land Usage Types in France, 1,2,3 Oak Tree Regeneration in Pennsylvania Number = 0,1,2,… (Many zeros) Teenagers physically active = 1 or physically inactive = 0, in Bay Area, CA. Discrete Choice Modeling Discrete outcome reveals a specific choice Underlying preferences are modeled Models for observed data are usually not conditional means Generally, probabilities of outcomes Nonlinear models – cannot be estimated by any type of linear least squares Discrete Outcomes Discrete Revelation of Underlying Preferences Binary choice between two alternatives Unordered choice among multiple alternatives Ordered choice revealing underlying strength of preferences Counts of Events Simple Binary Choice: Insurance Redefined Multinomial Choice Fly Ground Multinomial Unordered Choice - Transport Mode Health Satisfaction (HSAT) Self administered survey: Health Care Satisfaction? (0 – 10) Continuous Preference Scale Ordered Preferences at IMDB.com Counts of Events Modeling Discrete Outcomes “Dependent Variable” typically labels an outcome No quantitative meaning Conditional relationship to covariates No “regression” relationship in most cases The “model” is usually a probability Simple Binary Choice: Insurance Decision: Yes or No = 1 or 0 Depends on Income, Health, Marital Status, Gender Multinomial Unordered Choice - Transport Mode Decision: Which Type, A, T, B, C. Depends on Income, Price, Travel Time Health Satisfaction (HSAT) Self administered survey: Health Care Satisfaction? (0 – 10) Outcome: Preference = 0,1,2,…,10 Depends on Income, Marital Status, Children, Age, Gender Counts of Events Outcome: How many events at each location = 0,1,…,10 Depends on Season, Population, Economic Activity Nonlinear Spatial Modeling Discrete outcome yit = 0, 1, …, J for some finite or infinite (count case) J. i = 1,…,n t = 1,…,T Covariates xit . Conditional Probability (yit = j) = a function of xit. Two Platforms Random Utility for Preference Models Outcome reveals underlying utility Binary: u* = ’x y = 1 if u* > 0 Ordered: u* = ’x y = j if j-1 < u* < j Unordered: u*(j) = ’xj , y = j if u*(j) > u*(k) Nonlinear Regression for Count Models Outcome is governed by a nonlinear regression E[y|x] = g(,x) Probit and Logit Models Prob(yi 1 or 0|xi ) = F(θxi ) or [1- F(θxi )] Implied Regression Function Estimated Binary Choice Models: The Results Depend on F(ε) LOGIT Variable EXTREME Estimate VALUE t-ratio Estimate t-ratio -0.42085 -2.662 -0.25179 -2.600 0.00960 0.078 X1 0.02365 7.205 0.01445 7.257 0.01878 7.129 X2 -0.44198 -2.610 -0.27128 -2.635 -0.32343 -2.536 X3 0.63825 8.453 0.38685 8.472 0.52280 8.407 Constant Estimate PROBIT t-ratio Log-L -2097.48 -2097.35 -2098.17 Log-L(0) -2169.27 -2169.27 -2169.27 Effect on Predicted Probability of an Increase in X1 + 1 (X1+1) + 2 (X2) + 3 X3 (1 is positive) Estimated Partial Effects vs. Coefficients Applications: Health Care Usage German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods Variables in the file are Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). (Downloaded from the JAE Archive) DOCTOR HOSPITAL HSAT DOCVIS HOSPVIS PUBLIC ADDON HHNINC HHKIDS EDUC AGE FEMALE EDUC = 1(Number of doctor visits > 0) = 1(Number of hospital visits > 0) = health satisfaction, coded 0 (low) - 10 (high) = number of doctor visits in last three months = number of hospital visits in last calendar year = insured in public health insurance = 1; otherwise = 0 = insured by add-on insurance = 1; otherswise = 0 = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) = children under age 16 in the household = 1; otherwise = 0 = years of schooling = age in years = 1 for female headed household, 0 for male = years of education An Estimated Binary Choice Model An Estimated Ordered Choice Model An Estimated Count Data Model 210 Observations on Travel Mode Choice CHOICE MODE TRAVEL AIR .00000 TRAIN .00000 BUS .00000 CAR 1.0000 AIR .00000 TRAIN .00000 BUS .00000 CAR 1.0000 AIR .00000 TRAIN .00000 BUS 1.0000 CAR .00000 AIR .00000 TRAIN .00000 BUS .00000 CAR 1.0000 INVC 59.000 31.000 25.000 10.000 58.000 31.000 25.000 11.000 127.00 109.00 52.000 50.000 44.000 25.000 20.000 5.0000 ATTRIBUTES INVT TTME 100.00 69.000 372.00 34.000 417.00 35.000 180.00 .00000 68.000 64.000 354.00 44.000 399.00 53.000 255.00 .00000 193.00 69.000 888.00 34.000 1025.0 60.000 892.00 .00000 100.00 64.000 351.00 44.000 361.00 53.000 180.00 .00000 GC 70.000 71.000 70.000 30.000 68.000 84.000 85.000 50.000 148.00 205.00 163.00 147.00 59.000 78.000 75.000 32.000 CHARACTERISTIC HINC 35.000 35.000 35.000 35.000 30.000 30.000 30.000 30.000 60.000 60.000 60.000 60.000 70.000 70.000 70.000 70.000 An Estimated Unordered Choice Model Maximum Likelihood Estimation Cross Section Case Binary Outcome Random Utility: y* = x + Observed Outcome: y = 1 if y* > 0, 0 if y* 0. Probabilities: P(y=1|x) = Prob(y* > 0|x ) = Prob( > -x) P(y=0|x) = Prob(y* 0|x ) = Prob( -x ) Likelihood for the sample = joint probability = Log Likelihood = i n i n 1 1 Prob(y=y i|x i ) logProb(y=y i|x i ) Cross Section Case y1 y Prob 2 yn x1 x 2 x n or > x1 ) or > x 2 ) ... or > x n ) j | x1 1 or > j | x2 or > Prob 2 ... ... j | x n n or > = Prob(1 Prob(2 Prob(n We operate on the marginal probabilities of n observations Log Likelihoods for Binary Choice Models Logl(|X, y )= i 1 logF 2y i 1 x i Probit t 1 F(t) = (t) exp( t 2 / 2)dt 2 n t (t)dt Logit exp(t) F(t) = (t) = 1 exp(t) Spatially Correlated Observations Correlation Based on Unobservables y 1 x1 u1 y 2 x 2 u2 ... y n x n un 0 u1 1 u 2 W 2 ~ f 0 , 2 WW ... ... ... 0 un n W = the usual spatial weight matrix. In the cross section case, W= I. Now, it is a full matrix. The joint probably is a single n fold integral. Spatially Correlated Observations Correlated Utilities y 1* y 1* x1 1 x1 1 * * y 2 W y 2 x 2 2 I W 1 x 2 2 ... ... ... ... * * x n n yn y n x n n W = the usual spatial weight matrix. In the cross section case, W = I. Now, it is a full matrix. The joint probably is a single n fold integral. Log Likelihood In the unrestricted spatial case, the log likelihood is one term, LogL = log Prob(y1|x1, y2|x2, … ,yn|xn) In the discrete choice case, the probability will be an n fold integral, usually for a normal distribution. LogL for an Unrestricted BC Model LogL(|X, y)=log x n ... x1 q 11 1 q 1q 2w 12 q qqw 1 n 2 2 1 2 21 ... ... ... q n n q n q 1w n 1 q n q 2w n 2 ... q 1q nw 1n ... q 2q nw 2n d ... ... ... 1 1 2 ... n q i 1 if y i = 0 and +1 if y i = 1. One huge observation - n dimensional normal integral. Not feasible for any reasonable sample size. Even if computable, provides no device for estimating sampling standard errors. Solution Approaches for Binary Choice Distinguish between private and social shocks and use pseudo-ML Approximate the joint density and use GMM with the EM algorithm Parameterize the spatial correlation and use copula methods Define neighborhoods – make W a sparse matrix and use pseudo-ML Others … Pseudo Maximum Likelihood Smirnov, A., “Modeling Spatial Discrete Choice,” Regional Science and Urban Economics, 40, 2010. Spatial Autoregression in Utilities y* Wy * X , y 1( y* 0) for all n individuals y* (I W) 1 X (I W) 1 (I W) 1 t 0 (W)t assumed convergent = A = D + A -D where D = diagonal elements y* AX D A - D Private Social Suppose individuals ignore the social "shocks." Then nj 1aij x j Prob[y i 1 or 0 | X] F (2y i 1) di , probit or logit. Pseudo Maximum Likelihood Assumes away the correlation in the reduced form Makes a behavioral assumption Requires inversion of (I-W) Computation of (I-W) is part of the optimization process - is estimated with . Does not require multidimensional integration (for a logit model, requires no integration) GMM Pinske, J. and Slade, M., (1998) “Contracting in Space: An Application of Spatial Statistics to Discrete Choice Models,” Journal of Econometrics, 85, 1, 125-154. Pinkse, J. , Slade, M. and Shen, L (2006) “Dynamic Spatial Discrete Choice Using One Step GMM: An Application to Mine Operating Decisions”, Spatial Economic Analysis, 1: 1, 53 — 99. y*=Xθ+, = Wε+u = [I-W]1u = Au Cross section case: =0 Probit Model: FOC for estimation of is based on the ˆi = y i E [ | y i ] generalized residuals u (y i (x i ))(x i ) x i 1 i (x )[1 (x )] = 0 i i Spatially autocorrelated case: Moment equations are still valid. Complication is computing the variance of the moment n equations, which requires some approximations. GMM y*=Xθ+, = Wε+u = [I-W]1u = Au Autocorrelated Case: 0 Probit Model: FOC for estimation of is based on the ˆi = y i E [ | y i ] generalized residuals u i n 1 x i x i y i a ii () a ii () zi =0 x i 1 x i a ii () a ( ) ii Requires at least K +1 instrumental variables. GMM Approach Spatial autocorrelation induces heteroscedasticity that is a function of Moment equations include the heteroscedasticity and an additional instrumental variable for identifying . LM test of = 0 is carried out under the null hypothesis that = 0. Application: Contract type in pricing for 118 Vancouver service stations. Copula Method and Parameterization Bhat, C. and Sener, I., (2009) “A copula-based closed-form binary logit choice model for accommodating spatial correlation across observational units,” Journal of Geographical Systems, 11, 243–272 Basic Logit Model y *i x i i , y i 1[y i* 0] (as usual) Rather than specify a spatial weight matrix, we assume [1 , 2 ,..., n ] have an n-variate distribution. Sklar's Theorem represents the joint distribution in terms of the continuous marginal distributions, (i ) and a copula function C[u1=(1 ) ,u2 (2 ) ,...,un (n ) | ] Copula Representation Model Likelihood Parameterization Other Approaches Case A (1992) Neighborhood influence and technological change. Economics 22:491–508 Beron KJ, Vijverberg WPM (2004) Probit in a spatial context: a monte carlo analysis. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Berlin Case (1992): Define “regions” or neighborhoods. No correlation across regions. Produces essentially a panel data probit model. Beron and Vijverberg (2003): Brute force integration using GHK simulator in a probit model. Others. See Bhat and Sener (2009). Ordered Probability Model y* βx , we assume x contains a constant term y 0 if y* 0 y = 1 if 0 < y* 1 y = 2 if 1 < y* 2 y = 3 if 2 < y* 3 ... y = J if J-1 < y* J In general : y = j if j-1 < y* j , j = 0,1,...,J -1 , o 0, J , j-1 j, j = 1,...,J Outcomes for Health Satisfaction A Spatial Ordered Choice Model Wang, C. and Kockelman, K., (2009) Bayesian Inference for Ordered Response Data with a Dynamic Spatial Ordered Probit Model, Working Paper, Department of Civil and Environmental Engineering, Bucknell University. Core Model: Cross Section y *i βx i i , y i = j if j 1 y i* j , Var[i ] 1 Spatial Formulation: There are R regions. Within a region y *ir βx ir ui ir , y ir = j if j 1 y ir* j Spatial heteroscedasticity: Var[ir ] r2 Spatial Autocorrelation Across Regions u = Wu + v , v ~ N[0,v2 I] u = (I-W) 1 v ~ N[0,v2 {(I-W)(I-W)} 1 ] The error distribution depends on 2 parameters, v2 and Estimation Approach: Gibbs Sampling; Markov Chain Monte Carlo Dynamics in latent utilities added as a final step: y*(t)=f[y*(t-1)]. OCM for Land Use Intensity OCM for Land Use Intensity Estimated Dynamic OCM Unordered Multinomial Choice Core Random Utility Model Underlying Random Utility for Each Alternative U(i,j) = j x ij ij , i = individual, j = alternative Preference Revelation Y(i) = j if and only if U(i,j) > U(i,k) for all k j Model Frameworks Multinomial Probit: [1 ,..., J ] ~ N[0, ] Multinomial Logit: [1 ,..., J ] ~ iid type I extreme value Multinomial Unordered Choice - Transport Mode Decision: Which Type, A, T, B, C. Depends on Income, Price, Travel Time Spatial Multinomial Probit Chakir, R. and Parent, O. (2009) “Determinants of land use changes: A spatial multinomial probit approach, Papers in Regional Science, 88, 2, 328-346. Utility Functions, land parcel i, usage type j, date t U(i,j,t)=jt x ijt ik ijt Spatial Correlation at Time t ij l 1 w il lk n Modeling Framework: Normal / Multinomial Probit Estimation: MCMC - Gibbs Sampling Modeling Counts Canonical Model Rathbun, S and Fei, L (2006) “A Spatial Zero-Inflated Poisson Regression Model for Oak Regeneration,” Environmental Ecology Statistics, 13, 2006, 409-426 Poisson Regression y = 0,1,... exp() j Prob[y = j|x] = j! Conditional Mean = exp(x) Signature Feature: Equidispersion Usual Alternative: Various forms of Negative Binomial Spatial Effect: Filtered through the mean i = exp(x i +i ) i = m 1 w im m i n