Hybrid Choice Modeling of New Technologies for Car Use in Canada: A Simulated Maximum Likelihood Approach PRELIMINARY Please, do not quote or cite without the authors’ permission by Denis Bolduc1 , Nathalie Boucher, Ricardo Alvarez-Daziano Département d’Économique Université Laval. Pavillon J.-A.-DeSève, QC G1V 0A6, (418) 656-5427; Fax : (418) 656-2707 denis.bolduc@ecn.ulaval.ca (1 corresponding author) nboucher@ecn.ulaval.ca ricardo.alvarez-daziano.1@ulaval.ca May 8, 2008 Abstract In the last decade, a new trend in discrete choice modeling has emerged in which psychological factors are explicitly incorporated in order to enhance the behavioural representation of the choice process. In this context, hybrid choice models expand on standard choice models by including attitudes and perceptions as latent (unobserved) variables. The complete model is composed of a group of structural equations describing the latent variables in terms of observable exogenous variables, and a group of measurement relationships linking latent variables to observable indicators. With a multinomial logit model to describe the choice, the contribution of a given observation to the likelihood function of the full system is an integral of dimension equal to the number of latent variables in the model. Although the estimation of hybrid choice models requires the evaluation of complex multi-dimensional integrals, simulated maximum likelihood is implemented in order to solve the integrated multi-equation model. In practice, we replace the multidimensional integrals with a smooth simulator with good properties, thus leading to a maximum simulated likelihood (MSL) solution. In this paper we apply the hybrid choice modeling framework to data from a survey conducted by the Energy and Materials Research Group (EMRG) (Simon Fraser University, 2002-2003) of virtual personal vehicle choices made by Canadian consumers when faced with technological innovations. The survey also includes a complete list of indicators, allowing us to apply a hybrid choice model formulation. In our application, the choice model is assumed to be of the logit mixture type, which increases the computational complexity in the evaluation of the likelihood function. The choice model contains variables that are normally used in a travel choice model specification as well as two transportation-related latent variables that we call EC (environmental concern) and ACF (appreciation of car features). Our framework also makes it possible to tackle the problem of measurement errors in variables in a very natural way. Our initial results indicate that the quality of the information regarding income was questionable. We therefore include in the choice model a third latent variable that we call (REV) which represents the latent true income variable. We conclude that the hybrid choice model is genuinely capable of adapting to practical situations by including latent variables among the set of explanatory variables. Incorporating perceptions and attitudes as well as allowing for measurement errors lead to more realistic models and give a better description of the profile of consumers and their adoption of new private transportation technologies. 1 1 Introduction In the current context of global warming, it is more relevant than ever to explore sustainable solutions to problems created by personal transportation. Technological innovation and the use of cleaner alternative fuels are among the solutions that have traditionally been proposed to contain, or at least alleviate, these problems. In this way it becomes important to model the effect of introducing new technologies for car use on transportation choice behaviour in a more realistic and expanded way. The modeling challenge is to develop transportation demand models that are not only able to adequately predict individual preferences but also to recognize the impact of psychological factors, such as perceptions and attitudes. In this paper we analyze the practical implementation of hybrid choice models in order to include perceptions and attitudes in a standard discrete choice setting. In section 2, we discuss the derivation of discrete choice modelling and their integration into a hybrid model framework using latent variables. In section 3, we present the empirical data on car choice that we use from the survey. In section 4 we discuss the results of a multinomial logit (MNL) model estimation focusing exclusively on the car choice dimension. In section 5 we expand this model emphasizing on the results of each partial model that configures the hybrid model setting. In section 6, we present the main conclusion of our work. 2 2.1 Hybrid Choice Modeling Standard Discrete Choice Modeling In discrete choice modeling, the most common approach is based on random utility theory. According to this theory, each individual has a utility function associated with each of the alternatives. This individual function can be divided into a systematic part, which considers the effect of the explanatory variables, and a random part that takes into account all the effects not included in the systematic part of the utility function. In other words, choices are modeled using a structural equation (1) – the utility function – representing the individual preferences, where the explanatory variables are the alternative attributes and individual characteristics. The observed choice corresponds to the alternative which maximizes the individual utility function, a process represented by a measurement equation (2). Because the utility function has a random nature, the output of the model actually corresponds to the choice probability of individual n choosing alternative i. The set of equations describing the standard discrete choice setting is given by: Uin = Xin β + υin (1) 2 yin = 1 if Uin ≥ Ujn , j 6= i 0 otherwise, (2) where Uin corresponds to the utility of alternative i as perceived by individual n; Xin is a row vector of attributes of alternative i and socioeconomic characteristics of individual n; β is a column vector of unknown parameters; υin is an error term and yin corresponds to an indicator of whether alternative i is chosen by individual n or not. Different choice models can be derived depending on the assumptions considered for the distribution of the random error term (Ben-Akiva and Lerman, 1985). So far the workhorses in this area have been the multinomial logit model (McFadden, 1974) and the nested logit model (Ben-Akiva, 1973). Both offer closed forms for choice probabilities but rely on restrictive simplifying assumptions. To gain generality, more flexible models have been incorporated in practice. Indeed, one powerful modeling alternative is the logit mixture model (Bolduc and Ben-Akiva, 1991 and Brownstone and Train, 1999), which can approximate any random utility maximization model (McFadden and Train, 2000). The main idea of this kind of model is to consider more than one random component, allowing for the presence of a more flexible covariance structure. The estimation implies the evaluation of integrals without a closed form solution, although it is possible to use computer aided simulation techniques to approximate it. In this way it can be seen that the development of discrete choice modeling has evolved quickly and that powerful models can be used. However, under this whole approach, discrete choice models represent the decision process as an obscure black box, where attitudes, perceptions and knowledge are neglected. This is why in the last decade discrete choice modeling has evolved towards an explicit incorporation of psychological factors affecting decision making (Ben-Akiva et al., 2002). According to 2002 Nobel laureate Daniel Kahneman, there still remains a significant difference between economist modellers that develop practical models of decision-making and behavioural scientists who focus on in-depth understanding of agent behaviour. Both have fundamental interests in behaviour but each work with different assumptions and tools. McFadden (1986) points out the need to bridge these worlds by incorporating attitudes in choice models. In his 2000 Nobel lecture, McFadden emphasized the need to incorporate attitudinal constructs in conventional economics models of decision making. 2.2 Latent Variables and Discrete Choice: The Hybrid Choice Model The new trend in discrete choice modeling is to enhance the behavioural representation of the choice process. As a direct result, the model specification is improved and the model gains in predictive power. Econometrically, the improved representation model as displayed in Figure 1 involves dealing with a choice model formulation that contains 3 unobserved psychometric variables (attitudes, perceptions, opinions) as latent variables incorporated as explanatory variables. Since the latent variables are not observed, they are normally linked to questions of a survey: the indicators. These indicators can be continuous, binary or categorical variables expressed by responses to attitudinal and perceptual survey questions. The latent variable model is composed of a structural equation, which describes the latent variables in terms of observable exogenous variables, and a group of measurement relationships (measurement model ) linking latent variables to indicators (Jöreskog and Sörbom 1984). It is possible to integrate the latent variable model into the standard choice model setting, obtaining a group of structural and measurement equations leading to the Hybrid Choice Model (HCM) framework (Ben-Akiva et al., 2002 and Walker and BenAkiva, 2002). Figure 1: Latent Variables and discrete Choice. The hybrid choice model that we consider may be written as follows: Structural equations x∗n = Bwn + ζn , ζn ∼ N (0, Ψ), Un = Xn β + Cx∗n + υn , (3) (4) Measurement equations In = α + Λx∗n + εn , yin = εn ∼ N (0, Θ) 1 if Uin ≥ Ujn , j 6= i 0 otherwise, 4 (5) (6) where x∗n is a (L × 1) vector of latent variables; wn is a (M × 1) vector of explanatory variables affecting the latent variables; B is a (L × M ) matrix of unknown parameters describing the effects of the explanatory variables wn on the latent variables; and Ψ is a (L × L) variance covariance matrix which describes the relationship among the latent variables through the error term. The choice model in equation (4) is written in vector form where we assume that there are J alternatives. Therefore, Un is a (J × 1) vector of utilities; υn is a (J ×1) error vector assumed to be independent and identically distributed (i.i.d.) Type 1 extreme value distributed. Xn is a (J × K) matrix with Xin designating the ith row. β is a (K × 1) vector of unknown parameters. C is a (J × L) matrix of unknown parameters associated with the latent variables present in the utility function, with Ci designating the ith row of matrix C. In the set of measurement equations, In corresponds to a (R × 1) vector of indicators of latent variables associated with individual n; α is a (R × 1) vector of constants and Λ is a (R × L) matrix of unknown parameters that relate the latent variables to the indicators. The term εn is a (R × 1) vector of independent error terms. This implies that Θ is a diagonal matrix with variance terms on the diagonal. Finally, we stack the choice indicators yin ’s into a (J × 1) vector called yn . If the latent variables were not present, the choice probability would correspond exactly to the standard choice probability that we denote P (yn |Xn , β ). In a setting with given values for the latent variables x∗n , the choice probability would be represented by P (yn |x∗n , Xn , θ ) where θ contains all the unknown parameters in the choice model of equation (4). Since latent variables are not actually observed, the choice probability is obtained by integrating the latter expression over the whole space of x∗n : Z P (yn |Xn , wn , θ, B, Ψ) = P (yn |x∗n , Xn , θ )g(x∗n |B, wn )dx∗n , (7) x∗n which is an integral of dimension equal to the number of latent variables in x∗n and where g(x∗n |B, wn ) is the density of x∗n defined in equation (3). Indicators are introduced in order to characterize the unobserved latent variables, and econometrically they permit identification of the parameters of the latent variables. Indicators also provide efficiency in estimating the choice model with latent variables, because they add information content. The variables yn and In are assumed to be correlated only via the presence of the latent variables x∗n in equations (4) and (5). Given our assumptions, the joint probability of observing yn and In may thus be written as: Z P (yn , In |Xn , wn , δ) = P (yn |x∗n , Xn , θ )f (In |Λ, x∗n )g(x∗n |B, wn )dx∗n , (8) x∗n where f (In |Λ, x∗n ) is the density of In defined in equation (5). The vector δ designates the full set of parameters to estimate. 5 Few applications of the hybrid choice model are found in the literature. While BenAkiva and Boccara (1987) develop the idea of hybrid modeling in the context of a more comprehensive travel behaviour framework, Walker and Ben-Akiva (2002) and Morikawa, Ben-Akiva and McFadden (2002) extend this development, showing how the model can be estimated and implemented in practice. The contribution of Bolduc, Ben-Akiva, Walker and Michaud (2005) is the first example of the analysis and implementation of a situation characterized by a large number of latent variables and a large number of choices. The work of Ben-Akiva, Bolduc and Park (2008) is a recent application of the Hybrid Choice setting applied to the freight sector. 2.3 Estimation In practice, with a large number of latent variables (more than 3), we replace the multidimensional integral with a smooth simulator which has good properties. The number of latent variables has an impact on the computation of the joint probability in equation (9). Indeed, if υn is i.i.d. extreme value type 1 distributed, then conditional on x∗n the probability of choosing alternative i has the multinomial logit form, which leads to the following expression: Z P (yn , In |Xn , wn , δ) = x∗ exp(Xin β + Ci x∗n ) P f (In |Λ, x∗n )g(x∗n |B, wn )dx∗n . ∗ exp(Xjn β + Cj xn ) (9) j Taking advantage of the expectation form, we can replace the probability with the following empirical mean: S 1 X exp(Xin β + Ci x∗s n ) P P̃ (yn , In |Xn , wn , δ ) = f (In |Λ, x∗s n ), S s=1 exp(Xjn β + Cj x∗s ) n (10) j where x∗s n corresponds to a random draw s from its distribution. This sum is computed over S draws. This simulator is known to be unbiased, consistent and smooth with respect to the unknown parameters. Replacing P (yn , In |Xn , wn , δ) with P̃ (yn , In |Xn , wn , δ) in the log likelihood leads to a maximum simulated likelihood (MSL) solution. We thereN P fore consider the following objective function: max ln P̃ (yn , In |Xn , wn , δ). δ n=1 In the past few years, a lot of progress has been made regarding MSL estimation. Train (2003) gives an in-depth analysis of the properties of MSL estimators. Recent results, mostly attributable to Bhat (2001), suggest that simulation should exploit Halton draws. Halton type sequences are known to produce simulators with a given level of accuracy using many fewer draws than when using conventional uniform random draws (Ben-Akiva et al. 2002; Munizaga and Alvarez-Daziano 2005). Simulated maximum likelihood is now 6 well known and has been applied in numerous circumstances. The logit probability kernel present in equation (10) makes the simulated log likelihood fairly well behaved. Asymptotically, meaning as S → ∞ and as N → ∞, the solution becomes PN identical to a solution arising from maximizing the usual log likelihood function: n=1 ln P (yn , In |Xn , wn , δ ). Regarding the measurement model, we assume that each equation that links the indicators and the latent variables corresponds to a continuous, a binary, or a multinomial ∗ ordered response. A measurement equation r in the continuous case is given by Irn = Irn with: ∗ (11) = αr + Λr x∗n + εrn , εrn ∼ N (0, θr2 ) Irn In the binary case, we rather get instead: 1 Irn = 0 ∗ ≥0 if Irn otherwise, (12) while in the multinomial ordered case with Q responses, we obtain: ∗ 1 if γ0 < Irn ≤ γ1 2 ∗ if γ1 < Irn ≤ γ2 Irn = .. . ∗ Q ≤ γQ , if γQ−1 < Irn (13) where Irn and εrn are the rth element of In and εn respectively. θr2 is the rth element on the diagonal of Θ, and Λr denotes row r of Λ. In the multinomial cases, the γq ’s are estimated. By convention, γ0 and γQ are fixed to −∞ and ∞ respectively. We assume that Θ is diagonal, which implies that the indicators are not cross-correlated. For identification, the constant terms αr must be set to 0 in the non continuous cases. Given our assumptions, the density f (In |Λ, x∗n ) that we denote as f (In ) to simplify, corresponds to: R Y f (In ) = f (Irn ). (14) r=1 If measurement equation r is continuous, then 1 Irn − αr − Λr x∗n f (Irn ) = φ , θr θr (15) where φ denotes the probability density function (pdf) of a standard normal. If the measurement equation r corresponds to a binary response, then f (Irn ) = Φ αr + Λr x∗n θr Irn 1−Φ 7 αr + Λr x∗n θr (1−Irn ) , (16) where Φ denotes the cumulative distribution function (cdf) of a standard normal. Finally, if measurement equation r corresponds to a multinomial ordered response, then γq−1 − Λr x∗n γq − Λr x∗n −Φ . (17) f (Irn = q) = Φ θr θr Except for the continuous case, the variances θr cannot be estimated. They are fixed to 1 automatically. 3 The Data The data for this study are from a survey conducted by EMRG (Energy and Materials Research Group, Simon Fraser University) in 2002-2003 on 1500 Canadian consumers living in urban areas with populations over 250,000 people (Horne, Jaccard, and Tiedman, 2005). Survey participants were first contacted in a telephone interview for personalizing a detailed questionnaire that was then mailed to them. The telephone conversation allowed general information to be gathered on the socioeconomic characteristics of the participants and their households, as well as on their automobile fleet. The mailed questionnaire consisted in: • Part 1: Transportation options, requirements and habits • Part 2: Virtual personal vehicle choices made by Canadian consumers when faced with technological innovations (stated preferences experiment) • Part 3: Transportation mode preferences • Part 4: Views on transportation issues • Part 5: Additional information (gender, education, income) The stated preferences (SP) hypothetical car choices in Part 2 considered four vehicle types: • A conventional vehicle (operating on gasoline or diesel) (SGV) • A natural-gas vehicle for alternative fuel vehicle (AFV) • A hybrid vehicle (gasoline-electric) (HEV) • A hydrogen fuel cell vehicle (HFC) 8 The characteristics of the vehicles were depicted as: • Capital Cost: purchase price • Operating Cost: fuel cost • Fuel Available: percentage of stations selling the proper fuel type • Express Lane Access: whether or not the vehicle would be granted express lane access • Emissions Data: emissions compared to a standard gasoline vehicle • Power: power compared to their current vehicle Under the SP experimental design of the survey, each participant was asked to make up to four consecutive virtual choices while the features of the vehicle were modified after each round. The sample has 866 completed surveys (77% response rate) and since each respondent provided up to four vehicle choices, after some clean up, there remains 1877 usable observations. The various values assumed for the characteristics of the vehicles that were used in the experimental design of the SP survey are set out in Table 1 on the next page. The basis of comparison of the experimental design refers to the motor vehicle that is the most frequently used by the respondent. 4 Multinomial Logit Results We first apply a standard multinomial logit model, using as variables the list of attributes for the vehicle choice. The choice model is depicted in Figure 2, while in Table 2 we present the estimation results, which are equivalent to those reported by Horne, Jaccard, and Tiedman (2005). The estimates of the standard multinomial logit model show that the SP experiment was well designed in the sense that it allows one to estimate a significant effect of each experimental attribute. Signs are also correct: costs are perceived as negative attributes, while Fuel Available, Express Lane Access and Power are considered good characteristics. The magnitude of each estimated parameter permits one to rank the impact of an attribute on the choice probability. For example, power appears to be much more important than accessibility to an express lane. The alternative specific constants (ASC) allow for reproducing the experimental market shares. 9 TABLE 1 SP Design FIGURE 2 Basic Choice Model. 10 ASC AFV ASC HEV ASC HFC Capcost Fuelcost Fuelavail Expaccess Power Number of observations Final log-likelihood Adjusted rho-square Estimates -4.500 -1.380 -2.100 -0.856 -0.826 1.360 0.156 2.700 1877 -1984.55 0.234 Student t -6.81 -2.18 -3.26 -4.07 -4.18 7.32 2.29 4.12 TABLE 2 Multinomial Logit Results 5 Hybrid Choice Model 5.1 Definition of the Latent Variables The first step to build a hybrid choice model is to define the latent variables involved. In the survey, people answered several questions. We conduct our analysis focusing on: Question 26 : What is your level of support for/opposition to the following government actions that would influence your transportation system? Degree of support: 5 levels from Strongly Opposed to Strongly Supportive. 1. Improving traffic flow by building new roads and expanding existing roads 2. Discouraging automobile use with road tolls, gas taxes and vehicle surcharges 3. Making neighbourhoods more attractive to walkers and cyclists using bike lanes and speed controls 4. Reducing vehicle emissions with regular testing and manufacturer emissions standards 5. Making carpooling and transit faster by giving them dedicated traffic lanes and priority at intersections 6. Making transit more attractive by reducing fares, increasing frequency, and expanding route coverage 11 7. Reducing transportation distances by promoting mixed commercial, residential and high-density development 8. Reducing transportation needs by encouraging compressed workweeks and working from home Question 29 : Thinking about your daily experiences, how serious do you consider the following problems related to transportation to be? Degree of seriousness of problem: 5 levels from Not a Problem to Major Problem. 1. Traffic congestion you experience while driving 2. Traffic noise you hear at home, work, or school 3. Vehicle emissions which impact local air quality 4. Accidents caused by aggressive or absent-minded drivers 5. Vehicle emissions which contribute to global warming 6. Unsafe communities due to speeding traffic Question 6 : What importance did the following factors have in your family’s decision to purchase this vehicle? Degree of importance: 5 levels from Not at All Important to Very Important. 1. Purchase Price 2. Fuel Economy 3. Horsepower 4. Safety 5. Seating Capacity 6. Reliability 7. Appearance and Styling Considering answers to those questions as indicators we identify two travel related latent variables: • Environmental Concern (EC): related to transportation and its environmental impact (x∗1n ) 12 • Appreciation of new car features (ACF): related to car purchase decisions and how important are the characteristics of this new alternative (x∗2n ) Our model also includes a third latent variable called (REV) to account for the measurement error problem in quantifying the income variable. The hybrid model setting that we consider is represented in Figure 3, where the complete set of structural and measurement equations is sketched, depicting the relationships between explanatory variables and each partial model. Indeed, we can distinguish the choice model, which is centered on the utility function modeling; the latent variables structural model, which links the latent variables with the characteristics of the traveller and the latent variables measurement model, which links each latent variable with the indicators. FIGURE 3 Hybrid Choice Model. 13 5.2 Car Choice Model The set of equations for the mode choice model alone are given by: USGV n UAF V n UHEV n UHF Cn = = = = VSGV n + c1,2 ACFn + υSGV n VAF V n + c2,1 ECn + c2,2 ACFn + υAF V n VHEV n + c3,1 ECn + c3,2 ACFn + υHEV n VHF Cn + c4,1 ECn + c4,2 ACFn + c4,3 REVn + υHF Cn , (18) (19) (20) (21) where Vin = Xin β, denotes the deterministic part of the utility expression given in equation (1). It contains the variables listed in Table 2. The utility specifications also contains the effect of the latent variables. The latent variable related to environmental concern (EC) was not considered on the standard gasoline vehicle (SGV). On the basis of several attempts, the latent income variable (REV) was included only in the Hydrogen fuel cell vehicle (HFC) alternative. In Table 4, we report the results of the choice model. ASC AFV ASC HEV ASC HFC Capcost Fuelcost Fuelavail Expaccess Power Latent Variables ACF on SGV EC on AFV ACF on AFV EC on HEV ACF on HEV EC on HFC ACF on HFC REV on HFC Number of individuals Final global function Adjusted rho-square Number of Halton draws Hybrid Choice Model MNL Estimates Student t Estimates Student t -6.626 -5.162 -4.500 -6.81 -4.383 -5.688 -1.380 -2.18 -6.403 -8.750 -2.100 -3.26 -0.943 -4.369 -0.856 -4.07 -0.849 -3.917 -0.826 -4.18 1.384 7.096 1.360 7.32 0.162 2.229 0.156 2.29 2.710 3.985 2.700 4.12 3.160 0.798 2.810 0.770 2.810 1.085 3.054 0.456 1877 -57624.26 0.236 500 25.984 2.695 30.708 3.965 30.708 5.620 31.048 3.373 TABLE 4 Car Choice Model Results 14 1877 0.234 - - Common parameters with the standard multinomial logit model have the same sign and magnitude, except for alternative specific constants. In addition, the significance of the latent variable parameters shows that they have a significant effect on the individual utilities. The latent variables all enter very significantly and positively into the choice model specification. EC has the highest effect on the HFC alternative. 5.3 Structural Model The set of equations for the latent variables’ structural model is given by the following expressions, while the estimation results are showed in Table 5: ECn = B1,1 Intercept + B1,2 user car + B1,3 user carpool + B1,4 user transit +B1,5 gender + B1,6 HighInc5 + B1,10 EDU C5 + B1,11 AGE2 +B1,12 AGE3 + B1,13 AGE4 + ς1n (22) ACFn = B2,1 Intercept + B2,2 user car + B2,5 gender + B2,6 HighInc5 +B2,10 EDU C5 + B2,11 AGE2 + B2,12 AGE3 + B2,13 AGE4 + ς2n (23) REVn = B3,1 Intercept + B3,10 EDU C5 + B3,11 AGE2 + B3,12 AGE3 +B3,13 AGE4 + ς3n (24) The structural equations link consumers’ characteristics to the latent variables. Based on the results presented in Table 5, we can conclude that environmental concern is more important for transit users than for car pool users. We also observe a negative parameter for a driving alone user. High education level has a significant positive effect on the REV variable while the effect is positive but not significant on both the EC and ACF variables. The effect of age on EC increases as the individuals get older. The effect of age on ACF is the highest for people between 41 to 55 years old. Not surprisingly, the effect of age on REV is small and not significant for people older than 55 years of age. Note that we also estimate the elements of the covariance matrix. As shown in Table 6, the elements on the diagonal of the Ψ matrix are significant and show the presence of heteroscedasticity. In this version, we assumed that the latent variables are uncorrelated. 15 16 TABLE 5 Structural Model Results EC ACF REV Estimates Student t Estimates Student t Estimates Student t Intercept 2.434 9.660 -3.094 -18.495 1.293 7.374 Driving Alone User : user car -0.020 -0.428 -0.118 -2.244 Car Pool User : user carpool 0.097 1.135 0.100 0.938 Transit User : user transit 0.204 2.572 Gender (Female Dummy) 0.258 7.392 0.283 6.474 High Income Dummy (>80,000$) -0.011 -0.294 -0.001 -0.017 Education level 5 (University) 0.064 1.712 0.008 0.170 0.598 9.522 Age level 2 (26 to 40 years) 0.187 2.279 0.328 3.961 0.592 5.363 Age level 3 (41 to 55 years) 0.262 3.105 0.621 6.657 0.889 7.952 Age level 4 (56 years & more) 0.332 3.702 0.525 5.278 0.228 1.840 Var(EC) Var(ACF) Var(REV) Estimates Student t 0.553 20.934 0.708 23.410 0.449 3.336 TABLE 6 Latent Variables Variances 5.4 Measurement Model The latent variable measurement model links the latent variables to the indicators, and a typical equation for this model has the form: q26ind2 = αq26ind2 + λ1,q26ind2 ECn + λ2,q26ind2 ACFn + ε26ind2 . (25) In this example, we can see that the effects on the second indicator of question 26 are measured using a constant and the two latent variables. We have considered 21 indicators, so it is necessary to specify 21 equations. Their relations with the latent variables are depicted in Table 7 and the estimation results are displayed in Table 8. As explained before, this model measures the effect of the latent variables on each indicator. Some interesting conclusions can be seen from the results presented in Table 8. For example, the effect of EC on the indicator related to the support of expanding and upgrading roads is negative. This sign reflects the idea that car priority in the context of rising road capacity is negatively perceived because of the negative impact on the environment. On the one hand, we see that the effect of EC on the indicator related to the support of applying road tolls and gas taxes is positive, indicating a perceived positive environmental impact of measures allowing a rational use of the car. Note also that the effect on the same indicator of the ACF, the other transport related latent variable considered, is negative although not significant. This sign can be explained because of the perceived negative impact of this kind of car use restrictions, especially if the user is considering buying a new car. A similar analysis can be done for the other indicators. For example, the positive sign of the effect of both transport related latent variables on the support for reducing vehicle emissions with regular testing and manufacturer emissions standards: it is perceived with a positive environmental impact but also it is positively perceived by consumers as a good attribute of a potential new car. It is worth noting that such results permit us to establish a consumer profile in a way not possible with standard discrete choice models. Using simple questions we can enrich the model and obtain better knowledge about the user’s characteristics and his behavioral attitudes and perceptions. 17 Table 7 Indicators and Latent Variables 18 Question 26 EC on Expanding & Upgrading Roads EC on Road Tolls & Gas Taxes ACF on Road Tolls & Gas Taxes EC on Bike Lanes & Speed Controls EC on Reducing Car Emissions ACF on Reducing Car Emissions EC on High Occupancy Vehicules & Transit Priorities EC on Improving Transit Service EC on Promoting Compact Communities EC on Encouraging Short Work Weeks Question 29 EC on Traffic Congestion EC on Traffic Noise EC on Poor Local Air Quality ACF on Poor Local Air Quality EC on Accidents Caused by Bad Drivers EC on Emissions & Global Warming EC on Speeding Drivers in Neighborhoods Question 6 ACF on Purchase Price ACF on Fuel Economy Importance ACF on Horsepower Importance ACF on Safety Importance ACF on Seating Capacity Importance ACF on Reliability Importance ACF on Styling Income Class REV on rev Estimates Student t -0.392 0.581 -0.091 0.532 0.478 0.295 0.606 0.491 0.206 0.396 -5.405 6.600 -1.389 8.507 7.944 7.856 8.143 8.352 2.994 7.159 0.735 0.901 1.000 -0.061 0.837 1.113 1.107 9.154 9.495 -1.416 14.472 16.200 15.790 -0.004 0.259 0.433 1.000 0.684 0.537 0.371 -0.087 5.008 7.220 11.563 16.241 6.660 1.00 - TABLE 8 Structural Model Covariates 19 6 Conclusion In the last decade, discrete choice modeling has evolved towards an explicit incorporation of psychological factors affecting decision making. Traditionally, discrete choice models represented the decision process as an obscure black box for which attributes of alternatives and characteristics of individuals were inputs and where the observed choice made by the individual corresponded to the output of the system. The new trend in discrete choice modeling is to enhance the behavioral representation of the choice process. As a direct result, the model specification is improved and the model gains in predictive power. Econometrically, the improved representation – called the hybrid choice model – involves dealing with a choice model formulation that contains latent psychometric variables among the set of explanatory variables. Since perceptions and attitudes are now incorporated, this leads to more realistic models. In this paper we have described the hybrid choice model, composed of a group of structural equations describing the latent variables in terms of observable exogenous variables, and a group of measurement relationships linking latent variables to certain observable indicators. We have shown that although the estimation of hybrid models requires the evaluation of complex multi-dimensional integrals, simulated maximum likelihood can be successfully implemented and offers an unbiased, consistent and smooth estimator of the true probabilities. Using real data about virtual personal vehicle choices made by Canadian consumers when faced with technological innovations, we showed that hybrid choice is genuinely capable of adapting to practical situations. Indeed, the results provide a better description of the profile of consumers and their adoption of new private transportation technologies. We identified two travel related dimensions with a significant impact: environmental concern and appreciation of new car features. We also successfully included in the choice model a latent income variable to account for the measurement errors associated with the reported income classes. A valuable positive feature of hybrid choice is the fact that enriching the model is quite easy: people are used to rating exercises from marketing surveys, and responses to such questions are simple to include in econometric studies. Further research will include enriching the model specification through specification testing – taking advantage of the complete information provided by the survey – and analyzing the results of a prediction and simulation exercise, which will permit us to compare the predictions of a hybrid choice model versus those obtained from a conventional choice model. 20 References [1] Ben-Akiva, M. Structure of passenger travel demand models. Ph.D. dissertation, Department of Civil Engineering, MIT, Cambridge, Mass. 1973. [2] Ben-Akiva M. and S.R. Lerman. Discrete choice analysis: Theory and application to travel demand. The MIT Press, Cambridge, Massachusetts, 1985. [3] Ben-Akiva, M. and B. Boccara. Integrated Framework for Travel Behavior Analysis. IATBR Conference, Aix-en-provence, France, 1987. [4] Ben-Akiva, M., D. McFadden, K. Train, J. Walker, C. Bhat, M. Bierlaire, D. Bolduc, A. Boersch-Supan, D. Brownstone, D. Bunch, A. Daly, A. de Palma, D. Gopinath, A. Karlstrom and M.A. Munizaga. Hybrid choice models: progress and challenges. Marketing Letters 13 (3), 2002, 163-175. [5] Ben-Akiva, M., D. Bolduc and J.Q. Park. Discrete Choice Analysis of Shipper’s Preferences, in Recent Developments in Transport Modeling: Lessons for the Freight Sector, Eddy Van de Voorde editor, Emerald publisher, forthcoming 2008. [6] Bhat, C. Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model. Transportation Research B, 35, 2001, pp. 677-693. [7] Bolduc, D. and M. Ben-Akiva. A multinomial probit formulation for large choice sets. Proceedings of the 6 th international conference on Travel Behaviour, Volume 2, 1991, pp. 243-258. [8] Bolduc, D., M. Ben-Akiva, J. Walker and A. Michaud. Hybrid choice models with logit kernel: applicability to large scale models. Integrated Land-Use and Transportation Models: Behavioral Foundations, Elsevier, Eds. Lee-Gosselin and Doherty, 2005, pp. 275-302. [9] Bolduc, D. and A. Giroux. The Integrated Choice and Latent Variable (ICLV) Model: Handout to accompany the estimation software, Département d’économique, Université Laval, 2005. [10] Brownstone, D. and K.E. Train. Forecasting new product penetration with flexible substitution patterns. Journal of Econometrics, 89, 1999, pp. 109-129. [11] Jöreskog, K. and D. Sörbom. LISREL VI: Analysis of linear structural relations by maximum likelihood, instrumental variables and least squares methods, User’s guide, Department of Statistics, University of Uppsala, Uppsala, Sweden, 1984. [12] McFadden, D. and K.E. Train. Mixed MNL models of discrete response. Journal of Applied Econometrics, 15, 2000, pp. 447-470. 21 [13] Horne, M., M. Jaccard, and K. Tiedman. Improving behavioral realism in hybrid energy-economy models using discrete choice studies of personal transportation decisions. Energy Economics, 27(1), 2005, pp. 59-77. [14] McFadden, D. Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontier in Econometrics. Academic Press, New York, 1974. [15] McFadden, D. The choice theory approach to market research. Marketing Science, Vol. 5, No. 4, 1986, pp. 275-297. [16] Morikawa, T., M. Ben-Akiva and D. McFadden. Discrete choice models incorporating revealed preferences and psychometric data. Econometric Models in Marketing 16, 2002, pp. 27-53. [17] Munizaga, M.A. and R Alvarez-Daziano. Testing Mixed Logit and Probit models by simulation. Transportation Research Record 1921, 2005, pp. 53-62. [18] Train, K. Discrete Choice Methods with Simulation. Cambridge University Press, 2003. [19] Walker, J. and M. Ben-Akiva. Generalized random utility model. Mathematical Social Sciences, Vol. 43, 2002, No. 3, pp. 303-343. 22