Getting Started with Limdep for Windows

advertisement
CITY AND REGIONAL PLANNING 776
Getting Started with Limdep for Windows
Philip A. Viton
January 11, 2006
Contents
1
Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
2
2
Data preparation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
3
3
Limdep's opening screen : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
4
4
Open the spreadsheet data le : : : : : : : : : : : : : : : : : : : : : : : : : :
5
5
Data examination : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
6
6
Data transformations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
7
7
Projects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
8
8
Setting up the model — dependent variable : : : : : : : : : : : : : : : : : : :
8
9
Setting up the model — independent variables : : : : : : : : : : : : : : : : : 10
10 Run the model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11
11 Examine the results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11
12 Command les : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13
13 Choice-based sampling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16
14 The mixed-logit model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17
15 References : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21
1
1
Introduction
This note is an illustrated guide to using Limdep/NLogit to estimate a discrete-choice (logit)
model.
The program is available on the computers of the KSOA network. It should also be
available on Civil Engineering computers. A full set of manuals is available in the KSOA
library. There are actually two versions of the program: Limdep and NLogit. For most purposes — except some advanced discrete choice models, like the nested logit or mixed logit
models, see section 14— they work identically, and datasets are transferable between the
two versions. Unless otherwise stated, when I refer to Limdep, I mean Limdep or NLogit.
There is also a free student edition available on the KSOA Faculty (typically X;) drive
at
viton_philip/homework/general/handouts/stat.exe
This will only be accessible if you're enrolled in the School of Architecture: others can stop
by my of ce with a Zip disk (total capacity 250M or less; the program needs 4M free space,
plus another 8M if you want documentation), or a writeable CD, or jump drive. To install the
student edition, just run the exe le. There's no point in not accepting the default location,
since that will always be created on your hard disk, no matter what you say.
The student edition is limited to datasets of less than 50,000 values, with no individual
variable having more than 1000 observations: in other words, it is suitable for all but very
large analyses. If you have a large dataset, you can develop your analysis at home using a
subset of the data (to get all the procedures right, etc); and then save the commands into a
le, bring them into one of the labs and run the analysis on the large dataset.
I concentrate here on the discrete-choice model; but Limdep is an extremely full-featured
general-purpose statistics package: it will allow you to estimate just about any model you
nd discussed in the econometrics literature.
In this connection, note that even for simple things like regression models, Excel's numerical accuracy is suspect: one recent analysis by a statistician recommended that it should
not be used if accurate results were desired. On the other hand, most “real” statistics packages (eg, SAS or SPSS) can estimate a many of these models, though typically not as many
as Limdep; and SAS in particular is good at database management, which one of Limdep's
weaknesses. One recent PhD student here did all his data manipulation in SAS and then
imported the nal massaged data into Limdep to do the actual estimation.
In this note I emphasize Limdep's point-and-click GUI interface. My own view is that
this is a bit klunky, and that for any serious work you may want to consider the batch-like
command-based interface — see section 12.
The dataset I'll use is clogit.dat; this contains data on 210 individuals' mode choices
for intercity travel in Australia. You can get the data by opening Limdep's online Help
2
and selecting Datasets in the Contents. The actual data is under “Data on Mode Choice for
Discrete Choice Models”. You can copy it to the clipboard and then paste it into a NotePad
window and save it to disk; or you can paste it directly into one of Limdep's command
windows and then Run the window. Since most of your data will not be available in this
form, for purposes of illustration I imported the data set into an Excel spreadsheet, and I'll
begin with that.
2
Data preparation
Limdep can read in data in Excel (.xls) , Lotus-1-2-3 (.wks or .wk1), or plain text
formats.
Spreadsheet data should consist of values only — formulas should be converted to
values before saving. Limdep will consider any non-numeric data (except in Row 1
of a spreadsheet) as missing data, and will assign it value 999.
The rst row in your spreadsheet can be used to name the variables. If you don't
do this, the variables are named automatically as X1; X2, X3, etc, which is pretty
unhelpful. Names must be 8 characters or less; case is not signi cant.
Don't try anything fancy like multicolumn titles or Limdep will have trouble understanding them.
3
3
Limdep's opening screen
You see an “Untitled project” containing no data variables (you can tell that there are none
because there's no “+” beside the “Variables” folder). Note that there are entries in the Matrices and Scalars folders; but if you open the folders and look, they are all empty elements.
4
4
Open the spreadsheet data le
Our rst task is to read in some data to analyze.
Do Project -> Import -> Variable
Find your spreadsheet le via the standard Windows interface.
Click Open to read in the data.
Note that a little “+” now appears before the Variables folder, indicating that some
variables are available to work with.
5
5
Data examination
Click on the Project Window under Variables to see which data series have been read
in.
You can use the Data Editor to see the actual data, though the editor is limited in the
number of rows (observations) can be displayed.
To start the editor, double-click on the series you want to view. You will also see a
few of the series surrounding that one.
You can also edit the data from the Cell box at the top of the editor. Note that if you
do so, you are not prompted to save any changed data when you close the data editor.
To save changed data, do Project -> Export -> Variables and choose your export
format.
6
6
Data transformations
Limdep includes many ways to create new variables by transforming old ones. Personally I nd it easiest to do this “by hand” in a command window — see section
12 — using the create command. But you can also do it using the point and click
interface.
First enter the Data Editor (double-click on any variable or do Project -> Data Editor
Right-click anywhere in the Data Editor, and then select New variable
You get a dialog-based way to create a new variable: see the picture below.
– Fill in the name you wish to give your new variable in the Name eld. Remember that names must be 8 characters or less; case is not signi cant.
– You can then choose one or more transformations from the list on the right.
Unfortunately, some of these are rather cryptically named; and the online help
provides no guidance at all.
– Choosing a transformation inserts a function name into the Expression box;
this function name typically contains one or more place-holders (eg x) where
you must ll in the name of an existing variable: unfortunately, there is no
dialog-based way to do this.
– Click OK to have the new variable created. You can examine it in the Data Editor
to ensure that the command you constructed actually did what you intended it to
do.
7
7
Projects
Reading in the data is probably the slowest part of Limdep. To get round this, you
can read in the data once, then save the current Limdep workspace as a Project. Once
this has been done, you can read in the Project instead of re-reading the data: this is
almost instantaneous.
To save a workspace as a project do File -> Save Project As. Provide a name only:
the extension .lpj will be automatically appended.
At the end of your Limdep session, you'll be asked if you want to save the current
project. There's probably no reason to do this unless you've changed the data (for
example, added new variables). But you should know that a Limdep project includes
all matrices and scalars shown in the Project Window, so if you do want to save the
latest versions of these, then (re-)save the project when asked.
8
Setting up the model — dependent variable
We will now set up to estimate a logit model of discrete choice for our data.
Do Model -> Discrete Choice -> Discrete Choice to start.
Note that, despite the model's being known as the Logit model, you do not choose
Logit. The name “Logit Model” is used in some of the econometrics literature for a
slightly different model, and Limdep respects that usage.
Now set up the dependent variable (the one describing the choices the individuals in
the sample actually made) in the Main tab.
Click on the drop-down menu under Choice Variable to select the choice indicator.
In our case, the dependent variable is called MODE
8
You must also provide names for the choices represented in the dependent variable
(this is how Limdep keeps things straight internally).
The names can be anything you like, but obviously it's a good idea to make them
re ect the alternatives actually represented in the data, in the proper order.
Here we choose Air,Train,Bus,Car, but we could just as well use (say) A,B,C,D
Limdep allows each individual in the sample to have a different-sized choice set: in
this case you need to provide a variable describing, for each individual, which modes
are available.
9
9
Setting up the model — independent variables
Click on the Options tab.
In the middle of the page, you select variables from the list on the right.
Then click on << in the Attributes frame to add them to the list. If you change your
mind, then selecting a variable in that frame and clicking >> removes it from the list
of independent variables.
The variable ONE is built in to the program: it represents a constant (vector of 1's).
But for the discrete choice model is has a special usage: if you include it in your list of
independent variables you will automatically get the full set of estimable alternativespeci c constants. (The omitted constant will correspond to the last alternative).
However, for this particular dataset, the alternative speci c constants are included
as variables (AASC, BASC, CASC, and TASC).
If you click on the << in the Interact with ASC frame, you create a version of
the variable which differs by choice (that is, the product of a variable with the set
of alternative-speci c constants). So, for example, if you wanted the coef cients of
In-Vehicle Travel Time to vary by mode, you'd enter it using the Interact with ASC
buttons.
10
10
Run the model
You can experiment with the other options, but most of the time, the defaults will
suf ce. However, the Display frame on the Output tab allows you to request that
certain additional results of the estimation (for example a full variance-covariance
matrix, or a set of descriptive statistics) be printed.
When you're ready to run the model, click Run.
Here's the output:
Note the little box at the bottom, marked Matrix LastOutp: this is a little spreadsheetlike object containing the estimation results. You can open it, select the entire array
by clicking on the top-left cell, copy the contents to the clipboard, and then paste it
into an Excel spreadsheet or Word document for further manipulation. Note that this
object is not saved when save the results to a text le.
11
Examine the results
The most important result is given at the end of the Trace window, where you see the
Exit Status for the model you have just estimated. This should always be 0: if it is not,
11
then something has gone wrong, and the results are unreliable. Only after you have
checked this should you look at the actual estimation results, in the lower part of the
window.
You can save the results into a text le by doing File ->Save and then providing a
le name. The default extension (.lim) is optional.
Here is the contents of the saved le:
The variables here are:
– INVC : in-vehicle trip cost
– INVT : in-vehicle trip time
– AASC, TASC, BASC :mode-speci c dummys for car, train and bus (respectively).
The entry “Log likelihood function” is what the SFBA handout refers to as the loglikelihood at convergence.
The columns marked R-sqrd give the likelihood ratio index (McFadden's / for the
models. This statistic represents the gain in information provided by the model, versus
the no-information case. It is roughly analogous to the R 2 in linear regression. Note
that there are two concepts of “no information” used here: a model in which all the
coef cients are zero (“No coef cients”) and a model in which we are assumed to
know only mode-speci c constants (“Constants only”). This latter makes sense as
a no-information model because we do not need to gather any information (data) in
order to run a model with dummys. Note that before the coef cient-estimates results
there is a useful box of explanations of how these are computed.
12
12
Command les
For many analyses, I prefer to create Limdep commands “by hand”: this makes it easy
to modify a command (you just want to add or remove an independent variable, for
example) without going through the whole point-and-click menu system. Limdep has
a built-in Text/Command Document window, which you can use for this purpose.
However, this requires a knowledge of the syntax of Limdep's commands; see the
online help. You can get a leg up on syntax from your previous output, which shows
commands Limdep built from your point-and-click instructions. They appear in the
lower frame of your Trace window (also in any saved output) preceded by -->.
For example, suppose you want to re-run the model we've just estimated without the
mode-speci c dummys AASC,TASC and BASC. Here's the Output window, showing
the command you just ran:
Open a Command Document: click File -> New (or click on the New Document
icon at the far left) then select Text/Command Document
13
Copy-and-Paste the DiscreteChoice lines (without the --->). Do not remove the
$ at the end of the line: this is how Limdep knows that the line has ended. (This
also means that you can break up a command onto multiple lines, which may make it
easier to read).
The lhs= (“left-hand side” of the regression equation) modi er provides the dependent (observed choice) variable. The independent variables are entered using the rhs=
(“right-hand side”) modi er, and choices= supplies the names you chose for the alternatives. Both rhs and choices are comma-separated lists of names; and note that
each modi er ends with a semi-colon (;).
Edit out the alternative speci c constants.
You can add comment lines to remind yourself of what you're doing. Comment lines
begin with a question-mark (?). They do not appear in your output.
Now highlight all the lines (models) you want to run. (You can highlight comments:
they will be ignored).
14
Click the green Go button to run the models, or do Run -> Run Selection
You can also construct your command le using an external text editor (like NotePad)
and import it and then run it from within Limdep. After a while you'll be able to write
Limdep commands without having to examine a previously run command to see what
they should look like.
15
13
Choice-based sampling
As it happens, the sample in this dataset is not random: rather it is choice-based, and as
we've seen, ignoring this fact leads to incorrect estimation results. In order to correct the
problem, we need to know the population selection proportions, which are as follows1 :
Mode
Sample
Population
Air
0:2667
0:1400
Train
0:5200
0:1300
Bus
0:0267
0:0900
Car
0:1867
0:6400
Note what is happening here: we are over-sampling the non-road modes, Air and Train. To
have Limdep correct for this, you need to tell it what the true (population) proportions are:
you do this in the model setup dialog:
Then click Run to run the model
1 There is further discussion of this dataset and its applications in Louvière, Hensher and Swait, Stated Choice
Methods; however note that the population proportions given on p. 157 of the book are obviously wrong, since
they don't add up to 1.0. I got the proportions in the table above from an old Limdep manual.
16
And here is the output. Note that you no longer have Likelihood Ratio statistics; and
that the output reminds you that we are correcting for choice-based sampling.
14
The mixed-logit model
The mixed logit model handles the case of unobserved heterogeneity by assuming that (some
of) the weighting coef cients vary in the population according to some distribution, and
estimating the parameters of those distributions. This model may only be estimated with
NLogit, not Limdep (and hence not with the student version of Limdep). In the KSOA there
is limited availability of NLogit: it is restricted to 3 simultaneous users. Note that the only
advantage of NLogit over Limdep is in a few of these specialized models: for other models,
the two programs accept precisely the same commands and give precisely the same results.
To estimate a mixed logit model you need to make a few decisions:
Which coef cients will you assume to be random, and which distributions will you
use? NLogit gives you a choice of normal, uniform, triangular and lognormal distributions. There are some tricky issues here: for example, you'd usually want a price
coef cient to be always negative, since increasing prices reduces utility for anyone.
But if it had a normal distribution, then there is some probability that it could be negative, since the domain of the normal distribution is the entire real line. To handle
this in NLogit, one creates a new variable, the negative of prices, and then forces the
coef cient to be from a non-negative distribution. That's why the non-negative lognormal distribution is available. You can also force the triangular distribution to be
non-negative.
Another issue concerns the values of times (or, in general, substitution between a
modal characteristic and cost). As we've seen, this is the ratio of the characteristicscoef cient and the coef cient of cost. If you specify both of these to be random,
you are forcing the values to have a very complicated distribution (the ratio of two
normals, for example, is not normal). In order to make the interpretation of the results
easier, one often speci es that cost (the quantity in the denominator) be non-random.
How will you generate the random draws needed for the simulation? You have two
choices: via uniform random numbers or by a special procedure known as Halton
quasi-random numbers. Most people believe that Halton numbers are substantially
better: you get more precise results with fewer repetitions, ie computational effort.
How many repetitions will you use? There is no exact answer here: one guideline is
that if you use Halton numbers then somewhere between 100 and 500 is reasonable:
if you use uniform numbers than you may need between 4 and 10 times as many. That
can be seriously time consuming.
Do you want robust standard errors? There are two ways of computing the estimated standard errors: the ordinary (fast) way, which however is sensitive to misspeci cation, and the robust way which is not. Since we will not in general know that
17
our speci cation of the model is correct, we will almost always want to request robust
standard errors. But these take a bit longer to compute.
Warning: there are identi cation (uniqueness of the estimates) issues connected with
specifying randomly varying choice-speci c dummies: you can in general estimate
only J .J
1/=2 1 of them (where J is the size of the choice set). There may
be identi cation issues connected with individual characteristics attached to a single
alternative: the situation here is not quite clear. There are however no identi cation
issues connected with allowing characteristics of the alternatives (cost, time etc) to
vary randomly. See the References for more information on this.
To specify a mixed-logit model you will ordinarily use the command editor, and then
submit your command to NLogit for estimation: there's no graphical command builder for
this model.
Here's an example. Note that this is to be considered only an example: as we've seen,
this dataset is choice-based, and NLogit contains no way to estimate the mixed logit model
on a choice-based sample. In this model we shall allow the coef cient of generalized cost
(GC) to have a normal distribution, we shall use Halton quasi-random numbers and R D 100
repetitions for the simulation and request robust standard errors. The picture below shows
the commands: note that the main command is nlogit not DiscreteChoice.
The lhs, rhs, and choices are as before.
rpl is the part of the command that requests the mixed-logit model: “rpl” stands for
“random parameters logit”
halton requests Halton quasi-random numbers. If you omit this, you get uniform
numbers.
18
pts is the number of repetitions. The default is 100.
The fcn part of the command is where you specify the distribution of the random
parameters. This consists of a variable name (here, gc) followed in parentheses by in
identi er for the distribution you want: your choices here are n (normal), u (uniform),
l (lognormal) or t (triangular). More complicated setups are possible: you could for
example specify that the distributions are non-independent by including the keyword
;correlated.
Here's the output from this model:
A quick guide to understanding the output:
We begin with a standard logit model, in order to get reasonable starting points. That's
the only use for the rst block; otherwise you don't need to worry about it.
This is followed by the mixed-logit results. Most of the quantities in the top box
should be familiar. It tells us how many repetitions we requested, and con rms that
Halton numbers were used. It also tells us that a Robust VC (variance-covariance
matrix) was used.
This is followed by the estimates themselves. The important thing to note here is
that in the case of the random coef cients we are estimating the parameters of the
underlying distribution, not the individual-speci c weights themselves. The estimates
are broken into three parts
– The rst, in the section random parameters in utility functions gives the
means of the distributions of the random parameters.
19
– The second, nonrandom parameters in utility functions gives the coef cient estimates for parameters you have decided are to be non-random. These
have precisely the same interpretation as with “standard” logit.
– The third, Derived standard deviations of parameter distributions,
gives the estimated standard deviations of the random distributions. (The notation NsGC reminds you that the chosen distribution was Normal; if you'd chosen,
say, a triangular distribution, it would have said TsGC; the “s” stands for “standard deviation”).
In this model, for example, we have estimated that the generalized cost has a normal
distribution with mean :01578 and standard deviation :0001879: Note the low tstatistic on the standard deviation: in fact, we cannot reject the null hypothesis that
the standard deviation is 0, implying that in fact GC isn't random at all.
20
15
References
There are two extremely good books on discrete choice, both available free over the internet:
Kenneth E. Train, Qualitative Choice Analysis: Theory, Econometrics and Applications
to Automobile Demand. MIT Press, 1986, 1993. See http://elsa.berkeley.edu/
books/choice.html.
Kenneth E. Train, Discrete Choice Methods with Simulation. Cambridge University
Press 2003. See http://elsa.berkeley.edu/books/choice2.html. This book
focuses on the more dif cult mixed logit model.
The issue of identi cation in these models is covered in two papers, both available on
the internet at http://web.mit.edu/jwalker/www/home.htm:
Joan Walker, “The Mixed Logit (or Kernel Logit) Model: Dispelling Misconceptions of
Identi cation”. Also in Transportation Research Record 1805:86–98.
Joan Walker, Moshe Ben-Akiva and Denis Bolduc, “Identi cation of the Logit Kernel
(or Mixed Logit) Model”. MIT working paper.
21
Download