numerical/statistical - Agricultural & Applied Economics

advertisement
Agricultural and Applied Economics 637
Lab Section 1: Introduction to GAUSS
January 24, 2007
As noted in class, we believe the best way to really learn econometrics is to go “inside the black
box”. As such throughout the semester you will not only be asked to undertake empirically
based assignments but you will also be asked to develop your own parameter estimation
method(s). We have found that the GAUSS software package is a very useful tool for obtaining
an understanding of applied econometrics as it allows you to directly translate what is presented
in class into an operational software program. This software has been used extensively by
econometricians since the early 1980’s. I started using this system in 1983 (before there were
hard drives available on the typical desktop computer).
The best way to start learning GAUSS is to use it. Today’s lab exercise will take you through a
basic introduction of the software and familiarize you with the programming language. You’ll
want to save your code when you are finished, as we’ll build on it next week. Be aware: the
steps outlined here are just one way of doing things. There are many ways to edit, run, code,
etc., and coding is a very personal thing. As we go through the class, you will see more methods
and you will invariably develop your own style. Through the years, I have found a number of
strategies that help with the development and debugging of my own computer code. I will offer
these insights to you over the course of the semester. I hope you will find them useful as well.
The GAUSS software is a very extensive system that can be used for a variety of
numerical/statistical operations. You will not use all of the features and commands associated
with this program during this course. You will, however be able to undertake a variety of basic
as well as fairly advanced tasks by the end of the course. I would recommend that you become
familiar with Chapter 4 (Introduction to the Windows Interface) and Chapter 5 (Using the
Windows Interface) of the GAUSS Users Guide that you can download from the class website.
We also encourage you to learn to use the on-line help system available from the main menu
once GAUSS is started (Reference and Command Reference Sections) and from the above
GAUSS Users Guide.
The goal today is to make sure everyone knows how to get started developing your own GAUSS
code, accessing GAUSS data sets and outputting the results in a readable format. So let’s get
started.
1. LOG INTO THE AAE NETWORK
If you have an AAE account, log in as usual. If you do not have an AAE account you can login
using a guest account:
Username=heccquest
Pwd=Ag&AppEc
2. OPEN GAUSS
1
Go to the Start menu → Programs → Statistics → GAUSS 7.0 (Note: If you trying this on your
own machine, you will probably have an icon on your Desktop. You will also probably locate
the GAUSS program in a different location than the above)
This opens the GAUSS PROGRAM. Close the Tip of the Day Box if it appears. The
“Command Input-Output” window is open. The cursor is blinking at the gauss prompt. Here
you can enter commands line-by-line and see your output in the same window (Note: you can
also include the following commands in an ASCII file with each command followed by a “;” and
tell GAUSS to run this file). For example, type
cls <hit the return key> (This clears the screen of any characters)
y=16+15 <hit the return key> (This adds 16 and 15)
y <hit the return key>
(This displays the current value of y, which happens to be a
scaler)
and see what happens.
3. WRITING A PROGRAM.
You can undertake various tasks using GAUSS one command at a time (Command mode) or by
running a bunch of commands at once (Batch mode). To run a file in batch mode you first create
an ASCII file containing these commands. This file can be created using your own favorite
ASCII file editor (e.g., Notepad, Tedit, etc) or you can use the native GAUSS program editor.
(Note, I use a specialized editor designed for programmers. I use it because of some of its
features) Lets assume that you will be using the GAUSS editor. Once GAUSS is started click on
the first button on the tool bar (or File→new). This will open up an editor window where you
can write an entire program and run it all at once. It helps to go to the Window menu and choose
“tile horizontal”. Now you have your edit window in the top half and the command I-O window
on the bottom half of the right side of your screen.
Open up the file example_1.gas which is located in the network temporary drive (T:\) via the
GAUSS program editor. You should see something like the following:
new;
cls;
let a={1 2 3, 4 5 6};
b=ones(2,3);
c=a + b;
“a” a;
“b” b;
“c” c;
end;
/*Use this to start every program. This clears out
previous information from past programs*/
/* Clears the screen of previous gauss output*/
/*one of several ways to create a (2x3) matrix with these
particular elements- see Felix Ritchie’s “Basic
operations”. NOTE the let command is necessary*/
/*creates a matrix with 2 rows and 3 columns, full of 1’s*/
/*defines the variable c*/
/*Prints the letter “a” and then the matrix called a. */
FYI: a few general comments about the code.
2






There must be a semi-colon to end every command. You can have many commands
in the same line, but they must be separated by a semicolon. Much time has been
wasted trying to debug code that doesn’t run because of a forgotten semicolon!!
You can place a long command onto separate lines of your code so long as a “;” is
not used. You should not have lines longer than 80 characters (for printing
reasons). If you line is more than 80 characters, continue with your code on a 2nd
line, indent, and keep typing.
Every program should start with new; and finish with end;
Any variable to the left of an equals sign is a name I made up, e.g. that I want to
define for GAUSS
Any thing to the right of an equals sign is a variable name I already defined or a
GAUSS command/operation/function/reserved word;
Comments, that don’t impact the operation are located between /* and */
The GAUSS editor has the normal Windows-based file management system so you can edit and
save this file like any other program you have used before. I would recommend that you save
this file to your H drive. Go to File→Save, and save it on your H drive, or a removable disk.
(Hint: It is recommended that when creating gauss program files that you give it a unique file
extension different from “txt”. Personally I use the extension “gas” as the file extension for
gauss program files.) With your cursor located in GAUSS editor window, click on the Run
Active File button on the toolbar (the 12th from the left) or go to Run→Run Active File. What
happened? Is the matrix c what you expected it to be?
Now in your edit window, add the following to the end of your program JUST BEFORE THE
END STATEMENT. What do you expect it to do?
“d”; d=a*b; d;
Run the program again and notice what happens- it doesn’t work. That is, you receive the error
message in the bottom window that looks similar to the following:
M:\aae637\spring_07\example_1.gas(9): error G0036: Matrices are not conformable
The number in parenthesis provides an indication of the line number that generated the error.
(Note: For more complicated code and depending on the error, the error will be in the
neighborhood of this line number).
This new command doesn’t make sense since the “a” and “b” matrices are not dimension
compatible with respect to matrix multiplication. This error was found while GAUSS was
compiling the program file. As such, given the severity of the error GAUSS did not run any of
the program. There is a useful message in the command window, though. When you get this
message, it is often helpful to verify the dimensions of your matrices. You will find the Source
View window very useful for the debugging of your code. From within the Source View
window, if you click on the Symbols tab, and then click on matrices, you see a list of the
matrices your created and their size. If you click on a particular matrix, a matrix viewer window
opens up and shows you the values of the particular matrix of concern.
3
The problem here is that the GAUSS command * performs matrix multiplication, and here you
are telling it to multiply a (2x3) matrix times a (2x3) matrix (which you should remember is not
possible).
Change the last line to read
d=a .* b;”d”; d;
Also add the line
e=a*b’; “e”; e; /*Note the transpose*/;
Run the code again. Does it work? What is different about these? What do these commands do?
4. READING IN A DATA SET
For most of the assignments and in-class discussion, we will work with data sets that have
already been created. So, we need to read in and open this data file. I will always provide the
data as a GAUSS data set, although you can also read in ASCII files and even Excel spreadsheets
directly into GAUSS. GAUSS datasets (and similar to STATA and SAS datasets) are binary
files that can not be opened up and examined using an ASCII (text) editor. DBMSCOPY is a
useful program that can transform any ASCII or binary file of a particular type (e.g., SAS,
STATA, EXCEL, GAUSS) into any other file type. Besides being able to transform data from
one form to another, this software also has the nice characteristic of allowing you to view any
type of data, undertake data management activities and to obtain descriptive statistics of this
data. The DBMSCOPY manual can be downloaded from the class website (It can be found in
the GAUSS Programming & Programming Hints Section) For now, I have put a gauss data set
on the temp drive, T, in the folder aae637. The data set is called china_00 (Warning: This file
has a “dat” extension. Do not use your file manager to try to open this file as it is binary and will
try to use Notepad to open this file).
Copy this file to your H drive, or removable disk. Normally you will have to download and
unzip the data set from the course webpage to your H drive, temp drive, disk, etc. A GAUSS data
set (in V92/V96 format) consists of the actual data and information as to variable names, variable
type and column location in the data set (in binary form).
One complaint I had when first learning GAUSS is that I couldn’t start by “seeing” my data.
DBMSCOPY helps me here too. You can open a GAUSS dataset in DBMSCOPY and look at
the variable names, locations, create new variables and a lot of other stuff. You can try this now
or later, ask me if you want to know more. (Note: One caveat when using DBMSCOPY is that
if you want to convert for example a SAS to a GAUSS dataset make sure you save the file in
the GAUSS V92/V96 format.) . Start DBMSCOPY and look at the GAUSS dataset china_00.
4.1 Accessing DBMSCOPY Remotely
Note, you can do the above at home or some other remote location even if you do not have
access to DBMSCOPY so long as you have internet access, XP Professional is your computer’s
OS and you have a valid AAE account. If you meet this requirement, you can use the Remote
Desktop feature of the XP OS to connect to the AAE Remote Server. If you are at a remote
location, click on START, then PROGRAMS then ACCESSORIES then COMMUNICATIONS
4
then REMOTE DESKTOP CONNECTION. The computer you want to access is our remote
server which has the computer name (address) of: remote.aae.wisc.edu . Once you enter this
address, you will then see what looks like another Windows session. What is actually occurring
is that your remote machine is in fact starting a session on another Windows machine (i.e., the
remote machine). You can then log-in to the remote computer using your AAE username and
password. The remote machine is set up exactly (or very similar to) the regular AAE lab
machines. This means that you have access to all the programs in the AAE lab from your remote
location. Since you are logging in using your AAE account, you have the same drive mappings
as you have when you log into a lab machine. This means that so long as your data, programs
etc are stored on a network resource, you have access to it from your remote location.
4.2 GAUSSS Code for Reading in a Dataset
Now, I would like you to create a new gauss command file. Use File→Close to close the above
GAUSS command file. Then use File→Open to open up the file: example_2.gas. Use the
Window→Tile Horizontal command sequence to place your various windows in an orderly
manner. Type the following lines into the GAUSS edit window. (Note: Where I’ve written
t:\\aae636\\ type the location of the folder in which you’ve copied the data, making sure you use
the double backslash (ie., h:\\private\\). Also note the use of quotation marks). The following is
a listing of the commands contained in the above GAUSS command file.
new;
/* Command to clear memory */
cls;
/* Clears screen from previous output */
format /rd 8,2;
/* Formats screen output, 8 places, 2 decimal pts */
outwidth 256;
/* Output width of your pinter/screen */
basepath="t:\\aae637\\";
/* Create path acronym */
outpath="t:\\aae637\\output\\";
/* Create path to where output file is to be place*/
datafile = basepath $+ “china_00"; /* Identifies Gauss data set */
outfile=outpath $+ "china_ex.out"; /* Complete path for output ascii file */
output file=^outfile reset;
/* Identifies output file */
open fp=^datafile varindxi;
/* Open data file for reading */
numobs=rowsf(fp);
/* Determine number of observations */
mydata=readr(fp,numobs);
/* Read in data all at once*/
vvv=close(fp);
/* Close data file */
print "I have successfully run the program";
end;
This block of text, or something very similar to it, will be needed at the beginning of virtually all
of your GAUSS programs.
What do the above commands do?
format controls the format of numerical output. The values “/rd 8,2” gives rightjustified decimal number allowing 8 total spaces, 5 to the left and 2 to the right of the
decimal (ie. 62534.78) THIS IS IMPORTANT, ON ASSIGNMENTS I WANT THE
NUMBERS FORMATED TO A REASONABLE NUMBER OF DECIMALS.
Multiple format commands can be used in a single GAUSS run to control the look of
5
specific output. You can look up the format command in the on-line help system. Give
it a try.
outwidth sets the width of your output file. You’ll almost always use 256
basepath is a user-supplied name I made up and defined as the words in quotes. This
saves typing below. This basically contains a path statement that will be appended to a
file name later on to completely identify the location and name of a particular file. Make
sure the directory exists.
outpath is similar to basepath but defines the location (directory) where your output is to
be placed. Make sure that this directory exists prior to you running your program.
datafile tells GAUSS what GAUSS data set to open and where it is located. In this case,
I want to open the file containing the GAUSS data set, china_00, located in the folder
t:/aae637. I could type
datafile = “t:\\aae636\\china_00”;
/* Note the double back-slashes */
Since I defined a variable called basepath, I can save typing by using the syntax above.
This is useful if your folder path is very long. For example your data is stored in
h:/private/school/spring2007/aae637.
Note that you do not need to include the “.dat” extension to the GAUSS dataset name.
For some reason people don’t seem to remember this. This just saves you typing.
outfile is a user supplied name I made up to identify the location and name of an ASCII
output file I want GAUSS to create. After I run my program, I can go to my computer,
look in my t:\aae636\output folder and see a file called china_ex.out.
output file actually creates an ASCII output file for your program and the “reset”
command overwrites the contents of this file with every run. You can specify that the
output file be saved as a text file by replacing “.out” with “.txt” if you prefer. The
“^outfile” says to GAUSS to place the value of “outfile” here.
open fp opens the data file identified by datafile. The varindxi commands allows one to
identify variables by their name instead of column location. varindxi creates new
variables in memory equal in number to the number of variables in a GAUSS data set
where the letter “i” is added to each variables file name (the first 7 letters). The values of
these scaler variables will be the column number of the variable in the gauss data set
being accessed. This information will facilitate your ability to access specific variables in
a data set without knowing the column location of the variables. You only need to know
the variable name. Use the Source View window to verify the contents of these varindxi
variables.
6
rowsf(fp) identifies the number of rows in the file fp. rowsf is a native GAUSS
command. Here we call this number numobs. This name is arbitrary and you can call it
whatever your want. Don’t confuse the rowsf command with the rows, which returns the
number of rows in a matrix x. Use the online help system for more information about
these commands.
readr(fp,numobs) reads numobs rows from the file fp, and assign this data to a matrix
referred to using the arbitrary name mydata. The value of numobs was obtained from the
previous command.
close(fp) closes the file fp
FYI: You must first open and then read the data. It is important that you read the correct
number of rows. Too many rows and things don’t work, too few and you lose data.
5. RUN THE CODE, AND CHECK TO MAKE SURE YOU HAVE SUCCESSFULLY
LOADED THE DATA.
Hit the run button. Go to the source view window and then click on the symbols tab and then
click on the matrices section. You should see that the matrix of mydata should have a dimension
of (2050 x 4). This means there are 4 variables in the dataset with 2050 observations.
6. IDENTIFY YOUR VARIABLES
The variables in this data set are
Per Capita Income
At home food expenditures
Away from home food expenditures
Region(either 1, 2, or 3)
percinc
fah
fafh
region
After the “readr” command you now have a (2050 x 4) matrix of data. The matrix name is
“mydata”;
To make things easier, we might like to identify the columns of the data matrix by name. The
varindxi command we used allow us to do this easily, if we know the variable names even if we
don’t know which column is which variable. Go to the source view window and then click on
the symbols tab and then click on the matrix ipercinc. You should see the value “4” indicating
that the per capita income variable is in column “4” of the GAUSS dataset.
In your command window, type each of the following:
inc=mydata[.,ipercinc] <Hit the return key>
fafh=mydata[.,ifafh]
<Hit the return key>
reg=mydata[.,iregion] <Hit the return key>
7
This tells GAUSS that the user-defined variable inc is all observations in the 4th column of the
userdefine mydata matrix (i.e., the column of information associated with per capita income),
etc. The use of the “.” in the above commands tells GAUSS to grab all the rows. If you wanted
to grab only the 120th -220th observations for some reason you would type something like:
inc_2=inc=mydata[120:220,ipercinc]
Go to the source view window and then click on the symbols tab to look at the income (INC) and
food-away-from-home (FAFH) matrices. They should both be (2050 x 1) in size.
If you want to create a (2050 x 2) matrix containing per capita income and food-away-from
home expenditures you could enter the following either in your command file or interactively:
income_region=mydata[.,ipercinc ifafh];
.Note there is no comma or other delimiter between ipercinc and iregion. This would grab the 4th
and 2nd columns (the columns associated with PERCINC and REGION) from the mydata matrix
using all the observations (which is what the “.” means). I could have created this new variable
via the following:
inc_reg_2=inc~fafh
The “~” operator represents horizontal concatenation of two matrices (vectors).
In the command I/O window type the following command:
begtind=numobs-10;
fah[begind:numobs]~reg[begind:numobs]
and see what comes up.
FYI: The square brackets attached to the end of a matrix name allow you to specify part of
a matrix. For example,
mydata[2,4] is the single element in the 2nd row and 4th column of mydata
mydata[2,1:3] is the (1x3) vector of element in the 2nd row and 1st through 3rd
columns
mydata[.,4] is the vector of elements in the entire 4th column.
FYI: If we did not know the variable names we could get them with the following
commands in the commad window.
varnames=getname(datafile); /* varnames is arbitrary name */
$varnames;
/*the $ tells GAUSS that varnames is a character matrix*/
You could also look at this character vector using the source view window, symbols tab. If you
click on the matrix varnames all you see are zeros. That’s because GAUSS does not know that
varnames is a character vector. If you click on the column header for (1), then click on format,
then choose character, the window will then show the elements of the varnames vector. In this
8
example, the order of the variable names also reflects the order of the variables in the mydata
matrix.
7. NOW LET’S DO SOME MATHEMATICAL MANIPULATIONS AND DATA
INTERPRETATION
Refer to the Gauss reference material, help manual, and the list of commands below to answer
the following questions:
What is the average income and food expenditures?
What is the maximum level of income?
What percentage of households live in region 1?
What is the min, max, and mean fafh in region 1?
/********************************************************************/;
ADDITIONAL COMMANDS YOU SHOULD BE FAMILIAR WITH
sumc(x) returns the sum of the elements in matrix x
cumsumc(x) returns the cumulative sum of the columns of matrix x
meanc(x) returns the mean of every column of matrix x
stdc(x) returns the standard deviation of of the elements in each column of matrix x
minc(x) returns a column vector containing the smallest element in each column of matrix x
maxc(x) returns a column vector containing the largest element in each column of matrix x
vcx(x) computes the variance-covariance matrix
zeros(r,c) creates a matrix with r rows and c columns full of zeros
ones(r,c) creates a matrix with r rows and c columns full of ones
eye(N) creates an NxN idendity matrix (you must either use a number in place of N or define N)
rndn(r,c) creates a matrix with r rows and c columns of c independent std. normal random
numbers
sortc(x,c) sorts matrix x in increasing order according to column c
rows(x) returns the number of rows in matrix x
cdfn(x) returns the Prob(z<=x) where z is a N(0,1) random variable
y=selif(examdata[.,2:5], data_matrix2[.,4] .ge 10)
This creates a matrix y that contains selected parts of the previously defined matrix
mydata. Here y is all rows of the 2nd through 5th columns of the examdata matrix when
the 4th column of the matrix data_matrix2 is greater than or equal to 10. (Note: This
assumes that the number of rows of data_matrix2 is the same as the number of rows of
the examdata matrix.)
Some logical expressions (Note these are all element by element comparisons),
.gt → greater than
e.g. vvv =(x .gt y) will create a vector of 0,1’s of dimension rows(x) depending on
whether the comparison is true or false, =1 if xi > yi, 0 otherwise
.lt → less than
e.g. vvv =(x .lt y) will create a vector of 0,1’s of dimension rows(x) depending on
whether the comparison is true or false, =1 if xi < yi, 0 otherwise
9
.ge → greater than or equal to
e.g. vvv =(x .ge y) will create a vector of 0,1’s of dimension rows(x) depending on
whether the comparison is true or false, , =1 if xi ≥ yi, 0 otherwise
.le → less than or equal to
e.g. vvv =(x .le y) will create a vector of 0,1’s of dimension rows(x) depending on
whether the comparison is true or false, , =1 if xi ≤ yi, 0 otherwise
.eq → equal to
e.g. vvv =(x .eq y) will create a vector of 0,1’s of dimension rows(x) depending on
whether the comparison is true or false, , =1 if xi = yi, 0 otherwise
.and → allows you to combine logical expressions
e.g., vvv= (x .gt 0) .and ( y .le 0) will create a vector of 0,1’s of dimension rows(x) =1
if xi>0 and yi ≤0
.or → allows you to examine the union of two logical expressions
e.g., vvv= (x .gt 0) .or ( y .le 0) will create a vector of 0,1’s of dimension rows(x) =1
if xi>0 or yi ≤0 or both.
The following illustrates the use of a “For Loop”. You should refer to your reference material
for instructions on its use. The For or Do loops can make your life much easier if you are doing
repetitive tasks.
x=zeros(20,1);
for i (1,20,2);
x[i]=i;
endfor;
/* This initializes the x matrix so there are place holders for later use In
general this only needs to be done when using for or do loops.
/* i is an temporary variable used to control the loop In this example
the loop goes from 1 to 20 in increments of 2, e.g., 10 steps. That
is every other element of x will be non-zero*/
/* the ith element of the vector x is assigned the current value of i */
/* end of for loop */
Another way to accomplish the above:
for i (1,20,1);
if i .eq 1;
x=i;
else; x=x|i; /* This horizontally concatenates the previous x vector with another
element equal to “i” */
endif;
endfor;
/* comments*/ anything in the comments area will not be read by GAUSS but can help you
organize your program.
Matrix concatenation can be achieved via the following assuming conformability:
~ → horizontally concatenates two matrices
In GAUSS type the following:
10
let a1={1,2,3} <Hit return key>
let a2={4,5,6} <Hit return key>
a3=a1~a2
<Hit return key>
“a1” a1
<Hit return key>
“a2” a2
<Hit return key>
“a3” a3
<Hit return key>
| → vertically concatenates two matrices
In GAUSS type the following:
a4=a1|a2
<Hit return key>
“a1” a1
<Hit return key>
“a2” a2
<Hit return key>
“a4” a3
<Hit return key>
8. RUN THE DEMO_GAUSS PROGRAM
Now you are ready to run a longer GAUSS program. Download from the Lab section of the
class website the program DEMO_GAUSS_07.GAS command file and store locally. This file
undertakes some basic matrix manipulations. After examining this file, run it in GAUSS and
make sure you understand the various operations.
A FINAL NOTE: Try to develop a habit of writing clean and neat code (e.g. use of
indentation for loops, lines of code that don’t go on forever, etc). This will facilitate your
debugging of code and your ability to understand what is actually being undertaken. Refer
to the GAUSS users manuals for guidance.
11
Download