A GAUSS Starter Document

advertisement
GAUSS S T A R T E R H INTS
(ORIGINAL AUTHORS: PAULA DESPINS AND RAGAN PETRIE)
Commands for General Use of GAUSS:
We are going to be using the Windows version of GAUSS because the interface is better and much easier to
use. The DOS version does exist, and all the program commands that apply to DOS apply to Windows and
vice-versa. There are some commands for maneuvering around the program that are unique to the DOS
version, but we will not cover those here.
To start GAUSS from the desktop of any computer in the HECC, double-click on the GAUSS icon. Or, if
the icon is not on your desktop, click on the start button, then click on programs, then click on statistics
packages, then click on GAUSS. This will put you in the GAUSS-Command window where there will be a
[gauss] prompt after which you can type commands directly into GAUSS (more on these later). This is not
the window in which you write and edit your own GAUSS code. This is done in the GAUSS-Edit window.
To open this window, click on the Window menu at the top of the screen and then click on GAUSS-Edit.
The editing capabilities of this window are very similar to those of a simple text editor such as Microsoft
Notepad.
Other functions available through GAUSS pull-down menus:
file menu
in this menu, you can save your program files, find a saved file to edit, debug programs
(unfortunately, this isn't a magic debug function! In fact, I've never used it.), run a GAUSS
program, and exit GAUSS. Note that some functions are listed twice (edit, run, save). The
ones listed at the top refer to actions on saved files, and the ones at the bottom refer to actions
on currently active (open) files.
edit menu
in this menu, you can cut, copy, and paste text and also undo previous steps.
search menu in this menu, you can find text and replace text. This can be particularly useful if you need to
replace text that occurs several times in your program.
windows menu
in this menu, you can switch between edit and command mode. This can also be done
by clicking on the buttons on the bottom of the screen (just as you would switch between
Word and Excel, for example).
help menu
in this menu, you can find the Windows version of the help manual.
Below the menus are four large buttons, two text boxes, and four smaller arrow buttons. The top box is the
run box and displays a selected file from the run list (files, typically that you have written, that may be run in
GAUSS). Clicking on the arrow to the right of the run box displays the entire run list. Similarly, the bottom
box is the edit box.
run button
clicking on the run button will run the entire file currently displayed in the run box. You can
add files to the run list by using the first "run" option in the file menu. If you have output from a
program run, the output will be printed in the command window (as well as in an output file
that you specify … see below), and you can scroll up and down the window to view your
output. If you have graphs, a window for each graph will be displayed.
Before running a GAUSS program, GAUSS automatically saves the current version of the
program. So if you are experimenting with new code, you might want to save your work
under a new file name before overwriting your old code. You can do this by pulling down the
file menu in the GAUSS-Edit window and clicking on Save-As.
save button
clicking on the save button will save the file currently displayed under its existing file name, thus
overwriting the previous version.
edit button
this functions analogously to the run button. You can also choose files to edit by pulling down
the File menu in the GAUSS-Edit window and selecting Edit.
stop button
this button will stop your program while it is running. This is particularly useful if you appear to
be stuck in an infinite loop. Be patient when clicking the stop button – sometimes there is a
significant delay before GAUSS recognizes the command.
Debug button
again, nothing to get excited about here. This just compiles your program without
running it to check for syntax errors. But GAUSS does this before running a program,
anyway.
running only part of your program
if you want to run only a portion of your program, highlight that section and click on the run
button. This is useful for debugging your code, but it won’t work if there is an error
somewhere else in the program.
GAUSS is actually a very simple language for writing programs, but it takes practice. Before getting into the
mathematical commands allowed in GAUSS, note the following basic operating commands that are needed in
many if not all GAUSS programs. One thing to keep in mind is that blank lines and extra spaces in your code
typically are ignored by GAUSS.
;
the semi-colon must end every command in GAUSS (many hours have been spent
debugging code because of missing semi-colons). The command itself can be written over
several lines, but every command must have a semi-colon between it and the next command
regardless of where line breaks occur.
new;
this command should start all of your programs
end;
this command should end all of your programs. There are other choices such as “closeall;”
that may be more appropriate, but “end;” will always work.
2
output file=<path>\filename.out reset;
this command creates an ASCII output file for your program and the reset portion allows that
file to be overwritten at every run. If instead of reset, you use “on”, then all printed output will
be appended in the specified file. You can also specify that the output file be saved as a text
file by replacing “.out” with “.txt” if you prefer.
outwidth
this command sets the width of your output file. To avoid lines wrapping around on an 8-1/2
by 11 inch page, use “outwidth 256;”
format
controls the format of numerical output. Adding “/rd” will return a right-justified signed
decimal number of the form [-]####.####, where #### is one or more decimal places. The
number of spaces for the field width and for the decimal place is specified by “#,#.” For
example, to format output with a field width of seven and two decimal places, the command is
“format /rd 7,2;” This leaves four spaces to the left of the decimal, one space for the decimal,
and two spaces to the right.
screen off
this prevents output from being printed to the screen (but it still is saved in your output file).
This is often useful when your program has a lot of output because it takes much less time to
calculate something than it does to simultaneously calculate and show it. For small output
programs this is no problem and you won’t need to use this. “screen on” will turn the screen
back on in the middle of a program; it will automatically come back on when the program
finishes executing.
/*…*/ or @…@
these are comment markers. GAUSS will ignore everything you type between them.
They are equivalent, except that “/*” opens a comment and “*/” closes a comment, whereas
“@” both opens and closes a comment. Comments are very useful for organizing your code
(especially when the number of variables gets large) and for testing sections of your code (you
can temporarily prevent certain lines from running without deleting text to isolate a problem).
print “whatever”
use this to print text to the screen or to an output file. Here, “whatever” will be
printed. The quotation marks are necessary, but the print command is not – print is the default
command in GAUSS. So, the commands “print “whatever”;” and ““whatever”;” are
equivalent. To print a GAUSS object, such as a matrix, just type its name with no quotation
marks. So, “print “whatever ” name;” will print “whatever name” – notice the space after
“whatever” is within the quote marks - that is how you must insert spaces. However, you
need a space between the quote marks and “name” as well because they are treated
separately. Simply typing “name;” will print the object.
GAUSS supposedly has a nice graphics program (pgraph), but I’ve never used it – I always
import output files into Excel to create graphs and tables. It’s up to you.
? or “”
this is used to add a blank line when printing output.
3
timestr(0)
this prints the current time.
datestr(0)
this prints the current date.
Mathematical, Matrix and Logical Operators:
Almost all common functions are included in GAUSS, like the absolute value function, the log function, the
sine, square root, gradient, integral, determinant, eigenvalue, rank, etc. A list is given in the front of Volume 2
of the manual (the command reference, ch. 21). It might be worth a quick read. Commands also are listed in
the pull-down help file. Also note that many common statistical operation like the mean, median, and
correlation matrices are supported, as well.
Below, “x” signifies a mathematical object – typically either a scalar, vector or matrix – but almost any text
string (without spaces) could be substituted in its place. Exceptions to this are the “reserve words” list (such
as proc, col, row, etc.) that is given in Appendix G of the manual. “x” must be initialized before it may be
included in any operation (more on this later). Note that typing “y=ln(x);” will assign the value given by ln(x)
to the variable y and store it in memory, but typing “ln(x);” only will print the value given by ln(x) to the screen
and/or output file. Virtually all of these commands may be used both in your code and at the [gauss] prompt
in the command window. Also note that
“(m x n)” should be read as “m by n”.
*
/
+
^
=
eq
ne
ge
gt
le
lt
.
x’
x~y
x|y
rndu(m,n)
multiplication
division
addition
subtraction
raised to the power of... e.g. “x^2” is x-squared
equal to – typically used to assign values (e.g., y = ln(x);)
equal to – typically used to check equality (e.g., if x eq 0 then y = 1; endif;)
not equal to
greater than or equal to
greater than
less than or equal to
less than
combining the period with any of the above operators causes the operation to be done
element-by-element. If you are dealing with matrices the default is to use matrix
operations, not element by element, so be sure which you want.
transpose of x
horizontally concatenates x and y. That is,it takes these two matrices or vectors and
combines them into one by stacking them side by side.
vertically concatenates x and y. This works just like “~” except it stacks them on top
of one another.
generates an (m x n) matrix of random numbers drawn from the uniform distribution
4
cdfn(x)
computes the cdf for the normal distribution. If “x” is (m x n), then “cdfn(x)”
generates an (m x n) matrix of cdf’s. “cdfnc” computes “1-F(x)”. The same process
is supported for the chi-square, F, t, beta, exponential, and gamma distributions.
These commands all start with “cdf” as well and you can look them up in the
command reference. We won’t have many opportunities to use other distributions so
these should be enough for you. Similarly, “pdfn” computes the pdf for the normal
distribution.
ceil(x)
floor(x)
round(x)
rounds x up
rounds x down
rounds x to the nearest integer
exp(x)
ln(x)
log(x)
sqrt(x)
abs(x)
computes the exponential function of x
computes the natural log of x
computes the log (base 10) of x
computes the square root of x
computes the absolute value of x
gradp(&f,x0)
computes the gradient vector or matrix (Jacobian) of a function that has been defined
by a procedure.
computes the matrix of second partial derivatives (Hessian) of a function defined in a
procedure.
hessp(&f,x0)
det(x)
inv(x)
invpd(x)
rank(x)
eig
eigv
computes the determinant of a matrix (x)
computes the inverse of a matrix
computes the inverse of a symmetric positive definite matrix (what is the difference?
None, mathematically, but a lot in terms of computation time, especially for large
matrices. For symmetric, positive definite matrices (such as moment matrices),
“invpd” is about twice as fast as “inv”.)
computes the rank of a matrix
computes the eigenvalues of a matrix
computes both the eigenvalues and the eigenvectors of a matrix. (As with calculating
an inverse, there are faster computation methods available under certain conditions.
Refer to the manual for these.)
Most of the following commands work on either a vector or a matrix. When “x” is a vector, each command
return a scalar; when “x” is a matrix, each command returns a vector.
sumc(x)
cumsumc(x)
prodc(x)
cumprodc(x)
computes the sum of the elements in each column of a matrix
computes the cumulative sum of the sum of the elements in each column of a matrix
computes the product of the elements in each column of a matrix
computes the cumulative product of the product of the elements in each column of a
matrix
5
maxc(x)
minc(x)
rows(x)
cols(x)
meanc(x)
medianc(x)
corrx(x)
corrvc(x)
crossprd
sortc(x,x[.,1])
returns the maximum element of a column vector
returns the minimum element of a column vector
returns the number of rows of a matrix
returns the number of columns of a matrix
computes the mean of a column vector
computes the median of a column vector
computes the correlation matrix of x
computes the correlation matrix of a variance-covariance matrix
computes a cross product
sorts the elements of a matrix according to the numeric order of the first column. There
are other “sort...” commands which get a little fancier.
eye(m,n)
zeros(m,n)
creates an (m x n) identity matrix
creates an (m x n) null matrix. This is commonly used to initialize matrices. For
example, if we want to record the results from each run of a loop in a matrix, we need
to initialize the matrix prior to starting the loop.
creates an (m x n) summer matrix of ones. Useful for defining a constant term in a
regression.
ones(m,n)
if “logical expression”; then “what you want to happen”; endif;
this is how you set up a conditional expression. Notice the “endif;” – you always
need to close these expressions with “endif;”. The logical expressions supported here
include the mathematical operators shown above plus “and” and “or.” You can
include as many “and” / “or” expressions as you like, but it’s often useful to use
parentheses to keep track of how they relate to one another. To express “either or”
write a command line directly after the “if” line and use the command “elseif “logical
expression”; then “what you want to happen”; -- you also can use as many “elseif”
expressions as you like.
x[m,n]
this is not a command by itself but is an integral part of many commands -- it is
referring explicitly to the (m x n) matrix x.
y=x[2:5,1:3]
This assigns to the variable y a submatrix of the matrix x. Here, y is now a (4 x 3)
matrix with values given by the corresponding elements in rows 2 – 5 and columns 1 –
3 of the x matrix. The operation also supports scalar variables in place of numerical
values, such as y = x[1:rows(x)-1,1:cols(x)-1], which assigns all but the last row and
column of the matrix x to the variable y. Or
y = x[a:b,c:d], where a,b,c and d have been defined previously in the program. If you
write y=x[.,c:d], then all rows of x are preserved; if you write y=x[a:b,.], then all
columns of x are preserved (notice the “.” inside the square brackets).
y=selif(x, [logical expression])
this selects rows or columns of the x matrix that satisfy the logical expression and
assigns them to y. See the command reference for an example. For very large data
6
sets, this procedure requires a lot of memory because it loads the entire x matrix each
time it checks the logical expression. In such cases, it may be better to write-out the
code explicitly to accomplish this task.
y=delif(x, [logical expression])
this deletes rows or columns of the x matrix that satisfy the logical expression and
assigns the resulting matrix to y. See the command reference for an example.
Data Issues:
Several of the commands shown below use “let” to begin the command line. In many cases, this is not
necessary – in fact, GAUSS may sometimes produce an error message if you use “let”. For example, you
cannot use “let” to concatenate matrices you have already made (e.g. “let x=a~b” is not supported where “a”
and “b” are matrices; but “x=a~b” is supported). However, sometimes GAUSS seems to produce an error
message if you don’t use “let” – this inconsistency is due to the fact that “let” has been phased out of the
GAUSS source code, but not entirely. The old command reference shows a lot of “let” commands, but you
should ignore most of these.
let x = { 1 2, 3 4 }
this returns a (2 x 2) matrix with 1 and 2 in the first row and 3 and 4 in the second row. The
comma is necessary to demarcate the rows. “let” isn’t necessary.
let x = { 1 2,
3 4 } GAUSS does not distinguish between this command and the one shown above. Use this one
if you find it convenient to type in data in matrix form. Note that you have to manually stack
the matrix by hitting return and inserting spaces. “let” isn’t necessary.
let x[2,2] = 1 2 3 4
this returns the same matrix as above. “let” is necessary.
let x = 1 2 3 4
this returns a column vector of “1 2 3 4.” Notice that the column vector is the default data
format for GAUSS – unlike the previous commands, GAUSS doesn’t know what dimensions
to use here, so it uses a column vector. “let” is necessary.
let x[2,2]=1
this returns a (2 x 2) summer matrix (i.e., a matrix of ones). You also could use “ones(m,n)”
but either is supported. “let” is necessary.
let x[2,2]
this returns a (2 x 2) null matrix (i.e., a matrix of zeros). You also could use “zeros(m,n)” but
either is supported. “let” is necessary.
reshape(x,r,c) = y
this reshapes a matrix x of arbitrary dimensions into an (r x c) matrix. The first c elements are
put into the first row of y, the second c elements into the second row of y, and so on. If there
are more elements in x than in y, the remaining elements are discarded. If there are not enough
7
elements in x to fill y, then when reshape runs out of elements, it goes back to the first element
of x and repeats.
seqa(starting point,increment,number of elements in sequence)
this is a quick way to generate a vector which contains regular intervals (e.g. integers 1-10).
“seqm” works the same way but multiplies rather than adds the successive elements.
GAUSS runs in your computer’s RAM, but data and output are stored on disk. Therefore, data must be
loaded into RAM before GAUSS can use it, and output must be stored on disk if you intend to use it after
quitting GAUSS (note: screen output is stored in the specified output file as discussed previously. The
commands presented here are for storing data not printed to the screen.). There are several different ways to
load data into a GAUSS program and to save objects in GAUSS to disk. Various “load” and “save”
commands are the most common, but others such as “import” also exist. Before you load or save data you
must specify a path, one for loading and one for saving. The easiest way to do this is just to type “load path =
drive:\folders;” and “save path = drive:\folders;”. These will remain the defaults unless you overwrite them
later in the program.
load x=name of matrix; or load x=name of text file.txt
this will load a previously saved GAUSS matrix (with the default extension “fmt”) or text file
and assign it to the matrix “x”. “loadm” will also work for matrices.
load x[m,n]=name of ASCII file.asc
this will load an ASCII file and assign it to the matrix “x”. Note that if you mess up the
dimensions, GAUSS will not notice and will just reshape the data to fit the “m” and “n” you
specified (e.g. if you have a file with 8 entries and you mean it to be a 4 by 2 matrix but you
type “load x[2,4]=name of file”, you will get a 2 by 4 instead of a 4 by 2).
x=loadd(“name of file”)
this will load a previously saved GAUSS data set or small ASCII file and assign it to the
matrix “x”. Note the quotation marks are required even though you will NOT see them in the
manual. Read the section on the “Atog” utility for instructions on converting an ASCII file into
a GAUSS data set prior to loading - this will be required with any substantial ASCII files.
The program DBMSCOPY can be used to convert ASCII files to GAUSS data sets, as well.
save x;
this saves the object x as a GAUSS object named “x”. Here, “x” can be a procedure,
function, matrix, string, etc. This is much more convenient than using the output file (which
essentially is just a printout) because “save” allows you to easily reload the object into another
GAUSS program. DO NOT put an extension on the object – the computer will assign the file
a default extension. Leave this alone, it will make your life easier when you reload the object.
If you want to save an object without specifying a default path, you can write “save
x=drive:\folders\file;”.
y=saved(x,name of file,variable names string);
8
this saves the matrix x as a GAUSS data set. This may be convenient if you want to transform
some variables and then save them for later use in a regression, particularly when your dataset
is large. “save” also allows you to do this.
clear x
saving an object to disk does not clear the RAM associated with that object. This can be a
problem as when you are working with large data sets. Use “clear” to free-up RAM after
you have saved an object. If you clear x before saving x, you will permanently lose x.
x=missrv(x[.,1],y[.,1])
this replaces any missing values in the first column of x with the corresponding values in the
first column of y. “miss” will reverse the process and substitute GAUSS’ missing value code
wherever it finds the number you specify. “code” “recode” and “substute” also does roughly
the same thing except you define exactly what to look for (e.g. some number) and to what to
substitute (e.g. another number).
Graphics:
GAUSS has a pretty good graphics program, and it may be worth learning. If you have more than one series
to graph, you can graph them together in one graph or in several different windows on the same page.
GAUSS also supports 3-D graphs. Personally, I prefer to load data into Excel to generate graphs, so I am
fairly unfamiliar with GAUSS’ graphics capabilities. But here is a brief intro.
library pgraph
this lets GAUSS know that you will be using the graphics module - there are several separate
modules for use with GAUSS e.g. one for maximum likelihood estimations and you must
always ‘open the door’ to them at the beginning of the program if you will need to use them.
title(“title of the graph”)
this puts a title on your graph. note the quotation marks are required.
ylabel(“label for the y axis of your graph”)
this puts a label for the vertical axis on the graph. note the quotation marks are required.
xlabel(“label for the x axis of your graph”)
this puts a label for the horizontal axis on the graph. note the quotation marks are required.
xy(x,y)
this makes a 2-d graph of your x and y vectors (x and y must be column vectors, so transpose
them if necessary). You can also concatenate two series together in order to graph them on
the same graph. For example, say you have the distribution of income for two groups (denote
x and y as the distribution for group A and B) which you want to graph together. Let z=x~y
and w is income, then “xy(w,z)” would graph both series together.
_plegstr=”graph1\000graph2”
this gives legend text in a box on your graph (here, “graph1” and “graph2”). If you have
several curves, separate the text with “\000.”
9
Download