Getting Started With Stata - Department of Epidemiology

Getting Started With Stata
Session 1
Jim Anthony
John Troost
Department of Epidemiology
Michigan State University
Windowing and the Edit submenus
The
Review
Window
displays
a record
of the
commands
implied by
your
changes to
the data
editor.
You can
save these
commands
so that you
do not have
to enter the
basic data
structure
next time.
VARIABLE
window, with
variables
being
created
COMMAND window
The
‘Log’ or
‘Output’
window
echoes back
the
commands
and the
result of
each
command.
You’ll learn
to save a log
file that you
can use to
document
your work,
copy/paste
tables to
emails, or
print it out.
Enter the 1 0 1 0 sequence in the first four rows of var1 as shown
here, and then click on the first row of the second column.
In that cell of the dataset, enter 1 as shown below.
Repeat that process for the var2 variable by double-clicking on ‘var2’
at the top of the second column of values, and make the changes as
shown below in order to label the unprotected sex variable, which
also is a ‘dummy-coded’ exposure variable which means that 1 is the
code for exposed and 0 is the code for unexposed.
In the jargon of our field, any 0/1 coded variable is a ‘dummy-coded’
variable. Question for you: Is the aids variable a ‘dummy-coded’
variable even though it has to do with ‘case’ status and not with
‘exposure’ status’? Think about it. The answer is on the next slide.
Yes, the aids variable is a dummy-coded variable as well because it has the 0/1 coding
scheme.
ANY 0/1 variable might be thought of as a dummy-coded variable, whether it applies to
case status, exposure status, or any other kind of variable. A dummy-coded variable
always is a ‘binary’ or ‘dichotomous’ variable.
I think it is helpful to reserve the term ‘dummy-coded’ for variables that are “nominal” in
the variable’s level of measurement.
This concept of ‘level of measurement’ is an important one.
Nominal variables are at a very low level of measurement. The values are names, and
the 0/1 coding may have nothing to do with units of measurement as we see in an
‘ordinal’ variable that conveys rank.
For example, an ordinal variable conveys class rank. The best student might get a class
rank value of 1 (first in the class; best grade). The next best student would get a class
rank value of 2 (second in class; next best score), and so on, with the integers actually
conveying the ‘distance’ or ranking of each student in relation to an underlying scale,
and we can interpret the meaning of a unit change in the class rank variable.
In this sense, we look across values of nominal variables, but can’t compare levels.
Nominal variables can reveal group differences, but not levels of the variable.
Actually, you can leave this at float type and change the
format to %4.0f, which gives a bit more generality.
But name the variable struc1 (your first data structure).
The data structure you just created corresponds to a null association between aids and
unprotected sex.
One way to think about this data structure is that it is the kind of structure we might
generate if we had flipped two fair heads/tails coins for each of the 400 people, and
then used the pattern of heads and tails to place each person into a case-exposure cell
of the table.
If the laws of chance worked exactly as they should work with respect to these 400
people, and we are flipping Coin #1 and then Coin #2, and looking at the combinations
of heads and tails on the two paired coins, then how many combinations of each type
should we see?
How many ‘head–head’ combinations?
How many ‘tail-tail’ combinations?
How many ‘head-tail’ combinations?
And how many ‘tail-head’ combinations?
Hint:
The chance of a ‘head-head’ combination
= the chance of a ‘tail-tail’ combination
= the chance of a ‘head-tail’ combination
= the chance of a ‘tail-head’ combination.
ANSWER IS ON THE NEXT PAGE
If the laws of chance work exactly as they should work
with respect to these 400 people, then how many the
paired coin flips should be generated?
100 of each combination type.
This is the data structure we just created using the Stata Data
Editor, for an initial look at the association between being a
case of AIDS and prior unprotected sex exposure.
Here we leave it as a float variable,
but change the format to: %4.0f.
Name it struc2.
When you go back to the
main Stata windows after
closing the Data Editor
window, you will find that
Stata has kept a record of
all your commands in the
REVIEW window.
You can save them and
study the program syntax
later. The next slide
explains how to do it.
Your VARIABLES window
now has a list of all the
variables you created in
the dataset, which you can
save.
The BIG window is the
OUTPUT or LOG window,
and it shows your
commands and their
execution results.
You probably won’t see anything in
the COMMAND window at this time.
To save your commands for later
inspection, move your cursor into the
Review Window, and right click.
Slide your cursor to ‘SAVE ALL’ and
you will be prompted to declare a
location and file name where you can
save them.
By tradition, the extension for Stata
syntax files includes the letters ‘do’
because these are ‘do’ files.
Save them with an informative name,
such as ‘build26feb11.do’ so that you
can remember out how to ‘build’ a
dataset from scratch using this file.
We can go over the other options
later.
However, sometimes, if you have issued
some incorrect commands, you may want
to partition the correct commands from the
incorrect commands, before you save
your program syntax commands for later
use,
Incorrect commands show up in the
command window as red font.
To partition the incorrect ones (if you have
made any mistakes in issuing commands),
slide the Review Window to the right (red
arrow at bottom of the snapshot), and then
click on the _rc letters printed up at the top
of the command window.
Now, you can either select on the correct
commands and save the selected ones,
using the menu from the last slide. Or you
can save ALL of the commands, sorted by
correct status.
Now, let’s have you look at the data structures you built, using a basic tabulate command,
which is abbreviated by Stata as:
tab
Start by typing
tab
in the
COMMAND window, and follow the instructions below, step by step. Then go to next slide.
The result should look like what you see down below.
Now, position your cursor in the command window to the right of the word
‘aids’ as shown above.
Use the space bar to add a space and ENTER this phrase
[fweight=struc1]
Then press the ENTER key.
This command applies ‘struc1’ values as ‘frequency weights’ and builds the
2x2 aids – u_sex table, as shown in the log window (next slide).
The result should look like what you see down below, but
the command line should be empty. Look at the table
before going to the bottom of this slide.
Now, let’s apply the struc2 weights and see the positive association table. Do this by
pressing the PgUp key to retrieve your just-issued command, and change struc1 to
struc2. (You can just change the number. No need to type the entire word again.)
Press ENTER to issue this command.
ANNOTATING THE OUTPUT WITH COMMENTS
SAVING THE DATASET
SAVING THE OUTPUT IN A LOG FILE YOU CAN EDIT WITH NOTEPAD
SAVING THE COMMAND FILE YOU CAN EDIT WITH NOTEPAD
SEE SLIDE 24-25, AS SHOWN BEFORE
Second Part of Session 1
•
As a work group or on your own, view the UCLA Stata introductory
streaming video on other ways to bring data into the Stata environment
(e.g., if you have an .xls spreadsheet version of the data):
http://www.ats.ucla.edu/stat/stata/notes_old/movies/IntroStata1.html
•
This video also teaches some nifty Stata tricks about describing datasets,
etc.
Information about importing SPSS and SAS files into Stata can be found
here:
http://www.ats.ucla.edu/stat/stata/faq/convert_pkg.htm
•
Other Stata aids at the UCLA site are here:
http://www.ats.ucla.edu/stat/stata/
Session 2 Overview
•
•
An overview of the Stata epitab commands will be provided.
The ‘immediate’ commands will be covered in detail
http://www.stata.com/help.cgi?epitab
In advance of Session 2, read Chapter 1 (3 pages) of this online text if you are new
to epidemiology or need a quick refresher overview.
http://www.epi.msu.edu/janthony/Epidemiologic%20Analysis%20with%20a%20Programmable%20Calculator.pdf
End of Session 1
A copy of this PPT and an annotated Stata do-file with these commands
can be found at the following URL:
http://www.epi.msu.edu/janthony/stata/session1/
Try the .zip file if you cannot access the individual files.