Time Saving Tips for Handling Multiple Data in STATA

advertisement
Time Saving Tips for Handling
Multiple Data in STATA
Handling Data
ƒ A dataset contains a number of
observations (people) and a list of variables
(their attributes)
ƒ Data can be stored in either long or wide
format
• Wide – each observation corresponds to a
single individual
• Long – more than one observation per person
Wide Format
Long Format
Formats
ƒ Even though there is >1 observation per
patient in long format this can still be
considered as a ‘single’ observation – Xij
• i = the patient to which the observation belongs
• j = the within-patient identifier
Interchanging between formats (I)
ƒ STATA has to be told how to split the data
ƒ Command to use for reshaping
reshape direction varnames, i(varlist) j(varname)
• Direction – what format you want the data to be in after
reshaping e.g. if you are reshaping wide Æ long you need
to specify ‘long’
• Varnames – the variables you want to be reshaped. All
other variables are assumed to be constant within varlist.
• i(varlist) – the variable(s) whose unique values denote a
logical observation. In my data this corresponds to the
patient id number.
• j(varname) – the variables whose unique values denote
sub-observations/within-patient identifiers
Interchanging between formats (II)
ƒ If reshaping from long Æ wide you need to
generate a system variable so that STATA can use
it for reshaping, unless there is a within-patient
identifier already present
• _n contains the number of the current observation
• _N contains the total number of observations in the data
ƒ If reshaping from wide Æ long there is no need to
generate a system variable but you need to be
more careful after reshaping
Illustration of interchanging
Other uses of the reshape command
ƒ Reshape is particularly useful for dealing with
numeric indicator variables
ƒ In STATA you can avoid lugubrious programming
by using asterixes (*), hyphens (-) and question
marks (?) to replace characters in a variable list
(varlist)
ƒ It is also possible to use time saving methods for
dealing with numbers in a number list (numlist)
Varlist substitutions
*
Replaces more than one character
STATA deals with all variables with the same
letters as those surrounding the ‘*’
id* – all variables starting with id
• *id – all variables ending with id
• *id* – all variables containing id
•
?
Represents a single character and directs
STATA to deal with all variables where only that
letter differs. It is possible to use ≥ 1 question
mark in a command
1st - last STATA deals with all the variables between the
first variable specified and the last one
Numlist substitutions
ƒ Specify the numbers you want to use in a
command
e.g. 1 2 3 4 5 or 5 10 15 20 25
ƒ Specify a regular sequence
e.g. 1/5 instead of 1 2 3 4 5
ƒ Represent regular intervals in a sequence
e.g. 5(5)25 instead of 5 10 15 20 25
Soap opera illustration
For values/numbers
ƒ These commands cycle through a list of variables
and perform an operation on each member of the
list
ƒ ‘for var’ is used for performing an operation on
variables whose names have nothing in common
e.g. length, weight and height
ƒ ‘for num’ is used for substituting numbers in a
command
For val/num examples
Foreach, forvals
ƒ These are recent additions to STATA
ƒ Allow for more complicated programming and are
closer in syntax to the rest of STATA but are
harder to get to grips with compared to the ‘for
val’ and ‘for num’ commands
Foreach/forval examples
Other uses of foreach and forval
ƒ They may be used to deal with several datasets
simultaneously
ƒ They can be used for variables containing blank
elements
ƒ Used for looping over the elements of a local
macro (Gus will discuss this in the next session)
ƒ Looping over the elements of a global macro (Gus
Æ next session)
Other examples
Discussion
ƒ Important to keep a systematic naming
convention in STATA
pop60 pop70 and pop80
is easier to handle than
pop1960 pop70 and popn80
ƒ Worth finding a system you feel comfortable using
and stick with it
General STATA Issues (I)
ƒ Any current issues/problems that need
addressing
ƒ Forum for essential commands that are
commonly used or commands that have
been recently discovered
General STATA Issues (II)
ƒ STATA 6 manuals (except St-Z)
ƒ An introduction to survival analysis using
STATA by MA Cleves, WW Gould, RG
Gutierrez
ƒ STATA journal – includes useful
programming tips and time saving advice
Download