Time Saving Tips for Handling Multiple Data in STATA Handling Data A dataset contains a number of observations (people) and a list of variables (their attributes) Data can be stored in either long or wide format • Wide – each observation corresponds to a single individual • Long – more than one observation per person Wide Format Long Format Formats Even though there is >1 observation per patient in long format this can still be considered as a ‘single’ observation – Xij • i = the patient to which the observation belongs • j = the within-patient identifier Interchanging between formats (I) STATA has to be told how to split the data Command to use for reshaping reshape direction varnames, i(varlist) j(varname) • Direction – what format you want the data to be in after reshaping e.g. if you are reshaping wide Æ long you need to specify ‘long’ • Varnames – the variables you want to be reshaped. All other variables are assumed to be constant within varlist. • i(varlist) – the variable(s) whose unique values denote a logical observation. In my data this corresponds to the patient id number. • j(varname) – the variables whose unique values denote sub-observations/within-patient identifiers Interchanging between formats (II) If reshaping from long Æ wide you need to generate a system variable so that STATA can use it for reshaping, unless there is a within-patient identifier already present • _n contains the number of the current observation • _N contains the total number of observations in the data If reshaping from wide Æ long there is no need to generate a system variable but you need to be more careful after reshaping Illustration of interchanging Other uses of the reshape command Reshape is particularly useful for dealing with numeric indicator variables In STATA you can avoid lugubrious programming by using asterixes (*), hyphens (-) and question marks (?) to replace characters in a variable list (varlist) It is also possible to use time saving methods for dealing with numbers in a number list (numlist) Varlist substitutions * Replaces more than one character STATA deals with all variables with the same letters as those surrounding the ‘*’ id* – all variables starting with id • *id – all variables ending with id • *id* – all variables containing id • ? Represents a single character and directs STATA to deal with all variables where only that letter differs. It is possible to use ≥ 1 question mark in a command 1st - last STATA deals with all the variables between the first variable specified and the last one Numlist substitutions Specify the numbers you want to use in a command e.g. 1 2 3 4 5 or 5 10 15 20 25 Specify a regular sequence e.g. 1/5 instead of 1 2 3 4 5 Represent regular intervals in a sequence e.g. 5(5)25 instead of 5 10 15 20 25 Soap opera illustration For values/numbers These commands cycle through a list of variables and perform an operation on each member of the list ‘for var’ is used for performing an operation on variables whose names have nothing in common e.g. length, weight and height ‘for num’ is used for substituting numbers in a command For val/num examples Foreach, forvals These are recent additions to STATA Allow for more complicated programming and are closer in syntax to the rest of STATA but are harder to get to grips with compared to the ‘for val’ and ‘for num’ commands Foreach/forval examples Other uses of foreach and forval They may be used to deal with several datasets simultaneously They can be used for variables containing blank elements Used for looping over the elements of a local macro (Gus will discuss this in the next session) Looping over the elements of a global macro (Gus Æ next session) Other examples Discussion Important to keep a systematic naming convention in STATA pop60 pop70 and pop80 is easier to handle than pop1960 pop70 and popn80 Worth finding a system you feel comfortable using and stick with it General STATA Issues (I) Any current issues/problems that need addressing Forum for essential commands that are commonly used or commands that have been recently discovered General STATA Issues (II) STATA 6 manuals (except St-Z) An introduction to survival analysis using STATA by MA Cleves, WW Gould, RG Gutierrez STATA journal – includes useful programming tips and time saving advice