Stata0, Introduction Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ Why Stata • Pro – – – – – Aimed at epidemiology Many methods, growing Graphics Structured, Programmable Coming soon to a course near you • Con – Memory>file size – Copy tables May-16 H.S. 2 Data handling Import data • Using SPSS 14.0 – Save as, Stata Version 8 SE May-16 H.S. 4 Interface May-16 H.S. 5 Do Editor • New – Ctrl-8, or: • Run – Mark commands, Ctrl-D to do (execute) May-16 H.S. 6 Do-file example May-16 H.S. 7 Syntax • Syntax [bysort varlist:] command [varlist] [if exp] [in range][, opts] • Examples – – – – May-16 mean age mean age if sex==1 bysort sex: summarize age summarize age ,detail H.S. 8 Use and save data • Open data – set memory 200m – use “C:\Course\Myfile.dta”, clear • Describe – describe – list x1 x2 in 1/20 describe all variables list obs nr 1 to 20 • Save data – save “C:\Course\Myfile.dta” ,replace May-16 H.S. 9 Drop and keep • Drop – drop x1 x2 – drop if sex==1 – drop if age==. drop variables x1 and x2 drop males drop missing • Keep – same as drop May-16 H.S. 10 Recode • Syntax – From 4 to 2 groups: recode educ (1 2=1) (3 4=2)(missing=.), gen(educ2) – From cont. to 3 groups: recode age (min/19=1) (20/29=2) (30/max=3), gen(age3) May-16 H.S. 11 Labels • Variable – label variable q1 ”Age” • Value 1 ) label define freqLab 1”Low” 2”Med” 3”High” 2a) label values smoke freqLab 2b) label values drink freqLab • List – label list May-16 H.S. 12 Generate, replace • Age square – generate ageSqr=age^2 • Young/Old • Alternatives – generate old=0 if (age<=50) – replace old=1 if (age>50) generate old=(age>50) generate old=(age>50) if age<. • Observation numbers – gen id=_n – gen lag=age[ _n-1] May-16 H.S. 13 Dates • From numeric to date ex: m=12, d=2, y=1987 generate bdate=mdy(m,d,y) format bdate %d • From string to date ex: bstr=“01.12.1987” generate bdate=date(bstr,”dmy”) format bdate %d May-16 H.S. 14 Missing • Obs!!! – Missing values are large numbers – age>30 will include missing. – age>30 if age<. will not. • Test – replace x=0 if (x==.) • Remove – drop if age==. • Change – replace educ=. if educ==99 May-16 H.S. 15 Describe missing • Summarize variables • Missing in tables May-16 H.S. 16 Handle data with many variables • Describe – describe vars – summarize vars – codebook vars format and labels N, mean, std, min and max range, missing, mean and std, percentiles • Find variables – describe, simple – lookfor age – describe age*, n list all variables list variables with “age” in name or label list vars starting with “age” and show var number • Change order – order vars May-16 change order of variables H.S. 17 Help • General – help – findit command keyword search Stata+net • Examples – help table – findit aflogit May-16 H.S. 18 Summing up • Use do files – Mark, Ctrl-D to do (execute) • Syntax – command [varlist] [if exp] [in range] [, options] • Missing – age>30 & age<. – generate old=(age>50) if age<. • Help – help describe May-16 H.S. 19 Books Data Analysis Using Stata by Ulrich Kohler and Frauke Kreuter Statistics with Stata (Updated for Version 9) by Lawrence C. Hamilton A visual guide to Stata graphics by M.N. Mitchell Multilevel and longitudinal modeling using Stata by S. Rabe-Hesketh, A. Skrondal May-16 H.S. 20