Stata Introduction, Short v2 Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ courses May-16 H.S. H.S. 1 Stata introduction • General use – Interface and menu – Do-files and syntax – Data handling • Analysis – Descriptive – Graphs – Bivariate May-16 H.S. 2 Why Stata • Pro – – – – – Aimed at epidemiology Many methods, growing Graphics Structured, Programmable Coming soon to a course near you • Con – Memory>file size May-16 H.S. 3 Interface Interface Stata 9 May-16 H.S. 5 Interface Stata 12 Do Data file edit May-16 H.S. 6 Menu May-16 H.S. 7 Do-file example New do-file: icon or Ctrl-9 Run: Mark, Ctrl-D May-16 H.S. 8 Syntax • Syntax [bysort varlist:] command [varlist] [if exp] [in range][, opts] • Examples – – – – May-16 mean age mean age if sex==1 bysort sex: summarize age summarize age ,detail H.S. 9 Data handling Import data • Using SPSS 14.0-17.0 – Save as, Stata Version 8 SE May-16 H.S. 11 Use and save data • Open data – use “C:\Course\Myfile.dta”, clear • Describe – describe – list x1 x2 in 1/20 describe all variables list obs nr 1 to 20 • Save data – save “C:\Course\Myfile.dta” ,replace May-16 H.S. 12 Use data from web • webuse “file” use data from Stata homepage 1.webuse set “http://www.med.uio.no/forskning/doktorgradkarriere/forskerutdanning/kurs/biostatistikk/mf 9510-logistisk-regresjon-overlevelsesanalysecox/” set homepage 2.webuse “birth1” data for exercise 1 May-16 H.S. 13 Generate, replace • Index – generate index=0 – replace index=1 if sex==1 & age<30 • Young/Old – generate old=(age>50) if age<. • Serial numbers, lags – generate id=_n – generate age1=age[ _n-1] May-16 H.S. 14 Dates • From numeric to date ex: m=12, d=2, y=1987 generate birth=mdy(m,d,y) format birth %td • From string to date ex: bstr=“01.12.1987” generate birth=date(bstr,”DMY”) format birth %td May-16 H.S. 15 Missing • Obs!!! – – – – Represented as ”.” Missing values are large numbers age>30 will include missing. age>30 if age<. will not. • Test – replace age=0 if (age==.) • Remove – drop if age==. • Change – replace educ=. if educ==99 May-16 H.S. 16 Describe missing • Summarize variables summarize id bullied sex • Missing in tables tab bullied sex, missing misstable summarize bullied sex May-16 new command H.S. 17 Help • General – help – findit command keyword search Stata+net • Examples – help table – findit aflogit May-16 H.S. 18 Summing up • Use do files – Run: Mark, Ctrl-D • Syntax – command [varlist] [if exp] [in range] [, options] • Missing – age>30 if age<. – generate old=(age>50) if age<. • Help – help describe May-16 H.S. 19 Descriptive Descriptive • Continuous summarize weight summarize weight, details fractiles ++ • Categorical tabulate bullied tabulate bullied, nolab May-16 show coding H.S. 21 Other descriptives tabstat mAge, stat( N min p50 mean max) by(parity) May-16 H.S. 22 Graphics May-16 H.S. 23 Twoway plots • Syntax – twoway (plot1, opts) (plot2, opts), opts • One plot Kernel density estimate – kdensity bw 0 2000 4000 Birth weight 6000 0 2000 Birth weight – scatter bw gest 4000 6000 kernel = epanechnikov, bandwidth = 102.3251 240 May-16 H.S. 260 280 300 Gestational age 320 340 24 twoway ( kdensity bw if sex==1, lcolor(blue) ) /// ( kdensity bw if sex==2, lcolor(red ) ) 0 .0002 .0004 .0006 .0008 Weight distribution by sex 1000 May-16 2000 3000 gram H.S. 4000 5000 25 twoway (scatter bw gest) (fpfitci bw gest) (lfit bw gest) smooth with CI scatter line fit 2000 3000 gram 4000 5000 6000 Weight by gestational age 250 270 290 310 days May-16 H.S. 26 Titles scatter bw gest, title("title") subtitle("subtitle") xtitle("xtitle") ytitle("ytitle") note("note") title 1000 2000 3000 4000 5000 ytitle subtitle 240 260 280 xtitle 300 320 note May-16 H.S. 27 /// Bivariate analysis 2 independent samples Do boys and girls have the same mean birth weight? twoway ( kdensity weight if sex==1, lcolor(blue) ) /// ( kdensity weight if sex==2, lcolor(red) ) Equal means? Equal variance? 2000 May-16 3000 H.S. 4000 Birth weight 5000 6000 29 2 independent samples test ttest weight, by(sex) 2-sample T-test ttest weight, by(sex) unequal ttest w1 w2, paired May-16 H.S. 30 Crosstables Are boys bullied as much as girls? tabulate bullied sex, col chi2 nofreq equal proportions? May-16 H.S. 31 Summing up • Descriptive summarize weight tabulate sex • Graphs twoway (plot1, opts) (plot2, opts), opts • Bivariate • ttest weight, by(sex) • tabulate bullied sex, chi2 May-16 H.S. 32