Stata 0 Introduction 3h Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ courses May-16 H.S. H.S. 1 Stata introduction • General use – Interface and menu – Do-files and syntax – Data handling • Analysis – Descriptive – Graphs – Bivariate May-16 H.S. 2 Why Stata • Pro – – – – – Aimed at epidemiology Many methods, growing Graphics Structured, Programmable Coming soon to a course near you • Con – Memory>file size May-16 H.S. 3 INTERFACE Interface Stata 12 Do Data file edit May-16 H.S. 5 Menu May-16 H.S. 6 Do-file example New do-file: icon or Ctrl-9 Run: Mark, Ctrl-D May-16 H.S. 7 Syntax • Syntax [bysort varlist:] command [varlist] [if exp] [in range][, opts] • Examples – – – – May-16 mean age mean age if sex==1 bysort sex: summarize age summarize age ,detail H.S. 8 DATA HANDLING Import data • Using SPSS 14.0-?? – Save as, Stata Version 8 SE May-16 H.S. 10 Use and save data • Open data – use “C:\Course\Myfile.dta”, clear • Describe – describe – list x1 x2 in 1/20 describe all variables list obs nr 1 to 20 • Save data – save “C:\Course\Myfile.dta” ,replace May-16 H.S. 11 Stata via kiosk • Stata 13 – https://kiosk.uio.no – Analyse > StataMp13 > … • Course files 1. webuse set “http://www.med.uio.no/forskning/doktorgradkarriere/forskerutdanning/kurs/biostatistikk/mf9510-logistiskregresjon-overlevelsesanalyse-cox/” set homepage 2. webuse “birth1” data for exercise 1 May-16 H.S. 12 Exercise 1 • Start Stata • Open a new syntax file • Paste in the webuse set “…” text. The text between “…” should turn red. Replace the “ ” if it doesn’t. Remove any blank spaces inside the red text. Run the command. • Type in webuse “birth1” and run. • Describe all variables: describe. • List the 10 first observations of weight, sex and mother’s age (mage) • Save the syntax file for later use in the course 30 minutes May-16 H.S. 13 Descriptive • Continuous summarize weight summarize weight, detail fractiles ++ • Categorical tabulate bullied tabulate bullied, nolab May-16 show coding H.S. 14 Other descriptives tabstat mAge, stat( N min p50 mean max) by(parity) May-16 H.S. 15 Generate, replace, recode • Index (0/1) (young men) – generate index=0 – replace index=1 if sex==1 & age<30 • Old (0/1) – generate old=(age>50) if age<. • Recode 1 to 0 and 2 to 1 into sex0 – recode sex (1=0) (2=1), generate(sex0) May-16 H.S. 16 Dates • From numeric to date (3 numeric variables into date variable) ex: m=12, d=2, y=1987 generate birth=mdy(m,d,y) format birth %td • From string to date (1 string variable into date variable) ex: bstr=“01.12.1987” generate birth=date(bstr,”DMY”) format birth %td May-16 H.S. 17 Exercise 2 • Summarize mother’s age • Tabulate sex • Recode parity into parity4 with categories 0, 1, 2, 3-7 (Hint (3/7=3) ) – Tabulate parity by parity4 • Generate gestational age in weeks – Summarize the new variable • Generate and format new variable birth in date format based on the three variables day, month and year (If day, month and year do not exist, run the 3 lines at the start of the syntax file “Stata 0, Introduction.do”) – List day, month, year and birth to control the results 15 minutes May-16 H.S. 18 Missing • Obs!!! – – – – Represented as ”.” Missing values are large numbers age>30 will include missing. age>30 if age<. will not. • Test – replace age=0 if (age==.) • Remove – drop if age==. • Change – replace educ=. if educ==99 May-16 H.S. 19 Describe missing • Summarize missing misstable summarize weight sex gest missing • Missing in tables tab bullied sex, missing May-16 H.S. 20 Exercise 3 • Tabulate missing in gestational age (gest) with the misstable command • Tabulate gest4 versus sex and include missing • Summarize mage if gest is greater than 260 days – Will this include missing in gest? – Summarize mage if gest is greater than 260 days excluding missing in gest 10 minutes May-16 H.S. 21 Help • General – help – findit command keyword search Stata+net • Examples – help table – findit aflogit May-16 H.S. 22 Summing up • Use do files – Run: Mark, Ctrl-D • Syntax – command [varlist] [if exp] [in range] [, options] • Missing – age>30 if age<. – generate old=(age>50) if age<. • Help – help describe May-16 H.S. 23 GRAPHICS May-16 H.S. 24 Twoway plots • Syntax – twoway (plot1, opts) (plot2, opts), opts • One plot Kernel density estimate – kdensity bw 0 2000 4000 Birth weight 6000 0 2000 Birth weight – scatter bw gest 4000 6000 kernel = epanechnikov, bandwidth = 102.3251 240 May-16 H.S. 260 280 300 Gestational age 320 340 25 twoway (scatter bw gest) (fpfitci bw gest) (lfit bw gest) smooth with CI scatter line fit 2000 3000 gram 4000 5000 6000 Weight by gestational age 250 270 290 310 days May-16 H.S. 26 Titles scatter bw gest, title("title") subtitle("subtitle") xtitle("xtitle") ytitle("ytitle") note("note") title 1000 2000 3000 4000 5000 ytitle subtitle 240 260 280 xtitle 300 320 note May-16 H.S. 27 /// Exercise 4 • Make a density plot of birth weight (weight) • Make a scatter plot of birth weight versus gestational age (gest) – Remove outliers (hint if gest>250 & gest<310) – Add a linear fit line to the scatter plot to see the trend – Add a smoothing curve with confidence interval to the plot (fpfitci) to look for non-linear trend (hint: order of the plots matter) – Add a title, ytitle and xtitle to the plot 15 minutes May-16 H.S. 28 BIVARIATE ANALYSIS 2 independent samples Do boys and girls have the same mean birth weight? twoway ( kdensity weight if sex==1, lcolor(blue) ) /// ( kdensity weight if sex==2, lcolor(red) ) Equal means? Equal variance? 2000 May-16 3000 H.S. 4000 Birth weight 5000 6000 30 2 independent samples test ttest weight, by(sex) 2-sample T-test ttest weight, by(sex) unequal ttest w1 w2, paired May-16 H.S. 31 Crosstables Are boys bullied as much as girls? tabulate bullied sex, col chi2 nofreq equal proportions? May-16 H.S. 32 Exercise 5 • The variable “magegr2” contains mother’s age in two groups. Do tab magegr2 and tab magegr2, nolab to find the groups and the coding. An alternative way is to list all labels: label list • Make a plot of the birth weight distribution for each of the two groups of mother’s age. • Do a ttest of weight by magegr2. Are the means different? • Redo the ttest for weight>2000 to get more normal distributions. – Are the means different? • Generate an indicator for high birth weight (>4500). • Make a table high birth weight by gestgr2 with columns percent and chi-square test. 15 minutes May-16 H.S. 33 Extra (if you have time) • Do a help tabstat and look at the statistics options • Do a tabstat of weight showing N min p25 p50 p75 max, by magegr2 May-16 H.S. 34 Summing up • Descriptive summarize weight tabulate sex continuous categorical • Graphs twoway (plot1, opts) (plot2, opts), opts • Bivariate ttest weight, by(sex) tabulate bullied sex, chi2 May-16 H.S. continuous categorical 35 EXTRA INFO Keep graphs (see last part of syntax file) 1. Set (run once) set autotabgraphs on 2. Give each plot a name scatter weight mage , name("plot1", replace) kdensity weight , name("plot2", replace) 3. Both plots are now on the screen May-16 H.S. 37 Stata via kiosk on Mac Mac brukere må først fjernstyre en Windows server og derfra bruke Internet Explorer for å logge inn på Windows versjonen av kiosk. UiO anbefaler at Macbruker laster ned og installerer programmet Cord som kan brukes til å fjernstyre Windows maskiner. Derfra startes Internet Explorer med innlogging til http://kiosk.uio.no. Informasjon og veiledning om nedlasting, konfigurasjon og bruk av CoRd finnes på: http://www.uio.no/tjenester/it/maskin/programvare/hjelp/finne-programmer/sentraleservere.html https://www.uio.no/tjenester/it/maskin/programvare/programkiosk/hjelp/tilgjengelige_programmer.html UiO anbefaler å lagre dokumenter på M: disken under Computer (enten direkte på M:`eller M:\pc\Desktop eller M:\pc\My Documents). May-16 H.S. 38