Stata Intro Mixed Models Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ Why Stata • Pro – – – – – Aimed at epidemiology Many methods, growing Graphics Structured, Programmable Coming soon to a course near you • Con – Memory>file size – Copy tables Apr-20 H.S. 2 Use Interface Apr-20 H.S. 4 Do Editor • New – Ctrl-8, or: • Run – Mark commands, Ctrl-D to do (execute) Apr-20 H.S. 5 Do-file example Apr-20 H.S. 6 Syntax • Syntax [bysort varlist:] command [varlist] [if exp] [in range][, opts] • Examples – – – – mean age mean age if sex==1 bysort sex: summarize age summarize age ,detail Apr-20 H.S. 7 Data handling Import data • Using SPSS 14.0 – Save as, Stata Version 8 SE Apr-20 H.S. 9 Use and save data • Open data – set memory 200m – use “C:\Course\Myfile.dta”, clear • Describe – describe – list x1 x2 in 1/20 describe all variables list obs nr 1 to 20 • Save data – save “C:\Course\Myfile.dta” ,replace Apr-20 H.S. 10 Generate, replace • Age square – generate ageSqr=age^2 • Young/Old • Alternatives – generate old=0 if (age<=50) – replace old=1 if (age>50) generate old=(age>50) generate old=(age>50) if age<. • Observation numbers – gen id=_n – gen lag=age[ _n-1] Apr-20 H.S. 11 Missing • Obs!!! – Missing values are large numbers – age>30 will include missing. – age>30 & age<. will not. • Test – replace x=0 if (x==.) • Remove – drop if age==. • Change – replace educ=. if educ==99 Apr-20 H.S. 12 Calculater • Display – dis 26/3 – dis exp(1.2) • Store results – scalar se=sqrt( 0.8*(1-0.8)/60 ) – dis se Apr-20 H.S. 13 Help • General – help – findit command keyword search Stata+net • Examples – help table – findit aflogit Apr-20 H.S. 14 Summing up • Use do files – Mark, Ctrl-D to do (execute) • Syntax – command [varlist] [if exp] [in range] [, options] • Missing – age>30 & age<. – generate old=(age>50) if age<. • Help – help describe Apr-20 H.S. 15 Books Web: http://www.stata.com/bookstore A Gentle Introduction to Stata by Alan C. Acock A visual guide to Stata graphics by M.N. Mitchell Multilevel and longitudinal modeling using Stata by S. Rabe-Hesketh, A. Skrondal Apr-20 H.S. 16 Graphics Twoway density • Syntax – graph twoway (plot1, opts) (plot2, opts), opts • One plot – kdensity x • Two plots, boys and girls compared twoway Apr-20 ( kdensity weight if sex==1, lcolor(blue) ) /// ( kdensity weight if sex==2, lcolor(red) ) H.S. 18 twoway ( kdensity weight if sex==1, lcolor(blue) ) /// ( kdensity weight if sex==2, lcolor(red) ) 0 .0002 .0004 .0006 .0008 Weight distribution by sex 1000 Apr-20 2000 3000 gram H.S. 4000 5000 19 Twoway scatter • Syntax – graph twoway (plot1, opts) (plot2, opts), opts • Examples – scatter y x – twoway (scatter y x) (lfit y x) Fitlines with CI lfit lfitci Linear qfit qfitci quadratic mband, mspline Median band, median spline fpfitci lowess Apr-20 Fractional polynomial Local regression H.S. 20 twoway (scatter weight gest)(lfitci weight gest) 1000 2000 gram 3000 4000 5000 Weight by gestational age 240 Apr-20 260 280 days H.S. 300 320 21 Descriptives Apr-20 H.S. 22 Central tendency and dispersion Mean and standard deviation: Mean with confidence interval: Apr-20 H.S. 23 Frequency and proportion Frequency: Proportion with CI: Apr-20 H.S. 24 Crosstables Are boys bullied as much as girls? equal proportions? Apr-20 H.S. 25 Tables for epidemiologists • Data – Must be 0/1 – Long format. Wide format • Commands – cc – mcc Case-control Matched case-control • Example – cc disease exposed, by(sex) Stratified MH-OR • Calculator (i=immideate) – cci 10 90 5 95 11.04.2020 OR H.S. 26 Logistic regression Being bullied 11 April 2020 H.S. 27 Syntax • Estimation – logistic y x1 x2 – xi: logistic y x1 i.c1 logistic regression categorical c1 • Post estimation – predict yf, pr predict probability • Manage models – estimates store m1 – est table m1, eform 11 April 2020 save model show OR H.S. 28 Bivariate, dummies Generate dummies gen Island= gen Norway= gen Finland= gen Denmark= 11 April 2020 (country==2) if country<. (country==3) (country==4) (country==5) H.S. 29 Model 1: outcome and exposure Alternative commands: xi:logistic bullied i.country use xi: i.var for categorical variables xi:logistic bullied i.country , coef coefs instead of OR's xi:logistic bullied i.country if sex!=. & age!=. do if sex and age not missing 11 April 2020 H.S. 30 Model 2: Add confounders Estimate associations: m1=m2 Predict: m2 best 11 April 2020 H.S. 31 Model 3: interaction lincom age+1*agesex lincom age+2*agesex 11 April 2020 effect of age for boys effect of age for girls H.S. 32 • Estimation Regression Summary – regress y x1 x2 – logistic y x1 x2 – xi:regress y x1 i.x2 linear regression logistic regression categorical x2 • Manage results – estimates store m1 – estimates table m1 m2 – estimates stats m1 m2 store results table of results statistics of results • Post estimation – predict y, xb – predict res, resid – lincom b0+2*b3 linear prediction residuals linear combination • Help – help logistic postestimation 11 April 2020 H.S. 33 Mixed Models Multilevel models Panel data Repeated measurements Apr-20 H.S. H.S. 34 Long and wide data 1. 2. id bp0 bp1 bp2 bp3 1 2 151.6 132.5 156.8 139.1 138.5 150.0 161.7 159.9 Wide data reshape wide bp, i(id) j(occ) reshape long bp, i(id) j(occ) id occ bp 1. 2. 3. 4. 1 1 1 1 0 1 2 3 151.6 156.8 138.5 161.7 5. 6. 7. 8. 2 2 2 2 0 1 2 3 132.5 139.1 150.0 159.9 Apr-20 Long data H.S. 35 Correlated measures • Two measures per person: W1 W2 symmetry W1 W2 Measure the same? • Matched Case-Control mcc expCase expContr Matched OR Multilevel data • Panel data • xt • help xt xsectional time data Setup and describe • Set panel data – xtset school – xtset id time pupils nested in schools times nested in subjects • Describe panel data – – – – xtdes xtsum bp xttab ht xtline bp describe data and missing summarize bloodpressure tabulate hypertension plot bp versus time for each id • Lag and lead – replace bp=bp[ _n+1] if id==1 Apr-20 H.S. 38 Logistic regression methods • Fixed effects models – logit y x1 x2, or • Conditional fixed effects models – clogit y x1 x2, group(id) or • Random intercept models – xtlogit y x1 x2, i(id) or • Mixed effects models – xtmelogit y x1 x2 || id: x1 , or • Population average effects – xtgee y x1 x2, i(id) t(time) fam(bin) link(logit) robust eform Apr-20 H.S. 39