Stata A statistical package for epidemiologists 29.05.2016 H.S. 1 Packages compared SPSS typical user Social scientist cost, kr 15 000 menu / syntax yes user friendly 4 datahandling OK graphics 4 non-parametric 4 regresion 3 epidemiology 0 survival analysis 3 factor analysis 4 multi level 0 path-models (SEM) 0 measurement error 0 programable no new methods 2 29.05.2016 Stata Epi/med stat 2 500 yes 4 OK 6 5 5 5 5 4 4 3 3 yes 5 H.S. Splus / R Mat stat 8000 / 0 yes 2 OK 5 5 5 3 6 4 3 SAS 50 000 yes 4 OK 4 5 5 - yes 5 3 yes 5 2 Packages compared SPSS typical user Social scientist cost, kr 15 000 menu / syntax yes user friendly 4 datahandling OK graphics 4 non-parametric 4 regresion 3 epidemiology 0 survival analysis 3 factor analysis 4 multi level 0 path-models (SEM) 0 measurement error 0 programable no new methods 2 29.05.2016 Stata Epi/med stat 2 500 yes 4 OK 6 5 5 5 5 4 4 3 3 yes 5 H.S. Splus / R Mat stat 8000 / 0 yes 2 OK 5 5 5 3 6 4 3 SAS 50 000 yes 4 OK 4 5 5 - yes 5 3 yes 5 3 Packages compared SPSS typical user Social scientist cost, kr 15 000 menu / syntax yes user friendly 4 datahandling OK graphics 4 non-parametric 4 regresion 3 epidemiology 0 survival analysis 3 factor analysis 4 multi level 0 path-models (SEM) 0 measurement error 0 programable no new methods 2 29.05.2016 Stata Epi/med stat 2 500 yes 4 OK 6 5 5 5 5 4 4 3 3 yes 5 H.S. Splus / R Mat stat 8000 / 0 yes 2 OK 5 5 5 3 6 4 3 SAS 50 000 yes 4 OK 4 5 5 - yes 5 3 yes 5 4 Packages compared SPSS typical user Social scientist cost, kr 15 000 menu / syntax yes user friendly 4 datahandling OK graphics 4 non-parametric 4 regresion 3 epidemiology 0 survival analysis 3 factor analysis 4 multi level 0 path-models (SEM) 0 measurement error 0 programable no new methods 2 29.05.2016 Stata Epi/med stat 2 500 yes 4 OK 6 5 5 5 5 4 4 3 3 yes 5 H.S. Splus / R Mat stat 8000 / 0 yes 2 OK 5 5 5 3 6 4 3 SAS 50 000 yes 4 OK 4 5 5 - yes 5 3 yes 5 5 Why Stata • Pro – – – – – Aimed at epidemiology Many methods, growing Graphics Structured, Programable Comming soon to a course near you • Con – Memory>file size 29.05.2016 H.S. 6 Syntax • Full syntax [by varlist:] command [varlist] [if] [in] [, options] • Examples – – – – mean age mean age if sex==1 by sex, sort: summarize age summarize age ,detail 29.05.2016 H.S. 7 Ways of working • Testing – p-values • Estimation – Estimate with confidence interval 29.05.2016 H.S. 8 Bivariate 29.05.2016 H.S. 9 Mean with CI • Advanced features – Standarization – Clustering – Bootstrap 29.05.2016 H.S. 10 Median and percentiles with CI 29.05.2016 H.S. 11 Compare means .0008 Birth weight distribution by sex .0004 0 .0002 density .0006 Boys, N=291 Girls, N=273 2000 29.05.2016 3000 4000 gram H.S. 5000 6000 12 Compare means, T-test ttest weight, by(sex) unequal ttest var1=var2 29.05.2016 H.S. uneq. var. paired test 13 Graphics 29.05.2016 H.S. 14 4000 5000 6000 4000 weight 4000 2000 1000 2000 3000 weight weight 4000 5000 240 260 280 gest 300 320 Bar 40 30 Bar and dot plots 4,000 40 weight 1000 0 3000 30 6000 3000 Density 2.0e-04 4.0e-04 .0006 .0004 2000 20 2000 0 .0002 1000 Matrix plot Scatter 5000 8.0e-04 Histogram 6.0e-04 .0008 Density HBar mage 20 Dot 300 3,000 <=280 days <=280 days gest 2,000 280 260 >280 days 1,000 >280 days 2000 0 0 <=280 days >280 days 1,000 2,000 mean of weight 3,000 4,000 0 1,000 2,000 mean of weight 3,000 4000 6000 260 280 300 4,000 Pie plot Box plots HBox 5,000 Box 4,000 <=280 days 3,000 <=280 days >280 days 2,000 Density Twoway plots >280 days 1,000 weight Plottypes 1,000 <=280 days 29.05.2016 >280 days 2,000 3,000 weight 4,000 5,000 H.S. 15 Legend with extra information .0008 Birth weight distribution by sex .0004 0 .0002 density .0006 Boys, N=291 Girls, N=273 2000 29.05.2016 3000 4000 gram H.S. 5000 6000 16 Density with min, max and fractiles .0004 0 .0002 Density .0006 .0008 Weight 1392 2750 3220 3630 3960 gram 4520 5488 N=553 29.05.2016 H.S. 17 Scatter with fitline + extra point 2000 3000 4000 5000 Birth weight by gestational age 1000 Removed before analysis 240 29.05.2016 260 280 Gestational age in days H.S. 300 320 18 Bar with labels inside Horizontal bars Long labels are not a problem with horizontal bars and labels inside 0 29.05.2016 1,000 2,000 mean of weight H.S. 3,000 4,000 19 Regression results Bullied Crude models sex Adjusted single chron 0 29.05.2016 .5 2 1.5 1 Odds ratios with 95% confidence interval H.S. 2.5 20 Regression 29.05.2016 H.S. 21 Purpose of regression • Prediction – Use an estimated model to predict the outcome given covariates in a new dataset • Estimation – Estimate association between outcome and covariates adjusted for the other covariates 29.05.2016 H.S. 22 Linear regression, exposure only 29.05.2016 H.S. 24 Add confounders and compare 29.05.2016 H.S. 25 Assumtions and influence • Test of assumptions – Independent errors – Linear effects – Constant error variance • Influence, robustness 29.05.2016 H.S. 26 6000 Influence 5000 Regression without outlier 4000 Regression with outlier 2000 3000 Outlier 200 29.05.2016 300 400 500 Gestational age H.S. 600 700 27 Binary regression • Odds ratio, OR – binreg y x1 x2, or Link: logit • Risk ratio, RR – binreg y x1 x2, rr Link: log (ln) • Risk difference, RD – binreg y x1 x2, rd 29.05.2016 Link: identity H.S. 28 Odds ratio, Relative risk, Risk Diff Problems: convergence 29.05.2016 H.S. 29 Help 29.05.2016 H.S. 30 Search for help • General – help – findit command keyword search the net • Examples – help table – findit GAM • My home page – http://folk.uio.no/heins/ 29.05.2016 H.S. 31 Books A visual guide to Stata graphics by M.N. Mitchell Data Analysis Using Stata by Ulrich Kohler and Frauke Kreuter Statistics with Stata (Updated for Version 9) by Lawrence C. Hamilton Multilevel and longitudinal modeling using Stata by S. Rabe-Hesketh, A. Skrondal 29.05.2016 H.S. 32