Stata 2, Bivariate Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ May-16 H.S. 1 Datatypes • Categorical data – Nominal: – Ordinal: married/ single/ divorced small/ medium/ large • Numerical data – Discrete: number of children – Continuous: weight May-16 H.S. 2 Data type dictates type of analysis Data type Numerical Yes Means T-test Linear regression May-16 Normal data Categorical No Medians Non-par tests H.S. Freq table Cross, Chisquare Logistic regression 3 Continuous symmetric outcome Example: Birth weight May-16 H.S. 4 Distribution 0 0 .0002 .0004 .0006 .0008 drop if weight<2000 kdensity weight Density kdensity weight 0 2000 4000 6000 weight May-16 H.S. 2000 3000 4000 weight 5000 6000 5 Central tendency and dispersion Mean and standard deviation: Mean with confidence interval: May-16 H.S. 6 Compare groups, equal variance? • Equal 2 May-16 0 • Not equal 2 4 2 H.S. 0 2 4 7 2 independent samples Are birth weights the same for boys and girls? Density plot 2000 3000 4000 5000 6000 Scatterplot Boys Girls 2000 3000 sex May-16 H.S. 4000 Birth weight 5000 6000 8 2 independent samples test May-16 H.S. 9 K independent samples • Is birth weight the same over parity? Density plot 6000 Scatterplot 2000 3000 4000 5000 0 1 2+ 0 1 Parity 2-7 2000 3000 Equal means? Linear effect? Outliers? May-16 4000 Birth weight, g 5000 6000 Equal variances? H.S. 10 K independent samples test equal means? Equal variances? May-16 H.S. 11 Continuous by continuous • Does birth weight depend on gestational age? Scatterplot 4000 3000 2000 2000 3000 4000 Birth weight 5000 5000 6000 Scatterplot, outlier dropped 200 May-16 300 400 500 600 Gestational age 700 200 220 240 260 280 300 Gestational age H.S. 12 Continuous by continuous tests • Cut gestational age up in groups, then use T-test or ANOVA or • Use linear regression with 1 covariate May-16 H.S. 13 Test situations • 2 independent samples • ttest weight, by(sex) • K independent samples • oneway weight parity • By continuous • regress weight gestAge • 2 dependent samples (Paired) • ttest weight_last_year = weight_today May-16 H.S. 14 Continuous skewed outcome Example: Number of sexual partners May-16 H.S. 15 Distribution kdensity partners if partners<=50 0 .02 .04 .06 .08 .1 Distribution of number of lifetime partners 25%50% 75% 95% 1 4 9 20 50 Partners N=394 May-16 H.S. 16 Central tendency and dispersion Median and percentiles: May-16 H.S. 17 2 independent samples Do males and females have the same number of partners? Density plot 0 50 100 150 200 Scatterplot Males Females 0 Gender May-16 H.S. 10 20 30 partners 40 50 18 2 independent samples test equal medians? May-16 H.S. 19 K independent samples Do partners vary with age? Density plot (partners<20) 200 20 Scatterplot (partners<20) 00 50 5 100 10 150 15 Age: 18-29 30-44 45-60 18-29 18-29 May-16 30-44 30-44 agegr3 agegr3 45-60 45-60 0 H.S. 5 10 Partners 15 20 20 K independent samples test equal medians? May-16 H.S. 21 Table of tests Numerical data Normal Skewed 1 sample One sample T-test Kolmogorov-Smirnov 2 independent samples Independent sample T-test Mann-Whitney U K independent samples ANOVA Kruskal-Wallis 2 dependent samples Paired sample T-test Wilcoxon signed rank test Categorical ordered: May-16 Proportions Binomial Chi-square Chi-square Mc-Nemar (2x2) use nonparametric tests H.S. 22 Categorical data Example: Being bullied May-16 H.S. 23 Frequency and proportion Frequency: Proportion with CI: May-16 H.S. 24 Proportion, confidence interval proportion: x=”disease” n=total number x p n p (1 p ) n standard error: se( p ) confidence interval: CI ( p ) p 2 se( p ) May-16 H.S. 25 Crosstables Are boys bullied as much as girls? equal proportions? May-16 H.S. 26 Ordered categories, trend .1 .15 .2 .25 Does bullied vary with age? twoway (fpfitci bullied agegr) /// (lfit bullied agegr) 2-6 y May-16 7-12 y Age group H.S. 13-17 y 27 Ordered categories, trend Trend? equal proportions? May-16 H.S. 28 Table of tests Numerical data Normal Skewed 1 sample One sample T-test Kolmogorov-Smirnov 2 independent samples Independent sample T-test Mann-Whitney U K independent samples ANOVA Kruskal-Wallis 2 dependent samples Paired sample T-test Wilcoxon signed rank test Categorical ordered: May-16 Proportions Binomial Chi-square Chi-square Mc-Nemar (2x2) use nonparametric tests H.S. 29