Stata A statistical package for epidemiologists 29.05.2016

advertisement
Stata
A statistical package for
epidemiologists
29.05.2016
H.S.
1
Packages compared
SPSS
typical user
Social scientist
cost, kr
15 000
menu / syntax
yes
user friendly
4
datahandling
OK
graphics
4
non-parametric
4
regresion
3
epidemiology
0
survival analysis
3
factor analysis
4
multi level
0
path-models (SEM)
0
measurement error
0
programable
no
new methods
2
29.05.2016
Stata
Epi/med stat
2 500
yes
4
OK
6
5
5
5
5
4
4
3
3
yes
5
H.S.
Splus / R
Mat stat
8000 / 0
yes
2
OK
5
5
5
3
6
4
3
SAS
50 000
yes
4
OK
4
5
5
-
yes
5
3
yes
5
2
Packages compared
SPSS
typical user
Social scientist
cost, kr
15 000
menu / syntax
yes
user friendly
4
datahandling
OK
graphics
4
non-parametric
4
regresion
3
epidemiology
0
survival analysis
3
factor analysis
4
multi level
0
path-models (SEM)
0
measurement error
0
programable
no
new methods
2
29.05.2016
Stata
Epi/med stat
2 500
yes
4
OK
6
5
5
5
5
4
4
3
3
yes
5
H.S.
Splus / R
Mat stat
8000 / 0
yes
2
OK
5
5
5
3
6
4
3
SAS
50 000
yes
4
OK
4
5
5
-
yes
5
3
yes
5
3
Packages compared
SPSS
typical user
Social scientist
cost, kr
15 000
menu / syntax
yes
user friendly
4
datahandling
OK
graphics
4
non-parametric
4
regresion
3
epidemiology
0
survival analysis
3
factor analysis
4
multi level
0
path-models (SEM)
0
measurement error
0
programable
no
new methods
2
29.05.2016
Stata
Epi/med stat
2 500
yes
4
OK
6
5
5
5
5
4
4
3
3
yes
5
H.S.
Splus / R
Mat stat
8000 / 0
yes
2
OK
5
5
5
3
6
4
3
SAS
50 000
yes
4
OK
4
5
5
-
yes
5
3
yes
5
4
Packages compared
SPSS
typical user
Social scientist
cost, kr
15 000
menu / syntax
yes
user friendly
4
datahandling
OK
graphics
4
non-parametric
4
regresion
3
epidemiology
0
survival analysis
3
factor analysis
4
multi level
0
path-models (SEM)
0
measurement error
0
programable
no
new methods
2
29.05.2016
Stata
Epi/med stat
2 500
yes
4
OK
6
5
5
5
5
4
4
3
3
yes
5
H.S.
Splus / R
Mat stat
8000 / 0
yes
2
OK
5
5
5
3
6
4
3
SAS
50 000
yes
4
OK
4
5
5
-
yes
5
3
yes
5
5
Why Stata
• Pro
–
–
–
–
–
Aimed at epidemiology
Many methods, growing
Graphics
Structured, Programable
Comming soon to a course near you
• Con
– Memory>file size
29.05.2016
H.S.
6
Syntax
• Full syntax
[by varlist:] command [varlist] [if] [in] [, options]
• Examples
–
–
–
–
mean age
mean age if sex==1
by sex, sort: summarize age
summarize age ,detail
29.05.2016
H.S.
7
Ways of working
• Testing
– p-values
• Estimation
– Estimate with confidence interval
29.05.2016
H.S.
8
Bivariate
29.05.2016
H.S.
9
Mean with CI
• Advanced features
– Standarization
– Clustering
– Bootstrap
29.05.2016
H.S.
10
Median and percentiles with CI
29.05.2016
H.S.
11
Compare means
.0008
Birth weight distribution by sex
.0004
0
.0002
density
.0006
Boys, N=291
Girls, N=273
2000
29.05.2016
3000
4000
gram
H.S.
5000
6000
12
Compare means, T-test
ttest weight, by(sex) unequal
ttest var1=var2
29.05.2016
H.S.
uneq. var.
paired test
13
Graphics
29.05.2016
H.S.
14
4000
5000
6000
4000
weight
4000
2000
1000
2000
3000
weight
weight
4000
5000
240
260
280
gest
300
320
Bar
40
30
Bar and dot plots
4,000
40
weight
1000
0
3000
30
6000
3000
Density
2.0e-04
4.0e-04
.0006
.0004
2000
20
2000
0
.0002
1000
Matrix plot
Scatter
5000
8.0e-04
Histogram
6.0e-04
.0008
Density
HBar
mage
20
Dot
300
3,000
<=280 days
<=280 days
gest
2,000
280
260
>280 days
1,000
>280 days
2000
0
0
<=280 days
>280 days
1,000
2,000
mean of weight
3,000
4,000
0
1,000
2,000
mean of weight
3,000
4000
6000
260
280
300
4,000
Pie plot
Box plots
HBox
5,000
Box
4,000
<=280 days
3,000
<=280 days
>280 days
2,000
Density
Twoway plots
>280 days
1,000
weight
Plottypes
1,000
<=280 days
29.05.2016
>280 days
2,000
3,000
weight
4,000
5,000
H.S.
15
Legend with extra information
.0008
Birth weight distribution by sex
.0004
0
.0002
density
.0006
Boys, N=291
Girls, N=273
2000
29.05.2016
3000
4000
gram
H.S.
5000
6000
16
Density with min, max and fractiles
.0004
0
.0002
Density
.0006
.0008
Weight
1392
2750
3220 3630 3960
gram
4520
5488
N=553
29.05.2016
H.S.
17
Scatter with fitline + extra point
2000
3000
4000
5000
Birth weight by gestational age
1000
Removed before analysis
240
29.05.2016
260
280
Gestational age in days
H.S.
300
320
18
Bar with labels inside
Horizontal bars
Long labels are
not a problem
with horizontal bars
and labels inside
0
29.05.2016
1,000
2,000
mean of weight
H.S.
3,000
4,000
19
Regression results
Bullied
Crude models
sex
Adjusted
single
chron
0
29.05.2016
.5
2
1.5
1
Odds ratios with 95% confidence interval
H.S.
2.5
20
Regression
29.05.2016
H.S.
21
Purpose of regression
• Prediction
– Use an estimated model to predict the
outcome given covariates in a new dataset
• Estimation
– Estimate association between outcome and
covariates adjusted for the other covariates
29.05.2016
H.S.
22
Linear regression, exposure only
29.05.2016
H.S.
24
Add confounders and compare
29.05.2016
H.S.
25
Assumtions and influence
• Test of assumptions
– Independent errors
– Linear effects
– Constant error variance
• Influence, robustness
29.05.2016
H.S.
26
6000
Influence
5000
Regression
without outlier
4000
Regression with outlier
2000
3000
Outlier
200
29.05.2016
300
400
500
Gestational age
H.S.
600
700
27
Binary regression
• Odds ratio, OR
– binreg y x1 x2, or
Link: logit
• Risk ratio, RR
– binreg y x1 x2, rr
Link: log (ln)
• Risk difference, RD
– binreg y x1 x2, rd
29.05.2016
Link: identity
H.S.
28
Odds ratio, Relative risk, Risk Diff
Problems: convergence
29.05.2016
H.S.
29
Help
29.05.2016
H.S.
30
Search for help
• General
– help
– findit
command
keyword
search the net
• Examples
– help table
– findit GAM
• My home page
– http://folk.uio.no/heins/
29.05.2016
H.S.
31
Books
A visual guide to Stata graphics
by M.N. Mitchell
Data Analysis Using Stata
by Ulrich Kohler and Frauke Kreuter
Statistics with Stata (Updated for Version 9)
by Lawrence C. Hamilton
Multilevel and longitudinal modeling using Stata
by S. Rabe-Hesketh, A. Skrondal
29.05.2016
H.S.
32
Download