Stata0, Introduction Hein Stigum Presentation, data and programs at:

advertisement
Stata0, Introduction
Hein Stigum
Presentation, data and programs at:
http://folk.uio.no/heins/
Why Stata
• Pro
–
–
–
–
–
Aimed at epidemiology
Many methods, growing
Graphics
Structured, Programmable
Coming soon to a course near you
• Con
– Memory>file size
– Copy tables
May-16
H.S.
2
Data handling
Import data
• Using SPSS 14.0
– Save as, Stata Version 8 SE
May-16
H.S.
4
Interface
May-16
H.S.
5
Do Editor
• New
– Ctrl-8,
or:
• Run
– Mark commands, Ctrl-D to do (execute)
May-16
H.S.
6
Do-file example
May-16
H.S.
7
Syntax
• Syntax
[bysort varlist:] command [varlist] [if exp] [in range][, opts]
• Examples
–
–
–
–
May-16
mean age
mean age if sex==1
bysort sex: summarize age
summarize age ,detail
H.S.
8
Use and save data
• Open data
– set memory 200m
– use “C:\Course\Myfile.dta”, clear
• Describe
– describe
– list x1 x2 in 1/20
describe all variables
list obs nr 1 to 20
• Save data
– save “C:\Course\Myfile.dta” ,replace
May-16
H.S.
9
Drop and keep
• Drop
– drop x1 x2
– drop if sex==1
– drop if age==.
drop variables x1 and x2
drop males
drop missing
• Keep
– same as drop
May-16
H.S.
10
Recode
• Syntax
– From 4 to 2 groups:
recode educ (1 2=1) (3 4=2)(missing=.), gen(educ2)
– From cont. to 3 groups:
recode age (min/19=1) (20/29=2) (30/max=3), gen(age3)
May-16
H.S.
11
Labels
• Variable
– label variable q1 ”Age”
• Value
1 ) label define freqLab 1”Low” 2”Med” 3”High”
2a) label values smoke freqLab
2b) label values drink freqLab
• List
– label list
May-16
H.S.
12
Generate, replace
• Age square
– generate ageSqr=age^2
• Young/Old
• Alternatives
– generate old=0 if (age<=50)
– replace old=1 if (age>50)
generate old=(age>50)
generate old=(age>50) if age<.
• Observation numbers
– gen id=_n
– gen lag=age[ _n-1]
May-16
H.S.
13
Dates
• From numeric to date
ex: m=12, d=2, y=1987
generate bdate=mdy(m,d,y)
format bdate %d
• From string to date
ex: bstr=“01.12.1987”
generate bdate=date(bstr,”dmy”)
format bdate %d
May-16
H.S.
14
Missing
• Obs!!!
– Missing values are large numbers
– age>30
will include missing.
– age>30 if age<.
will not.
• Test
– replace x=0 if (x==.)
• Remove
– drop if age==.
• Change
– replace educ=. if educ==99
May-16
H.S.
15
Describe missing
• Summarize variables
• Missing in tables
May-16
H.S.
16
Handle data with many variables
• Describe
– describe vars
– summarize vars
– codebook vars
format and labels
N, mean, std, min and max
range, missing, mean and std, percentiles
• Find variables
– describe, simple
– lookfor age
– describe age*, n
list all variables
list variables with “age” in name or label
list vars starting with “age” and show var number
• Change order
– order vars
May-16
change order of variables
H.S.
17
Help
• General
– help
– findit
command
keyword
search Stata+net
• Examples
– help table
– findit aflogit
May-16
H.S.
18
Summing up
• Use do files
– Mark, Ctrl-D to do (execute)
• Syntax
– command [varlist] [if exp] [in range] [, options]
• Missing
– age>30 & age<.
– generate old=(age>50) if age<.
• Help
– help describe
May-16
H.S.
19
Books
Data Analysis Using Stata
by Ulrich Kohler and Frauke Kreuter
Statistics with Stata (Updated for Version 9)
by Lawrence C. Hamilton
A visual guide to Stata graphics
by M.N. Mitchell
Multilevel and longitudinal modeling using Stata
by S. Rabe-Hesketh, A. Skrondal
May-16
H.S.
20
Download