Stata 0 Introduction 3h Hein Stigum Presentation, data and programs at:

advertisement
Stata 0 Introduction
3h
Hein Stigum
Presentation, data and programs at:
http://folk.uio.no/heins/
courses
May-16
H.S.
H.S.
1
Stata introduction
• General use
– Interface and menu
– Do-files and syntax
– Data handling
• Analysis
– Descriptive
– Graphs
– Bivariate
May-16
H.S.
2
Why Stata
• Pro
–
–
–
–
–
Aimed at epidemiology
Many methods, growing
Graphics
Structured, Programmable
Coming soon to a course near you
• Con
– Memory>file size
May-16
H.S.
3
INTERFACE
Interface Stata 12
Do Data
file edit
May-16
H.S.
5
Menu
May-16
H.S.
6
Do-file example
New do-file: icon or
Ctrl-9
Run: Mark, Ctrl-D
May-16
H.S.
7
Syntax
• Syntax
[bysort varlist:] command [varlist] [if exp] [in range][, opts]
• Examples
–
–
–
–
May-16
mean age
mean age if sex==1
bysort sex: summarize age
summarize age ,detail
H.S.
8
DATA HANDLING
Import data
• Using SPSS 14.0-??
– Save as, Stata Version 8 SE
May-16
H.S.
10
Use and save data
• Open data
– use “C:\Course\Myfile.dta”, clear
• Describe
– describe
– list x1 x2 in 1/20
describe all variables
list obs nr 1 to 20
• Save data
– save “C:\Course\Myfile.dta” ,replace
May-16
H.S.
11
Stata via kiosk
• Stata 13
– https://kiosk.uio.no
– Analyse > StataMp13 > …
• Course files
1. webuse set “http://www.med.uio.no/forskning/doktorgradkarriere/forskerutdanning/kurs/biostatistikk/mf9510-logistiskregresjon-overlevelsesanalyse-cox/”
set homepage
2. webuse “birth1”
data for exercise 1
May-16
H.S.
12
Exercise 1
• Start Stata
• Open a new syntax file
• Paste in the webuse set “…” text. The text between “…” should
turn red. Replace the “ ” if it doesn’t. Remove any blank spaces inside the red
text. Run the command.
• Type in webuse “birth1” and run.
• Describe all variables: describe.
• List the 10 first observations of weight, sex and
mother’s age (mage)
• Save the syntax file for later use in the course
30 minutes
May-16
H.S.
13
Descriptive
• Continuous
summarize weight
summarize weight, detail
fractiles ++
• Categorical
tabulate bullied
tabulate bullied, nolab
May-16
show coding
H.S.
14
Other descriptives
tabstat mAge, stat( N min p50 mean max) by(parity)
May-16
H.S.
15
Generate, replace, recode
• Index (0/1) (young men)
– generate index=0
– replace index=1 if sex==1 & age<30
• Old (0/1)
– generate old=(age>50) if age<.
• Recode 1 to 0 and 2 to 1 into sex0
– recode sex (1=0) (2=1), generate(sex0)
May-16
H.S.
16
Dates
• From numeric to date (3 numeric variables into date variable)
ex: m=12, d=2, y=1987
generate birth=mdy(m,d,y)
format birth %td
• From string to date (1 string variable into date variable)
ex: bstr=“01.12.1987”
generate birth=date(bstr,”DMY”)
format birth %td
May-16
H.S.
17
Exercise 2
• Summarize mother’s age
• Tabulate sex
• Recode parity into parity4 with categories 0, 1, 2, 3-7
(Hint (3/7=3) )
– Tabulate parity by parity4
• Generate gestational age in weeks
– Summarize the new variable
• Generate and format new variable birth in date format
based on the three variables day, month and year (If
day, month and year do not exist, run the 3 lines at the start of the syntax file
“Stata 0, Introduction.do”)
– List day, month, year and birth to control the results
15 minutes
May-16
H.S.
18
Missing
• Obs!!!
–
–
–
–
Represented as ”.”
Missing values are large numbers
age>30
will include missing.
age>30 if age<.
will not.
• Test
– replace age=0 if (age==.)
• Remove
– drop if age==.
• Change
– replace educ=. if educ==99
May-16
H.S.
19
Describe missing
• Summarize missing
misstable summarize weight sex gest
missing
• Missing in tables
tab bullied sex, missing
May-16
H.S.
20
Exercise 3
• Tabulate missing in gestational age (gest) with the
misstable command
• Tabulate gest4 versus sex and include missing
• Summarize mage if gest is greater than 260 days
– Will this include missing in gest?
– Summarize mage if gest is greater than 260 days excluding
missing in gest
10 minutes
May-16
H.S.
21
Help
• General
– help
– findit
command
keyword
search Stata+net
• Examples
– help table
– findit aflogit
May-16
H.S.
22
Summing up
• Use do files
– Run:
Mark, Ctrl-D
• Syntax
– command [varlist] [if exp] [in range] [, options]
• Missing
– age>30 if age<.
– generate old=(age>50) if age<.
• Help
– help describe
May-16
H.S.
23
GRAPHICS
May-16
H.S.
24
Twoway plots
• Syntax
– twoway (plot1, opts) (plot2, opts), opts
• One plot
Kernel density estimate
– kdensity bw
0
2000
4000
Birth weight
6000
0
2000
Birth weight
– scatter bw gest
4000
6000
kernel = epanechnikov, bandwidth = 102.3251
240
May-16
H.S.
260
280
300
Gestational age
320
340
25
twoway (scatter bw gest) (fpfitci bw gest) (lfit bw gest)
smooth with CI
scatter
line fit
2000
3000
gram
4000
5000
6000
Weight by gestational age
250
270
290
310
days
May-16
H.S.
26
Titles
scatter bw gest,
title("title") subtitle("subtitle")
xtitle("xtitle") ytitle("ytitle") note("note")
title
1000 2000 3000 4000 5000
ytitle
subtitle
240
260
280
xtitle
300
320
note
May-16
H.S.
27
///
Exercise 4
• Make a density plot of birth weight (weight)
• Make a scatter plot of birth weight versus gestational age (gest)
– Remove outliers (hint if gest>250 & gest<310)
– Add a linear fit line to the scatter plot to see the trend
– Add a smoothing curve with confidence interval to the plot (fpfitci) to look for
non-linear trend (hint: order of the plots matter)
– Add a title, ytitle and xtitle to the plot
15 minutes
May-16
H.S.
28
BIVARIATE ANALYSIS
2 independent samples
Do boys and girls have the same mean birth weight?
twoway
( kdensity weight if sex==1, lcolor(blue) ) ///
( kdensity weight if sex==2, lcolor(red) )
Equal means?
Equal variance?
2000
May-16
3000
H.S.
4000
Birth weight
5000
6000
30
2 independent samples test
ttest weight, by(sex)
2-sample T-test
ttest weight, by(sex) unequal
ttest w1 w2, paired
May-16
H.S.
31
Crosstables
Are boys bullied as much as girls?
tabulate bullied sex, col chi2 nofreq
equal proportions?
May-16
H.S.
32
Exercise 5
• The variable “magegr2” contains mother’s age in two groups. Do
tab magegr2 and tab magegr2, nolab to find the groups and the
coding. An alternative way is to list all labels: label list
• Make a plot of the birth weight distribution for each of the two
groups of mother’s age.
• Do a ttest of weight by magegr2. Are the means different?
• Redo the ttest for weight>2000 to get more normal distributions.
– Are the means different?
• Generate an indicator for high birth weight (>4500).
• Make a table high birth weight by gestgr2 with columns percent
and chi-square test.
15 minutes
May-16
H.S.
33
Extra (if you have time)
• Do a help tabstat and look at the statistics options
• Do a tabstat of weight showing N min p25 p50 p75 max, by
magegr2
May-16
H.S.
34
Summing up
• Descriptive
summarize weight
tabulate sex
continuous
categorical
• Graphs
twoway (plot1, opts) (plot2, opts), opts
• Bivariate
ttest weight, by(sex)
tabulate bullied sex, chi2
May-16
H.S.
continuous
categorical
35
EXTRA INFO
Keep graphs
(see last part of syntax file)
1. Set (run once)
set autotabgraphs on
2. Give each plot a name
scatter weight mage , name("plot1", replace)
kdensity weight
, name("plot2", replace)
3. Both plots are now on the screen
May-16
H.S.
37
Stata via kiosk on Mac
Mac brukere må først fjernstyre en Windows server og derfra bruke Internet Explorer for å logge inn
på Windows versjonen av kiosk.
UiO anbefaler at Macbruker laster ned og installerer programmet Cord som kan brukes til å fjernstyre
Windows maskiner. Derfra startes Internet Explorer med innlogging til http://kiosk.uio.no.
Informasjon og veiledning om nedlasting, konfigurasjon og bruk av CoRd finnes på:
http://www.uio.no/tjenester/it/maskin/programvare/hjelp/finne-programmer/sentraleservere.html
https://www.uio.no/tjenester/it/maskin/programvare/programkiosk/hjelp/tilgjengelige_programmer.html
UiO anbefaler å lagre dokumenter på M: disken under Computer (enten direkte på M:`eller
M:\pc\Desktop eller M:\pc\My Documents).
May-16
H.S.
38
Download