Stata help

advertisement
Stata help
1
Mar. 16
BASICS ............................................................................................................................................................................................ 4
1.1
HELP .......................................................................................................................................................................................... 4
1.2
SHORT CUTS ............................................................................................................................................................................... 4
1.3
OPTIONS..................................................................................................................................................................................... 4
1.3.1
Memory, Max variables and max matrix size .................................................................................................................... 4
1.4
SAVE COMMANDS ...................................................................................................................................................................... 4
1.5
SAVE OUTPUT............................................................................................................................................................................. 4
1.6
NOTATION .................................................................................................................................................................................. 4
1.6.1
Variable names ................................................................................................................................................................. 4
1.7
COMMAND SYNTAX ................................................................................................................................................................... 4
1.7.1
By ...................................................................................................................................................................................... 4
1.7.2
Weights .............................................................................................................................................................................. 4
1.7.3
If exp.................................................................................................................................................................................. 5
1.7.4
Ranges ............................................................................................................................................................................... 5
1.8
PREFIX COMMANDS.................................................................................................................................................................... 5
1.9
ESTIMATION COMMANDS ........................................................................................................................................................... 5
1.10 POSTESTIMATION COMMANDS.................................................................................................................................................... 5
2
FUNCTIONS ................................................................................................................................................................................... 5
2.1
MATEMATICAL FUNCTIONS ........................................................................................................................................................ 5
2.2
STATISTICAL FUNCTIONS ............................................................................................................................................................ 5
2.2.1
Examples ........................................................................................................................................................................... 5
2.3
LOGICAL .................................................................................................................................................................................... 6
3
DATA HANDLING ........................................................................................................................................................................ 6
3.1
IMPORT DATA............................................................................................................................................................................. 6
3.2
USE AND SAVE ........................................................................................................................................................................... 6
3.3
DESCRIBE, LABELS ..................................................................................................................................................................... 6
3.4
FORMATS ................................................................................................................................................................................... 6
3.5
RECODING.................................................................................................................................................................................. 6
3.6
GENERATE, REPLACE ................................................................................................................................................................. 6
3.7
EXTENDED GENERATE................................................................................................................................................................ 6
3.7.1
Functions........................................................................................................................................................................... 6
3.8
DROP, KEEP ................................................................................................................................................................................ 6
3.9
MISSING ..................................................................................................................................................................................... 7
3.10 SORT .......................................................................................................................................................................................... 7
3.11 STRING COMMANDS ................................................................................................................................................................... 7
3.12 ACCESSING RESULTS FROM COMMANDS..................................................................................................................................... 7
3.12.1 System variables ................................................................................................................................................................ 7
3.12.2 Saved results ..................................................................................................................................................................... 7
3.12.3 Accessing results from commands, save as macros .......................................................................................................... 7
4
UNI- AND BIVARIATE ................................................................................................................................................................. 7
4.1
LIST ........................................................................................................................................................................................... 7
4.2
TABULATE ................................................................................................................................................................................. 7
4.2.1
One-way tables .................................................................................................................................................................. 7
4.2.2
Two-way tables ................................................................................................................................................................. 8
4.2.3
Three-way tables ............................................................................................................................................................... 8
4.3
TABLE OF SUMMARY STATISTICS................................................................................................................................................ 8
4.4
MEANS AND CONFIDENCE INTERVALS ........................................................................................................................................ 8
4.5
SUMMARIZE ............................................................................................................................................................................... 8
4.6
T-TEST ....................................................................................................................................................................................... 8
4.6.1
Test of equal variance (standard deviation)...................................................................................................................... 8
4.6.2
One way anova .................................................................................................................................................................. 8
4.7
NON-PARAMETRIC ANALYSIS ..................................................................................................................................................... 8
106754358
3/8/2016
7:39 AM
H.S. 
2
4.8
5
PROPORTIONS ............................................................................................................................................................................ 8
GRAPHICS ..................................................................................................................................................................................... 9
5.1
PLOT TYPES ................................................................................................................................................................................ 9
5.2
GRAPH TWOWAY ....................................................................................................................................................................... 9
5.2.1
Twoway syntax .................................................................................................................................................................. 9
5.2.2
Twoway plot types ............................................................................................................................................................. 9
5.2.3
Twoway fitlines.................................................................................................................................................................. 9
5.3
GRAPH BAR, HBAR AND DOT..................................................................................................................................................... 9
5.3.1
Syntax ................................................................................................................................................................................ 9
5.3.2
Options .............................................................................................................................................................................. 9
5.4
GRAPH BOX, HBOX .................................................................................................................................................................... 9
5.5
GRAPH PIE ............................................................................................................................................................................... 10
5.5.1
Options ............................................................................................................................................................................ 10
5.6
GRAPH MATRIX ....................................................................................................................................................................... 10
5.7
OTHER GRAPHS ........................................................................................................................................................................ 10
5.8
TITLES...................................................................................................................................................................................... 10
5.8.1
Title options..................................................................................................................................................................... 10
5.9
LEGEND ................................................................................................................................................................................... 10
5.10 AXIS SCALE, LABEL, TICKS AND GRID ....................................................................................................................................... 10
5.10.1 Axix title .......................................................................................................................................................................... 10
5.10.2 Axis scale......................................................................................................................................................................... 10
5.10.3 Axis labels and ticks ........................................................................................................................................................ 11
5.11 TEXT ........................................................................................................................................................................................ 11
5.12 MARKERS AND MARKER LABELS .............................................................................................................................................. 11
5.12.1 Markers ........................................................................................................................................................................... 11
5.12.2 Marker labels .................................................................................................................................................................. 11
5.13 LINES ....................................................................................................................................................................................... 11
5.13.1 Connecting points ........................................................................................................................................................... 11
5.13.2 Line options ..................................................................................................................................................................... 11
5.14 TEXT BOX OPTIONS .................................................................................................................................................................. 12
5.15 OTHER OPTIONS ....................................................................................................................................................................... 12
5.15.1 Colors .............................................................................................................................................................................. 12
5.15.2 Positions .......................................................................................................................................................................... 12
5.16 OVER()..................................................................................................................................................................................... 12
5.17 BY() ......................................................................................................................................................................................... 12
5.18 SCHEMES ................................................................................................................................................................................. 12
5.19 COMBINDING GRAPHS .............................................................................................................................................................. 13
6
REGRESSION COMMANDS ..................................................................................................................................................... 13
6.1
REGRESSION MODELS ............................................................................................................................................................... 13
6.1.1
Linear regression with simple error structure ................................................................................................................ 13
6.1.2
GLM ................................................................................................................................................................................ 13
6.1.3
Conditional logistc .......................................................................................................................................................... 13
6.1.4
Multiple outcome............................................................................................................................................................. 13
6.1.5
Linear regression with complex error structure.............................................................................................................. 13
6.1.6
Survival models ............................................................................................................................................................... 13
6.2
ORTHOGONAL VARIABLES ........................................................................................................................................................ 13
6.3
TEST AFTER REGRESSION COMMANDS ...................................................................................................................................... 13
6.3.1
Wald test .......................................................................................................................................................................... 13
6.3.2
Likelihood ratio test ........................................................................................................................................................ 14
6.4
CATALOGING ESTIMATION RESULTS ......................................................................................................................................... 14
6.5
COV, CORR, AIC, BIC AND SAMPLE ........................................................................................................................................ 14
6.6
PREDICTION ............................................................................................................................................................................. 14
7
LINEAR REGRESSION .............................................................................................................................................................. 14
7.1.1
7.1.2
7.1.3
Test of assumtions ........................................................................................................................................................... 14
Test of influence .............................................................................................................................................................. 14
Test of multicollinearity .................................................................................................................................................. 14
3
8
LOGISTIC REGRESSION .......................................................................................................................................................... 15
8.1
8.2
8.3
8.4
9
SYNTAX ................................................................................................................................................................................... 15
CATEGORICAL COVARIATES ..................................................................................................................................................... 15
RESIDUALS, GOODNES-OF-FIT................................................................................................................................................... 15
DIAGNOSTIC PLOTS .................................................................................................................................................................. 15
ST SURVIVAL TIME DATA ...................................................................................................................................................... 15
9.1
INITIAL SETTINGS AND DESCRIPTION ........................................................................................................................................ 15
9.2
KAPLAN –MEIER … ................................................................................................................................................................. 15
9.3
SURVIVAL REGRESSION MODELS .............................................................................................................................................. 15
9.3.1
Cox .................................................................................................................................................................................. 15
9.3.2
Parametric survival ......................................................................................................................................................... 15
10
10.1
10.2
10.3
11
11.1
12
XTMIXED -- MULTILEVEL MIXED-EFFECTS LINEAR REGRESSION ..................................................................... 15
SYNTAX ................................................................................................................................................................................... 15
RANDOM EFFECT COVARIANCES .............................................................................................................................................. 15
PREDICT ................................................................................................................................................................................... 16
DATA REDUCTION ................................................................................................................................................................ 16
FACTOR ANALYSIS ................................................................................................................................................................... 16
PROGRAMING ........................................................................................................................................................................ 16
12.1 PROGRAMS............................................................................................................................................................................... 16
12.1.1 Program definition .......................................................................................................................................................... 16
12.2 MACROS .................................................................................................................................................................................. 16
12.3 LOOPS ...................................................................................................................................................................................... 16
12.3.1 For loop .......................................................................................................................................................................... 16
12.3.2 Foreach ........................................................................................................................................................................... 16
12.3.3 While ............................................................................................................................................................................... 17
12.4 CONDITIONS ............................................................................................................................................................................. 17
12.4.1 If ...................................................................................................................................................................................... 17
12.5 MATRIX EXPRESSIONS .............................................................................................................................................................. 17
12.5.1 Matrix operators ............................................................................................................................................................. 17
13
GLLAMM .................................................................................................................................................................................. 17
13.1 INSTALATION ........................................................................................................................................................................... 17
13.2 DATA FORMAT ......................................................................................................................................................................... 17
13.3 SYNTAX EXAMPLES .................................................................................................................................................................. 18
13.3.1 A two-level random intercept model (logistic) ................................................................................................................ 18
13.3.2 A two-level random intercept and slope model (linear) .................................................................................................. 18
13.3.3 A two-level random intercept model, x1 and x2 categorical ........................................................................................... 18
13.4 PREDICTION ............................................................................................................................................................................. 18
13.4.1 Syntax and options .......................................................................................................................................................... 18
14
14.1
14.2
14.3
14.4
14.5
SURVEY COMMANDS ........................................................................................................................................................... 18
SETTING STRATIFICATION, CLUSTERING, FINITE POPULATION CORRECTION AND SAMPLE WEIGTHS ........................................... 18
MEANS AND PROPORTIONS ....................................................................................................................................................... 18
TABLES .................................................................................................................................................................................... 18
REGRESSION ............................................................................................................................................................................ 18
STATA WEB LINKS .................................................................................................................................................................... 18
4
1 Basics
1.1
Help
help cmd
1.2
Short cuts
Ctlr-R
Ctlr-D
Ctrl-Alt-T
PgUp / PgDown
# review n
esc
1.3
show help file for cmd
run selection in do file
do selection in do file
start STATA
prew/next command in command window
show last n commands
clear command
Options
1.3.1
Memory, Max variables and max matrix size
set memory 100m
default =10 Mb, max=as large as OS allows
set maxvar 1000
default =5000, max=32767
set matsize 500
default =400, max=11000
set xxx, permanently
will set for all sessions
1.4
Save commands
cmdlog using myfile
start a command log file
cmdlog close
close (and save) command log file
Can also save Review windov as do file, click on left upper “minus”
1.5
Save output
(set more off), Begin log, …….., close log, save log, print log
1.6
Notation
==
equal
~= (or !=)
not equal
&
and
|
or
~ (or !)
not
x^2
x square
+
string concatination
.
missing
x[3]
3. Observation of x
x[_n-1]
previous value of x
replace x=2 if _n==3
x[3]=2
1.6.1
Variable names
Names can be 1-32 ch long, letters (case sensitive), digits, underscore. Start with letter.
1.7
Command syntax
[by varlist:] command [varlist] [weigth] [if exp] [in range] [using filename] [, options]
OBS All command are lower case letters!
1.7.1
By
by varlist:
repeat for all combinations of values in varlist, use sort varlist first
by varlist, sort:
1.7.2
Weights
[weighttype=var]
fweight=freq
frequency weighting for aggregated data
aweight=1/sd
analytic weighting by precision
pweight=1/prob
probability weighting by sample probabilities
iweight=
importance weighting, manual controll of weights
5
ref U 23.13 and U 30
1.7.3
If exp
if exp
1.7.4
Ranges
in range
list x in 5/10
list x in f/10
list x in -10/l
1.8
do if exp == true
(OBS, missing includsed)
restrict to range (in first/last), f=first, l=last, -n from end. Ex: 5/25, -10/l
x from 5 to 10
x from first to 10
x from –10 to last= 10 last observations
Prefix commands
by:
statsby:
bootstrap:
jackknife:
simulate:
svy:
stepwise:
xi:
1.9
Estimation commands
1.10
Postestimation commands
mfx
adjust
estat vce
predict, predictnl
ereturn list
test, testnl
lrtest
lincom
nlcom
estimates
marginal effects
adjusted means
variance/covariance of estimates
list of saved results
linear and nonlinear Wald test
likelihood ratio tests
point estimates and conf int of linear combinations
non-linear comb
store and retrieve results
2 Functions
2.1
Matematical functions
sqrt()
ln() or log()
log10()
abs()
int()
exp()
min(x1,…,xn) max….
2.2
natural log
Statistical functions
comb(n,k)
binomial(n,k,p)
chi2(df,x)
normden(z,s)
norm(z)
uniform()
2.2.1
Examples
a+(b-a)*uniform()
a+int((b-a+1)*uniform())
mu+s*invnorm(uniform())
“n over k”
cum chi2
N(0,s2)
cum N(0,1)
0-1
random uniform [a,b)
random integers [a,b]
random normal mu s2
6
2.3
Logical
cond(x,a,b)
if x then a else b
3 Data handling
3.1
Import data
Use DBMS copy to convert from SPSS to Stata format. Use Stata 6 , 8 byte double as outcome file
3.2
Use and save
use file.dta
save newfile.dta
save file.dta ,replace
3.3
Describe, labels
describe
label var varname “text”
label define lblname # “text” # “text”…
label values varname lblname
3.4
overview of variables
variable lable
define mapping between numeric values (#) and labels (“text”) called lblname
associate mapping with variable
Formats
format varname %w.d type
type
Examples: %9.0g , %9.2f, %10s
3.5
save new copy
Overwrite original data
w=widht in columns, d=decimal places,
g=general, f=fixed, s=string.
Recoding
recode varlist (rule) (rule), gen(varlist) copy
syntax
recode x (1 2=1 low) (3 4=2 high)(missing=.), gen(x2)
recode 1 and 2 into 1, 3 and 4 into 2 give labels and generate new x2
recode x(1=2) if sex==1), gen(x2) copy
copy values for sex!=1
egen ageGr3=cut(age), group(3) label
3 equal sized groups
egen ageGr2=cut(age), at(0,50,80) label
2 groups 0-50, 50-80, values outside set to missing
encode stringvar, generate(newvar)
make numerical newvar (1,2,3…) based on stringvar values
3.6
Generate, replace
generate newvar=exp
replace oldvar=exp
gen agegr=age>=30 if age!=.
gen xlag=x[_n-1]
gen xlead=x[_n+1]
3.7
create new variable
missing values are greater than all numerical values
Extended generate
egen [type] newvar = fcn(arguments) [if exp] [in range] [, options]
egen newvar=fcn(arg)
extended generate: make newvar from stored functions.
Ex: by code, sort: egen mx=median(x)
gives medians of x by values of code
by ... : may be used with some egen functions
3.7.1
Functions
count(exp)
number of nonmissing observations of exp.
cut(varname), {at(#,#,...,#)|group(#)}
cut at the at() numbers, or in equal groups
mean, median, max, min, std, sum
pctile(exp) [, p(#)]
percentiles
group(var1 var2)
new var from all combinations of var1 and var2
rmiss
3.8
Drop, keep
drop varnames
drop in 3
keep var1-var5
drop if age==.
drop variables from memory
drop observation 3
keep variables 1 to 5. OBS Keep if age==10 will also keep missing.
Remove missing
7
3.9
Missing
.
numerical missing
“”
string missing
missing(x)
is eqv to x==. if x is numeric, is equv to x==”” is x is string
missing values are greater than all numerical values and are sorted last, age>=30 will include missing.
gen agegr=age>=30 if age!=.
drop if age==.
Remove missing
mvdecode x1, mv(99)
set 99 to missing
mvencode x1, mv(.=99)
set missing to 99
3.10
Sort
sort varname
3.11
String commands
fname+” “+lname
substr(name,1,10)
See U 16.3.5
3.12
string concatination
Aggregate
contract vars, freq(fname) percent(pname)
collapse vars
3.13
sort by variable. Use before “by var:” command
contract (aggregate) over variable patterns to freq and percents
collapse data to means (or other ststs) over variable patterns
Accessing results from commands
3.13.1 System variables
_b[varname]
regression coef
_b[cons]
intercept
_se[varname]
SE of regression coef
_n
current observation
_N
total number of obs
_pi
pi
Ex: regress y x, _b[_cons] gives constatnt term, _b[x[1]] gives coeff of first category of x, _se[x[1]] gives stand error
Ex: xi:regres y I.x, _b[_Ix_2] gives coef of second level of x (created dummy called _Ix_2)
3.13.2 Saved results
return list
run after a command to find list of saved results
ereturn list
run after a command to find list of estimated saved results
e(name)
estimation class, live until next estimation
r(name)
result class, live until next command
Ex: summarize age, gen agedev=age-r(mean)
Ex: regress y x1 x2, matrix B=e(b), matrix corr=e(V) save coeff and corr matrices
3.13.3 Accessing results from commands, save as macros
sum w if c==1
mean of w for c=1
global w1=r(mean)
save as global macro
dis $w1
show content of macro
4 Uni- and bivariate
4.1
List
list varlist [, [no]display nolabels]
list varname-i – varname-j
list in 3
list if exp
list var1 if var2==.
4.2
list variables, nodisplay gives tabular data, nolables gives values
List a group of variables
3. Observation, -1=last, 1/10 = 1 to 10
list if var>10, list if var==10
List if var2 is missing
Tabulate
4.2.1
One-way tables
tabulate var [weight][if expr][in range][,nofreq plot missing nolable]
nolable shows category values
8
tab1 varlist
one way tables for all variables
tab c, gen(c)
create dummies c1, c2,.. for each category of c
4.2.2
Two-way tables
tab var1 var2 [weight][if expr][in range][,nofreq col row cells chi2 exact missing nolabel]
tab var1 var2 , nofreq col chi
crosstab column % no freq with chi-square test
tab var1 var2 ,exact
Fisher exact test
tabi 30 20 \ 20 10, col chi2
immediate table
tab var1 var2, summarize(var3)
mean, sd and freq of var3 by var1 and var2. Use mean standard or freq to limit out
4.2.3
Three-way tables
sort var3
by var3: tab var1 var2
4.3
Table of summary statistics
table rowvar [colvar [supercolvar]] [if] [in] [weight] [, options]
table rowvar, contents(clist) row col
clist:freq, mean, sd, sum, n, max, min, median, p# (percentile),iqr. Totals: row col.
Show missing: missing
table rowvar colvar supercolvar by superrowvarlist multi way tables
Ex: table sex, c(n age mean age mean educ) row subjects, mean age and mean educ by sex , plus total row
tabstat varlist [if] [in] [weight] [, options]
epitab
4.4
Means and confidence intervals
means varlist
ci varlist, binomial poisson total
4.5
Summarize
summarize vars
summarize vars ,detail
inspect var
4.6
3 types of means with ci
ci for means, proportions and counts
number, mean, sd, min, max. Summarize alone takes all variables.
percentiles, var, skew, kurt
details on values
T-test
ttest var=#
one sample T-test
ttest var, by(c)
two sample T-test
ttest var1=var2
paired two sample T-test
ttest var1=var2, unpaired
two sample T-test
,unequal
equal variances not assumed
Ex: sdtest age, by(sex) (equal var rejected) ttest age, by(sex) unequal
4.6.1
Test of equal variance (standard deviation)
sdtest var=#
standard deviation=#
sdtest var, by(c)
two groups compared
sdtest var1=var2
same variance in both variables
4.6.2
One way anova
oneway response_var factor_var [weight] [if exp] [in range] [, noanova nolabel missing wrap tabulate
[no]means [no]standard [no]freq [no]obs bonferroni scheffe sidak ]
Ex: oneway var c, tabulate
analysis of var by c
4.7
Non-parametric analysis
by gender, sort: centile partners, centile(25 50 75) cci percentiles with exact confidence interval
ranksum partners, by(gender)
Mann-Whitney test=Wilcoxon rank sum, 2 group
kwallis partners, by(age3)
Kruskal Wallis K-group test
4.8
Proportions
proportions x1,over(c)
proportions with ci
9
5 Graphics
5.1
Plot types
graph twoway
graph matrix
graph bar, hbar, dot
graph box
graph pie
5.2
scatter, line, density, histogram, function,..
Graph Twoway
5.2.1
Twoway syntax
graph twoway plot [if exp] [in range] [, options] twoway syntax (graph may be omitted)
where plot=(plottype varlist, options)
plot syntax, several plots may be listed and combined
where varlist= y1 y2 … x
lats variable is x
Ex: twoway scatter y x
plot y by x
5.2.2
Twoway plot types
scatter, line, connected, area
dot, bar, histogram, kdensity
kernal desity
function y=f(x),range( x1 x2)
f(x) from x1 to x2
rarea rcap rbar
range area, range cap, range bar ,
Ex: twoway area y x , sort base(50)
gives shading from 50
Ex: Histogram, bin(10) start(-2.5) percent/frequency
Ex: twoway (histogram x, width(1) frequency) (kdensity x, area(3200))
area scaled to the sum of subjects
Ex: function y=normden(x), range(-4 4) droplines(-1.96 1.96) function plots
Ex: twoway dropline db id if abs(db>.25) , mlabel(id)
deltabeta >0.25
5.2.3
Twoway fitlines
lfit, qfit, mband, mspline,lowess
linear and quadratic fits, median band, median splines and lowess
lfitci, qfitci, fpfitci
fit with CI: linear, quadratic, fractional polynom
Ex: (lfitci y x, ciplot(rline)) default is rarea
Ex.: twoway (lfit y x) (lowess y x) (scatter y x)
scatter with linear and lowess fit
5.3
Graph Bar, Hbar and Dot
5.3.1
Syntax
graph bar/hbar/dot yvars [if exp] [in range] [, options]
Where yvars=varlist, or =(stat) varlist, or= (stat) name=varname
stat= mean, median, p1, p2, p99, sum, count, min, max
Ex: graph bar x ,over(c) nofill
means of x over categories of c
Ex: graph bar (mean) meany=x (median) medy=x
mean and median of the same variable
Ex: graph bar (median) x1 x2 , percent stack
stacked percentages
5.3.2
Options
nofill
skip empty categories
sort(1)
sort by 1 variable
over(c1)
values for each c1
by(c2)
separete plots for each c2
bargap(0)
% overlap, -30=30% overlap, 30=gap.
blabel(what,where_and_how)
bar labels
what: bar/ total/ name/ group
print height, total height, name of yvar, name of first over() group
Where_and_how:
position(outside/ inside/base/center)
where to lpace the bar label
format(%9.1f) gap(rel_size) textbox_options
options for labels
Ex: graph bar teq1 ,over(landsdel) nofill blabel(bar, pos(inside) size(*1.3) format(%9.1f) color(white))
Ex: graph hbar teq1 ,over(landsdel,axis(off) sort(1))nofill blabel(group, pos(base) size(*1.3) format(%9.1f) color(white))
5.4
Graph Box, Hbox
graph box x1 x2 x3, ascategory
boxplot of separate cariables, ascat puts labels on the y-axis
10
graph hbox x, over(c, total)
5.5
plot of x over cat of c plus total
Graph Pie
graph pie x1 x2 x3
sum of x1, x2 and x3
graph pie x ,over(c)
sum of x for each category of c
graph pie ,over(c)
number of cases for each category of c
5.5.1
Options
plabel(_all sum/ percent/ name/ text, text_box_options)
label all slices with sum, percent, x-names or a given text
5.6
Graph Matrix
graph matrix x1-x5
5.7
Other graphs
gladder y, qladder y
5.8
scatter of all 5 variables
histograms over different transformations of y, QQ plot of the same
Titles
title(“text”), xtitle(“text”), ytitle(“text”)
titles
title, subtitle, captition, note
title types
5.8.1
Title options
position(clockpos)
ring(ringpos)
span
text_box_options
Ex: scatter teq1 moralder, title("Title", position(12) ring(0))
5.9
Legend
legend([contents] [location])
Contetnts:
order(1 2 3)
may also use order(1-“label1” 2 3)
label(1 “label1”)
override legend for var 1
cols(1)
legend in 1 column. Row(1) …
stack
stack symbol and text
rowgap(2) colgap(2)
gap between each element
Location:
on/off
legend on/off
position(clock)
position of legend
ring(1)
radial distance from plot, ring(0)=inside
Ex: legend(label(1 "Density of TEQ") label(2 "Mean") label(3 "Median") ring(0) pos(2) cols(1))
Ex: graph bar teq_di teq_fu teq_npcb teq_mopc teq_hcb ///
, legend(row(1) stack colgap(10) label(1 "Dioxin") label(2 "Furan") label(3 "Non-o") label(4 "Mono") label(5 "HCB"))
5.10
Axis scale, label, ticks and grid
5.10.1 Axix title
x|ytitle(“line1” “line2”)
5.10.2 Axis scale
x|yscale(opts)
Options:
axis(1)
axis to modify (1-9)
[no]log
[no]reverse
range(0 100)
extend range, will not decrease range. range(0): start at 0, range(100): end at 100
alt
axis at alternative side
on/off
axis on/off
Ex: scatter teq1 moralder,xscale(range(0 80)) yscale(off)
no y-axis
11
5.10.3 Axis labels and ticks
x|ylabel(rule_or_values,opts)
major ticks and labels
x|ytick(rule_or_values)
major ticks
x|ymlabel(rule_or_values)
minor ticks and labels
x|ymtick(rule_or_values)
minor ticks
rule or values (may use both):
#10
10 nice values
1 5 50
labels at 1, 5 and 50
0 5 10 “mean” 15 20
labels every 5, with mean printed at 10
0 (10) 100
labels from 0 to 100 in steps of 10
minmax
min and max values
none
Label options:
angle(0)
[no]grid
add gridlines
format(%5.0f)
5 places, o decimals, fixed
Ex: xlabel(1 “Low” 2 “Medium” 3”High”,angle(45)) text labels at values 1 2 and 3, at 45 deg
Ex: scatter teq1 moralder,xlabel(#10,grid)
5.11
Text
text(y x “text”, opts)
text at y,x in the plot
placement(c )
c=centered, n=north, s=south, ..
orientation(vertical)
box
draw box around text
Ex: graph …, text(10 50 “Line1” “Line2”, just(left) color(blue) )
two lines of text at (y,x)=(10,50)
5.12
Markers and marker labels
5.12.1 Markers
mstyle(p1 p2 )
msymbol(sym1 sym2 …)
default styles
marker, Square, square(small), Sh (hollow), Square, Diamond, Triange, O circle, X ,
+, p point, . default, i invisible. Ex msymbol(S)
msize(small medium large), msize(*2)
small meduin large markers, twize the size
mcolor(green)
both outside and inside color
Ex msymbol(. t Oh)
markers for 3 variables: default, small triagles and hollow circles
Ex twoway scatter y x [aweight=z], msymbol(oh) msize(small)
point size prop to z
5.12.2 Marker labels
mlabel(var)
label marker by var content
mlabsize(size)
mlabcolor(color)
mlabelpos(12)
label at 12 o’clock position
mlabvposition(var)
postitions based on variable containing clock positions
mlabgap(*3)
3 times larger gap between marker and label
Ex scatter y x, mlabel(z) mlabpos(center) msymbol(i) use contents of z to label points, labels in the center and invisible points
5.13
Lines
5.13.1 Connecting points
Twoway scatter y x, connect(l) sort
connect(l)
connect(L)
connects(J stepstair)
5.13.2 Line options
lcolor(red)
lwidth(thick) or lwidth(*3)
lpattern(dash)
lpattern(“l” “.-“ “-###”)
sort points, connect with line
line
separate line for each series
for survival curves
line color
thick line
solid, dotdashed, dash+3 spaces
12
5.14
Text box options
tsstyle(textboxstyle)
overall style
box/nobox
border
size(textsizestyle)
color(colorstyle)
text color
justification(justificationstyle)
text left, center, right
alignment(alignmentsyle)
text top, middle, bottom, baseline
bfcolor(colorstyle)
background color
bcolor(colorstyle)
background and border color
blstyle(linestyle)
style of border
orientation(orientationstyle)
vertical/horizontal, rvertical/rhorizontal
placement(compassdirstyle)
location
ring(1)
0:inside, 1-7 outside
format(%9.1f)
9 places, 1 desimal, fixed
Ex: graph…,title(“My title”, color(red) box size(*1.5))
5.15
Other options
5.15.1 Colors
black, white, red, blue, cyan, green, mint, yellow….
gs0… gs16
gray scales from black to white
gray=gs8
color*0.5
half the intensity
5.15.2 Positions
clockpos(12)
12 o’clock. clockpos(0) means center if valid
placement(north)
alternative to clock with 9 positions
ring(1)
0:inside, 1-7 outside
justification(left/ centered/ right)
text justification
alignment(top/ middle/ bottom/ baseline)
text alignement
orientation(horizontal/ vertical/rhorizontal/ rvertical)
5.16
Over()
over(c, total)
split by categories of c plus total, can use over(c1) over(c2)
over(c, descending)
sort values.
over(c, sort(c2)), sort(1)
sort by c2 or by the first y variable
over(var, relabel(1 “lab1” “lab2”))
new labels for ”over” variable
ascategory / asyvars
as categories: plotted with spaces, as yvars: plotted dense
missing, nofill
show missing, do not show empty combinations
Ex: graph bar teq_di teq_fu ,over(landsdel, total) nofill
5.17
By()
by(varlist, suboptions)
separate graphs for each varlist
total
add total group
missing
add missing groups
colfirst
display down columns
rows(#), cols(#)
number of rows or cols
holes(numlist)
positions to leave blank
compact
Ex: graph bar teq_di teq_fu ,by(star, total rows(1) compact)
5.18
Schemes
set scheme(schemename) [,permanently]
graph …, scheme(schemename)
graph query, schemes
schemenames:
s2color
s2mono
set overall look of graphs
set overall look for current graph
list installed schemes
Default, will vary colors of lines and markers
monocrome, will vary patterns of lines and markers
13
5.19
Combinding graphs
graph …., saving(plt1,replace) or name(plt1)
graph …., name(plt1,replace)
graph use plt1.gph or display plt1.gph
graph combine plt1 plt2, ycommon cols(1)
graph combine plt1.gph plt2.gph
5.20
Graph query
graph query
graph query color
graph query linepattern
5.21
saving to file
saving to memory
show saved graph from file
combine from memory in 1 row with same y scaling
combine from file
list of all styletypes
list of all colorstyles
list of all linepatternstyles
Palettes
palette line
palette symbol
palette color1 color2
plot showing the linetypes
plot showing the symboltypes
plot comparing colors
6 Regression commands
6.1
Regression models
6.1.1
Linear regression with simple error structure
regress
linear regression (also heteroschedastic errors)
boxcox
linear regression on BoxCox transformations of y and x’s
nl
non linear least squares
6.1.2
GLM
logistic
logistic regression
poisson
Poisson regression
binreg
binary outcome, OR, RR, or RD effect measures
glm
use for non-canonical links
6.1.3
Conditional logistc
clogit
for matched case-control data
6.1.4
Multiple outcome
mlogit
multinomial logit (not ordered)
ologit
ordered logit
6.1.5
Linear regression with complex error structure
xtmixed
linear mixed models
xtlogit
random effect logistic
xtpoisson
random effect Poisson
6.1.6
Survival models
stcox
Cox proportional hazard models (with frailty)
streg
parametrix survival models (with frailty)
6.2
Orthogonal variables
orthog x1 x2 x3, gen(q1 q2 q3) matrix(R)
regress y q1 q2 q3
matrix b=e(b)*inv(R)’
matrix list b
6.3
make orthogonal variables and transformation matrix R
regression command
transforming coefs back to original metric
show coefs
Test after regression commands
6.3.1
Wald test
test x1 x2
test x1=-2
test x1-2*x2=3
joint effect of two variables
H0: x1=-2
test of linear combinations of variables
14
6.3.2
Likelihood ratio test
regress y x1 x2 x3 x4
estimates store m1
regress y x1 x2
lrtest m1 .
lrtest m1 m2
6.4
Cataloging estimation results
quietly: regress y x1 x2
estimates store m1
estimates dir
est table m1 m2 …
est stats m1 m2 …
estimates replay
estimates restore m2
6.5
fit model without output
store results as m1
list stored results
compare coefs
compare fit (ll, AIC..)
show results
make m2 active
Cov, Corr, AIC, BIC and sample
estat vce
estat vce, corr
estat ic
estat summarize
6.6
fit model 1
store model 1
fit model 2
test model 1 against current model
test m1 vs m2
vce=variance-covariance estimate
correlation matrix
information criteria: AIC and BIC
show mean, min and max for variables in the model
Prediction
regress y x1 x2
gen y1=_b[_cons]+_b[x1]*x1+_b[x2]*x2
predict y1
predict y1, xb
pred y1 if e(sample), xb
pred sey, stdp
pred r1, resid
pred c1, cooksd
fit model
direct predicition
prediction in the same metric as the outcome, prob of sucsess for logistic, counts for
Poisson, …
linear prediction
linear prediction restricted to the estimation sample
standard error of prediction
residuals
Cooks distance
7 Linear regression
regress y x1 x2 x3
regress
test x2 x3
vce
predict
predict newvar, stat
regress y x1 x2 x3 if influ<1
7.1.1
Test of assumtions
predict fteq ,xb
predict res ,res
twoway (qfitci res fteq ) (scatter res fteq)
rvfplot, mlabel(id) yline(0)
ovtest
ovtest, rhs
hettest
7.1.2
Test of influence
lvr2plot ,mlabel(id)
avplot moralder ,mlabel(id)
7.1.3
Test of multicollinearity
vif
regress y on x1 x2 x3
repeat last result
F-test of joint effect of x2 and x3
variance covariance matrix of estimators. Vce, rho gives corr matrix
predicted values
pred, resid, DFBeta,…
Stored Cooks dist in influ, rerun without high influential points
predicted y
residuals
scatter with qubic +ci
residuals versus fitted, look for non linearity and heterosk.
test for omitted higher order y's, p<.05 means non-linear effects
test for omitted higher order x-variables, p<.05 means non-linear effects */
test for heterosk., p<0.05 means heterosk.
leverage vs residuals squared, look for high leverage
added variable plot
variance inflation factor, look for vif>10 (or 30) and mean vif>1
15
8 Logistic regression
8.1
Syntax
logistic y x1 x2 x3
logistic , coef
logit
8.2
show odds ratios
show coefs of last model
show coefs of last model
Categorical covariates
xi: logistic y x1 i.x2 x3
char _dta[omit] prevalent
char _dta[omit]
char catvar[omit] 3
8.3
indicator variables for x2
make the most prevalent group the reference category (Permanent setting)
make the 1. Group reference. (Permanent setting)
make 3. Group of catvar reference. (Permanent setting)
Residuals, goodnes-of-fit
predict newvar, stat
predict statistic and put into newvar.
ptat: p=probabilities, xb=fitted values, db=delta beta, de=deviance resid, r=Pearson resid, rsta=standardized resid, hat=leverage
test x1 x2
test joint effect of x1 x2
lfit
Pearson chi-square goodness of fit. , group(10) gives Hosmer-Lemeshow with 10 g
lstat
summary statistics
lincom
OR of one covariate pattern versus another
8.4
Diagnostic plots
After fitting the logistic model do:
predict p, p
probabilities
predict db, db
delta beta
predict dx2, dx2
Hosmer Lemeshow delta chi-square influence
graph dx2 p [w=db],border ylab xlab t1(“Symbol size prop to delta-beta”)
9 ST Survival time data
9.1
Initial settings and description
stset timevar, failure(died)
stdes
stsum
9.2
set time variable and failure indicator
describe data
summarize data
Kaplan –Meier …
sts graph, by(drug)
sts test drug
Kaplan-Meier plot
log rank test
stci, by(sex) p(25)
25 percentile with ci by sex
9.3
Survival regression models
9.3.1
Cox
9.3.2
Parametric survival
10 xtmixed -- Multilevel mixed-effects linear regression
10.1
Syntax
xtmixed y x1 x2 x3 ||id: x1 , cov(ind)
10.2
y and fixed part || id for second level: random part (intercept understood), covariance
Random effect covariances
independent
exchangeable
identity
one variance parameter per random effect, all covariances zero; default
equal variances for random effects, and one common pairwise covariance
equal variances for random effects, all covariances zero; the default for factor vars
16
unstructured
10.3
all variances/covariances distinctly estimated
Predict
xb
stdp
fitted
residuals
rstandard
Ex: predict yhat, fitted
xb, linear predictor for the fixed portion of the model
standard error of the fixed-portion linear prediction xb
fitted values, linear predictor of the fixed portion plus predicted random effects
residuals, response minus fitted values
standardized residuals
predict fixed and random effect into new variable yhat
11 Data reduction
11.1
Factor analysis
factor v1 v2 v3 v4, mineigen(1) factors(5)
estat anti
estat kmo
rotate
loadingplot
minimum eigenvalue 1, max number of factors 5
anti-image corr and cov
Kaiser-Meyer-Olkin measure of sampling adequacy, 0.00 to 0.49 unacceptable,
0.50 to 0.59 miserable, 0.60 to 0.69 mediocre, 0.70 to 0.79 middling, 0.80 to
0.89 meritorious, 0.90 to 1.00 marvelous
varimax orthogonal
plot 2 factors
12 Programing
12.1
Programs
12.1.1 Program definition
program define name
arguments x1 x2 x3
local m=`x1’ +1
.
end
program drop name
12.2
Macros
local name “content”
local name= expression
`name’
global name= expression
$name
12.3
remove old program definition
define macro
define macro
use local macro
define macro
use global macro
Loops
12.3.1 For loop
forvalues i=1(1)10 {
disp `i'
}
12.3.2 Foreach
foreach lname in any_list {
foreach lname of local lmacname {
foreach lname of global gmacname {
foreach lname of varlist varlist {
foreach lname of newlist newvarlist {
foreach lname of numlist numlist {
Ex:
local grains "rice wheat corn rye barley oats"
foreach x of local grains {
display "`x'"
commands on separate lines
17
}
Ex: foreach x of varlist mpg weight-turn {
...
}
12.3.3 While
local i=1
while `i’<5 {
commands
local i= `i’+1
}
12.4
Conditions
12.4.1 If
if exp {
Commands
}
else {
commands
}
12.5
the else part is optitional
Matrix expressions
matrix A=(1,2,3\4,5,6)
A[.,“col1”] or A[.,1]
A[”row1”,. ] or A[1,.]
A[“row i2,”col j”] or A[i,j]
A[2:,1..2]
mat B=J(3,4,0)
mat B[2,2]=1
12.5.1 Matrix operators
-B
negate
B'
transpose
B \ C add rows of C below rows of B
B , C add columns of C to the right of B
B + C add
B - C subtract
B * C multiply (including mult. by scalar)
B / z division by scalar
B # C Kronecker product
define matrix A as 2 by 3
first col, “col1” is the column name
first row
element i,j
submatrix (2-n) by (1-2), may also use names
3 by 4 matrix of zero’s
change element
matrix list A
matrix dir
matrix list
matrix rename
matrix drop
show matrix
List the currently defined matrices
Display the contents of a matrix
Rename a matrix
Drop a matrix
13 GLLAMM
13.1
Instalation
Run the following Stata command to install gllamm:
ssc install glamm, replace
13.2
Data format
Use long data format with identifiers at the different levels
18
13.3
Syntax examples
13.3.1 A two-level random intercept model (logistic)
gllamm y x1 x2, i(level2-Id) family(binom) link(logit) nip(8) number of integration points=8
13.3.2 A two-level random intercept and slope model (linear)
gen cons=1
eq interc: cons
eq slope1: x1
gllamm y x1, i(level2_id) nrf(2) eqs(interc slope1)
number of random functions=2,
3 random parameters estimated: var(interc), var(slope1) and covar(interc,slope1).
Option nocor would set the last to 0
13.3.3 A two-level random intercept model, x1 and x2 categorical
xi:gllamm y i.x1 i.x2, i(level2-Id) family(binom) link(logit) nip(8)
13.4
Prediction
13.4.1 Syntax and options
Gllapred varname [, xb u linpred]
xb
u
linpred
fixed effect part of linear prediction
posterior means and std for latent variables
linear prediction of both fixed and random parts
14 Survey commands
A family of commands to account for survey design (stratification and clustering)
14.1
Setting stratification, clustering, finite population correction and sample weigths
Svyset strata varname
Svyset psu varname
Svyset fpc varname
Svyset pweigth=varname
Settings remain untill cleared
Svyset , clear
Svyset
14.2
stratification
clustering (psu=principal survey unit)
finite population correction
sample probability weights
shows current settings
Means and proportions
Svymean varname by (variable) subpopulation(variable)
subpopulation will select values different from 0 and missing. Do not
use if in svy commands
Svyprop varname
Svyratio varname
Svytotal varname
14.3
Tables
Svytab x y, row column obs se ci
14.4
Regression
Svyreg
Svylogit
Svypois
14.5
two-way tables
linear
logistic
Poisson
Stata web links
Stata programs for generalized linear measurement error models, USA
Programs by R. J. Carroll, J. Hardin, and H. Schmiediche, fit generalized linear models when one or more
covariates are measures with error.
Stata program by Tony Brady, Sealed Envelope Ltd
19
Programs for Hosmer–Lemeshow goodness of fit test, conversion of regression output into near publication
quality tables, time utilities to translate strings in 24 hr clock HH:MM format to elapsed times and back again,
tabulate longitudinal data at the cluster level, count clusters in longitudinal data, etc.
Stata programs from Dr. Gareth Ambler, University College, UK
Programs for Hosmer-Lemeshow test, penalised logistic regression, and generalized additive models, and a
postestimation routine.
One great source for user-written software for Stata is the Stata Journal (SJ). There are many other resources available,
including the Statalist archive, but we will use the SJ archive for this example.
From Stata's toolbar, click on Help > SJ and User-written Programs, or at the command prompt, type [view] help
net_mnu.
15 New in Stata 10
15.1
Graph editor
15.2
Exact
15.3
Mixed models
xtmelogit
xtmepoisson
15.4
Survival
sts graph, risk table ci plotopt() ciopt()
st curve, ---#---
15.5
Power
stpower cox
stpower logrank 0.7 0.8, power(80)
sample required to increase the survival from 0.7 (untreated) to 0.8 (treated) at the
end of survey
stpower logrank 0.7, n(100 250 500) hratio(0.1(0.01)0.9) saving(mypower)
15.6
Saved results
est save filename
est use filename
15.7
Mata
15.8
Diverse
lpoly
mkspline
Download