Stata help 1 Maremory, Max variables and max matrix sizeariable names ................................................................................................................................................................. 4 1.7 COMMAND SYNTAX ................................................................................................................................................................... 4 1.7.1 By ...................................................................................................................................................................................... 4 1.7.2 Weights .............................................................................................................................................................................. 4 1.7.3 If exp.................................................................................................................................................................................. 5 1.7.4 Rangesxamplesunctionsystem variables ................................................................................................................................................................ 7 3.12.2 Saved results ..................................................................................................................................................................... 7 3.12.3 Accessing results from commands, save as macrosne-way tables .................................................................................................................................................................. 7 4.2.2 Two-way tables ................................................................................................................................................................. 8 4.2.3 Three-way tablesest of equal variance (standard deviation)...................................................................................................................... 8 4.6.2 One way anovawoway syntax .................................................................................................................................................................. 9 5.2.2 Twoway plot types ............................................................................................................................................................. 9 5.2.3 Twoway fitlines.................................................................................................................................................................. 9 5.3 GRAPH BAR, HBAR AND DOT..................................................................................................................................................... 9 5.3.1 Syntax ................................................................................................................................................................................ 9 5.3.2 Optionsptionsitle optionsxix title .......................................................................................................................................................................... 10 5.10.2 Axis scale......................................................................................................................................................................... 10 5.10.3 Axis labels and ticksarkers ........................................................................................................................................................................... 11 5.12.2 Marker labels .................................................................................................................................................................. 11 5.13 LINES ....................................................................................................................................................................................... 11 5.13.1 Connecting points ........................................................................................................................................................... 11 5.13.2 Line optionsolors .............................................................................................................................................................................. 12 5.15.2 Positionsinear regression with simple error structure ................................................................................................................ 13 6.1.2 GLM ................................................................................................................................................................................ 13 6.1.3 Conditional logistc .......................................................................................................................................................... 13 6.1.4 Multiple outcome............................................................................................................................................................. 13 6.1.5 Linear regression with complex error structure.............................................................................................................. 13 6.1.6 Survival modelsald test .......................................................................................................................................................................... 13 6.3.2 Likelihood ratio testest of assumtions ........................................................................................................................................................... 14 Test of influence .............................................................................................................................................................. 14 Test of multicollinearity–MEIER … ................................................................................................................................................................. 15 9.3 SURVIVAL REGRESSION MODELS .............................................................................................................................................. 15 9.3.1 Cox .................................................................................................................................................................................. 15 9.3.2 Parametric survivalrogram definitionor loop .......................................................................................................................................................................... 16 12.3.2 Foreach ........................................................................................................................................................................... 16 12.3.3 While ............................................................................................................................................................................... 17 12.4 CONDITIONS ............................................................................................................................................................................. 17 12.4.1 If ...................................................................................................................................................................................... 17 12.5 MATRIX EXPRESSIONS .............................................................................................................................................................. 17 12.5.1 Matrix operatorstwo-level random intercept model (logistic) ................................................................................................................ 18 13.3.2 A two-level random intercept and slope model (linear) .................................................................................................. 18 13.3.3 A two-level random intercept model, x1 and x2 categorical ........................................................................................... 18 13.4 PREDICTION ............................................................................................................................................................................. 18 13.4.1 Syntax and options .......................................................................................................................................................... 18 14 14.1 14.2 14.3 14.4 14.5 SURVEY COMMANDS ........................................................................................................................................................... 18 SETTING STRATIFICATION, CLUSTERING, FINITE POPULATION CORRECTION AND SAMPLE WEIGTHS ........................................... 18 MEANS AND PROPORTIONS ....................................................................................................................................................... 18 TABLES .................................................................................................................................................................................... 18 REGRESSION ............................................................................................................................................................................ 18 STATA WEB LINKS .................................................................................................................................................................... 18 4 1 Basics 1.1 Help help cmd 1.2 Short cuts Ctlr-R Ctlr-D Ctrl-Alt-T PgUp / PgDown # review n esc 1.3 show help file for cmd run selection in do file do selection in do file start STATA prew/next command in command window show last n commands clear command Options 1.3.1 Memory, Max variables and max matrix size set memory 100m default =10 Mb, max=as large as OS allows set maxvar 1000 default =5000, max=32767 set matsize 500 default =400, max=11000 set xxx, permanently will set for all sessions 1.4 Save commands cmdlog using myfile start a command log file cmdlog close close (and save) command log file Can also save Review windov as do file, click on left upper “minus” 1.5 Save output (set more off), Begin log, …….., close log, save log, print log 1.6 Notation == equal ~= (or !=) not equal & and | or ~ (or !) not x^2 x square + string concatination . missing x[3] 3. Observation of x x[_n-1] previous value of x replace x=2 if _n==3 x[3]=2 1.6.1 Variable names Names can be 1-32 ch long, letters (case sensitive), digits, underscore. Start with letter. 1.7 Command syntax [by varlist:] command [varlist] [weigth] [if exp] [in range] [using filename] [, options] OBS All command are lower case letters! 1.7.1 By by varlist: repeat for all combinations of values in varlist, use sort varlist first by varlist, sort: 1.7.2 Weights [weighttype=var] fweight=freq frequency weighting for aggregated data aweight=1/sd analytic weighting by precision pweight=1/prob probability weighting by sample probabilities iweight= importance weighting, manual controll of weights 5 ref U 23.13 and U 30 1.7.3 If exp if exp 1.7.4 Ranges in range list x in 5/10 list x in f/10 list x in -10/l 1.8 do if exp == true (OBS, missing includsed) restrict to range (in first/last), f=first, l=last, -n from end. Ex: 5/25, -10/l x from 5 to 10 x from first to 10 x from –10 to last= 10 last observations Prefix commands by: statsby: bootstrap: jackknife: simulate: svy: stepwise: xi: 1.9 Estimation commands 1.10 Postestimation commands mfx adjust estat vce predict, predictnl ereturn list test, testnl lrtest lincom nlcom estimates marginal effects adjusted means variance/covariance of estimates list of saved results linear and nonlinear Wald test likelihood ratio tests point estimates and conf int of linear combinations non-linear comb store and retrieve results 2 Functions 2.1 Matematical functions sqrt() ln() or log() log10() abs() int() exp() min(x1,…,xn) max…. 2.2 natural log Statistical functions comb(n,k) binomial(n,k,p) chi2(df,x) normden(z,s) norm(z) uniform() 2.2.1 Examples a+(b-a)*uniform() a+int((b-a+1)*uniform()) mu+s*invnorm(uniform()) “n over k” cum chi2 N(0,s2) cum N(0,1) 0-1 random uniform [a,b) random integers [a,b] random normal mu s2 6 2.3 Logical cond(x,a,b) if x then a else b 3 Data handling 3.1 Import data Use DBMS copy to convert from SPSS to Stata format. Use Stata 6 , 8 byte double as outcome file 3.2 Use and save use file.dta save newfile.dta save file.dta ,replace 3.3 Describe, labels describe label var varname “text” label define lblname # “text” # “text”… label values varname lblname 3.4 overview of variables variable lable define mapping between numeric values (#) and labels (“text”) called lblname associate mapping with variable Formats format varname %w.d type type Examples: %9.0g , %9.2f, %10s 3.5 save new copy Overwrite original data w=widht in columns, d=decimal places, g=general, f=fixed, s=string. Recoding recode varlist (rule) (rule), gen(varlist) copy syntax recode x (1 2=1 low) (3 4=2 high)(missing=.), gen(x2) recode 1 and 2 into 1, 3 and 4 into 2 give labels and generate new x2 recode x(1=2) if sex==1), gen(x2) copy copy values for sex!=1 egen ageGr3=cut(age), group(3) label 3 equal sized groups egen ageGr2=cut(age), at(0,50,80) label 2 groups 0-50, 50-80, values outside set to missing encode stringvar, generate(newvar) make numerical newvar (1,2,3…) based on stringvar values 3.6 Generate, replace generate newvar=exp replace oldvar=exp gen agegr=age>=30 if age!=. gen xlag=x[_n-1] gen xlead=x[_n+1] 3.7 create new variable missing values are greater than all numerical values Extended generate egen [type] newvar = fcn(arguments) [if exp] [in range] [, options] egen newvar=fcn(arg) extended generate: make newvar from stored functions. Ex: by code, sort: egen mx=median(x) gives medians of x by values of code by ... : may be used with some egen functions 3.7.1 Functions count(exp) number of nonmissing observations of exp. cut(varname), {at(#,#,...,#)|group(#)} cut at the at() numbers, or in equal groups mean, median, max, min, std, sum pctile(exp) [, p(#)] percentiles group(var1 var2) new var from all combinations of var1 and var2 rmiss 3.8 Drop, keep drop varnames drop in 3 keep var1-var5 drop if age==. drop variables from memory drop observation 3 keep variables 1 to 5. OBS Keep if age==10 will also keep missing. Remove missing 7 3.9 Missing . numerical missing “” string missing missing(x) is eqv to x==. if x is numeric, is equv to x==”” is x is string missing values are greater than all numerical values and are sorted last, age>=30 will include missing. gen agegr=age>=30 if age!=. drop if age==. Remove missing mvdecode x1, mv(99) set 99 to missing mvencode x1, mv(.=99) set missing to 99 3.10 Sort sort varname 3.11 String commands fname+” “+lname substr(name,1,10) See U 16.3.5 3.12 string concatination Aggregate contract vars, freq(fname) percent(pname) collapse vars 3.13 sort by variable. Use before “by var:” command contract (aggregate) over variable patterns to freq and percents collapse data to means (or other ststs) over variable patterns Accessing results from commands 3.13.1 System variables _b[varname] regression coef _b[cons] intercept _se[varname] SE of regression coef _n current observation _N total number of obs _pi pi Ex: regress y x, _b[_cons] gives constatnt term, _b[x[1]] gives coeff of first category of x, _se[x[1]] gives stand error Ex: xi:regres y I.x, _b[_Ix_2] gives coef of second level of x (created dummy called _Ix_2) 3.13.2 Saved results return list run after a command to find list of saved results ereturn list run after a command to find list of estimated saved results e(name) estimation class, live until next estimation r(name) result class, live until next command Ex: summarize age, gen agedev=age-r(mean) Ex: regress y x1 x2, matrix B=e(b), matrix corr=e(V) save coeff and corr matrices 3.13.3 Accessing results from commands, save as macros sum w if c==1 mean of w for c=1 global w1=r(mean) save as global macro dis $w1 show content of macro 4 Uni- and bivariate 4.1 List list varlist [, [no]display nolabels] list varname-i – varname-j list in 3 list if exp list var1 if var2==. 4.2 list variables, nodisplay gives tabular data, nolables gives values List a group of variables 3. Observation, -1=last, 1/10 = 1 to 10 list if var>10, list if var==10 List if var2 is missing Tabulate 4.2.1 One-way tables tabulate var [weight][if expr][in range][,nofreq plot missing nolable] nolable shows category values 8 tab1 varlist one way tables for all variables tab c, gen(c) create dummies c1, c2,.. for each category of c 4.2.2 Two-way tables tab var1 var2 [weight][if expr][in range][,nofreq col row cells chi2 exact missing nolabel] tab var1 var2 , nofreq col chi crosstab column % no freq with chi-square test tab var1 var2 ,exact Fisher exact test tabi 30 20 \ 20 10, col chi2 immediate table tab var1 var2, summarize(var3) mean, sd and freq of var3 by var1 and var2. Use mean standard or freq to limit out 4.2.3 Three-way tables sort var3 by var3: tab var1 var2 4.3 Table of summary statistics table rowvar [colvar [supercolvar]] [if] [in] [weight] [, options] table rowvar, contents(clist) row col clist:freq, mean, sd, sum, n, max, min, median, p# (percentile),iqr. Totals: row col. Show missing: missing table rowvar colvar supercolvar by superrowvarlist multi way tables Ex: table sex, c(n age mean age mean educ) row subjects, mean age and mean educ by sex , plus total row tabstat varlist [if] [in] [weight] [, options] epitab 4.4 Means and confidence intervals means varlist ci varlist, binomial poisson total 4.5 Summarize summarize vars summarize vars ,detail inspect var 4.6 3 types of means with ci ci for means, proportions and counts number, mean, sd, min, max. Summarize alone takes all variables. percentiles, var, skew, kurt details on values T-test ttest var=# one sample T-test ttest var, by(c) two sample T-test ttest var1=var2 paired two sample T-test ttest var1=var2, unpaired two sample T-test ,unequal equal variances not assumed Ex: sdtest age, by(sex) (equal var rejected) ttest age, by(sex) unequal 4.6.1 Test of equal variance (standard deviation) sdtest var=# standard deviation=# sdtest var, by(c) two groups compared sdtest var1=var2 same variance in both variables 4.6.2 One way anova oneway response_var factor_var [weight] [if exp] [in range] [, noanova nolabel missing wrap tabulate [no]means [no]standard [no]freq [no]obs bonferroni scheffe sidak ] Ex: oneway var c, tabulate analysis of var by c 4.7 Non-parametric analysis by gender, sort: centile partners, centile(25 50 75) cci percentiles with exact confidence interval ranksum partners, by(gender) Mann-Whitney test=Wilcoxon rank sum, 2 group kwallis partners, by(age3) Kruskal Wallis K-group test 4.8 Proportions proportions x1,over(c) proportions with ci 9 5 Graphics 5.1 Plot types graph twoway graph matrix graph bar, hbar, dot graph box graph pie 5.2 scatter, line, density, histogram, function,.. Graph Twoway 5.2.1 Twoway syntax graph twoway plot [if exp] [in range] [, options] twoway syntax (graph may be omitted) where plot=(plottype varlist, options) plot syntax, several plots may be listed and combined where varlist= y1 y2 … x lats variable is x Ex: twoway scatter y x plot y by x 5.2.2 Twoway plot types scatter, line, connected, area dot, bar, histogram, kdensity kernal desity function y=f(x),range( x1 x2) f(x) from x1 to x2 rarea rcap rbar range area, range cap, range bar , Ex: twoway area y x , sort base(50) gives shading from 50 Ex: Histogram, bin(10) start(-2.5) percent/frequency Ex: twoway (histogram x, width(1) frequency) (kdensity x, area(3200)) area scaled to the sum of subjects Ex: function y=normden(x), range(-4 4) droplines(-1.96 1.96) function plots Ex: twoway dropline db id if abs(db>.25) , mlabel(id) deltabeta >0.25 5.2.3 Twoway fitlines lfit, qfit, mband, mspline,lowess linear and quadratic fits, median band, median splines and lowess lfitci, qfitci, fpfitci fit with CI: linear, quadratic, fractional polynom Ex: (lfitci y x, ciplot(rline)) default is rarea Ex.: twoway (lfit y x) (lowess y x) (scatter y x) scatter with linear and lowess fit 5.3 Graph Bar, Hbar and Dot 5.3.1 Syntax graph bar/hbar/dot yvars [if exp] [in range] [, options] Where yvars=varlist, or =(stat) varlist, or= (stat) name=varname stat= mean, median, p1, p2, p99, sum, count, min, max Ex: graph bar x ,over(c) nofill means of x over categories of c Ex: graph bar (mean) meany=x (median) medy=x mean and median of the same variable Ex: graph bar (median) x1 x2 , percent stack stacked percentages 5.3.2 Options nofill skip empty categories sort(1) sort by 1 variable over(c1) values for each c1 by(c2) separete plots for each c2 bargap(0) % overlap, -30=30% overlap, 30=gap. blabel(what,where_and_how) bar labels what: bar/ total/ name/ group print height, total height, name of yvar, name of first over() group Where_and_how: position(outside/ inside/base/center) where to lpace the bar label format(%9.1f) gap(rel_size) textbox_options options for labels Ex: graph bar teq1 ,over(landsdel) nofill blabel(bar, pos(inside) size(*1.3) format(%9.1f) color(white)) Ex: graph hbar teq1 ,over(landsdel,axis(off) sort(1))nofill blabel(group, pos(base) size(*1.3) format(%9.1f) color(white)) 5.4 Graph Box, Hbox graph box x1 x2 x3, ascategory boxplot of separate cariables, ascat puts labels on the y-axis 10 graph hbox x, over(c, total) 5.5 plot of x over cat of c plus total Graph Pie graph pie x1 x2 x3 sum of x1, x2 and x3 graph pie x ,over(c) sum of x for each category of c graph pie ,over(c) number of cases for each category of c 5.5.1 Options plabel(_all sum/ percent/ name/ text, text_box_options) label all slices with sum, percent, x-names or a given text 5.6 Graph Matrix graph matrix x1-x5 5.7 Other graphs gladder y, qladder y 5.8 scatter of all 5 variables histograms over different transformations of y, QQ plot of the same Titles title(“text”), xtitle(“text”), ytitle(“text”) titles title, subtitle, captition, note title types 5.8.1 Title options position(clockpos) ring(ringpos) span text_box_options Ex: scatter teq1 moralder, title("Title", position(12) ring(0)) 5.9 Legend legend([contents] [location]) Contetnts: order(1 2 3) may also use order(1-“label1” 2 3) label(1 “label1”) override legend for var 1 cols(1) legend in 1 column. Row(1) … stack stack symbol and text rowgap(2) colgap(2) gap between each element Location: on/off legend on/off position(clock) position of legend ring(1) radial distance from plot, ring(0)=inside Ex: legend(label(1 "Density of TEQ") label(2 "Mean") label(3 "Median") ring(0) pos(2) cols(1)) Ex: graph bar teq_di teq_fu teq_npcb teq_mopc teq_hcb /// , legend(row(1) stack colgap(10) label(1 "Dioxin") label(2 "Furan") label(3 "Non-o") label(4 "Mono") label(5 "HCB")) 5.10 Axis scale, label, ticks and grid 5.10.1 Axix title x|ytitle(“line1” “line2”) 5.10.2 Axis scale x|yscale(opts) Options: axis(1) axis to modify (1-9) [no]log [no]reverse range(0 100) extend range, will not decrease range. range(0): start at 0, range(100): end at 100 alt axis at alternative side on/off axis on/off Ex: scatter teq1 moralder,xscale(range(0 80)) yscale(off) no y-axis 11 5.10.3 Axis labels and ticks x|ylabel(rule_or_values,opts) major ticks and labels x|ytick(rule_or_values) major ticks x|ymlabel(rule_or_values) minor ticks and labels x|ymtick(rule_or_values) minor ticks rule or values (may use both): #10 10 nice values 1 5 50 labels at 1, 5 and 50 0 5 10 “mean” 15 20 labels every 5, with mean printed at 10 0 (10) 100 labels from 0 to 100 in steps of 10 minmax min and max values none Label options: angle(0) [no]grid add gridlines format(%5.0f) 5 places, o decimals, fixed Ex: xlabel(1 “Low” 2 “Medium” 3”High”,angle(45)) text labels at values 1 2 and 3, at 45 deg Ex: scatter teq1 moralder,xlabel(#10,grid) 5.11 Text text(y x “text”, opts) text at y,x in the plot placement(c ) c=centered, n=north, s=south, .. orientation(vertical) box draw box around text Ex: graph …, text(10 50 “Line1” “Line2”, just(left) color(blue) ) two lines of text at (y,x)=(10,50) 5.12 Markers and marker labels 5.12.1 Markers mstyle(p1 p2 ) msymbol(sym1 sym2 …) default styles marker, Square, square(small), Sh (hollow), Square, Diamond, Triange, O circle, X , +, p point, . default, i invisible. Ex msymbol(S) msize(small medium large), msize(*2) small meduin large markers, twize the size mcolor(green) both outside and inside color Ex msymbol(. t Oh) markers for 3 variables: default, small triagles and hollow circles Ex twoway scatter y x [aweight=z], msymbol(oh) msize(small) point size prop to z 5.12.2 Marker labels mlabel(var) label marker by var content mlabsize(size) mlabcolor(color) mlabelpos(12) label at 12 o’clock position mlabvposition(var) postitions based on variable containing clock positions mlabgap(*3) 3 times larger gap between marker and label Ex scatter y x, mlabel(z) mlabpos(center) msymbol(i) use contents of z to label points, labels in the center and invisible points 5.13 Lines 5.13.1 Connecting points Twoway scatter y x, connect(l) sort connect(l) connect(L) connects(J stepstair) 5.13.2 Line options lcolor(red) lwidth(thick) or lwidth(*3) lpattern(dash) lpattern(“l” “.-“ “-###”) sort points, connect with line line separate line for each series for survival curves line color thick line solid, dotdashed, dash+3 spaces 12 5.14 Text box options tsstyle(textboxstyle) overall style box/nobox border size(textsizestyle) color(colorstyle) text color justification(justificationstyle) text left, center, right alignment(alignmentsyle) text top, middle, bottom, baseline bfcolor(colorstyle) background color bcolor(colorstyle) background and border color blstyle(linestyle) style of border orientation(orientationstyle) vertical/horizontal, rvertical/rhorizontal placement(compassdirstyle) location ring(1) 0:inside, 1-7 outside format(%9.1f) 9 places, 1 desimal, fixed Ex: graph…,title(“My title”, color(red) box size(*1.5)) 5.15 Other options 5.15.1 Colors black, white, red, blue, cyan, green, mint, yellow…. gs0… gs16 gray scales from black to white gray=gs8 color*0.5 half the intensity 5.15.2 Positions clockpos(12) 12 o’clock. clockpos(0) means center if valid placement(north) alternative to clock with 9 positions ring(1) 0:inside, 1-7 outside justification(left/ centered/ right) text justification alignment(top/ middle/ bottom/ baseline) text alignement orientation(horizontal/ vertical/rhorizontal/ rvertical) 5.16 Over() over(c, total) split by categories of c plus total, can use over(c1) over(c2) over(c, descending) sort values. over(c, sort(c2)), sort(1) sort by c2 or by the first y variable over(var, relabel(1 “lab1” “lab2”)) new labels for ”over” variable ascategory / asyvars as categories: plotted with spaces, as yvars: plotted dense missing, nofill show missing, do not show empty combinations Ex: graph bar teq_di teq_fu ,over(landsdel, total) nofill 5.17 By() by(varlist, suboptions) separate graphs for each varlist total add total group missing add missing groups colfirst display down columns rows(#), cols(#) number of rows or cols holes(numlist) positions to leave blank compact Ex: graph bar teq_di teq_fu ,by(star, total rows(1) compact) 5.18 Schemes set scheme(schemename) [,permanently] graph …, scheme(schemename) graph query, schemes schemenames: s2color s2mono set overall look of graphs set overall look for current graph list installed schemes Default, will vary colors of lines and markers monocrome, will vary patterns of lines and markers 13 5.19 Combinding graphs graph …., saving(plt1,replace) or name(plt1) graph …., name(plt1,replace) graph use plt1.gph or display plt1.gph graph combine plt1 plt2, ycommon cols(1) graph combine plt1.gph plt2.gph 5.20 Graph query graph query graph query color graph query linepattern 5.21 saving to file saving to memory show saved graph from file combine from memory in 1 row with same y scaling combine from file list of all styletypes list of all colorstyles list of all linepatternstyles Palettes palette line palette symbol palette color1 color2 plot showing the linetypes plot showing the symboltypes plot comparing colors 6 Regression commands 6.1 Regression models 6.1.1 Linear regression with simple error structure regress linear regression (also heteroschedastic errors) boxcox linear regression on BoxCox transformations of y and x’s nl non linear least squares 6.1.2 GLM logistic logistic regression poisson Poisson regression binreg binary outcome, OR, RR, or RD effect measures glm use for non-canonical links 6.1.3 Conditional logistc clogit for matched case-control data 6.1.4 Multiple outcome mlogit multinomial logit (not ordered) ologit ordered logit 6.1.5 Linear regression with complex error structure xtmixed linear mixed models xtlogit random effect logistic xtpoisson random effect Poisson 6.1.6 Survival models stcox Cox proportional hazard models (with frailty) streg parametrix survival models (with frailty) 6.2 Orthogonal variables orthog x1 x2 x3, gen(q1 q2 q3) matrix(R) regress y q1 q2 q3 matrix b=e(b)*inv(R)’ matrix list b 6.3 make orthogonal variables and transformation matrix R regression command transforming coefs back to original metric show coefs Test after regression commands 6.3.1 Wald test test x1 x2 test x1=-2 test x1-2*x2=3 joint effect of two variables H0: x1=-2 test of linear combinations of variables 14 6.3.2 Likelihood ratio test regress y x1 x2 x3 x4 estimates store m1 regress y x1 x2 lrtest m1 . lrtest m1 m2 6.4 Cataloging estimation results quietly: regress y x1 x2 estimates store m1 estimates dir est table m1 m2 … est stats m1 m2 … estimates replay estimates restore m2 6.5 fit model without output store results as m1 list stored results compare coefs compare fit (ll, AIC..) show results make m2 active Cov, Corr, AIC, BIC and sample estat vce estat vce, corr estat ic estat summarize 6.6 fit model 1 store model 1 fit model 2 test model 1 against current model test m1 vs m2 vce=variance-covariance estimate correlation matrix information criteria: AIC and BIC show mean, min and max for variables in the model Prediction regress y x1 x2 gen y1=_b[_cons]+_b[x1]*x1+_b[x2]*x2 predict y1 predict y1, xb pred y1 if e(sample), xb pred sey, stdp pred r1, resid pred c1, cooksd fit model direct predicition prediction in the same metric as the outcome, prob of sucsess for logistic, counts for Poisson, … linear prediction linear prediction restricted to the estimation sample standard error of prediction residuals Cooks distance 7 Linear regression regress y x1 x2 x3 regress test x2 x3 vce predict predict newvar, stat regress y x1 x2 x3 if influ<1 7.1.1 Test of assumtions predict fteq ,xb predict res ,res twoway (qfitci res fteq ) (scatter res fteq) rvfplot, mlabel(id) yline(0) ovtest ovtest, rhs hettest 7.1.2 Test of influence lvr2plot ,mlabel(id) avplot moralder ,mlabel(id) 7.1.3 Test of multicollinearity vif regress y on x1 x2 x3 repeat last result F-test of joint effect of x2 and x3 variance covariance matrix of estimators. Vce, rho gives corr matrix predicted values pred, resid, DFBeta,… Stored Cooks dist in influ, rerun without high influential points predicted y residuals scatter with qubic +ci residuals versus fitted, look for non linearity and heterosk. test for omitted higher order y's, p<.05 means non-linear effects test for omitted higher order x-variables, p<.05 means non-linear effects */ test for heterosk., p<0.05 means heterosk. leverage vs residuals squared, look for high leverage added variable plot variance inflation factor, look for vif>10 (or 30) and mean vif>1 15 8 Logistic regression 8.1 Syntax logistic y x1 x2 x3 logistic , coef logit 8.2 show odds ratios show coefs of last model show coefs of last model Categorical covariates xi: logistic y x1 i.x2 x3 char _dta[omit] prevalent char _dta[omit] char catvar[omit] 3 8.3 indicator variables for x2 make the most prevalent group the reference category (Permanent setting) make the 1. Group reference. (Permanent setting) make 3. Group of catvar reference. (Permanent setting) Residuals, goodnes-of-fit predict newvar, stat predict statistic and put into newvar. ptat: p=probabilities, xb=fitted values, db=delta beta, de=deviance resid, r=Pearson resid, rsta=standardized resid, hat=leverage test x1 x2 test joint effect of x1 x2 lfit Pearson chi-square goodness of fit. , group(10) gives Hosmer-Lemeshow with 10 g lstat summary statistics lincom OR of one covariate pattern versus another 8.4 Diagnostic plots After fitting the logistic model do: predict p, p probabilities predict db, db delta beta predict dx2, dx2 Hosmer Lemeshow delta chi-square influence graph dx2 p [w=db],border ylab xlab t1(“Symbol size prop to delta-beta”) 9 ST Survival time data 9.1 Initial settings and description stset timevar, failure(died) stdes stsum 9.2 set time variable and failure indicator describe data summarize data Kaplan –Meier … sts graph, by(drug) sts test drug Kaplan-Meier plot log rank test stci, by(sex) p(25) 25 percentile with ci by sex 9.3 Survival regression models 9.3.1 Cox 9.3.2 Parametric survival 10 xtmixed -- Multilevel mixed-effects linear regression 10.1 Syntax xtmixed y x1 x2 x3 ||id: x1 , cov(ind) 10.2 y and fixed part || id for second level: random part (intercept understood), covariance Random effect covariances independent exchangeable identity one variance parameter per random effect, all covariances zero; default equal variances for random effects, and one common pairwise covariance equal variances for random effects, all covariances zero; the default for factor vars 16 unstructured 10.3 all variances/covariances distinctly estimated Predict xb stdp fitted residuals rstandard Ex: predict yhat, fitted xb, linear predictor for the fixed portion of the model standard error of the fixed-portion linear prediction xb fitted values, linear predictor of the fixed portion plus predicted random effects residuals, response minus fitted values standardized residuals predict fixed and random effect into new variable yhat 11 Data reduction 11.1 Factor analysis factor v1 v2 v3 v4, mineigen(1) factors(5) estat anti estat kmo rotate loadingplot minimum eigenvalue 1, max number of factors 5 anti-image corr and cov Kaiser-Meyer-Olkin measure of sampling adequacy, 0.00 to 0.49 unacceptable, 0.50 to 0.59 miserable, 0.60 to 0.69 mediocre, 0.70 to 0.79 middling, 0.80 to 0.89 meritorious, 0.90 to 1.00 marvelous varimax orthogonal plot 2 factors 12 Programing 12.1 Programs 12.1.1 Program definition program define name arguments x1 x2 x3 local m=`x1’ +1 . end program drop name 12.2 Macros local name “content” local name= expression `name’ global name= expression $name 12.3 remove old program definition define macro define macro use local macro define macro use global macro Loops 12.3.1 For loop forvalues i=1(1)10 { disp `i' } 12.3.2 Foreach foreach lname in any_list { foreach lname of local lmacname { foreach lname of global gmacname { foreach lname of varlist varlist { foreach lname of newlist newvarlist { foreach lname of numlist numlist { Ex: local grains "rice wheat corn rye barley oats" foreach x of local grains { display "`x'" commands on separate lines 17 } Ex: foreach x of varlist mpg weight-turn { ... } 12.3.3 While local i=1 while `i’<5 { commands local i= `i’+1 } 12.4 Conditions 12.4.1 If if exp { Commands } else { commands } 12.5 the else part is optitional Matrix expressions matrix A=(1,2,3\4,5,6) A[.,“col1”] or A[.,1] A[”row1”,. ] or A[1,.] A[“row i2,”col j”] or A[i,j] A[2:,1..2] mat B=J(3,4,0) mat B[2,2]=1 12.5.1 Matrix operators -B negate B' transpose B \ C add rows of C below rows of B B , C add columns of C to the right of B B + C add B - C subtract B * C multiply (including mult. by scalar) B / z division by scalar B # C Kronecker product define matrix A as 2 by 3 first col, “col1” is the column name first row element i,j submatrix (2-n) by (1-2), may also use names 3 by 4 matrix of zero’s change element matrix list A matrix dir matrix list matrix rename matrix drop show matrix List the currently defined matrices Display the contents of a matrix Rename a matrix Drop a matrix 13 GLLAMM 13.1 Instalation Run the following Stata command to install gllamm: ssc install glamm, replace 13.2 Data format Use long data format with identifiers at the different levels 18 13.3 Syntax examples 13.3.1 A two-level random intercept model (logistic) gllamm y x1 x2, i(level2-Id) family(binom) link(logit) nip(8) number of integration points=8 13.3.2 A two-level random intercept and slope model (linear) gen cons=1 eq interc: cons eq slope1: x1 gllamm y x1, i(level2_id) nrf(2) eqs(interc slope1) number of random functions=2, 3 random parameters estimated: var(interc), var(slope1) and covar(interc,slope1). Option nocor would set the last to 0 13.3.3 A two-level random intercept model, x1 and x2 categorical xi:gllamm y i.x1 i.x2, i(level2-Id) family(binom) link(logit) nip(8) 13.4 Prediction 13.4.1 Syntax and options Gllapred varname [, xb u linpred] xb u linpred fixed effect part of linear prediction posterior means and std for latent variables linear prediction of both fixed and random parts 14 Survey commands A family of commands to account for survey design (stratification and clustering) 14.1 Setting stratification, clustering, finite population correction and sample weigths Svyset strata varname Svyset psu varname Svyset fpc varname Svyset pweigth=varname Settings remain untill cleared Svyset , clear Svyset 14.2 stratification clustering (psu=principal survey unit) finite population correction sample probability weights shows current settings Means and proportions Svymean varname by (variable) subpopulation(variable) subpopulation will select values different from 0 and missing. Do not use if in svy commands Svyprop varname Svyratio varname Svytotal varname 14.3 Tables Svytab x y, row column obs se ci 14.4 Regression Svyreg Svylogit Svypois 14.5 two-way tables linear logistic Poisson Stata web links Stata programs for generalized linear measurement error models, USA Programs by R. J. Carroll, J. Hardin, and H. Schmiediche, fit generalized linear models when one or more covariates are measures with error. Stata program by Tony Brady, Sealed Envelope Ltd 19 Programs for Hosmer–Lemeshow goodness of fit test, conversion of regression output into near publication quality tables, time utilities to translate strings in 24 hr clock HH:MM format to elapsed times and back again, tabulate longitudinal data at the cluster level, count clusters in longitudinal data, etc. Stata programs from Dr. Gareth Ambler, University College, UK Programs for Hosmer-Lemeshow test, penalised logistic regression, and generalized additive models, and a postestimation routine. One great source for user-written software for Stata is the Stata Journal (SJ). There are many other resources available, including the Statalist archive, but we will use the SJ archive for this example. From Stata's toolbar, click on Help > SJ and User-written Programs, or at the command prompt, type [view] help net_mnu. 15 New in Stata 10 15.1 Graph editor 15.2 Exact 15.3 Mixed models xtmelogit xtmepoisson 15.4 Survival sts graph, risk table ci plotopt() ciopt() st curve, ---#--- 15.5 Power stpower cox stpower logrank 0.7 0.8, power(80) sample required to increase the survival from 0.7 (untreated) to 0.8 (treated) at the end of survey stpower logrank 0.7, n(100 250 500) hratio(0.1(0.01)0.9) saving(mypower) 15.6 Saved results est save filename est use filename 15.7 Mata 15.8 Diverse lpoly mkspline