Uploaded by Christopher Meixner-Alter

AllCheatSheets Stata v15

advertisement
Data Processing
Cheat Sheet
with Stata 15
For more info see Stata’s reference manual (stata.com)
Basic Syntax
All Stata commands have the same format (syntax):
[by varlist1:]
command
[varlist2]
[=exp]
apply the
command across
each unique
combination of
variables in
varlist1
Useful Shortcuts
keyboard buttons
F2
describe data
Ctrl + 8
open the data editor
clear
delete data in memory
AT COMMAND PROMPT
PgUp
Tab
cls
PgDn
Ctrl + 9
open a new .do file
Ctrl + D
highlight text in .do file,
then ctrl + d executes it
in the command line
scroll through previous commands
autocompletes variable name after typing part
clear the console (where results are displayed)
Set up
pwd
print current (working) directory
cd "C:\Program Files (x86)\Stata13"
change working directory
dir
display filenames in working directory
dir *.dta
List all Stata data in working directory underlined parts
are shortcuts –
capture log close
use "capture"
close the log on any existing do files or "cap"
log using "myDoFile.txt", replace
create a new log file to record your work and results
search mdesc
packages contain
find the package mdesc to install extra commands that
expand Stata’s toolkit
ssc install mdesc
install the package mdesc; needs to be done once
Import Data
sysuse auto, clear
for many examples, we
load system data (Auto data) use the auto dataset.
use "yourStataFile.dta", clear
load a dataset from the current directory frequently used
commands are
import excel "yourSpreadsheet.xlsx", /*
highlighted in yellow
*/ sheet("Sheet1") cellrange(A2:H11) firstrow
import an Excel spreadsheet
import delimited "yourFile.csv", /*
*/ rowrange(2:11) colrange(1:8) varnames(2)
import a .csv file
webuse set "https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data"
webuse "wb_indicators_long"
set web-based directory and load data from the web
function: what are
you going to do
to varlists?
[if exp]
[in range]
[weight]
column to save output as condition: only apply to
apply
a new variable apply the function specific rows
command to
if something is true
bysort rep78 : summarize
price
[using filename]
apply
weights
pull data from a file
(if not loaded)
[,options]
special options
for command
In this example, we want a detailed summary
with stats like kurtosis, plus mean and median
if foreign == 0 & price <= 9000, detail
To find out more about any command – like what options it takes – type help command
Arithmetic
add (numbers)
+ combine (strings)
− subtract
* multiply
/ divide
^ raise to a power
Basic Data Operations
Logic
&
and
! or ~ not
| or
== tests if something is equal
= assigns a value to a variable
== equal
!= not
or
~= equal
if foreign != 1 & price >= 10000
make
Chevy Colt
Buick Riviera
Honda Civic
Volvo 260
foreign
0
0
1
1
price
3,984
10,372
4,499
11,995
< less than
<= less than or equal to
> greater than
>= greater or equal to
if foreign != 1 | price >= 10000
make
Chevy Colt
Buick Riviera
Honda Civic
Volvo 260
foreign
0
0
1
1
price
3,984
10,372
4,499
11,995
Explore Data
VIEW DATA ORGANIZATION
describe make price
display variable type, format,
and any value/variable labels
count
count if price > 5000
number of rows (observations)
Can be combined with logic
ds, has(type string)
lookfor "in."
search for variable types,
variable name, or variable label
isid mpg
check if mpg uniquely
identifies the data
SEE DATA DISTRIBUTION
codebook make price
overview of variable type, stats,
number of missing/unique values
summarize make price mpg
print summary statistics
(mean, stdev, min, max)
for variables
inspect mpg
show histogram of data,
number of missing or zero
observations
histogram mpg, frequency
plot a histogram of the
distribution of a variable
BROWSE OBSERVATIONS WITHIN THE DATA
Missing values are treated as the largest
browse or
Ctrl + 8 positive number. To exclude missing values,
ask whether the value is less than "."
open the data editor
list make price if price > 10000 & !missing(price)
clist ... (compact form)
list the make and price for observations with price > $10,000
display price[4]
display the 4th observation in price; only works on single values
gsort price mpg (ascending)
gsort –price –mpg (descending)
sort in order, first by price then miles per gallon
duplicates report
assert price!=.
finds all duplicate values in each variable
verify truth of claim
levelsof rep78
display the unique values for rep78
Tim Essam (tessam@usaid.gov) • Laura Hughes (lhughes@usaid.gov)
follow us @StataRGIS and @flaneuseks
inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets)
Change Data Types
Stata has 6 data types, and data can also be missing:
no data
true/false
numbers
words
missing
string int long float double
byte
To convert between numbers & strings:
gen foreignString = string(foreign)
"1"
tostring foreign, gen(foreignString)
1
"1"
decode foreign , gen(foreignString)
"foreign"
1
gen foreignNumeric = real(foreignString)
"1"
"1"
destring foreignString, gen(foreignNumeric)
encode foreignString, gen(foreignNumeric) "foreign"
recast double mpg
generic way to convert between types
Summarize Data
include missing values create binary variable for every rep78
value in a new variable, repairRecord
tabulate rep78, mi gen(repairRecord)
one-way table: number of rows with each value of rep78
tabulate rep78 foreign, mi
two-way table: cross-tabulate number of observations
for each combination of rep78 and foreign
bysort rep78: tabulate foreign
for each value of rep78, apply the command tabulate foreign
tabstat price weight mpg, by(foreign) stat(mean sd n)
create compact table of summary statistics
displays stats
formats numbers for all data
table foreign, contents(mean price sd price) f(%9.2fc) row
create a flexible table of summary statistics
collapse (mean) price (max) mpg, by(foreign) replaces data
calculate mean price & max mpg by car type (foreign)
Create New Variables
generate mpgSq = mpg^2 gen byte lowPr = price < 4000
create a new variable. Useful also for creating binary
variables based on a condition (generate byte)
generate id = _n
bysort rep78: gen repairIdx = _n
_n creates a running index of observations in a group
generate totRows = _N
bysort rep78: gen repairTot = _N
_N creates a running count of the total observations per group
pctile mpgQuartile = mpg, nq = 4
create quartiles of the mpg data
see help egen
egen meanPrice = mean(price), by(foreign)
calculate mean price for each group in foreign for more options
geocenter.github.io/StataTraining
Disclaimer: we are not affiliated with Stata. But we like it.
updated June 2016
CC BY 4.0
Data Transformation
with Stata 15
Cheat Sheet
For more info see Stata’s reference manual (stata.com)
Select Parts of Data (Subsetting)
SELECT SPECIFIC COLUMNS
drop make
remove the 'make' variable
keep make price
opposite of drop; keep only variables 'make' and 'price'
FILTER SPECIFIC ROWS
drop if mpg < 20
drop in 1/4
drop observations based on a condition (left)
or rows 1-4 (right)
keep in 1/30
opposite of drop; keep only rows 1-30
keep if inrange(price, 5000, 10000)
keep values of price between $5,000 – $10,000 (inclusive)
keep if inlist(make, "Honda Accord", "Honda Civic", "Subaru")
keep the specified values of make
sample 25
sample 25% of the observations in the dataset
(use set seed # command for reproducible sampling)
Replace Parts of Data
CHANGE COLUMN NAMES
rename (rep78 foreign) (repairRecord carType)
rename one or multiple variables
Reshape Data
webuse set https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data
webuse "coffeeMaize.dta"
load demo dataset
MELT DATA (WIDE → LONG)
reshape variables starting
with coffee and maize
reshape long coffee@ maize@, i(country) j(year)
new variable
convert a wide dataset to long
TIDY
DATASETS
WIDE
LONG (TIDY) have each obsercountry
year coffee maize
melt
coffee maize maize
vation in its own
country coffee
2011
2012
2011 2012
Malawi
2011
row and each
Malawi
2012
Malawi
Rwanda
2011
variable in its own
Rwanda
REPLACE MISSING VALUES
useful for cleaning survey datasets
mvdecode _all, mv(9999)
replace the number 9999 with missing value in all variables
mvencode _all, mv(9999)
useful for exporting data
replace missing values with the number 9999 for all variables
Label Data
Value labels map string descriptions to numbers. They allow the
underlying data to be numeric (making logical tests simpler)
while also connecting the values to human-understandable text.
label define myLabel 0 "US" 1 "Not US"
label values foreign myLabel
define a label and apply it the values in foreign
label list
list all labels within the dataset
note: data note here
place note in dataset
Tim Essam (tessam@usaid.gov) • Laura Hughes (lhughes@usaid.gov)
follow us @StataRGIS and @flaneuseks
Rwanda
Uganda
Uganda
cast
Uganda
CAST DATA (LONG → WIDE)
2012
2011
2012
what will be create new variables
unique id with the year added
variable (key) to the column name
create new variables named
coffee2011, maize2012...
reshape wide coffee maize, i(country) j(year)
convert a long dataset to wide
When datasets are
tidy, they have a
consistent,
standard format
that is easier to
manipulate and
analyze.
xpose, clear varname
transpose rows and columns of data, clearing the data and saving
old column names as a new variable called "_varname"
blue
id
id
+
blue
webuse coffeeMaize2.dta, clear
save coffeeMaize2.dta, replace load demo data
webuse coffeeMaize.dta, clear
pink
pink
blue
pink
should
contain
the same
variables
(columns)
append using "coffeeMaize2.dta", gen(filenum)
add observations from "coffeeMaize2.dta" to
current data and create variable "filenum" to
track the origin of each observation
MERGING TWO DATASETS TOGETHER
must contain a
common variable
id blue pink (id) id brown
+
ONE-TO-ONE
=
id
blue
pink brown _merge
3
3
3
MAid
blue
pink
id brown
+
=
_merge code
1 row only
(master) in ind2
2 row only
(using) in hh2
3 row in
(match) both
id
blue
pink brown _merge
3
.
3
1
3
.
.
.
3
1
2
webuse ind_age.dta, clear
save ind_age.dta, replace
webuse ind_ag.dta, clear
merge 1:1 id using "ind_age.dta"
one-to-one merge of "ind_age.dta"
into the loaded dataset and create
variable "_merge" to track the origin
webuse hh2.dta, clear
save hh2.dta, replace
webuse ind2.dta, clear
merge m:1 hid using "hh2.dta"
many-to-one merge of "hh2.dta"
into the loaded dataset and create
variable "_merge" to track the origin
FUZZY MATCHING: COMBINING TWO DATASETS WITHOUT A COMMON ID
reclink match records from different data sets using probabilistic matching ssc install reclink
jarowinkler create distance measure for similarity between two strings ssc install jarowinkler
inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets)
FIND MATCHING STRINGS
display strmatch("123.89", "1??.?9")
return true (1) or false (0) if string matches pattern
display substr("Stata", 3, 5)
return string of 5 characters starting with position 3
list make if regexm(make, "[0-9]")
list observations where make matches the regular
expression (here, records that contain a number)
list if regexm(make, "(Cad.|Chev.|Datsun)")
return all observations where make contains
"Cad.", "Chev." or "Datsun"
compare the given list against the first word in make
TRANSFORM STRINGS
ADDING (APPENDING) NEW DATA
id
GET STRING PROPERTIES
display length("This string has 29 characters")
return the length of the string
* user-defined package
charlist make
display the set of unique characters within a string
display strpos("Stata", "a")
return the position in Stata where a is first found
list if inlist(word(make, 1), "Cad.", "Chev.", "Datsun")
return all observations where the first word of the
make variable contains the listed words
Combine Data
CHANGE ROW VALUES
replace price = 5000 if price < 5000
replace all values of price that are less than $5,000 with 5000
recode price (0 / 5000 = 5000)
change all prices less than 5000 to be $5,000
recode foreign (0 = 2 "US")(1 = 1 "Not US"), gen(foreign2)
change the values and value labels then store in a new
variable, foreign2
unique id create new variable which captures
variable (key) the info in the column names
Manipulate Strings
display regexr("My string", "My", "Your")
replace string1 ("My") with string2 ("Your")
replace make = subinstr(make, "Cad.", "Cadillac", 1)
replace first occurrence of "Cad." with Cadillac
in the make variable
display stritrim(" Too much Space")
replace consecutive spaces with a single space
display trim(" leading / trailing spaces ")
remove extra spaces before and after a string
display strlower("STATA should not be ALL-CAPS")
change string case; see also strupper, strproper
display strtoname("1Var name")
convert string to Stata-compatible variable name
display real("100")
convert string to a numeric or missing value
Save & Export Data
compress
compress data in memory
Stata 12-compatible file
save "myData.dta", replace
saveold "myData.dta", replace version(12)
save data in Stata format, replacing the data if
a file with same name exists
export excel "myData.xls", /*
*/ firstrow(variables) replace
export data as an Excel file (.xls) with the
variable names as the first row
export delimited "myData.csv", delimiter(",") replace
export data as a comma-delimited file (.csv)
geocenter.github.io/StataTraining
Disclaimer: we are not affiliated with Stata. But we like it.
updated June 2016
CC BY 4.0
Data Visualization
Cheat Sheet
with Stata 15
graph <plot type>
titles
y2
y3
kdensity mpg, bwidth(3)
main plot-specific options;
see help for complete set
b
c
graph bar (count), over(foreign, gap(*0.5)) intensity(*0.5)
graph hbar draws horizontal bar charts
23
20
graph hbar ...
(asis) • (percent) • (count) • over(<variable>, <options: gap(*#) •
relabel • descending • reverse>) • cw •missing • nofill • allcategories •
percentages • stack • bargap(#) • intensity(*#) • yalternate • xalternate
DISCRETE X, CONTINUOUS Y
graph bar (median) price, over(foreign)
graph hbar ...
graph dot (mean) length headroom, over(foreign) m(1, ms(S))
dot plot (asis) • (percent) • (count) • (stat: mean median sum min max ...)
over(<variable>, <options: gap(*#) • relabel • descending • reverse
sort(<variable>)>) • cw • missing • nofill • allcategories • percentages
linegap(#) • marker(#, <options>) • linetype(dot | line | rectangle)
dots(<options>) • lines(<options>) • rectangles(<options>) • rwidth
graph hbox mpg, over(rep78, descending) by(foreign) missing
graph box draws vertical boxplots
box plot
over(<variable>, <options: total • gap(*#) • relabel • descending • reverse
sort(<variable>)>) • missing • allcategories • intensity(*#) • boxgap(#)
medtype(line | line | marker) • medline(<options>) • medmarker(<options>)
ssc install vioplot
violin plot over(<variable>, <options: total • missing>) • nofill •
vertical • horizontal • obs • kernel(<options>) • bwidth(#) •
barwidth(#) • dscale(#) • ygap(#) • ogap(#) • density(<options>)
bar(<options>) • median(<options>) • obsopts(<options>)
Plot Placement
twoway scatter mpg price, by(foreign, norescale)
SUPERIMPOSE
total • missing • colfirst • rows(#) • cols(#) • holes(<numlist>)
compact • [no]edgelabel • [no]rescale • [no]yrescal • [no]xrescale
[no]iyaxes • [no]ixaxes • [no]iytick • [no]ixtick • [no]iylabel
[no]ixlabel • [no]iytitle • [no]ixtitle • imargin(<options>)
graph twoway scatter mpg price in 27/74 || scatter mpg price /*
*/ if mpg < 15 & price > 12000 in 27/74, mlabel(make) m(i)
combine twoway plots using ||
Laura Hughes (lhughes@usaid.gov) • Tim Essam (tessam@usaid.gov)
follow us @flaneuseks and @StataRGIS
2
17
10
twoway scatter mpg weight, mlabel(mpg)
vertical • horizontal • headlabel
THREE VARIABLES
twoway contour mpg price weight, level(20) crule(intensity)
scatter plot with labelled values
3D contour plot
ccuts(#s) • levels(#) • minmax • crule(hue | chue | intensity | linear) •
scolor(<color>) • ecolor (<color>) • ccolors(<colorlist>) • heatmap
interp(thinplatespline | shepard | none)
jitter(#) • jitterseed(#) • sort • cmissing(yes | no)
connect(<options>) • [aweight(<variable>)]
twoway connected mpg price, sort(price)
regress price mpg trunk weight length turn, nocons
matrix regmat = e(V)
ssc install plotmatrix
plotmatrix, mat(regmat) color(green)
line
twoway area mpg price, sort(price)
line plot with area shading
heatmap
mat(<variable) • split(<options>) • color(<color>) • freq
SUMMARY PLOTS
twoway mband mpg weight || scatter mpg weight
sort • cmissing(yes | no) • vertical, • horizontal
base(#)
plot median of the y values
bands(#)
twoway bar price rep78
ssc install binscatter
binscatter weight mpg, line(none)
bar plot
plot a single value (mean or median) for each x value
vertical, • horizontal • base(#) • barwidth(#)
twoway dot mpg rep78
dot plot
vertical, • horizontal • base(#) • ndots(#)
dcolor(<color>) • dfcolor(<color>) • dlcolor(<color>)
dsize(<markersize>) • dsymbol(<marker type>)
dlwidth(<strokesize>) • dotextend(yes | no)
medians • nquantiles(#) • discrete • controls(<variables>) •
linetype(lfit | qfit | connect | none) • aweight[<variable>]
FITTING RESULTS
twoway lfitci mpg weight || scatter mpg weight
calculate and plot linear fit to data with confidence intervals
level(#) • stdp • stdf • nofit • fitplot(<plottype>) • ciplot(<plottype>) •
range(# #) • n(#) • atobs • estopts(<options>) • predopts(<options>)
twoway lowess mpg weight || scatter mpg weight
twoway dropline mpg price in 1/5
dropped line plot
calculate and plot lowess smoothing
twoway rcapsym length headroom price
calculate and plot quadriatic fit to data with confidence intervals
bwidth(#) • mean • noweight • logit • adjust
vertical, • horizontal • base(#)
twoway qfitci mpg weight, alwidth(none) || scatter mpg weight
range plot (y1 ÷ y2) with capped lines
vertical • horizontal
see also rcap
level(#) • stdp • stdf • nofit • fitplot(<plottype>) • ciplot(<plottype>) •
range(# #) • n(#) • atobs • estopts(<options>) • predopts(<options>)
REGRESSION RESULTS
twoway rarea length headroom price, sort
vertical • horizontal • sort
cmissing(yes | no)
combine 2+ saved graphs into a single plot
plot several y values for a single x value
twoway pccapsym wage68 ttl_exp68 wage88 ttl_exp88
Slope/bump plot
(sysuse nlswide1)
range plot (y1 ÷ y2) with area shading
graph combine plot1.gph plot2.gph...
scatter y3 y2 y1 x, msymbol(i o i) mlabel(var3 var2 var1)
twoway scatter mpg weight, jitter(7)
vertical, • horizontal
half • jitter(#) • jitterseed(#)
diagonal • [aweights(<variable>)]
jitter(#) • jitterseed(#) • sort
see also
connect(<options>) • cmissing(yes | no)
over(<variable>, <options: gap(*#) • relabel • descending • reverse
sort(<variable>)>) • cw • missing • nofill • allcategories • percentages
stack • bargap(#) • intensity(*#) • yalternate • xalternate
JUXTAPOSE (FACET)
twoway pcspike wage68 ttl_exp68 wage88 ttl_exp88
(sysuse nlswide1)
Parallel coordinates plot
scatter plot with connected lines and symbols
bar plot (asis) • (percent) • (count) • (stat: mean median sum min max ...)
vioplot price, over(foreign)
save
scatter plot of each combination of variables
jitter(#) • jitterseed(#) • sort • cmissing(yes | no)
connect(<options>) • [aweight(<variable>)]
(asis) • (percent) • (count) • over(<variable>, <options: gap(*#) •
relabel • descending • reverse>) • cw •missing • nofill • allcategories •
percentages • stack • bargap(#) • intensity(*#) • yalternate • xalternate
a
annotations
by(var) xline(xint) yline(yint) text(y x "annotation")
plot size
scatter plot
bar plot
grouped bar plot
facet
axes
graph matrix mpg price weight, half
y1
bin(#) • width(#) • density • fraction • frequency • percent • addlabels
addlabopts(<options>) • normal • normopts(<options>) • kdensity
kdenopts(<options>)
graph bar (percent), over(rep78) over(foreign)
[if ], <plot options>
TWO+ CONTINUOUS VARIABLES
histogram
DISCRETE
plot-specific options
<marker, line, text, axis, legend, background options> scheme(s1mono) play(customTheme) xsize(5) ysize(4) saving("myPlot.gph", replace)
histogram mpg, width(5) freq kdensity kdenopts(bwidth(5))
bwidth • kernel(<options>
normal • normopts(<line options>)
[in]
custom appearance
sysuse auto, clear
smoothed histogram
y1 y2 … yn x
title("title") subtitle("subtitle") xtitle("x-axis title") ytitle("y axis title") xscale(range(low high) log reverse off noline) yscale(<options>)
For more info see Stata’s reference manual (stata.com)
ONE VARIABLE
CONTINUOUS
variables: y first
BASIC PLOT SYNTAX:
twoway rbar length headroom price
range plot (y1 ÷ y2) with bars
vertical • horizontal • barwidth(#) • mwidth
msize(<marker size>)
inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets)
regress price mpg headroom trunk length turn
ssc install coefplot
coefplot, drop(_cons) xline(0)
Plot regression coefficients
baselevels • b(<options>) • at(<options>) • noci • levels(#)
keep(<variables>) • drop(<variables>) • rename(<list>)
horizontal • vertical • generate(<variable>)
regress mpg weight length turn
margins, eyex(weight) at(weight = (1800(200)4800))
marginsplot, noci
Plot marginal effects of regression
horizontal • noci
geocenter.github.io/StataTraining
updated February 2016
Disclaimer: we are not affiliated with Stata. But we like it.
CC BY 4.0
Plotting in Stata 15
Apply Themes
ANATOMY OF A PLOT
Customizing Appearance
title
For more info see Stata’s reference manual (stata.com)
200
annotation
y-axis
plots contain many features
y-axis title
150
graph region
inner graph region
inner plot region
9
6
2
50
y-axis labels
plot region
10
marker
grid lines
20
40
x-axis
specify the fill of the background in RGB or with a Stata color
60
80
x-axis title
SYNTAX
COLOR
for example:
scatter price mpg, xline(20, lwidth(vthick))
mcolor("145 168 208")
mcolor(none)
mfcolor("145 168 208")
mfcolor(none)
specify the fill and stroke of the marker
in RGB or with a Stata color
specify the fill of the marker
LINES / BORDERS
line
specify the marker size:
SIZE / THICKNESSS
ehuge
vhuge
huge
vlarge
large
POSITION
APPEARANCE
msymbol(Dh)
medlarge
color("145 168 208")
D
T
S
o
d
t
s
Oh
Dh
Th
Sh
oh
dh
th
sh
+
X
p
jitter(#)
randomly displace the markers
none i
jitterseed(#)
set seed
tlcolor("145 168 208")
marker
tick marks
grid lines
vvvthick
vvthick
vthick
thick
medthick
medium
line
axes
grid lines
solid
dash
dot
lpattern(dash)
glpattern(dash)
Create custom themes by
help scheme entries
saving options in a .scheme file
see all options for setting scheme properties
adopath ++ "~/<location>/StataThemes"
set path of the folder (StataThemes) where custom
.scheme files are saved
set as default scheme
title(...)
subtitle(...)
xtitle(...)
ytitle(...)
annotation
text(...)
axis labels
xlabel(...)
ylabel(...)
legend
legend(...)
install William Buchanan’s package to generate custom
schemes and color palettes (including ColorBrewer)
USING THE GRAPH EDITOR
twoway scatter mpg price, play(graphEditorTheme)
color(none)
specify the color of the text
marker label mlabcolor("145 168 208")
Select the
Graph Editor
labcolor("145 168 208")
axis labels
mcolor("145 168 208 %20")
mlwidth(thin)
tlwidth(thin)
glwidth(thin)
medthin
thin
vthin
vvthin
vvvthin
none
specify the
line pattern
longdash
longdash_dot
shortdash
shortdash_dot
dash_dot
blank
axes off no axis/labels
noline
tick marks noticks
tick marks tlength(2)
grid lines nogrid nogmin nogmax
axes
tick marks xlabel(#10, tposition(crossing))
number of tick marks, position (outside | crossing | inside)
Laura Hughes (lhughes@usaid.gov) • Tim Essam (tessam@usaid.gov)
follow us @flaneuseks and @StataRGIS
twoway scatter mpg price, scheme(customTheme)
adjust transparency by adding %#
glcolor("145 168 208")
specify the thickness
(stroke) of a line:
medsmall
small
vsmall
tiny
vtiny
O
lcolor(none)
lwidth(medthick)
medium
specify the marker symbol:
titles
marker label
lcolor("145 168 208")
specify the stroke color of the line or border
marker mlcolor("145 168 208")
USING A SAVED THEME
net inst brewscheme, from("https://wbuchanan.github.io/brewscheme/") replace
<marker
options>
grid lines
msize(medium)
tick marks
axes
marker
Schemes are sets of graphical parameters, so you don’t
have to specify the look of the graphs every time.
set scheme customTheme, permanently
change the theme
TEXT
<line options> <marker xscale(...) grid lines
options> yscale(...)
xline(...)
xlabel(...)
yline(...)
legend
ylabel(...)
legend(region(...))
tick marks
100
y2
Fitted values
legend
specify the fill of the plot background in RGB or with a Stata color
marker
tick marks
0
0
scatter price mpg, plotregion(fcolor("224 224 224") ifcolor("240 240 240"))
<marker
options>
5
3
outer region
inner region
scatter price mpg, graphregion(fcolor("192 192 192") ifcolor("208 208 208"))
arguments for the plot
objects (in green) go in the
options portion of these
commands (in orange)
line
7
y-line
SYMBOLS
marker label
1
8
4
100
y-axis title
titles
subtitle
specify the size of the text:
size(medsmall)
marker label mlabsize(medsmall)
axis labels labsize(medsmall)
Text
Text
Text
Text
Text
Text
Click
Record
vhuge
Text medsmall
Text small
huge
Text
vsmall
Text
tiny
vlarge
Text
half_tiny
large
Text
third_tiny
medlarge Text quarter_tiny
Text
medium
minuscule
Double click on
symbols and areas
on plot, or regions
on sidebar to
customize
Unclick
Record
marker label mlabel(foreign)
label the points with the values
of the foreign variable
axis labels
nolabels
axis labels
format(%12.2f )
no axis labels
change the format of the axis labels
legend
off
legend
label(# "label")
turn off legend
change legend label text
marker label mlabposition(5)
label location relative to marker (clock position: 0 – 12)
inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets)
Save theme
as a .grec file
Save Plots
graph twoway scatter y x, saving("myPlot.gph") replace
save the graph when drawing
graph save "myPlot.gph", replace
save current graph to disk
graph combine plot1.gph plot2.gph...
combine 2+ saved graphs into a single plot
graph export "myPlot.pdf", as(.pdf )
see options to set
export the current graph as an image file size and resolution
geocenter.github.io/StataTraining
Disclaimer: we are not affiliated with Stata. But we like it.
updated June 2016
CC BY 4.0
Examples use auto.dta (sysuse auto, clear)
unless otherwise noted
univar price mpg, boxplot
ssc install univar
calculate univariate summary, with box-and-whiskers plot
stem mpg
return stem-and-leaf display of mpg
frequently used commands are
summarize price mpg, detail
highlighted in yellow
calculate a variety of univariate summary statistics
for Stata 13: ci mpg price, level (99)
ci mean mpg price, level(99)
r compute standard errors and confidence intervals
correlate mpg price
return correlation or covariance matrix
pwcorr price mpg weight, star(0.05)
return all pairwise correlation coefficients with sig. levels
mean price mpg
estimates of means, including standard errors
proportion rep78 foreign
estimates of proportions, including standard errors for
e
categories identified in varlist
ratio
estimates of ratio, including standard errors
total price
estimates of totals, including standard errors
Statistical Tests
tabulate foreign rep78, chi2 exact expected
tabulate foreign and repair record and return chi2
and Fisher’s exact statistic alongside the expected values
ttest mpg, by(foreign)
estimate t test on equality of means for mpg by foreign
r prtest foreign == 0.5
one-sample test of proportions
ksmirnov mpg, by(foreign) exact
Kolmogorov-Smirnov equality-of-distributions test
ranksum mpg, by(foreign)
equality tests on unmatched data (independent samples)
webuse systolic, clear
anova systolic drug
analysis of variance and covariance
e pwmean mpg, over(rep78) pveffects mcompare(tukey)
estimate pairwise comparisons of means with equal
variances include multiple comparison adjustment
TIME SERIES OPERATORS
L.
F.
D.
S.
lag x t-1
lead x t+1
difference x t-x t-1
seasonal difference x t-xt-1
USEFUL ADD-INS
tscollap
carryforward
tsspell
measure something
CATEGORICAL VARIABLES
identify a group to which
an observations belongs
INDICATOR VARIABLES
denote whether
T F
something is true or false
L2.
F2.
D2.
S2.
1850
1900
tsline plot
0
4
webuse drugtr, clear
regress price mpg weight, vce(robust)
estimate ordinary least squares (OLS) model
on mpg weight and foreign, apply robust standard errors
regress price mpg weight if foreign == 0, vce(cluster rep78)
regress price only on domestic cars, cluster standard errors
rreg price mpg weight, genwt(reg_wt)
estimate robust regression to eliminate outliers
probit foreign turn price, vce(robust)
ADDITIONAL MODELS
estimate probit regression with
built-in Stata principal components analysis
pca
command
robust standard errors
factor analysis
factor
poisson • nbreg
count outcomes
logit foreign headroom mpg, or
censored data
tobit
estimate logistic regression and
instrumental variables
ivregress ivreg2
report odds ratios
difference-in-difference
diff
user-written
bootstrap, reps(100): regress mpg /*
ssc
install
ivreg2
regression discontinuity
rd
*/ weight gear foreign
xtabond xtdpdsys dynamic panel estimator
estimate regression with bootstrapping teffects psmatch
propensity score matching
jackknife r(mean), double: sum mpg
synthetic control analysis
synth
Blinder-Oaxaca decomposition
jackknife standard error of sample mean oaxaca
more details at http://www.stata.com/manuals/u25.pdf
DESCRIPTION
specify indicators
specify base indicator
command to change base
treat variable as continuous
o.
#
omit a variable or indicator
specify interactions
regress price io(2).rep78
regress price mpg c.mpg#c.mpg
specify rep78 variable to be an indicator variable
set the third category of rep78 to be the base category
set the base to most frequently occurring category for rep78
treat mpg as a continuous variable and
specify an interaction between foreign and mpg
set rep78 as an indicator; omit observations with rep78 == 2
create a squared mpg term to be used in regression
##
specify factorial interactions
regress price c.mpg##c.mpg
create all possible interactions with mpg (mpg and mpg2)
EXAMPLE
regress price i.rep78
regress price ib(3).rep78
fvset base frequent rep78
regress price i.foreign#c.mpg i.foreign
id 4
1970
1980
1990
webuse nhanes2b, clear
svyset psuid [pweight = finalwgt], strata(stratid)
declare survey design for a dataset
r
svydescribe
report survey data details
svy: mean age, over(sex)
estimate a population mean for each subpopulation
svy, subpop(rural): mean age
estimate a population mean for rural areas
e
svy: tabulate sex heartatk
report two-way table with tests of independence
svy: reg zinc c.age##c.age female weight rural
estimate a regression using survey weights
stores results as e -class
OPERATOR
i.
ib.
fvset
c.
Tim Essam (tessam@usaid.gov) • Laura Hughes (lhughes@usaid.gov)
follow us @StataRGIS and @flaneuseks
SURVEY DATA
stset studytime, failure(died)
r declare survey design for a dataset
stsum
summarize survival-time data
stcox drug age
e
estimate a Cox proportional hazard model
1 Estimate Models
id 3
0
compact time series into means, sums and end-of-period values
carry non-missing values forward from one obs. to the next
identify spells or runs in time series
SURVIVAL ANALYSIS
id 2
2
1950
2-period lag x t-2
2-period lead x t+2
difference of difference xt-xt−1-(xt−1-xt−2)
lag-2 (seasonal difference) xt−xt−2
id 1
2
100
0
webuse nlswork, clear
xtset id year
declare national longitudinal data to be a panel
xtdescribe
xtline plot
report panel aspects of a dataset
wage relative to inflation
r
xtsum hours
summarize hours worked, decomposing
standard deviation into between and
within components
xtline ln_wage if id <= 22, tlabel(#3)
plot panel data as a line plot
xtreg
ln_w c.age##c.age ttl_exp, fe vce(robust)
e
estimate a fixed-effects model with robust standard errors
4
200
Estimation with Categorical & Factor Variables
CONTINUOUS VARIABLES
PANEL / LONGITUDINAL
inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets)
2 Diagnostics
some are inappropriate with robust SEs
estat hettest test for heteroskedasticity
r
ovtest test for omitted variable bias
vif
report variance inflation factor
dfbeta(length)
Type help regress postestimation plots
calculate measure of influence for additional diagnostic plots
rvfplot, yline(0)
avplots
plot residuals
plot all partialmpg
rep78
against fitted
regression leverage
values
plots in one graph
Fitted values
weight
headroom
3 Postestimation
price
Summarize Data
webuse sunspot, clear
tsset time, yearly
declare sunspot data to be yearly time series
tsreport
r
report time series aspects of a dataset
generate lag_spot = L1.spot
create a new variable of annual lags of sun spots
Number of sunspots
tsline spot
plot time series of sunspots
e
arima spot, ar(1/2)
estimate an auto-regressive model with 2 lags
price
Results are stored as either r -class or e -class. See Programming Cheat Sheet
TIME SERIES
price
For more info see Stata’s reference manual (stata.com)
By declaring data type, you enable Stata to apply data munging and analysis functions specific to certain data types
price
Cheat Sheet
with Stata 15
Declare Data
Residuals
Data Analysis
commands that use a fitted model
regress price headroom length Used in all postestimation examples
display _b[length]
display _se[length]
return coefficient estimate or standard error for mpg
from most recent regression model
margins, dydx(length) returns e-class information when post option is used
return the estimated marginal effect for mpg
r
margins, eyex(length)
return the estimated elasticity for price
predict yhat if e(sample)
create predictions for sample on which model was fit
predict double resid, residuals
calculate residuals based on last fit model
test headroom = 0
r test linear hypotheses that headroom estimate equals zero
lincom headroom - length
test linear combination of estimates (headroom = length)
geocenter.github.io/StataTraining
Disclaimer: we are not affiliated with Stata. But we like it.
updated June 2016
CC BY 4.0
Programming
with Stata 15
Cheat Sheet
For more info see Stata’s reference manual (stata.com)
1 Scalars
both r- and e-class results contain scalars
scalar x1 = 3
create a scalar x1 storing the number 3
scalar a1 = “I am a string scalar”
create a scalar a1 storing a string
2 Matrices
Scalars can hold
numeric values or
arbitrarily long strings
DISPLAYING & DELETING BUILDING BLOCKS
[scalar | matrix | macro | estimates] [list | drop] b
list contents of object b or drop (delete) object b
[scalar | matrix | macro | estimates] dir
list all defined objects for that class
matrix list b
matrix dir
scalar drop x1
list contents of matrix b list all matrices delete scalar x1
GLOBALS
public or private variables storing text
available through Stata sessions
LOCALS
R- AND E-CLASS: Stata stores calculation results in two* main classes:
r
return results from general commands
such as summarize or tabulate
e
return results from estimation
commands such as regress or mean
To assign values to individual variables use:
r individual numbers or strings
e rectangular array of quantities or expressions
e pointers that store text (global or local)
1 SCALARS
2 MATRICES
3 MACROS
Loops: Automate Repetitive Tasks
ANATOMY OF A LOOP
objects to repeat over
temporary variable used
only within the loop
requires local macro notation
* there’s also s- and n-class
PUBLIC
available only in programs, loops, or .do files PRIVATE
local myLocal price mpg length
create local variable called myLocal with the
strings price mpg and length
summarize `myLocal' add a ` before and a ' after local macro name to call
summarize contents of local myLocal
levelsof rep78, local(levels)
create a sorted list of distinct values of rep78,
store results in a local macro called levels
local varLab: variable label foreign can also do with value labels
store the variable label for foreign in the local varLab
mean price
e ereturn list
returns list of scalars, macros,
matrices and functions
summarize price, detail
r return list
returns a list of scalars
scalars:
r(N)
r(mean)
r(Var)
r(sd)
=
=
=
=
74
6165.25...
86995225.97...
2949.49...
...
Results are replaced
each time an r-class
/ e-class command
is called
scalars:
e(df_r)
e(N_over)
e(N)
e(k_eq)
e(rank)
=
=
=
=
=
73
1
73
1
1
generate meanN = e(N)
create a new variable equal to
obs. in estimation command
preserve create a temporary copy of active dataframe set restore points
restore restore temporary copy to point last preserved to test code that
generate p_mean = r(mean)
create a new variable equal to
average of price
ACCESSING ESTIMATION RESULTS
changes data
After you run any estimation command, the results of the estimates are
stored in a structure that you can save, view, compare, and export
regress price weight
Use estimates store
estimates store est1
to compile results
store previous estimation results est1 in memory for later use
ssc install estout
eststo est2: regress price weight mpg
eststo est3: regress price weight mpg foreign
estimate two regression models and store estimation results
estimates table est1 est2 est3
print a table of the two estimation results est1 and est2
EXPORTING RESULTS
The estout and outreg2 packages provide numerous, flexible options for making tables
after estimation commands. See also putexcel and putdocx commands.
esttab est1 est2, se star(* 0.10 ** 0.05 *** 0.01) label
create summary table with standard errors and labels
esttab using “auto_reg.txt”, replace plain se
export summary table to a text file, include standard errors
outreg2 [est1 est2] using “auto_reg2.txt”, see replace
export summary table to a text file using outreg2 syntax
see also while
Stata has three options for repeating commands over lists or values:
foreach, forvalues, and while. Though each has a different first line,
the syntax is consistent:
foreach x of varlist var1 var2 var3 {
Many Stata commands store results in types of lists. To access these, use return or
ereturn commands. Stored results can be scalars, macros, matrices or functions.
matrix ad2 = a , d
matrix ad1 = a \ d
row bind matrices
column bind matrices
matselrc b x, c(1 3) findit matselrc
select columns 1 & 3 of matrix b & store in new matrix x
mat2txt, matrix(ad1) saving(textfile.txt) replace
export a matrix to a text file
ssc install mat2txt
global pathdata "C:/Users/SantasLittleHelper/Stata"
define a global variable called pathdata
cd $pathdata add a $ before calling a global macro
change working directory by calling global macro
global myGlobal price mpg length
summarize $myGlobal
summarize price mpg length using global
basic components of programming
4 Access & Save Stored r- and e-class Objects
e-class results are stored as matrices
matrix a = (4\ 5\ 6)
matrix b = (7, 8, 9)
create a 3 x 1 matrix
create a 1 x 3 matrix
matrix d = b' transpose matrix b; store in d
3 Macros
Building Blocks
command `x', option
...
}
open brace must
appear on first line
command(s) you want to repeat
can be one line or many
close brace must appear
on final line by itself
FOREACH: REPEAT COMMANDS OVER STRINGS, LISTS, OR VARIABLES
foreach x in|of [ local, global, varlist, newlist, numlist ] {
list types: objects over which the
Stata commands referring to `x'
commands will be repeated
}
loops repeat the same command
STRINGS
over different arguments:
sysuse "auto.dta", clear
foreach x in auto.dta auto2.dta {
same as...
tab rep78, missing
sysuse "`x'", clear
tab rep78, missing
sysuse "auto2.dta", clear
}
tab rep78, missing
LISTS
foreach x in "Dr. Nick" "Dr. Hibbert" {
display length("Dr. Nick")
display length ( "` x '" )
display length("Dr. Hibbert")
}
When calling a command that takes a string,
surround the macro name with quotes.
VARIABLES
foreach x in mpg weight {
summarize `x'
}
must define list type
foreach x of varlist mpg weight {
summarize `x'
}
• foreach in takes any list
as an argument with
elements separated by
spaces
• foreach of requires you
to state the list type,
which makes it faster
summarize mpg
summarize weight
FORVALUES: REPEAT COMMANDS OVER LISTS OF NUMBERS
iterator
forvalues i = 10(10)50 {
display `i'
numeric values over
which loop will run
}
DEBUGGING CODE
Use display command to
show the iterator value at
each step in the loop
ITERATORS
display 10
display 20
...
i = 10/50
10, 11, 12, ...
i = 10(10)50
10, 20, 30, ...
i = 10 20 to 50 10, 20, 30, ...
see also capture and scalar _rc
set trace on (off )
trace the execution of programs for error checking
PUTTING IT ALL TOGETHER
sysuse auto, clear
pull out the first word
generate car_make = word(make, 1)
from the make variable
calculate unique groups of
levelsof car_make, local(cmake)
define the
car_make and store in local cmake
local i to be
local i = 1
an iterator
Additional Programming Resources
store the length of local
local cmake_len : word count `cmake'
cmake in local cmake_len
bit.ly/statacode
foreach x of local cmake {
download all examples from this cheat sheet in a .do file
display in yellow "Make group `i' is `x'"
TEMPVARS & TEMPFILES special locals for loops/programs
ssc install adolist
adoupdate
adolist
if `i' == `cmake_len' {
initialize
a
new
temporary
variable
called
temp1
tempvar temp1
Update user-written .ado files
List/copy user-written .ado files
tests the position of the
display "The total number of groups is `i'"
save squared mpg values in temp1
generate `temp1' = mpg^2
iterator,
executes
contents
net install package, from (https://raw.githubusercontent.com/username/repo/master) in brackets when the
summarize the temporary variable temp1
}
summarize `temp1'
install a package from a Github repository
condition is true
local
i
=
`++i'
increment iterator by one
tempfile myAuto create a temporary file to see also
https://github.com/andrewheiss/SublimeStataEnhanced
}
save `myAuto'
be used within a program tempname
configure Sublime text for Stata 11-14
Tim Essam (tessam@usaid.gov) • Laura Hughes (lhughes@usaid.gov)
follow us @StataRGIS and @flaneuseks
inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets)
geocenter.github.io/StataTraining
Disclaimer: we are not affiliated with Stata. But we like it.
updated June 2016
CC BY 4.0
Download