Stata.graphics

advertisement
Computing for Research I
Spring 2012
Stata Graphics
February 16
Primary Instructor:
Elizabeth Garrett-Mayer
Basic syntax for commands
• prefix: command varlist, options
• Examples:
– regress y x, level(90)
– by race: sum y x, detail
– ttest y, by(x) unequal
Stata Graphics
• Maybe we can just end class now!
• Check out these links:
– http://www.ats.ucla.edu/stat/stata/library/Graph
Examples/default.htm
– http://www.ats.ucla.edu/stat/stata/topics/graphic
s.htm
– http://data.princeton.edu/stata/graphics.html
– http://www.stata.com/capabilities/graphics.html
Basic univariate displays
•
•
•
•
Boxplots
Stem and leaf
Histograms
Density plots
Ceramide Data
•
•
•
•
Let’s look at the ceramide markers
What are their distributions?
Are there outliers?
Should we consider taking logs, or using % change?
Results of a phase II trial of gemcitabine plus doxorubicin in patients with recurrent head
and neck cancers: serum Cā‚ā‚ˆ-ceramide as a novel biomarker for monitoring response.
Saddoughi SA, Garrett-Mayer E, Chaudhary U, O'Brien PE, Afrin LB, Day TA, Gillespie MB,
Sharma AK, Wilhoit CS, Bostick R, Senkal CE, Hannun YA, Bielawski J, Simon GR, Shirai K,
Ogretmen B. Clin Cancer Res. 2011 Sep 15;17(18):6097-105. Epub 2011 Jul 26.
Histogram
0
.02
Density
.04
.06
• hist c18
0
20
40
C18 ceramide
60
Let’s make it prettier
* prettier histograms
hist c18 , freq xaxis(1 2) ylabel(0(2)24) xlabel(20
"Twenty" 40 "Forty")
hist c18, title("Histogram of C18 Ceramide") subtitle("PI:
K. Shirai")
hist c18, ytitle("number of patients") freq yline(0(10)20)
hist c18, xaxis(1 2) xlabel(19.6 "mean" 11.9 "median",
axis(2) grid)
finding help on these can sometimes be tricky! e.g. help axis_choice_options
20
C18 ceramide
40
Histogram of C18 Ceramide
60
PI: K. Shirai
0
0
2
4
6
8
.02
Density
.04
.06
10 12 14 16 18 20 22 24
0
Forty
C18 ceramide
0
20
25
Twenty
40
C18 ceramide
60
C18 ceramide
0
0
5
.02
10
Density
.04
15
20
.06
median mean
0
20
40
C18 ceramide
60
0
20
40
C18 ceramide
60
Boxplots
40
20
0
C18 ceramide
60
80
• graph box c18
Boxplots
graph box c18, by(cycle)
graph box c18, over(cycle)
tab cycle
graph box c18 if cycle<7, over(cycle)
sort patient cycle
merge m:1 patient using "Ptdata.GemDox.dta"
graph box c18 if cycle<7, over(cycle) over(gender)
graph hbox c18, over(initial) capsize(5)
0
0
10
10
30
40
50
50
0
0
0
20 40 60 80
20 40 60 80
11
15
19
1
3
5
40
80
9
C18 ceramide
60
0
7
20
20 40 60 80
5
30
40
3
20
C18 ceramide
20
1
Graphs by Cycle
1
1
3
5
3
f
7
5
9
11
1
15
3
m
19
5
CR
CR
PD
PD
PR
PR
SD
SD
0
20
40
C18 ceramide
60
800
20
40
C18 ceramide
60
80
graph hbox c18, over(initial) capsize(5)
graph hbox c18, over(initial) medtype(marker)medmarker(msymbol(+) msize(large))
graph hbox c18, over(initial) ytitle(“C18”)
Labels
• Sometimes xlabels cannot be applied (e.g.
boxplots)
• need to label your values
• Example: cycle for boxplots
–
label define cycle 1 "cycle 1" 3 "cycle 3" 5 "cycle
5" 7 "cycle 7"
– label values cycle cycle
– graph box c18 if cycle<7, over(cycle)
• (Hint: use this on the homework!)
Stem and Leaf
. stem c18
Stem-and-leaf plot for c18ceramide (C18 ceramide)
c18ceramide rounded to nearest multiple of .1
plot in units of .1
0**
0**
1**
1**
2**
2**
3**
3**
4**
4**
5**
5**
6**
6**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42,43,44,46
57,57,67,81,89,90,96,98,99,99
01,06,08,08,14,15,19,20,35,44
62
03,15,16,18,19,19,22
82
17
23,49
58,68,68
37
86
Dotplot
• Excellent way to show data across groups
when you have a relatively small dataset
• dotplot y, over(group)
dotplot
dotplot
dotplot
dotplot
dotplot
c18,
c18,
c18,
c18,
c18,
over(cycle)
over(gender)
over(gender) nogroup
over(gender) nogroup jitter(3)
over(gender) nogroup median center
40
20
0
C18 ceramide
60
80
Dotplot, by gender
f
m
gender
Scatterplots
• Two way graph
• Syntax:
– graph twoway scatter y x1 x2
– graph twoway scatter y x1
60
40
20
0
C18 ceramide
– graph twoway
scatter c18
totalceramide
80
• Example:
400
600
800
total ceramide levels
1000
1200
Regression example
•
•
•
•
Scatterplot
Residual plots
Leverage
Fitted line with raw data
Code
graph twoway scatter c18 totalcer
regress c18 totalcer
* residual plot
* (residual vs. fitted)
rvfplot
* the long way
* 1. generate a new variable from the
regression, residuals
predict resid, res
* 2. generate a new variable from the
regression, fitted values
predict fit
scatter res fit, yline(0)
* leverage vs. residual plot
lvr2plot
* take transform of C18?
gladder c18
boxcox c18
* generate new variable
gen logc18=log(c18)
scatter logc18 totalcer
scatter logc18 totalcer, mlabel(gender)
scatter logc18 totalcer, mlabel(gender)
s(i)
scatter logc18 totalcer, s(Oh)
* redo regression
regress logc18 totalcer
rvfplot, yline(0)
lvr2plot
predict logfit
* make plot of fitted model and raw
data
scatter logfit logc18 totalcer
scatter logfit logc18 totalcer, s(i o)
c(l .)
graph twoway scatter logfit totalcer,
s(i) c(l) || scatter logc18
totalcer, s(o) c(.)
The next graph to create
Fancier way to put regression lines
infile str14 country setting effort change ///
using http://data.princeton.edu/wws509/datasets/effort.raw
graph
graph
graph
graph
twoway
twoway
twoway
twoway
scatter change setting
(scatter change setting ) (lfit change setting )
(scatter change setting ) (qfit change setting )
(scatter change setting ) (lfitci change setting )
• scatter makes a scatterplot of the two variables
• lfit plots the regression line of y on x
• qfit plots a fitted quadratic model of y on x
• lfitci plots the line AND a confidence interval!
Fancier way to put regression lines
Plot using qfit
0
-20
10
0
20
20
30
40
40
Plot using lfitci
40
40
60
80
60
100
80
setting
setting
change
Fitted values
95% CI
change
Fitted values
100
(scatter change setting, mlabel(country) )
40
graph twoway (lfitci change setting)
Cuba
CostaRica
TrinidadTobago
Chile
20
Colombia
Panama
Jamaica
DominicanRep
ElSalvador
Brazil
Nicaragua
Paraguay
Venezuela
Peru
Ecuador
Bolivia
Haiti
Mexico
-20
0
Honduras
Guatemala
40
60
80
100
setting
95% CI
change
•
•
•
Fitted values
One slight problem with the labels is the overlap of Costa Rica and Trinidad Tobago
(and to a lesser extent Panama and Nicaragua).
We can solve this problem by specifying the position of the label relative to the
marker using a 12-hour clock (so 12 is above, 3 is to the right, 6 is below and 9 is to
the left) and the mlabv() option.
We create a variable to hold the position set by default to 3 o'clock and then move
Costa Rica to 9 o'clock and Trinidad Tobago to just a bit above that at 11 o'clock
(we can also move Nicaragua and Panama up a bit, say to 2 o'clock).
40
Cuba
TrinidadTobago
CostaRica
20
Colombia
DominicanRep
Chile
Panama
Jamaica
ElSalvador
Brazil
Nicaragua
Paraguay
Venezuela
Peru
Ecuador
Bolivia
Haiti
Mexico
-20
0
Honduras
Guatemala
40
60
80
100
setting
95% CI
change
Fitted values
gen pos=3
replace pos = 11 if country == "TrinidadTobago"
replace pos = 9 if country == "CostaRica"
replace pos = 2 if country == "Panama" | country == "Nicaragua“
graph twoway (lfitci change setting) ///
(scatter change setting, mlabel(country) mlabv(pos) )
Legends
Cuba
TrinidadTobago
CostaRica
Honduras
Guatemala
Haiti
Brazil
Nicaragua
Paraguay
Peru
Ecuador
Bolivia
-20
linear fit
40
60
80
setting
Colombia
Mexico
DominicanRep
20
Panama
Jamaica
Honduras
Guatemala
95% CI
100
Chile
Panama
Jamaica
ElSalvador
Venezuela
0
ElSalvador
TrinidadTobago
CostaRica
Haiti
Brazil
Nicaragua
Paraguay
Mexico
Venezuela
Peru
Ecuador
Bolivia
-20
20
DominicanRep
Cuba
Chile
Fertility Decline
Colombia
0
Fertility Decline by Social Setting
40
40
Fertility Decline by Social Setting
40
60
80
setting
graph twoway (lfitci change setting) ///
(scatter change setting, mlabel(country) mlabv(pos) ) ///
, title("Fertility Decline by Social Setting") ///
ytitle("Fertility Decline") ///
legend(ring(0) pos(5) order(2 "linear fit" 1 "95% CI"))
graph twoway (lfitci change setting) ///
(scatter change setting, mlabel(country) mlabv(pos) ) ///
, title("Fertility Decline by Social Setting") ///
ytitle("Fertility Decline") ///
legend(off)
100
Spaghetti plots
Command available from UCLA: spagplot
* spaghetti plots
clear
insheet using "I:\MUSC Oncology\Shirai,
Keisuke\October2010\ceramide.csv"
findit spagplot
spagplot c18 cycle, id(patient)
spagplot c18 cycle, id(patient) nofit
* remove patients who only have cycle=1
sort patient cycle
by patient: gen visit=_n
egen maxvis=max(visit), by(patient)
spagplot c18 cycle if maxvis>1, id(patient) nofit
* or, use c(L)
graph twoway scatter c18 cycle if maxvis>1, c(L)
help connectstyle
other neat stuff
• graph matrix
• saving graphs: click and save as desired
format
• saving and combining (see princeton site,
section 3.3)
– http://data.princeton.edu/stata/graphics.html
• See GraphExamples on ucla site:
– http://www.ats.ucla.edu/stat/stata/library/GraphExamples/
Download