ggplot2 - TAU R Workshop 2015

advertisement
ggplot2
A cool way for creating plots in R
Maria Novosolov
Your scientific graphic
options
Why not only Excel?
• Excel is relatively limited in its support of
scientific graphing
• It’s options provide limited control over
the output
• Limited selection of graph types
• Limited number of data points that can be
plotted
Why ggplot2?
• Helps creating good looking graphs for
publishing
• The package format allows manipulation
on every step of the way
• Create new graphics tailored for your
problem
• Easy to create complex plots
The idea behind
ggplot2
Combining all the good and
leaving out all the bad of all R
packages for plotting
Some examples
Attention!!
It doesn’t suggest what graphics
you should use to answer the
questions you are interested in
The basics of ggplot2
• Works as a layer by layer system
• Based on the Grammar of Graphics
(Wilkinson, 2005)
• The plots can be edited after they are
ready
How it works
Two options of working with the
package
1. Easy – using qplot() (=quick plot)
2. Complicated – using ggplot
qplot
• Basically very similar to the function
plot() in R
• The first two arguments to qplot() are x
and y
• An optional data argument (recommended)
• Exp: qplot(x, y, data = mydata)
qplot
• You can change color, size, shape and transparency
(all of this is called aesthetics)
• This can either be set to the default aesthetics or
manually using the function I().
• You can decide on the type of the plot (scatterplot,
box-plot, histogram) by using “geom” function.
ggplot
ggplot
• Works by creating the plot layer by layer
• A layer is composed of four parts:
• data and aesthetic mapping
• a statistical transformation (stat)
• a geometric object (geom)
• and a position adjustment.
Layers
• Iteratively update a plot
• Change a single feature at a time
• Think about the high level aspects of the
plot in isolation
• Instead of choosing a static type of plot,
create new types of plots on the fly
• Cure against immobility
• Developers can easily develop new layers
without affecting other layers
Lets dive into the
syntax
Anatomy of a plot
1.
2.
3.
4.
5.
6.
7.
8.
Data *
Aesthetic mapping *
Geometric Object *
Statistical transformation
Scales
Facets
Themes
Position adjustments
* Mandatory layers
1. Data
In ggplot2, we always explicitly specify
the data
qplot(displ, hwy, data = mpg)
2. Aesthetic Mapping
In ggplot land aesthetic means "something you can see“
Includes:
• Position (i.e., on the x and y axes)
• Color ("outside" color)
• Fill ("inside" color)
• Shape (of points)
• Line type
• Size
Aesthetic mappings are set with the aes() function
3. Geometric Object
• A geom can only display certain aesthetics
• A plot must have at least one geom; there is no
upper limit
Example
The data.frame
to plot
Aesthetic
Mappings
ggplot(data=mpg, aes(x=cty, y=hwy))
+ geom_point()
What geom
to use in
plotting
4. Statistical Transformations
• Each geom has a default statistic, but these can be
•
•
changed
Some plot types (such as scatterplots) do not require
transformations–each point is plotted at x and y
coordinates equal to the original value
Other plots, such as boxplots, histograms, prediction
lines etc. require statistical transformations
5. Scales
• Control mapping from data to aesthetic attributes
• One scale per aesthetic
5. Scales
In ggplot2 scales include:
• position
• color and fill
• size
• shape
• line type
Modified with scale_<aesthetic>_<type>
5. Scales
Common Scale Arguments:
• name: the first argument gives the axis or legend title
• limits: the minimum and maximum of the scale
• breaks: the points along the scale where labels should
appear
• labels: the labels that appear at each break
5. Scales
Partial combination matrix of available scales
6. Faceting
• Lay out multiple plots on a page
• Split data into subsets
• Plot subsets into different panels
7. Themes
• The ggplot2 theme system handles non-data plot
elements such as
•
•
•
•
Axis labels
Plot background
Facet label background
Legend appearance
• Specific theme elements can be overridden using
theme()
8. Position adjustments
• Tweak positioning of geometric objects
• Avoid overlaps
Exp: position_jitter
• Avoid overplotting by jittering points
ggplot(overplotted, aes(x,y)) +
geom_point(position=position_jitter
(w=0.1, h=0.1))
Summary
• Allows creating high quality plots
•
Many options
• Many blogs and webpages explaining how
to do different plots in ggplot
• Disadvantage – takes time to learn the
grammar
• There is a book:
ggplot2: Elegant Graphics for Data Analysis
Useful links
•
http://ggplot2.org
• http://wiki.stdout.org/rcookbook/Graphs/
• http://www.ceb-institute.org/bbs/wpcontent/uploads/2011/09/handout_ggplot2.
pdf
• http://yeroon.net/ggplot2/
•
Download