ggplot2 A cool way for creating plots in R Maria Novosolov Your scientific graphic options Why not only Excel? • Excel is relatively limited in its support of scientific graphing • It’s options provide limited control over the output • Limited selection of graph types • Limited number of data points that can be plotted Why ggplot2? • Helps creating good looking graphs for publishing • The package format allows manipulation on every step of the way • Create new graphics tailored for your problem • Easy to create complex plots The idea behind ggplot2 Combining all the good and leaving out all the bad of all R packages for plotting Some examples Attention!! It doesn’t suggest what graphics you should use to answer the questions you are interested in The basics of ggplot2 • Works as a layer by layer system • Based on the Grammar of Graphics (Wilkinson, 2005) • The plots can be edited after they are ready How it works Two options of working with the package 1. Easy – using qplot() (=quick plot) 2. Complicated – using ggplot qplot • Basically very similar to the function plot() in R • The first two arguments to qplot() are x and y • An optional data argument (recommended) • Exp: qplot(x, y, data = mydata) qplot • You can change color, size, shape and transparency (all of this is called aesthetics) • This can either be set to the default aesthetics or manually using the function I(). • You can decide on the type of the plot (scatterplot, box-plot, histogram) by using “geom” function. ggplot ggplot • Works by creating the plot layer by layer • A layer is composed of four parts: • data and aesthetic mapping • a statistical transformation (stat) • a geometric object (geom) • and a position adjustment. Layers • Iteratively update a plot • Change a single feature at a time • Think about the high level aspects of the plot in isolation • Instead of choosing a static type of plot, create new types of plots on the fly • Cure against immobility • Developers can easily develop new layers without affecting other layers Lets dive into the syntax Anatomy of a plot 1. 2. 3. 4. 5. 6. 7. 8. Data * Aesthetic mapping * Geometric Object * Statistical transformation Scales Facets Themes Position adjustments * Mandatory layers 1. Data In ggplot2, we always explicitly specify the data qplot(displ, hwy, data = mpg) 2. Aesthetic Mapping In ggplot land aesthetic means "something you can see“ Includes: • Position (i.e., on the x and y axes) • Color ("outside" color) • Fill ("inside" color) • Shape (of points) • Line type • Size Aesthetic mappings are set with the aes() function 3. Geometric Object • A geom can only display certain aesthetics • A plot must have at least one geom; there is no upper limit Example The data.frame to plot Aesthetic Mappings ggplot(data=mpg, aes(x=cty, y=hwy)) + geom_point() What geom to use in plotting 4. Statistical Transformations • Each geom has a default statistic, but these can be • • changed Some plot types (such as scatterplots) do not require transformations–each point is plotted at x and y coordinates equal to the original value Other plots, such as boxplots, histograms, prediction lines etc. require statistical transformations 5. Scales • Control mapping from data to aesthetic attributes • One scale per aesthetic 5. Scales In ggplot2 scales include: • position • color and fill • size • shape • line type Modified with scale_<aesthetic>_<type> 5. Scales Common Scale Arguments: • name: the first argument gives the axis or legend title • limits: the minimum and maximum of the scale • breaks: the points along the scale where labels should appear • labels: the labels that appear at each break 5. Scales Partial combination matrix of available scales 6. Faceting • Lay out multiple plots on a page • Split data into subsets • Plot subsets into different panels 7. Themes • The ggplot2 theme system handles non-data plot elements such as • • • • Axis labels Plot background Facet label background Legend appearance • Specific theme elements can be overridden using theme() 8. Position adjustments • Tweak positioning of geometric objects • Avoid overlaps Exp: position_jitter • Avoid overplotting by jittering points ggplot(overplotted, aes(x,y)) + geom_point(position=position_jitter (w=0.1, h=0.1)) Summary • Allows creating high quality plots • Many options • Many blogs and webpages explaining how to do different plots in ggplot • Disadvantage – takes time to learn the grammar • There is a book: ggplot2: Elegant Graphics for Data Analysis Useful links • http://ggplot2.org • http://wiki.stdout.org/rcookbook/Graphs/ • http://www.ceb-institute.org/bbs/wpcontent/uploads/2011/09/handout_ggplot2. pdf • http://yeroon.net/ggplot2/ •