ggplot2 Package (An implementation of the grammar of graphics) Sandra P. Galeano (sgalea1@tigers.lsu.edu) 05/12/13 Ggplot2 (created by Hadley Wickham and Winston Chang) is a data visualization package for R that works with an underlying grammar of graphics. It is extremely flexible and easy to use to make graphs visually appealing. Compared to the basic R graphics, ggplot2 may be more verbose for simple graphics but is less verbose for complex graphics. Also, ggplot2 allows adding, removing or altering components in a plot easily; or building up a plot from multiple data sources. ggplot2 can be used to construct basic figures such as histograms, box plots and scatterplots, or more complex figures such as maps and heat maps. ggplot2 offers two main functions that can be used to construct the same kind of figures: qplot() and ggplot(). The qplot() mimics R`s traditional plot() function and provides a simpler sintax than ggplot(). However, the ggplot() allows maximum features and flexibility and therefore is recommended to use when constructing complex plots. This self-tutorial goes step-by-step to help you to start using ggplot2 to create basic figures using the ggplot() function. Although the examples here provided show how to create and modify scatterplots, the basic grammar used in ggplot2 to construct any kind of figure is explained. In this self-tutorial I used a modified version of the Fisher`s Iris dataset (Fisher 1936) which I have called “irismod”. The original dataset represents the morphologic variation of three species of Iris (Iris setosa, Iris virginica, and Iris versicolor) based on measurements of the length and width of petals and sepals. In “irismod”, I modified the original dataset by adding a “site” column (north and south) only for the purpose of this exercise. Documentation on ggplot2 package is available from CRAN in the following websites: http://cran.r-project.org/web/packages/ggplot2/index.html http://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf I made this self-tutorial based on the following three resources: - Chang, Winston. 2013. R graphics Cookbook. O`Reilly Media, Inc. USA. - Ram, Karthik. 2013. A quick introduction to ggplot2. Available at: http://www.rbloggers.com/a-quick-introduction-to-ggplot2/ - Wickham, Hadley. 2009. ggplot2: Elegant Graphics for Data Analysis. Springer Science+Business Media. USA. 1 Getting started: Note: Installing R. 3.0.0 may be required to run some functions Installing and loading ggplot2 package and dependencies: >install.packages("ggplot2", dependencies = TRUE) Install ggplot2 and automatically installs dependencies (other packages needed by ggplot2) > library ("ggplot2") Load ggplot2 Installing and loading reshape2 and plyr packages: > install.packages("reshape2") Install reshape2. This package allows for restructuring and aggregating data > install.packages("plyr") Install plyr. This package allows for splitting, applying and combining data > library ("reshape2") Load reshape2 > library ("plyr") Load plyr Reading the data (iris database modified): > irismod=read.table(file="c:/Users/Sandra/Desktop/R class/ggplot2/irismod.txt", header = T) Read the content of irismod.txt dataset > irismod[1:5,] 1 2 3 4 5 Display the first 5 rows of irismod.txt dataset Petal.Width Petal.Length Sepal.Width Sepal.Length Species Site 0.2 1.4 3.3 5.0 Setosa South 0.2 1.0 3.6 4.6 Setosa North 0.2 1.6 3.1 4.8 Setosa South 0.1 1.4 3.6 4.9 Setosa North 0.2 1.3 3.2 4.4 Setosa South Exploring the data using a scatterplot: > ggplot(data = irismod, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() The command “aes” stands for “aesthetics” and refers to “something you can see” such as position, shape, color, etc. The command “geom” stands for geometric and refers to “marks to put in a plot” such as line, point, bar, etc). Plots in ggplot are constructed by adding multiple "aes" and "geom". There is no limit to the number of “geom” you can add to a line code to make a plot. In this case the 2 position for each variable is provided under the aes command (x-axis and y-axis), and “geom_point" is selected to represent the data in the scatterplot. Adding color, trendlines, and error shades: > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3.5) “color=Species” adds colors automatically by species and “size=3.5” changes point size. > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(aes(shape = Species), size = 3.5) “shape=Species” changes shape of points per species 3 > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3.5) + geom_smooth(method = "lm", se = FALSE ) “geom_smooth” adds a trendline to the data per species, “se=FALSE” is added to avoid representing standard error of the data > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3.5) + geom_smooth(method = "lm") To represent the standard error of the data, delete “se=FALSE” from geom_smooth > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3) + geom_smooth(method = "lm") + geom_text(aes(label=rownames(iris))) “label=rownames" is used to add row labels to each point in the figure 4 Selecting colors: Selecting colors manually (useful when only few colors are needed): > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3.5) + geom_smooth(method = "lm") + scale_color_manual(values = c("red", "green", "blue")) “scale_color_manual” allows selecting colors manually for each species Selecting colors from library with palettes of colors (useful when more than three colors are needed): > library(RColorBrewer) Load RColorBrewer to use palettes of colors > display.brewer.all() Display palettes of colors 5 > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3.5) + geom_smooth(method = "lm") + scale_color_brewer(palette = "Dark2") “scale_color_brewer” selects the preferred palette from RColorBrewer Selecting colors from color blind friendly palettes: Colors in palettes are specified as a hexadecimal RGB triplet, such as "#0066CC". The first two digits are the level of red, the next two green, and the last two blue. The value for each ranges from 00 to FF in hexadecimal (base-16) notation. At least two colorblind friendly palettes can be chosen to add in figures using ggplot2: The cbPalette and the cbbPalette. > cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") Calls the color blind friendly palette with grey represented here: > cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") Calls the color blind friendly palette with black represented here: > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3.5) + geom_smooth(method = "lm") + scale_colour_manual(values=cbbPalette) 6 “values=cbbPalette” selects the color blind friendly palette with gray “cbpalette” Faceting: Faceting in ggplot2 is used to generate small multiple plots using the command “facet”. Each small plot shows a different subset of the data, allowing comparison in one figure. To generate small multiples two commands can be used: facet_wrap and facet_grid. > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3.5) + geom_smooth(method = "lm") + scale_colour_manual(values=cbPalette) + facet_wrap(~ Species) “facet_wrap” defines subsets as the level of a single grouping variable, in this case the species variable. > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3.5) + geom_smooth(method = "lm") + scale_colour_manual(values=cbPalette) + facet_grid(Species ~ Site) “facet_grid” defines subsets as the level of two grouping variables, in this case the species and the site variables. 7 Editing plots: The command “theme” is used to edit backgrounds, legends, labels, and fonts in ggplot2. A list of the possible elements can be found under: > ?theme Retrieves all theme elements > ggplot(irismod, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3.5) + geom_smooth(method = "lm") + scale_colour_manual(values=cbPalette) + facet_grid(Species ~ Site)+theme_bw()+ theme(legend.position="none",axis.text=element_text(size=11),axis.title=element_text (size=14,face="bold"),strip.text=element_text(size = 12)) “theme_bw()” is used to change the gray background of the entire figure into a white background maintaining the grid. “legend.position=none” deletes the legend in the figure, “axis.text=element_text()” is used to modify the axes text, “axis.title=element_text()” modifies the axes titles, and “strip.text=element_text()” modifies the facet text which in this case is the name of each species and of each site. 8