Uploaded by Javeria Malik

Course 7 Week 4 (Visualizations in R

advertisement
Visualizations in R
Visualization basics in R and tidyverse
Different visualization packages in R
Base R has its own package and there are other useful packages you can add. They'll
help you do almost anything you want with your data from making simple pie charts, to
creating more complex visuals like interactive graphs and maps. Some of the most
popular packages include:
● ggplot2
● Plotly (do a wide range of visualization functions)
● Lattice
● RGL (focus on specific solutions like 3D visuals)
● Dygraphs
● Leaflet
● Highcharter
● Patchwork
● gganimate
● ggridges.
Ggplot (R visualization package)
It's the most popular visualization package in R. A lot of data analysts prefer to use
ggplot2. You can use ggplot2 on its own or extend its powers with other packages.
Ggplot2 was originally created by the statistician and developer Hadley Wickham in 2005.
Wickham's inspiration for creating ggplot2 came from the 1999 book The Grammar of
Graphics, a scholarly study of data visualization by computer scientist Leland Wilkinson.
The first two letters of ggplot2 actually stand for grammar of graphics. And in the
same way the grammar of a human language gives us rules to build any kind of sentence,
the grammar of graphics gives us rules to build any kind of visual.
Benefits of ggplot2
● Create different types of plots including scatter plots, bar charts, line diagrams and
tons more
● Customize the look and feel of plots (change the colors, layout and dimensions of
your plots and add text elements like titles, captions and labels).
● Create high-quality visuals.
● Combine data manipulation and visualization using the pipe operator.
● You can add or remove layers of detail to your plot without changing its basic
structure or the underlying data.
Core Concepts in ggplot2
● Aesthetics (Aesthetic is a visual property of an object in your plot. For example,
in a scatter plot aesthetics include things like the size, shape or color of your data
points. Think of an aesthetic as a connection or mapping between a visual feature
in your plot and a variable in your data).
● Geoms (A geom refers to the geometric object used to represent your data. For
example, you can use points to create a scatter plot, bars to create a bar chart, or
lines to create a line diagram).
● Facets (Facets let you display smaller groups or subsets of your data. With facets,
you can create separate plots for all the variables in your dataset).
● Labels and Annotations (Label and annotate functions let you customize your
plot. You can add text like titles, subtitles and captions to communicate the purpose
of your plot or highlight important data).
The ggplot2 cheat sheet
RStudio has a useful reference guide called the “Data Visualization with ggplot2 Cheat
Sheet.” You can use the Cheat Sheet as a quick reference while you work to learn about
the main functions and features of ggplot2.
Click the link to check it out: Cheat Sheet
https://r4ds.had.co.nz/data-visualisation.html
https://r4ds.had.co.nz/graphics-for-communication.html
#attach hyperlink
Getting started with ggplot()
ggplot(data
=
penguins)
+
geom_point(mapping
aes(x=flipper_length_mm, y=body_mass_g))
=
The code uses functions from ggplot2 to plot the relationship between body mass and
flipper length. In R, a function's name is followed by a set of parentheses. Lots of functions
require special information to do their jobs. You write this information called the function's
argument inside the parentheses. The three functions in the code are the ggplot
function, the geom_point function, and the aes function.
Every ggplot2 plot starts with the ggplot function. The argument of the ggplot function
tells R what data to use for your plot. So the first thing to do is choose a data frame to
work with. You can set up the code like this. Inside the parentheses of the function write
the word data, then an equal sign, then penguins. This code initializes or starts the plot.
This is just the first step in creating a plot.
The next thing you might notice about this code is the plus sign at the end of the first line.
You use the plus sign to add layers to your plot. In ggplot2 plots are built through
combinations of layers. First, we start with our data. Then we add a layer to our plot by
choosing a geom to represent our data. The function geom_point tells R to use points to
represent our data. Keep in mind that the plus sign must be placed at the end of each line
to add a layer. Adding a geom function is the second step in creating a plot. As a reminder,
a geom is a geometric object used to represent your data. Geoms include points,
bars, lines, and more. In our code, the function geom_point tells R to use points and
create a scatter plot.
Next, we need to choose specific variables from our dataset and tell R how we want these
variables to look in our plot. In ggplot2, the way a variable looks is called its aesthetic. An
aesthetic is a visual property of an object in your plot, like its position, color, shape, or
size. The mapping equals aes part of the code tells R what aesthetics to use for the plot.
You use the aes function to define the mapping between your data and your plot. Mapping
means matching up a specific variable in your dataset with a specific aesthetic. For
example, you can map a variable to the x- axis of your plot, or you can map a variable to
the y-axis of your plot. In a scatter plot, you can also map a variable to the color, size,
and shape of your data points. Mapping aesthetics to variables is the third step in creating
a plot. In our code, we map the variable flipper length to the x-axis and the variable body
mass to the y-axis. Inside the parentheses of the aes function, we write the name of the
aesthetic then the equal sign, then the name of the variable. Run the code and then the
scatter plot appears showing the relationship between flipper length and body mass of
penguins.
To learn more about any R function, just run the code question mark function_name. For
example, if you want to learn more about the geom_point function, type in question mark
geom_point.
? geom_point
As a new learner, you might not understand all the concepts in the help page. At the
bottom of the page,you can find specific examples of code that may show you how to
solve your problem.
Steps to create a plot
1. Start with the ggplot function and choose a dataset to work with.
2. Add a geom_function to display your data.
3. Map the variables you want to plot in the argument of the aes function. We can
also turn our code into a reusable template for creating plots in ggplot2. To make
a plot, replace the bracketed sections in the code with a dataset, a
geom_function, or a group of aesthetic mappings. We can make all kinds of
different plots using this template.
Aesthetics in ggplot2
Ggplot2 is an R package that allows you to create different types of data visualizations
right in your R workspace. In ggplot2, an aesthetic is defined as a visual property of an
object in your plot.
There are three aesthetic attributes in ggplot2:
● Color: this allows you to change the color of all of the points on your plot, or the
color of each data group
● Size: this allows you to change the size of the points on your plot by data group
● Shape: this allows you to change the shape of the points on your plot by data
group
Enhancing visualizations in R
In a scatter plot, wecan make a point small, triangular or blue or a combination of these.
Let's go back to our penguins dataset and review the code for our plot that shows the
relationship between body mass and flipper length.
You can also map data to other aesthetics, like color, size and shape. Right now, the
plot of penguins data is in black and white. It clearly shows the positive relationship
between the two variables. As the values on the x-axis increase, the values on the yaxis increase. But it's also got some limitations. For example, we can't tell which data
points refer to each of the three penguin species. To solve this problem, we can map a
new variable to a new aesthetic. Let's add a third variable to our scatter plot by mapping
it to a new aesthetic. We'll map the variable species to the aesthetic color by adding
some code inside the parentheses of the aes function. We'll add a comma after the
body mass variable and type color equals sign species. Our code tells R to assign a
different color to each species of penguin. Let's check it out.
ggplot(data = penguins) + geom_point(mapping =
aes(x=flipper_length_mm, y=body_mass_g, color = species))
The Gentoo are the largest of the three penguin species. The legend just to the right of
the plot shows us that the blue points refer to the Gentoo penguins. Not only does R
automatically apply different colors to each data point, it also gives a legend to show us
the color-coding.
We can also use shape to highlight the different penguin species. Let's map the variable
species to the aesthetic shape. To do this, we can change the code from color equal
species to shape equal species. Instead of colored points, R assigns different shapes to
each species.
ggplot(data = penguins) + geom_point(mapping =
aes(x=flipper_length_mm, y=body_mass_g, shape = species))
Now the legend shows us a circle for the Adelie species, a triangle for the Chinstraps
and a square for the Gentoos. You might notice that our plot's in black and white again
because we removed the code for color. Let's put some color back into our plot. If we
want we can map more than one aesthetic to the same variable. Let's map both color
and shape to species. We'll add the code color equals species while keeping the code
shape equal species.
ggplot(data = penguins) + geom_point(mapping =
aes(x=flipper_length_mm, y=body_mass_g, shape = species, color =
species))
Now our plot shows a different color and a different shape for each species.
Let's add size as well and map three aesthetics to species. If we add size equal
species, each colored shape will also be a different size.
ggplot(data = penguins) + geom_point(mapping =
aes(x=flipper_length_mm, y=body_mass_g, shape = species, color =
species, size = species))
Using more than one aesthetic can also be a way to make your visuals more accessible
because it gives your viewers more than one way to make sense of your data.
We can also map species to the alpha aesthetic, which controls the transparency of the
points. Our first plot showed the relationship between body mass and flipper length in
black and white. Then we mapped the variable species to the aesthetic color to show
the difference between each of the three penguin species. If we want to keep our graph
in black and white, we can map the alpha aesthetic to species. This will make some
points more transparent or see-through than others. This gives us another way to
represent each penguin species.
ggplot(data = penguins) + geom_point(mapping =
aes(x=flipper_length_mm, y=body_mass_g, alpha = species))
Alpha is a good option when you've got a dense plot with lots of data points.
You can also set the aesthetic apart from a specific variable. Let's say we want to
change the color of all the points to purple. Here we don't want to map color to a specific
variable like species. We just want every point in our scatter plot to be purple. So we
need to set our new piece of code outside of the aes function and use quotation marks
for our color value. This is because all the code inside of the aes function tells R how to
map aesthetics to variables. For example, mapping the aesthetic color to the variable
species. If we want to change the appearance of our overall plot without regard to
specific variables, we write code outside of the aes function. Let's write the code and
run it.
ggplot(data = penguins) + geom_point(mapping =
aes(x=flipper_length_mm, y=body_mass_g), color = "purple")
Additional resources
For more information about aesthetic attributes, check out these resources:
● Data visualization with ggplot2 cheat sheet: RStudio’s cheat sheet is a great
reference to use while working with ggplot2. It has tons of helpful information,
including explanations of how to use geoms and examples of the different
visualizations that you can create.
● Stats Education’s Introduction to R: This resource is a great way to learn the
basics of ggplot2 and how to apply aesthetic attributes to your plots. You can
return to this tutorial as you work more with ggplot2 and your own data.
● RDocumentation aes function: This guide describes the syntax of the aes
function and explains what each argument does.
Create Different Graphs/Plots Using Different Geom function
There are lots of different geoms available. You can choose a specific geom based on
how you want to represent your data and your goals for communicating it. This lets you
tell the story of your data in different ways and communicate effectively to different
audiences. In ggplot2, a geom is the geometrical object used to represent your data.
Geoms include points, bars, lines, and more. The geom_point function uses points
to create scatter plots. The geom_bar function uses bars to create bar charts and so on.
To change the geom in our plot, we need to change the geom function in our code.
For creating a scatter chart, we use the following code:
ggplot (data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =
body_mass_g))
For creating a smooth line chart we use the following code:
ggplot (data = penguins) + geom_smooth(mapping = aes(x = flipper_length_mm, y =
body_mass_g))
We still have the same data, but now the data's got a different visual appearance. Instead
of points, there's a smooth line that fits the data. The geom underscore smooth function's
useful for showing general trends in our data. The line clearly shows the positive
relationship between body mass and flipper length. The larger the penguin, the longer the
flipper. We can even use two geoms in the same plot. Let's say we want to show the
relationship between the trend line and the data points more clearly. We can combine the
code for geom_point and the code for geom_smooth by adding a plus symbol after geom
underscore smooth. Let's write the code and run it.
ggplot (data = penguins) + geom_smooth(mapping = aes(x = flipper_length_mm, y =
body_mass_g)) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
Now we want to plot a separate line for each species of penguin. We can add the line
type aesthetic to our code and map it to the variable species.
ggplot (data = penguins) +geom_smooth(mapping = aes(x = flipper_length_mm, y =
body_mass_g, linetype = species))
Geom_smooth will draw a different line with a different line type for each species of
penguin. The legend shows how each line type matches with each species. The plot
clearly shows the trend for each species.
The geom_jitter function creates a scatter plot and then adds a small amount of random
noise to each point in the plot. Jittering helps us deal with over-plotting, which happens
when the data points in a plot overlap with each other. Jittering makes the points easier
to find. And for that we run the following code
ggplot (data = penguins) +geom_jitter(mapping = aes(x = flipper_length_mm, y =
body_mass_g))
Bar Charts with ggplot2
For bar charts we use the diamonds dataset. This includes data like the quality, clarity,
and cut for over 50,000 diamonds. This dataset comes with the ggplot2 package, so it's
already loaded. To make a bar chart, we use the geom_bar function. Let's write some
code that plots a bar chart of the variable cut in the diamonds dataset.
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut))
Cut refers to a diamond's proportions, symmetry, and polish. Notice that we didn't supply
a variable for the y-axis. When you use geom_bar, R automatically counts how many
times each x-value appears in the data, and then shows the counts on the y-axis. The
default for geom_bar is to count rows. For example, the x-axis of our plot shows five
categories of cut quality: fair, good, very good, premium, and ideal. The y-axis shows the
number of diamonds in each category.
Geom_bar uses several aesthetics that you're already familiar with, such as color, size,
and alpha. Let's add the color aesthetic to our plot and map it to the variable cut.
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, color = cut))
The color aesthetic adds color to the outline of each bar. R also supplies a legend to show
the color-coding.
Let's say, we want to highlight the difference between cuts even more clearly to make our
plot easier to understand. We can use the fill aesthetic to add color to the inside of each
bar.
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut,fill = cut))
R automatically chooses the colors and supplies a legend.
If we map fill to a new variable, geom underscore bar will display what's called a stacked
bar chart. Let's map fill to clarity instead of cut.
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity))
Our plot now shows 40 different combinations of cut and clarity. Each combination has
its own colored rectangle. The rectangles that have the same cut value are stacked on
top of each other in each bar. The plot organizes the complex data. Now we know the
difference in volume between cuts and we can figure out the difference in clarity within
each cut.
Aesthetics and facets
Facet functions let you display smaller groups or subsets of your data. A facet is a side
or section of an object, like the sides of a gemstone. Facets show different sides of your
data by placing each subset on its own plot. Faceting can help you discover new patterns
in your data and focus on relationships between different variables. For example, let's say
you're looking at sales data for a clothing company. You might want to break down your
data by category to show specific trends: children's clothing versus adult clothing, or
spring fashions versus fall fashions. Or if you are running an employee engagement
survey, you might want to break down your data by tenure and compare senior employees
to new employees.
Ggplot2 has two functions for faceting:
● Facet_wrap
● facet_grid.
To facet your plot by a single variable, use facet_wrap. Let's say we wanted to focus on
the data for each species of penguin. Take our plot that shows the relationship between
body mass and flipper length in each penguin species. The facet_wrap function lets us
create a separate plot for each species. To add a new layer to our plot, we'll add a plus
symbol to our code.
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g, )) +
geom_point(aes(color = species)) + facet_wrap(~ species)
The separate plots show the relationship between body mass and flipper length within
each species of penguin. Facets help us focus on important parts of our data that we
might not notice in a single plot. If your is visual is too busy, for example, if it's got too
many variables or levels within variables, faceting can be a good option.
Tilde (~) operator is used to define the relationship between dependent variable and
independent variables in a statistical model formula. The variable on the left-hand side of
the tilde operator is the dependent variable and the variable(s) on the right-hand side of
the tilde operator is/are called the independent variable(s). So, the tilde operator helps to
define that dependent variable depends on the independent variable(s) that are on the
right-hand side of the tilde operator. (retrieved from tutorialspoint.com)
Let's try faceting the diamonds dataset. Earlier, we made a bar chart that showed the
number of diamonds for each category of cut. Fair, good, very good, premium, and ideal.
We can use face_wrap on the cut variable to create a separate plot for each category of
cut.
ggplot(data = diamonds) + geom_bar(mapping = aes(x = color, fill = cut)) +
facet_wrap(~cut)
To facet your plot with two variables, use the facet_grid function. Facet_grid will split the
plot into facets vertically by the values of the first variable and horizontally by the values
of the second variable. For example, we can take our penguins plot and use facet
underscore grid with the two variables, sex and species. In the parentheses following the
facet_grid function, we write sex, then the tilde symbol, then species. Let's run the code.
ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =
body_mass_g, color = species)) + facet_grid(sex~species)
There are nine separate plots, each based on a combination of the three species of
penguin and three categories of sex. Facet_grid lets you quickly reorganize and display
complex data and makes it easier to spot relationships between different groups.
If we want, we can focus our plot on only one of the two variables. For example, we can
tell R to remove sex from the vertical dimension of the plot and just show species. Let's
check it out.
gplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =
body_mass_g, color = species)) + facet_grid(~species)
You can easily spot differences in the relationship between flipper length and body mass
between the three species. In the same way, we can focus our plot on sex instead of
species.
gplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =
body_mass_g, color = species)) + facet_grid(~sex)
Facets let you reorganize your data to show specific relationships between variables and
reveal important patterns and trends in subsets of your data.
{r creating a plot with rotated labels}
ggplot(data = hotel_bookings) +
geom_bar(mapping = aes(x = distribution_channel)) +
facet_wrap(~deposit_type) +
theme(axis.text.x = element_text(angle = 45))
Filtering and plots
By this point you have likely downloaded at least a few packages into your R library. The
tools in some of these packages can actually be combined and used together to become
even more useful. This reading will share a few resources that will teach you how to use
the filter function from dplyr to make the plots you create with ggplot2 easier to read.
Example of filtering data for plotting
Filtering your data before you plot it allows you to focus on specific subsets of your data
and gain more targeted insights. To do this, just include the dplyr filter() function in your
ggplot syntax.
Example code
data %>%
filter(variable1 == "DS") %>% ggplot(aes(x = weight,
y = variable2, colour = variable1)) +
geom_point(alpha = 0.3,
position = position_jitter()) + stat_smooth(method = "lm")
Additional resources
To learn more details about ggplot2 and filtering with dplyr, check out these resources:
● Putting it all together: (dplyr+ggplot): The RLadies of Sydney’s course on R
uses real data to demonstrate R functions. This lesson focuses specifically on
combining dplyr and ggplot to filter data before plotting it. The instructional video
will guide you through every step in the process while you follow along with the
data they have provided.
● Data transformation: This resource focuses on how to use the filter() function in
R, and demonstrates how to combine filter() with ggplot(). This is a useful resource
if you are interested in learning more about how filter() can be used before plotting.
●
Visualizing data with ggplot2: This comprehensive guide includes everything
from the most basic uses for ggplot2 to creating complicated visualizations. It
includes the filter() function in most of the examples so you can learn how to
implement it in R to create data visualizations.
Chart title
To add a title to the chart, use a label function: title = Average product rating.
Blue and yellow bars
To highlight underperforming products, use an aesthetics function: col = ifelse (x<2, 'blue', 'yellow').
Bar chart
To create the bars on the chart, use a geom function: geom_bar ().
Trend line
To create a trend line, use a geom function: geom_smooth ().
Scatter plot chart
To create the scatter plot, use a geom function: geom_point ().
Compare data
To compare data trends across average ratings, use a facets function: facet_wrap (~Average Rating)
Axis labels
To label the axes, use an aesthetics function: aes (x = Average price (USD), y = Product)
Labels and Annotations
Annotate means to add notes to a document or diagram to explain or comment upon it.
In ggplot 2 adding annotations to your plot can help explain the plot's purpose or highlight
important data.
Labels
Labels include titles, subtitles, and captions. To add a title to our plot that shows the
relationship between body mass and flipper length for the three penguin species, we use
the following code:
ggplot(data
=
penguins)
+
geom_point(mapping
=
aes(x
=
flipper_length_mm, y =body_mass_g, color = species)) + labs(title
= "Palmer Penguins: Body Mass VS. Flipper Length")
R automatically displays the title at the top of the plot.
We can also add a subtitle to our plot to highlight important information about our data by
using the following code:
ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y
=body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper
Length", subtitle = "Sample of Three Penguin Species")
R automatically displays the subtitle just below the title.
We can add a caption to our plot in the same way. Captions let us show the source of our
data. The palmer penguins data was collected from 2007 to 2009 by Dr.Kristen Gorman,
a member of the Palmer Station Long Term Ecological Research program. Let's cite Dr.
Gorman in our caption.
ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y
=body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper
Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr.
Kristen Gorman")
R automatically displays the caption at the bottom right of our plot.
Annotations
Titles, subtitles, and captions are labels that we put outside of the grid of our plot to
indicate important information. If we want to put text inside the grid to call out specific data
points, we can use the annotate function. For example, let's say we want to highlight the
data from the Gentoo penguins. We can use the annotate function to add some text
next to the data points that refer to the Gentoos. This text will clearly communicate what
the plot shows and reinforce an important part of our data.
ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y
=body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper
Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr.
Kristen Gorman") + annotate("text", x = 220, y = 3500, label = "The Gentoos are the
Largest")
In the parentheses of the annotate function, we've got information on the type of label,
the specific location of the label and the context of the label. In this case, we want to write
a text label. We also want to place it near the Gentoo data points. Let's put it at the
following coordinates: x-axis equals 220 millimeters and y-axis equals 3,500 grams.
R automatically places the text label on the correct coordinates in our plot.
We can customize our annotation even more. Let's say we want to change the color of
our text. Well, we can add color equals followed by the name of the color. Let's try purple.
ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y
=body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper
Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr.
Kristen Gorman") + annotate("text", x = 220, y = 3500, label = "The Gentoos are the
Largest", color = "purple")
We can also change the font style and size of our text. Use font face and size to write the
code. Let's bold our text and make it a little larger.
ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y
=body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper
Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr.
Kristen Gorman") + annotate("text", x = 220, y = 3500, label = "The Gentoos are the
Largest", color = "purple", fontface = "bold", size=4.5)
We can even change the angle of our text. For example, we can tilt our text at a 25 degree
angle to line it up with our data points.
ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y
=body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper
Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr.
Kristen Gorman") + annotate("text", x = 220, y = 3500, label = "The Gentoos are the
Largest", color = "purple", fontface = "bold", size=4.5, angle=25)
Store your Plot as a Variable in R
That looks great. By this point, our code is getting pretty long. If you want to use less
code, you can store your plot as a variable in R. As a quick reminder to create a variable
in R you type the variable name then a less than sign, followed by a dash. Let's try it with
the variable name p.
p <- ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y
=body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper
Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr.
Kristen Gorman")
Now, instead of writing all the code again, we can just call p and add an annotation to it
like this:
p + annotate("text", x = 220, y = 3500, label = "The Gentoos are the Largest")
Adding annotations in R
Annotations are a useful way to add notes to your plot. They help you explain the plot’s
purpose, highlight important data points, or comment on any data trends or findings the
plot illustrates. You have already learned how to add notes as labels, titles, subtitles,
and captions. You can also draw arrows or add shapes to your plot to create more
emphasis. Usually you add these kinds of annotations in your presentation application
after you have saved the visualizations. But, you can now add lines, arrows, and shapes
to your plots using ggplot2.
Resources
Check out these resources to learn more:
● Create an annotation layer: This guide explains how to add an annotation layer
with ggplot2. It includes sample code and data visualizations with annotations
created in ggplot2.
● How to annotate a plot in ggplot2: This resource includes explanations about
how to add different kinds of annotations to your ggplot2 plots, and is a great
reference if you need to quickly look up a specific kind of annotation.
● Annotations: Chapter eight of the online ggplot2 textbook is focused entirely on
annotations. It provides in-depth explanations of the different types of
annotations, how they are used, and detailed examples.
● How to annotate a plot: This R-Bloggers article includes explanations about
how to annotate plots in ggplot2. It starts with basic concepts and covers more
complicated information the further on you read.
● Text Annotations: This resource focuses specifically on adding text annotations
and labels to ggplot2 visualizations.
Saving your visualizations
To save a plot we'll use the Export option in the plots tab of RStudio or the ggsave
function provided by the ggplot2 package.
Export option is shown in the plots tab in the Rstudio, where you can save the plot either
in the image format or in the pdf form.
ggsave is a useful function for saving a plot. It defaults to saving the last plot that you
displayed and uses the size of the current graphics device. Ggsave will automatically
save the plot that shows the relationship between body mass and flipper length because
this is the last plot that we displayed. We have to give the file a name and say what kind
of file we want to save it as. Let's write the code.
ggsave("Three Penguins Species.png")
Now, if we click on the files tab, we'll find our new file in the list.
Saving images without ggsave()
In most cases, ggsave() is the simplest way to save your plot. But there are situations
when it might be best to save your plot by writing it directly to a graphics device. This
reading will cover some of the different ways you can save images and plots without
ggsave(), and includes additional resources to check out if you want to learn more.
A graphics device allows a plot to appear on your computer. Examples include:
● A window on your computer (screen device)
● A PDF, PNG, or JPEG file (file device)
● An SVG, or scalable vector graphics file (file device)
When you make a plot in R, it has to be “sent” to a specific graphics device. To save
images without using ggsave(), you can open an R graphics device like png() or pdf();
these will allow you to save your plot as a .png or .pdf file. You can also choose to print
the plot and then close the device using dev.off().
Example of using png()
Example of using pdf()
png(file
=
"exampleplot.png",
bg
=
"transparent")
plot(1:10)
rect(1, 5,
3,
7,
col
=
"white")
dev.off()
pdf(file
=
"/Users/username/Desktop/example.pd
f",
width = 4,
height = 4)
plot(x = 1:10,
y = 1:10)
abline(v = 0) text(x = 0, y = 1,
labels = "Random text") dev.off()
To learn more about the different processes for saving images, check out these
resources:
● Saving images without ggsave(): This resource is pulled directly from the
ggplot2 documentation at tidyverse.org. It explores the tools you can use to
save images in R, and includes several examples to follow along with and learn
how to save images in your own R workspace.
● How to save a ggplot: This resource covers multiple different methods for
saving ggplots. It also includes copyable code with explanations about how each
function is being used so that you can better understand each step in the
process.
● Saving a plot in R: This guide covers multiple file formats that you can use to
save your plots in R. Each section includes an example with an actual plot that
you can copy and use for practice in your own R workspace.
Hands-On Activity: Annotating and saving
visualizations
You also want to add another detail about what time period this data covers. To do this,
you need to find out when the data is from.
You realize you can use the `min()` function on the year column in the data:
```{r earliest year}
min(hotel_bookings$arrival_date_year)
```
And the `max()` function:
```{r latest year}
max(hotel_bookings$arrival_date_year)
```
But you will need to save them as variables in order to easily use them in your labeling;
the following code chunk creates two of those variables:
```{r latest date}
mindate <- min(hotel_bookings$arrival_date_year)
maxdate <- max(hotel_bookings$arrival_date_year)
```
Now, you will add in a subtitle using `subtitle=` in the `labs()` function. Then, you can
use the `paste0()` function to use your newly-created variables in your labels. This is
really handy, because if the data gets updated and there is more recent data added,
you don't have to change the code below because the variables are dynamic:
r city bar chart with time frame
ggplot(data = hotel_bookings) + geom_bar(mapping = aes(x = market_segment)) +
facet_wrap(~hotel) + theme(axis.text.x = element_text(angle = 45)) +
labs(title="Comparison of market segments by hotel type for hotel bookings",
subtitle=paste0("Data from: ", mindate, " to ", maxdate))
{r city bar chart with time frame as caption}
ggplot(data = hotel_bookings) + geom_bar(mapping = aes(x = market_segment)) +
facet_wrap(~hotel) + theme(axis.text.x = element_text(angle = 45)) +
labs(title="Comparison of market segments by hotel type for hotel bookings",
caption=paste0("Data from: ", mindate, " to ", maxdate))
{r city bar chart with x and y axis}
ggplot(data = hotel_bookings) + geom_bar(mapping = aes(x = market_segment)) +
facet_wrap(~hotel) + theme(axis.text.x = element_text(angle = 45)) +
labs(title="Comparison of market segments by hotel type for hotel bookings",
caption=paste0("Data from: ", mindate, " to ", maxdate), x="Market Segment",
y="Number of Bookings")
Download