Visualizations in R Visualization basics in R and tidyverse Different visualization packages in R Base R has its own package and there are other useful packages you can add. They'll help you do almost anything you want with your data from making simple pie charts, to creating more complex visuals like interactive graphs and maps. Some of the most popular packages include: ● ggplot2 ● Plotly (do a wide range of visualization functions) ● Lattice ● RGL (focus on specific solutions like 3D visuals) ● Dygraphs ● Leaflet ● Highcharter ● Patchwork ● gganimate ● ggridges. Ggplot (R visualization package) It's the most popular visualization package in R. A lot of data analysts prefer to use ggplot2. You can use ggplot2 on its own or extend its powers with other packages. Ggplot2 was originally created by the statistician and developer Hadley Wickham in 2005. Wickham's inspiration for creating ggplot2 came from the 1999 book The Grammar of Graphics, a scholarly study of data visualization by computer scientist Leland Wilkinson. The first two letters of ggplot2 actually stand for grammar of graphics. And in the same way the grammar of a human language gives us rules to build any kind of sentence, the grammar of graphics gives us rules to build any kind of visual. Benefits of ggplot2 ● Create different types of plots including scatter plots, bar charts, line diagrams and tons more ● Customize the look and feel of plots (change the colors, layout and dimensions of your plots and add text elements like titles, captions and labels). ● Create high-quality visuals. ● Combine data manipulation and visualization using the pipe operator. ● You can add or remove layers of detail to your plot without changing its basic structure or the underlying data. Core Concepts in ggplot2 ● Aesthetics (Aesthetic is a visual property of an object in your plot. For example, in a scatter plot aesthetics include things like the size, shape or color of your data points. Think of an aesthetic as a connection or mapping between a visual feature in your plot and a variable in your data). ● Geoms (A geom refers to the geometric object used to represent your data. For example, you can use points to create a scatter plot, bars to create a bar chart, or lines to create a line diagram). ● Facets (Facets let you display smaller groups or subsets of your data. With facets, you can create separate plots for all the variables in your dataset). ● Labels and Annotations (Label and annotate functions let you customize your plot. You can add text like titles, subtitles and captions to communicate the purpose of your plot or highlight important data). The ggplot2 cheat sheet RStudio has a useful reference guide called the “Data Visualization with ggplot2 Cheat Sheet.” You can use the Cheat Sheet as a quick reference while you work to learn about the main functions and features of ggplot2. Click the link to check it out: Cheat Sheet https://r4ds.had.co.nz/data-visualisation.html https://r4ds.had.co.nz/graphics-for-communication.html #attach hyperlink Getting started with ggplot() ggplot(data = penguins) + geom_point(mapping aes(x=flipper_length_mm, y=body_mass_g)) = The code uses functions from ggplot2 to plot the relationship between body mass and flipper length. In R, a function's name is followed by a set of parentheses. Lots of functions require special information to do their jobs. You write this information called the function's argument inside the parentheses. The three functions in the code are the ggplot function, the geom_point function, and the aes function. Every ggplot2 plot starts with the ggplot function. The argument of the ggplot function tells R what data to use for your plot. So the first thing to do is choose a data frame to work with. You can set up the code like this. Inside the parentheses of the function write the word data, then an equal sign, then penguins. This code initializes or starts the plot. This is just the first step in creating a plot. The next thing you might notice about this code is the plus sign at the end of the first line. You use the plus sign to add layers to your plot. In ggplot2 plots are built through combinations of layers. First, we start with our data. Then we add a layer to our plot by choosing a geom to represent our data. The function geom_point tells R to use points to represent our data. Keep in mind that the plus sign must be placed at the end of each line to add a layer. Adding a geom function is the second step in creating a plot. As a reminder, a geom is a geometric object used to represent your data. Geoms include points, bars, lines, and more. In our code, the function geom_point tells R to use points and create a scatter plot. Next, we need to choose specific variables from our dataset and tell R how we want these variables to look in our plot. In ggplot2, the way a variable looks is called its aesthetic. An aesthetic is a visual property of an object in your plot, like its position, color, shape, or size. The mapping equals aes part of the code tells R what aesthetics to use for the plot. You use the aes function to define the mapping between your data and your plot. Mapping means matching up a specific variable in your dataset with a specific aesthetic. For example, you can map a variable to the x- axis of your plot, or you can map a variable to the y-axis of your plot. In a scatter plot, you can also map a variable to the color, size, and shape of your data points. Mapping aesthetics to variables is the third step in creating a plot. In our code, we map the variable flipper length to the x-axis and the variable body mass to the y-axis. Inside the parentheses of the aes function, we write the name of the aesthetic then the equal sign, then the name of the variable. Run the code and then the scatter plot appears showing the relationship between flipper length and body mass of penguins. To learn more about any R function, just run the code question mark function_name. For example, if you want to learn more about the geom_point function, type in question mark geom_point. ? geom_point As a new learner, you might not understand all the concepts in the help page. At the bottom of the page,you can find specific examples of code that may show you how to solve your problem. Steps to create a plot 1. Start with the ggplot function and choose a dataset to work with. 2. Add a geom_function to display your data. 3. Map the variables you want to plot in the argument of the aes function. We can also turn our code into a reusable template for creating plots in ggplot2. To make a plot, replace the bracketed sections in the code with a dataset, a geom_function, or a group of aesthetic mappings. We can make all kinds of different plots using this template. Aesthetics in ggplot2 Ggplot2 is an R package that allows you to create different types of data visualizations right in your R workspace. In ggplot2, an aesthetic is defined as a visual property of an object in your plot. There are three aesthetic attributes in ggplot2: ● Color: this allows you to change the color of all of the points on your plot, or the color of each data group ● Size: this allows you to change the size of the points on your plot by data group ● Shape: this allows you to change the shape of the points on your plot by data group Enhancing visualizations in R In a scatter plot, wecan make a point small, triangular or blue or a combination of these. Let's go back to our penguins dataset and review the code for our plot that shows the relationship between body mass and flipper length. You can also map data to other aesthetics, like color, size and shape. Right now, the plot of penguins data is in black and white. It clearly shows the positive relationship between the two variables. As the values on the x-axis increase, the values on the yaxis increase. But it's also got some limitations. For example, we can't tell which data points refer to each of the three penguin species. To solve this problem, we can map a new variable to a new aesthetic. Let's add a third variable to our scatter plot by mapping it to a new aesthetic. We'll map the variable species to the aesthetic color by adding some code inside the parentheses of the aes function. We'll add a comma after the body mass variable and type color equals sign species. Our code tells R to assign a different color to each species of penguin. Let's check it out. ggplot(data = penguins) + geom_point(mapping = aes(x=flipper_length_mm, y=body_mass_g, color = species)) The Gentoo are the largest of the three penguin species. The legend just to the right of the plot shows us that the blue points refer to the Gentoo penguins. Not only does R automatically apply different colors to each data point, it also gives a legend to show us the color-coding. We can also use shape to highlight the different penguin species. Let's map the variable species to the aesthetic shape. To do this, we can change the code from color equal species to shape equal species. Instead of colored points, R assigns different shapes to each species. ggplot(data = penguins) + geom_point(mapping = aes(x=flipper_length_mm, y=body_mass_g, shape = species)) Now the legend shows us a circle for the Adelie species, a triangle for the Chinstraps and a square for the Gentoos. You might notice that our plot's in black and white again because we removed the code for color. Let's put some color back into our plot. If we want we can map more than one aesthetic to the same variable. Let's map both color and shape to species. We'll add the code color equals species while keeping the code shape equal species. ggplot(data = penguins) + geom_point(mapping = aes(x=flipper_length_mm, y=body_mass_g, shape = species, color = species)) Now our plot shows a different color and a different shape for each species. Let's add size as well and map three aesthetics to species. If we add size equal species, each colored shape will also be a different size. ggplot(data = penguins) + geom_point(mapping = aes(x=flipper_length_mm, y=body_mass_g, shape = species, color = species, size = species)) Using more than one aesthetic can also be a way to make your visuals more accessible because it gives your viewers more than one way to make sense of your data. We can also map species to the alpha aesthetic, which controls the transparency of the points. Our first plot showed the relationship between body mass and flipper length in black and white. Then we mapped the variable species to the aesthetic color to show the difference between each of the three penguin species. If we want to keep our graph in black and white, we can map the alpha aesthetic to species. This will make some points more transparent or see-through than others. This gives us another way to represent each penguin species. ggplot(data = penguins) + geom_point(mapping = aes(x=flipper_length_mm, y=body_mass_g, alpha = species)) Alpha is a good option when you've got a dense plot with lots of data points. You can also set the aesthetic apart from a specific variable. Let's say we want to change the color of all the points to purple. Here we don't want to map color to a specific variable like species. We just want every point in our scatter plot to be purple. So we need to set our new piece of code outside of the aes function and use quotation marks for our color value. This is because all the code inside of the aes function tells R how to map aesthetics to variables. For example, mapping the aesthetic color to the variable species. If we want to change the appearance of our overall plot without regard to specific variables, we write code outside of the aes function. Let's write the code and run it. ggplot(data = penguins) + geom_point(mapping = aes(x=flipper_length_mm, y=body_mass_g), color = "purple") Additional resources For more information about aesthetic attributes, check out these resources: ● Data visualization with ggplot2 cheat sheet: RStudio’s cheat sheet is a great reference to use while working with ggplot2. It has tons of helpful information, including explanations of how to use geoms and examples of the different visualizations that you can create. ● Stats Education’s Introduction to R: This resource is a great way to learn the basics of ggplot2 and how to apply aesthetic attributes to your plots. You can return to this tutorial as you work more with ggplot2 and your own data. ● RDocumentation aes function: This guide describes the syntax of the aes function and explains what each argument does. Create Different Graphs/Plots Using Different Geom function There are lots of different geoms available. You can choose a specific geom based on how you want to represent your data and your goals for communicating it. This lets you tell the story of your data in different ways and communicate effectively to different audiences. In ggplot2, a geom is the geometrical object used to represent your data. Geoms include points, bars, lines, and more. The geom_point function uses points to create scatter plots. The geom_bar function uses bars to create bar charts and so on. To change the geom in our plot, we need to change the geom function in our code. For creating a scatter chart, we use the following code: ggplot (data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) For creating a smooth line chart we use the following code: ggplot (data = penguins) + geom_smooth(mapping = aes(x = flipper_length_mm, y = body_mass_g)) We still have the same data, but now the data's got a different visual appearance. Instead of points, there's a smooth line that fits the data. The geom underscore smooth function's useful for showing general trends in our data. The line clearly shows the positive relationship between body mass and flipper length. The larger the penguin, the longer the flipper. We can even use two geoms in the same plot. Let's say we want to show the relationship between the trend line and the data points more clearly. We can combine the code for geom_point and the code for geom_smooth by adding a plus symbol after geom underscore smooth. Let's write the code and run it. ggplot (data = penguins) + geom_smooth(mapping = aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) Now we want to plot a separate line for each species of penguin. We can add the line type aesthetic to our code and map it to the variable species. ggplot (data = penguins) +geom_smooth(mapping = aes(x = flipper_length_mm, y = body_mass_g, linetype = species)) Geom_smooth will draw a different line with a different line type for each species of penguin. The legend shows how each line type matches with each species. The plot clearly shows the trend for each species. The geom_jitter function creates a scatter plot and then adds a small amount of random noise to each point in the plot. Jittering helps us deal with over-plotting, which happens when the data points in a plot overlap with each other. Jittering makes the points easier to find. And for that we run the following code ggplot (data = penguins) +geom_jitter(mapping = aes(x = flipper_length_mm, y = body_mass_g)) Bar Charts with ggplot2 For bar charts we use the diamonds dataset. This includes data like the quality, clarity, and cut for over 50,000 diamonds. This dataset comes with the ggplot2 package, so it's already loaded. To make a bar chart, we use the geom_bar function. Let's write some code that plots a bar chart of the variable cut in the diamonds dataset. ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut)) Cut refers to a diamond's proportions, symmetry, and polish. Notice that we didn't supply a variable for the y-axis. When you use geom_bar, R automatically counts how many times each x-value appears in the data, and then shows the counts on the y-axis. The default for geom_bar is to count rows. For example, the x-axis of our plot shows five categories of cut quality: fair, good, very good, premium, and ideal. The y-axis shows the number of diamonds in each category. Geom_bar uses several aesthetics that you're already familiar with, such as color, size, and alpha. Let's add the color aesthetic to our plot and map it to the variable cut. ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, color = cut)) The color aesthetic adds color to the outline of each bar. R also supplies a legend to show the color-coding. Let's say, we want to highlight the difference between cuts even more clearly to make our plot easier to understand. We can use the fill aesthetic to add color to the inside of each bar. ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut,fill = cut)) R automatically chooses the colors and supplies a legend. If we map fill to a new variable, geom underscore bar will display what's called a stacked bar chart. Let's map fill to clarity instead of cut. ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity)) Our plot now shows 40 different combinations of cut and clarity. Each combination has its own colored rectangle. The rectangles that have the same cut value are stacked on top of each other in each bar. The plot organizes the complex data. Now we know the difference in volume between cuts and we can figure out the difference in clarity within each cut. Aesthetics and facets Facet functions let you display smaller groups or subsets of your data. A facet is a side or section of an object, like the sides of a gemstone. Facets show different sides of your data by placing each subset on its own plot. Faceting can help you discover new patterns in your data and focus on relationships between different variables. For example, let's say you're looking at sales data for a clothing company. You might want to break down your data by category to show specific trends: children's clothing versus adult clothing, or spring fashions versus fall fashions. Or if you are running an employee engagement survey, you might want to break down your data by tenure and compare senior employees to new employees. Ggplot2 has two functions for faceting: ● Facet_wrap ● facet_grid. To facet your plot by a single variable, use facet_wrap. Let's say we wanted to focus on the data for each species of penguin. Take our plot that shows the relationship between body mass and flipper length in each penguin species. The facet_wrap function lets us create a separate plot for each species. To add a new layer to our plot, we'll add a plus symbol to our code. ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g, )) + geom_point(aes(color = species)) + facet_wrap(~ species) The separate plots show the relationship between body mass and flipper length within each species of penguin. Facets help us focus on important parts of our data that we might not notice in a single plot. If your is visual is too busy, for example, if it's got too many variables or levels within variables, faceting can be a good option. Tilde (~) operator is used to define the relationship between dependent variable and independent variables in a statistical model formula. The variable on the left-hand side of the tilde operator is the dependent variable and the variable(s) on the right-hand side of the tilde operator is/are called the independent variable(s). So, the tilde operator helps to define that dependent variable depends on the independent variable(s) that are on the right-hand side of the tilde operator. (retrieved from tutorialspoint.com) Let's try faceting the diamonds dataset. Earlier, we made a bar chart that showed the number of diamonds for each category of cut. Fair, good, very good, premium, and ideal. We can use face_wrap on the cut variable to create a separate plot for each category of cut. ggplot(data = diamonds) + geom_bar(mapping = aes(x = color, fill = cut)) + facet_wrap(~cut) To facet your plot with two variables, use the facet_grid function. Facet_grid will split the plot into facets vertically by the values of the first variable and horizontally by the values of the second variable. For example, we can take our penguins plot and use facet underscore grid with the two variables, sex and species. In the parentheses following the facet_grid function, we write sex, then the tilde symbol, then species. Let's run the code. ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) + facet_grid(sex~species) There are nine separate plots, each based on a combination of the three species of penguin and three categories of sex. Facet_grid lets you quickly reorganize and display complex data and makes it easier to spot relationships between different groups. If we want, we can focus our plot on only one of the two variables. For example, we can tell R to remove sex from the vertical dimension of the plot and just show species. Let's check it out. gplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) + facet_grid(~species) You can easily spot differences in the relationship between flipper length and body mass between the three species. In the same way, we can focus our plot on sex instead of species. gplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) + facet_grid(~sex) Facets let you reorganize your data to show specific relationships between variables and reveal important patterns and trends in subsets of your data. {r creating a plot with rotated labels} ggplot(data = hotel_bookings) + geom_bar(mapping = aes(x = distribution_channel)) + facet_wrap(~deposit_type) + theme(axis.text.x = element_text(angle = 45)) Filtering and plots By this point you have likely downloaded at least a few packages into your R library. The tools in some of these packages can actually be combined and used together to become even more useful. This reading will share a few resources that will teach you how to use the filter function from dplyr to make the plots you create with ggplot2 easier to read. Example of filtering data for plotting Filtering your data before you plot it allows you to focus on specific subsets of your data and gain more targeted insights. To do this, just include the dplyr filter() function in your ggplot syntax. Example code data %>% filter(variable1 == "DS") %>% ggplot(aes(x = weight, y = variable2, colour = variable1)) + geom_point(alpha = 0.3, position = position_jitter()) + stat_smooth(method = "lm") Additional resources To learn more details about ggplot2 and filtering with dplyr, check out these resources: ● Putting it all together: (dplyr+ggplot): The RLadies of Sydney’s course on R uses real data to demonstrate R functions. This lesson focuses specifically on combining dplyr and ggplot to filter data before plotting it. The instructional video will guide you through every step in the process while you follow along with the data they have provided. ● Data transformation: This resource focuses on how to use the filter() function in R, and demonstrates how to combine filter() with ggplot(). This is a useful resource if you are interested in learning more about how filter() can be used before plotting. ● Visualizing data with ggplot2: This comprehensive guide includes everything from the most basic uses for ggplot2 to creating complicated visualizations. It includes the filter() function in most of the examples so you can learn how to implement it in R to create data visualizations. Chart title To add a title to the chart, use a label function: title = Average product rating. Blue and yellow bars To highlight underperforming products, use an aesthetics function: col = ifelse (x<2, 'blue', 'yellow'). Bar chart To create the bars on the chart, use a geom function: geom_bar (). Trend line To create a trend line, use a geom function: geom_smooth (). Scatter plot chart To create the scatter plot, use a geom function: geom_point (). Compare data To compare data trends across average ratings, use a facets function: facet_wrap (~Average Rating) Axis labels To label the axes, use an aesthetics function: aes (x = Average price (USD), y = Product) Labels and Annotations Annotate means to add notes to a document or diagram to explain or comment upon it. In ggplot 2 adding annotations to your plot can help explain the plot's purpose or highlight important data. Labels Labels include titles, subtitles, and captions. To add a title to our plot that shows the relationship between body mass and flipper length for the three penguin species, we use the following code: ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper Length") R automatically displays the title at the top of the plot. We can also add a subtitle to our plot to highlight important information about our data by using the following code: ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper Length", subtitle = "Sample of Three Penguin Species") R automatically displays the subtitle just below the title. We can add a caption to our plot in the same way. Captions let us show the source of our data. The palmer penguins data was collected from 2007 to 2009 by Dr.Kristen Gorman, a member of the Palmer Station Long Term Ecological Research program. Let's cite Dr. Gorman in our caption. ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr. Kristen Gorman") R automatically displays the caption at the bottom right of our plot. Annotations Titles, subtitles, and captions are labels that we put outside of the grid of our plot to indicate important information. If we want to put text inside the grid to call out specific data points, we can use the annotate function. For example, let's say we want to highlight the data from the Gentoo penguins. We can use the annotate function to add some text next to the data points that refer to the Gentoos. This text will clearly communicate what the plot shows and reinforce an important part of our data. ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr. Kristen Gorman") + annotate("text", x = 220, y = 3500, label = "The Gentoos are the Largest") In the parentheses of the annotate function, we've got information on the type of label, the specific location of the label and the context of the label. In this case, we want to write a text label. We also want to place it near the Gentoo data points. Let's put it at the following coordinates: x-axis equals 220 millimeters and y-axis equals 3,500 grams. R automatically places the text label on the correct coordinates in our plot. We can customize our annotation even more. Let's say we want to change the color of our text. Well, we can add color equals followed by the name of the color. Let's try purple. ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr. Kristen Gorman") + annotate("text", x = 220, y = 3500, label = "The Gentoos are the Largest", color = "purple") We can also change the font style and size of our text. Use font face and size to write the code. Let's bold our text and make it a little larger. ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr. Kristen Gorman") + annotate("text", x = 220, y = 3500, label = "The Gentoos are the Largest", color = "purple", fontface = "bold", size=4.5) We can even change the angle of our text. For example, we can tilt our text at a 25 degree angle to line it up with our data points. ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr. Kristen Gorman") + annotate("text", x = 220, y = 3500, label = "The Gentoos are the Largest", color = "purple", fontface = "bold", size=4.5, angle=25) Store your Plot as a Variable in R That looks great. By this point, our code is getting pretty long. If you want to use less code, you can store your plot as a variable in R. As a quick reminder to create a variable in R you type the variable name then a less than sign, followed by a dash. Let's try it with the variable name p. p <- ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y =body_mass_g, color = species)) + labs(title = "Palmer Penguins: Body Mass VS. Flipper Length", subtitle = "Sample of Three Penguin Species", caption = "Data Collected by Dr. Kristen Gorman") Now, instead of writing all the code again, we can just call p and add an annotation to it like this: p + annotate("text", x = 220, y = 3500, label = "The Gentoos are the Largest") Adding annotations in R Annotations are a useful way to add notes to your plot. They help you explain the plot’s purpose, highlight important data points, or comment on any data trends or findings the plot illustrates. You have already learned how to add notes as labels, titles, subtitles, and captions. You can also draw arrows or add shapes to your plot to create more emphasis. Usually you add these kinds of annotations in your presentation application after you have saved the visualizations. But, you can now add lines, arrows, and shapes to your plots using ggplot2. Resources Check out these resources to learn more: ● Create an annotation layer: This guide explains how to add an annotation layer with ggplot2. It includes sample code and data visualizations with annotations created in ggplot2. ● How to annotate a plot in ggplot2: This resource includes explanations about how to add different kinds of annotations to your ggplot2 plots, and is a great reference if you need to quickly look up a specific kind of annotation. ● Annotations: Chapter eight of the online ggplot2 textbook is focused entirely on annotations. It provides in-depth explanations of the different types of annotations, how they are used, and detailed examples. ● How to annotate a plot: This R-Bloggers article includes explanations about how to annotate plots in ggplot2. It starts with basic concepts and covers more complicated information the further on you read. ● Text Annotations: This resource focuses specifically on adding text annotations and labels to ggplot2 visualizations. Saving your visualizations To save a plot we'll use the Export option in the plots tab of RStudio or the ggsave function provided by the ggplot2 package. Export option is shown in the plots tab in the Rstudio, where you can save the plot either in the image format or in the pdf form. ggsave is a useful function for saving a plot. It defaults to saving the last plot that you displayed and uses the size of the current graphics device. Ggsave will automatically save the plot that shows the relationship between body mass and flipper length because this is the last plot that we displayed. We have to give the file a name and say what kind of file we want to save it as. Let's write the code. ggsave("Three Penguins Species.png") Now, if we click on the files tab, we'll find our new file in the list. Saving images without ggsave() In most cases, ggsave() is the simplest way to save your plot. But there are situations when it might be best to save your plot by writing it directly to a graphics device. This reading will cover some of the different ways you can save images and plots without ggsave(), and includes additional resources to check out if you want to learn more. A graphics device allows a plot to appear on your computer. Examples include: ● A window on your computer (screen device) ● A PDF, PNG, or JPEG file (file device) ● An SVG, or scalable vector graphics file (file device) When you make a plot in R, it has to be “sent” to a specific graphics device. To save images without using ggsave(), you can open an R graphics device like png() or pdf(); these will allow you to save your plot as a .png or .pdf file. You can also choose to print the plot and then close the device using dev.off(). Example of using png() Example of using pdf() png(file = "exampleplot.png", bg = "transparent") plot(1:10) rect(1, 5, 3, 7, col = "white") dev.off() pdf(file = "/Users/username/Desktop/example.pd f", width = 4, height = 4) plot(x = 1:10, y = 1:10) abline(v = 0) text(x = 0, y = 1, labels = "Random text") dev.off() To learn more about the different processes for saving images, check out these resources: ● Saving images without ggsave(): This resource is pulled directly from the ggplot2 documentation at tidyverse.org. It explores the tools you can use to save images in R, and includes several examples to follow along with and learn how to save images in your own R workspace. ● How to save a ggplot: This resource covers multiple different methods for saving ggplots. It also includes copyable code with explanations about how each function is being used so that you can better understand each step in the process. ● Saving a plot in R: This guide covers multiple file formats that you can use to save your plots in R. Each section includes an example with an actual plot that you can copy and use for practice in your own R workspace. Hands-On Activity: Annotating and saving visualizations You also want to add another detail about what time period this data covers. To do this, you need to find out when the data is from. You realize you can use the `min()` function on the year column in the data: ```{r earliest year} min(hotel_bookings$arrival_date_year) ``` And the `max()` function: ```{r latest year} max(hotel_bookings$arrival_date_year) ``` But you will need to save them as variables in order to easily use them in your labeling; the following code chunk creates two of those variables: ```{r latest date} mindate <- min(hotel_bookings$arrival_date_year) maxdate <- max(hotel_bookings$arrival_date_year) ``` Now, you will add in a subtitle using `subtitle=` in the `labs()` function. Then, you can use the `paste0()` function to use your newly-created variables in your labels. This is really handy, because if the data gets updated and there is more recent data added, you don't have to change the code below because the variables are dynamic: r city bar chart with time frame ggplot(data = hotel_bookings) + geom_bar(mapping = aes(x = market_segment)) + facet_wrap(~hotel) + theme(axis.text.x = element_text(angle = 45)) + labs(title="Comparison of market segments by hotel type for hotel bookings", subtitle=paste0("Data from: ", mindate, " to ", maxdate)) {r city bar chart with time frame as caption} ggplot(data = hotel_bookings) + geom_bar(mapping = aes(x = market_segment)) + facet_wrap(~hotel) + theme(axis.text.x = element_text(angle = 45)) + labs(title="Comparison of market segments by hotel type for hotel bookings", caption=paste0("Data from: ", mindate, " to ", maxdate)) {r city bar chart with x and y axis} ggplot(data = hotel_bookings) + geom_bar(mapping = aes(x = market_segment)) + facet_wrap(~hotel) + theme(axis.text.x = element_text(angle = 45)) + labs(title="Comparison of market segments by hotel type for hotel bookings", caption=paste0("Data from: ", mindate, " to ", maxdate), x="Market Segment", y="Number of Bookings")