Looking at Health Disparities with Data In this tutorial we will learn how to visually show health disparities in the United States using the same dataset we used in our first R tutorial. Part A: Import data set and install the necessary R packages. 1. Create a new RScript file by clicking on the Choose RScript. Make sure to save (using the using the following file name: Disparities_Data_Work icon in the upper left-hand corner. icon at the top of the RScript) this file 2. Import the “2019 County Health Rankings Data - v3” dataset, following the steps from “Installing R and Importing Excel Files”. OR You can copy, paste and run the code below to get the data in R (remember to change the file path to where you saved the dataset on your computer): CHRD_2019 <- read_xls("C:\\Users\\nanab\\Box\\ECO 390\\ECO390_R\\2019 County Health Rankings Data - v3.xls", sheet = "Additional Measure Data", skip=1) 3. To create the charts in this tutorial, you need to install three R packages. The first package you must install is named ggplot2. Install the ggplot2 package by typing install.packages("ggplot2") into your RScript. Press CTRL + Enter. Wait as the package installs (it will take 2-3 minutes). - Command you type: install.packages("ggplot2 ") 4. The second package you must install is named tidyverse. Install the tidyverse package the same way you installed the ggplot2 package. Type install.packages("tidyverse") into your RScript. Press CTRL + Enter. Wait as the package installs; again, it will take 23 minutes. - Command you type: install.packages("tidyverse") 5. The third package you must install is named scales. Install the scales package the same way you installed the first two packages. Type install.packages("scales") into your RScript. Press CTRL + Enter. Wait as the package installs; again, it will take 2-3 minutes. - Command you type: install.packages("scales") 6. You have successfully installed all necessary packages (ggplot2, scales and tidyverse) and are ready to move on to Part B! Part B: Create a histogram. 1. To use the three R packages you just installed (ggplot2, scales and tidyverse) you must load those packages. Enter the library(ggplot2) command into your RScript to load the ggplot2 package. - Command you type: library(ggplot2) 2. Now load the tidyverse package using the same library command. - Command you type: library(tidyverse) 3. Finally, load the scales package. - Command you type: library(scales) 4. In this tutorial, we will be looking at racial health disparities. Our analyses will differentiate between counties with an African American population of 29% or more. We will call those counties “More Black counties” and call counties with an African American population of less than 29% “Less Black counties”. To do that, we will create a new variable which will have a value of “1” if a county is “More Black” and “0” if a county is “Less Black”. Let’s name the variable “pblack29”. Command you type: CHRD_2019$pblack29 <- ifelse(CHRD_2019$"% African American" >= 29, "More Black", "Less Black") 5. Your RScript file is ready for the code needed to create a histogram using your new variable. Copy and paste the command below. - Command you type: CHRD_2019%>% ggplot(aes(x=`Life Expectancy`))+ geom_histogram(aes(y=stat(width*density)), binwidth = 5, color = "black", fill = "gray90")+ scale_y_continuous(labels=percent)+ facet_grid(cols=vars(pblack29))+ labs(title = 'Breakdown by Percentage of Black People in a County', x = 'Life Expectancy', y = 'Percent') NOTE: The + sign must be at the end of each code line in R ggplot to show the entire block of code should be run together . . . except for the very last line (labs(…)) 6. The window in the lower right-hand corner is called the Plots/Packages/Plots” window. There is a tab labeled “Plots”. To view the histogram you’ve created using this tutorial, click on that tab. The histogram should look identical to this: Part C: Export the histogram and save file. 1. In the Plots tab, there is an Export button. Click this and choose to export the bar chart to a PDF file. The Export tab also allows you to copy the plot to the Clipboard; then you can paste the plot into a docxfile. 2. Save the PDF file as “Disparities_Histogram”. The PDF file will be saved in your current working directory (we learnt about working directories in the first tutorial). Congratulations! You have successfully created a histogram in R Studio! Part D: Now that you know how to create a histogram, let’s learn how to create a weighted scatterplot. This plot is different from the ordinary scatterplot in that it weights the size of the dots (or circles) by each county’s population (in our case). Like before, we look at disparities in life expectancy between “More Black” counties and “Less Black” counties. This time, however, we look at differences in life expectancy based on median household income in each county. Copy and paste the command below to create a weighted scatterplot: ggplot(CHRD_2019, aes(x=`Household Income`/1000, y=`Life Expectancy`, color=pblack29, size=Population/1000000)) + geom_point(shape=1, stroke=1.2)+ scale_x_continuous(breaks=seq(50,150,50))+ scale_size(range=c(1,30))+ scale_color_manual(values=c("Red","Blue"))+ labs(title="Life Expectancy by Median Household Income", x="Median Household Income", y="Life Expectancy", size="Population(Millions)",color="", caption = "*More Black represents counties with an African American population of 29% or more") Part E: Finally, you may want to know the relationship between life expectancy and income, still making a distinction between “More Black” counties and “Less Black” counties to check for any disparities in that relationship. Here, we will still look at a weighted scatterplot as before. However, this time, we will include two regression lines to show the relationship between life expectancy and median income for “More Black” counties and “Less Black” counties. Copy and paste the command below to create the weighted scatterplot with two regression lines: ggplot(CHRD_2019, aes(x=`Household Income`/1000, y=`Life Expectancy`, size=Population/1000000)) + geom_point(shape=1, stroke=1.2, color="forestgreen")+ scale_x_continuous(breaks=seq(50,150,50))+ scale_size(range=c(1,30))+ geom_smooth(method='lm', se=FALSE, aes(color=pblack29))+ scale_color_manual(values=c("Red","Blue"))+ guides(size=FALSE)+ labs(title="Life Expectancy by Median Household Income", x="Median Household Income", y="Life Expectancy", size="Population(Millions)", caption = "*More Black represents counties with an African American population of 29% or more") You may save the plots using the steps in Part C.