Uploaded by abroyalty

Looking at Health Disparities with Data (Current)

advertisement
Looking at Health Disparities with Data
In this tutorial we will learn how to visually show health disparities in the United States using the
same dataset we used in our first R tutorial.
Part A: Import data set and install the necessary R packages.
1. Create a new RScript file by clicking on the
Choose RScript. Make sure to save (using the
using the following file name:
Disparities_Data_Work
icon in the upper left-hand corner.
icon at the top of the RScript) this file
2. Import the “2019 County Health Rankings Data - v3” dataset, following the steps from
“Installing R and Importing Excel Files”.
OR
You can copy, paste and run the code below to get the data in R (remember to change the
file path to where you saved the dataset on your computer):
CHRD_2019 <- read_xls("C:\\Users\\nanab\\Box\\ECO 390\\ECO390_R\\2019 County
Health Rankings Data - v3.xls",
sheet = "Additional Measure Data", skip=1)
3. To create the charts in this tutorial, you need to install three R packages.
The first package you must install is named ggplot2. Install the ggplot2 package by
typing install.packages("ggplot2") into your RScript. Press CTRL + Enter. Wait as the
package installs (it will take 2-3 minutes).
- Command you type: install.packages("ggplot2 ")
4. The second package you must install is named tidyverse. Install the tidyverse package
the same way you installed the ggplot2 package. Type install.packages("tidyverse")
into your RScript. Press CTRL + Enter. Wait as the package installs; again, it will take 23 minutes.
-
Command you type: install.packages("tidyverse")
5. The third package you must install is named scales. Install the scales package the same
way you installed the first two packages. Type install.packages("scales") into your
RScript. Press CTRL + Enter. Wait as the package installs; again, it will take 2-3
minutes.
- Command you type: install.packages("scales")
6. You have successfully installed all necessary packages (ggplot2, scales and tidyverse)
and are ready to move on to Part B!
Part B: Create a histogram.
1. To use the three R packages you just installed (ggplot2, scales and tidyverse) you must
load those packages. Enter the library(ggplot2) command into your RScript to load the
ggplot2 package.
- Command you type: library(ggplot2)
2. Now load the tidyverse package using the same library command.
- Command you type: library(tidyverse)
3. Finally, load the scales package.
- Command you type: library(scales)
4. In this tutorial, we will be looking at racial health disparities. Our analyses will
differentiate between counties with an African American population of 29% or more. We
will call those counties “More Black counties” and call counties with an African
American population of less than 29% “Less Black counties”.
To do that, we will create a new variable which will have a value of “1” if a county is
“More Black” and “0” if a county is “Less Black”. Let’s name the variable
“pblack29”.
Command you type:
CHRD_2019$pblack29 <- ifelse(CHRD_2019$"% African American" >= 29, "More
Black", "Less Black")
5. Your RScript file is ready for the code needed to create a histogram using your new
variable. Copy and paste the command below.
- Command you type:
CHRD_2019%>%
ggplot(aes(x=`Life Expectancy`))+
geom_histogram(aes(y=stat(width*density)),
binwidth = 5, color = "black", fill = "gray90")+
scale_y_continuous(labels=percent)+
facet_grid(cols=vars(pblack29))+
labs(title = 'Breakdown by Percentage of Black People in a County', x = 'Life Expectancy', y =
'Percent')
NOTE: The + sign must be at the end of each code line in R ggplot to show the entire block of
code should be run together
. . . except for the very last line (labs(…))
6. The window in the lower right-hand corner is called the Plots/Packages/Plots” window.
There is a tab labeled “Plots”. To view the histogram you’ve created using this tutorial,
click on that tab. The histogram should look identical to this:
Part C: Export the histogram and save file.
1. In the Plots tab, there is an Export button. Click this and choose to export the bar chart to
a PDF file.
The Export tab also allows you to copy the plot to the Clipboard; then you can paste the plot into a docxfile.
2. Save the PDF file as “Disparities_Histogram”. The PDF file will be saved in your
current working directory (we learnt about working directories in the first tutorial).
Congratulations! You have successfully created a histogram in R Studio!
Part D: Now that you know how to create a histogram, let’s learn how to create a weighted
scatterplot. This plot is different from the ordinary scatterplot in that it weights the size of the
dots (or circles) by each county’s population (in our case). Like before, we look at disparities in
life expectancy between “More Black” counties and “Less Black” counties. This time, however,
we look at differences in life expectancy based on median household income in each county.
Copy and paste the command below to create a weighted scatterplot:
ggplot(CHRD_2019, aes(x=`Household Income`/1000, y=`Life Expectancy`, color=pblack29,
size=Population/1000000)) +
geom_point(shape=1, stroke=1.2)+
scale_x_continuous(breaks=seq(50,150,50))+
scale_size(range=c(1,30))+
scale_color_manual(values=c("Red","Blue"))+
labs(title="Life Expectancy by Median Household Income", x="Median Household Income",
y="Life Expectancy",
size="Population(Millions)",color="", caption = "*More Black represents counties with an
African American population of 29% or more")
Part E: Finally, you may want to know the relationship between life expectancy and income,
still making a distinction between “More Black” counties and “Less Black” counties to check for
any disparities in that relationship. Here, we will still look at a weighted scatterplot as before.
However, this time, we will include two regression lines to show the relationship between life
expectancy and median income for “More Black” counties and “Less Black” counties.
Copy and paste the command below to create the weighted scatterplot with two regression lines:
ggplot(CHRD_2019, aes(x=`Household Income`/1000, y=`Life Expectancy`,
size=Population/1000000)) +
geom_point(shape=1, stroke=1.2, color="forestgreen")+
scale_x_continuous(breaks=seq(50,150,50))+
scale_size(range=c(1,30))+
geom_smooth(method='lm', se=FALSE, aes(color=pblack29))+
scale_color_manual(values=c("Red","Blue"))+
guides(size=FALSE)+
labs(title="Life Expectancy by Median Household Income", x="Median Household Income",
y="Life Expectancy",
size="Population(Millions)", caption = "*More Black represents counties with an African
American population of 29% or more")
You may save the plots using the steps in Part C.
Download