04-Charts-and-Graphs

advertisement
Outline
• Research Question: What determines
height?
• Data Input
• Look at One Variable
• Compare Two Variables
• Children’s Height and Parents Height
• Children’s Height and Gender
• Graphic Packages: ggplot2
What factors are most responsible for height?
Galton’s Family Height Dataset
X1
X2
X3
Y
Galton’s Notebook on Families & Height
> getwd()
[1] "C:/Users/johnp_000/Documents"
> setwd()
Dataset Input
h <- read.csv("GaltonFamilies.csv")
Object
Function
Filename
str() summary()
Data Types: Numbers and Factors/Categorical
Variable
Steps
Type
Child’s Height
Continuous
Histogram
Dad’s Height
Mom’s Height
Continuous
Scatter
Gender
Categorical
Boxplot
Frequency Distribution, Histogram
hist(h$child)
Density Plot
plot(density(h$childHeight))
Area = 1
Mode, Bimodal
hist(h$childHeight,freq=F, breaks =25, ylim = c(0,0.14))
curve(dnorm(x, mean=mean(h$childHeight), sd=sd(h$childHeight)), col="red", add=T)
Grammar of Graphics
Seven Components
formations
Legend Axes
ggplot2 built using the grammar of graphics approach
Hadley Wickman and ggplot2
Asst. Professor of Statistics
at Rice University
ggplot2
plyr
reshape
rggobi
profr
http://ggplot2.org/
ggplot2
In ggplot2 a plot is made up of layers.
Pl o t
ggplot2
library(ggplot2)
h.gg <- ggplot(h, aes(child))
h.gg + geom_histogram(binwidth = 1 ) + labs(x = "Height", y = "Frequency")
h.gg + geom_density()
ggplot2
h.gg <- ggplot(h, aes(child)) + theme(legend.position = "right")
h.gg + geom_density() + labs(x = "Height", y = "Frequency")
h.gg + geom_density(aes(fill=factor(gender)), size=2)
Box Plot
Children’s Height vs. Gender
boxplot(h$child~gender,data=h, col=(c("pink","lightblue")),
main="Children's Height by Gender", xlab="Gender", ylab="")
Descriptive Stats: Box Plot
Subset Males
men<- subset(h, gender=='male')
Subset Females
women <- subset(h, gender==‘female')
Children’s Height: Males
hist(men$childHeight)
Children’s Height: Females
hist(women$child)
ggplot2
library(ggplot2)
h.bb <- ggplot(h, aes(factor(gender), child))
h.bb + geom_boxplot()
h.bb + geom_boxplot(aes(fill = factor(gender)))
Variable
Y
X1, X2
X3
Steps
Type
Child’s Height
Continuous
Histogram
Dad’s Height
Mom’s Height
Continuous
Scatter
Gender
Categorical
Boxplot
Correlation
Correlation
?cor
cor(h$father, h$child)
0.2660385
Scatterplot Matrix: pairs()
Correlations Matrix
library(car) scatterplotMatrix(heights)
ggplot2
Analytics & History: 1st Regression Line
The first “Regression Line”
Variable
Steps
Type
Child’s Height
Continuous
Histogram
Dad’s Height
Mom’s Height
Continuous
Scatter
Gender
Categorical
Boxplot
Appendix
What software do you use for
creating charts or data visualizations?
0%
R
Excel
Python
Tableau
D3
BI Tools
Matlab
Javascript
SAS
SPSS
Google
Scientific S/W
Mathematica
SQL
Others
5%
10%
15%
20%
25%
30%
35%
40%
45%
47%
45%
19%
15%
14%
8%
5%
4%
3%
2%
2%
2%
2%
2%
22%
May, 2013 N=172
.net
BIRT
cytoscape
flot
gephi
gnuplot
graphite
iDashboards
Incanter
Java
JMP
LogiXML
MDX
Mondrian
octave
openlayers
OpenViz
PhP
Powerpoint
precog
Prezi
processing
Ptotobi
Silverlight
splunk
SSRS
talend
webGL
Wijmo
WPF
Xcelcuis
XLMiner
50%
Visualization and Reporting
Steep
Learning
Curve
Easy
to
Use
Standard
Interactive
Visualizations
BI Software: Tableau
http://public.tableausoftware.com/views/PapelbonPitchFX/PapelbonPitchFX
http://rcharts.io/gallery/
https://plot.ly/r/
http://shiny.rstudio.com/gallery/movie-explorer.html
The next data visual was produced
with about 150 lines of R code
Data Viz Tutorials
Download