Here - Darrin L Rogers

advertisement
Darrin Rogers
Psychology Department
Feb. 5, 2016
INTRODUCTION TO R – PART 1
Structure of this course
 General information about R
 How to get help and information
 How to do basic things
 Basic analyses
Installing R
 http://r-project.org
 You want the "base" package.
 Click “Dowload R for....” (Windows, Mac OS, etc.)
 If you have a mac, do this FIRST (for best experience):
 Install TCL/TK:
http://cran.r-project.org/bin/macosx/tools/ → tcltk-8.5.5-x11.dmg
What is R?
 A computer language developed specifically for data analysis
You don’t need to “program”
You don’t need to write programs or scripts
 R is interactive
 Type something in, get an immediate result
What can R do?
 Anything any other statistics software can do
 And usually lots more
 Because it’s a programming language
What can R do with Graphics?
 Anything1
 Some of my graphs (from research & teaching):
1Eventually...
Figure 2. Distribution of true Phase 2 SIS scores (blue) versus randomly-generated
profiles (red).
Figure 4. AUCs for 100 runs of SIS discrimination between original profiles and partially (1%
through 100%) random profiles. Light blue lines are AUCs for 100 individual runs; dark blue line
indicates mean AUC at each point.
R and Graphics
 Others’ graphs (not mine)
http://www.statmethods.net/advgraphs/
http://gallery.r-enthusiasts.com/graph/US.2004.elections.map.113
http://gallery.r-enthusiasts.com/graph/Scatter.plot.3D.44
http://gallery.r-enthusiasts.com/graph/Smily.and.Grumpy.faces.174
http://gallery.r-enthusiasts.com/graph/Correlation.matrix.ellipses.149
http://gallery.r-enthusiasts.com/graph/Correlograms.148
http://gallery.r-enthusiasts.com/graph/SuperStorm.Sandy.170
http://gallery.r-enthusiasts.com/graph/Notched.boxplots.6
http://gallery.r-enthusiasts.com/graph/Image.lag.plot.matrix.158
Using R as a calculator
 Start R (double-click a big blue R icon)
 Make something happen!
 Some of the more popular operators:
Try some stuff!
Operation
in R
Operation in R
Add
+
*
/
Exponent
Subtract
Multiply
Divide
Square root
Sum
Log
^
sqrt()
sum()
log()
5+3
4-12*9
sqrt(10^2.5)
sum(5,3,2)
log(127)
Assignment
<-
7
25
4 12.99
0.519
87.3
= or <- creates an object and assigns information to it
Try this:
x <- 5
Now x "contains" the value 5
To see the contents of the object, just type its name
x
<-
<<-
7
25 12.99
4
0.51 9
87.3
"Bob" "Alice"
"Fred"
"Rod" "Jenny"
"Jackie""Xavier"
Assignment: Try it
One number
Multiple
num <- 5
nums <- c(5, 6, 9, 12, 100)
One character string
beast <- "Aardvark"
Multiple beasts <- c("Bird", "Dog", "Hi there")
Other objects
bestiary <- c(word, words)
Action Words: Functions
 Some special words included by default, like
mean
c
cor
etc...
sd
t
t.test
hist
sum
anova
barplot
sqrt
lm
 Other users (and you!) make their own
recode
fa
ggplot
qqPlot
How do functions work?
name(argument = value, argument = value, ...)
ls()
hist(data)
mean(x = myvalues, na.rm = TRUE)
lm(y ~ x, data = surveyData)
recode(responses, recodes = "1=5; 2=4; 4=2; 5=1")
fa(Dataset, nfactors = 4 , rotate = "oblimin" , fm = "gls" )
Use some functions
T-scores:
google helps you learn
which functions exist and
x <- rnorm(200, mean=50,
sd=10)
how
to use them
Letters:
let <- LETTERS
Try a few functions
mean(x)
sd(x)
hist(x)
summary(x)
length(let)
Assign output of function to an object
zx
<-
(x - mean(x)) / sd(x)
Functions: How Do You Know?
 Which functions exist?
Google!
 How to use them?
?functionname
also: Google
or
help("functionname")
Built-in doodads
 Sequences
 Randomness
1:100
sample(1:100, 5, replace=TRUE)
 distributions
 Normal
pnorm(-1.645)
pnorm(95, mean=100, sd=15/sqrt(25))
t
pt(-1.73, df=24)
Quick Demo: Twenty-One Strategy
 In a game of Twenty-One, how often would you win
versus the dealer, if...
 Dealer always "holds" at 17
 You always "hold" at 18
 Simplified (for brevity):
(gotta take risks sometimes...)
 Aces always equal 1
 Initial 2 cards + 2 more "hits" (maximum)
Demo: Twenty-One
 We can pause and see what the distributions look like...
 Set the graphics space print 2 rows (1 column) of charts:
par(mfrow=c(2,1))
 Histogram of our outcomes:
hist(sum4, col="lightgreen")
 Add a vertical line at 21:
abline(v=21, col="red", lwd=3)
 Now the dealer:
hist(sum4.d, col="pink")
abline(v=21, col="red", lwd=3)
Quick regression demo
x <- rnorm(200, mean=50, sd=10)
y <- x + rnorm(200, mean=0, sd=7)
Scatterplot
Regression analysis
Now view the analysis
plot(x,y)
Or...
plot(y ~ x)
mod <- lm(y ~ x)
summary(mod)
plot(mod)
Prettier graph
plot(y ~ x, pch=19, col="blue", main="Regression Plot")
abline(mod, col="red", lwd=2)
Getting Data Into R
Can be frustrating at first
Then you learn how to do it
And how to fix the details that can go wrong
And then it's amazingly flexible and quite easy
Perfect microcosm of R
Import Data
 CSV format is your friend!
 From Excel or SPSS (or anything)  Save As .csv
 Then in R
CleverName <- read.csv()
 Result: a data frame
 Works from URLs 
Pun <- read.csv("http://darrinlrogers.com/static/data/pun.csv")
View the data
 Names of variables:
names(Pun)
 See the first few values:
head(Pun)
 Information about variables:
summary(Pun)
str(Pun)
 See the full matrix
edit(Pun)
Working with Data Frames
 How to access individual variables:
$
dataframename$variablename
Pun$o.age
Pun$o.age
sub.num sub.grp trt.pro p.sex p.age p.ethn p.politaffil first.o.type o.age o.devlvl pun.so pun.nso ...
1
ugs
n
f
19
Wh
-2
s
24
3
3
3.25
...
2
ugs
n
f
19
Wh
0
s
18
3
2.5
2.75
...
3
ugs
n
m
19
0
n
21
3
3
2.5
...
4
ugs
n
f
18
Wh
0
s
16
2
2.5
2.5
...
5
ugs
n
m
18
Wh
1
s
22
3
2.5
2.5
...
6
ugs
n
n
23
3
2.25
...
7
ugs
n
m
18
Wh
-1
n
17
2
2
1.5
...
8
ugs
n
f
20
Wh
-1
s
20
3
3.5
3
...
9
ugs
n
m
19
Wh
0
n
27
3
3
1.5
...
10
ugs
n
f
19
NW
1
n
19
3
2.5
2.25
...
11
ugs
n
m
22
Wh
0
s
26
3
3.25
3.25
...
12
ugs
n
m
23
Wh
0
n
25
3
2.75
3
...
13
ugs
n
f
18
NW
0
n
15
2
2.25
2.25
...
14
ugs
n
f
19
Wh
-1
s
7
1
1.75
1
...
15
ugs
n
f
18
Wh
0
s
8
1
2.5
1.75
...
16
ugs
n
f
18
NW
0
s
12
2
2.25
1.5
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Pun$o.age
sub.num sub.grp trt.pro p.sex p.age p.ethn p.politaffil first.o.type o.age o.devlvl pun.so pun.nso ...
1
ugs
n
f
19
Wh
-2
s
24
3
3
3.25
...
2
ugs
n
f
19
Wh
0
s
18
3
2.5
2.75
...
3
ugs
n
m
19
0
n
21
3
3
2.5
...
4
ugs
n
f
18
Wh
0
s
16
2
2.5
2.5
...
5
ugs
n
m
18
Wh
1
s
22
3
2.5
2.5
...
6
ugs
n
n
23
3
2.25
...
7
ugs
n
m
18
Wh
-1
n
17
2
2
1.5
...
8
ugs
n
f
20
Wh
-1
s
20
3
3.5
3
...
9
ugs
n
m
19
Wh
0
n
27
3
3
1.5
...
10
ugs
n
f
19
NW
1
n
19
3
2.5
2.25
...
11
ugs
n
m
22
Wh
0
s
26
3
3.25
3.25
...
12
ugs
n
m
23
Wh
0
n
25
3
2.75
3
...
13
ugs
n
f
18
NW
0
n
15
2
2.25
2.25
...
14
ugs
n
f
19
Wh
-1
s
7
1
1.75
1
...
15
ugs
n
f
18
Wh
0
s
8
1
2.5
1.75
...
16
ugs
n
f
18
NW
0
s
12
2
2.25
1.5
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Pun$o.age
sub.num sub.grp trt.pro p.sex p.age p.ethn p.politaffil first.o.type o.age o.devlvl pun.so pun.nso ...
1
ugs
n
f
19
Wh
-2
s
24
3
3
3.25
...
2
ugs
n
f
19
Wh
0
s
18
3
2.5
2.75
...
3
ugs
n
m
19
0
n
21
3
3
2.5
...
4
ugs
n
f
18
Wh
0
s
16
2
2.5
2.5
...
5
ugs
n
m
18
Wh
1
s
22
3
2.5
2.5
...
6
ugs
n
n
23
3
2.25
...
7
ugs
n
m
18
Wh
-1
n
17
2
2
1.5
...
8
ugs
n
f
20
Wh
-1
s
20
3
3.5
3
...
9
ugs
n
m
19
Wh
0
n
27
3
3
1.5
...
10
ugs
n
f
19
NW
1
n
19
3
2.5
2.25
...
11
ugs
n
m
22
Wh
0
s
26
3
3.25
3.25
...
12
ugs
n
m
23
Wh
0
n
25
3
2.75
3
...
13
ugs
n
f
18
NW
0
n
15
2
2.25
2.25
...
14
ugs
n
f
19
Wh
-1
s
7
1
1.75
1
...
15
ugs
n
f
18
Wh
0
s
8
1
2.5
1.75
...
16
ugs
n
f
18
NW
0
s
12
2
2.25
1.5
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Working with Data Frames
 See all the values of participant age
Pun$p.age
 Summary stats of participant age
summary(Pun$p.age, na.rm=T)
 Histogram of religious fundamentalism scores
hist(Pun$relig.fund)
 Right-wing authoritarianism by participant age
plot(Pun$rw.auth ~ Pun$p.age)
 Judgments of sex offender accountability by participant group
boxplot(Pun$acc.so ~ Pun$sub.grp)
 Barplot of offender development level
barplot( table(Pun$o.devlvl)
)
# note: table!
Try some more stuff
 Look at names and structures of Pun variables...
 Apply functions to variables (substitute for x & y, below)
summary(Pun$p.age)
tbl <- table(Pun$o.devlvl)
# sometimes useful to have table as an object
barplot(tbl)
# see?
hist(Pun$pun.nso)
plot(rw.auth ~ relig.fund, data = Pun)
Try 2-way table (be sure to choose categorical variables):
tbl2 <- table(Pun$p.politaffil, Pun$sub.grp)
mosaicplot(tbl2, col = c("skyblue", "lightgreen", "pink"))
barplot(tbl2, beside = TRUE, col = c("red", "orange", "yellow", "green", "blue"))
Some Things Are Easier in R
 Histogram of number of sex offenders known
hist(Pun$num.offs.known)
 Histogram of transformed variable
hist(log(Pun$num.offs.known))
Some Things Are Easier in R
 Histogram of accountability ratings (SO+NSO)
hist(Pun$acc.all)
 Histogram of undergrad accountability ratings
with(subset(Pun, trt.pro == "n"), hist(acc.all , col="pink") )
 Add therapist accountability ratings (and make it pretty)
with(subset(Pun, trt.pro == 'y'), hist(acc.all, add=TRUE, col="skyblue") )
Some Things Are Easier in R
Regression and ANOVA (i.e., linear models): lm()
 Predict accountability judgments of offenders (acc.all) by
religious fundamentalism (relig.fund), right-wing
authoritarianism (rw.auth), and professional status (trt.pro)
acc.lm <- lm(acc.all ~ relig.fund + rw.auth + trt.pro, data=Pun)
summary(acc.lm)
Diagnostic Plots:
plot(acc.lm)
Some Things Are Easier in R
 ANOVA
 Effects of participant group (sub.grp) and offender development
level (o.devlvl) on preference for punishing offenders (pun.all)
pun.lm <- lm(pun.all ~ sub.grp * o.devlvl, data=Pun)
anova(pun.lm)
Some things are easier with packages
 Thousands of user-created packages
 There are even companies who do this as a business model
 Available nearly instantly in R
 To install a package named newfunctions...
install.packages("newfunctions")
 To load it into the workspace (i.e., make it accessible)
library(newfunctions)
Some useful packages









psych – lots of nifty tools designed for psychological research
car – lots of amazing regression tools
dplyr – powerful data manipulation
lavaan – structural equation modeling
rvest – data scraping from the web
lme4, nlme – mixed-effects modeling (i.e., HLM, MLM, etc.)
Amelia, mice – multiple imputation for missing data
bioconductor – meta-package for bio research
ggplot2 – graphics that make more sense than in base R
Some more...
Package: ggplot2
 Better, prettier, more logical graphics
 Example: accountability ratings by treatment group and
offender developmental level
install.packages("ggplot2")
library(ggplot2)
ggplot(Pun, aes(x = o.devlvl, y = acc.all,
color=sub.grp, group=sub.grp)) +
stat_summary(fun.y="mean", na.rm=TRUE, geom="point") +
stat_summary(fun.y="mean", na.rm=TRUE, geom="line")
Package: psych
 Scatterplot matrix with histograms and correlations
install.packages("psych")
library(psych)
vars <- c("o.age", "pun.all", "acc.all", "trt.all", "relig.fund", "rw.auth")
pairs.panels(subset(Pun, select = vars))
Package: corrgram
 Correlogram
install.packages("corrgram")
library(corrgram)
corrgram(subset(Pun, select = vars),
upper.panel=panel.pie, lower.panel=panel.ellipse)
The R Learning Curve




More work than “learning SPSS”
Less work than “learning javascript”
About the same as "learning Stata"
Tips to reduce the learning curve
 Get a good book (or three) on R
 Get comfortable with Google and Stack Overflow
 Only learn what you need to, right now
 Try R Commander GUI?
YMMV
Basic R Resources









Help system built into R
Google
Stack overflow
Google
R-Help mailing list archives (not so active)
Google
Websites made for R users
Google
Comprehensive R Archive Network (cran.r-project.org)
R Books
 Bare-Bones R (Thomas P. Hogan)
 Extremely short
 Nice for beginners
 Statistical Analysis with R (John M. Quick)
 Good reviews from R beginners
 An Introduction to R
 A bit dry, technical
 An R Companion to Applied Regression
 Lots more out there, some free and online
Some very helpful websites
 Quick-R
 A bunch of links from UCLA’s excellent IDRE
 Google “R Tutorial”
 Here’s one by Kelly Black at Clarkson University
 Another one from a help book series
Seriously, Google is your friend.
There are thousands of sites (at least!) with R help on them.
More notes on Getting Help
 From within R
 RSiteSearch(“xxx”)
 ?xxxx or help(“xxx”)
 Google
 “R-Help xxx”
 “R package xxx”
 “R how-to xxx”
this is
THE END
Download