Darrin Rogers Psychology Department Feb. 5, 2016 INTRODUCTION TO R – PART 1 Structure of this course General information about R How to get help and information How to do basic things Basic analyses Installing R http://r-project.org You want the "base" package. Click “Dowload R for....” (Windows, Mac OS, etc.) If you have a mac, do this FIRST (for best experience): Install TCL/TK: http://cran.r-project.org/bin/macosx/tools/ → tcltk-8.5.5-x11.dmg What is R? A computer language developed specifically for data analysis You don’t need to “program” You don’t need to write programs or scripts R is interactive Type something in, get an immediate result What can R do? Anything any other statistics software can do And usually lots more Because it’s a programming language What can R do with Graphics? Anything1 Some of my graphs (from research & teaching): 1Eventually... Figure 2. Distribution of true Phase 2 SIS scores (blue) versus randomly-generated profiles (red). Figure 4. AUCs for 100 runs of SIS discrimination between original profiles and partially (1% through 100%) random profiles. Light blue lines are AUCs for 100 individual runs; dark blue line indicates mean AUC at each point. R and Graphics Others’ graphs (not mine) http://www.statmethods.net/advgraphs/ http://gallery.r-enthusiasts.com/graph/US.2004.elections.map.113 http://gallery.r-enthusiasts.com/graph/Scatter.plot.3D.44 http://gallery.r-enthusiasts.com/graph/Smily.and.Grumpy.faces.174 http://gallery.r-enthusiasts.com/graph/Correlation.matrix.ellipses.149 http://gallery.r-enthusiasts.com/graph/Correlograms.148 http://gallery.r-enthusiasts.com/graph/SuperStorm.Sandy.170 http://gallery.r-enthusiasts.com/graph/Notched.boxplots.6 http://gallery.r-enthusiasts.com/graph/Image.lag.plot.matrix.158 Using R as a calculator Start R (double-click a big blue R icon) Make something happen! Some of the more popular operators: Try some stuff! Operation in R Operation in R Add + * / Exponent Subtract Multiply Divide Square root Sum Log ^ sqrt() sum() log() 5+3 4-12*9 sqrt(10^2.5) sum(5,3,2) log(127) Assignment <- 7 25 4 12.99 0.519 87.3 = or <- creates an object and assigns information to it Try this: x <- 5 Now x "contains" the value 5 To see the contents of the object, just type its name x <- <<- 7 25 12.99 4 0.51 9 87.3 "Bob" "Alice" "Fred" "Rod" "Jenny" "Jackie""Xavier" Assignment: Try it One number Multiple num <- 5 nums <- c(5, 6, 9, 12, 100) One character string beast <- "Aardvark" Multiple beasts <- c("Bird", "Dog", "Hi there") Other objects bestiary <- c(word, words) Action Words: Functions Some special words included by default, like mean c cor etc... sd t t.test hist sum anova barplot sqrt lm Other users (and you!) make their own recode fa ggplot qqPlot How do functions work? name(argument = value, argument = value, ...) ls() hist(data) mean(x = myvalues, na.rm = TRUE) lm(y ~ x, data = surveyData) recode(responses, recodes = "1=5; 2=4; 4=2; 5=1") fa(Dataset, nfactors = 4 , rotate = "oblimin" , fm = "gls" ) Use some functions T-scores: google helps you learn which functions exist and x <- rnorm(200, mean=50, sd=10) how to use them Letters: let <- LETTERS Try a few functions mean(x) sd(x) hist(x) summary(x) length(let) Assign output of function to an object zx <- (x - mean(x)) / sd(x) Functions: How Do You Know? Which functions exist? Google! How to use them? ?functionname also: Google or help("functionname") Built-in doodads Sequences Randomness 1:100 sample(1:100, 5, replace=TRUE) distributions Normal pnorm(-1.645) pnorm(95, mean=100, sd=15/sqrt(25)) t pt(-1.73, df=24) Quick Demo: Twenty-One Strategy In a game of Twenty-One, how often would you win versus the dealer, if... Dealer always "holds" at 17 You always "hold" at 18 Simplified (for brevity): (gotta take risks sometimes...) Aces always equal 1 Initial 2 cards + 2 more "hits" (maximum) Demo: Twenty-One We can pause and see what the distributions look like... Set the graphics space print 2 rows (1 column) of charts: par(mfrow=c(2,1)) Histogram of our outcomes: hist(sum4, col="lightgreen") Add a vertical line at 21: abline(v=21, col="red", lwd=3) Now the dealer: hist(sum4.d, col="pink") abline(v=21, col="red", lwd=3) Quick regression demo x <- rnorm(200, mean=50, sd=10) y <- x + rnorm(200, mean=0, sd=7) Scatterplot Regression analysis Now view the analysis plot(x,y) Or... plot(y ~ x) mod <- lm(y ~ x) summary(mod) plot(mod) Prettier graph plot(y ~ x, pch=19, col="blue", main="Regression Plot") abline(mod, col="red", lwd=2) Getting Data Into R Can be frustrating at first Then you learn how to do it And how to fix the details that can go wrong And then it's amazingly flexible and quite easy Perfect microcosm of R Import Data CSV format is your friend! From Excel or SPSS (or anything) Save As .csv Then in R CleverName <- read.csv() Result: a data frame Works from URLs Pun <- read.csv("http://darrinlrogers.com/static/data/pun.csv") View the data Names of variables: names(Pun) See the first few values: head(Pun) Information about variables: summary(Pun) str(Pun) See the full matrix edit(Pun) Working with Data Frames How to access individual variables: $ dataframename$variablename Pun$o.age Pun$o.age sub.num sub.grp trt.pro p.sex p.age p.ethn p.politaffil first.o.type o.age o.devlvl pun.so pun.nso ... 1 ugs n f 19 Wh -2 s 24 3 3 3.25 ... 2 ugs n f 19 Wh 0 s 18 3 2.5 2.75 ... 3 ugs n m 19 0 n 21 3 3 2.5 ... 4 ugs n f 18 Wh 0 s 16 2 2.5 2.5 ... 5 ugs n m 18 Wh 1 s 22 3 2.5 2.5 ... 6 ugs n n 23 3 2.25 ... 7 ugs n m 18 Wh -1 n 17 2 2 1.5 ... 8 ugs n f 20 Wh -1 s 20 3 3.5 3 ... 9 ugs n m 19 Wh 0 n 27 3 3 1.5 ... 10 ugs n f 19 NW 1 n 19 3 2.5 2.25 ... 11 ugs n m 22 Wh 0 s 26 3 3.25 3.25 ... 12 ugs n m 23 Wh 0 n 25 3 2.75 3 ... 13 ugs n f 18 NW 0 n 15 2 2.25 2.25 ... 14 ugs n f 19 Wh -1 s 7 1 1.75 1 ... 15 ugs n f 18 Wh 0 s 8 1 2.5 1.75 ... 16 ugs n f 18 NW 0 s 12 2 2.25 1.5 ... ... ... ... ... ... ... ... ... ... ... ... ... ... Pun$o.age sub.num sub.grp trt.pro p.sex p.age p.ethn p.politaffil first.o.type o.age o.devlvl pun.so pun.nso ... 1 ugs n f 19 Wh -2 s 24 3 3 3.25 ... 2 ugs n f 19 Wh 0 s 18 3 2.5 2.75 ... 3 ugs n m 19 0 n 21 3 3 2.5 ... 4 ugs n f 18 Wh 0 s 16 2 2.5 2.5 ... 5 ugs n m 18 Wh 1 s 22 3 2.5 2.5 ... 6 ugs n n 23 3 2.25 ... 7 ugs n m 18 Wh -1 n 17 2 2 1.5 ... 8 ugs n f 20 Wh -1 s 20 3 3.5 3 ... 9 ugs n m 19 Wh 0 n 27 3 3 1.5 ... 10 ugs n f 19 NW 1 n 19 3 2.5 2.25 ... 11 ugs n m 22 Wh 0 s 26 3 3.25 3.25 ... 12 ugs n m 23 Wh 0 n 25 3 2.75 3 ... 13 ugs n f 18 NW 0 n 15 2 2.25 2.25 ... 14 ugs n f 19 Wh -1 s 7 1 1.75 1 ... 15 ugs n f 18 Wh 0 s 8 1 2.5 1.75 ... 16 ugs n f 18 NW 0 s 12 2 2.25 1.5 ... ... ... ... ... ... ... ... ... ... ... ... ... ... Pun$o.age sub.num sub.grp trt.pro p.sex p.age p.ethn p.politaffil first.o.type o.age o.devlvl pun.so pun.nso ... 1 ugs n f 19 Wh -2 s 24 3 3 3.25 ... 2 ugs n f 19 Wh 0 s 18 3 2.5 2.75 ... 3 ugs n m 19 0 n 21 3 3 2.5 ... 4 ugs n f 18 Wh 0 s 16 2 2.5 2.5 ... 5 ugs n m 18 Wh 1 s 22 3 2.5 2.5 ... 6 ugs n n 23 3 2.25 ... 7 ugs n m 18 Wh -1 n 17 2 2 1.5 ... 8 ugs n f 20 Wh -1 s 20 3 3.5 3 ... 9 ugs n m 19 Wh 0 n 27 3 3 1.5 ... 10 ugs n f 19 NW 1 n 19 3 2.5 2.25 ... 11 ugs n m 22 Wh 0 s 26 3 3.25 3.25 ... 12 ugs n m 23 Wh 0 n 25 3 2.75 3 ... 13 ugs n f 18 NW 0 n 15 2 2.25 2.25 ... 14 ugs n f 19 Wh -1 s 7 1 1.75 1 ... 15 ugs n f 18 Wh 0 s 8 1 2.5 1.75 ... 16 ugs n f 18 NW 0 s 12 2 2.25 1.5 ... ... ... ... ... ... ... ... ... ... ... ... ... ... Working with Data Frames See all the values of participant age Pun$p.age Summary stats of participant age summary(Pun$p.age, na.rm=T) Histogram of religious fundamentalism scores hist(Pun$relig.fund) Right-wing authoritarianism by participant age plot(Pun$rw.auth ~ Pun$p.age) Judgments of sex offender accountability by participant group boxplot(Pun$acc.so ~ Pun$sub.grp) Barplot of offender development level barplot( table(Pun$o.devlvl) ) # note: table! Try some more stuff Look at names and structures of Pun variables... Apply functions to variables (substitute for x & y, below) summary(Pun$p.age) tbl <- table(Pun$o.devlvl) # sometimes useful to have table as an object barplot(tbl) # see? hist(Pun$pun.nso) plot(rw.auth ~ relig.fund, data = Pun) Try 2-way table (be sure to choose categorical variables): tbl2 <- table(Pun$p.politaffil, Pun$sub.grp) mosaicplot(tbl2, col = c("skyblue", "lightgreen", "pink")) barplot(tbl2, beside = TRUE, col = c("red", "orange", "yellow", "green", "blue")) Some Things Are Easier in R Histogram of number of sex offenders known hist(Pun$num.offs.known) Histogram of transformed variable hist(log(Pun$num.offs.known)) Some Things Are Easier in R Histogram of accountability ratings (SO+NSO) hist(Pun$acc.all) Histogram of undergrad accountability ratings with(subset(Pun, trt.pro == "n"), hist(acc.all , col="pink") ) Add therapist accountability ratings (and make it pretty) with(subset(Pun, trt.pro == 'y'), hist(acc.all, add=TRUE, col="skyblue") ) Some Things Are Easier in R Regression and ANOVA (i.e., linear models): lm() Predict accountability judgments of offenders (acc.all) by religious fundamentalism (relig.fund), right-wing authoritarianism (rw.auth), and professional status (trt.pro) acc.lm <- lm(acc.all ~ relig.fund + rw.auth + trt.pro, data=Pun) summary(acc.lm) Diagnostic Plots: plot(acc.lm) Some Things Are Easier in R ANOVA Effects of participant group (sub.grp) and offender development level (o.devlvl) on preference for punishing offenders (pun.all) pun.lm <- lm(pun.all ~ sub.grp * o.devlvl, data=Pun) anova(pun.lm) Some things are easier with packages Thousands of user-created packages There are even companies who do this as a business model Available nearly instantly in R To install a package named newfunctions... install.packages("newfunctions") To load it into the workspace (i.e., make it accessible) library(newfunctions) Some useful packages psych – lots of nifty tools designed for psychological research car – lots of amazing regression tools dplyr – powerful data manipulation lavaan – structural equation modeling rvest – data scraping from the web lme4, nlme – mixed-effects modeling (i.e., HLM, MLM, etc.) Amelia, mice – multiple imputation for missing data bioconductor – meta-package for bio research ggplot2 – graphics that make more sense than in base R Some more... Package: ggplot2 Better, prettier, more logical graphics Example: accountability ratings by treatment group and offender developmental level install.packages("ggplot2") library(ggplot2) ggplot(Pun, aes(x = o.devlvl, y = acc.all, color=sub.grp, group=sub.grp)) + stat_summary(fun.y="mean", na.rm=TRUE, geom="point") + stat_summary(fun.y="mean", na.rm=TRUE, geom="line") Package: psych Scatterplot matrix with histograms and correlations install.packages("psych") library(psych) vars <- c("o.age", "pun.all", "acc.all", "trt.all", "relig.fund", "rw.auth") pairs.panels(subset(Pun, select = vars)) Package: corrgram Correlogram install.packages("corrgram") library(corrgram) corrgram(subset(Pun, select = vars), upper.panel=panel.pie, lower.panel=panel.ellipse) The R Learning Curve More work than “learning SPSS” Less work than “learning javascript” About the same as "learning Stata" Tips to reduce the learning curve Get a good book (or three) on R Get comfortable with Google and Stack Overflow Only learn what you need to, right now Try R Commander GUI? YMMV Basic R Resources Help system built into R Google Stack overflow Google R-Help mailing list archives (not so active) Google Websites made for R users Google Comprehensive R Archive Network (cran.r-project.org) R Books Bare-Bones R (Thomas P. Hogan) Extremely short Nice for beginners Statistical Analysis with R (John M. Quick) Good reviews from R beginners An Introduction to R A bit dry, technical An R Companion to Applied Regression Lots more out there, some free and online Some very helpful websites Quick-R A bunch of links from UCLA’s excellent IDRE Google “R Tutorial” Here’s one by Kelly Black at Clarkson University Another one from a help book series Seriously, Google is your friend. There are thousands of sites (at least!) with R help on them. More notes on Getting Help From within R RSiteSearch(“xxx”) ?xxxx or help(“xxx”) Google “R-Help xxx” “R package xxx” “R how-to xxx” this is THE END