Overall Aims • Introduce programming concepts relevant to MX • Demonstrate the strengths (and weaknesses) of R Introduction to R: Joseph Powell Books • The R Book – Crawley (2007) • Introductions to statistics using R • • • – Cohen Y. and Cohen J. Y. (2008). Statistics and Data with R. – Crawley M. (2005). Statistics: An Introduction using R. – Dalgaard P. (2002). Introductory Statistics with R. – Maindonald J. & Braun J. (2003). Data Analysis and Graphics Using R: An Example-based Approach. Books on biological topics – Paradis E. (2006). Analysis of Phylogenetics and Evolution with R. – Broman K. W. & Sen S. (2009). A Guide to QTL Mapping with R/qtl. – Bolker B.M. (2008). Ecological Models and Data in R. Books on statistical topics – Aitkin M. et al. (2009). Statistical Modelling in R. – Faraway J. (2009). Linear Models with R. – Albert J. (2009). Bayesian Computation with R. – Bivand R.S. et al. (2009). Applied Spatial Data Analysis with R. – Cowpertwait P.S.P. & Metcalfe A.V. (2009). Introductory Time Series with R. Books on R specifics and R programming – Spector P. (2008). Data Manipulation with R. – Murrell P. (2006). R Graphics. – Chambers J. M. (2008). Software for Data Analysis: Programming with R. Introduction to R: Joseph Powell Websites • Websites: – – – – – – Cran R: http://www.r-project.org/ R cookbook: http://www.r-cookbook.com/ R graphics: http://addictedtor.free.fr/graphiques/ R wiki: http://wiki.r-project.org/ Mailing lists: http://www.r-project.org/mail.html R seek: http://www.rseek.org/ • Websites on statistical topics – R genetics: http://rgenetics.org/trac/rgalaxy – Bioconductor: http://www.bioconductor.org/ Introduction to R: Joseph Powell The console • Load up R • Console window appears, with a command prompt • Everything in the R console can be partitioned into two fundamental operations: – Input variables > x <- 2 – Output variables > x [1] 2 Introduction to R: Joseph Powell Objects • Names – Case sensitive, no spaces – Must begin with a letter but also can contain numbers and: . _ – Try to give your objects meaningful names > My_f4vourite.langua6e_evR <- “R” • x, y and My_f4v… are objects that we have created > ls() # this will bring up a list of all our objects > rm(y) # this deletes y (forever) > rm(list=ls()) # this deletes everything (..forever) Introduction to R: Joseph Powell Workspace 1 • Everything shown in this list of objects comprises our 'workspace' > ls() [1] "My_f4vourite.langua6e_evR" "x" "y“ > save.image(file=“myworkspace.RData”) > rm(list=ls()) > ls() character(0) > load(file = “myworkspace.RData”) > ls() [1] "My_f4vourite.langua6e_evR" "x" "y“ • Objects are internal to R – Does not behave like a file structure on the computer – Can't be read or interpreted outside R (?) Introduction to R: Joseph Powell Workspace 2 • You can select which objects to save > save(y, x, file = “two_objects.RData”) • Different computer folders can be accessed > dir() # shows current work directory > setwd(“~/work_directory”) # sets R's focus to a different computer folder Introduction to R: Joseph Powell Built-in functions • Native functions make R succinct • Diverse range available from graphics to data manipulation to statistical algorithms etc. • Highly optimised so use them if they are available instead of writing your own • Function structure: > function_name(<argument 1>, <argument 2>, …) Introduction to R: Joseph Powell Missing values • NA is a “reserved” word in R • It is a single element (length 1) that indicates a missing value • A helpful alternative to coding missing values (e.g -99) > my_array <- c(NA,100,120,120,120,130,NA) > sum(my_array) [1] NA > sum(my_array,na.rm=T) # most functions allow you to explicitly state how to handle NA [1] 590 > table(my_array) my_array 100 120 130 1 3 Introduction to R: 1 Joseph Powell # HOWEVER the default action varies from function to function R help pages • Each function has its own unique syntax – – – Default arguments Data structure requirements Output options > ?seq > ??”sequence” • # brings up help page of seq() function # searches for all related functions Note > seq(from = 2, to = 100, by = 2) is clearer than > seq(2,100,2) Introduction to R: Joseph Powell Basic Scripting • Note pad / text editor – Within the R GUI – Open with: File > New Script or Ctrl+N – Layout as tile is useful: Windows > Tile Introduction to R: Joseph Powell Basic Scripting • Note pad / text editor – – – – Useful for keeping all work together Scripts can be saved Can be used to save a “program” Add # comments – Check individual bits of code – Ctrl+R • Whole line • Selected code Introduction to R: Joseph Powell Basic Scripting • Brackets – ( ) – [ ] – { } functions subsets processes • Subsets – Take a subset of an object – Objects have either 1 x n, or m x n dimensions > x > x[5] [1] 2 5 6 2 6 77 55 [1] 6 > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 Introduction to R: Joseph Powell > X[3,4] [1] 12 [rows, columns] Basic Scripting • Data input – Direct input into the console • scan() – Reading in data • read.table / read.csv – “name.txt” – “c:\\temp\\name.txt” – choose.file() – list.files() – dir() > y <- scan() 1: 3 2: 4 3: 12 4: 3 5: 5 6: 2 7: 14 8: Read 7 items > dir() [1] "temp.csv" "temp2.csv" “name.txt” > y <- read.table("name.txt", header=T, sep="\t") > Introduction to R: Joseph Powell Basic Scripting • Data output – Direct input into the console • sink() sink(“sink_tmp.txt”) i <- 1:10 outer(i, i, "*") – Writing out data sink() • write.table / write.csv – “name.txt” – “c:\\temp\\name.txt” > dir() [1] "temp.csv" "temp2.csv" “name.txt” > write.table("name.txt", header=T, sep="\t") > Introduction to R: Joseph Powell Basic Scripting • Adding rows and columns – Allows objects to be joined, either to an existing object or to make a new object – cbind() – adds columns together – rbind() – adds rows together > y1 [1,] [2,] [3,] [4,] [,1] [,2] [,3] 1 3 12.5 1 2 13.8 1 5 15.3 1 4 16.8 > y2 [1,] [2,] [3,] [4,] [,1] 0.349 0.745 0.684 0.964 Introduction to R: Joseph Powell > y3 <- cbind(y1, y2) > y3 [,1] [,2] [,3] [,4] [1,] 1 3 12.5 0.349 [2,] 1 2 13.8 0.745 [3,] 1 5 15.3 0.684 [4,] 1 4 16.8 0.964 > y3 <- rbind(y1, y2[1:3]) > y3 [,1] [,2] [,3] [1,] 1.000 3.000 12.500 [2,] 1.000 2.000 13.800 [3,] 1.000 5.000 15.300 [4,] 1.000 4.000 16.800 [5,] 0.349 0.745 0.684 Basic Scripting • for loops – loop through a set of commands a given number of times – very useful, but are not optimal for memory > dim(y) [1] 10 10 > out <- array(0, c(ncol(y), 1)) > for(i in 1:ncol(y)) { y_mean <- mean(y[i, 1:10]) } > for(i in 1:ncol(y)) { out[i] <- mean(y[i, ]) } > out > y_mean [1] 0.1974492 Introduction to R: [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] Joseph Powell [,1] -0.3110800 -0.2000344 0.2019573 0.2859823 0.1932523 0.2759323 -0.2571102 -0.1037983 0.3522018 0.1974492 Data Manipulation • Check data – – – – – – – dim() mydata[1:10, 1:10] str() summary() head() tail() table() – etc… Introduction to R: Joseph Powell > mydata <- read.table("mydata.txt", header=T, sep="\t") > dim(mydata) [1] 642 1470 > mydata[1:10, 1:10] [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 2 2 1 2 1 2 0 1 0 1 0 0 2 2 0 0 1 2 1 2 0 2 2 2 1 1 0 0 2 1 2 0 2 2 2 0 1 2 0 1 2 0 0 2 0 1 1 0 2 0 2 1 2 1 1 0 2 2 1 1 1 1 2 2 1 2 2 2 0 1 0 1 0 0 0 1 1 1 1 1 0 0 1 2 1 2 2 0 0 1 1 0 1 1 2 0 1 0 0 1 Data Manipulation • Reordering – If you have a data.frame or matrix (numbers or letters) – Use: order() – index <- order(old[,1], decreasing=T) > dim(lamb) [1] 1600 5 > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 A 25.52592 1 1 M 4 A 25.56016 1 1 M 5 A 24.53296 1 2 F 6 A 22.03344 1 2 F Introduction to R: Joseph Powell > lamb <- lamb[order(lamb$sex, decreasing=F), ] > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 5 A 24.53296 1 2 F 6 A 22.03344 1 2 F 9 A 30.37944 2 1 F 10 A 25.93680 2 1 F Data Manipulation • Reordering – order() > lamb <- lamb[order(lamb$sex, decreasing=F), ] > rows <- order(lamb$sex, decreasing=F) > lamb <- lamb[rows, ] Expanded way > index <- order(lamb$sex, decreasing=F) > head(index) [1] 1 2 5 6 9 10 > lamb <- lamb[index, ] Introduction to R: Joseph Powell Data Manipulation • Replacing – which() – index > class(lamb) [1] “matrix” > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- which(lamb[,1]=="A") > head(index) 1 2 4 6 7 10 > lamb[index, 1] <- ”C” Put it together > index <- lamb[,1]==“A” > head(index) [1] TRUE TRUE FALSE TRUE FALSE > lamb[index, 1] <- ”C” > head(lamb) Field Weight sire dam sex 1 C 22.92368 1 1 F 2 C 27.52896 1 1 F 3 B 25.52592 1 1 M Introduction to R: Joseph Powell > lamb[which(lamb[,1]==”A”, 1] <- ”C” Data Manipulation • Replacing > class(lamb) [1] “matrix” > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- lamb[,2] <= 22.000 > table(index) index FALSE TRUE 1553 47 > lamb[index, 2] <- ”NA” > which(lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 842 921 983 1103 1126 > which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 > new_lamb <- lamb[which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) , ] > new_lamb 214 363 496 Introduction to R: Joseph Powell Field A A A Weight 2046 2008 2041 sire 27 46 62 dam 2 1 2 sex F M M Graphics with R: Overview 1. Why graphics? 2. Why graphics in R? 3. The R graphics systems (did you really expect just one?) 4. Graphics basics and examples 5. Customisation of a graphic 6. Overview of different systems and packages Introduction to R: Joseph Powell plot(x, y, …) > ?Formaldehyde > head(Formaldehyde) carb optden 1 0.1 0.086 2 0.3 0.269 3 0.5 0.446 4 0.6 0.538 5 0.7 0.626 6 0.9 0.782 > plot(Formaldehyde) > ?par Introduction to R: Joseph Powell