R Basics Xudong Zou Prof. Yundong Wu Dr. Zhiqiang Ye 18th Dec. 2013 1 R Basics History of R language How to use R Data type and Data Structure Data input R programming Summary Case study 2 History of R language 3 4 Robert Gentleman Ross Ihaka 5 History of R language 6 History of R language 7 History of R language 8 History of R language 9 History of R language 10 History of R language 11 History of R language 12 History of R language 13 History of R language 14 History of R language 15 History of R language 16 2013-09-25: Version: R-3.0.2 17 History of R language 18 History of R language 19 History of R language 20 History of R language 21 History of R language 22 History of R language 5088 23 What is R? • R is a programming language, and also a environment for statistics analysis and graphics Why use R • R is open and free. Currently contains 5088 packages that makes R a powerful tool for financial analysis, bioinformatics, social network analysis and natural language process and so on. • More and more people in science tend to learn and use R # BioConduct: bioinformatics analysis(microarray) # survival: Survival analysis How to use R 从这里输 入命令 控制台 How to use R 新建或打 开R脚本 ?用来获 取帮助 点这里添 加R包 Data type and Data structure Data type in R : numeric : character complex logical integer, single float, double float Data structure in R: Objects Class Mixed-class permitted? Vector numeric, char, complex, logical no Factor numeric, char no Array numeric, char, complex, logical no Matrix numeric, char, complex, logical no Data frame numeric, char, complex, logical yes list numeric, char, complex, logical, func, exp… yes Vector and vector operation Vector is the simplest data structure in R, which is a single entity containing a collection of numbers, characters, complexes or logical. 注意这个向 左的箭头 # Create two vectors: # Check the attributes: # basic operation on vector: 28 Vector and vector operation # basic operation on vector: > max( vec1) > min (vec1) > mean( vec1) > median(vec1) > sum(vec1) > summary(vec1) > vec1 > vec1[1] > x <- vec1[-1] ; x [1] > vec1[7] <- 15;vec1 29 array and matrix An array can be considered as a multiply subscripted collection of data entries. > x <- 1:24 > dim( x ) <- c( 4,6) # create a 2D array with 4 rows and 6 columns > dim( x ) <- c(2,3,4) # create a 3D array 30 array and matrix array() > x <- 1:24 > array( data=x, dim=c(4,6)) > array( x , dim= c(2,3,4) ) array indexing > x <- 1:24 > y <- array( data=x, dim=c(2,3,4)) > y[1,1,1] > y[,,2] > y[,,1:2] 31 array and matrix Matrix is a specific array that its dimension is 2 > class(potentials) > dim(potentials) > rownames(potentials) > colnames(potentials) > min(potentials) # “matrix” # 20 20 # GLY ALA SER … # GLY ALA SER … # -4.4 32 list List is an object that containing other objects as its component which can be a numeric vector, a logical value, a character or another list, and so on. And the components of a list do not need to be one type, they can be mixed type. >Lst <- list(drugName="warfarin",no.target=3,price=500, + symb.target=c("geneA","geneB","geneC") >length(Lst) # 4 >attributes(Lst) >names(Lst) >Lst[[1]] >Lst[[“drugName”]] >Lst$drugName 33 Data Frame A data frame is a list with some restricts: ① the components must be vectors, factors, numeric matrices, lists or other data frame ② Numeric vectors, logicals and factors are included as is, and by default character vectors are coerced to be factors, whose levels are the unique values appearing in the vector ③ Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same row size Names of components 34 Data Frame > names(cars) [1] "Plant" "Type" "Treatment" "conc" "uptake“ > length(cars) # 2 > cars[[1]] > cars$speed # recommended > attach(cars) # ?what’s this > detach(cars) > summary(cars$conc) # do what we can do for a vector 35 Data Input scan(file, what=double(), sep=“”, …) # scan will return a vector with data type the same as the what give. read.table(file, header=FALSE, sep= “ ”, row.names, col.names, …) # read.table will return a data.frame object # my_data.frame <- read.table("MULTIPOT_lu.txt",row.names=1,header=TRUE) From other software # from SPSS and SAS library(Hmisc) mydata <- spss.get(“test.file”,use.value.labels=TRUE) mydata <- sasxport.get(“test.file”) #from Stata and systat library(foreign) mydata<- read.dta(“test.file”) mydata<-read.systat(“test.file”) # from excel library(RODBC) channel <- odbcConnectExcel(“D:/myexcel.xls”) mydata <- sqlFetch(channel, “mysheet”) odbcclose(channel) load package 36 Operators 37 R Programming Control Statements # repeat {…} # switch( statement, list) 38 R Programming Function Definition: Example: matrix.axes <- function(data) { x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1); axis(side=1, at=x, labels=rownames(data), las=2); x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1); axis(side=2, at=x, labels=colnames(data), las=2); } 39 Summary numeric, character, complex, logical Data type and Data Structure vector, array/matrix, list, data frame scan, read.table Data Input load from other software: SPSS, SAS, excel Operators : <- R Programming: 40 Case study Residue based Protein-Protein Interaction potential analysis: Lu et al. (2003) Development of Unified Statistical Potentials Describing Protein-Protein Interactions, Biophysical Journal 84(3), p1895-1901 41 Reference CRAN-Manual:http://cran.r-project.org/ Quick-R:http://www.statmethods.net/index.html R tutorial:http://www.r-tutor.com/ MOAC: http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/matrix_cont our/ 42 43