R data structures Bonie Thiel, Ph.D. and Gürkan Bebek, Ph.D., M.S. Systems Biology and Bioinformatics Graduate Program 1 Different data structures Zimmermann, Niklaus & Steinmann, Katharina. (2021). A short Introduction to statistics using R. 2 Getting information about an object x <- c('one'=1,'two'=2,'three'=3,'four'=4,'five’=5) #this is a named vector typeof(x) [1] "double" y <- c('one'=1L,'two'=2L,'three'=3L,'four'=4L,'five'=5L) typeof(y) [1] "integer" length(y) [1] 5 attributes(y) #this shows the metadata $names [1] "one" "two" "three" "four" "five" x.y <- cbind(x,y) #building more complex structures str(x.y) num [1:5, 1:2] 1 2 3 4 5 1 2 3 4 5 - attr(*, "dimnames")=List of 2 ..$ : chr [1:5] "one" "two" "three" "four" ... ..$ : chr [1:2] "x" "y" class(x.y) [1] "matrix" "array" typeof(x.y) [1] "double" 3 Useful functions: str() typeof() class() length() attributes() rownames() colnames() Reading Data Into R >Biomarkers <- read.csv(“bio.csv”, header=TRUE,sep=“,”) >library(ODBC) >sheet <- “c:\\Documents and Settings\\user\sheet.xls” >con <- odbcConnectExcel(sheet) Can also use library(XML) to read MS Office docs since they are zipped XML files. >library(RMySQL); drv <- dbDriver(“MySQL”) >con <- dbConnect(drv,dbname=,user,password=,host=) >mydata <- dbGetQuery(con, “SELECT * FROM mydata”) > wpage <- readLines("http://www.programr.com/list.html") > author_lines <- wpage[grep("<I>", wpage)] 4 Concepts for Using R - Data Frames # Create a data frame: > info <- data.frame(gender = c("M", "M", "F"), ht = c(172, 186.5, 165), wt = c(91, 99, 74)) > info gender ht wt 1 M 172.0 91 2 M 186.5 99 3 F 165.0 74 > info[1,2] [1] 172 > names(info) [1] "gender" "ht" "wt" > info$ht [1] 172.0 186.5 165.0 > row.names(info) <- c("S1","S2","S3") > info gender ht wt S1 M 172.0 91 S2 M 186.5 99 S3 F 165.0 74 5 Concepts for Using R - Data Frames > height <- info$ht > height 1] 172.0 186.5 165.0 > info$age = c(28,55,43) > info gender ht wt age S1 M 172.0 91 28 S2 M 186.5 99 55 S3 F 165.0 74 43 > subset(info,age > 50 ) gender ht wt age S2 M 186.5 99 55 6 Concepts for Using R - Data Frames A data frame is an object that contains data in a format that allows for easier manipulation, reshaping, and open-ended analysis. Data frames are tightly coupled collections of variables. # read clinical_trial.txt into R # name it clinical.trial clinical.trial <- read.delim("clinical_trial.txt") Download from canvas to test NOTE: when copying a path from windows use forward slashes: change ‘C:\Desktop\SYBB402\example.csv’ to ‘C:/Desktop/SYBB402/example.csv’ # use head() and str() functions to investigate clinical.trial > head(clinical.trial) > str(clinical.trial) #select age variable > clinical.trial$age > clinical.trial[“age”] > clinical.trial[2] > clinical.trial[[2]] #what is the difference between using [] vs [[]]? 7