Example ofis multivariate What R? data R is a language and environment for statistical computing and graphics. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux),Windows and MacOS. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics. Example of multivariate data The R environment A fully planned and coherent system that includes: • an effective data handling and storage facility, • a suite of operators for calculations on arrays (matrices), • a large, coherent, integrated collection of intermediate tools for data analysis, • graphical facilities for data analysis and display (on-screen or on hardcopy), • a well-developed, simple and effective programming languages which includes conditionals, loops, user-defined recursive functions and input and output facilities. Download R for free at: http://www.r-project.org/ of multivariate data RExam Download of multivariate data RExam Download of multivariate data R Exam Download of multivariate data RExam packages of multivariate data RExam Console Exam ofdata multivariate Import indata R Exam ofdata multivariate Import indata R Exam of multivariate data Install packages Exam of multivariate data Install packages Exam of multivariate data Install packages R script Exam of multivariate data R script Exam of multivariate data RStudio Exam of multivariate data RStudio Exam of multivariate data Example of multivariate data Import data in RStudio Exam of multivariate data Install packages in RStudio Exam R inof multivariate linux data Exam R inof multivariate linux data Essential commands in R Example in R Vectors # Character vector: > c("Huey","Dewey","Louie") [1] "Huey" "Dewey" "Louie" # Logical vector: > c(T,T,F,T) [1] TRUE TRUE FALSE TRUE #Functions that create vectors: c-“concatenate” > c(42,57,12,39) [1] 42 57 12 39 seq-”sequence” > seq(4,9) [1] 4 5 6 7 8 9 # Numeric vector: > c(2,3,5,7,9) [1] 2 3 5 7 9 rep-”replicate” > rep(1:2,5) [1] 1 2 1 2 1 2 1 2 1 2 > rep(1:2,c(3,4)) [1] 1 1 1 2 2 2 2 Example in R Factors Factors – a data structure that makes it possible to assign meaningful names to the categories. > pain=c(0,3,2,2,1) > fpain=factor(pain,levels=0:3) > levels(fpain)=c("none","mild","medium","severe") > fpain [1] none severe medium medium mild Levels: none mild medium severe > levels(fpain) [1] "none" "mild" "medium" "severe" Example Matrices and arrays > x=1:2 > x=1:12 > dim(x)=c(3,4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > x=matrix(1:12,nrow=3,byrow=T) > rownames(x)=LETTERS[1:3] >x [,1] [,2] [,3] [,4] A 1 2 3 4 B 5 6 7 8 C 9 10 11 12 > t(x) AB C [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 LETTERS- build in variable that contains the capital letters A-Z. t(x) – the transpose matrix of x. Example Matrices and arrays # Use the functions cbind and rbind to “bind” vectors together columnwise or rowwise. > cbind(A=1:4,B=5:8,C=9:12) AB C [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 > rbind(A=1:4,B=5:8,C=9:12) [,1] [,2] [,3] [,4] A 1 2 3 4 B 5 6 7 8 C 9 10 11 12 Example Data frames Data frame – it is a list of vectors and/or factors of the same length, which are related “across”, such that data in the same position come from the same experimental unit (subject, animal, etc.). > conc=c(5,12,20,24,35,40) > vol=c(20,25,33,40,50,55) > d=data.frame(conc,vol) >d conc vol 1 5 20 2 12 25 3 20 33 4 24 40 5 35 50 6 40 55 of multivariate data in R DataExample manipulation Data: “Soil” Soil properties of two adjacent locations on Wimbledon common, a sandy lowland heath (site1), and adjoining spoil mounds of calcareous clay (site 2). Parameters: Site - site number rep - quadrat replicate number pH cond - electrical conductivity of soil solution OM - percentage organic matter composition of soil H2O – percentage water content of soil after drying to 105°F Read data in R Example of multivariate data A comment in R is marked with # #import a .text file: > Soil=read.table("E:/Multivariate_analysis/Data/Soil.txt",header=T) #import a .csv file: >Soil=read.csv("E:/Multivariate_analysis/Data/Soil.csv",header=T) > Soil Site rep pH cond OM H2O 1 1 1 4.5 55 26 17 2 1 1 5.4 60 16 21 3 1 3 5.1 49 NA 18 4 1 4 4.8 55 27 18 5 2 1 7.6 155 5 25 6 2 2 7.8 124 NA 35 7 2 3 7.2 141 6 32 8 2 4 7.3 166 8 29 of multivariate data in R DataExample manipulation #Display the column names of “Soil” data: > names(Soil) [1] "Site" "rep" "pH" "cond" "OM" "H2O" #Display the row names: > rownames(Soil) [1] "1" "2" "3" "4" "5" "6" "7" "8" #Display the dimensions of the Soil data: > dim(Soil) [1] 8 6 rows (observations) columns (variables) of multivariate data in R DataExample manipulation #Select the second column of the data: > Soil[,2] [1] 1 1 3 4 1 2 3 4 #or: > Soil$rep [1] 1 1 3 4 1 2 3 4 #Select the third row of the data: >Soil[3,] Site rep pH cond OM H2O 3 1 3 5.1 49 34 18 #Select rows 2,4, and 5: > Soil[c(2,4,5),] Site rep pH cond OM H2O 2 1 1 5.4 60 16 21 4 1 4 4.8 55 27 18 5 2 1 7.6 155 5 25 of multivariate data in R DataExample manipulation #Display the length of the second column: > length(Soil[,2]) [1] 8 #Add a new column log.pH containing the logarithmic transform of pH: >Soil2=transform(Soil,log.pH=log(Soil$pH)) > Soil2 Site rep pH cond OM H2O log.pH 1 1 1 4.5 55 26 17 1.504077 2 1 1 5.4 60 16 21 1.686399 3 1 3 5.1 49 NA 18 1.629241 4 1 4 4.8 55 27 18 1.568616 5 2 1 7.6 155 5 25 2.028148 6 2 2 7.8 124 NA 35 2.054124 7 2 3 7.2 141 6 32 1.974081 8 2 4 7.3 166 8 29 1.987874 of multivariate data in R DataExample manipulation #Delete the third column (pH) of the “Soil2” data: > Soil3=Soil2[,-3] > Soil3 Site rep cond OM H2O 1 1 1 55 26 17 2 1 1 60 16 21 3 1 3 49 NA 18 4 1 4 55 27 18 5 2 1 155 5 25 6 2 2 124 NA 35 7 2 3 141 6 32 8 2 4 166 8 29 log.pH 1.504077 1.686399 1.629241 1.568616 2.028148 2.054124 1.974081 1.987874 of multivariate data in R DataExample manipulation #Select the first four columns of the “Soil” data: > Soil4=Soil[,1:4] > Soil4 Site rep pH cond 1 1 1 4.5 55 2 1 1 5.4 60 3 1 3 5.1 49 4 1 4 4.8 55 5 2 1 7.6 155 6 2 2 7.8 124 7 2 3 7.2 141 8 2 4 7.3 166 of multivariate data in R DataExample manipulation #Obtain a subset of the “Soil” data with cond >100: > Soil5=subset(Soil,Soil$cond>100) > Soil5 Site rep pH cond OM H2O 5 2 1 7.6 155 5 25 6 2 2 7.8 124 NA 35 7 2 3 7.2 141 6 32 8 2 4 7.3 166 8 29 #Obtain a subset of the “Soil” data with cond >100 and H2O<32 >Soil6=subset(Soil,Soil$cond>100&Soil$H2O<32) > Soil6 Site rep pH cond OM H2O 5 2 1 7.6 155 5 25 8 2 4 7.3 166 8 29 of multivariate data in R DataExample manipulation #Obtain a subset of the “Soil” data with no missing values (NA): > Soil7=subset(Soil, !is.na(Soil$OM)) > Soil7 Site rep pH cond OM H2O 1 1 1 4.5 55 26 17 2 1 1 5.4 60 16 21 4 1 4 4.8 55 27 18 5 2 1 7.6 155 5 25 7 2 3 7.2 141 6 32 8 2 4 7.3 166 8 29 #Obtain a subset of the “Soil” data with missing values (NA): > Soil8=subset(Soil,is.na(Soil$OM)) > Soil8 Site rep pH cond OM H2O 3 1 3 5.1 49 NA 18 6 2 2 7.8 124 NA 35 of multivariate data in R DataExample manipulation #Identify which observations have pH<7: > which(Soil$pH<7) [1] 1 2 3 4 # observations (rows) 1,2,3,and 4 have pH<7. #Identify which observations have missing values for OM: > which(is.na(Soil$OM)) [1] 3 6 #observations 3 and 6 have missing values for OM. #Identify which observation has pH=5.4: > which(Soil$pH==5.4) [1] 2 #Identify which observations are not from the Site 1: > which(Soil$Site!=1) [1] 5 6 7 8 of multivariate data in R DataExample manipulation #Order “Soil” data by pH: Increasing > Soil9=Soil[order(Soil$pH),] > Soil9 Site rep pH cond OM H2O 1 1 1 4.5 55 26 17 4 1 4 4.8 55 27 18 3 1 3 5.1 49 NA 18 2 1 1 5.4 60 16 21 7 2 3 7.2 141 6 32 8 2 4 7.3 166 8 29 5 2 1 7.6 155 5 25 6 2 2 7.8 124 NA 35 Decreasing > Soil10=Soil[order(-Soil$pH),] > Soil10 Site rep pH cond OM H2O 6 2 2 7.8 124 NA 35 5 2 1 7.6 155 5 25 8 2 4 7.3 166 8 29 7 2 3 7.2 141 6 32 2 1 1 5.4 60 16 21 3 1 3 5.1 49 NA 18 4 1 4 4.8 55 27 18 1 1 1 4.5 55 26 17 of multivariate data in R DataExample manipulation #Save “Soil10” data from the R console to your computer: >write.table(Soil10,file="E:/Multivariate_analysis/pH_Order_Soil.csv“, row.names=F,col.names=names(Soil10),quote=F,sep=",") #Load a package in R (after installing it): > library(MASS) # load the package called MASS # Get help with R functions: >help(read.table) or >?read.table Get help in R Example of multivariate data Example of multivariatestatistics data Simple summary #Calculate mean, standard deviation, variance, median, sum, and maximum and minimum values for “cond” in “Soil” data: > mean(Soil$cond) [1] 100.625 > sum(Soil$cond) [1] 805 > sd(Soil$cond) [1] 50.54824 > max(Soil$cond) [1] 166 > var(Soil$cond) [1] 2555.125 > min(Soil$cond) [1] 49 > median(Soil$cond) [1] 92 Graphics in R Example of multivariate data Graphics in R Example of multivariate data