pptx

advertisement
Example ofis
multivariate
What
R? data
R is a language and environment for statistical computing and graphics.
R is available as Free Software under the terms of the Free Software Foundation's
GNU General Public License in source code form.
It compiles and runs on a wide variety of UNIX platforms and similar systems
(including FreeBSD and Linux),Windows and MacOS.
R can be extended (easily) via packages. There are about eight packages supplied
with the R distribution and many more are available through the CRAN family of
Internet sites covering a very wide range of modern statistics.
Example
of multivariate data
The R
environment
A fully planned and coherent system that includes:
• an effective data handling and storage facility,
• a suite of operators for calculations on arrays (matrices),
• a large, coherent, integrated collection of intermediate tools for data analysis,
• graphical facilities for data analysis and display (on-screen or on hardcopy),
• a well-developed, simple and effective programming languages which
includes conditionals, loops, user-defined recursive functions and input and
output facilities.
Download R for free at:
http://www.r-project.org/
of multivariate data
RExam
Download
of multivariate data
RExam
Download
of multivariate data
R Exam
Download
of multivariate data
RExam
packages
of multivariate data
RExam
Console
Exam ofdata
multivariate
Import
indata
R
Exam ofdata
multivariate
Import
indata
R
Exam of
multivariate data
Install
packages
Exam of
multivariate data
Install
packages
Exam of
multivariate data
Install
packages
R script
Exam of multivariate data
R script
Exam of multivariate data
RStudio
Exam of multivariate data
RStudio
Exam of multivariate data
Example
of multivariate
data
Import
data
in RStudio
Exam of multivariate
data
Install packages
in RStudio
Exam
R
inof multivariate
linux data
Exam
R
inof multivariate
linux data
Essential commands in R
Example in R
Vectors
# Character vector:
> c("Huey","Dewey","Louie")
[1] "Huey" "Dewey" "Louie"
# Logical vector:
> c(T,T,F,T)
[1] TRUE TRUE FALSE TRUE
#Functions that create vectors:
c-“concatenate”
> c(42,57,12,39)
[1] 42 57 12 39
seq-”sequence”
> seq(4,9)
[1] 4 5 6 7 8 9
# Numeric vector:
> c(2,3,5,7,9)
[1] 2 3 5 7 9
rep-”replicate”
> rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> rep(1:2,c(3,4))
[1] 1 1 1 2 2 2 2
Example in R
Factors
Factors – a data structure that makes it possible to assign meaningful names
to the categories.
> pain=c(0,3,2,2,1)
> fpain=factor(pain,levels=0:3)
> levels(fpain)=c("none","mild","medium","severe")
> fpain
[1] none severe medium medium mild
Levels: none mild medium severe
> levels(fpain)
[1] "none" "mild" "medium" "severe"
Example
Matrices
and arrays
> x=1:2
> x=1:12
> dim(x)=c(3,4)
>x
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> x=matrix(1:12,nrow=3,byrow=T)
> rownames(x)=LETTERS[1:3]
>x
[,1] [,2] [,3] [,4]
A 1 2 3 4
B 5 6 7 8
C 9 10 11 12
> t(x)
AB C
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
LETTERS- build in variable that contains the capital letters A-Z.
t(x) – the transpose matrix of x.
Example
Matrices
and arrays
# Use the functions cbind and rbind to “bind” vectors together
columnwise or rowwise.
> cbind(A=1:4,B=5:8,C=9:12)
AB C
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> rbind(A=1:4,B=5:8,C=9:12)
[,1] [,2] [,3] [,4]
A 1 2 3 4
B 5 6 7 8
C 9 10 11 12
Example
Data
frames
Data frame – it is a list of vectors and/or factors of the same length, which are
related “across”, such that data in the same position come from the same
experimental unit (subject, animal, etc.).
> conc=c(5,12,20,24,35,40)
> vol=c(20,25,33,40,50,55)
> d=data.frame(conc,vol)
>d
conc vol
1 5 20
2 12 25
3 20 33
4 24 40
5 35 50
6 40 55
of multivariate data in R
DataExample
manipulation
Data: “Soil”
Soil properties of two adjacent locations on Wimbledon common, a sandy
lowland heath (site1), and adjoining spoil mounds of calcareous clay (site 2).
Parameters:
Site - site number
rep - quadrat replicate number
pH
cond - electrical conductivity of soil solution
OM - percentage organic matter composition of soil
H2O – percentage water content of soil after drying to 105°F
Read data in R
Example of multivariate data
A comment in R is marked with #
#import a .text file:
> Soil=read.table("E:/Multivariate_analysis/Data/Soil.txt",header=T)
#import a .csv file:
>Soil=read.csv("E:/Multivariate_analysis/Data/Soil.csv",header=T)
> Soil
Site rep pH cond OM H2O
1 1 1 4.5 55 26 17
2 1 1 5.4 60 16 21
3 1 3 5.1 49 NA 18
4 1 4 4.8 55 27 18
5 2 1 7.6 155
5 25
6 2 2 7.8 124 NA 35
7 2 3 7.2 141
6 32
8 2 4 7.3 166
8 29
of multivariate data in R
DataExample
manipulation
#Display the column names of “Soil” data:
> names(Soil)
[1] "Site" "rep" "pH" "cond" "OM" "H2O"
#Display the row names:
> rownames(Soil)
[1] "1" "2" "3" "4" "5" "6" "7" "8"
#Display the dimensions of the Soil data:
> dim(Soil)
[1] 8 6
rows
(observations)
columns
(variables)
of multivariate data in R
DataExample
manipulation
#Select the second column of the data:
> Soil[,2]
[1] 1 1 3 4 1 2 3 4
#or:
> Soil$rep
[1] 1 1 3 4 1 2 3 4
#Select the third row of the data:
>Soil[3,]
Site rep pH cond OM H2O
3 1 3 5.1 49 34 18
#Select rows 2,4, and 5:
> Soil[c(2,4,5),]
Site rep pH cond OM H2O
2 1 1 5.4 60 16 21
4 1 4 4.8 55 27 18
5 2 1 7.6 155 5 25
of multivariate data in R
DataExample
manipulation
#Display the length of the second column:
> length(Soil[,2])
[1] 8
#Add a new column log.pH containing the logarithmic transform of pH:
>Soil2=transform(Soil,log.pH=log(Soil$pH))
> Soil2
Site rep pH cond OM H2O log.pH
1 1 1 4.5 55 26 17 1.504077
2 1 1 5.4 60 16 21 1.686399
3 1 3 5.1 49 NA 18 1.629241
4 1 4 4.8 55 27 18 1.568616
5 2 1 7.6 155 5 25 2.028148
6 2 2 7.8 124 NA 35 2.054124
7 2 3 7.2 141 6 32 1.974081
8 2 4 7.3 166 8 29 1.987874
of multivariate data in R
DataExample
manipulation
#Delete the third column (pH) of the “Soil2” data:
> Soil3=Soil2[,-3]
> Soil3
Site rep cond OM H2O
1 1 1
55 26 17
2 1 1
60 16 21
3 1 3
49 NA 18
4 1 4
55 27 18
5 2 1 155
5 25
6 2 2 124 NA 35
7 2 3 141
6 32
8 2 4 166
8 29
log.pH
1.504077
1.686399
1.629241
1.568616
2.028148
2.054124
1.974081
1.987874
of multivariate data in R
DataExample
manipulation
#Select the first four columns of the “Soil” data:
> Soil4=Soil[,1:4]
> Soil4
Site rep pH cond
1 1 1 4.5 55
2 1 1 5.4 60
3 1 3 5.1 49
4 1 4 4.8 55
5 2 1 7.6 155
6 2 2 7.8 124
7 2 3 7.2 141
8 2 4 7.3 166
of multivariate data in R
DataExample
manipulation
#Obtain a subset of the “Soil” data with cond >100:
> Soil5=subset(Soil,Soil$cond>100)
> Soil5
Site rep pH cond OM H2O
5 2 1 7.6 155 5 25
6 2 2 7.8 124 NA 35
7 2 3 7.2 141
6 32
8 2 4 7.3 166
8 29
#Obtain a subset of the “Soil” data with cond >100 and H2O<32
>Soil6=subset(Soil,Soil$cond>100&Soil$H2O<32)
> Soil6
Site rep pH cond OM H2O
5 2 1 7.6 155 5 25
8 2 4 7.3 166 8 29
of multivariate data in R
DataExample
manipulation
#Obtain a subset of the “Soil” data with no missing values (NA):
> Soil7=subset(Soil, !is.na(Soil$OM))
> Soil7
Site rep pH cond OM H2O
1 1 1 4.5 55 26 17
2 1 1 5.4 60 16 21
4 1 4 4.8 55 27 18
5 2 1 7.6 155 5 25
7 2 3 7.2 141 6 32
8 2 4 7.3 166 8 29
#Obtain a subset of the “Soil” data with missing values (NA):
> Soil8=subset(Soil,is.na(Soil$OM))
> Soil8
Site rep pH cond OM H2O
3 1 3 5.1 49 NA 18
6 2 2 7.8 124 NA 35
of multivariate data in R
DataExample
manipulation
#Identify which observations have pH<7:
> which(Soil$pH<7)
[1] 1 2 3 4
# observations (rows) 1,2,3,and 4 have pH<7.
#Identify which observations have missing values for OM:
> which(is.na(Soil$OM))
[1] 3 6
#observations 3 and 6 have missing values for OM.
#Identify which observation has pH=5.4:
> which(Soil$pH==5.4)
[1] 2
#Identify which observations are not from the Site 1:
> which(Soil$Site!=1)
[1] 5 6 7 8
of multivariate data in R
DataExample
manipulation
#Order “Soil” data by pH:
Increasing
> Soil9=Soil[order(Soil$pH),]
> Soil9
Site rep pH cond OM H2O
1 1 1 4.5 55 26 17
4 1 4 4.8 55 27 18
3 1 3 5.1 49 NA 18
2 1 1 5.4 60 16 21
7 2 3 7.2 141 6 32
8 2 4 7.3 166 8 29
5 2 1 7.6 155 5 25
6 2 2 7.8 124 NA 35
Decreasing
> Soil10=Soil[order(-Soil$pH),]
> Soil10
Site rep pH cond OM H2O
6 2 2 7.8 124 NA 35
5 2 1 7.6 155 5 25
8 2 4 7.3 166 8 29
7 2 3 7.2 141 6 32
2 1 1 5.4 60 16 21
3 1 3 5.1 49 NA 18
4 1 4 4.8 55 27 18
1 1 1 4.5 55 26 17
of multivariate data in R
DataExample
manipulation
#Save “Soil10” data from the R console to your computer:
>write.table(Soil10,file="E:/Multivariate_analysis/pH_Order_Soil.csv“,
row.names=F,col.names=names(Soil10),quote=F,sep=",")
#Load a package in R (after installing it):
> library(MASS)
# load the package called MASS
# Get help with R functions:
>help(read.table)
or
>?read.table
Get help in R
Example of multivariate data
Example
of multivariatestatistics
data
Simple
summary
#Calculate mean, standard deviation, variance, median, sum, and maximum
and minimum values for “cond” in “Soil” data:
> mean(Soil$cond)
[1] 100.625
> sum(Soil$cond)
[1] 805
> sd(Soil$cond)
[1] 50.54824
> max(Soil$cond)
[1] 166
> var(Soil$cond)
[1] 2555.125
> min(Soil$cond)
[1] 49
> median(Soil$cond)
[1] 92
Graphics in R
Example of multivariate data
Graphics in R
Example of multivariate data
Download