pptx

advertisement
PC
PCA Example
Data set “Alaskan” presents the microcrustacean densities in seven streams from
south eastern Alaska which are formed when the glaciers melt. Five replicate
samples were taken from each stream. The sites range in age from 18 to
approximately 1400 years. How are the microcrustacean species distributed
in the seven streams?
R data “Alaskan”:
Variables:
Str- stream
rep – replicate
sp1 – Nitocra hibernica
sp2 – Atheyella illinoisensis
sp3 – Atheyella idahoensis
sp4 – Bryocamptus hiemalis
sp5 – Bryocamptus zschkkei
sp6 – Aanthocyclops vernalis
sp7 - Alona guttata
sp8 – Grapholebies sp
sp9 – Chydorus
sp10- Macrothricidae
sp11- Maraenobiotus insignipes.
PC
PCA Example
Read the data “Alaskan”:
>Alaskan=read.csv("E:/Multivariate_analysis/Data/Alaskan.csv",header=T)
Subset the data by removing the first three columns. We will have a data with all
microcrustacean species:
> Al=Alaskan[,-c(1:3)]
Log transform log(x+1) the Al data:
> Allog=log(Al[,1:11]+1)
Calculate the variance for each variable:
> round(sapply(Allog,var),2)
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10 sp11
2.72 0.95 1.90 5.05 5.03 5.51 0.23 0.45 0.45 0.23 1.02
PC
PCA Example
Calculate the correlation matrix of Allog data:
> Allog_Cor=round(cor(Allog),2)
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10 sp11
sp1 1.00 0.72 -0.16 -0.34 -0.31 0.18 0.47 0.64 -0.10 0.30 -0.12
sp2 0.72 1.00 -0.12 -0.26 -0.23 0.23 -0.05 0.79 -0.07 0.57 -0.09
sp3 -0.16 -0.12 1.00 0.36 0.03 -0.22 -0.07 -0.10 0.63 -0.07 -0.12
sp4 -0.34 -0.26 0.36 1.00 0.53 -0.36 -0.15 -0.21 0.35 -0.15 -0.05
sp5 -0.31 -0.23 0.03 0.53 1.00 -0.14 -0.13 -0.19 -0.19 -0.13 -0.02
sp6 0.18 0.23 -0.22 -0.36 -0.14 1.00 0.11 0.23 -0.21 0.06 -0.16
sp7 0.47 -0.05 -0.07 -0.15 -0.13 0.11 1.00 -0.04 -0.04 -0.03 -0.05
sp8 0.64 0.79 -0.10 -0.21 -0.19 0.23 -0.04 1.00 -0.06 -0.04 -0.07
sp9 -0.10 -0.07 0.63 0.35 -0.19 -0.21 -0.04 -0.06 1.00 -0.04 -0.07
sp10 0.30 0.57 -0.07 -0.15 -0.13 0.06 -0.03 -0.04 -0.04 1.00 -0.05
sp11 -0.12 -0.09 -0.12 -0.05 -0.02 -0.16 -0.05 -0.07 -0.07 -0.05 1.00
PC
PCA Example
Calculate the eigenvectors and eigenvalues for the correlation matrix:
> eigen(Allog_Cor)
$values
[1] 3.209758751 1.793647878 1.317301716 1.149864175 1.039372721 1.032387926 0.669324985 0.418930903 0.240541559 0.122835010 0.006034375
$vectors
[,1]
[,2]
[,3]
[,4]
[,5]
[,6]
[,7]
[,8]
[,9]
[,10]
[,11]
[1,] 0.46505201 -0.18595669 0.081346302 -0.197489482 -0.25285267 0.21921003 0.05270992 -0.0313541917 0.007230858 0.76488888 0.051683732
[2,] 0.46362324 -0.27574083 -0.338845535 0.075375284 -0.03802553 0.02737146 -0.05648792 0.0003910978 0.001684852 -0.26146377 -0.717475043
[3,] -0.22164587 -0.55323204 0.147472706 -0.007644819 0.08091379 -0.05457051 -0.21479408 -0.7000387364 -0.277652259 0.01348259 0.003537124
[4,] -0.35482789 -0.27945753 -0.310489062 -0.270900614 -0.17976336 0.12866504 -0.15177057 0.5039342415 -0.545694872 0.05062810 0.003220773
[5,] -0.25824043 0.11764614 -0.517812434 -0.432641139 -0.21547262 0.07078586 -0.24889042 -0.2761087974 0.522539482 0.03781096 0.008025880
[6,] 0.25707343 0.18349145 0.087053889 -0.232422653 0.39317365 -0.37462577 -0.71256124 0.1252277896 -0.088963762 0.11074437 0.010540076
[7,] 0.14166000 0.04890876 0.551371317 -0.410584228 -0.50816748 0.16017748 -0.16550098 0.0107956538 -0.006776436 -0.43920135 -0.024673194
[8,] 0.39286126 -0.22400349 -0.233646760 -0.163383379 0.35240864 0.39936624 0.04566958 -0.0050671692 0.004854094 -0.34940832 0.553435777
[9,] -0.17300311 -0.58704793 0.258726119 0.129284720 0.09953428 -0.00974481 -0.16245744 0.4021581567 0.585641363 0.02586437 0.004527292
[10,] 0.23136438 -0.14558042 -0.243144638 0.356342342 -0.53479363 -0.49483048 -0.14233570 0.0119226888 -0.012177037 -0.11106273 0.418845372
[11,] -0.05575345 0.19830939 -0.006860953 0.549276160 -0.12808736 0.59683133 -0.52824412 -0.0328509707 -0.030258508 0.04642032 0.002844908
PC
PCA Example
Extract the principal components:
> Allog_PCA=princomp(Allog,cor=TRUE)
> summary(Allog_PCA,loadings=TRUE)
Importance of components:
Comp.1 Comp.2 Comp.3
Standard deviation 1.7912956 1.3406876 1.1475338
Proportion of Variance 0.2917036 0.1634039 0.1197122
Cumulative Proportion 0.2917036 0.4551076 0.5748197
Loadings:
Comp.1 Comp.2 Comp.3
sp1 -0.465 -0.188
sp2 -0.464 -0.273 0.340
sp3 0.223 -0.553 -0.146
sp4 0.353 -0.278 0.312
sp5 0.258 0.122 0.515
sp6 -0.257 0.187
sp7 -0.140
-0.553
sp8 -0.393 -0.223 0.232
sp9 0.174 -0.586 -0.256
sp10 -0.231 -0.145 0.245
sp11
0.203
The first three components
account for 57.4% of the
variance with the first component
accounting for 29%.
PC
PCA Example
The equations of the first three principal components:
Y1  0.46sp1  0.46sp2  0.22sp3  35sp4  0.25sp5  0.25sp6  0.14sp7  0.39sp8  0.17sp9  0.23sp10
Y2  0.18sp1  0.27sp2  0.55sp3  0.27sp4  0.12sp5  0.18sp6  0.23sp8  0.58sp9  0.14sp10  0.2sp11
Y3  0.34sp 2  0.14sp3  0.31sp 4  0.51sp5  0.55sp7  0.23sp8  0.25sp9  0.24sp10
Species 1, 2, and 8 contribute more to the first principal component and
species 3 and 9 to the second principal component, and species 5 and 7
contribute more to the third principal component.
PC
PCA Example
Plot the variances of the principal components:
> screeplot(Allog_PCA,main="Alaskan", cex.names=0.5)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Variances
Alaskan
Comp.1
Comp.3
Comp.5
Comp.7
Comp.9
PC
PCA Example
Calculate the axis scores for each principal component:
> round(Allog_PCA$scores,2)
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11
[1,] 0.30 1.37 -0.48 -2.97 0.17 2.41 -0.96 -0.25 0.01 -0.06 -0.01
[2,] 0.26 1.23 -0.49 -2.60 0.12 2.00 -0.60 -0.23 0.00 -0.09 -0.01
[3,] 0.06 0.52 -0.54 -0.67 -0.16 -0.13 1.27 -0.12 -0.09 -0.25 -0.02
[4,] 0.06 0.52 -0.54 -0.67 -0.16 -0.13 1.27 -0.12 -0.09 -0.25 -0.02
[5,] 0.06 0.52 -0.54 -0.67 -0.16 -0.13 1.27 -0.12 -0.09 -0.25 -0.02
[6,] -0.56 0.96 -0.77 -0.13 -1.02 -1.11 -0.44 0.18 0.10 0.04 -0.01
[7,] -0.60 1.00 -0.78 -0.09 -1.08 -1.19 -0.56 0.21 0.12 0.06 -0.01
[8,] -0.67 1.05 -0.81 -0.03 -1.17 -1.29 -0.75 0.24 0.14 0.09 -0.01
[9,] -0.85 1.18 -0.88 0.14 -1.43 -1.58 -1.25 0.33 0.20 0.18 0.00
………………………………………………………………………………………………………………….
[35,] -4.58 -1.36 1.10 0.73 -1.79 1.37 0.10 0.01 0.03 -0.50 0.38
PC
PCA Example
Plot PC1 vs PC2 with different symbols for each stream:
>plot(Allog_PCA$scores[1:5,2]~Allog_PCA$scores[1:5,1],ylim=c(-2.5,2),xlim=c(-3,3),xlab="PC1",ylab="PC2",pch=15)
>points(Allog_PCA$scores[6:10,2]~Allog_PCA$scores[6:10,1],pch=16)
>points(Allog_PCA$scores[11:15,2]~Allog_PCA$scores[11:15,1],pch=17)
>points(Allog_PCA$scores[16:20,2]~Allog_PCA$scores[16:20,1],pch=0)
>points(Allog_PCA$scores[21:25,2]~Allog_PCA$scores[21:25,1],pch=1)
>points(Allog_PCA$scores[26:30,2]~Allog_PCA$scores[26:30,1],pch=2)
>points(Allog_PCA$scores[30:35,2]~Allog_PCA$scores[30:35,1],pch=5)
>legend("bottomleft",legend=as.character((unique(Alaskan[,1]))),bty="n",pch
=c(15,16,17,0,1,2,5))
PC
PCA Example
Plot PC1 vs. PC2 and PC1 vs. PC3 scores with different symbols for each stream:
-4
-1
stonefly
wolf
berg_n
tyndall
berg_s
rush_pt
carolus
-5
-4
-3
stonefly
wolf
berg_n
tyndall
berg_s
rush_pt
carolus
-2
PC3
-1
-2
-3
-4
-5
PC2
0
0
1
1
2
2
Carolus and berg north springs have the most different microcrustaceans
communities as shown on the PC1 axis. Differences between replicates are
observed on PC2 (berg north) and PC3 (carolus).
-2
0
PC1
2
-6
-4
-2
PC1
0
2
PC
PCA Example
Make a biplot showing the loadings of each variable on PC1 and PC2:
> biplot(Allog_PCA,xlabs=abbreviate(Alaskan[,1]),xlim=c(-0.6,0.3),ylim=c(-0.7,0.2))
4
-6
stnf
stnf
wolf
wolf
brg_s
tynd
wolf
sp11
sp6 wolf
rsh_
brg_s
rsh_
brg_s
tynd
sp5
stnf
crls sp7 brg_n
tynd
tynd
rsh_
rsh_
crls brg_n
rsh_
sp10
sp1
sp8
brg_ssp4
sp2
crls
crls
brg_n
crls
sp3
sp9
2
2
0
0
-2
-2
-4
-4
brg_n
brg_n
-0.6
-0.4
-0.2
0.0
Comp.1
0.2
-10 -8
-0.2
-0.4
-6
-0.6
Comp.2
0.0
0.2
-8
The Carolus springs have
species 1, 2, and 8 and Berg North
have species 3 and 9. The separation
between the two springs is shown on PC1.
PC
PCA Example
Make a biplot showing the loadings of each variable on PC1 and PC3:
> biplot(Allog_PCA,xlabs=abbreviate(Alaskan[,1]), choices=c(1,3)
,xlim=c(-0.6,0.4),ylim=c(-0.4,0.3))
2
0
-2
crls
sp5
tynd
tynd
crls
tynd
crlssp2
sp10 brg_s
brg_nsp4
sp8
rsh_
brg_n
brg_s
brg_n
brg_s
brg_s
sp11
rsh_
rsh_
sp1 sp6 rsh_
sp3
stnf
stnf
rsh_
wolf
crls
sp9
wolf
wolf
brg_n
sp7 brg_n
4
5
-6
-4
0.0
-0.2
0
-0.4
Comp.3
0.2
-5
-0.6
-0.2
Comp.1
crls
0.2 0.4
The Carolus and berg borth springs
are separated on both comp.1 and comp.3.
The presence of species 2, 8, and 10 in
carolus spring and species 3 and 9 in
berg north spring shows the difference between
the two springs.
Download