PC PCA Example Data set “Alaskan” presents the microcrustacean densities in seven streams from south eastern Alaska which are formed when the glaciers melt. Five replicate samples were taken from each stream. The sites range in age from 18 to approximately 1400 years. How are the microcrustacean species distributed in the seven streams? R data “Alaskan”: Variables: Str- stream rep – replicate sp1 – Nitocra hibernica sp2 – Atheyella illinoisensis sp3 – Atheyella idahoensis sp4 – Bryocamptus hiemalis sp5 – Bryocamptus zschkkei sp6 – Aanthocyclops vernalis sp7 - Alona guttata sp8 – Grapholebies sp sp9 – Chydorus sp10- Macrothricidae sp11- Maraenobiotus insignipes. PC PCA Example Read the data “Alaskan”: >Alaskan=read.csv("E:/Multivariate_analysis/Data/Alaskan.csv",header=T) Subset the data by removing the first three columns. We will have a data with all microcrustacean species: > Al=Alaskan[,-c(1:3)] Log transform log(x+1) the Al data: > Allog=log(Al[,1:11]+1) Calculate the variance for each variable: > round(sapply(Allog,var),2) sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10 sp11 2.72 0.95 1.90 5.05 5.03 5.51 0.23 0.45 0.45 0.23 1.02 PC PCA Example Calculate the correlation matrix of Allog data: > Allog_Cor=round(cor(Allog),2) sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10 sp11 sp1 1.00 0.72 -0.16 -0.34 -0.31 0.18 0.47 0.64 -0.10 0.30 -0.12 sp2 0.72 1.00 -0.12 -0.26 -0.23 0.23 -0.05 0.79 -0.07 0.57 -0.09 sp3 -0.16 -0.12 1.00 0.36 0.03 -0.22 -0.07 -0.10 0.63 -0.07 -0.12 sp4 -0.34 -0.26 0.36 1.00 0.53 -0.36 -0.15 -0.21 0.35 -0.15 -0.05 sp5 -0.31 -0.23 0.03 0.53 1.00 -0.14 -0.13 -0.19 -0.19 -0.13 -0.02 sp6 0.18 0.23 -0.22 -0.36 -0.14 1.00 0.11 0.23 -0.21 0.06 -0.16 sp7 0.47 -0.05 -0.07 -0.15 -0.13 0.11 1.00 -0.04 -0.04 -0.03 -0.05 sp8 0.64 0.79 -0.10 -0.21 -0.19 0.23 -0.04 1.00 -0.06 -0.04 -0.07 sp9 -0.10 -0.07 0.63 0.35 -0.19 -0.21 -0.04 -0.06 1.00 -0.04 -0.07 sp10 0.30 0.57 -0.07 -0.15 -0.13 0.06 -0.03 -0.04 -0.04 1.00 -0.05 sp11 -0.12 -0.09 -0.12 -0.05 -0.02 -0.16 -0.05 -0.07 -0.07 -0.05 1.00 PC PCA Example Calculate the eigenvectors and eigenvalues for the correlation matrix: > eigen(Allog_Cor) $values [1] 3.209758751 1.793647878 1.317301716 1.149864175 1.039372721 1.032387926 0.669324985 0.418930903 0.240541559 0.122835010 0.006034375 $vectors [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 0.46505201 -0.18595669 0.081346302 -0.197489482 -0.25285267 0.21921003 0.05270992 -0.0313541917 0.007230858 0.76488888 0.051683732 [2,] 0.46362324 -0.27574083 -0.338845535 0.075375284 -0.03802553 0.02737146 -0.05648792 0.0003910978 0.001684852 -0.26146377 -0.717475043 [3,] -0.22164587 -0.55323204 0.147472706 -0.007644819 0.08091379 -0.05457051 -0.21479408 -0.7000387364 -0.277652259 0.01348259 0.003537124 [4,] -0.35482789 -0.27945753 -0.310489062 -0.270900614 -0.17976336 0.12866504 -0.15177057 0.5039342415 -0.545694872 0.05062810 0.003220773 [5,] -0.25824043 0.11764614 -0.517812434 -0.432641139 -0.21547262 0.07078586 -0.24889042 -0.2761087974 0.522539482 0.03781096 0.008025880 [6,] 0.25707343 0.18349145 0.087053889 -0.232422653 0.39317365 -0.37462577 -0.71256124 0.1252277896 -0.088963762 0.11074437 0.010540076 [7,] 0.14166000 0.04890876 0.551371317 -0.410584228 -0.50816748 0.16017748 -0.16550098 0.0107956538 -0.006776436 -0.43920135 -0.024673194 [8,] 0.39286126 -0.22400349 -0.233646760 -0.163383379 0.35240864 0.39936624 0.04566958 -0.0050671692 0.004854094 -0.34940832 0.553435777 [9,] -0.17300311 -0.58704793 0.258726119 0.129284720 0.09953428 -0.00974481 -0.16245744 0.4021581567 0.585641363 0.02586437 0.004527292 [10,] 0.23136438 -0.14558042 -0.243144638 0.356342342 -0.53479363 -0.49483048 -0.14233570 0.0119226888 -0.012177037 -0.11106273 0.418845372 [11,] -0.05575345 0.19830939 -0.006860953 0.549276160 -0.12808736 0.59683133 -0.52824412 -0.0328509707 -0.030258508 0.04642032 0.002844908 PC PCA Example Extract the principal components: > Allog_PCA=princomp(Allog,cor=TRUE) > summary(Allog_PCA,loadings=TRUE) Importance of components: Comp.1 Comp.2 Comp.3 Standard deviation 1.7912956 1.3406876 1.1475338 Proportion of Variance 0.2917036 0.1634039 0.1197122 Cumulative Proportion 0.2917036 0.4551076 0.5748197 Loadings: Comp.1 Comp.2 Comp.3 sp1 -0.465 -0.188 sp2 -0.464 -0.273 0.340 sp3 0.223 -0.553 -0.146 sp4 0.353 -0.278 0.312 sp5 0.258 0.122 0.515 sp6 -0.257 0.187 sp7 -0.140 -0.553 sp8 -0.393 -0.223 0.232 sp9 0.174 -0.586 -0.256 sp10 -0.231 -0.145 0.245 sp11 0.203 The first three components account for 57.4% of the variance with the first component accounting for 29%. PC PCA Example The equations of the first three principal components: Y1 0.46sp1 0.46sp2 0.22sp3 35sp4 0.25sp5 0.25sp6 0.14sp7 0.39sp8 0.17sp9 0.23sp10 Y2 0.18sp1 0.27sp2 0.55sp3 0.27sp4 0.12sp5 0.18sp6 0.23sp8 0.58sp9 0.14sp10 0.2sp11 Y3 0.34sp 2 0.14sp3 0.31sp 4 0.51sp5 0.55sp7 0.23sp8 0.25sp9 0.24sp10 Species 1, 2, and 8 contribute more to the first principal component and species 3 and 9 to the second principal component, and species 5 and 7 contribute more to the third principal component. PC PCA Example Plot the variances of the principal components: > screeplot(Allog_PCA,main="Alaskan", cex.names=0.5) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Variances Alaskan Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 PC PCA Example Calculate the axis scores for each principal component: > round(Allog_PCA$scores,2) Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 [1,] 0.30 1.37 -0.48 -2.97 0.17 2.41 -0.96 -0.25 0.01 -0.06 -0.01 [2,] 0.26 1.23 -0.49 -2.60 0.12 2.00 -0.60 -0.23 0.00 -0.09 -0.01 [3,] 0.06 0.52 -0.54 -0.67 -0.16 -0.13 1.27 -0.12 -0.09 -0.25 -0.02 [4,] 0.06 0.52 -0.54 -0.67 -0.16 -0.13 1.27 -0.12 -0.09 -0.25 -0.02 [5,] 0.06 0.52 -0.54 -0.67 -0.16 -0.13 1.27 -0.12 -0.09 -0.25 -0.02 [6,] -0.56 0.96 -0.77 -0.13 -1.02 -1.11 -0.44 0.18 0.10 0.04 -0.01 [7,] -0.60 1.00 -0.78 -0.09 -1.08 -1.19 -0.56 0.21 0.12 0.06 -0.01 [8,] -0.67 1.05 -0.81 -0.03 -1.17 -1.29 -0.75 0.24 0.14 0.09 -0.01 [9,] -0.85 1.18 -0.88 0.14 -1.43 -1.58 -1.25 0.33 0.20 0.18 0.00 …………………………………………………………………………………………………………………. [35,] -4.58 -1.36 1.10 0.73 -1.79 1.37 0.10 0.01 0.03 -0.50 0.38 PC PCA Example Plot PC1 vs PC2 with different symbols for each stream: >plot(Allog_PCA$scores[1:5,2]~Allog_PCA$scores[1:5,1],ylim=c(-2.5,2),xlim=c(-3,3),xlab="PC1",ylab="PC2",pch=15) >points(Allog_PCA$scores[6:10,2]~Allog_PCA$scores[6:10,1],pch=16) >points(Allog_PCA$scores[11:15,2]~Allog_PCA$scores[11:15,1],pch=17) >points(Allog_PCA$scores[16:20,2]~Allog_PCA$scores[16:20,1],pch=0) >points(Allog_PCA$scores[21:25,2]~Allog_PCA$scores[21:25,1],pch=1) >points(Allog_PCA$scores[26:30,2]~Allog_PCA$scores[26:30,1],pch=2) >points(Allog_PCA$scores[30:35,2]~Allog_PCA$scores[30:35,1],pch=5) >legend("bottomleft",legend=as.character((unique(Alaskan[,1]))),bty="n",pch =c(15,16,17,0,1,2,5)) PC PCA Example Plot PC1 vs. PC2 and PC1 vs. PC3 scores with different symbols for each stream: -4 -1 stonefly wolf berg_n tyndall berg_s rush_pt carolus -5 -4 -3 stonefly wolf berg_n tyndall berg_s rush_pt carolus -2 PC3 -1 -2 -3 -4 -5 PC2 0 0 1 1 2 2 Carolus and berg north springs have the most different microcrustaceans communities as shown on the PC1 axis. Differences between replicates are observed on PC2 (berg north) and PC3 (carolus). -2 0 PC1 2 -6 -4 -2 PC1 0 2 PC PCA Example Make a biplot showing the loadings of each variable on PC1 and PC2: > biplot(Allog_PCA,xlabs=abbreviate(Alaskan[,1]),xlim=c(-0.6,0.3),ylim=c(-0.7,0.2)) 4 -6 stnf stnf wolf wolf brg_s tynd wolf sp11 sp6 wolf rsh_ brg_s rsh_ brg_s tynd sp5 stnf crls sp7 brg_n tynd tynd rsh_ rsh_ crls brg_n rsh_ sp10 sp1 sp8 brg_ssp4 sp2 crls crls brg_n crls sp3 sp9 2 2 0 0 -2 -2 -4 -4 brg_n brg_n -0.6 -0.4 -0.2 0.0 Comp.1 0.2 -10 -8 -0.2 -0.4 -6 -0.6 Comp.2 0.0 0.2 -8 The Carolus springs have species 1, 2, and 8 and Berg North have species 3 and 9. The separation between the two springs is shown on PC1. PC PCA Example Make a biplot showing the loadings of each variable on PC1 and PC3: > biplot(Allog_PCA,xlabs=abbreviate(Alaskan[,1]), choices=c(1,3) ,xlim=c(-0.6,0.4),ylim=c(-0.4,0.3)) 2 0 -2 crls sp5 tynd tynd crls tynd crlssp2 sp10 brg_s brg_nsp4 sp8 rsh_ brg_n brg_s brg_n brg_s brg_s sp11 rsh_ rsh_ sp1 sp6 rsh_ sp3 stnf stnf rsh_ wolf crls sp9 wolf wolf brg_n sp7 brg_n 4 5 -6 -4 0.0 -0.2 0 -0.4 Comp.3 0.2 -5 -0.6 -0.2 Comp.1 crls 0.2 0.4 The Carolus and berg borth springs are separated on both comp.1 and comp.3. The presence of species 2, 8, and 10 in carolus spring and species 3 and 9 in berg north spring shows the difference between the two springs.