Stat 415, section 3. HW 1 solutions This is written in Word, which changes characters to look pretty. You can cut and paste most of the code below into R. You can not cut and paste code with quotes. R doesn’t recognize Word’s pretty quotes. 1) Echinacea metabolites a) Pur504 (row 30) You are looking for the sample with the smallest distance. That’s most easily done by looking at the Pur307 row of the matrix containing the distances. lipo.m <- as.matrix(lipo[,-(1:2)]) dimnames(lipo.m)[[1]] <- lipo$access lipo.canb <- vegdist(lipo.m, ‘canb’) temp <- as.matrix(lipo.canb) temp[‘Pur307’, ] The smallest is Pur504 (row 30) with a distance of 0.426 b) There are number of metabolites that are in Pur307 and Pur504, but not in Ang267. lipo.m[c(1,28, 30),] c) Pur 313 lipo.pa <- decostand(lipo.m, 'pa') lipo.jd <- vegdist(lipo.pa, 'jacc') temp <- as.matrix(lipo.jd) temp[‘Pur307’, ] The smallest is Pur313 (row 29). d) Most metabolites are present in all 3 samples or absent in all three samples. There are two that are not. Pur307 and 313 differ in metabolite (unknown_A2). Pur307 and 504 differ in 2 (unknown_A2 unknown_B7). lipo.pa[28:30,] e) I expect them to be quite similar because the two distance measures are highly correlated. The plot is on the next page 0.8 0.6 0.4 0.0 0.2 lipo.jd 0.2 0.4 0.6 0.8 1.0 lipo.canb f) 3 Dimensions. Stress for the nMDS 2D solution is 11.5%. That for the 3D solution is 7.9% lipo.mds2 <- metaMDS(lipo.canb, k = 2, expand=F) # because we’re starting with the distance matrix autotransform=F is not necessary #or, to get the species scores, use the data matrix lipo.mds2 <- metaMDS(lipo.m, ‘canb’, k=2, autotransform=F, expand=F) lipo.mds2 <- metaMDS(lipo.canb, k = 2, expand=F) g) The very obvious ones are Pur, Lae, Tenn. All three Pur samples and all Tenn samples are on the edge of the plot of samples, close to each other and relatively well separated from the rest. The three Lae samples are also close to each other in the middle of the plot but separated from the other species. ordirgl(lipo.mds3) Note: Pal215 is one sample that stands out from the rest, but the other Pal samples are quite distant. h) Again, Lae, Pur, and Tenn are the species relatively well separated from the rest (plot on next page). ordiplot(lipo.mds2,type=’t’,disp=’sites’) 0.4 0.2 Tenn326 Tenn325 Tenn324 Tenn250 AngAng318 0.0 Pal275 AngAng285 AngStr266 AngAng272 Lae310 Lae314 Lae312 Lae316 Atr260 Atr262 Atr255 Ang267 Atr299 Simu249 ParaNeg264 Hyb306 ParaNeg263 ParaNeg265 Pal293 Simu308 Simu304 AngStr320 Pal290 ParaPara301 Pal296 ParaPara321 Hyb294 ParaPara292 -0.2 NMDS2 Sang258 Sang878 Sang257 Pal215 -0.4 Pur504 Pur307 -0.4 -0.2 Pur313 0.0 0.2 0.4 NMDS1 i) Your explanation here is more important than your specific answer. I would use the 2D plot because it represents the same features seen in the 3D plot, even though the stress is slightly above 10%. j) Unknown B1, Unknown B6, amide 16, amide 13. Note: You are looking for the compounds (species scores to vegan) that are plotted in the direction of the Tenn samples. To get the species scores, you need to do the mds on the raw data matrix. ordipointlabel(lipo.mds2) k) Yes, the plot does. 1.0 lipo.m[, "amide_16"] 0.07 0.06 0.5 0.05 0.04 0.03 0.0 0.01 -0.5 NMDS2 0.02 -1.0 0 -2 -1 0 1 2 NMDS1 amide 16 increases towards the locations of the Tenn samples. ordisurf(lipo.mds2, lipo.m[,'amide_16']) l) Yes, the arrow appropriate summarizes the variation in amide 16. The arrow points in the direction of increasing amounts of amide 16, the contours from ordisurf are reasonably parallel and equi-distant. 1.0 lipo.m[, "amide_16"] 0.07 2 1 0.06 0.5 0.05 0.04 0.03 0.0 -0.5 0.01 0 -1.0 NMDS2 0.02 -2 -1 0 NMDS1 1 2 2) Missouri River fish data a) Bray-Curtis after standardizing to site total and Morisita-Horn are very highly correlated. Jaccard is much less correlated. BC and MH will give similar results; Jaccard may give very different results. Note: the curvature in BC vs MH plot is not relevant (at least for nMDS) because nMDS is based on ranks, not linear correlation. fish.bray <- vegdist(decostand(fish.m,’total’), ‘bray’) fish.mh <- vegdist(fish.m, ‘horn’) fish.j <- vegdist(decostand(fish.m, ‘pa’), ‘jacc’) pairs(cbind(fish.bray, fish.mh, fish.j)) 0.2 0.4 0.6 0.8 1.0 0.6 0.8 1.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 fish.bray 0.0 0.2 0.4 0.6 0.8 1.0 fish.mh fish.j 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 b) Yes. Standardizing by site total removes the dependence on total abundance. BC is most sensitive to differences in the abundant species. MH would have also been appropriate. c) You would need 4 dimensions if you strictly demanded stress < 10%. Stress for a 2D solution is 0.143, for a 3D solution is 0.102, for a 4D solution is ca 0.080. fish.mds2 <- metaMDS(fish.bray, k=2) fish.mds3 <- metaMDS(fish.bray, k=3, trymax=200) fish.mds4 <- metaMDS(fish.bray, k=4, trymax=200) Note: it was hard to find the best 3D solution and even harder to find the best 4D. trymax= tells metaMDS to try even more starting positions. d) gear is the only factor with visually obvious association with species composition. The convex hulls for sites and habitats overlap each other, but those for some gears don’t overlap. plot(fish.mds2,disp='sites') ordihull(fish.mds2,fish.env$site,label=T) title("Sites") plot(fish.mds2,disp='sites') ordihull(fish.mds2,fish.env$habitat,label=T) title("Habitats") plot(fish.mds2,disp='sites') ordihull(fish.mds2,fish.env$gear,label=T) title("Gears") 0.2 Sites 0.0 L -0.2 O -0.4 NMDS2 C -0.4 -0.2 0.0 NMDS1 0.2 0.4 0.6 0.0 m c -0.4 -0.2 NMDS2 0.2 Habitats -0.4 -0.2 0.0 0.2 0.4 0.6 NMDS1 0.2 Gears SH 0.0 SN -0.2 LH EF -0.4 NMDS2 FN -0.4 -0.2 0.0 NMDS1 0.2 0.4 0.6 e) OcLH02 is very different from the others. The two species strongly associated with that sample are GZSD and BKCP. Note: You are looking at for a point well separated from the others. The species most strongly associated with that sample are those with species scores closest to OcLH02. f) When this HW was assigned, we knew about two ways to answer this question: visual assessment, e.g. using convex hulls and testing association between the ordination and the habitat using envfit. Either was appropriate. In neither approach is there evidence of a difference. Visually: 0.0 0.2 0.4 -0.4 mc -0.8 NMDS2 Habitats, LH gear -0.5 0.0 0.5 1.0 1.5 NMDS1 fish.lhmds <- metaMDS(decostand(fish.lh, ‘total’), autotransform=F, expand=F) plot(fish.lhmds, disp=’sites’) ordihull(fish.lhmds, fish.envlh$habitat) The envfit test has a p-value of 0.97. (because this is permutation based, you may have a slightly different p-value). No evidence of a difference among habitats. envfit(fish.lhmds, fish.envlh$habitat)