Jen Costanza 12/5/05 Biol 112 Vegetation Analysis – Final Lab Introduction A common goal of ecologists is to describe communities by determining which species occur together and why (McCune and Grace 2002). Because vegetation communities do not have clear boundaries, community description is often difficult. Similarly, because environmental variables interact to influence species composition, determining the most important variables is challenging. However, there are several techniques in multivariate analysis that can aid in teasing apart these interdependences and summarizing important interactions among environmental variables. In this study, I examined species occurrence data from plots in the Duke Forest, North Carolina, as well as environmental data from the same plots. I used several multivariate community analysis techniques to classify and describe the communities present in Duke Forest, and I explored the main factors that distinguish each group. Data and Methods Data I used stem counts of woody species from 106 Carolina Vegetation Survey plots in Duke Forest. The stem count data were log-transformed, then relativized by plot so that all values for species within a single plot summed to 100. 56 total species were present in the data set (Table 1). I also used data for 16 environmental variables in the same plots. Environmental variables included soil characteristics such as pH, nutrients and texture, as well as topographic factors such as slope, aspect, and distance to water. Analyses – good detail First, I used ordination using PC-ORD software to compare plots along important axes and help determine which species and environmental variables are most important in determining species composition in the plots. Nonmetric multidimensional scaling (NMS) was the type of ordination I used because it avoids the assumption of linear relationships among variables (McCune and Grace 2002). NMS uses ranked distances, so it tends to linearize relationships between distances measured in species space (McCune and Grace 2002). I did a preliminary, step-down ordination to determine the dimensionality to choose for my focal ordination. For the step-down ordination, I chose six dimensions using the Bray-Curtis distance measure, a random number starting configuration, and 20 runs with real data. The step-down ordination created six models, one each with six, five, four, three, two and one dimensions. For each model, it calculated the amount of stress on the model, or how far the data after ordination diverge from the original data (McCune and Grace 2002). According to McCune and Grace (2002), stress values below 20 indicate that a model should provide useful results, with values closer to 0 most preferred but rarely achieved. The scree plot is a graphical output that shows the stress as a function of dimensionality. I examined the scree plot and determined that three dimensions were adequate to describe my data, since the three-dimensional model had a stress of under 20. Models with four, five or six dimensions did not reduce the stress below 10. In addition, the stress for the three-dimensional model was stable after it reached a solution. Therefore, I did a focal NMS ordination with three dimensions, using the distance measure and other criteria listed above. Using PC-ORD, I was able to create biplots showing where each plot occurred along each of the three axes. I overlaid the species abundance data and environmental attributes on these biplots to determine which of these were correlated with which axes. The output from the NMS ordination included r2 values for the correlations between each species or environmental variable and each of the three axes in ordination space. These were helpful in determining which axes corresponded to which species and which environmental variables. To examine in more detail the characteristics of the data, I used polythetic hierarchical agglomerative cluster analysis. This analysis sorts the plots into groups based on their species composition according to a matrix of distances between each pair of plots. I used polythetic clustering because it bases clustering on multiple species. Hierarchical clustering was used because larger groups are formed from smaller grouping levels. Later fusions therefore depend on earlier fusions. Since it is agglomerative, the clustering starts with individual plots and begins grouping them into successively larger clusters. Chaining occurs when new groups are formed by the addition of single items to existing groups, and a low amount of chaining is desirable (McCune and Grace 2002). I used the Sorensen distance measure, with flexible beta as my linkage method, since according to McCune and Grace (2002) it is compatible with the Sorensen measure. I chose a beta of -.25 because it has the least propensity to chain (McCune and Grace 2002). I ran the cluster analyses for six groups and included all lower-level clustering, so the output included cluster dendrograms for six, five, four, three and two groups. I then qualitatively examined the species composition of each group, as well as the environmental variables that ?... I used this clustering as the basis of my next analysis, indicator species analysis (ISA). I used ISA to characterize the species that belong to each group. ISA combines species abundance and frequency to determine to what extent a species is diagnostic for a particular group. A perfect indicator species for a group will only occur in that group (100% of its abundance in that group), and will occur in all plots in that group (a value of 100% frequency for that group). Relative abundance (RA) is calculated as the average abundance of a given species in a given group divided by the average abundance of that species in all samples, expressed as a percent. Relative frequency (RF) is calculated as the percent of samples in a given group where the species is present. An indicator value (IV) is calculated as the combination of RA and RF. For every species, a maximum IV (IVmax) was calculated as well. To test for significance of the results, I ran a Monte Carlo test using 1000 randomizations. This method randomly assigns species to groups 1000 times, and calculates an IVmax for each randomization. The null hypothesis is that IVmax from the clustering for a particular species is no larger than would be expected by chance from the randomization. P-values of < 0.05 indicate that the IVmax for a particular species is significantly different from chance. I ran ISA in PC-ORD for the three, four, five and six group clusters. RA, RF, and IV were outputs. To determine how many groups to use for my ISA and community description, I examined looked for the cluster level that produced results with the lowest average p-value, and the largest number of significant p-values (< 0.05). Results Ordination The three-dimensional model in the focal run had a final stress of 15.80993 and a final instability of 0.00044. These values are acceptable, and models with more dimensions did not reduce stress by a large amount. Biplots from the three-dimensional solution are shown in Figure 1. All species correlated with the ordination axes (r2 > .200) are shown as arrows overlaid on the plot data. In addition, Table 2 shows the r2 values at or above .200 for species and environmental factors that are correlated with each axis. The ordination graphs with tree species and environmental variables overlaid give a visual picture of how these correspond to the three axes. The correlation coefficients show how species and environmental variables relate to the axes quantitatively. From the graphs, Acer rubrum and Quercus prinus are positively correlated with Axis 1, with Quercus prinus showing a slightly stronger relationship. The r2 correlation coefficients for Axis 1 show the same trend. Quercus prinus has an r2 value of .408, while Acer rubrum had an r2 = .281. Four species are sorted along Axis 2. Fagus grandifolia and Liriodendron tulipifera are positively correlated with Axis 2, and Juniperus virginiana and Quercus stellata are negatively correlated, according to the graphs. Each one of these has an r2 > .300. Several species are sorted along Axis 3. The distributions of Liquidambar styraciflua, Carpinus caroliniana and Ulmus alata are positively correlated with the axis, while Oxydendron arboreum and Quercus alba are negatively correlated, according to the graphs. Liquidambar styraciflua and Quercus alba have the strongest relationship to the axis, as shown by the longer lines for those species. Again, this corresponds to r2 values for these species. All of the species have r2 above .200. Liquidambar styraciflua and Quercus alba have very high r2 values above .500. The graphs with the environmental variables overlaid show that Al, pH, Mn, and Ca have the greatest correlation with Axis 1. The r2 values for all of these variables are at .200 and above. Therefore, the distributions of Acer rubrum and Quercus prinus each likely depend on these variables. However, the graphs show no environmental variables that are sorted along this axis. Similarly, none of the environmental variables has an r2 > .200. This probably means that the species that sort along Axis 2 are influenced by a variable that was not included in this data set. The environmental variables that are correlated with Axis 3 are Mg, Ca, distance to water, and elevation. Each of these has an r2 > .200. In particular, Dist-H20 has an r2 of .542, indicating a relatively strong relationship. Therefore, the distribution of species correlated with Axis 3 such as Quercus alba, Liquidambar styraciflua, and Oxydendron arboreum must be influenced by these environmental variables. Clustering and ISA The six-group clustering level has the highest number of significant p-values and the highest average p-value (Figure 2), so that is the one that was used for this analysis. The clustering dendrogram (Figure 3) shows the agglomeration done by PC-ORD. Table 3 shows the species with the highest IVs for each group, along with the RA and RF values that correspond to those species. An IV in bold indicates that it is a maximum IV for that species, and is significantly different from the result of the Monte Carlo randomization (p < .05). Based on the IV, RA and RF values for each group, I determined the dominant species for each group. Group 1 is the Quercus alba/Oxydendron arboreum/Quercus velutina group, or the oaksourwood community. This group is made up of 44 of the 106 plots, so it would be expected to have a great deal of variation in species composition among plots. This probably accounts for the relatively low IV and RA values in this group; no species has an IV of greater than 50%. Distance to water and soil aluminum also appear to be associated with this group based on biplots and overlays in PC-ORD. Group 3 can be characterized as an Ulmus/Ilex group, or an elm-holly community. Both Ulmus alata and Ulmus rubra have relatively high IVs in Group 3, as well as RA and RF values. However, it does not appear that any of the environmental variables measured is associated with this group. Group 12 is characterized by Fagus grandifolia, as well as Cornus florida and Liriodendron tulipifera to a lesser extent. I will name this group the beech group, since F. grandifolia has fairly high values for RA and RF in Group 12. Soil pH is associated with this group as well. Group 36 is the Quercus prinus or chestnut oak community, although Quercus coccinea (scarlet oak) is often present. Distance to water and elevation are associated with this group, so the presence of chestnut oak must depend on these environmental factors. Group 37 is the Fraxinus/Cercis canadensis group, or an ash-redbud community. Both of these species are present in all plots in this group (RA=100%), and a large portion of their total abundances fall within this group (RA > 50%). Group 37 also seems to be characterized by distance to water, soil pH, and soil Mn. Group 87 has three pine species with fairly high IVs and RF = 100. This group appears to represent a southern pine community; however, the group is only made up of 3 plots. Therefore, it is difficult to determine whether this is an actual community type or if it is just a residual group of plots that should be in other groups. No environmental variables measured here are associated with this group. Discussion As a result of the ordination and clustering analysis, I was able to identify approximately six community types with the Duke Forest data: oak-sourwood, elm-holly, beech, chestnut oak, ashredbud and perhaps a southern pine community type. Ordination and subsequent overlays of species and environmental data were helpful in determining the characteristics associated with each of these groups. Most groups showed at least one soil variable associated with them, except for the elm-holly and southern pine communities. It could be that none of the environmental variables measured in this data set influence the presence of the species in these groups. However, environmental variables probably do not correlate with the southern pine group because there are only three plots in that group. Personal knowledge of the data would aid in determining which of these groups may be valid or invalid. Reference McCune, B. and James B. Grace 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, OR. Table 1: Species present in the data set. Code ACNE ACRU ACSA AMAR BENI CACR CACA CACO CAGL CAOL CAOV CAPA CATO CECA CEOC COFL COST CRMA CRUN CRAT DIVI FAGR FRAX ILAM ILDE ILOP JUNI JUVI LIST LITU LOJA MATR MORU NYSY OSVI OXAR PITA PIEC PIVI PLOC PRAM PRSE QUAL QUCO QUFA QUMA QUMI QUNI QUPH QUPR QURU QUSH QUST QUVE SAAL ULAL ULAM ULRU Scientific name ACER NEGUNDO ACER RUBRUM ACER SACCHARUM AMELANCHIER ARBOREUM BETULA NIGRA CARPINUS CAROLINA CARYA CAROLINAE-SEPTENTRIONALIS CARYA CORDIFORMIS CARYA GLABRA CARYA OVALIS CARYA OVATA CARYA PALLIDA CARYA TOMENTOSA CERCIS CANADENSIS CELTIS OCCIDENTALIS CORNUS FLORIDA CORNUS STRICTA CRATAEGUS MARSHALLII CRATAEGUS UNIFLORA CRATAEGUS SP. DIOSPYRUS VIRGINIANUS FAGUS GRANDIFOLIA FRAXINUS SP. ILEX AMBIGUA ILEX DECIDUA ILEX OPACA JUGLANS NIGRA JUNIPERUS VIRGINIANA LIQUIDAMBAR STYRICIFLUA LIRIODENDRON TULIPIFERA LONICERA JAPONICA MAGNOLIA TRIPETALA MORUS RUBRA NYSSA SYLVATICA OSTRYA VIRGINIANA OXYDENDRUM ARBOREUM PINUS TAEDA PINUS ECHINATA PINUS VIRGINIANA PLATANUS OCCIDENTALIS PRUNUS AMERICANA PRUNUS SEROTINA QUERCUS ALBA QUERCUS COCCINEA QUERCUS FALCATA QUERCUS MARILANDICA QUERCUS MICHAUXII QUERCUS NIGRA QUERCUS PHELLOS QUERCUS PRINUS QUERCUS RUBRA QUERCUS SHUMARDII QUERCUS STELLATA QUERCUS VELUTINA SASSAFRAS ALBIDUM ULMUS ALATA ULMUS AMERICANA ULMUS RUBRA Table 2: Species and environmental factors with strong correlations to the three ordination axes. r2 values > .200 are listed. Refer to Table 1 for species codes. Axes Species / Env ACRU CACR FAGR JUVI LIST LITU OXAR QUAL QUPR QUST ULAL pH Ca Mg Al Mn Distance to H20 Elevation 1 0.281 2 3 0.317 0.331 0.378 0.536 0.319 0.22 0.531 0.408 0.39 0.248 0.368 0.223 0.286 0.332 0.200 0.236 0.542 0.223 Table 3: Importance values (IV), relative abundances (RA) and relative frequencies (RF) for species in each group as a result of the clustering analysis. The species with the top seven IV’s for each group are shown. Significant maximum IV’s are shown in bold. Refer to Table 1 for species codes. Group ID # plots 1 1 44 Species IV QUAL 44 OXAR 37 QUVE 35 CATO 34 COFL 33 CAOL 28 NYSY 28 2 3 11 RA RF Species IV 44 100 ULAL 66 44 84 ILDE 62 45 80 LIST 60 36 95 CACO 49 34 98 ULRU 49 51 55 MORU 43 28 98 CAOV 39 RA 81 97 60 67 89 59 54 3 12 29 RF Species IV 82 FAGR 73 64 LITU 58 100 COFL 34 73 QURU 30 55 ACRU 29 73 LIST 22 73 NYSY 20 RA 88 68 37 38 29 31 24 4 36 11 RF Species IV RA RF Species 83 QUPR 99 99 100 FRAX 86 OXAR 37 41 91 CECA 93 QUCO 37 82 45 OSVI 79 ACRU 33 33 100 QURU 100 CAPA 18 100 18 QUAL 72 QUVE 17 23 73 CAGL 86 QUMA 14 76 18 PRSE 5 37 8 IV 74 59 45 38 35 34 33 6 87 3 RA RF Species IV 74 100 PIEC 84 59 100 PIVI 69 52 88 JUVI 63 43 88 QUST 59 35 100 PITA 39 39 88 DIVI 19 38 88 QUMA 13 RA 84 69 63 59 39 19 19 RF 100 100 100 100 100 100 67 TreeLong_NMS Group6 1 3 12 36 37 87 LITU FAGR ACRU Axis 2 QUPR JUVI QUST (a) Axis 1 TreeLong_NMS Group6 1 3 12 36 37 87 LIST Axis 3 CACR ULAL ACRU QUPR OXAR (b) QUAL Axis 1 TreeLong_NMS Group6 1 3 12 36 37 87 LIST Axis 3 CACR ULAL LITU FAGR JUVI QUST OXAR (c) QUAL Axis 2 Figure 1a-c: Biplots from the three-dimensional NMS ordination solution. All species with r2 > .200 for a given axis are overlaid on the plot data. Groups correspond to those resulting from the cluster analysis. Refer to Table 1 for species codes. 30 0.18 0.16 25 20 0.12 0.1 15 0.08 10 Average p-value Number of Significant p-values 0.14 0.06 # significant p-values Average p-value 0.04 5 0.02 0 0 3 groups 4 groups 5 groups 6 groups Number of Groups Figure 2: The number of significant p-values and the average p-values for all grouping levels. The sixgroup level has the most significant p-values, and the largest average p-value, so it was the grouping level chosen in this analysis. group Distance (Objective Function) 1.6E-02 5.1E+00 1E+01 1.5E+01 2E+01 25 0 Information Remaining (%) 100 00001 PSP37 00018 00021 00574 00004 00008 00014 00002 00005 00007 00509 00520 00016 00024 00023 00033 00010 00031 00020 00042 00019 00069 00517 PSP36 00555 00581 00598 00589 00571 00579 00009 00012 00011 00015 00017 00022 00067 00618 00513 00514 00620 00537 00619 00501 00504 00524 00502 PSP88 PSP86 00508 PSP87 00617 PSP35 PSP34 00081 00606 00590 00510 00596 00512 00511 00607 00621 00608 00609 00003 00612 00616 00611 00614 00615 00622 00624 00575 00593 00582 00013 00029 PSP44 PSP61 00503 00507 00025 00026 00583 00584 00585 00032 00505 00506 00625 PSP10 00027 00587 00518 00588 00602 00515 PSP43 00028 00516 00030 00610 00613 00623 75 50 Group6 1 3 12 36 37 87 Figure 3: Clustering dendrogram showing six groups. The length of each branch in the dendrogram indicates the amount of information needed to create each group. Colors correspond to group ID numbers: Group 1 – red, Group 3 – green, Group 12 – light blue, Group 36 – purple, Group 37 – dark blue, Group 87 – yellow. Excellent! Very clear, good detail 26/26