Introduction

advertisement
Jen Costanza
12/5/05
Biol 112
Vegetation Analysis – Final Lab
Introduction
A common goal of ecologists is to describe communities by determining which species occur
together and why (McCune and Grace 2002). Because vegetation communities do not have clear
boundaries, community description is often difficult. Similarly, because environmental variables
interact to influence species composition, determining the most important variables is
challenging. However, there are several techniques in multivariate analysis that can aid in teasing
apart these interdependences and summarizing important interactions among environmental
variables. In this study, I examined species occurrence data from plots in the Duke Forest, North
Carolina, as well as environmental data from the same plots. I used several multivariate
community analysis techniques to classify and describe the communities present in Duke Forest,
and I explored the main factors that distinguish each group.
Data and Methods
Data
I used stem counts of woody species from 106 Carolina Vegetation Survey plots in Duke Forest.
The stem count data were log-transformed, then relativized by plot so that all values for species
within a single plot summed to 100. 56 total species were present in the data set (Table 1). I also
used data for 16 environmental variables in the same plots. Environmental variables included soil
characteristics such as pH, nutrients and texture, as well as topographic factors such as slope,
aspect, and distance to water.
Analyses – good detail
First, I used ordination using PC-ORD software to compare plots along important axes and help
determine which species and environmental variables are most important in determining species
composition in the plots. Nonmetric multidimensional scaling (NMS) was the type of ordination
I used because it avoids the assumption of linear relationships among variables (McCune and
Grace 2002). NMS uses ranked distances, so it tends to linearize relationships between distances
measured in species space (McCune and Grace 2002). I did a preliminary, step-down ordination
to determine the dimensionality to choose for my focal ordination. For the step-down ordination,
I chose six dimensions using the Bray-Curtis distance measure, a random number starting
configuration, and 20 runs with real data. The step-down ordination created six models, one each
with six, five, four, three, two and one dimensions. For each model, it calculated the amount of
stress on the model, or how far the data after ordination diverge from the original data (McCune
and Grace 2002). According to McCune and Grace (2002), stress values below 20 indicate that a
model should provide useful results, with values closer to 0 most preferred but rarely achieved.
The scree plot is a graphical output that shows the stress as a function of dimensionality. I
examined the scree plot and determined that three dimensions were adequate to describe my
data, since the three-dimensional model had a stress of under 20. Models with four, five or six
dimensions did not reduce the stress below 10. In addition, the stress for the three-dimensional
model was stable after it reached a solution.
Therefore, I did a focal NMS ordination with three dimensions, using the distance measure and
other criteria listed above. Using PC-ORD, I was able to create biplots showing where each plot
occurred along each of the three axes. I overlaid the species abundance data and environmental
attributes on these biplots to determine which of these were correlated with which axes. The
output from the NMS ordination included r2 values for the correlations between each species or
environmental variable and each of the three axes in ordination space. These were helpful in
determining which axes corresponded to which species and which environmental variables.
To examine in more detail the characteristics of the data, I used polythetic hierarchical
agglomerative cluster analysis. This analysis sorts the plots into groups based on their species
composition according to a matrix of distances between each pair of plots. I used polythetic
clustering because it bases clustering on multiple species. Hierarchical clustering was used
because larger groups are formed from smaller grouping levels. Later fusions therefore depend
on earlier fusions. Since it is agglomerative, the clustering starts with individual plots and begins
grouping them into successively larger clusters. Chaining occurs when new groups are formed
by the addition of single items to existing groups, and a low amount of chaining is desirable
(McCune and Grace 2002). I used the Sorensen distance measure, with flexible beta as my
linkage method, since according to McCune and Grace (2002) it is compatible with the Sorensen
measure. I chose a beta of -.25 because it has the least propensity to chain (McCune and Grace
2002). I ran the cluster analyses for six groups and included all lower-level clustering, so the
output included cluster dendrograms for six, five, four, three and two groups. I then qualitatively
examined the species composition of each group, as well as the environmental variables that ?...
I used this clustering as the basis of my next analysis, indicator species analysis (ISA). I used
ISA to characterize the species that belong to each group. ISA combines species abundance and
frequency to determine to what extent a species is diagnostic for a particular group. A perfect
indicator species for a group will only occur in that group (100% of its abundance in that group),
and will occur in all plots in that group (a value of 100% frequency for that group). Relative
abundance (RA) is calculated as the average abundance of a given species in a given group
divided by the average abundance of that species in all samples, expressed as a percent. Relative
frequency (RF) is calculated as the percent of samples in a given group where the species is
present. An indicator value (IV) is calculated as the combination of RA and RF. For every
species, a maximum IV (IVmax) was calculated as well. To test for significance of the results, I
ran a Monte Carlo test using 1000 randomizations. This method randomly assigns species to
groups 1000 times, and calculates an IVmax for each randomization. The null hypothesis is that
IVmax from the clustering for a particular species is no larger than would be expected by chance
from the randomization. P-values of < 0.05 indicate that the IVmax for a particular species is
significantly different from chance.
I ran ISA in PC-ORD for the three, four, five and six group clusters. RA, RF, and IV were
outputs. To determine how many groups to use for my ISA and community description, I
examined looked for the cluster level that produced results with the lowest average p-value, and
the largest number of significant p-values (< 0.05).
Results
Ordination
The three-dimensional model in the focal run had a final stress of 15.80993 and a final instability
of 0.00044. These values are acceptable, and models with more dimensions did not reduce stress
by a large amount. Biplots from the three-dimensional solution are shown in Figure 1. All
species correlated with the ordination axes (r2 > .200) are shown as arrows overlaid on the plot
data. In addition, Table 2 shows the r2 values at or above .200 for species and environmental
factors that are correlated with each axis. The ordination graphs with tree species and
environmental variables overlaid give a visual picture of how these correspond to the three axes.
The correlation coefficients show how species and environmental variables relate to the axes
quantitatively. From the graphs, Acer rubrum and Quercus prinus are positively correlated with
Axis 1, with Quercus prinus showing a slightly stronger relationship. The r2 correlation
coefficients for Axis 1 show the same trend. Quercus prinus has an r2 value of .408, while Acer
rubrum had an r2 = .281. Four species are sorted along Axis 2. Fagus grandifolia and
Liriodendron tulipifera are positively correlated with Axis 2, and Juniperus virginiana and
Quercus stellata are negatively correlated, according to the graphs. Each one of these has an r2 >
.300. Several species are sorted along Axis 3. The distributions of Liquidambar styraciflua,
Carpinus caroliniana and Ulmus alata are positively correlated with the axis, while Oxydendron
arboreum and Quercus alba are negatively correlated, according to the graphs. Liquidambar
styraciflua and Quercus alba have the strongest relationship to the axis, as shown by the longer
lines for those species. Again, this corresponds to r2 values for these species. All of the species
have r2 above .200. Liquidambar styraciflua and Quercus alba have very high r2 values above
.500.
The graphs with the environmental variables overlaid show that Al, pH, Mn, and Ca have the
greatest correlation with Axis 1. The r2 values for all of these variables are at .200 and above.
Therefore, the distributions of Acer rubrum and Quercus prinus each likely depend on these
variables. However, the graphs show no environmental variables that are sorted along this axis.
Similarly, none of the environmental variables has an r2 > .200. This probably means that the
species that sort along Axis 2 are influenced by a variable that was not included in this data set.
The environmental variables that are correlated with Axis 3 are Mg, Ca, distance to water, and
elevation. Each of these has an r2 > .200. In particular, Dist-H20 has an r2 of .542, indicating a
relatively strong relationship. Therefore, the distribution of species correlated with Axis 3 such
as Quercus alba, Liquidambar styraciflua, and Oxydendron arboreum must be influenced by
these environmental variables.
Clustering and ISA
The six-group clustering level has the highest number of significant p-values and the highest
average p-value (Figure 2), so that is the one that was used for this analysis. The clustering
dendrogram (Figure 3) shows the agglomeration done by PC-ORD. Table 3 shows the species
with the highest IVs for each group, along with the RA and RF values that correspond to those
species. An IV in bold indicates that it is a maximum IV for that species, and is significantly
different from the result of the Monte Carlo randomization (p < .05).
Based on the IV, RA and RF values for each group, I determined the dominant species for each
group. Group 1 is the Quercus alba/Oxydendron arboreum/Quercus velutina group, or the oaksourwood community. This group is made up of 44 of the 106 plots, so it would be expected to
have a great deal of variation in species composition among plots. This probably accounts for the
relatively low IV and RA values in this group; no species has an IV of greater than 50%.
Distance to water and soil aluminum also appear to be associated with this group based on
biplots and overlays in PC-ORD. Group 3 can be characterized as an Ulmus/Ilex group, or an
elm-holly community. Both Ulmus alata and Ulmus rubra have relatively high IVs in Group 3,
as well as RA and RF values. However, it does not appear that any of the environmental
variables measured is associated with this group.
Group 12 is characterized by Fagus grandifolia, as well as Cornus florida and Liriodendron
tulipifera to a lesser extent. I will name this group the beech group, since F. grandifolia has
fairly high values for RA and RF in Group 12. Soil pH is associated with this group as well.
Group 36 is the Quercus prinus or chestnut oak community, although Quercus coccinea (scarlet
oak) is often present. Distance to water and elevation are associated with this group, so the
presence of chestnut oak must depend on these environmental factors. Group 37 is the
Fraxinus/Cercis canadensis group, or an ash-redbud community. Both of these species are
present in all plots in this group (RA=100%), and a large portion of their total abundances fall
within this group (RA > 50%). Group 37 also seems to be characterized by distance to water, soil
pH, and soil Mn.
Group 87 has three pine species with fairly high IVs and RF = 100. This group appears to
represent a southern pine community; however, the group is only made up of 3 plots. Therefore,
it is difficult to determine whether this is an actual community type or if it is just a residual group
of plots that should be in other groups. No environmental variables measured here are associated
with this group.
Discussion
As a result of the ordination and clustering analysis, I was able to identify approximately six
community types with the Duke Forest data: oak-sourwood, elm-holly, beech, chestnut oak, ashredbud and perhaps a southern pine community type. Ordination and subsequent overlays of
species and environmental data were helpful in determining the characteristics associated with
each of these groups. Most groups showed at least one soil variable associated with them, except
for the elm-holly and southern pine communities. It could be that none of the environmental
variables measured in this data set influence the presence of the species in these groups.
However, environmental variables probably do not correlate with the southern pine group
because there are only three plots in that group. Personal knowledge of the data would aid in
determining which of these groups may be valid or invalid.
Reference
McCune, B. and James B. Grace 2002. Analysis of Ecological Communities. MjM Software
Design, Gleneden Beach, OR.
Table 1: Species present in the data set.
Code
ACNE
ACRU
ACSA
AMAR
BENI
CACR
CACA
CACO
CAGL
CAOL
CAOV
CAPA
CATO
CECA
CEOC
COFL
COST
CRMA
CRUN
CRAT
DIVI
FAGR
FRAX
ILAM
ILDE
ILOP
JUNI
JUVI
LIST
LITU
LOJA
MATR
MORU
NYSY
OSVI
OXAR
PITA
PIEC
PIVI
PLOC
PRAM
PRSE
QUAL
QUCO
QUFA
QUMA
QUMI
QUNI
QUPH
QUPR
QURU
QUSH
QUST
QUVE
SAAL
ULAL
ULAM
ULRU
Scientific name
ACER NEGUNDO
ACER RUBRUM
ACER SACCHARUM
AMELANCHIER ARBOREUM
BETULA NIGRA
CARPINUS CAROLINA
CARYA CAROLINAE-SEPTENTRIONALIS
CARYA CORDIFORMIS
CARYA GLABRA
CARYA OVALIS
CARYA OVATA
CARYA PALLIDA
CARYA TOMENTOSA
CERCIS CANADENSIS
CELTIS OCCIDENTALIS
CORNUS FLORIDA
CORNUS STRICTA
CRATAEGUS MARSHALLII
CRATAEGUS UNIFLORA
CRATAEGUS SP.
DIOSPYRUS VIRGINIANUS
FAGUS GRANDIFOLIA
FRAXINUS SP.
ILEX AMBIGUA
ILEX DECIDUA
ILEX OPACA
JUGLANS NIGRA
JUNIPERUS VIRGINIANA
LIQUIDAMBAR STYRICIFLUA
LIRIODENDRON TULIPIFERA
LONICERA JAPONICA
MAGNOLIA TRIPETALA
MORUS RUBRA
NYSSA SYLVATICA
OSTRYA VIRGINIANA
OXYDENDRUM ARBOREUM
PINUS TAEDA
PINUS ECHINATA
PINUS VIRGINIANA
PLATANUS OCCIDENTALIS
PRUNUS AMERICANA
PRUNUS SEROTINA
QUERCUS ALBA
QUERCUS COCCINEA
QUERCUS FALCATA
QUERCUS MARILANDICA
QUERCUS MICHAUXII
QUERCUS NIGRA
QUERCUS PHELLOS
QUERCUS PRINUS
QUERCUS RUBRA
QUERCUS SHUMARDII
QUERCUS STELLATA
QUERCUS VELUTINA
SASSAFRAS ALBIDUM
ULMUS ALATA
ULMUS AMERICANA
ULMUS RUBRA
Table 2: Species and environmental factors with strong correlations to the three ordination axes. r2
values > .200 are listed. Refer to Table 1 for species codes.
Axes
Species / Env
ACRU
CACR
FAGR
JUVI
LIST
LITU
OXAR
QUAL
QUPR
QUST
ULAL
pH
Ca
Mg
Al
Mn
Distance to H20
Elevation
1
0.281
2
3
0.317
0.331
0.378
0.536
0.319
0.22
0.531
0.408
0.39
0.248
0.368
0.223
0.286
0.332
0.200
0.236
0.542
0.223
Table 3: Importance values (IV), relative abundances (RA) and relative frequencies (RF) for species in
each group as a result of the clustering analysis. The species with the top seven IV’s for each group are
shown. Significant maximum IV’s are shown in bold. Refer to Table 1 for species codes.
Group
ID
# plots
1
1
44
Species IV
QUAL
44
OXAR
37
QUVE
35
CATO
34
COFL
33
CAOL
28
NYSY
28
2
3
11
RA RF Species IV
44 100 ULAL
66
44 84 ILDE
62
45 80 LIST
60
36 95 CACO
49
34 98 ULRU
49
51 55 MORU 43
28 98 CAOV
39
RA
81
97
60
67
89
59
54
3
12
29
RF Species IV
82 FAGR
73
64 LITU
58
100 COFL
34
73 QURU
30
55 ACRU
29
73 LIST
22
73 NYSY
20
RA
88
68
37
38
29
31
24
4
36
11
RF Species IV RA RF Species
83 QUPR
99 99 100 FRAX
86 OXAR
37 41 91 CECA
93 QUCO
37 82 45 OSVI
79 ACRU
33 33 100 QURU
100 CAPA
18 100 18 QUAL
72 QUVE
17 23 73 CAGL
86 QUMA
14 76 18 PRSE
5
37
8
IV
74
59
45
38
35
34
33
6
87
3
RA RF Species IV
74 100 PIEC
84
59 100 PIVI
69
52 88 JUVI
63
43 88 QUST
59
35 100 PITA
39
39 88 DIVI
19
38 88 QUMA
13
RA
84
69
63
59
39
19
19
RF
100
100
100
100
100
100
67
TreeLong_NMS
Group6
1
3
12
36
37
87
LITU FAGR
ACRU
Axis 2
QUPR
JUVI
QUST
(a)
Axis 1
TreeLong_NMS
Group6
1
3
12
36
37
87
LIST
Axis 3
CACR
ULAL
ACRU
QUPR
OXAR
(b)
QUAL
Axis 1
TreeLong_NMS
Group6
1
3
12
36
37
87
LIST
Axis 3
CACR
ULAL
LITU
FAGR
JUVI
QUST
OXAR
(c)
QUAL
Axis 2
Figure 1a-c: Biplots from the three-dimensional NMS ordination solution. All species with r2 > .200 for
a given axis are overlaid on the plot data. Groups correspond to those resulting from the cluster analysis.
Refer to Table 1 for species codes.
30
0.18
0.16
25
20
0.12
0.1
15
0.08
10
Average p-value
Number of Significant p-values
0.14
0.06
# significant p-values
Average p-value
0.04
5
0.02
0
0
3 groups
4 groups
5 groups
6 groups
Number of Groups
Figure 2: The number of significant p-values and the average p-values for all grouping levels. The sixgroup level has the most significant p-values, and the largest average p-value, so it was the grouping
level chosen in this analysis.
group
Distance (Objective Function)
1.6E-02
5.1E+00
1E+01
1.5E+01
2E+01
25
0
Information Remaining (%)
100
00001
PSP37
00018
00021
00574
00004
00008
00014
00002
00005
00007
00509
00520
00016
00024
00023
00033
00010
00031
00020
00042
00019
00069
00517
PSP36
00555
00581
00598
00589
00571
00579
00009
00012
00011
00015
00017
00022
00067
00618
00513
00514
00620
00537
00619
00501
00504
00524
00502
PSP88
PSP86
00508
PSP87
00617
PSP35
PSP34
00081
00606
00590
00510
00596
00512
00511
00607
00621
00608
00609
00003
00612
00616
00611
00614
00615
00622
00624
00575
00593
00582
00013
00029
PSP44
PSP61
00503
00507
00025
00026
00583
00584
00585
00032
00505
00506
00625
PSP10
00027
00587
00518
00588
00602
00515
PSP43
00028
00516
00030
00610
00613
00623
75
50
Group6
1 3 12 36 37 87
Figure 3: Clustering dendrogram showing six groups. The length of each branch in the dendrogram
indicates the amount of information needed to create each group. Colors correspond to group ID
numbers: Group 1 – red, Group 3 – green, Group 12 – light blue, Group 36 – purple, Group 37 – dark
blue, Group 87 – yellow.
Excellent!
Very clear, good detail
26/26
Download