Similarity Assignment

advertisement
Biol 529 Plant Ecology
Similarity among communities
Communities can differ in a number of ways. Considering only the plant component of a system, two
communities can differ in species composition (taxonomy), total number of species (richness), and the
relative abundance of species (evenness). Species diversity refers to a community-level concept that
combines both richness and evenness.
We use a number of different indices to estimate the similarity, or dissimilarity, of two communities.
Considerable controversy exists about the effectiveness of these indices as they vary in performance for
things like total number of species involved, influence of rare species, abundance of the dominant species, etc.
A number of papers recently have explored what an appropriate index might be.
We will use a few of the common ones that you will frequently come across in the literature so you can get an
impression of their usefulness. They fall into two categories, presence-absence measures, which focus on
richness and composition, and abundance measures, with incorporate richness, composition, and abundance.
We will use the following indices:
Jaccard (presence-absence)
SJ =
a/a+b+c
Sorensen-Dice (Sorensen or Sorensen binary) (presence-absence)
SSD =
2a/2a+b+c
Bray-Curtis quantitative (sometimes called Pielou’s percentage similarity or Czekanowski’s)
SBC = ∑ 2*min(n1i, n2i)
∑n1i+∑n2i
Where:
a=number of species in both sites
b= number of species in second site only
c= number of species in first site only
n1i = the number of individuals or % cover or Importance value of the ith species in sample 1
n2i = the number of individuals or % cover or Importance value of the ith species in sample 2
min=refers to the lower abundance value for the species of the two samples being compared
The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté
by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets. The Jaccard coefficient
measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the
union of the sample sets. This index only uses presence-absence data.
The Sørensen index, also known as Sørensen’s similarity coefficient, is a statistic used for comparing the
similarity of two samples. It was developed by the botanist Thorvald Sørensen and published in 1948. It also uses
presence-absence data. When you use both the Jaccard and Sorensen Index on the same data set, note how they
differ in performance.
Sorensen’s Index is easily extended to abundance instead of incidence of species. This quantitative version of the
Sørensen index is also known as the Bray-Curtis similarity or quantitative index. When using the Bray-Curtis
quantitative index, the “minimum” value (# individuals or %cover or Importance Value) for a species when
comparing two samples is used for the numerator values.
Ecologists often use different names to describe the same index. For example, Pielou’s percentage
similarity is the same as Bray-Curtis, Sorensen’s quantitative index and the Czekanowski’s quantitative
index (but there is disagreement about this latter case).
Lab Problems:
We will use your Sierran forest data set to compare the performance of these similarity indices.
Assignment: Take the Sierran data and calculate all three similarities between one elevation and/or that
above or below it.
1. Create both an ‘average IV matrix’ for each species for each elevation and a presence-absence matrix as
well.
2. Calculate a Jaccard Index for 7000 & 6500, 6500 & 6000, 6000 & 5500, etc. Do the same for Sorensen’s. To
do this, you’ll need to calculate the #species in common between the two elevations, and the #species
restricted to each elevation.
3. Create a graph that contains each index. Plot the Index value on the Y-axis. For the X-axis, use the ‘joint’
elevation value.
4. Examine the graphs. In a paragraph or two, discuss whether these similarity indices suggest whether
there are one, two, three or more plant communities along the elevation gradient. In other words, is there a
significant change in the value of the indices such that one elevation is much more different (less similar) to
the next elevation than is typical for the other elevation pairs?
6. You can work in groups to do the calculations and discuss them, but write the paragraph by yourself.
7. Turn in the graphs you make along with the discussion.
Here are the details of how to do this:
1. Create a presence/absence file
1) Open up the Sierra data file in Excel
2) Highlight, copy and paste the names of the species over next to the data on the same line (leave at least one
open column)
3) Start with this new row of names, in the cell beneath the first species name Abmag (Abies magnifica), put
the following equation:
=If([click on cell for Abmag data]>0,1,0)
=IF(B6>0, 1, 0)
Then hit ‘return’.
<-(this would be if that cell was B6)
What this means is, if there is a value in the cell for Abies magnifica greater than 0, then place a one in the ‘if’
cell, if not, place a zero.
4) Highlight the ‘If’ cell (click on it), copy it, and paste it across under the names of the species.
5) Now highlight the whole row of ‘if’ equations, and past them down for all the rows of data.
6) You should now have a matrix of 1s and 0s
2. Now make a single presence/absence species list for each elevation
1) Copy and paste the names over again on the same row, leaving 2 columns open
2) In the column before the names you just pasted, list each elevation, from 7000 down, like this:
Abmag
Abcon
Pijef
Picon
7000
6500
6000
5500
5000
4000
3000
2000
3) Now, to get a single value for each elevation we’re going to use the “If” function again. Ask to sum the
column of Abies magnifica found at one elevation, if that sum is greater than zero, then that species is present
at that elevation. So we can set up the “If” equation like this, if there were five rows of one elevation:
=IF(SUM(P6:P10)>0,1,0)
You would interpret this equation as: If the sum of those 5 cells for Abies magnifica is greater than zero, place
a one in the cell, otherwise, place a zero.
4) Now repeat this process for Abies magnifica for all elevations, being careful to use the correct cell names at
each step. Check for accuracy.
5) Copy this column of “If” equations and paste across beneath all the names.
3. Now you can produce a similarity analysis for the Jaccard and Sorenson approaches.
For the Jaccard and Sorenson indices, you need three values, a=the number of shared species; b=the number
of species found only in one site; c=the number of species found only in the second site. You can use “If”
equations again to figure this out.
1) Copy and paste the names over again next to the summary matrix.
2) Let’s calculate the ‘a, b, and c’ value for comparing elevations 7000 and 6500.
If, for example, Abies magnifica for 7000 ft was cell AD6, and for 6500 ft was AD7, the equation would be:
=If((AD6+AD7)=2,1,0)
This means, if both the 7000 and 6500 ft cells for Abies magnifica contain a ‘1’, then both sites share the same
species, and place a ‘1’ in the new IF cell, otherwise place a zero.
Copy and past this across beneath each species name.
For the ‘b’ value and the ‘c’ value, it’s for species only found in one or the other site. We can use IF equations
again:
For the b value:
=IF(AD6>AD7,1,0)
For the c value:
=IF(AD7>AD6,1,0)
Create a SUM column next to the a, b, c, rows you just calculated and write a summary equation, for example
(adjusting the cell names to be accurate for your data):
=SUM(AU6:BG6)
3) Next to the a, b, c rows you just created, label a set of rows 7000&6500, 6500&6000, etc. Label the
adjacent column Jaccard and thee next one Sorensen. Now write the Jaccard and Sorenson equations
appropriate for each of the cells:
The Jaccard equation is a/a+b+c, so, for the SUM cells for a, b, and c comparing 7000 and 6500:
=BH6/(BH6+BH7+BH8)
This calculates the Jaccard index, if BH6 is ‘a’, BH7 is ‘b’, and BH8 is ‘c’.
The Sorenson index doubles the ‘a’ values in the last equation, so your ‘IF’ equation would be:
=(2*BH6)/((2*BH6)+BH7+BH8)
Repeat this process for all elevation pairs down the gradient.
You should have a column of labels, followed by a column of Jaccard values, then Sorensen values.
4. Now produce a ‘quantitative’ similarity analysis using the Bray-Curtis approach.
1) The first thing is to create an average IV matrix for each elevation/each species. Copy and paste the
species names again, in the column before the names, place the elevations 7000, 6500, etc, as you did before.
2) Now create an average IV value for each species at each elevation. The function to use is the ‘average’
function. So, for Abies magnifica at 7000 ft,
=AVERAGE(B6:B10) <- assuming cells B6, B7…B10 are the 7000 ft cells for Abies magnifica. Repeat this
process for each elevation, carefully making sure the cells are correct. Then you can copy and paste the Abies
column across for all species. As you did before, for each elevation, create a SUM column
3) Now, copy and paste the species names again, next to your average IV matrix. The column in front of the
species names should use 7000&6500, 6500&6000, 6000&5500, etc. Now we’re going to select the smaller of
the two IV values for each pair of elevation comparisons.
4) To conduct a Bray-Curtis analysis for 7000’ versus 6500’, for each species, first you compare the IV and
select the smaller of the two. So in your new matrix, use can place the smaller of the two values. So, if you
were comparing Abies magnifica at 7000 versus at 6500, assuming the two cells are B41 and B42.
=IF(B41>B42,B42,B41)
This means, if the value in B41 is the larger value, write the value of B42 into the cell, otherwise, if it isn’t,
write B41 in that cell. In other words, write the smaller value into the cell. Repeat this process for all the
species by elevation pairs for 7000 vs 6500 ft. Create a SUM column for each elevation comparison and sum
the values.
5) Now that you’ve selected the smaller value of the pairs, we can proceed with the Bray-Curtis index. In this
index, the sum of the smaller values is calculated and doubled, then the total of the sums of both elevations is
used to divide into the doubled smaller values. See the equation on the first page.
Create a new column labeled Sorensen, with the 7000&6500, 6500&6000, etc, labels as before in the prior
column. The Sorensen is calculated as:
=(2*[cell with the sum of the minimum IV values])/([sum of the IVs for the first elevation]+[sum of the IVs for
the second elevation] )
Carefully place your cell locations in the right spot and you will have the correct index. Continue for all
elevation comparisons.
5. Create a graph of the three similarity indices.
To create an Excel graph of these results, create a matrix with your Jaccard, Sorensen and Bray-Curtis values
like this (these are random values out of my head):
Jaccard
Sorensen
BrayCurtis
7000&6500
0.5
0.66
0.75
6500&6000
.75
1
0.85
6000&5500
0.9
0.8
0.8
5500&5000
0.8
0.7
0.4
5000&4000
0.45
0.35
0.8
4000&3000
0.8
0.9
0.7
3000&2000
0.3
0.04
0.1
Then, highlight all the cells. Click on ‘charts’, then click on line graphs, and click on the first type. You’ll get
something like this:
Congratulations. Now discuss with others how to interpret these values. The questions to ask are “similar
communities should have similarity indices in the upper values, so when there is a ‘dip’ does that mean there
is a shift in communities?” and “how well does each index perform based on your experience of the data
collection.”
Assignment:
Use the graph you create. Click on the graph, then click on the outer ring of the graph, then control-copy.
Paste it into a Word program. Below that, write a paragraph or two (on your own), discussing whether there
are one, two or three communities based on the similarity indices. Is there a significant ‘dip’ in similarities
that would lead you to make those decisions? Does one similarity index work better for making decisions?
…or do they work well depending on what you’re interested in (within community comparisons or between
communities? for example). Write at least 200-300 words, but no more than 400 words.
Download