Exercise 3 - Null models in R

advertisement
Introduction to Null Models: Exercise 3
Created by: J. Sebastián Tello
Center for Conservation and Sustainable Development
Missouri Botanical Garden
Objective: The purpose of this session is to become familiar with some of the basic capabilities
of R to conduct null model analyses. In this exercise, we will dive a bit deeper into the guts of a
null model algorithm.
One of the greatest benefits of learning R is its flexibility, you can combine and create tests in
essentially anyway you like. Concurrently, one of the reasons why null models have become so
popular is because they can be easily built and modified in R. Once you know R reasonably well,
creating your own code to implement null models is not that difficult.
1. Mid-domain effect – a null model for biodiversity patterns.
The mid-domain effect in biogeography was discovered in 1994 and proposed as a null model to
study biodiversity patterns created by the distribution of species at large geographic scales. In
essence, the mid-domain effect is the result of 3 principles underlying the distribution of species:
1. Species are distributed within constrained domains. For example, the continental edges
represent hard boundaries for the distribution of most terrestrial species.
2. At large spatial scales, species distributions are strongly aggregated. That means
populations of a species are close to one another, not scattered across the domain of
distribution.
3. Species within the domain are distributed randomly with respect to one-another and with
respect to underlying environmental gradients.
As such, the mid-domain effect has been used as a null model primarily to test ideas about the
role of environmental variables in the formation of biodiversity gradients. Here, we will build a
simple null model algorithm to produce species distributions expected by the mid-domain effect.
This is a quick review of the algorithm, and is missing many of the details and steps needed
when you are considering how to translate your ideas into code.
This null model will use a one-dimensional continuous domain. To start, we will create an
artificial domain by defining its beginning and ending points:
domain <- c(0,1)
In this case, the domain has coordinates that vary between 0 and 1, but any other values would
work just as well.
1
The null model algorithm will place species distributions at random with the constraint that the
entire species range will have to be contained within the domain (i.e., part of the range cannot be
outside of the limits of the domain). We will represent the domain graphically in this figure:
plot(domain, c(0, domain[2]-domain[1]), ylab="Range Size", xlab="Range
Midpoint", type="n")
The domain of distribution is the x-axis. The y-axis is the range size of species, which will be
useful when plotting the results of the algorithm.
Empirical species vary in how geographically widespread they are, so we will create an artificial
distribution of range sizes:
pool.distris <- runif(200, 0, domain[2]-domain[1])
hist(pool.distris)
Here, we have created a “pool” of 200 range sizes that follow a uniform distribution ranging
from 0 to 1. This distribution of range sizes is very unrealistic, but it will help us develop the null
model algorithm.
For a given range size, the location of its mid-point is constrained if we want to have the entire
range within the domain. This constraint can be added to the figure we created above:
plot(domain, c(0, domain[2]-domain[1]), ylab="Range Size", xlab="Range
Midpoint", type="n")
polygon(x=c(domain[1], domain[1]+(domain[2]-domain[1])/2, domain[2]), y=c(0,
domain[2]-domain[1], 0), col="grey80", border="grey80")
The grey triangle is the area where the range mid-points could be located. For small-ranged
species, the mid-point can be almost anywhere, while for large-ranged species the mid-point
needs to be closer to the center of the domain.
In this algorithm, the first step to randomly place a species in the domain is to select a position
(i.e., a mid-point) given its range size. For example, to find a random position for the first species
we need to do:
first.range.size <- pool.distris[1]
first.range.size
first.midpoint <- runif(1, domain[1]+( first.range.size/2), domain[2]-(
first.range.size/2))
2
first.midpoint
The third line of code is selection at point at random from within the range of possible midpoints. Of course, we need to do this for each species:
midpoints <- lapply(1:length(pool.distris), function(x, pool.distris, domain)
{runif(1, domain[1]+(pool.distris[x]/2), domain[2]-(pool.distris[x]/2))},
pool.distris=pool.distris, domain=domain)
midpoints<-unlist(midpoints)
hist(midpoints)
As you can probably see, we have used the function lapply to repeat a procedure – in this case
selecting a random mid-point – for every species. This could have been done also using a for
loop, but this way is a bit faster, and the code is briefer. There is a whole family of apply
functions, each with its own use. If you are building null models or simulations, it is very useful
to learn how to use them.
Knowing the mid-points and the range sizes, now we can define the distribution of each species
in the domain using its lower and upper limits.
current.distris<-t( apply(cbind(midpoints, pool.distris), 1,
function(x)c(x[1]-(x[2]/2), x[1]+(x[2]/2))) )
colnames(current.distris)<-c("lower_distri_limit", "upper_distri_limit")
head(current.distris)
Now, we can add the species distributions to the figure:
plot(domain, c(0, domain[2]-domain[1]), ylab="Range Size", xlab="Range
Midpoint", type="n")
polygon(x=c(domain[1], domain[1]+(domain[2]-domain[1])/2, domain[2]), y=c(0,
domain[2]-domain[1], 0), col="grey80", border="grey80")
apply(current.distris, 1, function(x) lines(x=c(x[1], x[2]), y=rep(x[2]-x[1],
2), col="darkorange", lwd=2))
points(pool.distris~midpoints, cex=1, col="darkorange3", pch=16)
Finally, what we will do is divide the domain into discrete cells, and calculate the species
richness in each that is expected by the null model.
3
domain.cells <- 20 # The number of cells to be used
# this will create a matrix with the upper and lower limits that define the
cells
cells.limits <- seq(domain[1], domain[2], length.out=domain.cells+1)
cells <- matrix(0, domain.cells, 3)
colnames(cells) <- c("lower_cell_limit", "upper_cell_limit", "richness")
cells[,1] <- cells.limits[1:domain.cells]
cells[,2] <- cells.limits[2:(domain.cells+1)]
# this will create a composition matrix where rows are cells and columns are
species. A 1 means the species is present in the cell, a 0 means it is
absent.
final.composition <- matrix(0, domain.cells, length(pool.distris))
colnames(final.composition) <- rownames(current.distris)
rownames(final.composition) <- 1:domain.cells
for(j in 1:nrow(current.distris))
{
WhichCells <- intersect(which(cells[,1]<current.distris[j,2]),
which(cells[,2]>current.distris[j,1]))
final.composition[WhichCells,j] <- 1
}
# this calculates the number of species in each cell
cells[,3] <- rowSums(final.composition)
Finally, we can create a plot of the species richness along the domain expected by the middomain effect:
cell.midpoints<-cells[,1]+((cells[,2]-cells[,1])/2)
plot(cells[,"richness"]~cell.midpoints, ylim=c(0, max(cells[,3])),
ylab="Species Richness", xlab="Domain", pch=21, bg="grey60", col="grey30",
cex=1.5)
The species richness per cell could then work as test statistic to contrast empirical data with null
model expectations. We will not use any empirical data in this example, but many studies have
done precisely this.
4
Ideally, the code above would be developed further so that this process of randomization would
be repeated any required number of times. It is often helpful also to make a function out of that
code, so that it can be used easily afterwards. This is precisely what has been done in the files
mdeCH.R and mdeCH.richness.R which you should have downloaded into your computer. You
can open the files with a text editor to explore their code. For now, let’s load the functions into R
using your browser.
source(file.choose()) # search for the file mdeCH.R
source(file.choose())# search for the file mdeCH.richness.R
The function mdeCH implements the algorithm we described above, as well as the other
algorithms for random distributions in a 1D domain described in Colwell and Hurtt (1994). Let’s
look at some examples:
# Model 1 – this is not a Mid-domain effect model
MDE.res1<-mdeCH(required.div=300, domain=c(0,1), CH.model=1,
pool.type="uniform", domain.cells=20, fig=TRUE)
# Model 4 – this is the model we developed in some detail above
MDE.res4<-mdeCH(required.div=300, domain=c(0,1), CH.model=4,
pool.type="uniform", domain.cells=20, fig=TRUE)
The function mdeCH.richness is a wrapper that uses the function mdeCH and runs it repeatedly to
produce replicated random distributions expected by a particular mid-domain effect algorithm.
POOL<-runif(300,0,1)
names(POOL)<-paste("a", 1:300, sep="")
results.model.1<-mdeCH.richness(rand.n=100, compo.list=TRUE,
distri.list=TRUE, required.div=300, domain=c(0,1), CH.model=1,
pool.type="user", pool.distris=POOL, domain.cells=20)
results.model.4<-mdeCH.richness(rand.n=100, compo.list=TRUE,
distri.list=TRUE, required.div=300, domain=c(0,1), CH.model=4,
pool.type="user", pool.distris=POOL, domain.cells=20)
The results include various summaries of the data produced by the null model:
names(results.model.1)
5
Including the species richness per cell, which we can plot:
cell.midpoints <- rowMeans(results.model.1$cells)
par(mfrow=c(1,2))
plot(results.model.1$richness[,1]~ cell.midpoints, type="n",
ylim=range(c(0,results.model.1$richness)), ylab="Richness", xlab="Domain",
main="CH model 1")
for(i in 1:ncol(results.model.1$richness))
points(results.model.1$richness[,i]~cell.midpoints, type="b",
col="darkgoldenrod3")
plot(results.model.4$richness[,1]~ cell.midpoints, type="n",
ylim=range(c(0,results.model.4$richness)), ylab="Richness", xlab="Domain",
main="CH model 4")
for(i in 1:ncol(results.model.4$richness))
points(results.model.4$richness[,i]~cell.midpoints, type="b",
col="darkgoldenrod3")
As you can see, the algorithm consistently produces a peak of species richness near the center of
the domain – hence the mid-domain effect.
6
Download