Introduction to Null Models: Exercise 3 Created by: J. Sebastián Tello Center for Conservation and Sustainable Development Missouri Botanical Garden Objective: The purpose of this session is to become familiar with some of the basic capabilities of R to conduct null model analyses. In this exercise, we will dive a bit deeper into the guts of a null model algorithm. One of the greatest benefits of learning R is its flexibility, you can combine and create tests in essentially anyway you like. Concurrently, one of the reasons why null models have become so popular is because they can be easily built and modified in R. Once you know R reasonably well, creating your own code to implement null models is not that difficult. 1. Mid-domain effect – a null model for biodiversity patterns. The mid-domain effect in biogeography was discovered in 1994 and proposed as a null model to study biodiversity patterns created by the distribution of species at large geographic scales. In essence, the mid-domain effect is the result of 3 principles underlying the distribution of species: 1. Species are distributed within constrained domains. For example, the continental edges represent hard boundaries for the distribution of most terrestrial species. 2. At large spatial scales, species distributions are strongly aggregated. That means populations of a species are close to one another, not scattered across the domain of distribution. 3. Species within the domain are distributed randomly with respect to one-another and with respect to underlying environmental gradients. As such, the mid-domain effect has been used as a null model primarily to test ideas about the role of environmental variables in the formation of biodiversity gradients. Here, we will build a simple null model algorithm to produce species distributions expected by the mid-domain effect. This is a quick review of the algorithm, and is missing many of the details and steps needed when you are considering how to translate your ideas into code. This null model will use a one-dimensional continuous domain. To start, we will create an artificial domain by defining its beginning and ending points: domain <- c(0,1) In this case, the domain has coordinates that vary between 0 and 1, but any other values would work just as well. 1 The null model algorithm will place species distributions at random with the constraint that the entire species range will have to be contained within the domain (i.e., part of the range cannot be outside of the limits of the domain). We will represent the domain graphically in this figure: plot(domain, c(0, domain[2]-domain[1]), ylab="Range Size", xlab="Range Midpoint", type="n") The domain of distribution is the x-axis. The y-axis is the range size of species, which will be useful when plotting the results of the algorithm. Empirical species vary in how geographically widespread they are, so we will create an artificial distribution of range sizes: pool.distris <- runif(200, 0, domain[2]-domain[1]) hist(pool.distris) Here, we have created a “pool” of 200 range sizes that follow a uniform distribution ranging from 0 to 1. This distribution of range sizes is very unrealistic, but it will help us develop the null model algorithm. For a given range size, the location of its mid-point is constrained if we want to have the entire range within the domain. This constraint can be added to the figure we created above: plot(domain, c(0, domain[2]-domain[1]), ylab="Range Size", xlab="Range Midpoint", type="n") polygon(x=c(domain[1], domain[1]+(domain[2]-domain[1])/2, domain[2]), y=c(0, domain[2]-domain[1], 0), col="grey80", border="grey80") The grey triangle is the area where the range mid-points could be located. For small-ranged species, the mid-point can be almost anywhere, while for large-ranged species the mid-point needs to be closer to the center of the domain. In this algorithm, the first step to randomly place a species in the domain is to select a position (i.e., a mid-point) given its range size. For example, to find a random position for the first species we need to do: first.range.size <- pool.distris[1] first.range.size first.midpoint <- runif(1, domain[1]+( first.range.size/2), domain[2]-( first.range.size/2)) 2 first.midpoint The third line of code is selection at point at random from within the range of possible midpoints. Of course, we need to do this for each species: midpoints <- lapply(1:length(pool.distris), function(x, pool.distris, domain) {runif(1, domain[1]+(pool.distris[x]/2), domain[2]-(pool.distris[x]/2))}, pool.distris=pool.distris, domain=domain) midpoints<-unlist(midpoints) hist(midpoints) As you can probably see, we have used the function lapply to repeat a procedure – in this case selecting a random mid-point – for every species. This could have been done also using a for loop, but this way is a bit faster, and the code is briefer. There is a whole family of apply functions, each with its own use. If you are building null models or simulations, it is very useful to learn how to use them. Knowing the mid-points and the range sizes, now we can define the distribution of each species in the domain using its lower and upper limits. current.distris<-t( apply(cbind(midpoints, pool.distris), 1, function(x)c(x[1]-(x[2]/2), x[1]+(x[2]/2))) ) colnames(current.distris)<-c("lower_distri_limit", "upper_distri_limit") head(current.distris) Now, we can add the species distributions to the figure: plot(domain, c(0, domain[2]-domain[1]), ylab="Range Size", xlab="Range Midpoint", type="n") polygon(x=c(domain[1], domain[1]+(domain[2]-domain[1])/2, domain[2]), y=c(0, domain[2]-domain[1], 0), col="grey80", border="grey80") apply(current.distris, 1, function(x) lines(x=c(x[1], x[2]), y=rep(x[2]-x[1], 2), col="darkorange", lwd=2)) points(pool.distris~midpoints, cex=1, col="darkorange3", pch=16) Finally, what we will do is divide the domain into discrete cells, and calculate the species richness in each that is expected by the null model. 3 domain.cells <- 20 # The number of cells to be used # this will create a matrix with the upper and lower limits that define the cells cells.limits <- seq(domain[1], domain[2], length.out=domain.cells+1) cells <- matrix(0, domain.cells, 3) colnames(cells) <- c("lower_cell_limit", "upper_cell_limit", "richness") cells[,1] <- cells.limits[1:domain.cells] cells[,2] <- cells.limits[2:(domain.cells+1)] # this will create a composition matrix where rows are cells and columns are species. A 1 means the species is present in the cell, a 0 means it is absent. final.composition <- matrix(0, domain.cells, length(pool.distris)) colnames(final.composition) <- rownames(current.distris) rownames(final.composition) <- 1:domain.cells for(j in 1:nrow(current.distris)) { WhichCells <- intersect(which(cells[,1]<current.distris[j,2]), which(cells[,2]>current.distris[j,1])) final.composition[WhichCells,j] <- 1 } # this calculates the number of species in each cell cells[,3] <- rowSums(final.composition) Finally, we can create a plot of the species richness along the domain expected by the middomain effect: cell.midpoints<-cells[,1]+((cells[,2]-cells[,1])/2) plot(cells[,"richness"]~cell.midpoints, ylim=c(0, max(cells[,3])), ylab="Species Richness", xlab="Domain", pch=21, bg="grey60", col="grey30", cex=1.5) The species richness per cell could then work as test statistic to contrast empirical data with null model expectations. We will not use any empirical data in this example, but many studies have done precisely this. 4 Ideally, the code above would be developed further so that this process of randomization would be repeated any required number of times. It is often helpful also to make a function out of that code, so that it can be used easily afterwards. This is precisely what has been done in the files mdeCH.R and mdeCH.richness.R which you should have downloaded into your computer. You can open the files with a text editor to explore their code. For now, let’s load the functions into R using your browser. source(file.choose()) # search for the file mdeCH.R source(file.choose())# search for the file mdeCH.richness.R The function mdeCH implements the algorithm we described above, as well as the other algorithms for random distributions in a 1D domain described in Colwell and Hurtt (1994). Let’s look at some examples: # Model 1 – this is not a Mid-domain effect model MDE.res1<-mdeCH(required.div=300, domain=c(0,1), CH.model=1, pool.type="uniform", domain.cells=20, fig=TRUE) # Model 4 – this is the model we developed in some detail above MDE.res4<-mdeCH(required.div=300, domain=c(0,1), CH.model=4, pool.type="uniform", domain.cells=20, fig=TRUE) The function mdeCH.richness is a wrapper that uses the function mdeCH and runs it repeatedly to produce replicated random distributions expected by a particular mid-domain effect algorithm. POOL<-runif(300,0,1) names(POOL)<-paste("a", 1:300, sep="") results.model.1<-mdeCH.richness(rand.n=100, compo.list=TRUE, distri.list=TRUE, required.div=300, domain=c(0,1), CH.model=1, pool.type="user", pool.distris=POOL, domain.cells=20) results.model.4<-mdeCH.richness(rand.n=100, compo.list=TRUE, distri.list=TRUE, required.div=300, domain=c(0,1), CH.model=4, pool.type="user", pool.distris=POOL, domain.cells=20) The results include various summaries of the data produced by the null model: names(results.model.1) 5 Including the species richness per cell, which we can plot: cell.midpoints <- rowMeans(results.model.1$cells) par(mfrow=c(1,2)) plot(results.model.1$richness[,1]~ cell.midpoints, type="n", ylim=range(c(0,results.model.1$richness)), ylab="Richness", xlab="Domain", main="CH model 1") for(i in 1:ncol(results.model.1$richness)) points(results.model.1$richness[,i]~cell.midpoints, type="b", col="darkgoldenrod3") plot(results.model.4$richness[,1]~ cell.midpoints, type="n", ylim=range(c(0,results.model.4$richness)), ylab="Richness", xlab="Domain", main="CH model 4") for(i in 1:ncol(results.model.4$richness)) points(results.model.4$richness[,i]~cell.midpoints, type="b", col="darkgoldenrod3") As you can see, the algorithm consistently produces a peak of species richness near the center of the domain – hence the mid-domain effect. 6