Appendix S3, Supplementary Materials and Methods Dealing with studies biases. In the MCMCglmm, rather than modeling each location separately, we simultaneously modeled habitat-defined data for all locations. While this allowed us to better capture general patterns and reduces type-II errors due to small sample size, the approach also has some limitations. The most important one is related to increased heterogeneity due to differences between regions in sampling effort, type of sampling and differences in the mode of urbanization. Differences in the mode of urbanization is expected to mostly affect the habitat filtering hypothesis by varying the quality of the surrounding habitat (i.e. the pool of potential urban invaders) and the extent to which the urban habitat sorts species between avoiders and exploiters. We tackled these limitations with the use of a hierarchical modeling approach in which the intercept of the lines relating response variables and predictors were allowed to vary across regions. However, we also studied possible biases in two additional ways. First, we included the survey method as fixed factor in all models to reduce sources of heterogeneity. Second, we weighted the regions according to the sampling effort. The results revealed that the conclusions were not affected by heterogeneity in data quality (Tables S1, S2, S4). MCMCglmm specifications. Because we did not have a priori information about parameter distributions for our models, we used weakly informative priors, thus all the information in the analysis comes from the data (Hadfield 2010). The models were run for 50,000,000 iterations with a burn-in of 10,000 and a thinning interval of 500. This generated 10,000 samples from each chain from which parameters were estimated. Terms were considered statistically significant when 95% CIs did not span 0 and pMCMC values calculated in MCMCglmm were less than 0.05 (Hadfield 2010). We analyzed the Markov chain of all our models to check reliability of the posterior approximation. We assessed the level of non-independence between successive samples in the chain, where autocorrelation was less than 0.1 between successive stored iterations (Hadfield 2010). Furthermore, we applied the Heidelberg stationary test with the R package “coda”, a diagnostic test of convergence that uses the Cramer-von-Mises statistic to test the null hypothesis that the sampled values come from a stationary distribution (Plummer et al. 2006). The codes used to implement analyses presented in tables S1-S5 and S7-S9 are shown below. Table S1 Prior <-list(R = list(V = 1, nu = 0.002),G = list(G1 = list(V = 1, nu = 0.002, alpha.mu=0, alpha.V=1000), G1 = list(V = 1, nu = 0.002, alpha.mu=0, alpha.V=1000))) Model: MCMCglmm(Rarefied diversity~ habitat + log(altitude+0.5) + survey method + detectability + distance to Equator, random = ~ location + us(mesd):units, prior=Prior) Table S2 Prior <-list(R = list(V = 1, nu = 0.002),G = list(G1 = list(V = 1, nu = 0.002, alpha.mu=0, alpha.V=1000), G1 = list(V = 1, nu = 0.002, alpha.mu=0, alpha.V=1000))) 1 Model: MCMCglmm(Rarefied diversity~ all habitats + season + log(altitude+0.5) + survey method + detectability + distance to Equator, random = ~ location + us(mesd):units, prior=Prior) Tables S3-S4 Prior <- list(R = list(V = diag(2), n = 0.002, fix = 2), G = list(G1 = list(V = 1, n = 0.002), G2 = list(V = 1, n = 0.002), G3 = list(V = 1, n = 0.002), G4 = list(V = 1, n = 0.002))) Model: MCMCglmm(abundance urbanized habitat (i.e. urban or suburban) ~ trait-1 + at.level(trait,1):log(relative abundance in surrounding+0.5) + at.level(trait,1):reproductive season + at.level(trait,1):altitude + at.level(trait,1):survey.method + at.level(trait,1):detectability + at.level(trait,1):distance to Equator, random = ~idh(at.level(trait,1)):location + idh(at.level(trait,1)):animal + idh(at.level(trait,1)):species + us(mesd):units, rcov = ~ idh(trait):units, prior = Prior, pedigree=tr[[i]], family = "zipoisson") Tables 1 and S6 Prior <- list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.V=1000), G2=list(V=1, nu=1, alpha.mu=0, alpha.V=1000), G3=list(V=1, nu=1, alpha.mu=0, alpha.V=1000))) Model: MCMCglmm(tolerance ~ predictor1 + ... + predictorn, random=~animal+species+location,pedigree=tree[[i]],prior=prior,family="categorical") References Hadfield, J. D. J. 2010. MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. Journal of Statistical Software 33:1–22. Plummer, M., N. Best, K. Cowles, and K. Vines. 2006. CODA: Convergence diagnosis and output analysis for MCMC. Rnews 6:343–4. 2