Appendix 1: Additional methodological information associated with region delineation, diversification rate estimation and the calculation of ecological divergence. Regionalization based on maximizing dissimilarity We first calculated the Simpson dissimilarity between assemblages over a 12,100 km2 equal-area grid. We then clustered this dissimilarity matrix using average-linkage UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clustering, which allows defining k bioregions by cutting the resulting dendogram to produce k partitions. Cells with bird or mammal richness below five were excluded before this clustering procedure, as these cells (usually remote oceanic islands) tended to display idiosyncratic clustering (however, the results remain similar when these cells were retained). Diversification rates We considered several ways of calculating DivRate. As version 1, we used the entire phylogenetic tree (Jetz et al. 2012), but restrict the DivRate calculation to below the most ancient node of species in the region. As this metric may conflate speciation and extinction events that occur within a region with those outside, we also examined two more restrictive alternatives: (2) Down-weighting the effect of ancient speciation events by focusing only on diversification rates within the last X years, varying values of X from 3 to 30 million years BP; (3) A modified version of the method above in which we used region-specific cutoffs determined by the time since the onset of the bioregion assessment (see below). All methods returned similar results and exact DivRate metric specification did not affect our main conclusions (Fig. S1). In addition, we examined a version (4) of the DivRate metric in which all calculations are restricted to only those species associated with a region. The DivRate values based on this regional set of species are highly correlated with other measures of diversification calculated for these regionally subset phylogenies, such as the Kendall Moran estimator (r = 0.99 and 0.98; for birds and mammals, respectively) and a birthdeath model (r = 0.93 and 0.80). However, all these region-restricted measures of diversification rate disregard speciations in which one of the descendants remains outside the focal region. The degree to which such speciation events remain erroneously noncaptured increases with decreasing region richness and is thus expected to result in spuriously high associations with Richness. This is confirmed in our results, but as we show this did not alter substantially any of the focal associations in our study (Fig S2). We therefore restrict our main presentation of results to version (1) of the DivRate estimate as it requires neither an arbitrary temporal cutoff or external estimates of the bioregion age nor is it biased by outside-region speciation. Ecological divergence Data. We closely follow Wilman et al. (2014) categorization, with the following exceptions: (i) activity time is transformed to an ordinal variable with five categories (1 nocturnal, 2- nocturnal and crepuscular, 3- crepuscular or cathemeral, 4- diurnal and crepuscular, 5- diurnal) to better represent variability in dial activity patterns. While crepuscular and cathemeral (irregularly active at any time of night or day) represent very different activity patterns, they are both intermediate between diurnal and nocturnal patterns and hence are given an intermediate score. (ii) Bird foraging height data were matched to those of mammals to include four ordinal categories (1 – ground level, 2scansorial/ low vegetation / understory, 3 – fully arboreal / canopy, 4- aerial). (iii) We added an ordinal variable indicating the degree to which species forage in aquatic habitats (1 - aquatic, 2 - semi-aquatic, 3 - terrestrial/non-aquatic). Metric calculation. There is currently no consensus on the best way to quantify trait diversity from a distance matrix (Mouchet et al. 2010; Schleuter et al. 2010). We thus explored three methods. First, we used the sum of dendogram branch lengths , formed by removing all terminal branches belonging to species not present in the assemblage or not belonging to the taxon examined. The dendogram was based on UPGMA clustering of the trait distance matrix (using the R function "hclust" from R package "stats"), which in simulation studies has been shown to provide the best representation of the original dissimilarities (Merigot et al. 2010). Indeed, we found the goodness-of-fit between the original distance and the clustered distance [as measured by the 2-norm (Merigot et al. 2010)] to be better than the values obtained by alternative clustering methods (UPGMA: = 81,347; WPGMA: 91,209; Neighbor joining: 262,871; Ward: 81,536,156). Using a consensus tree from alternative clustering methods (Mouchet et al. 2008) was computationally unfeasible given the number of species considered. Second, we examined ‘functional attribute diversity’, which is simply the sum of the pairwise distances between all species in an assemblage. The advantage of this method is that it does not require the additional clustering stage, but results were extremely similar to those found using the sum of dendogram branch lengths (the correlation between rarefied dendogram-based and functional attribute trait diversity was 0.90 and 0.77 for birds and mammals, respectively). Finally, we used the convex hull approach, using Principal Coordinates axes to represent the dissimilarity matrix (Cornwell et al. 2006; Ricklefs 2012). However, for our purposes we found it unsuitable as the convex hull approach is influenced by extreme traits and is insensitive to variation in traits for species that do not possess extreme values. Moreover, the calculation of convex hull over many trait axes for a large amount of species is computationally challenging. Thus, for subsequent analyses we only show the results obtained from the first method using the sum of dendogram branch lengths. References: Cornwell W.K., Schwilk D.W. & Ackerly D.D. (2006). A trait-based test for habitat filtering: Convex hull volume. Ecology, 87, 1465-1471. Jetz W., Thomas G.H., Joy J.B., Hartmann K. & Mooers A.O. (2012). The global diversity of birds in space and time. Nature, 491, 444-448. Merigot B., Durbec J.P. & Gaertner J.C. (2010). On goodness-of-fit measure for dendrogram-based analyses. Ecology, 91, 1850-1859. Mouchet M., Guilhaumon F., Villeger S., Mason N.W.H., Tomasini J.A. & Mouillot D. (2008). Towards a consensus for calculating dendrogram-based functional diversity indices. Oikos, 117, 794-800. Mouchet M.A., Villeger S., Mason N.W.H. & Mouillot D. (2010). Functional diversity measures: an overview of their redundancy and their ability to discriminate community assembly rules. Funct Ecol, 24, 867-876. Ricklefs R.E. (2012). Species richness and morphological diversity of passerine birds. P Natl Acad Sci USA, 109, 14482-14487. Schleuter D., Daufresne M., Massol F. & Argillier C. (2010). A user's guide to functional diversity indices. Ecol Monogr, 80, 469-484. Wilman H., Belmaker J., Simpson J., de la Rosa C., Rivadeneira M.M. & Jetz W. (2014). EltonTraits 1.0: Species-level foraging attributes of the world's birds and mammals. Ecology, 95, 2027.