Electronic supplementary material Supplementary Text This document contains two sections. The first, “Variable definitions”, defines the variables in the dataset (see Dataset in the electronic supplementary material). The second, “Frequently asked questions”, gives further justification and discussion of our analyses that aid interpretation for researchers who wish to evaluate or repeat our analyses Variable definitions Column A “Time”: Time before present (Ma) according to Harland et al. (1990). Column B “Red”: The red curve in figure 4 of Royer et al. (2004), which is estimated low latitude sea surface temperature relative to present (oC). Data originate from Veizer et al. (2000) but have been adjusted for pH effects due to changes in seawater Ca++ concentration, and CO2 based on GEOCARB III. Column C “Bottom orange”: the lower bound of the orange band in figure 4 of Royer et al. (2004), which is as Red, but holding Ca++ levels constant. Column D “Top orange”: the upper bound of the orange band in figure 4 of Royer et al. (2004), which is as Red, but allowing the saturation state of CaCO3 in the ocean to vary through time. Column E “CO2”: RCO2 from GEOCARBIII (Berner & Kothavala 2001). RCO2 = ratio of mass of CO2 at time t to that at present, as used in Royer et al. (2004). Column F “All-Fam-Max”. Total standing diversity of families from Benton (1993), using maximum dating assumption (see Benton 1993, 1995). Column G “All-Fam-Min”. Total standing diversity of families from Benton (1993), using minimum dating assumption (see Benton 1993, 1995). Column H “Marine-Fam-Max”. Standing diversity of marine families from Benton (1993), using maximum dating assumption (see Benton 1993, 1995). Column I “Marine-Fam-Min”. Standing diversity of marine families from Benton (1993), using minimum dating assumption (see Benton 1993, 1995). Column J “Terrestrial-Fam-Max”. Standing diversity of terrestrial families from Benton (1993), using maximum dating assumption (see Benton 1993, 1995). Column K “Terrestrial-Fam-Min”. Standing diversity of terrestrial families from Benton (1993), using maximum dating assumption (see Benton 1993, 1995). Column L “All-Fam-pcOrig-Max”. Per capita origination rate for all families from Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per capita origination rate for an interval is calculated as (number of originations/standing diversity)/duration of interval. Column M “All-Fam-pcOrig-Min”. Per capita origination rate for all families from Benton (1993) using the minimum dating assumption (see Benton 1993, 1995). Per capita origination rate for an interval is calculated as (number of originations/standing diversity)/duration of interval. Column N “Marine-Fam-pcOrig-Max”. Per capita origination rate for marine families from Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per capita origination rate for an interval is calculated as (number of originations/standing diversity)/duration of interval. Standing diversity here refers to that of marine families. Column O “Marine-Fam-pcOrig-Min”. Per capita origination rate for marine families from Benton (1993) using the minimum dating assumption (see Benton 1993, 1995). Per capita origination rate for an interval is calculated as (number of originations/standing diversity)/duration of interval. Standing diversity here refers to that of marine families. Column P “Terrestrial-Fam-pcOrig-Max”. Per capita origination rate for terrestrial families from Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per capita origination rate for an interval is calculated as (number of originations/standing diversity)/duration of interval. Standing diversity here refers to that of terrestrial families. Column Q “Terrestrial-Fam-pcOrig-Min”. Per capita origination rate for terrestrial families from Benton (1993) using the minimum dating assumption (see Benton 1993, 1995). Per capita origination rate for an interval is calculated as (number of originations/standing diversity)/duration of interval. Standing diversity here refers to that of terrestrial families. Column R “All-Fam-pcExtin-Max”. Per capita extinction rate for all families from Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per capita extinction rate for an interval is calculated as (number of extinctions/standing diversity)/duration of interval. Column S “All-Fam-pcExtin-Min”. Per capita extinction rate for all families from Benton (1993) using the minimum dating assumption (see Benton 1993, 1995). Per capita extinction rate for an interval is calculated as (number of extinctions/standing diversity)/duration of interval. Column T “Marine-Fam-pcExtin-Max”. Per capita extinction rate for marine families from Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per capita extinction rate for an interval is calculated as (number of extinctions/standing diversity)/duration of interval. Standing diversity here refers to that of marine families. Column U “Marine-Fam-pcExtin-Min”. Per capita extinction rate for marine families from Benton (1993) using the minimum dating assumption (see Benton 1993, 1995). Per capita extinction rate for an interval is calculated as (number of extinctions/standing diversity)/duration of interval. Standing diversity here refers to that of marine families. Column V “Terrestrial-Fam-pcExtin-max”. Per capita extinction rate for terrestrial families from Benton (1993) using the maximum dating assumption (see Benton 1993, 1995). Per capita extinction rate for an interval is calculated as (number of extinctions/standing diversity)/duration of interval. Standing diversity here refers to that of terrestrial families. Column W “Terrestrial-Fam-pcExtin-Min”. Per capita extinction rate for terrestrial families from Benton (1993) using the minimum dating assumption (see Benton 1993, 1995). Per capita extinction rate for an interval is calculated as (number of extinctions/standing diversity)/duration of interval. Standing diversity here refers to that of terrestrial families. Column X “Sep_ngen”. Standing diversity of marine animal genera from Sepkoski (2002). Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using stage resolution. Column Y “Sep-pcOrig”. Per capita origination rate of marine animal genera from Sepkoski (2002). Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using stage resolution. Per capita origination rate for an interval is calculated as (number of originations/standing diversity)/duration of interval, durations taken from Harland et al. (1990) for comparability with above variables. Column Z “Sep-pcExtin”. Per capita extinction rate of marine animal genera from Sepkoski (2002). Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using stage resolution. Per capita extinction rate for an interval is calculated as (number of extinctions/standing diversity)/duration of interval, durations taken from Harland et al. (1990) for comparability with above variables. Column AA “allmaxbc”. Standing diversity of boundary crossing families, taken from Benton (1993), using maximum dating assumption. For definitions of boundary crossers see Foote (2000). Column AB “allmaxp”. Estimated per capita rate of origination, p, of all families in Benton (1993), using maximum dating assumption. p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is interval duration. Column AC “allmaxq”. Estimated per capita rate of extinction, q, of all families in Benton (1993), using maximum dating assumption. q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is interval duration. Column AD “allminbc”. Standing diversity of boundary crossing families, taken from Benton (1993), using minimum dating assumption. For definitions of boundary crossers see Foote (2000). Column AE “allminp”. Estimated per capita rate of origination, p, of all families in Benton (1993), using minimum dating assumption. p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is interval duration. Column AF “allminq”. Estimated per capita rate of extinction, q, of all families in Benton (1993), using minimum dating assumption. q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is interval duration. Column AG “marmaxbc”. Standing diversity of marine boundary crossing families, taken from Benton (1993), using maximum dating assumption. For definitions of boundary crossers see Foote (2000). Column AH “marmaxp”. Estimated per capita rate of origination, p, of marine families in Benton (1993), using maximum dating assumption. p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is interval duration. Column AI “marmaxq”. Estimated per capita rate of extinction, q, of marine families in Benton (1993), using maximum dating assumption. q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is interval duration. Column AJ “marminbc”. Standing diversity of marine boundary crossing families, taken from Benton (1993), using minimum dating assumption. For definitions of boundary crossers see Foote (2000). Column AK “marminp”. Estimated per capita rate of origination, p, of marine families in Benton (1993), using minimum dating assumption. p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is interval duration. Column AL “marminq”. Estimated per capita rate of extinction, q, of marine families in Benton (1993), using minimum dating assumption. q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is interval duration. Column AM “terrmaxbc”. Standing diversity of terrestrial boundary crossing families, taken from Benton (1993), using maximum dating assumption. For definitions of boundary crossers see Foote (2000). Column AN “terrmaxp”. Estimated per capita rate of origination, p, of terrestrial families in Benton (1993), using maximum dating assumption. p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is interval duration. Column AO “terrmaxq”. Estimated per capita rate of extinction, q, of terrestrial families in Benton (1993), using maximum dating assumption. q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is interval duration. Column AP “terrminbc”. Standing diversity of terrestrial boundary crossing families, taken from Benton (1993), using minimum dating assumption. For definitions of boundary crossers see Foote (2000). Column AQ “terrminp”. Estimated per capita rate of origination, p, of terrestrial families in Benton (1993), using minimum dating assumption. p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is interval duration. Column AR “terrminq”. Estimated per capita rate of extinction, q, of terrestrial families in Benton (1993), using minimum dating assumption. q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is interval duration. Column AS “Sepbc”. Number of marine animal broundary crossing genera from Sepkoski (2002). Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using stage resolution. Column AT “sep-p”. Estimated per capita rate of origination, p, of marine animal genera in Sepkoski (2002). p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nt = number of taxa crossing the top boundary of an interval and Δt is interval duration. Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using stage resolution. Column AU “sep-q”. Estimated per capita rate of extinction, q, of marine animal genera in Sepkoski (2002). q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where Nbt = number of taxa that cross both the top and bottom boundary of an interval, Nb=number of taxa crossing the bottom boundary of an interval and Δt is interval duration. Data were extracted from http://strata.ummp.lsa.umich.edu/jack/, using stage resolution. Column AV “pulsedR”. Mean per-stage sampling probability in marine animal genera taken from the model of pulsed origination and pulsed extinction in Foote (2005). This model fit the data better then alternative models in that paper. Data provided courtesy of Michael Foote. Column AW “p&f01terrf”. Standing diversity, per epoch, of terrestrial sedimentary formations, taken from column 4 of Appendix 2 in Peters & Foote (2001). Data originate from Kerocher et al. (1967). Column AX “p&f02f”. Standing diversity of marine sedimentary formations, per stage, taken from the supplementary information of Peters & Foote (2002). Data originate from Kerocher et al. (1967). Column AY “cosunaD”. Standing diversity of marine sedimentary packages, per stage, used in Peters (2005). Data originate from Childs & Salvador (1985), provided courtesy of Shanan Peters. Column AZ “cosunaP”. Estimated per capita rate of origination, p, of marine sedimentary packages, used in Peters (2005). p is defined as in Foote (2000): -ln(Nbt / Nt) / Δ t, where Nbt = number of packages that cross both the top and bottom boundary of an interval, Nt = number of packages crossing the top boundary of an interval and Δt is interval duration. Data originate from Childs & Salvador (1985), provided courtesy of Shanan Peters. Column BA “cosunaQ”. Estimated per capita rate of extinction, q, of marine sedimentary packages, used in Peters (2005). q is defined as in Foote (2000): -ln(Nbt / Nb) / Δ t, where Nbt = number of packages that cross both the top and bottom boundary of an interval, Nb=number of packages crossing the bottom boundary of an interval and Δt is interval duration. Data originate from Childs & Salvador (1985), provided courtesy of Shanan Peters. Frequently asked questions 1. “Why did you transform the data?” Answer: The data are skewed, requiring us either to transform the data to meet parametric assumptions. The use of non-parametric ranking statistics involves throwing away information embodied in the data, and this may seriously limit our ability to detect what turn out to be relatively weak associations (correlations <0.5) when sample size is small. Furthermore it would limit our ability to incorporate control variables and hence test the robustness of our results that way, as parametric statistics are very flexible in that regard, non-parametric statistics less so. 2. “Why did you de-trend the data prior to analysis?” Answer: When time series show serial trends, spurious associations can arise simply because of that. Whilst the trends might be causally linked, one cannot tell: they might just as well be caused by other variables. So the variation in a time series, that one can actually use to detect associations, is the variation around the trend. To examine that, we first have to find the trend and then remove it by taking residuals from it. De-trending also has a biological motivation and lends some robust elements to our analysis: It is widely recognized that fossil diversity generally has been increased enormously over geological time (either logistically or exponentially depending on who you read), no doubt because of the standard process of clade growth. But the standard process of clade growth is not of interest here, so it is the variation around this that we must consider. Thus, it can be argued that an increase should only be regarded as significant biologically if it is a residual from the trend. Another issue is that some of the long term trend is artefactual due to changes in the availability of fossil-bearing rocks of different ages (sampling effects). Here it also makes sense to de-trend to remove such artefacts.. 3. “Why did you mean-standardize the residuals prior to analysis?” Answer: This simply allows us to plot the two variables on the same y-axis (number of standard deviations from the mean), which helps for visualization (see figures 1-3 of the main paper). 4. “Why did you assess significance using bootstrapping instead of in the normal way?” Answer: The transformed, detrended, mean-standardized time series are still serially autocorrelated, meaning that data points close to each other in the time series take similar values and hence are not fully independent. This means that the standard p values from tables will be too low, giving rise to Type I errors. Bootstrapping is a randomization procedure that allows us to calculate the correct probabilities even in the presence of such autocorrelation. 5. “How did you control for sampling effects and other confounding variables in your analyses?” Answer: We did this using multiple regression (general linear model, the term we use in the main paper, is the more general term for the class of statistics). Diversity, origination or extinction was the response variable, and temperature was one explanatory variable. Measures of sampling or other, possibly confounding, variables were included as other explanatory variables, and a regression model is built using both temperature and the other potential explanatory variable(s), so that the effect of temperature is considered whilst the effect of these is already included. 6. “Raw data in some of the time series are at the stage level (about 80 data points), but these were somehow converted to a coarser time series of about 50 points. How was this done? What problems might it introduce?” Answer: The temperature and CO2 data are point estimates at 10 Myr intervals. We needed point estimates at 10 Myr intervals for diversity and origination/extinction. We simply used the diversity/rates for whichever stage each 10 Myr point falls, thus avoiding the need for aggregation or interpolation. The main problem with this is the usual one of stages being of different duration, but use of boundary crossers etc should help to ameliorate that. One consequence is that the same values for a stage are sometimes used in consecutive time intervals, but this should actually be conservative given that we are creating extra error that our tests have to withstand (i.e. in reality diversity is probably not static but we are sometimes artificially making it so). 7. “If these associations are real, why did nobody notice them before?” Answer: Correlations between CO2 and origination, extinction and diversity have been found previously at the same scales, and CO2 is associated with temperature in our data (they both basically signify greenhouse and icehouse worlds, though temperature is obviously more definitive). This suggests that the temperature associations we report are real. The temperature data are quite new, so there was not previously much opportunity to conduct these analyses. A final reason is that the data are not simple and the associations are not very strong; the diversity trends are non-linear, the temperature data are cyclical with a more linear trend, and this means that associations will not be obvious on first inspection, and crude analyses will not necessarily pick them up. 8. “There are several intervals in Earth history when climate has been warm, but diversity has risen, for example the Paleocene/Eocene and mid Cretaceous, both periods of rising diversity. Aren’t the present results at odds with those? Furthermore, geographically the places with highest mean temperatures have highest diversity. Isn’t that observation also at odds with the present results?” Answer: Not necessarily. After you de-trend, the mid and late Cretaceous show falling diversity residuals as temperature residuals are high (figure 1 of the main paper). So detrending leads to the opposite conclusion. The Paleocene/Eocene likewise is relatively warm, but forms part of a falling long term trend in temperature residuals, whilst diversity rises. So de-trending alters one’s view of the associations. The reasons for detrending are explained above. In contrast, previously published associations between CO2 and diversity, origination and extinction in the Phanerozoic would predict associations in the same directions as those we find. The observation on the geographic distribution of biodiversity is also not necessarily at odds with our results. The latitudinal gradient in diversity has persisted for millions of years (Willig et al. 2003), obviously independently of global climate and total changes in global diversity. It is the changes in global diversity that concern us here. Current studies (Williams et al. 2007) suggest that areas of high diversity, mostly tropical, are currently most at risk from climate change because they will experience the greatest loss of current climates. So a warmer world could be at greater risk of taxonomic extinction. It would obviously be easier to interpret such apparent conflicts if we knew what the underlying causes of our results are, but currently we can only hypothesize. 9. “Why does standing diversity sometimes take non-integer values in your data?”. Answer: This is a result of the date falling at a geological boundary, in which case the mean diversity of the two intervals either side of the boundary was taken. Means were also taken for rates in such cases. 10. “What defines a marine or terrestrial family in your data?” Answer: “Marine” families from Benton (1993) are defined here as any family occupying marine habitats, either wholly or in combination with other habitats. Similarly, “Terrestrial” families are defined as any family occupying terrestrial habitats, either wholly or in combination with other habitats. 11. “Why is there no data at time zero in some of your time series?” Answer: At time 0 (=Holocene), taxonomic rates take extraordinary values, due the unusually short duration of this geological interval. Such values were omitted from analysis, recorded as “no data” here. References: Benton, M. J., ed. 1993 The fossil record 2. London: Chapman & Hall. Benton, M. J. 1995 Diversification and extinction in the history of life. Science 268, 5258. Berner, R. A. & Kothavala, Z. 2001 Geocarb III: a revised model of atmospheric CO2 over Phanerozoic time. Am. J. Sci. 301, 182-204. Childs, O. E. & Salvador, A. 1985 Correlation of Stratigraphic Units of North America (COSUNA). Amer. Assoc. Petr. Geol. Bull. 69, 173-180. Foote, M. 2000 Origination and extinction components of taxonomic diversity: general problems. Paleobiol. 26 (Supplement), 74-102. Foote, M. 2005 Pulsed origination and extinction in the marine realm. Paleobiol. 31, 620. Harland, W. B. et al. 1990 A geologic time scale. Cambridge: Cambridge University Press. Keroher, G. C. 1967 Lexicon of geologic names of the United States for 1936–1960. US Geological Survey Bull.1200. Reston, VA: US Geological Survey. Peters, S. E. 2005 Geological constraints on the macroevolutionary history of marine animals. Proc. Natl. Acad. Sci. USA 102, 1236-1331. Peters, S. E. & Foote, M. 2001 Biodiversity in the Phanerozoic: a reinterpretation. Paleobiol. 27, 583-601. Peters, S. E. & Foote, M. 2002 Determinants of extinction in the fossil record. Nature 416, 420-424. Royer, D. L., Berner, R. A., Montañez, I. P., Tibor, N. J. & Beerling, D. J. 2004 CO2 as a primary driver of Phanerozoic climate. GSA Today 14, 4-10. Sepkoski, J. J. Jr. 2002 A compendium of fossil marine animal genera. Bull. Amer. Paleontol. 363, 1–560. Veizer, J., Godderis, Y. & François, L. M. 2000 Evidence for decoupling of atmospheric CO2 and global climate during the Phanerozoic eon. Nature 408, 698-701. Williams, J. W., Jackson, S. T. & Kutzbach, J. E. 2007 Projected distributions of novel and disappearing climates by 2100 AD. Proc. Natl. Acad. Sci. USA 104, 57385742. Willig, M. R., Kaufman, D. M. & Stevens R. D. 2003 Latitudinal gradients of biodiversity: pattern, process, scale and synthesis. Ann. Rev. Ecol. Syst. 34, 273309.