Supplementary Information for: “Two waves of diversification in mammals and reptiles of Baja California revealed by hierarchical Bayesian analysis” Contents: S1: Hierarchical approximate Bayesian computation S2: Hierarchical population divergence model S3: Summary statistic vector S4: References Figure S1: Multiple population-pair divergence model Table S1: Parameters and their prior distributions S1: Hierarchical Approximate Bayesian Computation The hierarchical model employed in our ABC test for simultaneous divergence across Y taxon-pairs consists of sub-parameters (; within population-pair parameters) that are conditional on “hyper-parameters” () that describe the variability of among the Y population-pairs. For example, divergence times () can vary across a set of population pairs conditional on the set of hyper-parameters () that varies according to their hyper-prior distribution. Instead of explicitly calculating the likelihood expression P(Data | ,) to get a posterior distribution, we sample from the posterior distribution P((,) | Data) by simulating the data K times under the coalescent model using candidate parameters drawn from the prior distribution P(,). A summary statistic vector D for each simulated dataset is then compared to the observed summary statistic vector in order to generate random observations from the joint posterior distribution f(i,i|Di) by way of a rejection/acceptance algorithm (Weiss and von Haeseler 1998) followed by an optional weighted local regression step (Beaumont et al 2002). Loosely speaking, hyper-parameter values are accepted and used to construct the posterior distribution with probabilities proportional to the similarity between the summary statistic vector from the observed data and the summary statistic vector calculated from simulated data. S2: Hierarchical Population Divergence Model The hierarchical model consist of ancestral populations that split at divergence times TY = {1…Y} in the past (Supplementary Figure 1). The hyper-parameter set, quantifies the degree of variability in these Y divergence times across the Y ancestral populations and their Y descendent population pairs: (1) , the number of possible divergence times (1 Y); (2) E(), the mean divergence time; and (3) , the ratio of the variance to the mean in these Y divergence times, Var()/E(). The sub-parameters for the i-th population-pair (i) are allowed to vary independently across Y population pairs. The sub-parameters consist of each of the Y taxon-pair’s divergence times and demographic parameters drawn from sub-priors (Supplementary Table 1). Each pair of daughter populations a and b are descended from an ancestral population at a divergence time . Population mutation parameters (; N is the female effective population size and is the per gene per generation mutation rate) for daughter populations a and b are a and b, whereas ’a and ’b are the population mutation parameters for the daughter populations a and b at the time of divergence until (duration of bottleneck). For each of the Y taxon-pairs, a + b = The daughter populations ’a and ’b then grow exponentially to sizes a and b. The population mutation parameter for each ancestral population is depicted as A. Each divergence time parameter is scaled by AVE, where AVEis a constant determined by the mean of the sub-prior for (Supplementary Table 1). The uniform prior for spans all of the empirical estimates of from a comparative phylogeographic dataset using either Tajima’s (1983) or Watterson’s (1975) estimator of . In mammals, the maximum bound of the sub-prior for was max = 50.0 whereas in squamate reptiles it was max = 200.0. S3: Summary Statistic Vector The summary statistic vector D we employ consists of up to six summary statistics collected from each of the Y population pairs (, W, Var( - W), net, b, and w). This includes , the average number of pairwise differences among all sequences within each population pair, W the number of segregating sites within each population pair normalized for sample size, (Watterson 1975), Var( - W) in each population pair, and net, Nei and Li’s net nucleotide divergence between each pair of populations (Nei and Li 1979). This last summary statistic is the difference (b - w) where b is the average pairwise differences between each population pair and w is the average pairwise differences within a sister pair of descendent populations. The vector D is made up of a two-dimensional array where the number of columns correspond to the classes of summary statistics and the number of rows correspond to the number of taxon-pairs (Y) per comparative phylogeographic dataset. We use up to four classes of summary statistics including , net, W, and Var( - W) . Given these four classes of summary statistics collected per taxon pair and Y taxon pairs, the summary statistic vector D ( net )1 . . . ( net )Y 1 . . . Y (W )1 . . . (W )Y Var( W )1 . . . Var( W )Y , would include 4Y summary statistics. For each data set of Y taxon-pairs, rows 1 though Y within each column of D are ordered by ascending values of net diversgence (net). S4: References Beaumont, M. A., Zhang, W. & Balding, D. J. 2002 Approximate Bayesian computation in population genetics. Genetics 162, 2025-2035. Hickerson, M. J., Dolman, G. & Moritz, C. 2006 Comparative phylogeographic summary statistics for testing simultaneous vicariance across taxon-pairs. Mol Ecol 15, 209-224. Nei, M. & Li, W. 1979 Mathematical model for studying variation in terms of restriction endonucleases. Proc Natl Acad Sci USA 76, 5269-5273. Weiss, G. & von Haeseler, A. 1998 Inference of population history using a likelihood approach. Genetics 149, 1539-1546. Supplementary Figure 1. Depiction of the multiple population-pair divergence model used for the ABC estimates of , E(), and . (A): The white lines depict a gene tree with TMRCA being the time to the gene sample’s most recent common ancestor, and the black tree containing the gene tree is the population/species tree. (B): Parameters in the multiple population-pair divergence model. The population mutation parameter, , is 2N where 2N is the summed haploid effective female population size of each pair of daughter populations ( is the per gene per generation mutation rate). The time since isolation of each population pair is denoted by (in units of 2NAVE generations, where NAVE is the parametric expectation of N across Y population pairs given the prior distribution). Population mutation parameters for daughter populations a and b are a and b, whereas ’a and ’b are the population mutation parameters for the sizes of daughter populations a and b at the time of divergence until (length of bottleneck). The daughter populations ’a and ’b then grow exponentially to sizes a and b. The population mutation parameter for each ancestral population is depicted as A. The migration rate between each pair of daughter populations is depicted as M (number of effective migrants per generation). (C): Example of four population-pairs where parameters in (B) are drawn from uniform priors. Supplementary Table 1. Parameters and their prior distributions. Hyper-Parameters () are randomly drawn once per Y taxon-pairs. Sub-taxon Parameters () are randomly drawn once per ith taxon-pair. The per generation per gene DNA mutation rate () is uniform across all taxa. Hyper-Parameters () Description Prior Distribution Per gene per generation mutation rate Assumed to be uniform across taxon-pairs Number of possible divergence times across Y Discrete uniform (1, Y) taxon-pairs Matrix of possible divergence times (t) among Y T = {t1, …, t} Each t within T drawn from uniform (0,max) taxon-pairs. TY = {1, …, Y} E() Matrix of Y divergence times among Y Each within TY randomly drawn with taxon-pairs. replacement from T matrix The mean across Y taxon-pairs calculated from 1, …, Y taxon-pairs. Determined by max , , Y Var()/E(), the variance of , divided by the mean of across Y taxon-pairs calculated from Determined by max , , Y 1, …, Y. Sub-Parameters () Description Prior Distribution Each (ith) taxon-pair’s divergence time drawn i, i =1,…,Y randomly (with replacement) from divergence Uniform (0, max); max = 10.0 times within matrix T ={t1, …, t}. Uniform (0.01 max); i, i =1,…,Y Total population mutation parameter of each taxonpair, where i = 2Ni. max = 60.0 in mammals max = 200.0 in squamates (a)i, i = 1,…,Y Population mutation parameters for daughter (a)i = Uniform (0.0, i ) (b)i, i = 1,…,Y populations a and b i = 1,…,Y i, = (a + b)i (b)i = Uniform (0.0, i (b)i = i - (a)i) (A) i, i =1,…,Y Population mutation parameter for the ancestral Uniform (0.01, (Amax) population size of the ith taxon-pair ( a) i, i =1,…,Y Coefficient of population bottleneck magnitude in Uniform (0.01, i) ( b) i, i =1,…,Y daughter populations a and b at beginning of population bottleneck ( a and b before the present) ( a) i, i =1,…,Y between beginning of bottleneck in Length of time ( b) i, i =1,…,Y daughter populations a and b and the present time. Uniform (0.0, i)