Guschanski_et_al_SI

advertisement

9

10

11

12

4

5

6

7

8

1

2

3

13

14

15

16

17

18

19

20

21

22

23

24

25

26

Supporting Information

Guschanski et al.

SI M ATERIAL AND M ETHODS

Definition of Ancestral Ranges for Lagrange Analysis

To reconstruct ancestral ranges in Lagrange v

20110117 (Ree et al. 2005; Ree & Smith

2008) we defined following geographical regions (Fig. 1):

the Congo basin;

northern DRC - north of the Congo River and extending into Central African

 northwestern part on Zambia;

southeastern Africa - southern part of the East African region;

northeastern Africa - northern part of the East African region, extending to the Rift

Republic, confined to the West by the Oubangui River and to the east by the Rift

Valley;

northern Rift Valley - the northern part of the Albertine Rift Valley, extending as far south as Rwanda and Burundi and north to the border with Sudan;

Upper Guinean region and Lower Guinean region - the Dahomey Gap defines the border between the two;

Angola - corresponding to the northern part of the country;

southeastern DRC - Katanga (DRC), including the eastern part of Angola and the

Valley in Tanzania and Uganda and reaching the border with Ethiopia;

Zambia - northeastern part of the country and southern Rift Valley;

Ethiopia/Sudan - only the southernmost part of Sudan is taken into account.

The definition of biogeographical provinces by Udvardy (1975) and of ecoregions by Olson et al. (2001) was used as the basis for the above subdivision. However, we also took into

1

42

43

44

45

46

38

39

40

41

47

48

49

50

51

31

32

33

34

35

36

37

27

28

29

30

account previous studies of plant endemism (Linder 2001) and biogeographical analyses of plant and animal taxa (Moodley & Bruford 2007; Couvreur et al. 2008). Importantly,

these regions are based on the current distribution of guenon species.

LASER Analyses on Different Taxonomic Units

The birth

–death likelihood (BDL) method as implemented in LASER (Rabosky 2006)

strongly relies on the taxonomic scheme. Therefore, to account for the current taxonomic uncertainty of the guenon radiation, we chose to independently analyze three different taxonomic scenarios: (A) trees pruned to a single representative per species following the

taxonomy of Grubb et al. (2003) (Table S1); (B) trees pruned to a single representative per

species following Wilson & Reeder (2005); and (C) trees pruned to a single representative

per subspecies following Grubb et al. (2003). The result of the latter scenario is presented in the main text.

For each analysis, the critical value of ΔAIC

RC

(difference in AIC between the best rateconstant model (RC) and each of the rate-variable models) was calculated based on a null distribution of ΔAIC

RC

generated by fitting the candidate models to a set of 1,000 phylogenies simulated with the same number of taxa as the original chronograms and the

ML birth rate estimated by LASER. To account for incomplete taxon sampling, we also calculated the critical ΔAIC

RC

from a set of 1,000 phylogenies simulated with complete sampling and the ML birth rate, but pruned randomly to reach the sampled number of taxa.

Finally, in order to test whether the best fit model for the maximum credibility trees was consistently selected across the posterior distribution of branching times, we fitted BDL models to each of 1,000 BEAST chronograms under the three taxonomic scenarios, generating a posterior distribution of the difference in ΔAIC

RC scores.

2

55

56

57

58

59

52

53

54

63

64

65

66

60

61

62

72

73

74

75

76

67

68

69

70

71

SI R ESULTS

Which Factors Influence DNA Recovery from Museum Specimens?

We tested if the recovery of DNA from museum samples (measured as the number of fragments that could be mapped to the reference genome) differed between samples from different sources (dried tissue from skeleton and skull, teeth, bone material, nasal bones, ear cartilage, finger tips), different museums, was dependent on the age of the specimen or the weight of the sample (Table S2). We also analyzed if there was an effect of the extraction set and the enrichment pool. The linear regression model in R first showed that age and weight did not have an effect on the recovery of endogenous DNA (p=0.89 and p=0.97, respectively). The age category spanned more than 100 years from 1894 to 1997.

The weight ranged from 10 to 253 mg and thus differed by more than 25-fold. Next, accounting for all other variables, we tested if samples from different museums performed differently, which could be an indication of different storage conditions. Only samples from

Royal Belgian Institute of Natural Sciences (RBINS) showed significantly lower DNA recovery compared to other museums (pairwise t-test with adjusted p-values=0.1-0.002).

Different extraction conditions for the majority of the samples (9 out of 11) could explain this result (samples were extracted in a different laboratory compared to the rest, and no siliconized tubes were used during extraction and for storage). We subsequently removed the data from this museum from further tests. Next, we tested if there was a difference between different sources of material. There was no difference in the performance of any source material (pairwise t-test with adjusted p-values=1). Our results are in agreement

with Mason et al. (2011). Different extraction sets differed significantly from each other,

with one of the extraction sets performing exceptionally well (May_4: adjusted pvalues=0.05-2x10 -16 ). There was also a significant difference among different enrichment pools, with two pools performing worse than others. One of them contained the

3

81

82

83

84

85

86

87

77

78

79

80

95

96

97

92

93

94

88

89

90

91

98

99

100

101 combination of three bait species (3 species pool 2) and the other was captured with

Erythrocebus patas (E. patas pool 1) (Table S2).

Sample Misidentification

To avoid possible biases in taxonomic and phylogenetic interpretation as result of specimen mix-up, we collected and sequenced multiple representatives of the same taxon, wherever possible. In many cases the representatives were collected in different museums. For instance, the unexpected placement of Cercopithecus solatus apart from other members of the C. preussi species group, was confirmed by two independent specimens collected in Natural History Museum, London, United Kingdom (NHM) and

Royal Museum for Central Africa, Tervuren, Belgium (RMCA). The only species within

Ceropithecini that were represented by a single specimen and whose placement was not supported by additional samples were C. dryas and C. hamlyni . The former was collected from the type specimen in RMCA, the latter was sequenced from high quality sample derived from an animal in Leipzig Zoo, Germany and obtained from DPZ (German Primate

Center, Göttingen, Germany).

We identified three cases of mislabeling of museum-preserved specimens or mix-up during sample collection. Noteworthy, all of these cases are confined to specimens from

Museum für Naturkunde in Berlin (MfN), Germany. Based on the phylogenetic placement the specimen labeled Lophocebus albigena is most likely a representative of the genus

Cercocebus , whereas two specimens labeled Cercopithecus nictitans fall within the Mona group. Another potential case could involve a specimen designated as Chlorocebus tantalus from Togo. While specimens of Chl. tantalus from Cameroon and Central African

Republic cluster together, this specimen forms a sister branch to the cluster of Chl. pygerythrus , Chl. cynosorus , Chl. tantanlus and Chl. aethiops . Given the possibility of

4

106

107

108

109

110

111

112

102

103

104

105

117

118

119

120

121

113

114

115

116 sample confusion we restrain from any phylogenetic conclusions, until further evidence has been collected.

Differential Diversification through Time Estimated with LASER

The Yule-3-rate model consistently obtained the lowest AIC score in all three taxonomic scenarios (Table S7). It provided a significantly better fit than the best rate-constant model

(pure birth) for taxonomic scenario B (Wilson & Reeder (2005) specieslevel tree, ΔAIC

RC

= 11.72, P < 0.01) and C (Grubb et al. (2003) regardless of whether the critical ΔAIC

RC subspecies tree, ΔAIC

RC

= 13.75, P < 0.01),

values were estimated from trees simulated with or without accounting for incomplete taxon sampling. For taxonomic scenario A (Grubb et al. (2003) species-level tree), Yule-3-rate was not significantly better than the constant rates models (ΔAIC

RC

= 6.56, P = 0.06), although the test approached significance.

Therefore, rate constancy could not be rejected. The maximum likelihood estimate of the first shift in diversification rate was the same for all three taxonomic scenarios, i.e. 2.77 Ma

(Table S7). According to the model, there was an increase of 2.2-2.8 times (r2/r1) in the rate of diversification across the guenon phylogeny as a whole. Near the present, all three trees also supported a strong decrease in diversification rate of over sevenfold. However, the date of the second shift differed depending on the taxonomic scenario (A: 1.21, B:

1.08, C: 0.44 Ma). That is to be expected, given that diversification at the subspecies level occurred more recently than at the species level.

122

123

124

125

126

127

Ancestral Ranges and Dispersal Events within Species Groups

Range overlap increased with time (Fig. 4 and S8), which for nuclear data was approximated by branch length. Based on nuclear data, the regression between branch length and range overlap was significantly positive (adjusted R-squared= 0.25, p= 0.03) and the Y-intercept was at -0.1 not significantly different from 0 (p=0.58).

5

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

Inferring possible geographical ranges for the ancestral nodes shed light on the dispersal events within individual guenon species groups. It also illustrated, how climatic events could have simultaneously affected speciation in multiple species groups. Because here we consider gene divergence times and not population divergences, which can be considerably younger, we can only speculate about the role of particular climatic events in guenon speciation.

C. mitis species group.

The C. mitis group seems to have originated around the watershed of the lower Congo River at the border between Lower Guinea and Angola (Fig.

1). After the initial split around 2.4 Ma, the western populations dispersed into the Lower

Guinea to become C. nictitans and southwards towards Angola to become C. mitis mitis .

The eastern populations went south- and eastwards towards Zambia and there split into two sub-lineages around 1.7 Ma: one of them to the west of the Albertine Rift Valley and the other to the east. This time period coincides with the major uplifts in the Malawi Rift

(Ebinger et al. 1993), which might have triggered the separation. Members of the western

sub-lineage occupied the Congo basin ( C. m. heymansi ) and also went northwards along the western side of the Albertine Rift valley. Upon entering the “northern Rift Valley” region they diversified into C. m. kandi , C. m. dogetti and C. m. stuhlmani . Cercopithecus m. kolbi , which nowadays occurs in central Kenya, could represent a remnant population of the former eastwards range expansion. Members of the eastern sub-lineage dispersed along the east African cost north- and southwards. This sub-lineage contains C. m. boutourlini , C. m. albogularis , C. m. albotorquatus , C. m. monoides , C. m. erythrarchus , C. m. moloneyi and C. m. labiatus and represents the most recent radiation event within the

Mitis group (less than 1 Ma).

C. cephus species group.

After splitting from C. preussi / C. lhoesti , the C. cephus group took its origin at the contact zone between Upper and Lower Guinea about 2.2 Ma. The most ancestral lineage, including C. erythrogaster pococki and C. petaurista , diversified in

6

167

168

169

170

171

172

173

162

163

164

165

166

174

175

176

177

178

154

155

156

157

158

159

160

161

Upper Guinea. The current distribution of the species in this lineage suggests a scenario of gradual westwards migration. The common ancestor entered Nigeria, which is now occupied by C. erythrogaster and went westwards through Togo. It crossed the Dahomey gap and the gene flow across this region seemed to have ceased ca. 1.4 Ma, which gave rise to C. petaurista . This species subsequently diversified into subspecies in Upper

Guinea. The second lineage of the C. cephus group diversified within Lower Guinea. It contains C. erythrotis as sister species to a subset of C. cephus , consisting of C. c. ngottoensis and a not further identified subspecies from Cameroon. Together, this C. cephus subgroup occupies the northernmost range of the species. The C. erythrotis subspecies, one of which ( C. e. erythrotis ) is endemic to the Bioko island, split ca. 0.4 Ma - slightly later than the C. preussi subspecies ( C. p. preussi and C. p. insularis ca. 0.7 Ma), which show similar distribution. The third lineage containing C. cephus cephus collected south of the lower Congo River and C. ascanius occurs further to the south. C. ascanius diversified into multiple subspecies in and around the Congo Basin in a most recent radiation event in guenons (less than 1 Ma). The most ancestral lineage within C. ascanius , C. a. schmidti occurs north and east of the Congo River, while all other subspecies are distributed within and south of the Congo basin. In contrast to forest cover fluctuations that seem to be responsible for many of the previously described diversification events, the changing course of the rivers might have played a more pronounced role for C. ascanius . Many of the subspecies are clearly confined to interfluvial areas, which might form impermeable barriers to dispersal. The Congo basin is a region of low altitudinal profile and was suggested to have been occupied by a large water body as

recent as the Pliocene (Goudie 2005). It is thus conceivable that the course of the Congo

River tributaries was established only recently, which in turn would explain the young separation ages of C. ascanius within this area.

7

192

193

194

195

196

187

188

189

190

191

179

180

181

182

183

184

185

186

197

198

199

200

201

202

203

204

C. mona species group.

– The C. mona group shows a strikingly similar pattern of diversification to that of the C. cephus group. This similarity and the large range overlap between these two species groups suggest that same environmental factors have played a role in creating today’s diversity and distribution. It is likely that the ancestor of the

C. mona group occupied the contact zone of Lower and Upper Guinea, corresponding to the today’s distribution range of C. mona . One of the lineages remained to the west of the

Nigeria/Cameroon border, the other dispersed eastwards. The western lineage entered the Dahomey gap, in which C. mona

still occurs today (Campbell et al. 2008). It eventually

gave rise to C. campbelli and C. lowei about 2.7 Ma, which are distributed further to the west, similar to the stepwise dispersal of C. erythrogaster and C. petaurista (see above).

The eastern lineage, containing C. pogonias gave rise to two clusters. One of them occurs west and north of the Congo River and contains C. p. pogonias , C. p. nigripes and C. p. grayi . Another member of this cluster could be C. p. schwarzianus from western DRC, as its relationship with the sister cluster is only weakly supported (Fig. S3). Finally, the cluster containing the other members of C. pogonias has diversified in northeastern Congo ( C. p. denti ) and within the Congo basin ( C. p. pyrogaster , C. p. wolfi and C. p. elegans ).

Particularly this last radiation within the Congo basin occurred at about the same time (ca.

1 Ma) as the diversification of C. ascanius in the same geographical area. Again, the fast changing drainage system of Congo River tributaries could have contributed to this diversification.

It is interesting to note the unexpected placement of the C. mona specimen collected from the island of Principe ( São Tomé) within C. pogonias (Fig. S3 and Fig. S4). Island radiations of guenons represent a special case, which includes dispersal by humans

(Horsburgh et al. 2003). It is possible that this specimen reflects past hybridization

between C. pogonias and C. mona . Importantly, hybrids between C. mona and C. pogonias

were previously described (Struhsaker 1970; Detwiler et al. 2005).

8

205

206

207

208

209

210

211

212

213

214

215

216

C. aethiops species group (genus Chlorocebus).

– Recovering ancient geographical signal for the widely distributed members of the C. aethiops group is challenging. It is conceivable that the ancestor originated in the area around Lower Guinea and northern

DRC. Chl. sabaeus , the most ancestral member of the C. aethiops group, could be the result of an ancient westwards movement of this ancestor. Another lineage subsequently spread eastwards north of the Congo River before entering the Congo basin, where its remnant population with the representative C. dryas still lives today. The northern population gave rise to the western Chl. tantalus , the eastern Chl. aethiops , and the

Tanzanian Chl. pygerythrus . Another branch of this dispersal moved southwards and gave rise to Chl. cynosorus and the South African Chl. pygerythrus .

9

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

237

238

239

240

241

232

233

234

235

236

R EFERENCES

Campbell G, Teichroeb J, Paterson JD 2008. Distribution of diurnal primate species in

Togo and Benin. Folia Primatol. (Basel). 79:15-30.

Couvreur TL, Chatrou LW, Sosef MS, Richardson JE 2008. Molecular phylogenetics reveal multiple tertiary vicariance origins of the African rain forest trees. BMC Biol.

6:54.

Detwiler KM, Burrell AS, Jolly CJ 2005. Conservation Implications of Hybridization in

African Cercopithecine Monkeys. Int. J. Primatol. 26:661-684.

Ebinger CJ, Deino AL, Tesha AL, Becker T, Ring U 1993. Tectonic Controls on Rift Basin

Morphology: Evolution of the Northern Malawi (Nyasa) Rift. J. Geophys. Res.

98:17821-17836.

Goudie AS 2005. The drainage of Africa since the cretaceous. Geomorphology. 67:437-

456.

Grubb P, Butynski TM, Oates JF, Bearder SK, Disotell TR, Groves CP, Struhsaker TT

2003. Assessment of the diversity of African primates. Int. J. Primatol. 24:1301-

1357.

Horsburgh KA, Matisoo-Smith E, Glenn ME, Bensen KJ 2003. Genetic Study of

Translocated Guenons: Cercopithecus mona on Grenada. In: Glenn ME, Cords M, editors. The Guenons: Diversity and Adaptation in African Monkeys. New York.

Kluwer Academic/Plenum Publishers. p. 289-306.

Linder HP 2001. Plant diversity and endemism in sub-Saharan tropical Africa. J. Biogeogr.

28:169-182.

Mason VC, Li G, Helgen KM, Murphy WJ 2011. Efficient cross-species capture hybridization and next-generation sequencing of mitochondrial genomes from noninvasively sampled museum specimens. Genome Res. 21:1695-1704.

10

250

251

252

253

254

255

256

257

258

259

260

261

262

242

243

244

245

246

247

248

249

Moodley Y, Bruford MW 2007. Molecular biogeography: towards an integrated framework for conserving pan-African biodiversity. PLoS ONE. 2:e454.

Olson DM, Dinerstein E, Wikramanayake ED, Burgess ND, Powell GVN, Underwood EC,

D'Amico JA, Itoua I, Strand HE, Morrison JC, Loucks CJ, Allnutt TF, Ricketts TH,

Kura Y, Lamoreux JF, Wettengel WW, Hedao P, Kassem KR 2001. Terrestrial ecoregions of the worlds: A new map of life on Earth. Bioscience. 51:933-938.

Rabosky DL 2006. Likelihood methods for detecting temporal shifts in diversification rates.

Evolution. 60:1152-1164.

Ree RH, Moore BR, Webb CO, Donoghue MJ 2005. A likelihood framework for inferring the evolution of geographic range on phylogenetic trees. Evolution. 59:2299-2311.

Ree RH, Smith SA 2008. Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis. Syst. Biol. 57:4-14.

Struhsaker TT 1970. Phylogenetic implications of some vocalizations of Cercopithecus monkeys. In: Napier JR, Napier PH, editors. Old World Monkeys: evolution, systematics and behaviour. New York. Academic Press. p. 365-444.

Udvardy MDF 1975. A classification of the biogeographical provinces of the world Morges,

Switzerland: IUCN.

Wilson DE, Reeder DM 2005. Mammal species of the world: A taxonomic and geographic reference Baltimore: The Johns Hopkins University Press.

263

264

11

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

272

273

274

275

276

265

266

267

268

269

270

271

T ABLE L EGENDS

Table S1: Specimen taxonomy, collection locality and GenBank accession numbers.

Taxonomic classification by Grubb et al. (2003) and Wilson and Reeder (2005) are juxtaposed and taxonomic correspondence of the specimens is given. The table lists country and locality of origin for each specimen and indicates missed taxa.

Table S2: Sample source, extraction, sequencing and enrichment success.

The basic data for each extracted sample is summarized. First column lists the name of the specimen as given in museum records. Tree label corresponds to the name of each sample in the phylogenetic trees (Figs. 1, 2, 3, S3, S5, S6). Museum number: museuminternal catalog number. Collection date: date the specimen was collected in the wild and thus the date that the animal was killed or died. Extracted part: part of the specimen from which the sample for DNA extraction was collected. Extraction set: groups samples that have been extracted together. Each of the seven extraction sets was complemented by two negative controls. Enrichment pool indicates which samples have been pooled for mitochondrial genome capture ( C. m. monoides pool1-3, E. patas pool1-3, and 3species pool1-3, which corresponds to the combination of C. m. monoides , E. patas and C. diana mtDNA genomes). Number of merged fragments: number of fragments that have been sequenced and merged for each sample. Number of unique mapped fragments: number of the merged fragments with unique start and end coordinates that could be mapped to the reference genome. Percentage of contaminated reads: percentage of all reads that had a higher alignment score with the set of reference sequences that including the human mtDNA genome than with the set of only Cercopithecini reference sequences (see main text for more details). Average coverage: average sequencing depth per position of each sequenced mtDNA genome. Percentage of fragments mapped to reference: corresponds to the efficiency of the enrichment method to capture fragments with mitochondrial

12

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

292

293

294

295

296

297

298

299 sequence. Next three columns summarize the information about the number of sequenced nucleotides, not sequenced nucleotides and ambiguous nucleotides (1-fold coverage or inconsistent base calling) per genome. We further indicate if the genome had more than

13,000 sequenced nucleotides and thus was used in the subset of analysis that rely on almost complete mitochondrial genomes.

Table S3: Mitochondrial primers.

Sequences of the primers used to produce the mitochondrial genome from high quality sample ( Cercopithecus hamlyni ). Primers used to generate long-range PCR products for bait construction are also listed. S=sequencing primer, R=re-amplification from the initial long-range PCR product.

Table S4: Best partitioning scheme identified with PartitionFinder for mtDNA data using only RaxML evolutionary models and all models. The substitution model and the data subsets are listed for each data partition.

Table S5: Taxonomic representation of nuclear data from Perelman et al. 2011.

Table S6: Lagrange analysis: geographic species distribution, adjacency matrix and allowed ranges. Range assignment as in SI Materials and Methods: A: the Congo basin;

B: northern DRC; C: northern Rift Valley; D: Upper Guinean region; G: Lower Guinean region; I: Angola; J: southeastern DRC; K: southeastern Africa; L: northeastern Africa; M:

Zambia; N: Ethiopia/Sudan.

Table S7: Tests of diversification models in Laser.

Models were fitted to three maximum clade credibility trees pruned according to three

13

318

319

320

321

322

323

324

325

326 different taxonomic scenarios.

AIC = Akaike information criterion; ∆AIC

RC

= difference in

AIC score between the best rate-constant model (pure birth) and the model with the lowest

AIC score (Yule-3-rate); r1 = initial net diversification rate (speciation events per million years); r2 = diversification rate after first rate shift; r3 = diversification rate after second rate shift; a = extinction fraction; k= carrying capacity parameter; x = rate change parameter; st1, st2 = inferred time of rate shifts in Ma before present. P values based on comparing

∆AIC

RC with the critical ∆AIC

RC from: † trees simulated under rate constancy with the same number of taxa sampled in this study; †† trees simulated under rate constancy with complete taxon sampling and pruned to the number of species sampled.

14

Download