Fig. S1: Graphic representation of the four competing

advertisement

Fig. S1: Graphic representation of the four competing phylogeographic scenarios considered in the

DIYABC analysis (description of the scenarios are in the Materials and Methods section). Red represents Scandinavia; green represents the Balkans and blue represents Iberia.

Fig. S2: Mismatch distribution of the number of pairwise nucleotide site differences for European badgers. Full line represents the observed distribution and the dashed line represents the simulated distribution for an expanding population. 95% CI limits are shown in grey.

2 4 6 8 10 no. of genetic clusters

12 14

Fig. S3: Inference of genetic clusters using the STRUCTURE algorithm. Plot of the number of genetic clusters tested against their estimated log-likelihood. STRUCTURE was run using the admixture and correlated allele frequencies models.

Fig. S4: Location of the genetic sub-populations inferred using the STRUCTURE and GENELAND algorithms. At K=10 – the optimal STRUCTURE clustering solution – three different ways of partitioning the data into populations were obtained in 10 independent runs (Figs. 1-3), while the best ten GENELAND runs converged on the same nine genetic populations (Fig. 4). The size of the pie charts is indicative of the number of samples collected from a certain locality, while the pattern of the pie chart is indicative of the identity of the genetic clusters.

Fig. S5: The fit of the data used in the ABC analysis are visualized using principal components analysis

(PCA), in which the observations are the simulated data sets and the variables are the summary statistics. This Results from the ABC analysis: Principal component analysis showsing the fit of the badger dataset (‘Observed data set’) based onin comparison to 50,000 simulated datasets of the four scenarios.

(a)

Fig. S6: Microsatellite-based genetic diversity measures for 25 European badger populations with a samples size of N

10. The estimate of allelic richness was based on a sample of 16 individuals and 16 microsatellite markers in all 25 populations. Unbiased He: unbiased expected heterozygosity.

Fig. S7: Mitochondrial control region-based genetic diversity measures for European badger populations with a samples size of N

7.

simulated N0

50

150

1000

25 100 500 simulated time since decline (ta)

Fig. S8: Effect of simulated time since population decline and simulated current effective population size on the accuracy of the MSVAR estimate of log(N

0

): interaction plot of simulated time since decline with bottlenecks of different simulated current effective population sizes (N

0

), i.e. bottlenecks of differing severity.

0

-1

2

1

4

3

-2

7

6

5 sim. N0=50

N=20; ta=25 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

4

3

0

-1

2

1

-2

7

6

5 sim. N0=50

N=20; ta=100 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

0

-1

2

1

4

3

-2

7

6

5 sim. N0=50

N=50; ta=25 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

4

3

0

-1

2

1

-2

7

6

5 sim. N0=50

N=50; ta=100 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

7

6

5

4

3

2

1

0

-1

-2

N=20; ta=500 N=50; ta=500 sim. N0=50 sim. N0=150 sim. N0=1000 7

6

5

4

3

2

1

0

-1

-2 sim. N0=50 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 no. of simulated data set

Fig. S9: Inference of the past effective population size, log(N

1

), of simulated data sets using MSVAR.

The simulated data sets were characterised by different samples sizes (N), time since population decline (t a

) and current effective population size (sim N

0

). The past effective population size was always simulated to be 5000 individuals. In each graph, the grey bars represent the 90% credible intervals of the point estimate represented by a cross. The dotted and the dashed lines indicate the simulated past and current effective population sizes, respectively.

6

5

4

3

2

-1

-2

1

0

8

7 sim. N0=50

N=20; ta=25 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

4

3

2

-1

-2

1

0

8

7

6

5 sim. N0=50

N=20; ta=100 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

6

5

4

3

2

-1

-2

1

0

8

7 sim. N0=50

N=50; ta=25 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

4

3

2

-1

-2

1

0

8

7

6

5 sim. N0=50

N=50; ta=100 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

8

7

6

5

4

3

2

-1

-2

1

0

N=20; ta=500 N=50; ta=500 sim. N0=50 sim. N0=150 sim. N0=1000 8

7

6

5

4

3

2

-1

-2

1

0 sim. N0=50 sim. N0=150 sim. N0=1000

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 no. of simulated data set

Fig S10: Inference of the time (in generations) since the decline, log(t a

), of simulated data sets using

MSVAR. The simulated data sets were characterised by different samples sizes (N), time since population decline (t a

) and current effective population size (sim N

0

). Missing values are indicative of lack of convergence between chains and/or of a bimodal posterior distribution. The past effective population size was always simulated to be 5000 individuals. In each graph, the grey bars represent the 90% credible intervals of the point estimate represented by a cross. The dotted lines indicate the simulated time since the decline.

Download