ELE_1454_sm_appendixS2

advertisement
Supplementary Information, Appendix 3: Sample size analysis and permutations.
Results of a bootstrap analysis of the effect of sample size on the estimate of nucleotide diversity (π). For three populations
(Mionectes oleagineus – Bocas del Toro, Panama; Myrmeciza exsul – Bocas del Toro, Panama; Henicorhina leucostica – Darien,
Panama) we created 8 bootstrap datasets each consisting of 1000 subsamples of the entire population dataset. Each subsampled dataset
had between 2 and 10 individuals, which were sampled with replacement from the population. For each of these 8000 subsampled
datasets we calculated π. At each sample size value (e.g. 2 to 8) we calculated the mean (Nuc[leotide] Div[ersity] mean), standard
deviation (Nuc Div St Dev), median (Nuc Div Median), and median minus mean (Median – Mean) of π. We present these data in three
figures below.
In all three cases, we noted a minor downward bias in mean and median estimates of π at sample sizes between 2 and 4.
However, above n = 4 we detected no bias in π associated with sample size. Likewise, the standard deviation of π was relatively large
for samples of 2 or 3, but above n = 4 little decrease in standard deviation was observed with increases in sample size. Finally, Median
values of π were extremely stable above n = 4. We infer from these data that the minor bias in the most extreme small samples is due
to the lack of sampling relatively uncommon, genetically-dissimilar, alleles.
The conclusions in this manuscript are based on the estimate of nucleotide diversity in 49 populations where the true parameter
value remains unknown. Our bootstrap simulations demonstrate that sample-size related artifacts are unlikely to have played an
appreciable role in our observations of patterns in the geographical distribution of nucleotide diversity among these populations.
However, we include some permutations of our null-hypothesis testing to explicitly address any issues related to some of the smallest
sample sizes included in our overall dataset, as follows.
We sequentially removed small-sample sized populations from our analysis to create six subset data matrices (minimum n = 3
to minimum n = 8) for which we evaluated whether the value of πi > πi+1 was positive more often than expected by chance. As was the
case for the full dataset, reduced subsets of data had a ratio of positive to negative values of πi > πi+1 not significantly different than
0.50. Thus, we conclude that small samples were not responsible for the patterns observed in our study.
Supplementary Table 2. Effect of removing small sample-size populations on the test of an inverse relationship between
latitude and nucleotide diversity.
minimum
n
2**
3
4
5
6
7
8
count of
πi > πi+1
21
21
21
21
19
18
16
number
of comparisons
38
37
36
34
30
29
27
ratio
0.55
0.57
0.58
0.62
0.63
0.62
0.59
p-value*
0.31
0.26
0.20
0.11
0.10
0.13
0.22
* p-value is the probability that the ratio is significantly different than 0.50 as measured by a binomial exact test.
** i.e., the full dataset as evaluated in the main Results section.
We sequentially removed small-sample sized populations from our analysis creating six subset data matrices (minimum n = 3 to
minimum n = 8) for which we calculated the probability of observing that few, or few, number of species with a maximum π value in
an edge population. The probability of observing no species with max π in an edge population is the easiest to calculate, as this is the
product of the probability for each species of max π occurring in an no-edge population. The probability for any given species is just
the number of non-edge populations divided by the total number of populations. For example, in the full dataset, the probability of 0
max-π in edge populations is:
(p Phaethornis longirostris [4/6]) × (p Phaethornis striigularis [4/6]) × (p Amazilia tzacatl [4/6]) × (p Glyphorynchus spirurus [3/5])
× (p Myrmeciza exsul [2/4]) × (p Pipra mentalis [3/5]) × (p Mionectes oleagineus [4/6]) × (p Henicorhina leucosticta [4/6]) × (p
Euphonia goulidi [3/5])
= (0.67  0.67 × 0.67 × 0.60 × 0.50 × 0.60  0.67 × 0.67  0.50) = 0.012
In the case where the max π occurs in the edge in at least one species, it is necessary to calculate all the possibilities of equally
extreme, or more extreme outcomes. Therefore, this product is the sum of all such scenarios. In the case of 1 edge, this sums the
previous product for n = 0 as well as the 9 different ways (i.e. one for each species) in one species has max π in an edge population.
For each of the nine hypothetical case of n = 1, the probability is calculated as above, except that for the given species, the probability
of max π occurring in a non-edge (i.e. 4/6) is replaced by the probability of max π occurring in an edge (i.e. 2/6). This is done for each
of the nine species, and then these nine probabilities are summed along with the calculated probability of 0 occurrences of max π in an
edge population.
In the full dataset, as well as in 5 of 6 permutations of our dataset, the number of species with the maximum observed π value
occurring in an edge population was significantly smaller than expected by chance (i.e. if populations with the maximum π value are
distributed randomly throughout a species’ range), while in the final case, the p-value was nearly 0.05 (0.065). We take this as
evidence that sample size issues have no effect on our observation that the populations with maximum π values occur in range-center
populations more often than could be expected by chance.
Supplementary Table 3. Effect of removing small sample-size populations on the test of whether the maximum π value occurs
in edge populations less frequently than expected by chance.
minimum
n
2*
3
4
5
6
7
8
number
of species
9
9
9
9
8
8
8
number of species with
max π in edge population
0
1
1
1
0
0
0
p-values in bold represent significant values where  = 0.05.
* i.e. the full dataset as evaluated in the main Results section.
joint probability of an result
equally or more extreme
0.012
0.065
0.043
0.039
0.012
0.010
0.006
Download