Supplementary Methods (doc 46K)

advertisement
SUPPLEMENTARY METHODS
Heterozygote deficiencies in parasite component populations:
An evaluation of interrelated hypotheses in the raccoon tick, Ixodes texanus
G Dharmarajan, JC Beasley and OE Rhodes Jr.
Department of Forestry and Natural Resources,
Purdue University,
West Lafayette, Indiana, USA
Large allele dropout
The most commonly used method for detection of large allele dropout is that implemented in the software
MICROCHECKER (Van Oosterhout et al., 2004). However, large allele dropout can also be recognized in the presence
of a statistically significant negative slope when regressing FIS of the kth allele in the jth locus in the ith IP (fijk) against
allele size (see de Meeûs et al., 2004). Thus, at each locus we estimated the fijk separately for each IP and used a
GLM (normally distributed errors and identity link; SPSS ver. 16) to explore the effect of allele size on fijk while
controlling for host level factors. The initial model had the form fijk ~ poly(Allele size, 2) + Host sex + Host age +
Host sex:Host age + Constant, where poly(Allele size, 2) was the quadratic function of allele size. Apart from the
quadratic function, we also tested for a monotonic relationship between fijk and allele size. As recommended by de
Meeûs et al., (2004) we weighted each observation by the product Nijpijk(1-pijk), where Nij is the number of
individuals genotyped and pijk is the frequency of the kth allele (at the jth locus in the ith IP). Thus, more weight was
given to larger and more polymorphic samples. Model parsimony was assessed using AICC. We used the same
procedure as described above to test for signatures of large allele dropout at the CP scale.
Cryptic structure
We primarily relied on the program BAPS ver. 5.3 (Corander et al., 2008) for detection of cryptic population
structure. While, numerous spatial genetic clustering algorithms are currently available to test for cryptic
microgeographic structure, Latch et al., (2006) demonstrated that BAPS ver. 3.1 (Corander and Marttinen, 2006) and
STRUCTURE ver. 2.1 (Pritchard et al., 2000) are capable of accurately partitioning individuals even at low levels of
genetic differentiation (FST ≥ 0.05). Both these programs identify cryptic sub-structure by minimizing HardyWeinberg and linkage disequilibrium within each of k clusters, the programs use algorithms which exhibit different
advantages and disadvantages. While the stochastic-greedy algorithm used by BAPS is fast, it tends to overestimate k
due to the inference of a few small spurious clusters. On the other hand, the Markov Chain Monte-Carlo (MCMC)
algorithm used by STRUCTURE more accurately identifies the number of clusters but is slower to implement (Latch et
al., 2006).
Given, the probability that utilizing only BAPS may lead to the spurious identification of cryptic population
structure we confirmed the BAPS results utilizing the program STRUCTURE ver. 2.2. Briefly, we first calculated the
maximum k identified by BAPS with a posterior probability ≥ 0.05 (designated k0.05) to facilitate STRUCTURE analyses
(see below). We then used STRUCTURE ver. 2.2 to calculate the overall likelihood of the genotypic data assuming
each CP was composed of 1–k distinct clusters. To this end, we utilized a model that assumed admixture and
correlated allele frequencies between clusters. We constrained the maximum k for each STRUCTURE run to k0.05+2
(the additional values evaluated to facilitate Δk computation; see below). Because BAPS likely overestimates the true
number of clusters (Latch et al., 2006), we felt the loss of information due to the constraint of maximal k in
STRUCTURE would be minimal. We performed 5 iterations per k, first allowing the Markov chain to reach stationarity
with a burn-in of 150 000 MCMC simulations, followed by 500 000 MCMC simulations to find optimal clusters.
Provided k > 1 (as determined by the loge likelihood of the data given k), we chose the most parsimonious number of
clusters using the Δk method proposed by Evanno et al., (2005).
Kin structure
We evaluated the levels of Type I (α) and Type II (β) errors in classification of half-sib groups by three clustering
algorithms implemented in STRUCTURE (Pritchard et al. 2000), BAPS (Corander et al. 2008) and PEDIGREE
(Herbinger 2005). In order to evaluate the error rates we first generated 20 random populations utilizing the software
KINGROUP (Konovalov et al. 2004). Each population consisted of 7 half-sib groups (group sizes were 5, 4, 3, 3, 2, 2,
2 individuals; 21 individuals in total/population) which were generated utilizing global allele frequencies at the 11 I.
texanus microsatellites. We evaluated levels of FIS at each of the 11 loci and obtained approximate 95% confidence
intervals by jacknifing across populations utilizing the program FSTAT (Goudet, 1995; Fig. S2A). Each of the 20
random populations were then analysed seperately utilizing the three clustering algorithms. In the case of
Dharmarajan et al.; Supplementary Methods
2
we calculated the overall likelihood of the genotypic data assuming each random population was
composed of k = 1–10 distinct clusters; utilizing a model that assumed admixture and correlated allele frequencies
between clusters. In the case of BAPS for each random population we carried out 5 runs wherein the maximum k was
constrained to 10, and the k having the highest posterior probability (across the 5 runs) was considered the most
likely number clusters identified by BAPS. Finally, in the case of PEDIGREE for each random population we carried
out 5 runs utilizing 500 000 iterations/run and the the following run parameters: Full sib constraint = 0; Temperature
= 10; Weight = 1 (see Table S1 for run parameter details). Once the best partition was generated by each of the three
algorithms for each of the 20 random populations the level of α error was calculated as the proportion of unrelated
individual pairs incorrectly grouped together in a single cluster (i.e. the number of unrelated individuals wrongly
classified as being half-sibs). Alternatively, the power (1- β) was calculated as the proportion of true half-sib pairs
correctly grouped together in a single cluster (i.e. the number of half-sibs correctly classified as being related). The
levels of α error and power for the clustering algorithms implemented in STRUCTURE , BAPS and PEDIGREE are given
in Fig S2B.
STRUCTURE
Testing assumptions of MC simulation
The post hoc Monte-Carlo (MC) simulation was performed to evaluate whether the levels of kin-structure and lifehistory characteristics of I. texanus could adequately explain deviations from HWE at the IP scale. The MC
simulation approach, principally based on the subdivided breeding group model proposed by Criscione and Blouin
(2005) and modified to take into account the levels of kin-structure observed in I. texanus, was based on three major
assumptions. First, we assumed that there was pangamy at the scale of the raccoon den, an assumption that seemed
justified in the absence of empirical data indicative of (positive/negative) assortative mating. Second, we assumed
that mating took place prior to dispersal in I. texanus. This assumption also seems reasonable since mating takes
place prior to blood-feeding in the case of nidicolous ticks (Sonnenshine, 1993) and dispersal can only take place
while ticks are feeding on the host. Finally, we assume that IP scale allele frequencies are an adequate estimate of
allele frequencies of ticks that will mate to produce subsequent generation of ticks. While, we feel these assumptions
are realistic based on I. texanus biology there was no direct way of evaluating if these assumptions were accurate.
However, it was clear that these assumptions could only affect the results of the MC simulation through genetic
patterns of the kin groups generated. Thus, to test if this assumption was likely to affect the results of our MC
simulation for each observed kin group (with > 1 tick) we generated 100 kin groups of the same size (following the
MC simulation approach outlined in the main text). Within the observed and simulated kin groups we calculated the
average pair-wise relatedness (Queller and Goodnight 1989) and FIS as implemented in SPAGEDI (Hardy and
Vekemans 2002) and GENEPOP’007 (Rousset 2007), respectively. We created a frequency distribution of observed
average relatedness and FIS values and compared this distribution with the distribution of simulated values. The
above test was only carried out in the five CPs that showed significant levels of kin structure (see main text) because
in CPs with non-significant levels of kin structure the MC simulation assumed all ticks were unrelated (i.e. kin
group size = 1; number of kin groups = number of ticks sampled). The results of comparing the frequency
distribution of average relatedness and FIS values in observed and MC simulated kin groups showed a strong
concordance between the two at all loci and CPs examined (see Fig. S5 and S6, respectively).
Dharmarajan et al.; Supplementary Methods
3
Supplementary references
Corander J, Waldmann P, Marttinen P, Sillanpaa MJ (2004). BAPS 2: enhanced possibilities for the analysis of
genetic population structure. Bioinformatics 20(15): 2363-2369.
Corander J, Marttinen P, Sirén J, Tang J. (2008). Enhanced Bayesian modelling in BAPS software for learning
genetic structures of populations. BMC Bioinformatics, 9: 539.
Criscione CD, Blouin MS (2005). Effective sizes of macroparasite populations: a conceptual model. Trends
Parasitol 21(5): 212-217.
de Meeûs T, Humair PF, Grunau C, Delaye C, Renaud F (2004). Non-Mendelian transmission of alleles at
microsatellite loci: an example in Ixodes ricinus, the vector of Lyme disease. Int J Parasitol 34(8): 943950.
Evanno G, Regnaut S, Goudet J (2005). Detecting the number of clusters of individuals using the software
STRUCTURE: a simulation study. Mol Ecol 14(8): 2611-2620.
Goudet J (1995). FSTAT (Version 1.2): A computer program to calculate F-statistics. J Hered 86(6): 485-486.
Hardy OJ, Vekemans X (2002). SPAGEDi: a versatile computer program to analyse spatial genetic structure at the
individual or population levels. Mol Ecol Notes 2(4): 618-620.
Herbinger CM. (2005). PEDIGREE. Available from http://herbinger.biology.dal.ca:5080/Pedigree/.
Konovalov DA, Manning C, Henshaw MT (2004). KINGROUP: a program for pedigree relationship reconstruction
and kin group assignments using genetic markers. Mol Ecol Notes 4(4): 779-782.
Latch EK, Dharmarajan G, Glaubitz JC, Rhodes OE (2006). Relative performance of Bayesian clustering software
for inferring population substructure and individual assignment at low levels of population differentiation.
Conserv Genet 7(2): 295-302.
Pritchard JK, Stephens M, Donnelly P (2000). Inference of population structure using multilocus genotype data.
Genetics 155(2): 945-959.
Queller DC, Goodnight KF (1989). Estimating relatedness using genetic markers. Evolution 43(2): 258-275.
Rousset F (2008). GENEPOP ' 007: a complete re-implementation of the GENEPOP software for Windows and
Linux. Mol Ecol Res 8(1): 103-106.
Sonenshine DE (1993). Biology of ticks. Vol. 2. Oxford University Press: New York.
Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004). MICRO-CHECKER: software for identifying
and correcting genotyping errors in microsatellite data. Mol Ecol Notes 4(3): 535-538.
Dharmarajan et al.; Supplementary Methods
4
Download