Supplementary Material (doc 145K)

advertisement
SUPPLEMENTAL MATERIAL
MATERIALS AND METHODS
Simulated datasets:
All simulated datasets were generated in HYBRIDLAB version 1.0 (Nielsen et al, 2006), using the
representative allopatric samples of M. fuliginosus and M. giganteus as parental populations.
1. Initial dataset: 500 representative individuals for each species were simulated as well
as 500 F1 hybrids. Using these simulated individuals, 500 first, second and third
generation backcrosses to each species were also simulated. This dataset was used to
assess the cut-off values (see main text), optimise settings in (Anderson and Thompson,
2002) and assess the accuracy of the clustering algorithms in both in NEWHYBRIDS and
STRUCTURE (Pritchard et al., 2000).
Optimizing setting in NEWHYBRIDS
When hybridisation is known to be rare, a reduction in (ζg’s for) those genotype frequency classes that
require more episodes of interbreeding within the last n generations, may be used in NEWHYBRIDS
to increase the likelihood of detecting hybridisation (see Anderson and Thompson (2002) for a
detailed explanation). For instance, a reduction to the hybrid genotype frequency class for the
backcross categories increases the likelihood of detecting hybrids resulting from multiple generations
of backcrossing. Optimum (ζg’s for) genotype frequency class values for backcross categories in our
study, those which maximized detection whilst minimizing the generation of false positives in the
representative samples, were obtained using the initial simulation dataset (see above) containing all
4500 simulated individuals for all hybrid categories.
Choice of method for hybrid detection
To determine the most appropriate method, STRUCTURE or NEWHYBRIDS, using either the
default or optimized settings, for the detection of hybrid genotypes in our study system a simulation
study was conducted.
First, the three methods were compared using the initial simulated dataset (see 1 above). Second,
since variation in the frequency of hybrid genotypes in the sample has been shown to influence the
accuracy of these programs (Vaha and Primmer, 2005), multiple simulated datasets (listed below)
containing variable proportions of hybrid genotypes from each hybrid category were also created to
investigate the accuracy of the methods under various scenarios to determine if this would alter the
method chosen. These generated datasets fell into two categories, the complete (or near complete)
removal of single or multiple categories and variation in the ratio of all hybrid categories relative to
pure in the sample.
2. Removal of categories – results are shown in Figure S4
i.
ii.
iii.
iv.
v.
None- Initial dataset (see 1 above) containing all simulated individuals
90% F1- 90% of simulated F1 hybrids removed from i.
95% F1- 95% of simulated F1 hybrids removed from i.
F1- 100% simulated F1 hybrids were removed from i.
F1&1Bx – 100% simulated F1 and 1st generation backcrosses removed
from i.
vi. F1, 1Bx&2Bx – 100% simulated F1, 1st and 2nd generation backcrosses
removed from i.
3. Frequency of hybrids – results are shown in Figure S5
i. 0.25% - 3 F1 hybrids, 5 first generation backcrosses, 10 second
generation 20, 3rd generation backcrosses and 1000 ‘pure’
ii. 0.5% - 5 F1 hybrids, 10 first generation backcrosses, 20 second
generation 40, 3rd generation backcrosses and 1000 ‘pure’
iii. 1% - 10 F1 hybrids, 20 first generation backcrosses, 40 second
generation, 80, 3rd generation backcrosses and 1000 ‘pure’
iv. 2.5% - 25 F1 hybrids, 50 first generation backcrosses, 100 second
generation 200, 3rd generation backcrosses and 1000 ‘pure’
RESULTS
Distinguishing Hybrid and Pure genotypes:
Analysis of the initial simulated dataset in NEWHYBRIDS revealed that a definitive
assignment was made (either correctly or incorrectly) in 81.8% of backcrosses in
NEWHYBRIDS, see Figure S1.
b.
1st generation backcrosses
2nd generation backcrosses
1
1
0.9
0.9
0.8
0.8
0.7
0.7
% of sample
% of sample
a.
0.6
0.5
0.4
0.6
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
1
0.95
0.90
0.85
0.80
0.75
0.65
Assignment probability
0.60
0.55
0.50
0.45
1
0.95
0.90
0.85
0.80
0.75
0.65
Assignment probability
0.60
0.55
3rd generation backcrosses
1
Figure S1: The assignment probabilities
of simulated first (a), second (b) and
third (c) generation backcrosses in
NEWHYBRIDS. The black line
indicates the cut-off value for a definitive
assignment (0.75).
0.9
0.8
% of sample
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
0.95
0.90
0.85
0.80
0.75
0.65
Assignment probabiltiy
0.60
0.55
0.50
0.45
0.50
0.45
Optimizing Setting in NEWHYBRIDS
The reduction in backcross hybrid genotype frequency classes in NEWHYBRIDS results in an
increase in the proportion of simulated backcrosses which were distinguished (Figure S2). However,
the likelihood of false positives in the representative sample also increased (Figure S2). Furthermore,
the proportion of simulated first generation backcrosses correctly distinguished decreased with the
decreasing backcross hybrid genotype frequency class (Figure S2), which was attributable to
misidentification as F1s. The backcross genotype frequency class was reduced from 0.50 to 0.35
before false positives were evident in the representative individuals (0.02%), corresponding to an
increase of 30.7% and 33.0% in the proportion of simulated second and third generation backcrosses
distinguished respectively (Figure S2 and S3). It was not until this value was lowered to 0.15, that
any representative individuals (3/1000) were misidentified with a high degree of certainty (>0.75).
Therefore, in analyses of the empirical data, a value of 0.35 for backcross hybrid genotype frequency
class was used, as this provided the highest proportion of correct assignment, overall. Comparisons
between the results of optimised NEWHYBRIDS with the default settings did not yield additional
information capable of consistently distinguishing between generations of backcrosses. However,
individuals only detected using the optimised settings were more frequently second generation or
later.
1
Proportion Correctly Assigned
1
Pure Total correct
Pure
0.8Total correct
F1backcross Total
correct
F1backcross
Total
Figure S2: The effect of reducing
the hybrid Total
genotype frequency class values for the backcross
F2backcross
correct
0.6
correct
Pure Totalascorrect
categories in NEWHYBRIDS on the identification of pure (
) individuals
well as first (
Total
Total
second (
)F2backcross
and third ( ) F3backcross
generation backcrosses.
correct
correct
0.8
0.6
0.4
F3backcross
Total
correct
0.4
0.6
0.7
0.8
0.9
0.2
brid genotype Frequency Class
0.2
0.8
0.9
0
F1backcross Total
correct
F2backcross Total
correct
F3backcross Total
correct
Pure Total corre
),F1backcross To
correct
F2backcross To
correct
F3backcross To
correct
Choice of Methodology for hybrid detection
1. Initial dataset:
STRUCTURE distinguished 100% of F1 hybrids, 91.5% of simulated first generation
backcrosses, 52.1% of simulated second generation backcrosses and 21.4% of third
generation backcrosses from “pure” (Figure S3). However, due to overlapping Q values for
simulated F1 hybrids (0.3-0.7) and backcrosses (0.43-0.98), a much lower proportion 64.0%,
50.1% and 21.2% of simulated first, second and third generation backcrosses respectively
were correctly identified (Figure S3).
Similar results were apparent in NEWHYBRIDS, with 100% of F1 hybrids, 90.4% of first
generation backcrosses, 50.5% of second and 19.3% of third generation backcrosses
distinguished from “pure” (Figure S3). Separation into hybrid categories did not substantially
alter the proportion of individuals correctly assigned, with 98.6% of F1 hybrids, 85.5% of
simulated first generation backcrosses, 50.5% of second and 19.3% of third generation
backcrosses correctly identified (Figure S3). However, 4.7% of simulated first generation
backcrosses were misidentified as F1 hybrids. Differences in the assignment probabilities of
individuals could not be used to consistently separate generation of backcrossing.
The proportion of backcrosses correctly identified increased using the optimised
NEWHYBRIDS, with 92.9% first, 81.2% second and 52.3% of third generation backcrosses
correctly identified (Figure S3).
Proportion Correctly Assigned
1
0.8
0.6
0.4
0.2
0
Proportion Correctly Assigned
1
Pure
F1
Bx1
Bx2
Bx3
Category
0.8
S first (Bx1), second (Bx2)
Figure0.6
S3: Comparison of the proportion of first generation hybrids (F1),
NH
and third (Bx3) generation backcrosses
by STRUCTURE
S and pure individuals correctly assigned
NH2
( )Sand
and optimised NEWHYBRIDS ( ).
0.4default NEWHYBRIDS ( )NH
NH
NH2
NH20.2
0
Pure
F1
Bx1
Bx2
F1
Bx3
Bx1
Category
Bx2
Bx3
2. Removal of Categories
The removal of categories of hybrid categories decreased the proportion of the remaining simulated
hybrids correctly identified using all methods (Figure S4), with this effect amplified by the removal of
multiple groups (Figure S4). Optimised NEWHYBRIDS (Figure S4a) was least effected by the
removal of simulated F1 hybrids, with an overall decrease in the detection of simulated backcrosses of
less than 1%, compared with 3.9% in default NEWHYBRIDS (Figure S4b) and 15.3% in
STRUCTURE (Figure S4c). Moreover, this decrease in detection was not substantially changed when
a proportion (10 or 5%) of simulated F1 hybrids remained within the sample (Figure S4). The
removal of multiple categories further reduced the proportion of backcross distinguished from pure
(Figure S4). Overall, the optimised NEWHYBRIDS detected the highest proportion of simulated
backcrosses.
a
b
1
0.8
0.8
Frequency correct
Frequency correct
1
0.6
0.4
0.2
0.6
0.4
0.2
0
0
i
ii
iii
iv
v
i
vi
ii
iii
Categories
c
iv
Categories
1
b
b
1
1
Frequency correct
0.8
Frequency correct
0.8
0.6
0.4
0.6
0
0.4
0.2
Pure
0
i
0.4
0.25%
Pure
F1
0.50%
1%
2.50%
Frequency
of Hybridization
F2backcross
Frequency
of correct
Hybridization
0.50%
1% Total
2.50%
F3backcross
Frequency of Hybridization
Total correct
F1
,1
Bx
&2
Bx
F1
&1
Bx
1B
x
0.6
iiPure
F1
iii
F1
iv
v
vi
Categories
F1backcross
F1backcross F2backcross
)F2backcross
and third ( F3backcross
) generation
0.2 S2: The detection of first ( F1backcross
Figure
), second (
backcrosses and pure
F3backcross
Total
individuals ( )Pure
by optimised
NEWHYRBIDS
(a),
default
NEWHYRBIDS
(b)
and STRUCTURE (c)
F2backcross
correct
following
the complete or partial removal
of specific and combinations of hybrid categories (see text)
0
F3backcross
F1backcross
0.25%
0.50%
1%
2.50%
from the initial
sample.
Total correct
0.2
0.25%
Frequency correct
0.8
rom sample
v
vi
3. Frequency of hybrids
Variation in the ratio of hybrids in the sample (i.e. 2.5, 1, 0.5 and 0.25%) resulted in an overall
reduction of 13% in optimised NEWHYBRIDS, 5.53% in default NEWHYBRIDS and 1% in
STRUCTURE in distinguishing simulated backcrosses, following a reduction from 2.5% to 0.25%
hybridisation (Figure S5). The largest reduction occurred between 0.5 and 0.25% regardless of the
method. Some variation in accuracy at low frequencies (Figure S5) was associated with the
individual genotypes included rather than an effect of relatively few hybrid genotypes in the sample.
For instance only minor (<0.01) changes were apparent in the posterior probabilities of individuals
used in multiple runs in either NEWHYBRIDS or STRUCTURE. Again, the optimised
NEWHYBRIDS detected the greatest proportion of backcrosses.
a
b
1
0.8
Frequency correct
Frequency correct
0.8
1
0.6
0.4
0.6
0.4
0.2
0.2
0
0
0.25%
0.50%
1%
0.25%
2.50%
Frequency of hybridisation
c
0.50%
1%
Frequency of Hybridization
1
b
b
0.8
1
0.8
0.6
0.4
Frequency correct
0.8
0.6
0.4
0.2
0.2
Frequency correct
1
0.8
1
0.6
0.4
Frequency correct
b
1
Frequency correct
0.8
0.6
0.4
0.2
0.6
Pure
0
0.25%
0.50%
Pure
1%
F1
Frequency
Pure of Hybridization
0.4
Pure
F1
2.50%F1
F1backcross
F1backcross F2backcross
0.2
Figure S5: The detection
of F1 hybrids ( ),F1first ( ),
second ( )F2backcross
and third ( )F3backcross
generation
F1backcross
Pure
0.2
F3backcross
F1backcross
backcrosses and pure individuals ( ) by optimised NEWHYRBIDS
(a), default NEWHYRBIDS (b)
F2backcross
F1
and STRUCTURE (c)0 with varying frequencies
of hybridization.
F2backcross
F3backcross
0
0.25%
0.25%
0.50%
F1backcross
0.50%
1%
F3backcross
1%
2.50%
F2backcross
Frequency
of Hybridization
0
F3backcross
Frequency
of Hybridization
0
0.25%
0.50%
1%
2.50%
0.25%
0.50%
1%
2.50%
Frequency of Hybridization
Frequency of Hybridization
0.25%
0.50%
1%
2.50%
Frequency of Hybridization
2.50%
2.50%
Chosen Method
The results of the simulation study revealed the highest proportion of correct assignments, in all
scenarios, occurred using the optimised NEWHYBRIDS method, with ζg’s for the backcross
categories genotype frequency classes set at 0.35. Hence this appeared the most appropriate method,
to investigate hybridisation within our study system, since the frequency of hybridisation is expected
to be low (Kirsch and Poole, 1972). However, the optimised settings resulted in the misidentification
of 5.2% of the simulated first generation backcrosses as F1 hybrids. Therefore to detect the presence
of F1 hybrids the empirical data was also examined with the default NEWHYBRIDS and
STRUCTURE.
Download