README.

advertisement
ReadMe file
Shurtliff, Q.R., Murphy, P. J., and Matocq, M.D. (In Press). ECOLOGICAL SEGREGATION IN A SMALL
MAMMAL HYBRID ZONE: HABITAT-SPECIFIC MATING OPPORTUNITIES AND SELECTION AGAINST
HYBRIDS RESTRICT GENE FLOW ON A FINE SPATIAL SCALE. Evolution.
Direct questions to Quinn R. Shurtliff (qshurtliff@gmail.com)
Model Results
This dataset shows outputs from two models (NewHybrids and Structure) used to assign
individual woodrats to genotypic classes.
ID: Unique identifiers for each woodrat specimen analyzed. Tissue samples were archived for
many individuals at the Museum of Vertebrate Zoology, Berkley, California, USA
(http://mvz.berkeley.edu/Mammal_Collection.html; MVZ:Mamm:225847–226077; 227035–
227040).
NEWHYBRIDS: Genotypic class assignments based on the NEWHYBRIDS model.









bryanti = N. bryanti
BC-bry or BC-bryanti = backcross N. bryanti
BC-lep or BC-lepida = backcross N. lepida
lepida = N. lepida
F1 = First-generation hybrid
F2 = Theoretically, a second-generation hybrid, generated by crossing two F1s (though in
reality it could occur by any two hybrids breeding).
Hybrid = Sample cannot be assigned to a single genotypic class because posterior
probabilities do not reach the threshold value in any one class. However, if posterior
probabilities were summed across all hybrid classes, they would exceed the threshold.
The hybrid class in parentheses had the highest posterior probability.
Where two classes are separated by a “/”, the class listed first had the highest posterior
probability, though neither had a value high enough to exceed the threshold of 0.70.
Color Code
o Blue = N. bryanti
o Yellow = N. lepida
o Purple = hybrid
STRUCTURE: Genotypic class assignments based on the STRUCTURE model. Codes are as
above, except that “F1/F2” represents a combined genotypic class (see Supplement 1).
Combined: Final genotypic class assignments based on the combined evidence from both
models. Specific rules regarding how results were combined is presented in the main text and in
Supplement 1.
Simulation Results – STRUCTURE
This worksheet displays the STRUCTURE results using the simulated dataset, as described in the
text.
ID: Unique identifiers for each individual
Sim_Geno_Class: Genotypic classification for each simulated genotype (prior to analysis). The
class names are shown for each simulated and real individual (the 100 individuals in the parental
classes are from real data; all others are simulated). The simulated alleles are not shown.








bryanti – N. bryanti. Seed individuals (see text for details).
BC-bryanti (w/F1) – generated by “mating” N. bryanti with simulated F1s.
BC-bryanti (w/F2) – generated by “mating” N. bryanti with simulated F2s.
F1 – generated by “mating” N. bryanti with N. lepida.
F2 – generated by “mating” simulated F1s.
BC-lepida (w/F1) – generated by “mating” N. lepida with simulated F1s.
BC-lepida (w/F2) – generated by “mating” N. lepida with simulated F2s.
lepida – N. lepida. Seed individuals (see text for details).
bryanti q: The q-value produced by STRUCTURE. In theory, the proportion of a sample’s
genotype that is comprised of N. bryanti alleles.
Structure Threshold Determination – Optimized q-value ranges following the methods of Vaha
and Primmer (2006; see text for full citation) presented with summary statistics resulting from
the STRUCTURE analysis of the simulated dataset.






Total simulated: The total number of simulated and seed individuals within each
genotypic class that were analyzed.
# correct: The number of individuals that were correctly assigned to their actual
genotypic class using the optimized threshold values.
# others: The number of samples incorrectly assigned to the given genotypic class.
Efficiency: Efficiency score following Vaha and Primmer.
Accuracy: Accuracy score following Vaha and Primmer.
Overall Performance: Overall Performance score following Vaha and Primmer.
q-value ranges: This table compares Efficiency and Accuracy scores based on various q-value
ranges for each genotypic class. The ranges were chosen with the intent to optimize the summed
score, based on a visual inspection of the data. The yellow row indicates the value that was
ultimately used.




Total simulated: Same as above.
# correct: Same as above.
# others: Same as above.
Efficiency: Same as above.


Accuracy: Same as above.
Sum: Sum of the efficiency and accuracy score. The q-value range with the highest
summed score (or, in the case of a tie, the highest summed score with the lowest number
of erroneous assignments) was considered the optimal.
Simulation Results – NEWHYBRIDS
This worksheet displays the NEWHYBRIDS results using the simulated dataset, as described in
the text.
ID: Unique identifiers for each individual. These ID numbers correspond to the ID numbers in
the STRUCTURE results.
Sim_Geno_Class: Genotypic classification for each simulated genotype (prior to analysis). The
class names are shown for each simulated and real individual (the 100 individuals in the parental
classes are from real data; all others are simulated). The simulated alleles are not shown.








bryanti – N. bryanti. Seed individuals (see text for details).
BC-bryanti (w/F1) – generated by “mating” N. bryanti with simulated F1s.
BC-bryanti (w/F2) – generated by “mating” N. bryanti with simulated F2s.
F1 – generated by “mating” N. bryanti with N. lepida.
F2 – generated by “mating” simulated F1s.
BC-lepida (w/F1) – generated by “mating” N. lepida with simulated F1s.
BC-lepida (w/F2) – generated by “mating” N. lepida with simulated F2s.
lepida – N. lepida. Seed individuals (see text for details).
Posterior Probabilities: Posterior probabilities for each of the six genotypic classes.
NewHybrids Threshold Testing: Displays how we tested seven different posterior probability
thresholds (0.80 – 0.51 at 5-point increments), following Vaha and Primmer (2006), to determine
that the optimal threshold was 0.70 (see Supplement A).

Total – Total individuals within each class (all individuals were simulated except parental
classes).
 # ≥ [posterior probability threshold] – Number of samples within each class that were
correctly assigned at the given threshold.
 # others ≥ [posterior probability threshold] – Number of samples that were incorrectly
assigned to the given class.
 Efficiency: Efficiency score following Vaha and Primmer.
 Accuracy: Accuracy score following Vaha and Primmer.
 Overall Performance: Overall Performance score following Vaha and Primmer.
Isotopes
Subject #: Unique identifiers. ID numbers do not correspond to ID numbers in other datasets.
d15N (‰ vs. air): Nitrogen 15 isotope ratio values.
d13C (‰ vs. VPDB): Carbon 13 isotope ratio values.
Species ID: Genotype of each sampled individual





Bryanti = N. bryanti
Lepida = N. lepida
BC-lepida = backcrossed N. lepida
BC-bryanti = backcrossed N. bryanti
F1 = 1st generation hybrid
Survival on Site
Data used to calculate survivorship estimates
SurvID: Unique identifiers. ID numbers do not correspond to ID numbers in other datasets.
Sex:



M = male
F = female
Period (.) = Unknown
Spp: Genotypic Class



bryanti = N. bryanti
lepida = N. lepida
hybrid = any hybrid including backcrosses, F1s and F2s.
Age First Capture: Age at first capture



1 = juvenile
2 = subadult
3 = adult
Young of the year are included in both classes 1 and 2.
Survival Seasons: Equivalent to the number of years an individual was recaptured.



0=found only in season first tagged;
1=found one year post-tagging (but not thereafter);
2=found ≥2 years post-tagging.
Censored: Censored animals excluded from survival analysis if censoring occurred BEFORE the
survival transition being tested.


0 = no
1 = yes
Dispersal
Data used to calculate dispersal values
DispID: Unique identifiers. ID numbers do not correspond to ID numbers in other datasets.
Sex: Self-explanatory
Spp: Genotypic Classification of an individual



bryanti = N. bryanti
lepida = N. lepida
hybrid = any hybrid including backcrosses, F1s and F2s.
Age-befDisp: Age of an individual in the season just before dispersal event recorded.



1 = juvenile
2 = subadult
3 = adult
Young of the year are included in both classes 1 and 2.
year_to_year_disp_m: Distance (in meters) between capture localities (i.e., dispersal distance) in
two different years. Sampling always occurred in May and June.
Ln_yy_disp_m: Log-transformation of year_to_year dispersal distance (in meters).
Vegetation
Dataset used to test environment/genotype correlation
Woodrat ID: Unique identifier. These numbers correlate with ID numbers in the dataset Model
Results (where individuals were included in both datasets).
GenClass: Genotypic Class of individual.
Spp1 – Spp45: Species1 – Species45. Number of individual plants, per species, observed along
transects radiating out from woodrat nests.
Rock: Number of times rock was observed at each meter along the 4 transects (part of substrate
analysis).
Sand: Number of times sand was observed at each meter along the 4 transects (part of substrate
analysis).
Download