ReadMe file Shurtliff, Q.R., Murphy, P. J., and Matocq, M.D. (In Press). ECOLOGICAL SEGREGATION IN A SMALL MAMMAL HYBRID ZONE: HABITAT-SPECIFIC MATING OPPORTUNITIES AND SELECTION AGAINST HYBRIDS RESTRICT GENE FLOW ON A FINE SPATIAL SCALE. Evolution. Direct questions to Quinn R. Shurtliff (qshurtliff@gmail.com) Model Results This dataset shows outputs from two models (NewHybrids and Structure) used to assign individual woodrats to genotypic classes. ID: Unique identifiers for each woodrat specimen analyzed. Tissue samples were archived for many individuals at the Museum of Vertebrate Zoology, Berkley, California, USA (http://mvz.berkeley.edu/Mammal_Collection.html; MVZ:Mamm:225847–226077; 227035– 227040). NEWHYBRIDS: Genotypic class assignments based on the NEWHYBRIDS model. bryanti = N. bryanti BC-bry or BC-bryanti = backcross N. bryanti BC-lep or BC-lepida = backcross N. lepida lepida = N. lepida F1 = First-generation hybrid F2 = Theoretically, a second-generation hybrid, generated by crossing two F1s (though in reality it could occur by any two hybrids breeding). Hybrid = Sample cannot be assigned to a single genotypic class because posterior probabilities do not reach the threshold value in any one class. However, if posterior probabilities were summed across all hybrid classes, they would exceed the threshold. The hybrid class in parentheses had the highest posterior probability. Where two classes are separated by a “/”, the class listed first had the highest posterior probability, though neither had a value high enough to exceed the threshold of 0.70. Color Code o Blue = N. bryanti o Yellow = N. lepida o Purple = hybrid STRUCTURE: Genotypic class assignments based on the STRUCTURE model. Codes are as above, except that “F1/F2” represents a combined genotypic class (see Supplement 1). Combined: Final genotypic class assignments based on the combined evidence from both models. Specific rules regarding how results were combined is presented in the main text and in Supplement 1. Simulation Results – STRUCTURE This worksheet displays the STRUCTURE results using the simulated dataset, as described in the text. ID: Unique identifiers for each individual Sim_Geno_Class: Genotypic classification for each simulated genotype (prior to analysis). The class names are shown for each simulated and real individual (the 100 individuals in the parental classes are from real data; all others are simulated). The simulated alleles are not shown. bryanti – N. bryanti. Seed individuals (see text for details). BC-bryanti (w/F1) – generated by “mating” N. bryanti with simulated F1s. BC-bryanti (w/F2) – generated by “mating” N. bryanti with simulated F2s. F1 – generated by “mating” N. bryanti with N. lepida. F2 – generated by “mating” simulated F1s. BC-lepida (w/F1) – generated by “mating” N. lepida with simulated F1s. BC-lepida (w/F2) – generated by “mating” N. lepida with simulated F2s. lepida – N. lepida. Seed individuals (see text for details). bryanti q: The q-value produced by STRUCTURE. In theory, the proportion of a sample’s genotype that is comprised of N. bryanti alleles. Structure Threshold Determination – Optimized q-value ranges following the methods of Vaha and Primmer (2006; see text for full citation) presented with summary statistics resulting from the STRUCTURE analysis of the simulated dataset. Total simulated: The total number of simulated and seed individuals within each genotypic class that were analyzed. # correct: The number of individuals that were correctly assigned to their actual genotypic class using the optimized threshold values. # others: The number of samples incorrectly assigned to the given genotypic class. Efficiency: Efficiency score following Vaha and Primmer. Accuracy: Accuracy score following Vaha and Primmer. Overall Performance: Overall Performance score following Vaha and Primmer. q-value ranges: This table compares Efficiency and Accuracy scores based on various q-value ranges for each genotypic class. The ranges were chosen with the intent to optimize the summed score, based on a visual inspection of the data. The yellow row indicates the value that was ultimately used. Total simulated: Same as above. # correct: Same as above. # others: Same as above. Efficiency: Same as above. Accuracy: Same as above. Sum: Sum of the efficiency and accuracy score. The q-value range with the highest summed score (or, in the case of a tie, the highest summed score with the lowest number of erroneous assignments) was considered the optimal. Simulation Results – NEWHYBRIDS This worksheet displays the NEWHYBRIDS results using the simulated dataset, as described in the text. ID: Unique identifiers for each individual. These ID numbers correspond to the ID numbers in the STRUCTURE results. Sim_Geno_Class: Genotypic classification for each simulated genotype (prior to analysis). The class names are shown for each simulated and real individual (the 100 individuals in the parental classes are from real data; all others are simulated). The simulated alleles are not shown. bryanti – N. bryanti. Seed individuals (see text for details). BC-bryanti (w/F1) – generated by “mating” N. bryanti with simulated F1s. BC-bryanti (w/F2) – generated by “mating” N. bryanti with simulated F2s. F1 – generated by “mating” N. bryanti with N. lepida. F2 – generated by “mating” simulated F1s. BC-lepida (w/F1) – generated by “mating” N. lepida with simulated F1s. BC-lepida (w/F2) – generated by “mating” N. lepida with simulated F2s. lepida – N. lepida. Seed individuals (see text for details). Posterior Probabilities: Posterior probabilities for each of the six genotypic classes. NewHybrids Threshold Testing: Displays how we tested seven different posterior probability thresholds (0.80 – 0.51 at 5-point increments), following Vaha and Primmer (2006), to determine that the optimal threshold was 0.70 (see Supplement A). Total – Total individuals within each class (all individuals were simulated except parental classes). # ≥ [posterior probability threshold] – Number of samples within each class that were correctly assigned at the given threshold. # others ≥ [posterior probability threshold] – Number of samples that were incorrectly assigned to the given class. Efficiency: Efficiency score following Vaha and Primmer. Accuracy: Accuracy score following Vaha and Primmer. Overall Performance: Overall Performance score following Vaha and Primmer. Isotopes Subject #: Unique identifiers. ID numbers do not correspond to ID numbers in other datasets. d15N (‰ vs. air): Nitrogen 15 isotope ratio values. d13C (‰ vs. VPDB): Carbon 13 isotope ratio values. Species ID: Genotype of each sampled individual Bryanti = N. bryanti Lepida = N. lepida BC-lepida = backcrossed N. lepida BC-bryanti = backcrossed N. bryanti F1 = 1st generation hybrid Survival on Site Data used to calculate survivorship estimates SurvID: Unique identifiers. ID numbers do not correspond to ID numbers in other datasets. Sex: M = male F = female Period (.) = Unknown Spp: Genotypic Class bryanti = N. bryanti lepida = N. lepida hybrid = any hybrid including backcrosses, F1s and F2s. Age First Capture: Age at first capture 1 = juvenile 2 = subadult 3 = adult Young of the year are included in both classes 1 and 2. Survival Seasons: Equivalent to the number of years an individual was recaptured. 0=found only in season first tagged; 1=found one year post-tagging (but not thereafter); 2=found ≥2 years post-tagging. Censored: Censored animals excluded from survival analysis if censoring occurred BEFORE the survival transition being tested. 0 = no 1 = yes Dispersal Data used to calculate dispersal values DispID: Unique identifiers. ID numbers do not correspond to ID numbers in other datasets. Sex: Self-explanatory Spp: Genotypic Classification of an individual bryanti = N. bryanti lepida = N. lepida hybrid = any hybrid including backcrosses, F1s and F2s. Age-befDisp: Age of an individual in the season just before dispersal event recorded. 1 = juvenile 2 = subadult 3 = adult Young of the year are included in both classes 1 and 2. year_to_year_disp_m: Distance (in meters) between capture localities (i.e., dispersal distance) in two different years. Sampling always occurred in May and June. Ln_yy_disp_m: Log-transformation of year_to_year dispersal distance (in meters). Vegetation Dataset used to test environment/genotype correlation Woodrat ID: Unique identifier. These numbers correlate with ID numbers in the dataset Model Results (where individuals were included in both datasets). GenClass: Genotypic Class of individual. Spp1 – Spp45: Species1 – Species45. Number of individual plants, per species, observed along transects radiating out from woodrat nests. Rock: Number of times rock was observed at each meter along the 4 transects (part of substrate analysis). Sand: Number of times sand was observed at each meter along the 4 transects (part of substrate analysis).