mec13269-sup-0002-AppendixS1

advertisement
SI2: Additional details on data collection, data processing, and analysis
Leaves were collected from wild plants in the field, maintained fresh during transport,
and stored as leaf tissue at –80°C until DNA extraction. DNA was extracted from fresh or
frozen material using the DNeasy plant extraction protocol (DNeasy, Qiagen, Valencia,
CA) as reported in Cavender-Bares and Pahlich (2009).
Voucher specimens for all RAD sequenced individuals and for one representative per site
for the individuals sampled for nuclear SSR and chloroplast markers are housed in the
University of Minnesota Bell Museum of Natural History.
Collection authorizations and permits
Collections in Florida were authorized by the Florida Division of Environmental
Protection, Division of Recreation and Parks and approved by Clif Maxwell, District Park
Biologist, in North Carolina by District Superintendent William Berry and Marshall Ellis,
North Caolina Department of Environment and Natural Resources, Division of Parks and
Recreation for the southeastern US species (Q. virginiana, Q. minima, Q. geminata).
Collection of Q. sagraena in Cuba in Pinar del Rio was conducted with permission of Dr.
Antonio Lopez Almirall at the Museo Nacional de Historia Natural, La Habana, Cuba.
Scientific research permits for collection were obtained from the Belize Ministry of
Natural Resources and the Environment authorized by Chief Forest Officer Hannah
Martinez, in Honduras from the Paul C. Standley Herbarium director at the University of
Zamorano Dr. George Pilz and Dr. Lilian Ferrufino, in Costa Rica from the Ministry of
Energy and Environment, authorized by Roger Blanco, and in Mexico by the Secretariat
of Environment and Natural Resources (SEMARNAT) by Director General, Dr.
Francisco García García. Other field collections did not require written permits, as they
were acquired from roadside populations or private land (with permission from
landholders) in unregulated areas. In several cases, DNA was extracted from progeny,
grown in Greenhouse facilities from field-collected seeds. Export, phytosanitary and
import permits were obtained for seed collection from Honduras, Costa Rica, Belize and
Mexico and can be provided upon request.
RAD library preparation. DNA was extracted from fresh or frozen material using the
DNeasy plant extraction protocol (DNeasy, Qiagen, Valencia, CA) as reported in
Cavender-Bares and Pahlich (2009). Extractions were gel-quantified in agarose by visual
comparison with the New England Biolabs 100 bp DNA Ladder (NEB, Ipswich, MA).
Extraction concentrations ranged from 5–10 ng DNA / μl. A RAD sequencing library was
prepared at Floragenex Inc. (Eugene, Oregon) as described in Hipp et al (in press) using
PstI restriction enzyme (a 6-base cutter: 5’ — CTGCA|G — 3’; 3’ — G|ACGTC — 5’)
and attachment of sample specific barcodes. Assuming a GC-content of 40%, genome
size of 500 Mb (both of which are typical of oaks), and completely random draw of
nucleotides, we expect about 72,000 PstI cut sites in the oak genome. There was no
obvious correlation between sequence quality and initial DNA concentration or material
type (fresh vs. frozen).
Data filtering. Raw sequence data were analyzed in the software pipeline PyRAD v.1.4
(Eaton & Ree 2013), which filters and clusters RAD sequences to identify putatively
orthologous loci. This pipeline is suited to the phylogenetic scale of our study because of
its use of global alignment clustering which can cluster highly divergent sequence while
taking into account indel variation. Filtering parameters were set to replace base calls of
Q<20 with an ambiguous base (N) and discard sequences containing more than three Ns.
Reads clustered at 85% and 92% similarity yielded similar results therefore we report
only those of the 85% run. Consensus base calls were made for clusters with a minimum
depth of coverage greater than five. After correcting for errors, loci containing more than
two alleles were excluded as potential paralogs (all taxa in this study are diploid).
Consensus loci were then clustered across samples at 85% similarity and aligned. A final
filtering step excluded loci that contain any site that is heterozygous across more than
three samples, as this is more likely to represent a fixed difference among clustered
paralogs than a true polymorphism at the scale of this study.
Phylogenetic supermatrices It is common to recover missing data from many samples for
any given locus in a RADseq data set, especially among more distant taxa. To minimize
the amount of missing data in concatenated supermatrices we included only loci that had
data for at least some minimum number of samples. In this way we created a number of
data sets of varying size that were used for different analyses. The largest supermatrix
includes all loci with at least four samples present, which we refer to as the “All_min4”
data set, and a smaller but more densely sampled matrix was generated by requiring a
minimum of 20 samples, termed the “All_min20” data set. The “min4” data sets were
generally too large for dating analyses, so only the more dense concatenated data sets
were used.
The dense supermatrix includes the Q. virginiana, Q. minima, Q. geminata clade,
MGV_min16, the Q. oleoides and Q. sagraena clade, OS_min14, and the Q. fusiformis
and Q. brandegei, FB_min13, as well as a subsample of the All_min20 supermatrix,
which excluded 10 samples that had other close relatives sampled in the data set, and
which we refer to as the Sub_min20 data set.
Range size calculations
Latitude and longitude of each occurrence locality in our database as well as that
available from herbarium records were compiled across the ranges for all species.
Occurrence localities were provided by Tropicos® Missouri Botanical Garden
(www.tropicos.org); the Instituto Nacional de Biodiversidad, Costa Rica; the University
of Alabama Biodiversity and Systematics (accessed through the GBIF Data Portal,
www.gbif.org, 7 July 2008); and the United States Department of Agriculture (USDA)
PLANTS Database (plants.usda.gov). Points were examined in relation to reported
locations, and errors were removed. Range sizes were estimated using minimum convex
polygons of species occurrence values in ArcMap projected using UTM zone projections
most appropriate for each species and clipped by water boundaries. Climatic niche
envelopes were also used for each species. Briefly, climate models were generated using
the program Maxent 3.3.1, which utilizes the maximum entropy method for modelling
species geographical distributions (Phillips, 2006). The model was run separately for
each species with all 19 climatic variables from WorldClim (Hijmans et al., 2005) at each
locality. Fifty per cent of the occurrence localities were used for training the data to fit a
model and the other fifty per cent for testing the fit of the model. The AUC (area under
the curve of a receiver operating characteristic plot) values were examined to measure
model performance. An AUC value of 1.0 is optimal, with the model predicting each
occurrence of a species. Range area was based on the 95% distribution envelope. These
contrasting estimation approaches were strongly correlated with each other (R2=0.94).
Metrics of genetic diversity were regressed against range area, a proxy for population size,
to detect whether genetic diversity could be predicted by population size.
Download