SI2: Additional details on data collection, data processing, and analysis Leaves were collected from wild plants in the field, maintained fresh during transport, and stored as leaf tissue at –80°C until DNA extraction. DNA was extracted from fresh or frozen material using the DNeasy plant extraction protocol (DNeasy, Qiagen, Valencia, CA) as reported in Cavender-Bares and Pahlich (2009). Voucher specimens for all RAD sequenced individuals and for one representative per site for the individuals sampled for nuclear SSR and chloroplast markers are housed in the University of Minnesota Bell Museum of Natural History. Collection authorizations and permits Collections in Florida were authorized by the Florida Division of Environmental Protection, Division of Recreation and Parks and approved by Clif Maxwell, District Park Biologist, in North Carolina by District Superintendent William Berry and Marshall Ellis, North Caolina Department of Environment and Natural Resources, Division of Parks and Recreation for the southeastern US species (Q. virginiana, Q. minima, Q. geminata). Collection of Q. sagraena in Cuba in Pinar del Rio was conducted with permission of Dr. Antonio Lopez Almirall at the Museo Nacional de Historia Natural, La Habana, Cuba. Scientific research permits for collection were obtained from the Belize Ministry of Natural Resources and the Environment authorized by Chief Forest Officer Hannah Martinez, in Honduras from the Paul C. Standley Herbarium director at the University of Zamorano Dr. George Pilz and Dr. Lilian Ferrufino, in Costa Rica from the Ministry of Energy and Environment, authorized by Roger Blanco, and in Mexico by the Secretariat of Environment and Natural Resources (SEMARNAT) by Director General, Dr. Francisco García García. Other field collections did not require written permits, as they were acquired from roadside populations or private land (with permission from landholders) in unregulated areas. In several cases, DNA was extracted from progeny, grown in Greenhouse facilities from field-collected seeds. Export, phytosanitary and import permits were obtained for seed collection from Honduras, Costa Rica, Belize and Mexico and can be provided upon request. RAD library preparation. DNA was extracted from fresh or frozen material using the DNeasy plant extraction protocol (DNeasy, Qiagen, Valencia, CA) as reported in Cavender-Bares and Pahlich (2009). Extractions were gel-quantified in agarose by visual comparison with the New England Biolabs 100 bp DNA Ladder (NEB, Ipswich, MA). Extraction concentrations ranged from 5–10 ng DNA / μl. A RAD sequencing library was prepared at Floragenex Inc. (Eugene, Oregon) as described in Hipp et al (in press) using PstI restriction enzyme (a 6-base cutter: 5’ — CTGCA|G — 3’; 3’ — G|ACGTC — 5’) and attachment of sample specific barcodes. Assuming a GC-content of 40%, genome size of 500 Mb (both of which are typical of oaks), and completely random draw of nucleotides, we expect about 72,000 PstI cut sites in the oak genome. There was no obvious correlation between sequence quality and initial DNA concentration or material type (fresh vs. frozen). Data filtering. Raw sequence data were analyzed in the software pipeline PyRAD v.1.4 (Eaton & Ree 2013), which filters and clusters RAD sequences to identify putatively orthologous loci. This pipeline is suited to the phylogenetic scale of our study because of its use of global alignment clustering which can cluster highly divergent sequence while taking into account indel variation. Filtering parameters were set to replace base calls of Q<20 with an ambiguous base (N) and discard sequences containing more than three Ns. Reads clustered at 85% and 92% similarity yielded similar results therefore we report only those of the 85% run. Consensus base calls were made for clusters with a minimum depth of coverage greater than five. After correcting for errors, loci containing more than two alleles were excluded as potential paralogs (all taxa in this study are diploid). Consensus loci were then clustered across samples at 85% similarity and aligned. A final filtering step excluded loci that contain any site that is heterozygous across more than three samples, as this is more likely to represent a fixed difference among clustered paralogs than a true polymorphism at the scale of this study. Phylogenetic supermatrices It is common to recover missing data from many samples for any given locus in a RADseq data set, especially among more distant taxa. To minimize the amount of missing data in concatenated supermatrices we included only loci that had data for at least some minimum number of samples. In this way we created a number of data sets of varying size that were used for different analyses. The largest supermatrix includes all loci with at least four samples present, which we refer to as the “All_min4” data set, and a smaller but more densely sampled matrix was generated by requiring a minimum of 20 samples, termed the “All_min20” data set. The “min4” data sets were generally too large for dating analyses, so only the more dense concatenated data sets were used. The dense supermatrix includes the Q. virginiana, Q. minima, Q. geminata clade, MGV_min16, the Q. oleoides and Q. sagraena clade, OS_min14, and the Q. fusiformis and Q. brandegei, FB_min13, as well as a subsample of the All_min20 supermatrix, which excluded 10 samples that had other close relatives sampled in the data set, and which we refer to as the Sub_min20 data set. Range size calculations Latitude and longitude of each occurrence locality in our database as well as that available from herbarium records were compiled across the ranges for all species. Occurrence localities were provided by Tropicos® Missouri Botanical Garden (www.tropicos.org); the Instituto Nacional de Biodiversidad, Costa Rica; the University of Alabama Biodiversity and Systematics (accessed through the GBIF Data Portal, www.gbif.org, 7 July 2008); and the United States Department of Agriculture (USDA) PLANTS Database (plants.usda.gov). Points were examined in relation to reported locations, and errors were removed. Range sizes were estimated using minimum convex polygons of species occurrence values in ArcMap projected using UTM zone projections most appropriate for each species and clipped by water boundaries. Climatic niche envelopes were also used for each species. Briefly, climate models were generated using the program Maxent 3.3.1, which utilizes the maximum entropy method for modelling species geographical distributions (Phillips, 2006). The model was run separately for each species with all 19 climatic variables from WorldClim (Hijmans et al., 2005) at each locality. Fifty per cent of the occurrence localities were used for training the data to fit a model and the other fifty per cent for testing the fit of the model. The AUC (area under the curve of a receiver operating characteristic plot) values were examined to measure model performance. An AUC value of 1.0 is optimal, with the model predicting each occurrence of a species. Range area was based on the 95% distribution envelope. These contrasting estimation approaches were strongly correlated with each other (R2=0.94). Metrics of genetic diversity were regressed against range area, a proxy for population size, to detect whether genetic diversity could be predicted by population size.