Additional file Tables Table S1. Loci used in this study Table S2. Summary statistics and neutrality tests of populations at species boundaries Table S3. Other priors tested in IMa2 Table S4. Model evaluation for ecological niche modeling Figures Figure S1. Locations of occurrence data used for ecological niche modeling Figure S2. Projection of current and past distribution of R. palmatus and R. grayanus using the Maxent Tables Table S1. Loci used in this study Loci used and summary statistics at species boundaries and contact zone (Table 1). The putative gene functions are annotated based on the database of Arabidopsis thaliana reference proteins (min. threshold e-value was less than 10-20). bp: total length of sequence used for analysis; n(total) the total number of phased sequences; S: the total number of segregating sites; fixed S: the number of segregating sites between species that are fixed within a species; FST : population differentiation. COP1- and GSTF-homolog gene sequences included non-coding regions. Asterisk indicates chloroplast DNA region. Contact Zones putative gene function bp n S number of haplotype θ(total) π(total) CO 305 154 6 9 0.0041 0.0067 COP1 276 158 5 6 0.0032 0.0046 CCNB1 hypothetical protein ETR 118 158 4 4 0.0060 0.0122 482 160 9 8 0.0035 0.0070 258 328 154 158 3 7 4 4 0.0021 0.0037 0.0026 0.0077 349 639 158 158 7 9 10 7 0.0036 0.0025 0.0054 0.0042 PHYA 416 362 160 158 10 7 5 6 0.0042 0.0034 0.0107 0.0077 PAL4 603 160 15 13 0.0044 0.0054 LDOX 393 154 4 4 0.0018 0.0035 PHR2 518 158 9 11 0.0038 0.0045 EXPA 237 160 4 5 0.0030 0.0047 XET 248 160 5 5 0.0036 0.0067 hypothetical protein 276 154 2 3 0.0013 0.0034 IAA 351 160 3 4 0.0015 0.0026 TrnH_psbA* 238 80 3 2 0.0030 0.0053 F3H F3GT GI GSTF total 6302 112 Table S2. Summary statistics and neutrality tests of populations at species boundaries Population summary statistics for the populations at the species boundaries (Table 1). The parameters shown are the number of haplotype studied: n; the average number of segregating sites: S; nucleotide diversity (θ and π) for total, synonymous (s), non-synonymous (a), and non-cording. Tajima’s D and Fu & Li D were also calculated and simulated. Mean values and lower and higher 95% intervals were simulated with 1000 coalescent simulations and compared with the observed value to test its significance. +p<0.10, *p < 0.05, **p < 0.01. Table S3. Other priors tested in IMa2 Other priors tested in this study and highest probability density (HPD) with 95% HPD in parentheses, in isolation with migration models of two species populations (pop1 and pop2): divergence time in million years ago (t) of two focal populations, population size in thousands in population1 (θ1), population 2 (θ2), ancestral population (θA), and migration rate from population 2 to 1 (2NM1<2) and from population 1 to 2 (2NM1>2). Migration rates (2NM) were tested by likelihood ratio tests (Nielsen and Wakeley, 2001); *p < 0.05, **p < 0.01, ***p < 0.001. We used two migration priors for independent runs; a uniform distribution on [0,1] for migration rate, and an exponential distribution with mean m* = 0.05. model pop1, 2 run1 pYK, pEB prior θ 10 posterior m t hn 1 4 20 1 5 40 run2 10 run3 10 0.05 5 40 4 2 2 20 10 1 5 40 5 0.05# 4 40 run4 # gAM, gYK run5 run6 run7 pYK, gAM run8 run9 run10 pYK, gYK t θ1 θ2 θA 2NM1<2 2NM1>2 0.14 (0.03-3.63) 18.8 (9.3-32.0) 28.8 (16.1-45.7) 54.3 (0-255.2) 0.020 (0-0.216) 0.223 (0-0.362) 0.15 (0.03-4.54) 18.4 (9.3-32.0) 28.9 (13.6-45.7) 54.8 (0-288.4) 0.031 (0-0.215) 0.227 (0-0.363) 0.11 (0.05-0.21) 17.5 (8.4-31.1) 27.5 (14.8-46.1) 59.8 (34.8-96.1) 0.000 (0-0.035) 0.000 (0-0.073) 0.016 (0.003-0.13) 5.4 (1.3-14.1) 11.1 (2.9-27.6) 22.4 (7.9-42.6) 0.034 (0-0.235) 0.054 (0-0.477) 0.015 (0.006-0.07) 4.8 (1.5-14.0) 10.2 (2.7-26.5) 22.3 (8.9-40.2) 0.010 (0-0.104) 0.007 (0-0.212) 0.015 (0.006-0.065) 4.8 (1.5-13.5) 9.8 (2.7-26.5) 22.3 (9.4-40.2) 0.000 (0-0.013) 0.000 (0-0.025) 1 5 40 10 0.05# 5 40 10 1 5 40 1.15 (0.42-4.59) 42.2 (26.7-62.9) 12.9 (6.0-25.0) 20.9 (0-348.4) 0.000 (0-0.106) 0.000 (0-0.079) 0.99 (0.44-1.67) 45.1 (29.0-66.4) 15.8 (7.7-27.9) 34.2 (0-123.1) 0.000 (0-0.041) 0.000 (0-0.019) 1.00 (0.76-4.74) 12.7 (6.2-23.4) 13.9 (6.8-25.8) 12.1 (0-528.7) 0.072*** (0.028-0.131) 0.015** (0.001-0.070) 10 0.05# 5 40 1.07 (0.27-2.22) 17.5 (8.6-29.9) 16.9 (8.6-29.3) 0.9 (0 - 170.3) 0.022*** (0.005-0.057) 0.005** (0.0003-0.026) 10 Table S4. Model evaluation for ecological niche modeling Averaged TSS, AUC and Kappa values for model evaluation. The values were averaged from 10 runs of each modeling technique for each of 5 sets of pseudo-absence datasets. model GLM GAM MARS CTA RF MAXENT R. palmatus TSS AUC KAPPA 0.850 0.846 0.847 0.818 0.854 0.839 0.960 0.960 0.959 0.926 0.971 0.960 0.862 0.826 0.875 0.740 0.960 0.946 R. grayanus TSS AUC KAPPA 0.937 0.959 0.961 0.940 0.998 0.997 0.969 0.979 0.987 0.970 0.999 0.997 0.859 0.857 0.855 0.837 0.865 0.854 Figures Figure S1. Locations of occurrence data used for ecological niche modeling. Red and green circles represent occurrence data for R. palmatus and R. grayanus, respectively. Occurrence data were obtained from GBIF (http://www.gbif.org) and also included the sampling sites for phylogeographic analysis in this study. 0 25 50 75 100 Figure S2. Projection of current (a, b) and past (c, d) distribution of R. palmatus and R. grayanus using the Maxent. Maxent is a modeling technique that use only presence data. A dozen of runs were cross-validated with random seeds and averaged for each species projection.