Recurrent selection for the Winters sex-ratio genes in Drosophila simulans Submitted to Genetics Sarah B. Kingan*1, Daniel Garrigan†, and Daniel L. Hartl* * Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138 USA † Department of Biology, University of Rochester, Rochester, NY 14627 1. Present Address: Department of Biology, University of Rochester, Rochester, NY 14627 Corresponding Author: Sarah B. Kingan University of Rochester Department of Biology River Campus, Box 270211 Rochester, NY 14627-0211 Phone: (585) 275-4509 Fax: (585) 275-2070 e-mail: skingan@mail.rochester.edu 2 ABSTRACT Drosophila simulans is unusual in having a least three distinct systems of X chromosome meiotic drive. These selfish genetic elements have biased transmission during meiosis, resulting in an excess of female progeny. Here, we characterize naturally occurring genetic variation at the Winters sex-ratio driver (Distorter on the X or Dox), its progenitor gene (Mother of Dox or MDox), and its suppressor gene (Not Much Yang or Nmy), which have been previously mapped and characterized. We survey three North American populations as well as 13 globally distributed strains, and present molecular polymorphism data at the three loci. We find that all three genes show signatures of selection in North America, judging from levels of polymorphism and skews in the site-frequency spectrum. This signature of selection likely results from the biased transmission of the driver and selection on the suppressor for the maintenance of equal sex ratios. The timing of selection is more recent than the age of the alleles, suggesting that the driver and suppressor are coevolving under an evolutionary “arms race”. None of the Winters sex-ratio genes are fixed in D. simulans, and at all loci we find ancestral alleles, which lack the gene insertions and exhibit high levels of nucleotide polymorphism compared to the derived alleles. In addition, we find several “null” alleles that have mutations on the derived Dox background, which result in loss of drive function. We discuss the possible causes of the maintenance of presence-absence polymorphism in the Winters sex-ratio genes. 3 INTRODUCTION Meiotic drive can leave signatures in the genome similar to positive natural selection without increasing the fitness of an organism (LYTTLE 1993). Drive elements are preferentially transmitted during meiosis by disrupting the development or function of sperm carrying the homologous chromosome (ZIMMERING et al. 1970, meiotic drive sensu lato), or by true chromosome segregation defects during meiosis (SANDLER and NOVITSKI 1957, meiotic drive sensu stricto; TAO et al. 2007a); we use the term in the former sense throughout this article. While drive elements may arise on any chromosome, sex-chromosome meiotic drive is more easily detected due to the impact it has on the sex ratio, and a sex-linked driver is also more likely to invade a population than an autosomal driver (Hurst and Pomiankowski 1991). A chromosomal driver must maintain tight linkage with an insensitive target locus lest it drive against itself, a condition ensured by the lack of recombination between sex chromosomes (Charlesworth and Hartl 1978). Because of the impact drive elements have on sex ratios, sex-linked drivers are often referred to as “sex-ratio distorters” and the phenotype of skewed progeny sex ratios is termed “sex-ratio”. The mere transmission advantage of a driver, unless balanced by some detrimental fitness effect or masked by a suppressor, can cause it to sweep through a population in a manner similar to a positively selected mutation (Edwards 1961; Vaz and Carvalho 2004). Obviously, a complete sweep of a sex-linked driver dooms a male-less (or female-less) population to extinction (Hamilton 1967), and natural selection strongly favors genetic factors that suppress drive and restore Mendelian segregation. Fisher (1930) presented a qualitative argument for the maintenance of an equal sex ratio, which predicts selection on any heritable variant that increases the production of the rarer sex. Fisher’s principle has been formalized mathematically and demonstrated empirically (e.g. Bodmer and Edwards 1960; Carvalho et al. 1998). Suppressors have been identified in a wide variety of meiotic drive systems and are predicted to be strongly favored by natural selection for the maintenance of equal sex ratios (reviewed by Jaenike 2001). Furthermore, the 4 evolution of enhancer genes on a suppressed driving chromosome may enable drivers to evade suppression, setting off another bout of selection for Fisherian sex ratios through suppression (Hartl 1975). Meiotic drive is widespread, with systems identified in mammals, insects, and plants (JAENIKE 2001). Drosophila is the most extensively studied insect taxon, and sex-chromosome meiotic drive systems have been identified in more than a dozen species (Jaenike 2001). The rapid evolution of suppressors by Fisherian selection results in a cryptic sex-ratio distorter, which may be identified only when the association between the driver and suppressor is lost, such as in hybrids between species or populations that do not share meiotic drive systems (MERCOT et al. 1995). The coevolutionary arms race driving strong selection on drivers and suppressors likely contributes to Haldane’s rule (the preferential sterility of inviability of heterogametic hybrids) and is a leading explanation for the importance of X-linked loci in causing hybrid male sterility (FRANK 1991; HURST and POMIANKOWSKI 1991; PRESGRAVES 2008; TAO et al. 2007b). Two recently characterized hybrid male sterility factors, which are also sex-ratio distorters, are evidence of a possible link between meiotic drive and speciation (ORR and IRVING 2005; PHADNIS and ORR 2008; TAO et al. 2001). Drosophila simulans is unusual in having at least three distinct X-linked drive systems, termed Paris, Durham, and Winters (Tao et al. 2007a). Here, we focus on the Winters sex-ratio (SR), whose driver and suppressor have been mapped to the gene level and whose molecular and cellular features have been elucidated (Tao et al. 2007a; Tao et al. 2007b). Two genes, Distorter on the X (Dox) and Mother of Dox (MDox) are required for sex-ratio distortion (TAO et al. 2007a, Y. Tao personal comm.). Dox is a duplicate copy of MDox, which is located 70 kilobases (kb) proximal from its progenitor locus on the X chromosome (Figure 1). A dominant suppressor, called Not Much Yang (Nmy) evolved on chromosome 3R as a retrotransposed copy of Dox (TAO et al. 2007b). Nmy likely suppresses Dox through an RNA interference mechanism by forming a double stranded RNA with homology to the distorter RNAs (TAO et al. 2007b). The genes of the Winters sex-ratio are not found in D. 5 melanogaster, which diverged from D. simulans ~2.3 million years ago (LI et al. 1999). Initial surveys of the genes in the simulans clade indicate that a functional Nmy gene is present in D. mauritiana (Tao et al. 2007b). Thus, the Winters genes are more than 250,000 years old, the speciation time of D. simulans, D. mauritiana, and D. sechellia (McDermott and Kliman 2008). Signatures of positive selection have been previously detected at genomic regions linked to Drosophila sex-ratio distorters, but we present the first evidence of selection acting directly on a sexratio distorter gene and its suppressor gene. In Drosophila recens, driving X chromosomes show reduced nucleotide and haplotype variability relative to standard (non-driving) X chromosomes, and linkage disequilibrium extends over 130 cM of the driving chromosome (DYER et al. 2007). The D. recens driver is located in a large chromosomal inversion and appears to be in the early stages of mutational degradation. In the Paris sex-ratio of D. simulans, Derome et al. (2004) found reduced haplotype diversity at the Nrg locus, which is closely linked to the Paris sex-ratio gene. In a later study, the group further localized the Paris driver to a pair of duplicated loci 150 kb apart, and demonstrated reduced haplotype diversity and linkage disequilibrium between variants associated with drive (Derome et al. 2008). In this study, we characterize patterns of genetic variation in natural populations of North American D. simulans and find signatures of strong positive selection at all three genes of the Winters sex-ratio. MATERIALS AND METHODS Population Samples: Samples from three North American populations of D. simulans were examined in this study (Supplementary Table 1). Two sets of isofemale lines were established from Massachusetts in September 2006: Tremont, collected in a backyard grape arbor on Tremont Street in Cambridge (n = 34), and Nicewicz, collected at the Nicewicz Family Farm in Bolton (n = 12), ~30 mi west of Cambridge. F1 males were frozen and used for DNA extraction. In addition, a set of isofemale lines collected in Winters, CA, in the summer of 1995 (Begun and Whitley 2000) was kindly donated 6 by Sergey Nuzhdin. We also obtained 13 lines of diverse geographic origins from the Tucson Species Stock Center: 5 African (Madagascar 14021.0251.196, 14021.0251.197, Kenya 14021.0251.199, Congo 14021.0251.184, and South Africa 14021.0251.169), 2 North American (California 14021.0251.194, unknown 14021.0251.195), 2 European (Scotland 14021.0251.216, Greece 14021.0251.181), and 4 Oceanian (New Guinea 14021.0251.009, New Zealand 14021.0251.007, Australia 14021.0251.176, New Caledonia 14021.0251.198). All strains were sampled randomly with respect to sex-ratio phenotype and genotype. Data Collection: Genomic DNA was extracted from single males using a modified protocol of the Wizard Genomic DNA Purification Kit from Promega. From the Massachusetts populations F1 males from wild-caught females were used, and so both autosomal alleles were included in our sample. All other stocks are inbred lab lines and the autosomal loci were found to be homozygous. Polymerase chain reaction was performed using Takara LA Taq polymerase according to manufacturer’s instructions. Previously published PCR primers for Dox, MDox, and Nmy were used that amplified complete genes as well as flanking sequence (Tao et al 2007a, Tao et al 2007b, Figure 1). Internal sequencing primers were used to obtain 2X coverage (forward and reverse reads) for PCR amplicons. Primers were designed using Primer3Plus (Untegasser et al. 2007) and Amplify v. 3.14 (Engels 2005). Sequencing was performed on an ABI3730 capillary sequencer according to manufacturer’s protocols. Sequences were edited using Sequencher v. 4.8 (Gene Codes Corp.) and aligned by eye with the aid of bl2seq program of the BLAST package (Tatusova and Madden 1999). Additional editing was performed using BioEdit (Hall, 2007). At the Nmy locus, singleton variants that were observed as heterozygous sites in chromatograms were confirmed with repeated PCR and sequencing. Two samples from the Tremont population, T44 and T62, are double heterozygotes at the Nmy locus; both heterozygous sites for these samples feature a singleton variant. Haplotype phase was resolved for these two samples by assuming that each singleton variant arose on the wildtype (e.g. most common) background, rather than the less likely case of both singletons having arisen on the same genetic 7 background. For each sample we collected 6.2 kb from Dox, 4.5 kb from MDox, and 7.5 kb from Nmy (Figure 1). A total of 1.6 Megabases (Mb) of resequence data were obtained. Data Analysis: We calculated population genetic summary statistics using DnaSP (Rozas et al. 2003). The population mutation rate was estimated as the average pairwise diversity, (Tajima 1983) and Watterson’s estimator, W, (Watterson 1975) which is based on the number of segregating sites correcting for sample size. The site frequency spectrum was summarized by both Tajima’s D (Tajima 1989) and Fay and Wu’s H (Fay and Wu 2000). To summarize linkage disequilibrium (LD) across each gene, we estimated the statistic, ZNS (Kelly 1997), which is the average pairwise R2 value among all variable sites (Hill and Robertson 1968) and h, the number of haplotypes. In order to calculate the age of the origin of the genes, we estimated divergence as the average number of nucleotide substitutions, DXY (Nei 1987 equation 10.20). The fit of various summary statistics to the standard neutral model was assessed through coalescent simulations using the observed number of segregating sites, the conservative assumption of no recombination, and 1000 simulations, as implemented in DnaSP. HKA tests were performed using the HKA software (HEY 2004; HUDSON et al. 1987). Significance of HKA tests was determined from 10,000 coalescent simulations. Modeling selection: A Bayesian approach was taken to estimate the time since selection on the Winters sex-ratio genes in each of the three North American populations, using coalescent simulations of neutral variants linked to a site under selection. The simulation has two phases (going forward in time), a complete selective sweep of a new beneficial variant followed by a neutral (recovery) phase. We used a modified version of a computer program by Przeworski (2003), which models the selected phase as the structured coalescent in which recombination between neutral variants and the site under selection is treated analogously to migration between demes (KAPLAN et al. 1989). The neutral locus evolves according to the infinite sites model, with population mutation rate, = 4NL (where N is the effective population size, is the per site, per generation mutation rate, and L is the length of the 8 sequence) and population recombination rate, = 4NrL (where r is the per-site, per-generation recombination rate). Recombination between the neutral and selected sites occurs with rate C = 4NrK (where K is the distance between the closest neutral site and the selected site). The Bayesian method estimates the posterior probability distribution for the intensity (4Ns) and the time (T) since the completion of the selective sweep, using a summary likelihood method, in which the data are summarized by the number of segregating sites (S), number of haplotypes (h), and Tajima’s D. The selection model has the following parameters: N, effective population size; s, the selection coefficient; , mutation rate; r, recombination rate; and T, time since fixation of the beneficial variant. The posterior probability distributions for model parameters was generated using a rejection algorithm (Tavare et al. 1997). Briefly, parameter values are sampled from a prior distribution, a genealogy is simulated with the sampled parameters, and S segregating sites are placed randomly onto the genealogy. The data summaries described above are calculated from the simulated genealogy and compared to the summaries from the observed data. Parameter values that generate the observed number of haplotypes and a Tajima’s D values within some user-specified interval () are accepted and output to the posterior. To capture uncertainty in model parameters, the prior distribution of , r, and N are gamma-distributed whereas s is sampled from a uniform prior. Choice of prior distributions of parameters. In an effort to insure that the prior distribution of model parameters accurately reflect neutral variation in North American populations of D. simulans, we calculated the mean W and (HUDSON 1987) for 29 loci on the X and chromosome 3R sequenced in the same Winters, CA population (BEGUN and WHITLEY 2000, see Supplementary Table 2). We used gamma-distributed priors for N, r, and that yielded priors of the model parameters, and with these empirically observed means. We estimated W and separately for loci on the X versus 3R and included all variable sites. The empirically estimated mean per site W and for the X loci are 0.00488 and 0.01947 and for the 3R loci are 0.01029 and 0.08431 (Supplementary Table 2). The inheritance 9 scalar for the effective size of the X chromosome to that of the autosomes is accounted for in the joint prior probability distribution of and . The analysis outputs time scaled in units of 4N generations and that scaling can be considered arbitrary. To avoid confusion, we have reported scaled times in unit of N generations. Mutation rate was calculated from whole-genome divergence between D. simulans and D. melanogaster. Begun et al. (2007) estimated lineage-specific divergence for D. simulans in 10 kb windows across the entire genome. We calculated for each window as the lineage-specific divergence divided by 2.3MY, the divergence time for D. simulans and D. melanogaster (LI et al. 1999). Assuming 10 generations per year, this calculation gives a median per-site, per-generation mutation rate for chromosomes X and 3R of 1.2 x 10-9 and 1.0 x 10-9, respectively. These estimates are within the range of other estimated mutation rates for Drosophila (ANDOLFATTO and PRZEWORSKI 2000), but slightly lower than a commonly used mutation rate based on synonymous sites only (SHARP and LI 1989). If we assume there is a single effective population size for a population, the per-site, pergeneration r can be calculated as (*)/W. For the Winters, CA population data (BEGUN and WHITLEY 2000), we calculated r = 4.8 x 10-9 for the X and r = 8.2 x 10-9 for 3R. The prior distributions of for the X and 3R were gamma with shape parameter 12 and 10, respectively, and scale parameter 10-10; thus, the means of these distributions are 1.2 x 10-9 and 1.0 x 10-9, respectively (Table 2). The prior distributions of r for the X and 3R were gamma distributed with shape parameter 48 and 82, respectively, and scale parameter 10-10; thus, the means of these distributions are 4.8 x 10-9 and 8.2 x 10-9, respectively. The prior for the selection coefficient, s, was uniform between 5 x 10-4 and 0.5. We chose to estimate r using a population genetic estimate () rather than genetic map data for several reasons. First, recombination rates estimated from genetic maps are systematically higher than those estimates from population genetic parameters (ANDOLFATTO and PRZEWORSKI 2000; O'REILLY et al. 2008). While this pattern may be shaped by selection, demographic factors such as population 10 bottlenecks or population expansions may also increase levels of LD in natural populations (STUMPF and MCVEAN 2003). Secondly, recombination rate in Drosophila is sensitive to maternal age, temperature, and genetic background and recombination estimates in laboratory stocks do not take into account these biological factors (ASHBURNER et al. 2005). Third, our use of the lower, populationbased estimates of recombination is conservative with regards to the estimated strength of selection and timing of selection (i.e. time since selection may be over estimated and strength of selection may be under estimated). RESULTS Ancestral alleles observed at all loci: For each of the three sequenced loci we observe multiple chromosomes that lack the gene insertion, which represent the ancestral state of each locus (Supplementary Table 1). For convenience we refer to the presence of the gene insertion as the “derived” allele. At the Dox locus, four strains (two from Madagascar, one from New Caledonia, and one from New Zealand) lack the 3833 bp Dox gene insertion; at MDox, four samples lack the 3549 bp gene insertion (two from Madagascar, one from Congo, and one from New Zealand); and at Nmy, two North American samples lack the 2041 bp gene insertion (one individual each from Winters, CA, and the Tremont population from Massachusetts). Null mutations at Dox: Three different alleles at Dox were observed that have the derived gene insertion but have lost their ability to drive (see Figure 2). The wild-type allele is the functional distorter Dox and is present in 75% of the sampled lines (n = 53). A previously characterized null allele dox[del105] is present in 3 copies (4%) (Tao et al. 2007a). This allele has a 105 bp deletion overlapping intron 2 and exon 3, which removes a region that is critical for distortion. Ten samples (14%) have the dox[del150] null allele, which has a total of 150 bp deleted in exon 4, including one large 135 bp deletion and two smaller deletions of 12 bp and 3 bp. We found a single copy of 11 dox[del585], which shares the exon 4 deletions with dox[del150] but has an additional 435 bp deletion spanning exon 1 and intron 1. We tested the ability of dox[del150] and dox[del105] to distort sex ratios in a non-supressing nmy background, where nmy is a loss-of-function mutant of the Nmy gene (Tao et al. 2007b). These crosses yielded progeny with equal sex ratios (see Supplementary Table 3 and Supplementary Figure 1). We assume that dox[del585] is a loss-of-function mutant because it shares the dox[del150] deletions in addition to the large deletion in exon 1. Insertion-deletion polymorphism: Insertion-deletion (indel) polymorphisms at the Dox locus were already discussed in the context of loss-of-function mutations. At MDox, we observe one copy of MDox[del105] which has a 105 bp deletion that spans exons 2 and 3, one copy of MDox[ins135], which has a total of 135 bp inserted into exon 3, and one copy of MDox[ins32] which has 32 bp inserted in exon 1. The functional implications of these mutations are not known. In some cases, the same indel polymorphisms were observed at Dox and MDox, and evidently derive from gene conversion between the two paralogs (see next section). In addition to indel polymorphism in the MDox gene sequence, we observe variable numbers and lengths of the 360 bp repetitive elements that flank the MDox gene (Tao et al. 2007a). (Copies of this element also flank the Dox gene and may facilitate gene conversion between the paralogs). The New Zealand and Kenyan samples have an additional full-length repeat element 5' of the MDox gene, and one of the Madagascar samples (14021.0251.197) is missing the two 3' repeat elements. At Nmy, three samples (two from Madagascar and one from Congo) have a 6 bp insertion in one of the inverted repeats necessary for suppression by Nmy; we refer to this allele as Nmy[ins6]. Two of these three samples (the Congolese sample and one Madagascar sample, 14021.0251.196) also have a 201 bp insertion adjacent to a deletion of 77 bp between the inverted repeats, which is in the putative loop region of the RNA secondary structure (Tao et al. 2007a). The functional implications of these mutations at Nmy are not known. Nucleotide polymorphism and divergence: Estimates of nucleotide polymorphism for the full dataset at all three genes are relatively low, but not unusually so compared to other datasets for D. 12 simulans (ANDOLFATTO 2001; BEGUN and WHITLEY 2000). Importantly, the derived alleles have significantly reduced levels of nucleotide polymorphism compared to ancestral alleles (Table 1, Figure 3). Derived alleles have 2.22% of the ancestral allele diversity at Dox when measured as (4.38% when measured as W), and the corresponding parameters estimated for MDox are 0.99% (4.62%) and for Nmy 2.29% (14.65%). To test the statistical significance of the reduction, we performed pairwise HKA tests (HUDSON et al. 1987) for each locus, in which levels of polymorphism and divergence are compared for derived and ancestral alleles (Figure 3). Divergence was measured from D. melanogaster in the region flanking the genes. Deviation from neutral expectations is significant for all three loci (Dox: 2 = 57.84, P < 0.0001; MDox: 2 = 35.05, P < 0.0001; Nmy: 2 = 13.716, P < 0.0036). To determine whether the Winters SR genes show signatures of positive selection, we conducted multilocus HKA tests in which we compared polymorphism and divergence at each of the three Winters SR genes in the three North American populations to that of 13 unrelated loci sampled in the same Winters, CA population (Table 3). For our “neutral” set of loci, we chose a subset of the 29 loci sampled by Begun and Whitley (2000) that had the largest number of sampled chromosomes (n = 8). The Winters SR genes are predicted to be non-protein coding RNA genes (TAO et al. 2007a; TAO et al. 2007b) so we included all variable sites in our analysis because we cannot restrict our analysis to synonymous sites only, whose evolution is least likely to be influence by non-neutral processes (ANDOLFATTO 2005; HALLIGAN and KEIGHTLEY 2006). The original Begun and Whitely (2000) study analyzed only synonymous sites, so we reanalyzed all sites in their data in order to directly compare the datasets. A multi-locus HKA test on these 13 loci does not show any departure from neutral expectations ( 2 = 17.99, P < 0.0764, Table 3). However, when we include the Winters SR genes we observe significant deviation from the neutral expectations in all but one test (Table 3). We first conducted nine tests where we added data for a single Winters SR locus from a North American population to the 13 Begun and Whitely (2000) loci. All nine tests are significant except when we 13 added Nmy from the Winters population ( 2 = 20.28, P = 0.0903). For the Nmy data, we conducted two additional tests for the Tremont and Winters populations where we excluded the single ancestral allele present in each population. Both of these tests are significant (Winters: 2 = 59.92, P < 0.0001; Tremont: 2 = 94.52, P < 0.0001). (Here we report the uncorrected P-value but all tests remain significant at P < 0.0011 after a Bonferonni correction for multiple tests.) If positive selection has acted on the Winters SR genes, we expect to see deviation in the test in the direction of elevated divergence and reduced polymorphism at the Winters SR genes. In five of the 11 tests conducted, the Winters SR gene showed the largest deviation from neutral expectations in both polymorphism and divergence. In the remaining five significant tests, the Winters SR gene had the largest deviation from neutral expectations for divergence but not polymorphism. Moreover, these deviations were in the direction of reduced polymorphism and elevated divergence. Site-frequency spectrum: Non-neutral processes such as natural selection or non-equilibrium demography shape the site-frequency spectrum, which is commonly summarized by the statistics Tajima’s D (TAJIMA 1989) and Fay and Wu’s H (FAY and WU 2000). Tajima’s D (TD) is a summary of the folded frequency spectrum and compares two estimates of nucleotide polymorphism, and W, yielding a negative value if a locus has an excess of low frequency variants and a positive value if a locus has an excess of intermediate frequency variants. Fay and Wu’s H (FWH) is a summary of the unfolded frequency spectrum and is sensitive to the frequency of derived mutations such that it is negative when there is an excess of high frequency derived variants. Both of these statistics are commonly used as tests of selection where a negative value for each is compatible with a locus having experienced a selective sweep. We calculated TD and FWH at the Winters SR genes for a sample of all chromosomes, for each of the three North American populations and five African samples, and for the derived and ancestral alleles (Table 1). In the full dataset for all loci, we observe significantly negative Tajima’s D values at each gene (Dox: -2.19, P < 0.00001; MDox: -2.58, P < 0.00001; Nmy: -2.76, P < 0.00001). For the 14 North American populations, all but two population samples for which we could conduct tests have significantly negative Tajima’s D values (Dox: Nicewicz, -2.17, P = 0.003; Tremont, -0.29, n.s.; Winters, -0.34, n.s.; MDox: Nicewicz, -2.09, P < 0.00001; Tremont, -1.98, P = 0.008; Winters, -1.14, P < 0.05; Nmy: Nicewicz, n.a.; Tremont, -2.88, P < 0.00001; Winters, -2.32, P = 0.008). The African sample has a significantly positive TD at Dox (1.82, P = 0.001) and TD values close to zero for the other loci (MDox: -0.36, n.s.; Nmy: 0.32, n.s.). At all loci, the derived alleles have significantly negative TD’s (Dox: -1.48, P = 0.041; MDox: -2.40, P < 0.00001; Nmy: -2.69, P < 0.00001) whereas the ancestral alleles have TD’s close to zero (Dox: 0.35, n.s.; MDox: -0.27, n.s.; Nmy: n.a.). Samples for which TD values could not be calculated due to lack of segregating sites or too few samples are indicated with “n.a.” In summary, TD estimates are compatible with positive selection acting at all three Winters SR genes. At each gene, samples including all chromosomes as well as only the derived alleles show significantly negative TD values. Because the site frequency spectrum is sensitive to population pooling (HAMMER et al. 2003; PTAK and PRZEWORSKI 2002), the estimates for individual North American populations minimizes this problem (but may not eliminate it as the geographic scale of population structure in North American D. simulans is not well understood). For the individual populations, we observe significantly negative TD values for all tests except for Tremont and Winters at Dox. This pattern is not likely to result from demographic forces such as population growth because significantly negative TD values are not observed at any of the reanalyzed Begun and Whitley (2000) loci (Supplementary Table 2), which were sampled in the same Winters, CA population. For the complete dataset, we observe significant FWH at Nmy, and marginal significance at the driver loci (Dox: -25.95, P = 0.067; MDox: -24.57, P = 0.052; Nmy: -91.63, P < 0.00001). Similarly, North American populations and samples of derived alleles also have significant FWH at Nmy only (Dox: Nicewicz, n.a; Tremont, 0.06, n.s.; Winters, n.a.; derived, 0.03, n.s.; MDox: Nicewicz, 0.15, n.s.; Tremont, 0.18, n.s.; Winters, 0.15, n.s.; derived, 0.15, n.s.; Nmy: Nicewicz, n.a.; Tremont, -36.19, P = 15 0.005; Winters, -72.88, P < 0.00001, derived, -29.72 P = 0.003). None of the tests are significant for the African sample or the ancestral alleles. Gene conversion between Dox and MDox: Alignment of the paralogous region of the Dox and MDox loci reveal three gene-conversion tracts. The dox[del150] allele has a sequence motif of three deletions and a cluster of 5 single nucleotide polymorphisms (SNPs) that is shared with the wildtype MDox haplotype. In addition, we find one MDox haplotype that resembles the wild-type Dox haplotype in that it lacks these same three deletions and the SNP motif. Finally, the 105 bp deletion that characterizes the dox[del105] allele is also found in one MDox haplotype. These gene-conversion tracts were identified by eye and confirmed with the method of Betran et al. (1997) using the DnaSP software. Linkage disequilibrium: Positive selection on a beneficial mutation can cause linked neutral variants to increase in frequency along with the selected site, which results in elevated levels of linkage disequilibrium across the genomic region. To test for elevated levels of LD at the Winters SR genes, we summarized LD as the average pairwise R2 value across each gene, ZNS (KELLY 1997). We also tested for a reduction in the number of haplotypes (h), which results from hitchhiking by positive selection (NIELSEN 2005). The results of this test are largely parallel with the estimates of ZNS (Table 1). In the complete dataset, we observe significantly elevated LD at all three loci (Dox: 0.52 P = 0.003; MDox: 0.32, P = 0.046; Nmy: 0.40, P = 0.01). Six of the ten populations for which we could calculate ZNS show elevated LD (Dox: Nicewicz, 0.80, P = 0.007; Tremont, 0.35, n.s.; Winters, 0.76, P = 0.038; Africa, 0.97, P = 0.003; MDox: Nicewicz, 0.83, P = 0.015; Tremont, 0.25, n.s.; Winters, n.a.; Africa, 0.37, n.s.; Nmy: Nicewicz, n.a.; Tremont, 0.84, P < 0.00001; Winters, 1.00, P < 0.00001, Africa, 0.47, n.s.). Several factors besides selection may increase levels of LD in a sample. These include pooling derived and ancestral alleles (particular when alleles differ by large genomic insertions that may inhibit recombination), paralogous gene conversion, and pooling samples from different biological 16 populations. To explore these affects, we first calculated ZNS separately for derived and ancestral alleles. The sample including all derived alleles at Nmy showed elevated LD (n = 113, ZNS = 0.41, P = 0.005) but we see no significant ZNS values at other loci (Table 1). When we exclude the ancestral alleles in the Tremont and Winters populations at Nmy (no ancestral alleles were observed at Dox or MDox in North America), the signature of LD is no longer evident (Tremont, ZNS = 0.0002, n.s.; Winters ZNS = n.a., no segregating sites), meaning that the elevated LD was caused by the presence of the single divergent ancestral allele. Next, gene conversion between Dox and MDox may have introduced several non-independent mutations, which will initially be in linkage disequilibrium with each other until the association is eroded by recombination or mutation. We performed a second analysis of LD after encoding all mutations within gene conversion tracts as a single mutation. This reanalysis only differed from our initial analysis in the LD estimates at Dox and MDox, and resulted in no significant LD in the North American populations or the total sample of derived alleles (data not shown). Finally, pooling among subpopulation can result in spuriously high levels of LD (HARTL and CLARK 2007). The African samples includes several lines from populations which are genetically differentiated from each other (BAUDRY et al. 2006), which may be the cause of the elevated LD in the complete dataset at each locus as well as the African sample at Dox. In summary, we do not observe elevated LD in samples of derived alleles in our North American populations after correcting for gene conversion or excluding ancestral gene copies. Age of derived alleles: To estimate the age of the genes, the nucleotide divergence between the flanking sequence in the ancestral and derived alleles was calculated at each locus. From the sequence divergence, the age can be estimated as t = d/(2g) (where d is the divergence per site, is the per-site per-generation mutation rate, and g is generation time in years). We used the mutation rates calculated above for the modeling of selection. The per site divergence between the ancestral and derived alleles for Dox, MDox, and Nmy are 0.0467, 0.0198, and 0.0165, yielding age estimates of 1.96 MY, 832,000 years and 817,000 years, respectively. Based on this result, the Dox gene appears to 17 be much older than the other genes. It is possible that the duplication and transposition event that created the Dox gene may also be associated with extensive sequence changes, particularly in the repetitive sequences that flank the gene. A more accurate method of dating the Dox gene insertion is to determine the divergence between Dox and MDox at the gene insertion sequence, which is 0.0206, giving an age of 864,000 years, an estimate that is closer to the estimated ages of the other two genes. At MDox and Dox, we observed no shared polymorphisms and 22 and 77 fixed differences, respectively, between the ancestral and derived alleles. At Nmy, there are 12 shared polymorphism and 45 fixed differences—these shared polymorphisms result from a recombination event in the middle of the sequenced region such that sample T37a has the ancestral haplotype at the Nmy gene and a derived haplotype in the region distal to the gene. Timing of selection: To estimate the time since selection on the three genes of the Winters sexratio, we implement a model of a selective sweep followed by a neutral (recovery) phase in each of the three North American populations (Przeworski 2003). We assume the selective sweep was complete and therefore restrict our analysis to the derived alleles at each gene, which leads us to exclude one ancestral Nmy allele from each of the Tremont and Winters population. We were unable to perform the analysis for the Nicewicz population at the Nmy locus, because only one segregating site is present and Tajima’s D could not be calculated. By assuming fixation, we may be upwardly biasing our estimates of the time since selection at Nmy (in North America, Dox and MDox are fixed in our sample so this is less likely to be a problem at these loci). If ancestral alleles are segregating in the population, recombination between derived and ancestral alleles may introduce mutations onto the derived background, which would make derived alleles seem more diverse, and it would appear that selection occurred longer ago than it actually did. In view of the results actually obtained, therefore, excluding the ancestral Nmy sequences is conservative. In addition, the presence of gene conversion between Dox and MDox results in conservative estimates of time since selection. Gene conversion increases the 18 number of segregating sites by introducing multiple non-independent mutations, thus increasing the length of the recovery phase after selection is complete. We generate 1000 sets of model parameters that are compatible with our data summaries at each locus. For five of the datasets, we accepted simulated Tajima’s D values within = 0.1 of the observed data. However, three datasets (Dox-Tremont, Dox-Winters, and Nmy-Winters) exhibited low acceptance rates, which led us to increase to 0.5. The fit of the selection model to the data summaries can be evaluated based on the shape of the posterior distribution for T, the time since the sweep in coalescent time units of N generations (see Supplementary Figure 2). If the posterior is flat, it suggests that selection is either too old to be detected (i.e more than 4N generation ago), or else did not occur (PRZEWORSKI 2003). Based on an effective population size on the order of 1 x 106 years and 10 generations per year, we should be able to detect selection that occurred up to 4 million generations, or 400,000 years ago. All eight datasets are compatible with the model of selection (Supplementary Figure 2). The median time since selection for Dox and MDox ranges from 0.0304 N generations to 0.0348 N generations (Table 4). At Nmy, selection is more recent, with a median time of 0.0068 N generations for the Tremont population and 0.0164 N generations for the Winters population. The time since selection in years can be calculated as t = TNg where T is the time in coalescent time units, N is the effective population size, and g is the generation time in years, in this case 0.1, or 10 generations per year. At Dox and MDox, selection occurred around 3,000 years ago, with median times ranging from 2,800 years for the Tremont population at MDox to 3,500 years ago for the Nicewicz population at Dox (Table 4). Selection in the Tremont population at Nmy is most recent (median time = 1,600 years), while in the Winters population at Nmy the median time since selection is 3,800 years. Importantly, the 95% credible interval for all eight datasets excludes the origin of the genes more than 250,000 years ago (TAO et al. 2007a). Selection most likely occurred less than 14,000 years ago, well after the genes of the Winters SR had evolved in the ancestor of the D. simulans clade. 19 DISCUSSION In this study, we characterize patterns of genetic variation in North American populations of D. simulans at the genes of the Winters sex-ratio, one of three X-linked meiotic drive systems in this species (TAO et al. 2007a). We find that the presence of all genes—the distorter locus, Dox its progenitor gene, MDox, and the suppressor, Nmy—are polymorphic in this species. The frequencies of the ancestral form of the driver loci (the allele which lacks the gene insertion) are highest in African and Oceanean samples, while ancestral Nmy is rare in the North American samples and absent in samples from other geographic localities. We also find evidence of gene conversion between Dox and MDox, the paralogous gene pair responsible for sex-ratio distortion in this system. Finally, we find several loss-of-function mutations on the derived Dox background, consistent with virtually complete suppression of the sex-ratio system in North American populations. All three genes of the Winters sex-ratio show signatures consistent with recent positive selection. In this context, we use the term “selection” to also include the transmission-ratio advantage of the meiotic drive locus. The evidence for selection is two-fold. First, nucleotide variability on the derived allele background is greatly reduced compared to the ancestral allele background (Table 1, Figure 3). Second, all genes show skews in the site-frequency spectrum with an excess of lowfrequency variants observed in all genes, and an excess of high-frequency derived variants observed at Nmy. These site-frequency skews are reflected in significant negative Tajima’s D and Fay and Wu’s H statistics (Table 1). Both of these patterns are consistent with a hitchhiking model where a new mutation has rapidly increased in frequency in a population due to natural selection or biased transmission during meiosis. In addition, we find our data to be compatible with a coalescent model of a recent selective sweep at all loci that occurred well after the origins of the genes (Table 4 and Figure 4). In fact, the 95% credible interval for the time since selection at all loci is more recent than the split between D. simulans and D. mauritiana, ~250,000 years ago (MCDERMOTT and KLIMAN 2008). This 20 result is consistent with theoretical prediction that meiotic drive systems experience repeated bouts of drive and suppression, and thus multiple rounds of selection (FRANK 1991). Selection on the Winters sex-ratio is older than on the Paris sex-ratio, the other system in D. simulans that has been extensively studied. Derome et al. (2008) estimated that selection acted on the Paris driver only 88 years ago, based on an analysis of linkage disequilibrium across a region linked to the driver. Our results indicated selection acted less than 15,000 years ago, with an average age across loci of 3,000 years. Consistent with this estimate, we do not observe elevated linkage disequilibrium in derived gene copies at any of the Winters sex-ratio genes, after correcting for gene conversion between Dox and MDox. Significant linkage disequilibrium would be a signature of very recent selection. This signal is absent; whereas the signal of reduced polymorphism and skewed site frequencies are evident. At the time that selection was most likely acting on the genes of the Winters sex-ratio, the geographic range of D. simulans was restricted to Africa, the Indian ocean islands, and Eurasia (Lachaise et al. 1988). North America was likely settled ~500 years ago during the European colonization of the New World, facilitated by commensalism with humans (Lachaise and Silvain 2004). Interestingly, the most recent round of selection on the Winters SR occurred around the time of the expansion into Eurasia, 6,500- 5,000 years ago (LACHAISE and SILVAIN 2004). Female biased populations have higher growth rates than populations with even sex ratios (HAMILTON 1967), suggesting that the unleashing of the Winters driver and the resulting excess of females may have facilitated the colonization of new habitats. However, due to the large credible intervals of the estimated time since selection, we cannot exclude the possibility that selection occurred when the species range was restricted to Africa. Could other evolutionary forces besides selection have caused these departures from neutral patterns? Demographic forces such as population-size changes or population subdivision can have profound effects on genetic variation. However, these factors shape variation across all loci whereas selection targets particular genes or functional regions. Patterns of variation at Dox, MDox, and Nmy are unusual when compared to other loci sampled in North American populations (BEGUN and 21 WHITLEY 2000). In all three populations, each gene has either reduced polymorphism or elevated divergence, or both, as evidenced by significant multi-locus HKA tests (Table 3). Population growth can result in skews in the site-frequency spectrum similar to what we observed (i.e., an excess of rare variants and negative Tajima’s D). However, previous work indicates that populations of D. simulans were been subject to a population bottleneck during their colonization of the New World (WALL et al. 2002). Recent population bottlenecks are expected to result in an excess of intermediate frequency variants (WAKELEY 2009), whereas we observe a dearth in our data. Indeed, the Tajima’s D estimates for the Begun and Whitley data (2000) are slightly positive, consistent with a population bottleneck (Supplementary Table 2). Combined with our detailed knowledge of the function of these genes (TAO et al. 2007a; TAO et al. 2007b), we are confident that the observed departures from neutral equilibrium expectations at the genes of the Winters sex-ratio are due to selection. If all three genes show signatures of positive selection, why are they not fixed in the species? Even under a simple model of selective neutrality and drift, neutral mutations are not expected to persist beyond 4N generations, or roughly 400,000 years in the case of D. simulans if we assume 10 generations per year and an effective population size on the order of one million (HARTL and CLARK 2007). Four copies of the ancestral distorter alleles were found in African and Oceanean populations and two ancestral suppressors were found in North America. Polymorphism at the suppressor can be explained from a simple model of selection to maintain Fisherian sex ratios. Assume, after Fisher f m (1930), that the total reproductive value of males and females is equal, W W i i1 j where Wi is the j1 fitness of the ith male, Wj is the fitness of the jth female, and there are m males and f females in the population. If we apportion fitness evenly among individuals of each sex, the fitness of each male is then simply equal to the sex ratio, Wi = f/m. In a female-biased population, members of the “rarer sex” (males) have higher fitness. Under a model where a sex-ratio distorter invades a population and fixes due to its transmission advantage, selection on a new suppressor is frequency dependent. At low 22 frequency, a population is female-biased and selection for the maintenance of equal sex ratios is strong; but at high frequency, most copies of the distorter are masked, the population sex ratio is close to 50/50, and selection is much weaker. This result explains why selection for Fisherian sex ratios may be inefficient at removing the last few copies of a non-suppressor allele, even though under a deterministic model, the suppressor will eventually fix (VAZ and CARVALHO 2004). In addition, selection is expected to be even less efficient at purging null suppressors if the functional suppressor is dominant, as vanishingly few individuals will express sex-ratio. This verbal model makes many simplifying assumptions such as panmixia, infinite population size, no pleiotropic fitness effects of drivers or distorters, and dominant suppression, but it could nevertheless explain the presence of ancestral Nmy alleles in North American populations that are fixed for the derived allele at both Dox and MDox. Understanding the presence of null alleles of Dox and MDox is more complex. Under simple, single-population models of sex-chromosome drive, polymorphism between driving (SR) and standard (ST) X chromosomes can result from three conditions (Vaz and Carvalho 2004). First, the transmission advantage of an SR chromosome may be balanced by deleterious effects of either the driving locus itself or linked variants. Experimental work in a variety of Drosophila species indicates that when mated multiply, SR males suffer reduced fertility as well as reduced sperm competitive ability; these are examples of pleiotropic effects of the drive locus due to reduced sperm production (Jaenike 2001). Linked deleterious mutations may affect either male or female fitness and are common when driver elements occur in chromosomal inversions. In D. recens, females homozygous for SR chromosomes have reduced fertility, presumably due to a mutation at an unrelated locus trapped in the large inversion which contains the drive locus (DYER et al. 2007). The last two conditions for SR/ST polymorphism require the evolution of suppressors by selection for Fisherian sex ratios or genomic conflict, which mask the expression of drive. If suppression is complete (i.e., suppressors are fixed) the 23 meiotic drive system is essentially “dead” and both loci evolve neutrally. If the suppression is partial (i.e., suppressors are polymorphic) polymorphism in the driver may be maintained. For the Winters sex-ratio, we may argue against an offsetting deleterious effect based on several lines of evidence. First, the distorter is not located within a chromosomal inversion and is unlikely to be associated with deleterious variants. Second, theoretical work indicates that SR chromosomes balanced by deleterious effects cannot reach a frequency high enough to skew sex ratios and induce selection for suppressors (Vaz and Carvalho 2004). So the mere presence of Nmy indicates the Dox/MDox is not maintained as a balanced polymorphism. However, rejection of this hypothesis requires careful measurement of the fitness of each genotype. Interestingly, experiments suggest that the fertility of males expressing drive may be reduced relative to that of males with suppressed drivers (TAO et al. 2007b). Although rates of female remating in D. simulans is low (MARKOW 1996), in a female biased population, sperm limitation may be an issue for males. A difficulty in testing this hypothesis stems from the fact that small fitness effects may have important consequence in natural populations yet be undetectable in the laboratory. The partial suppression hypothesis seems unlikely because, although Nmy is not fixed, the frequency of males homozygous for non-suppressing Nmy is very low. Based on the observed allele frequencies in our sample, non-suppressing males are expected to occur at 0.6% in the Winters population and at 0.02% in the Tremont population. Thus, the “neutral” explanation seems most likely as it is supported by the presence of loss-of-function mutations on the derived Dox background and the near complete suppression of driving chromosomes based on observed allele frequencies in our sample. Our inability to distinguish among these three hypotheses for the polymorphism in the Winters driver is complicated by the fact that D. simulans violates many assumptions of the simple populationgenetic models implicit in the discussion above. The species exhibits high levels of population structure, particularly in Africa (Hamblin and Veuille 1999), and it is possible that the ancestral alleles 24 were sampled in populations that do not exchange migrants with populations that currently harbor the Winters sex-ratio genes. More extensive population sampling of the Madagascar, Congolese, New Caledonia, and New Zealand populations may shed light on this possibility. The possibility of competitive exclusion of the Winters driver by the Paris driver also exists. Notably, the frequency of the Paris driver is highest in central Africa and the Indian Ocean islands (JUTIER et al. 2004), where, based on our coarse global sampling, ancestral copies of the Winters driver are found. Consistent with the competitive exclusion hypothesis, the intensity of drive is higher in the Paris system than in the Winters system, ~96% versus ~81% (Montchamp-Moreau et al. 2006; Tao et al. 2007b). Neither driver appears to be a balanced polymorphism that would limit the spread of the drivers through the population, so differential intensity of drive would in large part determine the frequency of the drivers in the population (Thomson and Feldman 1975). Testing the competitive exclusion hypothesis will require more extensive population sampling of the Winters driver, particularly in Africa and the Indian Ocean islands, as well as competition experiments between the two drivers in population cages in the laboratory. Our analysis indicates that selection is much more recent than the actual origin of the Winters sex-ratio genes about 850,000 years ago. The date is based on sequence analysis and is consistent with the species distribution of the genes. All are absent in D. melanogaster but preliminary data indicates that the genes are present in D. mauritiana (Tao et al. 2007b, Kingan, unpublished data). Moreover, the D. sechellia Y chromosome is sensitive to drive by Dox (Tao et al. 2007a). An old origin but recent selection is suggestive of a genetic “arms-race” model for the evolution of drivers and suppressors, whereby multiple rounds of suppression and distortion occur due to ongoing genetic conflict between the loci (Frank 1991). In fact, the structure of the driving locus for Winters supports this “arms-race” model. Dox may have evolved as an enhancer or modifier of an original distorter, most likely MDox, which had been suppressed by an unknown locus (or an earlier form of Nmy). The most recent suppressor, Nmy, may then have evolved to suppress the new, compound distorter. This model is 25 testable by substituting chromosomes with a derived MDox and ancestral Dox into a variety of autosomal backgrounds. If drive is observed for some genotypes, it would confirm that MDox was once able to drive alone. In addition, if there is polymorphism in the drive phenotype, one may be able to map the original suppressor of MDox. The Winters sex-ratio is not the only trans-specific meiotic drive system: in mice, stalk-eyed flies, and Drosophila, shared drive systems are found in multiple closely related species (Jaenike 2001). The genomic conflict that results from a single meiotic drive system can have profound effects on patterns of genomic diversity in multiple species over a period of millions of years. On the molecular level, these patterns are indistinguishable from those caused by adaptation based on novel variation. It is only with a detailed understanding of the functional importance of genomic regions that one can attribute genomic signatures of selection to processes that increase the fitness of individual organisms. 26 ACKNOWLEDGMENTS We thank Yun Tao for generously sharing research materials, fly stocks, and unpublished results as well as his expertise and insight into this system; also Luciana Araripe, Horacio Montenegro, Kalsang Namgyal, Erik Dopman, and Nguyen Nguyen for technical assistance, and Noemi Velazguez for administrative assistance. The Nicewicz family farm kindly gave us access to their farm for fly collections. We are grateful to John Wakeley and Molly Przewoski for help with the coalescent modeling. Daven Presgraves and Yun Tao provided thoughtful comments, which greatly improved the manuscript. This work was supported by NIH grant GM065169 to D.L.H and an NSF Graduate Research Fellowship to S.B.K. 27 REFERENCES ANDOLFATTO, P., 2001 Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans. Mol Biol Evol 18: 279-290. ANDOLFATTO, P., 2005 Adaptive evolution of non-coding DNA in Drosophila. Nature 437: 1149-1152. ANDOLFATTO, P., and M. PRZEWORSKI, 2000 A genome-wide departure from the standard neutral model in natural populations of Drosophila. Genetics 156: 257-268. ASHBURNER, M., K. G. GOLIC and R. S. HAWLEY, 2005 Drosophila. A Laboratory Handbook. Cold Spring Harbor Laboratory Press, Cold Sping Habor, NY. BAUDRY, E., N. DEROME, M. HUET and M. VEUILLE, 2006 Contrasted polymorphism patterns in a large sample of populations from the evolutionary genetics model Drosophila simulans. Genetics 173: 759-767. BEGUN, D. J., A. K. HOLLOWAY, K. STEVENS, L. W. HILLIER, Y. P. POH et al., 2007 Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol 5: e310. BEGUN, D. J., and P. WHITLEY, 2000 Reduced X-linked nucleotide polymorphism in Drosophila simulans. Proc Natl Acad Sci U S A 97: 5960-5965. BETRAN, E., J. ROZAS, A. NAVARRO and A. BARBADILLA, 1997 The estimation of the number and the length distribution of gene conversion tracts from population DNA sequence data. Genetics 146: 89-99. BODMER, W. F., and A. W. EDWARDS, 1960 Natural selection and the sex ratio. Ann Hum Genet 24: 239-244. CARVALHO, A. B., M. C. SAMPAIO, F. R. VARANDAS and L. B. KLACZKO, 1998 An experimental demonstration of Fisher's principle: evolution of sexual proportion by natural selection. Genetics 148: 719-731. CHARLESWORTH, B., and D. L. HARTL, 1978 Population Dynamics of the Segregation Distorter Polymorphism of DROSOPHILA MELANOGASTER. Genetics 89: 171-192. DEROME, N., E. BAUDRY, D. OGEREAU, M. VEUILLE and C. MONTCHAMP-MOREAU, 2008 Selective sweeps in a 2-locus model for sex-ratio meiotic drive in Drosophila simulans. Mol Biol Evol 25: 409-416. DEROME, N., K. METAYER, C. MONTCHAMP-MOREAU and M. VEUILLE, 2004 Signature of selective sweep associated with the evolution of sex-ratio drive in Drosophila simulans. Genetics 166: 1357-1366. DYER, K. A., B. CHARLESWORTH and J. JAENIKE, 2007 Chromosome-wide linkage disequilibrium as a consequence of meiotic drive. Proc Natl Acad Sci U S A 104: 1587-1592. EDWARDS, A. W. F., 1961 The population genetics of "sex-ratio" in Drosophila pseudoobscura. Heredity 16: 291-304. FAY, J. C., and C. I. WU, 2000 Hitchhiking under positive Darwinian selection. Genetics 155: 14051413. FISHER, R. A., 1930 The Genetical Theory of Natural Selection. FRANK, S. A., 1991 Divergence of meiotic drive-suppression systems as an explanation for sex-biased hybrid sterility and inviability. Evolution 45: 262-267. HALLIGAN, D. L., and P. D. KEIGHTLEY, 2006 Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Res 16: 875-884. HAMBLIN, M. T., and M. VEUILLE, 1999 Population structure among African and derived populations of Drosophila simulans: evidence for ancient subdivision and recent admixture. Genetics 153: 305-317. 28 HAMILTON, W. D., 1967 Extraordinary sex ratios. A sex-ratio theory for sex linkage and inbreeding has new implications in cytogenetics and entomology. Science 156: 477-488. HAMMER, M. F., F. BLACKMER, D. GARRIGAN, M. W. NACHMAN and J. A. WILDER, 2003 Human population structure and its effects on sampling Y chromosome sequence variation. Genetics 164: 1495-1509. HARTL, D. L., 1975 Modifier theory and meiotic drive. Theor Popul Biol 7: 168-174. HARTL, D. L., and A. G. CLARK, 2007 Principles of Population Genetics. Sinauer Associates, Inc., Sunderland, MA. HEY, J., 2004 HKA, pp. HILL, W. G., and A. ROBERTSON, 1968 Llinkage disequilibrium in finite populations. Theoretical Applied Genetics 38: 226-231. HUDSON, R. R., 1987 Estimating the recombination parameter of a finite population model without selection. Genet Res 50: 245-250. HUDSON, R. R., M. KREITMAN and M. AGUADE, 1987 A test of neutral molecular evolution based on nucleotide data. Genetics 116: 153-159. HURST, L. D., and A. POMIANKOWSKI, 1991 Causes of sex ratio bias may account for unisexual sterility in hybrids: a new explanation of Haldane's rule and related phenomena. Genetics 128: 841-858. JAENIKE, J., 2001 Sex Chromosome Meiotic Drive. Annual Review of Ecology and Systematics 32. JUTIER, D., N. DEROME and C. MONTCHAMP-MOREAU, 2004 The sex-ratio trait and its evolution in Drosophila simulans: a comparative approach. Genetica 120: 87-99. KAPLAN, N. L., R. R. HUDSON and C. H. LANGLEY, 1989 The "hitchhiking effect" revisited. Genetics 123: 887-899. KELLY, J. K., 1997 A test of neutrality based on interlocus associations. Genetics 146: 1197-1206. LACHAISE, D., M.-L. CARIOU, J. R. DAVID, F. LEMEUNIER, L. TSACAS et al., 1988 Historical biogeography of the Drosophila melanogaster species subgroup. Evolutionary Biology 22: 159225. LACHAISE, D., and J. F. SILVAIN, 2004 How two Afrotropical endemics made two cosmopolitan human commensals: the Drosophila melanogaster-D. simulans palaeogeographic riddle. Genetica 120: 17-39. LI, Y. J., Y. SATTA and N. TAKAHATA, 1999 Paleo-demography of the Drosophila melanogaster subgroup: application of the maximum likelihood method. Genes Genet Syst 74: 117-127. LYTTLE, T. W., 1993 Cheaters sometimes prosper: distortion of mendelian segregation by meiotic drive. Trends Genet 9: 205-210. MARKOW, T. A., 1996 Evolution of Drosophila mating systems, pp. 73-106 in Evolutionary Biology, Vol 29. MCDERMOTT, S. R., and R. M. KLIMAN, 2008 Estimation of isolation times of the island species in the Drosophila simulans complex from multilocus DNA sequence data. PLoS ONE 3: e2442. MERCOT, H., A. ATLAN, M. JACQUES and C. MONTCHAMP-MOREAU, 1995 Sex-ratio distortion in Drosophila simulans: co-occurence of a meiotic drive and a suppressor of drive. Journal of Evolutionary Biology 8: 283-300. MONTCHAMP-MOREAU, C., D. OGEREAU, N. CHAMINADE, A. COLARD and S. AULARD, 2006 Organization of the sex-ratio meiotic drive region in Drosophila simulans. Genetics 174: 13651371. NEI, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York. NIELSEN, R., 2005 Molecular signatures of natural selection. Annu Rev Genet 39: 197-218. O'REILLY, P. F., E. BIRNEY and D. J. BALDING, 2008 Confounding between recombination and selection, and the Ped/Pop method for detecting selection. Genome Res 18: 1304-1313. ORR, H. A., and S. IRVING, 2005 Segregation distortion in hybrids between the Bogota and USA subspecies of Drosophila pseudoobscura. Genetics 169: 671-682. 29 PHADNIS, N., and H. A. ORR, 2008 A Single Gene Causes Both Male Sterility and Segregation Distortion in Drosophila Hybrids. Science. PRESGRAVES, D. C., 2008 Sex chromosomes and speciation in Drosophila. Trends Genet 24: 336-343. PRZEWORSKI, M., 2003 Estimating the time since the fixation of a beneficial allele. Genetics 164: 1667-1676. PTAK, S. E., and M. PRZEWORSKI, 2002 Evidence for population growth in humans is confounded by fine-scale population structure. Trends Genet 18: 559-563. ROZAS, J., J. C. SANCHEZ-DELBARRIO, X. MESSEGUER and R. ROZAS, 2003 DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19: 2496-2497. SANDLER, L., and E. NOVITSKI, 1957 Meiotic drive as an evolutionary force. American Naturalist 91: 105-110. SHARP, P. M., and W. H. LI, 1989 On the rate of DNA sequence evolution in Drosophila. J Mol Evol 28: 398-402. STUMPF, M. P., and G. A. MCVEAN, 2003 Estimating recombination rates from population-genetic data. Nat Rev Genet 4: 959-968. TAJIMA, F., 1983 Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437-460. TAJIMA, F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585-595. TAO, Y., L. ARARIPE, S. B. KINGAN, Y. KE, H. XIAO et al., 2007a A sex-ratio meiotic drive system in Drosophila simulans. II: an X-linked distorter. PLoS Biol 5: e293. TAO, Y., D. L. HARTL and C. C. LAURIE, 2001 Sex-ratio segregation distortion associated with reproductive isolation in Drosophila. Proc Natl Acad Sci U S A 98: 13183-13188. TAO, Y., J. P. MASLY, L. ARARIPE, Y. KE and D. L. HARTL, 2007b A sex-ratio meiotic drive system in Drosophila simulans. I: an autosomal suppressor. PLoS Biol 5: e292. TAVARE, S., D. J. BALDING, R. C. GRIFFITHS and P. DONNELLY, 1997 Inferring coalescence times from DNA sequence data. Genetics 145: 505-518. THOMSON, G. J., and M. W. FELDMAN, 1975 Population genetics of modifiers of meiotic drive: IV. On the evolution of sex-ratio distortion. Theor Popul Biol 8: 202-211. VAZ, S. C., and A. B. CARVALHO, 2004 Evolution of autosomal suppression of the sex-ratio trait in Drosophila. Genetics 166: 265-277. WAKELEY, J., 2009 Coalescent Theory: An Introduction. Roberts and Company Publishers, Greenwood Village, CO. WALL, J. D., P. ANDOLFATTO and M. PRZEWORSKI, 2002 Testing models of selection and demography in Drosophila simulans. Genetics 162: 203-216. WATTERSON, G. A., 1975 On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7: 256-276. ZIMMERING, S., L. SANDLER and B. NICOLETTI, 1970 Mechanisms of meiotic drive. Annu Rev Genet 4: 409-436. 30 TABLE 1 Population genetic summary statistics n (nanc) L S W h Z NS TD FWH 71 2342 155 0.00509 0.01396 6*** 0.52** -2.19*** -25.95 Nicewicz 12 (0) 5521 19 0.00057 0.00114 4* 0.80** -2.17** -- Tremont 34 (0) 5956 12 0.00045 0.00049 7 0.35 -0.29 0.06 Winters 12 (0) 60601 8 0.0004 0.00044 3 0.76* -0.34 -- Africa 5 (2) 2343 116 0.02945 0.02376 3* 0.97** 1.82* -1.20 Derived 67 5511 22 0.00044 0.00084 8* 0.31 -1.48* 0.03 Ancestral 4 2388 84 0.01982 0.01919 4 0.5 0.35 2.33 All Data 69 2788 118 0.0023 0.00918 10*** 0.32* -2.58*** -24.57 Nicewicz 12 (0) 4401 12 0.00045 0.0009 3* 0.83* -2.09*** 0.15 Tremont 33 (0) 4507 9 0.00017 0.00049 5 0.25 -1.98** 0.18 Winters 12 (0) 4508 1 0.00004 0.00007 2 -- -1.14* 0.15 Africa 5 (3) 2788 103 0.01772 0.01859 4 0.37 -0.36 6.40 Derived 65 4400 18 0.00018 0.00086 8 0.26 -2.40*** 0.15 Ancestral 4 2815 92 0.01812 0.0186 4 0.41 -0.27 4.33 115 5335 155 0.0009 0.00553 11*** 0.40* -2.76*** -91.63*** Nicewicz 24 (0) 7461 0 0 0 1 -- -- -- Tremont 66 (1) 5403 60 0.00034 0.00233 7*** 0.84*** -2.88*** -36.19** Winters 12 (1) 5385 121 0.00374 0.00744 2*** 1.00*** -2.32** -72.88*** Africa 5 (0) 7311 60 0.00372 0.00359 4*** 0.47 0.315 -3.50 Derived 113 7310 67 0.00028 0.00179 11*** 0.41** -2.69*** -29.72** Dox All Data Population Allele MDox Population Allele Nmy All Data Population Allele Ancestral 2 5402 66 0.01222 0.01222 2 1.00 --n is the number of chromosomes sampled. nanc is the number of ancestral alleles present in each population sample. L is the total number of sites analyzed, excluding alignment gaps. S is the number of segregating sites. h is the number of haplotypes. is the average number of pairwise differences (Nei 1987). W is Watterson's estimator of population diversity (Watterson, 1975). ZNS is the average pairwise R2 (Kelley 1997). TD is Tajima's D (Tajima 1989). FWH is Fay and Whu's H (Fay and Wu 2000). * indicates P < 0.05, ** indicates P < 0.01, *** indicates P < 0.001. 32 TABLE 2 Prior distributions of parameters for selection model prior Gam(10-10,12) r Gam(10-10,48) N Gam(4x104,25) s U(5x10-4,0.5) --- Dox mean prior MDox mean 95% density prior Nmy mean 95% density 1.2x10-9 6.3x10-10-2.0x10-9 Gam(10-10,12) 1.2x10-9 6.3x10-10-2.0x10-9 95% density Gam(10-10,10) 1.0x10-9 4.8x10-9 3.5x10-9-6.3x10-9 Gam(10-10,48) 4.8x10-9 4.8x10-10-1.7x10-9 3.5x10-9-6.3x10-9 Gam(10-10,82) 8.2x10-9 6.5x10-9-10x10-9 1.0x106 6.5x105-1.4x106 Gam(4x104,25) 1.0x106 6.5x105-1.4x106 Gam(1x105,25) 2.5x106 1.6x106-3.7x106 -- -- U(5x10-4,0.5) -- -- U(5x10-4,0.5) 30 -- -- 4.9-110 -- 22 3.6-80 -- 75 9.7-320 120 35-330 -- 87 25-240 -- 610 160-1540 is the per site mutation rate. r is the per site recombination rate. N is the effective population size. s is the selection coefficient. is the per locus population mutation parameter, is the per locus population recombination parameter. 33 TABLE 3 HKA Tests L n S Div P-value Gene Population Chromosome 2 Winters SR Data Dox Nicewicz X 5521 12 19 0.0567 52.51 <0.0001 Dox Tremont X 5956 34 12 0.0567 93.27 <0.0001 Dox Winters X 6061 12 8 0.0567 72.17 <0.0001 MDox Nicewicz X 4401 12 12 0.0611 41.22b <0.0001 MDox Tremont X 4507 33 9 0.0611 59.31 <0.0001 MDox Winters X 4508 12 1 0.0611 49.72 <0.0001 Nmy Nicewicz 3R 7461 24 0 0.0516 77.34 <0.0001 Nmy Tremont 3R 7460 65 0 0.0516 94.52 <0.0001 Nmy Winters 3R 7461 11 1 0.0516 59.92 <0.0001 NmyAlla Tremont 3R 5403 66 60 0.0513 42.85 <0.0001 NmyAll Winters 3R 5385 12 121 0.0511 20.28 0.0903 Begun and Whitley (2000) Data bnb Winters X 1015 8 11 0.0197 17.99 0.0764 mei-218 Winters X 1187 8 14 0.0687 ovo Winters X 1356 8 9 0.0270 sn Winters X 1437 8 28 0.0370 sog Winters X 1233 8 8 0.0251 X Winters X 1425 8 24 0.0281 yp3 Winters X 1227 8 8 0.0473 AP-50 Winters 3R 1398 8 58 0.0293 fzo Winters 3R 1360 8 22 0.0708 hyd Winters 3R 1786 8 26 0.0208 Osbp Winters 3R 1166 8 31 0.0266 ry Winters 3R 1362 8 54 0.0419 T-cp1 Winters 3R 1201 8 9 0.0325 L is the number of bases sequenced in D. simulans. n is the number of D. simulans chromosomes sampled. S is the number of segregating sites. Div is the per-base divergence from D. melanogaster. 2and P values correspond to multi-locus HKA tests on 13 loci previously sequenced in D. simulans (bottom) and when data from single Winters SR genes for each North American population were added to the 13 loci (top). See text for details. a. Ancestral alleles were not excluded from the analysis (Nicewicz has no ancestral alleles in sample). 34 TABLE 4 Posteriors distribution of parameters for selection model T (N gen) T (years) s median 0.0348 3,500 28 120 0.063 Dox 95% CI 0.0064-0.112 610-10,000 15-49 70-180 0.0019-0.46 median 0.0308 2,900 20 86 0.063 MDox 95% CI 0.0037-0.112 330-11,000 11-35 52-130 0.0015-0.46 T (N gen) T (years) s 0.0312 2,900 21 110 0.1 0.0072-0.156 760-12,000 9.6-43 63-170 0.0023-0.48 0.0304 2,800 17 83 0.055 0.0040-0.104 360-9,700 8.9-32 50-130 0.0017-0.47 0.0068 1,600 61 560 0.28 0.0020-0.0212 550-4,500 29-130 340-850 0.014-0.49 T (N gen) T (years) s 0.034 3,200 22 110 0.059 0.0035-0.136 300-12,000 11-42 68-180 0.0021-0.48 0.0328 3,100 18 81 0.26 0.0040-0.148 400-14,000 8.5-36 49-133 0.021-0.48 0.0164 3,800 59 570 0.27 0.0032-0.064 790-14,000 26-120 340-890 0.023-0.49 Nicewicz median Nmy 95% CI Tremont Winters T (N gen) is the time since selection in coalescent time units. T (years) is the time since selection in years. is the per-locus population mutation rate. is the per-locus population recombination rate. s is the selection coefficient. 95% CI is the 95% credible interval. 35 SUPPLEMENTARY TABLE 1 Geographic Distribution of Ancestral and Derived Alleles North American Populations Number of ancestral alleles (total sampled) Geographic Origin Population Name Bolton, MA Dox MDox Nmy Nicewicz 0 (12) 0 (12) 0 (24) Cambridge, MA Tremont 0 (34) 0 (33) 1 (66) Winters, CA Winters 0 (12) 0 (12) 1 (12) Geographic Origin Stock ID Dox Allele MDox Allele Nmy Allele Madagascar 14021.0251.196 ancestral ancestral derived 14021.0251.197 ancestral ancestral derived Kenya 14021.0251.199 derived ancestral derived Congo 14021.0251.184 derived derived derived South Africa 14021.0251.169 derived derived derived California 14021.0251.194 derived derived derived North America, unknown 14021.0251.195 derived derived derived Scotland 14021.0251.216 derived derived derived Greece 14021.0251.181 derived derived derived New Guinea 14021.0251.009 derived derived derived New Zealand 14021.0251.007 ancestral ancestral derived Australia 14021.0251.176 derived derived derived New Caledonia 14021.0251.198 ancestral derived derived Global Panel 36 Locus Aats-gluprop AP-50 Cen190 fzo Hsc70 hyd miranda nos Osbp oso pit ry T-cp1 tld MEAN bnb ct dec-1 garnet mei-218 mei-9 otu ovo pgd r sn sog sqh X yp3 MEAN SUPPLEMENTARY TABLE 2 Begun and Whitley 2000 loci Complete sequence. Winters, CA populations n L S h Chromosome W 3R 6 1348 32 6 0.0101 0.0110 3R 8 1398 58 7 0.0161 0.0166 3R 7 1287 23 4 0.0091 0.0073 3R 8 1362 22 5 0.0068 0.0062 3R 7 1292 10 6 0.0031 0.0032 3R 8 1793 26 7 0.0057 0.0056 3R 5 1200 29 5 0.0113 0.0120 3R 7 1073 20 7 0.0074 0.0078 3R 8 1167 31 7 0.0101 0.0103 3R 7 971 24 5 0.0107 0.0101 3R 7 1267 52 5 0.0196 0.0179 3R 8 1362 54 7 0.0163 0.0164 3R 8 1201 9 6 0.0028 0.0029 3R 7 1013 40 7 0.0189 0.0169 0.0106 0.0103 X 8 1030 11 6 0.0035 0.0042 X 6 1090 2 3 0.0008 0.0008 X 7 1493 23 5 0.0068 0.0063 X 7 1265 17 3 0.0039 0.0055 X 8 1230 14 4 0.0058 0.0046 X 7 890 6 4 0.0029 0.0028 X 6 1162 29 5 0.0125 0.0109 X 8 1359 9 6 0.0029 0.0026 X 7 912 17 3 0.0091 0.0076 X 6 1198 9 4 0.0032 0.0033 X 8 1450 28 5 0.0088 0.0078 X 8 1233 8 5 0.0021 0.0025 X 7 777 10 3 0.0055 0.0053 X 8 1425 24 5 0.0082 0.0065 X 8 1241 8 4 0.0027 0.0025 0.0052 0.0049 37 TD -0.5508 -0.1578 1.4001 0.4938 -0.1073 0.0794 -0.4673 -0.2433 -0.0914 0.3380 0.5476 -0.0374 -0.1608 0.6559 0.1213 -0.8319 -0.0500 0.3685 -1.6704 1.4083 0.2540 0.9132 0.5952 1.0809 -0.1132 0.6903 -0.8615 0.1431 1.3544 0.2580 0.2359 0.0367 0.0360 0.0093 0.0041 0.0271 0.0000 0.8265 0.0096 0.1235 0.0092 0.0333 0.0088 0.0257 0.0305 0.0843 0.0000 0.0298 0.0163 0.0000 0.0111 0.0000 0.0220 0.0174 0.0024 0.0451 0.0109 0.1266 0.0022 0.0071 0.0011 0.0195 n is the number of chromosomes sampled. L is the length of sequence in base pairs. S is the number of segregating sites. h is the number of haplotypes. is the per site average number of pairwise differences (Nei 1987). W is the per site Watterson's estimator of population diversity (Watterson, 1975). TD is Tajima's D (Tajima 1989). is the per site population recombination rate (Hudson 1987). 38 SUPPLEMENTARY TABLE 3 number of males genotype tested 1 Dox; nmy 18 2 dox[del105]; nmy 18 3 dox[del150]; nmy 19 4 dox[del150]; nmy 19 5 dox[del150]; nmy 17 k is the proportion of female progeny. mean k (s.d) 0.96 (0.038) 0.53 (0.050) 0.56 (0.061) 0.52 (0.032) 0.53 (0.057) 39 FIGURE 1. Regions sequenced of the genes of the Winters sex-ratio. Chromosomal location of the distorter locus and suppressor locus are shown at the top. The two genes of the distorter are Distorter on the X (Dox) and Mother of Dox (MDox). The suppressor gene is called Not Much Yang (Nmy). Dox and MDox are separated by ~70 kb of DNA sequence on the X chromosome. Triangles indicate the location of the PCR primers used. Arrows indicate direction of transcription of the genes (Tao et al. 2007a; Tao et al. 2007b). FIGURE 2. Predicted exon structure of the loss-of-function mutants at Dox. The allele name is followed by the frequency of the mutant in the total pooled sample. Exons are illustrated as grey boxes, deletions are shown as dashed lines. FIGURE 3. Pairwise HKA tests indicate selection acting on the derived form of the Winters SR genes. Per-site level of polymorphism is shown above the x-axis (W) and per-site average divergence in flanking sequence (DXY) between D. simulans and D. melanogaster is shown below the x-axis. Derived alleles are shown in dark grey, ancestral alleles are shown in light grey. FIGURE 4. Posterior distributions of the time since selection (in years) for a hitchhiking model. SUPPLEMENTARY FIGURE 1. Crossing scheme used to test for sex-ratio distortion of different Dox alleles. In generation 1, X chromosomes of known Dox genotype were extracted from the Tremont lines. These were substituted into an isogenic background homozygous for a non-functional suppressor allele, “nmy[sim1247]. ” Males were mated to tester females (w; e) and sex ratios of their progeny were determined. SUPPLEMENTARY FIGURE 2. Posterior distributions of the time since selection (in units of N generations) for a hitchhiking model. 41