The Y chromosome — with the genes to make a man — has been sequenced. Often regarded as a genetic wasteland, the sequence reveals that we may have underestimated its powers. Here, Nature presents the research, as well as news, reviews and analysis. As with all of Nature's genome content, these articles are available free online. To celebrate this landmark, we are offering a 15% discount for a subscription to Nature. Click here for details. News and Views Tales of the Y chromosome H. F. Willard Determining the sequence of the human Y chromosome presented a daunting challenge to genome researchers. But the task is now done, and the secrets revealed justify the effort. Article The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes H. Skaletsky et al. Letter Abundant gene conversion between arms of palindromes in human and ape Y chromosomes S. Rozen et al. Nature Science Update Y chromosome sequence completed DNA readout reveals genetic palindromes safeguard male-defining chromosome. Y chromosomes rewrite British history Anglo-Saxons' genetic stamp weaker than historians suspected. All articles here are available free to registered users evolution Human spermatozoa: The future of sex R. John Aitken, Jennifer A. Marshall Graves The vulnerability of the Y chromosome will be a key factor in shaping the evolutionary future of our species. Nature 415, 963 (28 Feb 2002) Unexpectedly similar rates of nucleotide substitution found in male and female hominids Hacho B. Bohossian, Helen Skaletsky, David C. Page Nature 406, 622 - 625 (10 Aug 2000) Y-chromosome variation and Irish origins Emmeline W. Hill, Mark A. Jobling, Daniel G. Bradley Nature 404, 351 - 352 (23 Mar 2000) The application of molecular genetic approaches to the study of human evolution L. Luca Cavalli-Sforza, Marcus W. Feldman Nature Genetics 33, 266 - 275 (01 Mar 2003) The human Y chromosome, in the light of evolution Bruce T. Lahn, Nathaniel M. Pearson, Karin Jegalian Nature Reviews Genetics 2, 207 - 216 (01 Mar 2001) Y chromosome sequence variation and the history of human populations Peter A. Underhill et al. Nature Genetics 26, 358 - 361 (01 Nov 2000) development DMY is a Y-specific DM-domain gene required for male development in the medaka fish Masaru Matsuda et al. Nature 417, 559 - 563 (30 May 2002) Male development of chromosomally female mice transgenic for Sry Koopman P, Gubbay J, Vivian N, Goodfellow P, Lovell-Badge R. Nature 351, 96 (9 May 1991) Sox9 induces testis development in XX transgenic mice Valerie P.I. Vidal, Marie-Christine Chaboissier, Dirk G. de Rooij, Andreas Schedl Nature Genetics 28, 216 - 217 (01 Jul 2001) A transgenic insertion upstream of Sox9 is associated with dominant XX sex reversal in the mouse Colin E. Bishop et al. Nature Genetics 26, 490 - 494 (01 Dec 2000) genetics Human mtDNA and Y-chromosome variation is correlated with matrilocal versus patrilocal residence H. Oota, W. Settheetham-Ishida, D. Tiwawech, T. Ishida, & M. Stoneking Nature Genetics 29, 20-21 (2001) Retroposition of autosomal mRNA yielded testis-specific gene family on human Y chromosome Bruce T Lahn, David C Page Nature Genetics 21, 429 - 433 Reduced adaptation of a non-recombining neo-Y chromosome Doris Bachtrog, Brian Charlesworth Nature 416, 323 - 326 (21 Mar 2002) Strong male-driven evolution of DNA sequences in humans and apes Kateryna D. Makova, Wen-Hsiung Li Nature 416, 624 - 626 (11 Apr 2002) A physical map of the human Y chromosome Charles A. Tilford et al. Nature 409, 943 - 945 (15 Feb 2001) Picture credits: The creator and voices behind the Simpsons Matt Groening © MC PHERSON COLIN/CORBIS SYGMA Vitruvian Man by Leonardo da Vinci © Bettmann/CORBIS David by Michelangelo © Royalty-Free/CORBIS Nature 423, 810 - 813 (19 June 2003); doi:10.1038/423810a Genome biology: Tales of the Y chromosome HUNTINGTON F. WILLARD Huntington F. Willard is at the Institute for Genome Sciences and Policy, and the Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina 27710, USA. e-mail: hunt.willard@duke.edu Determining the sequence of the human Y chromosome presented a daunting challenge to genome researchers. But the task is now done, and the secrets revealed justify the effort. Ancient maps showed the known world in colourful detail, beyond the edges of which lay vast expanses of terra incognita. Much creative thought went into portraying this unexplored territory, often featuring nasty-looking serpents and dragons. Only when Magellan managed to circumnavigate the globe did it become apparent that the unknown was in fact navigable, and that the serpents and dragons, if not illusory, could at least be tamed. The human genome has its terra incognita too, some of it known, much of it subject to alternating angst and fascination by genome biologists, and all of it to be avoided if possible — until now. On pages 825 and 873 of this issue1, 2 , a group of modern-day Magellans describe how they sailed headlong into the frothy seas of duplicated, inverted and otherwise troublesome sequences on the human Y chromosome. They have emerged safely on the other side, with tales to tell. Because of its distinctive role in sex determination, the Y chromosome has long attracted special attention from geneticists, evolutionary biologists and even the lay public. It is known to consist of regions of DNA that show quite distinctive genetic behaviour and genomic characteristics. The two human sex chromosomes, X and Y (Fig. 1), originated a few hundred million years ago from the same ancestral autosome — a non-sex chromosome — during the evolution of sex determination3. They then diverged in sequence over the succeeding aeons. Nowadays, there are relatively short regions at either end of the Y chromosome that are still identical to the corresponding regions of the X chromosome, reflecting the frequent exchange of DNA between these regions ('recombination') that occurs during sperm production 4. But more than 95% of the modern-day Y chromosome is male-specific, consisting of some 23 million base pairs (Mb) of euchromatin — the part of our genome containing most of the genes — and a variable amount of heterochromatin, consisting of highly repetitive DNA and often dismissed as non-functional. Now, in an accomplishment that can only be described as heroic, Skaletsky et al.1 report the complete sequence of the 23-Mb euchromatic segment, which they designate the MSY, for 'male-specific region of the Y'. Figure 1 Male make-up. Full legend High resolution image and legend (110k) Prioritization in the Human Genome Project had led to the heterochromatic regions of the Y and other chromosomes being set aside to be dealt with later, if ever. But there was reason to hope that the euchromatin of the Y chromosome would present no more difficult a sequencing challenge than that found elsewhere in the genome. That supposition could not have been more wrong. As Skaletsky et al. report, the MSY is a mosaic of complex and interrelated sequences that made this one of the most problematic regions of the human genome thus far to be successfully sequenced and assembled. For instance, about 10–15% of the MSY consists of stretches of sequence that moved there from the X chromosome within only the past few million years. These stretches are still 99% identical to their X-chromosome counterparts and are dominated by a high proportion of interspersed repetitive sequences, with only two genes. A further 20% of the MSY consists of a class of sequences ('X-degenerate' sequences1) that are more distantly related to the X chromosome, reflecting their more ancient common origin. And the remainder comprises a web of Y-specific repetitive sequences that make up a series of palindromes — sequences that read the same on both strands of the DNA double helix, with two 'arms' stretching out from a central point of mirrored symmetry. These palindromes come in a range of sizes, up to almost 3 Mb in length, with more than 99.9% identity between the two arms of each palindrome. The repetitive sequences, particularly the palindromes, caused some difficulties for sequence assemblers. Genome-sequencing projects involve fragmenting the genome in question into small, overlapping pieces, sequencing them, and then using computer algorithms to put the pieces together in the correct order. There are various ways of doing this; assembling the MSY's palindromes (and discriminating between their arms) required an iterative mapping and sequencing process more reminiscent of the knowledge-based mapping approaches of the early days of the Human Genome Project than the high-throughput assemblies that have emerged for most of the genome5, 6. This strategy was aided by the fact that the sequence came from a single Y chromosome, so Skaletsky et al. knew that minor sequence variations must have come from duplicated copies on the same chromosome, rather than from different Y chromosomes. Although necessarily more painstaking, this overall approach provides a model for how researchers might attack at least some of the troublesome areas of the rest of the genome — such as blocks of repetitive heterochromatin and the hundreds of regions of substantial sequence duplication7 — where standard assembly programs can be fooled. This is not just a celebratory tale of a successful sequencing journey, however. Along the way, Skaletsky et al. picked up artefacts of Y-chromosome antiquity, dating as far back as 300 million years, that allow a glimpse into the evolutionary strategies that the Y chromosome has used to survive. For instance, from the degree and patterns of divergence of the genes found on both sex chromosomes, the authors provide evidence for the stepwise decay of the Y chromosome over time and define changes in both Y-chromosome organization and gene content and expression. Unlike the regions at the ends, most of the lengths of the sex chromosomes do not exchange sequence during sperm production, and Skaletsky et al. point to two consequences of this suppression of recombination. First, selection occurred on the Y chromosome for a group of testis-specific genes that the authors argue may have enhanced male fertility. Most of these genes are found within the palindromes, showing why it can be important to sequence such difficult regions. Second, as X–Y recombination became suppressed during evolution, an alternative mechanism had to emerge to maintain the sequence and function of the remaining Ychromosome genes and to prevent the accumulation of inactivating mutations and the ultimate demise of the chromosome8. To gain insight into this alternative mechanism, Rozen et al.2 examined the hypothesis that X–Y recombination has been replaced by extensive, ongoing recombination between the arms of the MSY palindromes — where the sequence on one arm of the palindrome alters or 'converts' the sequence on the other. To test the predictions of this model, the authors sequenced one particular palindrome-embedded gene from Y chromosomes from around the world, representing the full tree of the previously established Y-chromosome genealogy9. They found several instances where the sequence of the copy of the gene on one arm of the palindrome had altered the sequence of the other arm's copy. From this, they calculate that as many as 600 base pairs (from the 5.4 Mb contained in MSY palindromes) must be converted in each newborn male in the human population. These data also indicate that gene conversion in general may be more common than previously suspected, especially in other palindromic and duplicated regions around the genome7. This supports a more dynamic view of genome change, in which, even within a single generation, not only does the occasional mutation occur (there are estimated to be as many as 100–200 new base-pair changes in each person), but also perhaps thousands of gene-conversion events. The tales told by these Magellans of the genome hold two lessons for those who might question the wisdom of such exploration. First, even the most repetitive and seemingly impenetrable stretches of the genome hold secrets that justify the effort. Second, each chromosome has its own story to tell, quite apart from the story of the genome as a whole. Although the sex chromosomes provide the strongest case for a special relationship between genome organization and the unique biology of a chromosome 10, 11 , the other chromosomes shouldn't feel left out. Each is the product of hundreds of millions of years of evolution, shaped by processes that have rearranged and exchanged sequences, contributed to the formation of new species, given birth to new genes and gene families, and provided the basis for a range of genetically determined or genomically influenced traits. Piecing together these events remains a worthwhile challenge, for among the flotsam and jetsam of each chromosome lie clues to our history. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Skaletsky, H. et al. Nature 423, 825-837 (2003). | Article | Rozen, S. et al. Nature 423, 873-876 (2003). | Article | Ohno, S. Sex Chromosomes and Sex-Linked Genes (Springer, Berlin, 1967). Burgoyne, P. S. Hum. Genet. 61, 85-90 (1982). | PubMed | ChemPort | International Human Genome Sequencing Consortium Nature 409, 860-921 (2001). | Article | PubMed | ChemPort | Venter, J. C. et al. Science 291, 1304-1351 (2001). | Article | PubMed | ChemPort | Bailey, J. A. et al. Science 297, 1003-1007 (2002). | Article | PubMed | ChemPort | Marshall Graves, J. A. Trends Genet. 18, 259-264 (2002). | Article | PubMed | Cavalli-Sforza, L. L. & Feldman, M. W. Nature Genet. 33, 266-275 (2003). | Article | PubMed | Lahn, B. T. & Page, D. C. Science 278, 675-680 (1997). | Article | PubMed | ChemPort | Carrel, L., Cottle, A. A., Goglin, K. C. & Willard, H. F. Proc. Natl Acad. Sci. USA 96, 1444014444 (1999). | Article | PubMed | ChemPort | Nature 423, 825 - 837 (19 June 2003); doi:10.1038/nature01722 The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes HELEN SKALETSKY*, TOMOKO KURODA-KAWAGUCHI*, PATRICK J. MINX†, HOLLAND S. CORDUM†, LADEANA HILLIER†, LAURA G. BROWN*, SJOERD REPPING‡, TATYANA PYNTIKOVA*, JOHAR ALI†, TAMBERLYN BIERI†, ASIF CHINWALLA†, ANDREW DELEHAUNTY†, KIM DELEHAUNTY†, HUI DU†, GINGER FEWELL†, LUCINDA FULTON†, ROBERT FULTON†, TINA GRAVES†, SHUN-FANG HOU†, PHILIP LATRIELLE†, SHAWN LEONARD†, ELAINE MARDIS†, RACHEL MAUPIN†, JOHN MCPHERSON†, TRACIE MINER†, WILLIAM NASH†, CHRISTINE NGUYEN†, PHILIP OZERSKY†, KYMBERLIE PEPIN†, SUSAN ROCK†, TRACY ROHLFING†, KELSI SCOTT†, BRIAN SCHULTZ†, CINDY STRONG†, AYE TIN-WOLLAM†, SHIAWPYNG YANG†, ROBERT H. WATERSTON†, RICHARD K. WILSON†, STEVE ROZEN* & DAVID C. PAGE* * Howard Hughes Medical Institute, Whitehead Institute, and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA † Genome Sequencing Center, Washington University School of Medicine, 4444 Forest Park Boulevard, St Louis, Missouri 63108, USA ‡ Center for Reproductive Medicine, Department of Gynaecology and Obstetrics, Academic Medical Centre, Amsterdam 1105 AZ, the Netherlands Correspondence and requests for materials should be addressed to D.C.P. (page_admin@wi.mit.edu). GenBank accession numbers are listed in Fig. 2l and the Supplementary Information. The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length. Here, we report that the MSY is a mosaic of heterochromatic sequences and three classes of euchromatic sequences: X-transposed, X-degenerate and ampliconic. These classes contain all 156 known transcription units, which include 78 proteincoding genes that collectively encode 27 distinct proteins. The X-transposed sequences exhibit 99% identity to the X chromosome. The X-degenerate sequences are remnants of ancient autosomes from which the modern X and Y chromosomes evolved. The ampliconic class includes large regions (about 30% of the MSY euchromatin) where sequence pairs show greater than 99.9% identity, which is maintained by frequent gene conversion (nonreciprocal transfer). The most prominent features here are eight massive palindromes, at least six of which contain testis genes. The history of human Y chromosome research can be divided into three eras. The first era focused on mendelian examination of human family trees. In the opening decades of the twentieth century, proponents of Mendel's concept of the gene observed three modes of inheritance in our species: autosomal recessive, autosomal dominant and Xlinked recessive. Contemporaneously, other scholars sought to identify traits that exhibited Y-linked (father to son) transmission. These scholars erroneously claimed success, presenting family trees purported to demonstrate that Y-chromosomal genes were responsible for hairy ears, scaly skin and other traits. Meanwhile, light microscopic studies of human cells provided strong physical evidence of the existence of a male-specific chromosome1. By 1950, studies of human pedigrees reported at least 17 Y-linked traits2. The second era was dominated by the view that the Y chromosome was a genetic wasteland, based on the debunking of earlier studies and a dearth of new evidence for genes. In the 1950s, Stern systematically exposed critical flaws in each of the preceding pedigree studies and dismissed them 2. In 1959, Jacobs' study of Klinefelter (XXY) males3 and Ford's research on Turner (X0) females 4 demonstrated that the Y chromosome carries a pivotal sex-determining gene, but this gene was considered to be an exception on a generally desolate chromosome. In the 1960s, Ohno proposed that the mammalian X and Y chromosomes had evolved from an ordinary pair of autosomes5. Ohno speculated that the X chromosome had retained the ancestral autosome's gene content whereas the Y chromosome had lost all but perhaps one gene involved in sex determination. Thus emerged the understanding of the human Y chromosome as a profoundly degenerate X chromosome. The hallmark of the third and present era has been the application of recombinant DNA and genomic technologies to the Y chromosome, culminating in molecularly based conclusions about its genes. In recent decades, an understanding of the Y chromosome's biological functions has begun to emerge from DNA studies of individuals with partial Y chromosomes, coupled with molecular characterization of Ylinked genes implicated in gonadal sex reversal, Turner syndrome, graft rejection and spermatogenic failure6. Genomic studies revealed that the Y chromosome contains a region, comprising 95% of its length, where there is no X–Y crossing over. This region came to be known as the non-recombining region, or NRY, although our discovery of abundant recombination, as reported here and in the accompanying manuscript, compels us to rename it the male-specific region, or MSY7. The MSY is flanked on both sides by pseudoautosomal regions, where X–Y crossing over is a normal and frequent event in male meiosis (see Supplementary Note 1). Previous efforts to construct accurate, high-resolution physical maps of the MSY had been stymied by an abundance of lengthy, intrachromosomal repetitive sequences, or amplicons8. To overcome this difficulty, we identified minute variations between amplicon copies, and then highlighted these minute variants (sequence family variants9) as markers to be ordered with respect to one another, yielding a map amenable to iterative refinement. However, the minute variants could only be found by fully and accurately sequencing and comparing near-identical amplicon copies. Thus, in our effort to determine the nucleotide sequence of the MSY, mapping and sequencing activities were fused into a single, iterative analytic process. We have previously reported the physical map that emerged from these efforts 10. Here we report the sequencing of the MSY. We mapped and sequenced a tiling path of 220 bacterial artificial chromosome (BAC) clones, each containing a portion of the MSY from the same individual. We used only one man's Y chromosome to prevent any allelic variation, or polymorphism, from confounding our search for minute sequence variation between amplicon copies. (MSY amplicon copies can differ as little in sequence as two Y chromosomes chosen at random from the population11.) We chose to sequence highly redundant BACs, especially in amplicon-rich regions: about 12.7 million (roughly 60%) of the euchromatic nucleotides were sequenced in at least two independent BAC clones. This redundancy allowed us to refine and validate the MSY sequence by exhaustively investigating, and in most cases resolving, sequence discrepancies between overlapping BACs. Sequencing of euchromatic and heterochromatic regions We begin with a statistical synopsis of the MSY sequence, considering the euchromatic and heterochromatic portions separately. (In this analysis, we have equated satellite sequences with heterochromatin, and all other sequences with euchromatin.) The product of our present research is a 'reference' sequence from one man's Y chromosome. A full description of the nature and extent of Y chromosome variation in human populations must await future studies. We and our colleagues have previously reported the nucleotide sequence of two portions of the MSY (the AZFa and AZFc regions12, 13). We have incorporated this previously reported sequence data in our present analysis of the entire MSY. The MSY's euchromatic DNA sequences total roughly 23 megabases (Mb), including 8 Mb on the short arm (Yp) and 14.5 Mb on the long arm (Yq) (Fig. 1). We obtained finished sequence, with an estimated error rate of about 1 per 10 5 nucleotides, for all MSY euchromatin, with two known exceptions. First, there remain two gaps, each of which is roughly 50 kilobases (kb) long as judged by chromosomal fluorescence in situ hybridization (FISH) (Supplementary Fig. 1). Second, we obtained representative but incomplete sequence for a tandem array that spans roughly 0.7 Mb on Yp. We estimate that we obtained finished nucleotide sequence for roughly 97% of the MSY euchromatin, and that we captured 99% of the sequence complexity of MSY's euchromatin. Figure 1 The male-specific region of the Y chromosome. Full legend High resolution image and legend (72k) So far, efforts to gain sequence-based understanding of human chromosomes have largely by-passed heterochromatic regions (refs 14, 15; see also Supplementary Note 2), including a large block of heterochromatic sequences found in the centromeric region of every nuclear chromosome16. In addition to its centromeric heterochromatin (approximately 1 Mb, ref. 17), the Y chromosome was previously shown to contain a second, much longer heterochromatic block (roughly 40 Mb) that comprises the bulk of the distal long arm (Fig. 1; see also Supplementary Note 3). In the course of the present sequencing project, we discovered and characterized a third heterochromatic block—a sharply demarcated island that spans approximately 400 kb, comprises >3,000 tandem repeats of 125 base pairs (bp), and interrupts the euchromatic sequences of proximal Yq (Figs 1 and 2). The other two heterochromatic blocks also consist of massively amplified tandem repeats of low sequence complexity. We attempted to sequence BACs spanning the boundaries and representing the body of each of the three heterochromatic blocks. We succeeded, with the exception that the distal boundary of the major heterochromatic region, on distal Yq, was not identified with certainty (Supplementary Fig. 1). In total, we found that the heterochromatin of MSY encompasses at least six distinct sequence species (Table 1), each of which form long, homogeneous tandem arrays. Our findings are detailed in Supplementary Note 4 and Supplementary Fig. 3. Figure 2 Sequence-based map of the MSY; a detailed view of the 24-Mb region shown in Fig. 1b. Full legend High resolution image and legend (44k) A catalogue of genes and transcription units With a comprehensive reference sequence of the MSY in hand, we set out to catalogue systematically the genes of the MSY. We electronically identified and manually examined all matches to previously reported MSY genes. Furthermore, we used polymerase chain reaction with reverse transcription (RT–PCR) and/or sequencing of complementary DNA clones to evaluate electronic matches to publicly available expressed sequence tags (ESTs), as well as potential genes that were predicted using GenomeScan software18. For all experimentally verified genes whose expression patterns had not been reported previously, we tested for expression in diverse human tissues by RT–PCR and subsequent sequencing of RT–PCR products. We found that the MSY includes at least 156 transcription units, half of which probably encode proteins (Table 2 and Figs 2, 3; see also Supplementary Tables 1 and 2). All 156 transcription units identified are located in euchromatic sequences. We have no evidence of transcription of the MSY heterochromatin. Of the approximately 78 protein-coding units, about 60 are members of nine different MSY-specific gene families, each characterized by >98% nucleotide identity among family members, in both exons and introns. The remaining 18 protein-coding genes are present in one copy each in the MSY. (These include two genes, RPS4Y1 and RPS4Y2, that exhibit 93.6% nucleotide identity in coding exons but are much more diverged in introns.) Thus, the MSY seems to encode at least 27 distinct proteins or protein families. Figure 3 MSY genes, transcription units and palindromes. Full legend High resolution image and legend (131k) Furthermore, the MSY includes at least 78 transcription units for which strong evidence of protein coding is lacking; many of these transcription units are probably non-coding. Of these 78 transcription units, 13 occur in single copy in the MSY and the remaining 65 are members of 15 MSY-specific families. Considering together both coding and non-coding transcription units, the MSY appears to contain 24 MSY-specific families, which collectively account for 125 of the 156 MSY transcription units identified so far. On the basis of earlier experiments, most of the genes of the MSY were thought to fall into two functional classes, with genes in the first group expressed throughout the body, in many organs, and genes in the second group expressed predominantly or exclusively in testes19. Our present catalogue of MSY genes and their patterns of tissue expression (Table 2) corroborate this model. Of the MSY's 27 distinct protein-coding genes or gene families identified so far, 12 are expressed ubiquitously and 11 are expressed exclusively or predominantly in testes. Three classes of sequences in the MSY euchromatin We find that nearly all of the euchromatic sequences fall into three classes, which we have named X-transposed, X-degenerate and ampliconic. As shown in Figs 1 and 2, the MSY euchromatin is a patchwork of these three sequence classes. The characteristics of the classes are summarized in Fig. 4. Figure 4 Three sequence classes in the MSY euchromatin. Full legend High resolution image and legend (47k) The X-transposed sequences are 99% identical to DNA sequences in Xq21, a band in the midst of the long arm of the human X chromosome. The X-transposed sequences are so named because their presence in the human MSY is the result of a massive Xto-Y transposition that occurred about 3–4 million years ago, after the divergence of the human and chimpanzee lineages20-22. Subsequently, an inversion within the MSY short arm cleaved the X-transposed block into two non-contiguous segments, as observed in the modern MSY (Figs 1 and 2)21, 22. The X-transposed sequences do not participate in X–Y crossing over during male meiosis, distinguishing them from the pseudoautosomal sequences found in the telomeric regions of the human X and Y chromosomes. Within the X-transposed segments, which have a combined length of 3.4 Mb, we identified only two genes, both of which have homologues in Xq21 (Table 2). Thus the X-transposed sequences exhibit the lowest density of genes among the three sequence classes in the MSY euchromatin (Figs 1 and 3), as well as the highest density of interspersed repeat elements (Fig. 1). In particular, long interspersed nuclear element 1 (LINE1) elements account for 36% of all X-transposed sequence, or nearly twice the genome average of 20%14, 15. As expected, low gene density and high repeat density also characterize the homologous sequence block in Xq21. In contrast to the X-transposed sequence blocks, the X-degenerate segments of the MSY are dotted with single-copy gene or pseudogene homologues of 27 different Xlinked genes. These single-copy MSY genes and pseudogenes display between 60% and 96% nucleotide sequence identity to their X-linked homologues, and they seem to be surviving relics of ancient autosomes from which the X and Y chromosomes coevolved, as explained below. In 13 cases, the MSY homologue is a pseudogene with sequence similarity to exons and introns of the functional X homologue (Supplementary Table 3). In the remaining 14 cases, the MSY homologue seems to be a transcribed, functional gene, and the X- and Y-linked genes encode very similar but non-identical protein isoforms (Table 2 and Figs 2, 3). These include two cases in which a functional X-linked gene has two expressed homologues in the MSY. The Ylinked genes RPS4Y1 and RPS4Y2 are full-length homologues of the X-linked gene RPS4X, and they apparently encode two different, full-length isoforms of ribosomal protein S4. In contrast, the Y-linked genes CYorf15A and CYorf15B are homologous to, respectively, 5' and 3' portions of the X-linked gene CXorf15, and they apparently encode proteins homologous to, respectively, amino- and carboxy-terminal portions of the predicted CXORF15 protein (Supplementary Fig. 4). Together, the X-degenerate sequences encode 16 of the MSY's 27 distinct proteins or protein families. Notably, all 12 ubiquitously expressed MSY genes reside in the X-degenerate regions; no such genes have been identified elsewhere in the MSY. Conversely, among the 11 MSY genes found to be expressed predominantly in testes, only one gene, the sexdetermining SRY, is X-degenerate. The third class of euchromatic sequences, the ampliconic segments, are composed largely of sequences that exhibit marked similarity—as much as 99.9% identity over tens or hundreds of kilobases—to other sequences in the MSY. We refer to these long, MSY-specific repeat units, of which there are many families, as amplicons. The amplicons are located in seven segments that are scattered across the euchromatic long arm and proximal short arm (Figs 1 and 2), and whose combined length is 10.2 Mb. We identified these ampliconic regions through a comprehensive analysis of similarities within the sequenced portions of the MSY. We calculated the percentage nucleotide identity between all pairs of known MSY sequences and then plotted the data in two ways. First we determined, at each point along the length of the sequenced MSY, the highest intrachromosomal similarity. The resulting graph (Fig. 5c) identifies the ampliconic regions as those where intrachromosomal identity, over stretches of 50 kb or more, generally exceeds 50%. Notably, 60% (6.1 Mb) of the ampliconic sequences exhibit intrachromosomal identities of 99.9% or greater. Figure 5 Sequence similarities within the MSY. Full legend High resolution image and legend (73k) A more spatially detailed representation of intrachromosomal similarities is shown in Fig. 5a, which records the locations of all MSY sequence pairs characterized by at least 65% identity within a sliding window of 2,000 nucleotides. After heterochromatic and LINE1 repeats have been accounted for, the MSY is seen to contain many long stretches of sequence that are similar to those elsewhere in the MSY. As shown in the inset to Fig. 5a, the triangular plot can be broken down into two smaller triangles—one representing sequence comparisons within Yp, the other depicting comparisons within Yq—and a rectangle depicting comparisons between Yp and Yq. Scrutiny of these Yp, Yq and Yp–Yq components of the plot reveals a wealth of sequence similarities within and between ampliconic segments on both arms of the chromosome. The ampliconic sequences exhibit by far the highest density of genes, both coding and non-coding, among the three sequence classes in the MSY euchromatin (Figs 1 and 3). We identified nine distinct MSY-specific protein-coding gene families, with copy numbers ranging from two (VCY, XKRY, HSFY, PRY) to three (BPY2) to four (CDY, DAZ) to six (RBMY) to approximately 35 (TSPY) (Table 2 and Figs 2, 3). (These copy numbers pertain to the particular Y chromosome that we sequenced; they may vary in human populations.) In aggregate, these nine coding families encompass roughly 60 transcription units. Furthermore, the ampliconic sequences include at least 75 other transcription units for which strong evidence of protein coding is lacking (Figs 2 and 3; see also Supplementary Table 2). Of these 75 putative non-coding transcription units, 65 are members of 15 MSY-specific families, and the remaining 10 occur in single copy. Considering together both coding and non-coding elements, the ampliconic sequences contain 135 of the 156 MSY transcription units identified so far. In contrast to the ubiquitous expression of most X-degenerate genes, the ampliconic genes and transcription units show highly restricted expression (Table 2). All nine protein-coding families in the ampliconic regions are expressed predominantly or exclusively in testes, as are most of the regions' non-coding transcription units. Among the three euchromatic sequence classes, the ampliconic sequences exhibit by far the lowest densities of LINE1 and total interspersed repeat elements (Fig. 1). Indeed, the interspersed repeat content of the MSY's ampliconic sequences (35%) is far below the mean for the human genome (44%; z-test yields P 0.000001). Eight palindromes comprising 25% of MSY euchromatin The most pronounced structural features of the ampliconic regions of Yq are eight massive palindromes (Table 3). In the dot plot of Fig. 5a, the longer palindromes are visible as vertical blue lines that approach the baseline. An MSY map highlighting all eight palindromes is shown in Fig. 3a. In all eight palindromes, the arms are highly symmetrical, with arm-to-arm nucleotide identities of 99.94–99.997%. (By convention, these percentage identities refer only to nucleotide substitutions and do not take account of insertions and deletions by which palindrome arms differ.) The palindromes are long, their arms ranging from 9 kb to 1.45 Mb in length. They are imperfect in that each contains a unique, non-duplicated spacer, 2–170 kb in length, at its centre. Palindrome P1 is particularly spectacular, having a span of 2.9 Mb, an arm-to-arm identity of 99.97%, and bearing two secondary palindromes (P1.1 and P1.2, each with a span of 24 kb) within its arms13. The eight palindromes collectively comprise 5.7 Mb, or one-quarter of the MSY euchromatin. Six of the eight palindromes carry recognized protein-coding genes, all of which seem to be expressed specifically in testes (Fig. 3b). In all known cases of genes on MSY palindromes, identical or nearly identical gene copies exist on opposite arms of the palindrome. Of the nine multi-copy, protein-coding gene families identified so far in the MSY, eight have members on palindromes. Indeed, six families are located exclusively in palindromes. These include the DAZ genes, which exist in four copies—two in palindrome P1 and two in P2—and the CDY genes, which also occur in four copies—two in P1 and two in P5 (Fig. 3b). In addition, the palindromes contain at least seven families of apparently non-coding transcription units, all expressed exclusively or predominantly in testes (Fig. 3e). In addition to the eight palindromes, the ampliconic regions of Yq and Yp contain five sets of more widely spaced inverted repeats with repeat lengths of 62–298 kb (Fig. 2; see also Supplementary Table 4). Three of these inverted repeat pairs (IR1, IR2 and IR3) exhibit nucleotide identities of 99.66–99.95%. Inversion of the IR3 repeats, both located on Yp, was probably a direct consequence of the molecular evolutionary event that cleaved the X-transposed sequences into two non-contiguous segments (Supplementary Fig. 5). Subsequent homologous recombination between inverted IR3 repeats was responsible, we suspect, for a 3.6-Mb inversion polymorphism observed on the short arm of the modern Y chromosome (Supplementary Figs 5 and 6)10. Transcriptionally active tandem arrays In addition to palindromes and inverted repeats, the ampliconic regions of Yq and Yp contain a variety of long tandem arrays. Prominent among these are the newly identified NORF (no long open reading frame) clusters, which in aggregate account for about 622 kb on Yp and Yq, and the previously reported TSPY clusters, which comprise about 700 kb of Yp (Fig. 2). Triangular dot plots that highlight the regularities and relatively crisp borders of the NORF and TSPY arrays are shown in Supplementary Fig. 7 (see also Supplementary Note 5). The NORF arrays are based on a repeat unit of 2.48 kb. A consensus sequence for the repeat is readily identifiable (Supplementary File 2), but the sequence of individual repeat elements typically diverges from that consensus by 14–20%. The NORF arrays are so named because they harbour a great diversity of spliced but apparently noncoding transcription units, including the TTTY1, TTTY2, TTTY6, TTTY7, TTTY8, TTTY18, TTTY19, TTTY21 and TTTY22 families. Both strands of the NORF arrays are transcribed; 3' portions of the TTTY1 and TTTY2 transcripts are complementary (Supplementary Fig. 8). The TSPY arrays are based on a 20.4-kb repeat unit23 that encodes, on one strand, a previously identified protein, TSPY. A newly identified transcription unit, CYorf16, is found on the opposite strand; its protein coding potential remains to be tested. Approximately 35 copies of this repeat unit—and hence 35 TSPY genes and 35 CYorf16 transcription units—are found in a single, highly regular tandem array in proximal Yp (Fig. 2 and Supplementary Fig. 7d, e); here the sequences of individual repeat units rarely differ from the consensus by more than 1%. Furthermore, a single, isolated TSPY repeat unit, whose sequence diverges 3% from the consensus, is located more distally in Yp, embedded in the distal IR3 inverted repeat (Fig. 2). The 35-unit TSPY cluster is the largest and most homogeneous protein-coding tandem array identified so far in the human genome. The evolution of the MSY On the basis of our present findings and previous studies, we propose a model of MSY evolution that addresses all three euchromatic sequence classes (Figs 6 and 7). In developing the model, we will offer an evolutionary map of the MSY (Fig. 8). We will then consider the two largest and most gene-rich sequence classes—X-degenerate and ampliconic—arguing that two opposed evolutionary dynamics have been at work: gene decay versus gene acquisition and conservation. Throughout, we will propose decisive roles for modulation of DNA recombination, both crossing over and gene conversion, in the evolution and on-going maintenance of the MSY (Fig. 9). Figure 6 Molecular evolutionary pathways and processes that gave rise to genes in three MSY euchromatic sequence classes. X-degenerate genes and pseudogenes (yellow background) derived from an autosomal pair that was ancestral to both the X and Y chromosomes (and that was enlarged by subsequent fusion with other autosomes or autosomal segments50). X-transposed genes (pink background) derived from X-linked genes, which in turn derived from the ancestral autosomal pair. Full legend High resolution image and legend (52k) Figure 7 Plot of Ks (Supplementary Table 5) versus X-linked gene order for 31 X–Y gene (or gene/pseudogene) pairs. Full legend High resolution image and legend (33k) Figure 8 Evolutionary map of the MSY. Full legend High resolution image and legend (69k) Figure 9 MSY sequences exhibiting 99.9% intrachromosomal identity probably undergo Y–Y gene conversion. Full legend High resolution image and legend (29k) The human X and Y chromosomes are thought to have evolved from an ordinary pair of autosomes5, 24. Support for this hypothesis, and a proposed 300-million-year timeline for human sex chromosome evolution, have emerged from studies of modern X–Y gene pairs. In this context, investigators have interpreted the X–Y gene pairs as surviving 'fossils' where extensive sequence identity between ancestral X and Y chromosomes once existed25, 26. Our present sequencing of the MSY euchromatin expands the catalogue of known X–Y gene pair fossils, providing opportunity to reexamine models developed in earlier studies. Evolutionary stratification of X–Y genes Lahn and Page previously studied the evolutionary ages of X–Y gene pairs, as measured by synonymous X–Y nucleotide divergence, or Ks (ref. 26). They reasoned that X–Y differentiation would have begun only after X–Y crossing over ceased. They observed a strong correlation between the age (Ks) of individual X–Y gene pairs and the locations of their X members on the human X chromosome. Among the 19 X–Y gene pairs studied, age increased in a stepwise fashion along the length of the X chromosome, in four 'evolutionary strata'. This suggested that at least four events had punctuated human sex chromosome evolution, with each event suppressing X–Y crossing over in one stratum without grossly disturbing gene order in the X chromosome. We re-analysed this published information and combined the results with Ks and map location data for 12 additional X–Y gene pairs, thus compiling data on 31 X–Y pairs in all (Supplementary Table 5). In each of 27 pairs, the Y member is an X-degenerate gene or pseudogene. The other four pairs include two in which the Y member is an Xtransposed gene and two in which the Y members are ampliconic gene families. Among all X-degenerate pairs, and the two ampliconic pairs, the previously reported correlation between age (Ks) and X map position is readily apparent, with age increasing from the distal short arm to the long arm of the X chromosome (Fig. 7). Furthermore, as observed in the earlier study, the order of the homologous genes in the MSY appears to be scrambled with respect to Ks (Supplementary Fig. 9). These observations, together with the earlier arguments of Lahn and Page, suggest three conclusions. First, all MSY genes and pseudogenes identified here as X-degenerate seem to be products of a single molecular evolutionary process: the region-by-region suppression of crossing over in ancestral autosomes, with subsequent differentiation of the Y from the X chromosome (Fig. 6). Second, at least two of the MSY's ampliconic gene families, VCY and RBMY, also originated in this manner, but subsequently acquired the characteristics of ampliconic sequences (Fig. 6; for independent evidence concerning RBMY see refs 27 and 28). Third, as previously hypothesized, inversions in the Y chromosome may have suppressed crossing over with the X chromosome. X-transposed genes as exceptions A very different evolutionary model accounts for the X-transposed genes, as confirmed by our Ks analysis. If, as hypothesized, these MSY genes are the result of a single, recent transposition from the X chromosome (Fig. 6), then the Ks values of the two X-transposed X–Y gene pairs should be similar to each other but much lower than the Ks values of the nearby (X-degenerate) pairs in the X-chromosome long arm. This prediction is met (Fig. 7). The two X-transposed X–Y gene pairs seem to be orders of magnitude younger than the ancient pairs (group 1 in Fig. 7) among which they are physically situated in the X chromosome. Blurred boundaries Our observations differ from those of Lahn and Page in that the boundaries between X–Y gene groups 2 and 3, and between groups 3 and 4, now seem less distinct (Fig. 7; compare with Fig. 2 in ref. 26). Whereas our present observations could be interpreted as evidence that suppression of X–Y crossing over evolved in more than four steps, such a conclusion would be premature. The apparent overlaps between groups could be artefacts of local errors in ordering X-linked genes, these regions not yet having been fully sequenced, or simply of large standard errors for some Ks estimates (Fig. 7). Some changes in local gene order in the X chromosome may also have occurred during its evolution. Another potentially confounding factor is X–Y gene conversion, which would depress Ks values and estimated ages for geneconverted X–Y pairs. Gene conversion depends on high sequence similarity, and thus one might expect any such effect to be greater among the younger X–Y pairs, in groups 3 and 4. Indeed, comparisons of X and Y genomic sequences suggest that the VCX/Y pair and 3' portions of the KAL1/P pair (both pairs in group 4) have engaged in extensive gene conversion (Supplementary Fig. 10), depressing their Ks values below those of the 5' portion of the KAL1/P pair and of other group 4 pairs (Fig. 7). A map of male-specific ages Having examined the evolutionary ages of all 31 X–Y gene pairs, we used them to anchor an evolutionary map of the modern human MSY. The map displays the male-specific ages of many sequence segments (Fig. 8). Here, male-specific age is the estimated number of years that have passed since sequences ancestral to that segment were incorporated into the MSY (having previously been autosomal, pseudoautosomal, or X-linked). We estimated the age of each gene or segment using Lahn and Page's methods that combined Ks analysis (Supplementary Table 5) with comparative gene mapping data from other mammals. The resulting estimated ages are graphed on a logarithmic scale to accommodate a range that extends from approximately 4 million years (the X-transposed sequences; the youngest known sequences in the MSY) to approximately 300 million years (SRY, the sex determinant and arguably the oldest gene in the MSY). As can be seen in Fig. 8, the MSY euchromatin is an elaborate patchwork of sequences of diverse male-specific ages. The result of a single, recent transposition from the Xchromosome, the MSY's X-transposed sequences are homogeneously youthful. The sequences of both the X-degenerate and ampliconic classes are much older, and they display a wide range of male-specific ages (Fig. 8). As we will argue, it is in comparing and contrasting these two chronologically diverse classes that the central themes of MSY evolution and function are revealed most clearly. Evolutionary dynamics of X-degenerate and ampliconic sequences To appreciate the evolutionary dynamics of these two sequence classes, we need to consider both their similarities and differences. In many senses, the X-degenerate and ampliconic sequences together dominate the euchromatic MSY. The X-degenerate and ampliconic classes are physically intermingled in the MSY, and they are comparably large, constituting, respectively, 38% and 45% of the MSY's euchromatic sequences (Fig. 1 and Supplementary Table 6). Together, these two sequence classes carry all but two of the MSY's 78 known protein-coding transcription units (Table 2). The Xdegenerate and ampliconic classes display comparable diversities of male-specific ages, from tens to hundreds of millions of years (Fig. 8). This implies that Xdegenerate and ampliconic sequences evolved in parallel, as parts of a single DNA molecule, for as much as 300 million years. Moreover, we infer that the X-degenerate and ampliconic sequences evolved under similar, unusual circumstances: both were transmitted exclusively through the male germ line, and neither participated in meiotic crossing over with a homologous counterpart. However, a number of marked structural and functional differences between these two sequence classes suggest that they followed different evolutionary trajectories. Palindromes are prevalent in ampliconic sequences. The density of transcription units is much higher and the density of interspersed repeats is much lower in ampliconic than in X-degenerate sequences (Fig. 1). The two sequence classes also diverge starkly with respect to gene-expression patterns. Most X-degenerate genes are expressed widely throughout the body, and many are probably involved in cellular housekeeping activities that are critical in both males and females. In contrast, most ampliconic genes are expressed predominantly or exclusively in testes, where they probably function in spermatogenesis. Decay in the absence of sexual recombination The X-degenerate sequences are adequately explained by the prevailing theory of sex chromosome evolution, which states that as the X and Y chromosomes evolved from an autosomal pair, the X chromosome maintained most of its ancestor's genes whereas the Y chromosome lost them5, 24-26. Our findings support the two major premises of this theory: the evolutionary genetic benefits of sexual recombination through meiotic crossing over, and the deleterious consequences of its absence. According to this theory, most ancestral genes remained functionally intact in the X chromosome, where the benefits of crossing over (in females) continued. In the Y chromosome, in contrast, the shutting down of X–Y crossing over during evolution triggered a monotonic decline in gene function. This model is corroborated by the presence, in the MSY's X-degenerate sequences, of decayed, intron-bearing pseudogenes of 13 different X-linked genes (Supplementary Table 3). Presumably, many hundreds of other X-homologous genes were deleted outright from the evolving MSY, leaving no trace in the DNA sequence of the modern human MSY. Seen in this light, the 16 protein-coding genes in the modern MSY's X-degenerate sequences (Table 2 and Fig. 3) appear as rare examples of persistence in the absence of sexual recombination. Acquisition and conservation of spermatogenic functions This evolutionary model of the Y chromosome as a decaying X chromosome, however, provides no explanation for central characteristics of the MSY's ampliconic sequences, including testis-specific gene expression, near-perfect palindromes, and an abundance of autosomal (as well as X-chromosomal) sequence similarities. To account for these characteristics, we propose that the MSY acquired, and evolved a means of conserving, genes that specifically enhanced male fertility. Unlike the X-degenerate sequences, all of which trace to the MSY's shared ancestry with the X chromosome, the ampliconic sequences evolved from a great variety of genomic sources, and by a diversity of molecular mechanisms (Fig. 6). As mentioned previously, the ampliconic genes VCY and RBMY were, similar to the X-degenerate genes, derived from common ancestors of the X and Y chromosomes27, 28. In contrast, the DAZ genes arose, during primate evolution, by transposition and subsequent amplification of an autosomal transcription unit, DAZL, which still exists on human chromosome 3 (ref. 29). Indeed, systematic analysis of MSY/autosome similarities suggests that a series of autosomal transpositions contributed to the MSY's ampliconic sequences during primate evolution (Fig. 8; see also ref. 13). Yet another molecular mechanism accounts for the CDY genes, which arose by retroposition (and subsequent amplification) of a processed messenger RNA derived from an autosomal gene 30. This retroposition event was previously thought to have occurred during primate evolution, but our present Ks analysis indicates a much older date, probably before the lineages of marsupials and placental mammals diverged (Fig. 8; see also Supplementary Table 5). Despite the wide variety of genomic sources and molecular evolutionary mechanisms that gave rise to the ampliconic genes, they all came to exist in the MSY in multiple, nearly identical copies, and they evolved remarkably uniform patterns of tissue expression. Indeed, detailed studies of several ampliconic gene families have revealed that they are expressed predominantly or exclusively in one cell lineage: the spermatogenic cells of the testis. What accounts for this convergence of evolutionary outcomes? The genesis of XY sex chromosomes during mammalian evolution, and specifically the emergence of a male-specific domain, created a genomic niche where selection could operate to enhance male germ-cell development. Amplification of the testis genes might have enhanced sperm production through high levels of expression. However, in a region devoid of crossing over, amplification might also have allowed another type of homologous recombination, gene conversion, to emerge as a means of conserving gene function. Abundant Y–Y gene conversion in ampliconic regions Gene conversion is the non-reciprocal transfer of sequence information from one DNA duplex to another31. This type of genetic recombination has been studied most extensively in fungi, where it was originally demonstrated to occur between chromosome homologues, or at lower frequency between sister chromatids, in meiosis. It was later shown that gene conversion could also occur between duplicated sequences on a single chromosome, and in mitosis32. Here we will argue that gene conversion (non-reciprocal recombination) is as frequent in the MSY as crossing over (reciprocal recombination) is in ordinary chromosomes. Specifically, two major findings provide evidence that gene conversion occurs routinely in 30% of the MSY euchromatin, including nearly all of the MSY's testis-specific gene families. The accompanying study7 reports the identification and sequencing of chimpanzee Y-linked orthologues of human MSY palindromes and establishes that gene conversion between palindrome arms has occurred in both the human and chimpanzee lineages, and has continued to occur in human populations. Here we report that these palindromes are representative of a large, discrete fraction of MSY sequences, all of which bear at least 99.9% identity to other MSY sequences. These findings suggest that the entire fraction is subject to frequent gene conversion. Above we described calculations of percentage nucleotide identity between all pairs of known MSY sequences. We defined and mapped the ampliconic regions by reporting, at each point along the length of the MSY euchromatin, the highest percentage identity to other MSY sequences (intrachromosomal similarity; Fig. 5). To view this data from another perspective, we electronically fractionated all MSY sequences according to intrachromosomal similarity. As seen in Fig. 9a, 30% of MSY euchromatic sequences display intrachromosomal identities of 99.9–100%. As intrachromosomal identity declines below 99.9%, the fractional representation of MSY sequences drops abruptly. Thus, the sequences displaying intrachromosomal identities of 99.9% represent a large and distinct subset of the MSY euchromatin. This 99.9% subset comprises the eight palindromes as well as large portions of the IR2 and IR3 inverted repeats described above (Figs 2 and 3). Indeed, nearly all of the 99.9% sequences exist as pairs in inverted orientation. Thus, the MSY palindromes in which gene conversion has been demonstrated 7 are typical and representative of the 99.9% fraction. We extrapolate that nearly all of the 99.9% fraction is engaged in gene conversion on a routine basis, resulting in a degree of identity among MSY's inverted sequence pairs that rivals that of two autosomal homologues, or alleles, chosen at random from the human population15, 33. Two modes of productive recombination in the human Y chromosome Combined with previous discoveries in the pseudoautosomal regions, the present findings imply that two modes of homologous recombination occur regularly in the human Y chromosome. First, there is crossing over with the X chromosome in the pseudoautosomal regions (aggregate length 3.0 Mb) (Supplementary Note 6). Second, there is Y–Y gene conversion in the 99.9% regions (aggregate length 6.1 Mb) dispersed throughout the MSY (Fig. 9b)7. We refer to both routine modes of Y chromosome recombination as 'productive' to distinguish them from the relatively rare, aberrant recombination events (typically Y–Y or X–Y) that perturb sex differentiation or fertility and thereby diminish the reproductive fitness of affected individuals. Genetic mapping studies have shown that, typically, one X–Y crossover occurs per generation in the pseudoautosomal regions (Supplementary Note 7). As described in the accompanying report7, steady-state calculations suggest that, on average, multiple Y–Y gene conversion events take place per generation in the MSY. Thus, most homologous recombination events in the Y chromosome probably occur in the MSY. In recent years, we and other investigators have referred to the MSY as the NRY, or 'non-recombining region of the Y chromosome'. This usage reflected both awareness that productive X–Y crossing over did not occur in the MSY, and ignorance of the Y–Y gene conversion that is apparently commonplace there. We now refer to the NRY as the MSY, or 'male-specific region of the Y chromosome', because it is recombinogenic and unique to males. Gene conversion and the MSY's testis gene families Examination of the MSY's testis gene families provides additional insight into the potential biological significance of the 99.9% fraction and the gene conversion associated with it. Eight of the MSY's nine identified testis gene families have members in the palindromes or inverted repeats that comprise the 99.9% fraction just described. (The exceptional family is TSPY, most of whose members are found in a long tandem array.) Many of these family members are intact gene copies, but others are apparent pseudogenes with disrupted splice sites or reading frames. For each of the eight testis gene families, we counted the numbers of intact and pseudogene copies, both within and without the 99.9% fraction (Table 4). Whereas large numbers of pseudogenes are present both inside and outside the 99.9% fraction, the intact gene copies, 25 in all, are located exclusively in the 99.9% fraction. Thus, there is an evident association of intact testis genes with near-identical inverted sequence pairs that undergo gene conversion. What is the biological significance of this association? We envision two possibilities, which are not mutually exclusive. First, we note that in all cases examined so far, expression of these testis-specific gene families has been found to be limited to or most pronounced in cells of the spermatogenic lineage—in germ cells. Perhaps these near-identical sequence pairs are transcriptionally active in germ cells because there they generate cruciforms or other unusual chromatin configurations. Second, the occurrence of MSY gene pairs that are subject to frequent gene conversion might provide a mechanism for conserving gene functions across evolutionary time in the absence of crossing over. Implications for future studies We anticipate that the nucleotide sequence reported here, and the methods with which it was obtained, will find many applications in human biology and beyond. Comparisons with other human Y chromosomes The sequence of one man's MSY, as reported here, provides a point of departure for systematic, comprehensive characterization of MSY sequence variation in human populations. The MSY's unique characteristics—male specificity, no crossing over and abundant gene conversion— suggest that its sequence variation might differ markedly from that of ordinary human chromosomes. Already the availability of MSY sequence information in public databases has accelerated the emergence of MSY sequence variation as a powerful tool in reconstructing the patrilineal origins of modern human populations11, 34. Comparisons (or lack of) with other species Little is known about the DNA sequences of Y chromosomes in other animals or plants, and thus it is not possible at present to compare systematically the human MSY with that of any other species. Both the Drosophila and mouse Y chromosomes contain genes required for spermatogenesis, but meagre Y chromosome sequence data is available in either species. In Drosophila, the sequences of autosomes and the X chromosome were assembled from whole-genome shotgun data. Unfortunately, this shotgun analysis was insufficient to assemble much Y chromosome sequence35, 36, confirming prior suspicions that, in Drosophila as in humans, the Y chromosome poses special challenges. In the mouse, a draft sequence of the female genome is available37, but systematic efforts to sequence the male-specific region of the Y chromosome have yet to be initiated. If undertaken, Y chromosome sequencing projects in Drosophila, mouse and other species are likely to encounter special technical hurdles, but they are also likely to yield entirely unforeseen biological insights, as was the case here for the human MSY. The availability of human MSY sequence has already enabled new tests and rekindled debate of Haldane's hypothesis that mutations in the male germ line greatly outnumber those in the female germ line (Supplementary Note 8). This debate will surely be fuelled by sequencing of other primate and mammalian Y chromosomes. Methods for sequencing difficult genomic regions Our strategy of iterative mapping and sequencing was laborious but essential. Two faster, less costly strategies have been used recently in sequencing large genomes: whole-genome shotgun analysis15, 35 and sequencing a tiling path of mapped clones (ref. 14 and Supplementary Note 9). Neither of these sequencing strategies would have yielded a coherent picture of the MSY. This is especially true of the MSY's ampliconic regions, and most particularly the 30% of the MSY euchromatin (including the eight palindromes) exhibiting intrachromosomal similarities of 99.9%. Large amplicons like those described here are not unique to the MSY, but as in the MSY, they have proven to be formidable obstacles to whole-genome methods38, 39. The iterative mapping and sequencing strategy used here should be considered by genome scientists wishing to determine the structure and sequence of amplicon-rich regions of human autosomes, the X chromosome and other genomes. The medical relevance of the MSY Propelled by advances in MSY genomics, the biomedical significance of the MSY has begun to surface in recent years, with evidence of roles in such diverse processes as gonadal sex determination, skeletal growth, germ-cell tumorigenesis and graft rejection6. Two research areas that should benefit from the present MSY sequence and gene catalogue are of particular note. First, one of the most common chromosomal disorders of girls and women is Turner syndrome, classically associated with a 45,X (X0) karyotype. Haploinsufficiency of particular genes common to the X and Y chromosomes may be responsible for somatic features of the syndrome40-42. In most cases, the molecular identity of these Turner genes remains to be determined. One or more Turner genes are likely to be found within the catalogue of X-degenerate genes (and their X-linked homologues; see Table 2). A highly active area of MSY research explores spermatogenesis and the genetic basis of male infertility. MSY deletions have emerged as the most common of the known genetic causes of spermatogenic failure in human populations13, 43-46. The availability of MSY sequence has already begun to transform our understanding, enabling investigators to precisely define four distinct classes of recurrent MSY deletions causing spermatogenic failure, identify the MSY genes absent as a result of these deletions (typically members of testis-specific families), and demonstrate that most such deletions are the result of homologous recombination between near-identical amplicons13, 43-46. Thus, the ampliconic structures that may help preserve testis gene function across evolutionary time (through gene conversion) also put individuals at risk of spermatogenic failure (again, through homologous recombination). Genetic and biological differences between males and females It is commonly stated that the genomes of two randomly selected members of our species exhibit 99.9% nucleotide identity. In reality, this statement holds only if one is comparing two males, or two females. If one compares a female with a male, the second X chromosome (160 Mb, or roughly 3% of the diploid DNA content) is replaced by the largely dissimilar Y chromosome (60 Mb, or 1% of the diploid DNA content). This common substitution of the Y chromosome for the second X chromosome dwarfs all other DNA polymorphism in the human genome. In decades past, and with the important exception of X-linked recessive diseases, biologists often judged this genomic dimorphism to be of limited functional consequence, especially because of inactivation of the second X chromosome in females and the presumed paucity of genes in the Y chromosome. Now we must begin to reconsider this position, given the unanticipated number and variety of MSY genes, many of which are expressed throughout the body, and the fact that many X-linked genes are expressed from both X chromosomes in female cells47. The present sequence of the MSY, and the emerging sequence of the X chromosome, offer the near prospect of a comprehensive catalogue of genetic and sequence differences between human males and females. Translating this knowledge into an understanding of the myriad differences between the sexes in anatomy, physiology, cognition, behaviour and disease susceptibility presents a monumental challenge, but surely one of broad significance and interest. Methods Iterative mapping and sequencing The method of iterative mapping and sequencing used here has been described10, 13. All MSY BACs selected for sequencing were isolated from the RPCI-11 library48, with the exception of 11 clones (nine spanning the AZFa region12, and two used to narrow gaps10) from the CITB and CITC libraries. We made frequent use of publicly available BAC-end sequences as a source of markers during the final stages of map construction 49. Two gaps were closed by longrange PCR; see Supplementary Fig. 11. Unfortunately, no cell line is available from the donor of the RPCI-11 BAC library. Thus, to confirm the large-scale organization of MSY sequences reported here, we PCRamplified the inner and outer boundaries of all palindromes in ten men with genetically diverse Y chromosomes (PCR primers in Supplementary Table 8). We sequenced all resulting products. These experiments confirmed that each palindrome boundary is present in the great majority of human Y chromosomes. Intrachromosomal sequence similarity Analyses of intrachromosomal similarity were performed using custom Perl code. This code used BLAST (http://blast.wustl.edu) to compare all 5-kb sequence segments, in 2-kb steps, to the entire remainder of the MSY sequence. Interspersed repeats We electronically identified interspersed repeats with RepeatMasker (http://repeatmasker.genome.washington.edu). Homology to other chromosomes To identify sequence similarities to other human chromosomes, we conducted BLAST searches against GenBank databases with the sequence of each MSY clone. Interspersed repeats and low-complexity regions were masked using RepeatMasker. To experimentally verify the chromosomal origins of sequences similar to the MSY, we designed STSs from those sequences and assayed them against the NIGMS human/rodent somatic cell hybrid mapping panels 1 and 2 (NIGMS Human Genetic Cell Repository, http://locus.umdnj.edu/nigms/maps/mapping.html). Identification of new genes and transcription units We identified potential transcripts from three sources: (1) BLAST matches to cDNA sequences (EST or full length). We pursued matches where the cDNA sequence showed evidence of polyadenylation or splicing, or where there were multiple matching cDNA sequences. (2) BLAST matches to fragments of putative MSY transcripts that had been cloned by cDNA selection of testis cDNA against a flow-sorted, genomic Y-chromosome library19. (3) GenomeScan18 predictions in the NCBI annotation of Y-chromosome contigs. We then tested for transcription by RT–PCR as previously described13. Chromosomal FISH One- or two-colour FISH to human chromosomes was performed as previously described9. Calculation of Ks and Ka We calculated the numbers of synonymous substitutions per synonymous site (Ks) and of non-synonymous substitutions per non-synonymous site (Ka) as follows. We used FASTA (ftp://ftp.virginia.edu/pub/fasta) to align the pairs of coding sequences in Supplementary Table 5. For non-transcribed MSY pseudogenes, we used FASTA to align the genomic sequence of pseudogene exons to the corresponding transcribed coding sequence (Supplementary Table 5 and File 3). Then, as is standard practice, insertions/deletions were manually removed from the alignments. We calculated Ks and Ka for these alignments using the diverge function in the Wisconsin Package (Version 10.2, Genetics Computer Group). Supplementary information accompanies this paper. Received 7 March 2003; accepted 8 April 2003 References 1. Painter, T. S. The Y-chromosome in mammals. Science 53, 503-504 (1921) 2. Stern, C. The problem of complete Y-linkage in men. Am. J. Hum. Genet. 9, 147-166 (1957) | ChemPort | 3. Jacobs, P. A. & Strong, J. A. A case of human intersexuality having a possible XXY sex determining mechanism. Nature 183, 302-303 (1959) | ChemPort | 4. Ford, C. E., Miller, O. J., Polani, P. E., de Almeida, J. C. & Briggs, J. H. A sex-chromosome anomaly in a case of gonadal dysgenesis (Turner's syndrome). Lancet 1, 711-713 (1959) | ChemPort | 5. Ohno, S. Sex Chromosomes and Sex-linked Genes (Springer, Berlin, 1967) 6. Vogt, P. H. et al. Report of the third international workshop on Y chromosome mapping 1997. Cytogenet. Cell Genet. 79, 1-20 (1997) | PubMed | ChemPort | 7. Rozen, S. et al. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 423, 873-876 (2003) | Article | 8. Foote, S., Vollrath, D., Hilton, A. & Page, D. C. The human Y chromosome: Overlapping DNA 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. clones spanning the euchromatic region. Science 258, 60-66 (1992) | PubMed | ChemPort | Saxena, R. et al. Four DAZ genes in two clusters found in AZFc region of human Y chromosome. Genomics 67, 256-267 (2000) | Article | PubMed | ChemPort | Tilford, C. et al. A physical map of the human Y chromosome. Nature 409, 943-945 (2001) | Article | PubMed | ChemPort | Shen, P. et al. Population genetic implications from sequence variation in four Y chromosome genes. Proc. Natl Acad. Sci. USA 97, 7354-7359 (2000) | Article | PubMed | ChemPort | Sun, C. et al. An azoospermic man with a de novo point mutation in the Y-chromosomal gene USP9Y. Nature Genet. 23, 429-432 (1999) | Article | PubMed | ChemPort | Kuroda-Kawaguchi, T. et al. The AZFc region of the Y chromosome features massive palindromes and uniform recurrent deletions in infertile men. Nature Genet. 29, 279-286 (2001) | Article | PubMed | ChemPort | Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001) | Article | PubMed | ChemPort | Venter, J. C. et al. The sequence of the human genome. Science 291, 1304-1351 (2001) | Article | PubMed | ChemPort | Schueler, M. G., Higgins, A. W., Rudd, M. K., Gustashaw, K. & Willard, H. F. Genomic and genetic definition of a functional human centromere. Science 294, 109-115 (2001) | Article | PubMed | ChemPort | Tyler-Smith, C. et al. Localization of DNA sequences required for human centromere function through an analysis of rearranged Y chromosomes. Nature Genet. 5, 368-375 (1993) | PubMed | ChemPort | Yeh, R. F., Lim, L. P. & Burge, C. B. Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803-816 (2001) | Article | PubMed | ChemPort | Lahn, B. T. & Page, D. C. Functional coherence of the human Y chromosome. Science 278, 675-680 (1997) | Article | PubMed | ChemPort | Page, D. C., Harper, M. E., Love, J. & Botstein, D. Occurrence of a transposition from the Xchromosome long arm to the Y-chromosome short arm during human evolution. Nature 311, 119-123 (1984) | PubMed | ChemPort | Mumm, S., Molini, B., Terrell, J., Srivastava, A. & Schlessinger, D. Evolutionary features of the 4-Mb Xq21.3 XY homology region revealed by a map at 60-kb resolution. Genome Res. 7, 307-314 (1997) | PubMed | ChemPort | Schwartz, A. et al. Reconstructing hominid Y evolution: X-homologous block, created by X-Y transposition, was disrupted by Yp inversion through LINE-LINE recombination. Hum. Mol. Genet. 7, 1-11 (1998) | Article | PubMed | ChemPort | Tyler-Smith, C., Taylor, L. & Muller, U. Structure of a hypervariable tandemly repeated DNA sequence on the short arm of the human Y chromosome. J. Mol. Biol. 203, 837-848 (1988) | PubMed | ChemPort | Graves, J. A. & Schmidt, M. M. Mammalian sex chromosomes: Design or accident? Curr. Opin. Genet. Dev. 2, 890-901 (1992) | PubMed | ChemPort | Jegalian, K. & Page, D. C. A proposed path by which genes common to mammalian X and Y chromosomes evolve to become X inactivated. Nature 394, 776-780 (1998) | Article | PubMed | ChemPort | Lahn, B. T. & Page, D. C. Four evolutionary strata on the human X chromosome. Science 286, 964-967 (1999) | Article | PubMed | ChemPort | Delbridge, M. L., Lingenfelter, P. A., Disteche, C. M. & Graves, J. A. M. The candidate spermatogenesis gene RBMY has a homologue on the human X chromosome. Nature Genet. 22, 223-224 (1999) | Article | PubMed | ChemPort | Mazeyrat, S., Saut, N., Mattei, M. G. & Mitchell, M. J. RBMY evolved on the Y chromosome from a ubiquitously transcribed X-Y identical gene. Naure Genet. 22, 224-226 (1999) | Article | ChemPort | 29. Saxena, R. et al. The DAZ gene cluster on the human Y chromosome arose from an autosomal gene that was transposed, repeatedly amplified and pruned. Nature Genet. 14, 292299 (1996) | PubMed | ChemPort | 30. Lahn, B. T. & Page, D. C. Retroposition of autosomal mRNA yielded testis-specific gene family on human Y chromosome. Nature Genet. 21, 429-433 (1999) | Article | PubMed | ChemPort | 31. Szostak, J. W., Orr-Weaver, T. L., Rothstein, R. J. & Stahl, F. W. The double-strand-break repair model for recombination. Cell 33, 25-35 (1983) | PubMed | ChemPort | 32. Jackson, J. A. & Fink, G. R. Gene conversion between duplicated genetic elements in yeast. Nature 292, 306-311 (1981) | PubMed | ChemPort | 33. The International SNP Map Working Group. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928-933 (2001) | Article | PubMed | ChemPort | 34. Underhill, P. A. et al. Y chromosome sequence variation and the history of human populations. Nature Genet. 26, 358-361 (2000) | Article | PubMed | ChemPort | 35. Adams, M. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185-2195 (2000) | Article | PubMed | 36. Carvalho, A. B., Dobo, B. A., Vibranovski, M. D. & Clark, A. G. Identification of five new genes on the Y chromosome of Drosophila melanogaster. Proc. Natl Acad. Sci. USA 98, 1322513230 (2001) | Article | PubMed | ChemPort | 37. Waterston, R. H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520-562 (2002) | Article | PubMed | ChemPort | 38. Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 10031007 (2002) | Article | PubMed | ChemPort | 39. Stankiewicz, P. & Lupski, J. R. Genome architecture, rearrangements and genomic disorders. Trends Genet. 18, 74-82 (2002) | Article | PubMed | ChemPort | 40. Ferguson-Smith, M. A. Karyotype-phenotype correlations in gonadal dysgenesis and their bearing on the pathogenesis of malformations. J. Med. Genet. 2, 142-155 (1965) 41. Zinn, A. R., Page, D. C. & Fisher, E. M. C. Turner syndrome: The case of the missing sex chromosome. Trends Genet. 9, 90-93 (1993) | Article | PubMed | ChemPort | 42. Rao, E. et al. Pseudoautosomal deletions encompassing a novel homeobox gene cause growth failure in idiopathic short stature and Turner syndrome. Nature Genet. 16, 54-63 (1997) | PubMed | ChemPort | 43. Sun, C. et al. Deletion of azoospermia factor a (AZFa) region of human Y chromosome caused by recombination between HERV15 proviruses. Hum. Mol. Genet. 9, 2291-2296 (2000) | Article | PubMed | ChemPort | 44. Blanco, P. et al. Divergent outcomes of intrachromosomal recombination on the human Y chromosome: male infertility and recurrent polymorphism. J. Med. Genet. 37, 752-758 (2000) | Article | PubMed | ChemPort | 45. Kamp, C., Hirschmann, P., Voss, H., Huellen, K. & Vogt, P. H. Two long homologous retroviral sequence blocks in proximal Yq11 cause AZFa microdeletions as a result of intrachromosomal recombination events. Hum. Mol. Genet. 9, 2563-2572 (2000) | Article | PubMed | ChemPort | 46. Repping, S. et al. Recombination between palindromes P5 and P1 on the human Y chromosome causes massive deletions and spermatogenic failure. Am. J. Hum. Genet. 71, 906-922 (2002) | Article | PubMed | 47. Carrel, L., Cottle, A. A., Goglin, K. C. & Willard, H. F. A first-generation X-inactivation profile of the human X chromosome. Proc. Natl Acad. Sci. USA 96, 14440-14444 (1999) | Article | PubMed | ChemPort | 48. Osoegawa, K. et al. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res. 11, 483-496 (2001) | Article | PubMed | ChemPort | 49. Zhao, S. et al. Human BAC ends quality assessment and sequence analyses. Genomics 63, 321-332 (2000) | Article | PubMed | ChemPort | 50. Watson, J. M., Spencer, J. A., Riggs, A. D. & Graves, J. A. Sex chromosome evolution: Platypus gene mapping suggests that part of the human X chromosome was originally autosomal. Proc. Natl Acad. Sci. USA 88, 11256-11260 (1991) | PubMed | ChemPort | Acknowledgements. We thank R. Giardine, R. Oates and S. Silber for patient samples; and J. Alfoldi, C. Disteche, J. Koubova and J. Lange for comments on the manuscript. This work was supported by the National Institutes of Health and the Howard Hughes Medical Institute. Competing interests statement. The authors declare that they have no competing financial interests. Figure 1 The male-specific region of the Y chromosome. a, Schematic representation of the whole chromosome, including the pseudoautosomal and heterochromatic regions. b, Enlarged view of a 24-Mb portion of the MSY, extending from the proximal boundary of the Yp pseudoautosomal region to the proximal boundary of the large heterochromatic region of Yq. Shown are three classes of euchromatic sequences, as well as heterochromatic sequences. A 1-Mb bar indicates the scale of the diagram. c, d, Gene, pseudogene and interspersed repeat content of three euchromatic sequence classes. c, Densities (numbers per Mb) of coding genes, non-coding transcription units, total transcription units and pseudogenes. d, Percentages of nucleotides contained in Alu, retroviral, LINE1 and total interspersed repeats. The data shown in c and d are available in numerical form in Supplementary Tables 6 and 7. Supplementary Table 6 also provides information about the size and (G + C) content of each sequence class. Figure 2 Sequence-based map of the MSY; a detailed view of the 24-Mb region shown in Fig. 1b. Backgroun indicate the three classes of MSY euchromatic sequences: X-transposed (pink), X-degenerate (yellow) and am (blue), as well as heterochromatic (red stripes) and pseudoautosomal (green) sequences and NORF arrays (gre Two gaps in the sequence are indicated at the top edge of the diagram. a, Eight primary palindromes (P1–P8) secondary palindromes (P1.1 and P1.2). Diverging black arrows mark the left and right arms of each palindrom between diverging arrows represent non-palindromic spacers at the centres of these structures. b, Near-perfect repeats (non-palindromic), three in all (IR1 to IR3; Supplementary Table 4). In each case, the left and right arm >99.5% nucleotide identity. c, Other inverted repeats (non-palindromic). Grey arrows (IR4) denote two region identity, one on Yp and one on Yq. Yellow arrows (IR5) denote four regions of >92% identity, all on Yq. d, D any of the four indicated regions—AZFa, P5/proximal P1 (AZFb), AZFc, or P5/distal P1—cause spermatogen 46 . e, Previously reported genes and new, experimentally verified transcription units for which cDNA sequenci protein-coding potential (Table 2). Plus (+) and minus (-) strand are indicated by the top or bottom row, respec Fig. 5c. g, Scale (Mb). h, Sequences whose transcription has been verified (in this or previous studies) but for little or no evidence of protein-coding potential (Supplementary Table 2). i, Previously reported pseudogenes apparently non-transcribed homologues of known coding genes (Supplementary Table 3). j, (G + C) content ( in a 100-kb sliding window with 1-kb steps. k, Alu, LINE1 and human endogenous retroviral (HERV) repeat expressed as percentage of nucleotides, calculated in a 200-kb sliding window with 1-kb steps. l, 220 BAC clo completely or partially sequenced. Each bar represents size and position of one BAC clone, identified by the n portion of its GenBank accession number (in each case beginning with the prefix AC). Black bars represent fin sequences deposited in GenBank, where finished sequences are trimmed to retain only 200 bp of overlap with BACs. Grey bars represent the 'trimmings' of those BACs, not deposited in GenBank. Striped bars represent B sequence has not been finished but has been deposited in GenBank. See Supplementary Fig. 2 for a more detai this figure. The composite sequence of the 24-Mb region studied is available as Supplementary File 1. Figure 3 MSY genes, transcription units and palindromes. a, Triangles denote sizes and locations of arms of e repeats (whose arms exhibit 99.95% identity). Gaps between opposed triangles represent the non-duplicated 's schematic, as in Fig. 1b. c, Nine families of protein-coding genes. Solid triangles denote apparently intact gene are not shown. d, Single-copy protein-coding genes. e, Single-copy transcription units. These give rise to splic Fifteen families of transcription units. g, Merged map of all genes and transcription units. Figure 4 Three sequence classes in the MSY euchromatin. Colour scheme as in Fig. 2. Figure 5 Sequence similarities within the MSY. a, Triangular dot plot in which the MSY's sequence is compa the plot, each dot represents a match of >65% within a window of 2,000 nucleotides. Green dots represent mat between LINE1 elements; red dots represent matches between heterochromatic sequences; blue dots represent other sequences. Direct repeats appear as horizontal lines, inverted repeats as vertical lines, and palindromes a nearly intersect the baseline. Long arrays of tandem repeats appear as pyramids. The inset indicates that the la contains two smaller triangles (one revealing sequence similarities within Yp, and one revealing similarities w rectangle (revealing similarities between Yp and Yq). b, MSY schematic, as in Fig. 1b. c, Plot of intrachromos similarity, which serves to identify ampliconic sequences (blue). Using a 50-kb sliding window and 1-kb steps euchromatic sequence was compared to all other available MSY euchromatic sequences. (Long interspersed re before analysis.) At each point along the length of the MSY, the highest sequence similarity (expressed as per identity) was identified. All such values >50% are shown. An expanded version of this plot is shown in Fig. 2f Figure 6 Molecular evolutionary pathways and processes that gave rise to genes in three MSY euchromatic sequence classes. X-degenerate genes and pseudogenes (yellow background) derived from an autosomal pair that was ancestral to both the X and Y chromosomes (and that was enlarged by subsequent fusion with other autosomes or autosomal segments50). X-transposed genes (pink background) derived from X-linked genes, which in turn derived from the ancestral autosomal pair. Ampliconic genes (blue background) were derived through three converging processes: amplification of X-degenerate genes (for example, RBMY, VCY); transposition and amplification of autosomal genes (DAZ); and retroposition and amplification of autosomal genes (CDY). Boxes enumerate dominant themes in X-degenerate (yellow) and ampliconic (blue) gene evolution. The asterisk indicates that Y–Y gene conversion is apparently common in the 61% of ampliconic sequences that exhibit intrachromosomal identities of 99.9%. Figure 7 Plot of Ks (Supplementary Table 5) versus X-linked gene order for 31 X–Y gene (or gene/pseudogene) pairs. Colour highlighting of X-linked gene names indicates whether Y homologues are X-degenerate (yellow), ampliconic (blue) or X-transposed (pink). Within the plot, four yellow rectangles denote four previously defined 'evolutionary strata', or groups of genes26; a small pink rectangle highlights two X-transposed genes. Genes in the X chromosome are ordered according to the NCBI sequence assembly of November 2002; distances between genes are not drawn to scale. Standard errors for Ks values are shown. Figure 8 Evolutionary map of the MSY. At the bottom is an MSY schematic, as in Fig. 1b. Coloured rectangl schematic depict the estimated male-specific ages of the corresponding segments of the modern MSY. These a logarithmic scale (a). b, X–Y strata 1, 2, 3 and 4 (ref. 26 and Fig. 7). c, The chromosomes (more properly, the of the chromosomes) from which the indicated X-transposed or ampliconic sequences apparently arose throug evolution. d, MSY genes that apparently arose at the indicated times. e, Approximate times of divergence betw other vertebrate lineages. The methods used to estimate the male-specific ages of each of the sequences and ge Supplementary Table 9. Figure 9 MSY sequences exhibiting 99.9% intrachromosomal identity probably undergo Y–Y gene conversion. a, Electronic fractionation of MSY euchromatic sequences according to intrachromosomal similarity (per cent identity to other MSY sequences), plotted on a logarithmic scale. Values <70% are not shown. b, Sites of productive recombination in the Y chromosome. Shown at the top is a schematic representation of the entire Y chromosome, including the pseudoautosomal regions (green). The pseudoautosomal regions are sites of frequent X–Y crossing over. Within the MSY's ampliconic sequences are many sites of apparently frequent Y–Y gene conversion; all of these sites display intrachromosomal identities of 99.9%. Nature 423, 873 - 876 (19 June 2003); doi:10.1038/nature01723 Abundant gene conversion between arms of palindromes in human and ape Y chromosomes STEVE ROZEN*, HELEN SKALETSKY*, JANET D. MARSZALEK*, PATRICK J. MINX†, HOLLAND S. CORDUM†, ROBERT H. WATERSTON†, RICHARD K. WILSON† & DAVID C. PAGE* * Howard Hughes Medical Institute, Whitehead Institute, and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA † Genome Sequencing Center, Washington University School of Medicine, 4444 Forest Park Boulevard, St Louis, Missouri 63108, USA Correspondence and requests for materials should be addressed to D.C.P. (page_admin@wi.mit.edu). All new DNA sequences and STSs were submitted to GenBank with accession numbers AC139189–AC139194 (chimpanzee BACs), AY090860–AY090881 (palindrome boundary sequences in apes), and G73582–G73595 (STS for amplifying palindrome boundaries); see Supplementary Information for details. Eight palindromes comprise one-quarter of the euchromatic DNA of the malespecific region of the human Y chromosome, the MSY1. They contain many testis-specific genes and typically exhibit 99.97% intra-palindromic (arm-toarm) sequence identity1. This high degree of identity could be interpreted as evidence that the palindromes arose through duplication events that occurred about 100,000 years ago. Using comparative sequencing in great apes, we demonstrate here that at least six of these MSY palindromes predate the divergence of the human and chimpanzee lineages, which occurred about 5 million years ago. The arms of these palindromes must have subsequently engaged in gene conversion, driving the paired arms to evolve in concert. Indeed, analysis of MSY palindrome sequence variation in existing human populations provides evidence of recurrent arm-to-arm gene conversion in our species. We conclude that during recent evolution, an average of approximately 600 nucleotides per newborn male have undergone Y–Y gene conversion, which has had an important role in the evolution of multi-copy testis gene families in the MSY. The human MSY palindromes, designated P1–P8, are surprisingly large, with arm lengths that range from 9 kilobases (kb; P7) to 1.45 megabases (Mb; P1) (see Table 2 and Figs 2, 3 and 5 of the accompanying manuscript1). The paired arms of each palindrome are separated by a non-duplicated spacer that measures 2–170 kb in length. Fifteen gene and transcript families have been identified in the palindrome arms (none in the spacers), and all seem to be expressed predominantly or exclusively in testes1. Similar to the palindrome arms in which they reside, these gene families are characterized by extremely low sequence divergence between the copies found in a single Y chromosome. The DAZ gene family of the MSY resides exclusively in the arms of palindromes P1 and P2 (ref. 2). Near identity between DAZ copies in a single Y chromosome led some investigators to conclude, based on molecular clock reasoning, that DAZ gene amplification had occurred only within the last 200,000 years3. However, multiple Ylinked copies of DAZ also exist in apes and Old World monkeys3-6. This suggests that palindromes P1 and P2, which contain the DAZ genes, might predate the divergence of humans from other primate lineages. This may be true for the other MSY palindromes as well. In that case, the near identity observed between palindrome arms could be the consequence of gene conversion—"the non-reciprocal transfer of information from one DNA duplex to another"7. Gene conversion sometimes involves transfer between repeated sequences on the same chromosome8. To test the ancient origins/gene-conversion hypothesis, we looked for evidence that MSY palindromes were present in the common ancestor of humans and chimpanzees. Specifically, we searched for orthologues of the eight human palindromes in chimpanzees (Pan troglodytes), bonobos (pygmy chimpanzee, Pan paniscus) and gorillas (Gorilla gorilla). In each species, and for each palindrome, we attempted to amplify, by polymerase chain reaction (PCR), and sequence the two inner boundaries (between spacer and arms) and the two outer boundaries (between arms and surrounding sequences). We successfully amplified both inner boundaries in multiple palindromes (Table 1). In all of these cases, the PCR products were observed only when male genomic DNAs were used as templates, and never when using female genomic DNAs (data not shown). This implies that the PCR products were amplified from the male-specific regions of the great ape Y chromosomes. In all cases, the boundary sequences were essentially identical in humans and great apes (Fig. 1a; see also Supplementary Information). Only for P7 did we successfully amplify both outer boundaries (in chimpanzee and bonobo). These findings suggested that: (1) most palindromes found in the modern human MSY were already present, in the MSY, in the common ancestor of humans and chimpanzees; and (2) inner boundaries are more highly conserved than outer boundaries. Figure 1 Sequence comparison of human and ape MSY palindromes. Full legend High resolution image and legend (79k) To enable detailed comparisons of human and chimpanzee palindromes, we screened a male chimpanzee genomic bacterial artificial chromosome (BAC) library for clones homologous to the inner boundaries of human palindromes P1–P8. We identified and then sequenced chimpanzee BACs corresponding to palindromes P1, P2, P6 and P7. (The BAC library provided only one- to twofold coverage, on average, of chimpanzee MSY sequences and thus was not expected to contain all boundaries of MSY palindromes.) Comparative sequence analysis confirmed the structural similarity of the human and chimpanzee palindromes and, by inference, their common ancestry (Fig. 1b; see Supplementary Information for complete sequence alignments). We observed 1.44% sequence divergence, on average, between orthologous palindrome arms in human and chimpanzee (Fig. 1b and Table 2). Such divergence between species probably reflects the simple accumulation of neutral mutations in the human and chimpanzee lineages after their separation. However, within each of the chimpanzee palindromes studied, we observed markedly little arm-to-arm divergence: 0.028%, on average, which is statistically indistinguishable from the 0.021% arm-to-arm divergence observed in the human MSY palindromes (Table 2; see also Supplementary Table 7). We conclude that the MSY palindromes predated separation of the human and chimpanzee lineages, and that, in both the human and chimpanzee lineages, the paired arms of the palindromes evolved in concert. If gene conversion between palindrome arms was responsible for our findings, it might leave traces in the recent genealogy of the human MSY. In particular, we might find evidence that single nucleotide differences between the two arms of a human MSY palindrome had been eliminated by gene conversion. Examination of two CDY genes— one in each arm of palindrome P1—revealed a duplicated site of sequence variation that fulfilled this prediction. By sequencing this duplicated site in diverse, unrelated men, we identified some Y chromosomes with a C at this site in both arms of P1 (C/C chromosomes), other chromosomes with a C in one arm and a T in the second arm (C/T chromosomes), and other chromosomes with a T in both arms (T/T chromosomes; Fig. 2a). We confirmed these findings using a PCR/restriction-digestion assay (Supplementary Fig. 4). This single nucleotide substitution occurs at nucleotide 381 of the CDY coding region but does not alter the predicted amino acid sequence. Figure 2 Site in CDY1 showing evidence of multiple independent gene conversion events. Full legend High resolution image and legend (44k) We then typed this nucleotide variant in 171 unrelated men chosen to represent the great diversity of Y chromosomes that other investigators have discovered in human populations. Specifically, these 171 Y chromosomes represented 42 distinct branches of a robust tree of human Y chromosome genealogy (Supplementary Fig. 1)9. In this sampling of the MSY genealogical tree, C/T chromosomes and T/T chromosomes were confined to a young cluster of five closely related branches (Fig. 2b; see also Supplementary Fig. 1). In the 37 other tested branches, only C/C chromosomes were observed. This distribution (Fig. 2b) suggested that the chromosome immediately ancestral to the five-branch cluster was C/T, and that this chromosome had arisen (from a C/C chromosome) by a C T substitution in one arm of palindrome P1. In three of this cluster's five branches, we observed T/T as well as C/T chromosomes (Fig. 2; see also Supplementary Fig. 1). This finding is readily explained by gene conversion in a C/T chromosome—the ancestral chromosome for this cluster—replacing the C in one arm of palindrome P1 with the T in the other arm. The data reveal at least three such gene-conversion events—one in each of the branches that have T/T chromosomes (Fig. 2b). In one of these branches, we also observed C/C chromosomes alongside C/T and T/T chromosomes (Fig. 2b). Here we surmise that gene conversion in a C/T chromosome replaced the T in one arm of P1 with the C in the other arm. Thus, during recent human history, gene conversion in C/T chromosomes has used either the C copy or the T copy as template. In addition, we investigated two other duplicated sites of sequence variation, and at both sites we found evidence of recurrent gene conversion during recent human history (Supplementary Figs 2 and 3). How frequently does gene conversion occur in the MSY palindromes? Near uniformity of arm-to-arm sequence divergence in both human and chimpanzee palindromes (Table 2 in ref. 1 and Fig. 1b) suggests a steady-state balance between new mutations that create differences between arms, and gene-conversion events that erase these differences. Accordingly, we can calculate the rate of gene conversion needed to maintain the observed divergence in the face of new mutations. Let µ be the human MSY mutation rate, 1.6 10-9 substitutions per nucleotide per year (see Methods). Let d be the observed divergence between human MSY palindrome arms (3 10-4 substitutions per duplicated nucleotide), and let c be the (unknown) rate of gene conversion (in both directions combined) per duplicated nucleotide per year. Differences between arms are introduced at a rate of 2µ (as a mutation in either arm creates a difference between arms), and homogenized at a rate of cd. Thus, at steady state, cd = 2µ. Then c = 2µ/d = 2 1.6 10-9/3 10-4 = 1.1 10-5 gene conversions per duplicated nucleotide per year. For a 20-year human generation, this corresponds to a rate of 2.2 10-4 conversions per duplicated nucleotide per generation, comparable to rates estimated directly in a mouse transgenic system10. Over the 5.4 Mb in human MSY palindromes (2.7 106 duplicated nucleotides), then, an average of about 600 duplicated nucleotides have undergone arm-to-arm gene conversion for every son born in recent human evolution. Most of these conversions would have involved two identical DNA sequences, and thus their products would be unobservable. The inferred kinetics of gene conversion in MSY palindromes is especially striking because the MSY was previously viewed as recombinationally inert under normal circumstances: it was known previously as the non-recombining region, or NRY. At present, we do not know whether gene conversion in MSY palindromes occurs during meiosis, mitosis, or both. It may involve homology-directed double-strand break repair, as in gene conversion between homologous chromosomes or sister chromatids11. An interesting observation is that human–chimpanzee divergence is significantly reduced in MSY palindrome arms as compared with other MSY sequences examined (Table 2). This reduction is evident even when comparing Alu and other interspersed repeat sequences that are presumed to be of little functional consequence (Supplementary Table 1). Thus, the reduced rate of evolution in palindrome arms does not seem to be due to selective constraints. A weak directional bias in gene conversion, favouring restoration of the original sequence, might account for these observations. Our finding of abundant gene conversion in MSY palindromes raises questions about the molecular-clock dating of other segmental duplications in the human genome12. Some of these were interpreted as being of recent origin based on low copy-to-copy divergence13. In other cases, however, analysis by Southern blots14, 15 or quantitative PCR16 indicated that these duplications exist in great apes as well as in humans. Thus, these duplications might well represent conserved genomic organizations subject to gene conversion and concerted evolution. In the case of human X-chromosomal colour vision genes, 2 kb of comparative sequence data confirm concerted evolution 17, 18. Our current findings, taken together with these previous results, raise the possibility that gene conversion in primate genomes could be much more pervasive than previously thought. Finally, we note a strong association between gene conversion and MSY testis genes. In humans, all genes in MSY palindromes seem to be expressed predominantly or exclusively in testes, and most MSY genes with this expression pattern occur in palindromes1. Given the abundance of gene conversion in palindromes, we infer that Y–Y gene conversion has accompanied and shaped the evolution of multi-copy testis gene families in the MSY. Perhaps some selective advantage stemmed from the palindromic duplication of MSY testis genes during human evolution. If so, has Y–Y gene conversion had a role in that advantage? Has it allowed genes in palindromes to resist, or at least retard, the evolutionary decay that is a hallmark of Y chromosome evolution19? This could explain the observation, as reported in the accompanying paper, that intact testis-specific genes tend to be located in palindrome arms whereas non-functional copies of these genes seem to be distributed randomly (see Table 4 in ref. 1). A full understanding of the functional and evolutionary significance of our findings will require further study in primates and other mammals. Methods Estimating the MSY mutation rate We estimated the MSY mutation rate in the human lineage based on the data and analysis in ref. 20, and an estimate of 5.5 million years ago for the most recent common ancestor of humans and chimpanzees21. The result is 1.6 10-9 substitutions per nucleotide per year (Supplementary Fig. 5). PCR amplification and sequencing of palindrome boundaries Supplementary Table 2 lists the PCR primers and conditions used to amplify palindrome boundaries. Supplementary Table 3 provides GenBank accession numbers for the chimpanzee, bonobo and gorilla sequences obtained. Identification and sequencing of chimpanzee BACs We screened high-density filters from the RPCI-43 male chimpanzee BAC library22 (BACPAC resources) using hybridization probes designed to detect sequences (1) near the inner boundaries of palindromes P1–P6 and P8; (2) near P7; and (3) from a non-ampliconic region of the human MSY. STS content and BAC-end sequences confirmed that, among the candidate BACs identified by hybridization, six contained the central portions of orthologues to human MSY palindromes. The BACs were sequenced as previously described2. Supplementary Table 4 provides descriptions of the sequenced BACs and their GenBank accession numbers. Sequence analysis Sequences were aligned with CLUSTAL W using default parameters23. In a few cases, the resulting alignments were adjusted manually. All alignments are provided as Supplementary Information. Typing nucleotide variants in palindrome arms The sites studied were CDY1 + 381 (Fig. 2 and Supplementary Fig. 1), CDY1 - 84 (Supplementary Fig. 2), and sY586 (Supplementary Fig. 3). sY586 was genotyped as previously described24. PCR primers and conditions for amplifying CDY1 + 381 (sY1313) and CDY1 - 84 (sY1314) have been deposited in GenBank (accession numbers G73596 and G73597, respectively). When typing CDY1 + 381 by sequencing, 'primer A' in GenBank G73596 served as the sequencing primer. CDY1 - 84 was typed by sequencing using 'primer B' in GenBank G73597. For the samples that showed evidence of gene conversion (Fig. 2 and Supplementary Figs 1–3), we excluded the possibility of deletion of one copy of the variant site as discussed in Supplementary Note 1. Steady-state balance between mutations and gene-conversion To show that the combined action of mutation and gene conversion results in a steady-state level of arm-to-arm divergence, we use the following recursion: dn+1 = (1 - cg)dn + 2µg where dn is the sequence divergence between repeat copies at generation n, µg is the mutation rate per nucleotide per generation, and cg is the gene conversion rate per duplicated nucleotide per generation. We presume that d0 = 0, corresponding to no differences between sequence copies immediately after the initial duplication event. However, as 1 - cg < 1, limn dn = 2µg/cg, for any value of d0 small enough to support cg. Because µg and d are very small, mutations almost never occur at sites that already differ between the two palindrome arms, and this possibility can be ignored. As shown in Supplementary Note 2, our analysis is a special case of Ohta's analysis25. Supplementary information accompanies this paper. Received 10 March 2003; accepted 7 April 2003 References 1. Skaletsky, H. et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423, 825-837 (2003) | Article | 2. Kuroda-Kawaguchi, T. et al. The AZFc region of the Y chromosome features massive palindromes and uniform recurrent deletions in infertile men. Nature Genet. 29, 279-286 (2001) | Article | PubMed | ChemPort | 3. Agulnik, A. I. et al. Evolution of the DAZ gene family suggests that Y-linked DAZ plays little, or a limited, role in spermatogenesis but underlines a recent African origin for human populations. Hum. Mol. Genet. 7, 1371-1377 (1998) | Article | PubMed | ChemPort | 4. Reijo, R. et al. Diverse spermatogenic defects in humans caused by Y chromosome deletions encompassing a novel RNA-binding protein gene. Nature Genet. 10, 383-393 (1995) | PubMed | ChemPort | 5. Glaser, B. et al. Simian Y chromosomes: species-specific rearrangements of DAZ, RBM, and TSPY versus contiguity of PAR and SRY. Mamm. Genome 9, 226-231 (1998) | Article | PubMed | ChemPort | 6. Makova, K. D. & Li, W. H. Strong male-driven evolution of DNA sequences in humans and apes. Nature 416, 624-626 (2002) | Article | PubMed | ChemPort | 7. Szostak, J. W., Orr-Weaver, T. L., Rothstein, R. J. & Stahl, F. W. The double-strand-break repair model for recombination. Cell 33, 25-35 (1983) | PubMed | ChemPort | 8. Jackson, J. A. & Fink, G. R. Gene conversion between duplicated genetic elements in yeast. Nature 292, 306-311 (1981) | PubMed | ChemPort | 9. Underhill, P. A. et al. The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann. Hum. Genet. 65, 43-62 (2001) | Article | PubMed | ChemPort | 10. Murti, J. R., Bumbulis, M. & Schimenti, J. C. High-frequency germ line gene conversion in transgenic mice. Mol. Cell. Biol. 12, 2545-2552 (1992) | PubMed | ChemPort | 11. Johnson, R. D. & Jasin, M. Sister chromatid gene conversion is a prominent double-strand break repair pathway in mammalian cells. EMBO J. 19, 3398-3407 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. (2000) | Article | PubMed | ChemPort | Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 10031007 (2002) | Article | PubMed | ChemPort | International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001) | Article | PubMed | ChemPort | Small, K., Iber, J. & Warren, S. T. Emerin deletion reveals a common X-chromosome inversion mediated by inverted repeats. Nature Genet. 16, 95-99 (1997) Aradhya, S. et al. Multiple pathogenic and benign genomic rearrangements occur at a 35 kb duplication involving the NEMO and LAGE2 genes. Hum. Mol. Genet. 10, 2557-2567 (2001) | Article | PubMed | ChemPort | Rochette, C. F., Gilbert, N. & Simard, L. R. SMN gene duplication and the emergence of the SMN2 gene occurred in distinct hominids: SMN2 is unique to Homo sapiens. Hum. Genet. 108, 255-266 (2001) | Article | PubMed | ChemPort | Deeb, S. S., Jorgensen, A. L., Battisti, L., Iwasaki, L. & Motulsky, A. G. Sequence divergence of the red and green visual pigments in great apes and humans. Proc. Natl Acad. Sci. USA 91, 7262-7266 (1994) | PubMed | ChemPort | Zhou, Y.-H. & Li, W.-H. Gene conversion and natural selection in the evolution of X-linked color vision genes in higher primates. Mol. Biol. Evol. 18, 780-783 (1996) Charlesworth, B. & Charlesworth, D. The degeneration of Y chromosomes. Phil. Trans. R. Soc. Lond. B 355, 1563-1572 (2000) | Article | ChemPort | Bohossian, H. B., Skaletsky, H. & Page, D. C. Unexpectedly similar rates of nucleotide substitution found in male and female hominids. Nature 406, 622-625 (2000) | Article | PubMed | ChemPort | Kumar, S. & Hedges, S. B. A molecular timescale for vertebrate evolution. Nature 392, 917920 (1998) | Article | PubMed | ChemPort | Fujiyama, A. et al. Construction and analysis of a human-chimpanzee comparative clone map. Science 295, 131-134 (2002) | Article | PubMed | Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673-4680 (1994) | PubMed | ChemPort | Saxena, R. et al. Four DAZ genes in two clusters found in AZFc region of human Y chromosome. Genomics 67, 256-267 (2000) | Article | PubMed | ChemPort | Ohta, T. Allelic and nonallelic homology of a supergene family. Proc. Natl Acad. Sci. USA 79, 3251-3254 (1982) | PubMed | ChemPort | Casanova, M. et al. A human Y-linked DNA polymorphism and its potential for estimating genetic and evolutionary distance. Science 230, 1403-1406 (1985) | PubMed | ChemPort | Underhill, P. A. et al. Detection of numerous Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography. Genome Res. 7, 996-1005 (1997) | PubMed | ChemPort | Shen, P. et al. Population genetic implications from sequence variation in four Y chromosome genes. Proc. Natl Acad. Sci. USA 97, 7354-7359 (2000) | Article | PubMed | ChemPort | Acknowledgements. We thank R. K. Alagappan and L. G. Brown for technical contributions; N. A. Ellis, M. F. Hammer, T. Jenkins and P. A. Underhill for assistance with genealogical studies; H. M. McClure and Yerkes Regional Primate Research Center for samples; C. Disteche, A. E. Donnenfeld, J. H. Hersh, T. Jenkins, P. G. McDonough, B. McGillivray, R. D. Oates, P. Patrizio, R. Rosenfield, L. Shapiro, S. Silber, M. C. Summers, J. Weissenbach, B. Whitmire and S. Yang for patient samples; and J. E. Alfoldi, B. Charlesworth, A. G. Clark, J. Koubova, J. Lange, B. Levy, T. L. Orr-Weaver, S. Repping, W. R. Rice and J. Saionz for comments on the manuscript. This work was supported by the National Institutes of Health and the Howard Hughes Medical Institute. Competing interests statement. The authors declare that they have no competing financial interests. Figure 1 Sequence comparison of human and ape MSY palindromes. a, Nucleotide sequences of inner boundaries of palindrome P6 in human and apes. Dots represent identity to human sequence. Full interspecific alignments of this and other palindromes' boundaries are in Supplementary Information. b, Overview of sequence divergence between human and chimpanzee palindromes, and between palindrome arms within each species. Each palindrome is shown to scale, folded about the centre of the spacer. For palindromes P1/P2 and P6, only the central portions are contained in sequenced chimpanzee BACs, and the palindromes are not perfectly centred within the BACs. Therefore, more sequence from one arm is available than from the other. For P1/P2 we include the 5' and 3' DAZ exons but exclude the central, intragenically duplicated regions of the gene24). The CDY1 genes are not in the portions of P1 shown1. For palindrome P7, the entire sequence of both arms is represented in sequenced chimpanzee BACs, as is extensive flanking, non-ampliconic sequence. Supplementary Table 8 provides confidence intervals, calculations and links to sequence alignments. Figure 2 Site in CDY1 showing evidence of multiple independent gene conversion events. This site, named CDY1 + 381, occurs in each arm of palindrome P1. a, Sequence traces for samples PD365, PD335 and PD207 with C/C, C/T and T/T chromosomes, respectively. b, Distribution of C/C, C/T and T/T chromosomes in the MSY genealogical tree, focusing on the cluster of related branches to which C/T and T/T chromosomes are confined. M92, M67, M12, M172, p12f: biallelic polymorphisms that define branch points in the part of the tree shown26–28. See Supplementary Fig. 1 for the full tree and inference of ancestral genotypes. Y chromosome sequence completed DNA readout reveals genetic palindromes safeguard maledefining chromosome. 19 June 2003 JOHN WHITFIELD The Y chromosome has sex with itself to guard against mutation. © GettyImages Reports of the demise of the Y chromosome and an impending extinction of men may have been exaggerated. The Y's full genome sequence reveals that we have underestimated its powers of selfpreservation. Instead of doubling up to protect its genetic cargo like other chromosomes, the lone Y safeguards its genes by having sex with itself, an international consortium has found. "We're on a quest to bring respectability to the Y chromosome," says geneticist David Page of the Massachusetts Institute of Technology, leader of the sequencing team. The male-defining chromosome was previously thought of as a wasteland where genes go to die. The Y's defences are double-edged, however, sometimes leading to infertility. The sequence should help us to diagnose and treat such genetic mishaps. Two-way street Human chromosome pairs swap genes to minimize bad mutations. Y, which has no partner, faces being whittled away by mutation. Some estimate that the chromosome could be complete junk in about ten million years. The finished sequence shows that the chromosome fights entropy with palindromes. About six million of its 50 million DNA letters reside in sequences that read the same, in opposite directions, on both strands of the double helix. The longest is nearly three million letters long1. "The Y chromosome is a hall of mirrors," says Page. These palindromes house many genes - which means that there is a copy at each end of the palindromic sequence. These provide back-ups should harmful mutations arise. The mirror-image structure also allows the arms to swap position when DNA divides. Genes are shuffled and bad copies are purged. Page's team has calculated the amount of swapping needed in each generation to produce the nearperfect palindromes of the human Y. They estimate that every man's Y contains 600 DNA letters that differ from his father's2. This is thousands of times more than the normal mutation rate. There are 50 million letters in Y's finished sequence. source: Nature "No one had contemplated that there would be this level of gene conversion in our own genome," says Huntington Willard of Duke University, Durham, North Carolina. "It gives us a glimpse of how the Y has protected itself." Other researchers see swapping as an evolutionary accident, not a safeguard. "It's a daring suggestion, but I find it a bit difficult to believe," says geneticist Mark Jobling of the University of Leicester, UK. Jobling is sceptical because the trick has a high cost: good genes are just as liable to be lost as bad. This is a major cause of male infertility, as most of the genes within the palindromes control testes development. One in every few thousand men is infertile because key genes have been deleted. Y files Genetic testing is already used to diagnose male infertility. A fuller understanding of the Y's make- up will help refine these tests, and improve doctors' advice to couples. "We have a greater knowledge of where the Y tends to break," says Page. "Testing needs to be updated to reflect our better understanding from the finished sequence." The palindromes, and other forms of repeated DNA, made the Y chromosome very tricky to sequence. So the finished sequence comes from just one man's Y. Getting more sequences is essential, says Jobling, as the chromosome's structure, and hence biology, varies greatly around the world. "We have a beautiful snapshot of the Y chromosome," he says. "Now we need to look in other lineages to build up a photo album of its diversity." References 1. 2. Skaletsky, H. et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature, 423, 825 - 837, (2003). |Article| Rozen, S. et al. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature, 423, 873 - 876, (2003). |Article| Y chromosomes rewrite British history Anglo-Saxons' genetic stamp weaker than historians suspected 19 June 2003 HANNAH HOAG A new survey of Y chromosomes in the British Isles suggests that the Anglo-Saxons failed to leave as much of a genetic stamp on the UK as history books imply1. Some Scottish men's Y's are remarkably similar to those of southern England. © GettyImages Romans, AngloSaxons, Danes, Vikings and Normans invaded Britain repeatedly between 50 BC and AD 1050. Many historians ascribe much of the British ancestry to the AngloSaxons because their written legacy overshadows that of the Celts. But the Y chromosomes of the regions tell a different story. "The Celts weren't pushed to the fringes of Scotland and Wales; a lot of them remained in England and central Ireland," says study team member David Goldstein, of University College London. This is surprising: the Anglo-Saxons reputedly colonized southern England heavily. The Anglo-Saxons and Danes left their mark in central and eastern England, and mainland Scotland, the survey says, and the biological traces of Norwegian invaders show up in the northern British Isles, including Orkney. Similar studies, including one by the same team, have looked at differences in mitochondrial DNA, which we inherit from our mothers. They found little regional variation because females tended to move to their husbands. But the Y chromosome shows sharper differences from one geographic region to the next, says geneticist Luca Cavalli-Sforza, of Stanford University, California. "The Y chromosome has a lower mutation rate than mitrochondrial DNA." Goldstein's team collected DNA samples from more than 1,700 men living in towns across England, Ireland, Scotland and Wales. They took a further 400 DNA samples from continental Europeans, including Germans and Basques. Only men whose paternal grandfathers had dwelt within 20 miles of their current home were eligible. The Y chromosomes of men from Wales and Ireland resemble those of the Basques. Some believe that the Basques, from the border of France and Spain, are the original Europeans. The new survey is an example of how archaeologists, prehistorians and geneticists are beginning to collaborate, comments Chris TylerSmith of the University of Oxford, UK, who tracks human evolution using the Y chromosome. "It would be nice to see the whole world surveyed in this kind of detail, but it's expensive and there are other priorities." References 1. Capelli, C. et al. A Y chromosome census of the British Isles. Current Biology, 13, 979 - 984, (2003). |Article| 28 February 2002 Nature 415, 963 (2002); doi:10.1038/415963a Human spermatozoa: The future of sex R. JOHN AITKEN1 AND JENNIFER A. MARSHALL GRAVES2 1 R. John Aitken is at the Hunter Medical Research Institute and is in the Discipline of Biological Sciences, University of Newcastle, Callaghan, New South Wales 2308, Australia. 2 Jennifer A. Marshall Graves is in the Research School of Biological Sciences, Australian National University, Canberra, ACT 2601, Australia. The desperate plight of the human spermatozoon is clearly reflected by the poor fecundity of our species. Human spermatozoa stand apart from the gametes of virtually all other mammals in the paucity of their phenotype, the inadequacy of their function, and the sensitivity to fragmentation of their mitochondrial and nuclear genomes. Roughly one in seven Western couples seek treatment for infertility, mostly because of problems with semen quality. Even when a human spermatozoon achieves the fertilization of an oocyte, damage can crop up in the next generation. All dominant mutations in our species (such as those that cause achondroplasia, multiple endocrine neoplasia and Apert's syndrome) seem to arise in the male germ line. Spermatozoa are also important mediators of the environmental contribution to cancer in young adults and children; for example, heavy smoking in fathers is reported to confer a fourfold increase in cancer risk to their children. The impact of environmental toxicants and the innate inadequacy of human spermatozoa are compounded by the advent of effective contraception and the introduction of assisted-conception technologies. This lifting of the selection pressure on fertility means that those endowed with genes for high fecundity have lost their advantage over those without. As a result, future generations are bound to experience a further decline in semen quality and, ultimately, human fertility. What mechanisms are responsible for the poor fertilizing potential and genetic damage shown by human spermatozoa? Two main causes of germ-cell dysfunction have recently been discovered: gene deletions on the long arm of the male sex-determining Y chromosome, and oxidative stress. We believe that these aetiologies may be associated. Swimming against the tide: sperm quality seems set to decline still further in future generations. The Y chromosome is particularly vulnerable to gene deletions because it is not a matching partner for the X chromosome, so it cannot retrieve lost genetic information by homologous recombination. Over the past 300 million years, the mammalian Y chromosome has been reduced from a pairing partner to the X chromosome to a shadow of its former self, rescued only by a large addition from a non-sex-determining chromosome in 'placental' mammals. Many of the remaining genes have acquired functions essential for sex determination and spermatogenesis. The original Y chromosome contained around 1,500 genes, but during the ensuing 300 million years all but about 50 were inactivated or lost. Overall, this gives an inactivation rate of five genes per million years. The presence of many genes that have lost their function (pseudogenes) on the Y chromosome indicates that this process of attrition is continuing, so that even these key genes will be lost. At the present rate of decay, the Y chromosome will self-destruct in around 10 million years. This has already occurred in the mole vole, in which the Y chromosome (together with all of its genes) has been completely lost from the genome. Accelerated degeneration of the Y chromosome is found in the 5–15% of severely infertile men whose infertility is caused by wholesale deletions of parts of this chromosome. Because mutations that cause infertility cannot be inherited, the relative abundance of Ychromosome deletions in male patients suggests an extremely high rate of spontaneous DNA damage. Even microdeletions on the Y chromosome destabilize its transmission, frequently causing it to be lost during gamete production. One important mechanism by which DNA damage is induced in the male germ line is oxidative stress. Spermatozoa are particularly vulnerable to this because they generate reactive oxygen species and are rich in targets for oxidative attack. Moreover, because they are transcriptionally inactive and have little cytoplasm, spermatozoa are deficient in both antioxidants and DNA-repair systems. Oxidative stress is thus a major cause of male infertility, and contributes to the high rate of DNA fragmentation in spermatozoa. Such DNA fragmentation probably predisposes the cell to mutagenic change, which would become fixed as a deletion in the embryo by aberrant recombination. The Y chromosome is susceptible to such recombination because of its high frequency of repetitive elements. Fragmentation induced by free radicals in the Y chromosome's DNA might also cause other post-fertilization genetic changes, such as insertions and amplifications. Because mutations that originate in this way precede the embryo's first cleavage division, they will enter the germ line and contribute to infertility and morbidity, including cancer, in the offspring. At present, we have no idea what causes oxidative stress, DNA fragmentation and functional incompetence in human spermatozoa. But we do know for certain that such events put pressure on the vulnerable Y chromosome. In the long term, absolute selection against males with deletions that confer sex reversal or sterility will create strong pressure either to retain (and amplify) fertility genes, or for any fertile variant that replaces it. Could the present race of humans eventually be replaced by a new variant (or several independent variants that cannot cross-hybridize) with an alternative sex-determining/differentiation system? Such a new hominid race could differ from present humans in many other characteristics, depending on the gene pool of the new variant's handful of founders. There is evidence of rapid 'selective sweeps' in the Y chromosome's evolution. These take place when a Y chromosome with an allele that confers a big selective advantage rapidly replaces other Y chromosomes in the population. As the Y chromosome is never broken up by recombination, whatever alleles lie at other genetic loci — even if they are deleterious — will 'hitch-hike' to fixation along with the advantageous variant. FURTHER READING Aitken, R. J. J. Reprod. Fertil. 115, 1–7 (1999). Marshall Graves, J. A. Biol. Reprod. 63, 667–676 (2000). Just, W. et al. Nature Genet. 11, 117–118 (1995). Kuroda-Kawaguchi, T. et al. Nature Genet. 29, 279–286 (2001). Kamp, C. et al. Mol. Hum. Reprod. 7, 987–994 (2001). 10 August 2000 Nature 406, 622 - 625 (2000); doi:10.1038/35020557 Unexpectedly similar rates of nucleotide substitution found in male and female hominids HACHO B. BOHOSSIAN, HELEN SKALETSKY & DAVID C. PAGE Howard Hughes Medical Institute, Whitehead Institute, and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA Correspondence and requests for materials should be addressed to D.C.P. (e-mail: dcpage@wi.mit.edu). In 1947, it was suggested that, in humans, the mutation rate is dramatically higher in the male germ line than in the female germ line1. This hypothesis has been supported by the observation that, among primates, Y-linked genes evolved more rapidly than homologous X-linked genes2-6. Based on these evolutionary studies, the ratio ( m) of male to female mutation rates in primates was estimated to be about 5. However, selection could have skewed sequence evolution in introns and exons7-10. In addition, some of the X–Y gene pairs studied lie within chromosomal regions with substantially divergent nucleotide sequences7, 11, 12. Here we directly compare human X and Y sequences within a large region with no known genes. Here the two chromosomes are 99% identical, and X–Y divergence began only three or four million years ago, during hominid evolution13-15. In apes, homologous sequences exist only on the X chromosome. We sequenced and compared 38.6 kb of this region from human X, human Y, chimpanzee X and gorilla X chromosomes. We calculated m to be 1.7 (95% confidence interval 1.15–2.87), significantly lower than previous estimates in primates. We infer that, in humans and their immediate ancestors, male and female mutation rates were far more similar than previously supposed. Li et al. have suggested that the most accurate and precise estimates of m should emerge from comparison of lengthy DNA segments that are non-functional but highly similar in sequence7. We therefore focused upon a large region of 99% nucleotide identity between the long arm of the human X chromosome (Xq) and the short arm of the human Y chromosome (Yp). These sequences are present on human Yp because of a massive X-to-Y transposition that occurred about three or four million years ago, after divergence of the human and chimpanzee lineages13-15. This Xq–Yp region is poor in genes (T. Kawaguchi et al., unpublished results). The 1% divergence between the human X- and Y-linked sequences presumably reflects the random accumulation of new mutations on both the X and Y chromosomes during the last three to four million years of hominid evolution. Given this evolutionary history and the paucity of genes, the Xq–Yp region offers a nearly ideal substrate for estimation of m in hominids. From within this Xq–Yp region we selected a 38.6-kb segment for detailed study. Human X and Y-chromosomal bacterial artificial chromosomes (BACs) containing this segment had been isolated in our laboratory and sequenced at the Whitehead Institute/MIT Center for Genome Research. This segment has a total content of guanine and cytosine (G+C content) of 35%; Alus, LINES, and other interspersed repeat elements account for 61% of the total sequence on both X and Y. We found no known or electronically predicted genes within this segment or within 100 kb to either side of the segment. We compared the sequences of this 38.6-kb segment as found on the human X, human Y, chimpanzee X and gorilla X chromosomes. For the chimpanzee and gorilla X chromosomes, we generated sequencing templates by polymerase chain reaction (PCR) amplification using female genomic DNAs as starting material. As controls, we resequenced the corresponding portions of the human X and Y BACs, again using PCRgenerated templates. In this manner, we were able to assemble 38.6 kb of virtually continuous sequence from all four chromosomes. Assembly and sequencing of genomic PCR products across the entire 38.6-kb segment were straightforward because the region's interspersed repeat elements, though numerous, were ancient and thus did not impede selection of locus-specific oligonucleotide primers. This characteristic of the region, together with the absence of genes, had led us to select it for study. We discovered that, within this segment, the human X and Y chromosome differed at 441 nucleotides (Table 1); the two human chromosomes were 98.86% identical, in good agreement with previous estimates for the larger Xq–Yp region15. Of these 441 nucleotide substitutions, we were able to infer in 413 cases whether the mutation had occurred on the hominid X chromosome (175 cases) or on the hominid Y chromosome (238 cases). This was done by examining, for each substitution, the corresponding nucleotide position on the chimpanzee and gorilla X chromosomes. For example, if a T nucleotide was present on the human Y chromosome, but an A was present at the corresponding site on human, chimpanzee, and gorilla X chromosomes, we inferred that the primitive or ancestral state was A, and that an A-to-T substitution had occurred on the hominid Y chromosome. Conversely, if the human Y, gorilla X, and chimpanzee X chromosomes were identical to each other but differed from the human X chromosome at a particular nucleotide, we inferred that a mutation had arisen on the hominid X chromosome. We were unable to infer the chromosomal origin of the substitution at only 28 nucleotide sites; in most such cases, we observed nucleotide differences between chimpanzee and gorilla. (We also traced the chromosomal origins of small insertions and deletions—15 on the hominid X chromosome, 23 on the hominid Y chromosome—but these events were too few to merit detailed analysis.) Using the inferred numbers of nucleotide substitutions on the hominid X and Y chromosomes, and ignoring the 28 unresolved differences, we estimated m by Miyata's formula: where Y/X is the ratio of mutation rates on the two sex chromosomes2. This formula is based on the expectation that, in any generation, two-thirds of X chromosomes are transmitted through the female germ line, the remaining one-third being transmitted through the male germ line. By contrast, all Y chromosomes are transmitted through the male germ line. Our observed Y/X ratio of (238/175) = 1.36 (95% confidence interval 1.12–1.65) implies that m = 1.66 (95% confidence interval 1.19–2.45). Restricting the analysis to either transitions or transversions does not alter the estimate of m (Table 1). Our calculations of Y/X and m are based on direct comparison of human X and human Y chromosomal sequences with a readily inferred ancestral sequence. All previous estimates of Y/X (and thus of m) in primates involved X–Y gene pairs that were too diverged to allow accurate reconstruction of ancestral sequences. Previous estimates of Y/X required construction and comparison of two trees of evolutionary distances: one tree for orthologous Y-linked sequences in several species and a second tree for orthologous Xlinked sequences in the same species3-6. To ensure that discrepancies between past and present findings were not attributable to different methods of calculation, we re-analysed our sequence data using the traditional method. We first estimated evolutionary distances between the human Y, human X, chimpanzee X and gorilla X sequences using the method of ref. 16 (Table 2). As expected, the resulting phylogenetic tree (Fig. 1) casts chimpanzee and gorilla as outgroups to the human X and Y sequences. The tree yields a Y/X value of (0.67/0.51) = 1.31 (95% confidence interval 1.05–1.55), implying that m = 1.55 (95% confidence interval 1.08–2.14). Thus, recalculating Y/X and m by the traditional phylogenetic method yields essentially the same results as our direct calculation. The phylogenetic tree also suggests that X–Y gene conversion was not a major factor in the evolution of these sequences in hominids. (Even if some X–Y gene conversion had occurred, it would not affect our estimates of m.) Finally, the phylogenetic tree corroborates the conclusion15 that the X-to-Y transposition occurred within one to two million years after the divergence (roughly four to six million years ago) of the human and chimpanzee lineages. Figure 1 Phylogenetic tree of nucleotide sequences. Full legend High resolution image and legend (17k) Although our direct calculation indicated that m 1.66, this may be a slight underestimate. Our calculation assumed that the 1.1% divergence observed between the human X- and Ylinked sequences was entirely attributable to mutations that arose after X-to-Y transposition. However, the 38.6-kb sequence analysed may have been polymorphic (on the X chromosome) at the time of transposition, three to four million years ago. Such ancient polymorphism could account for part of the observed X–Y divergence and could result in our underestimating both Y/X and m. Nonetheless, recent studies of sequence diversity on X chromosomes in modern humans and chimpanzees suggest that the correction for ancient polymorphism should be modest. Assuming that X-linked sequence diversity in ancient hominids approximated that in modern human populations (4 10-4 per base pair, bp)17, our estimate of m is essentially unchanged. Alternatively, if X-linked sequence diversity in ancient hominids was as high as that in modern chimpanzee populations (1.3 10-3 per bp)18, then our estimate of m should be corrected to about 1.8 (95% confidence interval 1.15–2.87). In any case, our estimate of m in hominids is much lower than previous estimates in primates (Table 3). Several factors may account for this discrepancy. Our study addressed the last three to four million years of hominid evolution. By contrast, previous experimental designs required comparisons among divergent primate lineages and thus yielded estimates of m averaged across broad swaths of primate evolution3-6. These studies could not have detected whether m was lower in hominids than in other primates. Second, our study examined a much larger DNA segment and yielded more precise estimates of m. Large standard errors in previous estimates (Table 3) may explain, in part, the apparent disparity with present findings. Third, as has been pointed out, nucleotide sequence context may influence or bias substitution rates7, 11, 12. This may have affected previous estimates of m, as these were based on relative substitution rates in substantially diverged portions of the X and Y chromosomes3-6. Such contextual bias should be negligible in our present examination of X and Y-chromosomal regions whose DNA sequences are 99% identical. Finally, and perhaps most importantly, the present analysis involved sequences which appear to be physically distant from any gene; selective neutrality can reasonably be assumed. By contrast, previous estimates of m were based on analyses of introns or exons. The higher estimates of m in past studies could reflect: (1) relaxed selective constraints on Y-linked introns and exons as compared with their X-linked homologues7-9; (2) diminished mutation rates in X-linked genes10, 19, or both. Future studies of genes within the region of 99% X–Y identity may allow investigators to document the effects, if any, of mutation and selection bias on estimates of m. Our findings suggest that substitution rates were only modestly higher in males than females during the last three to four million years of hominid evolution. If these inferences extend to modern humans, as seems likely, then they have ramifications for medical genetics. Many individuals are afflicted by autosomal dominant or X-linked recessive disorders because of new mutations that appeared in their parents' or grandparents' germlines1, 20, 21. In several such disorders, and especially those caused by recurrence of specific substitutions at particular nucleotide positions in one gene (such as achondroplasia22), nearly all new mutations arise in the male germline. Bolstered by studies of X–Y gene evolution in primates2-6, this dramatic sex bias at a small number of extraordinarily mutable nucleotides has been taken as evidence that substitution rates across the human genome are much higher in males than in females20, 21. Our results suggest a reinterpretation of these medically ascertained hot spots for mutation. Our estimate of m 1.7 is based on the complete, diverse set of germline substitutions that accumulated within a large, selectively neutral region. Our data may provide the best global estimate to date of substitutional sex ratios in the human genome. From this point of reference, the large sex biases at some medically important hot spots appear as marked departures from global norms, underscoring the importance of unexplored interactions between sequence context and sex at these unusual sites. Our findings also challenge the model that human mutation rates are directly proportional to the number of cell divisions, regardless of sex. Beginning with Haldane1, high m values have been attributed to the much greater number of germline cell divisions in males (with mitotically active spermatogonial stem cells) than in females (where germ cells cease dividing during foetal development)2, 3. Our results, however, suggest that sexual asymmetry in substitution rates is far less striking than sexual asymmetry in numbers of cell divisions, at least in hominids. We suggest two possibilities for further investigation. Perhaps errors in mitotic DNA replication and repair account for a minority of germline substitutions in human genes. Alternatively, perhaps DNA replication and repair are unusually accurate in spermatogonial stem cells and their prospermatogonial precursors, which account for most of the excess cell divisions in the male germ line. Methods Reference DNA sequences We studied a 38.6-kb X–Y homologous segment which corresponds both to nucleotides 28,842–67,422 of a human X-chromosomal BAC (GenBank AC002488) and to nucleotides 88,369–127,049 of a human Y-chromosomal BAC (GenBank AC002509). These BACs, isolated in our laboratory from the California Institute of Technology A (CTA) library23, had been sequenced at the Whitehead Institute/MIT Center for Genome Research. Interspersed repetitive elements within the 38.6-kb segment were identified electronically using RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html); all repeats were included in the analysis of mutations. Electronic searches employing GenScan24, Grail25 and BLAST26 failed to identify any genes or exons within the 38.6-kb segment. Electronic searches also failed to identify any genes or exons in three adjoining Y-chromosomal BACs (GenBank AC012078, AC010094 and AC010737), all sequenced at the Washington University Genome Sequencing Centre. Resequencing A series of overlapping fragments, each about 1 kb in length, that collectively spanned the 38.6-kb region was generated by PCR using each of four DNAs as starting material: the human X-chromosomal BAC, the human Y-chromosomal BAC, chimpanzee female genomic DNA, and gorilla female genomic DNA. PCR primers were selected using Primer3 (ref. 27) and are available upon request. PCR products were purified on Sephacryl-S300 columns and sequenced using fluorescent-dye-terminator cycle sequencing protocols (ThermoSequenase kit; Amersham). Primers used in PCR generation of sequencing templates were also used as sequencing primers; additional sequencing primers were selected at sites internal to the PCR-generated templates. The sequences of each of the four chromosomes (human X, human Y, chimpanzee X, gorilla X) were assembled using Sequencher 3.1 (Gene Codes Corp.). The four chromosomes were aligned using MegAlign (DNASTAR, Inc.), and the alignment was edited manually (alignment available upon request). There was only one discrepancy (T C at nucleotide 56,776 in X-chromosomal sequence) between our PCR-generated human X and Y-chromosomal sequences and the corresponding, GenBank-deposited reference sequences. Statistical calculations In this direct method, m was calculated directly from the inferred numbers of Y and X substitutions via the formula Y/X = 3 m/(2 + m) (ref. 2). We calculated confidence intervals for ratios of substitution rates using the formula for relative risk28. We also analysed our sequence data using the method of ref. 3. We began by calculating, for each pairwise sequence comparison, the number of substitutions per 100 nucleotides16. From there we estimated branch lengths and their variances29, and these values in turn enabled us to estimate Y/X and m (ref. 3). When correcting estimates of m for ancient polymorphism, we calculated means and variances for the numbers of substitutions after X-to-Y transposition, and then calculated means and standard deviations for Y/X using the delta method30. GenBank accession numbers Gorilla: AF190869, AF190870 and AF190871. Chimpanzee: AF190865, AF190866, AF190867 and AF190868. Received 19 April 2000; accepted 2 June 2000 References 1. Haldane, J. B. S. The mutation rate of the gene for haemophilia, and its segregation ratios in males and females. Ann. Eugen. 13, 262-271 (1947). | ISI | 2. Miyata, T., Hayashida, H., Kuma, K., Mitsuyasu, K. & Yasunaga, T. Male-driven molecular evolution: A model and nucleotide sequence analysis. Cold Spring Harbor Symp. Quant. Biol. 52, 863-867 (1987). | PubMed | ISI | ChemPort | 3. Shimmin, L. C., Chang, B. H. -J. & Li, W. -H. Male-driven evolution of DNA sequences. Nature 362, 745-747 (1993). | PubMed | ISI | ChemPort | 4. Shimmin, L. C., Chang, B. H. -J. & Li, W.-H. Contrasting rates of nucleotide substitution in the X-linked and Y-linked zinc finger genes. J. Mol. Evol. 39, 569-578 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. (1994). | PubMed | ISI | ChemPort | Chang, B. H.-J., Hewett-Emmett, D. & Li, W.-H. Male-to-female ratios of mutation rate in higher primates estimated from intron sequences. Zool. Stud. 35, 36-48 (1996). | ISI | ChemPort | Huang, W., Chang, B. H.-J., Gu, X., Hewett-Emmett, D. & Li, W.-H. Sex differences in mutation rate in higher primates estimated from AMG intron sequences. J. Mol. Evol. 44, 463-465 (1997). | PubMed | ISI | ChemPort | Shimmin, L. C., Chang, B. H.-J., Hewett-Emmett, D. & Li, W.-H. Potential problems in estimating the male-to-female mutation rate ratio from DNA sequence data. J. Mol. Evol. 37, 160-166 (1993). | PubMed | ISI | ChemPort | Charlesworth, B. The effect of background selection against deleterious alleles on weakly selected, linked variants. Genet. Res. 63, 213-227 (1994). | PubMed | ISI | ChemPort | Li, W.-H. Molecular Evolution (Sinauer, Sunderland, 1997). McVean, G. T. & Hurst, L. D. Evidence for a selectively favourable reduction in the mutation rate of the X chromosome. Nature 386, 388-392 (1997). | PubMed | ISI | ChemPort | Bulmer, M. Neighboring base effects on substitution rates in pseudogenes. Mol. Biol. Evol. 3, 322-329 (1986). | PubMed | ISI | ChemPort | Wolfe, K. H., Sharp, P. M. & Li, W.-H. Mutation rates differ among regions of the mammalian genome. Nature 337, 283-285 (1989). | PubMed | ISI | ChemPort | Page, D. C., Harper, M. E., Love, J. & Botstein, D. Occurrence of a transposition from the Xchromosome long arm to the Y-chromosome short arm during human evolution. Nature 311, 119-123 (1984). | PubMed | ISI | ChemPort | Mumm, S., Molini, B., Terrell, J., Srivastava, A. & Schlessinger, D. Evolutionary features of the 4-Mb Xq21. 3 XY homology region revealed by a map at 60-kb resolution. Genome Res. 7, 307-314 (1997). | PubMed | ISI | ChemPort | Schwartz, A. et al. Reconstructing hominid Y evolution: X-homologous block, created by X-Y transposition, was disrupted by Yp inversion through LINE-LINE recombination. Hum. Mol. Genet. 7, 1-11 (1998). | Article | PubMed | ISI | ChemPort | Tajima, F. & Nei, M. Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol. 1, 269-285 (1984). | PubMed | ISI | ChemPort | Kaessmann, H., Heissig, F., von Haeseler, A. & Paabo, S. DNA sequence variation in a noncoding region of low recombination on the human X chromosome. Nature Genet. 22, 78-81 (1999). | Article | PubMed | ISI | ChemPort | Kaessmann, H., Wiebe, V. & Paabo, S. Extensive nuclear DNA sequence diversity among chimpanzees. Science 286, 1159-1162 (1999). | Article | PubMed | ISI | ChemPort | Ellegren, H. & Fridolfsson, A. K. Male-driven evolution of DNA sequences in birds. Nature Genet. 17, 182-184 (1997). | PubMed | ISI | ChemPort | Vogel, F. & Motulsky, A. G. Human Genetics (Springer, Berlin, 1997). Crow, J. The high spontaneous mutation rate: Is it a health risk? Proc. Natl Acad. Sci. USA 94, 8380-8386 (1997). | Article | PubMed | ISI | ChemPort | Wilkin, D. J. et al. Mutations in fibroblast growth-factor receptor 3 in sporadic cases of achondroplasia occur exclusively on the paternally derived chromosome. Am. J. Hum. Genet. 63, 711-716 (1998). | Article | PubMed | ISI | ChemPort | Shizuya, H. B. et al. Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl Acad. Sci. USA 89, 87948797 (1992). | PubMed | ISI | ChemPort | Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94 (1997). | Article | PubMed | ISI | ChemPort | Uberbacher, E. C. & Mural, R. J. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl Acad. Sci. USA 88, 11261-11265 (1991). | PubMed | ISI | ChemPort | 26. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403-410 (1990). | Article | PubMed | ISI | ChemPort | 27. Rozen, S. & Skaletsky, H. in Bioinformatics Methods and Protocols (eds Misener, S. & Krawetz, S. A.) (Humana, Totowa, 1999). 28. Agresti, A. Categorical Data Analysis (Wiley, New York, 1990). 29. Li, W. -H. A statistical test of phylogenies estimated from sequence data. Mol. Biol. Evol. 6, 424-435 (1989). | PubMed | ISI | ChemPort | 30. Bishop, Y. V. V., Feinberg, S. E. & Holland, P. W. Discrete Multivariate Analysis (MIT Press, Cambridge, Massachusetts, 1975). Acknowledgements. We thank A. Schwartz for identifying homologous X and Ychromosomal BACs, colleagues at the Whitehead Institute/MIT Centre for Genome Research for sequencing those BACs, and J. Bradley, A. Chakravarti, B. Charlesworth, A. Clark, D. Haig, T. Kawaguchi, L. Kruglyak, F. Lewitter, Y.-F. Lim, D. Reich, W. Rice, S. Rozen, C. Tilford and J. Wang for comments on the manuscript. Supported in part by the NIH. Figure 1 Phylogenetic tree of nucleotide sequences. Branch lengths were estimated from the pairwise evolutionary distances (substitutions per 100 sites) in Table 2. 23 March 2000 Nature 404, 351 - 352 (2000); doi:10.1038/35006158 Y-chromosome variation and Irish origins A pre-neolithic gene gradation starts in the near East and culminates in western Ireland. Ireland's position on the western edge of Europe suggests that the genetics of its population should have been relatively undisturbed by the demographic movements that have shaped variation on the mainland. We have typed 221 Y chromosomes from Irish males for seven (slowly evolving) biallelic and six (quickly evolving) simple tandem-repeat markers. When these samples are partitioned by surname, we find significant differences in genetic frequency between those of Irish Gaelic and of foreign origin, and also between those of eastern and western Irish origin. Connaught, the westernmost Irish province, lies at the geographical and genetic extreme of a Europe-wide cline. Surnames have been used in Ireland from about AD 950 as markers of complex local kinship systems. As both surnames and Y chromosomes are paternally inherited, we divided our Irish sample into seven surname cohorts for which ancient geographical information is known, with some error. Four are of prehistoric, Gaelic origin (Ulster, Munster, Leinster and Connaught) and three are diagnostic of historical influx (Scottish, Norman/Norse and English)1. The biallelic markers (SRY-1532, M9, YAP, SRY-2627 (ref. 2); SRY-8299 (ref. 3); sY81 (ref. 4); and 92R7 (ref. 5)) define nine haplogroups (clusters of genetic variants) which are highly non-randomly distributed among human populations6, including our samples. In particular, haplogroup 1 (hg 1) has a very high frequency in Ireland (78.1% in the island as a whole). Surname subdivision reveals a cline in Irish samples, with exogenous samples clearly showing lower frequencies (English, 62.5%; Scottish, 52.9%; Norman/Norse, 83.0%) than Gaelic Irish samples (Leinster, 73.3%; Ulster, 81.1%; Munster, 94.6%), which almost reach fixation in the westernmost province (Connaught, 98.3%). These highly significant differences in the frequency of hg 1 between Irish Gaelic and non-Gaelic Y chromosomes (P<0.001 ) and between eastern and western Gaelic Y chromosomes (P<0.001 ) persist when duplicated surnames are removed. Eighty per cent (n = 26; ref. 7) of European hg 1 Y chromosomes belong to 'haplotype 15', defined by using the complex p49f/TaqI polymorphic system8. Using this relationship, we estimated that hg 1 frequencies follow a cline within Europe9, extending from the Near East (1.8% in Turkey) to a peak in the Spanish Basque country (89%; ref. 10) in the west (Fig. 1). This cline mirrors other genetic gradients in Europe and is best explained by the migration of Neolithic farmers from the Near East9. When the surname-divided Irish data are appended to this cline, it continues to the western edge of Europe, with hg 1 — the putative pre-Neolithic western European variant — reaching its highest frequency in Connaught (98.3%). Figure 1 Distribution of observed and estimated haplogroup 1 Y-chromosome haplotypes in Europe. Full legend High resolution image and legend (51k) In a maximum-parsimony phylogenetic analysis of both bialellic and simple tandem-repeat (STR) variation between Irish Gaelic haplotypes ( Fig. 2), the hg 1 chromosomes cluster together tightly, with the highest-frequency haplotypes occupying central positions, suggesting a coherent common ancestry. The smaller number of non-hg 1 haplotypes shows no such coherence, consistent with their being immigrants. Their concentration in the eastern Gaelic cohorts may be indicative of a prehistoric influx or of later gene flow across the linguistic barrier from historical migrant groups. Figure 2 Consensus maximum parsimony networks of Irish Gaelic haplotypes summarizing both single tandem-repeat (DYS19, DYS3891, DYS390, DYS391, DYS392 and DYS393 ; ref. 14) and biallelic variation. Full legend High resolution image and legend (8k) These findings suggest that hg 1 is the earlier, indigenous Irish variant. By taking the ancestral haplotype as that with the most common allele for each STR and calculating the average squared distance11 (assuming a generation time of 27 years and a mutation rate of 0.21%; ref. 12) between it and all variants (Fig. 2 ), we estimate a date for Irish hg 1 coalescence of 4,200 BP (95% c.i. 1,800–14,800 BP). This relatively recent date (a global estimate of hg 1 coalescence is 30,000 BP; ref. 13) falls well within Ireland's 9,000-year history of human habitation. Although error margins are considerable and include uncertainty related to method, this also provides an upper bound for any agriculturally facilitated population expansion, which, at the fringe of Europe, may have taken place in an insular Mesolithic population of hg 1 genotype. EMMELINE W. HILL*, MARK A. JOBLING† & DANIEL G. BRADLEY* * Department of Genetics, Trinity College, Dublin 2, Ireland † Department of Genetics, University of Leicester, University Road, Leicester LE1 7RH, UK e-mail: dbradley@mail.tcd.ie References 1. MacLysaght, E. The Surnames of Ireland (Irish Academic, Dublin, 1997). 2. Hurles, M. E. et al. Am. J. Hum. Genet. 63, 1793-1806 (1998). | Article | PubMed | ISI | ChemPort | 3. Whitfield, L. S. et al. Nature 378, 379-380 (1995). | PubMed | ISI | ChemPort | 4. Seielstad, M. T. et al. Hum. Mol. Genet. 3, 2159-2161 (1994). | PubMed | ISI | ChemPort | 5. Hurles, M. E. et al. Am. J. Hum. Genet. 65, 1437-1448 (1999). | Article | PubMed | ISI | ChemPort | 6. Jobling, M. A. & Tyler-Smith, C. Trends Genet. 11, 449-456 (1995). | Article | PubMed | ISI | ChemPort | 7. Jobling, M. A. Hum. Mol. Genet. 3, 107-114 (1994). | PubMed | ISI | ChemPort | 8. Ngo, K. Y. et al. Am. J. Hum. Genet. 38, 407-418 (1986). | PubMed | ISI | ChemPort | 9. Semino, O. et al. Am. J. Hum. Genet. 59, 964-968 (1996). | PubMed | ISI | ChemPort | 10. Lucotte, G. & Hazout, S. J. Mol. Evol. 42, 472-475 (1996). | PubMed | ISI | ChemPort | 11. Goldstein, D. B. et al. Proc. Natl Acad. Sci. USA 92, 6723-6727 (1995). | PubMed | ISI | ChemPort | 12. Heyer, E. et al. Hum. Mol. Genet. 6, 799-803 (1997). | Article | PubMed | ISI | ChemPort | 13. Hammer, M. F. et al. Mol. Biol. Evol. 15, 427-441 (1998) | PubMed | ISI | ChemPort | 14. Kayser, M. et al. Int. J. Legal Med. 110, 125-133 (1997). | Article | PubMed | ISI | ChemPort | Figure 1 Distribution of observed and estimated haplogroup 1 Y-chromosome haplotypes in Europe. A cline stretches from a frequency of 1.8% in Turkey to peaks in the Basque country (89%) and the west of Ireland (98% in Connaught, the westernmost marker). Figure 2 Consensus maximum parsimony networks of Irish Gaelic haplotypes summarizing both single tandem-repeat (DYS19, DYS3891, DYS390, DYS391, DYS392 and DYS393 ; ref. 14) and biallelic variation. Branch lengths are proportional to the number of mutational steps and node areas are proportional to haplotype frequencies. Haplogroup (hg) 1, blue; hg 2, yellow; hg 21, green; hg 26, white. Asterisk, estimated ancestral haplotype. The separate clustering of haplogroups and the tight clustering of the Irish hg 1 haplotypes around a few numerous, central variants are constant through all most-parsimonious trees and were resistant to repetition of the analysis with randomized inputs. doi:10.1038/ng1113 volume 33 supplement pp 266 - 275 The application of molecular genetic approaches to the study of human evolution L. Luca Cavalli-Sforza1 & Marcus W. Feldman2 1. Department of Genetics, Stanford Medical School, Stanford University, Stanford, California 94305-5120, USA. 2. Department of Biological Sciences, Stanford University, Stanford, California 94305-5020, USA. Correspondence should be addressed to M W Feldman. e-mail: marc@charles.stanford.edu The past decade of advances in molecular genetic technology has heralded a new era for all evolutionary studies, but especially the science of human evolution. Data on various kinds of DNA variation in human populations have rapidly accumulated. There is increasing recognition of the importance of this variation for medicine and developmental biology and for understanding the history of our species. Haploid markers from mitochondrial DNA and the Y chromosome have proven invaluable for generating a standard model for evolution of modern humans. Conclusions from earlier research on protein polymorphisms have been generally supported by more sophisticated DNA analysis. Co-evolution of genes with language and some slowly evolving cultural traits, together with the genetic evolution of commensals and parasites that have accompanied modern humans in their expansion from Africa to the other continents, supports and supplements the standard model of genetic evolution. The advances in our understanding of the evolutionary history of humans attests to the advantages of multidisciplinary research. Reconstructing human evolution requires both historical and statistical research. Although conclusions are not experimentally verifiable because the process cannot be repeated, various disciplines such as physical and social anthropology, archaeology, demography and linguistics provide complementary approaches to researching questions of human evolution. The existence of molecular genetic variation among human populations was first demonstrated by Hirszfeld and Hirszfeld1 in a classic study published in 1919 of the first human gene to be described—ABO, which determines ABO blood groups. The subsequent identification of blood group protein markers, such as MNS and Rh expanded the repertoire of polymorphic markers that could be analyzed using antibodies. R.A. Fisher showed that evolution could be reconstructed by analyzing the multilocus genotypes on a chromosome observed in populations and their inheritance within families2. The term 'haplotype' for the multilocus combination of alleles on a chromosome was introduced by Ceppellini et al.3 during early research on the major histocompatibility complex. Immunological methods remained the only satisfactory technique for detecting genetic variation until Pauling et al.4 introduced electrophoresis to separate different mutants of hemoglobin, a technique that was rapidly adapted to analyze variation in other blood proteins. It was soon obvious that genetic variation was not rare but, on the contrary, that almost every protein had genetic variants5, 6. These variants became useful markers for population studies. The first book of allele frequencies in populations, published in 1954, was limited almost completely to serological variation7, and books listing genetic variation increased rapidly in size and number8-10. In 1980, a method for studying variation in DNA11 identified mutants of restriction sites by using radioisotopes and generated several new markers. But it was only with the development of PCR in 1986 that the study of more general DNA variation became possible. The development of automated DNA sequencing in the early 1990s paved the way for the application of systematic study of genome variation to human evolutionary biology. Data from protein markers (sometimes called 'classical' markers) are still more abundant than are data from DNA, although this situation is rapidly changing. For example, Rosenberg et al.12 studied 377 autosomal microsatellite polymorphisms in 1,065 individuals from 52 populations producing a total of 4,199 different alleles, about half of which were found in all principal continental regions. Another study13 of 3,899 single-nucleotide polymorphisms (SNPs) in 313 genes sampled in 82 Americans self-identified as African American, Asian, European or Hispanic Latino found that only 21% of the sites were polymorphic in all four groups—a fraction that would be expected to increase with more sampled individuals. It is interesting to note, however, that so far no conclusions derived from the earlier studies of classical polymorphisms14 have been found to be in disagreement with those obtained with DNA markers. Nonetheless, molecular genetic markers have provided previously unavailable resolution into questions of human evolution, migration and the historical relationship of separated human populations. In this review we discuss the evolutionary and historical forces that have shaped genomic variation and how its interpretation has led to a deeper understanding of the evolution of our species. Evolutionary events affecting genomic variation All genetic variation is caused by mutations, of which there are many different types. The most common and most useful for many purposes are SNPs, which can be detected by DNA sequencing and other recently developed methods, such as denaturing high performance liquid chromatography15, mass spectrometry16 and array-based resequencing17. Allelic frequencies change in populations owing to two factors: natural selection, which is the result of population variation among individual genotypes in their probabilities of survival and/or reproduction, and random genetic drift, which is due to a finite number of individuals participating in the formation of the next generation. Both natural selection and genetic drift can ultimately lead to the elimination or fixation of a particular allele. In the presence of mutation and in the absence of selection (that is, under neutral conditions), the rate of neutral evolution of a finite population is equal to the reciprocal of the mutation rate18. The earliest evidence of selection acting on a human gene was the discovery that heterozygotes of the hemoglobin A/S polymorphism have greater resistance to malaria than do AA or SS homozygotes. In malarial environments, this results in a balanced polymorphism that maintains the S allele even though SS individuals are severely ill with sickle-cell anemia. Recent studies of DNA variation have focused on detecting signatures of selection, either balancing or directional19. This has produced many different statistical tests using DNA diversity20, 21 and comparisons of nucleotide substitutions that do or do not affect the amino acid sequence of proteins22, 23. Strong molecular evidence of balancing selection, also in malarial environments, has been found for the G6PD locus, the low-activity alleles of which seem to confer resistance to malaria24, 25. Other analyses26 have found evidence for positive selection at both G6PD and another gene TNFSF5, which is also implicated in the response to infectious agents. Strong directional selection has also been proposed 27 for FOXP2, which shows a two amino-acid difference between the human protein and the monomorphic form in primates. It has been suggested that these changes may have been selectively important for the evolution of speech and language in modern humans27. In other genes, however, the agent of selection is not at all obvious; for example, the CCR5 gene28 seems to be related to HIV resistance, and mutations in the BRCA1 gene29 produce an increased risk of female breast cancer. In such cases it is often very difficult to disentangle the effects of population dynamics or structure from selective pressures. These complications can be clearly observed in a thorough analysis of the HFE locus30, mutations of which result in hemochromatosis. In this study no evidence of selection on single SNPs or on haplotypes was detected, but significant between-continent variation was found. Unlike other studies12, 13, African samples showed only slightly more rare SNPs than Europeans or Asians. This suggests the possibility that different evolutionary models are relevant to the different continents. Genetic statistics of the substructure underlying human populations may also suggest which genes are candidates to have been under selection. The idea, originally proposed by Cavalli-Sforza31 and expanded by Lewontin and Krakauer32, is to compare the expected and observed values of FST statistics (a measure of the relevant amount of genetic diversity among populations)33 for a large enough number of genes and focus on those loci that produce extreme values. In a recent study of 8,862 SNPs mapped to gene-associated regions34, 156 genes for which the FST value was exceptionally high and 18 for which it was exceptionally low were identified, suggesting that these 174 genes are candidates for having been under selection. Similar approaches have been applied to specific genes such as G6PD24, the Duffy blood group locus35, lactase haplotypes36, MAOA37 and skin pigmentation38; in each case, unusually high variation among populations has been invoked as a signature for the action of selection. The interactions among population substructure, demography and phenotypic variation are discussed in a recent review39. Migration is another important factor in human evolution that can profoundly affect genomic variation within a population. Most populations are relatively isolated, however, although rare exchange of marriage partners between groups does occur. An average of one immigrant per generation in a population is sufficient to keep drift partially in check and to avoid complete fixation of alleles. Sometimes a whole population (or a fraction of it) migrates and settles elsewhere. If the migrant group is initially small but subsequently expands, by chance alone the frequencies of alleles among the founders of the new population will differ from those of the original population and even more so from those among which it settles. In this situation, group migration has an effect that in some respects is opposite to that of individual migration among neighboring populations: it creates more chances for drift and therefore divergence40. The effect will be intergroup variation in allele frequencies. Genome structure and population history A complete description of human genetic variation requires more than just properties of isolated genes, microsatellites or SNPs. How these vary simultaneously within a part or whole chromosome requires statistics of correlation between the variation at different positions, and these are usually described by patterns of linkage disequilibrium (LD). The stronger the LD, the more likely that alleles at each of two positions will be found in association with one another. Using studies of protein variants in the 1970s and 1980s it was rare to identify strong LD in populations of outcrossing diploids. However, as more details become available on variation in human DNA across populations, LD between polymorphic DNA sites is increasingly being detected. Standard population genetic theory suggests that LD between pairs of genetic markers should decrease as the recombination between them increases. But early studies of short segments of DNA did not show this relationship for SNPs41. Because the pattern of LD is expected to vary both with local effects, such as the extent of selection, the degree to which pairs of sites interact in response to selection (epistasis), and with population-scale forces, such as drift (reviewed in ref. 42), migration and non-random mating43, genomic patterns of LD can be expected to be fairly complex. Recent studies of relatively long (200–500 kb) stretches of DNA, however, have produced a picture of blocks of high LD interspersed by short intervals of low LD. Within the blocks of high LD there is evidence of lack of recombination, whereas the regions between the blocks seem to be 'hot spots' in which recombination occurs frequently44-47. It has been therefore suggested that the next phase of research into human variation should focus on these blocks of high LD, for which haplotypes, rather than single markers, will become the unit of variation44. Although it has been known for many years that the extent of LD among specific sets of genes shows great variation around the world48—for example, it is usually much weaker in African than in European populations49— genome-wide studies covering representative worldwide populations remain to be done50, 51. Interpreting evolutionary history The history of population differentiations using genetic data was initially inferred from phylogenetic trees52-54 and from multivariate statistical methods such as principal components53, 55 (of which multidimensional scaling is a derivative) that use allele frequencies. Population trees are especially useful for reconstructing history if population differences can be assumed to result from fissions that occur randomly in time, with a constant rate of neutral evolution in each population between fissions. This is likely to be roughly true for data on several autosomal genes from large populations that are geographically and genetically distant, as illustrated in Figure 1, which shows nine such groups from around the world. Completely different types of DNA variation provide the same basic conclusion regarding the relationships between these populations (refs. 56, 57; and L.A. Zhivotovsky, N.A. Rosenberg and M.W. Feldman, manuscript in preparation). Violation of the above assumptions, such as the presence of migration or selection, affects the interpretation of population trees. However, when migration between geographic neighbors is frequent, principal components displayed in two dimensions reflect the geographical distribution of populations. Under the simple evolutionary model described above, trees and principal components give similar results58. For populations that are geographically close, genetic and geographic distances are often highly correlated (Fig 2), with an asymptote for the genetic distance at about 1,000–2,600 miles on average (but higher for Asia and the world, which are not at equilibrium). Recent statistical developments in detecting clustering among populations based on highly polymorphic autosomal markers59 have been valuable for analyzing very large population genetic data sets12. It is important that this completely different approach produces the same primary continental clusters as the earlier methods. In its application to data sets with numerous polymorphic loci, however, it does seem to be more sensitive in detecting and assessing individual ancestry. Early studies showed that genetic differences between populations are relatively small as compared with those within populations60, 61. Subsequent analyses, including molecular polymorphisms of 14 populations representing all continents, confirmed that the within-population variance was about 85% of the total (Table 1)62. A recent analysis of 377 autosomal microsatellite markers12 in 1,065 individuals from 52 worldwide populations found that only 5–7% of the variation was between populations. It is the remaining 5–15%—the between-population component—that can be used to reconstruct the evolutionary history of populations. Dating the origin of our species using genetic data Archeological evidence is generally considered to support the initial spread of humans within Africa from an East African origin during the first half of the last 100 kya and the spread from the same origin to all the world in the last 50–60 kya. Analyses of numerous classical markers under this assumption have estimated the dates of first occupation by anatomically modern humans of Asia, Europe and Oceania at 60–40 kya, in agreement with archeological and fossil data. Dates for the first occupation of America are estimated at 15– 35 kya. Thus, genetically derived dates are consistent with evidence from physical anthropology, providing support for the use of population trees63. Below we discuss how recent analysis of DNA polymorphisms supports this timing of the earliest split between Africans and non-Africans. Studies of variation in DNA became possible in the early 1980s (refs. 64, 65). Subsequent estimates for the emergence of modern humans from Africa using autosomal restriction fragment length polymorphisms were consistent with earlier estimates56, 66. From the analysis of several mitochondrial DNA (mtDNA) polymorphisms, Cann et al.67 derived two important conclusions: the first major separation in the evolutionary tree of modern humans was between Africans and non-Africans; and the time back to the most recent common ancestor (TMRCA) of modern human mtDNA was 190,000 years (however, with a large error). After early doubts about the statistical validity of these interpretations of the data68, the order of magnitude was confirmed69-71. It is important to note that TMRCA is usually significantly earlier than the first archaeologically observable divergence among a set of populations72, 73. Also, TMRCA does not necessarily coincide with the onset of population expansion. The 'mismatch' method74 to analyze mtDNA, which analyzes the distribution of between sequence differences, gives estimates that are more compatible with the beginning of expansions inferred from archeology. Because mitochondria are transmitted along only female lineages and mtDNA is genetically haploid, the effective size of a population of mtDNAs is a quarter of that of the corresponding autosomes. The mutation rate of the mitochondrial genome is about ten times higher than that of nuclear DNA75, which provides an abundance of polymorphic sites, but creates difficulties in reconstructing genealogies owing to repeated and reverse mutations. Like the nonrecombining part of the Y chromosome (NRY), there is no evidence for recombination in mtDNA although low-frequency rearrangements of somatic mtDNA have been observed in heart muscle76. The mutation rate of the NRY is comparable to that of nuclear DNA, which means that polymorphisms are more difficult to find but genealogies are easier to reconstruct. The greater length of DNA on the NRY (perhaps 30 million bases of euchromatic DNA) relative to mtDNA compensates in data analyses for its lower mutation rate. Even though the NRY behaves effectively as a single locus, which is usually insufficient for evolutionary analyses, it has provided results that are consistent across many studies and in agreement with many archeological findings. In fact, the NRY genealogy constructed from 167 mutations77 has been replicated with a totally independent set of 114 mutations75 and confirmed independently using mostly different population samples78, 79. Statistical analysis of Y chromosome data have been carried out using coalescent theory devised by Kingman80. Coalescent-based techniques using numerical methods to study complex likelihood functions derived from Bayesian analyses were developed subsequently81, 82 and have facilitated estimation of key parameters in the Y chromosome genealogy (ref. 83; and H. Tang et al., manuscript in preparation) under specific assumptions about demographic history. Tang et al.75 have shown that important evolutionary properties of the Y chromosome TMRCA, which is close to 100 kya, can be derived under few demographic assumptions. Two recent estimates of TMRCA from mtDNA have been made using different methods. From complete mtDNA sequences (excluding the D loop) in a sample of 53 individuals, 516 segregating sites were seen and a TMRCA was estimated at 171 50 kya70. From a sample of 179 individuals with 971 SNPs, the TMRCA was estimated at 200–281 kya using a generation time of 25 years, and 160– 225 kya using a generation time of 20 years75. Corresponding estimates for the NRY-based TMRCA are 60–130 kya and 72–156 kya, with generation times of 25 and 30 years, respectively75. It is important to stress that such estimates of TMRCAs do not imply that the human population contained only one woman at 230 kya (the time of the mtDNA-based TMRCA, assuming constant mutation rates) or only one man at 100 kya (the time of the NRY-based TMRCA). The only implication is that all human mitochondria existing today descend from that of a single woman living 230 kya, and all NRYs descend from that of a single man living 100 kya. In both cases, it is likely that there were many more human individuals alive at the TMRCA—whether they were of the same species as Homo sapiens is hard to determine, but descendants of other species are either absent or extremely rare. Although the reconstructed genealogies of mtDNA and NRY are broadly similar, there are some notable differences, probably owing to social differences in migration customs. For example, patrilocal marriage has historically been more common than matrilocal84, which can explain differences in mtDNA and Y chromosome data in a number of populations8593. Demographic differences between the sexes, such as greater male than female mortality, the greater variance in reproductive success of males than females and possibly the greater frequency of polygyny than polyandry, may explain the discrepancy between the NRY and mtDNA dates. These factors reduce the effective number of males and may explain the more than twofold difference between the NRY-based and the mtDNA-based TMRCA. Another attractive alternative explanation is that mutation rates in mtDNA are very variable, and when this variation is taken into account TMRCA of mtDNA could become closer to that of NRY. Estimates of TMRCAs from autosomal genes are higher than those from mtDNA or NRY. In theory, they should be higher by a factor of four and the estimates are in this direction, although the number of autosomal genes studied is small and estimates of TMRCAs vary considerably94. For analyses of autosomal and X chromosomes, recombination can complicate genealogies and make TMRCAs impossible to estimate. There is also the possibility of heterozygote advantage, which has the potential to increase estimates of TMRCA. Heterozygote advantage may be widespread throughout the human genome but has been very difficult to show unequivocally, and the only fully confirmed example is sickle cell anemia, for which very large samples were required. There is some optimism, however, that the development of techniques that can detect heterosis for some genes in yeast95 may lead to greater success in other organisms, including humans. Tracking migrations of our species using DNA A recent synthesis of Y chromosome phylogeography, paleoanthropological and paleoclimatological evidence suggests a possible hypothesis for the evolution of human diversity96-98. Around 100 kya or shortly after, a small population of about 1,000 individuals (that is, a tribe), most probably from East Africa, expanded throughout much of Africa. Then, between 60 and 40 kya there was a second expansion, most probably from a descendant population, into Asia and from there to the other continents (Fig. 3). This may be referred to as the 'standard model of modern human evolution'; it is also called 'out of Africa 2' in recognition of an earlier expansion of Homo erectus from Africa into Eurasia around 1.7 million years ago and assumes that anatomically modern humans (also called Homo sapiens sapiens) replaced earlier poorly known species of Homo that descended from the first migrants of H. erectus98. Genetic data provide some indication that the spread of humans into Asia occurred through two routes. The first was a southern route, perhaps along the coast to south and southeast Asia, from where it bifurcated north and south99. In the south, these modern humans reached Oceania between 60 and 40 kya, whereas the northern expansion later reached China, Japan and eventually America (this might represent the second migration to America, associated with the NaDene languages, postulated by Greenberg100). The second was a central route through the Middle East, Arabia or Persia to central Asia, from where migration occurred in all directions reaching Europe, east and northeast Asia about 40 kya, after which the first and principal migration to America suggested by Greenberg occurred not later than 15 kya101. It is still unresolved whether the divergence between these two expansion routes occurred in Africa or after entry into west Asia, and, if the latter, where it happened. Most literature accepts without discussion that the entry to Europe and central Asia was through the Levant. It is not at all certain that this was the only or the earliest route. These two initially divergent routes converged later, especially in the extreme East and America. An alternative to the out of Africa 2 hypothesis, originated by Weidenreich102 and expanded and called 'multiregional' by Wolpoff103, maintains that all human populations living today originated in their various continents and evolved in parallel into modern humans. The main basis of this hypothesis is the claim that most ancient fossils (essentially those from Europe and Asia but not Oceania and America, where the human fossils found are all very recent and of modern human type) show a continuous morphological transition to modern humans. An extreme example of parallel evolution that included the doubling of brain volume is invoked to explain this scenario. In later versions of the multiregional model, parallelism is claimed to be the result of substantial intermigration100, 104. Recent quantitative anthropological research on several human skulls has shown no morphological continuity in the various continents85. In addition, in the only part of the world where there existed a human type with some clear similarity to modern humans—namely Neandertals in Europe and west Asia —this purported ancestor of modern Europeans disappeared shortly after the appearance of modern humans (40–30 kya). MtDNA analysis of three Neandertals from Germany105, 106, Croatia107 and the Caucasus108 detected no similarity with modern humans and indicated that the evolutionary separation of Neandertal from modern humans took place at least 500 kya. It has been claimed that the age of TMRCA derived from the few human autosomal genes examined (between 500 and 1,000 kya) is proof of early expansions that have not been detected in NRY and mtDNA but are compatible with the multiregional hypothesis109, 110. Templeton100 proposes that this ancient TMRCA of autosomal genes is due to multiple migrations from Asia of H. erectus types before out of Africa 2 and the origin of modern humans. There is no evidence for such early migrations; even small populations tend to maintain high genetic variation (Table 1), and the amount of variation observed between human populations today is so small relative to the average variation within populations that it could have easily accumulated in the 100–200 ky before the present. Recent simulation-based tests of the nested-clade method used by Templeton have found that it may produce an inference of long-term recurrent gene flow where this is specifically excluded from the simulation111. It is also important that the extent of LD in autosomal genes is much lower in African than non-African populations48, 49, 112, 113, suggesting that non-African populations represent a small genetic subset of the Africans. LD has had a long time to dissipate in Africa, and the polymorphisms of the autosomal genes from which their (expected) long TMRCAs are calculated are much more likely to have arisen in Africa than in Asia. High resolution history using haploid markers The identification in recent years of a large number of SNPs on the NRY and mtDNA has afforded higher resolution of population history through the reconstruction of the phylogenetic relationships of extant Y chromosomes and mtDNA (Fig. 4). Using the nomenclature developed by the Y Chromosome Consortium114, the first two haplogroups (Fig. 4a; A and B) are almost completely African and even today represent mostly hunter-gatherers or their descendants, who have never reached high population densities or undergone high rates of increase. Slow growth is indicated by the accumulation of many mutations within a branch, as in most descendants of haplogroup A and B and in those of the earliest branches of haplogroups C, D, E and F. By contrast, when there are many branches (called a starburst) after a specific mutation or group of mutations, we can infer rapid growth115, 116. The major expansions are those of haplogoup F (seven branches) after an initial lag in population growth, and even more remarkable is the later expansion of haplogroup K (nine branches). These began in the last 40 kya and led to the major settlement of all continents from Africa, first to Asia, and from Asia to the other three continents. The tree of mtDNA (Fig. 4b) is more bushy, but there are more haplogroups because of the higher mutation rate. The general structures of the male and female genealogies in Figure 4 are the same. The earliest branches all remain in Africa; in both trees they clearly refer to the slowly growing hunter-gatherers. In both trees the major growth in Africa is due to a late branch, taking place in the second part of the last 100,000 years and clearly connected with the expansion to Asia. The M, N branches of the mtDNA phylogeny indicate the separation of the expansion from Africa to Asia into a southern and a northern branch. In the NRY genealogy, the southern branch is on average earlier than the northern, and includes mostly haplogroups C, D, H, M and L. Of these, H and L remained in India and part of C went to Oceania, the rest to Mongolia, Siberia, and eventually to northwest America (Na-Dene speakers). D went as far as southeast Asia and Japan. The northern Asian expansion remains mostly in East Asia (haplogroup O—its branch N has a major propagule to N.E. Europe, among Uralic speakers). Haplogroup I from north Asia generates what is probably the first major Paleolithic expansion to central Europe. G and J are found today in the Middle East and from there expanded to Europe, mostly in the south and probably with Neolithic farmers. R is found in Europe, India, Pakistan, and America, but an early branch seems to have returned to the central part of the Sahel in North Africa. Haplogroup Q generates most Amerinds, except for Na-Dene speakers and Eskimos. Haplogroup I is also found in north and central Europe, where it probably originated around 20,000 kya. A few indigenous individuals in America and Australia probably inherited European Y chromosomes. Parallel developments to human evolution What were the causes of the expansions that increased the number of modern humans by a million times or more over the past 100 kyr? Many capabilities distinguish modern humans from our predecessors (especially our closest relative, Neandertal): sophistication of stone tools, art, religion and, above all, language. We cannot totally exclude art or religion among Neandertals, but it is usually claimed that modern humans showed a very early, sudden development of art, with common themes related to magic, religion and an afterlife linked to the making of tombs117, 118, although there is evidence that many of these aspects of modern human behavior have a long history in Africa119. It has been rejected that Neandertal could speak languages like ours for anatomical reasons, but the evidence offered is considered inconclusive120, 121. Modern human languages are mutually incomprehensible and superficially unrelated to each other. A general classification based on 12 language families has been suggested by Greenberg (Fig. 5)88, 122-125. For geneticists like us, it seems natural to think that modern languages derive mostly or completely from a single language spoken in East Africa around 100 kya, given that today's genes also derive from that population. This does not mean that this was the only language in existence at the time; in parallel with genetic TMRCAs, it was the only language then existing that survived and evolved with rapid differentiation and transformation. Evidence supporting the existence of a common single language include the shared lexicon, sounds and grammar of present-day languages. Language, like many other forms of cooperation, must have originated as intrafamilial communication126. The expansion of modern humans may have been stimulated by the development of a new, more sophisticated culture of stone tools (called Aurignacian), which developed at the time of the expansion127. It is also very likely that navigation became available (or else the passage from southeast Asia to Oceania would have been impossible) and may even have been used earlier, such as in coastal south Asia87, or later along the Pacific American coast. Innovations that increased food availability may have then allowed groups to remain in the same area and to increase in size. This apparently happened in many parts of the world on a massive scale starting 10–13 kya with the adoption of agriculture and pastoralism. From the beginning of food production to the present, there must have been a thousandfold population increase. Demographic growth in the well identified, specific areas of origin of agriculture must have stimulated a continuous peripheral population expansion wherever the new technologies were successful. 'Demic expansion' is the name given to the phenomenon (that is, farming spread by farmers themselves) as contrasted with 'cultural diffusion' (that is, the spread of farming technique without movement of people). Innovations favoring demographic growth would be expected to determine both demic and cultural diffusion55, 128, 129. Recent research suggests a roughly equal importance of demic and cultural diffusion of agriculture from the Near East into Europe in the Neolithic period130, 131. Demic diffusion also results in the spread of the language of the initiators of the expansion. This probably occurred for Indo-European languages spreading from the Middle East to Europe and India132, or for Austronesian languages spreading to Polynesia133. There is generally a strong correlation between linguistic families and the genetic tree of major populations14, 63, with some important exceptions. There is generally a strong correlation of genetic tree clusters with language families63, 134, but there are also clear examples of historically dated language replacements. It is likely that these language shifts have become more common recently, with massive colonizations made possible by development of transportation and military technology. Knowledge, which forms the basis of human behavior, is accumulated by 'cultural transmission' over generations and is subject to rapid change within generations. We have developed a theory of cultural transmission, in which the most important feature is 'duality': culture is transmitted either 'vertically' from parents to children or 'horizontally' between people with no particular age or genetic relationship135. Evolution under vertical transmission is slow, although faster than genetic evolution, and its time unit of one generation is the same. In assessing the importance of vertical transmission, we note that children are more prone to accept parental education because of specific susceptibilities during 'critical periods' of maturation135, 136. For example, most 'mother tongues' are learned without accent only in the first 4– 5 years. But under coercion or other special circumstances, the language of a whole population can be fully replaced in 3– 4 generations. Although complete rapid replacement of languages may occur, such events are probably rare. Evolution under horizontal cultural transmission is usually much faster than under vertical transmission, and modern means of communication have made it exceptionally fast. Present-day humans are a 'cultural animal', but even today old customs may persist because some vertical cultural transmission remains important. Humans carry many parasites or commensal organisms, some of which began their relationship with humans more than 100 kya. If their transmission is even partly vertical—as it is for hepatitis B virus—then their evolution is similar to that of humans, with origins in Africa and a spread first to Asia and then, independently, from Asia to the other three continents. It has been suggested that this is true of other viruses, such as polyomavirus137, and also of the bacterium Helicobacter pylori138, which was recently found to be the causative agent of gastric ulcer. It is likely that the same evolutionary properties will be detected for other commensals and parasites, indicating that at least part of their transmission is vertical. Summary and outlook Late twentieth century population genetic research was marked by a significant expansion in the available research tools through a greater appreciation of the level of polymorphism in the human genome. The development of assays for loci that allowed inferences about female (mtDNA)– or male (Y chromosome)–specific histories yielded new insights into human history. A growing appreciation of the importance of the genetic structure of human populations has seen the scope and application of population genetic studies expand. In many ways we are currently hampered by the limited range of populations from which samples are available for detailed analysis. The World Cell Line Collection of 1,064 individuals from 52 populations is a beginning, but at least 5,000–10,000 from a more representative sampling of all continents would be preferable. Inferences about human history from small samples13, 17 are invariably fallible. Most published analyses concern genes chosen because of a putative relation to some phenotype, but sampling of DNA variation should be random with respect both to coding and non-coding regions71. Current statistical procedures to estimate the extent of migration or to measure the strength of selection from patterns of nucleotide variation are still primitive. New computational and analytical methods are needed for both if we are to increase our confidence in the calculation of ages of mutations and TMRCAs. A key requirement here is the ability to separate selection from demographic effects. Comparative sequencing of primates may facilitate the detection and estimation of selection. For haplotype determination, large samples of trios—father, mother, son—would be useful but expensive to obtain on a worldwide scale. Thus, improved algorithms for estimating haplotypes are required. Systems that combine SNPs and microsatellites may provide a way to map haplotypes more finely, to assess erosion of LD and to reconstruct the evolutionary history of gene regions139. Construction of somatic cell hybrids might, in the future, enable individual chromosomes to be isolated and made available for haplotypic analysis. There is great scope for more interaction among anthropologists and population geneticists. Recent work by Hewlett et al.140 suggests that correlation of microcultural variation and genetic variation in the same groups can be very informative about population interactions on various timescales. In the same vein, there are still few studies that compare patterns of variation in representative populations of human pathogens with those in their hosts. Perhaps this is a symptom of our focus on the genetics and diseases of developed countries and of the tiny fraction of available resources allocated to studying genetic variation in those populations about whom we have the least knowledge. REFERENCES 1. Hirszfeld, L. & Hirszfeld, H. Essai d'application des methods au probléme des races. Anthropologie 29, 505537 (1919). 2. Race, R.R. & Sanger, R. Blood Groups in Man (Blackwell Scientific, Oxford, 1975). 3. Ceppellini, R. et al. Genetics of leukocyte antigens. A family study of segregation and linkage. In Histocompatibility Testing (eds. Curtoni, E.S., Mattiuz, P.L. & Tosi, R.M.) (Munksgaard, Copenhagen, 1967). 4. Pauling, L., Itano, H.A., Singer, S.J. & Wells, I.C. Sicklecell anemia, a molecular disease. Science 110, 543-548 (1949). | ChemPort | 5. Harris, H. Enzyme polymorphisms in man. Proc. R. Soc. Lond. B 164, 298-310 (1966). | PubMed | ChemPort | 6. Lewontin, R.C. & Hubby, J.L. A molecular approach to the study of genetic hetrozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics 54, 595-609 (1966). | PubMed | ChemPort | 7. Mourant, A.E. The Distribution of Human Blood Groups (Blackwell Scientific, Oxford, 1954). 8. Mourant, A.E., Kopec, A.C. & Domaniewska-Sobczak, K. The Distribution of the Human Blood Groups and Other Polymorphisms (Oxford Univ. Press, London, 1976). 9. Mourant, A.E., Kopec, A.C. & Domaniewska-Sobczak, K. Blood Groups and Diseases (Oxford Univ. Press, Oxford, 1978). 10. Nei, M. & Roychoudhury, A.K. Human Polymorphic Genes: World Distribution (Oxford Univ. Press, New York, 1988). 11. Botstein, D., White, R.L., Skolnick, M. & Davis, R.W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32, 314-331 (1980). | PubMed | ChemPort | 12. Rosenberg, N.A. et al. Genetic structure of human populations. Science 298, 2381-2385 (2002). | Article | PubMed | ChemPort | 13. Stephens, J.C. et al. Haplotype variation and linkage disequilibrium in 313 human genes. Science 293, 489493 (2001). | Article | PubMed | ChemPort | 14. Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton Univ. Press, Princeton, NJ, 1994). 15. Xiao, W. & Oefner, P.J. Denaturing high-performance liquid chromatography. Hum. Mutat. 17, 439474. | PubMed | ChemPort | 16. Oberacher, H. et al. Re-sequencing of multiple single nucleotide polymorphisms by liquid chromatography-electrospray ionization mass spectrometry. Nucleic Acids Res. 30, e67. | PubMed | ChemPort | 17. Patil, N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719-1723 (2001). | Article | PubMed | ChemPort | 18. Kimura, M. Evolutionary rate at the molecular level. Nature 217, 624-626 (1968). | PubMed | ChemPort | 19. Przeworski, M., Hudson, R.R. & Di Rienzo, A. Adjusting the focus on human variation. Trends Genet. 16, 296-302 (2000). | Article | PubMed | ChemPort | 20. Hudson, R.R., Kreitman, M. & Aquadé, M. A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153-159 (1987). | PubMed | ChemPort | 21. Tajima, F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585-595 (1989). | PubMed | ChemPort | 22. Muse, S.V. & Gaut, B.S. A likelihood approach for comparing synonymous and non-synonymous nucleotide substitutions. Mol. Biol. Evol. 11, 715-724 (1994). | PubMed | ChemPort | 23. Yang, Z. & Nielsen, R. Synonymous and non-synonymous rate variation in nuclear genes of mammals. J. Mol. Evol. 46, 409-418 (1998). | PubMed | ChemPort | 24. Tishkoff, S.A. et al. Haplotype diversity and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial resistance. Science 293, 455-462 (2001). | Article | PubMed | ChemPort | 25. Verrelli, B.C. et al. Evidence for balancing selection from nucleotide sequence analyses of human G6PD. Am. J. Hum. Genet. 71, 1112-1128 (2002). | Article | PubMed | ChemPort | 26. Sabeti, P.C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832-837 (2002). | Article | PubMed | ChemPort | 27. Enard, W. et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869-872 (2002). | Article | PubMed | ChemPort | 28. Bamshad, M.J. et al. A strong signature of balancing selection in the 5' cis-regulatory region of CCR5. Proc. Natl. Acad. Sci. USA 99, 10539-10544 (2002). | Article | PubMed | ChemPort | 29. Huttley, G.A. et al. Adaptive evolution of the tumour suppressor BRCA1 in humans and chimpanzees. Nat. Genet. 25, 410-413 (2000). | Article | PubMed | ChemPort | 30. Toomajian, C. & Kreitman, M. Sequence variation and haplotype structure at the human HFE locus. Genetics 161, 1609-1623 (2002). | PubMed | ChemPort | 31. Cavalli-Sforza, L.L. Population structure and human evolution. Proc. R. Soc. Lond. B 164, 362-379 (1966). | PubMed | ChemPort | 32. Lewontin, R.C. & Krakauer, J. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74, 175-195 (1973). | PubMed | ChemPort | 33. Weir, B.W. Genetic Data Analysis II (Sinauer, Sunderland, MA, 1996). 34. Akey, J.M., Zhang, G., Zhang, K., Jin, L. & Shriver, M.D. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12, 1805-1814 (2002). | Article | PubMed | ChemPort | 35. Hamblin, M.T., Thompson, E.E. & Di Rienzo, A. Complex signatures of natural selection at the Duffy blood group locus. Am. J. Hum. Genet. 70, 369-383 (2002). | Article | PubMed | 36. Hollox, E.J. et al. Lactase haplotype diversity in the Old World. Am. J. Hum. Genet. 68, 160-172 (2001). | Article | PubMed | ChemPort | 37. Gilad, Y., Rosenberg, S., Przeworski, M., Lancet, D. & Skorecki, K. Evidence for positive selection and population structure at the human MAO-A gene. Proc. Natl. Acad. Sci. USA 99, 862-867 (2002). | Article | PubMed | ChemPort | 38. Rana, B.K. et al. High polymorphism at the human melanocortin 2 receptor locus. Genetics 151, 1547-1557 (1999). | PubMed | ChemPort | 39. Goldstein, D.B. & Chikhi, L. Human migrations and population structure: what we know and why it matters. Annu. Rev. Genom. Hum. Genet. 3, 129-152 (2002). | Article | ChemPort | 40. Cavalli-Sforza, L.L. Some current problems in human population genetics. Am. J. Hum. Genet. 25, 82-104 (1973). | PubMed | ChemPort | 41. Clark, A.G. et al. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Hum. Genet. 63, 595612 (1998). | Article | PubMed | ChemPort | 42. Ewens, W.J. in Mathematical Population Genetics 98-104 (Springer, Berlin, 1979). 43. Feldman, M.W. & Christiansen, F.B. The effect of population subdivision on two loci without selection. Genet. Res. 24, 151-162 (1974). | PubMed | ChemPort | 44. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. & Lander, E.S. High-resolution haplotype structure in the human genome. Nat. Genet. 29, 229-232 (2001). | Article | PubMed | ChemPort | 45. Jeffreys, A.J., Kauppi, L. & Neumann, R. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat. Genet. 29, 217-222 (2001). | Article | PubMed | ChemPort | 46. Goldstein, D.B. Islands of linkage disequilibrium. Nat. Genet. 29, 109-111 (2001). | Article | PubMed | ChemPort | 47. Reich, D.E. et al. Human genome sequence variation and the influence of gene history, mutation and recombination. Nat. Genet. 32, 135-142 (2002). | Article | PubMed | ChemPort | 48. Payne, R., Feldman, M.W., Cann, H. & Bodmer, J.G. A comparison of HLA data of the North American black with African black and North American caucasoid populations. Tissue Antigens 9, 135-147 (1977). | PubMed | ChemPort | 49. Kidd, K.K. et al. A global survey of haplotype frequencies and linkage disequilibrium at the DRD2 locus. Hum. Genet. 103, 211-227 (1998). | Article | PubMed | ChemPort | 50. Reich, D.E. et al. Linkage disequilibrium in the human genome. Nature 411, 199-204 (2001). | Article | PubMed | ChemPort | 51. Gabriel, S.B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225-2229 (2002). | Article | PubMed | ChemPort | 52. Edwards, A.W.F. & Cavalli-Sforza L.L. Reconstruction of evolutionary trees. In Phenetic and Phylogenetic Classification (eds. Heywood, V.E. & McNeill, J.) 67-76 (The Systematics Association, London, 1964). 53. Cavalli-Sforza, L.L. & Edwards, A.W.F. Analysis of human evolution. Proc. 11th Int. Congr. Genet. 2, 923-933 (1964). 54. Cavalli-Sforza, L.L. & Edwards, A.W.F. Phylogenetic analysis: models and estimation procedures. Am. J. Hum. Genet. 19, 223-257 (1967). 55. Menozzi, P., Piazza, A. & Cavalli-Sforza, L.L. Synthetic maps of human gene frequencies in Europe. Science 201, 786-792 (1978). | PubMed | ChemPort | 56. Bowcock, A.M. et al. Drift, admixture, and selection in human evolution: A study with DNA polymorphisms. Proc. Natl. Acad. Sci. USA 88, 839-843 (1991). | PubMed | ChemPort | 57. Bowcock, A.M. et al. High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455-457 (1994). | PubMed | ChemPort | 58. Cavalli-Sforza, L.L. & Piazza, A. Analysis of evolution: evolutionary rates, independence, and treeness. Theor. Popul. Biol. 8, 127-165 (1975). | PubMed | ChemPort | 59. Pritchard, J.K., Stephens, M. & Donnelly, P.J. Inference of population structure using multilocus genotype data. Genetics 155, 945-959 (2000). | PubMed | ChemPort | 60. Lewontin, R.C. The apportionment of human diversity. In Evolutionary Biology Vol. 6 (eds. Dobzhansky, T.H., Hecht, M.K. & Steere, W.C.) 381-398 (Appleton-CenturyCrofts, New York, 1972). 61. Nei, M & Roychoudhury, A.K. Genic variation within and between the three major races of Man, Caucasoids, Negroids, and Mongoloids. Am J. Hum. Genet. 26, 421443 (1974). | PubMed | ChemPort | 62. Barbujani, G., Magagni, A., Minch, E. & Cavalli-Sforza, L.L. An apportionment of human DNA diversity. Proc. Natl. Acad. Sci. USA 94, 4516-4519 (1997). | Article | PubMed | ChemPort | 63. Cavalli-Sforza, L.L., Piazza, A., Menozzi, P. & Mountain, J. Reconstruction of human evolution; bringing together genetic, archaeological, and linguistic data. Proc. Natl. Acad. Sci. USA 85, 6002-6006 (1988). | PubMed | ChemPort | 64. Brown, W.M., George, M. Jr. & Wilson, A.C. Rapid evolution of animal mitochondrial DNA. Proc. Natl. Acad. Sci. USA 76, 1967-1971 (1979). | PubMed | ChemPort | 65. Johnson, M.J., Wallace, D.C., Ferris, S.D., Rattazzi, M.C. & Cavalli-Sforza, L.L. Radiation of human mitochondrial DNA types analyzed by restriction endonuclease cleavage patterns. J. Mol. Evol. 19, 255-271 (1983). | PubMed | ChemPort | 66. Mountain, J.L., Lin, A.A., Bowcock, A.M. & Cavalli-Sforza, L.L. Evolution of modern humans: evidence from nuclear DNA polymorphisms. Phil. Trans. R. Soc. Lond. B. 337, 159-165 (1992). | ChemPort | 67. Cann, R.L., Stoneking, M. & Wilson, A.C. Mitochondrial DNA and human evolution. Nature 325, 31-36 (1987). | PubMed | ChemPort | 68. Templeton, A.R. Human origins and analysis of mitochondrial DNA sequences. Science 255, 737 (1992). | PubMed | ChemPort | 69. Chen, Y.-S., Torroni, A., Excoffier, L., SantachiaraBenerecetti, A.S. & Wallace, D.C. Analysis of mtDNA variation in African populations reveals the most ancient of all human continent-specific haplogroups. Am. J. Hum. Genet. 57, 133-149 (1995). | PubMed | ChemPort | 70. Ingman, M. Kaessmann, H., Pääbo, S. & Gyllensten, U. Mitochondrial genome variation and the origin of modern humans. Nature 408, 708-713 (2000). | Article | PubMed | ChemPort | 71. Shen, P. et al. Population genetic implications from DNA polymorphism in random human genomic sequences. Hum. Mutat. 20, 209-217 (2002). | Article | PubMed | ChemPort | 72. Satta, Y., Klein, J. & Takahata, N. DNA archives and our nearest relative: the trichotomy problem revisited. Mol. Phylogenet. Evol. 14, 259-275 (2000). | Article | PubMed | ChemPort | 73. Rosenberg, N.A & Feldman, M.W. The relationship between coalescent times and population divergence times. In Modern Developments in Theoretical Population Genetics (eds. Slatkin, M. & Veuille, M) 130-164 (Oxford Univ. Press, Oxford, 2002). 74. Rogers, A.R. & Harpending, H. Population growth makes waves in the distribution of pairwise genetic differences. Mol. Biol. Evol. 9, 552-569 (1992). | PubMed | ChemPort | 75. Tang, H., Siegmund, D.O., Shen, P., Oefner, P.J. & Feldman, M.W. Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition. Genetics 161, 447-459 (2002). | PubMed | ChemPort | 76. Kajander, O.A., Karhunen, P.J., Holt, I.J. & Jacobs, H.T. Prominent mitochondrial DNA recombination intermediates in human heart muscle. EMBO Rep. 2, 1007-1012 (2001). | Article | PubMed | ChemPort | 77. Underhill, P.A. et al. Y chromosome sequence variation and the history of human populations. Nat. Genet. 26, 358-361 (2000). | Article | PubMed | ChemPort | 78. Hammer, M.F. et al. Hierarchical patterns of global human Y-chromosome diversity. Mol. Biol. Evol. 18, 1189-1203 (2001). | PubMed | ChemPort | 79. Paracchini, S., Arredi, B., Chalk, R. & Tyler-Smith, C. Hierarchical high-throughput SNP genotyping of the human Y chromosome using MALDI-TOF mass spectrometry. Nucleic Acids Res. 30, e27 (2002). | Article | PubMed | 80. Kingman, J.F.C. The coalescent. Stochastic Processes and their Applications. 13, 235-248 (1982). 81. Hudson, R.R. Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7, 203-217 (1990). 82. Griffiths, R.C. & Tavaré, R.C. Ancestral inference in population genetics. Stat. Sci. 9, 307-319 (1994). 83. Thomson, R., Pritchard, J.K., Shen, P., Oefner, P.J. & Feldman, M.W. Recent common ancestry of human Y chromosomes: evidence from DNA sequence data. Proc. Natl. Acad. Sci. USA 97, 7360-7365 (2000). | Article | PubMed | ChemPort | 84. Seielstad, M.T., Minch, E. & Cavalli-Sforza, L.L. Genetic evidence for a higher female migration rate in humans. Nat. Genet. 20, 278-280 (1998). | Article | PubMed | ChemPort | 85. Salem, A.H., Badr, F.M., Gaballah, M.F. & Pääbo, S. The genetics of traditional living: Y-chromosomal and mitochondrial lineages in the Sinai Peninsula. Am. J. Hum. Genet. 59, 741-743 (1996). | PubMed | ChemPort | 86. Sajantila, A. et al. Paternal and maternal DNA lineages reveal a bottleneck in the founding of the Finnish population. Proc. Natl. Acad. Sci. USA 93, 12035-12039 (1995). | Article | 87. Finnilä, S., Hassinen, I.E., Ala-Kokko, L. & Majamaa, K. Phylogenetic network of the mtDNA haplogroup U in northern Finland based on sequence analysis of the complete coding region by conformation-sensitive gel electrophoresis. Am. J. Hum. Genet. 66, 1017-1026 (2000). | Article | PubMed | 88. Richards, M. et al. Tracing European founder lineages in the near eastern mtDNA pool. Am. J. Hum. Genet. 67, 1251-1276 (2000). | PubMed | ChemPort | 89. Zerjal, T. et al. Genetic relationships of Asians and northern Europeans, revealed by Y-chromosomal DNA analysis. Am. J. Hum. Genet. 60, 1174-1183 (1997). | PubMed | ChemPort | 90. Richards, M., Oppenheimer, S. & Sykes, B. mtDNA suggests Polynesian origins in eastern Indonesia. Am. J. Hum. Genet. 62, 1234-1236 (1998). | Article | 91. Kayser, M. et al. Melanesian origin of Polynesian Y chromosomes. Curr. Biol. 10, 1237-1246 (2000). | Article | PubMed | ChemPort | 92. Underhill, P.A. et al. Maori origins, Y chromosome haplotypes and implications for human history in the Pacific. Hum. Mutat. 17, 271-280 (2001). | Article | PubMed | ChemPort | 93. Oota, H., Settheetham-Ishida, W., Tiwaweck, D., Ishida, T. & Stoneking, M. Human mtDNA and Y-chromosome variation is correlated with matrilocal versus patrilocal residence. Nat. Genet. 29, 20-21 (2001). | Article | PubMed | ChemPort | 94. Templeton, A.R. Out of Africa again and again. Nature 416, 45-51 (2002). | Article | PubMed | ChemPort | 95. Steinmetz, L.M. et al. Dissecting the architecture of a quantitative trait locus in yeast. Nature 416, 326-330 (2002) | Article | PubMed | ChemPort | 96. Underhill, P. et al. The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann. Hum. Genet. 65, 43-62 (2001). | Article | PubMed | ChemPort | 97. Lahr, M.M. & Foley, R. Multiple dispersals and modern human origins. Evol. Anthropol. 3, 48-60 (1994). 98. Lahr, M.M. & Foley, R.A. Towards a theory of modern human origins: geography, demography, and diversity in recent human evolution. Am. J. Phys. Anthropol. 27, 137176 (1998). 99. Stringer, C. Coasting out of Africa. Nature 405, 24-27 (2000). | Article | PubMed | ChemPort | 100. Greenberg, J. Language in the Americas (Stanford Univ. Press, Stanford, CA, 1987). 101. Fagan, B.M. The Great Journey: The Peopling of Ancient America (Thames and Hudson, London, 1987). 102. Weidenreich, F. Apes, Giants, and Man (Univ. Chicago Press, Chicago, IL, 1946). 103. Wolpoff, M.H. Multiregional evolution: The fossil alternative to Eden. In The Human Revolution: Behavioural and Biological Perspectives on the Origins of Modern Humans (eds. Mellar, P. & Stringer, C) 62-108 (Princeton Univ. Press, Princeton, NJ, 1989). 104. Weiss, K.M. & Maruyama, T. Archeology, population genetics and studies of human racial ancestry. Am. J. Phys. Anthropol. 44, 31-50 (1976). | PubMed | ChemPort | 105. Krings, M. et al. Neandertal DNA sequences and the origin of modern humans. Cell 90, 19-30 (1997). | PubMed | ChemPort | 106. Krings, M., Geisert, H., Schmitz, R.W., Krainitzki, H. & Pääbo, S. DNA sequence of the mitochondrial hypervariable region II from the Neandertal type specimen. Proc. Natl. Acad. Sci. USA 96, 5581-5585 (1999). | Article | PubMed | ChemPort | 107. Krings, M. et al. A view of Neandertal genetic diversity. Nat. Genet. 26, 144-146 (2000). | Article | PubMed | ChemPort | 108. Ovchinnikov, I.V. et al. Molecular analysis of Neanderthal DNA from the northern Caucasus. Nature 404, 490-493 (2000). | Article | PubMed | ChemPort | 109. Harding, R.M. et al. Archaic African and Asian lineages in the genetic ancestry of modern humans. Am. J. Hum. Genet. 60, 772-789 (1997). | PubMed | ChemPort | 110. Harris, E.E. & Hey, J. X-chromosome evidence for ancient human histories. Proc. Natl. Acad. Sci. USA 96, 3320-3324 (1999). | Article | PubMed | ChemPort | 111. Knowles, L.L. & Madison, W.P. Statistical phylogeography. Mol. Ecol. 11, 2623-2635 (2002). | Article | PubMed | 112. Kidd, J.R. et al. Haplotypes and linkage disequilibrium at the phenylalanine hydroxylase locus, PAH, in a global representation of populations. Am. J. Hum. Genet. 66, 1882-1899 (2000). | Article | PubMed | ChemPort | 113. Osier, M.V. et al. A global perspective on genetic variation at the ADH genes reveals unusual patterns of linkage disequilibrium and diversity. Am. J. Hum. Genet. 71, 84-99 (2002). | Article | PubMed | ChemPort | 114. The Y Chromosome Consortium. A nomenclature system for the tree of human Y-chromosomal binary haplogroups. Genome Res. 12, 339-348 (2002). 115. Slatkin, M. & Hudson, R.R. Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129, 555-562 (1991). | PubMed | ChemPort | 116. Donnelly, P. Interpreting genetic variability: the effects of shared evolutionary history. In Variation in the Human Genome, Ciba Foundation Symposium No. 197 (Wiley, Chichester, UK, 1996). 117. Anati, E. The Intellectual Expressions of Prehistoric Man: Art and Religion. Acts of the Valcamonica Symposium '79. Centro Camuno di Studi Preistorici, Capo di Ponte, Brescia, Italy, and (Editoriale Jaca Book SpA, Milano, Italy, 1983). 118. Conkey, M.W., Soffer, O., Stratmann, D. & Jablonski, N.G. (eds.) Beyond Art: Pleistocene Image and Symbol, Watts Symposium Series in Anthropology: Memoirs of the California Academy of Sciences No. 23 (California Academy of Sciences, San Francisco, CA, 1997). 119. McBrearty, S. & Brooks, A.S. The revolution that wasn't: a new interpretation of the origin of modern human behavior. J. Hum. Evol. 39, 453-563 (2000). | Article | PubMed | ChemPort | 120. Lieberman, P. & Crelin, E.S. On the speech of Neanderthal man. Linguistic Inquiry. 2, 203-222 (1971). 121. Arensburg, B., Schepartz, L.A., Tillier, A.M., Vandermeersch, B. & Rak, Y. A reappraisal of the anatomical basis for human speech in Middle Paleolithic hominids. Am. J. Phys. Anthropol. 83, 137-146 (1990). | PubMed | ChemPort | 122. Greenberg, J.H. The Languages of Africa (Bloomington, Indiana, 1963). 123. Greenberg, J.H. The Indo-Pacific Hypothesis. Current Trends in Linguistics, Volume 8, 809-871 (1971). 124. Greenberg, J.H. Indo-European and Its Closest Relatives: The Eurasiatic Language Family: Grammar (Stanford Univ. Press, Stanford, California, 2000). 125. Ruhlen, M. On the Origin of Languages: Studies in Linguistic Taxonomy (Stanford Univ. Press, Stanford, California, 1994). 126. Eshel, I. & Cavalli-Sforza, L.L. Assortment of encounters and evolution of cooperativeness. Proc. Natl. Acad. Sci. USA 79, 1331-1335 (1982). 127. Klein, R.G. The Human Career 2nd edn (Univ. of Chicago Press, Chicago, IL, 1999). 128. Ammerman, A.J. & Cavalli-Sforza, L.L. The Neolithic Transition and the Genetics of Populations in Europe (Princeton Univ. Press, Princeton, New Jersey, 1984). 129. King, R. & Underhill, P.A. Congruent distribution of Neolithic painted pottery and ceramic figurines with Ychromosome lineages. Antiquity 76, 707-714 (2002). 130. Chikhi, L., Destro-Bisol, G., Bertorelle, G., Pascali, V. & Barbujani, G. Clines of nuclear DNA markers suggest a largely Neolithic ancestry of the European gene pool. Proc. Natl. Acad. Sci. USA 95, 9053-9058 (1998). | Article | PubMed | ChemPort | 131. Chikhi, L., Nichols, R.A., Barbujani, G. & Beaumont, M.A. Y genetic data support the Neolithic demic diffusion model. Proc. Natl. Acad. Sci. USA 99, 11008-11013 (2002). | Article | PubMed | ChemPort | 132. Renfrew, C. Archaeology and Language: The Puzzle of Indo-European Origins (Jonathan Cape, London, 1987). 133. Bellwood, P.S. The colonization of the Pacific: some current hypotheses. In The Colonization of the Pacific: A Genetic Trail (eds. Hill, A.V.S. & Serjeantson, S.W.) 1-59 (Oxford Univ. Press, New York, 1989). 134. Cavalli-Sforza, L.L., Minch, E. & Mountain, J. Coevolution of genes and languages revisited. Proc. Natl. Acad. Sci. USA 89, 5620-5624 (1992). | PubMed | ChemPort | 135. Cavalli-Sforza, L.L. & Feldman, M.W. Cultural Transmission and Evolution: A Quantitative Approach. (Princeton Univ. Press, Princeton, New Jersey, 1981). 136. Cavalli-Sforza, L.L. Genes, Peoples and Languages (North Point Press, New York, 2000). 137. Sugimoto, C. et al. Typing of urinary JC virus DNA offers a novel means of tracing human migrations. Proc. Natl. Acad. Sci. USA 94, 9191-9196 (1997). | Article | PubMed | ChemPort | 138. Covacci, A., Telford, J.L., Del Giudice, G., Parsonnet, J. & Rappuoli, R. Helicobacter pylori virulence and genetic geography. Science 284, 1328-1333 (1999). | Article | PubMed | ChemPort | 139. Mountain, J.L. et al. SNPSTRs: Empirically derived, rapidly typed, autosomal haplotypes for inference of population history and mutational processes. Genome Res. 12, 1766-1772 (2002). | Article | PubMed | ChemPort | 140. Hewlett, B.S., De Silvestri, A. & Guglielmino, C.R. Semes and genes in Africa. Curr. Anthropol. 43, 313-321 (2002). | Article | 141. Latter, B.D.H. Genetic differences within and between populations of the major human subgroups. Am. Nat. 116, 220-237 (1980). | Article | 142. Ryman, N., Chakraborty, R. & Nei, M. Differences in the relative distribution of human gene diversity between electrophoretic, and red and white cell antigen loci. Hum. Hered. 33, 93-102 (1983). | PubMed | ChemPort | 143. Kivisild, T. et al. The genetic heritage of earliest settlers persist in both the Indian tribal and caste populations. Am. J. Hum. Genet. (in the press). 144. Thangaraj, K. et al. Genetic affinities of the Andaman islanders, a vanishing human population. Curr. Biol. (in the press). 145. Semino, O., Santachiara-Benerecetti, A.S., Falaschi, F., Cavalli-Sforza, L.L. & Underhill, P.A. Ethopians and Khoisan share the deepest clades of the human Y-chromosome phylogeny. Am. J. Hum. Genet. 70, 265-268 (2002). | Article | PubMed | ChemPort | 146. Cruciani, F. et al. An Asia to sub-Saharan Africa back migration is supported by high-resolution analysis of human Y chromosome haplotypes. Am. J. Hum. Genet. 70, 1197-1214 (2002). | Article | PubMed | ChemPort | Figure 1: Summary tree of world populations. Phylogenetic tree based on polymorphisms of 120 protein genes in 1,915 populations grouped by continental sub-areas and Fst genetic distances14. Root placed assuming a constant rate of evolution. Figure 2: Relationship between genetic and geographic distance. Genetic distance of population pairs measured by Fst as a function of geographic distance between members of the pairs14. Only samples from indigenous people were included. Continents where primitive economies predominate (huntinggathering or tropical gardening) show highest asymptotes. Asia and the world do not asymptote within the range shown. Figure 3: The migration of modern Homo sapiens. The scheme outlined above begins with a radiation from East Africa to the rest of Africa about 100 kya and is followed by an expansion from the same area to Asia, probably by two routes, southern and northern between 60 and 40 kya. Oceania, Europe and America were settled from Asia in that order. Figure 4: High resolution molecular phylogeny to study human history. a, Phylogeny of human mtDNA haplogroups and their continental affiliation composed from resequencing of 277 individuals143, 144. The length of the branches corresponds approximately to the number of mutations. b, Phylogeny of human Y chromosome haplogroups and the continental affiliation of their most frequent occurrence, composed from population genotyping of over 1,000 individuals and resequencing of over 100145, 146. The length of the branches corresponds approximately to the number of mutations. Figure 5: Language families of the world. The 12 families of the Greenberg classification88, 122-125. The Eurasiatic superfamily includes six families (most of which are recognized by most linguists) and an isolate, Gilyak, listed in the central column. The oldest family is the Khoisan that includes Bushmen and Hottentots, many of whom also belong genetically to the oldest haplogroups of both mtDNA and NRY. Australian and Indopacific are also old families. Other African languages are Niger-Kordofanian (mostly west Africa), Nilo-Saharan and Afroasiatic (that includes Semitic languages like Arab and Hebrew). American languages belong to three families: Amerinds were the first to migrate from Asia, according to some (Fagan, ref. 89) as late as 15 kya, and Amerind shows affinities with Eurasiatic. One of the other two American families is Na-Dene (belonging to Dene-Caucasian), a family that probably spread to Eurasia before Eurasiatic and includes Sinotibetan, spoken in almost all of China, as well as some isolated, probably relic, languages (Basque, a few Caucasian languages and Burushaski, spoken in N. Pakistan) that all survived the later spread of Eurasiatic languages. The third American family is Eskimo-Aleut, the last to spread to America from N.E. Siberia. The Austric family is very large and is spoken in S.E. Asia, Indonesia, all of Polynesia to the east and Madagascar to the west. Table 1: Variation components within and between populations Nature Reviews Genetics 2, 207-216 (2001); doi:10.1038/35056058 [1898K] THE HUMAN Y CHROMOSOME, IN THE LIGHT OF EVOLUTION Bruce T. Lahn1, Nathaniel M. Pearson1, 2 & Karin Jegalian3 about the authors Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago, 920 East 58th Street, Chicago, Illinois 60637, USA. blahn@genetics.uchicago.edu 2 Department of Ecology and Evolution, University of Chicago, 920 East 58th Street, Chicago, Illinois 60637, USA. npearson@uchicago.edu 3 National Human Genome Research Institute, National Institutes of Health, 9,000 Rockville Pike, Bethesda , Maryland 20892, USA. 1 Most eukaryotic chromosomes, akin to messy toolboxes, store jumbles of genes with diverse biological uses. The linkage of a gene to a particular chromosome therefore rarely hints strongly at that gene's function. One striking exception to this pattern of gene distribution is the human Y chromosome. Far from being random and diverse, known human Ychromosome genes show just a few distinct expression profiles. Their relative functional conformity reflects evolutionary factors inherent to sex-specific chromosomes. In many DIOECIOUS taxa, karyotype determines sex. To mediate this developmental decision, sex-chromosome pairs have arisen independently among such lineages from separate pairs of ordinary autosomes1. The sex chromosomes of one taxon can, therefore, differ phylogenetically and structurally from those of another. The mammalian sex chromosomes, for example, are not specifically related to those of birds, insects or plants. Despite their many origins, the sex chromosomes of diverse life forms are strikingly alike. Ever-hemizygous chromosomes (that is, the Y chromosome (hereafter the Y) in XY or the W chromosome in ZW systems) tend to be small, gene-poor and rich in repetitive sequence. Their non-sex-specific partners, the X chromosome (hereafter the X) and Z chromosome, tend to be more autosome-like in form and content, and in many cases undergo dosage compensation to equalize gene activity between the sexes. This gross convergence of sex chromosomes among disparate lineages hints that common factors drive their evolution. Such factors are increasingly well understood, thanks largely to studies of the mammalian sex chromosomes and of the human Y in particular. Here, we review how studies of the human Y have already cast a spotlight on the role of evolution in moulding the distinctive biological properties of sex chromosomes. Classes of human Y-chromosome genes A typical eukaryotic chromosome encodes a motley assortment of gene products; functionally related genes do not tend to jointly occupy particular chromosomes. It is curious, then, that one of the shortest human chromosomes — the Y — might contain the longest human genomic region, in which genes show only a few distinct expression profiles. To the extent that tissue specificity reflects functionality, the human Y thus harbours remarkably low gene-functional diversity. In fact, if classified jointly by location and apparent function, known human Y genes boil down to pseudoautosomal loci and three basic classes of non-recombining, male-specific loci. The pseudoautosomal regions (PARs) at the ends of the human Y comprise 5% of its sequence (this fraction, consistently small, varies among mammals)2, 3. In male meiosis, the PARs of the X and Y recombine with each other at high, if subregionally varied, rates4, 5. Accordingly, PAR genes, like autosomal genes, are shared freely between the sexes. Although highly recombinogenic relative to the human genome as a whole, the human PARs generally resemble autosomes in base composition, and in gene density and diversity. About a dozen pseudoautosomal genes, most of them on the short arm, have been identified. Most of these genes elude X inactivation, as would be expected of genes with sex-uniform dosage. Curiously, two genes on the long arm human PAR, SYBL1 (synaptobrevin-like 1) and HSPRY3 (sprouty (Drosophila) homologue 3), reportedly undergo X and Y inactivation in females and males, respectively, which indicates that this region might have a complex evolutionary history that involves recent X-to-Y translocation6. Most of the remainder of the human Y recombines with neither the X nor any other chromosome. This non-recombining region of the Y (NRY) consists largely of highly repetitive sequences that are rich in transposons and other elements whose replication and/or expression is unlikely to directly benefit the human host 7. Of the 60megabase (Mb) human NRY, 35 Mb are euchromatic. Most of the remainder is a block of heterochromatin on the long arm. Nearly one-half of the euchromatic portion of the NRY has been sequenced through the publicly funded Human Genome Project. Representative sequencing of the entire euchromatic NRY is expected to be completed within this year. So far, 21 distinct genes or gene families that are expressed in healthy tissues have been identified in the human NRY. These group into three salient classes — classes 1, 2 and 3 — largely on the basis of expression profile and homology to the X. The eight known class 1 genes are single copy, are expressed widely in the body and have like-functioning X-linked homologues. Class 2 also has eight known members, each of which is multicopy, expressed only in the testis and without an active X homologue (Fig. 1, Table 1). Class 3 contains the human NRY genes that blur an otherwise sharp bipartition defined by classes 1 and 2. Most prominent among these is the SRY (sex-determining region Y) gene, the master trigger of male embryonic differentiation. The single-copy SRY gene is expressed in the embryonic BIPOTENTIAL GONAD — where it initiates the development of the testis — and also in the adult testis. The X carries the SOX3 (SRY-box 3) gene, an active homologue of SRY8, 9. Two other notable class 3 NRY genes are AMELY (amelogenin Y) and PCDHY (protocadherin Y). Unlike the widely expressed class 1 and testis-specific class 2 genes, AMELY and its X homologue, AMELX (amelogenin X), are expressed only in developing tooth buds 10. Similarly, PCDHY and its X homologue, PCDHX (protocadherin X), are expressed mainly in the brain11, 12. The remaining NRY genes are RBMY (RNA-binding motif protein Y) and VCY (variable charge Y, previously called BPY1), which have features of both classes 1 and 2. Like class 1 genes, they have active X homologues (named RBMX (RNA-binding motif protein X) and VCX (variable charge X), respectively); like class 2 genes, they are expressed from multiple copies, but in the testis only. The single-copy X homologue of RBMY is widely expressed and dosage compensated 8, 9, whereas the many X homologues of VCY are expressed only in the testis (and so are inactive in females)13. Figure 1 | Active genes on the human Y chromosome. Yellow bar, euchromatic portion of the non-recombining region of the Y chromosome (NRY); black bar, heterochromatic portion of the NRY; grey bar, centromere; red bars, pseudoautosomal regions (genes omitted). Genes named to the right of the chromosome have active X-chromosome homologues. Genes named to the left of the chromosome lack known X homologues. Genes in red are widely expressed housekeeping genes; genes in black are expressed in the testis only; and genes in green are expressed neither widely, nor testis specifically (AMELY (amelogenin Y) is expressed in developing tooth buds, whereas PCDHY (protocadherin Y) is expressed in the brain). With the exception of the SRY (sex-determining region Y) gene, all the testis-specific Y genes are multicopy. Some multicopy gene families form dense clusters, the constituent loci of which are indistinguishable at the resolution of this map. Three regions often found deleted in infertile men, AZFa, b, c (azoospermia factor region a, b, c), are indicated. Table 1 | Classification of human Y-chromosome genes Converging theoretical and empirical evidence shows how and why the gene content of the NRY reflects the region's distinctive history. Altogether, the three gene classes of the region show markedly limited functional themes — in stark contrast to the genic miscellany of other human chromosomes. This remarkable functional specialization highlights two evolutionary processes inherent to Ys: genetic decay and the accumulation of genes that specifically benefit male fitness. Degeneration of the Y chromosome The mammalian sex chromosomes are thought to have arisen from an ordinary pair of autosomes 300 million years ago14. Until then, ambient temperature during embryonic development might have determined the sex of mammalian ancestors, as in many modern reptiles and other descendants of bony fish 15. The foremost sexchromosome bearers in this CLADE are, notably, birds and mammals — both HOMEOTHERMS, for whom temperature might have ceased to be useful as a signal for developmental switching. In mammals, sex chromosomes probably arose with the differentiation of SRY from its homologue, SOX3, which persists on the mammalian X8, 9 . Sequence and expression comparisons indicate that SRY and SOX3 descend from a specific progenitor gene, with the more derived SRY having gained and kept the maledetermining function9. The emergence of a dominant and PENETRANT sex-determining allele of the proto-SOX3/SRY gene would have effectively rendered an autosome pair into sex chromosomes, starting a long and dramatic evolutionary process. Over aeons, the mammalian X and Y diverged, with the gross structure of the X changing remarkably little, while the Y rapidly degenerated1, 14, 16, 17. The rampant attrition of gene activity from evolving Ys has long been noted. In fact, MEROHAPLODIPLOID sex determination (for example, XX:XO) is thought to represent a relatively stable endgame in sex-chromosome evolution18. Potential causes and mechanisms of Y-specific degeneration have drawn heated speculation. Why and how have large X and Y regions stopped recombining with each other? And why might Y genes tend to decay once they stop recombining with their X counterparts? Recent results indicate that, on the evolutionary lineage leading to humans, the mutually non-recombining portions of the human Xs and Ys greatly expanded several times, each time converting a block of previously freely recombining sequence into Xand Y-specific regions14. The striking similarity in gene order seen among disparate mammalian Xs, compared with the relative scrambling of genes seen among mammalian Ys (Fig. 2), indicates that such coarse blockwise (versus smooth) consolidation of Y-haplotype linkage was probably caused by serial, large-scale inversion of much of the Y itself. Such inversions would have disrupted alignment, and thus recombination, between progressively larger regions of the Xs and Ys. At least four multigene inversions seem to mark the human Y lineage: the first 300 million years ago and the last 30 million years ago14 ( Fig. 3). Consolidating linkage across wide swathes of the chromosome, such inversions might have swept to fixation in ancestral populations either by GENETIC DRIFT, or by selection if they bound together alleles that conferred benefit only in the presence of the sex-determining gene. That gene, SRY, seems to have been the first active gene on the Y to cease recombination with the X, as ranked by silent divergence between X and Y homologues14. The history of Y gene rearrangement (as well as gain and loss) varies among mammalian lineages (Fig. 2); such variation will prove phylogenetically informative as more non-human mammalian Y sequences become available. Figure 2 | Sex chromosomes in mammals. The radiation hybrid maps show a | conservation of locus order in disparate mammalian X chromosomes (cat and human) compared with b | the relative rearrangement of Y chromosomes in the same taxa. A similar comparison of the human Y to those of other primates (omitted for simplicity) reveals more recent taxon-specific rearrangements108. Adapted from Ref. 109. Figure 3 | Human sex-chromosome evolution. The figure shows the overall shrinkage of the Y chromosome and the blockwise expansion of its non-recombining region (NRY), probably mediated by serial large-scale inversion as posited by Lahn and Page14. Main events are noted and roughly dated (Myr ago, millions of years ago), with new NRY genes placed in parentheses, and phylogenetic branches indicated by arrows. Blue regions are freely recombining. Yellow regions are X-chromosome specific. Red regions are Y-specific (NRY). The green region represents PCDHX/Y (protocadherin X/Y)-containing sequence that has translocated from the X to the NRY (some other likely translocations are omitted for simplicity). The diagram is not drawn to scale and centromeres are omitted, as their locations are uncertain for many evolutionary stages. (PARp, short arm pseudoautosomal region.) But why do NRY genes tend to decay? Several models point to their lack of recombination as a key factor. Edmund Wilson, and later Hermann Muller, proposed that the NRY accumulates null alleles because intact X homologues shelter them 19, 20; such defunct loci are not selectively purged as they would be if rendered homozygous by recombination. A more general theory by Muller, dubbed "Muller's ratchet" (and extended by Brian Charlesworth and others), holds that, in the face of largely harmful mutations, only recombination can adequately regenerate highly fit alleles (that is, crossover between harmful variants that occupy different sites in a locus can yield a repaired allele)21, 22. William Rice invoked Muller's ratchet in considering tight linkage across multiple loci, not all of which carry beneficial alleles; he gave the name "genetic hitchhiking" to the spread of potentially harmful alleles that are linked to selectively favoured alleles, with a concomitant reduction in local nucleotide diversity 23. Human NRY haplotypes are — as predicted by such models — nearly static, strikingly poor in variation (despite relatively frequent mutation, apparently owing to greater male than female germ-cell turnover in mammals) and greatly eroded in function relative to other genomic regions24. By recombination, such other regions can maintain diverse, highly fit haplotypes that readily spread by selection, thanks to the greater, and thus less genetic-drift-prone, EFFECTIVE POPULATION SIZE of diploid versus haploid regions. Although the details of relevant models spur debate, most evolutionary biologists agree that recombination shuffles alleles so that well-adapted haplotypes can readily replace ill-adapted ones. Indeed, experimentally restricting local recombination in laboratory fruitfly populations has been shown to threaten their long-term genetic integrity25. The functional blight of NRYs might also explain their characteristic shrinkage and/or accumulation of non-essential — perhaps even parasitic — retroviral and heterochromatic sequences. Many gene-like NRY loci are not expressed in humans, as in many other taxa with XY systems, whereas their X counterparts remain active. This observation belies the pervasive decay that is associated with overly robust linkage. Nevertheless, a handful of non-recombining homologue pairs remain active on both chromosomes. Bucking the decay trend, these genes attest to the common ancestry of the Xs and Ys. Two alternative scenarios might account for their persistence in the NRY. Persistence of XY-chromosome homologues In the first scenario, X-homologous NRY genes might have functions crucial to both sexes. Such genes persist, with little differentiation, if proper development requires their double dosage (two X copies in females, or X and Y copies in males)26. In that case, X and Y homologues should function roughly equivalently, and, to maintain sexuniform dosage, the former should elude X inactivation. Class 1 human NRY genes meet these conditions. They and their X homologues encode widely expressed housekeeping proteins, many of which are crucial to viability26. The observed ratio of protein-to-nucleotide divergence between such XY homologues is significantly lower than that for other neighbouring loci — consistent with the idea that selection has conserved the functional similarity of X and Y copies26. Finally, the X homologues of nearly all these class 1 NRY genes elude X inactivation26-28. In the second scenario, NRY genes persist because they have specialized in malespecific functions, such as somatic masculinization or spermatogenesis. As such, they differ significantly in function from their X homologues (which presumably preserve ancestral functions). An exemplar is SRY, which apparently differentiated from its widely expressed X homologue, SOX3, to gain and maintain a key function in male development8, 9. Another is the testis-specific class 3 gene RBMY, the X homologue of which, RBMX, is expressed in diverse tissues29, 30. Presumably, in both cases, the progenitor of the XY-homologue pair was widely expressed. During subsequent evolution, the X homologue (SOX3 or RBMX) maintained this expression status, whereas the activity of the Y homologue (SRY or RBMY) became testis-specific (and thus male-specific). Other examples of NRY genes that have adopted specialized male functions are reported in the mouse. Three mouse NRY genes, Zfy (zinc-finger protein), Ube1y (ubiquitin-activating enzyme E1) and Usp9y (ubiquitin-specific protease 9), show testis-specific expression, whereas their X homologues are expressed in many other tissues31-33. Accumulation of spermatogenic genes Although NRY genes with X homologues clearly attest to ancestral XY homology, the evolutionary origins of class 2 NRY genes (which lack X homologues) are less obvious. Early clues to the history of these testis-specific genes came from studies of the CDY (chromo-domain protein Y) and DAZ (deleted in azoospermia) genes. Both have specific autosomal PARALOGUES: CDYL (chromodomain protein Y-like) and DAZL (deleted in azoospermia-like), respectively. These autosomal genes are found throughout mammals, whereas CDY and DAZ are found only on primate Ys. These observations indicate that early mammals might have had only DAZL and CDYL, the paralogues of which arose de novo at some point and were maintained on the primate Y lineage26, 3436 . DAZ and DAZL are spliced alike, which indicates that DAZ might have reached the Y by inter-chromosomal transposition of DAZL35. CDY is an intronless version of CDYL, which indicates that CDY might have arisen by retroposition of CDYL mRNA36. The gain and retention of genes that specifically benefit male fecundity — and promote spermatogenesis in particular — seems to be a global theme in Y evolution. Biologists have long suspected, and sometimes confirmed, the great importance of male-specific chromosomes in spermatogenesis34, 37-45. Male fruitflies that lack a Y, for example, produce no fertile sperm39, 42. Factors that potentially drive the accumulation of spermatogenic function in Ys have drawn much speculation. Ronald Fisher posited a selective advantage in sequestering, within a male-specific portion of the genome, any genes that benefit males but harm females46. This sexual antagonism model was invoked to account for the Y linkage of ornamentation genes in guppies47 (Fig. 4); these genes probably enhance male attractiveness and fecundity, but would reduce fecundity in female carriers, as female ornamentation increases predation risk without effectively boosting mating chances. Figure 4 | Example of a Y-chromosome-linked trait. Male (top) and female (bottom) guppies (Poecilia reticulata). Colourful male ornamentation, which enhances both sexual attractiveness to females and visibility to would-be predators, reflects the expression of Y-chromosomelinked genes. Photo courtesy of N.M.P. Sexual antagonism might plausibly explain the accumulation of spermatogenic genes on Ys, because such genes clearly benefit males but might harm females. Indeed, women that carry Y fragments are especially prone to gonadoblastoma, a form of ovarian tumour48, 49. However, impairment of female fitness by spermatogenic genes could alternatively be mitigated, potentially at low metabolic cost, by transcriptionally silencing these genes in females, instead of moving them to the Y. This possibility makes the sexual antagonism model less generally compelling. Accordingly, we invoke an additional argument — "constant selection" — to further explain the preferential accumulation on the Y of any spermatogenesis genes that might be nearly neutral in females. Studies in several taxa indicate that genes that drive sperm production evolve unusually rapidly, presumably owing to fierce rivalry among sperm from one or multiple males, whose fecundity tends to vary more than that of females50-52. Under such stringent selection for winning strategies in the race for fertilization, alleles that enhance sperm success might readily spread in a population. Their actual selective advantage, however, is likely to vary with chromosomal linkage. If they are Y linked, such alleles are always favoured, because they are expressed in each generation; if they are autosomal or X linked, they can selectively spread only when male-transmitted — roughly every other generation for autosomal loci and every third generation for X loci. So, generation-invariant selection on spermatogenic genes might intensify the overall selective advantage for their gain, retention and adaptive change on the male-specific NRY. Whether such intensified advantage actually makes allelic fixation significantly more likely on the NRY than elsewhere in the genome remains to be fully modelled. The several-fold lower effective population size of the NRY than of the X or an autosome, for example, might diminish the advantage of constant selection, because small populations allow nonadvantageous alleles a greater chance to drift to fixation in place of advantageous alleles53. Amplification of gene copy number might be a second counter to the decay of NRY genes. Most class 2 genes exist in multiple copies on the Y, although current counts are inexact26. Gene amplification might buffer against harmful mutations: although mutations accumulate to impair the function of single copies, other intact copies might carry out a gene family's spermatogenic duties and, also, seed further amplification. Notably, the great density of long-repeat sequences throughout the NRY might mediate frequent amplification of repeat-flanked genic regions26, 54, 55. Altogether, NRY genes have two distinct origins and three distinct evolutionary fates. Their origins are: descent from the proto-Y, which was extensively homologous with the X, or specific recruitment to the Y from elsewhere in the genome. The three evolutionary fates of NRY genes are: functional decay, preservation in ancestral (typically housekeeping) form, or specialization in male-specific function. Despite degeneration, some Ys (for example, that of the fruitfly Drosophila miranda) seem to have ballooned in size through large translocations from autosomes56. Such a translocation apparently occurred in an early placental mammal ancestor, shortly after the placental–MARSUPIAL split57, 58 ( Fig. 3). This translocation generated new XYhomologous sequence, which then encountered the factors that drive ongoing XY differentiation. Recombination was eventually suppressed in much of the new Y-linked portion; most genes in the region then decayed, and their X homologues became subject to inactivation in females. Y-chromosome genes and disease A striking feature of the human NRY is that its two largest gene classes correspond to two disorders: Turner syndrome (TS) and male infertility. Turner syndrome results from a 45,XO karyotype59-61. Most such embryos die in utero, accounting for roughly one-tenth of recognized human foetal deaths. TS is detected in about 1 out of 3,000 human live-births62. Short stature, failure of gonadal development and diverse macroanatomic anomalies typify the syndrome59-61. The TS karyotype can be seen as the lack of either an X, relative to XX females, or a Y, relative to XY males. Recognizing this, Malcolm Ferguson-Smith argued in 1965 that the syndrome reflects the haploinsufficiency of "TS genes", which he predicted would be common to the Xs and Ys and would elude X inactivation60. Class 1 human NRY genes and their X homologues meet these conditions and are considered to be TS candidates. Their widespread expression is consistent with the broad range of symptoms observed in TS patients. Pseudoautosomal genes might also contribute to TS, as they occupy both the Xs and Ys and typically elude female X inactivation 2. Indeed, a gene called short stature homeobox ( SHOX), identified recently in the freely recombining region of the human sex chromosomes, seems to contribute to the short stature of TS individuals63, 64. The syndrome highlights the crucial importance of the Y in body-wide housekeeping functions and underscores the incompleteness of human Y degeneration. In the mouse, whose Y degeneration seems relatively more advanced, XO individuals reportedly show no salient phenotype. The second common Y-associated disorder is male infertility. About 1 out of 1,000 human males is infertile, owing to spermatogenic failure65. Remarkably, newly arisen Y deletions account for 10% of such cases34, 44, which is consistent with a rate of de novo partial Y deletion of at least 10-4. Class 2 genes, which are testis-specific in expression and male-specific in the genome, are probably important for spermatogenesis. Deletion mapping in infertile men has defined particular Y regions that are involved in fertility. Three such regions — AZF a, b, c (azoospermia factor region a, b and c) — are well characterized (Fig. 1); deletion within any one region might severely impair spermatogenesis34, 43, 44. Among the three, AZFc deletion is by far the most common. The need for an intact Y for spermatogenesis might largely reflect the presence of testis-specific genes in these regions. Still, the possibility cannot be ruled out that the more widely expressed Y genes might also be required for male fertility. For example, lesions of USP9Y (previously known as DFFRY) or DBY (DEAD/H (Asp-Glu-Ala-Asp/His)-box polypeptide Y) genes, which are both widely expressed class 1 genes in AZFa, have been linked to spermatogenic failure 66, 67. In summary, two principal Y-associated disorders reflect the two most salient functional themes of the human Y, again highlighting the two main gene classes therein. Class 3 genes The active human NRY genes that fit neither class 1 nor 2 have provoked considerable curiosity, and some functional and phylogenetic inquiry. Five such genes are known; perhaps other putative coding sequences on the NRY will, upon more thorough expression assay in a broad range of tissues, prove to be additional class 3 genes. In general, these genes seem to be in various states of evolutionary limbo. Some (for example, RBMY and SRY) clearly reflect the evolutionary trend of the Y for malespecific fitness and, thus, most resemble class 2 genes; in some rodents, Sry is multicopy68, as are human class 2 genes and RBMY. Other class 3 genes, especially those that recombined recently, might still decay and join the ranks of evolutionarily informative — if functionally inert — NRY pseudogenes. Some such genes, however, might reflect the influence of additional evolutionary factors at work on the NRY. Here, within the broad context of mammalian Y history, we speculate on potential biological roles and evolutionary histories of the most intriguing class 3 genes. Amelogenin X/Y genes. Amelogenin proteins aggregate to scaffold the accretion of tooth enamel, which is the most densely mineralized vertebrate tissue69, 70. Placental mammals express these proteins from an X locus and, in some taxa (for example, primate, cat, cow, deer and horse, but not murid or pig), more weakly from a Y locus71-73. In humans, some AMELX (but not AMELY) alleles reportedly segregate with enamel defects, although studies on the X inactivation status of the gene are inconclusive74, 75. Given the expression profile of amelogenin, its active expression from Ys is puzzling. In the light of basic trends of Y-gene evolution, such conservation might reflect chance long-term persistence or, perhaps, adaptive evolution for some function specifically benefiting males. The latter possibility is particularly intriguing in the human case. Human AMELX and AMELY probably stopped recombining with each other between 30 and 50 million years ago — ample evolutionary time for Y-gene decay, as attested by the fact that all other known human X genes that ceased X–Y recombination during that time now lack active Y homologues14. Moreover, when aligned with one another, human AMELX and AMELY show, in addition to a single-codon gap, the most aminoacid replacements per synonymous nucleotide divergence of known human XY homologues, including those whose Y copies are pseudogenes. Likewise, partially sequenced deer amelogenin homologues show 3 frame-preserving gaps and 11 aminoacid differences, but no synonymous differences 76. Such sequence divergence might be more consistent with differential adaptive protein evolution by the homologues than with chance persistence of functionally unconstrained AMELY loci. If AMELY has persisted by adaptive evolution in the mode of other NRY genes, what male-specific benefit might it confer? Notably, to explain the evolution of genomic imprinting, David Haig, Laurence Hurst and others have modelled sexual antagonism as mediated through differences between maternal and paternal epigenetic regulation of early growth. They posit that promiscuous, nurturing mothers prefer (in the evolutionary sense) equitable offspring growth, whereas fathers prefer resourceintensive offspring growth at the expense of rival-fathered half-siblings77, 78. Imprinting research has largely targeted systemic growth modifiers as candidates for such parental antagonism, but one could also predict localized processes such as mammalian tooth development as relevant to such conflict. Namely, promiscuous mammal mothers might prefer relatively early teething of offspring in order to speed weaning and regain fertility. By contrast, fathers might prefer later teething, relative to other growth, in order to monopolize maternal resources. Indeed, first molar eruption age in HAPLORHINE primates reportedly correlates tightly with both weaning age and the inter-birth interval of the mother79. Furthermore, the delay typical of marsupial primary incisor eruption is widely deemed adaptive for prolonged suckling (K. Smith, personal communication). Intriguingly, females in many primate populations teethe earlier overall than males 80-83 (albeit that females outpace males on other development fronts too). Moreover, there is anecdotal evidence of delayed tooth eruption in XYY males84. Such observations are grossly consistent with Y-linked tooth eruption delay, which might simply reflect systemic sex-differential growth. Alternatively, might AMELY, acting as a parentally antagonistic gene, delay male tooth eruption in at least some of the taxa that preserve it? Yoh Iwasa, Hurst and others, have noted that sex-linkage, like imprinting, can mark alleles by parentage85-87. Although any inhibition of tooth eruption by AMELY would be manifest only in males, a Y harbouring such a parentally antagonistic gene would still be predicted to spread at the expense of other Y variants in some populations, perhaps as defined by the degree of POLYANDRY, distribution of litter size and other factors. But how might AMELY actively delay teething? Recent work shows that a well-attested short amelogenin splice product might strongly promote bone and/or cartilage growth, rather than enamel formation, indicating a previously unsuspected regulatory function for the gene88. Intriguingly, a splice junction crucial to this product has been eliminated by separate mutations in both the human and cow AMELY loci, leaving them able to encode only the long transcripts generally associated with enamel-forming, but not osteogenic, function (Fig. 5). Notably, regulatory signals from the enamel organ are implicated in the early stages of tooth eruption, which is thought to involve programmatic turnover in local bone and cartilage tissues89. These observations are consistent with, if not clearly supportive of, our speculation that AMELY of some mammals might have diverged in function from AMELX in a manner benefiting males through teething delay. Figure 5 | Amelogenin gene-splicing patterns. Comparative alignment of cow X- (GenBank accession number M63499), cow Y- (M63500), human X- (M86932) and human Y- (M86933) chromosome-encoded amelogenins, excluding cow alternatively spliced exon 3 for simplicity. Dots indicate identity to cow X-derived sequence; hyphens indicate relative gaps. Purple regions, linked by lines to indicate mRNA splicing, are homologous to a highly osteogenic splice product in rat88. Blue boxes show inferred parallel mutations in the cow and human Y loci, which destroy an exonic splice site (ancestral CAG glutamine codon) that is crucial to the osteogenic transcript. Green regions (notably excluded from the osteogenic product) are homologous to the glyco-binding motif that is crucial for enamel formation in rat, as reported by Ravindranath et al. in Ref. 110. Yellow sites have known variants associated with human X-linked enamel defects, as in Ref. 75. Note that relative sequence similarities indicate that the cow and human AMELY (amelogenin Y) loci became non-recombining separately after cow–human divergence, consistent with the model posited in Fig. 3. Rare human Y lineages that lack AMELY have been reported90. In the context of our model, it will be of great interest to learn more about tooth eruption timing in these lineages. Variable charge X/Y genes. These genes are the only known active human XY homologues that are both expressed exclusively in the testis. They form a large family: two reported Y-linked loci, which encode identical proteins, and roughly a dozen Xlinked loci, the protein products of which vary mainly in the tandem iteration of an acidic ten-amino-acid motif present singly in the Y homologues13. The predicted VCX/Y proteins are 125–206 amino acids long, with an invariant highly basic amino-terminal segment. So, with predicted isoelectric points ranging from 4.3 to 9.4, these proteins probably vary greatly in net charge at living pH, prompting their name: variable charge, X and Y13. Human VCX/Y-derived probes hybridize well only in anthropoids, among those mammals assayed. The gene family thus seems to have arisen recently and/or evolved rapidly in the anthropoid lineage13. The cellular function(s) of VCX and VCY proteins are unknown. But in size, absolute charge and superficial structural features, VCX and VCY resemble chromatin-associated proteins such as histones and protamines; the latter mediate condensed DNA packaging in sperm52. More strikingly, however, the testis-specific expression, multiple copies of sex-linked homologues, variable motif iteration and phylogenetic novelty of VCX/Y recall the fruitfly X-linked Stellate (Ste) and Y-linked crystal ((Su)Ste) loci91. These genes, confined to Drosophila melanogaster and close relatives, are contentiously viewed as MEIOTIC DRIVE antagonists, with Stellate expression putatively hindering transmission of Y-bearing sperm in a dosage-dependent manner and crystal expression putatively suppressing such bias92-94. Sex-chromosome drive is theoretically predicted to arise readily and is generally well attested in the heterogametic sexes of flies, lepidopterans, birds and mammals95. Such drive, however, carries an unusual cost in skewing the sex ratio; this is predicted to favour the genome-wide emergence of drive modifiers. Human VCX copies cluster near the Xp telomere, in the X region that most recently ceased to recombine with the Y14. There, two VCX clusters flank the STS (steroid sulphatase) gene96, 97. Deletion-induced STS deficiency, seen mostly in males as the skin anomaly called ichthyosis, might mark VCX-deficient individuals, because wholegene deletions often reflect unbalanced recombination among flanking VCX repeat clusters13, 98, 99. If VCX acts analogously to Stellate, males should overabound among offspring of VCX -/VCY+ men. Sex-ratio assessment in X-linked ichthyotic pedigrees might, therefore, reveal any resulting meiotic drive. Several such pedigrees are at least partially reported98, 100-102. Interestingly, before knowledge of VCX/Y, there was speculation of male-bias among offspring of ichthyosis-carrier females103 (rather than of affected males, as expected in spermatogenic X versus Y drive). However, such speculation was disputed on the grounds of male-biased ascertainment and reporting102. Perhaps more concerted study of VCX/ Y will ultimately provide a new window on human sex-linked meiotic drive — a phenomenon so far only cursorily studied103, 104. Protocadherin X/Y genes. The recently characterized hominid PCDHX/Y loci encode protocadherins expressed mainly in the brain11, 12. The X- and Y-derived protein sequences have diverged slightly from one another and show different cellular expression distributions, leading Patricia Blanco et al to suggest that PCDHY might have gained a male-specific function in brain morphogenesis12; the nature of such a hypothetical function is unclear, although large-scale sexual dimorphism of the adult human brain is well attested105. Alternatively, considering that the PCDHY region is thought to have transposed to the Y from the X only 3–4 million years ago106, the gene might simply be in an early stage of functional degeneration. Conclusion Theodosius Dobzhansky's claim that "nothing in biology makes sense except in the light of evolution" is a mantra of the field107. Viewed practically, it might be an overstatement: much coherent insight into the functioning of living systems has been gained without explicitly invoking evolutionary arguments. However, reference to evolution is crucial to a working understanding of Y functionality. As discussed here, gross classification of the genes of the human Y elucidates much of its unusual history. And in turn, such evolutionary insight helps to elucidate the functional ranges of the molecules that those genes encode. Links DATABASE LINKS SYBL1 | HSPRY3 | SRY | SOX3 | AMELY | PCDHY | AMELX | PCDHX | RBMY | VCY | RBMX | VCX | Zfy | Ube1y | Usp9y | CDY | DAZ | CDYL | DAZL | Turner syndrome | male infertility | SHOX | AZF | DBY | Stellate | crystal FURTHER INFORMATION Human Genome Project References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Bull, J. J. Evolution of Sex Determining Mechanisms(Benjamin Cummings, Menlo Park, California, 1983). Rappold, G. A. The pseudoautosomal regions of the human sex chromosomes. Hum. Genet. 92, 315-324 (1993). | PubMed | ISI | ChemPort | Graves, J. A. M., Wakefield, M. J. & Toder, R. The origin and evolution of the pseudoautosomal regions of human sex chromosomes. Hum. Mol. Genet. 7, 1991-1996 (1998). | Article | PubMed | ISI | ChemPort | Henke, A., Fischer, C. & Rappold, G. A. Genetic map of the human pseudoautosomal region reveals a high rate of recombination in female meiosis at the Xp telomere. Genomics 18, 478-485 (1993). | PubMed | ISI | ChemPort | Lien, S., Szyda, J., Schechinger, B., Rappold, G. & Arnheim, N. Evidence for heterogeneity in recombination in the human pseudoautosomal region: high resolution analysis by sperm typing and radiation-hybrid mapping. Am. J. Hum. Genet. 66, 557-566 (2000). | Article | PubMed | ISI | ChemPort | Ciccodicola, A. et al. Differentially regulated and evolved genes in the fully sequenced Xq/Yq pseudoautosomal region. Hum. Mol. Genet. 9, 395-401 (2000). | Article | PubMed | ISI | ChemPort | Erlandsson, R., Wilson, J. F. & Paabo, S. Sex chromosomal transposable element accumulation and male-driven substitutional evolution in humans. Mol. Biol. Evol. 17, 804-812 (2000). | PubMed | ISI | ChemPort | Stevanovic, M., Lovell-Badge, R., Collignon, J. & Goodfellow, P. N. SOX3 is an Xlinked gene related to SRY. Hum. Mol. Genet. 2, 2013-2018 (1993). | PubMed | ISI | ChemPort | Foster, J. W. & Graves, J. A. An SRY-related sequence on the marsupial X chromosome: implications for the evolution of the mammalian testis-determining gene. Proc. Natl Acad. Sci. USA 91, 1927-1931 (1994). References 8 and 9 together identify SOX3as the X-chromosome homologue of the male-determining gene SRY, and show that the SRYSOX3 split probably initiated the X-Y divergence in mammalian ancestors. | PubMed | ISI | ChemPort | Salido, E. C., Yen, P. H., Koprivnikar, K., Yu, L. C. & Shapiro, L. J. The human enamel protein gene amelogenin is expressed from both the X and the Y chromosomes. Am. J. Hum. Genet. 50, 303-316 (1992). | PubMed | ISI | ChemPort | Yoshida, K. & Sugano, S. Identification of a novel protocadherin gene (PCDH11) on the human XY homology region in Xq21. 3. Genomics 62, 540-543 (1999). | Article | PubMed | ISI | ChemPort | Blanco, P., Sargent, C. A., Boucher, C. A., Mitchell, M. & Affara, N. A. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. Conservation of PCDHX in mammals; expression of human X/Y genes predominantly in brain. Mamm. Genome 11, 906-914 (2000). | Article | PubMed | ISI | ChemPort | Lahn, B. T. & Page, D. C. A human sex-chromosomal gene family expressed in male germ cells and encoding variably charged proteins. Hum. Mol. Genet. 9, 311-319 (2000). | Article | PubMed | ISI | ChemPort | Lahn, B. T. & Page, D. C. Four evolutionary strata on the human X chromosome. Science 286, 964-967 (1999). This paper shows that X and Y chromosomes in the human lineage ceased to recombine with each other in progressive blocks during evolution. | Article | PubMed | ISI | ChemPort | Korpelainen, H. Sex ratios and conditions required for environmental sex determination in animals. Biol. Rev. Camb. Phil. Soc. 65, 147-184 (1990). | ISI | ChemPort | Barlow, D. P. Imprinting: a gamete's point of view. Trends Genet. 10, 194-199 (1994). | Article | PubMed | ISI | ChemPort | Graves, J. A. The origin and function of the mammalian Y chromosome and Yborne genes -- an evolving understanding. Bioessays 17, 311-320 (1995). | PubMed | ISI | ChemPort | Rice, W. R. Evolution of the Y sex chromosome in animals. BioScience 46, 331343 (1996). A highly readable synopsis of how the lack of recombination can foster functional decay of the Y chromosome, including a thorough explanation of concepts such as Muller's ratchet. | ISI | Wilson, E. B. Studies on chromosomes. III. The sexual difference of chromosome-groups in Hemiptera, with some consideration on the determination and inheritance of sex. J. Exp. Zool. 2, 507-545 (1906). Muller, H. J. A gene for the fourth chromosome of Drosophila. J. Exp. Zool. 17, 325-336 (1914). Muller, H. J. The relation of recombination to mutational advance. Mutat. Res. 1, 2-9 (1964). | Article | ISI | Charlesworth, B. Model for evolution of Y chromosomes and dosage compensation. Proc. Natl Acad. Sci. USA 75, 5618-5622 (1978). | PubMed | ISI | ChemPort | Rice, W. R. Genetic hitchhiking and the evolution of reduced genetic activity of the Y sex chromosome. Genetics 116, 161-167 (1987). | PubMed | ISI | ChemPort | Shen, P. et al. Population genetic implications from sequence variation in four Y chromosome genes. Proc. Natl Acad. Sci. USA 97, 7354-7359 (2000). This paper, from a research group which has made considerable use of non-genic non-recombining Y region (NRY) sequence variation to infer aspects of human population history, summarizes the current understanding of human Y-chromosome population genetics using new data on sequence variation in NRY genes themselves. | Article | PubMed | ISI | ChemPort | Rice, W. R. Degeneration of a nonrecombining chromosome. Science 263, 230232 (1994). Experimental evidence that the suppression of recombination is detrimental to the functional integrity of genes in diploid organisms. | PubMed | ISI | ChemPort | Lahn, B. T. & Page, D. C. Functional coherence of the human Y chromosome. Science 278, 675-680 (1997). Genes in the non-recombining region of the human Y chromosome conform to a small number of functional 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. themes. | Article | PubMed | ISI | ChemPort | Jegalian, K. & Page, D. C. A proposed path by which genes common to mammalian X and Y chromosomes evolve to become X inactivated. Nature 394, 776-780 (1998). The degeneration of genes on the Y chromosome is probably what drives the corresponding X-chromosome homologues to become subject to Xchromosome inactivation. | Article | PubMed | ISI | ChemPort | Brown, C. J., Carrel, L. & Willard, H. F. Expression of genes from the human active and inactive X chromosomes. Am. J. Hum. Genet. 60, 1333-1343 (1997). | PubMed | ISI | ChemPort | Delbridge, M. L., Lingenfeler, P. A., Disteche, C. M. & Graves, J. A. M. The candidate spermatogenesis gene RBMY has a homologue on the human X chromosome. Nature Genet. 22, 223-224 (1999). | Article | PubMed | ISI | ChemPort | Mazeyrat, S., Saut, N., Mattei, M. & Mitchell, M. J. RBMY evolved on the Y chromosome from a ubiquitously transcribed X-Y identical gene. Nature Genet. 22, 224-226 (1999). References 29 and 30 show that the testis-specific human Ychromosome gene RBMY evolved from a widely expressed X-Ychromosome homologous gene, demonstrating that Y-chromsome genes can acquire a testis-specific expression pattern during evolution. | Article | PubMed | ISI | ChemPort | Mardon, G. & Page, D. C. The sex-determining region of the mouse Y chromosome encodes a protein with a highly acidic domain and 13 zinc fingers. Cell 56, 765-770 (1989). | PubMed | ISI | ChemPort | Odorisio, T., Mahadevaiah, S. K., McCarrey, J. R. & Burgoyne, P. S. Transcriptional analysis of the candidate spermatogenesis gene Ube1y and of the closely related Ube1x shows that they are co-expressed in spermatogonia and spermatids but are repressed in pachytene spermatocytes. Dev. Biol. 180, 336343 (1996). | Article | PubMed | ISI | ChemPort | Brown, G. M. et al. Characterisation of the coding sequence and fine mapping of the human DFFRY gene and comparative expression analysis and mapping to the Sxrb interval of the mouse Y chromosome of the Dffry gene. Hum. Mol. Genet. 7, 97-107 (1998). | Article | PubMed | ISI | ChemPort | Reijo, R. et al. Diverse spermatogenic defects in humans caused by Y chromosome deletions encompassing a novel RNA-binding protein gene. Nature Genet. 10, 383-393 (1995). | PubMed | ISI | ChemPort | Saxena, R. et al. The DAZ gene cluster on the human Y chromosome arose from an autosomal gene that was transposed, repeatedly amplified and pruned. Nature Genet. 14, 292-299 (1996). | PubMed | ISI | ChemPort | Lahn, B. T. & Page, D. C. Retroposition of autosomal mRNA yielded testis-specific gene family on human Y chromosome. Nature Genet. 21, 429-433 (1999). References 35 and 36 show that some testis-specific genes on the human Y chromosome are transposed copies of autosomal genes. | Article | PubMed | ISI | ChemPort | Bridges, C. B. Non-disjunction as proof of the chromosome theory of heredity (concluded). Genetics 1, 107-163 (1916). Tiepolo, L. & Zuffardi, O. Localization of factors controlling spermatogenesis in the nonfluorescent portion of the human Y chromosome long arm. Hum. Genet. 34, 119-124 (1976). | PubMed | ISI | ChemPort | Hardy, R. W., Tokuyasu, K. T. & Lindsley, D. L. Analysis of spermatogenesis in Drosophila melanogaster bearing deletions for Y-chromosome fertility genes. Chromosoma 83, 593-617 (1981). | PubMed | ISI | ChemPort | Levy, E. R. & Burgoyne, P. S. The fate of XO germ cells in the testes of XO/XY 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. and XO/XY/XYY mouse mosaics: evidence for a spermatogenesis gene on the mouse Y chromosome. Cytogenet. Cell Genet. 42, 208-213 (1986). | PubMed | ISI | ChemPort | Bishop, C. E. Mouse Y chromosome. Mamm. Genome 3, S289-S293 (1992). | PubMed | ISI | ChemPort | Gepner, J. & Hays, T. S. A fertility region on the Y chromosome of Drosophila melanogasterencodes a dynein microtubule motor. Proc. Natl Acad. Sci. USA 90, 11132-11136 (1993). | PubMed | ISI | ChemPort | Ma, K. et al. A Y chromosome gene family with RNA-binding protein homology: candidates for the azoospermia factor AZF controlling human spermatogenesis. Cell 75, 1287-1295 (1993). | PubMed | ISI | ChemPort | Vogt, P. H. et al. Human Y chromosome azoospermia factor (AZF) mapped to different subregions in Yq11. Hum. Mol. Genet. 5, 933-943 (1996). This paper describes several regions of the human Y chromosome that are critically involved in spermatogenesis. | Article | PubMed | ISI | ChemPort | Zhang, P. & Stankiewicz, R. L. Y-Linked male sterile mutations induced by P element in Drosophila melanogaster. Genetics 150, 735-744 (1998). | PubMed | ISI | ChemPort | Fisher, R. A. The evolution of dominance. Biol. Rev. 6, 345-368 (1931). Brooks, R. Negative genetic correlation between male sexual attractiveness and survival. Nature 406, 67-70 (2000). | Article | PubMed | ISI | ChemPort | Salo, P. et al. Molecular mapping of the putative gonadoblastoma locus on the Y chromosome. Genes Chromosomes Cancer 14, 210-214 (1995). | PubMed | ISI | ChemPort | Tsuchiya, K., Reijo, R., Page, D. C. & Disteche, C. M. Gonadoblastoma: molecular definition of the susceptibility region on the Y chromosome. Am. J. Hum. Genet. 57, 1400-1407 (1995). | PubMed | ISI | ChemPort | Metz, E. C. & Palumbi, S. R. Positive selection and sequence rearrangements generate extensive polymorphism in the gamete recognition protein bindin. Mol. Biol. Evol. 13, 397-406 (1996). | PubMed | ISI | ChemPort | Ting, C. T., Tsaur, S. C., Wu, M. L. & Wu, C. I. A rapidly evolving homeobox at the site of a hybrid sterility gene. Science 282, 1501-1504 (1998). | Article | PubMed | ISI | ChemPort | Wyckoff, G. J., Wang, W. & Wu, C. I. Rapid evolution of male reproductive genes in the descent of man. Nature 403, 304-309 (2000). References 50-52 provide intriguing evidence from diverse taxa that male reproductive proteins might evolve relatively rapidly compared with most other proteins. | Article | PubMed | ISI | ChemPort | Gillespie, J. H. Population Genetics: A Concise Guide(Johns Hopkins Univ. Press, Baltimore, Maryland, 1998). Foote, S., Vollrath, D., Hilton, A. & Page, D. C. The human Y chromosome: overlapping DNA clones spanning the euchromatic region. Science 258, 60-66 (1992). | PubMed | ISI | ChemPort | Vollrath, D. et al. The human Y chromosome: a 43-interval map based on naturally occurring deletions. Science 258, 52-59 (1992). | PubMed | ISI | ChemPort | Yi, S. & Charlesworth, B. Contrasting patterns of molecular evolution of the genes on the new and old sex chromosomes of Drosophila miranda. Mol. Biol. Evol. 17, 703-717 (2000). | PubMed | ISI | ChemPort | Spencer, J. A., Sinclair, A. H., Watson, J. M. & Graves, J. A. Genes on the short arm of the human X chromosome are not shared with the marsupial X. Genomics 11, 339-345 (1991). | PubMed | ISI | ChemPort | Watson, J. M., Spencer, J. A., Riggs, A. D. & Graves, J. A. Sex chromosome 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. evolution: platypus gene mapping suggests that part of the human X chromosome was originally autosomal. Proc. Natl Acad. Sci. USA 88, 1125611260 (1991). This paper shows that there was a large translocation from autosome to sex chromosome in an ancestor of placental mammals, resulting in great enlargement of the placental sex chromosomes. | PubMed | ISI | ChemPort | Turner, H. H. A syndrome of infantilism, congenital webbed neck, and cubitus valgus. Endocrinology 23, 566-574 (1938). Ferguson-Smith, M. A. Karyotype-phenotype correlations in gonadal dysgenesis and their bearing on the pathogenesis of malformations. J. Med. Genet. 2, 142155 (1965). A classic paper that puts forward the hypothesis that Turner syndrome is due to haploinsufficiency of XY-chromosome-common genes. Zinn, A. R., Page, D. C. & Fisher, E. M. Turner syndrome: the case of the missing sex chromosome. Trends Genet. 9, 90-93 (1993). | Article | PubMed | ISI | ChemPort | Hook, E. B. & Warburton, D. The distribution of chromosomal genotypes associated with Turner's syndrome: livebirth prevalence rates and evidence for diminished fetal mortality and severity in genotypes associated with structural X abnormalities or mosaicism. Hum. Genet. 64, 24-27 (1983). | PubMed | ISI | ChemPort | Rao, E. et al. Pseudoautosomal deletions encompassing a novel homeobox gene cause growth failure in idiopathic short stature and Turner syndrome. Nature Genet. 16, 54-63 (1997). | PubMed | ISI | ChemPort | Ellison, J. W. et al. PHOG, a candidate gene for involvement in the short stature of Turner syndrome. Hum. Mol. Genet. 6, 1341-1347 (1997). | Article | PubMed | ISI | ChemPort | Hull, M. G. et al. Population study of causes, treatment, and outcome of infertility. Br. Med. J. (Clin. Res. Ed.) 291, 1693-1697 (1985). | PubMed | ISI | ChemPort | Sun, C. et al. An azoospermic man with a de novo point mutation in the Ychromosomal gene USP9Y. Nature Genet. 23, 429-432 (1999). | Article | PubMed | ISI | ChemPort | Foresta, C., Ferlin, A. & Moro, E. Deletion and expression analysis of AZFa genes on the human Y chromosome revealed a major role for DBY in male infertility. Hum. Mol. Genet. 9, 1161-1169 (2000). | Article | PubMed | ISI | ChemPort | Bullejos, M., Sanchez, A., Burgos, M., Jimenez, R. & Diaz De La Guardia, R. Multiple mono- and polymorphic Y-linked copies of the SRY HMG-box in microtidae. Cytogenet. Cell Genet. 86, 46-50 (1999). | Article | PubMed | ISI | ChemPort | Moradian-Oldak, J. et al. A review of the aggregation properties of a recombinant amelogenin. Connect. Tissue Res. 32, 125-130 (1995). | PubMed | ISI | ChemPort | Fincham, A. G., Moradian-Oldak, J. & Simmer, J. P. The structural biology of the developing dental enamel matrix. J. Struct. Biol. 126, 270-299 (1999). | Article | PubMed | ISI | ChemPort | Nakahori, Y., Takenaka, O. & Nakagome, Y. A human X-Y homologous region encodes 'amelogenin'. Genomics 9, 264-269 (1991). | PubMed | ISI | ChemPort | Watson, J. M., Spencer, J. A., Graves, J. A., Snead, M. L. & Lau, E. C. Autosomal localization of the amelogenin gene in monotremes and marsupials: implications for mammalian sex chromosome evolution. Genomics 14, 785-789 (1992). | PubMed | ISI | ChemPort | Samonte, R. V., Conte, R. A. & Verma, R. S. Molecular phylogenetics of the hominoid Y chromosome. J. Hum. Genet. 43, 185-186 (1998). | Article | PubMed | ISI | ChemPort | 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. Alvesalo, L. Sex chromosomes and human growth. A dental approach. Hum. Genet. 101, 1-5 (1997). | Article | PubMed | ISI | ChemPort | Ravassipour, D. B. et al. Unique enamel phenotype associated with amelogenin gene (AMELX) codon 41 point mutation. J. Dent. Res. 79, 1476-1481 (2000). | PubMed | ISI | ChemPort | Yamauchi, K. et al. Sex determination based on fecal DNA analysis of the amelogenin gene in sika deer (Cervus nippon). J. Vet. Med. Sci. 62, 669-671 (2000). | Article | PubMed | ISI | ChemPort | Haig, D. Intragenomic conflict and the evolution of eusociality. J. Theor. Biol. 156, 401-403 (1992). | PubMed | ISI | ChemPort | Hurst, L. D. Is multiple paternity necessary for the evolution of genomic imprinting? Genetics 153, 509-512 (1999). References 77 and 78 represent, respectively, a statement of the original sexual antagonism model of genomic imprinting, and thoughtful secondary theoretical work exploring that model's assumptions and implications. | PubMed | ISI | ChemPort | Lee, P. C. in Comparative Primate Socioecology(ed. Lee, P. C.) (Cambridge Univ. Press, Cambridge, UK, 1999). Mooney, M. P., Siegel, M. I., Eichberg, J. W., Lee, D. R. & Swan, J. Deciduous dentition eruption sequence of the laboratory-reared chimpanzee (Pan troglodytes). J. Med. Primatol. 20, 138-139 (1991). | PubMed | ISI | ChemPort | Kuykendall, K. L., Mahoney, C. J. & Conroy, G. C. Probit and survival analysis of tooth emergence ages in a mixed-longitudinal sample of chimpanzees (Pan troglodytes). Am. J. Phys. Anthropol. 89, 379-399 (1992). | PubMed | ISI | ChemPort | Kaul, S. S., Pathak, R. K. & Santosh Emergence of deciduous teeth in Punjabi children, north India. Z. Morphol. Anthropol. 79, 25-34 (1992). | PubMed | ChemPort | Rajic, Z., Rajic Mestrovic, S. & Vukusic, N. Chronology, dynamics and period of primary tooth eruption in children from Zagreb, Croatia. Coll. Antropol. 23, 659663 (1999). | PubMed | ISI | ChemPort | Alvesalo, L., Osborne, R. H. & Kari, M. The 47,XYY male, Y chromosome, and tooth size. Am. J. Hum. Genet. 27, 53-61 (1975). | PubMed | ISI | ChemPort | Iwasa, Y. & Pomiankowski, A. Sex specific X chromosome expression caused by genomic imprinting. J. Theor. Biol. 197, 487-495 (1999). | Article | PubMed | ISI | ChemPort | Hurst, L. D. Embryonic growth and the evolution of the mammalian Y chromosome. I. The Y as an attractor for selfish growth factors. Heredity 73, 223-232 (1994). | PubMed | ISI | Hurst, L. D. Embryonic growth and the evolution of the mammalian Y chromosome. II. Suppression of selfish Y-linked growth factors may explain escape from X-inactivation and rapid evolution of Sry. Heredity 73, 233-243 (1994). | PubMed | ISI | Veis, A. et al. Specific amelogenin gene splice products have signaling effects on cells in culture and in implants in vivo. J. Biol. Chem. 275, 41263-41272 (2000). | Article | PubMed | ISI | ChemPort | Marks, S.C.Jr The basic and applied biology of tooth eruption. Connect. Tissue Res. 32, 149-157 (1995). | PubMed | ISI | Santos, F. R., Pandya, A. & Tyler-Smith, C. Reliability of DNA-based sex tests. Nature Genet. 18, 103 (1998). | PubMed | ChemPort | Kogan, G. L., Epstein, V. N., Aravin, A. A. & Gvozdev, V. A. Molecular evolution of two paralogous tandemly repeated heterochromatic gene clusters linked to the X and Y chromosomes of Drosophila melanogaster. Mol. Biol. Evol. 17, 697-702 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. (2000). | PubMed | ISI | ChemPort | Hurst, L. D. Is Stellate a relict meiotic driver? Genetics 130, 229-230 (1992). | PubMed | ISI | ChemPort | Palumbo, G., Bonaccorsi, S., Robbins, L. G. & Pimpinelli, S. Genetic analysis of Stellate elements of Drosophila melanogaster. Genetics 138, 1181-1197 (1994). | PubMed | ISI | ChemPort | Hurst, L. D. Further evidence consistent with Stellate's involvement in meiotic drive. Genetics 142, 641-643 (1996). | PubMed | ISI | ChemPort | Hurst, L. D. Evolution. Sex, slime and selfish genes. Nature 354, 23-24 (1991). | PubMed | ISI | ChemPort | Yen, P. H. et al. Cloning and expression of steroid sulfatase cDNA and the frequent occurrence of deletions in STS deficiency: implications for X-Y interchange. Cell 49, 443-454 (1987). | PubMed | ISI | ChemPort | del Castillo, I., Cohen-Salmon, M., Blanchard, S., Lutfalla, G. & Petit, C. Structure of the X-linked Kallmann syndrome gene and its homologous pseudogene on the Y chromosome. Nature Genet. 2, 305-310 (1992). | PubMed | ISI | ChemPort | Ballabio, A. et al. Deletions of the steroid sulphatase gene in 'classical' X-linked ichthyosis and in X-linked ichthyosis associated with Kallmann syndrome. Hum. Genet. 77, 338-341 (1987). | PubMed | ISI | ChemPort | Li, X. M., Yen, P. H. & Shapiro, L. J. Characterization of a low copy repetitive element S232 involved in the generation of frequent deletions of the distal short arm of the human X chromosome. Nucleic Acids Res. 20, 1117-1122 (1992). | PubMed | ISI | ChemPort | Filippi, G. & Meera Khan, P. Linkage studies on X-linked ichthyosis in Sardinia. Am. J. Hum. Genet. 20, 564-569 (1968). | PubMed | ISI | ChemPort | Went, L. N., De Groot, W. P., Sanger, R., Tippett, P. & Gavin, J. X-linked ichthyosis: linkage relationship with the Xg blood groups and other studies in a large Dutch kindred. Ann. Hum. Genet. 32, 333-345 (1969). | PubMed | ISI | ChemPort | Adam, A. Sex ratio in families with X-linked ichthyosis. Am. J. Hum. Genet. 32, 763-764 (1980). | PubMed | ISI | ChemPort | Gladstien, K., Shapiro, L. J. & Spence, M. A. Estimating sex ratio biases in Xlinked disorders: is there an excess of males in families with X-linked ichthyosis? Am. J. Hum. Genet. 31, 741-746 (1979). | PubMed | ISI | ChemPort | Naumova, A. & Sapienza, C. The genetics of retinoblastoma, revisited. Am. J. Hum. Genet. 54, 264-273 (1994). | PubMed | ISI | ChemPort | Nopoulos, P., Flaum, M., O'Leary, D. & Andreasen, N. C. Sexual dimorphism in the human brain: evaluation of tissue volume, tissue composition and surface anatomy using magnetic resonance imaging. Psychiatry Res. 98, 1-13 (2000). | Article | PubMed | ISI | ChemPort | Schwartz, A. et al. Reconstructing hominid Y evolution: X-homologous block, created by X-Y transposition, was disrupted by Yp inversion through LINE-LINE recombination. Hum. Mol. Genet. 7, 1-11 (1998). | Article | PubMed | ISI | ChemPort | Dobzhansky, T. G. Genetic Diversity and Human Equality(Basic Books, New York, New York, 1973). Archidiacono, N. et al. Evolution of chromosome Y in primates. Chromosoma 107, 241-246 (1998). | Article | PubMed | ISI | ChemPort | Murphy, W. J. et al. Extensive conservation of sex chromosome organization between cat and human revealed by parallel radiation hybrid mapping. Genome Res. 9, 1223-1230 (1999). | Article | PubMed | ISI | ChemPort | Ravindranath, R. M., Tam, W. Y., Nguyen, P. & Fincham, A. G. The enamel protein amelogenin binds to the N-acetyl-D-glucosamine-mimicking peptide motif of cytokeratins. J. Biol. Chem. 275, 39654-39661 (2000). | Article | PubMed | ISI | ChemPort | Acknowledgements We thank B. Charlesworth, S. Dorus, R. Hudson, M. Kreitman, E. Stahl, A. Veis, G. Wyckoff and S. Yi for stimulating discussion; G. Wyckoff for computational support; and C. Andrews and J. Socha for help with the guppies. Glossary BIPOTENTIAL GONAD The last embryonic tissue precursor that can differentiate into either the ovary or the testis. CLADE An organismal lineage comprising an ancestor and all its descendants. DIOECIOUS Having separate male and female organisms. EFFECTIVE POPULATION SIZE (N e). The theoretical number of organisms or copies of a locus for which the genetic variation in a given sample of the organisms or copies can be explained solely by mutation and genetic drift; Ne is related to, but never exceeds, the actual population size (N). GENETIC DRIFT The random fluctuation of allele frequencies across generations in a finite population. HAPLORHINE A member of the clade comprising apes, monkeys and tarsiers only. HOMEOTHERM An organism that uses cellular metabolism specifically to stabilize its own body temperature. MARSUPIAL Non-placental mammal whose liveborn young suckle in maternal pouches. MEIOTIC DRIVE Preferential transmission of one gamete genotype over another genotype, in which the genotypes in question might derive from the same meiosis. MEROHAPLODIPLOID Characterized by one sex lacking part, but less than half, of the diploid chromosome set typical of the other sex. PARALOGUE A locus that is homologous to another within the same haploid genome. PENETRANCE The frequency of affected individuals among the carriers of a particular genotype. POLYANDRY A population mating structure in which a female might mate with multiple males during her lifetime. 30 May 2002 Nature 417, 559 - 563 (2002); doi:10.1038/nature751 Nature AOP, published online 12 May 2002 DMY is a Y-specific DM-domain gene required for male development in the medaka fish MASARU MATSUDA*, YOSHITAKA NAGAHAMA*, AI SHINOMIYA†, TADASHI SATO†, CHIKA MATSUDA*, TOHRU KOBAYASHI*, CRAIG E. MORREY*, NAOKI SHIBATA‡, SHUICHI ASAKAWA§, NOBUYOSHI SHIMIZU§, HIROSHI HORI , SATOSHI HAMAGUCHI† & MITSURU SAKAIZUMI† * Laboratory of Reproductive Biology, National Institute for Basic Biology, Okazaki 444-8585, Japan † Graduate School of Science and Technology, Niigata University, Ikarashi, Niigata 950-2181, Japan ‡ Department of Biology, Faculty of Science, Shinshu University, Asahi 3-1-1, Matsumoto, Nagano 390-8621, Japan § Department of Molecular Biology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan Graduate School of Science, Nagoya University, Chikusa-ku, Nagoya 464-8602, Japan Correspondence and requests for materials should be addressed to Y.N. (e-mail: nagahama@nibb.ac.jp). The DNA Data Bank of Japan (DDBJ) accession number of the medaka DMY cDNA sequence is AB071534. Although the sex-determining gene Sry has been identified in mammals1, no comparable genes have been found in non-mammalian vertebrates. Here, we used recombinant breakpoint analysis to restrict the sex-determining region in medaka fish (Oryzias latipes) to a 530-kilobase (kb) stretch of the Y chromosome. Deletion analysis of the Y chromosome of a congenic XY female further shortened the region to 250 kb. Shotgun sequencing of this region predicted 27 genes. Three of these genes were expressed during sexual differentiation. However, only the DM-related2 PG17 was Y specific; we thus named it DMY. Two naturally occurring mutations establish DMY's critical role in male development. The first heritable mutant—a single insertion in exon 3 and the subsequent truncation of DMY—resulted in all XY female offspring. Similarly, the second XY mutant female showed reduced DMY expression with a high proportion of XY female offspring. During normal development, DMY is expressed only in somatic cells of XY gonads. These findings strongly suggest that the sexspecific DMY is required for testicular development and is a prime candidate for the medaka sex-determining gene. The medaka has two major advantages for genetic research: a large genetic diversity within the species3, 4 and the existence of several inbred strains5. As in mammals, sex determination in medaka is male heterogametic6, although the Y chromosome is not cytogenetically distinct7. Alteration of phenotypic sex with no reproductive consequences, and recombination over the entire sex chromosome pair8-10, suggest that there are no major differences, other than a sex-determining gene, between the X and Y chromosomes. To clone positionally the sex-determining region, we generated a Y congenic strain to highlight the genetic differences between the X and Y chromosomes from inbred strains of medaka11. The Y congenic strain has a sex-determining region derived from the HNI-strain Y chromosome on the genetic background of an Hd-rR strain. In this congenic strain, the wild-type allele (R) of the r locus (a sex-linked pigment gene) is located only on the Y chromosome. Therefore, the female XrXr results in a white body colour, and the male XrYR results in an orange-red body colour. Using this strain, we had previously constructed a genetic map of the medaka sex chromosome9. In this study, we first performed chromosome walking using several recombinants to map the sex-determining region of the Y congenic strain (Fig. 1a). Two recombinants (HdrR.YHNI(R1) and (R2)) between the sex-determining (SD) locus and a sex-linked marker (SL1) (centromere side of SD) were obtained from an oestrogen-induced XY female of the Y congenic strain9. To obtain recombinants between SD and r (a body-colour gene), we rescreened progeny from an oestrogen-induced XY female of the recombinant Hd-rR.YHNI (R1) strain. We subsequently found one white male (Hd-rR.YHNIrr) and one orange-red female, which were recombinants between the SD and r loci. After confirming their genotypes from fin clippings, these recombinants were maintained as strains. Figure 1 Positional cloning strategy of the sex-determining region and subsequent identification of PG17/DMY. Full legend High resolution image and legend (63k) For chromosome walking from SL1, we constructed a bacterial artificial chromosome (BAC) genomic library from the Y congenic strain. By this approach, we obtained 47 sexlinked BAC clones, sequenced their end fragments, and designed polymerase chain reaction (PCR) primers for sequence tag site (STS) markers on the Y chromosome. Analyses of recombinant genotypes with these STS markers indicated that the sex-determining region was located between 135D12.F and 51H7.F (Fig. 1a). This stretch of the Y chromosome was encompassed by four BAC clones (Fig. 1b), although 19 other BAC clones included some portion of this region. We then determined the location of the sex-determining region on the Y chromosome by fluorescence in situ hybridization (FISH) using one of the BAC clones as a probe. The sexdetermining region is located on the centromere side of the long arms of the sex chromosomes (Fig. 1c, d). Centromere and DNA marker locations on the sex chromosomes, determined using triploid hybrid12 and gynogenetic diploids10, confirmed the findings of the FISH analysis. We used shotgun sequencing to determine the sequence covered by the four BAC clones. The entire sequence of the two centromere-side BAC clones (mCON089P3 and mCON144M14) was determined; however, the remaining two, telomere-side, BAC clones (mCON104P2 and mCON137M1) could not be completely sequenced mainly owing to numerous repetitive sequences. Consequently, we sequenced 422,202 nucleotides and estimated that the four BAC clones covered about 530 kb (Fig. 1b). The gene-predicting program Genscan (Version 1.0, http://genes.mit.edu/GENSCAN.html) predicted 52 genes in this region. We also found an orange-red female in our congenic progeny (Fig. 2). Mating this female with a sex-reversed (androgen-induced) XX Hd-rR male resulted in all female offspring with a 50:50 (white:orange-red) body colour ratio. Assuming this female's X chromosome was derived from a recombination event between the SD and r loci, sex-linked DNA markers flanking the SD locus should be homozygous (Hd-rR/Hd-rR type). The markers (for example, SL1 and 51H7.F) flanking the SD were heterozygous (Hd-rR/HNI type), indicating that this particular orange-red female had the congenic sex-determining region on the Y chromosome (Fig. 2b), a conclusion confirmed by other DNA markers within the sex-determining region. From these observations, this XY female was determined to lack 250 kb within the sex-determining region of the Y chromosome (Fig. 1e). Consequently, the corresponding region of a normal Y chromosome should contain the sex-determining gene. Figure 2 Characteristics of medaka lacking a part of the Y chromosome. Full legend High resolution image and legend (39k) Genscan analysis of the deleted 250-kb region predicted 27 genes. To identify which predicted genes were expressed, we designed specific primers for each, and examined their expression in medaka embryos during sexual differentiation (before hatching to 10 days after hatching, d.a.h.). PCR with reverse transcription (RT–PCR) detected expression of three of these 27 genes in embryos (Fig. 1f). Furthermore, only one of these three genes, PG17 (predicted gene 17)/DMY, was expressed exclusively in XY embryos. In contrast, the remaining two genes (PG21, PG30) were expressed in both XY and XX embryos. BAC clones derived from the X and Y chromosomes confirmed that PG17 is specific to the Y chromosome whereas the other two genes are present on both the X and Y chromosomes. The full-length complementary DNA sequence (1,320 base pairs) of DMY was obtained by 5' and 3' rapid amplification of cloned ends (RACE). The longest open reading frame spans six exons and encodes a putative protein of 267 amino acids, including the highly conserved DM domain (Fig. 1g, h). The DM domain was originally described as a DNAbinding motif shared between doublesex (dsx) in Drosophila melanogaster and mab-3 in Caenorhabditis elegans2. Since the initial characterizations of dsx and mab-3, DM-related genes have been identified from virtually all species examined, including medaka13-19. In vertebrate species, DMRT1 (DM-related transcription factor 1), the DM-related gene most homologous to DMY (about 80%; data not shown), correlates with male development13-18, 20 . Combined with its Y chromosome specificity, this finding suggests that DMY is pivotal in testicular differentiation. To establish a role for DMY during sexual differentiation, we screened wild medaka populations for naturally occurring DMY mutants. Two XY females with distinct mutations in DMY were found in separate populations (Awara and Shirone). The XYwAwr mutant from Awara contains a single nucleotide insertion in exon 3 of DMY. Although the DM domain remains intact, this insertion causes a frame shift from residue 110 and premature termination at residue 139 (Fig. 1h). Offspring obtained by mating the XYwAwr female with an Hd-rR male (XY, DMY on Y chromosome) revealed typical mendelian combinations (XX, XY, XYwAwr, YYwAwr); however, phenotypes were female, male, female and male, respectively (Table 1). Despite the presence of pseudo-DMY messenger RNA (which does not encode full-length DMY) in offspring containing the PG17wAwr mutant allele (Fig. 1h), the absence of about two-thirds of the protein presumably renders DMY nonfunctional, thus resulting in XY sex reversal (female phenotype). Although the mechanism apparently differs, the second mutant also results in XY sex reversal. The entire DMY coding region of the XYwSrn mutant is intact; however, an unknown transcriptional anomaly severely depresses or eliminates DMY expression in embryos (Fig. 3). In addition to the original XYwSrn female mutant, 60% of XYwSrn offspring developed as phenotypic females (Table 1), suggesting that a threshold level of DMY expression is required for male development. Taken together, the loss-of-function mutant (XYwAwr) and the depressed expression mutant (XYwSrn) strongly suggest that DMY is required for normal testicular differentiation. Figure 3 cDNA PCR of progeny from XYwAwr and XYwSrn female Hd-rR male crosses. Full legend High resolution image and legend (53k) To confirm the role of DMY during normal development, in situ hybridization was used to determine its spatial expression during gonadal differentiation. Because DMRT1 and DMY are very similar, the in situ hybridization probe was not able to discriminate between the two mRNAs. We therefore determined the temporal expression of both DMY and DMRT1 during sexual differentiation using specific RT–PCR. At hatching and 5 d.a.h., DMY mRNA was present in XY embryos, but not in XX embryos. In contrast, there was no DMRT1 expression in either XX or XY embryos at either of these time points (data not shown). Because expression of DMY and DMRT1 do not appear to overlap during this period, we assumed that the in situ hybridization signal represented DMY expression. As predicted from our PCR studies, DMY signal was detected only in the somatic cells surrounding germ cells in XY embryos (Fig. 4). Although its function in the pre-Sertoli cells remains unclear, these data further indicate that DMY has a critical role in testicular differentiation. Figure 4 PG17/DMY mRNA expression in fry gonads shown by digoxigenin-labelled in situ hybridization of larval sections. Full legend High resolution image and legend (43k) Although evidence of sufficiency is required to definitively identify DMY as a sexdetermining gene, we have shown that it is necessary for normal male development and falls within the sex-determining region of the Y chromosome. Given the absence of other reasonable candidate genes within the region, DMY is certainly the leading candidate for the medaka sex-determining gene. Interestingly, phylogenetic analyses indicate that DMY was probably derived from DMRT1, suggesting an evolutionary pattern similar to one of the proposed origins of Sry. Sry is thought to have either arisen from an autosomal Sox gene duplication event and subsequent formation of the Y chromosome or, more probably, divergence from Sox3 after formation of the sex (X) chromosomes21. Regardless, the linkage of Sry to the Y chromosome renders the entire male pathway dependent on Sry. Further evidence concerning the ability of DMY to trigger male development and its evolutionary relationship to DMRT1 are needed to confirm this hypothesis, but the linkage of DMY to the Y chromosome and its requirement for testicular differentiation strongly suggest that DMY represents a non-mammalian vertebrate equivalent of Sry. Methods Fish Two recombinant strains between SL1 and SD, Hd-rR.YHNI (R1) and Hd-rR.YHNI (R2), were established from recombinant offspring of sex-reversed XY females of the described Hd-rR.YHNI strain9. Another recombinant strain, Hd-rR.YHNIrr, was established from a recombinant between SD and r. This recombinant was obtained from offspring of a sex-reversed Hd-rR.YHNI (R1) XY female crossed with a sex-reversed Hd-rR XX male. Sex reversal in medaka was accomplished by oestrogen (XY females) or androgen (XX males) treatment during early development, according to previously published methods9. Naturally occurring mutants XYwAwr and XYwSrn were found in wild populations near Awara (Fukui prefecture) and Shirone (Niigata prefecture), Japan, respectively. RNA and DNA extraction Total RNA and genomic DNA were extracted from each hatched embryo after homogenization in a 1.5-ml tube with 350 µl RLT buffer supplied with the RNeasy Mini Kit (Qiagen). The homogenized lysates were centrifuged and supernatants were used for RNA extraction using the RNeasy Mini Kit with the RNaseFree DNase set protocol (Qiagen). Precipitated material was used for DNA extraction using the DNeasy tissue Kit (Qiagen) according to the manufacturer's protocol. Chromosome walking A BAC genomic library was constructed from the Y congenic HdrR.YHNI strain as described22. High-molecular-mass genomic DNA was extracted from sperm, partially digested with HindIII, and selected for a size range of 150–250 kb. The size-selected DNA fragments were ligated to pBAC-lac23 and used to transform DH10B. A total of 55,292 BAC clones was picked and arrayed to 144 microtitre plates each with 384 wells. The library was gridded at high density on Hybond-N + nylon membranes (Amersham Pharmacia Biotech). Chromosome walking started at SL1. Two-thirds of the library was usually screened. Inserted end fragments of positive BAC clones were amplified by vectorette PCR24 and used for assembling the positive clones. An amplified end fragment at the far end of the SD side was used in subsequent screening of the BAC library. RT–PCR First-strand cDNA was synthesized from 0.5 µg total RNA in 25 µl using PowerScript (Clontech) with oligo-dT primers. PCR was carried out in a 25 µl reaction mixture containing 0.25 µl of the first-strand cDNA. PCR conditions were 5 min at 95 °C; 20 s at 96 °C, 30 s at 55 °C, 30 s at 72 °C for 30 cycles; and 5 min at 72 °C. For PG17, 0.01 µl of the initial PCR products were re-amplified under the same conditions. Specific primers for each gene were as follows; PG17—first PG17.11, GAGTCGGAGCCAAGCGGGTACAA CATTC, PG17.12, GACCATCTCATTTTTTATTCTTGATTTT, second PG17.5, CCGGGTGCCCAAGTGCTCCCGCTG, PG17.6, GATCGTCCCTCCACAGAGAAGAGA; PG21—PG21.1, TGTGATTCTGAAGGGGGAGTTTGTAA, PG21.3, GACCTCCAGAGTC ATCTTGCACAC); and PG30—PG30.1, GGAGGAAAGTGTCAGGAGTGTTGTGT, PG30.2, GCCGTCCCTCTGATGTACTCGTTCCT). FISH mapping Metaphase cells from Hd-rR embryos were prepared by standard cytogenetic methods25. FISH was performed using an SL2 fragment labelled by PCR9 with digoxigenin-11-dUTP (Boehringer) and BAC clones mCON072N5 (containing SL1) and mCON049E13 (SD region) labelled by nick translation with fluorescein and biotin, respectively. Cells were co-hybridized with SL2 and SL1 or SL2, SL1 and SD, and counterstained with 4,6-diamidino-2-phenylindole (DAPI). Probes were detected with rhodamine-labelled anti-digoxigenin antibodies (Boehringer), Alexa Fluor 488-labelled anti-fluorescein, and Alexa Fluor 660-labelled streptavidin (Molecular Probes). Metaphase cells were examined using four filters (A4, L5, N3 and Y5) and images were captured using a CoolSNAP charge-coupled device camera (Nippon roper) and Openlab software (Improvision). Hybridization was detected on the identical sex chromosome. Shotgun sequencing BAC DNA was hydrodynamically sheared to average sizes of 1.5 and 4.5 kb, and the DNA was ligated into a pUC18 vector. We sequenced each BAC to 13 coverage using Dyeterminatore chemistry. Individual BACs were assembled from the shotgun sequences using phred Version 0.000925.c, crossmatch Version 0.990319 and phrap Version 0.990319 (Codon Code), and PCP Version 2.1.6 and CAP4 Version 2.1.6 (Paracel). The gaps in each BAC were closed using a combination of BAC walking, directed PCR and re-sequencing of individual clones. Analyses of wild mutants We crossed wild mutant females (XY; -/PG17wAwr or XY; /PG17wSrn) with Hd-rR males (XY; -/PG17Hd-rR) to obtain the following sex chromosomes and PG17 genotypes in the offspring: XX; -/-, XY; -/PG17wAwr, XY; -/PG17Hd-rR, and YY; PG17wAwr/PG17Hd-rR. Genotypes of these offspring were determined by genomic PCR of PG17 and SL1. PG17 expression was examined by RT–PCR (see above). Sexes were confirmed by histological examination of gonads (fry and adult fish) and secondary sex characteristics (adult fish). At hatching, germ cell numbers were counted to determine gonadal phenotype26. Exon sequences of the mutant PG17 genes (PG17wAwr and PG17wSrn) were determined using DNA extracted from the mutants and their progeny. In situ hybridization Gonads of 0–5 d.a.h. Hd-rR fry were dissected with trunk of the body and fixed in 4% paraformaldehyde in 0.1 M phosphase buffer (pH 7.4) at 4 °C overnight. After fixation, gonads were embedded in paraffin and cross-sectioned at 5 µm. In situ hybridization was performed using published methods27. Genetic sex of fry was confirmed by PCR using the SL1 genotype11. Received 22 February 2002; accepted 24 April 2002 References 1. Sinclair, A. H. et al. A gene from the human sex-determining region encodes a protein with homology to a conserved DNA-binding motif. Nature 346, 240-244 (1990) | PubMed | ISI | ChemPort | 1. Sinclair, A. H. et al. A gene from the human sex-determining region encodes a protein with homology to a conserved DNA-binding motif. Nature 346, 240-244 (1990) | PubMed | ISI | ChemPort | 2. Raymond, C. S. et al. Evidence for evolutionary conservation of sex-determining genes. Nature 391, 691-695 (1998) | Article | PubMed | ISI | ChemPort | 2. Raymond, C. S. et al. Evidence for evolutionary conservation of sex-determining genes. Nature 391, 691-695 (1998) | Article | PubMed | ISI | ChemPort | 3. Sakaizumi, M., Moriwaki, K. & Egami, N. Allozymic variation and regional differentiation in wild populations of the fish Oryzias latipes. Copeia 1983, 311-318 (1983) 4. Matsuda, M., Yonekawa, H., Hamaguchi, S. & Sakaizumi, M. Geographic variation and diversity in the mitochondrial DNA of the medaka, Oryzias latipes, as determined by restriction endonuclease analysis. Zool. Sci. 14, 517-526 (1997) | ISI | 4. Matsuda, M., Yonekawa, H., Hamaguchi, S. & Sakaizumi, M. Geographic variation and diversity in the mitochondrial DNA of the medaka, Oryzias latipes, as determined by restriction endonuclease analysis. Zool. Sci. 14, 517-526 (1997) | ISI | 5. Hyodo-Taguchi, Y. & Sakaizumi, M. List of inbred strains of the medaka, Oryzias latipes, maintained in the Division of Biology, National Institute of Radiological Sciences. Fish Biol. J. MEDAKA 5, 29-30 (1993) 6. Aida, T. On the inheritance of colour in a freshwater fish, Aplocheilus latipes Temminck and Schlegel, with special reference to sex-linked inheritance. Genetics 6, 554-573 (1921) 7. Uwa, H. & Ojima, Y. Detailed and banding karyotype analysis of the medaka, Oryzias latipes, in cultured cells. Proc. Jpn Acad. B 57, 39-43 (1981) | ISI | 7. Uwa, H. & Ojima, Y. Detailed and banding karyotype analysis of the medaka, Oryzias latipes, in cultured cells. Proc. Jpn Acad. B 57, 39-43 (1981) | ISI | 8. Yamamoto, T. Progenies of sex-reversal females mated with sex-reversal males in the medaka, Oryzias latipes. J. Exp. Zool. 146, 163-179 (1961) | ISI | ChemPort | 8. Yamamoto, T. Progenies of sex-reversal females mated with sex-reversal males in the 9. 9. 10. 10. 11. 11. 12. 12. 13. 13. 14. 14. 15. 15. 16. 16. 17. 17. medaka, Oryzias latipes. J. Exp. Zool. 146, 163-179 (1961) | ISI | ChemPort | Matsuda, M., Sotoyama, S., Hamaguchi, S. & Sakaizumi, M. Male-specific restriction of recombination frequency in the sex chromosomes of the medaka, Oryzias latipes. Genet. Res. 73, 225-231 (1999) | Article | ISI | Matsuda, M., Sotoyama, S., Hamaguchi, S. & Sakaizumi, M. Male-specific restriction of recombination frequency in the sex chromosomes of the medaka, Oryzias latipes. Genet. Res. 73, 225-231 (1999) | Article | ISI | Kondo, M., Nagao, E., Mitani, H. & Shima, A. Differences in recombination frequencies during female and male meioses of the sex chromosomes of the medaka, Oryzias latipes. Genet. Res. 78, 23-30 (2001) | Article | PubMed | ISI | ChemPort | Kondo, M., Nagao, E., Mitani, H. & Shima, A. Differences in recombination frequencies during female and male meioses of the sex chromosomes of the medaka, Oryzias latipes. Genet. Res. 78, 23-30 (2001) | Article | PubMed | ISI | ChemPort | Matsuda, M. et al. Isolation of a sex chromosome-specific DNA sequence in the medaka, Oryzias latipes. Genes Genet. Syst. 72, 263-268 (1997) | Article | ISI | ChemPort | Matsuda, M. et al. Isolation of a sex chromosome-specific DNA sequence in the medaka, Oryzias latipes. Genes Genet. Syst. 72, 263-268 (1997) | Article | ISI | ChemPort | Sato, T., Yokomizo, S., Matsuda, M., Hamaguchi, S. & Sakaizumi, M. Gene-centromere mapping of medaka sex chromosomes using triploid hybrids between Oryzias latipes and O. luzonensis. Genetica 111, 71-75 (2001) | Article | PubMed | ISI | ChemPort | Sato, T., Yokomizo, S., Matsuda, M., Hamaguchi, S. & Sakaizumi, M. Gene-centromere mapping of medaka sex chromosomes using triploid hybrids between Oryzias latipes and O. luzonensis. Genetica 111, 71-75 (2001) | Article | PubMed | ISI | ChemPort | Raymond, C. S., Kettlewell, J. R., Hirsch, B., Bardwell, V. J. & Zarkower, D. Expression of Dmrt1 in the genital ridge of mouse and chicken embryos suggests a role in vertebrate sexual development. Dev. Biol. 215, 208-220 (1999) | Article | PubMed | ISI | ChemPort | Raymond, C. S., Kettlewell, J. R., Hirsch, B., Bardwell, V. J. & Zarkower, D. Expression of Dmrt1 in the genital ridge of mouse and chicken embryos suggests a role in vertebrate sexual development. Dev. Biol. 215, 208-220 (1999) | Article | PubMed | ISI | ChemPort | Smith, C. A., McClive, P. J., Western, P. S., Reed, K. J. & Sinclair, A. H. Conservation of a sex-determining gene. Nature 402, 601-602 (1999) | Article | PubMed | ISI | ChemPort | Smith, C. A., McClive, P. J., Western, P. S., Reed, K. J. & Sinclair, A. H. Conservation of a sex-determining gene. Nature 402, 601-602 (1999) | Article | PubMed | ISI | ChemPort | De Grandi, A. et al. The expression pattern of a mouse doublesex-related gene is consistent with a role in gonadal differentiation. Mech. Dev. 90, 323-326 (2000) | Article | PubMed | ISI | ChemPort | De Grandi, A. et al. The expression pattern of a mouse doublesex-related gene is consistent with a role in gonadal differentiation. Mech. Dev. 90, 323-326 (2000) | Article | PubMed | ISI | ChemPort | Kettlewell, J. R., Raymond, C. S. & Zarkower, D. Temperature-dependent expression of turtle Dmrt1 prior to sexual differentiation. Genesis 26, 174-178 (2000) | Article | PubMed | ISI | ChemPort | Kettlewell, J. R., Raymond, C. S. & Zarkower, D. Temperature-dependent expression of turtle Dmrt1 prior to sexual differentiation. Genesis 26, 174-178 (2000) | Article | PubMed | ISI | ChemPort | Marchand, O. et al. DMRT1 expression during gonadal differentiation and spermatogenesis in the rainbow trout, Oncorhynchus mykiss. Biochim. Biophys. Acta 1493, 180-187 (2000) | Article | PubMed | ISI | ChemPort | Marchand, O. et al. DMRT1 expression during gonadal differentiation and spermatogenesis in the rainbow trout, Oncorhynchus mykiss. Biochim. Biophys. Acta 1493, 180-187 (2000) | Article | PubMed | ISI | ChemPort | 18. Guan, G., Kobayashi, T. & Nagahama, Y. Sexually dimorphic expression of two types of DM (Doublesex/Mab-3)-domain genes in a teleost fish, the tilapia (Oreochromis niloticus). Biochem. Biophys. Res. Commun. 272, 662-666 (2000) | Article | PubMed | ISI | ChemPort | 18. Guan, G., Kobayashi, T. & Nagahama, Y. Sexually dimorphic expression of two types of DM (Doublesex/Mab-3)-domain genes in a teleost fish, the tilapia (Oreochromis niloticus). Biochem. Biophys. Res. Commun. 272, 662-666 (2000) | Article | PubMed | ISI | ChemPort | 19. Brunner, B. et al. Genomic organization and expression of the doublesex-related gene cluster in vertebrates and detection of putative regulatory regions for dmrt1. Genomics 1, 8-17 (2001) | Article | 20. Moniot, B., Berta, P., Scherer, G., Sudbeck, P. & Poulat, F. Male specific expression suggests role of DMRT1 in human sex determination. Mech. Dev. 91, 323-325 (2000) | Article | PubMed | ISI | ChemPort | 20. Moniot, B., Berta, P., Scherer, G., Sudbeck, P. & Poulat, F. Male specific expression suggests role of DMRT1 in human sex determination. Mech. Dev. 91, 323-325 (2000) | Article | PubMed | ISI | ChemPort | 21. Schepers, G. & Koopman, P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators. Dev. Biol. 227, 239-255 (2000) | Article | PubMed | 22. Matsuda, M. et al. Construction of a BAC library derived from the inbred Hd-rR strain of the teleost fish, Oryzias latipes. Genes Genet. Syst. 76, 61-63 (2001) | Article | PubMed | ISI | ChemPort | 22. Matsuda, M. et al. Construction of a BAC library derived from the inbred Hd-rR strain of the teleost fish, Oryzias latipes. Genes Genet. Syst. 76, 61-63 (2001) | Article | PubMed | ISI | ChemPort | 23. Asakawa, S. et al. Human BAC library: construction and rapid screening. Gene 191, 69-79 (1997) | Article | PubMed | ISI | ChemPort | 23. Asakawa, S. et al. Human BAC library: construction and rapid screening. Gene 191, 69-79 (1997) | Article | PubMed | ISI | ChemPort | 24. Ragoussis, J. & Olavesen, M. G. in Genome Mapping: A Practical Approach (ed. Dear, P. H.) 253-260 (Oxford Univ. Press, New York, 1997) 25. Matsuda, M., Matsuda, C., Hamaguchi, S. & Sakaizumi, M. Identification of the sex chromosomes of the medaka, Oryzias latipes, by fluorescence in situ hybridization. Cytogenet. Cell Genet. 82, 257-262 (1998) | Article | PubMed | ISI | ChemPort | 25. Matsuda, M., Matsuda, C., Hamaguchi, S. & Sakaizumi, M. Identification of the sex chromosomes of the medaka, Oryzias latipes, by fluorescence in situ hybridization. Cytogenet. Cell Genet. 82, 257-262 (1998) | Article | PubMed | ISI | ChemPort | 26. Hamaguchi, S. A light- and electron-microscopic study on the migration of primordial germ cells in the teleost, Oryzias latipes. Cell Tissue Res. 227, 139-151 (1982) | PubMed | ISI | ChemPort | 26. Hamaguchi, S. A light- and electron-microscopic study on the migration of primordial germ cells in the teleost, Oryzias latipes. Cell Tissue Res. 227, 139-151 (1982) | PubMed | ISI | ChemPort | 27. Kobayashi, T., Kajiura-Kobayashi, H. & Naghama, Y. Differential expression of vasa homologue gene in the germ cells during oogenesis and spermatogenesis in a teleost fish, tilapia, Oreochromis niloticus. Mech. Dev. 99, 139-142 (2000) | Article | PubMed | ISI | ChemPort | 27. Kobayashi, T., Kajiura-Kobayashi, H. & Naghama, Y. Differential expression of vasa homologue gene in the germ cells during oogenesis and spermatogenesis in a teleost fish, tilapia, Oreochromis niloticus. Mech. Dev. 99, 139-142 (2000) | Article | PubMed | ISI | ChemPort | Acknowledgements. We are grateful to P. Koopman for advice; G. Young for critical reading of the manuscript; and M. Takeda, E. Uno and R. Hayakawa for technical assistance. This work was supported in part by Grants-in-Aid for Research for the Future from the Japan Society for the Promotion of Science, Scientific Research of Priority Area, Environmental Endocrine Disrupter Studies from the Ministry of the Environment, Bio Design from the Ministry of Agriculture, Forestry and Fisheries, and Japan Society for the Promotion of Science Research Fellowships for Young Scientists. Competing interests statement. The authors declare that they have no competing financial interests. Figure 1 Positional cloning strategy of the sex-determining region and subsequent identification of PG17/DMY. a, Genetic maps of the sex-determining regions and Y chromosomes of the recombinant strains. Pink indicates regions derived from an HNI Y chromosome; yellow represents Hd-rR X-chromosome-derived regions. b, BAC contigs and a physical map of the sex-determining region. Horizontal bars indicate BAC clones. Blue blocks show the sequenced regions. c, d, Cytogenetical mapping of the sex-determining region of medaka. c, FISH of metaphase chromosomes. SL2 (red) localizes on the short arms of the sex chromosomes, whereas a BAC clone containing SL1 (mCON072N5, yellow) localizes on the long arms. Arrowheads indicate sex chromosomes. d, FISH of one sex chromosome with three different probes (SL2, SL1, SD). Signals of a BAC clone containing the sex-determining region (mCON049E13) are light blue. The arrowhead shows the centromere region. e, The Y chromosome of medaka lacking a part of the sexdetermining region. DNA markers were not detected between PG31C and A314482, but were detected on the centromere side of PG04 and the telomere side of PG31T. Respective location of PG17, PG21 and PG30 within the sex-determining region between PG04 and PG31T is shown. f, RT–PCR analysis for PG17, PG21 and PG30 mRNA expression in whole Hd-rR.YHNI embryos at hatching. Sex chromosome types were determined by SL1 PCR. PG17 expression was detected by nested PCR (see Methods). g, h, Structure and sequences of PG17/DMY. g, Structural analysis of the PG17/DMY showing exons (open boxes), the DM domain (grey boxes) and introns (horizontal lines) of PG17. Translation start and stop sites are indicated by ATG and STOP, respectively. Numbers represent nucleotide sequence length (base pairs, bp). h, Amino-acid sequence of wildtype PG17/DMT and the PG17wAwr mutation. The DM domain is boxed. A frame shift (italicized) occurs in the region from amino acid 110 to premature termination (amino acid 139). Figure 2 Characteristics of medaka lacking a part of the Y chromosome. a, Phenotypes of the congenic strain (XX, XY) and medaka lacking a part of the Y chromosome (XY-). XX (top), XYand XY (bottom) with their secondary sex characters. Males (XY) have larger anal and dorsal fins than females (XX, XY-). XX have white bodies (XrXr), whereas the others are orange-red (XrYR). b, DNA types (SL1PG17 and 51H7.F) of sex chromosomes. PCR products of SL1 and PG17 were electrophoresed in a 1% agarose gel with a 1-kb DNA ladder. AluI-digested PCR products of 51H7.F were electrophoresed in a 3% agarose gel with a 100-bp DNA ladder. SL1 and 51H7.F are homozygous in XX and YY (Hd-rR or HNI type), but are heterozygous in XY- and XY (HdrR/HNI type). PG17 is present in XY and YY but absent in XX and XY-. These results indicate that PG17 is specific to the Y chromosome but absent from the Y chromosome of XY-. Specific primers for PG17 and 51H7.F were as follows: PG17: PG17.19, GAACCACAGCTTGAAGACCCCGCTGA; PG17.20, GCATCTGCTGGTACTGCTGGTAGTTG; and 51H7.F: 51H7.F2, CAGGCCTTGAAGATCAACGAGT; 51H7.F3, AGTGCATCTAGTGTACATGGGT. c, Histological cross-sections of medaka fry sampled 30 d.a.h.. Sex chromosome types were determined by PCR using DNA extracted from the head. Black arrowheads indicate gonads. XX and XY- individuals have ovaries with several oocytes, whereas XY specimens have testes with spermatogonia. Scale bars, 50 µm. Figure 3 cDNA PCR of progeny from XYwAwr and XYwSrn female Hd-rR male crosses. A PG17/DMY band was detected in XYHd-rR and XYwAwr embryos but not in XYwSrn embryos.Total RNA (150 ng each) was extracted from whole embryos (at hatching) and used for cDNA synthesis and amplification using the SMART PCR cDNA Synthesis Kit (Clontech) according to the manufacturer's protocol. Amplified cDNA and genomic DNA were used as PCR templates of PG04 and PG17 (PG17.19–20 primer set). Specific primers for PG04 were as follows: PG04.1, CCAGCGGTTTGAGGATAGGTTTG; PG04.2, GAGCTTTCTGCAGGGCGACTTTC. Products were electrophoresed in a 2% agarose gel with a 100-bp DNA ladder. Figure 4 PG17/DMY mRNA expression in fry gonads shown by digoxigenin-labelled in situ hybridization of larval sections. a, XY gonads on hatching day; antisense probe. Strong signals for PG17/DMY were seen in cells surrounding the germ cells. G, germ cell; CE, coelomic epithelium; ND, nephric duct. b, XX gonad on hatching day; antisense probe. PG17/DMY signal was undetectable in XX individuals. c, XY gonad on hatching day; sense probe. Control hybridization showed no signal. Scale bar, 20 µm. doi:10.1038/90046 volume 28 no. 3 pp 216 - 217 Sox9 induces testis development in XX transgenic mice Valerie P.I. Vidal1, 4, Marie-Christine Chaboissier1, 4, Dirk G. de Rooij2 & Andreas Schedl1, 3 1. MDC for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany. 2. Department of Cell Biology, University Medical Center Utrecht, Utrecht, Netherlands. 3. Present address: Institute of Human Genetics, University of Newcastle upon Tyne, UK. 4. These authors contributed equally to this work. Correspondence should be addressed to A Schedl. e-mail: andreas.schedl@ncl.ac.uk Mutations in SOX9 are associated with male-to-female sex reversal in humans1, 2. To analyze Sox9 function during sex determination, we ectopically expressed this gene in XX gonads. Here, we show that Sox9 is sufficient to induce testis formation in mice, indicating that it can substitute for the sex-determining gene Sry. Sex determination in mice is initiated at embryonic day 10.5 (E10.5) with expression of the Y chromosome-linked gene Sry (ref. 3). One of the first genes induced in male gonads after Sry expression is the Sry homolog Sox9 (refs. 4,5). The DNA-binding domains of both proteins are highly conserved and can functionally substitute for each other6. SOX9/Sox9 is required for expression of the Mullerian inhibiting substance (MIS/Mis; refs. 7,8), but additional functions during sex determination remain elusive. To clarify the role of Sox9 during sex determination, we ectopically expressed it in undifferentiated gonads of transgenic mice. The mouse gene Sox9 (including introns) is expressed under control of regulatory regions of the Wilms' tumor suppressor gene Wt1, which is expressed at E10.5 in the urogenital ridge of both XX and XY animals. Because regulatory regions of Wt1 are poorly characterized, we used a yeast artificial chromosome (YAC) 'knock-in' approach9, in which Sox9 was fused with the start codon of the Wt1 locus encoded by a 620-kilobase mouse YAC (ref. 10; Fig. 1a) and used the resulting construct to generate transgenic animals11. Two XX animals showed a female-to-male sex reversal and were infertile. Moreover, founder 92, a fertile XY male, transmitted the transgene to its offspring (Fig. 1b). All XX transgenic animals (21/21) were phenotypically male (Fig. 2a–d) and had normal mounting behavior. In situ hybridization and reverse transcription polymerase chain reaction (PCR) analysis from E10.5 to adulthood demonstrate expression of Sox9 from the transgene in the developing gonads (Fig. 2e–h; data not shown). Expression of Mis at E13.5 (data not shown) and E16.5 (Fig. 2i–l) is seen within the developing sex cords. Moreover, male reproductive ducts develop normally, which indicates the presence of functional Sertoli cells. The slightly less organized pattern of the seminiferous tubules is probably caused by ectopic expression of Sox9 driven by the Wt1 promoter. At E13.5, the testes of XX transgenic animals are histologically indistinguishable from those of XY wildtype littermates. Adult testes contain seminiferous tubules showing a lumen lined by apparently normal Sertoli cells (Fig. 2m–p). Many Leydig cells are present in the interstitial tissue. As expected from a male with two X chromosomes, germ cells are absent12, which explains the reduced testis size. A mouse line with a transgene inserted 1 Mb upstream of Sox9 shows a comparable female-to-male sex-reversed phenotype13. Sox9 is expressed in XX gonads, but it is impossible to conclude whether the expression of Sox9 triggers the sex reversal or whether Sox9 is activated as part of the sex determination cascade. Our data clearly demonstrate that Sox9 is sufficient to induce male development and indicate that it can substitute for Sry function. Hence, the duplication of the region containing SOX9, as found in a sex-reversed human14, may be caused by ectopic activation of the duplicated SOX9 in XX gonads. Therefore, SRY may act only as a molecular switch15 to activate the evolutionarily more conserved SOX9, which in turn initiates the male differentiation program. Received 20 March 2001; Accepted 25 May 2001. REFERENCES 1. Wagner, T. et al. Cell 79, 1111-1120 (1994). | PubMed | ISI | ChemPort | 2. Foster, J.W. et al. Nature 372, 525-530 (1994). | PubMed | ISI | ChemPort | 3. Swain, A. & Lovell Badge, R. Genes Dev. 13, 755-767 (1999). | PubMed | ISI | ChemPort | 4. Morais da Silva, S. et al. Nature Genet. 14, 62-68 (1996). | PubMed | ChemPort | 5. Kent, J., Wheatley, S.C., Andrews, J.E., Sinclair, A.H. & Koopman, P. Development 122, 2813-2822 (1996). | PubMed | ISI | ChemPort | 6. Bergstrom, D.E., Young, M., Albrecht, K.H. & Eicher, E.M. Genesis. 28, 111-124 (2000). | Article | PubMed | ISI | ChemPort | 7. De Santa Barbara, P. et al. Mol. Cell Biol. 18, 6653-6665 (1998). | PubMed | ChemPort | 8. Arango, N.A., Lovell-Badge, R. & Behringer, R.R. Cell 99, 409-419 (1999). | PubMed | ISI | ChemPort | 9. Moore, A.W. et al. Mech. Dev. 79, 169-184 (1998). | Article | PubMed | ISI | ChemPort | 10. Scholz, H. et al. J. Biol. Chem. 272, 32836-32846 (1997). | Article | PubMed | ISI | ChemPort | 11. Schedl, A., Montoliu, L., Kelsey, G. & Schutz, G. Nature 362, 258-261 (1993). | PubMed | ISI | ChemPort | 12. Hunt, P.A. et al. Mol. Reprod. Dev. 49, 101-111 (1998). | Article | PubMed | ISI | ChemPort | 13. Bishop, C.E. et al. Nature Genet. 26, 490-494 (2000). | Article | PubMed | ISI | ChemPort | 14. Huang, B., Wang, S., Ning, Y., Lamb, A.N. & Bartley, J. Am. J. Med. Genet. 87, 349-353 (1999). | Article | PubMed | ISI | ChemPort | 15. Pask, A. & Graves, J.A. Cell Mol. Life Sci. 55, 864-875 (1999). | Article | PubMed | ISI | ChemPort | ACKNOWLEDGMENTS We thank U. Ziegler, S. Schmidt, S. Lützkendorf and D. Landrock for excellent technical assistance. This work was supported by the Volkswagen Stiftung and EC-grant QLRT00741. Figure 1: Construct design and analysis of transgenic animals. a, Schematic of the transgene. The Sox9 genomic locus (exons and introns) was fused in-frame with the major ATG of Wt1 and introduced into the YAC620mWt1 by homologous recombination in yeast. E1, exon 1; IVS-pA, intervening sequence fused to polyadenylation signal; LYS2, lysine 2 yeast marker; 5'UTR, 5' untranslated region. b, Molecular analysis of transgenic animals and comparison with their phenotypes. Wt1-Sox9, a region overlapping the Wt1 promoter and Sox9 open reading frame, was amplified by PCR to identify transgenic animals (primers sWTp 5'–CATCCGAGCCGCACCTCATG–3', SS2 5'– GCTGGAGCCGTTGACGCG). Zfy/Pax6, the Y chromosome-linked Zfy, was amplified by PCR (oligonucleotides ZFY5 5'– GACTAGACATGTCTTAACATCTGTCC–3' and ZFY3 5'– CCTATTGCATGGACTGCAGCTTATG–3'); as an internal control, oligonucleotides specific for Pax6 (H499 5'–CTTTCTCCAGAGCCTCAATCTG–3' and H500 5'– GCAACAGGAAGGAGGGGGAGA–3') were added. F0, founders; F1, offspring of founder 92. Figure 2: Analysis of gonads in wildtype and transgenic animals. a–d, Urogenital systems of 2-day-old animals. In contrast with wildtype females (c), XX mice carrying the transgene (d) have descended testes (arrow). B, bladder; T, testis, O, ovary, K, kidney; A, adrenal. Whole-mount in situ hybridization with antisense Sox9 (e–h) and Mis (i–l) probes demonstrate a male-specific expression pattern in sex-reversed animals. m–p, Histological analysis of adult gonads (2- m sections; hematoxylin and eosin staining). Testes of sex-reversed animals (p) contain Sertoli and Leydig cells but lack germ cells because of the presence of two X chromosomes. doi:10.1038/82652 volume 26 no. 4 pp 490 - 494 A transgenic insertion upstream of Sox9 is associated with dominant XX sex reversal in the mouse Colin E. Bishop1, 5, Deanne J. Whitworth2, Yanjun Qin1, Alexander I. Agoulnik1, Irina U. Agoulnik1, 4, Wilbur R. Harrison3, 4, Richard R. Behringer2 & Paul A. Overbeek4, 5 1. Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, Texas, USA. 2. Department of Molecular Genetics, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA. 3. Department of Pathology and Laboratory Medicine, University of Texas, Houston Health Science Center, Houston, Texas, USA. 4. Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA. 5. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA. Correspondence should be addressed to C E Bishop. e-mail: bishop@bcm.tmc.edu In most mammals, male development is triggered by the transient expression of the Y-chromosome gene, Sry, which initiates a cascade of gene interactions ultimately leading to the formation of a testis from the indifferent fetal gonad1-4. Several genes5-8, in particular Sox9, have a crucial role in this pathway9-14. Despite this, the direct downstream targets of Sry and the nature of the pathway itself remain to be clearly established15, 16. We report here a new dominant insertional mutation, Odsex (Ods), in which XX mice carrying a 150-kb deletion (approximately 1 Mb upstream of Sox9) develop as sterile XX males lacking Sry. During embryogenesis, wild-type XX fetal gonads downregulate Sox9 expression, whereas XY and XX Ods/+ fetal gonads upregulate and maintain its expression13, 14. We propose that Ods has removed a long-range, gonad-specific regulatory element that mediates the repression of Sox9 expression in XX fetal gonads. This repression would normally be antagonized by Sry protein in XY embryos. Our data are consistent with Sox9 being a direct downstream target of Sry and provide genetic evidence to support a general repressor model of sex determination in mammals17, 18. We generated transgenic mice on the albino, inbred FVB genetic background by microinjection of a Dct-tyrosinase (Tyr) minigene construct to rescue the albinism19. The transgenic founder male, OVE1220, had light pigmentation and microphthalmia. When this founder was bred to FVB females, only male progeny inherited the transgene and the eye phenotype (Fig. 1a,b). Among the first 140 mice born, there was an excess of male progeny: 102 (73%) were male and 38 (27%) were female. We found that 77 of the males were pigmented with microphthalmia, indicating that they carried the transgene, and 25 males were albino and non-transgenic, as were all of the female progeny. This sex-ratio distortion could be explained if a proportion of the transgenic males were actually sex-reversed XX males. To investigate this possibility, we typed transgenic males with ten sequencetagged site (STS) markers spanning the Y chromosome, including primers specific for the genes RbmY1a1, Smcy, Ube1y1, Zfy1, Usp9y and Sry located on Yp (ref. 20). Transgenic males were either positive for all STS markers, indicating the presence of an intact Y chromosome, or negative for all the markers, indicating its complete absence (Fig. 1c, and data not shown). These results indicated that the transgene insertion was causing dominant microphthalmia with XX sex reversal, and the mutation was designated Odsex (Ods, ocular degeneration with sex reversal). Except in testes and eyes (unpublished data), no obvious histological differences were found in the adult visceral organs or newborn skeletons of XY, XY Ods/+ and sexreversed XX Ods/+ males compared with controls (Fig. 1d, and data not shown). Adult XX Ods/+ testes were approximately one-third the size of those of their XY or XY Ods/+ littermates. Histological analysis showed the presence of seminiferous tubules and Sertoli cells, but no evidence of spermatogenesis (Fig. 1d). Sterility was expected in XX Ods/+ males because they lack fertility genes located on the Y chromosome and the presence of two X chromosomes in the germ line has been shown to be incompatible with the mitotic proliferation of spermatogonia21, 22. XY Ods/+ males showed normal testis histology and were fertile. As Sox9 is known to have a role in chondrogenesis23, we examined skeletal development in more than 30 XX and XX Ods/+ newborns. No differences were found, indicating that Sox9 was correctly regulated in this tissue (data not shown). Restriction and Southern-blot analysis showed that a single insertion of two copies of the transgene has occurred in typical head-to-tail pattern (Fig. 2a,b). The left end sequence flanking the transgene showed approximately 90% identity over 550 bp to a sequenced human BAC located on chromosome 17. This allowed us to extend an extensive, well-characterized human chromosome 17 contig24 by approximately 250 kb, physically placing the homologous human sequences approximately 1.3 Mb upstream of SOX9. Using primers internal to the left end breakpoint (5sqendF/R), we showed that the predicted 180-bp band is present in DNA from normal and Ods/+ mice, but absent from Ods/Ods DNA, indicating that the transgene has caused a deletion (Fig. 2c). Preliminary data using pulsed-field gel analysis of a single, 200-kb BAC (RP23 455N13), which spans both transgenic integration sites, indicates that the size of the deletion is approximately 150 kb. Fluorescence in situ hybridization (FISH) analysis revealed a single integration site on distal chromosome 11, band E2 (Fig. 3a, top), near Sox9. Ods and Sox9 were resolved only in interphase cells (Fig. 3a, bottom left and right), indicating that Ods maps approximately 1–2 Mb from Sox9. Backcross analysis placed Ods approximately 1 cM proximal to Sox9 on chromosome 11 (Fig. 3b), consistent with the 1.3-Mb physical distance determined in human. The sex-reversal phenotype of XX Ods/+ mice and the location of the Ods deletion upstream of Sox9 indicated that the transgenic mice might have altered regulation of Sox9 expression during embryogenesis. XX Ods/+ gonads are histologically indistinguishable from those of normal XY littermates at 14.5 days post coitum (d.p.c.; Fig. 4a,d,g) and 11.5 d.p.c. (Fig. 4j,m,p). There was no evidence for ovotestis at these stages. We did not detect Sox9 expression in normal XX gonads at either 14.5 d.p.c. (Fig. 4e,f) or 11.5 d.p.c. (Fig. 4n,o), but we did detect Sox9 expression in XX Ods/+ gonads at levels comparable to those found in their normal XY littermates (Fig. 4h,i,q,r). Anti-mullerian substance (AMH) was correctly expressed, suggesting that functional Sertoli cells were present (data not shown). Abnormal expression of Sox9 was not seen in any other tissue at either embryonic stage. Preliminary data indicate that both alleles of Sox9 are expressed in newborn XX Ods/+ testes. Due to the presence of a proposed auto-regulatory loop15, however, expression data in the early embryonic gonad will be needed to assess the significance of this point. One explanation for the sex reversal is that the transgene has deleted a novel gene acting between Sry and Sox9 in the sexdetermination pathway; however, we have found no evidence for such a deleted gene in the draft sequence data of BAC 455N13, or in the homologous human sequence (unpublished data). Another possibility is that the deletion has directly altered the expression of Sox9 by some form of long-range effect involving chromatin structure25. Although we cannot rule out this possibility at present, the map position of the Ods deletion upstream of Sox9 and the absence of detectable skeletal malformations indicate that the deletion of a gonadspecific regulator of Sox9 expression may underlie the sexreversal phenotype. We interpret our results in the context of a repressor model of mammalian sex determination 18. In our model, Sox9 would induce Sertoli-cell differentiation and represent a possible direct downstream target of Sry action (Fig. 5). Normally, XX females would synthesize repressor molecules that specifically extinguish Sox9 expression in the XX fetal gonad, by binding to cis-acting regulatory elements located upstream of the Sox9 coding sequences. These regulatory elements are predicted to influence the activity of a separate and distinct gonad-specific enhancer. In XY males, Sry protein would interfere with the binding or activity of the repressor molecule. Sox9 would therefore be expressed, inducing Sertoli-cell differentiation and consequent male development. In the Ods mutant, the transgene insertion has deleted the regulatory element; thus the repressor complex can no longer exert its long-range activity on the gonadspecific enhancer. This would permit specific upregulation of Sox9 in the gonad in the absence of Sry, causing dominant sex-reversal in XX Ods/+ mice. Ods represents the first example of XX (Sry) sex reversal reported in the mouse. These data, although derived from a single mutant, provide strong support for a general repressor system of sex determination in mammals. They identify Sox9 as a possible direct downstream target of Sry action and suggest that specific gene expression is mediated by repressor binding to long-range, gonad-specific regulatory sequences. Methods Transgenic mice. We generated transgenic founder mice by microinjection of FVB fertilized eggs with a 3-kb tyrosinase minigene construct (Dct-Tyr) containing a mouse tyrosinase (Tyr) cDNA under the control of a tyrosinase-related protein-2 (Dct) promoter26. We identified seven pigmented founders. One male (a founder for family OVE 1220) had microphthalmia. This male and his transgenic progeny were mated to FVB females and the progeny analysed for inheritance of the transgene by eye phenotype, coat pigmentation and PCR using primers specific for the transgenic construct (TyEx1, 5'– CTGTCCAGTGCACCATCTGGACC–3', and TyEx2, 5'– GATTACGTAATAGTGGTCCCTCAG–3'). In more than 1,000 FVB mice tested so far, the pigmented mice have all been males with microphthalmia. Cloning the transgene insertion site. A phage library was constructed from XY Ods/+ DNA cloned into phage EMBL3 (Promega). We screened the library by hybridization with the Dct-Tyr minigene construct. Plaques representing endogenous Tyr were distinguished from the minigene by interexonic PCR using primers TyEx1/TyEx2. A restriction map of the insert was then generated according to standard procedures and the mouse DNA flanking the transgene insertion was sequenced. A single BAC (455N13) spanning the transgene integration sites was identified in the reference RPCI-23 mouse library by screening with probes flanking the insertion breakpoints. Generation of Ods/Ods mice. In more than 1,000 mice maintained on the FVB background, no transgenic females have been observed. When an FVB XY Ods/+ male was crossed to a normal female of the outbred ICR strain, however, approximately 10% of the XX Ods/+ F1 progeny developed as adult females (with microphthalmia). The other 90% developed as typical XX Ods/+ males. XX Ods/+ F1 females were backcrossed to XY Ods/+ FVB males and found to be fertile, giving normal litter sizes. In this way, homozygous XX Ods/Ods and XY Ods/Ods mice were produced. All of the Ods/Ods mice produced so far have been males. This phenomenon shows that genetic background is critical to Ods sex reversal and is currently being investigated further. Mapping of Sox9 and Ods. We typed DNA from the Jackson Laboratory interspecific backcross panel BSS (ref. 27) for Sox9 using primers Sox9pA (5'–TTCACCATCCCAGCCAAG– 3') and Sox9pB (5'–CCAGTCGGCCAGGTAATC–3'), located 20 kb proximal to Sox9(ref. 28). The 374-bp amplified product was digested with BanII yielding the following sizes: C57BL/6 (B6), 20 bp, 154 bp and 200 bp; Mus spretus, 20 bp and 354 bp. The Ods flanking sequences were mapped using the SQA1F (5'–ATTCCAGCCTTCACTGCTTC–3') and SQA2R (5'–GGGGCTGGATAAGAACATT–3') primers, which generate a 500-bp product. After restriction with HaeII, the M. spretus allele remains uncut, whereas the C57BL/6 (B6) allele is cleaved to yield 200-bp and 300-bp fragments. All products were separated on a 3.5% agarose gel. FISH. We prepared chromosome spreads according to standard protocols. Digoxygenin-labelled (Boehringer) Tyr cDNA was used to probe G-banded XX Ods/+ metaphase spreads. Similarly, digoxygenin-labelled phage DNA spanning Sox9 and biotin-labelled Ods BAC 455N13 DNA were used to probe wild-type female mouse metaphase spreads. Digoxygenin-labelled DNA was detected with an FITCconjugated anti-digoxygenin antibody (Boehringer) and biotinlabelled DNA with Cy3-conjugated streptavidin (Amersham, Pharmacia). After counterstaining (0.2 g/ml DAPI or 0.2 g/ml propidium iodide), images were captured using a PowerGene probe analysis system (Perceptive Scientific Instruments). RNA in situ hybridization. Timed matings of FVB females with XY Ods/+ FVB males were used to generate fetuses at 11.5 and 14.5 d.p.c. Fetuses were dissected from the uterus, a portion of the head was taken for DNA extraction and genotyping, and the body was fixed in 4% paraformaldehyde in PBS. The presence of the transgene was indicated by eye pigmentation and confirmed using Dct-Tyr minigene primers TYEx1/TYEx2. The presence of the Y chromosome was assessed using Y-specific primers 207F (5'– TGTAGACAGTCTTTCTGT–3') and 207R (5'– CACAGGCTCTCCTGATTT–3'; ref. 20). Fetuses were embedded in paraffin and serially sectioned (8 m). We stained a subset of sections from each fetus with haematoxylin and eosin according to standard protocols. 35[S]-UTP-labelled riboprobes were transcribed from a 255-bp fragment of Sox9 as described29. Sections were exposed to emulsion for 1 week, developed and counterstained with haematoxylin. Accession numbers. Draft sequence data for mouse BAC 455N13, AC069019; human BAC containing homologous sequence data, 005771. Received 7 June 2000; Accepted 11 September 2000. REFERENCES 1. Sinclair, A.H. et al. A gene from the human sexdetermining region encodes a protein with homology to a conserved DNA-binding motif. Nature 346, 240-244 (1990). | PubMed | ISI | ChemPort | 2. Gubbay, J. et al. A gene mapping to the sex-determining region of the mouse Y chromosome is a member of a novel family of embryonically expressed genes. Nature 346, 245-250 (1990). | PubMed | ISI | ChemPort | 3. Koopman, P., Gubbay, J., Vivian, N., Goodfellow, P. & Lovell-Badge, R. Male development of chromosomally female mice transgenic for Sry. Nature 351, 117-121 (1991). | PubMed | ISI | ChemPort | 4. Berta, P. et al. Genetic evidence equating SRY and the testis-determining factor. Nature 348, 448-450 (1990). | PubMed | ISI | ChemPort | 5. Foster, J.W. et al. Evolution of sex determination and the Y chromosome: SRY-related sequences in marsupials. Nature 359, 531-533 (1992). | PubMed | ISI | ChemPort | 6. Moniot, B., Berta, P., Scherer, G., Sudbeck, P. & Poulat, F. Male specific expression suggests role of DMRT1 in human sex determination. Mech. Dev. 91, 323-325 (2000). | Article | PubMed | ISI | ChemPort | 7. De Grandi, A. et al. The expression pattern of a mouse doublesex-related gene is consistent with a role in gonadal differentiation. Mech. Dev. 90, 323-326 (2000). | Article | PubMed | ISI | ChemPort | 8. Raymond, C.S. et al. Evidence for evolutionary conservation of sex-determining genes. Nature 391, 691695 (1998). | Article | PubMed | ISI | ChemPort | 9. Foster, J.W. et al. Campomelic dysplasia and autosomal sex reversal caused by mutations in an SRY-related gene. Nature 372, 525-530 (1994). | PubMed | ISI | ChemPort | 10. Wagner, T. et al. Autosomal sex reversal and campomelic dysplasia are caused by mutations in and around the SRY-related gene SOX9. Cell 79, 1111-1120 (1994). | PubMed | ISI | ChemPort | 11. Mansour, S., Hall, C.M., Pembrey, M.E. & Young, I.D. A clinical and genetic study of campomelic dysplasia. J. Med. Genet. 32, 415-420 (1995). | PubMed | ISI | ChemPort | 12. Huang, B., Wang, S., Ning, Y., Lamb, A. & Bartley, J. Autosomal XX sex reversal caused by duplication of SOX9. Am. J. Med. Genet. 87, 349-353 (1999). | Article | PubMed | ISI | ChemPort | 13. Kent, J., Wheatley, S.C., Andrews, J.E., Sinclair, A.H. & Koopman, P. A male-specific role for SOX9 in vertebrate sex determination. Development 122, 2813-2822 (1996). | PubMed | ISI | ChemPort | 14. Morais da Silva, S. et al. Sox9 expression during gonadal development implies a conserved role for the gene in testis differentiation in mammals and birds. Nature Genet. 14, 62-68 (1996). | PubMed | ChemPort | 15. Swain, A. & Lovell-Badge, R. Mammalian sex determination: a molecular drama. Genes Dev. 13, 755767 (1999). | PubMed | ISI | ChemPort | 16. Capel, B. Sex in the 90s: SRY and the switch to the male pathway. Annu. Rev. Physiol. 60, 497-523 (1998). | Article | PubMed | ISI | ChemPort | 17. Graves, J.A. Interactions between SRY and SOX genes in mammalian sex determination. Bioessays 20, 264-269 (1998). | Article | PubMed | ISI | ChemPort | 18. McElreavey, K., Vilain, E., Abbas, N., Herskowitz, I. & Fellous, M. A regulatory cascade hypothesis for mammalian sex determination: SRY represses a negative regulator of male development. Proc. Natl Acad. Sci. USA 90, 3368-3372 (1993). | PubMed | ISI | ChemPort | 19. Yokoyama, T. et al. Conserved cysteine to serine mutation in tyrosinase is responsible for the classical albino mutation in laboratory mice. Nucleic Acids Res. 18, 7293-7298 (1990). | PubMed | ISI | ChemPort | 20. Bishop, C.E. & Mitchell, M.J. Encyclopedia of the mouse genome VII. Mouse chromosome Y. Mamm. Genome 8, S378-381 (1998). | Article | 21. Burgoyne, P.S. The mammalian Y chromosome: a new perspective. Bioessays 20, 363-366 (1998). | Article | PubMed | ISI | ChemPort | 22. Sutcliffe, M.J. & Burgoyne, P.S. Analysis of the testes of H-Y negative XOSxrb mice suggests that the spermatogenesis gene (Spy) acts during the differentiation of the A spermatogonia. Development 107, 373-380 (1989). | PubMed | ISI | ChemPort | 23. Bi, W., Deng, J.M., Zhang, Z., Behringer, R.R. & de Crombrugghe, B. Sox9 is required for cartilage formation. Nature Genet. 22, 85-89 (1999). | Article | PubMed | ISI | ChemPort | 24. Pfeifer, D. et al. Campomelic dysplasia translocation breakpoints are scattered over 1 Mb proximal to SOX9: evidence for an extended control region. Am. J. Hum. Genet. 65, 111-124 (1999). | Article | PubMed | ISI | ChemPort | 25. Capel, B. et al. Deletion of Y chromosome sequences located outside the testis determining region can cause XY female sex reversal. Nature Genet. 5, 301-307 (1993). | PubMed | ISI | ChemPort | 26. Zhao, S. & Overbeek, P.A. Tyrosinase-related protein 2 promoter targets transgene expression to ocular and neural crest-derived tissues. Dev. Biol. 216, 154-163 (1999). | Article | PubMed | ISI | ChemPort | 27. Rowe, L.B. et al. Maps from two interspecific backcross DNA panels available as a community genetic mapping resource. Mamm. Genome 5, 253-274 (1994). | PubMed | ISI | ChemPort | 28. Hustert, E., Scherer, G., Olowson, M., Guenet, J.L. & Balling, R. Rbt (Rabo torcido), a new mouse skeletal mutation involved in anteroposterior patterning of the axial skeleton, maps close to the Ts (tail-short) locus and distal to the Sox9 locus on chromosome 11. Mamm. Genome 7, 881-885 (1996). | Article | PubMed | ISI | ChemPort | 29. Zhao, Q., Eberspaecher, H., Lefebvre, V. & De Crombrugghe, B. Parallel expression of Sox9 and Col2a1 in cells undergoing chondrogenesis. Dev. Dyn. 209, 377386 (1997). | Article | PubMed | ISI | ChemPort | ACKNOWLEDGMENTS We thank G. Schuster for microinjections; L. Vien for animal husbandry and PCR assays; B. de Crombrugghe for the Sox9 in situ hybridization probe; H. Boettger-Tong for advice on several aspects of the work; and B. Capel and P. Koopman for discussion on the manuscript. Supported by grants from the National Institutes of Health (to C.E.B., R.R.B. and P.A.O.). Figure 1: Phenotype of Ods-mutant mice. a, A four-week wild-type FVB XX female (left) and an XX Ods/+ male (right) are shown. The Ods mouse has pigmented ears, light coat pigmentation, pigmented eyes with cataracts and male external genitalia. b, Pedigree chart of the OVE1220 transgenic mouse line. All of the transgenic mice showed co-inheritance of the DctTyr minigene, coat colour, microphthalmia and sexual phenotype. Squares, males; circles, females; diamonds, died at birth; filled symbols, transgenic mice. Mice in the second generation were tested by PCR for the presence (XY) or absence (XX) of a Y chromosome. c, PCR genotyping for Sry, RbmY1a1 and the Dct-Tyr (Tyr) minigene. Results for six pigmented, microphthalmic males (1–6), along with wildtype FVB male (M) and female (F) controls, are shown. The pigmented, microphthalmic males were either positive for all Y-chromosome markers (2–5) or negative for all Y markers (1, 6). All were positive for the transgene. d, Histology of 6-week adult testes of an XY Ods/+ male (left) and an XX Ods/+ male (right). The XY Ods/+ testis is histologically normal, showing all stages of spermatogenesis, whereas the XX Ods/+ testis contains Sertoli cells, but is devoid of germ cells. Vacuolated Sertoli cells are present in many tubules. Figure 2: Ods has created a deletion upstream of Sox9. a, Map of the transgenic insertion site on mouse chromosome 11. Restriction sites for BamHI (B) and SalI (S) (from the vector) are indicated. Two copies of the DctTyr minigene were found to have integrated in a head-to-tail alignment. Hatching shows the region and orientation of flanking DNA homologous to human chromosome 17 BAC. The human chromosome 17 BAC contig upstream of SOX9 is also shown. b, Ods contains a single transgene insertion. Southern-blot analysis of EcoRI-restricted DNA from wild-type XY males, XX Ods/+ and XY Ods/+ males probed with a 700-bp Tyr probe. In addition to the expected 5.5-kb endogenous Tyr locus, a single 17-kb hybridizing band was detected, consistent with the transgene having inserted at a single locus. c, PCR analysis of homozygous XX and XY Ods/Ods DNA. The 180-bp product is present in normal and heterozygous DNA, but is absent from Ods/Ods mice, indicating that the transgene insertion has created a deletion. Figure 3: Mapping the Ods insertion site. a, FISH mapping shows that the Dct-Tyr transgene has integrated on distal chromosome 11, band E2 (top left and right). Probes for Ods (red) and Sox9 (green) were not resolved on metaphase chromosome spreads (bottom left), but can be resolved in interphase nuclei (bottom right), indicating that they map 1–2 Mb apart. b, Interspecific backcross mapping places Ods and Sox9 1 cM apart, at 69 cM on mouse chromosome 11 (full data can be obtained at http://www.informatics.jax.org). Figure 4: Histological analysis and Sox9 in situ hybridizations of fetal gonads. a, Normal testicular development in a wild-type XY embryo at 14.5 d.p.c. showing seminiferous cords (arrow) and the testis-specific blood vessel (arrowhead). b,c, Sox9 transcripts are localized to Sertoli cells (arrow). d, Normal ovarian development in a wild-type XX embryo at 14.5 d.p.c. e,f, Sox9 is not expressed by the developing ovary. g, Testicular development in an XX Ods/+ embryo at 14.5 d.p.c. Seminiferous cords (arrow) and the testis-specific blood vessel (arrowhead) are present. h,i, Sox9 is expressed by the Sertoli cells (arrow) of the XX Ods/+ gonad. j, The wild-type XY male gonad at 11.5 d.p.c.; (g) is at the indifferent stage (m, mesonephros). k,l, Strong Sox9 expression is localized to the wild-type XY male gonad. m, Wild-type XX female indifferent gonad at 11.5 d.p.c. n,o, Sox9 transcripts are not detectable above background in the indifferent female gonad. p, The gonad of an XX Ods/+ embryo at 11.5 d.p.c. is similarly morphologically indifferent. q,r, Sox9 is expressed in the XX Ods/+ 11.5 d.p.c. gonad. Scale bar, 100 m. All figures are of the same scale. Figure 5: A double-repressor model of mammalian sex determination. At 10.5 d.p.c., Sox9 is expressed in the genital ridges of both male and female embryos. This expression is mediated by a genital ridge-specific enhancer located upstream or downstream of Sox9 (not shown). In wild-type XX gonads at 11.5 d.p.c. (left), Sox9 expression is downregulated by the binding of a repressor or repressor complex to gonad-specific regulatory elements (filled box) located 1.3 Mb upstream of Sox9. In wild-type XY gonads at 11.5 d.p.c. (middle), this repressor binding is predicted to be antagonized by Sry protein, leading to upregulation of Sox9 expression, followed by Sertoli-cell differentiation and testis formation. In the XX Ods/+ gonads at 11.5 d.p.c. (right), Sox9 cannot be repressed, as the gonadspecific elements for repressor binding have been deleted by the transgene insertion. As a result, Sox9 is expressed at sufficient levels to induce testis formation and male development. Published online: 20 August 2001, doi:10.1038/ng711 volume 29 no. 1 pp 20 - 21 Human mtDNA and Y-chromosome variation is correlated with matrilocal versus patrilocal residence Hiroki Oota1, 2, Wannapa Settheetham-Ishida3, Danai Tiwawech4, Takafumi Ishida5 & Mark Stoneking1 1. Max Planck Institute for Evolutionary Anthropology, Inselstrasse 22, D-04103 Leipzig, Germany. 2. Present address: Department of Genetics, Yale University School of Medicine, New Haven, Connecticut, USA. 3. Khon Kaen University, Khon Kaen, Thailand. 4. National Cancer Institute, Bangkok, Thailand. 5. Department of Biological Sciences, School of Science, University of Tokyo, Tokyo, Japan. Correspondence should be addressed to M Stoneking. e-mail: stoneking@eva.mpg.de Genetic differences among human populations are usually larger for the Y chromosome than for mtDNA1-3. One possible explanation is the higher rate of female versus male migration due to the widespread phenomenon of patrilocality, in which the woman moves to her mate's residence after marriage. To test this hypothesis, we compare mtDNA and Y-chromosome variation in three matrilocal (in which the man moves to his mate's residence after marriage) and three patrilocal groups among the hill tribes of northern Thailand. Genetic diversity in these groups shows a striking correlation with residence pattern, supporting the role of sex-specific migration in influencing human genetic variation. Patrilocality, in which men stay in their birthplace and women move, occurs in about 70% of human societies4. Patrilocality has been invoked to explain the usual patterns observed in human populations: high mtDNA and low Ychromosome diversity within groups, large between-group differences for the Y chromosome, and small between-group differences for mtDNA. If patrilocality is responsible for these patterns, then matrilocal groups (in which the women stay in their birthplace and the men move) might show the opposite patterns: high Ychromosome and low mtDNA diversity within groups, large between-group differences for mtDNA, and small between-group differences for the Y chromosome. Here we compare Y-chromosome and mtDNA diversity in three matrilocal groups (Lahu, Red Karen and White Karen; the two Karen groups were sampled from multiple villages that were 5–25 km apart) and three patrilocal groups (Akha and two groups of Lisu, one sampled near Chiang Rai and one sampled about 220 km away, near Mae Hong Son) from northern Thailand. We obtained blood samples between 1996 and 1998, with informed consent. From these we prepared transformed cell lines from which we subsequently extracted DNA5. We analyzed 360 bp of the first hypervariable segment (HV1) of the mtDNA control region, corresponding to positions 16024– 16383 (ref. 6), using standard methods (Oota, unpublished data). To assess Y-chromosome variation, we carried out multiplex typing of nine short tandem repeat (STR) loci (DYS385a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393 and DYS394) as previously described (http://www.ystr.org/usa). The HV1 sequences have been submitted to the HVRbase database7 and are also available from the authors, as are the Y-STR haplotypes. We estimated haplotype diversity within groups8 for the HV1 sequences and the Y-STR haplotypes, and calculated the genetic distances between groups based on dA values8 for the HV1 sequences and Rst values9 for the Y-STR haplotypes. The haplotype diversity for mtDNA was higher in all of the patrilocal groups than in any of the matrilocal groups (Fig. 1). The mean mtDNA diversity in the patrilocal groups was 0.937, which is significantly greater than the mean mtDNA diversity of 0.860 in the matrilocal groups (Mann-Whitney U-test, P<0.05). Conversely, the Y-STR haplotype diversity was higher in all of the matrilocal groups than in any of the patrilocal groups (Fig. 1). The mean Y-STR diversity was 0.965 in the matrilocal groups, significantly greater than the mean Y-STR diversity of 0.863 in the patrilocal groups (MannWhitney U-test, P<0.05). In addition, the average genetic distance based on mtDNA HV1 sequences was significantly higher among the matrilocal groups than among the patrilocal groups, while the average genetic distance based on Y-STR haplotypes was significantly higher among the patrilocal groups than among the matrilocal groups (Table 1). Genetic variation in the north Thailand hill tribes thus shows a striking correlation with residence pattern. Matrilocal groups have high within-group diversity for the Y chromosome and large between-group distances for mtDNA, whereas patrilocal groups have high within-group diversity for mtDNA and large between-group distances for the Y chromosome. All of the groups studied come from the same geographic region, speak related Sino-Tibetan languages and practice similar subsistence modes of agriculture. These shared attributes make it unlikely that the correlation is accounted for by some other factor that differs between the matrilocal and patrilocal groups, such as reproductive success. Moreover, although our results do not rule out a role for variance in male reproductive success in influencing patterns of genetic variation, theoretical considerations suggest at best a minor effect of such variance1. We conclude that patrilocality does appear to be primarily responsible for the higher betweenpopulation genetic differences consistently observed for the Y chromosome as opposed to mtDNA or autosomal loci. Our results also provide evidence for the importance of social structure in influencing human genetic diversity10-12. Received 1 May 2001; Accepted 25 July 2001; Published online 20 August 2001. REFERENCES 1. Seielstad, M.T., Minch, E. & Cavalli-Sfroza, L.L. Nature Genet. 20, 278-280 (1998). | Article | PubMed | ISI | ChemPort | 2. Perez-Lezaun, A. et al. Am. J. Hum. Genet. 65, 208-219 (1999). | Article | PubMed | ISI | ChemPort | 3. Jorde, L.B. et al. Am. J. Hum. Genet. 66, 979-988 (2000). | Article | PubMed | ISI | ChemPort | 4. Burton, M.L., Moore, C.C., Whiting, J.W.M. & Romney, A.K. Curr. Anthropol. 37, 87-123 (1996). | Article | ISI | 5. Kimura, M. et al. Hum. Biol. 70, 993-1000 (1998). | PubMed | ISI | ChemPort | 6. Anderson, S. et al. Nature 290, 457-465 (1981). | PubMed | ISI | ChemPort | 7. Burckhardt, F., von Haeseler, A. & Meyer, S. Nucleic Acids Res. 27, 138-142 (1999). | Article | PubMed | ISI | ChemPort | 8. Nei, M. Molecular Evolutionary Genetics (Columbia Univ. Press, New York, 1987). 9. Slatkin, M. Genetics 139, 457-462 (1995). | PubMed | ISI | ChemPort | 10. Salem, A.H., Badr, F.M., Gaballah, M.F. & Pääbo, S. Am. J. Hum. Genet. 59, 741-743 (1996). | PubMed | ISI | ChemPort | 11. Bamshad, M.J. et al. Nature 395, 651-652 (1998). | Article | PubMed | ISI | ChemPort | 12. Bamshad, M. et al. Genome Res. 11, 994-1004 (2001). | Article | PubMed | ISI | ChemPort | ACKNOWLEDGMENTS We thank S. Brauer and H. Schädlich for technical assistance, and B. Pakendorf, R. Cordaux, M. Kayser, I. Nasidze, S. Pääbo and A. Ryan for helpful discussion. H.O. was supported by a fellowship from the Japanese Society for the Promotion of Science (JSPS). Sample collection was supported by funds from the JSPS and the Ministry of Education, Science, and Culture, Japan; research was supported by funds from the Max Planck Society, Germany. Figure 1: Diversity in mtDNA (red) and Y-STR (green) haplotypes in three matrilocal and three patrilocal groups from northern Thailand. From left to right, the matrilocal groups (mtDNA sample size, Y-STR sample size) are Lahu (39, 17), Red Karen (39, 30), and White Karen (40, 20); the patrilocal groups are Akha (91, 21), Lisu from near Chiang Rai (53, 9), and Lisu from near Mae Hong Son (42, 22). The fourth (shaded) bar in each group indicates the mean diversity (and standard error) for the group. Table 1: Average genetic distances and standard errors based on mtDNA HV1 sequences and Y-STR haplotypes among matrilocal and patrilocal groups doi:10.1038/7771 volume 21 no. 4 pp 429 - 433 There is an erratum (June 1999) associated with this Letter. Please click here to view. Retroposition of autosomal mRNA yielded testis-specific gene family on human Y chromosome Bruce T Lahn & David C Page Howard Hughes Medical Institute, Whitehead Institute and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA. Correspondence should be addressed to D C Page. e-mail: dcpage@wi.mit.edu Most genes in the human NRY (non-recombining portion of the Y chromosome) can be assigned to one of two groups: X-homologous genes or testis-specific gene families with no obvious X-chromosomal homologues1, 2. The CDY genes have been localized to the human Y chromosome1, and we report here that they are derivatives of a conventional single-copy gene, CDYL (CDY-like), located on human chromosome 13 and mouse chromosome 6. CDY genes retain CDYL exonic sequences but lack its introns. In mice, whose evolutionary lineage diverged before the appearance of the Y-linked derivatives, the autosomal Cdyl gene produces two transcripts; one is expressed ubiquitously and the other is expressed in testes only. In humans, autosomal CDYL produces only the ubiquitous transcript; the testis-specific transcript is the province of the Y-borne CDY genes. Our data indicate that CDY genes arose during primate evolution by retroposition of a CDYL mRNA and amplification of the retroposed gene. Retroposition contributed to the gene content of the human Y chromosome, together with two other molecular evolutionary processes: persistence of a subset of genes shared with the X chromosome3, 4 and transposition of genomic DNA harbouring intact transcription units5. We had previously identified a single full-length cDNA clone from CDY but had mapped homologous sequences to two different locations on the human Y chromosome1. We explored the possibility that there might be multiple functional CDY genes by isolating 25 additional cDNA clones from a human testis library. Sequence analysis of the 25 clones revealed two distinct species, which we designated CDY1 (17 clones) and CDY2 (8 clones). The sequence of CDY1 was as previously reported1. Like CDY1, CDY2 appears to encode a protein with a combination of chromatin-binding and catalytic domains ( Fig. 1a). The predicted coding regions of CDY1 and CDY2 were 99% identical in nucleotide sequence, and the amino acid sequences of the predicted proteins were 98% identical. We observed an alternative 3´ region in 4 of 17 CDY1 cDNA clones. Most of the putative protein encoded by this minor CDY1 transcript is identical to that encoded by the major transcript; its carboxy terminus is divergent (Fig. 1a,b). We had previously localized CDY-homologous sequences to Y chromosome deletion intervals 5L and 6F (ref. 1), but it was not apparent which of these map locations corresponded to which gene. Exploiting sequence differences between CDY1 and CDY2, we designed PCR assays specific to each of the two genes. Using genomic DNAs from individuals carrying partial Y chromosomes as templates6, we mapped CDY1 to interval 6F, and CDY2 to interval 5L (data not shown). We conclude that at least two distinct functional CDY genes exist on the human Y chromosome, and that they encode protein isoforms. We cannot exclude the possibility that there are multiple CDY1 genes in interval 6F, multiple CDY2 genes in interval 5L, or additional CDY species that we failed to identify. To determine the intron/exon structures of CDY1 and CDY2, we partially sequenced genomic BAC clones corresponding to each of the genes. CDY1 and CDY2 genomic sequences were found to be collinear with the major CDY1 and CDY2 transcripts as captured in cDNA clones, with no evidence that any intronic sequences had been excised. In the case of the minor CDY1 transcript, it appeared that a single intron had been excised; the splice donor site was located within, and near the 3´ end of, the single exon of the major transcript (Fig. 1 b). Further evidence of the atypical nature of CDY genes arose from examining their polyadenylation sites. For 13 of 17 CDY1 and 5 of 8 CDY2 cDNA clones, the poly(A) tract was located immediately 3´ of the predicted stop codon (Fig. 1b). CDY1 and CDY2 may be the only mammalian nuclear genes in which polyadenylation is known to occur so close to the stop codon. To address the evolutionary origin of CDY on the human Y chromosome, we sought homologues that might exist elsewhere in the genomes of humans or mice. We identified an autosomal homologue in both humans and mice by incubating a CDY-derived probe, at low stringency, with testis cDNA libraries from the two species. We will refer to the human and mouse autosomal homologues as CDYL and Cdyl, respectively. We determined the sequence of the human CDYL transcript by merging the sequences of four incomplete but substantially overlapping cDNA clones; we found no sequence differences among the four cDNA clones in regions of overlap. Similarly, we derived the sequence of mouse Cdyl by merging the sequences of ten incomplete but overlapping cDNA clones. In both species, the autosomal transcript, similar to human CDY, appears to encode a protein with an amino-terminal chromo domain and a C-terminal catalytic domain (Fig. 2). The predicted mouse and human CDYL proteins have 93% amino acid identity overall, with even greater similarity evident in the chromo and catalytic domains (Fig. 2). In contrast, the predicted human CDYL and CDY proteins have only 63% amino acid identity; again, greater similarity is evident in the chromo and catalytic domains (Fig. 2). These results suggest that human CDYL and mouse Cdyl are orthologues, and human CDYL and CDY are paralogues. They also suggest that evolutionary constraints operating on the chromo and catalytic domains of the CDYL and CDY proteins account for the relatively high amino acid identities observed there. Chromosomal mapping of the human CDYL and mouse Cdyl genes indicate that they are indeed orthologues. In humans, we localized CDYL to the distal short arm of chromosome 6 (distal to marker NIB1876 and proximal to WI-4489) by radiation hybrid mapping. In mice, we localized Cdyl to the proximal portion of chromosome 13 (distal to Fim1 and proximal to D13Mit18) by meiotic linkage mapping. These portions of human chromosome 6 and mouse chromosome 13 were previously shown to be syntenic7. We conclude that a gene much like human CDYL and mouse Cdyl existed in the common ancestors of mice and humans. Our experiments to this point left another evolutionary question unanswered: which came first, CDY or CDYL? The first indication came from studies in mice, in which we were unable to identify a Y-linked homologue of human CDY by either Southern-blot analysis (Fig. 3a) or screening of testis cDNA libraries. These findings are consistent with CDYL having given rise to CDY sometime after divergence of the mouse and human lineages. Additional evidence that CDYL gave rise to CDY came from analysis of the human CDYL and mouse Cdyl gene structures. We found that each of these autosomal genes contains eight introns ( Fig. 2). As one would expect for orthologues, most introns were positioned at corresponding sites in human CDYL and mouse Cdyl. The presence of numerous introns in the CDYL/Cdyl orthologues, contrasted with the paucity of introns in the human CDY genes, led us to consider whether CDY had arisen by reverse transcription of a spliced CDYL mRNA, with subsequent integration into the Y chromosome. Closer examination of the sequence data corroborated this retroposition model. Excluding a few hundred nucleotides at the 5´ terminus, the nucleotide sequence of the human CDY1 (or CDY2) genomic locus is essentially collinear with that of the mature human CDYL (or mouse Cdyl) transcripts. Even the single intron of the minor CDY1 transcript is collinear with, and homologous to, CDYL exon 9 (Fig. 2). Thus, with respect to evolutionary origins, all but the most 5´ portions of human CDY1 and CDY2 genomic loci can be accounted for by mature CDYL/Cdyl transcripts. These findings are as one would predict if CDYL/Cdyl intronic sequences had been excised before retroposition. How long ago did the retroposition event occur? To address this question, we incubated probes derived from human CDYL and CDY with Southern blots of male and female genomic DNAs from various mammalian species. Two conclusions emerged from this analysis (Fig. 3a): (i) the autosomal (male-female common) CDYL gene appears to be widely conserved among the mammals tested, including opossum, a marsupial; and (ii), the Y-chromosomal (malespecific) CDY family appears to be restricted to simian primates, including apes, Old World monkeys (for example macaques) and New World monkeys (for example squirrel monkeys, whose sequences appear to have undergone extraordinary amplification on the Y). We observed no malespecific hybridization fragments in prosimians (lemurs) or any of the non-primate mammals tested. The simplest interpretation of these findings is that the retroposition event occurred in the simian lineage after its divergence from prosimians but before the split between Old and New World monkeys, perhaps 40-50 million years ago (Fig. 3b). Given the age of the retroposed CDY genes, it is not unexpected that certain hallmarks of retroposition are absent: we found no evidence of target site duplication or of a poly(A) tract in CDY genomic sequences. The observed 99% identity between the coding sequences of human CDY1 and CDY2 (Fig. 1a; as contrasted with 73% identity between human CDY and human CDYL) suggests that their coexistence is the result of subsequent amplification on the Y chromosome. Do all members of the CDY/CDYL gene family serve malespecific functions? We had previously demonstrated by northern-blot analysis that human CDY genes are abundantly transcribed in adult testis but are not detectable in other adult tissues1 (Fig. 4). In contrast, human CDYL appears to be expressed at a modest and comparable level in all tissues examined (Fig. 4), indicating that it may perform a housekeeping function. In mice, Cdyl is characterized by two transcripts: (i) a ubiquitous one that is similar in length to the ubiquitous transcript of human CDYL; and (ii) a testis-specific one that is similar in length to the testis-specific transcript of human CDY (Fig. 4). These expression studies in humans and mice suggest that a genomic partitioning of housekeeping and testis-specific functions evolved in primates after retroposition. We interpret the mouse, which possesses no CDY homologue on its Y chromosome, as representing the evolutionarily ancestral condition. In humans, the autosomal gene CDYL appears to retain the housekeeping function but relinquished the testis-specific function, the latter of which seems to have transferred to the Y-linked CDY genes. We propose, on the basis of these and other findings, that the gene content of the human NRY was forged during evolution through the interplay of three molecular processes: retroposition, transposition and persistence ( Fig. 5). In certain respects, the process of CDY retroposition is reminiscent of the transposition process by which the DAZ gene family arose on the Y chromosome5. The net result of both evolutionary processes was the duplicative transfer of an autosomal gene to the Y chromosome, with retention of the progenitor autosomal gene. Subsequent to retroposition (as exemplified by CDY) and transposition (as exemplified by DAZ ) the genes transferred to the Y chromosome were therein amplified. The expression pattern of the Y-linked CDY genes, however, is very different to that of their autosomal progenitor, Cdyl/CDYL (Fig. 4), whereas expression of the Y-linked DAZ genes is quite similar to that of their autosomal progenitor, DAZL (also known as DAZLA , DAZH or SPGYLA; Refs 5,811). In the case of>CDY, the nascent Y-linked gene was fashioned from a fully processed, reverse-transcribed cDNA which, we speculate, fortuitously integrated near an existing promoter or into an otherwise transcriptionally permissive locale on the Y chromosome. In contrast, in the case of DAZ, a segment of genomic DNA containing an entire transcription unit was duplicated from an autosome onto the Y chromosome5. The transposed genomic DNA of DAZ probably included not only introns and exons but also the promoter of the ancestral gene. Retroposition has long been suspected to figure prominently in the evolution of animal Y chromosomes—but solely as a means of marking the decay of genes previously shared with the X chromosome. In Drosophila melanogaster , molecular studies have suggested that insertional mutagenesis via retroposition was a major driving force in Y gene decay12-14. An accumulation of retroposed elements on the mouse Y chromosome15 suggested that this paradigm might extend to mammals16. Thus, it is reasonable to speculate that retroposition—especially of highly transposable elements— may have had an important role in Y gene decay in many animal lineages. As illustrated here by CDY, however, retroposition has also provided a mechanism for gene building during the evolution of the human Y chromosome. Methods Identification of cDNA clones. The cDNA insert of plasmid pDP1660 contains the entire ORF of human CDY1, as previously reported1. To identify additional CDY clones, we labelled the plasmid insert with 32P-dCTP by random priming and used this probe to screen a human adult testis cDNA library (Clontech). We incubated blots overnight at 65 °C in NaiPO4 (0.5 M), 7% SDS, followed by three washes of 15 min each at 65 °C in 0.1 SSC, 0.1% SDS. We identified human CDYL clones by rescreening the same library with the same probe at lower stringency (incubation at 60 °C, as above; washing at 55 °C in 0.1 SSC, 0.1% SDS). We also used this probe and conditions to screen a mouse adult testis cDNA library (Clontech), resulting in the identification of mouse Cdyl clones. PCR analysis of human genomic DNAs. We tested human genomic DNAs for the presence or absence of CDY1 and CDY2 using PCR assays specific to each of the genes. PCR primers for CDY1 were 5'-GGCGAAAGCTGACAGCAA-3' and 5'-GGGTGAAAGTTCCAGTCAA-3´. PCR primers for CDY2 were 5'-GACCACAAGAAAACTGTGAG-3' and 5'GATCTGCTGCAATAGGGTC-3´. Thermocycling conditions were 30 cycles of 1 min at 94 °C, 45 s at 60 °C and 45 s at 72 °C. Characterization of intron/exon structures. We identified BAC clones corresponding to the human CDY1, human CDY2, human CDYL and mouse Cdyl genomic loci by hybridization screening on high-density filters (Research Genetics) of CITB BAC libraries prepared from human or mouse genomic DNA. We then used these BAC clones to characterize the intron/exon structures of the genes. We designed a series of PCR assays, based on the cDNA sequence of each gene, each yielding a cDNA product of approximately 50 bp and collectively encompassing the entire cDNA sequence. We then repeated each of these PCR assays using the corresponding BAC as template. In some cases, we obtained PCR products of identical length using cDNA or genomic BAC as template, indicating that no introns were located between the two primers. In other cases, a larger PCR product was obtained using the BAC as template (in some cases long-range PCR employing TaqExtender enzyme (Stratagene) was required), indicating the presence of an intron. We sequenced all intron-containing PCR products to identify intron/exon boundaries precisely. Radiation hybrid mapping of human CDYL. Using PCR, we tested DNAs from the 93 hybrid cell lines of the GeneBridge 4 panel17 (Research Genetics) for the presence of CDYL. PCR primers were 5'-CTGAGCAGGAGAACATCACC-3' and 5'GCTACGGGTGAGCTTGTTTC-3'. Thermocycling conditions were as listed above. Analysis of the results positioned CDYL with respect to the radiation hybrid map constructed at the Whitehead/MIT Center for Genome Research18 (http://wwwgenome.wi.mit.edu/cgi-bin/contig/phys_map ). Genetic mapping of mouse Cdyl. We typed genomic DNA from the BSS ([C57BL/6JEi SPRET/Ei]F1 female SPRET/Ei male) backcross panel (Jackson Laboratory) for a Cdyl polymorphism by PCR. PCR primers were 5'AAAAATGACCCTCACTAAGTTAC-3' and 5'AGGCTCCCTGCAGTAAGTA-3'. Thermocycling conditions were as listed above. This PCR assay yielded a 53-bp product when C57BL/6 genomic DNA was used as template, but no product with Mus spretus DNA (because of sequence polymorphism at priming sites). Analysis of the results positioned Cdyl with respect to the genetic linkage map maintained at the Jackson Laboratory (http://www.informatics.jax.org/map.html ). Southern- and northern-blot analyses. Each Southern-blot lane contained genomic DNA (7 g). We incubated blots overnight at 65 °C in NaiPO4 (0.5 M), 7% SDS, followed by three washes of 15 min each at 60 °C in 1 SSC, 0.1% SDS. For northern-blot analysis, commercial filters (Clontech) doped with poly(A)+ RNA (2 g/lane) were incubated under the same conditions, then washed at 65 °C in 0.1 SSC, 0.1% SDS. GenBank accession numbers. Human CDY1 major transcript, AF080597; human CDY1 minor transcript, AF000981 (ref. 1); human CDY2, AF080598; human CDYL, AF081258 and AF081259; mouse Cdyl, AF081260 and AF081261. Received 9 November 1998; Accepted 25 February 1999. REFERENCES 1. Lahn, B.T. & Page, D.C. Functional coherence of the human Y chromosome. Science 278, 675-680 (1997). | Article | PubMed | ISI | ChemPort | 2. Vogt, P.H. et al. Report of the Third International Workshop on Y chromosome mapping 1997. Cytogenet. Cell Genet. 79, 1-20 (1997). | PubMed | ChemPort | 3. Graves, J.A.M. The origin and function of the mammalian Y chromosome and Y-borne genes--an evolving understanding. BioEssays 17, 311-321 (1995). | PubMed | ISI | ChemPort | 4. Jegalian, K. & Page, D.C. A proposed path by which genes common to mammalian X and Y chromosomes evolve to become X inactivated. Nature 394, 776-780 (1998). | Article | PubMed | ISI | ChemPort | 5. Saxena, R. et al. The DAZ gene cluster on the human Y chromosome arose from an autosomal gene that was transposed, repeatedly amplified and pruned. Nature Genet. 14, 292-299 (1996). | PubMed | ISI | ChemPort | 6. Vollrath, D. et al. The human Y chromosome: a 43interval map based on naturally occurring deletions. Science 258, 52-59 (1992). | PubMed | ISI | ChemPort | 7. De Bry, R.W. & Seldin, M.F. Human/mouse homology relationships. Genomics 33, 337-351 (1996). | Article | PubMed | ISI | ChemPort | 8. Yen, P.H., Chai, N.N. & Salido, E.C. The human autosomal gene DAZLA: testis specificity and a candidate for male infertility. Hum. Mol. Genet. 5, 2013-2017 (1996). | Article | PubMed | ISI | ChemPort | 9. Shan, Z. et al. A SPGY copy homologous to the mouse gene Dazla and the Drosophila gene boule is autosomal and expressed only in the human male gonad. Hum. Mol. Genet. 5, 2005-2011 (1996). | Article | PubMed | ISI | ChemPort | 10. Cooke, H.J., Lee, M., Kerr, S. & Ruggiu, M. A murine homologue of the human DAZ gene is autosomal and expressed only in male and female gonads. Hum. Mol. Genet. 5, 513-516 (1996). | Article | PubMed | ISI | ChemPort | 11. Reijo, R. et al. Mouse autosomal homolog of DAZ, a candidate male sterility gene in humans, is expressed in male germ cells before and after puberty. Genomics 35, 346-352 (1996). | Article | PubMed | ISI | ChemPort | 12. Ganguly, R., Swanson, K.D., Ray, K. & Krishnan, R. A BamHI repeat element is predominantly associated with the degenerating neo-Y chromosome of Drosophila miranda but absent in the Drosophila melanogaster genome. Proc. Natl Acad. Sci. USA 89, 1340-1344 (1992). | PubMed | ISI | ChemPort | 13. Steinemann, M. & Steinemann, S. Degenerating Y chromosome of Drosophila miranda: a trap for retroposons. Proc. Natl Acad. Sci. USA 89, 7591-7595 (1992). | PubMed | ISI | ChemPort | 14. Steinemann, M. & Steinemann, S. The enigma of Y chromosome degeneration: TRAM, a novel retrotransposon is preferentially located on the Neo-Y chromosome of Drosophila miranda. Genetics 145, 261266 (1997). | PubMed | ISI | ChemPort | 15. Eicher, E.M., Hutchison, K.W., Phillips, S.J., Tucker, P.K. & Lee, B.K. A repeated segment on the mouse Y chromosome is composed of retroviral-related, Y-enriched and Y-specific sequences. Genetics 122, 181-192 (1989). | PubMed | ISI | ChemPort | 16. Charlesworth, B., Sniegowski, P. & Stephan, W. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371, 215-220 (1994). | PubMed | ISI | ChemPort | 17. Gyapay, G. et al. A radiation hybrid map of the human genome. Hum. Mol. Genet. 5, 339-346 (1996). | Article | PubMed | ISI | ChemPort | 18. Hudson, T.J. et al. An STS-based map of the human genome. Science 270, 1945-1954 (1995). | PubMed | ISI | ChemPort | 19. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 7894 (1997). | Article | PubMed | ISI | ChemPort | 20. Novacek, M.J. Mammalian phylogeny: shaking the tree. Nature 356, 121-125 (1992). | PubMed | ISI | ChemPort | 21. Kumar, S. & Hedges, S.B. A molecular timescale for vertebrate evolution. Nature 392, 917-920 (1998). | Article | PubMed | ISI | ChemPort | 22. Pilbeam, D. The descent of hominoids and hominids. Sci. Am. 250, 84-96 (1984). | PubMed | ISI | ChemPort | 23. Ohno, S. Sex Chromosomes and Sex-linked Genes (Springer-Verlag, Berlin, 1967). 24. Watson, J.M., Spencer, J.A., Riggs, A.D. & Graves, J.A. Sex chromosome evolution: platypus gene mapping suggests that part of the human X chromosome was originally autosomal. Proc. Natl Acad. Sci. USA 88, 11256-11260 (1991). | PubMed | ISI | ChemPort | 25. Foster, J.W. & Graves, J.A. An SRY-related sequence on the marsupial X chromosome: implications for the evolution of the mammalian testis-determining gene. Proc. Natl Acad. Sci. USA 91, 1927-1931 (1994). | PubMed | ISI | ChemPort | ACKNOWLEDGMENTS We thank H. Skaletsky and F. Lewitter for assistance with sequence analysis; the San Diego Zoo and the Duke Primate Center for animal specimens; and P. Bain, A. Bortvin, L. Brown, C. Burge, B. Charlesworth, A. Chess, S. Gilbert, R. Jaenisch, T. Kawaguchi, K. Kleene, D. Menke, R. Saxena, C. Sun, C. Tilford and J. Wang for helpful discussions and comments on the manuscript. Figure 1: Human CDY proteins and transcripts. a, Schematic representation of predicted CDY1 and CDY2 proteins. Positions of chromo and putative catalytic domains1 are indicated, as are protein lengths in amino acid residues. CDY1 major and minor isoforms differ only near their C termini (grey bar in C region of CDY1 minor isoform). b, 3´ portions of CDY1 (major and minor) and CDY2 transcripts. Shown are partial cDNA and predicted amino acid sequences; coding sequence is placed against a black background (grey in the case of differential portion of CDY1 minor transcript); terminal residues of predicted proteins are numbered. Part of the intron (genomic) sequence of the CDY1 minor transcript is shown; this intron evolved from what had been exon 9 of CDYL (Fig. 2). The fifth nucleotide of this intron is circled; the appearance of a G at this position (where 84% of introns 19 have a G, but where human CDYL and mouse Cdyl have a T) may have contributed to the evolution of this novel intron. Alternative poly(A) tracts observed in some CDY1 minor and CDY2 cDNA clones are indicated. Below polyadenylation signals (underlined) are human CDYL sequences at corresponding positions; polyadenylation sites employed by human CDY genes apparently arose only after retroposition, through mutations that generated appropriate signal sequences. Figure 2: Comparison of transcripts and encoded proteins from mouse Cdyl, human CDYL and human CDY1 genes. Exons are numbered; positions of introns are indicated. Black and grey bars depict translated regions. Percentage identities are shown in several regions for both DNA and predicted amino acid sequences; na, not applicable. Note: 64% nucleotide identity was observed between the 3´ portion of human CDYL transcript and the intron/second exon of human CDY1 minor transcript. Sequence similarity between CDY1 genomic locus and CDYL falls abruptly (to less than 50% nucleotide identity) at both the 5´ and 3´ ends: immediately 5´ of CDYL exon 4, and 3´ of the polyadenylation site of CDY1 minor transcript. Poly(A) tracts, including alternative locations observed in some cDNA clones, are indicated. Figure 3: Homologues of CDYL and CDY in diverse mammalian species. a, Southern blots of male and female genomic DNAs from 13 mammalian species hybridized with cDNA fragments corresponding to the entire coding sequence of either human CDYL (top, TaqI digest) or human CDY1 (bottom, BglII digest). At low stringency, CDYL probe hybridized with male-female common (presumably autosomal) homologues in all 13 species. It also cross-hybridized with Y-encoded homologues in some primates, especially orangutan and squirrel monkey. CDY probe identified Y-encoded homologues in all primates except lemurs. It also crosshybridized weakly to the autosomal homologue in some species. Positions of size markers are shown at right. The apparent smears in the squirrel monkey lanes are composed of intense, discrete fragments, as revealed by shorter exposures. b, Summary and interpretation of Southern-blot data. The 13 species tested are arranged phylogenetically20-22. Figure 4: Tissue distributions of mouse Cdyl, human CDYL and human CDY transcripts. Northern blots incubated cDNA fragments corresponding to the entire coding sequence of either mouse Cdyl (top), human CDYL (middle), or human CDY1 (bottom) reveal tissue expression patterns; the phylogenic relationship of these genes (illustrated on left) is based on data presented in Fig. 3. Figure 5: Schematic representation of three molecular evolutionary processes that contributed genes to human NRY. Persistence (top): an autosomal pair gave rise to the neo-Y and neo-X (subsequently enlarged by fusion with other autosomes or autosomal segments; data not shown; refs 23,24). SRY, RPS4Y and several other genes derived from these ancestral autosomes persist as X-homologous genes in the human NRY (refs 1,2,3,4,25). Transposition: the DAZ genes arose by transposition (and subsequent amplification; data not shown) of autosomal genomic DNA containing the entire DAZL transcription unit5. Retroposition: the CDY genes arose by integration (and subsequent amplification; data not shown) of a reverse-transcribed copy of a processed mRNA derived from the autosomal CDYL gene. Gene sizes are not to scale. 21 March 2002 Nature 416, 323 - 326 (2002); doi:10.1038/416323a Reduced adaptation of a non-recombining neo-Y chromosome DORIS BACHTROG AND BRIAN CHARLESWORTH Institute of Cell, Animal and Population Biology, University of Edinburgh, West Mains Road, Edinburgh EH9 3JT, UK Correspondence and requests for materials should be addressed to D.B. (e-mail: doris.bachtrog@ed.ac.uk). Sex chromosomes are generally believed to have descended from a pair of homologous autosomes. Suppression of recombination between the ancestral sex chromosomes led to the genetic degeneration of the Y chromosome1. In response, the X chromosome may become dosage-compensated1, 2. Most proposed mechanisms for the degeneration of Y chromosomes involve the rapid fixation of deleterious mutations on the Y1. Alternatively, Y-chromosome degeneration might be a response to a slower rate of adaptive evolution, caused by its lack of recombination3. Here we report patterns of DNA polymorphism and divergence at four genes located on the neo-sex chromosomes of Drosophila miranda. We show that a higher rate of protein sequence evolution of the neo-X-linked copy of Cyclin B relative to the neo-Y copy is driven by positive selection, which is consistent with the adaptive hypothesis for the evolution of the Y chromosome3. In contrast, the neo-Y-linked copies of even-skipped and roundabout show an elevated rate of protein evolution relative to their neo-X homologues, probably reflecting the reduced effectiveness of selection against deleterious mutations in a non-recombining genome1. Our results provide evidence for the importance of sexual recombination for increasing and maintaining the level of adaptation of a population. Well-studied sex chromosome systems, such as those of humans or Drosophila melanogaster, are very ancient and show few signs of their evolutionary origins4, 5. In D. miranda, a neo-sex chromosome system has resulted from a recent chromosome fusion between the Y chromosome and an autosome (Muller's element C (ref. 6); Fig. 1). The fused autosome, the neo-Y chromosome, is transmitted patrilineally. Its homologue segregates with the X chromosome, forming the neo-X chromosome. The closest relatives of D. miranda, D. pseudoobscura and D. persimilis (from which it diverged about 2 million years ago7), lack the fusion, setting an upper limit to the age of the neo-sex chromosomes. Because male Drosophila have achiasmate meiosis8, the neo-Y never recombines, and so is subject to the same evolutionary forces responsible for the degeneration of 'true' Y chromosomes. Indeed, the neo-sex chromosomes of D. miranda show many characteristics of the much older true sex chromosomes. The neo-Y has degenerated substantially9, whereas the neo-X chromosome has evolved partial dosage compensation2. There is compelling evidence that much of the human Y chromosome is derived from a single autosomal region added to the original Y chromosome10; that is, it is also a degenerate neoY chromosome. Figure 1 Phylogenetic relationships between the species investigated, constructed from the coding region of CycB. Full legend High resolution image and legend (24k) The different evolutionary processes that might be involved in Y-chromosome degeneration leave different footprints of variation and evolution at the molecular level1, and so can be examined empirically. Here we report levels of polymorphism and divergence at four genes, Cyclin B (CycB), roundabout (robo), engrailed (eng) and even-skipped (eve), located on both the neo-X and neo-Y chromosomes of D. miranda. The use of degenerate primers for PCR, and screening of a genomic library, allowed the isolation of these genes from D. miranda (see Methods). In situ hybridization of probes to the polytene chromosomes confirmed their expected position on the neo-sex chromosomes of D. miranda (their positions on the polytene map11 are: CycB, 32A; robo, 33C; eng, 31D; eve, 23A). For CycB, robo and eve, both coding and non-coding sequence data were obtained; for eng, only the intron sequence was investigated. The use of reverse-transcriptase-mediated polymerase chain reaction (RT–PCR) with primers spanning an intron confirmed that, for CycB, eve and robo, both alleles are transcribed and spliced in male flies, indicating that both the neoX and the neo-Y copies are probably under selective constraints. For outgroup comparisons, all genes were isolated by PCR from D. pseudoobscura, in which Muller's element C is autosomal (Fig. 1). We investigated nucleotide variation in 12 wild-derived lines of D. miranda (Table 1). A total of 5.2 kilobases per individual were surveyed for the homologous regions on the neoX and neo-Y chromosomes. Between the neo-X-linked and the neo-Y-linked genes investigated, there are 106 fixed differences and no shared polymorphisms, which is consistent with a lack of recombination between the neo-sex chromosomes. Net silent-site divergence12 between the neo-X and neo-Y alleles is 2.8%; combining this with previous data13, and assuming a silent substitution rate of 1.2 10-8 per site per year for Drosophila14, this yields a total divergence time of 2.2 million years, giving an age of about 1.1 million years for the neo-sex chromosome system. On the neo-X chromosome, 61 segregating sites were observed (27 silent single-nucleotide polymorphisms, 4 replacements, 25 non-coding sites and 5 insertion or deletion events. In contrast, the homologous neo-Y-linked regions almost completely lacked variability: only two singleton variants (one synonymous, one non-coding) were detected. Under the standard neutral model, genetic diversity is proportional to the product of the effective population size, Ne, and the mutation rate15. If no selective forces are operating, Ne for the neo-Y chromosome should be one-third that of the neo-X chromosome, assuming random variation in offspring number for each sex16. Our data show that the diversity level for the neo-Y chromosome is reduced 30-fold compared with that for the neo-X. This result is consistent with our previous finding of a 25-fold reduction in microsatellite variability on the neo-Y chromosome of D. miranda17. The overall rate of silent substitution on the neo-Y branch of the tree connecting the neo-X and neo-Y of D. miranda and D. pseudoobscura (Fig. 1) is similar to that for the neo-X branch, showing that the difference in diversity is not caused by a lower mutation rate of the neo-Y-linked genes (Table 1). A likelihood method was used to quantify the reduction in diversity of the neo-Y chromosome, combining both the microsatellite and the sequence data. Coalescent simulations yield a maximum-likelihood estimate of the ratio of Ne for the neo-Y chromosome to that for the neo-X of 0.04 (Fig. 2). This is consistent with the expectation of a substantial reduction in Ne under various forms of selection acting on a non-recombining genome1. A similar magnitude of reduction in nucleotide polymorphism has also been observed on an evolving plant Y chromosome18. Figure 2 Plots of the relative values of the log-likelihood functions ln L(k| V), ln L(k| S) and ln L(k| V, S) as a function of k, the reduction in the effective population size of the neo-Y chromosome relative to the neo-X. Full legend High resolution image and legend (24k) A large reduction in Ne for the neo-Y chromosome greatly reduces the rate of fixation of beneficial mutations on the neo-Y chromosome3, and increases the rate of fixation of deleterious mutations1. If loci on the neo-X chromosome continue to adapt and their homologues on the neo-Y fail to do so, it is advantageous to upregulate X-linked genes and downregulate their maladapted Y-linked homologues3. Similarly, an accumulation of deleterious alleles on the neo-Y favours increased activity of neo-X genes relative to their neo-Y homologues1. Both processes can therefore lead to the observed inactivation of neoY-linked genes9 and to dosage compensation2. Comparisons with the outgroup species reveal that the neo-X-linked copy of CycB has a very high rate of protein evolution (Table 1 and Fig. 1). Twenty-three amino-acid replacement substitutions were observed on the branch of the phylogeny leading to the neoX, but only nine occurred on the neo-Y branch. This difference is not the result of an increased mutation rate on the neo-X, as the proportion of synonymous substitutions per synonymous site, Ks, is very similar on the neo-X and neo-Y lineages (Ks 0.06 for both branches (Table 1)). For phylogenetic analysis of the nucleotide substitutions, CycB was also amplified from D. persimilis, D. affinis and D. subobscura, which are all members of the obscura subgroup of Drosophila, to which D. miranda belongs6. Likelihood analysis19 indicates that the ratio of the non-synonymous to synonymous substitution rate (Ka/Ks) differs significantly between branches (2 L = 31.9, d.f. = 8, P < 0.0001 (Fig. 1)). A model of a single Ka/Ks ratio for all branches was compared with a model of a specific ratio for the neo-X branch plus a second rate common to all other branches (two-rate model). The latter fitted the data significantly better (2 L = 18.4, d.f. = 1, P < 0.00002), whereas a model with a different Ka/Ks ratio for each branch did not significantly improve the likelihood over the two-rate model (2 L = 13.4, d.f. = 7, P 0.06). This suggests that most of the heterogeneity in the Ka/Ks ratio between branches of the phylogeny reflects an elevated rate on the branch leading to the neo-X chromosome (Fig. 1). Heterogeneity in the Ka/Ks ratio between lineages might be caused either by positive selection or by relaxed selective constraints in some lineages12. If the elevated rate of amino-acid replacements on the neo-X were solely a consequence of reduced functional constraints, the ratio of synonymous to replacement substitutions would be similar for polymorphisms within D. miranda and for fixed differences between D. miranda and its relatives20. However, a significant excess of the ratio of replacement to silent substitutions relative to polymorphisms was found for comparisons involving the neo-X-linked copy of CycB (Table 2), suggesting that directional darwinian selection has driven the rapid evolution of CycB on the neo-X chromosome20. In addition, the recent fixation of an advantageous mutation reduces neutral polymorphism around the site that is the target of selection21. Consistent with the selection hypothesis is a decrease in variability at CycB (Table 1). The level of polymorphism is about 4-fold lower than estimates for the other neo-X-linked genes investigated (see Table 1). By the HKA test22, the level of neo-X silent polymorphism relative to divergence from D. pseudoobscura at CycB is significantly lower than that for all three other genes (eve, eng and robo (P < 0.01)). This result is consistent with the idea that the neo-X chromosome experiences a faster rate of adaptive evolution than the non-recombining neo-Y. In contrast, eve and robo both show a higher rate of amino-acid replacements on the neo-Y chromosome than their neo-X homologues (Table 1). Maximum-likelihood analysis of the combined sequence data set for eve and robo was used to compare a model in which Ka/Ks was assumed to be the same on the neo-X and neo-Y branches with a model in which it was allowed to differ; the latter fitted the data significantly better (2 L = 6.1, d.f. = 1, P = 0.01). In addition, Ka/Ks on the neo-Y branch of CycB was higher than on the other branches of the phylogeny (except for the neo-X branch; see Fig. 1). Several lines of evidence suggest that the acceleration of the rate of amino acid substitution on the neo-Y chromosome is not due to a lack of functional constraints on the coding region of eve and robo. Using RT– PCR, we demonstrated that the neo-Y-linked copies of both genes are transcribed. The entire coding sequences of eve and robo were analysed, and contained no stop codons or frame-shift mutations. A ratio of Ka/Ks of 1 is expected for a gene that is evolving entirely neutrally, as was found for the Lcp genes on the neo-Y chromosome of D. miranda13, whereas a lower ratio is expected for genes subject to functional constraints12. Eve and robo display a very low Ka/Ks ratio on the neo-Y chromosome (Table 1), suggesting that both genes are under selective constraints. These results, together with the large reduction in Ne for the neo-Y chromosome, suggest that many amino-acid substitutions in these genes are slightly deleterious and that natural selection has not been able to prevent their accumulation on the neo-Y branch1. A similar effect has been reported for some other nonrecombining genomes23, 24. The adaptive significance of sexual recombination is one of the most puzzling problems in evolutionary biology25, 26. Most asexual lineages of eukaryotes seem to become extinct at a higher rate than their sexual relatives25, 27. Similarly, the Y chromosome is prone to degeneration once it stops recombining with the X1. The dismal fates of clonally transmitted species and genomes suggest that genetic recombination is necessary for their persistence over long periods of evolutionary time. Theoretical models predict that sexual populations should be more effective in incorporating new beneficial mutations and preventing the accumulation of deleterious alleles25, 26. Our evidence for both faster adaptation on the recombining X chromosome and the accumulation of deleterious mutations on the non-recombining Y chromosome suggests that both processes may be important in conferring an evolutionary advantage to sex and recombination. Methods Isolation of genes, sequencing and RT-PCR CycB and robo were isolated from a genomic library constructed from D. miranda males (strain 0101.3 (ref. 13) using the Lambda FixII Library kit (Stratagene). eve and eng were isolated by using degenerate primers designed from regions conserved between D. melanogaster and D. virilis. Allelespecific primers were used for PCR amplification of the neo-X-linked and neo-Y-linked copies of CycB and robo from male genomic DNA, followed by direct sequencing of both strands of the PCR products. The neo-Y-specific primers only amplify a product in males, while the neo-X-specific primers amplify in both sexes. For eng and eve, PCR primers amplifying both copies were used, and the amplified product was cloned with the TOPO TA cloning kit (Invitrogen). Clones containing either the neo-X or neo-Y copy were distinguished by a size difference in the cloned PCR products in the case of eng, or by a HaeII restriction site for eve. To exclude PCR errors, at least three independent clones per allele and individual were analysed (the PCR error rate was found to be 0.00092 per base pair). For the population survey, 12 strains of D. miranda were sequenced (see ref. 13 for description of the strains), using the ABI Prism BigDye Chemistry (Perkin-Elmer) on an ABI 377 automated sequencer. Total RNA from male D. miranda (strain MSH 38 (ref. 13) was extracted with the RNA STAT-60 kit, and first-strand cDNA synthesis was performed with random or dT primers and the cDNA Cycle kit (Invitrogen). Primers spanning at least one intron were used to PCR-amplify the neo-X and the neo-Y copies. Amplification products were cloned with the TOPO TA cloning vector and sequenced. Sequence analysis Sequences of D. miranda and D. pseudoobscura were aligned manually. Nucleotide diversity was calculated as described28. To align the coding region of CycB between the more diverged members of the obscura species group, the translated proteins were aligned by using CLUSTAL X (ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/) and the nucleotide alignment was constructed from the protein alignment. The phylogeny based on the coding region of CycB was derived by using PUZZLE (ftp://ftp.ebi.ac.uk/pub/software/mac/puzzle/) and is shown in Fig. 1. Ka and Ks were calculated with a maximum-likelihood method, which accounts for unequal transition and transversion rates and unequal base and codon frequencies19. A likelihood ratio test was used to test different models of evolution by application of the codeml program within the PAML software package19. A model assuming the same Ka/Ks ratio for all branches was compared with a model assuming a different Ka/Ks ratio for all or some branches of the phylogeny. Twice the difference in log-likelihood between models is assumed to be distributed approximately as 2, with degrees of freedom equal to the difference in the numbers of parameters. Coalescent process simulations Coalescent simulations with the standard algorithm of Hudson29 were performed to obtain a maximum-likelihood estimate of the reduction in Ne for the neo-Y chromosome compared with that of the neo-X, combining both the nucleotide polymorphism data set presented here and data on microsatellite variability17. Whereas all loci on the neo-Y share one genealogy, the loci on the neo-X are probably independent. We therefore generated 11 independent trees for a data set of 12 chromosomes for the neo-X (7 microsatellite loci and 4 genes), whereas a single tree with 11 completely linked loci was computed for the neo-Y. Microsatellite mutations were superimposed on the trees following the single stepwise mutation model described in ref. 17. In short, seven random estimates of = 4Ne (where µ is the mutation rate per microsatellite locus) were drawn from a gamma distribution for each run, and mutations using the resulting values were laid down on the trees in accordance with Poisson distributions, using the same values of for homologous loci. The mean variance in repeat number per locus, , across the seven loci was then calculated. After each run, Vsim = ( X - Y)/ X for the simulated data set was computed. For each gene, the total number of segregating mutations (CycB, 6; robo, 20; eve, 28; eng, 9) were assigned to the simulated neo-X and neo-Y trees in proportion to their lengths. After each run, the total numbers of segregating sites on the neo-X (SX) and on the neo-Y (SY) were determined and the quantity Ssim = (SX - SY)/SX was computed. To estimate the reduction in Ne for the neo-Y loci, the tree length (in units of 2Ne generations) was multiplied by a scaling factor k before mutations were laid down on the neo-Y tree. The simulations can be used to determine the likelihood of a pair of ( Vsim, Ssim) for a given value of k. n replicas are generated for each value of k. The number of runs M where | Vsim - Vobs| and | Ssim - Sobs| was determined, where is a preassigned mesh size for the continuous variables V and S. Following ref. 30, the likelihood of the sample is approximated by Here, n is 105 and is 0.02. Received 21 September 2001; accepted 18 December 2001 References 1. Charlesworth, B. & Charlesworth, D. The degeneration of Y chromosomes. Phil. Trans. R. Soc. Lond. B 55, 1563-1572 (2000). | Article | 2. Marin, I., Siegal, M. L. & Baker, B. S. The evolution of dosage-compensation mechanisms. BioEssays 22, 1106-1114 (2000). | Article | PubMed | ISI | ChemPort | 2. Marin, I., Siegal, M. L. & Baker, B. S. The evolution of dosage-compensation mechanisms. BioEssays 22, 1106-1114 (2000). | Article | PubMed | ISI | ChemPort | 2. Marin, I., Siegal, M. L. & Baker, B. S. The evolution of dosage-compensation mechanisms. BioEssays 22, 1106-1114 (2000). | Article | PubMed | ISI | ChemPort | 3. Orr, H. A. & Kim, Y. An adaptive hypothesis for the evolution of the Y chromosome. Genetics 150, 1693-1698 (1998). | PubMed | ISI | ChemPort | 3. Orr, H. A. & Kim, Y. An adaptive hypothesis for the evolution of the Y chromosome. Genetics 3. 4. 4. 5. 5. 5. 6. 7. 7. 7. 8. 8. 8. 9. 10. 10. 10. 11. 150, 1693-1698 (1998). | PubMed | ISI | ChemPort | Orr, H. A. & Kim, Y. An adaptive hypothesis for the evolution of the Y chromosome. Genetics 150, 1693-1698 (1998). | PubMed | ISI | ChemPort | Lahn, B. T., Pearson, N. M. & Jegalian, K. The human Y chromosome, in the light of evolution. Nature Rev. Genet. 2, 207-216 (2001). | Article | PubMed | ISI | ChemPort | Lahn, B. T., Pearson, N. M. & Jegalian, K. The human Y chromosome, in the light of evolution. Nature Rev. Genet. 2, 207-216 (2001). | Article | PubMed | ISI | ChemPort | Carvalho, A. B., Lazzaro, B. P. & Clark, A. G. Y chromosomal fertility factors kl-2 and kl-3 of Drosophila melanogaster encode dynein heavy chain polypeptides. Proc. Natl Acad. Sci. USA 97, 13239-13244 (2000). | Article | PubMed | ISI | ChemPort | Carvalho, A. B., Lazzaro, B. P. & Clark, A. G. Y chromosomal fertility factors kl-2 and kl-3 of Drosophila melanogaster encode dynein heavy chain polypeptides. Proc. Natl Acad. Sci. USA 97, 13239-13244 (2000). | Article | PubMed | ISI | ChemPort | Carvalho, A. B., Lazzaro, B. P. & Clark, A. G. Y chromosomal fertility factors kl-2 and kl-3 of Drosophila melanogaster encode dynein heavy chain polypeptides. Proc. Natl Acad. Sci. USA 97, 13239-13244 (2000). | Article | PubMed | ISI | ChemPort | Powell, J. R. Progress and Prospects in Evolutionary Biology: The Drosophila Model (Oxford Univ. Press, New York, 1997). Schaeffer, S. W. & Miller, E. L. Molecular population genetics of an electrophoretically monomorphic protein in the alcohol dehydrogenase region of Drosophila pseudoobscura. Genetics 132, 163-178 (1992). | PubMed | ISI | ChemPort | Schaeffer, S. W. & Miller, E. L. Molecular population genetics of an electrophoretically monomorphic protein in the alcohol dehydrogenase region of Drosophila pseudoobscura. Genetics 132, 163-178 (1992). | PubMed | ISI | ChemPort | Schaeffer, S. W. & Miller, E. L. Molecular population genetics of an electrophoretically monomorphic protein in the alcohol dehydrogenase region of Drosophila pseudoobscura. Genetics 132, 163-178 (1992). | PubMed | ISI | ChemPort | Gethmann, R. C. Crossing over in males of higher Diptera (Brachycera). J. Hered. 79, 344-350 (1988). | ISI | Gethmann, R. C. Crossing over in males of higher Diptera (Brachycera). J. Hered. 79, 344-350 (1988). | ISI | Gethmann, R. C. Crossing over in males of higher Diptera (Brachycera). J. Hered. 79, 344-350 (1988). | ISI | Steinemann, M. & Steinemann, S. Enigma of Y chromosome degeneration: neo-Y and neo-X chromosomes of Drosophila miranda a model for sex chromosome evolution. Genetica 102103, 409-420 (1998). | Article | PubMed | ChemPort | Waters, P. D., Duffy, B., Frost, C. J., Delbridge, M. L. & Graves, J. A. The human Y chromosome derives largely from a single autosomal region added to the sex chromosomes 80-130 million years ago. Cytogenet. Cell. Genet. 92, 74-79 (2001). | Article | PubMed | ISI | ChemPort | Waters, P. D., Duffy, B., Frost, C. J., Delbridge, M. L. & Graves, J. A. The human Y chromosome derives largely from a single autosomal region added to the sex chromosomes 80-130 million years ago. Cytogenet. Cell. Genet. 92, 74-79 (2001). | Article | PubMed | ISI | ChemPort | Waters, P. D., Duffy, B., Frost, C. J., Delbridge, M. L. & Graves, J. A. The human Y chromosome derives largely from a single autosomal region added to the sex chromosomes 80-130 million years ago. Cytogenet. Cell. Genet. 92, 74-79 (2001). | Article | PubMed | ISI | ChemPort | Das, M., Mutsuddi, D., Duttagupta, A. K. & Mukherjee, A. S. Segmental heterogeneity in replication and transcription of the X2 chromosome of Drosophila miranda and conservativeness in the evolution of dosage compensation. Chromosoma 87, 373-388 11. 11. 12. 13. 13. 13. 14. 14. 14. 15. 16. 17. 17. 17. 18. 18. 18. 19. (1982). | ISI | ChemPort | Das, M., Mutsuddi, D., Duttagupta, A. K. & Mukherjee, A. S. Segmental heterogeneity in replication and transcription of the X2 chromosome of Drosophila miranda and conservativeness in the evolution of dosage compensation. Chromosoma 87, 373-388 (1982). | ISI | ChemPort | Das, M., Mutsuddi, D., Duttagupta, A. K. & Mukherjee, A. S. Segmental heterogeneity in replication and transcription of the X2 chromosome of Drosophila miranda and conservativeness in the evolution of dosage compensation. Chromosoma 87, 373-388 (1982). | ISI | ChemPort | Li, W. Molecular Evolution (Sinauer Associates, Sunderland, Massachusetts, 1997). Yi, S. & Charlesworth, B. Contrasting patterns of molecular evolution of the genes on the new and old sex chromosomes of Drosophila miranda. Mol. Biol. Evol. 17, 703-717 (2000). | PubMed | ISI | ChemPort | Yi, S. & Charlesworth, B. Contrasting patterns of molecular evolution of the genes on the new and old sex chromosomes of Drosophila miranda. Mol. Biol. Evol. 17, 703-717 (2000). | PubMed | ISI | ChemPort | Yi, S. & Charlesworth, B. Contrasting patterns of molecular evolution of the genes on the new and old sex chromosomes of Drosophila miranda. Mol. Biol. Evol. 17, 703-717 (2000). | PubMed | ISI | ChemPort | Wang, R. L. & Hey, J. The speciation history of Drosophila pseudoobscura and close relatives: inferences from DNA sequence variation at the period locus. Genetics 144, 1113-1126 (1996). | PubMed | ISI | ChemPort | Wang, R. L. & Hey, J. The speciation history of Drosophila pseudoobscura and close relatives: inferences from DNA sequence variation at the period locus. Genetics 144, 1113-1126 (1996). | PubMed | ISI | ChemPort | Wang, R. L. & Hey, J. The speciation history of Drosophila pseudoobscura and close relatives: inferences from DNA sequence variation at the period locus. Genetics 144, 1113-1126 (1996). | PubMed | ISI | ChemPort | Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge, 1983). Wright, S. Evolution and the Genetics of Populations (Univ. of Chicago Press, Chicago, 1969). Bachtrog, D. & Charlesworth, B. Reduced levels of microsatellite variability on the neo-Y chromosome of Drosophila miranda. Curr. Biol. 10, 1025-1031 (2000). | Article | PubMed | ISI | ChemPort | Bachtrog, D. & Charlesworth, B. Reduced levels of microsatellite variability on the neo-Y chromosome of Drosophila miranda. Curr. Biol. 10, 1025-1031 (2000). | Article | PubMed | ISI | ChemPort | Bachtrog, D. & Charlesworth, B. Reduced levels of microsatellite variability on the neo-Y chromosome of Drosophila miranda. Curr. Biol. 10, 1025-1031 (2000). | Article | PubMed | ISI | ChemPort | Filatov, D. A., Moneger, F., Negrutiu, I. & Charlesworth, D. Low variability in a Y-linked plant gene and its implications for Y-chromosome evolution. Nature 404, 388-390 (2000). | Article | PubMed | ISI | ChemPort | Filatov, D. A., Moneger, F., Negrutiu, I. & Charlesworth, D. Low variability in a Y-linked plant gene and its implications for Y-chromosome evolution. Nature 404, 388-390 (2000). | Article | PubMed | ISI | ChemPort | Filatov, D. A., Moneger, F., Negrutiu, I. & Charlesworth, D. Low variability in a Y-linked plant gene and its implications for Y-chromosome evolution. Nature 404, 388-390 (2000). | Article | PubMed | ISI | ChemPort | Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555-556 (1997). | PubMed | ISI | ChemPort | 19. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555-556 (1997). | PubMed | ISI | ChemPort | 19. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555-556 (1997). | PubMed | ISI | ChemPort | 20. McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652-654 (1991). | PubMed | ISI | ChemPort | 20. McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652-654 (1991). | PubMed | ISI | ChemPort | 20. McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652-654 (1991). | PubMed | ISI | ChemPort | 21. Barton, N. H. Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B 355, 1553-1562 (2000). | Article | ISI | ChemPort | 21. Barton, N. H. Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B 355, 1553-1562 (2000). | Article | ISI | ChemPort | 21. Barton, N. H. Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B 355, 1553-1562 (2000). | Article | ISI | ChemPort | 22. Hudson, R. R., Kreitman, M. & Aguade, M. A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153-159 (1987). | PubMed | ISI | ChemPort | 22. Hudson, R. R., Kreitman, M. & Aguade, M. A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153-159 (1987). | PubMed | ISI | ChemPort | 22. Hudson, R. R., Kreitman, M. & Aguade, M. A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153-159 (1987). | PubMed | ISI | ChemPort | 23. Lynch, M. & Blanchard, J. L. Deleterious mutation accumulation in organelle genomes. Genetica 102-103, 29-39 (1998). | Article | PubMed | ChemPort | 24. Fridolfsson, A. K. & Ellegren, H. Molecular evolution of the avian CHD1 genes on the Z and W sex chromosomes. Genetics 155, 1903-1912 (2000). | PubMed | ISI | ChemPort | 24. Fridolfsson, A. K. & Ellegren, H. Molecular evolution of the avian CHD1 genes on the Z and W sex chromosomes. Genetics 155, 1903-1912 (2000). | PubMed | ISI | ChemPort | 24. Fridolfsson, A. K. & Ellegren, H. Molecular evolution of the avian CHD1 genes on the Z and W sex chromosomes. Genetics 155, 1903-1912 (2000). | PubMed | ISI | ChemPort | 25. Maynard Smith, J. The Evolution of Sex (Cambridge Univ. Press, Cambridge, 1978). 26. Barton, N. H. & Charlesworth, B. Why sex and recombination? Science 281, 1986-1990 (1998). | Article | PubMed | ISI | ChemPort | 26. Barton, N. H. & Charlesworth, B. Why sex and recombination? Science 281, 1986-1990 (1998). | Article | PubMed | ISI | ChemPort | 26. Barton, N. H. & Charlesworth, B. Why sex and recombination? Science 281, 1986-1990 (1998). | Article | PubMed | ISI | ChemPort | 27. Bell, G. The Masterpiece of Nature (Univ. of California, Berkeley, 1982). 28. Tajima, F. Statistical analysis of DNA polymorphism. Jpn J. Genet. 68, 567-595 (1993). | PubMed | ISI | ChemPort | 28. Tajima, F. Statistical analysis of DNA polymorphism. Jpn J. Genet. 68, 567-595 (1993). | PubMed | ISI | ChemPort | 28. Tajima, F. Statistical analysis of DNA polymorphism. Jpn J. Genet. 68, 567-595 (1993). | PubMed | ISI | ChemPort | 29. Hudson, R. R. in Oxford Surveys in Evolutionary Biology Vol. 7 (eds Futuyma, D. & Antonovics, J.) 1-44 (Oxford Univ. Press, Oxford, 1990). 30. Weiss, G. & von Haeseler, A. Inference of population history using a likelihood approach. Genetics 149, 1539-1546 (1998). | PubMed | ISI | ChemPort | 30. Weiss, G. & von Haeseler, A. Inference of population history using a likelihood approach. Genetics 149, 1539-1546 (1998). | PubMed | ISI | ChemPort | 30. Weiss, G. & von Haeseler, A. Inference of population history using a likelihood approach. Genetics 149, 1539-1546 (1998). | PubMed | ISI | ChemPort | Acknowledgements. We thank P. Andolfatto, N. Barton, D. Charlesworth, I. Gordo, P. Keightley and S. Wright for helpful comments on the manuscript. D.B. is supported by a Marie Curie fellowship and B.C. by the Royal Society. Competing interests statement. The authors declare that they have no competing financial interests. Figure 1 Phylogenetic relationships between the species investigated, constructed from the coding region of CycB. The numbers shown above the lines give the estimated Ka values (percentages) for each lineage obtained by maximum likelihood19; the corresponding Ks values are given below the lines. The karyotypes of the species are also illustrated. The letters A–E indicate the five major chromosomes of the basic Drosophila karyotype6. In D. miranda, element C is fused to the true Y chromosome (neo-Y, shown in grey), and its unfused homologue, the neo-X, segregates with the X chromosome. The relatives of D. miranda lack the Y–C fusion. (Note that D. subobscura lacks the A–D fusion found in the other species6). Figure 2 Plots of the relative values of the log-likelihood functions ln L(k| V), ln L(k| S) and ln L(k| V, S) as a function of k, the reduction in the effective population size of the neo-Y chromosome relative to the neo-X. All three curves have been normalized so that their maxima are at 0.The horizontal line indicates the two-unit support interval. 11 April 2002 Nature 416, 624 - 626 (2002); doi:10.1038/416624a Strong male-driven evolution of DNA sequences in humans and apes KATERYNA D. MAKOVA AND WEN-HSIUNG LI Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, Illinois 60637, USA Correspondence and requests for materials should be addressed to W.-H.L. (e-mail: whli@uchicago.edu). Studies of human genetic diseases have suggested a higher mutation rate in males than in females1 and the male-to-female ratio ( ) of mutation rate has been estimated from DNA sequence and microsatellite data to be about 4–6 in higher primates2-5. Two recent studies, however, claim that is only about 2 in humans6, 7. This is even smaller than the estimates ( > 4) for carnivores and birds8, 9; humans should have a higher than carnivores and birds because of a longer generation time and a larger sex difference in the number of germ cell cycles. To resolve this issue, we sequenced a noncoding fragment on Y of about 10.4 kilobases (kb) and a homologous region on chromosome 3 in humans, greater apes, and lesser apes. Here we show that our estimate of from the internal branches of the phylogeny is 5.25 (95% confidence interval (CI) 2.44 to ), similar to the previous estimates2-5, but significantly higher than the two recent ones6, 7. In contrast, for the external (short, species-specific) branches, is only 2.23 (95% CI: 1.47–3.84). We suggest that closely related species are not suitable for estimating , because of ancient polymorphism and other factors. Moreover, we provide an explanation for the small estimate of in a previous study12. Our study reinstates a high in hominoids and supports the view that DNA replication errors are the primary source of germline mutation. An effective way to estimate is to use highly similar noncoding sequences on different types of chromosomes10. Noting that the DAZ locus was translocated from chromosome 3 to Y after the split between Old and New World monkeys11, we sequenced three noncoding segments ( 10.4 kb total) on chromosomes Y and 3 in the human, bonobo, gorilla, siamang and gibbon. When we estimate the -values separately for the external and the internal branches of the tree (Fig. 1) an intriguing pattern emerges (Table 1). The -values vary enormously among the external branches, but their average is low. The sum of the external branch lengths is 4.43% for the Y sequences (Y) and 3.22% for the chromosome 3 sequences (A), leading to Y/A=1.38. According to a previous study12: Y/A=2 /(1 + ), so =2.23 (95% CI: 1.47–3.84), not significantly different from the estimate (1.55) in ref. 6. In contrast, the sum of the internal branch lengths is 3.77% for the Y sequences and 2.25% for the chromosome 3 sequences, leading to Y/A=1.68 and =5.25 (95% CI: 2.44 to ). This is similar to the earlier estimates of in primates2-5, but significantly higher than those in refs 6 and 7. Thus, estimates of obtained from closely related species tend to be lower than those obtained from more distantly related species. Figure 1 Phylogenetic tree of nucleotide sequences. Full legend High resolution image and legend (42k) We wondered how to explain the above discrepancy. The level of polymorphism on chromosome Y is usually extremely low because of selective sweep and background selection13, 14 and because the effective population size of Y is only one-quarter of that of an autosome. The average divergence between two species is equal to + 2ut, where t is the divergence time, u is the mutation rate, and is the average divergence between two sequences in the ancestral population (ancient nucleotide diversity) at the time of speciation15. The ratio of Y divergence (Y) to autosome divergence (A) is Y/A=(y + y)/(a + a), where y is the number of mutations on Y, a is the number of mutations on an autosome, and y and a are the ancient nucleotide diversities at the point of speciation on Y and an autosome, respectively. As y is usually lower than a, (y + y)/(a + a) can be substantially lower than y/a for closely related species. For instance, a between the two gorilla alleles at the chromosome 3 locus is 0.19% (Fig. 1). If we assume a=0.19% and y 0 in the common ancestor of the bonobo and the gorilla, then Y/A between the bonobo and the gorilla increases from 2.11%/1.49%=1.42 to 2.11%/(1.49% - 0.19%)=1.62, and increases from 2.45 to 4.26. Clearly, ancient polymorphism can considerably reduce the estimate of for closely related species. If A is small, even a small error in the estimate of A can have a strong effect on Y/A. A more serious problem in ref. 6 is that their phylogeny (Fig. 2a) assumes that the Y-linked sequence, which was derived from the transposition of an X-linked sequence in an ancestral human population, has always evolved as a Y-linked sequence since its separation from the X-linked sequence sampled in their study. This is unlikely. Rather, one of the schemes in Fig. 2b–d should be true. In Fig. 2b, when chimpanzee X is used as a reference, we have X=x1 and Y=x2 + y. If y < x2, Y can be close to X and can be seriously underestimated. In Fig. 2c, X= 1 + x1, Y= 2 + x2 + y and 1> 2, so if y is very small, Y can even be smaller than X; that is, both Y/X and can be smaller than 1. In Fig. 2d, 1 < 2, so Y/X is greater than 1. However, Y/X can be close to 1, if y is small and if 2 is close to 1. Among the three schemes, Fig. 2b is most probable because the observed human X–human Y divergence (1.18%) is smaller than the observed human X–chimpanzee X divergence (1.56%)6. In any case, is underestimated if Fig. 2a is used as the phylogeny. Figure 2 Phylogenetic schemes for the sequences of ref. 6. Full legend High resolution image and legend (71k) As to the estimate of =2.1 in ref. 7, corrections for multiple substitutions were not made in the data analysis. This could have underestimated for old repetitive elements. For young elements, the low estimate of could be in part due to low polymorphism on Y. One additional problem was the false assumption that the repetitive elements of the same subfamily were inserted into the genome at the same time16, introducing errors into the estimation of . We have provided a resolution for the controversy on the magnitude of in primates. Most previous studies compared X- and Y-linked sequences (see ref. 2), and some have argued that the high values could be due to a reduction in mutation rate in X rather than an elevated rate in Y17. To avoid this possibility, we compared an autosomal and a Y-linked locus. The fact that this study and all previous studies, with the exception of comparisons between closely related species, give consistent estimates of 4–6 provides conclusive evidence for strong male-driven evolution in hominoids. Methods DNA sequencing We studied three homologous fragments on Y and chromosome 3 that correspond to nucleotides 36,887–40,238, 49,037–54,082, and 55,171–57,328 of human Y contig AC006983.4 and nucleotides 59,378–56,044, 47,226–42,245, and 40,284–38,159 of human chromosome 3 contig AC010139.4. The 5' end of each fragment is located 1.5, 13.7 and 20.6 kb, respectively, downstream from the 3' untranslated region of the DAZ gene and they have no homology with an expressed sequence tags and contain no known or predicted gene. Interspersed and simple repeats were identified using RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html). Searches employing GenScan and BLAST failed to identify any genes or exons within these fragments. Genomic DNAs of a female and a male human, bonobo (Pan paniscus), gorilla (Gorilla gorilla), gibbon (Hylobates lar) and siamang (Hylobates syndactylus) were used separately as templates in polymerase chain reaction (PCR). The Expand High Fidelity PCR system (Roche) was used to minimize errors in PCR amplification. The PCR primers were designed from the human sequence (available as Supplementary Information). Female DNA was used to amplify the chromosome 3 locus and male DNA to amplify the chromosome Y locus. The sequences were deposited in GenBank under accession numbers AF483550–AF483579. Statistical analyses The sequences of the fragments studied were concatenated and aligned using the MegAlign module of DNAStar (Lasergene), and the alignment was adjusted manually. As the divergence between the sequences studied is low (at most 11%), the alignment was rather simple. We found evidence of at least two copies of the DAZ locus on chromosome Y in human (there were five 'polymorphic' sites in a 10.4-kb sequence), gorilla (two polymorphic sites), and siamang (four polymorphic sites). Thus, the copies were almost identical, suggesting that the duplications happened recently. To take the presence of polymorphisms on chromosome 3 into account, two pseudosequences were generated for each of the individuals studied. For example, if the site was heterozygous for A and G, one of the pseudosequences was assigned 'A' and the other was assigned 'G'. This assignment was done randomly by DAMBE18. Insertion/deletion polymorphisms (indels) in the alignment, as well as heterozygous indel sites, were not included in the calculations. These indels included one deletion of 106 base pairs (bp) in gibbon Y and one insertion of 33 bp in human Y, although most of the others were short indels, many of which were in mononucleotide microsatellites. One segment of 121 bp in gorilla chromosome 3 was difficult to sequence and was excluded from analysis. So, in total, 1,148 nucleotide sites were excluded from analysis. Tajima and Nei's genetic distances19 were estimated using DAMBE18. The distances were apportioned among the branches of the phylogeny using the FITCH module of PHYLIP20. The ratio of substitution rates (Y/A) was calculated from the sum of the branches on Y (Y) and chromosome 3 (A). The average of the two values for the chromosome 3 pseudosequences was used in the calculations. This procedure was also used for treating the 'polymorphic' sites in Y-linked sequences. The formula from ref. 12, Y/A=2 /(1 + ), was used to calculate . To calculate the 95% CI for , we used two methods. First, we derived the variance of Y/A. If L is the length of the sequence, V(Y)=Y(1 - Y)/[L(1 - 4Y/3)2], V(A)=A(1 - A)/[L(1 - 4A/3)2], and V(Y/A)=V(Y)/E(A)2 + E(Y)2V(A)/E(A)4. Second, we used the bootstrap. The results for the two methods were similar. The second method was used in the text. Supplementary information accompanies this paper. Received 8 October 2001; accepted 25 January 2002 References 1. Crow, J. F. The origins, patterns and implications of human spontaneous mutation. Nature Rev. Genet. 1, 40-47 (2000). | Article | PubMed | ISI | ChemPort | 2. Huang, W., Chang, B. H.-J., Gu, X., Hewett-Emmett, D. & Li, W.-H. Sex differences in mutation rate in higher primates estimated from AMG intron sequences. J. Mol. Evol. 44, 463-465 (1997). | PubMed | ISI | ChemPort | 3. Agulnik, A. I. et al. Evolution of the DAZ gene family suggests that Y-linked DAZ plays little, or a limited, role in spermatogenesis but underlines a recent African origin for human populations. Hum. Mol. Genet. 7, 1371-1377 (1998). | Article | PubMed | ISI | ChemPort | 4. Nachman, M. W. & Crowell, S. L. Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297-304 (2000). | PubMed | ISI | ChemPort | 5. Ellegren, H. Heterogeneous mutation processes in human microsatellite DNA sequences. Nature Genet. 24, 400-402 (2000). | Article | PubMed | ISI | ChemPort | 6. Bohossian, H. B., Skaletsky, H. & Page, D. C. Unexpectedly similar rates of nucleotide substitution found in male and female hominids. Nature 406, 622-625 (2000). | Article | PubMed | ISI | ChemPort | 7. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2000). | Article | 8. Pecon Slattery, J. & O'Brien, S. J. Patterns of Y and X chromosome DNA sequence divergence during the Felidae radiation. Genetics 148, 1245-1255 (1998). | PubMed | ChemPort | 9. Ellegren, H. & Fridolfsson, A. K. Male-driven evolution of DNA sequences in birds. Nature Genet. 17, 182-184 (1997). | PubMed | ISI | ChemPort | 10. Shimmin, L. C., Chang, B.H.-J., Hewett-Emmett, D. & Li, W.-H. Potential problems in estimating the male-to-female mutation rate ratio from DNA sequence data. J. Mol. Evol. 37, 160-166 (1993). | PubMed | ISI | ChemPort | 11. Saxena, R. et al. Four DAZ genes in two clusters found in the AZFc region of the human Y chromosome. Genomics 67, 256-67 (2000). | Article | PubMed | ISI | ChemPort | 12. Miyata, T., Hayashida, H., Kuma, K., Mitsuyasu, K. & Yasunaga, T. Male-driven molecular evolution: a model and nucleotide sequence analysis. Cold Spring Harbor Symp. Quant. Biol. 52, 863-867 (1987). | PubMed | ISI | ChemPort | 13. Begun, D. J. & Aquadro, C. F. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356, 519-520 (1992). | PubMed | ISI | ChemPort | 14. Charlesworth, B. & Charlesworth, D. The degeneration of Y chromosomes. Phil. Trans. R. Soc. Lond. B 355, 1563-1572 (2000). | Article | ISI | ChemPort | 15. Li, W.-H. Distribution of nucleotide differences between two randomly chosen cistrons in a finite population. Genetics 85, 331-337 (1977). | PubMed | ISI | ChemPort | 16. Erlandsson, R., Wilson, J. F. & Paabo, S. Sex chromosomal transposable element accumulation and male-driven substitutional evolution in humans. Mol. Biol. Evol. 17, 804-812 (2000). | PubMed | ISI | ChemPort | 17. McVean, G. T. & Hurst, L. D. Evidence for a selectively favourable reduction in the mutation rate of the X chromosome. Nature 386, 388-392 (1997). | PubMed | ISI | ChemPort | 18. Xia, X. Data Analysis in Molecular Biology and Evolution (Kluwer Academic, Boston, 2000). 19. Tajima, F. & Nei, M. Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol. 1, 269-285 (1984). | PubMed | ISI | ChemPort | 20. Felsenstein, J. PHYLIP--Phylogeny Inference Package (version 3.2). Cladistics 5, 164-166 (1989). Acknowledgements. The DNA samples were purchased from San Diego Zoological Society and the gibbon sample was given by M. Jensen-Seaman. We thank J. Crow and D. Page for comments. This study was supported by NIH grants. Competing interests statement. The authors declare that they have no competing financial interests. Figure 1 Phylogenetic tree of nucleotide sequences. Branch lengths were estimated from the pairwise evolutionary distances (substitutions per 100 sites). The pseudosequences of the two alleles at the chromosome 3 locus in each species are labelled a and b. Figure 2 Phylogenetic schemes for the sequences of ref. 6. a, The original scheme (modified from ref. 6). b–d, Three possible schemes: one of them is likely to be true. X1 was sampled in ref. 6 and part of X2 was transposed to Y. 1, the ancient nucleotide diversity between the ancestral human X1 and chimpanzee X chromosomes; 2, the ancient nucleotide diversity between the ancestral human X2 and chimpanzee X chromosomes; x1, the number of nucleotide substitutions on human X1 since the common ancestor of X1 and X2 or since the speciation; x2, the number of nucleotide substitutions on human X2 between the common ancestor of X1 and X2 (or the speciation) and the transposition; cx, the number of nucleotide substitutions on chimpanzee X chromosome; y, the number of nucleotide substitutions on the branch for the human Y chromosome locus from the time of transposition to present. Note that a assumes X2=X1. 15 February 2001 Nature 409, 943 - 945 (2001); doi:10.1038/35057170 A physical map of the human Y chromosome CHARLES A. TILFORD*, TOMOKO KURODA-KAWAGUCHI*, HELEN SKALETSKY*, STEVE ROZEN*, LAURA G. BROWN*, MICHAEL ROSENBERG*, JOHN D. MCPHERSON†, KRISTINE WYLIE†, MANDEEP SEKHON†, TAMARA A. KUCABA†, ROBERT H. WATERSTON† & DAVID C. PAGE* * Howard Hughes Medical Institute, Whitehead Institute, and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA † Genome Sequencing Center, Department of Genetics, Washington University School of Medicine, 4444 Forest Park Boulevard, St. Louis, Missouri 63108, USA Correspondence and requests for materials should be addressed to D.C.P. (e-mail: dcpage@wi.mit.edu). The non-recombining region of the human Y chromosome (NRY), which comprises 95% of the chromosome, does not undergo sexual recombination and is present only in males. An understanding of its biological functions has begun to emerge from DNA studies of individuals with partial Y chromosomes, coupled with molecular characterization of genes implicated in gonadal sex reversal, Turner syndrome, graft rejection and spermatogenic failure1, 2. But mapping strategies applied successfully elsewhere in the genome have faltered in the NRY, where there is no meiotic recombination map and intrachromosomal repetitive sequences are abundant3. Here we report a high-resolution physical map of the euchromatic, centromeric and heterochromatic regions of the NRY and its construction by unusual methods, including genomic clone subtraction4 and dissection of sequence family variants5. Of the map's 758 DNA markers, 136 have multiple locations in the NRY, reflecting its unusually repetitive sequence composition. The markers anchor 1,038 bacterial artificial chromosome clones, 199 of which form a tiling path for sequencing. A low-resolution physical map of the Y chromosome was previously assembled by testing naturally occurring deletions and yeast artificial chromosome (YAC) clones for the presence or absence of 182 Y-chromosomal sequence-tagged sites (STSs)3, 6. These STS markers were generated from Y DNA sequences selected at random, which promoted representative sampling of the entire chromosome6. Nonetheless, most randomly selected sequences proved unusable as map landmarks because they corresponded either to interspersed repetitive elements found throughout the genome or to male-specific repetitive sequences dispersed to many locations in the NRY. To construct a high-resolution map, we generated additional STSs in a directed manner. To enrich for the single-copy sequences most useful as map landmarks, we systematically applied genomic clone subtraction, whereby a 'tracer' clone's DNA is depleted of sequences shared with a set of 'driver' clones4. We identified a tiling path of 57 YACs that collectively spanned the euchromatic NRY3, and then carried out, in parallel, 57 subtractions, each employing one YAC as tracer and the remaining YACs (minus those overlapping the tracer YAC) as drivers. We sequenced a random sample of products from each of the 57 subtractions, identifying 308 additional STSs that proved useful in map assembly. We used radiation hybrid mapping to integrate and order the random and subtractionderived STSs. Large-fragment radiation hybrid panels offering long-range connectivity have been used to assemble human genome maps at 500–1,000-kilobase (kb) resolution7-10. To obtain greater resolution in the NRY, where the number of STSs appeared sufficient to cover the euchromatic region at an average spacing of 50 kb, we used a small-fragment radiation hybrid panel that had been used to build detailed maps of limited autosomal segments11. We tested this panel for all random and subtraction-derived NRY STSs. Nascent radiation hybrid linkage groups were ordered and oriented with respect to the centromere by positioning selected STSs on the existing map of natural deletions6. Additional STSs generated in our project's later phases were also tested. Ultimately, 513 STSs were positioned, at a resolution of around 50 kb, on a radiation hybrid map encompassing nearly the entire euchromatic NRY. To prepare for sequencing the NRY, we used the radiation hybrid map as a scaffold for assembling contigs of bacterial artificial chromosome (BAC) clones. We screened a BAC library of human male genomic DNA with hybridization probes derived from NRY STSs. Through subsequent polymerase chain reaction tests of STS content, we assembled 1,038 BACs into contigs that, except for four small gaps, represented the whole NRY (see Fig. 1 in Supplementary Information). Many portions of this BAC map could be assembled only after the sequence of selected BACs had been determined and compared. Also, many NRY genes and extragenic sequences are known to have closely related counterparts on the X chromosome12. In many cases, it was initially unclear whether BACs identified using Xhomologous NRY STSs, especially those from a 4-Mb region of 99% X–Y identity13, 14, derived from the NRY or from the X chromosome. We resolved these ambiguities by resequencing STSs from the BACs in question and comparing them to X- or Y-derived reference sequences. The resulting map of overlapping BACs and ordered STSs (Fig. 1 in Supplementary Information) was extensively cross-checked against the radiation hybrid map and was further reinforced by restriction fingerprinting of all mapped BACs15. Figure 1 Repetitive structure of euchromatic NRY. Full legend High resolution image and legend (48k) The greatest challenges were posed by massive, NRY-specific amplified regions (or amplicons), which comprise about one-third of the euchromatic NRY. Of the 758 STSs on which the map is built, 136 are present at two or more locations in the NRY. Although we avoided such repetitive STSs in favour of single-copy STSs wherever possible, substantial portions of the euchromatic NRY contained little or no single-copy sequence. For many such amplicons, BACs derived from different copies could not be distinguished by STS content or restriction fingerprinting15. In many cases, we distinguished among amplicon copies (and the BACs corresponding to them) by typing 'sequence family variants' (SFVs)5. SFVs are subtle differences (for example, single-nucleotide substitutions or dinucleotide repeat length alterations) between closely related but non-allelic sequences. We were analysing BACs from only one male's Y chromosome, so these subtle sequence differences could not represent allelic variants. In general, we identified SFVs only after comparing the DNA sequences of BACs that originated from distinct copies, despite having similar STS content. Thus, mapping and sequencing were inseparable, iterative activities in ampliconrich regions. The euchromatic NRY amplicons are diverse in composition, size, copy number and orientation (Fig. 1), with some occurring as tandem repeats, others as inverted repeats, and still others dispersed throughout both arms of the chromosome. The euchromatic amplicons are well populated with testis-specific gene families that may be critical in spermatogenesis (see Fig. 1 in Supplementary Information)1, 2. One pair of amplicons is of particular interest in the context of human variation. Highlighted in Fig. 1 (arrows) are two units, each 300 kb long, that exist in opposite orientations on the short arm. These inverted repeats bound a region of around 3.5 Mb that occurs in one orientation (Fig. 1 in Supplementary Information) in the male from whom the BAC library was constructed, but in the opposite orientation in the existing map of naturally occurring deletions6. This may reflect variation among men for a 3.5-Mb inversion1, 16, perhaps the result of homologous recombination between the 300-kb inverted repeats flanking the inverted segment. Large Y-chromosome inversions are postulated to have been crucial in the evolution of the human sex chromosomes12, and this 3.5-Mb inversion may be one of many massive NRY variants that exist in modern populations. Supplementary information accompanies this paper. Received 16 November 2000; accepted 21 December 2000 References 1. Vogt, P. H. et al. Report of the Third International Workshop on Y Chromosome Mapping 1997. Cytogenet. Cell Genet. 79, 1-20 (1997). | PubMed | ChemPort | 2. Lahn, B. T. & Page, D. C. Functional coherence of the human Y chromosome. Science 278, 675-680 (1997). | Article | PubMed | ISI | ChemPort | 3. Foote, S., Vollrath, D., Hilton, A. & Page, D. C. The human Y chromosome: Overlapping DNA clones spanning the euchromatic region. Science 258, 60-66 (1992). | PubMed | ISI | ChemPort | 4. Reijo, R. et al. Diverse spermatogenic defects in humans caused by Y chromosome deletions encompassing a novel RNA-binding protein gene. Nature Genet. 10, 383-393 (1995). | PubMed | ISI | ChemPort | 5. Saxena, R. et al. Four DAZ genes in two clusters found in the AZFc region of the human Y chromosome. Genomics 67, 256-267 (2000). | Article | PubMed | ISI | ChemPort | 6. Vollrath, D. et al. The human Y chromosome: A 43-interval map based on naturally occurring deletions. Science 258, 52-59 (1992). | PubMed | ISI | ChemPort | 7. Hudson, T. J. et al. An STS-based map of the human genome. Science 270, 1945-1954 (1995). | PubMed | ISI | ChemPort | 8. Gyapay, G. et al. A radiation hybrid map of the human genome. Hum. Mol. Genet. 5, 339-346 (1996). | Article | PubMed | ISI | ChemPort | 9. Stewart, E. A. et al. An STS-based radiation hybrid map of the human genome. Genome Res. 7, 422-433 (1997). | PubMed | ISI | ChemPort | 10. Deloukas, P. et al. A physical map of 30,000 human genes. Science 282, 744-746 (1998). | Article | PubMed | ISI | ChemPort | 11. Lunetta, K. L., Boehnke, M., Lange, K. & Cox, D. R. Selected locus and multiple panel models for radiation hybrid mapping. Am. J. Hum. Genet. 59, 717-725 (1996). | PubMed | ISI | ChemPort | 12. Lahn, B. T. & Page, D. C. Four evolutionary strata on the human X chromosome. Science 286, 964-967 (1999). | Article | PubMed | ISI | ChemPort | 13. Mumm, S., Molini, B., Terrell, J., Srivastava, A. & Schlessinger, D. Evolutionary features of the 4-Mb Xq21.3 XY homology region revealed by a map at 60-kb resolution. Genome Res. 7, 307-314 (1997). | PubMed | ISI | ChemPort | 14. Schwartz, A. et al. Reconstructing hominid Y evolution: X-homologous block, created by X-Y transposition, was disrupted by Yp inversion through LINE-LINE recombination. Hum. Mol. Genet. 7, 1-11 (1998). | Article | PubMed | ISI | ChemPort | 15. The International Human Genome Mapping Consortium. A physical map of the human genome. Nature 409, 934-941 (2001). | Article | PubMed | ChemPort | 16. Jobling, M. A. et al. A selective difference between human Y-chromosomal DNA haplotypes. Curr. Biol. 8, 1391-1394 (1998). | PubMed | ISI | ChemPort | Acknowledgements. We thank C. Nusbaum and T. Hudson for materials and advice on radiation hybrid mapping; J. Crockett, J. Fedele, N. Florence, H. Grover, C. McCabe, N. Mudd, S. Sasso, D. Scheer, R. Seim and P. Shelby for technical contributions; and D. Berry, J. Bradley, Y. Lim, A. Lin, D. Menke, M. Royce-Tolland, J. Saionz and J. Wang for comments on the manuscript. Supported in part by NIH. Figure 1 Repetitive structure of euchromatic NRY. Bottom, schematic of the Y chromosome, comprising large NRY flanked by pseudoautosomal regions (yellow). NRY is divided into euchromatic and heterochromatic (tan, shown truncated) portions, roughly 24 and 30 Mb, respectively. pter, short-arm telomere; cen, centromere; qter, long-arm telomere. Within euchromatic NRY, regions rich in NRY-specific amplicons (blue) or sequence similarity to X chromosome (red) are shown. Above chromosome schematic are positions of some NRY genes; most are found in amplicons (blue) or have X-linked homologues (red). Above genes is a plot of the average number of NRY BACs that contain each of the 758 STSs mapped (136 of these STSs at two or more locations) along euchromatic NRY. As expected, STSs in amplicon regions tended to be present in more BACs than STSs in X-homologous or unshaded regions. (Plotted values are local averages within sliding window of five consecutive STSs; values reflect all NRY BACs containing those STSs, not just BACs assigned to site indicated.) Some amplicon regions were under-represented in the BAC library; four gaps remain (red diamonds; 100 kb each) in BAC coverage of NRY. Top, STS-based dot plot of euchromatic NRY. Each dot reflects occurrence of a particular STS at two points in map (complete map shown in Fig. 1 in Supplementary Information). Dots fall almost exclusively within amplicon regions. Many repeats of entire groups of STSs are apparent, with lines parallel to light grey diagonal indicating direct repeats and lines perpendicular to light grey diagonal indicating inverted repeats. Green arrows, inverted repeats flanking 3.5-Mb inversion (see text). Pale red lines, centromere.