Xenopus tropicalis Genome Project Advisory Board Memorandum March 28th 2002 The overarching goal of the Xenopus tropicalis Genome Project is to produce high quality sequence and annotation that meets the needs of the research community. To further this goal, a project advisory board was formed to foster communication between the research community and the Joint Genome Institute and to help coordinate the efforts of all collaborators. The advisory board will also monitor the progress of the project and provide feedback to the JGI and OBER. Issues the board will deal with include assessing the community awareness and expectations, disseminating information, and maintaining open lines of communication with all interested parties. A steering committee will meet regularly (or by teleconference) to discuss aspects of the project and ensure all concerns are addressed in a timely manner. The first meeting of the Advisory Board occurred on March 12th 2002 at JGI 1. Sequencing and assembly: goals, strategy, and timetable 1.1 Goals A high quality draft genome is desired, which should meet two minimal criteria. Large-scale contiguity (on a scale of tens of kilobases) to ensure that a large fraction of the features of interest (exons, promoters, regulatory regions, etc.) are uninterrupted by gaps and are covered by high quality sequence with a low average error rate below a few parts in ten thousand. This can typically be achieved by 5-6X coverage of such regions, although it is recognized that statistical fluctuations and inevitable cloning biases mean that higher total shotgun coverage will be needed to ensure that most features of interest are covered at that level. The total coverage to be obtained will be approximately 8X or a total of 13 billion bases considering a 1.7Gb genome. Long-range linking of contigs such that each gene is contained within long scaffolds that encompass tens of neighboring genes and/or a megabase-sized region of genome. Such long-range linking information is directly useful for positional cloning, and makes a map-anchored assembly practical. It is recognized that heterochromatic regions are intrinsically difficult to clone and sequence, and that such regions will inevitably remain in fragmentary assemblies until considerable additional effort is brought to bear on them. In the long run, it will be desirable to have large portions of the Xenopus tropicalis genome at finished quality, but it is recognized that this is a costly proposition. In the short term, such finishing projects will no doubt be carried out on a BAC-by-BAC basis in individual labs, and arrangements should be made to incorporate this finished sequence into the reference X. tropicalis genome in a timely manner. In the long run, such a distributed effort does not take advantage of the substantial cost advantages of a focussed and centralized finishing effort. As the X. tropicalis genome project proceeds, further consideration of a large scale finishing strategy can be entertained. 1.2 Sequencing strategy DNA from a single, 6th generation inbred Nigerian individual will be used for all libraries to minimize complications from polymorphisms, except for certain BAC libraries as noted below. Preliminary studies of AFLP and isozyme data (D. Morizot, personal communication) suggest that this individual will have an allelic polymorphism rate comparable to fugu, and therefore suitable for a whole genome shotgun approach. Additional studies will be undertaken at JGI in spring 2002 to measure the quantity and quality of polymorphisms in the inbred individual nominated for sequencing. Shotgun sequence from three groups of libraries will be undertaken: Small insert sequence: end-sequences from 3-4 Kbp insert plasmids will provide the bulk of the sequence coverage. These can be produced at high throughput at the JGI. Medium insert sequence: To achieve the desired degree of scaffolding, it is essential that sequence from 8-12 Kbp insert plasmid end-sequences be available to at least 6X clone coverage. Assuming 85% pair pass rate and 10 Kbp clones, this amounts to 1.2 million clones = 3,125 384-well plates of clones = 25 days of JGI capacity. Paired end sequence would then result in 0.7X sequence coverage from these libraries. If 15 kb insert phage libraries could be sequenced, the amount of sequenced needed from such a library would be cut by one third. Large insert sequences: To achieve long range scaffolding and to anchor the assembly to a map, substantial BAC-end sequence is required. Approximately 10X clone coverage from a 150 Kbp insert BAC library should be available by the end of 2003 from an NIH funded project. End sequences from these 115,000 clones will total 98 Mbp (assuming 85% pass, 500 bp read length), resulting in one end roughly every 9 kb. Initially, a total of 6X shotgun sequence will be produced from these three sets of libraries. Regular assemblies will be monitored for contiguity and linkage. When 5-6X coverage is reached, a decision will be made regarding the nature of the remaining sequencing. Options are to continue with random whole genome shotgun sampling or switch to a directed BAC- or BAC-pool-based approach guided by the map and the 5-6X assembly. The directed approaches would focus the remaining 2-3X worth of sequence on poorly assembled regions to ensure a uniformly high quality sequence. Some of this sequence may be derived from an Ivory Coast strain to facilitate SNP detection for genetic mapping purposes. 1.3 Timetable Initial sequencing will commence in the summer of 2002 and library QC. Pending final approval of the project from DoE, the JGI will aim to produce approximately 5X raw sequence coverage of the X. tropicalis genome by the middle of calendar year 2004 from small and medium insert plasmids. This total sequence produced for the project at that point will be on the order of 8.5 billion bases. Thus the 5-6X decision point would be reached in mid 2004. The remaining sequencing would be carried out with the intent of completing the data collection phase of the project by the end of 2004. An initial publication of the draft genome would be expected late 2004, with the final sequence available early in 2005. 2. Additional resources: BACs, maps, and ESTs Several additional resources are critical to the success of the project. These are available from a variety of sources, and are points of dependency for the success of the project. 2.1 BAC resources and maps An essential need is the construction of a large (>150 Kbp) insert BAC library, and its subsequent fingerprinting and end-sequencing to high (>10X) clone coverage to aid in both mapping and assembly. Such a large insert BAC library is currently under construction in Chris Amemiya’s group (Seattle, WA) using DNA from a sister of the inbred individual whose DNA will be used for shotgun sequencing. The library is expected to become available in fall of 2002, and will be distributed by Pieter deJong (Oakland, CA). The NIH plans to fund and arrange the fingerprinting and end-sequencing of these BACs to 10X clone coverage (approximately 115,000 clones). This end sequence will be rapidly deposited in GenBank and made freely available to the community. To get the most synergy with the genome sequencing project, the mapping project should be completed by early to mid 2004. A smaller insert (75 kb) X. tropicalis BAC library has been constructed by Shi Qin (Seattle, WA), and will also be distributed by deJong et al. DNA for this library came from another individual? The library has been carefully checked, and encompasses 6.5X clone coverage of the X. tropicalis genome. Any additional end sequence information obtained from this library will greatly aid the assembly. 2.2 Expressed sequences and full-length cDNAs Several large scale EST and full-length cDNA sequencing projects are contemplated or underway. These include libraries from Xenopus laevis and tropicalis. The committee will endeavor to engage these groups in the X. tropicalis genome project and will make every effort to include these groups as collaborators. This data will be essential for the analysis and annotation of the assemblies. NIH: approximately 50 cDNA libraries from a wide range of tissues and developmental stages have been constructed for both X. laevis and X. tropicalis. Most are un-normalized, and most have not taken any steps to recover full-length cDNAs. Sanger Centre (Jane Rogers): A project to generate 200,000 ESTs (from both 5’ and 3’ ends) from X. tropicalis fertilized egg, gastrula, and neurula stages is ongoing. France (Nicolas Pollet): 30,000 X. tropicalis cDNA clones with both 5’ and 3’ sequences from stage 25/35 head and brain/spinal cord of metamorphic tadpoles. Japan (Naoto Ueno): 70,000 X. laevis ESTs (including 5’ and 3’ sequences) have been determined from normalized cDNA libraries constructed from stage 15 and 25 embryos. 25,000 contigs have been found, and 12,000 singlets. By the end of summer 2002, an additional 70-80,000 ESTs will be generated from a normalized stage 10.5 library. Other EST projects will likely be identified. 3. Data release policy, intermediate results, and publication The JGI will rapidly deposit raw sequence fragments (on a weekly basis) into NCBI Trace Archive to accelerate biological discovery prior to publication of the full genome sequence. As with other genome projects participating in the Trace Archive, deposited sequences are released as "private communications" to the research community. These are subject to the restriction that the JGI and the Xenopus tropicalis Genome Consortium reserves the right to publish the first whole genome assembly and genome-wide analysis of the Xenopus tropicalis sequence, as described in more detail in the Trace Archive Data Release Policies (web link). The JGI and the XTGC will endeavor to publish the genome in a timely fashion when sufficient data has been produced to permit a largely complete catalog of intact X. tropicalis genes. Based on prior experience with shotgun genome sequencing of a variety of organisms, this is expected to occur at approximately 6X total sequence coverage, although as noted above, additional sequencing will be needed to achieve the stated goals of the project. The genome would be submitted for publication within six months of achieving the 6X landmark, at which point the raw data and genome sequence would become freely available for all uses. Based on the projected timetable described above, this would occur by the end of 2004. The entire project (8X total sequence, achieving the goals stated at the top of this document) would be complete by mid 2005. Fragmentary assemblies become feasible and worthwhile at sequence coverages above 34X, at which point short (several kilobase) contigs emerge. Since these partial assemblies can be useful for biological discovery, the JGI will provide them at regular intervals (e.g., each integral increment in sequence coverage). As part of the Xenopus tropicalis project, the JGI will maintain an up-to-date Genome Browser containing current annotations of the various assemblies. During 2004-05, the JGI will host several Annotation Festivals to take advantage of community expertise to annotate the X. tropicalis genome, as planned for other JGI genomes. The JGI Browser will aim to be a robust central resource for X. tropicalis annotation and expression data as it relates to the genome sequence. Xenopus tropicalis Genome Project Advisory Board * Marvin Frazier, DOE Office of Science * Bruce Blumberg, University of California, Irvine * Rob Grainger, University of Virginia * Richard Harland, University of California, Berkeley * Richard Meyers, Stanford University * Paul Richardson, Joint Genome Institute *Dan Rokhsar, Joint Genome Institute *Eddy Rubin, Lawrence Berkeley Laboratory, Joint Genome Instititute Janet Heasman, University of Cincinnati Medical School Marc Kirschner, Harvard Medical School Steve Klein, National Institute of Child Health & Human Development, NIH Nicolas Pollet, Division of Molecular Embryology, German Cancer Research Center Enrique Amaya, Wellcome/CRC Institute of Cancer and Developmental Biology, University of Cambridge Jim Smith, Wellcome/CRC Institute of Cancer and Developmental Biology, University of Cambridge Naoto Ueno, National Institute for Basic Biology, Okazaki, Japan Peter Vize, University of Calgary Lyle Zimmerman, National Institute for Medical Research, London Len Zon, Harvard Medical School Aaron Zorn, University of Cincinnati Medical School * Steering Committee Members Letters of Support From: "Klein, Steven (NICHD)" <kleins@exchange.nih.gov> To: "'Paul Richardson'" <pmrichardson@lbl.gov> Cc: "Alexander, Duane (NICHD)" <alexandd@exchange.nih.gov>, "Willoughby, Anne (NICHD)" <willouga@mail.nih.gov>, "Hewitt, Tyl (NICHD)" <th119v@mail.nih.gov>, "'Richard Harland'" <harland@socrates.berkeley.edu>, "Peterson, Jane (NHGRI)" <petersoj@exchange.nih.gov>, "Guyer, Mark (NHGRI)" <guyerm@exchange.nih.gov>, "Felsenfeld, Adam (NHGRI)" <felsenfa@exchange.nih.gov> Subject: Xenopus BAC Resources Date: Thu, 28 Mar 2002 10:23:17 -0500 March 28, 2002 Paul M. Richardson, Ph.D. Manager of Functional Genomics US Department of Energy Joint Genome Institute 2800 Mitchell Drive Walnut Creek, CA 94598 Dear Dr. Richardson, Paul I'm writing to express our enthusiastic support for your project to sequence the genome of Xenopus tropicalis. The National Institute of Child Health and Human Development (NICHD) is interested in, and intends to support and arrange the production of BAC-related resources that will aid in the project. These resources include: 1) BAC Libraries NICHD is funding the production of the two Xenopus tropicalis BAC libraries that are being made by Chris Amemiya. These libraries will have inserts of approximately 150kb and 10X depth. We expect that they will be available soon. 2) BAC-End Sequencing and Fingerprint Maps The director of NICHD has approved funds to sequence BAC-ends and to generate fingerprint maps of the 150kb BAC libraries. NICHD plans to contribute up to $3 million to this project (up to $1.5 million per year for fiscal years 2003 and 2004). NICHD and the National Human Genome Research Institute (NHGRI) are arranging for the BAC-end sequencing and fingerprinting to be done by John McPherson at Washington University. We sent him your requirements for these resources and asked him for an estimate of their cost. We are awaiting his response. We expect that NICHD's approved funds will be sufficient to generate the required resources. However, because we don't yet know the project's exact cost, we can't yet guarantee that our funds will be enough. If additional funds are required, we expect that we will be able to obtain them from other NIH sources. NICHD is deeply committed to providing the BAC resources that you require. I hope that the information presented herein provides an adequate demonstration of that commitment. Please let me know if you would like any additional information. Sincerely, Steve Steven L. Klein, Ph.D. Program Official & Chair, Trans-NIH Xenopus Working Group Developmental Biology, Genetics & Teratology Branch National Institute of Child Health & Human Development, NIH 6100 Executive Boulevard Rockville, MD 20852 Aaron M Zorn Ph D Assistant Professor of Pediatrics Division of Developmental Biology Children's Hospital Medical Center 3333 Burnet Avenue Cincinnati, OH 45229-3039 tel: (513) 636-3770, fax: (513) 636-4317 aaron.zorn@chmcc.org 25 March 2002 Richard Harland University of California, Berkeley Department of Molecular & Cell Biology 401 Barker Hall #3204 Berkeley, CA 94720-3204 harland@socrates.berkeley.edu Dear Dr. Harland, My colleagues and I are extremely enthusiastic about the DEO’s proposal to sequence the Xenopus tropicalis genome at their Joint Genome Institute. I think that the entire Xenopus community would agree that having the tropicalis genome sequence would be enormously valuable, in both accelerating current Xenopus research as well as opening up new avenues and approaches to Xenopus research. With the genome sequence in hand Xenopus tropicalis will very likely become the premier vertebrate model for functional genomics. Furthermore, from the point of view of biology in general, it is clear that the Xenopus genome really is the best vertebrate genome to be sequenced next! I am pleased that such an undertaking might be done by the JGI who have a proven track record in whole genome sequencing and data annotation. I am happy to collaborate with the JGI on this project in any way that I can. In particular, I see our ongoing efforts with the Sanger Centre (in the UK) to generate Xenopus tropicalis EST sequences as a possible area that we would interact closely with the JGI. As you know, we and the Sanger Center are committed to making the data from our EST efforts public in timely fashion as it becomes available. It appears that the JGI shares this philosophy and therefore I would be keen to participate in periodic "annotation jamborees" when the draft sequence reaches the stage where this makes sense. Linking our publicly available EST data to the JGI produced genomic scaffold will undoubtedly provide a valuable resource. In summary the JGI’s proposal to sequence the tropicalis genome is a wonderful opportunity for the Xenopus research community. For my part I completely lend my support to this endeavor and I will be happy to collaborate with the JGI however I can. Best wishes, Aaron M Zorn Ph D Assistant Professor of Pediatrics Children's Hospital Medical Center, Cincinnati University of California, Irvine Developmental and Cell Biology School of Biological Sciences Bruce Blumberg, Ph.D. Assistant Professor March 23, 2002 Richard Harland University of California, Berkeley Department of Molecular & Cell Biology 401 Barker Hall #3204 Berkeley, CA 94720-3204 Dear Richard, I am writing to express my strong enthusiasm for the Xenopus tropicalis genome project being considered by the DOE Joint Genome Institute. This is a project of great importance for the Xenopus community and for my own research; hence I offer my fullest support for this effort. As you know, my laboratory is currently engaged in constructing normalized, full-length cDNA libraries from X. tropicalis. The libraries will be normalized, arrayed in 384-well plates and constructed in a pCS2 derivative that is optimized for genomic and functional studies. We plan to make these libraries available to the community without restriction and hope that they will serve as a valuable resource for jump-starting the use of X. tropicalis in many laboratories. These libraries will be made available to the Xenopus EST project and for full-length cDNA sequencing if funding ever becomes available for such an effort. I would also like to express my willingness to participate in the annotation jamborees that will begin when the sequence assembly stage begins. Please let me know if I can help the effort in any other way. Best wishes, Bruce Blumberg Professor Richard Harland University of California, Berkeley Department of Molecular & Cell Biology 401 Barker Hall #3204 Berkeley, CA 94720-3204 Dear Richard: I am pleased to write this latter to encourage Joint Genome Institute to initiate Xenopus tropicalis genome sequencing. Taking advantage of diploid nature and future availability of genetics, sequencing Xenopus tropicalis genome is extremely important because of two reasons. First, its output will facilitate developmental genetics using Xenopus. Combined with efficient functional assays using Xenopus, the genome sequence data will provide the most useful platform for functional genomics of vertebrate. Our independent effort on EST sequencing of Xenopus laevis, has been able to generate 70,000 ESTs which have been disclosed just recently as a database. However, we still need to refine the EST sequence set to unigene set of full length cDNAs eventually. To accomplish this, genome sequence data are essential even if they are from Xenopus tropicalis. Second, the sequencing project is also important from the aspect of evolutionary biology. It is going to be the first challenge to reveal amphibian genome sequence. The genome sequence information will tell us how aquatic animals evolved to terrestrial animals during the long history of animal evolution. Finally, as a member of Xemopus research community, I am happy to contribute to annotation meetings if our experience of Xenopus EST sequencing projects helps. Sincerely, Naoto Ueno, Professor Department of Developmental Biology National Institute for Basic Biology 38 Nishigonaka, Myodaiji Okazaki 444-8585, JAPAN TEL: 81-564-55-7570 FAX: 81-564-55-7571 nueno@nibb.ac.jp