Xenopus tropicalis Genome Project Advisory Board

advertisement
Xenopus tropicalis Genome Project
Advisory Board Memorandum
March 28th 2002
The overarching goal of the Xenopus tropicalis Genome Project is to produce high
quality sequence and annotation that meets the needs of the research community. To
further this goal, a project advisory board was formed to foster communication between
the research community and the Joint Genome Institute and to help coordinate the efforts
of all collaborators. The advisory board will also monitor the progress of the project and
provide feedback to the JGI and OBER. Issues the board will deal with include assessing
the community awareness and expectations, disseminating information, and maintaining
open lines of communication with all interested parties. A steering committee will meet
regularly (or by teleconference) to discuss aspects of the project and ensure all concerns
are addressed in a timely manner.
The first meeting of the Advisory Board occurred on March 12th 2002 at JGI
1. Sequencing and assembly: goals, strategy, and timetable
1.1 Goals
A high quality draft genome is desired, which should meet two minimal criteria.
 Large-scale contiguity (on a scale of tens of kilobases) to ensure that a large fraction
of the features of interest (exons, promoters, regulatory regions, etc.) are
uninterrupted by gaps and are covered by high quality sequence with a low average
error rate below a few parts in ten thousand. This can typically be achieved by 5-6X
coverage of such regions, although it is recognized that statistical fluctuations and
inevitable cloning biases mean that higher total shotgun coverage will be needed to
ensure that most features of interest are covered at that level. The total coverage to be
obtained will be approximately 8X or a total of 13 billion bases considering a 1.7Gb
genome.
 Long-range linking of contigs such that each gene is contained within long scaffolds
that encompass tens of neighboring genes and/or a megabase-sized region of genome.
Such long-range linking information is directly useful for positional cloning, and
makes a map-anchored assembly practical. It is recognized that heterochromatic
regions are intrinsically difficult to clone and sequence, and that such regions will
inevitably remain in fragmentary assemblies until considerable additional effort is
brought to bear on them.
In the long run, it will be desirable to have large portions of the Xenopus tropicalis
genome at finished quality, but it is recognized that this is a costly proposition. In the
short term, such finishing projects will no doubt be carried out on a BAC-by-BAC basis
in individual labs, and arrangements should be made to incorporate this finished sequence
into the reference X. tropicalis genome in a timely manner. In the long run, such a
distributed effort does not take advantage of the substantial cost advantages of a focussed
and centralized finishing effort. As the X. tropicalis genome project proceeds, further
consideration of a large scale finishing strategy can be entertained.
1.2 Sequencing strategy
DNA from a single, 6th generation inbred Nigerian individual will be used for all libraries
to minimize complications from polymorphisms, except for certain BAC libraries as
noted below. Preliminary studies of AFLP and isozyme data (D. Morizot, personal
communication) suggest that this individual will have an allelic polymorphism rate
comparable to fugu, and therefore suitable for a whole genome shotgun approach.
Additional studies will be undertaken at JGI in spring 2002 to measure the quantity and
quality of polymorphisms in the inbred individual nominated for sequencing.
Shotgun sequence from three groups of libraries will be undertaken:

Small insert sequence: end-sequences from 3-4 Kbp insert plasmids will provide the
bulk of the sequence coverage. These can be produced at high throughput at the JGI.

Medium insert sequence: To achieve the desired degree of scaffolding, it is essential
that sequence from 8-12 Kbp insert plasmid end-sequences be available to at least 6X
clone coverage. Assuming 85% pair pass rate and 10 Kbp clones, this amounts to 1.2
million clones = 3,125 384-well plates of clones = 25 days of JGI capacity. Paired
end sequence would then result in 0.7X sequence coverage from these libraries. If 15
kb insert phage libraries could be sequenced, the amount of sequenced needed from
such a library would be cut by one third.

Large insert sequences: To achieve long range scaffolding and to anchor the
assembly to a map, substantial BAC-end sequence is required. Approximately 10X
clone coverage from a 150 Kbp insert BAC library should be available by the end of
2003 from an NIH funded project. End sequences from these 115,000 clones will
total 98 Mbp (assuming 85% pass, 500 bp read length), resulting in one end roughly
every 9 kb.
Initially, a total of 6X shotgun sequence will be produced from these three sets of
libraries. Regular assemblies will be monitored for contiguity and linkage. When 5-6X
coverage is reached, a decision will be made regarding the nature of the remaining
sequencing. Options are to continue with random whole genome shotgun sampling or
switch to a directed BAC- or BAC-pool-based approach guided by the map and the 5-6X
assembly. The directed approaches would focus the remaining 2-3X worth of sequence
on poorly assembled regions to ensure a uniformly high quality sequence. Some of this
sequence may be derived from an Ivory Coast strain to facilitate SNP detection for
genetic mapping purposes.
1.3 Timetable
Initial sequencing will commence in the summer of 2002 and library QC. Pending final
approval of the project from DoE, the JGI will aim to produce approximately 5X raw
sequence coverage of the X. tropicalis genome by the middle of calendar year 2004 from
small and medium insert plasmids. This total sequence produced for the project at that
point will be on the order of 8.5 billion bases. Thus the 5-6X decision point would be
reached in mid 2004. The remaining sequencing would be carried out with the intent of
completing the data collection phase of the project by the end of 2004. An initial
publication of the draft genome would be expected late 2004, with the final sequence
available early in 2005.
2. Additional resources: BACs, maps, and ESTs
Several additional resources are critical to the success of the project. These are available
from a variety of sources, and are points of dependency for the success of the project.
2.1 BAC resources and maps
An essential need is the construction of a large (>150 Kbp) insert BAC library, and its
subsequent fingerprinting and end-sequencing to high (>10X) clone coverage to aid in
both mapping and assembly.
 Such a large insert BAC library is currently under construction in Chris Amemiya’s
group (Seattle, WA) using DNA from a sister of the inbred individual whose DNA
will be used for shotgun sequencing. The library is expected to become available in
fall of 2002, and will be distributed by Pieter deJong (Oakland, CA).
 The NIH plans to fund and arrange the fingerprinting and end-sequencing of these
BACs to 10X clone coverage (approximately 115,000 clones). This end sequence
will be rapidly deposited in GenBank and made freely available to the community.
To get the most synergy with the genome sequencing project, the mapping project
should be completed by early to mid 2004.
 A smaller insert (75 kb) X. tropicalis BAC library has been constructed by Shi Qin
(Seattle, WA), and will also be distributed by deJong et al. DNA for this library came
from another individual? The library has been carefully checked, and encompasses
6.5X clone coverage of the X. tropicalis genome. Any additional end sequence
information obtained from this library will greatly aid the assembly.
2.2 Expressed sequences and full-length cDNAs
Several large scale EST and full-length cDNA sequencing projects are contemplated or
underway. These include libraries from Xenopus laevis and tropicalis. The committee
will endeavor to engage these groups in the X. tropicalis genome project and will make
every effort to include these groups as collaborators. This data will be essential for the
analysis and annotation of the assemblies.

NIH: approximately 50 cDNA libraries from a wide range of tissues and
developmental stages have been constructed for both X. laevis and X. tropicalis.
Most are un-normalized, and most have not taken any steps to recover full-length
cDNAs.

Sanger Centre (Jane Rogers): A project to generate 200,000 ESTs (from both 5’ and
3’ ends) from X. tropicalis fertilized egg, gastrula, and neurula stages is ongoing.

France (Nicolas Pollet): 30,000 X. tropicalis cDNA clones with both 5’ and 3’
sequences from stage 25/35 head and brain/spinal cord of metamorphic tadpoles.

Japan (Naoto Ueno): 70,000 X. laevis ESTs (including 5’ and 3’ sequences) have
been determined from normalized cDNA libraries constructed from stage 15 and 25
embryos. 25,000 contigs have been found, and 12,000 singlets. By the end of
summer 2002, an additional 70-80,000 ESTs will be generated from a normalized
stage 10.5 library.

Other EST projects will likely be identified.
3. Data release policy, intermediate results, and publication
The JGI will rapidly deposit raw sequence fragments (on a weekly basis) into NCBI
Trace Archive to accelerate biological discovery prior to publication of the full genome
sequence. As with other genome projects participating in the Trace Archive, deposited
sequences are released as "private communications" to the research community. These
are subject to the restriction that the JGI and the Xenopus tropicalis Genome Consortium
reserves the right to publish the first whole genome assembly and genome-wide analysis
of the Xenopus tropicalis sequence, as described in more detail in the Trace Archive Data
Release Policies (web link).
The JGI and the XTGC will endeavor to publish the genome in a timely fashion when
sufficient data has been produced to permit a largely complete catalog of intact X.
tropicalis genes. Based on prior experience with shotgun genome sequencing of a variety
of organisms, this is expected to occur at approximately 6X total sequence coverage,
although as noted above, additional sequencing will be needed to achieve the stated goals
of the project. The genome would be submitted for publication within six months of
achieving the 6X landmark, at which point the raw data and genome sequence would
become freely available for all uses. Based on the projected timetable described above,
this would occur by the end of 2004. The entire project (8X total sequence, achieving the
goals stated at the top of this document) would be complete by mid 2005.
Fragmentary assemblies become feasible and worthwhile at sequence coverages above 34X, at which point short (several kilobase) contigs emerge. Since these partial
assemblies can be useful for biological discovery, the JGI will provide them at regular
intervals (e.g., each integral increment in sequence coverage). As part of the Xenopus
tropicalis project, the JGI will maintain an up-to-date Genome Browser containing
current annotations of the various assemblies. During 2004-05, the JGI will host several
Annotation Festivals to take advantage of community expertise to annotate the X.
tropicalis genome, as planned for other JGI genomes. The JGI Browser will aim to be a
robust central resource for X. tropicalis annotation and expression data as it relates to the
genome sequence.
Xenopus tropicalis Genome Project Advisory Board
* Marvin Frazier, DOE Office of Science
* Bruce Blumberg, University of California, Irvine
* Rob Grainger, University of Virginia
* Richard Harland, University of California, Berkeley
* Richard Meyers, Stanford University
* Paul Richardson, Joint Genome Institute
*Dan Rokhsar, Joint Genome Institute
*Eddy Rubin, Lawrence Berkeley Laboratory, Joint Genome Instititute
Janet Heasman, University of Cincinnati Medical School
Marc Kirschner, Harvard Medical School
Steve Klein, National Institute of Child Health & Human Development, NIH
Nicolas Pollet, Division of Molecular Embryology, German Cancer Research
Center
Enrique Amaya, Wellcome/CRC Institute of Cancer and Developmental Biology,
University of Cambridge
Jim Smith, Wellcome/CRC Institute of Cancer and Developmental Biology,
University of Cambridge
Naoto Ueno, National Institute for Basic Biology, Okazaki, Japan
Peter Vize, University of Calgary
Lyle Zimmerman, National Institute for Medical Research, London
Len Zon, Harvard Medical School
Aaron Zorn, University of Cincinnati Medical School
* Steering Committee Members
Letters of Support
From: "Klein, Steven (NICHD)" <kleins@exchange.nih.gov>
To: "'Paul Richardson'" <pmrichardson@lbl.gov>
Cc: "Alexander, Duane (NICHD)" <alexandd@exchange.nih.gov>,
"Willoughby, Anne (NICHD)" <willouga@mail.nih.gov>,
"Hewitt, Tyl (NICHD)"
<th119v@mail.nih.gov>,
"'Richard Harland'" <harland@socrates.berkeley.edu>,
"Peterson, Jane (NHGRI)" <petersoj@exchange.nih.gov>,
"Guyer, Mark (NHGRI)" <guyerm@exchange.nih.gov>,
"Felsenfeld, Adam (NHGRI)" <felsenfa@exchange.nih.gov>
Subject: Xenopus BAC Resources
Date: Thu, 28 Mar 2002 10:23:17 -0500
March 28, 2002
Paul M. Richardson, Ph.D.
Manager of Functional Genomics
US Department of Energy Joint Genome Institute
2800 Mitchell Drive
Walnut Creek, CA 94598
Dear Dr. Richardson, Paul
I'm writing to express our enthusiastic support for your project to sequence
the genome of Xenopus tropicalis. The National Institute of Child Health
and Human Development (NICHD) is interested in, and intends to support and
arrange the production of BAC-related resources that will aid in the
project. These resources include:
1) BAC Libraries
NICHD is funding the production of the two Xenopus tropicalis BAC libraries
that are being made by Chris Amemiya. These libraries will have inserts of
approximately 150kb and 10X depth. We expect that they will be available
soon.
2) BAC-End Sequencing and Fingerprint Maps
The director of NICHD has approved funds to sequence BAC-ends and to
generate fingerprint maps of the 150kb BAC libraries. NICHD plans to
contribute up to $3 million to this project (up to $1.5 million per year for
fiscal years 2003 and 2004).
NICHD and the National Human Genome Research Institute (NHGRI) are arranging
for the BAC-end sequencing and fingerprinting to be done by John McPherson
at Washington University. We sent him your requirements for these resources
and asked him for an estimate of their cost. We are awaiting his response.
We expect that NICHD's approved funds will be sufficient to generate the
required resources. However, because we don't yet know the project's exact
cost, we can't yet guarantee that our funds will be enough. If additional
funds are required, we expect that we will be able to obtain them from other
NIH sources.
NICHD is deeply committed to providing the BAC resources that you require.
I hope that the information presented herein provides an adequate
demonstration of that commitment. Please let me know if you would like any
additional information.
Sincerely,
Steve
Steven L. Klein, Ph.D.
Program Official &
Chair, Trans-NIH Xenopus Working Group
Developmental Biology, Genetics &
Teratology Branch
National Institute of Child Health
& Human Development, NIH
6100 Executive Boulevard
Rockville, MD 20852
Aaron M Zorn Ph D
Assistant Professor of Pediatrics
Division of Developmental Biology
Children's Hospital Medical Center
3333 Burnet Avenue
Cincinnati, OH 45229-3039
tel: (513) 636-3770, fax: (513) 636-4317
aaron.zorn@chmcc.org
25 March 2002
Richard Harland
University of California, Berkeley
Department of Molecular & Cell Biology
401 Barker Hall #3204
Berkeley, CA 94720-3204
harland@socrates.berkeley.edu
Dear Dr. Harland,
My colleagues and I are extremely enthusiastic about the DEO’s proposal to sequence the
Xenopus tropicalis genome at their Joint Genome Institute. I think that the entire
Xenopus community would agree that having the tropicalis genome sequence would be
enormously valuable, in both accelerating current Xenopus research as well as opening up
new avenues and approaches to Xenopus research. With the genome sequence in hand
Xenopus tropicalis will very likely become the premier vertebrate model for functional
genomics. Furthermore, from the point of view of biology in general, it is clear that the
Xenopus genome really is the best vertebrate genome to be sequenced next! I am pleased
that such an undertaking might be done by the JGI who have a proven track record in
whole genome sequencing and data annotation.
I am happy to collaborate with the JGI on this project in any way that I can. In particular,
I see our ongoing efforts with the Sanger Centre (in the UK) to generate Xenopus
tropicalis EST sequences as a possible area that we would interact closely with the JGI.
As you know, we and the Sanger Center are committed to making the data from our EST
efforts public in timely fashion as it becomes available. It appears that the JGI shares this
philosophy and therefore I would be keen to participate in periodic "annotation
jamborees" when the draft sequence reaches the stage where this makes sense. Linking
our publicly available EST data to the JGI produced genomic scaffold will undoubtedly
provide a valuable resource.
In summary the JGI’s proposal to sequence the tropicalis genome is a wonderful
opportunity for the Xenopus research community. For my part I completely lend my
support to this endeavor and I will be happy to collaborate with the JGI however I can.
Best wishes,
Aaron M Zorn Ph D
Assistant Professor of Pediatrics
Children's Hospital Medical Center, Cincinnati
University of California, Irvine
Developmental and Cell Biology
School of Biological Sciences
Bruce Blumberg, Ph.D.
Assistant Professor
March 23, 2002
Richard Harland
University of California, Berkeley
Department of Molecular & Cell Biology
401 Barker Hall #3204
Berkeley, CA 94720-3204
Dear Richard,
I am writing to express my strong enthusiasm for the Xenopus tropicalis genome project
being considered by the DOE Joint Genome Institute. This is a project of great
importance for the Xenopus community and for my own research; hence I offer my fullest
support for this effort.
As you know, my laboratory is currently engaged in constructing normalized, full-length
cDNA libraries from X. tropicalis. The libraries will be normalized, arrayed in 384-well
plates and constructed in a pCS2 derivative that is optimized for genomic and functional
studies. We plan to make these libraries available to the community without restriction
and hope that they will serve as a valuable resource for jump-starting the use of X.
tropicalis in many laboratories. These libraries will be made available to the Xenopus
EST project and for full-length cDNA sequencing if funding ever becomes available for
such an effort.
I would also like to express my willingness to participate in the annotation jamborees that
will begin when the sequence assembly stage begins.
Please let me know if I can help the effort in any other way.
Best wishes,
Bruce Blumberg
Professor Richard Harland
University of California, Berkeley
Department of Molecular & Cell Biology
401 Barker Hall #3204
Berkeley, CA 94720-3204
Dear Richard:
I am pleased to write this latter to encourage Joint Genome Institute to initiate Xenopus tropicalis
genome sequencing.
Taking advantage of diploid nature and future availability of genetics, sequencing Xenopus
tropicalis genome is extremely important because of two reasons. First, its output will facilitate
developmental genetics using Xenopus. Combined with efficient functional assays using Xenopus,
the genome sequence data will provide the most useful platform for functional genomics of
vertebrate. Our independent effort on EST sequencing of Xenopus laevis, has been able to generate
70,000 ESTs which have been disclosed just recently as a database. However, we still need to
refine the EST sequence set to unigene set of full length cDNAs eventually. To accomplish this,
genome sequence data are essential even if they are from Xenopus tropicalis. Second, the
sequencing project is also important from the aspect of evolutionary biology. It is going to be the
first challenge to reveal amphibian genome sequence. The genome sequence information will tell
us how aquatic animals evolved to terrestrial animals during the long history of animal
evolution.
Finally, as a member of Xemopus research community, I am happy to contribute to annotation
meetings if our experience of Xenopus EST sequencing projects helps.
Sincerely,
Naoto Ueno, Professor
Department of Developmental Biology
National Institute for Basic Biology
38 Nishigonaka, Myodaiji
Okazaki 444-8585, JAPAN
TEL: 81-564-55-7570
FAX: 81-564-55-7571
nueno@nibb.ac.jp
Download