SOLiD Sequencing and the Thousand Genomes Project

advertisement
SOLiD Sequencing and the Thousand Genomes Project
The Thousand Genomes Project (www.1000genomes.org) is a large-scale sequencing
project involving an international collaboration of scientists who are attempting to
provide the scientific community with a catalog of variants that are present at the 1% (or
greater) level in the human population across most of the genome, and down to 0.5% (or
less) within genes. This high-resolution map of variation will include single nucleotide
polymorphisms as well as structural variants. Once complete, this project will produce
the clearest picture yet of the spectrum of normal human genetic variation, and will likely
require the sequencing of at least 1000 individual human genomes.
At the Human Genome Sequencing Center (Baylor College of Medicine, Houston, TX)
we are working to produce over 200 mappable gigabases of sequence for the thousand
genomes pilot projects over the course of a 5 month period using 6 ABI SOLiD
sequencers. We are in the process of producing 140 mappable gigabases of SOLiD data
on 24 HapMap samples (as part of the 2x “light sequencing” pilot project), and we have
already produced over 60 mappable gigabases of SOLiD data on a single HapMap
sample (as part of the 20x “deep sequencing” pilot project). Our deep sequencing data
consists of about 14x coverage using 25bp tag paired-end libraries and about 9x coverage
using 35bp fragments.
The details of our SOLiD sequencing program for the Thousand Genomes Pilot Projects
will be presented here, along with a report on our experiences with SNP calling using
deep sequencing data and the vendor-supplied mapping/SNP-calling software, and our
progress on developing software to identify structural variants by looking for clusters of
paired-end reads which show characteristic rearrangement signals.
Download