from DNA « the Queen molecule

advertisement
Bioinformatics
and Comparative Genome Analysis
Monday, march 19th 2007
Tunis
Molecular biology story:
DNA "the Queen molecule"
Odile Ozier-Kalogeropoulos
Institut Pasteur
Université Pierre et Marie Curie
E-mail :odozier@pasteur.fr
Introduction
Genomes: two views
QuickTime™ et un
décompresseur TIFF (non compressé)
sont requis pour visionner cette image.
QuickTime™ et un
décompresseur TIFF (non compress é)
sont requis pour visionner cette image.
http://www.pasteur.fr/externe
http://genetique.snv.jussieu.fr
View of genomes
for biologists
View of genomes
for computer scientists
Pasteur Genopole® Île-de-France, Plate-forme technologique 4
DNA molecule: two views
View 1
James Watson and Francis Crick (1953)
View 2
5'
3'
3'
5'
DNA sequence: one view
DNA sequence: one view
Sequencing DNA,
"the Queen molecule"
Sequencing DNA,
"the Queen molecule"
Most of sequencing methods are based on the
natural living systems use to copy and repair
their own genomes
Reminder!
Cell DNA synthesis
Reminder!
Cell DNA synthesis
The main role of DNA polymerase
Cell DNA synthesis
1
DNA polymerase
3'
5'
3'
http://www.snv.jussieu.fr/vie/dossiers/sequencage/sequence.htm
Cell DNA synthesis
1
2
DNA polymerase
3'
5'
3'
3'
http://www.snv.jussieu.fr/vie/dossiers/sequencage/sequence.htm
5'
3'
Cell DNA synthesis
1
2
DNA polymerase
3'
3'
5'
3'
3
3'
5'
3'
http://www.snv.jussieu.fr/vie/dossiers/sequencage/sequence.htm
5'
3'
Cell DNA synthesis
1
2
DNA polymerase
3'
3'
5'
3'
5'
3
4
3'
5'
3'
3'
http://www.snv.jussieu.fr/vie/dossiers/sequencage/sequence.htm
3'
5'
5'
3'
1
Foundation of
the current state-of-the-art
production genome sequencing
1 Foundation of the current state-of-the-art
production genome sequencing
1 Foundation of the current state-of-the-art
production genome sequencing
The Sanger method
1 Foundation of the current state-of-the-art
production genome sequencing
The Sanger method
1977
1 Foundation of the current state-of-the-art
production genome sequencing
The Sanger method
30th year celebration!
1977
DNA
isolation
The Sanger method
Sample
preparation
Sequence
production
Assembly
and analysis
DNA
isolation
The Sanger method
Sample
preparation
Sequence
production
Assembly
and analysis
The Sanger method
Focus on
Sequence
production
The Sanger method
http://www.snv.jussieu.fr/vie/dossiers/sequencage/sequence.htm
The Sanger method
DNA
polymerase
DNA
polymerase
http://www.snv.jussieu.fr/vie/dossiers/sequencage/sequence.htm
The Sanger method
http://www.snv.jussieu.fr/vie/dossiers/sequencage/sequence.htm
The Sanger method
Fragment separation by
electrophoresis on acrylamide gel
(resolution: 1 base)
The Sanger method
Reading progression
Fragment separation by
electrophoresis on acrylamide gel
(resolution: 1 base)
2
Current state-of-the-art
production genome sequencing in
high-throughput sequencing centers
2 Current state-of-the-art production genome sequencing
in high-throughput sequencing centers
Sanger production-scale genome sequencing requires
the 4 successive steps:
1
2
DNA
isolation
Sample
preparation
Laboratory
Chan E.Y. (2005), Mutation res, 573, 13-40
2 Current state-of-the-art production genome sequencing
in high-throughput sequencing centers
Sanger production-scale genome sequencing requires
the 4 successive steps:
1
2
DNA
isolation
Sample
preparation
Laboratory
3
Sequence
production
Robots
Chan E.Y. (2005), Mutation res, 573, 13-40
2 Current state-of-the-art production genome sequencing
in high-throughput sequencing centers
Sanger production-scale genome sequencing requires
the 4 successive steps:
1
2
DNA
isolation
Sample
preparation
Laboratory
3
Sequence
production
Robots
4
Assembly
and analysis
Computers
Chan E.Y. (2005), Mutation res, 573, 13-40
2 Current state-of-the-art production genome sequencing
in high-throughput sequencing centers
Sanger production-scale genome sequencing requires
the 4 successive steps:
1
3
2
DNA
isolation
Sample
preparation
Laboratory
Sequence
production
Robots
4
Assembly
and analysis
Computers
Humans
Chan E.Y. (2005), Mutation res, 573, 13-40
2 Current state-of-the-art production genome sequencing
in high-throughput sequencing centers
Sequence
production
Sequencing robots
Lab technician working with sequencing machines
Courtesy of Celera Genomics
DNA
isolation
Laboratory
Sample
preparation
Room filled with sequencing machines
Courtesy of Celera Genomics
2 Current state-of-the-art production genome sequencing
in high-throughput sequencing centers
Sequencing
robots
Assembly
and analysis
Lab with sequencing machines
Courtesy of Celera genomics
Computers
Close up of capillaries from a capillary sequencing machine
Courtesy of Celera Genomics
2 Current state-of-the-art production genome sequencing
in high-throughput sequencing centers
Assembly
and analysis
Computers
Plate-forme Génomique, Institut Pasteur
3
Sequencing statistics
http://www.genomesonline.org
Bacteria Archea
http://www.genomesonline.org
Eukarya
Metagenomes
*
others
*
F
F
*
High-throughput
sequencing centers
by country
http://www.genomesonline.org
UK
*
*
* USA
4
Why continue sequencing?
4 Why continue sequencing?
-Comparative genomics
-Impact on biomedical research
-The personal genome project
4 Why continue sequencing?
-Comparative genomics
-Impact on biomedical research
-The personal genome project
Figure 1 | Evolutionary relationship between metazoans that are sequenced or due for sequencing.
The simplified phylogenetic relationships between the metazoans for which the complete,
or nearly complete, genome sequences are available or will be available soon.
Evolutionary distances (in million years)
Abel Ureta-Vidal, Laurence Ettwiller & Ewan Birney (2003), Nature rev. genet., 4, pp251-262
- International sequence databases:
Sequence fragments of 100 000 species
- Estimation of the number of species:
14 millions at least...
Shendure, 2004 and Wikipedia
Number of sequences
in GenBank
(log scale)
The phylogenetic sequence deficit for the Metazoa
Mark Blaxter, 2002
- International sequence databases:
Sequence fragments of 100 000 species
- Estimation of the number of species:
14 millions at least...
Shendure, 2004 and Wikipedia
Number of sequences
in GenBank
(log scale)
The phylogenetic sequence deficit for the Metazoa
Mark Blaxter, 2002
- International sequence databases:
Sequence fragments of 100 000 species
- Estimation of the number of species:
14 millions at least...
Number of sequences
in GenBank
(log scale)
Shendure, 2004 and Wikipedia
molluscs, worms..
QuickTime™ et un
décompresseur TIFF (non compressé)
sont requis pour visionner cette image.
The phylogenetic sequence deficit for the Metazoa
Mark Blaxter, 2002
4 Why continue sequencing?
-Comparative genomics
-Impact on biomedical research
-The personal genome project
-Single Nucleotide Polymorphism
SNP
HapMap Project
A freely-available public resource
to increase the power and efficiency
of genetic association studies to medical traits
High-density SNP genotyping across the genome
provides information about:
– SNP validation, frequency, assay conditions
– correlation structure of alleles in the genome
Mark J. Daly, PhD
Associated alleles reported
Tag SNPs
Kirov 2004
2
3
4
5
Straub 2002
Van den Oord 2003
7 10
AGGCCA
Williams 2004
Bray 2005
AAGCCT
Mark J. Daly, PhD
Schwab 2003
AGGCCT
AGGCCA
AGATTA
GGATCA
Van den Bogaert 2003
Funke 2004
4 Why continue sequencing?
-Comparative genomics
-Impact on biomedical research
-The personal genome project
Sequencing of individual human genomes
as a component of preventative medicine
The National Human Genome Research Institute (NHGRI) solicits grant applications
to develop novel technologies that will enable extremely low-cost genomic DNA sequencing.
(2005-2006)
A genome:
$ 1000
Revolutionary Genome
Sequencing
Technologies
The $1000 Genome
For 2015
US$ 0.001
US$ 1
US$ 10 000
Today
Chan E.Y. (2005), Mutation res, 573, 13-40
5
Improvements of the Sanger
method during these 30 years
5 Improvements of the Sanger method during
these 30 years
DNA
isolation
Sample
preparation
Sequence
production
Assembly
and analysis
5 Improvements of the Sanger method during
these 30 years
DNA
isolation
Sample
preparation
Sequence
production
Assembly
and analysis
-Production of template DNA
- Labelling: Radioactivity/Fluorescent dyes
- Analysis of the DNA fragments produced:
Radioactivity detection/
Laser within an automated DNA sequencing machine
- Electrophoresis: acrylamide gel/capillaries
5 Improvements of the Sanger method during
these 30 years
DNA
isolation
Sample
preparation
Sequence
production
Assembly
and analysis
-Production of template DNA
- Labelling: Radioactivity/Fluorescent dyes
- Analysis of the DNA fragments produced:
Radioactivity detection/
Laser within an automated DNA sequencing machine
- Electrophoresis: acrylamide gel/capillaries
5 Improvements of the Sanger method during
these 30 years
DNA
isolation
Sample
preparation
Sequence
production
Assembly
and analysis
-Production of template DNA
- Labelling: Radioactivity/Fluorescent dyes
- Analysis of the DNA fragments produced:
Radioactivity detection/
Laser within an automated DNA sequencing machine
5 Improvements of the Sanger method during
these 30 years
DNA
isolation
Sample
preparation
Sequence
production
Assembly
and analysis
-Production of template DNA
- Labelling: Radioactivity/Fluorescent dyes
- Analysis of the DNA fragments produced:
Radioactivity detection/
Laser within an automated DNA sequencing machine
- Electrophoresis: acrylamide gel/capillaries
DNA
isolation
- Production of template DNA
around 1985
Need of single-stranded DNA for sequencing
M13 is a filamentous bacteriophage specific to Escherichia coli
(+) SS
RF
(+) single strand
(+/-) Replicative form
Nick at a specific site in the (+) single strand
Synthesis by rolling circle replication of
the (+) single strand
(+) single strand
Replication of bacteriophage M13 DNA in infected bacteria
-Sequencing of pure single-stranded DNA from recombinant M13 particles
Single-stranded DNA
DNA
isolation
- Production of template DNA
around 1990
Double-stranded DNA from recombinant plasmids or PCR products
denatured by heat or alcali for sequencing
DNA
isolation
- Recent improvement of
template DNA production
Multiple displacement amplification
Phi29 DNA Polymerase is the replicative polymerase
from the Bacillus subtilis phage phi29
DNA templates can be amplified 10 000 fold in a few hours
Blanco, L. and Salas, M. (1984) Proc. Natl. Acad. Sci. USA, 81, 5325-5329)
Recent improvement of template DNA production
Principle:
Primers
Scheme for multiply-primed rolling circle amplification (Dean et al, 2001)
- Random oligonucleotide primers complementary to the amplification target circle
- DNA polymerase and deoxynucleoside triphosphates (dNTPs)
Recent improvement of template DNA production
Principle:
Primers
Scheme for multiply-primed rolling circle amplification (Dean et al, 2001)
- Random oligonucleotide primers complementary to the amplification target circle
- DNA polymerase and deoxynucleoside triphosphates (dNTPs)
-Strand displacement DNA synthesis for more than 70 000 nucleotides
without dissociating from the template
Recent improvement of template DNA production
Principle:
Primers
Scheme for multiply-primed rolling circle amplification (Dean et al, 2001)
- Random oligonucleotide primers complementary to the amplification target circle
- DNA polymerase and deoxynucleoside triphosphates (dNTPs)
-Strand displacement DNA synthesis for more than 70 000 nucleotides
without dissociating from the template
-Error rate: 1 in 106- 107 nucleotides
(contrast to 3. 104 for PCR with Taq DNA Polymerase)
Recent improvement of template DNA production
Principle:
Primers
Blanco, PNAS,1989
DNA
isolation
Applications
of the multiple displacement amplification
DNA
isolation
Applications
of the multiple displacement amplification
1. Whole human genome amplification using this method
2. Sequencing the genome of a single cell
DNA
isolation
Applications
of the multiple displacement amplification
1. Whole human genome amplification using this method
Phi29 DNA polymerase is able to amplify linear DNA
(Dean et al, PNAS, 2002)
DNA
isolation
Applications
of the multiple displacement amplification
1. Whole human genome amplification using this method
Phi29 DNA polymerase is able to amplify linear DNA
Cascading strand
displacement
Circular DNA
Linear DNA
(Dean et al, PNAS, 2002)
DNA
isolation
Applications
of the multiple displacement amplification
1. Whole human genome amplification using this method
Phi29 DNA polymerase is able to amplify linear DNA
1-10 copies of human genomic DNA
20-30 mg product
18 hours at 30°C
DNA amplification yield after MDA
(Dean et al, PNAS, 2002)
DNA
isolation
Applications
of the multiple displacement amplification
1. Whole human genome amplification using this method
Phi29 DNA polymerase is able to amplify linear DNA
For:
• Genome sequencing
• Genetic analysis on blood, microdissected tissues...
• Prenatal diagnosis,
• Anthropological samples...
(Dean et al, PNAS, 2002)
DNA
isolation
Applications
of the multiple displacement amplification
2. Sequencing the genome of a single cell
(Zhang et al, Nature Biotech, 2006)
Nature Biotechnology 24, 657 - 658 (2006)
doi:10.1038/nbt0606-657
Single-cell genomics
Clyde A Hutchison III & J Craig Venter
Phi29 DNA Polymerase is the replicative polymerase from the Bacillus subtilis phage phi29.This polymerase
has exceptional strand displacement and processive synthesis properties. The polymerase has an inherent
3´>5´ proofreading exonuclease activity (Blanco, L. and Salas, M. (1984) Proc. Natl. Acad. Sci. USA, 81, 5325-5329)
Figure 1. Sequencing the genome of a single cell.
A single cell is isolated by dilution or by cell sorting. The cell is lysed and the chromosome is denatured
by alkaline treatment. The cellular DNA is amplified >109-fold by multiple displacement amplification (MDA)
using random primers. The hyperbranched DNA product is resolved by shearing and enzymatic treatments,
then cloned and shotgun sequenced. Ideally, a complete genome sequence could be assembled from
the data and then annotated.
DNA
isolation
Applications
of the multiple displacement amplification
2. Sequencing the genome of a single cell
A pioneer work and a new world:
Polymerase cloning "Ploning"
The authors refer to the DNA populations amplified
from single cell as Polymerase clones, or "plones"
Two limitations in this first experiments:
- Bias in "plonable" amplification
- Chimeric plones (about 6%)
(Zhang et al, Nature Biotech, 2006)
DNA
isolation
Applications
of the multiple displacement amplification
2. Sequencing the genome of a single cell
Most of the diversity of the biosphere remains unsampled.
(Zhang et al, Nature Biotech, 2006)
DNA
isolation
Applications
of the multiple displacement amplification
2. Sequencing the genome of a single cell
Most of the diversity of the biosphere remains unsampled.
The ability to sequence an entire genome
from a single uncultured cell should allowed to reveal
this enormous biodiversity.
(Zhang et al, Nature Biotech, 2006)
DNA
isolation
Applications
of the multiple displacement amplification
2. Sequencing the genome of a single cell
Most of the diversity of the biosphere remains unsampled.
The ability to sequence an entire genome
from a single uncultured cell should allowed to reveal
this enormous biodiversity.
Metagenomics
(Zhang et al, Nature Biotech, 2006)
6
Alternatives to the Sanger method
Sequencing single molecules of DNA
Reminder!
The Sanger method is based on the analysis of populations
of DNA molecules
Sequence
production
- Analysis of the DNA fragments produced:
Radioactivity detection/
Laser within an automated DNA sequencing machine
6 Alternatives to the Sanger method
Sequencing single molecules of DNA
Cycle extention method on single molecules
1- Template DNA is arrayed on a surface or wells
2- Sequencing reaction steps including nucleotide incorporation
and washes are performed to identify each base pair.
3- The extended base pair is detected by fluorescence
or luminescence.
Sequential base incorporation steps
Template
Primer
Surface
Chan E.Y. (2005), Mutation res, 573, 13-40
Main features of cycle extention methods
compared to Sanger:
• Massive parallelism
• Short read lengths
• Potential for cost reduction
Pyrosequencing is the most famous
cycle extention method
From Biotage, http://www.pyrosequencing.com
Pyrosequencing
From Biotage, http://www.pyrosequencing.com
From Biotage, http://www.pyrosequencing.com
a, Read length distribution for the 306,178 high-quality reads of the M. genitalium sequencing run.
This distribution reflects the base composition of individual sequencing templates.
b, Average read accuracy, at the single read level, as a function of base position for the 238,066 mapped reads
of the same run
From Biotage, http://www.pyrosequencing.com
The two main problems of
pyrosequencing
a, Read length distribution for the 306,178 high-quality reads of the M. genitalium sequencing run.
This distribution reflects the base composition of individual sequencing templates.
b, Average read accuracy, at the single read level, as a function of base position for the 238,066 mapped reads
of the same run
From Biotage, http://www.pyrosequencing.com
Pyrosequencing: massive parallelism
Genome sequencing in microfabricated
high-density picolitre reactors
Margulies et al, 2005
Genomic DNA is fragmented,
ligated to adapters and separated
into single strands
Fragments are bound to beads
under conditions one fragment by bead.
The beads are captured in droplets
of a PCR-reaction-mixture-in-oil emulsion.
PCR amplification occurs within each
droplet.
Each bead at the end of PCR reaction carries 10 million
copies of an unique DNA template.
Margulies, 2005,
Nature, 437, pp376-380
Margulies et al, 2005
The emulsion is broken, the DNA
strands denatured and the beads
carrying single stranded DNA clones
are deposited into wells of a
fibre-optic slide.
Smaller beads carrying immobilized
enzymes required for pyrosequencing
are deposited into each well.
Margulies et al, 2005
Sequencing instrument
a) Fluidic assembly
b) The well-containing
fibre-optic slide
c) Computer providing
the user interface and
the instrument control
Margulies et al, 2005
De novo assembly of the bacterial genomes
Test on Mycoplasma genitalium (580 000 bp)
Density of wells: 480/1mm2
Total of wells on a slide: 1.6 millions!
14 hours!
Margulies et al, 2005
7
Sequencing or resequencing?
7 Sequencing or resequencing?
-Sequencing: for studies of genomes of unknown species
needing long read length
- Resequencing: for individual studies using a known genome
as guide
Comparison of sequencing methods
Sanger method
ABI 3730xl
Adapted from Chan E.Y. (2005), Mutation res, 573, 13-40
Comparison of sequencing methods
Sanger method
ABI 3730xl
454 technology
Adapted from Chan E.Y. (2005), Mutation res, 573, 13-40
Comparison of sequencing methods
Sanger method
ABI 3730xl
454 technology
Adapted from Chan E.Y. (2005), Mutation res, 573, 13-40
Comparison of sequencing methods
Sanger method
ABI 3730xl
454 technology
Adapted from Chan E.Y. (2005), Mutation res, 573, 13-40
Choice of sequencing method
Example of Neanderthal DNA
DNA from a fragment of 38 000-year-old Neanderthal
fossil found in 1980 in Vindija cave (Croatia)
Neanderthal DNA
constraints
-Rare short DNA
fragments
-Many
contaminations
Advantages of Pyrosequencing
- No bacterial cloning
- No template competition for amplification
- Read length about 200 bp
- Each sequenced product stems from just one
original single stranded template molecule of
known orientation (difference with PCR)
Green R.E. et al, 2006
Principle
Lambert and Millar (2006),
Green et al, (2006)
http://WWW.454.COM/
Results
Analysis of one million base pairs of Neanderthal DNA
Location on the human
karyotype of Neanderthal DNA
Schematic tree illustrating the
number of nucleotide changes
inferred to have occured on
hominoid lineages
Green et al, (2006)
Conclusions
Conclusions
- Sequencing today
is performed in big centers
Conclusions
- Sequencing today
is performed in big centers
- The number of sequences is exponentially growing up....
Conclusions
- Sequencing today
is performed in big centers
- The number of sequences is exponentially growing up....
But the bottle neck remains sequence analysis....
Conclusions
- Sequencing today
is performed in big centers
- The number of sequences is exponentially growing up....
But the bottle neck remains analysis of sequences....
Precisely, the goal of the present course "Bioinformatics
and Comparative Genome Analysis" is to give you tools to
participate to improvements of this knowledge domain...
So...
Good work on the Queen molecule!
Thanks to the organizers!
And thanks for your attention!
Plan of the course
5 Improvements of the Sanger method during these 30 years
A. Generalities
1. DNA isolation: Production of template DNA
- Sequence production:
-Labelling: Radioactivity/Fluorescent dyes
-Analysis of the DNA fragments produced:
- Radioactivity detection/ Laser within an automated DNA sequencing
machine
- Electrophoresis: acrylamide gel/capillaries
B. Details
-DNA isolation: Production of template DNA around 1985
Need of single-stranded DNA for sequencing
- M13 is a filamentous bacteriophage specific to Escherichia coli
-DNA isolation: Production of template DNA around 1990
Double-stranded DNA from recombinant plasmids or PCR products denatured by
heat or alcali for sequencing
3. DNA isolation: Recent improvement of template DNA production
-Multiple displacement amplification
Phi29 DNA Polymerase is the replicative polymerase from the Bacillus subtilis phage
phi29
-Applications of the multiple displacement amplification
- Whole human genome amplification usin g this method
For: Genome sequencing, Genetic analysis on blood, microdis sected tissues...Prenatal
diagnosis, Anthropological samples...
- Sequencing the genome of a single cell: Polymerase cloning "Ploning"
For: Most of the diversity of the biosphere remains unsampled.
The ability to sequence an entire genome from a single uncultured cell should allowed
to reveal
this enormous biodiv ersity. Metagenomics
1
2
Plan of the course (conted)
6 Alternatives to the Sanger method: Sequencing single molecules of DNA
Reminder! The Sanger method is based on the analysis of populations of DNA molecules
1. Cycle extention method on single molecules
1- Template DNA is arrayed on a surface or wells
2- Sequencing reaction steps including nucleotid e incorporation
and washes are performed to identify each base pair.
3- The extended base pair is detected by fluorescence or luminescence.
- Main features of cycle extention methods compared to Sanger:
-Massive parallelism
- Short read lengths
- Potential for cost reduction
- Pyrosequencing is the most famous cycle extention method
-Principle
-Two main difficulties
-Pyrosequencing: massive parallelism
Genome sequencing in microfabricated high-density picolitre reactors
- Instrumention
- Example: Mycoplasma genitalium (580 000 bp)
7 Sequencing or resequencing?
1. Sequencing: for studies of genomes of unknown species needing long read length
2. Resequencing: for individ ual studies using a known genome as guide
3. Comparison of sequencing methods
4. Choice of sequencing method: Example of Neanderthal DNA
Conclusions
- Sequencing today is performed in big centers
- The number of sequences is exponentially growing up....
- But the bottle neck remains analysis of sequences....
Precisely, the goal of the present course "Bioinformatics and Comparative Genome
Analysis " is to give you tools to participate to improvements of this knowledge domain...
So….Good work on the Queen molecule!
Thanks to the organizers!
And thanks for your attention!
3
Download