Oxford Nanopore sequencing of complex plant genomes

advertisement
Sequencing of complex plant genomes:
big data …big deal?
Applications and
Challenges of Oxford
Nanopore Sequencing in
the Life Science Industry
Raymond Hulzink, Ph.D
Wageningen, April 14, 2016
Rhu@keygene.com
Genome assembly
The challenge
Long-read sequencing technologies have accelerated whole
genome (re-)sequencing approaches and reduced costs
dramatically ..
but,
de novo construction of highly accurate draft genome
sequences in complex organisms is still a challenge and costly ..
therefore,
high-quality ultra-long reads are needed
‘Repeats longer
than read
length cannot
be resolved!’
The crop innovation company
2
Plant genomes
Size
Crops
e.g.
Melon
400 Mb
Source:
Alpsdake, via Wikimedia Commons- Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons http://commons.wikimedia.org/wiki/File:Paris_japonica_Kinugasasou_in_Hakusan_2010_7_18.jpg
Tomato
800 Mb
Japanese canopy plant
149,000 Mb
Lettuce
1,200 Mb
Pepper
3,200 Mb
Source:
Walter Siegmund - Own work. Licensed under CC BY-SA 3.0 via
Wikimedia Commons http://commons.wikimedia.org/wiki/File:Cardamine_amara_eF.jp
g
Large bitter cress
54 Mb
Source:
Michael Apel - Own work. Licensed under CC BY 3.0
via Wikimedia Commons http://commons.wikimedia.org/ wiki/ File:
Fritillaria_meleagris_MichaD.jpg#/ media/
File:Fritillaria_meleagris_MichaD.jpg
Barley
5,000 Mb
Data source: http://data.kew.org/cvalues/CvalServlet?querytype=1
Snake's head
124,852 Mb
The crop innovation company
3
Plant genomes
Complexity
•
Repetitive DNA
º Medium
- Tandem repeats (rRNA, tRNA)
- Gene families (paralogs)
- Transposable elements (e.g. retro)
º High
e.g. pepper genome ~81%
- Tandem arranged SSRs
repetitive sequences
- Centromeric tandem repeats
Qin et al. (2014) Whole-genome sequencing of cultivated and wild peppers provides
insights into Capsicum demestication and specialization. PNAS 111: 5135-5140
•
Heterozygosity, polyploidy
The crop innovation company
4
MAP @ KeyGene
•
Phase 1 (2014): set-up system,
•
testing software, and sequencing
ONT reference DNA (λ genome)
Phase 2 (2015): sequencing
experimental DNA (plant BAC clones)
“ I have just been looking
at some QC metrics that
the software has sent back
to us and see that your
flow cell is running hotter
than I would have
expected …. ”
Oxford Nanopore
The crop innovation company
5
BAC sequencing
Read alignment against reference
• Alignment with MarginAlign
against PB references
Depth of Coverage (# of reads)
• Despite a low number of 2D
pass reads (<10%), BAC
references were completely
covered (8-20x depth)
• Sequencing error rate
showed ~83% of read
accuracy
MinION / FLO-MAP003
Map Position on Reference
The crop innovation company
6
BAC sequencing
De novo assembly
• de novo assembly with Celera
assembler after one or two rounds of
error correction (NanoCorrect)
• Alignment against PB reference using
MUMmer with dnadiff tool for
estimation of per-base accuracy
PacBio reference (bases)
BAC H049 – Assemb 2
BAC H032 – Assemb 2
• Successful de novo assembly for two
BAC clones with 10 - 15 fold read depth
• High quality assemblies with a small
number of substitution errors and a
moderate amount of insertion /
deletion errors
Nanopore assembly (bases)
The crop innovation company
7
Genome sequencing
Plant pathogen Rhizoctonia solani
•
•
•
•
•
•
Soil-borne plant pathogenic fungus
Causes a wide range of commercially significant
plant diseases
Estimated genome size ~50-55 Mb
o
heterokaryotic (≥ 2 distinct nuclear genomes)
o
10% repetitive sequences
o
duplicated genomic regions
Draft genome sequences available from different subgroups
High level of sequence differences between different
subgroups (~21% shared core genes)
Generate draft genome assembly of Rhizoctonia solani
o
MinION MK1 sequencer with MAP006 chemistry and
FLO-MAP103 flow cells
The crop innovation company
8
Genome sequencing
Extraction of ultra-pure (u)HMW DNA
•
DNA quality and integrity are essential for obtaining high-quality long reads
•
Extraction of ultra-pure HMW and uHMW from plants has unique challenges that require
specific expertise to deal with carbohydrates, phenolics, and other compounds abundant in
plant tissues
•
KeyGene has developed protocols for extraction, purification, analysis, and quantitation of
DNA from a variety of (difficult) plant and pathogen species.
The crop innovation company
9
R. solani sequencing
Extraction and sizing of fungal HMW DNA
Nanodrop
Crude
QubitBR
Tape
Station
[ng/uL]
260/280
260/230
[ng/uL]
[ng/uL]
1,372
1.87
0.92
53
-
210
2.22
2.18
190
162
-
-
-
265
257
Purified
Sized
~45%
The crop innovation company
10
R. solani sequencing
Library preparation
MAP006 work flow
~12.5 K
(9K hydropore S )
Lib002
~18.8 K
(10K hydropore L)
>60 K
100
90
RECOVERY (%)
Lib003
100
80
80
80
70
66
60
50
64
62
Lib002
Lib003
46
Lib004
40
30
20
23
17
10
0
2
Lib004
The crop innovation company
11
R. solani sequencing
Library and read size distribution
Library size
Read length (MinKnow)
Lib002
2D Read Length (Metrichor)
2D Pass Read Length
8.5 K
19 K
17.9 K
Sequence length 2D
11.3 K
Lib003
23 K
21.3 K
Sequence length 2D
34 K
Lib004
56.6 K
The crop innovation company
15.3 K
Sequence length 2D
12
R. solani sequencing
2D pass read summary
Run
Remarks
# 2D
Pass
Reads
Total
length
(Mb)
Max Read
Length
(Kb)
Median 2D
Quality
Score
53.5 ng (6 uL) air bubble
2,900
26
15.8
9.4
53.5 ng (6 uL) heat sink ~40°C
4,204
36
29.0
8.8
89.2 ng (10 uL)
25,346
223
25.9
10.0
37.8 ng (6 uL)
13,068
152
34.2
8.9
17.9 K
% Reads
Library
cumulative length distribution
21.3 K
Read length (bases)
56.6 K
The crop innovation company
125.2 ng (20 uL)
23,806
269
43.7
8.6
7.8 ng (6 uL)
3,414
53
61.4
9.5
28.6 ng (22 uL)
5,931
89
80.4
9.4
13
Genome assembly
Miniasm and Canu assembly summary
• ~54 Mb draft genome sequence with Canu consisting of 679 contigs with a N50 value of ~170 K
and a maximum contig length of more than 2 Mb
• longer reads produce more contiguous assemblies
The crop innovation company
14
Genome assembly
Comparison between genome assemblies
Reference
Platform
Sequence
Yield (Mb)
Sum contigs
(Mb)
# scaffolds
N50 length (Kb)
# contigs
N50 length (Kb)
Zheng et al. Nat
Commun 2013
GAII
5,604
36.9
2,648
~475
6,452
~20.3
Cubeta et al.
Genome Ann
2014
Sanger/
FLX
-
51.7
326
~7,444
6,040
~25.9
Hane et al. PLOS
Gen 2014
HiSeq
-
39.8
857
~161
7,606
~7.2
Wibberg et al. J
Biotech 2015
FLX/MiSeq
2,200/
2,000
42.8
879
-
3,793
~35.1
Wibberg et al.
BMC Gen 2016
MiSeq
2,800
52
2,065
~81.2
5,826
~15.2
KeyGene Canu
MK1
848,6
54.1
-
-
679
~170
• With only 5 flow cells, about 15X coverage
• T.b.d.: detailed read coverage analysis to determine the level of genome duplication and the
estimated heterokaryotic genome size
The crop innovation company
15
Genome alignment
Comparison between two public assemblies
• Alignment of public assemblies
(MUMmer)
Zheng et al 2013- assembly (bases)
• Comparative genome analysis reveals
considerable genetic differences
between different isolates (i.e. genome
size, gene number and composition)
• Level of similarity between R. solani
draft genomes but with an overall low
level of co-linearity
Cubeta et al 2014- assembly (bases)
The crop innovation company
16
Genome alignment
KeyGene assembly vs. Cubeta et al. 2014
Cubeta et al 2014- assembly (bases)
• Considerable sequence diversity exists
between the KeyGene strain and public
Rhizoctonia strains
KeyGene canu assembly (bases)
The crop innovation company
17
Conclusions
• Plant BAC DNA sequencing
• De novo assembly for two BAC clones with 10 - 15 fold read depth
• High quality assemblies using a low number of 2D pass reads
• Rhizoctonia solani genome sequencing
• Large number of high-quality 2D pass reads in 24 hour runs
• Direct sequencing of HMW DNA positively effects read length
• Generated a ~54 Mb draft genome sequence with an estimated read
depth of 10x
• Low level of co-linearity between nanopore assembly and published
draft genomes
• Sequencing of complex plant genomes: big data … big deal!
The crop innovation company
18
What’s next ….?
• Rhizoctonia solani genome sequencing
• Improving the synthesis of long fragment libraries (yield, size)
• Sequencing additional flow cells
• Testing more tools and parameters
• Plant genome sequencing
• KeyGene joined PromethION Early Access Programme (PEAP)
• Draft genome sequence of a melon variety using the PromethION
• Meet us at …
The crop innovation company
19
Acknowledgements
Erwin Datema
Alex Boshoven
Koen Cuelenaere
Lisanne Blommers
Alexander Wittenberg
Nathalie van Orsouw
Michiel van Eijk
The KeyBase®, KeyPoint® Mutation Breeding, WGP™, Sequenced Based Genotyping and KeyGene® SNPSelect technologies are protected by patents and/or patent applications owned by Keygene N.V. KeyGene, KeyBase, KeyPoint and KeySeeQ are registered trademarks of Keygene N.V. in one or
more territories in the world. All other products names, brand names or company names are used for identification purposes only, and may be (registered) trademarks of their respective owners.
The crop innovation company
20
Download