Signal Generation and Imaging
PicoTiterPlate
Wells
Photons
Detected
by Camera
1600K field of
addressable wells
Reagent Flow
Sequencing
By Synthesis
Photons
Generated
• Spectral Instruments Series 800 Camera with Fairchild Imaging LM485 CCD (4096x4096,
15 μm pixels), directly bonded to a 1:1 imaging fiber bundle; cooled to -25 °C
• PTP in direct contact with imaging fiber bundle (no alignment or focusing issues); NA ~ 0.75
• Full-frame imaging mode; read-out during wash (dark portion of flow cycle)
Proprietary
Image Processing
Raw data series of
images
T
C
G
A
T
Data converted
into flowgrams”
Proprietary
Signal-Processing & Basecalling
Image Data → Signal processing Pipeline → Flowgrams → Quality Filtering
→ HQ Reads → Basecalling → HQ Bases → Mapping & Assembly
4-mer
T
A Flow
C Order
G
3-mer
TTCTGCGAA
2-mer
1-mer
Key sequence = TCAG for well identification and calibration
Proprietary
Mapping in Flowgram Space
Reference Chromosome Flowgram
Fragment Flowgrams (RN)
…,1, 3, 1, 0, 0, 2, 2, 0, 0, 1, 2, 3, 0, 1, 0…
…, 1.00, 3.14, 0.15, 0.20, 0.21, 1.84, 1.95,…
Re-sequencing and find variants to the
reference genome
Proprietary
De Novo Assembly in Flowgram Space
Overlap to form contigs
Fragment Flowgrams (RN)
Draft sequences of new genomes (species that
have not been sequenced before)
Proprietary
Potential Sources of Error
Optical Cross-Talk
• Light penetrates to adjacent fibers; leads to signal contamination
Non-DNA Well
DNA Well
False signal
True
signal
Chemical Cross-Talk
• PPi / ATP produced in DNA well appears in down-stream non-DNA wells due to
convection / diffusion; leads to signal contamination
Flow
PPi
Non-DNA
Well
False
signal
DNA Well
True
signal
Proprietary
Signal-Processing Solution
Original
Filtered
(51x51) kernel
2D intensity
contour
Pixel
Pixel
Filtering removes signal cross-contamination
Proprietary
Typical GS20 Run Results
42 Flow Cycles
~ 30MB per run
~ 300,000 reads
~ 100 bp per read
# of reads
15000
10000
5000
0
41
51
61
0
1
2
71
81
91
101
111
121
131
60%
> 50% wells error-free
~ 1 % individual read
error
% of reads
50%
40%
30%
20%
10%
0%
3
4
5
6
7
8
9
# of errors per read
Proprietary
10+*
Consensus Accuracy > 99.99+% when
10x over-sampling
Example
Ref / assembled genome:
2
-GTGCGCGCGCGGGACTAATCCCGGTTCGCGCGTCGGGCATGACACGCAAC-
10 reads aligned to this position
Read #:
FlowSignal
Read 1:
2.52
Read 2:
1.95
Read 3
2.11
Read 4:
1.53
Read 5:
1.32
Read 6:
2.14
Read 7:
2.06
Read 8:
1.85
Read 9:
2.21
Read 10:
2.17
Consensus (mean): 1.99
Proprietary
De Novo Assembly Results
4 runs of GS20 (E. Coli 4,639,675 bp)
Each data point represents 1/2 GS20 run
100.00%
1800
# of contigs
1600
1400
# Contigs
90.00%
% Coverage
80.00%
1200
70.00%
1000
800
60.00%
600
50.00%
400
200
0
% coverage
2000
40.00%
1 run
4.5
2 runs
9
13
16.5
3 runs
20
23
4 runs
26.5
30.00%
30
Oversampling
Proprietary
Genome Sequencer FLX
Recently Launched Next Generation Sequencing Platform
in Q1, 2007 in collaboration with Roche Diagnostics
At least 400K (200K) reads of avg. 250 bp (100 bp)
8 hours (5 hours) run time
Single read avg accuracy >99.5% (99%) over 200 (100) bases
Consensus read accuracy > 99.99%
Avg yield from single run ~ 100 Mb (20-30 Mb)
(GS20 performance)
Improved fluidics for faster reagent delivery
Firmware control of reagent delivery & camera timing
On-board reagent dilution
Optimized biochemistry
Improved algorithms with corrections for
– Crosstalk (for higher densities)
– Signal droop & Phasing
Numerical filter for improved rejection of low quality reads
Proprietary
Fluidics Modification – Air Plug Insertion
Camera cover
G
A
C
Debubbler
Air
mM
air plug in tube
PTP Inlet Nuc Conc (non-dim)
PTP Inlet Conc (non-dim)
21 s Injection: Ver
1.0 (New M
Concentration
Profile
Pre-dilution
withReagent
& w/o air
bubble Mode
11
Standard
0.80.8
Air
Bubble
Insertion
0.60.6
t
air bubbles removed
at debubbler
0.40.4
0.2
0.2
0
0
0
0
20
20
40
40
Time (s)
Time (s)
Proprietary
60
60
80
Whole Genome Sequencing Results from GS FLX
E. Coli (50% GC)
C. jejuni (35%GC)
T. thermophilus (71% GC)
Proprietary
Observed Individual Read Accuracy
Cumulative Read Error
4.0%
3.5%
E. coli run #1
09_29A
E. coli run #2
09_29B
09_14
+ 09_18A
E. coli run
#3
09_18B+09_25
E. coli run #4
Thermophilus
T. thermophilus
C
C.jejuni
jejuni
Reported in
Nature 2005
3.0%
2.5%
2.0%
GS20
Q2 2006
1.5%
1.0%
0.5%
0.0%
0
50
100
150
200
250
Base Position
All blind-filtered reads (no reference genome required)
Proprietary
Newbler™ Assembly Results from GS FLX
C. jejuni
T. thermophilus
E. coli
E. coli
(GS20)
Genome Size:
1,641,481
2,127,575
4,639,675
4,639,675
Number of Runs:
½
½
1
3
Assembly Contigs:
25
56
105
140
Assembly Cover:
98.31%
98.15%
97.61%
97.46%
Overall Accuracy:
99.996%
99.991%
99.998%
99.998%
Avg. Contig Size:
64.6 kb
39.8 kb
43.3 kb
32.4 kb
N50 Contig Size:
116 kb
82.1 kb
105.5 kb
67.2 kb
Largest Contig:
481 kb
383.0 kb
204.7 kb
164 kb
Proprietary