Signal Generation and Imaging PicoTiterPlate Wells Photons Detected by Camera 1600K field of addressable wells Reagent Flow Sequencing By Synthesis Photons Generated • Spectral Instruments Series 800 Camera with Fairchild Imaging LM485 CCD (4096x4096, 15 μm pixels), directly bonded to a 1:1 imaging fiber bundle; cooled to -25 °C • PTP in direct contact with imaging fiber bundle (no alignment or focusing issues); NA ~ 0.75 • Full-frame imaging mode; read-out during wash (dark portion of flow cycle) Proprietary Image Processing Raw data series of images T C G A T Data converted into flowgrams” Proprietary Signal-Processing & Basecalling Image Data → Signal processing Pipeline → Flowgrams → Quality Filtering → HQ Reads → Basecalling → HQ Bases → Mapping & Assembly 4-mer T A Flow C Order G 3-mer TTCTGCGAA 2-mer 1-mer Key sequence = TCAG for well identification and calibration Proprietary Mapping in Flowgram Space Reference Chromosome Flowgram Fragment Flowgrams (RN) …,1, 3, 1, 0, 0, 2, 2, 0, 0, 1, 2, 3, 0, 1, 0… …, 1.00, 3.14, 0.15, 0.20, 0.21, 1.84, 1.95,… Re-sequencing and find variants to the reference genome Proprietary De Novo Assembly in Flowgram Space Overlap to form contigs Fragment Flowgrams (RN) Draft sequences of new genomes (species that have not been sequenced before) Proprietary Potential Sources of Error Optical Cross-Talk • Light penetrates to adjacent fibers; leads to signal contamination Non-DNA Well DNA Well False signal True signal Chemical Cross-Talk • PPi / ATP produced in DNA well appears in down-stream non-DNA wells due to convection / diffusion; leads to signal contamination Flow PPi Non-DNA Well False signal DNA Well True signal Proprietary Signal-Processing Solution Original Filtered (51x51) kernel 2D intensity contour Pixel Pixel Filtering removes signal cross-contamination Proprietary Typical GS20 Run Results 42 Flow Cycles ~ 30MB per run ~ 300,000 reads ~ 100 bp per read # of reads 15000 10000 5000 0 41 51 61 0 1 2 71 81 91 101 111 121 131 60% > 50% wells error-free ~ 1 % individual read error % of reads 50% 40% 30% 20% 10% 0% 3 4 5 6 7 8 9 # of errors per read Proprietary 10+* Consensus Accuracy > 99.99+% when 10x over-sampling Example Ref / assembled genome: 2 -GTGCGCGCGCGGGACTAATCCCGGTTCGCGCGTCGGGCATGACACGCAAC- 10 reads aligned to this position Read #: FlowSignal Read 1: 2.52 Read 2: 1.95 Read 3 2.11 Read 4: 1.53 Read 5: 1.32 Read 6: 2.14 Read 7: 2.06 Read 8: 1.85 Read 9: 2.21 Read 10: 2.17 Consensus (mean): 1.99 Proprietary De Novo Assembly Results 4 runs of GS20 (E. Coli 4,639,675 bp) Each data point represents 1/2 GS20 run 100.00% 1800 # of contigs 1600 1400 # Contigs 90.00% % Coverage 80.00% 1200 70.00% 1000 800 60.00% 600 50.00% 400 200 0 % coverage 2000 40.00% 1 run 4.5 2 runs 9 13 16.5 3 runs 20 23 4 runs 26.5 30.00% 30 Oversampling Proprietary Genome Sequencer FLX Recently Launched Next Generation Sequencing Platform in Q1, 2007 in collaboration with Roche Diagnostics At least 400K (200K) reads of avg. 250 bp (100 bp) 8 hours (5 hours) run time Single read avg accuracy >99.5% (99%) over 200 (100) bases Consensus read accuracy > 99.99% Avg yield from single run ~ 100 Mb (20-30 Mb) (GS20 performance) Improved fluidics for faster reagent delivery Firmware control of reagent delivery & camera timing On-board reagent dilution Optimized biochemistry Improved algorithms with corrections for – Crosstalk (for higher densities) – Signal droop & Phasing Numerical filter for improved rejection of low quality reads Proprietary Fluidics Modification – Air Plug Insertion Camera cover G A C Debubbler Air mM air plug in tube PTP Inlet Nuc Conc (non-dim) PTP Inlet Conc (non-dim) 21 s Injection: Ver 1.0 (New M Concentration Profile Pre-dilution withReagent & w/o air bubble Mode 11 Standard 0.80.8 Air Bubble Insertion 0.60.6 t air bubbles removed at debubbler 0.40.4 0.2 0.2 0 0 0 0 20 20 40 40 Time (s) Time (s) Proprietary 60 60 80 Whole Genome Sequencing Results from GS FLX E. Coli (50% GC) C. jejuni (35%GC) T. thermophilus (71% GC) Proprietary Observed Individual Read Accuracy Cumulative Read Error 4.0% 3.5% E. coli run #1 09_29A E. coli run #2 09_29B 09_14 + 09_18A E. coli run #3 09_18B+09_25 E. coli run #4 Thermophilus T. thermophilus C C.jejuni jejuni Reported in Nature 2005 3.0% 2.5% 2.0% GS20 Q2 2006 1.5% 1.0% 0.5% 0.0% 0 50 100 150 200 250 Base Position All blind-filtered reads (no reference genome required) Proprietary Newbler™ Assembly Results from GS FLX C. jejuni T. thermophilus E. coli E. coli (GS20) Genome Size: 1,641,481 2,127,575 4,639,675 4,639,675 Number of Runs: ½ ½ 1 3 Assembly Contigs: 25 56 105 140 Assembly Cover: 98.31% 98.15% 97.61% 97.46% Overall Accuracy: 99.996% 99.991% 99.998% 99.998% Avg. Contig Size: 64.6 kb 39.8 kb 43.3 kb 32.4 kb N50 Contig Size: 116 kb 82.1 kb 105.5 kb 67.2 kb Largest Contig: 481 kb 383.0 kb 204.7 kb 164 kb Proprietary