The GS FLX Sequencer. What is it and what can we do with it? Jo Stanton Anatomy and Structural Biology University of Otago • The Equipment • The Technology • What can you do with the GS FLX? • Multiple optical fibers are fused to form an optical array. • Proprietary etching method produces wells that serve as picoliter reaction vessels. • Each well is only able to accept a single DNA bead. • Reactions in the wells are measured of the CCD camera. PicoTiterPlate device DNA Library Preparation and Titration 4.5 h and 10.5h emPCR Sequencing 8h 4.5h Well diameter: average of 44 µm 200,000 reads obtained in parallel A single clonally amplified sstDNA bead is deposited per well. Amplified sstDNA library beads Quality filtered bases DNA Library Preparation and Titration DNA capture bead containing millions of copies of a single clonal fragment Amplified sstDNA library beads emPCR Sequencing 4 bases (TACG) cycled 42 times Chemiluminescent signal generation Signal processing to determine base sequence and quality score Quality filtered bases T C G A T 1. Raw data is processed from a series of individual images. 2. Each well’s data is extracted, quantified, and normalized. 3. Read data is converted into flowgrams. Metric and image viewing software Signal output from a single well (flowgram) Signal strength is determined by homopolymer length 4-mer 3-mer 2-mer 1-mer T A C G Flow Order Sequence Specifications • Generate over 100 million bases per 7.5 hour run. • Achieve longer reads, averaging 200 - 300 bases. • Attain higher throughput with over 400,000 reads per run. • Generate single-read accuracy that is greater than 99.5% over 200+ base pair reads. • Benefit from consensus accuracy that is greater than 99.99%. *from Roche Genome Sequencer FLX System brochure. Some Applications Genome Sequencing • Mammoth found in the permafrost near Lake Taimyr, Russia (27,740±220 years before present). • 0.73µg DNA extracted from 1g Mammoth edentulous mandible. • Made library and sequenced using GS20 system • Results: 302,692 reads, average length 95bp. 28,000,000bp of sequence in total • Sequences aligned to African elephant, human and dog genomes. • Mammoth was Female. • Two Neanderthal genome projects. Using same individual - 38,000 year old male bone from Croatia. Genome size 3 billion base pairs. • 454 sequencing has yielded 1 million base pairs to date. Sanger/pyrosequencing yielded 65,000 base pairs. • Humans and Neanderthal diverged roughly 600,000 years ago. Arose from a population of 3000 individuals. • <0.5% difference between Humans and Neanderthal. • Projects aim to finish genome sequencing in two years. • The hunt is on for more Neanderthals to sequence! • • • • • • • Speciation Comparative genomics Process of domestication Species relationships Population genetics Responses to climate change Cause of extinctions Other Genome Projects • • • • • • • • • Pine Barley Campylobacter jejuni Helicobacter pylori Vibrio cholerae Acinetobacter baumannii Mycobacterium tuberculosis Corynebacterium urealyticum Myxococcus xanthus Environmental Sequencing • Characterization of Environments • Deep mine microbial environments • Tanzanian soil • Sea water • Bulk DNA prep from gut of ob/ob and +/+ litter mates. • Combination of Sanger sequencing and GS20 Technology to produce EGT’s (environmental gene tags) • EGT breakdown: 94% bacterial, 3.6% eukaryotic (0.29% mouse & 0.36% fungal), 1.5% archaeal, 0.61% viral. • Increase in ratio of Firmicutes vs Bacteroidetes in obese mice. Increase in Archaea in obese mice. • Ob/ob microbiome is enriched for genes for initial breakdown of dietary polysaccharides. Metabolic pathways: starch/sucrose, galactose, butanoate. • Transplantation of ob/ob or +/+ gut flora to germ free mice. Ob/ob microbiome recipients gained significantly more weight than +/+ recipients. Genome Methylation • Treat DNA sample with sodium bisulfite. Deaminates unmethylated cytosine to uracil. Methylated cytosine is unchanged. • Amplify regions of interest using PCR • Sequence and compare to untreated sample. • Example shows methylation of the CpG island in the p16 promoter from two cell lines. Mutation Detection/Population Genetics • PCR amplify defined regions of genome from individuals. • Sequence and compare. • Ultra Deep Sequencing • Example: EGFr gene mutations in Cancer patient. Ultra-Deep Sequencing of EGFR from Lung Carcinoma Patients Matthew Meyerson, Dana Farber Institute and 454 Life Sciences A total of 11 PCR amplicons, ranging in size between 85 and 156 base pairs, were generated to cover Amplicons Exon 22 Exon 21 Exon 20 Exon 19 Exon 18 Coverage exons 18-22 of EGFR. Each target region was individually amplified and quantified. Before emulsionbased PCR amplification and sequencing all amplicons were pooled in equimolar ratios Sequencing By Synthesis Biopsy PCR Instant cloning: Seperation of single molecules during emPCR 454 = Single Molecule Sequencing Comparison of single sequences emPCR Sequencing Identification of mutations • A lung adenocarcinoma sample previously shown by Sanger sequencing to contain mutation G719S was sequenced on the GS20 platform • Two mutations easily seen at 16 – 18% with 800 fold coverage • Only one is seen in the Sanger sequence Following 12.5 months of erlotinib treatment patient 12.3 relapsed with a massive pleural effusion. Pathological examination of a fibrin clot from the pleural effusion fluid showed very low tumor content in the isolated sample. • Del-4 mutation at 3% abundance (Exon 19) • In addition, T790M mutation, at 2% (Exon 20) shown in previous studies to confer resistance to TKI inhibitors Matthew Meyerson, Dana Farber Institute and 454 Life Sciences Transcriptome analysis • Isolate RNA from cell/tissues and convert to cDNA. Sequence. • GIS-PET - 462,626 in 1 run. 1476 novel genes identified (PET=paired end ditagging) • SAGE > 800,000 • EST approach • Example: Arabidopsis transcriptome Aribidopsis transcriptome from 8 day old seedlings • 541,852 ESTs • 17,449 gene loci. Close to complete transcriptome coverage. • small, medium and long transcripts detected equally. • No sequencing bias to either 3’ or 5’ ends of transcripts. • ESTs not contaminated by genomic DNA intron/exon boundaries clearly preserved • 16,698 ESTs not in dbEST. • Many of these are the first evidence that predicted genes are transcribed. • 60 previously unidentified loci shown as possible protein-coding sequences. • Traditional sequencing relies on bacterial cloning. Therefore cloned products toxic to E.coli will not be detected. Not a problem for 454 technology. • 454 chemistry is not hampered by trasitionally difficult to clone sequence. • Gene expression profiling possible using this approach. Digital Northerns and an open system. • Ideal for non-model systems Micro RNA Analysis sequence.otago.ac.nz • GS FLX Brochure • Publications list • First point of call for information regarding the Anatomy GS FLX facility.