Next-generation sequencing and PBRC Obesity Cell Biology Neuroscience Physiology Nutrition Development Ini al Consulta on Experiment Design Cost - Timeline Immunology PI: Biology Experimental Paradigm Sample Collection Sample Preparation Genomics Core Quality Control Sample Analysis Diabetes AB SOLiD vs.4 Instrument Runs Raw Data to Pipeline Data Analysis Pipelines Gene Expression: SAGE, RNA-Seq Read Mapping and Annota on · Gene and Summary Sta s cs · Cluster Analyses · Visualiza on · Molecular Func ons · Biological Pathways · Links to NCBI · Literature Small RNA Mapping · Known miRNA · Novel miRNA · Expression · Sequence & Loca on · Links to miRBASE · Known gene interac ons Epigenome: Methyla on, ChIP Seq Mapping · Peak calling · Sta s cs · Visualiza on · Compara ve Epigenomics Resequencing Mapping · SNP calling · Small indel · Large indel · CNV · Inversions Next Generation Sequencer Applications • • • • • • • • • • • • DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene Expression Analysis Methylation Studies ChIP Sequencing-transcription factors, histones, polymerases Transcriptome Analysis-splicing, UTRs, cSNPs, nested transcripts MicroRNA Discovery and quantitation Metagenomics, Microbial diversity Copy number variation Chromosomal aberrations Gene regulation studies AB SOLiD Ligation sequencing Observed color sequence base color_1 color_2 color_3 color_4 color_5 0=A y 1 r y 2 3 4 5 r b g b b g r b 6 r r 7 8 y r 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 b b g g r r y y b b y y g g y y g g r r r r y y b b r r y y b b y y y r r b b b b y y g g r r g g b b y y T T C C r r Color dinucleotide space R A C G T T G C A G A C G T C A T G B A C G T A C G T Y G T A C A C G T T T A A Color to DNA sequence A A T C G G G C C T A C C A A A A G G A A A A C C T T C C A A C C G G C C C C T T A A C C C C G G T T C C C C C C T T A A A A G G A A A How many sequence tags* do I need for my gene expression application? • • • • • • SAGE/CAGE – 2-5 million mappable miRNA – 10 million mappable ChIP Seq—10-20 million mappable Whole Transcriptome from polyA RNA – 40-50 million mappable Whole Transcriptome from rRNA depleted - >50 million mappable Whole Transcriptome for Allele Specific Expression - >>50 million mappable SOLiD™ 4 generates >1.4 billion mappable sequences/run (2 slides) Libraries can be multiplexed to decrease the cost/sample according to the application and number of sequences needed. *For human/mouse sized genomes; smaller organisms require fewer sequence tags. SAGE Sequencing vs. Microarray SOLiD v4 Microarray-Illumina Ref 8 Microarray-Illumina Ref 6 3.6 million 25,600 45,200 Known and novel transcripts Known transcripts Known transcripts Sensitivity 6 logs 3 logs 3 logs Technical Reproducibility >.99-.999 0.9 0.9 Correlation to Taqman 0.9 0.7-0.8 0.7-0.8 Multiplexing/Barcoding Yes –up to 48 RNA or 96 DNA samples No No Data Points No background –better for Hybridization process low abundance transcript creates background signal detection Hybridization process creates background signal RNA quantity 5-10 ug 750 ng 750 ng 16 Sample Experiment Cost $7200-full service $6100-PI creates library $3600 $5200 Bioinformatics: Geospiza Run Quality Primary Data Analysis - Images to bases Secondary Data Analysis – Bases to alignments/contigs Applications • Tag Profiling • Small RNA Analysis • Transcriptome seq. • ChIP-Seq • Methylation Analysis • Resequencing • De novo assembly One or more Data sets Discovery Ref Seq + Alignment Assembly, De Novo Sample/Library Quality Instrument-specific Sequences + Quality values Algorithms • Eland • Maq • SOAP • Velvet • Newbler • Mapreads • Others … Tertiary Data Analysis – Experiment Specific • Differential expression • Methylation sites • Binding sites • Gene association • Genomic structure Next-gen sequencing: applications – Genome analysis: basic and translational research • Genetics of disease – new frontiers • Exome resequencing: confirmation of GWAS • Genome sequence as diagnostic tool • Genetic counseling – Epigenome analysis: basic research; biomarkers • Analyses of DNA methylation, transcription factors, histone modifications, non-coding RNA • Epigenomic biomarkers of disease – Gene expression analysis: basic research; diagnostics & biomarkers • Whole transcriptome: all transcribed sequences in a cell • SAGE analysis: expression of known genes • Small RNA: microRNA as regulators of biology – Genotype to phenotype: a new frontier • Pathology: systems biology • Diagnosis: data filtering • Personalized Genomic Medicine: Treatment recommendations Next-gen sequencing: challenges – Rapid growth in methodology • Technology and equipment changes & upgrades – High demands on informatics: • Staff • Software • Computational resources – New ways of handling data needed: • Interpretation • Publication • Storage