CBI Tech. Workshop - NGS Special Session Lesson 5 Genetic Variant Annotation Linlin Yan (颜林林) Center for Bioinformatics, Peking University Jun 13, 2011 Outline Review & Overview Thoughts & Methods Variant Browsing Variant Annotation Association Study More Beyond Demos & Exercises 2 Part I: Review & Overview Workshop Schedule Topic Title Speaker Date 0 Warm-up Warm-up and Introduction GaoG 4-25 1 Basic File Format & Reads Mapping YanLL 5-9 Solexa Pipeline CaiT 5-16 Alignment File Manipulate YeYX 5-23 4 Genetic Variant Caller LiuH 5-30 5 Genetic Variant Annotation YanLL 6-13 6 Genome Assembling LiZ 6-20 CaiT 6-27 ZhaoHQ 7-4 LiuXQ 7-11 ChenWB 7-18 TangX 7-25 2 3 7 8 Genetics Transcriptome (RNA-Seq) ... Transcript Mapping 9 Transcript Assembling 10 Differential Expression Caller 11 ChIP-Seq Peak Caller 4 NGS Analysis Workflow Sequencer Assembling Contigs / Scaffolds SNV / CNV / SV Annotation Short Reads Call Variants Calculate Expression Expression Profile Mapping Alignments Call Peaks Peaks / Regions 5 Genetic Variant Analysis Workflow Sequencer Short Reads Mapping Alignments Call Variants SNV / CNV / SV Annotation Solexa Pipeline (Lesson 2) File Format (Lesson 1) FASTQ / Quality / SAM / ... Reads Mapping (Lesson 1) Maq / Bowtie / BWA Alignment File Manipulate (Lesson 3) Samtools / BedTools / FastX-tool Genetic Variant Caller (Lesson 4) GATK Genetic Variant Annotation (Lesson 5) PolyPhen / SIFT / ANNOVAR / PLINK / ... 6 Part II: Thoughts & Methods What Could Be Inferred from Variants SNV / CNV / SV Genome Annotation Genetic Variants Mutation Effects Disease Phenotype What at the positions? => Genome Browser How affect functions? => Variant Annotation What related to phenotype? => Association Study More beyond ... => Disease: CDCV vs. CDRV 8 Genome Browser Online Browsers: UCSC Genome Browser http://genome.ucsc.edu/ Ensembl Genome Browser http://www.ensembl.org/ DNAnexus https://dnanexus.com/genomes/hg18/public_browse Local Browsers: IGV (Integrative Genomics Viewer) http://www.broadinstitute.org/igv/ 9 UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg19) 10 UCSC Genome Browser (cont.) Support Formats: BED / bigBed bedGraph GFF GTF WIG / bigWig MAF BAM BED detail Personal Genome SNP PSL (http://genome.ucsc.edu/) 11 IGV (Integrative Genomics Viewer) (http://www.broadinstitute.org/igv/) 12 UCSC: Table Browser & Public DB Retrieve track data in batch Retrieve sequences in specific regions Combine regions and/or annotations Query track data in public MySQL database (http://genome.ucsc.edu/cgi-bin/hgTables) 13 These are KNOWN variants. How about UNKNOWN variants? Mutation Effects Prediction SIFT (Sorting Intolerant From Tolerant) http://sift.jcvi.org/ PolyPhen (Polymorphism Phenotyping) http://genetics.bwh.harvard.edu/pph/ MAPP (Multivariate Analysis of Protein Polymorphism) http://mendel.stanford.edu/SidowLab/downloads/MAPP/i ndex.html SNPs3D http://www.snps3d.org/ 15 Automatically Variant Annotation ANNOVAR (ANNOtate VARiation) http://www.openbioinformatics.org/annovar/ Gene-based annotation SNPs/CNVs affect protein coding Region-based annotations Variants in specific region Filter-based annotation Variants reported in dbSNP, 1000 genomes Filter by SIFT score Others Retrieve sequences or cadidate gene list in batch 16 Between Patients and Normals Too many variants detected Most variants are not related to target disease Comparing MAF (Minor allele Frequency) between patients and normals can indicate related variants MAF Patients Normals Related SNP1 5% 5% No SNP2 40% 10% Yes 17 Association Study Tools PLINK http://pngu.mgh.harvard.edu/~purcell/plink/ gPLINK http://pngu.mgh.harvard.edu/~purcell/plink/ gplink.shtml Haploview http://www.broadinstitute.org/scientific- community/science/programs/medical-andpopulation-genetics/haploview/haploview 18 More Beyond: Find Out Causal Gene Two Disease Hypothesis Models: CDCV: Common Disease, Common Variant CDRV: Common Disease, Rare Variant To Find Out Rare Variant From GWAS (Microarray) to Sequencing More Samples Pool-up analysis methods 19 Rare Variant Analysis Gene-Based Method (PMID:17660818) 20 Pool Up The Rare Variants Fixed-Threshold Method (Li, et al, 2008) Weighted Approach (Madsen, et al, 2009) Variable-Threshold Method (VT-Test) (Price, et al, 2010) http://genetics.bwh.harvard.edu/rare_variants/ 21 Part III: Demos & Exercises Demos Data Preparing Reads Mapping Variant Calling BED/Wig generation 23 Demos (cont.) UCSC Genome Browser Uploading BAM/BED/Wig IGV Genome Browser Loading BAM/BED/Wig UCSC Table Browser Retrieve track data Retrieve coding sequences UCSC Public Database 24 Demos (cont.) SIFT & PolyPhen ANNOVAR PLINK VT-Test 25 Thanks for your attention!