Genetic Variant Annotation

advertisement
CBI Tech. Workshop - NGS Special Session
Lesson 5
Genetic Variant Annotation
Linlin Yan (颜林林)
Center for Bioinformatics, Peking University
Jun 13, 2011
Outline
 Review & Overview
 Thoughts & Methods




Variant Browsing
Variant Annotation
Association Study
More Beyond
 Demos & Exercises
2
Part I: Review & Overview
Workshop Schedule
Topic
Title
Speaker
Date
0
Warm-up
Warm-up and Introduction
GaoG
4-25
1
Basic
File Format & Reads Mapping
YanLL
5-9
Solexa Pipeline
CaiT
5-16
Alignment File Manipulate
YeYX
5-23
4
Genetic Variant Caller
LiuH
5-30
5
Genetic Variant Annotation
YanLL
6-13
6
Genome Assembling
LiZ
6-20
CaiT
6-27
ZhaoHQ
7-4
LiuXQ
7-11
ChenWB
7-18
TangX
7-25
2
3
7
8
Genetics
Transcriptome
(RNA-Seq)
...
Transcript Mapping
9
Transcript Assembling
10
Differential Expression Caller
11
ChIP-Seq Peak Caller
4
NGS Analysis Workflow
Sequencer
Assembling
Contigs / Scaffolds
SNV / CNV / SV
Annotation
Short Reads
Call Variants
Calculate
Expression
Expression Profile
Mapping
Alignments
Call Peaks
Peaks / Regions
5
Genetic Variant Analysis Workflow
Sequencer
Short Reads
Mapping
Alignments
Call Variants
SNV / CNV / SV
Annotation
 Solexa Pipeline (Lesson 2)
 File Format (Lesson 1)
 FASTQ / Quality / SAM / ...
 Reads Mapping (Lesson 1)
 Maq / Bowtie / BWA
 Alignment File Manipulate (Lesson 3)
 Samtools / BedTools / FastX-tool
 Genetic Variant Caller (Lesson 4)
 GATK
 Genetic Variant Annotation (Lesson 5)
 PolyPhen / SIFT / ANNOVAR / PLINK / ...
6
Part II: Thoughts & Methods
What Could Be Inferred from Variants
SNV / CNV / SV
Genome Annotation
Genetic Variants
Mutation Effects
Disease
Phenotype
 What at the positions?
=> Genome Browser
 How affect functions?
=> Variant Annotation
 What related to phenotype?
=> Association Study
 More beyond ...
=> Disease: CDCV vs. CDRV
8
Genome Browser
Online Browsers:
 UCSC Genome Browser
 http://genome.ucsc.edu/
 Ensembl Genome Browser
 http://www.ensembl.org/
 DNAnexus
 https://dnanexus.com/genomes/hg18/public_browse
Local Browsers:
 IGV (Integrative Genomics Viewer)
 http://www.broadinstitute.org/igv/
9
UCSC Genome Browser
(http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg19)
10
UCSC Genome Browser (cont.)
 Support Formats:
 BED / bigBed
 bedGraph
 GFF
 GTF
 WIG / bigWig





MAF
BAM
BED detail
Personal Genome SNP
PSL
(http://genome.ucsc.edu/)
11
IGV (Integrative Genomics Viewer)
(http://www.broadinstitute.org/igv/)
12
UCSC: Table Browser & Public DB
 Retrieve track data in batch
 Retrieve sequences in specific regions
 Combine regions and/or annotations
 Query track data in public MySQL database
(http://genome.ucsc.edu/cgi-bin/hgTables)
13
These are KNOWN variants.
How about UNKNOWN variants?
Mutation Effects Prediction
 SIFT (Sorting Intolerant From Tolerant)
 http://sift.jcvi.org/
 PolyPhen (Polymorphism Phenotyping)
 http://genetics.bwh.harvard.edu/pph/
 MAPP (Multivariate Analysis of Protein Polymorphism)
 http://mendel.stanford.edu/SidowLab/downloads/MAPP/i
ndex.html
 SNPs3D
 http://www.snps3d.org/
15
Automatically Variant Annotation
ANNOVAR (ANNOtate VARiation)
 http://www.openbioinformatics.org/annovar/
 Gene-based annotation
 SNPs/CNVs affect protein coding
 Region-based annotations
 Variants in specific region
 Filter-based annotation
 Variants reported in dbSNP, 1000 genomes
 Filter by SIFT score
 Others
 Retrieve sequences or cadidate gene list in batch
16
Between Patients and Normals
 Too many variants detected
 Most variants are not related to target disease
 Comparing MAF (Minor allele Frequency) between
patients and normals can indicate related variants
MAF
Patients
Normals
Related
SNP1
5%
5%
No
SNP2
40%
10%
Yes
17
Association Study Tools
 PLINK
 http://pngu.mgh.harvard.edu/~purcell/plink/
 gPLINK
 http://pngu.mgh.harvard.edu/~purcell/plink/
gplink.shtml
 Haploview
 http://www.broadinstitute.org/scientific-
community/science/programs/medical-andpopulation-genetics/haploview/haploview
18
More Beyond: Find Out Causal Gene
 Two Disease Hypothesis Models:
 CDCV: Common Disease, Common Variant
 CDRV: Common Disease, Rare Variant
 To Find Out Rare Variant
 From GWAS (Microarray) to Sequencing
 More Samples
 Pool-up analysis methods
19
Rare Variant Analysis
 Gene-Based Method
(PMID:17660818)
20
Pool Up The Rare Variants
 Fixed-Threshold Method (Li, et al, 2008)
 Weighted Approach (Madsen, et al, 2009)
 Variable-Threshold Method (VT-Test) (Price, et al,
2010)
 http://genetics.bwh.harvard.edu/rare_variants/
21
Part III: Demos & Exercises
Demos
 Data Preparing
 Reads Mapping
 Variant Calling
 BED/Wig generation
23
Demos (cont.)
 UCSC Genome Browser
 Uploading BAM/BED/Wig
 IGV Genome Browser
 Loading BAM/BED/Wig
 UCSC Table Browser
 Retrieve track data
 Retrieve coding sequences
 UCSC Public Database
24
Demos (cont.)
 SIFT & PolyPhen
 ANNOVAR
 PLINK
 VT-Test
25
Thanks for your attention!
Download