Bioinformatics pipeline for detection of immunogenic cancer

Towards Personalized
Genomics-Guided Cancer
Immunotherapy
Ion Mandoiu
Department of Computer Science & Engineering
Joint work with
Sahar Al Seesi (CSE)
Jorge Duitama (CIAT)
Fei Duan, Tatiana Blanchard, Pramod K. Srivastava (UCHC)
Mandoiu Lab
Main Research Areas:
• Bioinformatics Algorithms
• Development of Computational Methods for Next-Gen Sequencing Data Analysis
Ongoing Projects
• RNA-Seq Analysis (NSF, NIH, Life Technologies)
- Novel transcript reconstruction
- Allele-specific isoform expression
- Computational deconvolution of heterogeneous samples
• Viral quasispecies reconstruction (USDA)
- IBV evolution and vaccine optimization
• Genome assembly and scaffolding, LD-based genotype calling, local ancestry
inference, metabolomics, …
2
- More info & software at http://dna.engr.uconn.edu
Genomics-Guided Cancer
Immunotherapy
mRNA Sequencing
Peptide
Synthesis
Tumor Specific
Epitopes
CTCAATTGATGAAATTGTTCTGAAACT
GCAGAGATAGCTAAAGGATACCGGGTT
CCGGTATCCTTTAGCTATCTCTGCCTC
CTGACACCATCTGTGTGGGCTACCATG
…
SYFPEITHI
ISETDLSLL
CALRRNESL
…
AGGCAAGCTCATGGCCAAATCATGAGA
Immune System Stimulation
T-Cell
Response
Mouse Image Source: http://www.clker.com/clipart-simple-cartoon-mouse-2.html
Tumor
Remission
Bioinformatics Pipeline
Read
Alignment
Data
Cleaning
Variant
Detection
Haplotyping
Epitope
Prediction
• Hybrid alignment strategy (HardMerge)
• Clipping alignments & removal of PCR artifacts
• Bayesian model based on quality scores (SNVQ)
• Max-Cut algorithm (RefHap)
• PWM and ANN algorithms (NetMHC)
Hybrid Read Alignment Approach
mRNA
reads
Transcript
Library
Mapping
Read
Merging
Genome
Mapping
http://en.wikipedia.org/wiki/File:RNA-Seq-alignment.png
Transcript
mapped reads
Mapped
reads
Genome
mapped reads
• More efficient compared to spliced
alignment onto genome
• Stringent filtering: reads with multiple
alignments are discarded
Percentage of reads with mismatches
Clipping Alignments
2.5
Lane 1
2
Lane 2
1.5
Lane 3
1
0.5
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73
Read position
Removal of PCR Artifacts
Variant Detection and Genotyping
Locus i
Reference
genome
Ri
AACGCGGCCAGCCGGCTTCTGTCGGCCAGCAGCCAGGAATCTGGAAACAATGGCTACAGCGTGC
AACGCGGCCAGCCGGCTTCTGTCGGCCAGCCGGCAG
CGCGGCCAGCCGGCTTCTGTCGGCCAGCAGCCCGGA
GCGGCCAGCCGGCTTCTGTCGGCCAGCCGGCAGGGA
GCCAGCCGGCTTCTGTCGGCCAGCAGCCAGGAATCT
GCCGGCTTCTGTCGGCCAGCAGCCAGGAATCTGGAA
CTTCTGTCGGCCAGCCGGCAGGAATCTGGAAACAAT
CGGCCAGCAGCCAGGAATCTGGAAACAATGGCTACA
CCAGCAGCCAGGAATCTGGAAACAATGGCTACAGCG
CAAGCAGCCAGGAATCTGGAAACAATGGCTACAGCG
GCAGCCAGGAATCTGGAAACAATGGCTACAGCGTGC
Variant Detection and Genotyping
• Pick genotype with the largest posterior probability
Accuracy as Function of Coverage
Haplotyping
• Somatic cells are diploid, containing two nearly identical copies of
each autosomal chromosome
– Novel mutations are present on only one chromosome copy
– For epitope prediction we need to know if nearby mutations appear in
phase
Locus Mutation
Alleles
Locus Mutation Haplotype
1
Haplotype
2
1
SNV
C,T
1
SNV
T
C
2
Deletion
C,-
2
Deletion
C
-
3
SNV
A,G
3
SNV
A
G
4
Insertion
-,GC
4
Insertion
-
GC
RefHap Algorithm
• Reduce the problem to Max-Cut
• Solve Max-Cut
• Build haplotypes according with the cut
Locus 1 2 3 4 5
f1
* 0 1 1 0
f2
1 1 0 * 1
f3
1 * * 0 *
f4
* 0 0 * 1
1
f4
-1
3
f1
f2
1
f3
-1
h1 00110
h2 11001
Epitope Prediction
Profile weight matrix (PWM) model
C. Lundegaard et al. MHC Class I Epitope Binding Prediction Trained on Small Data
Sets. In Lecture Notes in Computer Science, 3239:217-225, 2004
SYFPEITHI Score
H2-Kd
J.W. Yedell, E Reits and J Neefjes. Making sense of mass destruction: quantitating MHC class I
antigen presentation. Nature Reviews Immunology, 3:952-961, 2003
R² = 0.5333
-20
-10
0
10
NetMHC Score
20
Results on Tumor Data
Tumor Type
RNA-Seq Reads (Million)
Genome Mapped
Transcriptome Mapped
HardMerge Mapped
HardMerge Mapped Bases (Gb)
High-Quality Heterozygous SNVs in CCDS Exons
Non-synonymous
Missense
Nonsense
No-stop
NetMHC Predicted Epitopes
MethA
105.8
75%
83%
50%
3.18
1,504
1,160
1,096
63
1
836
CMS5
23.4
54%
59%
36%
0.41
232
182
178
4
142
Tnpo3
15
15
10
10
5
5
0
30
0
40 0
10
20
30
400
200
0
40
ai
20
600
Days after tumor challenge
N
10
800 P < 0.0001
v
Tn e
po
3
Naive
0
AUC (mm2)
Mean Tumor
Diameter (mm)
• Tumor rejection potential of identified epitopes currently evaluated
experimentally in the Srivastava lab
Ongoing Work
• Sequencing of spontaneous tumors (TRAMP mice)
• Detecting other forms of variation: indels, gene
fusions, novel transcripts
• Incorporating predictions of TAP transport efficiency
and proteasomal cleavage in epitope prediction
• Integration of mass-spectrometry data
• Monitoring immune response by TCR sequencing