IMMUNOINFORMATICS: How structural bioinformatics impacts immunology Julia Ponomarenko

advertisement
IMMUNOINFORMATICS:
How structural bioinformatics impacts immunology
Julia Ponomarenko
jpon@sdsc.edu
PHARM 201
November 18, 2011
Julia Ponomarenko, Ph.D.
San Diego Supercomputer Center, SSPPS

M.Sc., Physics, Novosibirsk, Russia

Ph.D., Biology, Novosibirsk, Russia

2001 – GlaxoSmithKline Pharmaceuticals, USA

2002 – Research Scientist, Prof. Bourne’s Lab, UCSD

2004 – Lead of the structural group at UCSD developing the Immune
Epitope Database (IEDB) (PI: Alex Sette, LIAI)

2008 – PI for NIH:
 Transcriptional regulation of stimulus-responsive gene expression programs
(with Alex Hoffmann, Biochemistry)
 Data integration and tools for systems biology (with M. Baitaluk, SDSC)
Swine-origin H1N1 influenza virus (S-OIV)
in mid-April 2009

2 cases of a unique combination of North
America and Eurasian swine-lineage H1N1
influenza virus occurred in California in
people not exposed to pigs

Neutralizing antibodies against S-OIV were
found exclusively in persons born before
1957

That raised the concern that little protective
immune memory exists in the general
human population
The New York Times, 2009
3

April, 2009: Examined pre-existing immunity against S-OIV - if epitopes that
were present in the H1N1 seasonal flu strains in 1988-2008 are also present
in the S-OIV strains

May, 2009: Found that 69% (54/78) of the epitopes recognized by CD8+ Tcells were completely invariant

August, 2009: Confirmed experimentally that memory T-cell immunity
against S-OIV is present in the adult population and is of similar magnitude as
the pre-existing memory against seasonal H1N1 influenza
4
Immunoinformatics is the application of
(bio)informatics techniques
to study the immune system
Immunoinformatics is the application of
(bio)informatics techniques
to study the immune system
BIOINFORMATICS
IMMUNOINFORMATICS
National Institute of Health
FY 2009
Total budget $30.3B
$4.97B
16%
$4.70B
16%
$3.02B
10%
$2.00B
7%
National Cancer
Institute
National Institute
of Allergy and
Infectious
Diseases
National Heart, National Institute
Lung, and Blood
of General
Institute
Medical Sciences
BIOINFORMATICS
IMMUNOINFORMATICS
$4.97B
16%
$4.70B
16%
$3.02B
10%
$2.00B
7%
National Cancer
Institute
National Institute
of Allergy and
Infectious
Diseases
National Heart, National Institute
Lung, and Blood
of General
Institute
Medical Sciences
Immunoinformatics goals
Understanding and modeling
the immune systems at the
levels of cells, tissues, whole
organisms, and populations
Design medical diagnostics
and vaccines for cancers,
allergies, infectious and
autoimmune diseases
Immunoinformatics: Areas of Study

Immunological Databases

Epitopediscovery:
discovery: antigen
recognition
Epitope
antigen
recognition

Evolution of the immune system

Evolution of pathogens and co-evolution of host and pathogen

Modeling of host-pathogen interactions

Regulatory networks in the cells of the immune system

Mathematical modeling of immunological memory

Computational models of the immune system
Roadmap
Vaccines
Roadmap
Vaccines
Immune
system
Roadmap
Vaccines
Immune
system
Therapeutic
Vaccines
Roadmap
Vaccines
Immune
system
Therapeutic
Vaccines
Epitope
Discovery
Roadmap
Vaccines
Vaccine types

ATTENUATED: Live but weakened whole virus or
bacterium. Minimal reproduction extends immune cells’
exposure to antigen without causing disease: Measles,
Mycobacterium Tuberculosis

INACTIVATED: Whole but “killed” and unable to
reproduce or to cause diseases: Rabies, Flu

SUBUNIT and Recombinant: Fragments of the pathogen.
 Toxoids - Inactivated pathogen’s toxins: Tetanus, Diphtheria
 Recombinant viral capsid proteins: Hepatitis B, HPV
 Purified bacteria polysaccharides: Meningococcal meningitis
 Pathogen’s antigens conjugated with toxoid: Haemophilus
influenzae type b meningitis
Forbes.com
17
Plasmodium falciparum malaria vaccine

RTS,S/AS, is a recombinant vaccine based on the Hepatitis B surface antigen virus-like
particle (VLP) platform, genetically-engineered to include the carboxy terminus (amino
acids 207-395) of the P. falciparum circumsporozoite (CS) antigen .

CS covers the entire surface of sporozoites, the form of the malaria parasite inoculated
into humans by female anopheline mosquitoes.

Asparagine-Alanine-Asparagine-Proline (NANP) amino acid repeat sequence forms the
immunodominant B-cell epitope from P. falciparum CS antigen. This sequence is
species-specific, but highly conserved for isolates from each species.

RTS,S/AS01 induces very high IgG concentrations in vaccinated humans to the NANP CS
repeat. In addition, this vaccine induces moderate to high CD4+ Th1 responses against
18
flanking region peptides.
Vaccines have
been made for 36
of >1,400 human
pathogens
Emerg Infect Dis.
2005;11(12):1842
+HPV
Alternative vaccines

Peptide-based
 Quimi-Hib (Cuba, 2003) -- The first human vaccine against
Haemophilus influenzae type B (or Hib), a bacteria that
causes meningitis and pneumonia in children – in USA
conjugate vaccine is used (1988; very expensive)
 Peptide vaccine against canine parvovirus (cause enteritis
and myocarditis in dogs and minks)

Experimental technologies
 Recombinant vector
 DNA vaccines
Roadmap
Vaccines
Immune
system
Vaccines mimic infection to avert it
Vaccines mimic infection to avert it
Vaccines mimic infection to avert it
How many lymph nodes humans have?
Xenoreactive Complex AHIII 12.2 TCR bound to
P1049 (ALWGFFPVLS) /HLA-A2.1
Vaccines mimic infection to avert it
T-Cell
Receptor
V
V
MHC
class I
-2-Microglobulin
1lp9
MHC class I pathway
Xenoreactive Complex AHIII 12.2 TCR bound to
P1049 (ALWGFFPVLS) /HLA-A2.1
Intracellular pathogen
(virus, mycobacteria)
T-Cell
Receptor
Cytosolic protein
V
Proteasome
V
Peptides
CD8
epitope
TAP
ER
ER
MHC I
TCR
CD8
Any cell
CTL
(TCD8+)
MHC
class I
-2-Microglobulin
1lp9
Complex Of A Human TCR, Influenza HA Antigen
Peptide (PKYVKQNTLKLAT) and MHC Class II
Vaccines mimic infection
to avert it
T-Cell
Receptor
V
V
MHC class II 
MHC class II 
MHC class I pathway
Intracellular pathogen
(virus, mycobacteria)
MHC class II pathway
Extracellular protein
Endosome
Cytosolic protein
Proteasome
Endosome
?
Peptides
CD8
epitope
TAP
ER
ER
MHC I
CD4 epitope
TCR
TCR
CD8
Any cell
MHC II
CTL
(TCD8+)
Endosome
CD4
TCD4+
B-cell, macrophage, or dendritic cell
Vaccines mimic infection to avert it
During embryonic development, regions of V genes combine with
D, J, and C genes to produce  1.0E+15 different antibodies
31
Igg2A Intact Mouse Antibody - Mab231 (PDB ID 1igt)
Fab fragment
VL
VH
CL
CH
Fv fragment
Light chain
Fc fragment
Heavy chain
Interaction between APC and Th

http://www.youtube.com/watch?v=M48qu5c7Cfg&NR=1
33
Antibody affinity maturation (great video)

http://www.youtube.com/watch?v=qGsyBwDVnTU&feature=rel
ated
34
Vaccines mimic infection to avert it
Epitope
HIV-1 envelope
protein gp120 (core
fragment)
Epitope
CD4 (N-terminal
two domain
fragment)
17b epitope
Antibody 17b (Fab
fragment)
PDB: 1gc1
Roadmap
Vaccines
Immune
system
Therapeutic
Vaccines
Immunotherapy: Monoclonal Antibodies

Alemtuzumab: For leukemia

Infliximab: For Crohn’s disease and rheumatoid arthritis

Rituximab: For non-Hodgkin’s lymphoma

Trastuzumab: Herceptin for breast cancer

Basiliximab and daclizumab: Block IL–2, immunosuppresives for
transplants

Movie how rituximab works:
http://www.youtube.com/watch?v=UtNeImBmQCM&feature=related
3m 40 sec
Cancer immunotherapy
Cancer immunotherapy
Cancer immunotherapy
A therapeutic patient-targeted prostate cancer vaccine
Provenge

Approved by FDA in 2010

The median survival time for patients was 25.8 months comparing to 21.7
months for placebo-treated patients.

Video: http://www.provenge.com/how-provenge-works.aspx (3 min)
Re-engineered T-cells kill B-cells affected by chronic
lymphocytic leukemia
Tiny magnetic beads
force the larger T-cells
to divide before they are
infused into the patient.
http://www.nytimes.com/2011/09/13/health/13gene.html?pagewanted=all
44
Re-engineered T-cells kill B-cells affected by chronic
lymphocytic leukemia

To survive without B-cells, the patients need periodic infusions of IVIG (intravenous
immunoglobulin) - the pooled IgG antibodies extracted from the plasma of over one
thousand blood donors. IVIG's effects last between 2 weeks and 3 months.

IVIG is an infusion of IgG antibodies only. Therefore, peripheral tissues that are defended
mainly by IgA antibodies, such as the eyes, lungs, gut and urinary tract are not fully
protected by the IVIG treatment.

http://www.nytimes.com/2011/09/13/health/13gene.html?pagewanted=all
45

More to read about cancer immunotherapies:
http://www.ncbi.nlm.nih.gov/pubmed/20706612
http://www.ncbi.nlm.nih.gov/pubmed?term=20187092
46
Roadmap
Vaccines
Immune
system
Therapeutic
Vaccines
Epitope
Discovery
Three types of epitopes
 T-cell MHC class I
Xenoreactive Complex AHIII 12.2 TCR bound to
P1049 (ALWGFFPVLS) /HLA-A2.1
Vaccines mimic infection to avert it
T-Cell
Receptor
V
V
MHC
class I
-2-Microglobulin
1lp9
Three types of epitopes
 T-cell MHC class I
 T-cell MHC class II
Complex Of A Human TCR, Influenza HA Antigen
Peptide (PKYVKQNTLKLAT) and MHC Class II
Vaccines mimic infection
to avert it
T-Cell
Receptor
V
V
MHC class II 
MHC class II 
Three types of epitopes
 T-cell MHC class I
 T-cell MHC class II
 B-cell or antibody epitopes
HIV-1 envelope
protein gp120 (core
fragment)
Epitope
CD4 (N-terminal
two domain
fragment)
17b epitope
Antibody 17b (Fab
fragment)
PDB: 1gc1
B cell (magenta, orange) and
T cell epitopes (blue, green, red) of lysozyme
PDB: 1dpx
Why to know epitopes?

Vaccines - epitope should be able to elicit T-cell
response or/and production of antibodies
neutralizing the pathogen

Diagnostics - epitope should in vitro bind an
antibody under diagnosis
 Early diagnostics of infectious diseases : SARS (2004),
malaria, Chagas' disease, leishmaniasis (2003), Lyme
disease (2005)
 Autoimmune diseases: lupus, rheumatoid arthritis
 Allergic reactions
Data for epitope discovery
Pathogen Databases

HIV databases

Pathogen Database at Los Alamos National Laboratory Sequences of oral pathogens (18 bacteria and 5 viruses),

Influenza Virus Resource at NCBI

NMPDR – National Microbial Pathogen Data Resource Sequences of 670 bacterial, 44 archaeal, and 29
eukaryotic genomes

Airborne Pathogen Database - 27 pathogens

Six Bioinformatics Resource Centers (BRCs)

Virulence Factor Databases
Data for epitope discovery
Immune Genes and Diseases

IMGT/GENE-DB - IG and TR genes from human, mouse, rat
and rabbit

IMGT/LIGM-DB - IG and TR genes, > 250 species

IPD databases @EBI – other genes

IPD-MHC (include IMGT/MHC-NHP) (@EBI) – MHC alleles
for non-human species

IMGT/HLA (@EBI) – HLA (human MHC) class I and II alleles

HPTAA - Potential tumor-associated antigens

Allele Frequency Database
MHC (Major Histocompatibility Complex), aka
HLA (Human Leukocyte Antigen) in human

HLA complex contains more than 220 genes

Most heterozygous humans express two copies of three MHC
class I (two alleles of HLA-A, -B, -C genes) and three MHC class II
molecules (HLA-DR, HLA-DP, HLA-DQ) inherited from both parents

Different species have different number of active MHC genes;
e.g., the resus macaque has 22 MHC class I genes
HLA genes are the most polymorphic in
human genome
 How many HLA alleles are known?
 7,059 (5,674 a year ago; 4,161 two years ago)
 5,468 HLA class I (4,383 a year ago; 3,007 two year ago)
 1,591 HLA class II (1,291 a year ago; 1,154 two years ago )
Data from IMGT/HLA @EBI
Populations differ by allele frequencies
Why?
Populations differ by allele frequencies
MHC polymorphism confers a population susceptibility to
a wide range of diseases and pathogens
Why to know allele frequencies by populations?

Population-optimized diagnostic tests:
 Designing reagents for HLA-typing, such as primers or probes

Population-optimized epitope-based vaccines:
 A vaccine should be effective for a sufficiently large
percentage of a given population
 At the same time, it should contain minimum number of
epitopes to satisfy cost of approval, quality control,
production, etc.
Populations differ by allele frequencies
Most of the MHC highly polymorphic
residues are in the peptide binding pocket
A*02 vs A*24: 14 of 26 polymorphic
residues bind peptide
Most of the MHC highly polymorphic
residues are in the peptide binding pocket
Phe9 in A*02 interacts
with Ile of GILGFVFTL
Ser9 in A*24 interacts
with Tyr of VYGFVRACL
Mutation from bulky Phe9 in A*02 to small Ser9 in A*24
makes the HLA binding pocket
and be able to
A*02 deeper
vs A*24
accommodate bulky Tyr
Data for epitope discovery
Immune Epitopes

IEDB database

AntiJen database

HIV Los Alamos database
Rotation Student Project:
Further Development of EpitopeViewer
(Beaver J., Bourne P., Ponomarenko J. BMC Immunome Research 2008)

For the structures of TCR-MHC-peptide complexes, visualize
interactions between TCR-MHC, TCR-peptide, and MHC-peptide

Visualize CDR regions of antibodies and TCR

Visualize the user’s submitted data through the web

Make it in JMOL
Prediction of MHC class I epitopes
Intracellular pathogen (virus,
mycobacteria)
• Proteosomal cleavage sites
(several methods exist based on
small amount of in vitro data)
Cytosolic protein
Proteasome
• Peptide-TAP binding (ibid.)
Peptides
CD8
epitope
TAP
ER
ER
MHC I
TCR
CD8
Any cell
CTL
(TCD8+)
• Peptide-MHC binding
• Prediction of pMHC-TCR
binding
Measuring and predicting MHC class I binding peptide
IC50
Sequence
QIVTMFEAL
3.6
LKGPDIYKG
308
NFCNLTSAF 50,000
AQSQCRTFR 38,000
CTYAGPFGM
143
CFGNTAVAK 50,000
...
Predict binding peptides
means to find function Fi
such that
Fi (Sequence) ≈ Affinity
log(IC50) ~ Binding free Energy
low IC50  high affinity
The half maximal inhibitory concentration (IC50) is a measure of
the effectiveness of a compound (peptide) in inhibiting biological
or biochemical function (binding MHC). Indicates how much of a
compound is needed to bind MHC by half.
Calculate scoring matrix from affinities
Function F is a matrix
F(sequence) = Sum of ‘sequence’ S matrix entries
 Find the matrix that minimizes differences F(S) – Affinity(S)
log (IC50)
0.50
0.72
2.37
3.42
3.46
4.07
4.18
4.24
4.39
4.40
4.90
Peptide
FQPQNGSFI
ISVANKIYM
RVYEALYYV
FQPQSGQFI
LYEKVKSQL
FKSVEFDMS
FQPQNGQFH
VLMLPVWFL
YMTLGQVVF
EDVKNAVGV
VFYEQMKRF
…
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
1
-0.3
0.2
0.8
0.6
-1.3
-0.2
1.1
-0.4
-0.3
0.0
-0.7
-0.1
1.2
0.4
-0.2
-0.3
-0.2
-0.1
0.0
-0.3
2
0.8
0.9
0.9
-0.4
0.5
0.1
0.9
-0.7
0.0
-1.9
-1.2
0.3
0.5
-1.1
0.9
0.1
-0.5
-0.9
0.7
0.2
3
-0.3
0.0
-0.4
0.7
-0.5
0.3
-0.1
-0.4
1.1
-0.4
-0.7
0.1
0.6
0.0
1.0
0.1
0.1
-0.1
-0.5
-0.6
HLA A*0201
4
5
6
-0.3 -0.2 -0.3
0.3 -0.5 -0.1
-0.3 0.3 0.2
-0.2 0.1 -0.4
0.1 -0.1 0.0
-0.1 0.0 0.4
0.4 0.1 0.2
0.1 -0.1 -0.4
0.1 0.1 0.6
-0.2 0.0 -0.2
0.2 -0.6 0.0
-0.3 -0.1 -0.3
-0.3 0.4 0.0
-0.1 0.4 -0.2
0.3 0.1 0.4
-0.4 0.1 0.3
0.4 0.1 -0.5
0.2 0.0 -0.3
-0.2 -0.1 0.2
0.2 0.0 0.4
7
0.0
0.1
0.4
-0.2
-0.3
0.3
0.0
-0.5
0.9
0.0
0.0
0.0
-0.4
-0.3
0.7
-0.2
0.2
0.1
-0.3
-0.4
8
0.0
0.2
0.3
-0.2
-0.4
-0.1
0.2
0.5
0.2
-0.1
0.0
0.2
-0.5
0.2
0.0
-0.1
0.0
0.1
-0.1
-0.3
9
-0.9
0.4
0.6
-0.5
-0.8
0.2
0.8
-1.4
0.9
-1.1
-0.8
0.7
0.7
0.7
0.9
0.2
-0.1
-1.9
0.4
0.8
1
-0.3
0.3
0.8
0.3
0.4
0.2
-0.3
-0.4
-0.7
-0.4
-0.6
0.2
0.6
0.0
-0.7
-0.3
0.3
-0.1
0.5
0.2
2
-0.2
0.4
0.4
0.3
0.7
0.4
0.1
-0.7
0.9
-0.7
-1.0
0.4
0.5
-0.7
0.9
-0.5
-1.2
-0.5
0.3
0.3
3
0.1
0.0
0.6
0.4
-0.5
0.2
0.1
-0.3
0.5
-0.3
-0.5
-0.4
0.4
-0.1
0.2
0.0
0.3
0.0
-0.5
-0.5
Performance measure for prediction methods
Predicted score
(binding affinity value)
ROC curve
TP
FP
Score
FN threshold
TN
TP+FN – actual binders (based on
a defined threshold on binding affinity values)
TN+ FP – actual non-binders (ibid.)
Sensitivity = TP / (TP + FN) = 6/7= 0.86
Specificity = TN / (TN + FP) = 6/8 = 0.75
True positive rate, TP / (TP +
FN)
1
0.9
0.8
0.7
0.6
0.5
AUC
or
AROC
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
False positive rate, FP / (FP + TN)
1
Benchmarking predictions of peptide binding to MHC I
(Peters et al. PLoS Comput Biol. 2006 Jun 9;2(6):e65)

48 MHC class I alleles

Length of peptides 8 – 11 aa

48,828 data points {peptide – affinity value}

20 different methods were evaluated
Performance evaluation
measured IC [nM]
50
100000
ROC:
Predict Binders with IC50 < 500 nM
10000
1000
100
100%
syfpeithi
10
1
-5
0
5
10
15
20
25
30
predicted score
measured IC [nM]
50
100000
10000
1000
35
true positive rate
r2 = 0.29
80%
60%
40%
bimas (AUC=0.920)
syfpeithi (AUC=0.871)
random (AUC = 0.5)
20%
100
10
bimas
0%
0%
r2 = 0.48
1
0.0001
0.01
1
predicted score
100
10000
20%
40%
60%
false positive rate
80%
100%
Consensus Rank
Peptide
LTDLGLLYT
CSANNSHHY
LSIRGNSNY
FSDQIEQEA
QSSINISGY
LSDSSGVEN
IC50
Rank
ann smm ann smm consensus
2
3
1
1
1
20
33
2
2
2
80 137
3
4
3.5
189 89
4
3
3.5
200 4920 5
6
5.5
1400 403
6
5
5.5
Consensus works best
MHC
H-2_Db
H-2_Dd
H-2_Kd
H-2_Kk
HLA_A-0201
HLA_A-3001
HLA_A-6802
HLA_B-0702
HLA_B-0801
HLA_B-1501
HLA_B-2705
HLA_B-3501
HLA_B-5101
HLA_B-5301
HLA_B-5401
HLA_B-5801
SMM
0.912
0.853
0.936
0.770
0.952
0.941
0.898
0.964
0.943
0.952
0.940
0.889
0.868
0.882
0.921
0.964
ANN
0.933
0.925
0.939
0.790
0.957
0.947
0.899
0.965
0.955
0.941
0.938
0.875
0.886
0.899
0.903
0.961
ANN+SMM
0.933
0.910
0.949
0.796
0.956
0.952
0.903
0.966
0.959
0.952
0.944
0.889
0.888
0.902
0.921
0.966
Summary on peptide-MHC class I binding

Large, quantitative peptide-MHC binding datasets
available

Consensus approach gives AUC > 0.90

Top 1% predicted binders are actual MHC class I
epitopes (based on testing of predicted epitope binding
against CD8+ T-cell response in mice infected by
vaccinia virus)
MHC class II epitope
prediction:
Challenges

The epitope length 9-37 aa

The peptide may have nonlinear conformation

The MHC binding groove is
open from both sides and it is
known that residues outside
the groove effect peptide
binding
Complex Of A Human TCR, Influenza HA Antigen
Peptide (PKYVKQNTLKLAT) and MHC Class II
T-Cell
Receptor
V
V
MHC class II 
MHC class II 
Benchmarking predictions of peptide binding to MHC II
(Wang et al. PLoS Comput Biol. 2007)

16 alleles

10,017 data points {peptide – affinity value}

9 different methods were evaluated: 6 matrix-based,
2 SVM, 1 QSAR-based

AUC values varied from 0.5 to 0.83

Comparison with 29 X-ray structures of peptide-MHC
II complexes (14 different alleles):
 The success of the binding core recognition was 21%-62%
Ab initio structure-based prediction of peptideMHC class II binding

Statistical pair potential (Zhang, DTU)

Molecular dynamics (Wang, LIAI)

Contact maps (Nikitas Papangelopoulos, UCSD)

Benchmarked on 3,882 experimentally measured peptideHLA DRB1*0101 binding affinities
Ab initio structure-based prediction of peptideMHC class II binding
(Zhang et al., PLOS One 2010)
The reason of low
performance could be
in complex nature of
peptide-MHC class II
interactions:

Long peptides

Contribution of
flanking amino acids
into binding

Antigen processing
MD for peptide-MHC class II binding

See review
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2981876

Example: HLA-DP2-peptide binding
(http://www.ncbi.nlm.nih.gov/pubmed/21898654): 247 peptideHLA complexes (all possible mutations for 13 positions) were
simulated to obtain the parameters of the binding matrix (20 aa
x13 peptide positions); applied to 457 peptide with known binding
affinities to HLA-DP2, the method gave better prediction than
existing sequence-based methods
Summary on peptide-MHC class II binding

Large, quantitative peptide-MHC binding datasets available

But prediction is still poor

New methods relying on both structural and peptide-MHC binding
data should improve the prediction.
Prediction of peptide-MHC I peptide binding from
both sequence, binding and structural data
(Jojic et al., Bioinformatics, 2006)

Method: threading of a peptide sequence onto 3D-structure of a
complex of other peptide with the same or similar (by sequence)
HLA molecule combined with machine learning on binding data

Results:
• The method outperformed all other sequence-based methods,
except ANN method (Nielsen et al. 2003) for some alleles.
• The method outperformed ANN when the available training
data for an allele was small; e.g., for B*4002 allele (119 data
points) it gave AUC of 0.82 vs. 0.75 (ANN).
Methods for antibody epitope prediction

Sequence-based (suitable for linear epitopes only) Maximum sensitivity of sequence-based methods is
59%; maximum AUC is ~0.60

Structure-based (antibody binding site prediction for a
protein of a given 3D structure)

Epitope mapping using peptide libraries with following
reconstruction of the epitope on the surface of protein
3D structure (if known or can be modeled)
Prediction methods performance measures
TN=127
FP=13
1
sensitivity = TP / (TP + FN) = 0.38
specificity = 1 – FP / (TN + FP) = 0.92
True positive rate, TP / (TP +
FN)
TP=6
FN=10
ROC curve
0.9
0.8
0.7
0.6
0.5
AUC
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
False positive rate, FP / (FP + TN)
Area Under ROC Curve (AUC) =
0.5*(sensitivity + specificity) = 0.64
Benchmark of the methods on 42 X-ray structures
of antibody-protein complexes
Random method
0.50
AUC
PatchDock 1st model
0.58
DOT 1st model
0.59
CEP average
0.54
DiscoTope
0.60
PEPITO
0.63
Rubinstein et al., 2008
0.65
ElliPro average
0.53
ElliPro best
0.73
new method
0.75
"Ideal" method
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Ponomarenko et al., BMC Bioinformatics, 2008
ElliPro prediction for Plasmodium vivax ookinete surface
protein Pvs25 [PDB:1Z3G, chain A]
94
The method basics

Actual epitope from the structure of
antibody-protein complex

Generated Epitope - surface residues
inside the sphere of radius R with the
center at the actual epitope

Non-epitopes are generated
randomly on the rest of protein
surface with the sphere of radius R
Propensity of polar residues
discriminated epitopes versus non-epitopes
0.35
0.3
epitopes
0.25
non-epitopes
0.2
0.15
0.1
0.05
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
[number of A in the epitope]/[number of all residues in the epitope]
[number of A on the surface]/[number of all residues on the surface]
Naïve Bayes classifier
Random method
0.50
AUC
PatchDock 1st model
0.58
DOT 1st model
0.59
CEP average
0.54
DiscoTope
0.60
PEPITO
0.63
Rubinstein et al., 2008
0.65
ElliPro average
0.53
ElliPro best
0.73
new method
0.75
"Ideal" method
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Low success rate of epitope predictions:
Reasons & Perspectives
Assumption
“The epitope is a property
of the antigen” is wrong
Epitopes cover ~75% of a lysozyme surface
180°
Low success rate of epitope predictions:
Reasons & Perspectives
Assumption
“The epitope is a property
of the antigen” is wrong
The epitope needs to be
considered in the context
of a specific antibody
There is no enough data to carry statistical analysis
of protein residue - antibody residue preferences
The number of X-ray structures of
protein-antibody complexes in PDB
350
300
250
200
150
100
50
0
1997
1999
2001
2003
2005
2007
2009
< 100 representative structures
2011
2012
There is no enough data to carry statistical analysis
of protein residue - antibody residue preferences
The number of X-ray structures of
protein-antibody complexes in PDB
350
300
250
200
150
100
50
0
1997
1999
2001
2003
2005
2007
2009
< 200 representative structures
2011
2012
Framework for predicting epitopes in the
context of specific antibodies

Select a pathogen(s)

Obtain the panel of mAbs targeting the pathogen

For each mAb,
 Identify antigens
 Measure affinity
 Determine epitope(s), utilizing functional and structural assays
 Determine neutralization capacity in vitro and in vivo

Analyze data and develop algorithms for predicting
epitope and paratope for a given antigen-antibody pair
104
Converging on an HIV Vaccine
105
106
Critical Ab-antigen interactions are similar
107
Summary

Knowledge of epitopes is essential for development of
vaccine and diagnostics

The problem of epitope prediction is far from solution
Supplement slides
109
Videos
1.
http://www.youtube.com/watch?v=M48qu5c7Cfg&NR=1 (2 min)
2.
http://www.youtube.com/watch?v=qGsyBwDVnTU&feature=related (3m
30s)
3.
http://www.youtube.com/watch?v=UtNeImBmQCM&feature=related
(3m30s)
4.
http://www.provenge.com/how-provenge-works.aspx (~3 min)
110
Recommended Books

Immunological Bioinformatics, Ole Lund et al., MIT Press, 2005

Immunoinformatics: Predicting Immunogenicity In Silico, Ed.:
Darren Flower, Humana Press, 2007

In Silico Immunology, Eds.: Darren Flower & Jon Timmis,
Springer, 2007

Bioinformatics for Vaccinology, Darren Flower, Wiley-Blackwell,
2008
Recommended Journals

Immunome Research

Nucleic Acids Research

BMC Immunology

Journal of Molecular Recognition

Immunogenetics

Vaccine

Journal of Immunology

Molecular Immunology

Bioinformatics

Drug Discovery Today

BMC Bioinformatics

Applied Bioinformatics

BMC Structural Biology

In Silico Biology

PLoS Computational Biology

International Journal of Immunogenetics

Immunity

Methods in Molecular Biology

PLoS One

Biosystems
Immunological synapse
T-cell-antigen recognition and the immunological synapse
Johannes B. Huppa & Mark M. Davis
Nature Reviews Immunology 3, 973-983 (December 2003)
113
Immunological synapse: 25–30 peptide–MHC complexes were
required at the interface to induce T cells
Immunology
Volume 133, Issue 4, pages 420-425, 1 JUN 2011 DOI: 10.1111/j.1365-2567.2011.03458.x
http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2567.2011.03458.x/full#f1
114
Large-scale molecular dynamics simulation of a membraneembedded TCR–pMHC–CD4 complex on atomistic level

The TCR–pMHC–CD4 complex is composed of
two crystal structures: the CD4 four domain
molecule [PDB:1WIO] and the TCR–pMHCII
complex [PDB:1FYT].

The CHARMM and VMD packages were used
for preparing the initial molecular models and
analyzing the simulation data. Modeller was
used to build the transmembrane and
extracellular loops missing in the X-ray
structures. Explicit solvent molecular dynamics
simulations were performed using NAMD.

The computed structural and thermodynamic
properties were in fair agreement with
experiment.
http://www.ncbi.nlm.nih.gov/pubmed/17980430
115
Download