Integrative models of the hepatitis C virus infection

advertisement
Integrative models of the
hepatitis C virus infection:
Modeling wicked problems
Presenter:
James Lara, Ph.D.
Centers for Disease Control and Prevention
Division of Viral Hepatitis
1600 Clifton Road
Atlanta, GA 30333
jlara@cdc.gov
History of Epidemiology*
John Snow
Broadwick Street cholera outbreak, London 1854
 Founding event for Computational Epidemiology.
 Ability to abstractly recognize a pattern without bias.
 Predicting the daily weather is easier than predicting disease.
 Public Health Science has greatly impacted life expectancy.
Average Lifespan (years) ‡
100
50
37.5
67.2
48.3
78.2
Worldwide
USA
0
1990
2010
Era
‡ Sources: Am J Clin Nutr, 1992; 55: 1196S-1202S; and CIA World Factbook.
* Chris Lynberg; www.ipdps.org/ipdps2010/ipdps2010-slides/ipdps-presentations.org (with permission)
2
History of CDC
1942: Office of Malaria Control during WWII.
1947: CDC employees purchase campus from Emory for $10 with Robert Woodruff gift.
1957: Inclusion of STD prevention.
1960: Inclusion of TB prevention.
1963: Immunization program is established.
1980: Centers for Disease Control (CDC).
1992: Renamed to:
Centers for Disease Control and Prevention.
2010: Total workforce of 15,000 ; 8,500 FTE’s ; FY $6.8B ; 50 states ; 45 countries
Source: www.cdc.gov
Chris Lynberg; www.ipdps.org/ipdps2010/ipdps2010-slides/ipdps-presentations.org (with permission)
3
CDC Organization Chart (2010)
4
CDC ‘s primary goals: prevention of illness, disability, and death
Model of long-term national productivity benefits from reduced daily intake of calories & sodium in the US. †
 Comorbidities increase probability of limitations that prevent work.
 The long-term benefit of reduced sodium intake is $108.5B.
 Facilitate planning by federal agencies.
 Help inform public health policy and the business case.
 For every $1 spent on wellness programs, the return is $4.56-$4.73*.
† Source: Dali et al., Am J Health Promot. 2009 Jul/Aug 23(6): 423-430.
* Source: Ozminkowski et al., Am J Health Promot. 1999 Sep/Oct; 14(1): 31-43.
5
Viral Hepatitis
 Viral hepatitis is liver inflammation caused
by viruses.
 Viral hepatitis is the leading cause of liver
cancer and the most common reason for liver
transplantation.
 Specific hepatitis viruses have been labeled
A, B, C, D, E, F, and G.
 The most common types are Hepatitis A,
Hepatitis B, and Hepatitis C.
 Hepatitis C is the major cause of chronic
liver disease and cirrhosis in the US.
6
Viral Hepatitis C
 Viral hepatitis C is caused by infection with
the hepatitis C virus (HCV).
Clinical manifestation: acute and chronic.
 Six HCV genotypes (1–6).
 Evolves as quasispecies (QS).
 Combinatorial therapeutic treatment:
interferon and ribavirin.
 Treatment efficacy varies by HCV genotype
and patient’s tolerance.
 No vaccine is available for Hepatitis C.
7
Hepatitis C Virus (HCV)
 RNA genome: ~9,600 bases
 Polyprotein: 3011 amino-acids
 Mechanisms of HCV infection
persistence are not well understood:
Insufficient immune response
Virus – host interactions
High genetic variability
8
Hepatitis C virus (HCV) infection is the most common chronic
bloodbourne infection and a major public health problem in the US
Disease Burden from HCV in the US (2002-2007)*
Clinical characteristics of acute HCV (2007)*
30,000
90.0%
20,000
80.0%
70.0%
No. acute clinical cases
15,000
849
2,800
802
3,200
4,200
758
694
3,400
4,500
5,000
891
10,000
1,223
4,800
Est. No. acute clinical cases
Est. No. new infections
Percentage
17,000
29,000
25,000
100.0%
60.0%
50.0%
Died from hepatitis
40.0%
Hospitalized for hepatitis
30.0%
Had jaundice
20.0%
10.0%
0.0%
0
>15
2002 2003 2004 2005 2006 2007
15–39
40–59
>60
Age group
No. of chronically infected persons:
Annual No. of chronic liver disease deaths:
2.7 – 3.9 million
12,000
Chronic infection develops in 70%-85% of HCVinfected persons; 60%-70% of chronically infected
persons have evidence of active liver disease
9
*http://www.cdc.gov/hepatitis/HCV/StatisticsHCV.htm
Intravenous drug use (IDU) and multiple sex partners are the major
risk factors associated to HCV infection
Trends in epidemiology among patients with acute HCV in the US (2001-2007)*
10
*http://www.cdc.gov/hepatitis/HCV/StatisticsHCV.htm
Clinical prognosis and treatment outcome of HCV infection has
dependencies to many viral and host factors.
Distribution of genotypes according demographic trends among chronically HCV-infected patients in the US (1988-1994)†*
100%
80%
80%
60%
60%
60%
40%
40%
40%
20%
20%
20%
0%
0%
0%
6_29 30_39 40_49 50_59
78.4%
>60
90.9%
100%
100%
80%
68.8%
Male
Caucasian
Female
Afro-AM
Genotype 1
Genotype 2
Genotype 3
Mex-AM
ETHNICITY
GENDER
AGE
71.2%
69.9%
2.5
2
1.5
1
0.5
0
2.1
1
2.3
1.8
1a
1b
GENOTYPE
1.9
2&3
Weighted GMC (IU/ml) x 1E+6
Weighted GMC (IU/ml) x 1E+6
HCV RNA concentrations among chronically infected patients by genotype and demographic characteristics (1988-1994)‡*
3.5
3
2.5
2
1.5
1
0.5
0
3.3
2.2
1.4
1.9
2.1
2.6
1
<40
≥40
AGE
Male
Female Caucasian Afro-AM Mex-AM
GENDER
ETHNICITY
11
†Weighted percentages by genotype; ‡Weighted Geometric mean concentrations(GMC); *In: O.V. Nainan et. al. Gastroenterology 2006; 131:478-484
Integrative Molecular Epidemiology Concept
Historical approach
Integrative Epidemiology
Viral factors:
Pylogenetics, mutation rates,
molecular determinants,
genotype, etc.
Linkage
Host factors:
Immunological,
demographical, genetic, and
other risk factors
Linkage
HCV infection:
Pathogenicity, virulence,
clinical outcome, therapy
response, etc.
Linkage
Assessment of risk factors.
Integration of risk factors for
outcome prediction
SARs
VIRUS
HOST
(Interactions)
(Interactions)
Genome
Demographical
Quasispecies
Immunological
HCV infection:
Predisposition, susceptibility,
prognosis, therapy outcome.
Genetic
 Ultimate goal: Accurate quantitative models for outcome prediction
12
Historical approach
Viral factors:
Pylogenetics, mutation rates,
molecular determinants,
genotype, etc.
Linkage
Host factors:
Immunological,
demographical, genetic, and
other risk factors
Linkage
HCV infection:
Pathogenicity, virulence,
clinical outcome, therapy
response, etc.
Linkage
Assessment of risk factors.
 Accounts for trends within a population.
 Does not take into account:
 genetic variability of individuals within a population
 genetic variability of viral strains within an individual
 Unsuitable for individual outcome prediction
How will a patient respond to a medication?
13
Towards individualized & tailored care and prevention
 Take into account:
 genetic variability of an individual within a population
 genetic variability of viral strains within an individual
Integrative Epidemiology
 Take advantage of high throughput technologies (molecular profiling, proteomics,
genetic testing, etc).
 Suitable for outcome prediction.
 The right treatment for the right person at the right time.
 Required for effective public health intervention (disease eradication).
Integration of risk factors for
outcome prediction
SARs
VIRUS
HOST
(Interactions)
(Interactions)
Genome
Demographical
Quasispecies
Immunological
HCV infection:
Predisposition, susceptibility,
prognosis, therapy outcome.
Genetic
14
Public Health Intervention: “A double edge sword”
 1910’s: Massive vaccination to eradicate sleeping disorder (using 5 syringes).
 1966: Programme to eradicate smallpox began in West and Central Africa (using
jet injectors).
 1970: last case of smallpox is reported.
 1966–1772: >28M children (1–6 yr’s of age) received measles vaccination.
 1997: The use of jet injectors is stopped.
15
 2010: Models indicate that prevalence of HBV genotype E is due to interventions.
Public Health Intervention: “A double edge sword”
 Egypt has the highest prevalence of HCV in the world.
 Has the highest morbidity and mortality from chronic liver disease, cirrhosis and
hepatocellular carcinoma.
 High degree of homogeneity of HCV subtypes (4a) probably due to vaccination
intervention.
Schistosomiasis life cycle
16
Source: World Health Organization (WHO).
Public Health Intervention: “A double edge sword”
 Intervention may lead to the selection of more
resistant and virulent strains.
 Unproportional decreases in incidence
and deaths.
 Increase in the morbidity and mortality of the
disease.
 Accurate models (e.g. probabilistic models):
estimate long-term effects of intervention on
disease burden, and design of optimal strategies
for eradication.
17
Modeling HCV Infection
 Assessing relationships from a copious amount
of features: “curse of dimensionality”.
 Modeling HCV virulence, susceptibilities to
various factors and predispositions to infection
or therapy failure is difficult because:
 Underlying mechanisms of are not
understood.
 Discrepancy among experts.
 Changes with time.
18
Genome Sequencing for Public Health
 Molecular Evolution of Pathogenicity (study evolutionary changes)
 Total Viral Population Analysis (disease and outbreak surveillance)
 Genome Data Mining (factors of virulence)
 Discovery of new hepatitis viruses
 Biomarker Discovery (polymorphisms of therapy resistance)
Genome
Sequencing
Molecular
Evolution
Comparative
Genomics
Genome
Assembly
19
Chris Lynberg; www.ipdps.org/ipdps2010/ipdps2010-slides/ipdps-presentations.org (with permission)
Viral RNA Mass Spectrometry
50000
100000
150000
200000
60
40
Whole Serum
20
0
50000
100000
150000
200000
50000
100000
150000
200000
4
Fraction 1
2
0
4
Fraction 2
2
0
75
50000
100000
150000
200000
50000
100000
150000
200000
50000
100000
150000
200000
50000
100000
150000
200000
50000
50000
100000
100000
150000
150000
200000
200000
50
Fraction 3
25
0
75
50
Fraction 4
25
0
10
Fraction 5
5
0
10
7.5
5
2.5
0
Fraction 6
20
Chris Lynberg; www.ipdps.org/ipdps2010/ipdps2010-slides/ipdps-presentations.org (with permission)
Genome sequencing of HCV virus results in high data generation and
special computing requirements
 HPC ( High Performance Computing): Systems comprising of very
fast resources, typically 100’s or 1000’s of processors, and very fast
memory, network, and storage.
 Computational Science: Science done by computations rather than
by theory and experiment alone, which typically requires HPC
resources.
21
Chris Lynberg; www.ipdps.org/ipdps2010/ipdps2010-slides/ipdps-presentations.org (with permission)
Requirements for coherent integrative computational epidemiology
 Science: (Theory; Experiment)
 Metrics, data collection, analysis.
 Computational Science: (Algorithms)
 Performing science computationally.
 Matching the algorithm to the computer architecture.
 Computer Science: (O/S, Programming)
 How to accelerate computational science.
 How to reduce barriers of parallelization.
22
Chris Lynberg; www.ipdps.org/ipdps2010/ipdps2010-slides/ipdps-presentations.org (with permission)
Study example:
THE HCV GENOME: IN SEARCH OF
EPISTATIC INTERRELATIONSHIPS
23
Coordinated Evolution of HCV
 The complex network of coordinated substitutions is
an emergent property of genetic systems with
implications for evolution, vaccine research, and drug
development.
 Such properties as polymorphism or strength of
selection, the epistatic connectivity mapped in the
network is important for typing individual sites,
proteins, or entire genetic systems.
 Help devise molecular intervention strategies for
disrupting viral functions or impeding compensatory
changes for vaccine escape or drug resistance
mutations.
 May be used to find new therapeutic targets, as
suggested in this study for the NS4A protein, which
plays an important role in the network.
24
Source: David Campo et. al. PNAS 2008, 105(28): 9685-9690.
Coordinated Evolution of HCV
 An algorithm for addressing coordinated mutations
that evolve with HCV were developed in MatLab (Zoya
Dimitrova).
 Using multiple computational architectures to find
optimal solution.
 Challenge: Having a library of parallelized algorithms
for the right computer architecture.
25
Study example:
LINKING HEPATITIS C VIRUS
QUASISPECIES GENETIC DIVERSITY TO
FEATURES OF VIRAL INFECTION
26
Sequence of HCV HVR1 quasispecies is linked to virological factors
HCV SEQUENCE
HCV SEQUENCE
HOST
HOST
Viral titer (VT)
Genomic Structure
Number of quasispecies (NQS)
Selection
(dN/dS)
27
Linking Sequences
oflinked
HCV HVR1to
Quasispecies
Sequence ofBayesian
HCV Network
HVR1Model
quasispecies
is
virological factors
to Viral Parameters
28
Evaluation of Models
Predictions: Classification Modeling
Target classes
10-fold-CV ‡
(%) Acc.
randTest †
(10-fold-CV ‡)
Genotype
99.9%
0.3286
dN/dS^^ (3-bin)
(2-bin)
94.4%
92.2%
0.4020
0.5120
NQSaa
88.0%
0.3887
NQSnt
87.7%
0.3978
Viral Titer
97.2%
0.6031
‡ Avg.
accuracies
† Random assignment of class labels
^^ Based on dNdS 3 class or 2 class grouping
29
Validation of Models
Predictions: Classification Modeling
Target classes
10-fold-CV ‡
(%) Acc.
TestSet**
Genotype
99.9%
100%
dN/dS^^ (3-bin)
(2-bin)
94.4%
92.2%
70.3%
82.7%
NQSaa
88.0%
70.3%
NQSnt
87.7%
72.4%
Viral Titer
97.2%
52.40%
‡ Avg.
accuracies
† Random assignment of class labels
** 10 NHANES-3 patients; 5M and 5F; Genotypes 1a and 1b; 185nt/96aa HVR1 QS
^^ Based on dNdS 3 class or 2 class grouping
30
Study example:
PREDICTIVE MODELS OF DRUG
THERAPY OUTCOMES
31
Coevolution among Genomic Sites of the Hepatitis C Virus
during Interferon–Ribavirin Therapy
 Only 50% of chronically HCV infected patients demonstrate
sustained virological response (SVR) to interferon/ribavirin
therapy.
 Patients who do not achieve SVR show complete absence of
response (NR) or unsustainable response (UR).
 UR presents in two forms: patients who relapse (R), and
patients who breakthrough (BT).
 BT is a special case where drug resistance evolves during
treatment.
32
Coevolution among Genomic Sites of the Hepatitis C Virus
during Interferon–Ribavirin Therapy
Importance of the probabilistic relationships
between HCV proteins and therapy outcome
12
10
8
6
4
Total Forces
2
0
Linear Projections of Physicochemical Properties
33
Therapy outcome prediction
34
Features of HCV infection are imprinted in the viral genome.
NS5A model
Classifier
Evaluation
DTNB
Linear Projection
97.5†
95.2*
Validation (% accuracy)
Overall NR class BT class
72.2
83.3
75.0
83.3
66.7
83.3
35
Ongoing research related to therapy outcome
 Beth Israel Deaconess Medical Center collaboration:
 Deep sequencing of HCV 1a QS sequences
 Approx. 13-15 samples/pat., collected over a time span of 48 hrs
 10,000-25,000 sequence reads/sample
 Atlanta Medical Center collaboration:
 Deep sequencing of HCV 1a variants
 Approx. 15-20 samples/patient during & after treatment
 5,000-10,000 sequence reads/sample
36
Continuing challenges to support prevention and control of HCV
Case Study – Hepatitis C Virus
 454 sequencing and alignment of hundreds of thousands
(>400,000) sequence variants using exact or heuristic algorithms
requires high performance computing.
 3D structure templates are not available for rational design of
peptides and proteins to aid in development of diagnostics.
 Compute bound Bayesian networks for Molecular
epidemiological studies.
 New computational technologies, services and
development/application of faster algorithms will be necessary
in the very near future to analyze and process these huge
amounts of data.
37
Lets say:
A & C are dependent on each other regardless of B and/or D.
C & D are dependent on each other regardless of A and/or B.
Three BN models graphically describes above model
38
Disclaimer
"The findings and conclusions in this presentation have
not been formally disseminated by [the Centers for
Disease Control and Prevention/the Agency for Toxic
Substances and Disease Registry] and should not be
construed to represent any agency determination or
policy."
Acknowledgements
Division of Viral Hepatitis
Bioinformatics and Molecular
Epidemiology Laboratory
-David Campo
-Zoya Dimitrova
-Mike Purdy
-Guoliang Xia
-Gilberto Vaughan
-Sumathi Ramachandran
-Lydia Ganova-Raeva
-Joseph Forbi
-Hong Thai
-Yulin Lin
-Livia Rossi
-Johnny Yokosawa
-YURY KHUDYAKOV
CDC
IT Research & Development
-Christopher A. Lynberg
CDC
DSR/BCFB Scientific Computing Activity
-Elizabeth B. Neuhaus
Corporate R&D
-Accelereyes
-NVIDIA
Collaborators
-Atlanta Medical Center, Georgia, USA
-Beth Israel Deaconess Medical Center, Boston, USA
-Saint Louis University School of Medicine, Missouri, USA
-UT Southwestern Medical Center, TX, USA
QUESTIONS?
41
Download