Clinical cardiovascular risk prediction tools such as Framingham and QRISK2... management of patients by aggregating a range of measurable biomarkers... Applying advanced computational approaches for data analytics of electronic health...

advertisement
Applying advanced computational approaches for data analytics of electronic health records
Nicholas Luscombe (Crick) and Harry Hemingway (UCL)
Apply to: UCL
Summary:
Clinical cardiovascular risk prediction tools such as Framingham and QRISK2 assist in the clinical
management of patients by aggregating a range of measurable biomarkers (eg, smoking, blood
pressure, BMI), medical history and treatment information to predict disease risk and guide
subsequent therapeutic management. However conventional epidemiological approaches are
limited to a priori defined predictors; further, models developed using regression based
methods provide individual outcomes only (e.g., all-cause mortality, coronary death, non-fatal
myocardial infarction) and cannot easily incorporate new data types (eg, genome sequences or
novel risk factors from emerging data sources). Curated resources like CALIBER (CArdiovascular
disease research using Linked BEspoke studies and Electronic Health Records by Hemingway
and Denaxas) provide access to contemporary linked national electronic health records from
primary to secondary care, disease registries and mortality registers.
These datasets are large with many patients and measurement types but generally sparse; thus
they present new opportunities for applying probabilistic modelling and unsupervised machine
learning methods as exploratory complements to classical models.
Machine learning can enable automatic identification of new candidate prognostic factors that
could be trialled as predictors in clinical care. Conditional graphical models or structured
output learning can then reveal conditional dependencies between variables that are functions
of independent variables and can be updated in real time. This allows users to compare the
probabilities of multiple clinical outcomes, and perform more complex, conditional queries
(e.g., what is the probability of a set of outcomes given an intervention and/or further
observations). Since many cardiovascular risk factors are well-characterised and classical
models generally perform well (C-index ~0.8), cardiovascular diseases provide an ideal context
for testing modelling approaches, providing state of the art analysis and models to predict
heart failure.
Early tests of the most basic machine-learning methods have shown that we can achieve at
least 80% prediction levels (similar to traditional tests). We expect to improve these
predictions with more advanced methods, and moreover to generalise these applications to
include data from the UK Biobank and Genomics England across additional clinical areas.
References:
1. Narasimhan VM , Hunt KA , Mason D … Hemingway H… van Heel DA. Health and
population effects of rare gene knockouts in adult humans with related parents. Science
2016 (in press).
2. Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L, Wingett SW,
Andrews S, Grey W, Ewels PA, Herman B, Happe S, Higgs A, LeProust E, Follows GA,
Fraser P, Luscombe NM, Osborne CS. Mapping long-range promoter contacts in human
cells with high-resolution capture Hi-C. Nat Genet. 2015 Jun;47(6):598-606.
3. Sugimoto Y, Vigilante A, Darbo E, Zirra A, Militti C, D'Ambrogio A, Luscombe NM, Ule J.
hiCLIP reveals the in vivo atlas of mRNA secondary structures recognized by Staufen 1.
Nature. 2015 Mar 26;519(7544):491-4.
4. Ilsley GR, Fisher J, Apweiler R, De Pace AH, Luscombe NM. Cellular resolution models
for even skipped regulation in the entire Drosophila embryo. Elife. 2013 Aug
6;2:e00522.
5. Rapsomaniki E, Timmis A, George J, Pujades-Rodriguez M, Shah AD, Denaxas S, White
IR, Caulfield MJ, Deanfield JE, Smeeth L, Williams B, Hingorani A, Hemingway H. Blood
pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy
lifeyears lost, and age-specific associations in 1·25 million people. The Lancet 2014;
383(9932):1899–911.
Download