Biomedical Informatics
NCBO Forum: Ontology of Clinical Phenotypes
Dallas, Sept 2008
Christopher G Chute, MD DrPH
Professor Biomedical Informatics
Mayo Clinic
ICD-11 Revision Steering Group Chair
World Health Organization
Biomedical Informatics
The Historical Center of the
Health Data Universe
Clinical Data
NCBO Phenotype; © 2008, Mayo Clinic
2
Biomedical Informatics
Copernican Healthcare
(Niklas Koppernigk)
Billable Diagnoses Clinical Guidelines
Medical Literature
Scientific Literature
NCBO Phenotype; © 2008, Mayo Clinic
3
Biomedical Informatics
Mayo: A Century-Long Tradition of
Studying Patient Outcomes
Demographics
Diagnoses
Procedures
Narratives
Laboratories
Pathology…
NCBO Phenotype; © 2008, Mayo Clinic
High-Volume
Data Storage
4
Biomedical Informatics
Early
Phenotype
NCBO Phenotype; © 2008, Mayo Clinic
5
Biomedical Informatics
From Practice-based Evidence to Evidence-based Practice
Data
Clinical &
Genomic
Registries
Data
Warehouses
& Marts
Patient
Encounters
Biomedical
Knowledge
Expert
Systems
Clinical
Guidelines
Knowledge
Management
6
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
Mayo Clinic
Data Governance
• Enterprise-wide (Eight States)
• Data Modeling, Metadata, and Vocabulary
• High-level representation
• Senior institutional Leaders – Physician led
• Two-tier organizations
• Steering Committee, Stewardship groups
• Multidisciplinary
• Care Providers, Informatics, IT, domain experts
NCBO Phenotype; © 2008, Mayo Clinic
7
Biomedical Informatics
Mayo Enterprise Vocabulary Organization
Reference
Vocabularies
Mayo Thesauri Value Sets
Cross Mapping
Tables
External
SNOMED
LOINC
GO
ICDs
CPTs
… ~200
Mayo Internal
Table 22
Table 61
SNOMED mods
Mayo CPTs
WARS
… ~50
UMLS
External
WHO FIC
… ~3
Mayo Internal
Drugs
Disease
Symptoms
Pt Functioning
EDT Aggregation
… ~8
External
JACHO
NACCR Ca
NIH/NCI
… ~1000
Mayo Internal
Flow sheets
MICS apps/screens
Dept systems
Registry screens
Form questions
Inf. for your Phys.
… ~10,000
External
SNOMED↔ICD
CPT↔ICD
LOINC↔SNOMED
FDB↔NDC
… ~500
Mayo Internal
All thesauri
All value sets
Tab 22↔ICD
FDB↔Fomulary
…
… >10,000
LexGrid: Data Model, Data Store and Machinery
NCBO Phenotype; © 2008, Mayo Clinic
8
Biomedical Informatics
•
•
•
NCBO Phenotype; © 2008, Mayo Clinic
9
Biomedical Informatics
The Continuum Of Health Classification
Biology meets Clinical Medicine
Biology Medicine
10
9
5
4
3
2
1
0
8
7
6
Chasm of Semantic Despair
G ic s en om
P ro
M et ic s te om
P at hw ay s ar
M od el in g ab ol ic
M ol ec ul ol ec ul
M ar
S im
C el ul at io n lu la r
M
M ol ec od el s ul ar
A
G en ss ay s om ic
T es tin g
B io sp ec im en s
La b
D
D at
Tr is ea ia a ls se
D at
&
S a yn dr
P om
M ed ic e al
Im at ie nt
R g ag in ec or d
D
E
H
R at a
S tr uc tu re s
NCBO Phenotype; © 2008, Mayo Clinic
10
Biomedical Informatics
NCBO – A Bridge Across the Chasm
NCBO Phenotype; © 2008, Mayo Clinic
11
Biomedical Informatics
Feudal Cognition:
Intellectual Semantic Baronies
• Genetic variation – Genomics
• Haplotypes – Statistical Genomics
• Molecular – Metabolomics, Proteomics
• Binding – Molecular simulation
• Pathways – Physiology and Systems Biology
• Symptoms – Consumer Health
• Rx and Px – Clinical Medicine
• Risk – Public Health, Epidemiology
• Social impact – Sociology, Health Economics
NCBO Phenotype; © 2008, Mayo Clinic
12
Biomedical Informatics
Whither Phenotype?
Spans spectrum from enzymes to disease
• Pharmacogenomics – enzyme functionality
• Physiologist – cellular function
• Systems biologist – pathway circuit flow
• Sub-specialist – organ functioning
• Patient/Clinician – disease manifestation
• Public Health – population characteristics
Highly specific to use-case context
NCBO Phenotype; © 2008, Mayo Clinic
13
Biomedical Informatics
Can the Genotype be the Phenotype?
• If the “disease” is defined by Genotype…
• Lesch-Nyhan syndrome
• Huntington's disease
• Gene sequence as “sign” of disease
• What is the distinction between them
• Consequences for genotype to phenotype research
NCBO Phenotype; © 2008, Mayo Clinic
14
Biomedical Informatics
Whither Disease as Phenotype?
Notions of disease are inextricably bound to an underpinning Knowledge Base (rabies)
Disease manifestation occurs across multiple planes of human understanding
Boundaries among clinical observational elements * are relativistic, rapidly changing, and probably serve little practical purpose
Observations, findings, syndromes, diseases, genotype, haplotype, epigenomic change…
NCBO Phenotype; © 2008, Mayo Clinic
15
Biomedical Informatics
Biomedical Informatics eMERGE: e lectronic M edical R ecord and GE nomics
• The eMERGE Network is a national consortium formed to develop, disseminate, and apply approaches to research that combine DNA biorepositories with electronic medical record
(EMR) systems for large-scale, high-throughput genetic research.
• Reproducible phenotype from EMRs
17
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
EMR Phenotypes and
Community Engaged Genomic Associations
• NHGRI U01 HG004599; Mayo Clinic
• CG Chute, PI
• Genome Wide Association Study (GWAS)
• Two Clinical “conditions”
• Myocardial Infarction
• Peripheral Arterial Disease (PAD)
• Balance with ethical and community engagement studies
18
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
Biomarker ECG
Abnormal Evolving Diag
Equivocal Evolving Diag
Incomplete Evolving Diag
Diagnostic
Evolving ST-T
Diagnostic
Evolving ST-T
Equivocal
Absent
Suspect MI
Suspect MI
No MI
No MI
No MI
No MI
NCBO Phenotype; © 2008, Mayo Clinic
Diagnostic
Evolving ST-T
Diagnostic
Evolving ST-T
Equivocal
Absent
No MI
No MI
No MI
No MI
No MI
No MI
19
Biomedical Informatics
• Increased troponin
Troponin
• a measurement exceeding 0.05 ng/ml
• Raising or falling pattern for serial troponin
• 0.05 ng/ml
• Historical overlap with CK-MB biomarker
• false positive issues - exclusions
• Ablation, cardiac surgery, CHF-acute, CPR, critically ill, crush/ contusion/ bruising, drug toxicity, embolectomy, endomyocardial biopsy, external defibrillation, hyperthermia, hypotension, hypothyroidism, infiltrative disease, inflamm. dis-cardiac, internal defibrillation, neurological disease, pericarditis, pulmonary embolism, transplant vasculopthy, uncontrolled hypertension, rhabdomyolysis, sepsis
20
NCBO Phenotype; © 2008, Mayo Clinic
CHF-acute
CPR critically ill crush contusion bruising drug toxicity hembolectomy endomyocardial biopsy external defibrillation hyperthermia hypotension hypothyroidism infiltrative disease inflammatory dis-cardiac internal defibrillation neurological disease
Biomedical Informatics
Exclusion Criteria for Troponin ablation cardiac surgery
33250, 33251, 33254, 33255, 33256, 33257, 33258, 33259, 33261, 33265, 33266,
33120, 33130, 33300, 33305, 33310, 33315, 33320, 33321, 33322, 33330, 33332, 33400, 33401, 33403, 33404, 33405, 33406, 33410, 33411, 33412, 33413, 33414, 33415, 33416, 33417, 33420, 33422, 33425,
33426, 33427, 33430, 33460, 33463, 33464, 33465, 33468, 33470, 33471, 33472, 33474, 33475, 33476, 33478, 33496, 33600, 33602, 33860, 33861, 33863, 33508, 33510, 33511, 33512, 33513, 33514, 33516,
33517, 33518, 33519, 33521, 33522, 33523, 33533, 33534, 33535, 33536, 33542, 33545, 33572, 33600, 33602, 33606, 33608, 33610, 33611, 33612, 33615, 33617, 33619, 33641, 33645, 33647, 33660, 33665,
33670, 33675, 33676, 33677, 33681, 33684, 33688, 33690, 33692, 33694,33692, 33694, 33697, 33702, 33710, 33720, 33722, 33724, 33726, 33730, 33732, 33735, 33736, 33737, 33750, 33755, 33762, 33764,
33766, 33767, 33768, 33770, 33771, 33774, 33775, 33776, 33777, 33778, 33779, 33780, 33781, 33786, 33788, 33800, 33802, 33803, 33813, 33814, 33820, 33822, 33824, 33840, 33845, 33851, 33852, 33853,
33860, 33861, 33863, 33864, 33945, 93501, 93503, 93505, 93508, 93510, 93511, 93514, 93524, 93526, 93527, 93528, 93529, 93530, 93531, 93532, 93533,
428.0, 428.1, 402.01, 402.11, 402.91, 402.91
92950
411.81, 411.89, 434.11, 434.91, 570, 584.5, 584.6, 584.7, 584.8, 584.9
861.00, 861.01, 929.9
861.01, 924.9
861.01, 924.9
977.9
33910, 33915, 33916, 33917, 33920, 35875, 35876, 92973,
93505
92960
780.6
458.9
244.9
135 [425.8], 164.1, 212.7, 429.1, 277.39, 429.71, 413.1, 425.1, 425.4, 428.9, 429.0, 427.89, 429.89
429.89, 429.0, 422.90, 422.92, 422.93, 422.89, 424.90, 424.99, 424.91, 423.1, 423.2, 423.8, 423.9
92961
340, 341.0, 341.1, 341.20, 341.21, 341.22, 341.8, 341.9, 349.9 342.00, 342.01, 342.01, 342.10, 342.11, 342.12, 342.80, 342.81 , 342.82, 342.90, 342.91, 342.92, 343.0, 343.1, 343.2, 343.3, 343.4, 343.8, 343.9,
344.01, 344.02, 344.03, 344.04, 344.09, 344.1, 344.2, 344.30, 344.31, 344.32, 344.40, 344.41, 344.42, 344.5, 344.60, 344.61, 344.81, 344.9, 345.00, 345.01, 345.10, 345.11, 345.20, 345.21 345.3, 345.40,
345.41, 345.50, 345.51, 345.60, 345.61, 345.70, 345.71, 345.80, 345.90, 345.91,346.00, 346.01, 346.00, 346.01, 346.10, 346.11, 346.20, 346.21, 346.80, 346.81, 346.90, 346.91 pericarditis pulmonary embolism
423.9, 420.90
415.19 transplant vasculopthy 996.80, 996.83
uncontrolled hypertension 401.9 rhabdomyolysis 728.88 sepsis
21
Biomedical Informatics
Cardiac Pain
• NLP of the text from EMR
• Can distinguish negated terms
• Search Term:
• Chest pain, thoracic pain, precordial pain, pain chest, chest discomfort, atypical chest pain, exertional chest pain, chest pain exertion, anginal chest pain, anginal pain, chest distress, acute chest pain, chest aching, aching chest, typical chest pain, chest pain, jaw pain, pain jaw, throat pain, pain throat, neck pain and pain neck.
• HPI in 72 hour window around presentation
NCBO Phenotype; © 2008, Mayo Clinic
22
Biomedical Informatics
Positive
Predictive
Value
Concordance MI Symptoms
Human vs. NLP
0.90
Nurse Abstractor
Yes No
NLP
Yes 590
No 63
NCBO Phenotype; © 2008, Mayo Clinic
35
80
23
Biomedical Informatics
Mayo Clinic Cancer Center
• Informatics Infrastructure
• $3.2M caBIG funding in 2008 (Chute, PI)
• LexBIG development
• Vocabulary Knowledge Center
• BRIDG development and leadership
• Clinical Trials infrastructure
• Common Data Elements
• Consistent and comparable
• Inclusion, exclusion criteria
• Co-morbidities
NCBO Phenotype; © 2008, Mayo Clinic
24
Biomedical Informatics
Cancer Phenotype
• Increasingly dependent on genomic characteristics
• Blend with pathology, imaging, laboratory
• Extent of disease measures are crucial
• Comparable and consistent data ultimately depends upon common ontologies
• BiomedGT
• SNOMED CT
• ICD-O
• HUGO
• GO
• Adverse
Events
NCBO Phenotype; © 2008, Mayo Clinic
• Drugs Rx
• Radiation RT
• Surgury Px
25
Biomedical Informatics
BRIDG Data Models
• BRIDG is the HL7 RCRIM domain model
• Collaboration among CDISC, HL7, FDA, NCI
• Historically expressed as UML model
• Evolving translation into OWL
• Preserve OWL and UML modes interoperably
• Assert that UML and OWL are isomorphic
• Span the information-terminology model boundary
• Enable reasoner support to create coherent model
• Preserve domain expert access to model in UML
26
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
Content vs. Structure
Contest or Synergy? Computer Equivalent?
Family History of Breast Cancer
Family History of Heart Disease
Family History of Stroke
27
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
Mayo CTSA
• Catalog ongoing projects, grants, studies
• Scientific description of research
• Administrative categories (human, animal, …)
• Regulatory (ClinTrials.gov, FDA, etc)
• Patient cohort identification
• Clinical trial recruitment
• Case-report form generation
NCBO Phenotype; © 2008, Mayo Clinic
28
Biomedical Informatics
So what does Mayo want?
• Ability to record patient data using common models and vocabulary content
• 100 year tradition
• Approximating reality in electronic modes
• Ability to create overarching warehouse
• Now on 3 rd generation “data trust”
• To support “understanding”
• Research, knowledge discovery, practice insight
• Practice improvement, quality, safety
• Decision support, guidelines, best evidence
NCBO Phenotype; © 2008, Mayo Clinic
29
Biomedical Informatics
How are we doing it?
• Department of Health Sciences Research
• Epi, Biostat, Informatics, Health Policy
• Enterprise wide (3 campus presence)
• About 4,000 studies, reports, retrievals per year
• Turnkey “phenotype” templates
• Diabetes (Type I, Type II, metabolic syndrome)
• CAD, MI, PAD,
• Stroke (embolic, aneurysm)
• Asthma, COPD,
• Cancers
NCBO Phenotype; © 2008, Mayo Clinic
30
Biomedical Informatics
NLP
• Six year operation of fully coded clinical notes
• About 24M clinical notes
• Added Pathology, Radiology in process
• 18-stage UMIA pipeline of annotators
• Tokenization, part of speech, entity recognition
• Code noun phrases into
• Sign and symptoms – SNOMED
• Diagnoses – SNOMED
• Drugs – RxNorm
• Primary source for Phenotype retrieval
NCBO Phenotype; © 2008, Mayo Clinic
31
Biomedical Informatics
From Local to Global
The ICD-11 Project
• 150 year legacy of the ICDs
• World Health Organization leadership
• Explicit definitions of classifications from
Terminology
• Scientific consensus of clinical phenotype
• Distributed authoring model
NCBO Phenotype; © 2008, Mayo Clinic
32
Biomedical Informatics
•
ICD11 Use Cases
•
• Mortality
• Public Health Morbidity
•
• Metrics of clinical activity
• Quality management
• Patient Safety
• Financial administration
• Case mix
• Resource allocation
33
Biomedical Informatics
Some Premises for ICD-11 development
• Rubrics defined by Aggregation Logics from terminologies
• Human language definitions will be explicit
• “core” representation will be in description logic based ontology
• A linear serialization will be derived as a view to maintain longitudinal consistency
• May require corresponding “rules” for practical use
34
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
Familiar Points Along Continuum
Modern Health Vocabularies
• Nomenclature – Highly Detailed Descriptions (SNOMED)
• Classification – Organized Aggregation of Descriptions into a Rubric
(ICDs)
• Groupings – High Level Categories of Rubrics (DRGs)
Aggregation Groupers
Nomenclature
Detailed
Classification Groups
Grouped
35
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
Traditional Hierarchical System
ICD-10 and family
NCBO Phenotype; © 2008, Mayo Clinic
36
Biomedical Informatics
Addition of structured attributes to concepts
Concept name
Definition
Language translations
Preferred string
Language translations
Synonyms
Language translations
Index Terms
37
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
Addition of semantic arcs - Ontology
Relationships
Logical Definitions
Etiology
Genomic
Location
Laterality
Histology
Severity
Acuity
NCBO Phenotype; © 2008, Mayo Clinic
38
Biomedical Informatics
Serialization of “the cloud”
Algorithmic Derivation
NCBO Phenotype; © 2008, Mayo Clinic
39
Biomedical Informatics
Linear views may serve multiple use-cases
Morbidity, Mortality, Quality, …
NCBO Phenotype; © 2008, Mayo Clinic
40
Biomedical Informatics
Proposed Hierarchy of Wiki Authority
by ICD Domain (not implemented)
0 Revision Steering Committee
1 Revision Domain/Topic Working Groups &
WHO-FIC Network
2 Accredited Experts
• Designated by Working Group Members &
WHO-FIC Network
3 Accredited Persons
• Designated by Experts
4 Registered Interested Persons (Public)
NCBO Phenotype; © 2008, Mayo Clinic
41
Biomedical Informatics
Discussions with IHTSDO
International Health Terminology (IHT)
• IHT (SNOMED) will require high-level nodes that aggregate more granular data
• Use-cases include mutually exclusive, exhaustive,…
• Sounds a lot like ICD
• ICD-11 will require lower level terminology for aggregation logic definitions
• Detailed terminological underpinning
• Sounds a lot like SNOMED
42
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
ICD-11
Potential Future States
SNOMED
Ghost ICD
Ghost SNOMED
NCBO Phenotype; © 2008, Mayo Clinic
43
Biomedical Informatics
ICD-11
ICD-
Joint
Alternate Future
Effort SNOMED
NCBO Phenotype; © 2008, Mayo Clinic
44
Biomedical Informatics
Advantages to Collaboration
• Both organizations avoid “Ghost” emulations
• Both organizations leverage expertise and content
• More resources brought to the table
• Both organizations retain independent intellectual property and derivatives (e.g. Linear formats of ICD-11)
• Mappings become moot
• Aggregation of SNOMED is definitional to ICD
45
NCBO Phenotype; © 2008, Mayo Clinic
Biomedical Informatics
Caveat
ICD and IHTSDO
• No agreements have been finalized
• Intellectual property sharing is expected
• Shared tooling is being discussed
• Harmonization Board has been proposed
NCBO Phenotype; © 2008, Mayo Clinic
46