Biomedical Informatics Data Governance and Normalization with the Mayo Clinic Enterprise Christopher G Chute MD DrPH, Mayo Clinic Chair, ICD11 Revision, World Health Organization Chair, ISO TC215 on Health Informatics CTSA Ontology Workshop Baltimore 25 April 2012 1 Biomedical Informatics From Practice-based Evidence to Evidence-based Practice Data Patient Encounters Decision support Clinical Databases Registries et al. Inference Shared Semantics Standards Medical Knowledge Vocabularies & Terminologies Expert Systems Clinical Guidelines Knowledge Management 2 Biomedical Informatics Mayo Clinic – Normalization Needs Variations on Secondary Use • Clinical Decision Support • Quality measurement and improvement • Best practice discovery and application • New knowledge discovery (research) • Genotype to phenotype association • Outcomes Research • Comparative Effectiveness Analyses © 2007 Mayo Clinic College of Medicine 3 Biomedical Informatics Comparable and Consistent Data • Inferencing from data to information requires sorting information into categories • Statistical bins • Machine learning features • Accurate and reproducible categorization depends upon semantic consistency • Semantic consistency is the vocabulary problem • Almost always manifest as the “value set” problem © 2007 Mayo Clinic College of Medicine 4 Biomedical Informatics LOINC, RxNorm, and SNOMED We’re not done yet… • Picking high-level, Meaningful Use conformant, and fashionable vocabularies is not hard • Finding which subsets (value sets) map to • For each application and interface • For each data element and variable • Implies thousands of value sets for the average enterprise • Imposes a normalization and mapping challenge © 2007 Mayo Clinic College of Medicine 5 Biomedical Informatics Data Governance Policies Data Standards Data Quality Data Architecture and Infrastructure Data Governance Support Services Biomedical Informatics Terminology Vision Mayo Clinic Production Terminology Support • To incrementally standardize Mayo Clinic terminology • To access and use standardized terminology in applications, interfaces, warehouses, and processes across Mayo Clinic • To reduce cost and effort of developing and maintaining redundant and disparate terminologies and mappings for operations, analytics, and decision support • To support business needs and strategies that require or benefit from standardized terminology (e.g., ICD-10 conversion, Meaningful Use, Health Information Exchange) Biomedical Informatics Principles • Leverage external standards • Support Mayo needs that are not yet part of an external standard and promote them to the industry • Align with the Enterprise Data Model and other types of Mayo Clinic Data Standards • Maintain a balance between short-term and strategic needs and solutions Biomedical Informatics High-Level Context External Terminologies LOINC, CPT, ICD-9, ICD-O, SNOMED-CT, UMLS, other Enterprise Managed Metadata Environment Enterprise Data Model Enterprise Terminology Analytic Systems Operational Systems EDT, Admin, Nursing, DSS, other EMRs. Lab, Radiology, other Research and Collaborating Organizations Biomedical Informatics Solution Vision Tools and Services: Load, Author, Map, Store, Search, Browse, Publish, Access • Code Systems •Mayo Clinic Backbone Terminology, ICD-9 CM, ICD-10 CM, SNOMED CT, RxNorm, LOINC, etc. Enterprise Terminology • Mappings •Clinical Problems to ICD-9 CM, ICD-10 CM, and SNOMED CT, Lab Orders/Results to LOINC • Value Sets •Clinical Problem, Body Structure, Body Side, Administrative Gender, Unit of Measure • Picklists •Admitting Diagnosis, Lab Unit of Measure Biomedical Informatics Value Set and Mappings Example Code Systems Clinical Problems Hypertension Hyperlipidemia Depression Hypothyroidism Obesity Osteoporosis Coronary artery disease Anemia Diabetes mellitus Obstructive sleep apnea Clinical Facing Maps SNOMED CT Administrative Billing Health Information Exchanges ICD-9 ICD-10 Payers Biomedical Informatics Problem List Terminology Example Develop and approve Provide in an accessible format Use in clinical and admin processes Distribute and Load EMR’s, Scheduling, Orders, etc. Biomedical Informatics Terminology Management Offerings • Methodologies and Best Practices • Terminology Standardization Facilitation • External Standards Expertise • Terminology Mapping (and 3rd party recommendations) • External Terminology Management • Mayo Clinic Terminology Standards Management Biomedical Informatics Tools and Terminology Services • Health Language Inc. • • • • Central authoring tool (LExScape) Mapping assistance tool (LEAP) Web-based browser (LExPlorer) Importer/Exporter and Web Services • Mayo Developed & Supported • Exports • Web Services (as needed) Biomedical Informatics Terminology Resources • Data Governance Support Services Terminology Management • Technical support: IT Reference Repositories Operational Systems • Data Governance Committee Biomedical Informatics Approved Data Standards and Policy • Approved Data Standards on Web • Mayo Clinic Data Standards • Enterprise Data Model • Data Standards Policy • Data Governance Policy for Data Standards Biomedical Informatics Data Granularity Dissonance Practice vs. Research vs. Public Health • Clinical data is impressionistic • Not protocol driven – quaint rambling text… • Detail and consistency is a happy coincidence • However, can be holistic; intuit the unanticipated • Research data is rigid though rigorous • Detailed, structured, complete (ideally), coherent • May miss the forest for the chlorophyll • Public Health comprehends broader interests • Metrics of life-style, high risk behavior, fitness • Non-clinical circumstances, settings, environment • Can there be reconciliation among use-cases? 17 Biomedical Informatics electronic MEdical Records and GEnomics NHGRI eMERGE (U01) Goals • GWAS – Genome Wide Association Study • 600k Affy chip • High-throughput Phenotyping • Disease algorithm scans across EMRs • “catch up” with high throughput genomics • Generalize Phenotypes across the Consortium • Measure reproducibility of algorithms among members • Vanderbilt, Northwestern, Marshfield, Group Health Seattle, Mayo 11/16/10 18 Biomedical Informatics SHARP: Area 4: Secondary Use of EHR Data A $15M National Consortium • 16 academic and industry partners • Develop tools and resources that influence and extend secondary uses of clinical data • Cross-integrated suite of project and products • Clinical Data Normalization • Natural Language Processing (NLP) • Phenotyping (cohorts and eligibility) • Common pipeline tooling (UIMA) and scaling • Data Quality (metrics, missing value management) • Evaluation Framework (population networks) 19 Biomedical Informatics SHARP Area 4: Secondary Use of EHR Data • Agilex Technologies • Harvard Univ. • CDISC (Clinical Data Interchange• Intermountain Healthcare Standards Consortium) • Mayo Clinic • Centerphase Solutions • Mirth Corporation, Inc. • Deloitte • MIT • Group Health, Seattle • MITRE Corp. • IBM Watson Research Labs • Regenstrief Institute, Inc. • University of Utah • SUNY • University of Pittsburgh • University of Colorado Biomedical Informatics Themes & Projects Biomedical Informatics Clinical Data Normalization • Data Normalization • Clinical data comes in all different forms even for the same kind of information. • Comparable and consistent data is foundational to secondary use • Clinical Data Models – Clinical Element Models (CEMs) • Basis for retaining computable meaning when data is exchanged between heterogeneous computer systems. • Basis for shared computable meaning when clinical data is referenced in decision support logic. Biomedical Informatics A diagram of a simple clinical model Clinical Element Model for Systolic Blood Pressure SystolicBPObs data 138 mmHg quals BodyLocation data Right Arm PatientPosition data # 23 Sitting Biomedical Informatics Biomedical Informatics Data Element Harmonization • Stan Huff – CIMI • Clinical Information Model Initiative • NHS Clinical Statement • CEN TC251/OpenEHR Archetypes • HL7 Templates • ISO TC215 Detailed Clinical Models • CDISC Common Clinical Elements • Intermountain/GE CEMs Biomedical Informatics Normalization Pipelines • Input heterogeneous clinical data • HL7, CDA/CCD, structured feeds • Output Normalized CEMs • Create logical structures within UIMA CAS • Serialize to a persistence layer • SQL, RDF, “PCAST like”, XML • Robust Prototypes Q1 2012 • Early version production Q3 2010 Biomedical Informatics This slide is obvious 1. Physical Quantity 2. Coded Values 3. Text field Determine payload Is laboratory data Biomedical Informatics Natural Language Processing • Information extraction (IE): transformation of unstructured text into structured representations and merging clinical data extracted from free text with structured data • Entity and Event discovery • Relation discovery • Normalization template: Clinical Element Model • Improving the functionality, interoperability, and usability of a clinical NLP system(s), namely the Clinical Text Analysis and Knowledge Extraction System (cTAKES) Biomedical Informatics High-Throughput Phenotyping • Phenotype - identifying a set of characteristics of about a patient, such as: • A diagnosis • Demographics • A set of lab results • Phenotyping – overload of terms • Originally for research cohorts from EMRs • Obvious extension to clinical trial eligibility • Note relevance to quality metrics • Numerator and denominator constitute phenotypes • Clinical decision support • Trigger criteria Biomedical Informatics EMR Phenotype Algorithms • Typical components • Billing and diagnoses codes; Procedure codes • Labs; Medications • Phenotype-specific co-variates (e.g., Demographics, Vitals, Smoking Status, CASI scores) • Pathology; Imaging? • Organized into inclusion and exclusion criteria • Experience from eMERGE Electronic Medical Records and Genomics Network (http://www.gwas.net) Biomedical Informatics SHARP Area 4: More information… http://informatics.mayo.edu/sharp http://sharpn.org Biomedical Informatics Where is This Going? • Standards and interoperability are crucial to the operation and success of academic medical centers • The boundaries among clinical and research standards are eroding • Information standards, especially vocabularies, are the foundation for scientific synergies • Data Governance, at an enterprise (and arguably a national) level is critical 32