Computable Semantics and Probabilistic Graphical Models Where Probabilistic Systems and Semantics Rub Elbows Peter Haug, MD Homer Warner Center for Informatics Research Intermountain Healthcare First of all: Thanks This work has many contributers: Dominik Aronsky, MD, PhD Jeffrey Ferraro, PhD Stan Huff, MD Scott Evans, PhD Robert Hausam, MD Lee Pierce Xinzu Wu, PhD Matthew Ebert Kumar Mynam And many more! Please ask questions … 3 Agenda • • Why Decision Support? Introduction: Bayesian Diagnostic Networks • Bayesian Systems • • • • A Few Bayesian Tools Diagnostic Systems Representing the Semantics of Diagnosis • • • A Framework for Computable Models Diagnostic Modeling with Ontologies Ontologies -> Bayesian Network Clinical Data • A Brief Look at Medical Data Forms Computerized Decision Support: Core Assumptions ‘... man is not perfectible. There are limits to man’s capabilities as an information processor that assure the occurrence of random errors in his activities.’ ~ Clement J. McDonald, MD (1976) ‘The complexity of modern medicine exceeds the inherent limitations of the unaided human mind.’ ~ David M. Eddy, MD, Ph.D. (1990) Patient Underlying principle: We are designing the system so that the computer is an active part of patient care, not just a way of getting data to people to read. Agenda • • Why Decision Support? Introduction: Bayesian Diagnostic Networks • Bayesian Systems • • • • A Few Bayesian Tools Diagnostic Systems Representing the Semantics of Diagnosis • • • A Framework for Computable Models Diagnostic Modeling with Ontologies Ontologies -> Bayesian Network Clinical Data • A Brief Look at Medical Data Forms The Reverend Thomas Bayes 1702 to 1761 Bayes set out his theory of probability in 1764. At that time, Richard Price, a friend of Bayes, discovered two unpublished essays among Bayes's papers which he forwarded to the Royal Society. A Way to Think about Probabilistic Systems (and an introduction to some terminology) Learning from Data • • • • The data comes from Health Care Encounters It is captured in Electronic Health Records (EHRs) It is aggregated and organized in Enterprise Data Warehouses (EDW) It includes the diagnoses and the data that support them Bayesian Networks • Model the joint probability distribution of the data and diagnoses • Use directed graphs to structure these models Re-Using Healthcare Data Episodes of Care Medical Information System Enterprise Data Warehouse • Quality Improvement • Measures of Care • Clinical Research • Medical Decision Support 12 Example: Patients with Symptoms of Heart Disease Patient Population Data Collected in a Care Setting 13 Original Data Patient Myocardial ST ID Infarction Chest Pain Segment 1 Present Present Elevated 2 Absent Absent Normal 3 Present Absent Depressed 4 Absent Absent Normal 5 Absent Absent Normal 6 Absent Absent Normal 7 Absent Absent Normal …. …. …. …. Summarizing the Data: The Numbers A Condensed Look at 1000 Cases MI No MI 20 980 1000 MI No MI Chest Pain 15 80 95 No Chest Pain 5 900 905 20 980 1000 Summarizing the Data: The Numbers A Condensed Look at 1000 Cases MI MI 20 2% Chest Pain Chest Pain No Chest Pain No Chest Pain No MI No MI 980 98% MI MI 15 1.5% 5 0.5% 20 2% 1000 100% No MI No MI 80 8.0% 900 90.0% 980 98% 95 10% 905 91% 1000 100% And the “Marginal Probabilities Another Summary: The Joint Probability Distribution Another View of the 2x2 Table 16 Dividing by the Column Marginals False Positive Rate: P(F|no D) Sensitivity: P(F|D) MI No MI Chest Pain 75% 8% No Chest Pain 25% 92% 100% 100% False Negative Rate: P(no F| D) Specificity: P(no F|no D) Bayes Equation Inferring the probability of a Disease (D) from a Finding (F) Prior Disease Probability Sensitivity P ( D) P( F | D) P( D | F ) P( F ) Posterior Disease Probability Probability of Finding Probability Updating The Disease is Myocardial Infarction The Finding is Chest Pain P(MI)P(Chest Pain | MI) P(MI | Chest Pain) = P(Chest Pain) P(MI) = 2.0% (0.02) P(Chest Pain|MI) = 75% (0.75) P(Chest Pain) = ? The Question of P(F) Simple Bayes • Patient has One and Only One Disease Multi-Membership Bayes • • Patient has Any Group of Disease Each Disease is Evaluated Independently Bayesian Networks • • Patient has Any Group of Disease Diseases are Evaluated According to Their Collective (Joint) Behavior P ( F ) P ( F and Di ) i Add All of the Probabilities Of Having Both the Finding and Disease P( F ) P( Di ) P( F | Di ) i The Question of P(F) 20 Simple Bayes • Patient has One and Only One Disease Multi-Membership Bayes • • Patient has Any Group of Disease Each Disease is Evaluated Independently Bayesian Networks • • Patient has Any Group of Disease Diseases are Evaluated According to Their Collective (Joint) Behavior P( F ) P( Di ) P( F | Di ) P( D i ) P( F | D i ) Two States Apply for Each Disease: With and Without the Disease 21 The Question of P(F) Simple Bayes • Patient has One and Only One Disease Multi-Membership Bayes • • Patient has Any Group of Disease Each Disease is Evaluated Independently Bayesian Networks • • Patient has Any Group of Disease Diseases are Evaluated According to Their Collective (Joint) Behavior Disease Intermediate Concept Finding 3 Finding 1 Finding 4 Finding 2 P(F) is Determined from the Joint Effect of Child Nodes on Their Parents Probability Updating The Disease is Myocardial Infarction The Finding is Chest Pain P(MI)P(Chest Pain P(MI)P(Chest Pain | MI)| MI) P(MI | Chest = P(MI | Chest Pain)Pain) = P(Chest Pain |P(Chest MI) + P(Chest Pain | noMI ) Pain) P(MI) = 2.0% (0.02) P(Chest Pain|MI) = 75% (0.75) Multi-Membership Bayes P(Chest Pain) = ? Probability Updating Using the Multi-Membership Model The Disease is Myocardial Infarction The Finding is Chest Pain 00..02 0200..75 75 P( MI P | Chest ( MI | Chest Pain ) Pain ) 0.02 0.75? 0.98 0.08 P(MI) = 2.0% (0.02) P(Chest Pain|MI) = 75% (0.75) P(Chest Pain) = 0.02 x 0.75 + 0.98 x 0.08 24 Probability Updating Using the Multi-Membership Model The Disease is Myocardial Infarction The Finding is Chest Pain P(MI | Chest Pain) 0.16 P(MI) = 2.0% (0.02) P(Chest Pain|MI) = 75% (0.75) P(Chest Pain) = 0.02 x 0.75 + 0.98 x 0.08 Diagnostic Bayesian Networks (Demonstrating Different Characteristics) Simple Bayes • Patient has one Disease • All findings are Conditionally Independent Multi-Membership Bayes • • Patient can have multiple Diseases All Diseases are evaluated independently Bayesian Networks • • • • Any relationship among diseases and findings Can represent any of the other models Multilayered models Graphical/probabilistic representation of knowledge Using a Bayesian Network Examples of Bayesian Diagnostics In Netica (www.Norsys.com) Myocardial InfarctionMyocardial Infarction A Simple Bayesian Network A Simple Bayesian Network Present 2.00 Present 2.00 (Several Findings) (One Finding) Absent 98.0 Absent Chest Pain Present Absent 9.34 90.7 ST Elevation Present Absent 13.6 86.4 98.0 Troponin Increase Chest Pain Present Present 9.34 Absent Absent 90.7 4.74 95.3 More Diagnostic Examples (Myocardial Infarction) Using Pulmonary Diseases • • • • Pneumonia Asthma COPD Pulmonary Embolism With Increasingly Complex Models • • Simple Bayes Multi-Membership Bayes • Complex Relationships Bayesian Diagnostic Models (Naïve Bayes) Disease Pneumonia Asthma Chronic Bronchitis Other Elevated_WBC Dyspnea Present Absent 100 0 0 0 Wheezing 15.0 85.0 Present Absent Present Absent 10.0 90.0 Cough Present Absent 85.0 15.0 Fever Present Absent 90.0 10.0 92.0 8.00 Bayesian Diagnostic Models (Multi-Membership Bayes) Pneumonia Present Absent Asthma 6.00 94.0 Present Absent 4.00 96.0 Dyspnea Present Absent 15.3 84.7 Wheezing Dyspnea Present Absent Elevated WBC 15.3 84.7 Present Absent 14.5 85.5 14.9 85.1 Fever Cough Present Absent Present Absent Present Absent 15.1 84.9 Cough Present Absent 8.88 91.1 10.3 89.7 Bayesian Diagnostic Models (Bayesian Network: Two-Layer) Asthm a Pneum onia Present Absent Present Absent 6.00 94.0 4.00 96.0 Dyspnea Present Absent 15.8 84.2 Elevated WBC Present Absent Cough Present Absent 12.8 87.2 Fever Present Absent 15.1 84.9 15.1 84.9 Wheezing Present Absent 11.3 88.7 Bayesian Diagnostic Models (Multi-Layer Bayesian Network) Pneum onia Present Absent Asthm a 6.00 94.0 Present Absent Dyspnea Present Absent 4.00 96.0 System ic Inflam ation 15.8 84.2 Present Absent 15.1 84.9 Wheezing Cough Present Absent Present Absent 11.3 88.7 12.8 87.2 Fever Present Absent 20.1 79.9 Elevated WBC Present Absent 17.7 82.3 Bayesian Diagnostic Models (Multi-Layer with Continuous Variables) Pne um onia Present Absent As thm a 6.00 94.0 Present Absent 4.00 96.0 Sys te m ic Inflam ation Dys pne a Present Absent Present Absent 15.8 84.2 15.1 84.9 Whe e zing Present Absent 11.3 88.7 Ele vate d WBC Cough Present Absent 12.8 87.2 Te m pe rature 35 to 35.5 35.5 to 36 36 to 36.5 36.5 to 37 37 to 37.5 37.5 to 38 38 to 38.5 38.5 to 39 39 to 39.5 39.5 to 40 40 to 40.5 40.5 to 41 41 to 41.5 41.5 to 42 42 to 42.5 42.5 to 43 43 to 43.5 43.5 to 44 44 to 44.5 44.5 to 45 45 0.10 0.21 1.93 11.5 28.6 28.8 12.2 3.33 2.26 2.50 2.47 2.08 1.49 0.96 0.55 0.30 0.18 0.13 0.11 0.10 .098 37.9 ± 1.2 0 to 5 5 to 10 10 to 15 15 to 20 20 to 25 25 to 30 30 to 35 35 to 40 0+ 84.9 15.1 .003 0+ 0+ 0 0 8.26 ± 2.3 Bayesian Diagnostic Models (Multi-Layer with Added Associations) Chronic Bronchitis Pulm onary Em bolus Present Absent Chest Pain Present Absent 5.91 94.1 2.00 98.0 Present Absent Dyspnea Present Absent 14.1 85.9 Asthm a Pneum onia 0 100 Present Absent 2.02 98.0 Present Absent 4.00 96.0 Cough Present Absent Wheezing 9.64 90.4 Present Absent WBC Tem perature 35 to 35.5 35.5 to 36 36 to 36.5 36.5 to 37 37 to 37.5 37.5 to 38 38 to 38.5 38.5 to 39 39 to 39.5 39.5 to 40 40 to 40.5 40.5 to 41 41 to 41.5 41.5 to 42 42 to 42.5 42.5 to 43 43 to 43.5 43.5 to 44 44 to 44.5 44.5 to 45 .002 0 8.87 18.6 21.0 17.6 12.5 8.18 5.23 3.12 1.91 1.17 0.71 0.44 0.27 0.16 0.10 .062 .039 .023 37.8 ± 1.2 0 to 2.5 2.5 to 5 5 to 7.5 7.5 to 10 10 to 12.5 12.5 to 15 15 to 17.5 17.5 to 20 20 to 22.5 22.5 to 25 25 to 27.5 27.5 to 30 30 to 32.5 32.5 to 35 35 to 37.5 37.5 to 40 0+ 0.92 21.8 42.5 24.5 7.88 1.95 0.42 .082 .015 .003 0+ 0+ 0+ 0+ 0+ 9.37 ± 2.6 8.32 91.7 Using Bayesian Diagnostic Systems in Care Example: Diagnosing Pneumonia? Protocols: Computers Intervene in the Workflow (an example from the ED) Goal: • • • Rapidly Screen for Pneumonia Patients in the ED Assess Risk of Death Apply a Pneumonia Care Protocol Approach: • Use Probabilistic System to Identify Patients • • • • Diagnostic Bayesian Networks Supported with Natural Language Processing* Suggest Enrollment in Pneumonia Protocol Provide Therapeutic Suggestions *Extracts Data from the X-ray Report Advanced CDS (Diagnositic Models) Example: Community-Acquired Pneumonia Computable Medical Knowledge Reposotory Chest Xray Reports Chest Xray Report Processing (Structured Data Extraction) Data Supporting Pneumonia Assessment Does the patient have pneumonia? Pneumonia Screening Tool Should we used the protocol? Pneumonia Protocol Enrollment Pneumonia Treatment Protocol Apply Pneumonia Care Protocol. Clinical Data Repository The Emergency Department Workflow Imbed logic, orders into process of care Alerting for Pneumonia in the Patient Tracking Syste System Watches the Data Flow in the ED Imbed logic, orders into process of car Imbed logic, orders into process of care Treatment Protocol Uses Data from the EHR Combined with Manually Input Data ChiefComplaint RESPIRATORY COMPLAINT 32.4 FEVER 6.96 ABD PAIN 6.05 ORTHO INJURY 4.26 CHEST PAIN 4.12 NEURO COMPLAINT 3.69 FALL 3.62 TRAFFIC INJURY 3.50 ABD PROBLEMS 3.45 CHEST PRESSURE 3.10 BACK PAIN 2.82 WEAKNESS 2.79 SYNCOPE 2.28 ENT PROBLEM 2.19 BODY ACHES 1.88 CV COMPLAINTS 1.88 HEADACHE 1.83 DIZZY 1.77 FLANK PAIN 1.43 CV PROBLEMS 0.92 ASSAULT RAPE 0.87 PSYCHIATRIC 0.86 CHEST HEAVINESS 0.82 SKIN COMPLAINT 0.78 SPECIFIC DIAGNOSIS 0.51 DIABETIC 0.44 PAIN CHEST 0.37 HEART RACE 0.33 TRAUMA 0.31 GENITOURINARY PROBLEM 0.31 PALPITATIONS 0.31 HEART IRR 0.30 ALLERGIES 0.29 HIGH BP 0.28 FLUID NUTRITION 0.27 CONVULSIONS 0.25 INFECTION 0.20 RAPID HR 0.19 IRR HEARTBEAT 0.16 LACERATION 0.16 INGESTION 0.16 BP HIGH 0.13 UNCONSCIOUSNESS 0.11 VAGINAL BLEEDING .098 MED REFILL .091 UNKNOWN .087 LOW BP .064 CARDIAC ARREST .059 EYE PROBLEM .055 BP LOW .054 other0.18 BPSystolic < 121.5 29.4 121.5 to 148.5 44.6 >= 148.5 26.0 134 ± 22 HeartRate < 85.5 44.5 85.5 to 99.5 24.7 99.5 to 110.5 13.0 >= 110.5 17.8 92.1 ± 15 BPDiastolic < 69.5 28.3 69.5 to 82.5 36.2 >= 82.5 35.5 76.9 ± 11 RespRate < 19.5 52.3 19.5 to 21.5 24.9 21.5 to 27.5 16.1 >= 27.5 6.72 20.8 ± 3.5 NLP_FINDING Positive 25.9 Negative 74.1 Implemented Using: Diagnostic System • • MeanBP < 85.5 23.0 85.5 to 99.5 35.4 >= 99.5 41.7 95.1 ± 12 Bayesian Network • TempC < 36.75 62.7 36.75 to 37.45 23.8 37.45 to 38.05 6.04 >= 38.05 7.46 36.79 ± 0.63 Sodium < 137.5 25.7 137.5 to 140.5 41.8 >= 140.5 32.6 139.2 ± 2.4 < 13.5 >= 13.5 Chloride < 103.5 42.1 103.5 to 105.5 25.1 >= 105.5 32.9 104.3 ± 1.8 Model Trained Using EDW Data PNEUMONIA Absent 94.9 Present 5.09 NLP System • Age < 15.5 8.06 15.5 to 45.5 45.6 >= 45.5 46.4 42 ± 21 Creatinine < 0.405 3.90 >= 0.405 96.1 SpO2 < 92.1 10.2 92.1 to 95.3 23.6 95.3 to 98.4 44.9 >= 98.4 21.3 96.1 ± 3 WBC < 11.85 86.1 11.85 to 18.75 12.4 >= 18.75 1.45 9.46 ± 3.4 Random Forests-Based Concept Identification Yes No BS_CLEAR 44.0 56.0 BS_CRACKLES Yes 0.72 No 99.3 BS_CONGESTION Yes 0.53 No 99.5 BS_TUBULAR Yes .024 No 100 BS_CLEARING_SECREA... Yes 0.45 No 99.6 BS_FINE_CRACK... Yes 0.31 No 99.7 BS_MODERATE Yes 1.36 No 98.6 Trained with Documents in the EDW BS_STRIDOR Yes .083 No 99.9 BS_NO_COUGH Yes 0+ No 100 BS_WHEEZES Yes 2.84 No 97.2 BS_EXPIRATION Yes 0.90 No 99.1 Yes No BS_RALES 0.11 99.9 Yes No BS_ABSENT .030 100 BS_INSPIRATION Yes 0.79 No 99.2 BS_NOT_CLEARING_SECREA... Yes 0.10 No 99.9 BS_RHONCHI Yes 0.43 No 99.6 BS_ABNORMAL Yes 3.87 No 96.1 BS_DECREASED Yes 2.29 No 97.7 BS_FREQUENT Yes 1.19 No 98.8 BS_COURSE Yes 0.90 No 99.1 BS_NON_PRODUCTIVE_CO... Yes 1.74 No 98.3 Yes No BS_WEAK 0.16 99.8 BUN 45.1 54.9 BS_PRODUCTIVE_CO... Yes 1.81 No 98.2 BS_INFREQUENT Yes 0.62 No 99.4 BS_STRONG Yes 0.76 No 99.2 Agenda • • Why? Introduction: Bayesian Diagnostic Networks • Bayesian Systems • • • • A Few Bayesian Tools Diagnostic Systems Representing the Semantics of Diagnosis • • • A Framework for Computable Models Diagnostic Modeling with Ontologies Ontologies -> Bayesian Network Clinical Data • A Brief Look at Medical Data Forms The Process of Data-Based Research (finding the right data) Query Database Query Database Identify Research Problem Clinical Researcher Determine Subject Availability Clinical Researcher + Data Analyst + Terminologist Determine Data Availability Clinical Researcher + Data Analyst + Terminologist Collect/Analyze Data Clinical Researcher + Data Analyst + Terminologist + Statistician Review Results Clinical Researcher Query Database Data Review/ Analysis Data discovery and extraction takes 80-90% of the time. Building a System to Automate Predictive Modeling • Build a System That Can: • • • • • • Identify the Target Patients Identify Relevant Data Elements Extract Patients and Data from the EDW/AHR Provide Initial Analyses Support Refinement The Key is Teaching the System a Certain Amount of Medical Knowledge • Ontologies: Tools For Capturing Complex Medical Knowledge Ontology-Driven Model Discovery • Can we use knowledge embedded in ontologies to drive research? • The Ontology would: • • • Disease Ontology Help select research patients Identify and extract relevant data Provide preliminary analysis of the data Concept Retrieval (from Ontology Analysis Design Utility • • Structural Knowledge Retrieval from the Ontology Analytic Workbench Screening Models Allow visualization of this·· data Model Comparisons · 20% Analytic Health Repository 20% 20% Analysis Results P(d | f ) A tool to support Medical Data Mining Natural Language Processing Subsystem 20% 20% Model Explanation (by reference to the Ontology) Retrieval from the Analytic ReturnDataData and results to the user for Health Repository further study Concept Translation to EDW Representation • Output Analytic Data P(d ) P( f | d ) | di ) i P(d ) P( f i Prediction Algorithm Relevant Ontologic Concepts Ontologies Describe How Diseases Are Related (according to ICD9) Pneumonia Viral Pneumonia Viral pneumonia ICD9: 480 Pneumococcal pneumonia Pneumococcal pneumonia ICD9: 481 Pseudomonas Pneumonia Pneumonia due to Pseudomona ICD9: 482.1 Bronchopneumonia Bronchopneumonia, organism unspecified ICD9: 485 Bacterial Pneumonia More Pneumonias Other Bacterial Pneumonia Other bacterial pneumonia ICD9: 482 Hemophilus Pneumonia Pneumonia due to Hemophilus influenzae ICD9: 482.2 Staph Aureus Pneumonia Pneumonia due to Staphylococcus, unspecified ICD9: 482.40 Streptococal Pneumonia Pneumonia due to Other Streptococcus ICD9: 482.3 MSSA Staph Pneumonia Methicillin Susceptable Staph Aureus (MSSA) Pneumonia ICD9: 482.41 Staphlococcal Pneumonia Pneumonia due to Staphylococcus ICD9: 482.4 MRSA Staph Pneumonia Methicillin Resistant Staph Aureus (MRSA) Pneumonia ICD9: 482.42 More Bactierial Pneumonias Other Staph Pneumonia Other Staphylococcus pneumonia ICD9: 482.49 Ontologies Describe How Clinical Data are Related to Diseases has_Altered_VS Pneumonia Temperature Vital Signs: Temperature LOINC: 8310-5 has_Altered_Lab_Value has_Sign Bacterial Pneumonia Pneumonia Pneumonia, Organism unspecified ICD9: 486 White Blood Count Hematology: White Blood Count LOINC: 62239-9 More Pneumonias has_X-ray_Manifestation Pneumococcal pneumonia Pneumococcal pneumonia ICD9: 481 Pulmonary Rales Signs: Chest Auscultation-Rales PTXT: 28.1.3.22.34.2.1.32 Other Bacterial Pneumonia Other bacterial pneumonia ICD9: 482 has_Micro_Manifestation Localize Infitrate X-ray Finding: Localized Infiltrate SNOMED: 128309002 More Bacterial Pneumonias has_??_Manifestation + Sputum Culture Sputum Culture: Positive SNOMED: 442773002 More Manifestations Visualizing the Results Comparing Two Models Using the ROC Curves Inspecting the Tradeoffs in Accuracy Extensions of Diagnostic Modeling • Large Models AGE < 15.5 15.5 to 45.5 >= 45.5 Simple Temporal Model • 42.8 ± 21 Redundant Data Equations and Logic • • 8.41 42.3 49.3 Time Slice 1ModelsTime Slice 2 Temporal Time Slice 3 PNEUMONIA • Admit Dx: Pneumonia Present Absent 4.72 95.3 • Absent Present 95.3 4.71 Following Disease Over Time Summarized Data as Features PNEUMONIA1 Present Absent PNEUMONIA2 5.03 95.0 Absent Present PNEUMONIA3 94.4 5.61 Present Absent 5.61 94.4 TEMP CC RESPIRATORY COMPLAINT ABD PAIN ORTHO INJURY NEURO COMPLAINT FALL CHEST PRESSURE CHEST PAIN ABD PROBLEMS WEAKNESS TRAFFIC INJURY other- 54.5 5.09 3.34 3.14 3.11 2.73 2.33 2.23 2.02 1.92 19.6 < 36.75 36.75 to 37.35 37.35 to 38.05 >= 38.05 75.6 20.5 3.44 0.49 NLP_FINDING1 NLP_FINDING Negative Positive 36.63 ± 0.38 WBC < 11.85 11.85 to 15.15 >= 15.15 81.2 11.7 7.07 11.1 ± 2.1 Negative Positive 67.1 32.9 TEMP1 < 36.75 36.75 to 37.35 37.35 to 38.05 >= 38.05 78.8 17.1 3.12 0.95 36.61 ± 0.39 WBC1 < 11.85 11.85 to 15.15 >= 15.15 100 0 0 10.2 ± 0.95 NLP_FINDING2 65.9 34.1 Negative Positive TEMP2 < 36.75 36.75 to 37.35 37.35 to 38.05 >= 38.05 77.0 18.2 4.12 0.67 36.62 ± 0.39 WBC2 < 11.85 11.85 to 15.15 >= 15.15 81.4 10.7 7.86 11.1 ± 2.2 65.7 34.3 TEMP3 < 36.75 36.75 to 37.35 37.35 to 38.05 >= 38.05 76.8 18.2 3.78 1.26 36.63 ± 0.41 WBC3 < 11.85 11.85 to 15.15 >= 15.15 8 1 5 10.9 ± Agenda • • Why Decision Support? Introduction: Bayesian Diagnostic Networks • Bayesian Systems • • • • A Few Bayesian Tools Diagnostic Systems Representing the Semantics of Diagnosis • • • A Framework for Computable Models Diagnostic Modeling with Ontologies Ontologies -> Bayesian Network Clinical Data • A Brief Look at Medical Data Forms A diagram of a simple clinical model (A Data Object) Clinical Element Model for White Blood Count White Blood Count WBCLabObs data 9.6 x 103 Units Cells per CC quals Specimen Type SpecimenType data Commment data Whole Blood Comment Specimen Hemolyzed What Does a Medical Concept Look Like (in probability space) Concepts vary based on source, goals, and usage. Pneumonia • Present • Absent White Blood Count • Specimen Type • Units • Value Cough • Present • Absent • Unknown Pulmonary Infiltrate (Chest X-ray Report) • Present • Possible • Absent • Unknown What Does a Concept Look Like Some concepts have subconcepts. Pulmonary Infiltrate (Chest X-ray Report) • Present • Possible • Absent • Unknown White Blood Count • Specimen Type • Units • Value Specimen Type • Blood • Pleural Fluid • Ascitic Fluid • … Units • Mg per Deciliter • Grams • Cells per CC • … Value • Real Number What Does a Concept Look Like Concepts can Modeled Probabilistically Pneumonia Present 1.50 Absent 98.5 White_Blood_Count_Specimen Blood 82.0 Pleural Fluid 4.00 Acitic Fluid 2.00 Urine 12.0 White_Blood_Count_Units mg per deciliter kilograms grams cells per cc etc 16.7 11.1 33.3 5.56 33.3 White_Blood_Count_Value 0 to 1000 1000 to 2000 2000 to 3000 3000 to 4000 4000 to 5000 5000 to 6000 6000 to 7000 7000 to 8000 8000 to 9000 9000 to 10000 10000 to 11000 11000 to 12000 12000 to 13000 >= 13000 0.49 1.66 4.41 9.20 15.0 19.2 19.2 15.0 9.20 4.41 1.66 0.49 0.11 .023 6010 ± 2000 Cough Present 4.26 Absent 53.2 Unknown 42.6 CBC_White_Blood_Count Unavailable 95.4 0 to 1000 .022 1000 to 2000 .075 2000 to 3000 0.20 3000 to 4000 0.42 4000 to 5000 0.68 5000 to 6000 0.87 6000 to 7000 0.87 7000 to 8000 0.68 8000 to 9000 0.42 9000 to 10000 0.20 10000 to 11000 .075 11000 to 12000 .022 12000 to 13000 .005 >= 13000 .001 -203 ± 1500 Pulmonary Infiltrate (Chest X-Ray Report) Present Possible Absent Unknown 6.22 3.38 19.2 71.2 What Does a Concept Look Like Concepts are (in part) defined by their relationships. Pneumonia • Present • Absent Causes White Blood Count • Specimen Type • Units • Value Pulmonary Infiltrate • Present • Absent Specimen: Blood Units: Cells/CC Value Thesholds: High-9,000 Low-2,000 Reported As Pulmonary Infiltrate (Chest X-ray Report) • Present • Possible • Absent • Unknown White Blood Count • Elevated • Normal • Reduced • Unavailable What Does a Concept Look Like And there are a number of ways to compute Concepts. Pneumonia Present 1.50 Absent 98.5 Causes Pulmonary Infiltrate Present 5.41 Absent 94.6 White_Blood_Count_Specimen Blood 82.0 Pleural Fluid 4.00 Acitic Fluid 2.00 Urine 12.0 16.7 11.1 33.3 5.56 33.3 White_Blood_Count_Value 0 to 1000 1000 to 2000 2000 to 3000 3000 to 4000 4000 to 5000 5000 to 6000 6000 to 7000 7000 to 8000 8000 to 9000 9000 to 10000 10000 to 11000 11000 to 12000 12000 to 13000 >= 13000 0.49 1.66 4.41 9.20 15.0 19.2 19.2 15.0 9.20 4.41 1.66 0.49 0.11 .023 6010 ± 2000 Pulmonary Infiltrate (Chest X-Ray Report) Present Possible Absent Unknown 6.22 3.38 19.2 71.2 CBC_White_Blood_Count Unavailable 95.4 0 to 1000 .022 1000 to 2000 .075 2000 to 3000 0.20 3000 to 4000 0.42 4000 to 5000 0.68 5000 to 6000 0.87 6000 to 7000 0.87 7000 to 8000 0.68 8000 to 9000 0.42 9000 to 10000 0.20 10000 to 11000 .075 11000 to 12000 .022 12000 to 13000 .005 >= 13000 .001 -203 ± 1500 White_Blood_Count_Units mg per deciliter kilograms grams cells per cc etc Reported As Specimen: Blood Units: Cells/CC White_Blood_Count Elevated 0.30 Normal 4.15 Reduced .098 Unavailable 95.4 Value Thesholds: High-9,000 Low-2,000 Conclusion • Graphical Probabilistic Models can capture the Semantics of Medical Diagnosis. • These models can be manufactured using data collected during the course of care. • Probabilistic models can participate in clinical care. • Medical terminologies, embedded in Ontologies can help to develop these models. Comments and Questions Questions??? Probability and Semantics One way to think of semantics: a set of relationships between concepts Disease Finding Pneumonia Cough Concept Word Mammal Mouse Whole Part Hand Thumb The arrows provide link across which we can reason P(A) P(B|A) A diagram of a simple clinical model Clinical Element Model for Systolic Blood Pressure SystolicBP SystolicBPObs data 138 mmHg quals BodyLocation BodyLocation data Right Arm PatientPosition PatientPosition data Sitting # 60 What if there is no model? Site #1 Dry Weight: 70 kg Site #2 Weight: 70 kg Dry Wet Ideal # 61 Too many ways to say the same thing A single name/code and value • Dry Weight is 70 kg Combination of two names/codes and values • Weight is 70 kg • Weight type is dry # 62 64 Terminology • • Probability • P(D) – Probability of Disease • Implies a Ratio or Rate • Names: Prevalence, Prior Probability Location Specific Num berwith Disease Num berin Population Population from a Specific Setting 65 More Terminology Conditional Probability • Probability of a Finding in a patient with a Disease Number With Disease and Finding Number with Disease • Probability of a Disease in a Patient with a Finding Number With Disease and Finding Number with Finding • Probability of Disease in a patient with Finding 1, Finding 2, neg Finding 3, Finding 4, no Finding 5, etc. Number With Disease and a Group of Findings Number with the Group of Findings 66 Names for the Numbers Prevalence P(D) MI No MI 2% 98% Prior Probability 100% 67 Yet Another View Dividing by the Row Marginals Positive Predictive Value: P(D|F) MI No MI Chest Pain 16% 84% 100% No Chest Pain 0.6% 99% 100% Negative Predictive Value: P(no D|no F) 68 From Data to Probabilities Data Data Bayesian Calculation DIAGNOSIS PROB Pneumonia 92% Asthma 14% Chronic Bronchitis 12% Acute Bronchitis 8% Bayes Equation Probability of Disease When the Finding is Present 69 Probability of Both The Disease and Finding P( F and D) P( D | F ) P( F ) Probability of Finding 70 Bayes Equation From probability theory: P (F and D) = P (D) * P (F|D) P( F and D) P( D | F ) P( F ) 71 Bayes Equation Prior Disease Probability Sensitivity P ( D) P( F | D) P( D | F ) P( F ) Posterior Disease Probability Probability of Finding 72 Probability Updating The Disease is Myocardial Infarction The Finding is Chest Pain P ( D) P( F | D) P( D | F ) P( F ) 73 Probability Updating The Disease is Myocardial Infarction The Finding is Chest Pain P( MI ) P(Chest Pain | MI ) P( MI | Chest Pain) P(Chest Pain) P(MI) = 2.0% (0.02) P(Chest Pain|MI) = 75% (0.75) P(Chest Pain) = ? 74 Probability Updating The Disease is Myocardial Infarction The Finding is Chest Pain 0.02 0.75 P( MI | Chest Pain ) ? P(MI) = 2.0% (0.02) P(Chest Pain|MI) = 75% (0.75) P(Chest Pain) = ? 75 The Question of P(F) Simple Bayes • Patient has One and Only One Disease Multi-Membership Bayes • • Patient has Any Group of Disease Each Disease is Evaluated Independently Bayesian Networks • • Patient has Any Group of Disease Diseases are Evaluated According to Their Collective (Joint) Behavior P( F ) P( F and Di ) i Add All of the Probabilities Of Having Both the Finding and Disease P( F ) P( Di ) P( F | Di ) i 76 The Question of P(F) Simple Bayes • Patient has One and Only One Disease Multi-Membership Bayes • • P(F ) P(Di )P(F | Di ) P(Di )P(F | Di ) Patient has Any Group of Disease Each Disease is Evaluated Independently Bayesian Networks • • Patient has Any Group of Disease Diseases are Evaluated According to Their Collective (Joint) Behavior Two States Apply for Each Disease: With and Without the Disease The Question of P(F) Simple Bayes • Patient has One and Only One Disease Multi-Membership Bayes • • Patient has Any Group of Disease Each Disease is Evaluated Independently Disease Intermediate Concept Finding 3 Finding 1 Bayesian Networks • • Patient has Any Group of Disease Diseases are Evaluated According to Their Collective (Joint) Behavior Finding 4 Finding 2 P(F) is Determined from the Joint Effect of Child Nodes on Their Parents Probability Updating Using the Multi-Membership Model The Disease is Myocardial Infarction The Finding is Chest Pain 00..02 0200..75 75 P( MI P | Chest ( MI | Chest Pain ) Pain ) 0.02 0.75? 0.98 0.08 P(MI) = 2.0% (0.02) P(Chest Pain|MI) = 75% (0.75) P(Chest Pain) = 0.02 x 0.75 + 0.98 x 0.08 79 What about more findings? • The joy of recursion! F1= Chest Pain F2= ST Elevation F3= CK Increased …. P( D) P( F1 | D) P( D | F1 ) P( F1 ) ' ' P(D )P(F | D ) 2 P(D' | F2 ) = P(F2 ) '' '' P(D )P(F | D ) 3 P(D'' | F3 ) = P(F3 ) etc. 82 Modeling Medical Phenomena Examples of Some of the Things that can be Modeled 83 Pneumonia Present 2.00 Absent 98.0 Noise Real White Blood Count 0 to 2 0.13 2 to 4 2.09 4 to 6 13.3 6 to 8 33.5 8 to 10 33.5 10 to 12 13.4 12 to 14 2.61 14 to 16 0.82 16 to 18 0.45 18 toNormal 20 0.17and 20 toLogNormal 30 .055 30 to 50 0+ Distributions 50 to 80 0+ 80 to 130 0+ >= 130 0+ 8.15 ± 2.4 Rales Really There! Present 6.70 Absent 93.3 The Effect of Noise on the Diagnosis of Pneumonia • • Noise/Bias modeled Noisy Lab (continuous) Data with Simple Discrete Distributions Noisy Physical Exam (categorical) Data Types of Noise Measured White Blood Count 0 to 2 2.94 2 to 4 8.59 4 to 6 15.9 6 to 8 21.5 8 to 10 21.6 10 to 12 16.2 12 to 14 8.81 14 to 16 3.09 16 to 18 0.81 18 to 20 0.29 of Different Types 20 to 30 0.18 Normal Noise/Bias 30 to 50 .006 50 to 80 0+ 80 to 130 0+ >= 130 0+ 8.17 ± 3.5 Auscultated Rales Present 18.6 Absent 81.4 • Bias • Imprecision Source of Result Small SD 25.0 Big SD 25.0 Bias High 25.0 Bias Low 25.0 Reported By Medical Student Resident Attending Pulmonologist Over Sensitive Med Student 20.0 20.0 20.0 20.0 20.0 Boolean Logic A Present Absent Present Absent • C: If A and B then C 0.20 99.8 • H E Present Absent 20.0 80.0 Probabilistic Logic • Present Absent Four Variables B 1.0 99.0 84 High Medium Low 5.00 95.0 20.0 60.0 20.0 If A and B then C • P(C) = P(A and B) = P(A) * P(B|A) F: If A or B or E then F Present 24.8 A or B thenC D Absent 75.2 If A or BD: Ifthen Present 20.8 Absent 79.2 • P(C) = P(A or B) = P(A) + P(B) – P(A and B) In a Bayesian Network, the resolution of Linked Rules Occurs Automatically I: If B and F and H = High Then I G: If (C and D) or (E and F) then G Present Absent 5.19 94.8 Five Interconnected Rules Present Absent 4.00 96.0 Temporal Phenomena 85 Several Approaches to Temporal Modeling have been Proposed Markov and Hidden Markov Models are Most Common • • • • Called Dynamic or Temporal Bayesian Networks Can Model Complex Disease Behavior Trained from Data Organized in “Time Slices” Can be Extended to Include Decisions and Utilities • (become “Partially Observable Markov Decision Processes”) 86 The Dynamic Bayesian Network First Time Slice Disease_Status Absent 90.0 Mild 4.00 Moderate 3.00 Severe 2.00 • Changes in Dead 1.00 Second Time Slice Disease_Status1 Absent 81.6 Mild 7.54 Moderate 4.60 Severe 3.12 Status Disease Dead of a 3.13 Can Model Changing Medical Phenomena • the State or Findings Caused by the Disease in it’s Various States Test Normal 85.7 Mildly •Abnormal Can be8.40 Used Severely Abnormal 4.87 Patient Deceased 1.00 Explanation Test1 for Normal Mildly Prediction Abnormal Diagnosis, Severely Abnormal Patient Deceased 78.0 11.1 and 7.87 3.13 Nor Mild Sev Pat 87 Second Time Slice First Time Slice Pancreatitis Acute Recovering Dischargeable Pancreatitis 63.0 8.23 28.8 Acute Recovering Dischargeable 39.7 28.9 31.4 Am ylase Pain Present Absent 60.8 39.2 30 to 80 80 to 140 140 to 200 200 to 600 600 to 8500 26.2 18.6 17.6 19.2 18.5 Am ylase Pain Present Absent 57.3 42.7 Pancreatitis Over Time 0.392 ± 0.49 0.427 ± 0.49 981 ± 2000 Abdom enal Pain Present Absent 57.9 42.1 0.421 ± 0.49 Glucose 60 to 90 90 to 103 103 to 115 115 to 140 140 to 410 20.5 16.6 19.5 22.6 20.8 139 ± 81 30 to 80 80 to 140 140 to 200 200 to 600 600 to 8500 30.3 17.7 17.0 20.3 14.7 816 ± 1800 Abdom enal Pain Lipase 0 to 300 300 to 600 600 to 1200 1200 to 3000 3000 to 1.28e5 22.8 14.7 24.7 11.3 26.5 17900 ± 34000 WBC 4 to 6 6 to 8 8 to 9 9 to 12 12 to 17 14.6 28.6 16.0 18.3 22.5 9.28 ± 3.4 Present Absent 53.5 46.5 0.465 ± 0.5 Glucose 60 to 90 90 to 103 103 to 115 115 to 140 140 to 410 24.3 16.6 22.8 19.9 16.5 130 ± 74 Lipase 0 to 300 300 to 600 600 to 1200 1200 to 3000 3000 to 1.28e5 31.5 15.9 24.1 9.98 18.6 12700 ± 30000 WBC 4 to 6 6 to 8 8 to 9 9 to 12 12 to 17 17.8 30.5 16.5 15.7 19.5 8.9 ± 3.3 Bayesian Networks and Diagnosis Re-Purposing Clinical Data Strategic Goals Minimum goal: Be able to share applications, reports, alerts, protocols, and decision support with ALL customers of our same vendor Maximum goal: Be able to share applications, reports, alerts, protocols, and decision support with anyone in the WORLD Why do we need detailed clinical models? # 90 How are the models used in an EMR? Data entry screens, flow sheets, reports, ad hoc queries • Basis for application access to clinical data Computer-to-Computer Interfaces • Creation of maps from departmental/external system models to the standard database model Core data storage services • Validation of data as it is stored in the database Decision logic • Basis for referencing data in decision support logic Does NOT dictate physical storage strategy # 96 Core Assumptions ‘The complexity of modern medicine exceeds the inherent limitations of the unaided human mind.’ ~ David M. Eddy, MD, Ph.D. ‘... man is not perfectible. There are limits to man’s capabilities as an information processor that assure the occurrence of random errors in his activities.’ ~ Clement J. McDonald, MD Ontologies, Concepts, and Probabilities The way from medical concepts to diagnostic models Relational database implications Patient Identifier Date and Time Observation Type Observation Value Units 123456789 7/4/2005 Dry Weight 70 kg 123456789 7/19/2005 Current Weight 73 kg Patient Identifier Date and Time Observation Type Weight type Observation Value Units 123456789 7/4/2005 Weight Dry 70 kg 123456789 7/19/2005 Weight Current 73 kg How would you calculate the desired weight loss during the hospital stay? # 101