Healthcare Informatics: From Data Mining to Business Intelligence Hsinchun Chen, Ph.D. Director, Artificial Intelligence Lab BioPortal Program; Caduceus Intelligence Program University of Arizona Acknowledgements: NSF, NIH, NLM, CDC, NCI, DOD, LOC, DHS, DOJ, FBI, CIA © 2005 1 Business Intelligence and Data Mining © 2005 2 Business Intelligence and Analytics • $3B BI revenue in 2009 (Gartner, 2006) • The Data Deluge (The Economists, March 2010); internet traffic 667 Exabytes by 2013, Cisco; Total amount of information in 2010, 1.2 Zettabyte (KB-MB-GB-TB-PBEB-ZB-YB) • $9.4B BI software spending in 2010 and $14.1B by 2014 (Forrester) • IBM spent $14B in BI in five years; $9B BI revenue in 2010 (USA Today, November 2010); 24 acquisitions, 10,000 BI software developers, 8,000 BI consultants, 200 BI mathematicians © 2005 3 Business Intelligence and Analytics • BI: “skills, technologies, applications, and practices used to help an enterprise better understand its business and market.” • Technologies: data warehousing; Extraction, Transformation, and Load(ETL); Business Performance Management (BPM); visual dashboards; and advanced knowledge discovery using data and text mining • BI 2.0: web intelligence, web analytics, web 2.0, social media analytics, opinion mining; cloud computing and web services; real-time monitoring and mining; enterprise performances (marketing/accounting/finance/healthcare) © 2005 4 BIG and FAST • Business data from TBs to PBs • The Big Data Era; “speed to insight” • Barnes & Noble + Aster Data + MapReduce; McAfee + Datameer + Hadoop; AdKnowledge + Greenplum + Hadoop + Amazon EC2 • MapReduce/Hadoop vs. Parallel DMBSs: ETL and “read once” data sets; complex and powerful analytics; semi-structured data; quick-and-dirty analyses; limitedbudget operations; fault tolerance; performances © 2005 5 CS Ecosystem and Impacts © 2005 6 Data, Text, and Web Mining • Data Mining: ID3, neural networks, genetic algorithms, SVM; Weka, SPSS, SAS, Microsoft SQL server data mining, IBM Intelligent Miner, IBM Cognos • World Wide Web: ftp, http/html, browser, digital library, search engines; Mosaic, Alta Vista, Lycos, Yahoo, Google • Social Media: collaboration, participation, filtering, multimedia, social networks; Facebook, Youtube, Twitter, Second Life © 2005 7 Data Mining Models and Methods Predictive Modeling Database Segmentation Classification Demographic clustering Value prediction Neural clustering Link Analysis Deviation Detection Associations discovery Visualization Sequential pattern discovery Statistics Similar time sequence discovery © 2005 8 8 Data Mining: A KDD Process © 2005 9 9 © 2005 10 Grand Challenges for Engineering (NAE), 2010 • Advance health informatics – “The acquisition, management, and use of information in health” – Sharing information over regional, national, or global networks – Offering relevant decision support to clinicians and patients (Caduceus Intelligence) – Alleviating doctors’ information overload – “Just in time, just for me” medical advice at the point of care; Support personalized medicine – Accessing medical research information (HelpfulMed & GeneScene) – Improving response in public health emergencies (BioPortal) • Secure cyberspace • Prevent nuclear terror • Engineer the tools for scientific discovery …. ©•2005 11 Healthcare Informatics Projects at AI Lab: (1) HelpfulMed and GeneScene: medical literature, ontology, concept maps, text mining, visualization (NSF, NIH, NCI) (2) BioPortal: infectious disease information sharing, knowledge portal, spatio-temporal analysis and visualization, sequence visualization (NSF, CIA, CDC) (3) Caduceus Intelligence: HIS, LIS, PACS, clinical decision support, health data mining, personalized medicine (NSF, Tawain NSC) © 2005 12 Hsinchun Chen et al., 2005 © 2005 Hsinchun Chen, et al., 2010 13 Biomedical Informatics: Biomedical literature, biomedical ontologies, linguistic phrasing, categorization, text mining © 2005 14 © 2005 15 © 2005 16 HelpfulMED Search of Medical Websites © 2005 17 HelpfulMED search of Evidence-based Databases What does database cover? Search which databases? How many documents? Enter search term © 2005 18 Consulting HelpfulMED Cancer Space (Thesaurus) Enter search term Select relevant search terms New terms are posted Search again... Or find relevant webpages © 2005 19 Browsing HelpfulMED Cancer Map 1 Visual Site Browser Top level map 2 3 Diagnosis, Differential 4 Brain Neoplasms 5 © 2005 Brain Tumors 20 © 2005 21 Genescene Overview Knowledge Base Integrate gene relations from literature and outside databases and provide knowledge for learning and evaluation in data mining Text Mining Process Medline abstracts and extract gene relations automatically from the text Data Mining Process gene expression data (and existing knowledge) and use different algorithms to extract regulatory networks Interface & Visualization Allow searching for keywords, display a map of the relations extracted from the text and/or from the microarray © 2005 22 Genescene Overview JIF Ontologies External Databases HUGO Publications Medline XML Parser Publications & GO Meta Information UMLS Knowledge Base Titles & Abstracts GeneScene Text Mart Relation Parsers Lexical lookup AZ Noun Phraser POS Tagging Adjuster & Tagger Full Parser FSA Relation Grammar UMLS © 2005 Relations in flat files Concept Space Relations in flat files Co-occurrence relations Feature Structures GeneScene Data Mart Text Mining GeneScene Information Retrieval Visualization Data Mining Spring Algorithm Micro Array Data Bayesian Networks Association Rule Mining 23 Problem: Gene Pathway •Title Key roles for E2F1 in signaling p53- dependent apoptosis and in cell division within developing tumors. •Abstract: Apoptosis induced by the p53 tumor suppressor can attenuate cancer growth in preclinical animal models. Inactivation of the pRb proteins in mouse brain epithelium by the T121 oncogene induces aberrant proliferation and p53-dependent apoptosis. p53 inactivation causes aggressive tumor growth due to an 85% reduction in apoptosis. Here, we show that E2F1 signals p53-dependent apoptosis since E2F1 deficiency causes an 80% apoptosis reduction. E2F1 acts upstream of p53 since transcriptional activation of p53 target genes is also impaired. Yet, E2F1 deficiency does not accelerate tumor growth. Unlike normal cells, tumor cell proliferation is impaired without E2F1, counterbalancing the effect of apoptosis reduction. These studies may explain the apparent paradox that E2F1 can act as both an oncogene and a tumor suppressor in experimental systems © 2005 Action Protocols Graphic Representation p53 reads "E2F1 signals p53-dependent apoptosis" E2F1 apoptosis p53 infers So, I'm assuming... a straight line pathway... E2F1 apoptosis Expert errs and corrects E2F1 reads "E2F1 acts upstream of p53" p53 apoptosis E2F1 p53 reads "E2F1 deficiency does not accelerate tumor growth" apoptosis Final graph tumor growth 24 Prepositions: OF/BY/IN OF BY IN q0 Nominalization (-ion) q5 Adjective, noun, verb (-ed) Adjective, Noun, verb (-ed) Nominalization (-ion) Nominalization (-ion) Negation q4 NP, 5: str1 NP q1 Aux, 1: tr13 OF q6 OF Nominalization (-ion) q7 Aux mod Negation q2 Adjective, noun, verb (-ed) OF q15 q18 q13 NP verb aux verb verb q14 verb Nominalization (-ion) q3 mod OF q8 BY q9 NP © 2005 mod q11 BY q10 q12 NP IN IN NP NP BY IN q16 IN NP q17 25 Example Map (one abstract) © 2005 26 Select interesting relations to visualize © 2005 27 Overview © 2005 Double click to expand 28 Expanded node © 2005 29 Finding the truth: p38 acts as a negative feedback for Ras signaling © 2005 30 BioPortal: infectious disease information sharing, knowledge portal, spatio-temporal analysis and visualization, sequence visualization © 2005 31 Syndromic Surveillance • A syndrome is a set of symptoms or conditions that occur together and suggest the presence of a certain disease or an increased chance of developing the disease (from NIH/NLM) • Syndromic surveillance is based on health-related data that precede diagnosis and signals a sufficient probability of a case or an outbreak to warrant further public health response (from CDC) – Targeting investigation of potential cases – Detecting outbreaks associated with bioterrorism 2016/7/12 © 2005 32 32 Syndromic Surveillance Data Sources in Different Stages of Developing a Disease Reproduced from Mandl et. al. (2004) 2016/7/12 © 2005 33 33 Syndromic Surveillance System Survey Projects User population Stakeholders RODS -Pennsylvania, Utah, Ohio, New Jersey, Michigan etc RODS laboratory, -418 facilities connected to RODS U of Pittsburgh STEM N/A IBM ESSENCE II 300 world wide DOD medical facilities DoD EARS -Various city, county, and state public health officials in the United States and abroad of US CDC BioSense Various city, county, and state public health officials in the United States and abroad of US CDC RSVP Rapid Syndrome Validation Project; Kansas, NM Sandia NL, NM BioPortal NY, CA, Kansas, AZ, Taiwan U of Arizona 2016/7/12 © 2005 34 34 Sample Systems and Data Sources Utilized Projects Data sources/Techniques RODS - Chief complaints (CC); OTC medication sales - Free-text Bayesian disease classification STEM - Simulated disease data - Disease modeling and visualization, SIR ESSENCE II - Military ambulatory visits; CC; Absenteeism data EARS - 911 calls; CC; Absenteeism; OTC drug sales - Human-developed CC classification rules BioSense - City/state generated geocoded clinical data - Graphing/mapping displays RSVP - Clinical and demographic data - PDA entry and access BioPortal - Geo-coded clinical data; Gemonic sequences; Multilingual CC - Real-time access and visualization; Web based hotspot analysis; Sequence visualization; Multilingual ontologybased CC classification © 2005 35 Information Sharing Infrastructure Design © 2005 Data Ingest Control Module Cleansing / Normalization Adaptor Adaptor SSL/RSA Adaptor SSL/RSA Info-Sharing Infrastructure Portal Data Store (MS SQL 2000) PHINMS Network XML/HL7 Network NYSDOH CADHS New 36 Data Access Infrastructure Design Public health professionals, researchers, policy makers, law enforcement agencies & other users WNV-BOT Portal Browser (IE/Mozilla/…) SpatialTemporal Data Search Visualand Query ization SSL connection Analysis / Prediction HAN or Personal Alert Management Web Server (Tomcat 4.21 / Struts 1.2) Data Store User Access Control API (Java) 2016/7/12 © 2005 Dataset Privileges Management Data Store (MS SQL 2000) Access Privilege Def. 37 37 BioPortal © 2005 38 38 Dataset name Advanced Spatial / Temporal Search criteria Select background maps Results listedlist in table Available dataset User main page Positive cases Time range Select NY / CA population, river and lakes County / State Choose WNV disease data Select CA dead bird, chicken and NY dead bird data Positive cases User Login Positive cases Start STV 2016/7/12 © 2005 Specify bird species 39 39 NYSpatial dead bird distribution temporal distribution pattern pattern GIS Timeline Close Zoom in NY Zoom in Periodic Pattern Year 2001 data Control panel 2016/7/12 © 2005 Move time slider, year 3 2 2 weeks View1 all year 3 year window window indata 3 year span Concentrated Similar time Overall pattern in May pattern / Jun 40 40 Spatial distribution Overlay population map pattern Dead bird cases Dead birdlong cases migrate from island distribute along Into upstate NY populated areas near Hudson river Enable population map Season end Move time slider 2016/7/12 © 2005 41 41 BioPortal HotSpot Analysis: RSVC, SaTScan, and CrimeStat Integrated (first visual, real-time hotspot analysis system for disease surveillance) • West Nile virus in California 2016/7/12 © 2005 42 42 Hotspot Analysis-Enabled STV Select hotspot to Regular STV highlight case points Select algorithms Hotspots found! Select baseline and case periods Select Select baseline targetand geographic case periods area 2016/7/12 © 2005 43 43 Taiwan Hospital Surveillance: Chief Complaints © 2005 44 44 Grouped by Hospital © 2005 45 45 Taiwan SARS Network Visualization (Cont.) Social network visualization with patients and geographical locations Scroll bar on time dimension to see the evolution of a network 2016/7/12 © 2005 46 46 Taiwan SARS Network Evolution – Hospital Outbreak The index patient of Heping Hospital began to have symptoms. 2016/7/12 © 2005 47 47 Security Informatics: law enforcement information sharing, crime data mining, counter-terrorism surveillance, multilingual text mining, intelligence web mining © 2005 48 • Intelligence and Security Informatics (ISI): Development of advanced information technologies, systems, algorithms, and databases for national security related applications, through an integrated technological, organizational, and policy-based approach” (Chen et al., 2003a) • Data, text, and web mining • From COPLINK to Dark Web H. Chen, computer scientist, artificial intelligence, U. of Arizona (2006) © 2005 49 COPLINK • • • • • • • 1996-, DOJ, NIJ, NSF, ITIC, DHS Connect Detect Agent STV (Spatio-Temporal Visualization) CAN (Criminal Activity Network) BorderSafe (Mutual Information) • AI Lab Knowledge Computing Corporation (KCC) • Tucson, Phoenix AZ 3500 agencies, 20 states • The largest public safety information sharing and data mining system! © 2005 50 © 2005 51 51 •The New York Times November 2, 2002 •ABC News April 15, 2003 •Newsweek Magazine March3, 2003 © 2005 52 Private Equity Firm Buys MIS Spinoff Company for $40M…July 2009 © 2005 53 Dark Web • 2002-, ITIC, NSF, LOC • Discussions: FBI, DOD/Dept of Army, NSA, DHS • Collection: – Web site spidering – Forum spidering – Video spidering • Analysis and Visualization: – Link and content analysis (web sites) – Web metrics analysis (web sites sophistication) – Authorship analysis (forums; CyberGate) – Sentiment analysis (forums; CyberGate) – Video coding and analysis (videos; MCT) • 50,000 terrorist web sites, 1B documents/files about terrorists • The largest collection of terrorist-generated contents: 10 TBs (~LOC) • The most advanced multilingual Web 2.0 intelligence analysis system © 2005 54 The Dark Web project in the Press Project Seeks to Track Terror Web Posts, 11/11/2007 Researchers say tool could trace online posts to terrorists, 11/11/2007 Mathematicians Work to Help Track Terrorist Activity, 9/14/2007 Team from the University of Arizona identifies and tracks terrorists on the Web, 9/10/2007 © 2005 55 Web Site Example: Links to Multimedia and Manuals Link to “The General of Islam” Radio Station Azzam Speeches Berg beheading others videos of Zarqawi Source: http://www.al-ghazawat.110mb.com/, © 2005 French and Arabic Web Site Complete 65 pages manual of a 50 caliber rifle in pdf 56 Dark Web Forum Tools © 2005 57 Caduceus Intelligence: HIS, LIS, PACS, clinical decision support, health data mining, virtual patients, personalized medicine, healthcare 2.0 (physicians and patients like me) © 2005 58 Outline • Background & Related Literature – Healthcare issues, healthcare systems, and healthcare IT • Research Testbed: Taiwan MS Hospital Data Overview – Tablespace, statistics, and examples • Techniques for Symptom-Disease-Treatment (SDT) Associations – Rationale, related work, method, and results – Dashboards for Physician and Manager © 2005 59 The Healthcare Systems • Healthcare continues to be one of the largest and fast-growing industry in the United Stated in the past decades. Annual Healthcare Expenditure in the United States Per Capita US$ % GDP 8,000 18.0 7,000 16.0 14.0 6,000 12.0 5,000 4,000 8.0 3,000 6.0 2,000 4.0 2008 2004 2000 1996 1992 1988 1984 1980 0.0 1976 0 1972 2.0 1968 1,000 1964 Behind this multi-billion industry are practices that need to be transformed to be more effective. – It was estimated that in the U.S. about 100,000 people die each year due to preventable medical errors (Institute of Medicine, 2001). 1960 • 10.0 Data Source: OECD Health Data 2010 © 2005 60 Effective Healthcare • Healthcare in the 21st Century – Quality metrics: safety, effective, patient-centered, timely, efficient, and equitable (Institute of Medicine, 2001). • Along with medical professions, computer scientist and information systems scholars have also been engaged in bringing effective healthcare, especially from an information technology (IT) angle (Stead & Lin, 2009). • Studies show that health IT (HIT) plays a significant role in transforming healthcare practice (Agarwal et al., 2010). – President Obama’s Health Information Technology for Economic and Clinical Health (HITECH) initiatives encourage healthcare providers to adopt digitalized solutions to enhance quality of care. • Financial incentives for “meaningful” use of HIT • Interoperable data standard for electronic health records (EHR) © 2005 61 Four Domains of HIT • In their study, Stead and Lin (2009: 29) categorize HIT into four domains – automation, – connectivity, – decision support, and – data-mining capabilities • It is indicated that despite their direct and significant impact in delivering effective healthcare, decision support and data-mining capabilities still receive little support in today’s HIT design and investment. © 2005 62 • Although often been used interchangeablely with medical informatics and ehealth, health informatics possesses simultaneous emphasis on decision support, data mining, as well as visualization in a broad healthcare provision process (Nykänen, 2000; Norris, 2002; Brender et al., 2000). © 2005 Health Informatics Decision Support Data Mining Health Informatics Visualization 63 Healthcare Decision Support • Advanced data analysis and decision support has been featured as a key success factor of HIT (Lau et al., 2010) as well as a promising IS research area (Sarnikar et al., 2010; Agarwal et al., 2010). • “Decision technology does not merely facilitate or augment decisionmaking rather it reorganizes decision-making practices” (Patel et al., 2002). • Decision support in healthcare process may reside in various managerial, administrative, and clinical scenarios (Norris, 2002). Key Pillars for Clinical Decision Support Systems • Challenges (Sittig et al., 2008) – Data integration and management – Complicated workflows – Reasoning in a high-dimensional space Clinical Perspectives Best Knowledge Available When Needed High Adoption & Effective Use of Clinical Decision Support Systems Continuous Improvement of Clinical Knowledge & Supporting Tools Knowledge Availability Effective IS Use Extending Knowledge & Methods Knowledge & Semantic Data Management Technology Acceptance & Adoption Medical Informatics & Healthcare Business Intelligence IS Perspectives © 2005 64 64 (Adapted from Osheroff et al., 2007) Healthcare Data Mining • Research indicates the need for domain-driven data mining to enable actionable knowledge discover and delivery (Cao, 2010). Health informatics is distinct from other data mining tasks for its voluminous and heterogeneous data as well as the inherent privacy and ethic issues (Cios & William Moore, 2002). • Previous studies have been applying data mining in disease pattern recognition (Rao et al., 2002; Wroblewski et al., 2009), risk assessment (Austin et al., 2010; Hou et al., 2010), and treatment support (Dahlström et al., 2006; Toussi et al., 2009) in various medical and healthcare contexts. © 2005 65 Healthcare Visualization • Users’ preferences and cognitive processes should be considered in the complex clinical decision process (Fieschi et al., 2003; Kushniruk & Patel, 2004; Horsky et al., 2003). • Research shows evidence that information representation significantly influence human’s sensemaking and analytical reasoning process (Gotz & Zhou, 2009; Qu & Furnas, 2005). • However, human-computer interface has also been ranked as the top challenges in clinical decision support (Sittig et al., 2008). © 2005 66 Research Testbed: Taiwan MS Hospital ATTRIBUTE Data Storage Size VALUE Current Database Status 39.6 GB Number of Tables 297* Number of Registered Patients 894,061 Number of Unique Outpatients 387,014 Time Span of Outpatient Records (from outpatient master table PTOPD) 2002/01/03-2010/07/12** Average Number of Daily Outpatients 2409.95*** Number of Unique Inpatients 91,717 Time Span of Inpatient Records (from inpatient master table PTIPD) 2003/11/17-2010/07/13 Average Number of Daily Inpatients 60.93 The tables encompass data from Min-Sheng’s Hospital Information System (HIS), Laboratory Information System (LIS), and Picture Archiving and Communication Systems (PACS). ** Records before Dec. 2004 are incomplete. Average daily outpatients before Dec. 2004 is 21.49. *** The calculation of average daily outpatients excludes Sundays and records before Dec. 2004. * © 2005 67 Main Tables in the Tablespace Outpatient Modules PTER (急診掛號檔) PTOPD (門急診病患檔) CODINGOPDA (門急診疾病分類診斷檔) CODINGOPD (門急診疾病分類資料檔) CODINGOPDP (門急診疾病分類處置檔) ACNTOPD (門診病患醫令明細檔) PRICE (收費標準檔) PTCOURSE (病患同療程記錄檔) ORDAOPD (門診病患診斷檔) MCHRONIC (慢性病連續處方箋檔) HRECOPD1 (歷史門診收據檔(表頭)) HRECOPD2C (歷史門診收據檔(貸方)) HRECOPD2D (歷史門診收據檔(借方)) HORDERA (門診病患診斷檔2) FREQUENCY (頻次代碼檔) ORDSOOPD (門診S.O.檔) Patient background: CHART (病歷基本資料檔) AGEGROUP1 (年齡分層主檔) AGEGROUP2 (年齡分層表身檔) PTTYPE (身份代碼檔) Diagnosis (Symptoms and Diseases) Registration PTIPD (住院病患基本資料檔) IPDINDEX (住院病患索引檔) IPDTRANS (住院病患履歷檔) Treatment (Procedures and Orders) Transaction / Receipt CODING (疾病分類表頭檔) CODINGA (疾病分類診斷檔) CODINGP (疾病分類處置檔) ACNTIPD (住院病患醫令明細檔) PRICE (收費標準檔) ORDFB (住院健保醫療費用醫令清單檔) DTLFB (住院健保醫療費用清單檔) DIAGDOCA (入院病摘主檔) DIAGDOCAX (入院病摘內容檔) DIAGDOCI (出院病歷摘要主檔) DIAGDOCIX (出院病摘內容檔) FREQUENCY (頻次代碼檔) RECIPD1 (住院收據檔(表頭)) RECIPD2C (住院收據檔(貸方)) RECIPD2D (住院收據檔(借方)) RECIPDNH (住院收據檔(健保)) ORDAIPD1 (住院診斷履歷檔(表頭)) ORDAIPD2 (住院診斷履歷檔(表身)) Inpatient Modules Hospital: Disease: Operation: LIS: PACS: BED (床位主檔) BEDGRADE1 (床位等級代碼檔(表頭)) BEDGRADE2 (床位等級代碼檔(表身)) BEDSTATUS (床位狀態檔) APDRGD (DRG疾病代碼對照檔) DRGICD9 (DRG疾病代碼對照檔) PTOR (手術病人主檔) PTORDRPT (手術病人報告記錄明細檔) ORSAMPLE (手術內容模組主檔) PTORALLLOG (手術異動記錄LOG檔) PTORDIAG (手術病人術後診斷) PTORDRPT (手術病人報告記錄明細檔) PTORLOG (手術病人主檔LOG) PTORSTAFF (手術參與人員檔) LABITEM1 (檢驗項目主檔(表頭)) LABITEM2 (檢驗項目主檔(表身)) LABGROUP (檢驗組別代碼檔) LABP1 (檢驗病理表頭檔) LABP2 (檢驗病理表身檔) LABS1 (檢驗病理組織學1檔) LABS2 (檢驗病理組織學2檔) LABX1 (檢驗異動表頭檔) LABX2 (檢驗異動表身檔) EXAMITEM (檢查項目檔) PTEXAM (申請單主檔) PTEXAMINDEX (申請單索引檔) PTEXAMITEM (申請單檢查項目檔) PTEXAMRPT (申請單報告檔) DEPT (部門代碼檔) DIV (科別代碼檔) DOCTOR (醫師代碼檔) HOSPITAL (醫院代碼檔) © 2005 ICDGROUP1 (ICD分類主檔) ICDGROUP2 (ICD分類表身檔) ICD (國際疾病分類代碼檔) Note: Tables with underlines contain free-text data. 68 TABLE Top 10 Tables NUMBER SPAN (Ranked by Number ofTIME Records) TABLE CHINESE NAME OF * CATEGORY RECORDS ACNTOPD 門診病患醫令明細檔 23,404,233 2004/07/22-2010/07/12 Outpatient - Order ACNTIPD 住院病患醫令明細檔 19,822,382 2004/05/10-2010/07/13 Inpatient - Order HRECOPD2C 歷史門診收據檔(貸方) 9,306,611 2004/06/30-2010/07/12 Outpatient Transaction CODINGOPDA 門急診疾病分類診斷檔 6,983,919 2004/07/01-2010/04/30 Outpatient - Diagnosis 6,908,559 2004/06/25-2010/07/13 Inpatient - Diagnosis DIAGDOCXLOG 病摘內容異動LOG檔 ORDFB 住院健保醫療費用醫令清單 檔 6,118,311 2006/06-2010/07 Inpatient - Order HRECOPD1 歷史門診收據檔(表頭) 5,549,295 2004/06/30-2010/07/12 Outpatient Transaction LABX2 檢驗異動表身檔 5,267,898 2008/07/09-2010/12/29 Lab ORDAOPD 門診病患診斷檔 5,117,329 2000/01/032010/07/12** Outpatient - Diagnosis HRECOPD2D 歷史門診收據檔(借方) 4,046,316 2004/06/30-2010/07/12 Outpatient Transaction © 2005 69 Important MS Hospital Tables and Their Potential Applications HIS CONTENT TABLE ORDAOPD & Signs, symptoms, and diseases ORDAIPD2 ACNTOPD & Orders for examination, procedure, ACNTIPD and medication, payment information LIS Chief complaints, personal history of DIAGDOCAX illness and drug, primary diagnosis, & plan of examination and treatment, DIAGDOCIX complication, and discharge instructions TABLE CONTENT LABX1 Lab examination results (with safe upper & & lower limits) and disease code for the lab LABX2 exam PACS © 2005 LABS2 Physician’s comments for the lab results CONTENT TABLE EXAMITE Medical image name and position of M body PTEXAMR Physician’s comments for the PACS PT results APPLICATION • Disease prediction • Disease clustering • Order generation • Treatment recommendation • Physician’s clinical preferences • Clinical workflow • Information extraction • Problem verification • Treatment recommendation APPLICATION • Patient safety alert • Lab suggestion & interpretation • Disease severity assessment • Treatment outcome assessment APPLICATION • PACS exam suggestion • Disease severity assessment • Treatment outcome assessment 70 LIS HIS PACS Sample Data Background Information Name: Jack Gender: Male Age: 55 Division: Pulmonary Medicine Doctor Name: Dr. John Admission Date: 2009 Aug 14th …… Admission Comment Chief Complaints: cough , white sputum, poor intake & loss of BW (about 9 kg ) for 3 months hemoptysis for 2 weeks mild SOB History of present illness: …… Family History: …… © 2005 71 LIS HIS Hospital Information System (HIS) PACS Sample Data (cont.) DIAGNOSIS & ICD/PRICE INTERVENTIO CODE N Symptom 786 Disease Procedure Order © 2005 DESCRIPTION Symptoms involving respiratory system and other chest symptoms 786.3 Hemoptysis 162.9 Malignant neoplasm of bronchus and lung, unspecified 197.2 Secondary malignant neoplasm of pleura 33.24 Closed (endoscopic) biopsy of bronchus 87.41 Computerized axial tomography of thorax OGEFI2 Gefitinib F.C. (Iressa) 67051 Thoraciscopic wedge or Partial resection of the Lung IETOP1 Etoposide injection (Etoposide-Teva) 72 LIS HIS Laboratory Information System PACS Sample Data (cont.) LAB NAME Surgical pathology Level IV © 2005 REPORT TYPE INTERPRETATION Diagnosis Bronchus, left, transbronchial biopsy --- Squamous cell carcinoma. Results of immunohistochemical stains: P63: tumor cell positive. TTF-1: tumor cell negative. Gross The specimen consists of 3 pieces of gray tan and soft tissue, up to 0.2 cm. All for section. Microscopic Section of the bronchial biopsy shows clusters of tumor cells with large pleomorphic nuclei and abundant cytoplasm. Immunohistochemical stains show that the tumor cells are postive for P63 but negative for TTF-1. Squamous cell carcinoma is considered. 73 LIS HIS PACS Picture Archiving and Communication Systems (PACS) Sample Data (cont.) EXAM NAME INTERPRETATION CT with/without contrast PROCEDURE: Clinical information: left hilar mass Technique: Chest CT scan with and without contrast enhancement performed by 64MDCT (LightSpeed VCT, GE) Limitations: None Comparison: No comparison study available Interpretation: Any further question, please feel free to contact with the radiologist, Dr. Mark, who interpreted this study. FINDINGS: The result shows focal areas of decreased opacity (lung destruction) with or without visible walls in … -There are multiple, small, centrilobular lucencies and patchy appearance. They are predominantly upper lobe distribution. It is suggestive of centrilobular emphysema. A large oval shaped mass like lesion, measuring over 5 cm in size, is localized at the left perihilar region with direct invasion to the mediasitnum. Wedge shape consolidation of the left lower lobe is disclosed down to the lung base. Significant adenopathy is positive at the AP window, pericarinal and subcarinal spaces. ====== [Conclusion] ====== 1. Lung cancer (left hilar) with partial collapse of LLL, stage IIIA 2. Mediastinal LN metastases 3. Emphysema is suggested. Chest PA view © 2005 CXR showed normal heart size with slightly increased lung markings at hila and lower lung. Prominent left hilar shadow is found. Suggest F/U. 74 A Clinical Decision Process DIAGNOSIS & ICD/PRICE DESCRIPTION INTERVENTION CODE Symptom 786 Symptoms involving respiratory system and other chest symptoms Disease Procedure Order 786.3 Hemoptysis 162.9 Malignant neoplasm of bronchus and lung, unspecified 197.2 Secondary malignant neoplasm of pleura 33.24 Closed (endoscopic) biopsy of bronchus 87.41 Computerized axial tomography of thorax OGEFI2 Gefitinib F.C. (Iressa) 67051 Thoraciscopic wedge or Partial resection of the Lung IETOP1 Etoposide injection (Etoposide-Teva) • Generally, physicians employ their medical knowledge to perform health care tasks in the iterative sequence of: symptoms, diseases, and treatments (and outcomes). © 2005 75 A Clinical Decision Process (cont.) • One of the difficulties in the provision of effective healthcare is that symptom, disease, and treatment constitute a high-dimensional, complicated, yet interrelated space (Sittig et al., 2008). • Diagnosis errors are common and cannot be mitigated by EHRs without integrating the digital evidence with physician’s diagnostic thinking (Schiff & Bates, 2010). • Techniques to facilitate physician’s diagnostic reasoning can be very useful. © 2005 76 Association Rule Mining (ARM) in Medicine • Diagnostic reasoning, planning, and patient management are three generic medical reasoning tasks (Long, 2001), in which diagnostic reasoning is a disease prediction process based on symptoms, signs, lab results, and/or medical images whereas planning is a set of interventions taken to verify and resolve patient’s illness. • The past decade has shown increasing interest in applying association rule mining (ARM) to support the diagnostic reasoning and planning process. Specifically, ARM was applied to disease prediction, problem verification, comorbidity analysis, disease clustering, automatic order generation, treatment suggestion, and many other medical scenarios. © 2005 77 ARM in Medicine: Symptoms, Diseases, and Treatments 0.05 < Confidence <=0.2 0.2 < Confidence <=0.5 0.5 < Confidence Symptoms Hemoptysis (786.3) Other dyspnea and respiratory abnormalities (786.09) 0.0640 0.0689 Unspecified pulmonary tuberculosis confirmation unspecified (011.90) Diseases 0.4525 Pneumonia (486) 0.2502 Malignant neoplasm of bronchus and lung, unspecified (162.9) 0.1456 0.2097 0.0640 Treatments 0.5562 Terbutaline sulphate 5mg/2ml/vial (ETERBUS) 0.7615 0.5496 Thoracentesis Chest PA view Pyridoxine Hcl (34.91) (320011) Tablets 50mg (OVTB6) © 2005 0.1158 0.4882 0.6777 0.4194 0.4646 Computerized axial tomography 0.2707 of thorax (87.41) Injection or infusion of Direct smear by cancer Gram Stain chemotherapeutic Aerobic Culture (130062) substance (13007) (99.25) 78 ARM Research in Medicine Disease prediction Problem verification Symptom, sign, or lab result Automatic order generation Treatment suggestion Disease Treatment Problem verification Comorbidity analysis Disease clustering STUDY DIRECTION OF ASSOCIATIONS APPLICATION Wright et al. (2010) Lab result to Disease Treatment to Disease Problem verification Hanauer et al. (2009) Disease to Disease Disease clustering Klann et al. (2009) Disease to Treatment Automatic order generation Tai & Chiu (2009) Disease to Disease Comorbidity analysis Ordonez (2006) Symptom to Disease Disease prediction Wright & Sittig (2006) Disease to Treatment Outpatient - Transaction Cao et al. (2005) Disease to Disease Comorbidity analysis Imberman (2002) Symptom to Disease Disease prediction Treatment to Disease Problem verification Doddi et al. (2001) © 2005 79 Hanauer, D. A., Rhodes, D. R., & Chinnaiyan, A. M. (2009). Exploring Clinical Associations Using ‘-Omics’ Based Enrichment Analyses. PLoS ONE, 4(4), e5203. © 2005 80 Wright, A., Chen, E. S., & Maloney, F. L. (2010). An automated technique for identifying associations between medications, laboratory results and problems. JBI, 43(6), 891-901. Medication Cyclosporine micro (Neoral) Ritonavir Tenofovir/emtricitabinea Multivitamin (vitamins A, D, E, K) Atazanavir Efavirenz/emtricitabine/tenofovira Efavirenz/emtricitabine/tenofovira Ritonavir Cyclosporine micro (Neoral) Tenofovir/emtricitabinea Problem Cardiac transplant HIV/AIDSb HIV/AIDSb Cystic fibrosis HIV/AIDSb HIV/AIDSb HIV positive HIV positive Stress test HIV positive Support Confidence Chi square Interest Conviction 72 47.37% 15974.05 222.76 1.9 108 87.10% 13584.49 126.62 7.7 117 74.05% 12484.95 107.66 3.83 13 76.47% 12206.84 939.93 4.25 91 87.50% 11495.76 127.21 7.94 77 95.06% 10576.62 138.2 20.11 73 90.12% 10525.03 145.06 10.06 90 72.58% 10423.49 116.82 3.62 63 41.45% 10390.04 166.04 1.7 101 63.92% 10284.74 102.89 2.75 Laboratory Result Bethesda inhibitor assay vWF multimers Fetal hemoglobin Cotinine Cotinine Cotinine Vitamin K Cyclosporine level Tobramycin level Cotinine Problem Hemophilia von Willebrand’s disease Sickle cell anemia Lung Transplant Cystic Fibrosis Pulmonary Fibrosis Cystic Fibrosis Cardiac Transplant Cystic Fibrosis Pulmonary Fibrosis Support Confidence Chi Square Interest Conviction 7 25.00% 5906.03 845.03 1.33 8 53.33% 3711.07 465.22 2.14 18 25.35% 6647.5 370.57 1.34 9 18.75% 2452.46 274.06 1.23 10 20.83% 2545.07 256.07 1.26 9 27.27% 1997.01 223.48 1.37 8 17.39% 1696.96 213.76 1.21 101 42.98% 20344.05 202.12 1.75 10 16.13% 1966.38 198.25 1.19 11 22.92% 2048.01 187.78 1.3 © 2005 81 Imberman, S. P., Domanski, B., & Thompson, H. W. (2002). Using dependency/association rules to find indications for computed tomography in a head trauma dataset. AI in Medicine, 26(1-2), 55-68. © 2005 82 SDT ARM Analysis • Goal: To find diagnosing and prescribing associations among symptoms, diseases, and treatments (medical procedures and medicines). • Process: 1. 2. 3. 4. Input a target code (c) that represents either a symptom, disease, or treatment Identify the set of clinical visits (V) from inpatient/outpatient records which have c Identify the set of codes (O) are also assigned in V For each code o in O, calculate the probability of o, given the occurrence of c. That is, P (o | c ) 5. © 2005 Entire Patient Visits Patient Visits with Code c P (o c ) P (c ) Rank and return the codes in O that meet thresholds, such as minimum support, confidence, and interest. Patient Visits with both Codes c and o 83 Study Case: Lung Cancer • To exemplify the SDT association, we proceed with lung cancer subjects. The reason for such selection is two-fold: – Cancer continues to be the leading cause of death around the world, accounting for one fourth of death in the United States (Jemal et al., 2010) and 27.3% death in Taiwan (Department of Health, Taiwan, 2010). – Among all kinds of cancers, lung cancer occupies the highest population, new cases, and death (Jemal et al., 2010). Lung cancer (ICD-9 code: 162) is also one of the most common types of cancer in our dataset (534 unique patients and 1131 distinct inpatient visits). • With various lung cancer subtypes, subjects are selected when their diseases are coded as 162.9 (ICD-9 code for malignant neoplasm of bronchus and lung unspecified), resulting a total of 484 subjects and 975 inpatient visits. © 2005 84 Descriptive Statistics of the Data Departments 0 100 200 300 400 500 600 700 800 Pulmonary medicine 622 Pulmonary medicine and critical illness 122 Cardiology 48 Thoracic surgery 43 Gastroenterology 30 General surgery 21 Other 11 Dept. (each < 20) 89 Physicians 500 400 300 200 100 0 386 175 M1158 © 2005 312 M1584 102 M0031 Other 64 physicians (each < 50) 85 Descriptive Statistics of the Data (cont.) Patient Age Groups Patient Genders 800 600 400 200 0 617 358 M 800 600 400 200 0 F Frequent Cooccurred Diagnosis 291 33 25 to 44 45 to 64 0 Pneumonia organism unspecified Secondary malignant neoplasm of bone and… Unspecified essential hypertension Acute respiratory failure Unspecified pleural effusion Secondary malignant neoplasm of brain and… Secondary malignant neoplasm of pleura Diabetes mellitus without complication type ii or… Secondary malignant neoplasm of lung Secondary malignant neoplasm of liver © 2005 651 100 > 65 200 300 244 155 135 131 130 110 100 91 83 77 86 Top 10 Identified Symptom& Disease Associations for Lung Cancer (162.9) Symptom Rank 1 2 3 4 5 6 7 8 9 10 ICD Code 792.1 799.4 780.71 786.3 785.9 786 786.5 786.2 786.9 799.0 Name Nonspecific abnormal findings in stool contents Cachexia Chronic fatigue syndrome Hemoptysis Other symptoms involving cardiovascular system Symptoms involving respiratory system and other chest symptoms Chest pain Cough Other symptoms involving respiratory system and chest Asphyxia ICD Code 162.9 162 733.11 197.1 162.5 163.9 162.0 197.2 198.3 162.8 Name Malignant neoplasm of bronchus and lung, unspecified Malignant neoplasm of trachea, bronchus and lung Pathologic fracture of humerus Secondary malignant neoplasm of mediastinum Malignant neoplasm of lower lobe, bronchus or lung Malignant neoplasm of pleura, unspecified Malignant neoplasm of trachea Secondary malignant neoplasm of pleura Secondary malignant neoplasm of brain and spinal cord Malignant neoplasm of other parts of bronchus or lung Verification WebMD Other Sources* V V V V V V V V Disease Rank 1 2 3 4 5 6 7 8 9 10 © 2005 * 87 Note: other sources include lungcancerbookandnewsletter.com and the Japanese Journal of Clinical Oncology Top 10 Identified Procedure & Order Associations for Lung Cancer (162.9) Treatment -- Procedure Rank 1 2 3 4 5 6 7 8 9 10 ICD Code 33.26 33.27 32.4 33.24 33.93 34.24 92.18 92.24 40.11 40.24 Name Closed (percutaneous) (needle) biopsy of lung Closed endoscopic biopsy of lung Lobectomy of lung Closed (endoscopic) biopsy of bronchus Puncture of lung Pleural biopsy Total body radioisotope scan Teleradiotherapy using photons Biopsy of lymphatic structure Excision of inguinal lymph mode Verification WebMD Other Sources* V V V V V V V V V Treatment -- Orders Rank 1 2 3 4 5 6 7 8 9 10 © 2005 * Order ID OIRES2 IPEME5 OERLOT OGEFI2 IETOP1 IDOCE8 70213 SMEGEOS 37038 33103 Name Gefitinib F.C. (Iressa) Pemetrexed Disodium Heptahydrate (Alimta) Erlotinib (Tarceva) Gefitinib F.C. (Iressa) Etoposide injection (Etoposide-Teva) Docetaxel (Taxotere) Radical lymphadenectomy Megestrol acetate suspension (Megest) Intravenous chemothrapy <=1 hours CT Guide biopsy Verification WebMD Other Sources* V V V V V V V V V V Note: other sources include mdguidelines.com, Wikipedia, PubMed, and Journal of Clinical Oncology 88 Scenario-based SDT Association (Personalized Medicine) • SDT associations can be further extended to illuminate associations in a specific (personalized) scenario, such as: – Physician-centered SDT association • Physician’s treatment propensity education and dissemination of best practice – Patient-centered SDT association • Influences of patient’s demographic background and medical status patientcentered care – SDT association for multiple target diseases (complex disease scenarios) Entire Patient Visits Patient Visits with Code c Patient Visits with both Codes c and o © 2005 Patient Visits in the Scenario Patient Visits with Code c Patient Visits with both Codes c and o 89 Consistency of Top Treatment Orders Top 10 Treatment Orders from the Aggregated Group Iressa Alimta Tarceva Iressa Etoposide-Teva Taxotere Radical lymphadenectomy Megest Intravenous chemothrapy CT Guide <=1 hours biopsy M0031 Physician M1158 V V V V M1584 Pulmonary medicine (PM) Department PM and critical illness V Cardiology Gender F V M V V V V V V V V 25 to 44 Age group 45 to 64 > 65 Cooccurred disease 198.5 486 518.81 V V V V V V V V V V V V V V V V V V V V V V • When ranked by their confidence values, the top 10 treatment orders in each subgroups show low consistency with the aggregated group. – The result indicates that context has a high impact on SDT associations, which motivates our further development on scenario-based SDT techniques. © 2005 90 Correlation Analysis Correlation Matrix for All Treatment Orders AGGREGATED M0031 M1158 M1584 AGGREGATED 1 0.852 0.951 0.963 M0031 0.852 1 0.686 0.812 M1158 0.951 0.686 1 0.904 M1584 0.963 0.812 0.904 1 AGGREGATED 198.5 486 518.81 AGGREGATED 1 0.94 0.922 0.812 198.5 0.94 1 0.822 0.711 486 0.922 0.822 1 0.943 518.81 0.812 0.711 0.943 1 • • AGGREGATED Pulmonary medicine (PM) PM and critical illness Cardiology AGGREGATED 25 to 44 45 to 64 > 65 AGGREGATED 1 0.878 0.944 0.987 AGGREGATED 1 0.752 0.985 0.904 25 to 44 0.878 1 0.862 0.84 45 to 64 0.944 0.862 1 0.879 Pulmonary medicine (PM) 0.752 1 0.669 0.771 > 65 0.987 0.84 0.879 1 PM and critical illness 0.985 0.669 1 0.843 AGGREGATED F M AGGREGATED 1 0.983 0.995 Cardiology 0.904 0.771 0.843 1 F 0.983 1 0.959 M 0.995 0.959 1 As treatment orders in each subgroup are weighted by their [support × confidence], the correlation matrices are calculated to determine the correlations of treatment orders among each subgroups. While the correlations are generally high (> 0.6), several subgroups show relatively low correlations with others, such as physician M0031, age group 25 to 44, and the pulmonary medicine department. © 2005 91 Manager Dashboard (Hospital Performance Indicators) • Goal: To understand various hospital performance indicators based on four categories: income, physician, and patient. © 2005 92 Physician Dashboard (Summarized Patient Profile) © 2005 93 Physician Dashboard (Scenario-based SDT Associations Visualization) Target Symptoms Diseases Treatment (Procedures) Treatment (Orders) © 2005 94 Preliminary Findings • Drawn from data integration and data mining methods, the proposed SDT association techniques aim to facilitate diagnostic reasoning and planning tasks. • Qualitative verification of top SDT associations shows a high level of consistency with expert knowledge. • Scenario-based SDT associations can provide finegrained associations for specific subgroups (patients or physicians). • Physician dashboards can potential help alleviate clinical information and cognitive overload. © 2005 95 Healthcare Data Mining and Business Intelligence Research Road Maps • • • Clinical data mining and decision support – SDT association rule mining; patient cross-disease analysis; patient clustering and visualization; patient and treatment anomaly detection; patient disease progression analysis – Physician dashboard visualization; patient information aggregation and progression visualization – Research support and care information from literature, medical ontology and web – Physician support social media and network (PhysicansLikeMe) Patient care and support – Patient information portal; patient centered management – Capture of regular physical exams and lab reports; progression analysis – Wireless sensors; home care monitoring and reporting – Information capture from patients and family members – Patient support social media and network; PatientsLikeMe, Caner Survivor Network Executive information systems – Resource and facility utilization; department and disease analysis – Cost analysis and assessment © 2005 96 Reference • • • • • • • • • • • • Agarwal, R., Gao, G. (., DesRoches, C., & Jha, A. K. (2010). Research Commentary--The Digital Transformation of Healthcare: Current Status and the Road Ahead. Information Systems Research, 21(4), 796-809. Austin, R. M., Onisko, A., & Druzdzel, M. J. (2010). The Pittsburgh Cervical Cancer Screening Model: a risk assessment tool. Archives of Pathology & Laboratory Medicine, 134(5), 744-750. Brender, J., Nøhr, C., & McNair, P. (2000). Research needs and priorities in health informatics. International Journal of Medical Informatics, 58-59, 257-289. Cao, L. (2010). Domain-Driven Data Mining: Challenges and Prospects. Knowledge and Data Engineering, IEEE Transactions on, 22(6), 755-769. Cios, K. J., & William Moore, G. (2002). Uniqueness of medical data mining. Artificial Intelligence in Medicine, 26(1-2), 1-24. Cao, H., Markatou, M., Melton, G. B., Chiang, M. F., & Hripcsak, G. (2005). Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics, 2005, 106-110. Dahlström, O., Thyberg, I., Hass, U., Skogh, T., & Timpka, T. (2006). Designing a decision support system for existing clinical organizational structures: considerations from a rheumatology clinic. Journal of Medical Systems, 30(5), 325-331. Department of Health, Taiwan. (2010, December 13). Twenty leading causes of death. Retrieved February 8, 2011, from http://www.doh.gov.tw/CHT2006/DisplayStatisticFile.aspx?d=78276 (in Chinese) Doddi, S., Marathe, A., Ravi, S. S., & Torney, D. C. (2001). Discovery of association rules in medical data. Medical Informatics & the Internet in Medicine, 26(1), 25-33. Fieschi, M., Dufour, J. C., Staccini, P., Gouvernet, J., & Bouhaddou, O. (2003). Medical decision support systems: old dilemmas and new paradigms. Methods of Information in Medicine, 42(3), 190–198. Gotz, D., & Zhou, M. X. (2009). Characterizing users’ visual analytic activity for insight provenance. Information Visualization, 8(1), 42-55. Hanauer, D. A., Rhodes, D. R., & Chinnaiyan, A. M. (2009). Exploring Clinical Associations Using ‘-Omics’ Based Enrichment Analyses. PLoS ONE, 4(4), e5203. © 2005 97 Reference (cont.) • • • • • • • • • • • • • Horsky, J., Kaufman, D. R., Oppenheim, M. I., & Patel, V. L. (2003). A framework for analyzing the cognitive complexity of computer-assisted clinical ordering. Journal of Biomedical Informatics, 36(1-2), 4-22. Hou, Q., Lin, Z., Dusing, R. W., Gajewski, B. J., & McCallum, R. W. (2010). A Bayesian hierarchical assessment of gastric emptying with the linear, power exponential and modified power exponential models. Neurogastroenterology and Motility: The Official Journal of the European Gastrointestinal Motility Society. Imberman, S. P., Domanski, B., & Thompson, H. W. (2002). Using dependency/association rules to find indications for computed tomography in a head trauma dataset. Artificial Intelligence in Medicine, 26(1-2), 55-68. Jemal, A., Siegel, R., Xu, J., & Ward, E. (2010). Cancer Statistics, 2010. CA Cancer J Clin, 60(5), 277-300. Institute of Medicine. (2001). Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, D.C.: The National Academies Press. Klann, J., Schadow, G., & McCoy, J. M. (2009). A recommendation algorithm for automating corollary order generation. AMIA Annual Symposium Proceedings / AMIA Symposium. AMIA Symposium, 2009, 333-337. Kushniruk, A. W., & Patel, V. L. (2004). Cognitive and usability engineering methods for the evaluation of clinical information systems. Journal of Biomedical Informatics, 37(1), 56-76. Lau, F., Kuziemsky, C., Price, M., & Gardner, J. (2010). A review on systematic reviews of health information system studies. Journal of the American Medical Informatics Association, 17(6), 637 -645. Long, W. J. (2001). Medical informatics: reasoning methods. Artificial Intelligence in Medicine, 23(1), 71-87. Norris, A. C. (2002). Current trends and challenges in health informatics. Health Informatics Journal, 8(4), 205 -213. Nykänen, P. (2000). Decision Support Systems from a Health Informatics Perspective. Ordonez, C. (2006). Association rule discovery with the train and test approach for heart disease prediction. IEEE Transactions on Information Technology in Biomedicine, 10(2), 334-343. Patel, V. L., Shortliffe, E. H., Stefanelli, M., Szolovits, P., Berthold, M. R., Bellazzi, R., & Abu-Hanna, A. (2009). The coming of age of artificial intelligence in medicine. Artificial Intelligence in Medicine, 46(1), 5-17. © 2005 98 Reference (cont.) • • • • • • • • • • Qu, Y., & Furnas, G. W. (2005). Sources of structure in sensemaking. In CHI '05 extended abstracts on Human factors in computing systems, CHI '05 (pp. 1989–1992). New York, NY, USA: ACM. Ramakrishnan, N., Hanauer, D., & Keller, B. (2010). Mining Electronic Health Records. Computer, 43(10), 77-81. Rao, B. R., Sandilya, S., Niculescu, R., Germond, C., & Goel, A. (2002). Mining time-dependent patient outcomes from hospital patient records. Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 632-636. Stead, W. W., & Lin, H. (Eds.). (2009). Computational technology for effective health care: immediate steps and strategic directions. National Academies Press. Sittig, D. F., Wright, A., Osheroff, J. A., Middleton, B., Teich, J. M., Ash, J. S., Campbell, E., & Bates, D. W. (2008). Grand challenges in clinical decision support. Journal of Biomedical Informatics, 41(2), 387-392. Tai, Y., & Chiu, H. (2009). Comorbidity study of ADHD: Applying association rule mining (ARM) to National Health Insurance Database of Taiwan. International Journal of Medical Informatics, 78(12), e75-e83. Toussi, M., Lamy, J., Le Toumelin, P., & Venot, A. (2009). Using data mining techniques to explore physicians' therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Medical Informatics and Decision Making, 9, 28. Wright, A., Chen, E. S., & Maloney, F. L. (2010). An automated technique for identifying associations between medications, laboratory results and problems. Journal of Biomedical Informatics, 43(6), 891-901. Wright, A., & Sittig, D. F. (2006). Automated development of order sets and corollary orders by data mining in an ambulatory computerized physician order entry system. AMIA ... Annual Symposium Proceedings / AMIA Symposium. AMIA Symposium, 819-823. Wroblewski, D., Francis, B. A., Chopra, V., Kawji, A. S., Quiros, P., Dustin, L., & Massengill, R. K. (2009). Glaucoma detection and evaluation through pattern recognition in standard automated perimetry data. Graefe's Archive for Clinical and Experimental Ophthalmology = Albrecht Von Graefes Archiv Für Klinische Und Experimentelle Ophthalmologie, 247(11), 15171530. © 2005 99