Loyola University Chicago Health Sciences Division Stritch School of Medicine (SSOM) The Clinical Research Database (CRDB) January 8, 2014 (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved Speaker: Ron Price Associate Dean, Office of Information Systems Loyola University Chicago Stritch School of Medicine Maywood, Illinois & Associate Vice President, Informatics and Systems Development Loyola University Chicago Health Science Division (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved What is the Clinical Research Database (CRDB)? • Large-scale, de-identified clinical data warehouse structured to support a wide range of clinical analytics • Operates on advanced Hadoop technology • CRDB data are accessible via a web-based front-end for casual users (e.g., faculty, housestaff and students) and via a wide range of tools for advanced users (e.g., analysts, bioinformatics staff, etc.) • Initial target data loads for the CRDB are from Epic (1/1/2007-9/30/2013) (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved Why use Hadoop? • Developed by Yahoo in mid-2000’s and is extensively utilized by “big-data” internet companies (and the NSA) to process large amounts (petabytes) of structured and unstructured data. • Hadoop is a data management/processing framework that distributes data storage and processing over clusters of inexpensive computers • Hadoop’s strengths are its ability to scale and to efficiently handle unstructured data (e.g., text reports, images, BLOBs, etc.) • SSOM’s Hadoop environment – Development and Production environments – Production environment • 12-node cluster (2 namenodes, MySQL srv, and 9 datanodes) • 178TBs of storage (current core Epic EMR is 4TBs) (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved Why use Hadoop? • Hadoop’s strengths are its ability to scale and to efficiently handle unstructured data (e.g., text reports, images, BLOBs, etc.) “Of the 1.2 billion clinical documents produced in the United States each year, approximately 60 percent contain valuable information trapped in unstructured documents that are unavailable for clinical use, quality measurement and data mining.”* • Some estimates put this number closer to 80% * Health Management Technology – June 2012 (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved Why not just use Epic? • Epic is LUMC’s EMR however most data originates and are stored their native (e.g., granular or structured) formats in local ancillary systems (e.g., Clinical Labs, RIS/PACS, EPS, etc.) • Epic is optimized for healthcare operations and not for research or population studies • Activity related to large-scale analytics impacts system performance • The “10,000 table” issue (actually 11,964! tables) • Systems supporting research and population studies need – Flexibility to handle “foreign” (e.g., external, multi-center) data – Flexibility to handle unstructured data – Need ability to scale to “big data” levels (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved CRDB Version 1.0 (July 2013) • Current data – – – – – De-identified with keys held in Epic Clarity data warehouse Data source of Epic Clarity (updated nightly) Data period of 1/1/2007 through 09/30/2013 Updated quarterly (next update mid-March 2014) Data tables • • • • • • Demographics Encounters (Inpatient, Outpatient, ED, Obs and home health) Procedures and clinical lab values Flowsheet measures (vitals, physical findings, etc.) Medications Payor information at encounter level • CRDB application is widely available on the portal (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved Demonstration of the CRDB (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved CRDB Version 1.0 - Future • Website development activities – Request for expedited IRB – Refinement of “groupings” for ICD9s, CPTs and providers • Capture of additional data (Current calendar year) – Microbiology results and other report text blobs • End-user Query Tool – Additional query parameters and analysis modules – – – – – Enhanced logic functions (January 2014) CPTs (March 2014) Labs (June 2014) Flowsheet measures (August 2014) Units (October 2014) (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved Current Usage (July 2013 – Dec 2013) • Unique CRDB Users – 213 • Query Tool CRDB Cohort identifications – 302 • CRDB Data Extracts (since August) – 5 large clinical extracts for a recent PCORI grant – Large data extract for Chicago Health Atlas project – 2 QI projects – 6 Medical Student/Resident clinical research projects (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved Questions and Answers (C) 2010 © Copyright Copyright 2013 - Loyola – Loyola University University Chicago Chicago StritchStritch School School of Medicine of Medicine , All Rights – All Rights Reserved Reserved