Loyola University Chicago Health Sciences Division
Stritch School of Medicine (SSOM)
The Clinical Research Database
(CRDB)
January 8, 2014
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
Speaker:
Ron Price
Associate Dean, Office of Information Systems
Loyola University Chicago Stritch School of
Medicine
Maywood, Illinois
&
Associate Vice President, Informatics and Systems
Development
Loyola University Chicago Health Science Division
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
What is the Clinical Research
Database (CRDB)?
• Large-scale, de-identified clinical data warehouse
structured to support a wide range of clinical analytics
• Operates on advanced Hadoop technology
• CRDB data are accessible via a web-based front-end
for casual users (e.g., faculty, housestaff and students)
and via a wide range of tools for advanced users (e.g.,
analysts, bioinformatics staff, etc.)
• Initial target data loads for the CRDB are from Epic
(1/1/2007-9/30/2013)
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
Why use Hadoop?
• Developed by Yahoo in mid-2000’s and is extensively utilized by
“big-data” internet companies (and the NSA) to process large
amounts (petabytes) of structured and unstructured data.
• Hadoop is a data management/processing framework that distributes
data storage and processing over clusters of inexpensive computers
• Hadoop’s strengths are its ability to scale and to efficiently handle
unstructured data (e.g., text reports, images, BLOBs, etc.)
• SSOM’s Hadoop environment
– Development and Production environments
– Production environment
• 12-node cluster (2 namenodes, MySQL srv, and 9 datanodes)
• 178TBs of storage (current core Epic EMR is 4TBs)
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
Why use Hadoop?
• Hadoop’s strengths are its ability to scale and to efficiently handle
unstructured data (e.g., text reports, images, BLOBs, etc.)
“Of the 1.2 billion clinical documents produced in the United States
each year, approximately 60 percent contain valuable information
trapped in unstructured documents that are unavailable for clinical
use, quality measurement and data mining.”*
• Some estimates put this number closer to 80%
* Health Management Technology – June 2012
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
Why not just use Epic?
• Epic is LUMC’s EMR however most data originates and are stored
their native (e.g., granular or structured) formats in local ancillary
systems (e.g., Clinical Labs, RIS/PACS, EPS, etc.)
• Epic is optimized for healthcare operations and not for research or
population studies
• Activity related to large-scale analytics impacts system performance
• The “10,000 table” issue (actually 11,964! tables)
• Systems supporting research and population studies need
– Flexibility to handle “foreign” (e.g., external, multi-center) data
– Flexibility to handle unstructured data
– Need ability to scale to “big data” levels
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
CRDB Version 1.0 (July 2013)
• Current data
–
–
–
–
–
De-identified with keys held in Epic Clarity data warehouse
Data source of Epic Clarity (updated nightly)
Data period of 1/1/2007 through 09/30/2013
Updated quarterly (next update mid-March 2014)
Data tables
•
•
•
•
•
•
Demographics
Encounters (Inpatient, Outpatient, ED, Obs and home health)
Procedures and clinical lab values
Flowsheet measures (vitals, physical findings, etc.)
Medications
Payor information at encounter level
• CRDB application is widely available on the portal
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
Demonstration of the CRDB
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
CRDB Version 1.0 - Future
• Website development activities
– Request for expedited IRB
– Refinement of “groupings” for ICD9s, CPTs and providers
• Capture of additional data (Current calendar year)
– Microbiology results and other report text blobs
• End-user Query Tool – Additional query parameters and analysis modules
–
–
–
–
–
Enhanced logic functions (January 2014)
CPTs (March 2014)
Labs (June 2014)
Flowsheet measures (August 2014)
Units (October 2014)
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
Current Usage (July 2013 – Dec 2013)
• Unique CRDB Users – 213
• Query Tool CRDB Cohort identifications – 302
• CRDB Data Extracts (since August)
– 5 large clinical extracts for a recent PCORI grant
– Large data extract for Chicago Health Atlas project
– 2 QI projects
– 6 Medical Student/Resident clinical research projects
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved
Questions and Answers
(C) 2010
© Copyright
Copyright 2013
- Loyola
– Loyola
University
University
Chicago
Chicago
StritchStritch
School
School
of Medicine
of Medicine
, All Rights
– All Rights
Reserved
Reserved