Enabling Medical Experts to navigate clinical text for cohort identification (meTAKES) Stephen Wu, Mayo Clinic SHARPn Summit 2012 June 12, 2012 Outline • Motivation • Methods (current) • • • • System architecture Data retrieval Search Cohort management • Conclusion & Future Work Motivation • Clinical NLP out-of-the-box • Comprehensive knowledge • Customize? Collaborate! • Diverse requirements • Physician/Researcher tasks • Enroll patients in study • Define retrospective cohort • Case abstraction Somali patients (unique terms) Drug-induced liver injury (rel’ns) Pediatric asthma (temporal) “Medical expert”-driven NLP • Use case-agnostic • Use case-specific • Comprehensive • Streamlined • Pre-computed NLP • On-the-fly NLP Diverse requirements Known requirements • Interactive interface • Delivery mechanism • Available data vs. expert knowledge source text semantics expert criteria • Web interface (GWT) Client Server data pool parameters EHR records GUI query query parser NLP (MedTagger) cohort mgmt Lucene records cohort manipulation ranked records Data retrieval • Parameters (current) • Patient ID • Date • Sources (current) • Enterprise Data Trust (EDT) • @ Mayo Clinic Text files on server Search • Parameters (current) • Term lists • Logic • Expansion • Techniques (current) • Dictionary (Lucene) • NLP results (e.g., negation) Cohort Management • Parameters: • Cohort name/tag • Selecting patients • Export • Iterative refinement Conclusion and Future Work • NLP / search • Text characteristics • Semantic search • Relationships • HCI / cohort management • Learning • Collaboration • Interoperability • Structured data • API • Mayo delivery: DDQB Clinical Notes Search Tool Evaluation framework meTAKES team: Stephen Wu Ravikumar K.E. Hongfang Liu https://sites.google.com/site/stephentzeinnwu wu.stephen@mayo.edu THANK YOU. Special thanks to: Siddhartha Jonnalagadda James Masanz Vinod Kaggal Sean Murphy Tom Suther Erik Voldal Carlos Garcia Melissa Gregg This work was supported in part by the SHARPn (Strategic Health IT Advanced Research Projects) Area 4: Secondary Use of EHR Data Cooperative Agreement from the HHS Office of the National Coordinator, Washington, DC. DHHS 90TR000201.