Text Mining of Medical Documents Michael Elhadad - Raphael Cohen Dept of Computer Science Natural Language Processing • Analyze free text to extract “information” • Key challenges: – Ambiguity: heart, ברק – Variability: diabetes, dm, diab. • Applications: – Search – Text Mining: information extraction, relations – Summarization NLP for Medical Domain Opportunity • Availability of online textual documents – EHR: mostly textual (release notes) – Scientific literature (PubMed) Challenge • Methods developed on “regular language” fail on “medical language” Specific Interest • EHR – Exploit rich textual data in EHR. – In Hebrew! • Hebrew NLP – Complex morphology, no dictionaries, no UMLS • Domain Adaptation – Machine learning methods to port NLP models from one domain to medical domain. Recent Work in Domain • Raphael Cohen, Michael Elhadad and Ohad S Birk, Analysis of free online physician advice services, PLOS ONE, 2013 • Raphael Cohen, Noemie Elhadad, Michael Elhadad, Redundancy in Electronic Health Record Corpora: Analysis, Impact on Text Mining Performance and Mitigation Strategies BMC Bioinformatics, 2013. • Raphael Cohen and Michael Elhadad, Syntactic Dependency Parsers for Biomedical-NLP, AMIA Proceedings 2012, pp121-128 • Raphael Cohen, Yoav Goldberg and Michael Elhadad, Domain Adaptation of a Dependency Parser with a Class-Class Selectional Preference Model, ACL 2012, SRW • Raphael Cohen, Avitan Gefen, Michael Elhadad and Ohad S Birk, CSIOMIM - Clinical Synopsis Search in OMIM, BMC Bioinformatics 2011, 12:65