Beyond “Bag of Words”: Towards a Framework for Conceptual Retrieval Jimmy Lin College of Information Studies University of Maryland Thursday, October 4, 2007 IPAM Workshop, UCLA Beyond “Bag of Words” IR is fundamentally based on counting words Different ways of “bookkeeping”: vector space, probabilistic, LM, DFR, etc. So… Words aren’t enough to capture meaning Term statistics aren’t enough to capture meaning Thus… IR systems should go beyond term statistics: concepts, relations, etc. Hypothesis: IR based on concepts, relations, etc. >> IR based on words However… A reasonable hypothesis? Where’s the empirical support? Outline Previous attempts to go beyond BoW Slightly different approach Case study in the medical domain Start with specialized applications Generalize A clinical question answering system in support of evidence-based medicine (EBM) Broader applicability? Previous Work Beyond “bags” Indexing phrases e.g., (Fagan, 1987; Smeaton et al., 1994; etc.) Modeling term dependencies e.g., (Gao et al., 2004; Liu et al., 2004; Metzler and Croft, 2005; Cui et al., 2005; etc.) Beyond “words” Query expansion: e.g., (Voorhees, 1993; 1994) Word Sense Disambiguation e.g., (Sanderson, 1994; Mihalcea and Moldovan, 2000) Results? Mixed A Different Approach Previous work focuses on the general domain Broad but (relatively) shallow Hampered by commonsense problem Difficult to acquire large amounts of knowledge Our approach: Develop a general framework Instantiate in domain-specific applications Leverage lessons learned to refine the framework Rinse, repeat “Conceptual Retrieval” Questions Conceptual representation Knowledge Extractor Semantic Matcher Conceptual representation Collection Answers What type of knowledge? Knowledge about the problem structure Knowledge about user tasks What representations are useful for capturing the information need? Why is this information needed? How will it be further used? Knowledge about the domain What background knowledge is needed to reason about the information need? K1: Problem Structure Knowledge representations are important! Helps experts reason about problems Form the basis for tractable computational structures GO’FAI Frames (Minsky) Scripts (Schank) Semantic networks (attribution less clear) Knowledge about problem structure Knowledge about user tasks Knowledge about the domain K2: User Tasks The user is important! Users are different High school student vs. intelligence analyst Different types of relevance Topical, situational, etc. Knowledge about problem structure Knowledge about user tasks Knowledge about the domain K3: Domain Why is the sky blue? “To really learn something, you basically have to already know it.” Users bring a tremendous amount of knowledge to bear when asking questions Specialized, technical knowledge Commonsense Knowledge about problem structure Knowledge about user tasks Knowledge about the domain K4 … Kn? More types of knowledge need? Working hypothesis: {K1, K2, K3} comprise a necessary set Introductions Dr. Dr. Dr. Dina Demner-Fushman, M.D., Ph.D., Ph.D. Why the Medical Domain? Evidence-Based Medicine = A paradigm of medical practice that emphasizes decision-support from high-quality clinical research Provides a basis for K1, K2, and K3 Need for retrieval systems is well documented: e.g., (Gorman et al., 1994; Chambliss and Conley, 1996; Cogdill and Moore, 1997; Ely et al., 2005; Sutton et al., 2005) Clinical QA: “Ready-made” domain for exploring conceptual retrieval Availability of corpora, resources, etc. Important and potentially high-impact application K1: Problem Structure EBM identifies four components of a question Originally developed as a clinical tool Can serve as a knowledge representation “In children with an acute febrile illness, what is the efficacy of single-medication therapy with acetaminophen or ibuprofen in reducing fever?” Population/ Problem children/ acute febrile illness Intervention acetaminophen Comparison ibuprofen Outcome reducing fever = PICO frame Knowledge about problem structure Knowledge about user tasks Knowledge about the domain K2: User Tasks Clinical tasks Therapy Selecting effective treatments, taking into account other factors such as risk and cost Diagnosis Selecting and interpreting diagnostic tests, while considering factors such as precision and safety Prognosis Estimating the patient’s likely course over time and anticipating likely complications Etiology Identifying risk factors and the causes for a patient’s disease Considerations for strength of evidence Strength of Recommendations Taxonomy (SORT): three evidence grades Knowledge about problem structure Knowledge about user tasks Knowledge about the domain K3: Domain The Unified Medical Language System (UMLS) 2004 version: 1+ million biomedical concepts, > 5 million concept names Software for leveraging this resource: MetaMap, SemRep for identifying concepts, relations Anti-infective agent Antifungal Antibacterial drugs Disinfectants and cleansers Quinolone ofloxacin Borate product Mucous membrane antifungal agent Ciclopirox boric acid Knowledge about problem structure Knowledge about user tasks Knowledge about the domain Re: Conceptual Retrieval Question: In children with an acute febrile illness, what is the efficacy of singlemedication therapy with acetaminophen or ibuprofen in reducing fever? Task P I C O P I C O therapy children/acute febrile illness acetaminophen ibuprofen reducing fever children/acute febrile illness acetaminophen ibuprofen reducing fever Answer: Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. MEDLINE NLM’s authoritative repository of 17 million+ abstracts System Architecture Question (query frame) Query Formulator query frame search query PubMed abstracts Semantic Matcher scored citations Answer Generator annotated abstracts Knowledge Extractors Answers Test Collection Manually gathered 50 clinical questions from FPIN and the Parkhurst Exchange Reflects distribution of real-world questions Divided into development and test collections Therapy 22 Does quinine reduce leg cramps for young athletes? Diagnosis 12 How often is coughing the presenting complaint in patients with gastroesophageal reflux disease? Prognosis 6 What’s the prognosis of lupoid sclerosis? Etiology 10 What are the causes of hypomagnesemia? Total 50 Gathering Judgments Manually formulated PubMed queries ~40 minutes per question; gathered top 50 fits Question: What is the best treatment for analgesic rebound headaches? PubMed Query: (((“analgesics”[TIAB] NOTMedline[SB]) OR “analgesics”[MeSH Terms] OR “analgesics”[Pharmacological Action] OR analgesic[TextWord]) AND ((“headache”[TIAB] NOT Medline[SB]) OR “headache”[MeSH Terms] OR headaches[TextWord]) AND (“adverse effects”[Subheading] OR side effects[Text Word])) AND hasabstract[text] AND English[Lang] AND “humans”[MeSH Terms] Manually evaluated all retrieved citations ~2 hours per question Knowledge Extraction Example Antipyretic efficacy of ibuprofen vs acetaminophen. OBJECTIVE--To compare the antipyretic efficacy of ibuprofen, placebo, and acetaminophen. DESIGN-Double-dummy, double-blind, randomized, placebo-controlled trial. SETTING--Emergency department and inpatient units of a large, metropolitan, university-based, children's hospital in Michigan. PARTICIPANTS--37 otherwise healthy children aged 2 to 12 years with acute, intercurrent, febrile illness. INTERVENTIONS--Each child was randomly assigned to receive a single dose of acetaminophen (10 mg/kg), ibuprofen (7.5 or 10 mg/kg), or placebo. MEASUREMENTS/MAIN RESULTS--Oral temperature was measured before dosing, 30 minutes after dosing, and hourly thereafter for 8 hours after the dose. Patients were monitored for adverse effects during the study and 24 hours after administration of the assigned drug. All three active treatments produced significant antipyresis compared with placebo. Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. No adverse effects were observed in any treatment group. CONCLUSION--Ibuprofen is a potent antipyretic agent and is a safe alternative for the selected febrile child who may benefit from antipyretic medication but who either cannot take or does not achieve satisfactory antipyresis with acetaminophen. Am J Dis Child. 1992 May; 146(5):622-5 Population Problem Interventions Question Outcome Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Knowledge Extractors Population, Problem, Intervention: IE task Exploited coverage of medical concepts in UMLS Additional candidate ranking based a few features Outcome: sentence-level classification task “Kitchen sink approach”, ensemble of classifiers Features: • • • • • Manually-defined cue words N-grams Position in abstract Presence of certain UMLS concepts … Semantics helps! Question Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Knowledge Extractors Antipyretic efficacy of ibuprofen vs acetaminophen. OBJECTIVE--To compare the antipyretic efficacy of ibuprofen, placebo, and acetaminophen. DESIGN--Double-dummy, double-blind, randomized, placebo-controlled trial. SETTING-Emergency department and inpatient units of a large, metropolitan, university-based, children's hospital in Michigan. PARTICIPANTS--37 otherwise healthy children aged 2 to 12 years with acute, intercurrent, febrile illness. INTERVENTIONS--Each child was randomly assigned to receive a single dose of acetaminophen (10 mg/kg), ibuprofen (7.5 or 10 mg/kg), or placebo. MEASUREMENTS/MAIN RESULTS--Oral temperature was measured before dosing, 30 minutes after dosing, and hourly thereafter for 8 hours after the dose. Patients were monitored for adverse effects during the study and 24 hours after administration of the assigned drug. All three active treatments produced significant antipyresis compared with placebo. Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. No adverse effects were observed in any treatment group. CONCLUSION-Ibuprofen is a potent antipyretic agent and is a safe alternative for the selected febrile child who may benefit from antipyretic medication but who either cannot take or does not achieve satisfactory antipyresis with acetaminophen. Am J Dis Child. 1992 May; 146(5):622-5 Problem Population Intervention Outcome ? ? ? ? 90% 5% 5% 80% 13% 7% 80% 0% 20% 95% 0% 5% Question Details: Dina Demner-Fushman and Jimmy Lin. Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics, 33(1):63-103, 2007 Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Semantic Matching Three score components: SEBM = SPICO + SSoE + SMeSH SPICO Matching PICO frame elements SSoE Strength of evidence considerations SMeSH MeSH indicators for each clinical task Problem Structure User Tasks Question Details: Dina Demner-Fushman and Jimmy Lin. Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics, 33(1):63-103, 2007 Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Semantic Matching: Evaluation Research Questions Does it work? What are the relative contributions of each component? What is the interaction between knowledge-based and statistical techniques? Approach Reranking experiments with test collection Ablation studies Question Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Evaluation: Abstract Reranking Question: What is the best treatment for analgesic rebound headaches? (((“analgesics”[TIAB] NOTMedline[SB]) OR “analgesics”[MeSH Terms] OR “analgesics”[Pharmacological Action] OR analgesic[TextWord]) AND ((“headache”[TIAB] NOT Medline[SB]) OR “headache”[MeSH Terms] OR headaches[TextWord]) AND (“adverse effects”[Subheading] OR side effects[Text Word])) AND hasabstract[text] AND English[Lang] AND “humans”[MeSH Terms] Clinical task, PICO frame P MEDLINE C Knowledge Extractor I Semantic Matcher O vs. original PubMed ordering vs. Indri baseline (state-of-the-art LM) Question Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Results: Complete Model Performance on held-out blind test set: Therapy Diagnosis Prognosis Etiology All Precision at 10 (P10) PubMed .350 (–39%) .150 (–70%) .200 (–46%) .320 (–20%) .281 (–44%) Indri .575 .500 .367 .400 .500 EBM .783 (+36%) .583 (+17%) .467 (+27%) .660 (+65%) .677 (+35%) Mean Average Precision (MAP) PubMed .421 (–29%) .279 (–48%) .235 (–56%) .364 (–17%) .356 (–35%) Indri .595 .534 .533 .439 .544 EBM .765 (+29%) .637 (+19%) .722 (+35%) .701 (+60%) .718 (+32%) Results are statistically significant Question Details: Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. SIGIR 2006. Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Results: Parameter Settings Tuning each component SEBM = λ1 SPICO + λ2 SSoE + (1 - λ1 - λ2 ) SMeSH No statistically significant difference Combining EBM + Indri SEBM+Indri = λ SEBM + (1- λ ) SIndri Better performance, but not statistically significant Question Details: Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. SIGIR 2006. Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Results: Contributions What’s the contribution of each EBM facet? MAP vs. EBM vs. Indri SPICO .646 –10%** +19%* Problem Structure SSoE + SMeSH .538 –25%** –1% User Tasks P10 vs. EBM vs. Indri SPICO .627 –7% +25%** Problem Structure SSoE + SMeSH .485 –28%** –3% User Tasks ** = sig. at 99%, * = sig. at 95% What types of knowledge are important? Problem structure (K1) helps a lot User tasks (K2) help, but not as much Question Details: Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. SIGIR 2006. Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Results: Partial Models Can we use limited knowledge to improve termbased methods? λ SIndri Term Statistics λ SIndri + (1- λ) SPICO .46 + Problem Structure λ SIndri + (1- λ)(.5 SSoE + .5 SMeSH) .55 MAP P10 .544 .500 .668 (+23%)** .627 (+25%)** .620 (+14%)** .565 (+13%)* + User Tasks ** = sig. at 99%, * = sig. at 95% Any knowledge helps! Question Details: Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. SIGIR 2006. Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Answer Generation Physicians are most interested in outcomes Approach: identify outcome sentences Generate an answer from each citation: abstract title and three highest scoring outcome sentences Question: Does combining aspirin and warfarin decrease the risk of stroke for patients with nonvalvular atrial fibrillation? Answer: Prevention of thromboembolic events in atrial fibrillation: The results from the SPAF III study demonstrated that a combination of mini-intensity warfarin plus aspirin was insufficient for stroke prevention in atrial fibrillation. Other trials now indicate, that oral anticoagulation at INR-values below 2.0 is not effective for stroke prevention in these patients. The present clinical challenge is to ensure effective and safe oral anticoagulation to patients with atrial fibrillation at high risk of stroke. abstract title outcome1 outcome2 Question outcome3 Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Evidence Synthesis Integrate findings from multiple citations Question: What is the best treatment for chronic prostatitis? ► anti-microbial [temafloxacin] Treatment of chronic bacterial prostatitis with temafloxacin. Temafloxacin 400 mg b.i.d. administered orally for 28 days represents a safe and effective treatment for chronic bacterial prostatitis. [ofloxacin] Ofloxacin in the management of complicated urinary tract infections, including prostatitis. In chronic bacterial prostatitis, results to date suggest that ofloxacin may be more effective clinically and as effective microbiologically as carbenicillin. ... ► Alpha-adrenergic blocking agent [terazosine] Terazosin therapy for chronic prostatitis/chronic pelvic pain syndrome: a randomized, placebo controlled trial. CONCLUSIONS: Terazosin proved superior to placebo for patients with chronic prostatitis/chronic pelvic pain syndrome who had not received alpha-blockers previously. ... Question Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Semantic Clustering Cluster1 Cluster2 relevant citations Cluster3 Answer Extraction Semantic Clustering Question Interactive Presentation Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Evaluation: Evidence Synthesis What is the best treatment of X? Compare Top three answers from PubMed First answer in three largest semantic clusters Evaluation by a physician: “Good” “Okay” “Bad” PubMed 0.600 0.227 0.173 Semantic Clustering 0.827 0.133 0.040 Question Details: Dina Demner-Fushman and Jimmy Lin. Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering. ACL 2006. Query Formulator Semantic Matcher PubMed Knowledge Extractors Answer Generator Answers Findings K1 + K2 + K3 → “conceptual retrieval” Knowledge helps a lot! But here’s the catch: Limited domain: “narrow but deep” Dependent on availability of existing resources Beyond “bag of words”: Develop a general framework Instantiate in domain-specific applications Leverage lessons learned to refine the framework Rinse, repeat Re: Re: Conceptual Retrieval Question: In children with an acute febrile illness, what is the efficacy of singlemedication therapy with acetaminophen or ibuprofen in reducing fever? Task P I C O P I C O MEDLINE therapy children/acute febrile illness acetaminophen ibuprofen reducing fever children/acute febrile illness acetaminophen ibuprofen reducing fever facet Task P facet I C O facet Answer: Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. therapy facet children/acute febrile illness acetaminophen ibuprofen facet reducing fever NLM’s authoritative repository of 17 million+ abstracts = faceted query! Conceptual Retrieval “Building blocks” strategy in library science Decompose information need into conceptual facets Identify terms that represent those facets Instantiate in a structured query ( A1 A2 …) ( B1 B2 …) ( C1 C2 …) ( D1 D2 …) … P I C O EBM-based retrieval is a specific case of facet analysis and structured querying! A General Framework? For a domain 1. 2. 3. 4. 5. Identify prototypical information needs Develop a frame-based representation Build extractor for frame elements Instantiate semantic matcher Watch performance go up! The subject of ongoing work… What comes next? Retrieval in the biomedical domain Information describing the role(s) of a [gene] involved in a [disease]. gene: Interferon-beta disease: Multiple Sclerosis Information describing the role of a [gene] in a specific [biological process]. gene: nucleoside diphosphate kinase (NM23) biological process: tumor progression Complex question answering What evidence is there for transport of [art looted by the Nazis in WWII] from [Germany] to [France]? What [familial ties] exist between [Neanderthals] and [humans]? What [common interests] exist between [Network Solutions] and [the Internet Corporation for Assigned Names and Numbers (ICANN)]? Acknowledgments Dina Demner-Fushman (Ph.D., 2006) This work was funded in part by NLM References Dina Demner-Fushman and Jimmy Lin. Answering Clinical Questions with KnowledgeBased and Statistical Techniques. Computational Linguistics, 33(1):63-103, 2007. Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), 2006, pp. 99-106. Dina Demner-Fushman and Jimmy Lin. Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering. Proceedings of the 21th International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), 2006, pp. 841-848.