Beyond “Bag of Words”: Towards a Framework for Conceptual Retrieval Jimmy Lin

advertisement
Beyond “Bag of Words”: Towards a
Framework for Conceptual Retrieval
Jimmy Lin
College of Information Studies
University of Maryland
Thursday, October 4, 2007
IPAM Workshop, UCLA
Beyond “Bag of Words”

IR is fundamentally based on counting words


Different ways of “bookkeeping”: vector space,
probabilistic, LM, DFR, etc.
So…


Words aren’t enough to capture meaning
Term statistics aren’t enough to capture meaning
Thus…

IR systems should go beyond term statistics:
concepts, relations, etc.

Hypothesis:
IR based on concepts, relations, etc. >> IR based on words

However…


A reasonable hypothesis?
Where’s the empirical support?
Outline

Previous attempts to go beyond BoW

Slightly different approach



Case study in the medical domain


Start with specialized applications
Generalize
A clinical question answering system in support of
evidence-based medicine (EBM)
Broader applicability?
Previous Work

Beyond “bags”

Indexing phrases
e.g., (Fagan, 1987; Smeaton et al., 1994; etc.)

Modeling term dependencies
e.g., (Gao et al., 2004; Liu et al., 2004; Metzler and Croft, 2005;
Cui et al., 2005; etc.)

Beyond “words”

Query expansion:
e.g., (Voorhees, 1993; 1994)

Word Sense Disambiguation
e.g., (Sanderson, 1994; Mihalcea and Moldovan, 2000)

Results? Mixed
A Different Approach

Previous work focuses on the general domain




Broad but (relatively) shallow
Hampered by commonsense problem
Difficult to acquire large amounts of knowledge
Our approach:




Develop a general framework
Instantiate in domain-specific applications
Leverage lessons learned to refine the framework
Rinse, repeat
“Conceptual Retrieval”
Questions
Conceptual representation
Knowledge
Extractor
Semantic
Matcher
Conceptual representation
Collection
Answers
What type of knowledge?

Knowledge about the problem structure


Knowledge about user tasks



What representations are useful for capturing the
information need?
Why is this information needed?
How will it be further used?
Knowledge about the domain

What background knowledge is needed to reason about
the information need?
K1: Problem Structure

Knowledge representations are important!



Helps experts reason about problems
Form the basis for tractable computational structures
GO’FAI



Frames (Minsky)
Scripts (Schank)
Semantic networks (attribution less clear)
Knowledge about problem structure
Knowledge about user tasks
Knowledge about the domain
K2: User Tasks

The user is important!

Users are different


High school student vs. intelligence analyst
Different types of relevance

Topical, situational, etc.
Knowledge about problem structure
Knowledge about user tasks
Knowledge about the domain
K3: Domain

Why is the sky blue?
“To really learn something, you basically have to already know it.”

Users bring a tremendous amount of knowledge
to bear when asking questions


Specialized, technical knowledge
Commonsense
Knowledge about problem structure
Knowledge about user tasks
Knowledge about the domain
K4 … Kn?

More types of knowledge need?

Working hypothesis:

{K1, K2, K3} comprise a necessary set
Introductions
Dr. Dr. Dr. Dina Demner-Fushman, M.D., Ph.D., Ph.D.
Why the Medical Domain?

Evidence-Based Medicine



= A paradigm of medical practice that emphasizes
decision-support from high-quality clinical research
Provides a basis for K1, K2, and K3
Need for retrieval systems is well documented:
e.g., (Gorman et al., 1994; Chambliss and Conley, 1996;
Cogdill and Moore, 1997; Ely et al., 2005; Sutton et al., 2005)

Clinical QA:



“Ready-made” domain for exploring conceptual retrieval
Availability of corpora, resources, etc.
Important and potentially high-impact application
K1: Problem Structure

EBM identifies four components of a question


Originally developed as a clinical tool
Can serve as a knowledge representation
“In children with an acute febrile illness, what is the efficacy of
single-medication therapy with acetaminophen or ibuprofen in
reducing fever?”
Population/
Problem
children/
acute febrile illness
Intervention
acetaminophen
Comparison
ibuprofen
Outcome
reducing fever
= PICO frame
Knowledge about problem structure
Knowledge about user tasks
Knowledge about the domain
K2: User Tasks

Clinical tasks
Therapy Selecting effective treatments, taking into account
other factors such as risk and cost
Diagnosis Selecting and interpreting diagnostic tests, while
considering factors such as precision and safety
Prognosis Estimating the patient’s likely course over time
and anticipating likely complications
Etiology Identifying risk factors and the causes for a
patient’s disease

Considerations for strength of evidence

Strength of Recommendations Taxonomy (SORT): three
evidence grades
Knowledge about problem structure
Knowledge about user tasks
Knowledge about the domain
K3: Domain

The Unified Medical Language System (UMLS)


2004 version: 1+ million biomedical concepts, > 5
million concept names
Software for leveraging this resource:

MetaMap, SemRep for identifying concepts, relations
Anti-infective
agent
Antifungal
Antibacterial
drugs
Disinfectants
and cleansers
Quinolone
ofloxacin
Borate
product
Mucous membrane
antifungal agent
Ciclopirox
boric acid
Knowledge about problem structure
Knowledge about user tasks
Knowledge about the domain
Re: Conceptual Retrieval
Question:
In children with an acute febrile illness, what is the efficacy of singlemedication therapy with acetaminophen or ibuprofen in reducing fever?
Task
P
I
C
O
P
I
C
O
therapy
children/acute febrile illness
acetaminophen
ibuprofen
reducing fever
children/acute febrile illness
acetaminophen
ibuprofen
reducing fever
Answer:
Ibuprofen provided greater
temperature decrement and
longer duration of antipyresis
than acetaminophen when the
two drugs were administered
in approximately equal doses.
MEDLINE
NLM’s authoritative repository of 17 million+ abstracts
System Architecture
Question
(query frame)
Query
Formulator
query
frame
search query
PubMed
abstracts
Semantic
Matcher
scored
citations
Answer
Generator
annotated
abstracts
Knowledge
Extractors
Answers
Test Collection

Manually gathered 50 clinical questions from
FPIN and the Parkhurst Exchange


Reflects distribution of real-world questions
Divided into development and test collections
Therapy
22
Does quinine reduce leg cramps for young
athletes?
Diagnosis
12
How often is coughing the presenting
complaint in patients with gastroesophageal
reflux disease?
Prognosis
6
What’s the prognosis of lupoid sclerosis?
Etiology
10
What are the causes of hypomagnesemia?
Total
50
Gathering Judgments

Manually formulated PubMed queries

~40 minutes per question; gathered top 50 fits
Question: What is the best treatment for analgesic rebound
headaches?
PubMed Query: (((“analgesics”[TIAB] NOTMedline[SB]) OR
“analgesics”[MeSH Terms] OR “analgesics”[Pharmacological Action]
OR analgesic[TextWord]) AND ((“headache”[TIAB] NOT Medline[SB])
OR “headache”[MeSH Terms] OR headaches[TextWord]) AND
(“adverse effects”[Subheading] OR side effects[Text Word])) AND
hasabstract[text] AND English[Lang] AND “humans”[MeSH Terms]

Manually evaluated all retrieved citations

~2 hours per question
Knowledge Extraction Example
Antipyretic efficacy of ibuprofen vs acetaminophen.
OBJECTIVE--To compare the antipyretic efficacy of ibuprofen, placebo, and acetaminophen. DESIGN-Double-dummy, double-blind, randomized, placebo-controlled trial. SETTING--Emergency department
and inpatient units of a large, metropolitan, university-based, children's hospital in Michigan.
PARTICIPANTS--37 otherwise healthy children aged 2 to 12 years with acute, intercurrent, febrile
illness. INTERVENTIONS--Each child was randomly assigned to receive a single dose of
acetaminophen (10 mg/kg), ibuprofen (7.5 or 10 mg/kg), or placebo. MEASUREMENTS/MAIN
RESULTS--Oral temperature was measured before dosing, 30 minutes after dosing, and hourly
thereafter for 8 hours after the dose. Patients were monitored for adverse effects during the study and
24 hours after administration of the assigned drug. All three active treatments produced significant
antipyresis compared with placebo. Ibuprofen provided greater temperature decrement and longer
duration of antipyresis than acetaminophen when the two drugs were administered in approximately
equal doses. No adverse effects were observed in any treatment group. CONCLUSION--Ibuprofen is a
potent antipyretic agent and is a safe alternative for the selected febrile child who may benefit from
antipyretic medication but who either cannot take or does not achieve satisfactory antipyresis with
acetaminophen.
Am J Dis Child. 1992 May; 146(5):622-5
Population
Problem
Interventions
Question
Outcome
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Knowledge Extractors

Population, Problem, Intervention: IE task



Exploited coverage of medical concepts in UMLS
Additional candidate ranking based a few features
Outcome: sentence-level classification task


“Kitchen sink approach”, ensemble of classifiers
Features:
•
•
•
•
•

Manually-defined cue words
N-grams
Position in abstract
Presence of certain UMLS concepts
…
Semantics helps!
Question
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Knowledge Extractors
Antipyretic efficacy of ibuprofen vs acetaminophen.
OBJECTIVE--To compare the antipyretic efficacy of ibuprofen, placebo, and acetaminophen.
DESIGN--Double-dummy, double-blind, randomized, placebo-controlled trial. SETTING-Emergency department and inpatient units of a large, metropolitan, university-based,
children's hospital in Michigan. PARTICIPANTS--37 otherwise healthy children aged 2 to 12
years with acute, intercurrent, febrile illness. INTERVENTIONS--Each child was randomly
assigned to receive a single dose of acetaminophen (10 mg/kg), ibuprofen (7.5 or 10 mg/kg),
or placebo. MEASUREMENTS/MAIN RESULTS--Oral temperature was measured before
dosing, 30 minutes after dosing, and hourly thereafter for 8 hours after the dose. Patients
were monitored for adverse effects during the study and 24 hours after administration of the
assigned drug. All three active treatments produced significant antipyresis compared with
placebo. Ibuprofen provided greater temperature decrement and longer duration of
antipyresis than acetaminophen when the two drugs were administered in approximately
equal doses. No adverse effects were observed in any treatment group. CONCLUSION-Ibuprofen is a potent antipyretic agent and is a safe alternative for the selected febrile child
who may benefit from antipyretic medication but who either cannot take or does not achieve
satisfactory antipyresis with acetaminophen.
Am J Dis Child. 1992 May; 146(5):622-5
Problem
Population
Intervention
Outcome
?
?
?
?
90% 5% 5%
80% 13% 7%
80% 0% 20%
95% 0% 5%
Question
Details: Dina Demner-Fushman and Jimmy Lin. Answering
Clinical Questions with Knowledge-Based and Statistical
Techniques. Computational Linguistics, 33(1):63-103, 2007
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Semantic Matching

Three score components:
SEBM = SPICO + SSoE + SMeSH
SPICO
Matching PICO frame elements
SSoE
Strength of evidence considerations
SMeSH
MeSH indicators for each clinical task
Problem Structure
User Tasks
Question
Details: Dina Demner-Fushman and Jimmy Lin. Answering
Clinical Questions with Knowledge-Based and Statistical
Techniques. Computational Linguistics, 33(1):63-103, 2007
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Semantic Matching: Evaluation

Research Questions




Does it work?
What are the relative contributions of each component?
What is the interaction between knowledge-based and
statistical techniques?
Approach


Reranking experiments with test collection
Ablation studies
Question
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Evaluation: Abstract Reranking
Question: What is the best treatment for analgesic rebound headaches?
(((“analgesics”[TIAB] NOTMedline[SB]) OR “analgesics”[MeSH
Terms] OR “analgesics”[Pharmacological Action] OR
analgesic[TextWord]) AND ((“headache”[TIAB] NOT Medline[SB])
OR “headache”[MeSH Terms] OR headaches[TextWord]) AND
(“adverse effects”[Subheading] OR side effects[Text Word])) AND
hasabstract[text] AND English[Lang] AND “humans”[MeSH Terms]
Clinical task,
PICO frame
P
MEDLINE
C
Knowledge
Extractor
I
Semantic
Matcher
O
vs. original PubMed ordering
vs. Indri baseline (state-of-the-art LM)
Question
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Results: Complete Model

Performance on held-out blind test set:
Therapy
Diagnosis
Prognosis
Etiology
All
Precision at 10 (P10)
PubMed
.350 (–39%)
.150 (–70%)
.200 (–46%)
.320 (–20%)
.281 (–44%)
Indri
.575
.500
.367
.400
.500
EBM
.783 (+36%)
.583 (+17%)
.467 (+27%)
.660 (+65%)
.677 (+35%)
Mean Average Precision (MAP)
PubMed
.421 (–29%)
.279 (–48%)
.235 (–56%)
.364 (–17%)
.356 (–35%)
Indri
.595
.534
.533
.439
.544
EBM
.765 (+29%)
.637 (+19%)
.722 (+35%)
.701 (+60%)
.718 (+32%)
Results are statistically significant
Question
Details: Jimmy Lin and Dina Demner-Fushman. The Role of
Knowledge in Conceptual Retrieval: A Study in the Domain of
Clinical Medicine. SIGIR 2006.
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Results: Parameter Settings

Tuning each component
SEBM = λ1 SPICO + λ2 SSoE + (1 - λ1 - λ2 ) SMeSH


No statistically significant difference
Combining EBM + Indri
SEBM+Indri = λ SEBM + (1- λ ) SIndri

Better performance, but not statistically significant
Question
Details: Jimmy Lin and Dina Demner-Fushman. The Role of
Knowledge in Conceptual Retrieval: A Study in the Domain of
Clinical Medicine. SIGIR 2006.
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Results: Contributions

What’s the contribution of each EBM facet?
MAP
vs. EBM
vs. Indri
SPICO
.646
–10%**
+19%*
Problem Structure
SSoE + SMeSH
.538
–25%**
–1%
User Tasks
P10
vs. EBM
vs. Indri
SPICO
.627
–7%
+25%**
Problem Structure
SSoE + SMeSH
.485
–28%**
–3%
User Tasks
** = sig. at 99%, * = sig. at 95%

What types of knowledge are important?


Problem structure (K1) helps a lot
User tasks (K2) help, but not as much
Question
Details: Jimmy Lin and Dina Demner-Fushman. The Role of
Knowledge in Conceptual Retrieval: A Study in the Domain of
Clinical Medicine. SIGIR 2006.
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Results: Partial Models

Can we use limited knowledge to improve termbased methods?
λ
SIndri
Term Statistics
λ SIndri + (1- λ) SPICO
.46
+ Problem Structure
λ SIndri + (1- λ)(.5 SSoE + .5 SMeSH) .55
MAP
P10
.544
.500
.668 (+23%)**
.627 (+25%)**
.620 (+14%)**
.565 (+13%)*
+ User Tasks
** = sig. at 99%, * = sig. at 95%

Any knowledge helps!
Question
Details: Jimmy Lin and Dina Demner-Fushman. The Role of
Knowledge in Conceptual Retrieval: A Study in the Domain of
Clinical Medicine. SIGIR 2006.
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Answer Generation

Physicians are most interested in outcomes

Approach: identify outcome sentences

Generate an answer from each citation: abstract title
and three highest scoring outcome sentences
Question: Does combining aspirin and warfarin decrease the risk of stroke for
patients with nonvalvular atrial fibrillation?
Answer: Prevention of thromboembolic events in atrial fibrillation: The results from
the SPAF III study demonstrated that a combination of mini-intensity warfarin plus
aspirin was insufficient for stroke prevention in atrial fibrillation. Other trials now
indicate, that oral anticoagulation at INR-values below 2.0 is not effective for stroke
prevention in these patients. The present clinical challenge is to ensure effective
and safe oral anticoagulation to patients with atrial fibrillation at high risk of stroke.
abstract title
outcome1
outcome2
Question
outcome3
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Evidence Synthesis

Integrate findings from multiple citations
Question: What is the best treatment for chronic prostatitis?
► anti-microbial
[temafloxacin] Treatment of chronic bacterial prostatitis with temafloxacin.
Temafloxacin 400 mg b.i.d. administered orally for 28 days represents a safe
and effective treatment for chronic bacterial prostatitis.
[ofloxacin] Ofloxacin in the management of complicated urinary tract infections,
including prostatitis. In chronic bacterial prostatitis, results to date suggest that
ofloxacin may be more effective clinically and as effective microbiologically as
carbenicillin.
...
► Alpha-adrenergic blocking agent
[terazosine] Terazosin therapy for chronic prostatitis/chronic pelvic pain
syndrome: a randomized, placebo controlled trial. CONCLUSIONS: Terazosin
proved superior to placebo for patients with chronic prostatitis/chronic pelvic
pain syndrome who had not received alpha-blockers previously.
...
Question
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Semantic Clustering
Cluster1
Cluster2
relevant
citations
Cluster3
Answer
Extraction
Semantic
Clustering
Question
Interactive
Presentation
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Evaluation: Evidence Synthesis

What is the best treatment of X?

Compare



Top three answers from PubMed
First answer in three largest semantic clusters
Evaluation by a physician:
“Good”
“Okay”
“Bad”
PubMed
0.600
0.227
0.173
Semantic Clustering
0.827
0.133
0.040
Question
Details: Dina Demner-Fushman and Jimmy Lin. Answer
Extraction, Semantic Clustering, and Extractive Summarization for
Clinical Question Answering. ACL 2006.
Query
Formulator
Semantic
Matcher
PubMed
Knowledge
Extractors
Answer
Generator
Answers
Findings

K1 + K2 + K3 → “conceptual retrieval”

Knowledge helps a lot!

But here’s the catch:



Limited domain: “narrow but deep”
Dependent on availability of existing resources
Beyond “bag of words”:




Develop a general framework
Instantiate in domain-specific applications
Leverage lessons learned to refine the framework
Rinse, repeat
Re: Re: Conceptual Retrieval
Question:
In children with an acute febrile illness, what is the efficacy of singlemedication therapy with acetaminophen or ibuprofen in reducing fever?
Task
P
I
C
O
P
I
C
O
MEDLINE
therapy
children/acute febrile illness
acetaminophen
ibuprofen
reducing fever
children/acute febrile illness
acetaminophen
ibuprofen
reducing fever
facet
Task
P
facet I
C
O
facet
Answer:
Ibuprofen provided greater
temperature decrement and
longer duration of antipyresis
than acetaminophen when the
two drugs were administered
in approximately equal doses.
therapy
facet
children/acute febrile illness
acetaminophen
ibuprofen facet
reducing fever
NLM’s authoritative repository of 17 million+ abstracts
= faceted query!
Conceptual Retrieval

“Building blocks” strategy in library science



Decompose information need into conceptual facets
Identify terms that represent those facets
Instantiate in a structured query
( A1  A2 …)  ( B1  B2 …)  ( C1  C2 …)  ( D1  D2 …) …
P

I
C
O
EBM-based retrieval is a specific case of facet
analysis and structured querying!
A General Framework?

For a domain
1.
2.
3.
4.
5.

Identify prototypical information needs
Develop a frame-based representation
Build extractor for frame elements
Instantiate semantic matcher
Watch performance go up!
The subject of ongoing work…
What comes next?

Retrieval in the biomedical domain
Information describing the role(s) of a [gene] involved in a [disease].
gene: Interferon-beta
disease: Multiple Sclerosis
Information describing the role of a [gene] in a specific [biological process].
gene: nucleoside diphosphate kinase (NM23)
biological process: tumor progression

Complex question answering
What evidence is there for transport of [art looted by the Nazis in WWII]
from [Germany] to [France]?
What [familial ties] exist between [Neanderthals] and [humans]?
What [common interests] exist between [Network Solutions] and [the
Internet Corporation for Assigned Names and Numbers (ICANN)]?
Acknowledgments

Dina Demner-Fushman (Ph.D., 2006)

This work was funded in part by NLM
References
Dina Demner-Fushman and Jimmy Lin. Answering Clinical Questions with KnowledgeBased and Statistical Techniques. Computational Linguistics, 33(1):63-103, 2007.
Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual
Retrieval: A Study in the Domain of Clinical Medicine. Proceedings of the 29th Annual
International ACM SIGIR Conference on Research and Development in Information
Retrieval (SIGIR 2006), 2006, pp. 99-106.
Dina Demner-Fushman and Jimmy Lin. Answer Extraction, Semantic Clustering, and
Extractive Summarization for Clinical Question Answering. Proceedings of the 21th
International Conference on Computational Linguistics and 44th Annual Meeting of the
Association for Computational Linguistics (COLING/ACL 2006), 2006, pp. 841-848.
Download