Slides

advertisement
Resolving Healthcare Forum
Posts via Similar Thread Retrieval
Jason H.D. Cho1,2, Parikshit Sondhi1,
Chengxiang Zhai1, Bruce R. Schatz1,2,3
1Department
of Computer Science,
2Institute of Genomic Biology,
3Department of Medical Information Science,
University of Illinois at Urbana-Champaign, Urbana, IL
Motivation
•
72% of internet users looked online for health information
within the past year
•
18% of internet users have gone online to find others who
might have health concerns similar to theirs
•
Improving health information retrieval and similar case
retrieval will improve quality of search for vast majority of
users
• Not many posts are answered in timely manner!
* Pew Research http://www.pewinternet.org/
2
Motivation
3
Envisioned Response
The following threads discuss similar problems:
 Doritos Allergy Very Severe and New
 Certain Foods + Beer = Flushing and Head Pounding…Help!
 Peanut/Food Allergies
……………………
4
Case Retrieval Task
•
Traditionally defined as retrieving relevant cases
doctors may be interested in
• Doctors may want to compare cases that are
similar to the current patient
•
In online domain, we define this as retrieving forum
posts written by patients
•
We tackled cases where we do not know user’s
background
5
Query Characteristics
• Queries meant for human experts not automated systems
• Simple non-technical language
• Presence of emotional statements
6
Document Characteristics
7
Our Goal
•
How can we improve case retrieval search task?
• How should we represent queries?
• Entity-based search, or context-based search?
• Which posts are most informative in a given thread?
• Can we utilize forum categories?
8
Evaluation via Pooling
•
350K threads and 20 queries from HealthBoards
•
2 judges first judged 100 query-thread pairs
• 88% agreement (κ=0.76)
•
730 total judged query-thread pairs
• 324 relevant
• 406 irrelevant
9
Method Summary
•
•
•
•
Baseline weighting
•
First Post BM-25
•
Thread BM-25
Q: How should we represent queries?
Semantic weighting
•
Medical term extraction
•
Shallow Information Extraction
Post weighting
•
Monotonic weighting
•
Parabolic weighting
Forum Category weighting
•
Uniform weighting (FCUW)
•
Feedback weighting (FCFW)
10
State of the Art Baseline
•
Baseline BM-25 formula:
•
c(w,t): Count of word w in thread t
•
c(w,q): Count of word w in query q
•
FPBM-25: Consider only the content of first post to represent the
thread document
•
TBM-25: Consider content of entire thread to represent the thread
document
11
Results: Query Representation Comparison
Run
Method
P@5
Recall@30
MAP
0.3000
0.2846
0.1977
B1
Baseline TBM-25
B2
Baseline FPBM-25 0.4700 (56.6%) 0.4975 (74.8%) 0.3316 (67.7%)
Representing first post as query is better than
utilizing all of the posts
12
Method Summary
•
•
•
•
Baseline weighting
•
First Post BM-25
•
Thread BM-25
Semantic weighting
Q: Which one works better? Entitybased search, or context-based search?
•
Medical term extraction
•
Shallow Information Extraction
Post weighting
•
Monotonic weighting
•
Parabolic weighting
Forum Category weighting
•
Uniform weighting (FCUW)
•
Feedback weighting (FCFW)
13
Medical Entity Extraction
•
Applied ADEPT toolkit (MacLean and Heer 2013)
•
High precision but low recall
14
MedicalEx: Relevance Scoring
Modified
query
frequency
Count of
occurrences
labeled as med
entity
Count of
occurrences
not labeled as
med entity
15
Shallow Information Extraction
I am severly allergic to some product that is found in
both Tostitos and Doritos, as well as random other
types of chips. I know the solution is "don't eat chips"
but what could the product be? I don't
want to accidentally consume it. When I eat this, I get
very bad stomach cramps and it ruins the rest of my
day/night - the only solution is to go to sleep so I can't
feel it. Help! Any ideas on this?
Physical Examination (PE)
Medication (MED)
Background (BKG)
Disease, Symptoms
Treatment, Prevention
Neither PE nor MED
Sondhi, 2010
16
ShallowEx: Relevance Scoring
Modified
Query Count
Word count in
PE sentences
Word count in
MED sentences
Word count in
BKG sentences
Give higher importance to PE and MED sentences
17
Results: Semantic Methods
Run
Method
P@5
Recall@30
MAP
B2
Baseline FPBM-25
0.4700
0.4975
0.3316
S1
B2+MedEx
0.4600
0.4283
0.2918
S2
B2+ShallowEx
0.53 (12.7%) 0.4847 (-2.5%) 0.3481 (4.9%)
Shallow extraction is better than medical entity
extraction
18
Method Summary
•
•
•
•
Baseline weighting
•
First Post BM-25
•
Thread BM-25
Semantic weighting
•
Medical term extraction
•
Shallow Information Extraction
Post weighting
•
Monotonic weighting
•
Parabolic weighting
Q: Which posts are most
informative in a given thread?
Forum Category weighting
•
Uniform weighting (FCUW)
•
Feedback weighting (FCFW)
19
Post Weighting
c' ( w, t )
Not all posts are equally representative
Sondhi, 2013
20
Post Weighting
f (1,3)c(w, p1 )
f (3,3)c( w, p3 )
f (i, K ) : gives the weight of post i in a thread with K posts
21
Monotonic Post Weighting
Relative
Post
Weight
for K=10
m  1
m  3
m  2
Post Position i
22
Parabolic Post Weighting
23
Post Weighting Methods Evaluation
Accuracy
0.8
0.7
0.6
Uniform
Monotonic
0.5
Parabolic
0.4
FF
UF
LQ
Forum Used
Cross
Forum
24
Results: Post Weighting
Run
Method
P@5
Recall@30
MAP
B2
Baseline FPBM-25
0.4700
0.4975
0.3316
P1
Monotonic
0.5100 (8.5%) 0.5240 (5.3%) 0.3631 (9.5%)
P2
Parabolic
0.5100 (8.5%)
0.5040
0.3494
Both post weighting schemes outperform the
baseline
25
Method Summary
•
•
•
•
Baseline weighting
•
First Post BM-25
•
Thread BM-25
Semantic weighting
•
Medical term extraction
•
Shallow Information Extraction
Post weighting
•
Monotonic weighting
•
Parabolic weighting
Forum Category weighting
•
Uniform weighting (FCUW)
•
Feedback weighting (FCFW)
Q: Can we utilize forum categories?
26
Forum Categories
27
Forum Category Weighting
•
Relevance feedback based on top k retrieved categories
• Forum Category Uniform weighting (FCUW)
•
Forum Category Feedback weighting (FCFW)
Randomly selecting forum ID
Ratio of current forum ID
amongst retrieved
documents
28
Forum Category Weighting Scoring
Weights for forum
category weighting
New Score
Forum Category
Feedback weighting
29
Results: Forum Category Weighting
Run
P@5
Recall@30
MAP
Baseline FPBM-25
0.4700
0.4975
0.3316
P1
Uniform weighting
0.5200
(10.6%)
0.4678
(-7.0%)
0.3334 (0.5%)
P2
Feedback weighting
0.5100
(8.5%)
0.4610
(-7.3%)
0.3389 (2.2%)
B2
Method
Uniform weighting and Feedback weighting similar
performance, but FCFW less parameters to tune.
30
Results: Method Combinations
Run
Method
P@5
Recall@30
MAP
B2
Baseline FPBM-25
0.4700
0.4975
0.3316
S2
Baseline FPBM-25
+ ShallowEx
0.53
0.4847
0.3481
C2
Monotonic
+ ShallowEx
0.5400 (14.9%)
0.5354 (7.6%)
0.3745 (12.9%)
C3
Parabolic
+ShallowEx
0.5100
0.5155
0.3573
Monotonic + ShallowEx performs the best
C4
Monotonic +
ShallowEx + FCFW
0.5200
0.5625 (13.1%)
0.3702
31
Conclusion
•
Fairly high P@5 accuracy is achievable
•
Treating first post as query performed the better than utilizing all
posts in thread
•
Shallow information extraction is better for query understanding
• Incorporates contextual information
•
Utility of posts drops steadily with position
•
Easy extension of baseline method
32
Future Work
•
Recommending relevant forum posts for doctors
• Various online forums have ‘ask a doctor’ section
• Doctors will save time by recommending forum posts
•
Intent-based case retrieval
•
•
Identifying intents for both the end user and the existing posts will
improve search quality
Examples: Cause of symptom, managing disease, adverse effects
33
Acknowledgements
•
This work is supported in part by the National Science Foundation under Grant Number
CNS-1027965. We would also like to thank the anonymous reviewers for their invaluable feedback, and Institute of Genomic Biology for their computing resources.
34
Questions?
Thank you!
35
References
•
J. H. D. Cho and V. Q. Liao and Y. Jiang and B. Schatz, Aggregating
Personal Health Messages for Scalable Comparative Effectiveness
Research. ACM BCB, 2013
•
J. H. D. Cho and P. Sondhi and C. Zhai and B. Schatz, Resolving
Healthcare Forum Posts via Similar Thread Retrieval. ACM BCB, 2014
•
K. Pattabiraman and P. Sondhi and C. Zhai, Exploiting Forum Thread
Structures to Improve Thread Clustering. ICTIR 2013.
•
P. Sondhi and M. Gupta and C. Zhai and J. Hockenmaier, Shallow
Information Extraction from Medical Forum Data. COLING 2010.
•
B. W. Chee and R. Berlin and B Schatz, Predicting Adverse Drug Events
from Personal Health Messages, AMIA 2011
•
Diana L. MacLean and Jeffrey Heer. Identifying medical terms in patientauthored text: a crowdsourcing-based approach. Journal of the American
Medical Informatics Association, pages amiajnl–2012–001110+, May 2013.
36
Features & Performance of Shallow
Information Extraction Method
37
ShallowEx: Extraction Model
Performance results for different feature sets
Percentage Accuracy
76
74
72
70
68
66
Order-1
CRF
SVM
64
62
60
Feature Set
We use the best performing SVM based classifier
(Posts: 175, Sentences: 1494)
38
Download