Understanding User Intents in Online Health Forums

advertisement
Understanding User Intents in
Online Health Forums
Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai
Department of Computer Science
University of Illinois at Urbana-Champaign
5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
Newport Beach, California
22nd September 2014
1
Online Health Forums
• Purpose: To provide a convenient platform to facilitate
discussion among patients and professionals
• Huge user base, and still growing!
• In 2011, 80% of all web users searched for health
information online, of which 6% participated in health
related discussions
• Forums contain valuable information
– Contain rich, often first hand experiences
2
Deficiencies of Forums
• Threads are scattered
• Similar questions are asked again and again
• Keyword search is inadequate
– Finding several keyword matches in a thread does
not necessarily mean that the thread is relevant
3
Post about
cholinergic
urticaria in
April 2004
Received 3rd
and final reply
a week later
Post from
March 2012
No replies as
of July 2014
4
Applications of Intents
• Improving thread retrieval
– e.g. A thread whose original post matches both
keywords and intent specified by the user are
more likely to be helpful
• Filtering threads
– e.g. To treat a condition, only look at posts asking
about treatment
• Understanding user behavior in forums
– i.e. users of different forums have different intents
5
This Paper
• Introduces problem of identifying user intents in
health forums as a classification problem
• Derives the first taxonomy of user intents
• Designs a set of novel features for use with
machine learning to solve the problem
• Create the first dataset for evaluation, and
conducted experiments to make empirical
findings
6
Roadmap
1. Problem formulation
2. Intent taxonomy derivation
3. Methodology
–
–
–
Support vector machines
Hierarchical classification
Feature design
4. Evaluation
–
–
–
Dataset
Experiments
Results
5. Intents in MedHelp forums
6. Wrap-up
7
Problem Formulation
Given 𝑂, an original thread post from our dataset
𝐷 with intent 𝑐𝑖 from a taxonomy of user intents
𝐢 = 𝑐1 , … , π‘π‘˜ . Denote 𝑆 = {𝑠1 , … , 𝑠𝑛 } as the
sentence representation of 𝑂.
Classify 𝑂 as some 𝑐𝑗 ∈ 𝐢 using 𝑆 as evidence.
𝑂 is correctly classified if and only if 𝒋 = π’Š
8
Taxonomy Derivation
• No taxonomy exists for health forum intents
• Solution: Create our own!
• First reduce top ten most commonly asked generic
questions by doctors (Ely et al, 2000) into three intent
classes
– Classes match the intents of users who search for health
information online (Choudhury et al, 2014)
• Next introduce two additional intent classes that are
specific to health forum posts
9
Taxonomy
• Manage: How should I manage or treat condition X?
• Cause: What is the cause of symptom/physical/test
finding X?
• Adverse: Can drug or treatment X cause adverse
finding Y?
• Combo: Combination (at least two of first three)
• Story: Story telling, news, sharing or asking about
experience, soliciting support, or others
10
Where are we?
1. Problem formulation
2. Intent taxonomy derivation
3. Methodology
–
–
–
Support vector machines
Feature Selection
Hierarchical classification
4. Evaluation
–
–
–
Dataset
Experiments
Results
5. Intents in MedHelp forums
6. Wrap-up
11
Support Vector Machines (SVM)
• Main idea: Learn a
hyperplane from examples
to separate them into two
classes
• Use learned hyperplane to
classify unseen examples
• Capable of non-linear and
multiclass classification
• Shown to have good
performance on high
dimensional data
12
Post Representation
• How should we represent posts?
– SVMs require examples to be represented as a
vector of features
• What are features?
– Some measurable property of the observed data
• How should we select them?
13
Feature Selection
A good feature should be:
1. Generic enough to be found in many posts
2. Sufficiently discriminative for different
intents
14
Solution: Patterns!
• Sequence of (possibly non-contiguous) tokens
that represent recurring text patterns in
sentences
• Very generic
– Lowercasing, stemming
– POS tagging
– UMLS semantic group tagging
• Very discriminative
– “What could X be…?” signifies Cause intent, but
“What does X do…?” signifies Manage intent
15
Pattern Types
Each pattern falls under one of four types:
• LSP: Lowercased + stemmed tokens only
– E.g. “…what can caus…”
• POSP: LSP + POS tags
– E.g. “…how to <VERB>…”
• SGP: LSP + semantic group tags
– E.g. “…if <CHEM> works…”
• ALL: All types of tokens and tags
– E.g. “…<CHEM> make <PRP> feel…”
16
UMLS Semantic Groups
• MetaMap labels
text phrases with
semantic group
labels from the
UMLS
Metathesaurus
17
Caveat
• Patterns possess limitations
– Difficult to achieve good coverage without
sacrificing discriminative properties
– Impossible to extract for posts with large content
variations (e.g. Story posts)
• However, we still want complete coverage of
our dataset!
18
Solution: Hierarchical Classification!
• Two cascading SVM classifiers
– The first uses binary pattern
features (Pattern SVM)
– The second uses unigram
features with TF-IDF weighting
(Word SVM)
• Complete coverage allows
comparison with unigram
baseline
Input Post
Match ≥ 1
pattern?
Yes
No
Pattern
SVM
Word
SVM
Output Class
19
Where are we?
1. Problem formulation
2. Intent taxonomy derivation
3. Methodology
–
–
–
Support vector machines
Hierarchical classification
Feature design
4. Evaluation
–
–
–
Dataset
Experiments
Results
5. Intents in MedHelp forums
6. Wrap-up
20
Dataset
• No labeled dataset exists, since this is a new
problem
• So we create our own!
– 1,192 original HealthBoards posts, evenly divided
among four topics: allergies, breast cancer,
depression, and heart disease
• Ideally want more posts, but labeling is expensive
• Why the four topics?
21
Dataset Labeling
• Labeling done by two CS students
– Substantial* agreement with medical students
(πœ… = 0.67)
– Substantial* agreement between themselves (πœ… =
0.665, 74.67% labels match)
• Combo posts labeled by a third CS student
according to their underlying classes
– A Combo post is predicted correctly if a classifier
outputs one of its class labels
*Per Landis and Koch, 1977
22
Experiments
• What is the best performing set of patterns?
– Try different type combinations of patterns
• How does hierarchical compare with baseline?
– Five-fold cross validation (CV)
• Does performance suffer if we train on posts
from three topics and test on the fourth?
– Four-fold forum CV
23
Selecting a Pattern Set
πΆπ‘œπ‘Ÿ.
πΆπ‘œπ‘Ÿ.
2𝑃𝑅
𝑃=
,𝑅 =
, 𝐹1 =
π‘‡π‘œπ‘‘.
𝑀 + 𝐢 + |𝐴|
𝑃+𝑅
24
CV Takeaways
• Overall
Patterns
reach
labeling
agreement
upper
bound
Patternsimprovement
give high
precision
butforum
low
recall
generalize
well
across
topics
is
underwhelming,
why?
– Why is this acceptable?
Hierarchical
Classification
Performance
Word Classifier
(Baseline) Performance
25
Intents in MedHelp Forums
We applied our
Pattern SVM to
61,225 MedHelp
posts split across
allergies, breast
cancer, depression,
and heart disease
26
Concluding Remarks
• Introduced the new problem of forum post intent
analysis
• Designed the first taxonomy and dataset for
classification
• Proposed a novel set of pattern features for SVMs
• Proved that patterns give high classification
precision while generalizing well across forums
27
Future Work
• Administer study of health forum user intents
• Expand pattern feature set to improve recall
• Handle classification of Story posts
• Identify all intents from Combo posts
• Further evaluation with larger datasets
28
Thank you!
Questions? Comments?
29
Download