Slides

advertisement
Syndromic Classification of Twitter
Messages
Nigel Collier and Son Doan
National Institute of Informatics, Tokyo
collier@nii.ac.jp, sondoan@gmail.com
E-HEALTH, November 2011
Time
Rumours
Sentinel
networks
GP reports
Field
workers
Laboratory
reports
Certainty
Blog rumour>
“Ahh! Really bad
throat.”
News report>
“Influenza starts
early this year.”
Blog rumour>
“Still getting worse.
Staying at home
temp is up to 39.5.”
Blog rumour>
“I’m sick with a
chest infection”
News report>
“Mystery illness
causes concern.”
Overview
•
•
•
•
Research context
Method
Results
Significance and limitations
Syndromic classification of Twitter messages
RESEARCH CONTEXT
Alerting real world events
What signals should
we be looking for?
2. Web microblog response
1. Personal event
“i’ve been waitin at
the docs all
morning with flu”
5. Issue alert
Seeking medical
intervention
3. Text mining on unstructured blogs
Alert level
400
Time
9/1…
9/1…
9/1…
9/1…
9/9/…
9/7/…
9/5/…
9/3/…
9/1/…
8/3…
8/2…
0
8/2…
200
8/2…
News volume
4. Detecting unusual events
‘See what the world is doing right now’ –
microblogs versus newswire
• Newswire:
–
–
–
–
–
Event based reports
Near real time
Reporting bias (focus on reader’s concerns)
Editorial quality control  Low level of noise
Good for health event alerting; Unknown for case counting
• Social media microblogs:
–
–
–
–
–
–
Personal reports and event based re-reporting
Real time
Reporting bias (focus on writer’s concerns)
Large-scale and independent
Little quality control  High level of noise
Unknown for health event alerting; Probably good for case counting
Twitter characteristics [1]
• Twitter posts (tweets) are limited to 140 characters
– Low user investment in time and thought for content generation (Java et al.
2007)
– High use of abbreviations and aliases
– Dynamic lexicon of semantic tags (hashtags)
• Very high volume of data:
–
–
–
–
55 million tweets per day
Hundreds of micro-blogs each second for major events (Petrovic et al. 2010)
Compared to ~0.1 news reports each second for newswire
Surge capacity requires highly efficient algorithms
• High numbers of users
– 106 million announced at Twitter developer’s conference 2010
Twitter characteristics [2]
• Typical tweet contents (Nardi et al. 2004)
–
–
–
–
Daily experience (All about me)
Share opinions
Commentary on events
Spam
• Meta data:
– Geo-tagging
– Time stamping
– User profile
• Event reports sometimes ahead of newswire, e.g. Iranian presidential
protests, swine flu outbreak reports from CDC, deaths of famous people
(Petrovic et al. 2010)
Previous work on online personal signal analysis
• Google flu trends (Ginsberg et al. 2009, Valdivia et al. 2010)
• Ushahidi (Okolloh 2009)
• Flutracker (http://flutracker.rhizalabs.com/)
• Twitter earthquake detector (Guy et al. 2010)
• First story detection (Petrovic et al. 2010)
• Maximum story coverage (Saha and Getoor, 2009)
• New study:
– GP consultation correlation for ILI in the UK (Lampos et al. 2010)
Syndromic classification of Twitter messages
METHOD
Schema development
• Syndromic categories
– A syndrome is a collection of symptoms (specific and non-specific) that are
indicative of a class of diseases;
– Six syndrome categories were chosen: constitutional, respiratory,
gastrointestinal, hemorrhagic, rash;
– Syndromes and symptoms were based on those in the BioCaster ontology,
developed by experts in computational linguistics, public health, genetics and
anthropology.
– Symptom lists were expanded to include informal synonyms found in Twitter
data, e.g. ‘stomach ache’, ‘belly ache’, ‘belly pain’, ‘stomach hurt’.
– Case descriptions for each syndrome were then developed with positive and
negative examples;
Gold standard data [1]
• Three students annotated 2000 tweets per syndrome into positive or
negative;
• Data was sourced from Twitter between 9th and 24th July 2010 using
symptom keywords and removing duplicate messages;
• We then chose messages where all 3 annotators agreed on the
classification to train the classifiers.
• Pairwise kappa ranged from 0.42% (Neurological) to 0.92% (Hemorrhagic).
Gold standard data [2]
• Positively tagged messages only included subject as user or close family
member;
• Hypothetical reports are negative;
• User opinions about other people are negative;
• Reports of conditions must be within one week of the posting time;
• Reported syndromes can belong to more than one category;
Gold standard data [3]
Features and Models
• Features: Bag of words including hashtags but excluding links
• Models: Naïve Bayes (McCallum’s Rainbow) , SVM (SVM Light) with
polynomial kernel (p=1,2,3) and radial basis function kernel (RBF)
• 10-fold cross validation
Syndromic classification of Twitter messages
RESULTS
Classifying twitter messages for syndromes
•
•
•
•
•
SVM with degree 1 kernel performed the best;
Precision ranged from 82.0 to 83.8 (SVM degree 1);
Recall ranged from 58.3 to 96.2 (SVM degree 1);
Performance moderately correlates with P/N ratio;
Noticeably weak performance for Hemorrhagic and Gastrointestinal
where positive data was scarce and Kappa was lower.
Difficult cases
• Metaphoric symptoms
– Cabin fever setting in right now.
• Wide range of common meanings
– Exhausted after days of housework.
• Interrogative sentences
– wonder how long u get off work with swine flu?
• Hypothetical sentences
– I can ignore this sore throat no longer. And, um, maybe I should have
gotten that H1N1 vaccine.
– It's a mask I use with spray paint, but if I did have swine flu, why would
I need a mask?
• Others
– Too much lemonade. My throat is burning.
BioCaster: early alerting for public health events
Ontology browsing
Email/GeoRSS alerting
Watchboard, etc.
Trend graphs
Event database search
Event maps
Up to date news in
multiple languages
WHO
IT
JP
CA
GHSAG
partners
US
UK
FR
DE
Event alerts
Real time Twitter
analysis
Syndromic classification of Twitter messages
SIGNIFICANCE AND LIMITATIONS
Discussion
• Twitter offers unique challenges and opportunities for epidemic
surveillance;
• Very challenging environment for automated classification but evidence
from several studies points to close correlation between ILI keywords and
laboratory data. No studies yet on correlating other syndromes.
• The 6 classifiers are available as part of the experimental DIZIE project
online at the BioCaster portal.
• Future work will look into change point detection and integrating social
media reports with evidence from news events for situational awareness.
Funding
2010 NII, internship grant and a grand challenge grant
PLEASE SEE:
http://born.nii.ac.jp
The landscape of online health event monitoring
GPHIN (Ginsberg et al. 2009) EpiSpider (Tolentino et al. 2007)
MiTaP (Damianos et al. 2002) BioCaster (Collier et al. 2008)
Argus (Wilson et al .2008)
Medisys (Yangarber et al. 2007)
HealthMap (Friefeld et al. 2008)ProMed-mail (Madoff 2004)
MiTaP
(?)
(Damianos et al. 2002)
Newswire
Radio
Share
Ushahidi
(Okolloh et al. 2009)
Twitter Earthquake Detector
(Guy et al. 2010)
HealthMap
SMS/
Query
microblog
Online
Signals
(Friefeld et al. 2008)
BioCaster
(Collier et al. 2008)
Social
networks
Lifestream
Discuss
Livecast
Google Flu Trends
(Ginsberg et al. 2009)
Discussion [2]
• Limitations of Twitter
– Representation of population by country, city and age group
Twitter user age distribution
source: sysomos.com
Twitter usage by major city
source: sysomos.com
New twitter users by country
source: sysomos.com
Download