Syndromic Classification of Twitter Messages Nigel Collier and Son Doan National Institute of Informatics, Tokyo collier@nii.ac.jp, sondoan@gmail.com E-HEALTH, November 2011 Time Rumours Sentinel networks GP reports Field workers Laboratory reports Certainty Blog rumour> “Ahh! Really bad throat.” News report> “Influenza starts early this year.” Blog rumour> “Still getting worse. Staying at home temp is up to 39.5.” Blog rumour> “I’m sick with a chest infection” News report> “Mystery illness causes concern.” Overview • • • • Research context Method Results Significance and limitations Syndromic classification of Twitter messages RESEARCH CONTEXT Alerting real world events What signals should we be looking for? 2. Web microblog response 1. Personal event “i’ve been waitin at the docs all morning with flu” 5. Issue alert Seeking medical intervention 3. Text mining on unstructured blogs Alert level 400 Time 9/1… 9/1… 9/1… 9/1… 9/9/… 9/7/… 9/5/… 9/3/… 9/1/… 8/3… 8/2… 0 8/2… 200 8/2… News volume 4. Detecting unusual events ‘See what the world is doing right now’ – microblogs versus newswire • Newswire: – – – – – Event based reports Near real time Reporting bias (focus on reader’s concerns) Editorial quality control Low level of noise Good for health event alerting; Unknown for case counting • Social media microblogs: – – – – – – Personal reports and event based re-reporting Real time Reporting bias (focus on writer’s concerns) Large-scale and independent Little quality control High level of noise Unknown for health event alerting; Probably good for case counting Twitter characteristics [1] • Twitter posts (tweets) are limited to 140 characters – Low user investment in time and thought for content generation (Java et al. 2007) – High use of abbreviations and aliases – Dynamic lexicon of semantic tags (hashtags) • Very high volume of data: – – – – 55 million tweets per day Hundreds of micro-blogs each second for major events (Petrovic et al. 2010) Compared to ~0.1 news reports each second for newswire Surge capacity requires highly efficient algorithms • High numbers of users – 106 million announced at Twitter developer’s conference 2010 Twitter characteristics [2] • Typical tweet contents (Nardi et al. 2004) – – – – Daily experience (All about me) Share opinions Commentary on events Spam • Meta data: – Geo-tagging – Time stamping – User profile • Event reports sometimes ahead of newswire, e.g. Iranian presidential protests, swine flu outbreak reports from CDC, deaths of famous people (Petrovic et al. 2010) Previous work on online personal signal analysis • Google flu trends (Ginsberg et al. 2009, Valdivia et al. 2010) • Ushahidi (Okolloh 2009) • Flutracker (http://flutracker.rhizalabs.com/) • Twitter earthquake detector (Guy et al. 2010) • First story detection (Petrovic et al. 2010) • Maximum story coverage (Saha and Getoor, 2009) • New study: – GP consultation correlation for ILI in the UK (Lampos et al. 2010) Syndromic classification of Twitter messages METHOD Schema development • Syndromic categories – A syndrome is a collection of symptoms (specific and non-specific) that are indicative of a class of diseases; – Six syndrome categories were chosen: constitutional, respiratory, gastrointestinal, hemorrhagic, rash; – Syndromes and symptoms were based on those in the BioCaster ontology, developed by experts in computational linguistics, public health, genetics and anthropology. – Symptom lists were expanded to include informal synonyms found in Twitter data, e.g. ‘stomach ache’, ‘belly ache’, ‘belly pain’, ‘stomach hurt’. – Case descriptions for each syndrome were then developed with positive and negative examples; Gold standard data [1] • Three students annotated 2000 tweets per syndrome into positive or negative; • Data was sourced from Twitter between 9th and 24th July 2010 using symptom keywords and removing duplicate messages; • We then chose messages where all 3 annotators agreed on the classification to train the classifiers. • Pairwise kappa ranged from 0.42% (Neurological) to 0.92% (Hemorrhagic). Gold standard data [2] • Positively tagged messages only included subject as user or close family member; • Hypothetical reports are negative; • User opinions about other people are negative; • Reports of conditions must be within one week of the posting time; • Reported syndromes can belong to more than one category; Gold standard data [3] Features and Models • Features: Bag of words including hashtags but excluding links • Models: Naïve Bayes (McCallum’s Rainbow) , SVM (SVM Light) with polynomial kernel (p=1,2,3) and radial basis function kernel (RBF) • 10-fold cross validation Syndromic classification of Twitter messages RESULTS Classifying twitter messages for syndromes • • • • • SVM with degree 1 kernel performed the best; Precision ranged from 82.0 to 83.8 (SVM degree 1); Recall ranged from 58.3 to 96.2 (SVM degree 1); Performance moderately correlates with P/N ratio; Noticeably weak performance for Hemorrhagic and Gastrointestinal where positive data was scarce and Kappa was lower. Difficult cases • Metaphoric symptoms – Cabin fever setting in right now. • Wide range of common meanings – Exhausted after days of housework. • Interrogative sentences – wonder how long u get off work with swine flu? • Hypothetical sentences – I can ignore this sore throat no longer. And, um, maybe I should have gotten that H1N1 vaccine. – It's a mask I use with spray paint, but if I did have swine flu, why would I need a mask? • Others – Too much lemonade. My throat is burning. BioCaster: early alerting for public health events Ontology browsing Email/GeoRSS alerting Watchboard, etc. Trend graphs Event database search Event maps Up to date news in multiple languages WHO IT JP CA GHSAG partners US UK FR DE Event alerts Real time Twitter analysis Syndromic classification of Twitter messages SIGNIFICANCE AND LIMITATIONS Discussion • Twitter offers unique challenges and opportunities for epidemic surveillance; • Very challenging environment for automated classification but evidence from several studies points to close correlation between ILI keywords and laboratory data. No studies yet on correlating other syndromes. • The 6 classifiers are available as part of the experimental DIZIE project online at the BioCaster portal. • Future work will look into change point detection and integrating social media reports with evidence from news events for situational awareness. Funding 2010 NII, internship grant and a grand challenge grant PLEASE SEE: http://born.nii.ac.jp The landscape of online health event monitoring GPHIN (Ginsberg et al. 2009) EpiSpider (Tolentino et al. 2007) MiTaP (Damianos et al. 2002) BioCaster (Collier et al. 2008) Argus (Wilson et al .2008) Medisys (Yangarber et al. 2007) HealthMap (Friefeld et al. 2008)ProMed-mail (Madoff 2004) MiTaP (?) (Damianos et al. 2002) Newswire Radio Share Ushahidi (Okolloh et al. 2009) Twitter Earthquake Detector (Guy et al. 2010) HealthMap SMS/ Query microblog Online Signals (Friefeld et al. 2008) BioCaster (Collier et al. 2008) Social networks Lifestream Discuss Livecast Google Flu Trends (Ginsberg et al. 2009) Discussion [2] • Limitations of Twitter – Representation of population by country, city and age group Twitter user age distribution source: sysomos.com Twitter usage by major city source: sysomos.com New twitter users by country source: sysomos.com