The second KYOTO workshop, Gifu 2011 Detecting Sub‐events in Environmental Disaster using Plurks Mei-Yu Chen, Qian-Rong Zhang, Li-Chuan Ku1 Jessie Chiao-Shan Lo, Jia-Fei Hong2 and Shu-Kai Hsieh3 1.National Taiwan Normal University 2. Academia Sinica 3. National Taiwan University 2011/1/27 1 Outline • • • • • • • Introduction (Sub)-event:What and Why? Our experiment Plurk Corpus Methods Evaluation and Result Conclusion 2011/1/27 2 Introduction • Recently, Internet social networking tools have been utilized as an alternative solution in emergency response during disasters (Huang et al, 2010). • Twitter and plurk have the feature of realtime nature (instantaneity). limitation of sentence length each post passes by as the timeline flow. 2011/1/27 3 Introduction • Sakaki et al (2010) proposes an algorithm to monitor tweets and to detect a target event in the environmental domain such as earthquake, typhoon, rainbow, etc. • By considering each Twitter user as a sensor, notification of earthquake is delivered much faster than the announcements that are broadcast by the Japan Metrology Agency. 2011/1/27 4 Japan S.Korea Philippine 2011/1/27 5 2011/1/27 6 Introduction Motivation • In the environmental domain, the cause of the event (i.e., etiology) and the aftermath of the event have great impact on the disaster management, which were not treated finely. Objective • Centered on the theme of TYPHOON, we aim to propose automatic ways to find out the event‐related cliques in the social network with spatio‐temporal information help in gathering and disseminating real‐time sub‐event information. 2011/1/27 7 What is an Event ? • Event has the commonly understood definition of "a happening" or "a noteworthy occurrence.” (Bedrosian, 2008) • Event has the .who .what .when .where and .how properties. • Example: Armstrong descended to the lunar surface on the Apollo 11 moon landing mission on July 20, 1969. . 2011/1/27 8 What is an Event ? Other properties • .agency -- Natural force or Human agency? • .impact -- on: humans, animals, nature • .why -- causality • .frequency and .predictability – [Time] • .parents and .children – [Precedents and Consequences] 2011/1/27 9 What and Why is Sub-Event? • What is a sub-event? Any kind of damages or inconvenience which are related to residents’ lives. The aftermath of the event : real-time notifications、Emergency response、 instant damage report. • Why sub-event? It is closely related to daily life. It can be easily embedded in the mobile application. 2011/1/27 10 Our Experiment • Detect the “sub-events of Typhoon” in the Plurk Social Web-As-Corpus. • Main event : Typhoon Morakot on August 8, 2009. • Sub-event : flood, wind, collapse, blackout, etc. 2011/1/27 11 What and Why is Plurk? • What is Plurk? Plurk is a free social networking and micro-blogging service that allows users to send updates through short messages or links, which can be up to 140 text characters in length. (Wikipedia) – Plurk has a timeline view integrating video and picture sharing. Updates are received in chronological order. • Why is Plurk? According to Alexa (The Web Information Company), 44.6% of Plurk's traffic comes from Taiwan. (up to November 21, 2010) 2011/1/27 12 It is flooding! The flood is two story high • By searching in the Plurk contents in Aug 8, 2009, we aim to automatically detect the instant events which is related to Typhoon. 2011/1/27 13 Data Source: Plurk Corpus • Plurk Corpus has been developed by NTU. • Using official Plurk API, both meta- and linguistic information of the users had been collected. • Currently, the corpus includes a total number of 20,448 users and among them, 17,973 from Taiwan. • 5,889,033 sentences and Over 20 millions words. 2011/1/27 14 THE DEMOGRAPHIC Total 20,448 users! INFORMATION17,973 from Taiwan 5,889,033 sentences Over 20 millions words 2011/1/27 15 15 Data Source: Plurk Corpus • Preprocessing of the corpus data • The corpus data has been filtered on Plurk by user info, (restricting to users in Taiwan), and time of utterance, (as on 08/Aug/2009, which is the date of typhoon Morakot.) • All the data were segmented by CKIP Chinese segmentator. 2011/1/27 16 Methods • Step 1: Bootstrapping the seed terms Manually build up a set of terms that are highly associated with typhoon as filtering keywords. 1. Collecting typhoon-related seed terms 2. Generating lexical patterns based on the seed terms 3. Yielding typhoon-related terms analogically 2011/1/27 17 Typhoon Flood Nature disaster Damage Weather Front … 2011/1/27 18 Methods Step 1: Bootstrapping the seed terms (continue) • 2. Generating lexical patterns based on the seed terms. We adopted analogical similarity algorithm propose by : – Lexical patterns that co-occur with similar pairs tend to have similar meanings (Lin & Pantel, 2001; Turney, 2006) – Words are semantically associated if they tend to co-occur frequently (e.g. bee and honey) (Chiarello, Burgess, Richard, & Pollock, 1990; Turney, 2010) • Based on these two hypotheses, we sent queries to Google search engine and obtained about 60 most often appeared lexical patterns. Example: typhoon 颱風 typhoon 颱風 typhoon 2011/1/27 patterns 帶來了 brought 造成了 caused seed terms 損害 damage. 損害 damage. 19 Methods Step 1: Bootstrapping the seed terms (Continue) 3. Yielding typhoon-related terms analogically – The 60 lexical patterns are further utilized to bootstrap 200 more typhoon-related terms on the Web. Typhoon-related terms 颱風(typhoon),風災(wind damage),來襲(hit),災害(damage), 莫拉克 (Morakot),災情(disaster), 警報(alarm), 氣象局(bureau of meteorology), 豪雨(heavy rain), 淹水(flooded)… ect. • By using these 200 Typhoon-related terms, we can automatically extract plurks such as : 「天阿!竟然連嘉義都淹水了! 」 “My God! Even the ChiaYi county is flooded!” 「分享剛剛看到一篇關於淹水的好文章」 “I want to share a well-written article about flood.” 2011/1/27 20 Methods Step 2: Classifying the plurks • Recall the four main properties of an event: .who.when .what .where Event.who => The user who utters an utterance on Plurk. Event.when=> real-time nature representing by the timeline. Event.what => Property is detected by the 200 Typhoon related keyword. How about the Event.where ? 2011/1/27 21 Method Step 2: Classifying the plurks (continue) • For the .where property, there is an observed pattern in Chinese especially in describing emergent or instant events. (TT= Typhoon-related term, $=punctuation) $ Location + …… + TT + ……. $ “ Ba-De Road is flooded.” • The “Location” included all the place names sorted by Taiwan National Post Office, and also place deictic pronoun, such as here, there, home, … Rd., and etc. 2011/1/27 22 Method Step 2: Classifying the plurks (continue) • To reduce the inconsistency, we excluded the following patterns $ Location + … + $ + … + TT + … $ $ Loc $ ET $ “The weather is clear in Taipei, it seems that the flood is gone.” • Interrogative forms are also excluded. $ Location + … + TT ? $Loc TT ? “ Is Taipei really flooded? ” 2011/1/27 23 Evaluation and Result Evaluation • Evaluation is not based on subjective judgment but on the authorized media. • We use the newspaper, libertytimes, in Aug 9-10, 2009 as gold standard. • The candidate sub-events that are reported in the newspaper next days are considered as matched one. 2011/1/27 24 Evaluation and Result Result • From total 11514 plurks on August 8, 2009, approximately 550 plurks are classified as candidates carrying typhoon related subevent. • Precision rate:73.45% 2011/1/27 25 Evaluation and Result • The emergency reports are matched with the reports that announced by the central weather bureau or the media. Notification is delivered faster than the government or media. • The location is more specifically notified in plurk. The flood waters near Er-Ren stream are knee high. 南科大門口的樹都被吹倒了,大家別從那裡經過。 “The trees in front of the Tainan college were blown down, don’t pass by there”. 2011/1/27 26 Figure of P rate on sub-event detection 2011/1/27 Rainfall alarm given by the Central Whether Bureau on August 8 27 Evaluation and Result Error analysis • Irrelevant plurks 祈禱南部受風災的朋友們一切平安。 “Wish everyone who suffers from the damage in southern Taiwan is fine.” 看新聞才知到屏東淹水好嚴重,雨量已經累積到 2146mm… “I learned from the news that the flood caused serious damage in PinTung, the accumulated rainfall had reached 2146 mm…” 2011/1/27 28 Local disaster alert map in Real Time 2011/1/27 29 Conclusion • Detecting local emergency early response during disasters also demands attention. • We use Plurk by focusing on a case study of the typhoon Morakot disaster in Taiwan. • Our proposed methods combined bootstrapping approach with instant event linguistic patterns to perform the task, which yield promising result. 2011/1/27 30 Conclusion • With more data, it is expected to automatically construct an event ontology for the disaster information management. 2011/1/27 31