Detecting Sub-events in Environmental Disaster using Plurks

advertisement
The second KYOTO workshop, Gifu 2011
Detecting Sub‐events in
Environmental Disaster using
Plurks
Mei-Yu Chen, Qian-Rong Zhang, Li-Chuan Ku1
Jessie Chiao-Shan Lo, Jia-Fei Hong2
and Shu-Kai Hsieh3
1.National Taiwan Normal University
2. Academia Sinica
3. National Taiwan University
2011/1/27
1
Outline
•
•
•
•
•
•
•
Introduction
(Sub)-event:What and Why?
Our experiment
Plurk Corpus
Methods
Evaluation and Result
Conclusion
2011/1/27
2
Introduction
• Recently, Internet social networking tools
have been utilized as an alternative
solution in emergency response during
disasters (Huang et al, 2010).
• Twitter and plurk have the feature of realtime nature (instantaneity).
 limitation of sentence length
 each post passes by as the timeline flow.
2011/1/27
3
Introduction
• Sakaki et al (2010) proposes an algorithm
to monitor tweets and to detect a target
event in the environmental domain such
as earthquake, typhoon, rainbow, etc.
• By considering each Twitter user as a
sensor, notification of earthquake is
delivered much faster than the
announcements that are broadcast by the
Japan Metrology Agency.
2011/1/27
4
Japan
S.Korea
Philippine
2011/1/27
5
2011/1/27
6
Introduction
Motivation
• In the environmental domain, the cause of the
event (i.e., etiology) and the aftermath of the event
have great impact on the disaster management,
which were not treated finely.
Objective
• Centered on the theme of TYPHOON, we aim to
propose automatic ways to find out the
event‐related cliques in the social network with
spatio‐temporal information help in gathering and
disseminating
real‐time
sub‐event
information.
2011/1/27
7
What is an Event ?
• Event has the commonly understood definition of "a
happening" or "a noteworthy occurrence.” (Bedrosian,
2008)
• Event has the .who .what .when .where and .how
properties.
• Example:
Armstrong descended to the lunar surface on the Apollo
11 moon landing mission on July 20, 1969. .
2011/1/27
8
What is an Event ?
Other properties
• .agency -- Natural force or Human agency?
• .impact -- on: humans, animals, nature
• .why -- causality
• .frequency and .predictability – [Time]
• .parents and .children – [Precedents and
Consequences]
2011/1/27
9
What and Why is Sub-Event?
• What is a sub-event?
 Any kind of damages or inconvenience which are
related to residents’ lives.
 The aftermath of the event :
real-time notifications、Emergency response、
instant damage report.
• Why sub-event?
 It is closely related to daily life.
 It can be easily embedded in the mobile application.
2011/1/27
10
Our Experiment
• Detect the “sub-events of Typhoon” in the Plurk
Social Web-As-Corpus.
• Main event :
Typhoon Morakot on
August 8, 2009.
• Sub-event : flood,
wind, collapse, blackout, etc.
2011/1/27
11
What and Why is Plurk?
• What is Plurk?
Plurk is a free social networking and micro-blogging
service that allows users to send updates through
short messages or links, which can be up to 140 text
characters in length. (Wikipedia)
– Plurk has a timeline view integrating video and picture
sharing. Updates are received in chronological order.
• Why is Plurk?
 According to Alexa (The Web Information Company),
44.6% of Plurk's traffic comes from Taiwan. (up to
November 21, 2010)
2011/1/27
12
It is flooding!
The flood is two story high
• By searching in the Plurk contents in Aug
8, 2009, we aim to automatically detect the
instant events which is related to Typhoon.
2011/1/27
13
Data Source: Plurk Corpus
• Plurk Corpus has been developed by NTU.
• Using official Plurk API, both meta- and linguistic information of the
users had been collected.
• Currently, the corpus includes a total number of 20,448 users and
among them, 17,973 from Taiwan.
• 5,889,033 sentences and Over 20 millions words.
2011/1/27
14
THE DEMOGRAPHIC
Total 20,448 users!
INFORMATION17,973 from Taiwan
5,889,033 sentences
Over 20 millions words
2011/1/27
15
15
Data Source: Plurk Corpus
• Preprocessing of the corpus data
• The corpus data has been filtered on Plurk by user info,
(restricting to users in Taiwan), and time of utterance,
(as on 08/Aug/2009, which is the date of typhoon
Morakot.)
• All the data were segmented by CKIP Chinese
segmentator.
2011/1/27
16
Methods
• Step 1: Bootstrapping the seed terms
Manually build up a set of terms that are highly
associated with typhoon as filtering keywords.
1. Collecting typhoon-related seed terms
2. Generating lexical patterns based on the seed terms
3. Yielding typhoon-related terms analogically
2011/1/27
17
Typhoon
Flood
Nature disaster
Damage
Weather Front
…
2011/1/27
18
Methods
Step 1: Bootstrapping the seed terms (continue)
• 2. Generating lexical patterns based on the seed terms. We adopted
analogical similarity algorithm propose by :
– Lexical patterns that co-occur with similar pairs tend to have similar meanings (Lin
& Pantel, 2001; Turney, 2006)
– Words are semantically associated if they tend to co-occur frequently (e.g. bee
and honey) (Chiarello, Burgess, Richard, & Pollock, 1990; Turney, 2010)
•
Based on these two hypotheses, we sent queries to Google search engine
and obtained about 60 most often appeared lexical patterns.
Example:
typhoon
颱風
typhoon
颱風
typhoon
2011/1/27
patterns
帶來了
brought
造成了
caused
seed terms
損害
damage.
損害
damage.
19
Methods
Step 1: Bootstrapping the seed terms (Continue)
3. Yielding typhoon-related terms analogically
– The 60 lexical patterns are further utilized to bootstrap 200 more
typhoon-related terms on the Web.
Typhoon-related terms
颱風(typhoon),風災(wind damage),來襲(hit),災害(damage),
莫拉克 (Morakot),災情(disaster), 警報(alarm), 氣象局(bureau
of meteorology), 豪雨(heavy rain), 淹水(flooded)… ect.
• By using these 200 Typhoon-related terms, we can automatically
extract plurks such as :
 「天阿!竟然連嘉義都淹水了! 」
“My God! Even the ChiaYi county is flooded!”
 「分享剛剛看到一篇關於淹水的好文章」
“I want to share a well-written article about flood.”
2011/1/27
20
Methods
Step 2: Classifying the plurks
• Recall the four main properties of an event:
.who.when .what .where
Event.who => The user who utters an utterance on Plurk.
Event.when=> real-time nature representing by the
timeline.
Event.what => Property is detected by the 200 Typhoon
related keyword.
How about the Event.where ?
2011/1/27
21
Method
Step 2: Classifying the plurks (continue)
• For the .where property, there is an observed pattern in
Chinese especially in describing emergent or instant
events. (TT= Typhoon-related term, $=punctuation)
$ Location + …… + TT + ……. $
“ Ba-De Road is flooded.”
• The “Location” included all the place names sorted by
Taiwan National Post Office, and also place deictic
pronoun, such as here, there, home, … Rd., and etc.
2011/1/27
22
Method
Step 2: Classifying the plurks (continue)
• To reduce the inconsistency, we excluded the following
patterns
$ Location + … + $ + … + TT + … $
$
Loc
$
ET
$
“The weather is clear in Taipei, it seems that the flood is gone.”
• Interrogative forms are also excluded.
$ Location + … + TT ?
$Loc
TT
?
“ Is Taipei really flooded? ”
2011/1/27
23
Evaluation and Result
Evaluation
• Evaluation is not based on subjective
judgment but on the authorized media.
• We use the newspaper, libertytimes, in
Aug 9-10, 2009 as gold standard.
• The candidate sub-events that are
reported in the newspaper next days are
considered as matched one.
2011/1/27
24
Evaluation and Result
Result
• From total 11514 plurks on August 8, 2009,
approximately 550 plurks are classified as
candidates carrying typhoon related subevent.
• Precision rate:73.45%
2011/1/27
25
Evaluation and Result
• The emergency reports are matched with the
reports that announced by the central weather
bureau or the media. Notification is delivered
faster than the government or media.
• The location is more specifically notified in plurk.
The flood waters near Er-Ren stream are knee high.
南科大門口的樹都被吹倒了,大家別從那裡經過。
“The trees in front of the Tainan college were blown down, don’t pass by there”.
2011/1/27
26
Figure of P rate on sub-event detection
2011/1/27
Rainfall alarm given by the Central
Whether Bureau on August 8
27
Evaluation and Result
Error analysis
• Irrelevant plurks
祈禱南部受風災的朋友們一切平安。
“Wish everyone who suffers from the damage in southern Taiwan is fine.”
看新聞才知到屏東淹水好嚴重,雨量已經累積到
2146mm…
“I learned from the news that the flood caused serious damage in PinTung, the
accumulated rainfall had reached 2146 mm…”
2011/1/27
28
Local disaster alert map in Real
Time
2011/1/27
29
Conclusion
• Detecting local emergency early response
during disasters also demands attention.
• We use Plurk by focusing on a case study of the
typhoon Morakot disaster in Taiwan.
• Our proposed methods combined bootstrapping
approach with instant event linguistic patterns to
perform the task, which yield promising result.
2011/1/27
30
Conclusion
• With more data, it is expected to
automatically construct an event ontology
for the disaster information management.
2011/1/27
31
Download