Document 17955763

advertisement
>> Andres Monroy-Hernandez: Welcome to this presentation. I have the pleasure to introduce Axel
Schulz. He is a PhD student at and works also as a research associate in the Telecooperation Lab at the
Technical University in Darmstadt. Is that how you pronounce it? And also works at SAP Research. So I
met him at [indiscernible] USM earlier this year in the summer and I thought he was doing really
interesting work analyzing Twitter using machine learning and other methods to understand small scale
incidents like car accidents and so on. And I think that he is using a lot of data from Twitter so that’s
also connected to some of their work [indiscernible]. So.
>> Axel Schulz: Thanks. So first thank you for inviting me here to share my research with all of you. The
topic for this talk will be microblogging during small scale Incidents. As said before my name is Axel
Schulz. I’m working part-time for the Technical University of Darmstadt and I am also employed by SAP
which is famous more or less for your piece of their business software. And I have programmed what I
found PhD students to stay at SAP and to work with the university and that’s what I am doing and I am
in the last month of my thesis. I just have to write down here but I think most of you know, writing
down is not that easy. You want to write the next paper and go on and go on researching, especially if
you have an interesting research topic. But, everything has to come to an end. But that’s to my person.
First I want to show you and want to tell you something about Darmstadt. Darmstadt is located near
Frankfurt in the Rhine-Main area where a lot of computer science and research companies are located.
The university itself is the only technical university in our state, not in Germany but in Hessian. We have
two main campuses. One for social sciences and the other one for computer sciences or technical
sciences with over 25,000 students. Around 2,000 students are studying computer science. And also
SAP founded a research lab in 2006 because of the location close to the university with a lot of
collaborations with the technical university. So that’s why SAP is there.
SAP further on, where am I working at? It’s the HCI Research Group. We’re doing more or less
computer interaction research focused on smart interactions, flexible collaboration using for example
surface for crisis management applications. We can talk about this later if you like. Information
exploration exploring large datasets and the main topic currently is context-awareness. Bringing
context-awareness to business applications is very important, especially for SAP and we will look how to
define to infer the context and to find the information that the user really needs. And my research is
also related to this because finding the relevant information and the large amounts of information that
is out there is what I am more or less doing.
On the other hand, the Telecooperation Lab, with Professor Max Muhlhauser, is also related in the HCI
topic. We are doing cooperation research and pure networks area, smart area networks, smart sensing.
Also interaction topics like talk ‘n’ touch or tangible interaction. And also we are related to smart
privacy and trust research topics which is the so-called cased research department, which is a special
department focusing on privacy and trust issues.
So this is where I am from, what my, a lot of background of my both employers. Now let’s have a look at
my talk. What is the motivation of this talk? As decision makers for example here in Seattle you have to
decide what to do when an incident occurs. For example if there is a fire in a building or a car crash
happening, then county emergency management staff gets their information that is provided by on-site
rescue squads. So they are communicating by radio and they are telling oh there is fire, there are two
persons injured. We need a lot of rescue squads or anything else to help. On the other hand we have
traditional [indiscernible] systems, traditional emergency management systems that are very specialized
for decision making in crisis management. And all of this information that is around there published to
wire 911 calls in these systems and decision makers know how to handle all of these information
sources. So they have their own situational picture of the situation that is going out there. On the other
hand we have bystanders, citizens that are reporting information about what is going on in the city. For
example, they publish microblogs around for example, your neighborhood topics or they are publishing
microblogs that are related to the incidents, for example, to incidents that are already known to the
decision makers and on the other hand to previously unknown incidents. And currently we have the
situation that for decision making this additional information is not usable and this results in a
fragmented situational picture. Because if this tweet is, this tweet information about the incident is
shared, it was not reported before by the onset rescue squad, then a decision maker misses important
information.
So summing up, currently valuable is the generated content is not used by decision making or for
decision making in emergency management. Because if you ever look at social media it is unstructured,
completely unstructured. It is wherever [inaudible]. We have microblogs with 140 characters. We have
Facebook posts. We have many other different types of sharing information. Also, we have large
amounts of information. And for real-time decision making in crisis management there is absolutely no
time to have a look at all of the tweets that are around there.
So this is the situation that we have. And my vision and the vision of my PhD thesis is to make or to
identify, to learn information and the large amounts of information that is out there. And to make it
usable for decision making in crisis management.
The result of this is that a decision maker gets an enhanced situational picture because he has new
information that was not there before. So to sum up the situation before was user generated content
was not usable and with the things that I want to present here is that we make user generated content
somehow usable for decision making in crisis management.
From this we have three research questions. The first one is how to classify user-generated content.
We have, if we take for example events, we have to classify user-generated content based on the
spatial, the temporal and the thematic dimension. Second, we have to identify the relevant information
which information is the information that I need. And third, how do we get more information out of
user generated content. For example if we have multiple microblogs talking about the same incident,
then we should aggregate this information to one event so the decision maker has only to click one
event cluster and sees oh there are five microblogs related to this event.
During this talk we will focus on everyday small scale incidents like car crashes, fires in buildings,
shootings that are happening every day. They have limited spatial and temporal extent. They are
happening maybe car crashes over after thirty minutes and only a few people are affected. On the other
hand compared to large scale events, the amount of available information is rather low because for
example only one or two tweets are published per small scale incident, which makes it much more
difficult to detect the important information that is out there.
The agenda for this talk will be after the motivation and vision that we have seen before. I want to
introduce you a general pipeline for processing microblogs for incident detection. I want to show you
how microblogs are preprocessed, then how we classify and how we refine our classifier for detecting
incident types. And then I want to show you what to make with the information that is out there based
on our classifiers. We can do a lot of interesting things.
The first step is that I want to show you, much, a very general overview of how pipeline for instant
detection looks like. This is the pipeline that I developed for my PhD thesis. The first step is that we
collect different information about the things that are going on. For example we assume that humans
act as so-called soft sensors. They share information as bystanders and use their smart phones for
example to share texts, to share videos, to share pictures about the ongoing incident. The problem here
is we don’t know which person to trust. We have curiosity issues. People are just telling oh there is an
incident but they don’t tell where the incident is. So we have a lot of problems when this information
comes in.
Furthermore, we might have some semi-structured information. For example we have specialized
mobile applications that are or were developed in the so-called participatory sensing environment. We
ourselves developed so-called Incident Reporter App which is a specialized mobile application that can
be used by people to share incident reports. We are also working on Noisemapping. Noisemapping is
for example you are walking around a city, you collect noise measurements and this helps us to infer
noise sources, which is very important for different management [inaudible] to understand when
something is loud and why something is loud in the city. On the other hand, as mentioned before, we
have very unstructured information like social media. And in our, in my talk, I’m going to focus on
microblogs as one example.
As the next step, when all this information comes in we preprocess the information and then automatic
preprocessing and filtering step. In this case the definition of an event is important. An event can be
defined as something that happens at a particular time and space. So we have three dimensions. The
first dimension is the type of the incident or the type of the event. We have the spatial extent and the
temporal extent. And this brings us back to our initial question. For differentiating certain events, we
have to identify tweets that, or we have to identify the spatial, temporal and thematic dimension of
each tweet to differentiate noise from the relevant information. This is done in the automatic
preprocessing and filtering step.
The next step, when all the information is somehow collected and inferred we can use machine learning
to predict or to classify this user generated content for example for differentiating incident types. In our
case we differentiate three incident types, which are fire, shooting, car crash and no incident. In this
case good training data is needed. And to obtain good training data counted as only one or two, which
is called crowd sourcing. Crowd sourcing means that we can provide labeled information or we can use
the crowd to label tweets for training our initial classifier. Also the crowd might be used for labeling
wrong information because the classifier is not 100% percent [inaudible]. We can use crowd sourcing
for refining. But also in this case crowd sourcing is limited because during time critical situations we
don’t have the time to call the crowd help us now because we have an incident here to label the
information. This is not possible. So our approach is that we want to combine the wisdom of the crowd
that is out there with the power of the algorithms that is given for example by using machine learning.
As a last step in our pipeline we can provision the now structured information by accessing the so-called
virtual sensor. We call this a virtual sensor because you can just say deploy sensor in the city of Seattle
that detects small scale incidents. And the sensor can collect information that is around there and
presents and processes the information and presents information as a structured information base. In
our case, the structured information base might help us to improve the situational picture of the
decision maker. So this was the general pipeline and we now have a closer look at all of these steps.
And the first step is how to identify, how to thematically, temporally and spatially classify usergenerated content.
If we have this tweet, which is called retweet @people oh no friday afternoon heavy traffic accident on
I90 right lane closed. This tweet, part, these are parts of tweets that’s really out there but I constructed
some additional points because it’s easier to show the things that are happening now based on the,
yeah, pre-constructed example. The first of these that we are doing, we preprocess this tweet based on
textual processing. First we remove the retweet mentioned. We remove the @-mentions so that
people or the person at this address, we remove special characters that might be present.
Later on we resolve abbreviations. For example the onoe can be resolved by oh no, which is helpful.
For example people might state, might use different abbreviations for different things and so we can use
the real word set up behind these abbreviations.
The next step, we annotate and replace spatial mentions. For example, the I90 is a spatial mention and
we just replace it with the common @ location text. This is important because if we do, or if you want
to infer a textual similarity, then comparing I90 to I80 or I75, which might be around there, is not easy or
is not possible. But using a common n-gram word like @LOC we are able to have some textual
similarity. The same stands for temporal mentions. So we are detecting in our tweet Friday afternoon
as a temporal mention and we replace it with a @date text. We can do the same with time mentions
that mark the present in microblogs. For the “on” we apply the standard text processing steps like
Stanford lemmatization function or the POS tagger. The POS tagger is important because in our case we
found out that only nouns are valuable for incident detection and for different use cases other word
categories might be beneficial.
As mentioned before text similarity is not always sufficient. For example if we have this example like
@LOC mention and traffic accident @LOC lane closed and car collision @LOC then we find textual
similarity based on the @LOC word. But the textual similarity based on the rest of the tweet is not high.
On the other hand we might detect some other higher level concepts of words. For example, accident
and collision are somehow related. In our case we use linked open data which is a source for a lot of
interlinked information extracted from Wikipedia for example and annotated with categories and types
to infer more common concepts. For example accident and collision share the concept for the category
accidents. And if we use this more general category, then we can detect not only text similarity but also
conceptual similarity between two tweets.
This is done using the so-called FeGeLOD framework. FeGeLOD stands for feature generation based on
linked open data. And this framework DBpedia Spotlight is used for detecting entities like traffic,
accident, I90 and lane. And here are some examples. There are more than, I think more than 50 links
that are extracted. Here the types of accidents, road or the categories causes of death are extracted.
And we can use these higher level concepts for finding conceptual similarity to other tweets for
example.
So our idea is to make use of the named entities and the higher level concept, concepts that are behind
these named entities later on for machine learning and to add some additional machine learning
features.
As mentioned before, temporal filtering is important. It is not obvious why this is important. In our case
we assume that the creation date of a tweet is not necessarily the tweet when the incident occurs. For
example I can say oh my brother was involved in a car accident last year. And this is not relevant for us
now. What we do is we extract, or we try to extract, the incident date or the real incident date based on
the content of the tweet. Also as show before we use this mechanism to extract or to replace the
temporal concepts in tweets.
For this we use the HeidelTime temporal tagger which was developed for Wikipedia. For example we
customized it to be usable on microblogs and if we have a tweet that was from before that might be
created on Tuesday the 19th in February and have the temporal mention Friday afternoon, then we can
extract the time or can infer a time like the last Friday, the Friday that was before this Tuesday and say
oh the incident was more likely on the last Friday and not on the Tuesday when the accident, or when
the tweet was created. Also as shown before, we can use this approach to annotate the message to
replace the temporal mention.
There’s another important step. Spatial filtering is necessary because we want to know where the
incident occurs. And when we use microblogs, then we have the problem, the major issue that only 1%
or around 2% of all tweets are explicitly geotagged. Thus a lot of tweets cannot directly be used for a
use case.
Currently no simple approaches are applicable like using the IP address of the user because we don’t
have this information or just using a Twitter search API because to search API is also incomplete and also
error prone. When inferring locations of tweets we have to find good mechanisms to know where the
incident occurs or where the location of the user was when he sent the tweet. And if we cope with this
research topic then we have to cope with toponym resolution. Toponym resolution means that we have
tweets track location information which are proper names in text and we have to disambiguate it. In
this case we differentiate two problems. The first problem is the geo/geo disambiguation problem. For
example if we extract Paris, the proper name Paris from a text, then we don’t know which city is meant
because we have 23 cities in the USA. So which city is referred by the user if we post a tweet the car
crash happened in Paris last year? We don’t know. Furthermore, we have to differentiate or we have to
cope with the geo/non-geo disambiguation problem, which is for example if a proper name like Vienna
is used, then it can refer to a city, but also can refer to a person name.
This whole topic spatial processing is one of my main research topics so I want to show you how this is
done in our research. I find this very, very interesting because microblogs are, except for only a few
microblogs, have geo tags and inferring geo tags from microblogs is a very interesting research topic.
We found out that for microblogs we have a lot of spatial mentions. We call these spatial indicators.
For example, we have the tweet message. The message might contain toponyms or it might be a check
in like we have in foursquare, a check in at a certain location. Here we have a link. We have followed
the link and then we know at which venue the person checked in. When the user has a user profile he
can enter the location field to say this is my own location. He can enter a website. He can use for
example the top level [indiscernible] to infer which is the country the user is most likely from or we can
use the time zone. And for all of these spatial indicators that are out there, different means of
processing are necessary. What we do is we combine the information that we can extract from one
tweet, which is a lot, to infer the location where the tweet was sent. So the general idea is for every
spatial indicator we can extract the polygon in the world. Here is one polygon. Here is one for France.
Then we have another one and another one maybe for Paris and another one for a small district of Paris.
What we do is that we say, we assume, that intersection of all of these polygons and the polygon, the
resulting polygon with the highest height in the end is the resulting polygon. For example here we have
three polygons and if we intersect three polygons then we have a resulting polygon with height 3 which
is much more than these two polygons which might have a height 1. So you can say this is the location
that is most likely to be the location where the tweet was sent.
So summing up this approach, we extract spatial indicators from microblogs. We map them to polygons
around the world we built using the sequel service spatial extractions. We built up a large collection of
polygons describing administrative areas at six levels of accuracy. For example we have the Manhattan
as a polygon. We have the states. We have the USA. We have the time zones. All these polygons are
in our data base and then we can extract or infer which polygon might be related to France or which
polygons might be related to France or the city Paris.
Then we assign a height to each polygon. A height allows us to model the certainty of certain spatial
indicators. For example if we know that a person checked in at a certain point, then this location is very
accurate. If we have the time zone which is very long, which is a very large polygon, then we know this
spatial indicator is very inaccurate. Also we might get quality measures provided by external services as
confident scores. For example if they extract locations for the city Paris and we obtain 20 results, then
these API’s might provide us quality scores or confident scores for each estimation. And what I said
before, we stack these polygons that are resulting based on each other so we have 3D shapes which
have height. And the highest area of this intersection of our polygons is our final estimation of the
location.
So coming back to our example, we have these tweeter check in where a city is mentioned. We have
the user profile. We extract different spatial indicators using different API’s. For example we call the
foursquare web page. We use the DBpedia Spotlight for annotating the entities. We use [indiscernible]
called GeoNames for inferring location mentions and the location fields, and so on and so on. Then we
infer the coordinates that are related to these spatial indicators and map them to polygons. Then we
provide them with quality measures. For example as said before, foursquare should [indiscernible] as a
very high quality. On the other side we might have the time zone which has a much lower quality in the
end. Then we take all of these polygons that are resulting [indiscernible] off each other and the highest
polygon is our assumed position and the confidence. It has a certain confidence. This is the whole
approach.
I did not mention this before but if you have questions, then I will stop at that point. Maybe later on we
have too many stuff to discuss. And we evaluated this approach with 1 million tweets. Yes?
>>: It seems like another clue you could use is the fact that people could only move so fast. So if I
tweeted an hour ago, not that long, but I can’t be that far away. But could that be another clue?
>> Axel Schulz: Yes we could infer so-called mobility patterns of people. But the important point here is
that we make use of only one tweet by one person. We don’t need all the other tweets. It’s not that
easy to collect tweets or we can collect tweets of certain user profiles. We can say give us tweets for
this user but doing this for all the users that are around, it’s not valuable in the end. I know your idea
and it’s very good and it’s done our research, but for real-time detection in the end it’s not applicable I
would say.
As I said, we evaluated our approach with 1 million tweets. Having the device location so the device
location is a ground truth and we found out that we are able with our approach to estimate the real
location of a tweet with a median accuracy of 30km, which is quite good on city level. We could do this
for around 92% of all tweets that are contained in our set.
When we talk about incident detection, then we might directly say oh 30km is not enough because
knowing that the incident occurred somewhere in Seattle. Okay, nice to know, but we want to know at
which intersection did the incident occur? But it’s important to mention that 30km is much better than
knowing nothing about the location. So this is the first step to infer the city where something might
have happened. Then we have to cope with the problem of street-level geolocalization. So we want to
know where the incident actually occurs.
>>: [indiscernible] tweets that you are trained on all have geo tagged [indiscernible]. So I compared
that with non geo, like with a set of features that are in those tweets with non geo tagged tweets
because you might expect people to know that their tweets are being geo tagged don’t but locations.
>> Axel Schulz: Actually no. Because how would you get a ground truth for these tweets?
>>: Well I’m just saying so you compare a set of tweets that have been geo tagged and say like okay, ten
of them have the location …
>> Axel Schulz: I know, I know what you mean. I haven’t tried this before. I will check this but, it would
be interesting to see if the things that people are reporting about differentiate, yeah. I will do this.
Yeah I will write this down later. Another question? No.
So street-level geolocalization is the next step for this. We retrained a special model, a special model for
Seattle using the Stanford Name Entity Recognition tool kit to geocode or to identify named entities and
then we geocoded these named entities on street or building level. For example if we have this tweet,
we can detect I90 has spatial mention as a proper name and then we can infer the location or geocode
the location of the I90 which is quite easy because there are a lot of API’s around here which give us the
location or the polygon surrounding this highway. On the other point we can use this approach to
detect named entities and to replace them with [indiscernible] location mention.
So for Seattle we were able to detect location mentions within accuracy of around 90%, which was
because we trained a model for Seattle. I don’t know if the same model might perform on different
cities. There are similarities between different cities but besides the more general approach that I
showed before to infer the city, such as street-level approach, has to be developed for every city on its
own because the ways people talk about locations might differ across the cities. So for the automatic
processing and filtering step we have seen how to temporally and spatially classify user-generated
content and how to thematically preprocess it.
The next step we want to know what the actual type of the incident is. This is found in the automatic
classification step for incident types. Yes?
>>: Before you move on, I have a quick question. So you said you removed all [indiscernible]?
>> Axel Schulz: Yes.
>>: Did you look at [indiscernible] because you might expect, and actually I’ve done some stuff in my
own work that shows like incident related tweets are much less likely to have mentions of other people
in them. So that might actually be valuable information. Just the fact that you are talking about a
certain other person might mean you’re not talking about an event.
>> Axel Schulz: Okay, yes. [indiscernible]
>>: Or perhaps the mention of a specific thing, like the Seattle PD or something like that.
>> Axel Schulz: Yes. No, actually we don’t use those features but we will now come to the next slide
and then we can discuss these ideas. Because what we do or what we want to do is we have to extract
features for our machine learned problems. So we have to transform our set of tweets into features
that might be valuable for machine learning. In our case we experimented with word unigrams,
character n-grams because the Twitter guys told us that character [indiscernible] grams are most
valuable for doing machine learning which is not true for our set but maybe for their datasets. We used
syntactic features like # of words, # of characters, # of “!”, # of “?”. We experimented with sentiment
features. We used the text similarity scores. We used our spatial and temporal features that we
extracted before and we used the linked open data features that are the concepts. And as she said
maybe we could extend this with the absence of @ mentions anything else here. It would be nice to
test this.
>>: [indiscernible] Retweet attributions might not be as useful but like, mentions of specific like
emergency management organizations or like just [indiscernible] when directing someone at someone
might mean it’s not about that incident.
>> Axel Schulz: Yes. But the problem for example if we would use the mention of certain emergency
management organization, it could result in a problem that our model is over fitted to what this mention
is.
>>: [indiscernible] city is so big anyway, so maybe it …
>> Axel Schulz: Yes. You could try this out. [indiscernible]
>>: [indiscernible]
>> Axel Schulz: Yes. We tested all of these different combinations of these features and different
machine learning methods using this the [indiscernible] Meka tool kit extract [indiscernible] Vector
machines and valuated our approach with the dataset or training dataset collected in the city center of
Seattle and also in the city centers of Memphis, Tennessee. Here are the numbers. The first set consists
of I think 1,200 tweets and the second, the test set consists of 1,608, I think, tweets related to incidents,
to a certain incident types or to which are not related to incidents and we found out that the best
combination of features is for you to use word 3 grams plus [indiscernible] filtering. For example only
nouns are valuable or valuable in our case plus concept features syntactic and TF-IDF scores. Then we
can achieve accuracy of around 82% for a test set.
>>: What did you say is LOD?
>> Axel Schulz: Add the concept to each [indiscernible] of the data features. So if they replaced concept
of accident or roads. That 82% is yeah, quite good score I would say. Though the dataset is not that
large but it’s not easy as mentioned before to find good incident related tweets and to label them. It’s
no fun to collect 6 million tweets and to identify these 100, 200, 300 tweets that are valuable for
incident detection. So, yes sir?
>>: You had some measure of how often each type you were able to [indiscernible]. You still get like
70%, maybe more. Because your test dataset if 70% [indiscernible]. So I’m just curious do you have
some rough indicator of you know, other than the car accidents, how many were classified correctly of
the fire?
>> Axel Schulz: How many are there in the real world?
>>: [indiscernible] If your machine learning just said everything is done, [indiscernible] you did 75% or
something, how long …
>> Axel Schulz: I had the baseline [indiscernible]
>>: No, no. I’m just curious, so [indiscernible].
>>: A [indiscernible] I think is what you want.
>>: Yeah.
>> Axel Schulz: Okay, I don’t have it here but we can maybe look [indiscernible]
>>: I was generally curious if [indiscernible]
>> Axel Schulz: It was. The computer metrics look good but the main problem was, well not the main
problem, the issue was that mostly the decision was between incident or not an incident. Not car crash
or fire. There is another research we are currently doing because we found out that I think 90% of all
incident related tweets are indicating only one incident type. But with 10% of all tweets that state two
types for example, the car crash where the car burns, this is a different problem. In this case the
classifier, this kind of classifier is not applicable so we are working with the multi label learning to infer
multiple types. Also multi label learning is beneficial because if we know that there is an incident
around there, we might infer that the tweet is related or states something about injuries. So we can
detect the incident and we can find out that there are no injuries or there are two persons injured. So
with multi label learning you can do a lot of other things. Yes?
>>: I gather the car crash. The car on fire. Do you count that as two incidents or one incident?
>> Axel Schulz: No. In this case the classifier decides for one. Most likely for the car crash in this case I
would say.
>>: [indiscernible] is that like something that might be coming in, like streaming, or is it …
>> Axel Schulz: Whoa, whoa, whoa. Come back to it. [indiscernible] we use to search API. Because we
said to search API we get a sufficient sample for the cities. In our case for the training set we collected
around 6 million tweets in December 2012.
>>: Just geo samples?
>> Axel Schulz: No, not just geo …
>>: Sort of like key word samples or how did …
>> Axel Schulz: Then we did the key word sampling based on incident related key words which are
[indiscernible] and then we selected another sample set and then we manually annotated this sample
set which [indiscernible] around to, I don’t know around 40,000 tweets.
>>: So like the proportion of the [indiscernible] in your datasets is probably higher than [indiscernible].
>> Axel Schulz: Because of the key word search, yes. [indiscernible]
>>: [indiscernible]
>> Axel Schulz: Okay. These are the results for the simple classifier. The first question that comes to
mind I’ll say is are these results transferable to a different city? So can we use the same classifier for a
different city? And I prepared these results two weeks ago to show you how our classifier performed. I
collected for one hour 90,000 tweets in New York City and applied our classifier that was trained for
Seattle. And I could detect only in the one hour period 15 incident related tweets with probability
higher than or the probability that an incident is mentioned in this tweet was higher than 75%. So 15
tweets is fine for a city like New York City. For one [indiscernible] I would say, and here’s the proof just
one tweet stating there’s a fire in apartment 3A and so on, that was sent by, I don’t know, maybe it’s an
official emergency management organization. But, it was detected by our classifier that was trained for
Seattle. So the classifier performs without adapting to New York City well on a different city. But also it
found this tweet, AlGoesHard causing fires on Instagram. Yeah it might be related to a fire incident and
it was identified in this incident related tweet set and this shows us that the refinement of the classifier
is needed and should be done for using it on the different dataset.
This is what we are currently experimenting with. We use the crowd to reclassify tweets that were
created or are created in a different city for relabeling these incident related tweets. And for this we
developed mobile applications first, not mobile application but the web application was the first
prototype stating a lot of different incidents. For example motor vehicle accident, freeway at a certain
place shows the tweets and all the pictures that are extracted from tweets related to this and then the
users can go to the platform and vote yes this is correct or this is not correct. Currently I am extending
this so that user can say, oh what is the real type of incident mentioned in the tweet. And we can use
this mechanism to refine our classifier and I did this with 693 instances and retrained the model and
yeah, the accuracy proved for one training set with 1,700 instances. So retraining is very valuable in our
case. So combining the power of algorithms with [indiscernible] the crowds should be done.
So what we have seen, we can classify incident types with machine learning. Also a classifier applicable
for one city can be used for a different city and I’ve shown you how a model can be refined for a
different city. Also the model might be refined for different incident types because we are only covering
three very generic incident types.
Finally, I want to show you how to infer new information based on these pre-classified tweets. The first
idea is that we aggregate these incident related tweets to events based on the spatial, temporal and
thematic dimension. In our case we used a plain or simple rule-based aggregation algorithm that
performs this way. That we assume that two people are at the same location or around a certain radius.
For example in a 50m radius around a certain event type or a certain event like a car crash, and in the
same spatial extent for example of 30 minutes and we assume it must be the same car crash. It could be
that the car crash is over here and the next crash happened there, but in our case we assume that it is
the same event. And so we can aggregate microblogs and information that is around [indiscernible]
three dimension to one event clustering all the information.
And all the future, all the slides I will show you are based on this approach. We used tweet dataset
collected in March. We applied our classifier on 600,000 unfiltered tweets. We could detect 347
incidents using our classifier and this is also a very good point to explain to you why we used tweets
from Seattle, which is not that obvious. We used Seattle real time fire calls which can be obtained from
data.seattle.gov. This is why I like Seattle. Because Seattle is the only city in the world that publishes
incident information 15 minutes after the incident occurs or after the information is in the emergency
management system on a webpage where everyone can get this information. And we collected 141
real-world incidents for the same time period.
So we now have the incidents that were detected in the tweets and we have the incidents that were
detected in the real world. And we can compare them to each other. First we analyzed the user types.
So which users are reporting about incidents? We differentiated five user groups: emergency
management organizations, organizations not related to emergencies, traffic reporters or journalists and
citizens as individual users. And we found out, here is a graph where we can see the number of users
and the number of tweets for every event type and every user group that most of the tweets are shared
by official organizations like emergency management organizations or other organizations which are
56%. But also 33% of all tweets were shared by individuals, which is interesting to see more tweets are
shared for shootings. This might be because people are more affected somehow by shootings. Our data
said there was a shooting, there was a shooting at a library at the university here in Seattle, some, in
this, okay. You haven’t heard of this, but lots of people shared information about the shooting.
>>: [indiscernible]
>> Axel Schulz: Yeah. And so we had more tweets around this shooting which happened in the city.
Also we found out that when individual, if individual users contribute, then they contribute only one or
at most two tweets per incident type, which is also very important as I mentioned in the beginning of
the talk. Yes?
>>: About the 30% the individual users shared, do you have a sense of how many of those contain
mentions of organizations?
>> Axel Schulz: No.
>>: [indiscernible] some of the tweets [indiscernible] like they might be retweeting the emergency
organization. Even though …
>> Axel Schulz: Yes, yes I know. I know what you mean, yes. We haven’t checked [indiscernible] official
sources but we checked and this is the next slide of who is reporting first, which is [indiscernible]
covered in this problem. We found out that for 65% of all incident types the organizational users
reported first. And only 23% of all incidents are first reported by individual users. In this case, it does
not mean that this is correct because with using the Twitter search API we get a biased sample. So these
are just numbers that are valid for our dataset, maybe not for the real world.
But, what is interesting to see in our case that if individual users are reporting about incidents, then they
are reporting much faster. And in our case it was 24 minutes faster compared to other user categories,
which means that if we could detect tweets by individual users that were posted by individual users,
then we get information that is more, more, more, more, which is more recent information, and more
valuable information hopefully.
We mentioned before we also analyzed the correlation between real-world incidents and incidents that
were detected in our tweet set. And we found out, and this was done using manual comparison based
on the three dimensions, that we could detect 81% of all car accidents that are in the Seattle
[indiscernible] set, 68% of the fire incidents, 75% of the shooting incidents, humming up to 73% of all
real-world incidents which is quite good. So we can say we are detecting three quarters of all the
incidents that are in the emergency management system with just using microblogs.
So, finally, I have showed you that there are a variety of individual users reporting about incidents. And
these users are reporting faster than official sources. Also the correlation between information in
tweets and real-world incidents is quite high. So I can say that incident detection is valuable and can
contribute to situational awareness of decision makers.
Finally, summing up my whole talk, I answered, or I started with the first research question how to
thematically, temporally and spatially classify user-generated content. I have shown you how to
preprocess and how to filter microblogs based on spatial temporal and thematic dimensions. I showed
you how to identify the relevant information using machine learning techniques, how to refine an
existing classifier, and I’ve shown you how to infer new information, like aggregating single tweets to
events or how to understand how the value of microblogs is for emergency management, how this can
be done using our data base, our dataset.
So I conclude that small scale incident detection is feasible. We can do this. And also hopefully I have
shown you that it is valuable for decision makers. And I hope that someone is continuing with this work.
If not me, maybe you are interested in this. No, but maybe we can do some collaboration on this stuff
and I think it’s really valuable and really interesting. Especially to see the things that we learned from
Seattle are applicable on different cities. And I have heard of different guys talking about, oh here I’ve
talked to decision makers in New York City and they want such an application that makes usergenerated content usable. So this research topic is highly interesting and a lot of things can be done. So
thank you very much for your attention. [applause] Yes?
>>: Do you think you could set up any kind of incentive to get people to tweet more often about
incidents like this?
>> Axel Schulz: Um…
>>: Here’s a suggestion for one. Maybe if you publish all the incident tweets in a central location that
anyone could come to right, and I might feel proud that one of mine showed up there.
>> Axel Schulz: Okay. It could be done. I’m not sure if this is possible with the existing social networks.
I think it can be done with specialized applications that were developed for incident reporting. But it
may be, and the idea is nice, and it could be realized but how to identify the user that sent a microblog?
It may be that the user does not want the rest to know that he is the one. So, but I can follow your idea
or your vision.
>>: I can also read your response.
>>: [indiscernible]
>> Axel Schulz: This is what we, what we also are investigating, to understand the [indiscernible] aspect
of these [indiscernible]. So we want to find out how to make intimate management applications viral.
How, when are people using this kind of application? When are they sharing instant related microblogs?
This is also our answer to most of the cases it is plain altruism of the people there. But maybe you can
incentivize people for sending in valuable information. But it shouldn’t be the case that people should
be incentivized.
>>: So, one of the things we talked about in centering is around how we classify the tweets into
overlapping events. And so you had made a comment that if you have an event that’s the same type,
the same location, and at the same time, you just sort of assume it’s the same event. And given the
time intervals and space intervals involved, you might see all the events in a city and say okay, this is an
accident in Seattle. Have you, have you given any thought to being finer grade than that? Or
alternatively, doing event detection where events are more common. So say, I don’t know,
[indiscernible] a police officer or a [indiscernible] a red light and you know like these people are at the
same red light or the same bridge crossing. Or you know, that kind of thing where you have a much
more ambiguous which event corresponds to you and whether or not your system can be extended to
handle that.
>> Axel Schulz: So the question is if I have looked at this research?
>>: Yeah. If you or if you have any, if you have looked at this you know problem as an extension to your
[indiscernible].
>> Axel Schulz: I think the main problem is the classifier part. Extracting the temporal and the spatial
dimensions can be done but building a classifier for some automatic classifier for this type of event may
be very difficult. I don’t know. I think it’s very general what you want to do. So building a classifier, we
should, well yeah, you should think of what I presented with this refinement approach for interactive
classifier that starts with some general types and then becomes much more complex the more labels I
have. Yeah, so forget new labels and I can make very high level type. I can extract more fine grain type
and through all these iterations this might be possible. I can imagine such a system. The main problem
is the research. You need a lot of people to re-label the things and I don’t know if this is even possible.
>>: So, and this is sort of a larger question. So have you thought at all about say this system like turns
out to be really accurate and it gets implemented in like decision makers in Seattle are using this actively
in the [indiscernible] situation. Have you thought at all about the social consequences of that with some
events being over reported or areas over reported and some under reported? Because you could, in
your dataset you could look at this very easily with the events that you didn’t identify? Do they come
from specific areas that have specific demographic characteristics?
>> Axel Schulz: Well I haven’t looked at this but also very interesting. Yeah, no, I’m not a social scientist
guys so these questions are more or less out of my scope. I just investigated some situational features
so the main question remains. Are these tweets valuable in the end? Plus we just say these tweets
state some or have some incident types, some temporal and spatial mention about what is written
there.
>>: So with your test basically you didn’t want to see the events that you did capture if they had specific
characteristics or if they were from specific areas?
>> Axel Schulz: No. Not from specific areas. We just looked at the content based on these three
dimensions here.
>>: [indiscernible]
>> Axel Schulz: [indiscernible] This would be very interesting. The next type paper for next year. Okay.
>>: Are you familiar with tweets and tweet?
>> Axel Schulz: Yes.
>>: That sounds like a great example of [indiscernible] how to reclassify tweets through translation.
>> Axel Schulz: Tweet to tweet. The main problem is that you have to familiar, you have to train the
crowd to use these special [indiscernible].
>>: [indiscernible]
>> Axel Schulz: Yes, but ….
>>: [indiscernible] But yeah I was thinking about tweet to tweet at the beginning of the presentation …
>> Axel Schulz: But my idea is if people would use special [indiscernible] text, then they would also
specialized mobile application that was designed for the same. And this is much more valuable so we
get a lot of information that is needed. We can ask the people here provide us more information and I
have seen the value of tweet to tweet but as I said, if I have to train the crowd, I can use much more
different and more sophisticated mechanisms and without training the crowd finding this information is,
yeah, a far different problem. But, yeah, I am familiar with [indiscernible].
>>: So you talked about ambiguity around temporal mentions.
>> Axel Schulz: Yeah.
>>: Temporal mentions or spatial mentions, excuse me, spatial mentions. So Paris could be
[indiscernible] and it didn’t seem like you were doing ambiguity over abbreviations and temporal
mentions. So for example, EMS could be Emergency Management Services or Eastern Mountain Sports.
And that has a very different impact on this particular problem. You also, as you dive into ambiguity
here, you end up with segmentation problems where you don’t know if like Eastern Mountain Sports
refers to Eastern Mountain and sports or Eastern Mountain Sports or Eastern Mountain sports?
>> Axel Schulz: Yes.
>>: So I am curious, how you decide on the version to use? Do you just take the top abbreviation
translation and if there is anything that, how you would use multiple versions if you did get them.
>> Axel Schulz: Actually that’s the advantage of this approach. As we said if, if [indiscernible] of a text
has several spatial meanings or meanings in the real world, like New York City and NYC [indiscernible]
city or whatever else, then we can use the polygon for each alternative.
>>: Right so this is how you deal with the other types of the ambiguity. So over topics and over …
>> Axel Schulz: [indiscernible] the question is not related to the spatial but to …
>>: But how do you use the same kind of things that apply to topic and time?
>> Axel Schulz: For topic [indiscernible] we tried this with a common concept. If we have, if we have,
for NYC we detect for example several concepts, one is related to a city and the other one is related to
some movie or whatever, then we use both concepts for our training problem. So we hope that these
features in combination with the other features might help us to differentiate what’s valuable and
what’s not. But you are absolutely right. This is also …
>>: So you sort of take a [indiscernible] and weigh them based on [indiscernible].
>> Axel Schulz: Yes, but actually I think they are, this is one source for errors and as for the temporal
mentions, it is not that easy.
>>: Sure. [indiscernible]
>> Axel Schulz: Yes. Much more difficult in this case.
>>: [indiscernible] But also it was interesting when you were talking about geo [indiscernible]
>> Axel Schulz: No we haven’t investigated this nor [indiscernible] case but I just had a Syria case. A lot
of people are reporting from outside and just retweeting. This is a very important aspect to cope with
but in this case I think different means are unnecessary. You have to find some reputation model for
users. You have to look at which users are more likely from Syria which are affected somehow, which
are from outside of. We can do a lot of things with this on the user level. Yeah.
>> Andres Monroy-Hernandez: Well, let’s thank the speaker. If you will hear me out for a few minutes
here.
[applause]
Download