32964 >> Asela Gunawardana: So thank you for coming. ... Adrian Dobra from University of Washington where he studies mobility...

advertisement
32964
>> Asela Gunawardana: So thank you for coming. I'm pleased to welcome
Adrian Dobra from University of Washington where he studies mobility and
applies it to very important problems such as studying HIV transmission.
Adrian.
>> Adrian Dobra: Thank you, Asela, and thank you, Chris, and thank you,
John, for inviting me to give this talk. My group does a lot of work with
mobility. And today I'd like to share some of the things we are working on.
There's quite a bit, and the structure of our group essentially gives a way
that this is such an interdisciplinary effort and that is really key. So
Matt Dunbar is a GIS specialist. GIS plays such an important role in this
work. Tim Thomas is also a GIS specialist and a Ph.D. student in sociology.
Nathalie Williams is a sociologist and she also works a lot with migration
data and has a special interest in modeling conflicts. And, well, I'm a
statistician and so what we work on are really big data sources and we'll get
to talk quite a bit about that. But historically, mobility has been recorded
in quite a few ways. And a special interest has been migration. So in the
past, bigger movements where of key interest where by migration, we mean
people moving from one country to another and remaining there for a good bit
of time, maybe forever. By contrasting migration and mobility, so by
mobility, we mean movements that are short in duration in time and also
spatially. And in the past, because there wasn't a lot of opportunities and
resources for people to move, there was quite a bit of differentiation
between mobility and migration. But now, people have more resources, they
tend to move more. It's a lot easier to move. And so nowadays, a
distinction between mobility and migration is quite blurred. And so
migration now tends to play a little role. Mobility seems to be key, and
really this is what we're going to talk about today.
Why is mobility important? Mobility explains quite a bit of how society
looks like. Mobility is part of the understanding. Social behavior is part
of the understanding why societies change because the concentration of people
culturally with respect to workforce is all related to mobility. Moreover,
if we want to look at the spread of infectious diseases, then mobility plays
a key role because how people move in space and in time is a key factor in
how disease spreads. Also, health behaviors are explained quite a bit from
mobility because by understanding the different context in which people are
active, then we can understand their health behavior and why some people
get -- are likely to be more diseased versus others.
And also, if we talk about economic, social, and political well-being, these
are all key effects in a sense of mobility. So for that reason, in the past,
a lot of efforts have been dedicated to capturing migration and mobility.
And that has been done through censuses and surveys. And these are wonderful
sources of information. However, looking -- actually performing a survey
involves a lot of preparation, so for that reason, only events that were
somehow planned or known of could be captured and also there is a lot of
surveys and censuses are expensive. There are biases related to recall,
biases, related to people not reporting the actual movements. And so while
there is a great value in censuses and surveys, there are lots of questions
that simply cannot be captured through those more traditional sources of
mobility information. Coming -- making the step back to today, right, we
have access to call detail records at least we know companies collect a lot
of information related to for example phone billing and so through call data
records, one can capture the movement of millions of people at country level.
And so because companies collect that kind of information on a regular basis,
all types of events, unplanned events are being captured and so call data
record represent an incredible powerful source of information. But in the
same time, there are biases that exist in the data. There are biases with
respect to phone ownership, phone usage. And then because of disclosure
reasons, it is really not possible to know anything else but the movement
about people what are represented in call data records. And so for that
reason, somehow connecting censuses and surveys with CDRs represents an
avenue of research that would bring a lot of value. However, we are going to
see this is a little bit of un-trivial, not so trivial effort.
And so what is the overall goal? Well, the overall goal is to really make
use of this huge sources of information of mobility information, combine them
together in some meaningful way, link them in order to understand how people
move. So to that end, developing measures of mobility and developing proper
statistical molds that will capture mobility is really, really important. So
because mobility and the information contained in that is so huge, the
development of models involves a lot of effort both from a meta logical
perspective, but also from a computational perspective and all this mobility
information from surveys and CDRs need also to be combined with GIS
information. And we'll talk about that later on.
So if all these effort is there, then we can obtain incredible results
because these results would allow to give better answer to see all the
questions but also it allows us to ask new questions, capture aspects of
mobility levels, captures aspects of the population at country level that
were not obtainable before.
And so just to give you an example, this is a data set I'm going to be using
throughout the talk, so I have acquired access to a data set that comes from
a major telecom provider in Rwanda. So the data spans January 2005 through
May 2009. So what I know is for each person, the complete information about
all the calls they made within that period of time. So I know the time of
the call, the duration of the call, I have a unique identifier that allows me
to connect all the records corresponding with a particular caller and then
coupled with the known location of the cell towers, now I have a way to
obtain approximate location of where the calls have been made. And so this
is what I'm going to mean by CDR. So from this really short description, you
can see that while CDRs allow me to capture movement, I do not know anything
else about these people. Of course the frequent of calls could be an
indication of their wealth and the literature has been using information CDRs
as proxies for social variables. However, the information is quite limited.
And so for that reason, I try to combine this information at the ecological
group level survey data. And so from the same time period, demographic and
health surveys, DHS data is available from three different years, 2005, 2007,
and 2010. So DHS is used, a double stratified sampling. So the first stage,
a couple of locations where surveys take place are sampled and then given
those locations, people in the immediate vicinity are being surveyed.
They released approximate location of where the surveys took place after a
perturbation and so that gives me information about the spatial location of
the DHS clusters. And so putting everything together, let's see what we
deal -- yes, John?
>> I just wanted to ask about the CDRs.
realtime CDR data?
What are the barriers to getting
>> Adrian Dobra: Real time CDR data. By that you mean that I'm going to see
a record immediately after it has been generated. So right now in the U.S.,
this is being done. And so there are companies that actually sell CDR data
and so you can check them out. DR sage .com is one example. So DR sage has
their own network within the networks of pretty much I think the top four
providers in the country. And so CDRs are being generated. AR sage gets in
their database pretty much all the records of the people making calls and so
technically, this is being done. However, purchasing that kind of data I
tried to do it and it's extremely expensive. Companies such as one mark
interested in where people come from in the vicinity of one of their stores
can do that. But from people like in academia, actually seeing that data is
not possible. So yes, I've been down that route. Yes?
>> [Indiscernible] now that now it's kind of moved into [indiscernible]
which report their position at [indiscernible] Intel by which much less by
[indiscernible]. We talk about all the biases to CDR. [Indiscernible] kind
of reporting that's [indiscernible] does is much less biased in several
aspects, but then it's biased [indiscernible].
>> Adrian Dobra: I mean, in all fairness, the companies have a lot more
location -- I mean, telecom companies have a lot more information than CDRs
because if somebody doesn't make calls, then they are going to be invisible
in the CDR database. However, telecom companies have what's called RFID
data. So I think every 30 seconds, our cell phones get a ping so through
triangulation, companies know where our cell phones are and those databases
exist.
>> The GSM send-out is not designed for that. It's not -- so the phone does
that. The phone knows [indiscernible]. Addition of software which measure
qualities and are reading them and send them [indiscernible] which is in
smartphones so it's [indiscernible]. But it's not -- I mean, it's
[indiscernible] let people think that their phone is reporting their position
and it's not -- the phone where it is [indiscernible] triangulate but sent
report [indiscernible].
>> Adrian Dobra: I mean, I know about [indiscernible] data only anecdotally,
really, because I haven't seen it. I don't think I will see. However, I've
been talking to people who have analyzed [indiscernible] data and that's what
they told me. You know, they look at, so ->>
[Indiscernible] widespread things like [indiscernible].
>> Adrian Dobra:
Yeah, yeah.
>> [Indiscernible] do you also have temporal information, [indiscernible]
information?
>> Adrian Dobra: So I mean what I know is when the survey took place, and so
I just know year.
>>
Year.
>> Adrian Dobra: Right. That's all about it because surveys take quite -- I
mean, months to really be ->>
[Indiscernible].
>> Adrian Dobra: Right, right. And so I mean, there are many limitations.
I mean, the point I was trying to make here is that a lot of mobility
information about our society exists. It's a huge amount of it, you know.
What I got access to is just a little tiny glimpse of it. So trying to
connect survey and CDR data in this effort, what kind of map are we going to
look at. And so this is a map of Rwanda. Rwanda is this very small country,
and what I plotted here are the location of DHS clusters and as you can see,
so the color blue, light blue, and green indicate the surveys from 2010,
2007, and 2005. And so if you follow these colors then you're going to see
the locations that are being reported are not the same. Right? Because
they -- because of the double certified sampling. They sample different
locations and then the perturbation was also different. And so even to
connect DHSs from several years is difficult. Right.
In red, you see the locations of the cell phone towers' locations I know of
and you can kind of see their spread varies. So Rwanda has the capital
Kigali pretty much in the middle of it, and that's where the largest
concentration of cell phone towers you see right here and DHS clusters occurs
and then the rest of the country has a few cities right around the border
areas mostly, but as far as I can tell, it is mostly rural. And so we would
expect a lot of information to come from this area and so -- and then you
know, there's quite a few information coming from where the cities are but
then in the rest of the country, the spread is quite large and actually this
picture is a bit misleading because it's a picture of an entire country and
there are so many pixels, but if you zoom in, you'll see the sparsity is
quite high in pretty much all the areas other than Kigali. And so the first
challenge is quite significant and it is also compounded by the fact that not
all the cell phone towers with locations you see here represent location of
the active towers. So these are present all the towers I know of, but if you
look in time, you'll see that the -- and we are going to look at that plot.
We are going to see that the number of towers and location of the towers, the
concentration varies quite a bit in the country.
And so connecting CDRs with survey data involves a lot of modeling, and
that's one of the things, the to-do things on my list. You know. However,
so far the results that I have just from a first look are pretty much full of
questions that are yet to be addressed.
Another country I'm very interested in is South Africa. In South Africa,
because of the upper high regime, the phenomenon of circular migration is
widespread. So many specially had to travel for work. They essentially had
to leave their hometowns and go out to work in mines or in factories located
quite far from where they are. And men, the families of these men didn't
have the permit to move with, you know, so families were essentially
separated for six months and maybe one year. So after the upper high regime
was abolished, of course those travel permits disappeared but the way of life
has not disappeared. And so because of the separation in the families, HIV
is so widespread, right, because, you know, essentially men look for
different partners while away and maybe the women were maybe doing the same
thing.
So I'm very interested in this country because there are no mobility patterns
that are strong and are still present nowadays. And so I established a
collaboration with the Africa Centre and Africa Centre manages a study area
that is located north of Durban. This is Johannesburg. This is the location
we're looking at. It's most quite a rural location. This is a map of the
surveillance area, about 100,000 people live there. And what you see here,
the small dots represent the location of the homesteads. So they have
complete [indiscernible] information about everything that takes place in the
area. So they have just information about the roads, about location of the
health clinics. They map all the homesteads. They map pretty much
everything that can be mapped in that area. And then every year, they try to
do HIV testing for all the people in the area. Of course they might not get
consent for about 30 percent of the people give their consent to be tested
and every six months they collect [indiscernible] demographic information.
And so that represents a really, really incredible source of information.
Actually, this is state of the art. And so what do they do with all that
location information? Well, they create maps like this. So one map that you
see here, right, represents one year. So they start in 2005, '6, '7, '8, '9,
'10, '11, and so what you see here in redder colors represent how HIV has
spread in the surveillance area. So you see that it starts off with the
higher concentration here where a township is and then it pretty much spreads
more and more while, you know, in 2011, there is -- HIV seems to be quite
widespread in the area.
Of course this is not necessarily -- yeah, please go ahead.
>>
[Indiscernible] each year.
>> Adrian Dobra: Each year, right. 2005, '6, up to, right. So that's when
they collect their information. Right. And so of course higher prevalence
of HIV is not necessarily something negative because it could also be an
indication of people living longer. So people are follow treatments, right,
so they are on ART, the follow-up is and the adherence is quite good for that
surveillance area so people live longer with the disease and for that reason,
you know, we see higher HIV prevalence, right.
And so what would be the key negative of this map? This map really
represents -- this is a science paper published in 2013. So top journal,
really high-end data. But what would be the problem with this map? What
does this kind of map assume without us really having to spell it out? So
the dot -- yeah? Any guesses?
>>
[Indiscernible].
>> Adrian Dobra: Exactly. So this kind of map is based on the location of
the homesteads. Right. So these maps assume that people exist where their
homesteads are, but this is such an inappropriate assumption for this area
because of circular migration. You see? So while this is state of the art,
I mean, this is one of the few surveillance area that has such a detailed GIS
information. What you would really want to capture is movement. Right. And
that is exactly the missing part in the picture.
>> There's other dimensions that it misses.
reproduction of ART treatment or ->> Adrian Dobra:
>>
I mean, you talked about the
Right.
-- when was that introduced and what was the prevalence of that.
>> Adrian Dobra:
Right.
>> So I think that's one dimension. But then there's also demographic
changes in populations. So has there been population changes here. So I
think there's a lot of different [indiscernible] where they're missing in any
kind of static view.
>> Adrian Dobra: Absolutely. You know, but say, you know, quite a few of
these dimensions are related to mobility because adherence to long-term
treatment is related to mobility, right? Is a person that lives in the same
area, they are going to be aware of where the health clinics are. They're
going to know where to get their pills. But if somebody is traveling around
the country and is faced with environments they're not familiar with, perhaps
they are unlikely to go and see a doctor when they need. They're unlikely to
get their pills when they need it and so on.
And so pretty much all the factors I think you mentioned can be --
>> [Indiscernible]. So if you have a new treatment, that's not
[indiscernible] missing here by population density that I think -- so is
there a sense in which the hot spots correlate to population density?
>> Adrian Dobra: So they have adjusted for that. So they have adjusted for
the changes in the population as far as I can tell.
>> I expect that the hot spots correlate to population density so the red
areas [indiscernible] correspond to ->> Adrian Dobra: Well, I mean, the hot spots correspond more to distance
from major roads. And we are going to get to that. Actually there's a major
road right here going through that area and then you see how HIV has spread
pretty much along that road, right. So there's township here and that's
where the major road is and then you see how, you know, population density
but also HIV kind of spreads, right. Yes?
>> Do you have a sense of sort of the changing and consenting to testing?
[Indiscernible] treatment more efficient or if you know sick people
[indiscernible] errors and the treatment is more efficient, the more I need
to be consenting to testing right?
>> Adrian Dobra: Right. So I think initially they got I think a 60 percent
consent rate and then it pretty much dropped to 30 percent from the
subsequent years you know, but it's not the same people who give their
consent each year.
>> [Indiscernible] expected the opposite. As treatment become more
efficient, you are more likely to be willing to be tested. I mean, I don't
know, maybe there are other facts.
>> Adrian Dobra: Right. But there is still the mobility factor. So these
are a couple of pictures I took back in January when I went there. So this
is how a school looks like over there. This is a typical homestead in that
area. So it's pretty much a one-room type of dwelling and everything takes
place in there. And what you see here are two of my friends, two
fieldworkers that took me out with them while they were knocking from door to
door. And so what you see here are stacks of files they were carrying with
them and you see maybe there are 10 or 15 files in -- they have there and
each of those files correspond with one household. So they had about 10 or
15 people who are recorded as living into one of these dwellings. So I asked
them, how is it possible that so many people whatever fit in that homestead
and they said, recognize well, people are never there. Essentially, most
people who are recorded as belonging to that homestead are somewhere out to
work and they periodically come back and there's some sort understanding of
who takes over and who leaves and so on and so in as the reality of their
life. There is a lot of mobility. And residential location has very little
meaning in their life. It has a lot more meaning in our lives, right. But
in this lives, it has very little meaning.
So really, capturing mobility is a lot more important in a sense in the South
African context than it is in the Seattle context, if you like.
>> What does consent rate now mean in the context of is it for percentage of
people, percentage of households?
>> Adrian Dobra: Right. So that's something I need -- right. So that's a
fair question and that is something I need to talk a bit more with my friends
in the Africa Centre about because from their papers, obviously controlling
all the details, but a lot of questions come up once you really understand
how their life looks like and that to me was one aspect I was really
intrigued about.
And so this is just an example -- well, it's a fake one, but you know, I
tried to make it as real as possible. You know, so this is an example of how
a life of a person could look like. So this person, a man, John, say, lives
in the surveillance area but really, he has to work in Durban. So he
periodically comes back to the surveillance area to see his family but then
he really spends most of his time in Durban. And so what he's exposed to a
multitude of context that vary quite a bit when he is in the surveillance
area where his home is as opposed to when he is in Durban. In Durban, he is
away from his family, so he might be more prone to risky behavior. In
Durban, he cannot afford to live in a good area, and so he lives in less than
desirable part of the city, in which there are a lot of bars and sex workers
and so on. And so the temptation might be there for this person really not
to follow the kind of life he would have if he would have been with his
family.
If John happens to be on ART, he happens to have acquired HIV, then he needs
to take his pills regularly. When he is near his home, he's going to know
where the health clinics are. In Durban, maybe he just doesn't know there is
a health clinic a couple of blocks away. And so while the possibility is
there for him to get his pills, maybe because of his different environmental
context he's exposed to, he just doesn't know where to go. Maybe there is
social stigma associated with going to the nearest health clinics in his
hometown. And so maybe he's more likely to obtain his pills when he's away
as opposed to when he's in hometown. We just don't know that. Yes?
>>
[Indiscernible]?
>> Adrian Dobra: So I do not know about that. No. So the only point I'm
trying to make here is that moving from the really one point that -- from one
point location to looking at mobility patterns, figuring out what are the
relevant locations in somebody's life is really, really important for
explaining health behavior, social aspects and so on. And so the information
that is present in CDR data is part of the picture you know, but really that
is the kind of data I would like to look at. And the project will hopefully
get funded will collect GPS data from about 1200 people living in this area.
So hopefully I will get to do that and hopefully next time I get to come here
and give a talk I will have real data instead of this schematic. And so
dynamic context, what do they represent? And so if we look at the trajectory
of the space and trajectory of a person, so the plane here represents
latitude and longitude XY and then Z, that represents time. And so you see
how this person seems to stay at this key location for quite a lot of period
of time but then he periodically moves to this other location and so
identifying when a person is present in certain location and the number and
spatial spread of this election is certainly important, right. And so that
is precisely the kind of information we would need to capture.
In the same time, knowing just the location, latitude and longitude, doesn't
have much meaning. Right? What we would like to do is have layers GIS maps
associated with different risk factors, say accessibility, road networks or
concentration of healthcare facilities. Or access to food stores and so on.
So each of these layers, GIS layers need to be produced and each of these GIS
layers can be connected with spatial temporal trajectories and everything put
together will give an accurate picture and will make us able to produce
location that also have meaning attached to them. And that meaning can come
at so many different dimensions. And so this is why GIS work is really,
really important together with the effort of collecting good spatial
information.
And so UN NATO actually released back in January a report called location,
location, location. In this report they stressed the importance of going
from country-level estimate of disease prevalence to local level. And so
moving from country level to local level can be done for example by using the
location of the homesteads and that kind of information would be available
countrywide. That will represent a huge step forward. However, as we have
just seen, that kind of information has only a limited relevance, say, for in
countries which mobility plays such a key role in the volatility -- there's a
lot of volatility of movement. And so capturing multiple spatial context of
really, really important. The residential context which is really dominant
in the demography literature is by all means not sufficient, although,
knowing the residential context of the country level is still something that
has yet to be obtained, especially for this advantage countries.
And so what is also person is spatial colocation of individuals. So being
able to perform realtime census is being able to know all the time where
people actually are is quite important because if you think about spread of
infectious diseases, spatial colocation is a key indicator of that. And so
for all these reasons, your located data that can come from quite a few
sources becomes really, really important. And so ideally, what we would like
to have for each person is an indication of the areas in which an individual
exists, the times when that individual exists in those places, attaching
meaning to those places, knowing the kind of activity an individual is likely
to take place in in each of those key activity areas, and also determining
the associations between activity areas and various contextual measures. So
overall, these personal activity maps should replace residential information
and so based on these maps we can have comprehensive picture of somebody's
well-being and health and the results spatial and temporal behavior has on
the individual's well-being. And so what kind of information we can look at,
what kind of location information we can look at, the first sources of
individual GPS tracers. GPS tracers are quite common in the literature
dedicated to obesity. So they use GPS tracers and [indiscernible] meters to
capture really dense location information. And so however -- being able to
use those devices from a longer period of time where long means two weeks or
1 month becomes questionable, especially when you want to increase your
sample size. So ideally, this is the kind of information you would want to
have. However, obtaining it is quite difficult if you also want to determine
activity types. So what kind of physical activity a person does.
So another source of information are individual cell phone records. So if it
would be possible, if somebody could just let the signing of disclosure
agreement that let us access all his cell phone records, and that would be
amazing because we already know -- we already -- I mean, his approximate
location information already exists in the databases of the phone companies
that individual is using. However, actually, taking something like that into
reality constitutes a challenge because not only somebody signing such a
disclosure agreement represents a difficulty, but also obtaining the
individual records from the cell phone companies is also another level of
difficulty.
And so the third source of information represent these collective cell phone
records which are incredible if those databases are released. However, the
problem with these collective cell phone records is that there's no other
information available for those individuals and so any inferences can be made
only at the group level. And so if individual cell phone traces would be
available, so this is real data from an obesity study done at UW, or actually
this is -- these are GPS traces over a period of two weeks, then you know,
from those GPS traces we could extract key locations and also patterns of
activity, patterns of mobility at each day, day level or weekend level, and
also we can slice this kind of information temporally and so having access to
that kind of information has an incredible value if that information would
exist.
And so say given a certain -- given the mobility trace of an individual, how
can we actually quantify its movement? And so last year, I started looking
at a couple of mobility measures and some of these measures are known in the
literature and so these are the six measures I've been looking at. I will
describe them here and so what you get to see here are maps that represent
the locations from which individuals that were highly mobile with respect to
each mobility measure, so this is the data that I have from Rwanda 2005 to
2009. I evaluated mobility for all the individual over period of one month
and then I extracted the most mobile ones.
And so we expect a number of towers used. This is the profile of the
individual that has the largest number of towers used. So you see this
individual has been pretty much everywhere and the country. This is the
profile of the individual that has the maximum distance between two towers,
two use towers. This is the profile of an individual that has the maximum
radius gyration. Radius of gyration is some sort of variance of all the
positions somebody was recorded in. And so you see, so you -- it starts by
calculating the sent raids of all the known locations and then from there,
distances from all the known location to the sent raids and computing the
square root of the squares of that. Anyway, so that's the radius of
gyration, area of the convex hull and so, so on. So what is interesting here
is that the different level of mobility bring up different profiles. But
soon I was really turned off by using all of these measures. And the reason
why I was turned off by all of them is related to the fact that through those
measures, I wasn't capturing the frequency of movement. So I had -- so I was
able to capture the spatial aspect of the movement but I wasn't able to
capture trips. Right. How frequent movement is important. And so I said,
well, I must find a way to capture space, spatial spread but also frequency
of movement. Where I also didn't capture here is how movement occurs because
what I had, where the location of the cell phone towers from which each
person made calls, and so through those mobility measures, I had no way to
say, okay, this person must have followed a road in order to travel from one
location to the next one. Right. So seeing the tower location as point in
space is less than desirable. So I said I must find a way to also capture
how movement might have occurred. I mean, traveling by air is less likely in
Rwanda. They must have followed the road. And so building road structure in
this seemed to be important.
What I also had to capture was the fact that towers did not exist for all the
periods of time, all the 53 months I was looking at. And so this is why I
had to use a grid structure. So I mapped -- I used the five kilometer square
grid structure and I mapped all the towers that belonged to each grid cell
and I considered that to be a location. Some of the grid cells did not
contain any towers but the grid cells at that contain at least one tower were
called places. And so I used places instead of the location of the towers
per se because that gave me a little bit more stability in time with respect
to the reference of location.
So looking -- so those two plots illustrate the dynamics of the CDRs over
from 2005 to 2009 so the top plot shows the number of callers in this
provider's network so you see the number of callers go from about I think
250,000 to about a million. So as we move from January 2005 to May 2009.
You see here there are two drops, so those are two months I think May 2005
and April 2009 -- February 2009 for which there was very little information
so essentially I dropped those two months because there wasn't much data
there.
And so if we look at the plot over time of the number of active towers in red
and the number of active places, then you see how that network of this
provider has expanded and there were huge increases from 2008 to 2009, so
using this grid structure, by using replacing tower level location
information with places grid level information, then I stabilized this effect
a little bit so the changes were not that huge.
And so the next plot why you shows how the -- what were the places, the
active places from 2005 to 2009 so in blue here you see the grid cells that
contain active towers in January 2005 and there were 49 of them. In red, you
see which grid cells, which places have been added from the previous plot.
So you see how the network has expanded and so you see that there is quite
huge change, right, you know, when they installed new towers but you also see
that going from January 2009 to May 2009, some places have disappeared. See.
So compare these two months, there were no places added but two places shown
in green have disappeared.
And so looking at the dynamics of the CDRs is a little bit more difficult
because of the different concentration of location information that I have.
Another aspect that was very important for me to capture was related to how
the movement occurred. So on the left side, you see a plot of the towers.
On the right-hand side, we see the major and secondary network of roads in
Rwanda superimposed with location of towers. And so I thought that I must
find a way, I must develop more [indiscernible] measures that capture the
road network structure because this is how movement actually occurred.
And so came up with six mobility measures and really, there are three groups
of mobility measures. So I have trips, right, so I look at movement as it
captured by a person placing calls from one place to another place and then
to another place and so on and I murder that movement through road distance,
travel time, so that comes from GIS information. And also from the number of
grid cells between two places. And so you see, instead of with the previous
version of the mobility measures, see, the distance was quantified as the
crow flies and that could be very different than distance on the road network
[indiscernible]. And so these three measures captured the frequency of
movement and also the road network structure so they pretty much achieved
what I wanted.
>> I don't understand the distance versus grid cells very well.
like they'd be relatively close.
They seem
>> Adrian Dobra: They are all very close, right. So essentially, the only
difference between the -- these three mobility measures is the unit of
measurement.
>>
Time might be very different.
>> Adrian Dobra: In the context of Rwanda, is really not. I was expecting
that time would be different. That was my intuition, but this is not what I
get to see. In a context of a more advanced society, then I would indeed
expect this measure to be different, you know, but in the context of Rwanda,
maybe traffic there is not that complicated as it is in Germany for example
or here. Right. And so having all three measures seems important to me,
although they kind of capture overlapping information in Rwanda.
>>
Sort of estimated drive time?
>> Adrian Dobra: Estimated travel time. Right. Yeah. So how long it would
take to travel from here so here using the road network. Yes?
>>
[Indiscernible] from the road.
>> Adrian Dobra:
>>
Yes, yes.
[Indiscernible] seen someone [indiscernible].
>> Adrian Dobra:
it's all --
Right.
We wouldn't have that kind of information, no.
>> [Indiscernible], right?
[indiscernible].
>> Adrian Dobra:
aspects of that.
>>
Travel from the road.
So
All you have, you have XYZ, right, so you
Correct. But somebody -- I mean, but there are several
Somebody might not be placing calls as they move and --
[Indiscernible].
>> Adrian Dobra: Yeah, yeah, yeah. Right. Yeah. It's not trivial to
estimate actual travel time from CDRs. From GPS data is absolutely possible,
but from -- yeah. John, did you ->>
I had the same question, actually.
>> Adrian Dobra:
Right.
So --
>> [Indiscernible]. By trips you're not talking about identifying some
subsequence of the CDR. You are calling it a trip, but you're saying taking
some period of time and computing the distance between successive pairs of
towers and summing those distances?
>> Adrian Dobra: It's simpler than that. So what I just looked is when
somebody placed a call from this grid cell, and a trip would be whether the
next call was placed from another grid cell. So that's a trip. Moving from
grid cell to another grid cell that contains -- where both grid cells contain
a tower. That's what I call a trip.
>> [Indiscernible] your CDR, what is the measure?
some period of time?
So is it measured over
>> Adrian Dobra: No, no, the travel can be -- can occur in 30 minutes or it
can occur in the next day.
>> Right. I'm not trying to [indiscernible]. So I give you your
[indiscernible] for the past month, what is the measure of mobility for you
for distance?
>> Adrian Dobra: So I look at the -- so, see, take, for example, this grid
cell, that's a place. It has a cell phone tower and take this other grid
cell that's another place that has a cell phone towers right? And so if I
place a call here and then the next call I place was here, then that's going
to be a trip.
>> So you have a statistic that's a function of pairs of successive CDRs and
the statistic is zero if they didn't move out of the cell or something like
that.
>> Adrian Dobra: So if I place two consecutive calls from this -- from the
same place, I'm not counting that because I didn't move.
>>
Okay.
>> Adrian Dobra:
Yeah.
Please.
>> [Indiscernible] edge of two towers and just switching between them and
[indiscernible] a lot of trips [indiscernible]?
>> Adrian Dobra: Absolutely. Right. So there are limited -- absolutely.
There are limitations to using a grid cell structure, and I agree with you.
If those borderline cell phone towers could be indicative of more movement
than it actually is, but because of the varying density of cell phone towers
in their country and because of towers appearing and then disappearing we
considered that we gain, you know, we create more stability by using a
gaussian structure versus not using it.
And so the number of trips, so these we measure capture both space and
frequency of movement. This measure just comes how many trips somebody
makes, so this is -- there's no spatial aspect of it. It's just frequency of
movement and then the other two measures, number visited place, number of
visited grid cells capture just the spatial aspect of it.
Something, this is a key aspect I didn't talk about, and that relates to
figuring out where somebody might have been even if they didn't place calls
from those places. So for example, if somebody placed calls from here and
then the next call was placed from here, then -- and if the shortest road is
this one in red, then we also count it as a visit to place this grid cell
here and also this grid cell right here because those are places somebody
must have traveled through. And so we balance the fact that somebody might
not have placed calls by inferring places and grid cells somebody might have
visited based on their own [indiscernible] and that's an aspect that was
really important because you've seen that individual that was [indiscernible]
with respect to the radius of gyration that placed calls from all the three
sides of the country and no other calls from anywhere else, that individual
must have traveled through some place. So that essentially motivated this.
Yeah?
>> It seems like if you were trying to find the correlation between, say,
HIV and sex workers or obesity and McDonald's, then you want to look for
places where people stay for a little while so they have [indiscernible] to
the risk, right? So just traveling through a grid, I wonder if that's really
very indicative of risk.
>> Adrian Dobra: Right, but we can -- well, it's not. I agree. And so this
is only one version of mobility measures that take into account frequency of
movement. And also are a bit more stable with respect to the type of
locations that are known to us. But I also agree with you that there are
other measures that can be developed from here on that could be -- I mean,
these measure need to be geared for certain applications, right. And so,
yeah, I mean, sky is the limit with respect to how these measures can be
changed or replaced but for us it was going from the first set of measures to
these set of measures. It was a big step because they -- yeah?
>>
You also have a [indiscernible] so this is like a [indiscernible].
>> Adrian Dobra: Right. There's so much -- so many variables, so much
additional information one can create based on that data once we throw in the
road network structure and other GIS information. I absolutely agree with
you. So this was information that was actively straightforward to understand
in terms of [indiscernible] can be augments in so many ways so it is actually
really valuable and you know, it can be exploited in so many ways.
Absolutely. We just start realize now, okay, how much we can do it. So I -how much time do I have?
>>
[Indiscernible].
>> Adrian Dobra: Yeah but how much time? When do you have to leave?
Because there are two things which I want to ->>
[Indiscernible].
>> Adrian Dobra: Okay. Yeah, okay. Great. So I have -- so I just show
here mobility profiles for the highest mobile people with respect to each of
these measures and as it happens, the highest mobile person with respect to
trip distance is also the highest mobile person with respect to the trip's
time and trip's grid cells and also looked at the association of these
measures and in the context of Rwanda, I absolutely agree the unit of
measurements place very literal. And so what you see here in red are the
places, the grid cells from which this person placed calls. In blue, you see
all the grid cells this person must have traveled through, okay. And so you
see the total distance traveled in meters, total time travel, right, number
of grid cells visited. So you see he's on all the extreme with respect to
all those measures and also with respect to the number of trips, how many
transition this person made.
With respect to the visited places here, 0.45, so I used a ratio between
number of visited places and number of places that were active in that months
so I can compare from month to month because not the same number of places,
not the same number of towers were active from month to month and 215 is the
number of grid cells so that's one extreme profile.
Another extreme profile is this one. This is the highest mobile person with
respect to the number of trips and we see that this person places calls from
just five grid cells and there are just six grid cells all in Kigali this
person placed calls from. We can all guess what this guy was possibly a taxi
driver moving around Kigali and so you know, but the point is, you know, the
profile of this person that is extreme with respect to the spatial spread and
frequency of movement is quite different than the profile of this person that
disregards the spatial aspect of it, right.
And so another profile, the most extreme person with respect to a number of
visited places, you see, this person has visited all those places, all the
grid cells that contain towers showing red here and then in blue are all the
other grid cells this person must have traveled through. And so based on
these six dimension and mobility, now we can come up with mobility profiles
that are quite accurate and can be tailored for different applications.
What I've also been looking at is the association of mobility measures cross
time and so in this one plot, which I chose to show here, you see the
correlation between the five mobility measures that capture space with
respect to number of trips. And so you see that visited trips distance,
trips time, and trips grid cells see are quite -- have a high correlation
with number of trips and that makes sense because most for measured captured
the frequency of movement and then you see the number of trips has a much
lower association with visited places and visited grid cells two measured do
not capture the frequency of movement, right. And then there are
corresponding plots for correlation with respect to the other mobility
measures and from those association measures, we clearly see that the three
groups of mobility measures are there and are quite distant.
So we've been talking quite a bit about capturing mobility. But what about
identifying the people that not move -- that do not move. This is such an
important aspect because people that are immobile are people that could, for
example, have software from a chronic disease, handicapped people, poor
people, basically people in need. There could be also people that perceive
movement as something adverse to them. So for example, when a disaster
occurs, say if we look at our coastline area, when a hurricane comes, knowing
where the immobile people are makes us know what are the areas that should be
searched first because those areas are likely to contain people that refuse
to leave when the warning came. And so being able to identify immobile
people at different levels is very, very important and here, I show the
concentration of immobile people, people that just didn't move, people that
placed calls from only one grid cell in across the 51 months of data that I
have.
And so you see here that in red, you see where the highest concentration of
mobile people are at the country level, and then you see that there are two
grid cells that are right in Kigali and then there are two other grid cells
here and here that come up as being relevant throughout this time period.
These are all associated with two townships right on the boarder of Congo.
And especially these two grid cells right here, their numbers are 360 and
361. I'm mentioning them because these are two grid cells in which that were
affected by an earthquake that took place in 2008. And so we will come back
to that.
But the point is here that we are able to capture and locate immobile people
and that is important for so many reasons. Yes, please.
>>
[Indiscernible] correlate with like the frequency of [indiscernible].
>> Adrian Dobra: I didn't look at that. Right. So I absolutely agree that
there could be people who maybe have four cell phones and maybe they chose to
use that one cell phone I have information about only when they're at home.
And those people would be perceived as being immobile. So no, I didn't look
at frequency of calls. But of course, there are so many filters one can
apply going from here.
>>
[Indiscernible] something which [indiscernible].
>> Adrian Dobra: Right, right. You can look at some sort of ability of
movement across time and I just took periods of one month and if somebody
just didn't -- even if somebody placed just one call within one month, you
know, it didn't matter versus placing 1000 calls. Correct. So yes, there
are filters that can be applied. And the point is, CDR data can be used to
create maps like this, which are so important for so many reasons with
respect to planning, figuring out where resources should be concentrated and
maybe where rescue efforts should be concentrated.
And so what the last thing I'm going to show you is related to the ability of
CDR date to capture extreme events. Right. And so I he took the example of
this earthquake, 5.9 magnitude that took place on February 3rd, 2008. It
took place right here, somewhere in -- so this is Lake Kivu. And this is the
border between Congo and Rwanda. Rwanda is over here. There are two major
cities. This one right here and the other one up there. They are the same
ones that showed up as containing quite a few immobile people and so what I
tried to look at -- so this was a relatively major event. And so what I
tried to figure out is whether it is possible for me to see that event
identify that event from CDR data.
So this is the location, the epicenter of the earthquake. And so in red you
see all the towers that were active in February 2008 that were within 25
kilometers of the epicenter of the earthquake. And so I looked at the CDRs
of all the people that placed calls from those towers. When I tried to look
at mobility levels, using month as a unit, I really didn't see any change as
it went from January to February to March, there wasn't much change. So
being able to zoom in at the daily level and also zoom in at the
corresponding spatial spread was very important. And so once I looked at
daily mobility levels, then I started seeing patterns that were really,
really interesting.
And so there are five places in which those towers within 25 kilometers of
the epicenter belonged to. This is place 360. It's right here at the border
with Congo. And so what you see here are mobility, daily mobility levels for
ten days before the event in red and ten days after the event, the day of the
event is shown in blue. And so the four graphs correspond to four mobility
measures that capture the frequency of movement. And so this is trips
distance, trips time, trips grid cells and number of trips. And so you see
here how the day of the event becomes quite clear because it is -- it
corresponds to higher number of trips. Right. And then you also see how the
event itself has changed the mobility patterns in the area.
So what is interesting here is to be able to figure out where in realtime
whether an event such as the disease outbreak takes place and be able to
predict the spatial spread and the existence of an event we are not aware of.
Here, obviously we are aware of the event. We kind of try to reverse
engineer, but the interesting application would be the development of
predictive models that would allow us to see in realtime and that goes back
to your question, right, can we see the answer in realtime and really be able
to predict when something takes place and where that something takes place.
And so what we see here is that, see, mobility levels clearly seem to go up
the day of the event and that is clearly visible but after the event for this
particular grid cell, they seem not to come back to the levels that were
there before. So you see there is a clear change between the days before the
event and the days after the event.
>>
[Indiscernible] other way around.
[Indiscernible] after.
>> Adrian Dobra: Oh, I'm sorry. Okay. Fine, right. So it's -- fine. So I
reverse my story. Right. Okay. Yes, you're right. Red they're after,
green is before, right. So there is less mobility before the event. There
is a spike the day of the event and then mobility slows down but not at the
levels that were there before. The point is we can clearly see the event in
the data and when we talk about disaster recovery, it is quite important to
figure out when an area -- the activity in one area has resumed at its normal
level before the event. So that kind of information is clearly there in the
data so this is the grid cell 361 right next to the grid cells we just looked
at and here we clearly see seeing the number of trips the day of the event
stands out. There is a little bit more variability in this grid cell in
terms of mobility before and after the event but the patterns are still
there. Yes?
>> I think in this case [indiscernible] because the number of trips might
increase because [indiscernible] same mobile but [indiscernible]. So then my
number of trips increase, my number of [indiscernible] increase, all those
matrix increase [indiscernible] not changed at all, right?
>> Adrian Dobra: I absolutely agree with you in this context it makes sense
to look at things like frequency of calls and the literature has examples in
which, say, sporting events have been captured from CDRs. I mean, you can -if you look at the work again, you see when team scored because people
started calling right after a goal is scored and so you can clearly see the
up and down so you can see the half time, you can see the end of the game and
so on. So that is people being in the stadium. Then obviously there is not
going to be much change in their mobility level but the frequency of calls is
going capture that. And so I absolutely agree in this context it makes sense
to look at that aspect.
What I tried to show is that using the same mobility measures I defined, I am
able to see the event quite clearly. I mean, see, if you are expecting
something to happen, then monitoring frequency of calls is something you can
look into. But if you want to identify events that are about to happen, the
frequency of calls at least in my opinion has a lot more volatility and so
for example, if a certain campus is affected by a disease outbreak, then
what's going happen is that people might not come to work as they usually do
and then if you look at mobility patterns in that area, that is going to be
reflected. The frequency of calls might not be, might not play a role -yeah.
>> [Indiscernible] mobility matrix are not immune to change in frequency of
calls and then [indiscernible] reliable to look at them if there's been a
sudden switch in the frequency pattern.
>> Adrian Dobra: I absolutely agree. But for developing countries, looking
at Rwanda, the frequency of calls is not something I trust because in that
context, people might not make all the calls they want to make and they might
not make all the calls they want to make from all the cell phones -- from the
same cell phone. And so here, I absolutely agree with you. If you assume
somebody is using their cell phone and it's the only cell phone they are
using, every time they want, then you would want to look at frequency of
calls but in the context of Rwanda, I do not think the frequency of calls is
so much indicative of behavior because they might not have the means to do it
and they might share several cell phones and so on. I mean, I heard that
people have their own sim cards but share the actual phone and they just
replace sim cards to place their own calls and so on. So actually these
records indicate sim cards, not the cell phones per se.
Yea, so essentially -- and the patterns is still there, right. If you look
at place 467 that is north of Lake Kivu, so spatially, you pretty much get to
see the impact that earthquake had and you can clearly see that that impact
is very, very visible from the CDR information that we have. So that's place
70, same sorry. [Indiscernible] strong and it is there.
And so I will conclude here, you know, part of what I'm interested in is HIV
spread but this is only part of the picture. There are so many ways this
research can grow models from identifying relevant places and the dynamics of
those places are very relevant predictive models for catastrophic events, for
extreme events, are relevant capturing mobility and also building that model
that predict health and well-being are really relevant.
The key aspect
that data with
as far as this
I guess we can
to have access
of all this is looking at GPS geo-located data and combining
GIS information and so you know, the avenues one could go from
line of research is concerned are really endless and you know,
only see the beginning of this research taking off as we start
and collect GPS information. So I will stop here. Thank you.
[Applause]
>> Adrian Dobra: Any other questions? All right. Thank you then.
Download