32964 >> Asela Gunawardana: So thank you for coming. I'm pleased to welcome Adrian Dobra from University of Washington where he studies mobility and applies it to very important problems such as studying HIV transmission. Adrian. >> Adrian Dobra: Thank you, Asela, and thank you, Chris, and thank you, John, for inviting me to give this talk. My group does a lot of work with mobility. And today I'd like to share some of the things we are working on. There's quite a bit, and the structure of our group essentially gives a way that this is such an interdisciplinary effort and that is really key. So Matt Dunbar is a GIS specialist. GIS plays such an important role in this work. Tim Thomas is also a GIS specialist and a Ph.D. student in sociology. Nathalie Williams is a sociologist and she also works a lot with migration data and has a special interest in modeling conflicts. And, well, I'm a statistician and so what we work on are really big data sources and we'll get to talk quite a bit about that. But historically, mobility has been recorded in quite a few ways. And a special interest has been migration. So in the past, bigger movements where of key interest where by migration, we mean people moving from one country to another and remaining there for a good bit of time, maybe forever. By contrasting migration and mobility, so by mobility, we mean movements that are short in duration in time and also spatially. And in the past, because there wasn't a lot of opportunities and resources for people to move, there was quite a bit of differentiation between mobility and migration. But now, people have more resources, they tend to move more. It's a lot easier to move. And so nowadays, a distinction between mobility and migration is quite blurred. And so migration now tends to play a little role. Mobility seems to be key, and really this is what we're going to talk about today. Why is mobility important? Mobility explains quite a bit of how society looks like. Mobility is part of the understanding. Social behavior is part of the understanding why societies change because the concentration of people culturally with respect to workforce is all related to mobility. Moreover, if we want to look at the spread of infectious diseases, then mobility plays a key role because how people move in space and in time is a key factor in how disease spreads. Also, health behaviors are explained quite a bit from mobility because by understanding the different context in which people are active, then we can understand their health behavior and why some people get -- are likely to be more diseased versus others. And also, if we talk about economic, social, and political well-being, these are all key effects in a sense of mobility. So for that reason, in the past, a lot of efforts have been dedicated to capturing migration and mobility. And that has been done through censuses and surveys. And these are wonderful sources of information. However, looking -- actually performing a survey involves a lot of preparation, so for that reason, only events that were somehow planned or known of could be captured and also there is a lot of surveys and censuses are expensive. There are biases related to recall, biases, related to people not reporting the actual movements. And so while there is a great value in censuses and surveys, there are lots of questions that simply cannot be captured through those more traditional sources of mobility information. Coming -- making the step back to today, right, we have access to call detail records at least we know companies collect a lot of information related to for example phone billing and so through call data records, one can capture the movement of millions of people at country level. And so because companies collect that kind of information on a regular basis, all types of events, unplanned events are being captured and so call data record represent an incredible powerful source of information. But in the same time, there are biases that exist in the data. There are biases with respect to phone ownership, phone usage. And then because of disclosure reasons, it is really not possible to know anything else but the movement about people what are represented in call data records. And so for that reason, somehow connecting censuses and surveys with CDRs represents an avenue of research that would bring a lot of value. However, we are going to see this is a little bit of un-trivial, not so trivial effort. And so what is the overall goal? Well, the overall goal is to really make use of this huge sources of information of mobility information, combine them together in some meaningful way, link them in order to understand how people move. So to that end, developing measures of mobility and developing proper statistical molds that will capture mobility is really, really important. So because mobility and the information contained in that is so huge, the development of models involves a lot of effort both from a meta logical perspective, but also from a computational perspective and all this mobility information from surveys and CDRs need also to be combined with GIS information. And we'll talk about that later on. So if all these effort is there, then we can obtain incredible results because these results would allow to give better answer to see all the questions but also it allows us to ask new questions, capture aspects of mobility levels, captures aspects of the population at country level that were not obtainable before. And so just to give you an example, this is a data set I'm going to be using throughout the talk, so I have acquired access to a data set that comes from a major telecom provider in Rwanda. So the data spans January 2005 through May 2009. So what I know is for each person, the complete information about all the calls they made within that period of time. So I know the time of the call, the duration of the call, I have a unique identifier that allows me to connect all the records corresponding with a particular caller and then coupled with the known location of the cell towers, now I have a way to obtain approximate location of where the calls have been made. And so this is what I'm going to mean by CDR. So from this really short description, you can see that while CDRs allow me to capture movement, I do not know anything else about these people. Of course the frequent of calls could be an indication of their wealth and the literature has been using information CDRs as proxies for social variables. However, the information is quite limited. And so for that reason, I try to combine this information at the ecological group level survey data. And so from the same time period, demographic and health surveys, DHS data is available from three different years, 2005, 2007, and 2010. So DHS is used, a double stratified sampling. So the first stage, a couple of locations where surveys take place are sampled and then given those locations, people in the immediate vicinity are being surveyed. They released approximate location of where the surveys took place after a perturbation and so that gives me information about the spatial location of the DHS clusters. And so putting everything together, let's see what we deal -- yes, John? >> I just wanted to ask about the CDRs. realtime CDR data? What are the barriers to getting >> Adrian Dobra: Real time CDR data. By that you mean that I'm going to see a record immediately after it has been generated. So right now in the U.S., this is being done. And so there are companies that actually sell CDR data and so you can check them out. DR sage .com is one example. So DR sage has their own network within the networks of pretty much I think the top four providers in the country. And so CDRs are being generated. AR sage gets in their database pretty much all the records of the people making calls and so technically, this is being done. However, purchasing that kind of data I tried to do it and it's extremely expensive. Companies such as one mark interested in where people come from in the vicinity of one of their stores can do that. But from people like in academia, actually seeing that data is not possible. So yes, I've been down that route. Yes? >> [Indiscernible] now that now it's kind of moved into [indiscernible] which report their position at [indiscernible] Intel by which much less by [indiscernible]. We talk about all the biases to CDR. [Indiscernible] kind of reporting that's [indiscernible] does is much less biased in several aspects, but then it's biased [indiscernible]. >> Adrian Dobra: I mean, in all fairness, the companies have a lot more location -- I mean, telecom companies have a lot more information than CDRs because if somebody doesn't make calls, then they are going to be invisible in the CDR database. However, telecom companies have what's called RFID data. So I think every 30 seconds, our cell phones get a ping so through triangulation, companies know where our cell phones are and those databases exist. >> The GSM send-out is not designed for that. It's not -- so the phone does that. The phone knows [indiscernible]. Addition of software which measure qualities and are reading them and send them [indiscernible] which is in smartphones so it's [indiscernible]. But it's not -- I mean, it's [indiscernible] let people think that their phone is reporting their position and it's not -- the phone where it is [indiscernible] triangulate but sent report [indiscernible]. >> Adrian Dobra: I mean, I know about [indiscernible] data only anecdotally, really, because I haven't seen it. I don't think I will see. However, I've been talking to people who have analyzed [indiscernible] data and that's what they told me. You know, they look at, so ->> [Indiscernible] widespread things like [indiscernible]. >> Adrian Dobra: Yeah, yeah. >> [Indiscernible] do you also have temporal information, [indiscernible] information? >> Adrian Dobra: So I mean what I know is when the survey took place, and so I just know year. >> Year. >> Adrian Dobra: Right. That's all about it because surveys take quite -- I mean, months to really be ->> [Indiscernible]. >> Adrian Dobra: Right, right. And so I mean, there are many limitations. I mean, the point I was trying to make here is that a lot of mobility information about our society exists. It's a huge amount of it, you know. What I got access to is just a little tiny glimpse of it. So trying to connect survey and CDR data in this effort, what kind of map are we going to look at. And so this is a map of Rwanda. Rwanda is this very small country, and what I plotted here are the location of DHS clusters and as you can see, so the color blue, light blue, and green indicate the surveys from 2010, 2007, and 2005. And so if you follow these colors then you're going to see the locations that are being reported are not the same. Right? Because they -- because of the double certified sampling. They sample different locations and then the perturbation was also different. And so even to connect DHSs from several years is difficult. Right. In red, you see the locations of the cell phone towers' locations I know of and you can kind of see their spread varies. So Rwanda has the capital Kigali pretty much in the middle of it, and that's where the largest concentration of cell phone towers you see right here and DHS clusters occurs and then the rest of the country has a few cities right around the border areas mostly, but as far as I can tell, it is mostly rural. And so we would expect a lot of information to come from this area and so -- and then you know, there's quite a few information coming from where the cities are but then in the rest of the country, the spread is quite large and actually this picture is a bit misleading because it's a picture of an entire country and there are so many pixels, but if you zoom in, you'll see the sparsity is quite high in pretty much all the areas other than Kigali. And so the first challenge is quite significant and it is also compounded by the fact that not all the cell phone towers with locations you see here represent location of the active towers. So these are present all the towers I know of, but if you look in time, you'll see that the -- and we are going to look at that plot. We are going to see that the number of towers and location of the towers, the concentration varies quite a bit in the country. And so connecting CDRs with survey data involves a lot of modeling, and that's one of the things, the to-do things on my list. You know. However, so far the results that I have just from a first look are pretty much full of questions that are yet to be addressed. Another country I'm very interested in is South Africa. In South Africa, because of the upper high regime, the phenomenon of circular migration is widespread. So many specially had to travel for work. They essentially had to leave their hometowns and go out to work in mines or in factories located quite far from where they are. And men, the families of these men didn't have the permit to move with, you know, so families were essentially separated for six months and maybe one year. So after the upper high regime was abolished, of course those travel permits disappeared but the way of life has not disappeared. And so because of the separation in the families, HIV is so widespread, right, because, you know, essentially men look for different partners while away and maybe the women were maybe doing the same thing. So I'm very interested in this country because there are no mobility patterns that are strong and are still present nowadays. And so I established a collaboration with the Africa Centre and Africa Centre manages a study area that is located north of Durban. This is Johannesburg. This is the location we're looking at. It's most quite a rural location. This is a map of the surveillance area, about 100,000 people live there. And what you see here, the small dots represent the location of the homesteads. So they have complete [indiscernible] information about everything that takes place in the area. So they have just information about the roads, about location of the health clinics. They map all the homesteads. They map pretty much everything that can be mapped in that area. And then every year, they try to do HIV testing for all the people in the area. Of course they might not get consent for about 30 percent of the people give their consent to be tested and every six months they collect [indiscernible] demographic information. And so that represents a really, really incredible source of information. Actually, this is state of the art. And so what do they do with all that location information? Well, they create maps like this. So one map that you see here, right, represents one year. So they start in 2005, '6, '7, '8, '9, '10, '11, and so what you see here in redder colors represent how HIV has spread in the surveillance area. So you see that it starts off with the higher concentration here where a township is and then it pretty much spreads more and more while, you know, in 2011, there is -- HIV seems to be quite widespread in the area. Of course this is not necessarily -- yeah, please go ahead. >> [Indiscernible] each year. >> Adrian Dobra: Each year, right. 2005, '6, up to, right. So that's when they collect their information. Right. And so of course higher prevalence of HIV is not necessarily something negative because it could also be an indication of people living longer. So people are follow treatments, right, so they are on ART, the follow-up is and the adherence is quite good for that surveillance area so people live longer with the disease and for that reason, you know, we see higher HIV prevalence, right. And so what would be the key negative of this map? This map really represents -- this is a science paper published in 2013. So top journal, really high-end data. But what would be the problem with this map? What does this kind of map assume without us really having to spell it out? So the dot -- yeah? Any guesses? >> [Indiscernible]. >> Adrian Dobra: Exactly. So this kind of map is based on the location of the homesteads. Right. So these maps assume that people exist where their homesteads are, but this is such an inappropriate assumption for this area because of circular migration. You see? So while this is state of the art, I mean, this is one of the few surveillance area that has such a detailed GIS information. What you would really want to capture is movement. Right. And that is exactly the missing part in the picture. >> There's other dimensions that it misses. reproduction of ART treatment or ->> Adrian Dobra: >> I mean, you talked about the Right. -- when was that introduced and what was the prevalence of that. >> Adrian Dobra: Right. >> So I think that's one dimension. But then there's also demographic changes in populations. So has there been population changes here. So I think there's a lot of different [indiscernible] where they're missing in any kind of static view. >> Adrian Dobra: Absolutely. You know, but say, you know, quite a few of these dimensions are related to mobility because adherence to long-term treatment is related to mobility, right? Is a person that lives in the same area, they are going to be aware of where the health clinics are. They're going to know where to get their pills. But if somebody is traveling around the country and is faced with environments they're not familiar with, perhaps they are unlikely to go and see a doctor when they need. They're unlikely to get their pills when they need it and so on. And so pretty much all the factors I think you mentioned can be -- >> [Indiscernible]. So if you have a new treatment, that's not [indiscernible] missing here by population density that I think -- so is there a sense in which the hot spots correlate to population density? >> Adrian Dobra: So they have adjusted for that. So they have adjusted for the changes in the population as far as I can tell. >> I expect that the hot spots correlate to population density so the red areas [indiscernible] correspond to ->> Adrian Dobra: Well, I mean, the hot spots correspond more to distance from major roads. And we are going to get to that. Actually there's a major road right here going through that area and then you see how HIV has spread pretty much along that road, right. So there's township here and that's where the major road is and then you see how, you know, population density but also HIV kind of spreads, right. Yes? >> Do you have a sense of sort of the changing and consenting to testing? [Indiscernible] treatment more efficient or if you know sick people [indiscernible] errors and the treatment is more efficient, the more I need to be consenting to testing right? >> Adrian Dobra: Right. So I think initially they got I think a 60 percent consent rate and then it pretty much dropped to 30 percent from the subsequent years you know, but it's not the same people who give their consent each year. >> [Indiscernible] expected the opposite. As treatment become more efficient, you are more likely to be willing to be tested. I mean, I don't know, maybe there are other facts. >> Adrian Dobra: Right. But there is still the mobility factor. So these are a couple of pictures I took back in January when I went there. So this is how a school looks like over there. This is a typical homestead in that area. So it's pretty much a one-room type of dwelling and everything takes place in there. And what you see here are two of my friends, two fieldworkers that took me out with them while they were knocking from door to door. And so what you see here are stacks of files they were carrying with them and you see maybe there are 10 or 15 files in -- they have there and each of those files correspond with one household. So they had about 10 or 15 people who are recorded as living into one of these dwellings. So I asked them, how is it possible that so many people whatever fit in that homestead and they said, recognize well, people are never there. Essentially, most people who are recorded as belonging to that homestead are somewhere out to work and they periodically come back and there's some sort understanding of who takes over and who leaves and so on and so in as the reality of their life. There is a lot of mobility. And residential location has very little meaning in their life. It has a lot more meaning in our lives, right. But in this lives, it has very little meaning. So really, capturing mobility is a lot more important in a sense in the South African context than it is in the Seattle context, if you like. >> What does consent rate now mean in the context of is it for percentage of people, percentage of households? >> Adrian Dobra: Right. So that's something I need -- right. So that's a fair question and that is something I need to talk a bit more with my friends in the Africa Centre about because from their papers, obviously controlling all the details, but a lot of questions come up once you really understand how their life looks like and that to me was one aspect I was really intrigued about. And so this is just an example -- well, it's a fake one, but you know, I tried to make it as real as possible. You know, so this is an example of how a life of a person could look like. So this person, a man, John, say, lives in the surveillance area but really, he has to work in Durban. So he periodically comes back to the surveillance area to see his family but then he really spends most of his time in Durban. And so what he's exposed to a multitude of context that vary quite a bit when he is in the surveillance area where his home is as opposed to when he is in Durban. In Durban, he is away from his family, so he might be more prone to risky behavior. In Durban, he cannot afford to live in a good area, and so he lives in less than desirable part of the city, in which there are a lot of bars and sex workers and so on. And so the temptation might be there for this person really not to follow the kind of life he would have if he would have been with his family. If John happens to be on ART, he happens to have acquired HIV, then he needs to take his pills regularly. When he is near his home, he's going to know where the health clinics are. In Durban, maybe he just doesn't know there is a health clinic a couple of blocks away. And so while the possibility is there for him to get his pills, maybe because of his different environmental context he's exposed to, he just doesn't know where to go. Maybe there is social stigma associated with going to the nearest health clinics in his hometown. And so maybe he's more likely to obtain his pills when he's away as opposed to when he's in hometown. We just don't know that. Yes? >> [Indiscernible]? >> Adrian Dobra: So I do not know about that. No. So the only point I'm trying to make here is that moving from the really one point that -- from one point location to looking at mobility patterns, figuring out what are the relevant locations in somebody's life is really, really important for explaining health behavior, social aspects and so on. And so the information that is present in CDR data is part of the picture you know, but really that is the kind of data I would like to look at. And the project will hopefully get funded will collect GPS data from about 1200 people living in this area. So hopefully I will get to do that and hopefully next time I get to come here and give a talk I will have real data instead of this schematic. And so dynamic context, what do they represent? And so if we look at the trajectory of the space and trajectory of a person, so the plane here represents latitude and longitude XY and then Z, that represents time. And so you see how this person seems to stay at this key location for quite a lot of period of time but then he periodically moves to this other location and so identifying when a person is present in certain location and the number and spatial spread of this election is certainly important, right. And so that is precisely the kind of information we would need to capture. In the same time, knowing just the location, latitude and longitude, doesn't have much meaning. Right? What we would like to do is have layers GIS maps associated with different risk factors, say accessibility, road networks or concentration of healthcare facilities. Or access to food stores and so on. So each of these layers, GIS layers need to be produced and each of these GIS layers can be connected with spatial temporal trajectories and everything put together will give an accurate picture and will make us able to produce location that also have meaning attached to them. And that meaning can come at so many different dimensions. And so this is why GIS work is really, really important together with the effort of collecting good spatial information. And so UN NATO actually released back in January a report called location, location, location. In this report they stressed the importance of going from country-level estimate of disease prevalence to local level. And so moving from country level to local level can be done for example by using the location of the homesteads and that kind of information would be available countrywide. That will represent a huge step forward. However, as we have just seen, that kind of information has only a limited relevance, say, for in countries which mobility plays such a key role in the volatility -- there's a lot of volatility of movement. And so capturing multiple spatial context of really, really important. The residential context which is really dominant in the demography literature is by all means not sufficient, although, knowing the residential context of the country level is still something that has yet to be obtained, especially for this advantage countries. And so what is also person is spatial colocation of individuals. So being able to perform realtime census is being able to know all the time where people actually are is quite important because if you think about spread of infectious diseases, spatial colocation is a key indicator of that. And so for all these reasons, your located data that can come from quite a few sources becomes really, really important. And so ideally, what we would like to have for each person is an indication of the areas in which an individual exists, the times when that individual exists in those places, attaching meaning to those places, knowing the kind of activity an individual is likely to take place in in each of those key activity areas, and also determining the associations between activity areas and various contextual measures. So overall, these personal activity maps should replace residential information and so based on these maps we can have comprehensive picture of somebody's well-being and health and the results spatial and temporal behavior has on the individual's well-being. And so what kind of information we can look at, what kind of location information we can look at, the first sources of individual GPS tracers. GPS tracers are quite common in the literature dedicated to obesity. So they use GPS tracers and [indiscernible] meters to capture really dense location information. And so however -- being able to use those devices from a longer period of time where long means two weeks or 1 month becomes questionable, especially when you want to increase your sample size. So ideally, this is the kind of information you would want to have. However, obtaining it is quite difficult if you also want to determine activity types. So what kind of physical activity a person does. So another source of information are individual cell phone records. So if it would be possible, if somebody could just let the signing of disclosure agreement that let us access all his cell phone records, and that would be amazing because we already know -- we already -- I mean, his approximate location information already exists in the databases of the phone companies that individual is using. However, actually, taking something like that into reality constitutes a challenge because not only somebody signing such a disclosure agreement represents a difficulty, but also obtaining the individual records from the cell phone companies is also another level of difficulty. And so the third source of information represent these collective cell phone records which are incredible if those databases are released. However, the problem with these collective cell phone records is that there's no other information available for those individuals and so any inferences can be made only at the group level. And so if individual cell phone traces would be available, so this is real data from an obesity study done at UW, or actually this is -- these are GPS traces over a period of two weeks, then you know, from those GPS traces we could extract key locations and also patterns of activity, patterns of mobility at each day, day level or weekend level, and also we can slice this kind of information temporally and so having access to that kind of information has an incredible value if that information would exist. And so say given a certain -- given the mobility trace of an individual, how can we actually quantify its movement? And so last year, I started looking at a couple of mobility measures and some of these measures are known in the literature and so these are the six measures I've been looking at. I will describe them here and so what you get to see here are maps that represent the locations from which individuals that were highly mobile with respect to each mobility measure, so this is the data that I have from Rwanda 2005 to 2009. I evaluated mobility for all the individual over period of one month and then I extracted the most mobile ones. And so we expect a number of towers used. This is the profile of the individual that has the largest number of towers used. So you see this individual has been pretty much everywhere and the country. This is the profile of the individual that has the maximum distance between two towers, two use towers. This is the profile of an individual that has the maximum radius gyration. Radius of gyration is some sort of variance of all the positions somebody was recorded in. And so you see, so you -- it starts by calculating the sent raids of all the known locations and then from there, distances from all the known location to the sent raids and computing the square root of the squares of that. Anyway, so that's the radius of gyration, area of the convex hull and so, so on. So what is interesting here is that the different level of mobility bring up different profiles. But soon I was really turned off by using all of these measures. And the reason why I was turned off by all of them is related to the fact that through those measures, I wasn't capturing the frequency of movement. So I had -- so I was able to capture the spatial aspect of the movement but I wasn't able to capture trips. Right. How frequent movement is important. And so I said, well, I must find a way to capture space, spatial spread but also frequency of movement. Where I also didn't capture here is how movement occurs because what I had, where the location of the cell phone towers from which each person made calls, and so through those mobility measures, I had no way to say, okay, this person must have followed a road in order to travel from one location to the next one. Right. So seeing the tower location as point in space is less than desirable. So I said I must find a way to also capture how movement might have occurred. I mean, traveling by air is less likely in Rwanda. They must have followed the road. And so building road structure in this seemed to be important. What I also had to capture was the fact that towers did not exist for all the periods of time, all the 53 months I was looking at. And so this is why I had to use a grid structure. So I mapped -- I used the five kilometer square grid structure and I mapped all the towers that belonged to each grid cell and I considered that to be a location. Some of the grid cells did not contain any towers but the grid cells at that contain at least one tower were called places. And so I used places instead of the location of the towers per se because that gave me a little bit more stability in time with respect to the reference of location. So looking -- so those two plots illustrate the dynamics of the CDRs over from 2005 to 2009 so the top plot shows the number of callers in this provider's network so you see the number of callers go from about I think 250,000 to about a million. So as we move from January 2005 to May 2009. You see here there are two drops, so those are two months I think May 2005 and April 2009 -- February 2009 for which there was very little information so essentially I dropped those two months because there wasn't much data there. And so if we look at the plot over time of the number of active towers in red and the number of active places, then you see how that network of this provider has expanded and there were huge increases from 2008 to 2009, so using this grid structure, by using replacing tower level location information with places grid level information, then I stabilized this effect a little bit so the changes were not that huge. And so the next plot why you shows how the -- what were the places, the active places from 2005 to 2009 so in blue here you see the grid cells that contain active towers in January 2005 and there were 49 of them. In red, you see which grid cells, which places have been added from the previous plot. So you see how the network has expanded and so you see that there is quite huge change, right, you know, when they installed new towers but you also see that going from January 2009 to May 2009, some places have disappeared. See. So compare these two months, there were no places added but two places shown in green have disappeared. And so looking at the dynamics of the CDRs is a little bit more difficult because of the different concentration of location information that I have. Another aspect that was very important for me to capture was related to how the movement occurred. So on the left side, you see a plot of the towers. On the right-hand side, we see the major and secondary network of roads in Rwanda superimposed with location of towers. And so I thought that I must find a way, I must develop more [indiscernible] measures that capture the road network structure because this is how movement actually occurred. And so came up with six mobility measures and really, there are three groups of mobility measures. So I have trips, right, so I look at movement as it captured by a person placing calls from one place to another place and then to another place and so on and I murder that movement through road distance, travel time, so that comes from GIS information. And also from the number of grid cells between two places. And so you see, instead of with the previous version of the mobility measures, see, the distance was quantified as the crow flies and that could be very different than distance on the road network [indiscernible]. And so these three measures captured the frequency of movement and also the road network structure so they pretty much achieved what I wanted. >> I don't understand the distance versus grid cells very well. like they'd be relatively close. They seem >> Adrian Dobra: They are all very close, right. So essentially, the only difference between the -- these three mobility measures is the unit of measurement. >> Time might be very different. >> Adrian Dobra: In the context of Rwanda, is really not. I was expecting that time would be different. That was my intuition, but this is not what I get to see. In a context of a more advanced society, then I would indeed expect this measure to be different, you know, but in the context of Rwanda, maybe traffic there is not that complicated as it is in Germany for example or here. Right. And so having all three measures seems important to me, although they kind of capture overlapping information in Rwanda. >> Sort of estimated drive time? >> Adrian Dobra: Estimated travel time. Right. Yeah. So how long it would take to travel from here so here using the road network. Yes? >> [Indiscernible] from the road. >> Adrian Dobra: >> Yes, yes. [Indiscernible] seen someone [indiscernible]. >> Adrian Dobra: it's all -- Right. We wouldn't have that kind of information, no. >> [Indiscernible], right? [indiscernible]. >> Adrian Dobra: aspects of that. >> Travel from the road. So All you have, you have XYZ, right, so you Correct. But somebody -- I mean, but there are several Somebody might not be placing calls as they move and -- [Indiscernible]. >> Adrian Dobra: Yeah, yeah, yeah. Right. Yeah. It's not trivial to estimate actual travel time from CDRs. From GPS data is absolutely possible, but from -- yeah. John, did you ->> I had the same question, actually. >> Adrian Dobra: Right. So -- >> [Indiscernible]. By trips you're not talking about identifying some subsequence of the CDR. You are calling it a trip, but you're saying taking some period of time and computing the distance between successive pairs of towers and summing those distances? >> Adrian Dobra: It's simpler than that. So what I just looked is when somebody placed a call from this grid cell, and a trip would be whether the next call was placed from another grid cell. So that's a trip. Moving from grid cell to another grid cell that contains -- where both grid cells contain a tower. That's what I call a trip. >> [Indiscernible] your CDR, what is the measure? some period of time? So is it measured over >> Adrian Dobra: No, no, the travel can be -- can occur in 30 minutes or it can occur in the next day. >> Right. I'm not trying to [indiscernible]. So I give you your [indiscernible] for the past month, what is the measure of mobility for you for distance? >> Adrian Dobra: So I look at the -- so, see, take, for example, this grid cell, that's a place. It has a cell phone tower and take this other grid cell that's another place that has a cell phone towers right? And so if I place a call here and then the next call I place was here, then that's going to be a trip. >> So you have a statistic that's a function of pairs of successive CDRs and the statistic is zero if they didn't move out of the cell or something like that. >> Adrian Dobra: So if I place two consecutive calls from this -- from the same place, I'm not counting that because I didn't move. >> Okay. >> Adrian Dobra: Yeah. Please. >> [Indiscernible] edge of two towers and just switching between them and [indiscernible] a lot of trips [indiscernible]? >> Adrian Dobra: Absolutely. Right. So there are limited -- absolutely. There are limitations to using a grid cell structure, and I agree with you. If those borderline cell phone towers could be indicative of more movement than it actually is, but because of the varying density of cell phone towers in their country and because of towers appearing and then disappearing we considered that we gain, you know, we create more stability by using a gaussian structure versus not using it. And so the number of trips, so these we measure capture both space and frequency of movement. This measure just comes how many trips somebody makes, so this is -- there's no spatial aspect of it. It's just frequency of movement and then the other two measures, number visited place, number of visited grid cells capture just the spatial aspect of it. Something, this is a key aspect I didn't talk about, and that relates to figuring out where somebody might have been even if they didn't place calls from those places. So for example, if somebody placed calls from here and then the next call was placed from here, then -- and if the shortest road is this one in red, then we also count it as a visit to place this grid cell here and also this grid cell right here because those are places somebody must have traveled through. And so we balance the fact that somebody might not have placed calls by inferring places and grid cells somebody might have visited based on their own [indiscernible] and that's an aspect that was really important because you've seen that individual that was [indiscernible] with respect to the radius of gyration that placed calls from all the three sides of the country and no other calls from anywhere else, that individual must have traveled through some place. So that essentially motivated this. Yeah? >> It seems like if you were trying to find the correlation between, say, HIV and sex workers or obesity and McDonald's, then you want to look for places where people stay for a little while so they have [indiscernible] to the risk, right? So just traveling through a grid, I wonder if that's really very indicative of risk. >> Adrian Dobra: Right, but we can -- well, it's not. I agree. And so this is only one version of mobility measures that take into account frequency of movement. And also are a bit more stable with respect to the type of locations that are known to us. But I also agree with you that there are other measures that can be developed from here on that could be -- I mean, these measure need to be geared for certain applications, right. And so, yeah, I mean, sky is the limit with respect to how these measures can be changed or replaced but for us it was going from the first set of measures to these set of measures. It was a big step because they -- yeah? >> You also have a [indiscernible] so this is like a [indiscernible]. >> Adrian Dobra: Right. There's so much -- so many variables, so much additional information one can create based on that data once we throw in the road network structure and other GIS information. I absolutely agree with you. So this was information that was actively straightforward to understand in terms of [indiscernible] can be augments in so many ways so it is actually really valuable and you know, it can be exploited in so many ways. Absolutely. We just start realize now, okay, how much we can do it. So I -how much time do I have? >> [Indiscernible]. >> Adrian Dobra: Yeah but how much time? When do you have to leave? Because there are two things which I want to ->> [Indiscernible]. >> Adrian Dobra: Okay. Yeah, okay. Great. So I have -- so I just show here mobility profiles for the highest mobile people with respect to each of these measures and as it happens, the highest mobile person with respect to trip distance is also the highest mobile person with respect to the trip's time and trip's grid cells and also looked at the association of these measures and in the context of Rwanda, I absolutely agree the unit of measurements place very literal. And so what you see here in red are the places, the grid cells from which this person placed calls. In blue, you see all the grid cells this person must have traveled through, okay. And so you see the total distance traveled in meters, total time travel, right, number of grid cells visited. So you see he's on all the extreme with respect to all those measures and also with respect to the number of trips, how many transition this person made. With respect to the visited places here, 0.45, so I used a ratio between number of visited places and number of places that were active in that months so I can compare from month to month because not the same number of places, not the same number of towers were active from month to month and 215 is the number of grid cells so that's one extreme profile. Another extreme profile is this one. This is the highest mobile person with respect to the number of trips and we see that this person places calls from just five grid cells and there are just six grid cells all in Kigali this person placed calls from. We can all guess what this guy was possibly a taxi driver moving around Kigali and so you know, but the point is, you know, the profile of this person that is extreme with respect to the spatial spread and frequency of movement is quite different than the profile of this person that disregards the spatial aspect of it, right. And so another profile, the most extreme person with respect to a number of visited places, you see, this person has visited all those places, all the grid cells that contain towers showing red here and then in blue are all the other grid cells this person must have traveled through. And so based on these six dimension and mobility, now we can come up with mobility profiles that are quite accurate and can be tailored for different applications. What I've also been looking at is the association of mobility measures cross time and so in this one plot, which I chose to show here, you see the correlation between the five mobility measures that capture space with respect to number of trips. And so you see that visited trips distance, trips time, and trips grid cells see are quite -- have a high correlation with number of trips and that makes sense because most for measured captured the frequency of movement and then you see the number of trips has a much lower association with visited places and visited grid cells two measured do not capture the frequency of movement, right. And then there are corresponding plots for correlation with respect to the other mobility measures and from those association measures, we clearly see that the three groups of mobility measures are there and are quite distant. So we've been talking quite a bit about capturing mobility. But what about identifying the people that not move -- that do not move. This is such an important aspect because people that are immobile are people that could, for example, have software from a chronic disease, handicapped people, poor people, basically people in need. There could be also people that perceive movement as something adverse to them. So for example, when a disaster occurs, say if we look at our coastline area, when a hurricane comes, knowing where the immobile people are makes us know what are the areas that should be searched first because those areas are likely to contain people that refuse to leave when the warning came. And so being able to identify immobile people at different levels is very, very important and here, I show the concentration of immobile people, people that just didn't move, people that placed calls from only one grid cell in across the 51 months of data that I have. And so you see here that in red, you see where the highest concentration of mobile people are at the country level, and then you see that there are two grid cells that are right in Kigali and then there are two other grid cells here and here that come up as being relevant throughout this time period. These are all associated with two townships right on the boarder of Congo. And especially these two grid cells right here, their numbers are 360 and 361. I'm mentioning them because these are two grid cells in which that were affected by an earthquake that took place in 2008. And so we will come back to that. But the point is here that we are able to capture and locate immobile people and that is important for so many reasons. Yes, please. >> [Indiscernible] correlate with like the frequency of [indiscernible]. >> Adrian Dobra: I didn't look at that. Right. So I absolutely agree that there could be people who maybe have four cell phones and maybe they chose to use that one cell phone I have information about only when they're at home. And those people would be perceived as being immobile. So no, I didn't look at frequency of calls. But of course, there are so many filters one can apply going from here. >> [Indiscernible] something which [indiscernible]. >> Adrian Dobra: Right, right. You can look at some sort of ability of movement across time and I just took periods of one month and if somebody just didn't -- even if somebody placed just one call within one month, you know, it didn't matter versus placing 1000 calls. Correct. So yes, there are filters that can be applied. And the point is, CDR data can be used to create maps like this, which are so important for so many reasons with respect to planning, figuring out where resources should be concentrated and maybe where rescue efforts should be concentrated. And so what the last thing I'm going to show you is related to the ability of CDR date to capture extreme events. Right. And so I he took the example of this earthquake, 5.9 magnitude that took place on February 3rd, 2008. It took place right here, somewhere in -- so this is Lake Kivu. And this is the border between Congo and Rwanda. Rwanda is over here. There are two major cities. This one right here and the other one up there. They are the same ones that showed up as containing quite a few immobile people and so what I tried to look at -- so this was a relatively major event. And so what I tried to figure out is whether it is possible for me to see that event identify that event from CDR data. So this is the location, the epicenter of the earthquake. And so in red you see all the towers that were active in February 2008 that were within 25 kilometers of the epicenter of the earthquake. And so I looked at the CDRs of all the people that placed calls from those towers. When I tried to look at mobility levels, using month as a unit, I really didn't see any change as it went from January to February to March, there wasn't much change. So being able to zoom in at the daily level and also zoom in at the corresponding spatial spread was very important. And so once I looked at daily mobility levels, then I started seeing patterns that were really, really interesting. And so there are five places in which those towers within 25 kilometers of the epicenter belonged to. This is place 360. It's right here at the border with Congo. And so what you see here are mobility, daily mobility levels for ten days before the event in red and ten days after the event, the day of the event is shown in blue. And so the four graphs correspond to four mobility measures that capture the frequency of movement. And so this is trips distance, trips time, trips grid cells and number of trips. And so you see here how the day of the event becomes quite clear because it is -- it corresponds to higher number of trips. Right. And then you also see how the event itself has changed the mobility patterns in the area. So what is interesting here is to be able to figure out where in realtime whether an event such as the disease outbreak takes place and be able to predict the spatial spread and the existence of an event we are not aware of. Here, obviously we are aware of the event. We kind of try to reverse engineer, but the interesting application would be the development of predictive models that would allow us to see in realtime and that goes back to your question, right, can we see the answer in realtime and really be able to predict when something takes place and where that something takes place. And so what we see here is that, see, mobility levels clearly seem to go up the day of the event and that is clearly visible but after the event for this particular grid cell, they seem not to come back to the levels that were there before. So you see there is a clear change between the days before the event and the days after the event. >> [Indiscernible] other way around. [Indiscernible] after. >> Adrian Dobra: Oh, I'm sorry. Okay. Fine, right. So it's -- fine. So I reverse my story. Right. Okay. Yes, you're right. Red they're after, green is before, right. So there is less mobility before the event. There is a spike the day of the event and then mobility slows down but not at the levels that were there before. The point is we can clearly see the event in the data and when we talk about disaster recovery, it is quite important to figure out when an area -- the activity in one area has resumed at its normal level before the event. So that kind of information is clearly there in the data so this is the grid cell 361 right next to the grid cells we just looked at and here we clearly see seeing the number of trips the day of the event stands out. There is a little bit more variability in this grid cell in terms of mobility before and after the event but the patterns are still there. Yes? >> I think in this case [indiscernible] because the number of trips might increase because [indiscernible] same mobile but [indiscernible]. So then my number of trips increase, my number of [indiscernible] increase, all those matrix increase [indiscernible] not changed at all, right? >> Adrian Dobra: I absolutely agree with you in this context it makes sense to look at things like frequency of calls and the literature has examples in which, say, sporting events have been captured from CDRs. I mean, you can -if you look at the work again, you see when team scored because people started calling right after a goal is scored and so you can clearly see the up and down so you can see the half time, you can see the end of the game and so on. So that is people being in the stadium. Then obviously there is not going to be much change in their mobility level but the frequency of calls is going capture that. And so I absolutely agree in this context it makes sense to look at that aspect. What I tried to show is that using the same mobility measures I defined, I am able to see the event quite clearly. I mean, see, if you are expecting something to happen, then monitoring frequency of calls is something you can look into. But if you want to identify events that are about to happen, the frequency of calls at least in my opinion has a lot more volatility and so for example, if a certain campus is affected by a disease outbreak, then what's going happen is that people might not come to work as they usually do and then if you look at mobility patterns in that area, that is going to be reflected. The frequency of calls might not be, might not play a role -yeah. >> [Indiscernible] mobility matrix are not immune to change in frequency of calls and then [indiscernible] reliable to look at them if there's been a sudden switch in the frequency pattern. >> Adrian Dobra: I absolutely agree. But for developing countries, looking at Rwanda, the frequency of calls is not something I trust because in that context, people might not make all the calls they want to make and they might not make all the calls they want to make from all the cell phones -- from the same cell phone. And so here, I absolutely agree with you. If you assume somebody is using their cell phone and it's the only cell phone they are using, every time they want, then you would want to look at frequency of calls but in the context of Rwanda, I do not think the frequency of calls is so much indicative of behavior because they might not have the means to do it and they might share several cell phones and so on. I mean, I heard that people have their own sim cards but share the actual phone and they just replace sim cards to place their own calls and so on. So actually these records indicate sim cards, not the cell phones per se. Yea, so essentially -- and the patterns is still there, right. If you look at place 467 that is north of Lake Kivu, so spatially, you pretty much get to see the impact that earthquake had and you can clearly see that that impact is very, very visible from the CDR information that we have. So that's place 70, same sorry. [Indiscernible] strong and it is there. And so I will conclude here, you know, part of what I'm interested in is HIV spread but this is only part of the picture. There are so many ways this research can grow models from identifying relevant places and the dynamics of those places are very relevant predictive models for catastrophic events, for extreme events, are relevant capturing mobility and also building that model that predict health and well-being are really relevant. The key aspect that data with as far as this I guess we can to have access of all this is looking at GPS geo-located data and combining GIS information and so you know, the avenues one could go from line of research is concerned are really endless and you know, only see the beginning of this research taking off as we start and collect GPS information. So I will stop here. Thank you. [Applause] >> Adrian Dobra: Any other questions? All right. Thank you then.