21200 >> Jie Liu: So I'm happy to introduce Dan Work from the Department of Civil Environmental Engineering at the University of California at Berkeley. Dan Work is a Ph.D. student there. He's graduating really soon, in a couple of months. And after that he's going to be joining the UIUC as a faculty member as a civil engineer in the department there. He'll describe his work on the Millennium Mobile project, a project estimating traffic based on smart phone data. Go ahead. >> Dan Work: Thank you for the introduction, and thank you for having me here today. I'm going to talk today about real time estimation of distributed parameter systems, and specifically with application to traffic monitoring, based on some work we've been doing at Berkeley through the Mobile Millennium project, which is a large deployment we have of users in the Bay Area who have downloaded mobile applications, collecting GPS data and sending it into our system, and then we use that information to estimate traffic conditions and send it back to them. This is part of my Ph.D. research at U.C. Berkeley where I work with Professor Alex Byian at the Department of Civil and Environmental Engineering. Parts of the talk are also joint work with Nokia Research Center in Palo Alto. I spent a couple of years both as an intern and visiting researcher working on parts of this problem. So I seem to have some problem with the slides. Let's see. Let's see if that helps. So before I get into talking about traffic monitoring, in particular, I want to give some context about distributed parameter systems and why they're important, especially as we're interested in managing and monitoring both the built infrastructure and the natural environment. So a distributed parameter system is just a system in which the spatial variation of the system plays an important role in describing the evolution of the system in time. So this is true, for example, in the context of monitoring air quality, looking at how contaminants propagate in rivers, even describing how buildings respond to wind or seismic loads and of course what I'll talk about today is how congestion propagates on traffic or how congestion propagates on roadways. So the commonality amongst each of these systems is the fact that again they have a spatial component which is very important that we like to characterize so we can understand how the system will evolve forward in time. Basically with the game that we play here is we have some -- we have the physical world, and we want to come up with some mathematical abstraction for our distributed parameter system. One way to represent a distributed parameter system is in the form of a partial differential equation. You take the model, build the abstraction or take a physical world, you build your mathematical model. If you describe the appropriate initial conditions and boundary conditions and model parameters you can completely characterize how the system will evolve. Okay. This is really useful, for example, if you want to understand, for example, how to evacuate an urban corridor, or, for example, what's going to happen if you change the average sea temperature of the world by say two degrees. The problem you're only loosely coupled with the world, though, because in practice initial conditions are almost never available. The boundary conditions are unknown. And model parameters have a lot of uncertainty in them. So in the direct problem, where we basically specify this stuff initially, and then run our forward simulations, we lose this coupling again with the physical world. Sometimes the problems I'm interested in are estimation problems. Where basically we try to augment the fact that we have some uncertainty in our mathematical model with additional sensor data with the physical world. If we can basically augment that information with the sensor data, we solve estimation problems which we try to get a better estimate of the state of the system than either the data alone would give us or the mathematical model would give us. And in this way we can create a more tight coupling between both the computational or cyber side of our modeling infrastructure as well as the physical side, which is the actual world that we're interested in monitoring or controlling. If we can create this tight coupling it allows us to feedback based on our estimate what's happening in the world into our system to control it in direct control in the context of changing traffic like in traffic signals or information giving that information to the users and let them respond based on the new information. So basically the game to play with these estimation problems either you estimate the state of the system which in some communities it's called data assimilation or you can estimate parameters which is also known as inverse modeling. But if you can correctly estimate these things, and again you can feed that back into the system. Okay. The major problem with doing estimation on distributed parameter systems is going out and collecting sensor data. It's extremely expensive to sense in large distributed areas. And I think there's really two things that are starting to change how we do sensing on a lot of these things. And the first is the mobile Internet. The fact that by 2013 it's expected to be like 1.3 billion smart phones worldwide. I'm guessing that's probably the world's largest sensor network that has communication, computation, and some form of sensing embedded in it. Now that platforms are starting to open up we can actually have access to the applications that run on mobile phones, that gives us a platform for developing a bunch of rich applications that connect our physical environment and people that cell phones are embedded on in the physical world with the cloud. The other thing is sensor 2.0 is an emerging paradigm basically based out of some of the work done at MSR where instead of having sort of deploy a sensor network and then hiding the sensors in your proprietary application, make that available sort of to the outside world. Let the sensors be a platform as a whole. And so although you may have a sensor, for example, that's used for traffic, it may be useful, someone else may want to use that information for estimating something totally different. In order to get that information having a common platform in order to access a sensor data becomes useful especially in building these large distributed systems. Okay. So what I want to talk about today in the context of traffic is really how to combine this modeling and this sensing for an estimation problem, both online and in real time. Online basically I want to do this as data becomes available, I want to be able to continue to do my estimation piece by piece as the information is available and in real time I want to be able to produce an estimate fast enough that the physical system hasn't changed before I make that information available. And sort of one common approach to solving these problems is basically I look at the model that I have of the physical world and the estimation problem that I want to solve. And I go out and I specify like what type of sensor I should design to solve this problem. Or where I should place these sensors so I get a good estimate of what's happening in this data system. In today's talk I'm going to do the exact opposite specifically because of the availability of GPS data. As I'll show later most of the mathematical models for traffic are density-based and the sensing that comes from GPS smart phones is velocity information. So there's some incompatibility here but there's so much GPS data that's going to become available soon we'd like to take advantage of that. And to make the estimation problem easier, rather than specifying and designing the sensor that I'd like, what I'm going to do is I'm going to change the mathematical representation of the model so it becomes easier to integrate the GPS data into the estimation problem. So I'll show how that's done. That's really the core idea is to recast the model in a way that's still physically consistent with the work that's been done in the transportation community for 50 years in a way that can take advantage of this new data that's becoming available from smart phones. So a little bit more concretely, the things I'm going to talk about. First I'll provide background on traffic monitoring technologies, what's the existing sensing of a structure look like, what are the issues when you start using GPS data from smart phones. Then I'll talk a little bit about the mathematical models of traffic. So first what do these historic models look like and why do transportation practitioners like these models. I mean, what do they model and measure. And then I'll talk about how I can transform those models from the density, as the state that these models work on, how many cars are stacked on a unit on a stretch of roadway, into a velocity evolution equation. I'll talk about how you can expand that to networks. Finally I'll talk about the real time estimation problem, how do you solve this integration of uncertain measurements in an uncertain model in real time. And it's solved using a variant Kalman filtering known as ensemble Kalman filtering. I'll show experimental work we've done to validate that the algorithm works well in practice. I'll just conclude with a few summarizing remarks and some future directions for where we'd like to take this work. Okay. So let's talk about traffic monitoring technologies. If you've never been stuck in traffic there's no real need to motivate the problem. Average I think driver in the United States right now wastes something like a week stuck in traffic every day. It's a huge annoyance but it also has a huge impact on the economy. Departments of transportation have been very active and they're trying to build monitoring systems. We know we can't build our way out of the congestion problems. We need to start managing this infrastructure better. So they built systems that either face inward for transportation practitioners that basically record any of the count data that becomes available, how many cars are using a roadway during the day. They can record that and store it in a database and throw it on a map and use this for planning. And public facing tools like changeable message signs that may tell you how long it takes to get to an airport based on real time traffic data. So the sensors that feed this, basically the vast majority of traffic sensing technology relies on inductive loop detectors. It's a sensor that basically you put a coil of wire in the ground and as cars drive over it you register a signal and you can count the number of vehicles. And there's been magnetometers that are developed that are wireless and do the basic idea of counting the number of cars. The problem with magnetometers and inductive loops, if you put a sensor in the roadway and you have several thousand vehicles cross over it every day, over time those sensors fail. The only way to replace them is to shut down a lane of traffic and then dig it up and replace it. This is very expensive. In some areas it's not feasible to shut down the traffic to replace these sensors. >>: Does the loop give you just the presence of the vehicle, also the speed? >> Dan Work: So it depends on the type of loop. In the case, if you see these here, there's actually two loops. There's one small loop here and one here. They do like -- they just deal with the time distance, or time difference when the signals register they know how far apart they're spaced and they can -- what's that? >>: Single loop? >> Dan Work: There's some people that have done some work to try to get estimates of the speeds from the single loop based on how long the vehicle was sitting over the sensor. But, of course, this depends on how long the vehicle is. So this -- you can get some speed measurements out of them. They're not the most accurate. Double loops are more accurate but they still have some problems as well. So a lot of the departments of transportation have been saying let's stop putting sensors in the roadway, this is too expensive to install and maintain, it's not practical, let's get them off the road. Either they use radar detection or video detection to identify license plates or just to show people the images of whether traffic is congested or not. These systems are -- I mean, if you know NAVTEC, they own traffic.com. This is the sensor they deploy. Deploy a radar sensor on freeways in the United States. People started to say let's go one step beyond than that, let's put the sensors on the vehicles themselves. Either taking advantage of the GPS units that are in fleet vehicles like taxis or FedEx trucks or UPS, or in consumer vehicles with the toll tags on the East Coast, it's E-ZPass. On the West Coast it's, or at least in the Bay Area, it's FasTrak. So these have an RFID transponder, use it to pay your tolls but they use it to deploy readers in the transportation network and they record basically when you passed one sensor, when you passed the other, and it gives them an idea how long it took you to travel on that stretch of roadway. But what everybody is really excited about now is the GPS data. And why is that? Well, just to look at what fleet data and GPS is making available to monitor traffic right now, is just look at these images. So this is the San Francisco Bay Area. This is one day of taxi data. There's 500 taxis, roughly, that we have access to in Mobile Millennium. Each taxi reports its position once a minute. And so each red dot in this image -- this is the San Francisco Bay Area, this is zoomed into a particular area, corresponds with one measurement. You can already start to make out, we talked a little bit earlier can you identify the roads. I think it's pretty obvious, you can start to -- it wouldn't take much work to be able to identify what the network topology looks like there. There's opportunities to even potentially build the network. If you're really familiar with the area you might recognize this as really the San Francisco airport, and if you go to Terminal A or B or C, you can drive through there. Okay. So fleet data is already available. It's already being used by several commercial companies to try to estimate traffic conditions. And what's got people really excited is fleet data is relatively small. You're talking about, I think these numbers are from NAVTEK, they're looking at basically 100 million points worldwide annually in a year. But in a couple of years, based on just GPS data from cell phones, it's going to explode to more than a billion points. It's kind of funny, really early when we started this project, I had to convince people that GPS was going to be a feature on your phone. And the only analogy I could come up with would be like it would be looking back five years and saying we're going to have cameras on phones. This is just preposterous. But now it's sort of obvious this is where things are going based on location-based search, location-based advertising. There's too much inertia and too many services that can be built on this stuff. Every phone is going to have it. Most cars will have it soon. And this is really what we want to start taking advantage of. How are we going to build models and estimation algorithms that can take advantage of this GPS data. Because it's just going to be so much cheaper to acquire than the fixed sensing infrastructure that we currently go out and deploy. So the first real problem I worked on at Nokia and as part of my Ph.D. was really how do we collect GPS data in a way that manages privacy of users. And so the paradigm that we came up with that was really led by Bico, researcher at Nokia Research Center in Palo Alto, was the notion of virtual trip line. I'll explain a little bit later what that is. Motivate the privacy problem, okay, this is a small video of me in 2008 driving in Berkeley, California. My car, with the first prototype application of a cell phone just recording GPS trajectory data. As you saw at the very beginning of the video, there's a stop right here. I'm picking up another Ph.D. researcher, Juan Carlos Herrera, now a professor in Chilé. It's easy to identify which home he lives on based on the trajectory alone. You can also tell we left Berkeley, if you know the area well. Even though the data is anonymous even though it doesn't say anything about the car, it's pretty easy to reidentify who might be in that car based on just where the trip started, where the trip stopped and where there might have been anomalies along the trajectory. Of course, this is -- these green bars correspond to the GPS measurements, where did the phone actually send a measurement and how fast was it going. And we did this initially because we wanted to see -- can you get lane resolution? Do you have problems with identifying whether or not you're on a freeway or a frontage road or nearby road. So over time we came to realize that the GPS data even in dense urban areas you can get good resolution most of the time even by these low quality GPS units that are put in cell phones. Okay. Temporal sampling with smart phones has some problems. And there's been a number of researchers, both actually John Krumm here at Microsoft has done a lot of work on really highlighting this problem and trying to propose some work on how to get rid of reidentifying users who are sending information from vehicles. So you can do things like basically adding noise to the data that's sent from the cell phone or anonymasing it or preventing certain areas from sending measurements. The approach that we took on this project is a paradigm known as a virtual trip line. Okay. A virtual trip line, the way to think of it is just a virtual sensor. It tells the cell phone where it should send measurements. It can be used as a trigger for the phone to say it's okay for me to send a measurement here. I want to explain exactly what this looks like. The first step as the phone is running its traffic application, it's downloading map tiles, downloading traffic data. It's also going to download these virtual trip lines. It makes a request to the database stored at Nokia and says give me the virtual trip lines of the road I'm on. You can see what a virtual trip line is, two latitude/longitude coordinates and a line segment between those two latitude/longitude coordinates. They lay across the roadway. As it drives down the road, it checks: Does my local GPS projecor that I have stored on my phone, does it cross one of the trip lines? If it crosses one you can send a measurement. Virtual trip line, ID at a particular time, and here's the speed I went. As the vehicle travels down the road it continues to send measurements. The important thing here is that, okay, the game in terms of understanding, does this preserve my privacy or not, basically can I look at this virtual trip line ID, time stamp, and the speed and identify that these measurements came from the same user or not. And based on sort of posing the problem in this way, we can identify how far apart these virtual triggers need to be. And the phone can run local rules that say basically like I will only probablistically send measurements on this virtual trip line, and the important piece is that it's really a framework from which you can better manage the information that's being collected into the system. You don't put virtual trip lines in an area that's like a residential area, because most of the information that you would collect there from the beginning would be something that might be extremely privacy invasive. But even if you're on a major freeway, say 90 from Seattle, if you're the only person that sent a measurement on a virtual trip line, doesn't matter how anonymous you are, the next virtual trip line we're going to know with 100 percent certainty that that measurement came from you. >>: So how often do you sample the GPS? Because I assume maybe you realize you crossed the virtual trip wire only after you actually crossed it, maybe you're 500 meters ->> Dan Work: So the current client or the client that we used in this work wasn't designed to be the most energy efficient client. We were basically looking at let's get the data and see if it's going to work proof of concept. So there the phone was running, basically it was pulling the GPS at either one second interval or three second interval, something like this. And from there you're right, at 60 miles an hour there will still be a period where you're not hitting the measurement directly on. But you can interpolate between it or, again, sort of viewed in the broader context, I can say send me the measurement right before or send me the measurement right after assuming that it's relatively close to the trip line. And the same way, you know, it doesn't -- in the first version of this work, and in all the stuff we've done for highways, basically we only send a point measurement. If you cross a virtual trip line you send a measurement here. It turns out on surface streets it's pretty impractical because you send cent me a stop measurement I don't know if it's because the traffic stopped or the traffic light is red. There it makes sense let's put a virtual trip line on either side of the intersection and let's measure the time that it takes you to go between these virtual trip lines. And I can still use these virtual sensors as sort of markers, and as a framework on which I can design an architecture to preserve the privacy or anonymity of the users that are sending the measurements in this way. Did I answer the question? So that's a little bit about the sensing on the cell phone side with respect to traffic. I want to talk now a little bit about the mathematical models. Okay. The seminal model in the transportation community for traffic is known as the Lighthill Whitham Richards partial differential equation. The state of the system is denoted by row of XT. That's the density of vehicles. Take a stretch of roadway and count how many cars there are on that stretch of roadway, divide it by the length. That gives you the density of vehicles. Q is just a flux function that describes the flux again is take a point in space and count how many cars are crossing that point in space. That's the flux. Okay. So the LWRPDE is a conservation law. It conserves the number of cars on the roadway, and it relates basically how the density changes in time with how the flux changes in space. So how does this work? If I take a stretch of roadway with vehicles entering A and leaving at B I look at the rate at which they're entering A and leaving at B. Based on these rates spatial variation, I can say either the density should be going up, more cars are entering or leaving or density should be going down. More are exiting than entering. When you take this to the limit, that gives you the LWRPDE. So all it describes is mass conservation. It's very simple. But yet this is quite rich in the features that it can capture. And I'll describe those in a second. So first basically we have the LWRPDEs. You specify an initial boundary condition and the boundary condition, on a minor technical note, boundary conditions aren't implemented as simply as I've shown here. They have to be implemented in what's known as a weak sense. I'll avoid the mathematical details, but it's been well studied in the literature there's other conditions of when the boundary conditions can apply and when they won't to have a unique well-posed problem here. Okay. So we ever flux function, which still needs to be defined. Okay. And the flux function is given by the constitutive relationship, the density times the velocity. That gives you the flux. You can check the units to make sure it works. In order to have a flux function that's a function of density only, that means we have to embed a relationship between the velocity and density. And this is the full-time fundamental assumption of the LWRPDE is that I can describe the velocity of a function of density only. This guy Greenshields, in 1935, basically went and studied two separate roads in Ohio. And on two different days came up with this relation that shows basically, okay, let's assume that velocity is linearly decreasing as a function of the density. So when there's no cars on the roadway, traffic is moving fast; when there is the maximal number of cars that fit on the roadway, nobody moves anywhere. It's fairly simple, yet gives us this quadratic flux function, and this causes all the nonlinearities in traffic evolution. This is exactly what we want to capture. Basically when you're free flowing, when you're at low speed -- this is the vehicle density here and the vehicle flux is on this axis. At low densities you don't have many people around you, you increase the number of cars on the roadway and everybody seems to continue at roughly the same speed. The slope of this line is the velocity. So at low densities, lots of cars moving, you increase the number of cars on the roadway. You just increase the flux because there's more people moving at the same speed. Then you reach some critical point where once you start adding more people on the roadway, something bad happens. The flux starts to decrease. The throughput of the roadway starts to go down. That's bad news, because now you're at the position where more people on the roadway just simply cause the flux to decrease because the speed -- that's basically this line here coming through -- starts to decrease as well. Okay. And the problem is if you know -- if you study transport laws at all, this PDE basically is just a transport law that says that information is propagating at different speeds, the speed is the slope of this line. Well, you can have information propagating forward or backward. The fact that information can propagate forward or backward creates shockwaves which then cause all the mathematical difficulties with this model. Why are shockwaves important, though? They're interesting from a mathematical standpoint, but they're actually practical in studying traffic. That's because the shockwaves are exactly what we want to try to track. They're the congestion waves, when you have a bunch of people piling into a queue, the rate at which that queue backs up is the shockwave. You have a bunch of very dense vehicles, and basically a stretch of roadway that's not dense at all. And you want to know, is that shockwave going to propagate back down the roadway or is it going to start clearing? And this very simple model, why it's so powerful, is it tells you exactly when the queue should be building and decreasing. If there's enough cars feeding into the bottleneck, then that queue is going to keep piling back. But if you're only adding a few vehicles and more vehicles are actually being able to get put through this bottleneck, you actually start to see the shockwave clear. That model is very useful, for example, if I want to understand, I've got some measurements of congested traffic now. I want to know basically is it reasonable to expect that I should be getting additional slow measurements in the future, or is it possible that it could clear quite quickly? This can tell you if you've got the whole freeway, I don't know if you saw on the news, there was a 60-mile traffic jam in China. This model will tell you how long after you remove the bottleneck those cars will clear. >>: Interesting. >> Dan Work: Yeah. So the problem is you have this model that describes the evolution of these shocks quite well. That's useful. But because of these shocks, it makes the estimation problem really difficult. Okay. The first problem is that basically because of these discontinuities, the partial differential equation doesn't exist anymore. You have a PDE that assumes differentiability of the states, yet the shocks are points of nondifferentiability. That causes a problem. The mathematical story is, that instead of treating this and looking for classical solutions to the PDE, you actually have to solve a more general form of the problem, which basically relaxes some condition on the smoothness of the solutions that you seek, you look for what's known as a weak solution. That solves the problem of the solution can still exist but you have too many solutions to the PDE. And the only way to get around this is basically to further embed some condition that says I'm going to look for a less, a more general solution to the problem because I know the smooth solutions don't exist, but then I need another condition to say not all more general solutions will work. I need some entropy condition to isolate a physically meaningful solution to this PDE. So the entropy condition, all it means for practice is that this is the solution of this PDE, this mass conservation law, which actually corresponds to what you see in practice for traffic. What it basically says is if there's a shockwave in the traffic, the only way that this is possible is there better be congestion downstream and free flowing traffic upstream. It prevents the opposite from occurring where you would have free flowing traffic upstream and congestion downstream. You wouldn't expect to see a shock there, because people would start to leave that shockwave and it would start to smooth out the density profile. So the weak solution would permit this in the first place. The entropy condition basically says, no, only shockwaves that have congestion downstream and free flow upstream are physically permissible. So that's the summary of the mathematical formulation in terms of the density evolution model. The important thing there is the entire evolution of the traffic is described purely in terms of how many vehicles are stacked along the roadway. As I mentioned before, we're looking at integrating all this velocity data with cell phones. It's not really a clear way on how to do this. So in order to solve this problem, the approach that I proposed is basically to transform this density-based partial differential equation into a velocity equation, and this will simplify the estimation problem because we'll directly have as the state the velocity which is the same thing that our measurements are in. Okay. So in the general case, you have to do some -- it's not possible to always transform this density PDE into a velocity partial differential equation, you must go into discrete space to solve this problem. What does it look like? We start with the LWRPDE, this partial differential equation. And we want to get to discrete velocity for our estimation problem. So we have this relationship that relates density to velocity. So presumably we can just take the density partial differential equation and use this substitution, right, the velocity as a function of the density and substitute it in and get the velocity differential density partial equation. Then we can discretize it. Here, again, here's the velocity function from Greenshields, this fellow in Ohio that did two different roads two different days and came up with this relation. You substitute it anywhere you have a density. You solve for the density and substitute it back into your PDE. You can do some manipulations and write a new conservation law where velocity is conserved. Instead of having a flux function Q, you have a flux function R where you have the velocity flux where it looks like. The problem it only works for linear velocity functions. The linear velocity functions not surprisingly people have shown that nonlinear functions tend to work better than the pioneering work of Greenshields. Basically, if you want to use nonlinear velocity function, you have to discretize the PDE. And the discretization that you use is a discretization scheme known as the Gudonov discretization scheme. This discretization scheme is important because it embeds the entropy condition into the discretization. You start with the PDE, the finite difference approximation that is Gudonov, basically isolates in the discrete space the entropy solution to that PDE. So I have a discrete density model which completely characterizes the density evolution in the discrete space with the entropy rules embedded in it. Isolates the physically meaning to the PDE take the discrete model and apply my velocity transformation. There I've already got the physically meaningful solution I want in the density domain and now I'll map it into the velocity space. If I tried to do it the other way, I would have lost the consistency with the weak entropy solution of the density problem. Okay. So basely in terms of what it looks like on the roadway. You basically take your stretch of the network and you discretize it into discrete space steps delta length X and into discrete time steps of length delta T. And you build a big velocity vector VN, where VIN is the velocity at cell I at time N. You assume the velocity is constant in each of these cells. So you build a long velocity vector that just describes the discrete state, the speed in each of these cells. And you have an evolution equation which basically takes the velocity at time N to time N plus 1 and it's denoted here on this slide by ME. The M is the model. This velocity evolution equation, which I'll describe in a second. The E just corresponds to the fact that it's only for a single edge in the network. And basically it's given by this velocity evolution equation. The important pieces are basically that you can see that the velocity at time N plus 1 is just a function of the velocity at time N at the same space I. The space that's one upstream and one downstream. VI and VI minus 1. V plus 1 and VI minus 1. There's details I'm not going to go into precisely, I want to point out a couple things that basically in this velocity evolution equation, we have this function G tilde. This is the numerical velocity flux in the discretization scheme. The only important piece is that basically this is where this entropy condition is embedded. It determines all the properties of which way the shocks move. And there's a minimization term here. The min function makes the evolution equation nonlinear and nondifferential. It's important to the estimation algorithms that you can apply on top of it, but because of that nondifferentiability it eliminates a class of algorithms that you would like to use. Of course, we have the inverse velocity function which maps the velocity to the density, and it's given by a nonlinear model here, which is just hyperbolic and one dimension and free flow and linear in another. And of course the velocity function as we have. Again, the important piece here it's a velocity evolution equation that describes basically the velocity at the next time step using only the velocity at the previous time step in the cell you're at, plus the immediate stretch of roadway upstream for you and immediately downstream of you. And it's nonlinear and nondifferentiable. The network problem is slightly more involved. I'll summarize the important aspects. For looking at modeling, the traffic across a city or state or country, basically we have to take the road network and model it as a directed graph. Each edge in the network corresponds to a stretch of roadway and each vertex in the network corresponds to a point where you have like vehicles merging or diverging. And the problem with expanding this model to a network is basically there's lots of complications of the boundary conditions for each edge in the first place. Well, now you have points on the roadway where you have to have consistency. This point has a shared boundary condition with two incoming edges and one outgoing edge. And in order to solve this consistency problem, you have to actually solve a linear program that basically looks at more or less how many vehicles can be accepted by this downstream stretch of roadway versus how many are upstream and would like to go into this downstream stretch from this link and how many are available to be sent from this link. So you can't have obviously more cars go through the intersection than are available. And you can't have more cars go through the intersection than can be held by the other side of the intersection. So that's sort of in words what this vertex linear program solves for. It solves for the boundary conditions for each edge, such that it's consistent with what's actually happening on each edge. So to fully evolve the velocity model from one time step to the next, basically you start with the initial velocity everywhere on the network. And at each vertex you solve one of these linear problems to solve for the appropriate boundary conditions. Once you have the boundary conditions for each edge, then you solve the edge evolution equation, which is this nonlinear, nondifferentiable finite scheme which I skimmed across in the previous slide. That evolves velocity from time N to time N plus 1. Yeah? >>: I was -- for California you have thousands and thousands of linear systems that are solving. >> Dan Work: That's a good question. I think I have a slide that has the actual numbers. But I think in terms of -- I think we've got maybe in northern California, which is the current network that we run, I think there was about 15,000 states. I think there's about 7,000 vertices and roughly five or 6,000 edges. And the edges are further discretized. And the linear programs, it's actually part of the reason I show this slide is because that's actually the slowest part of the algorithm, is solving all these optimization problems. It turns out that the commercial codes to solve these linear programs have so much overhead in terms of the precomputation that they do that it's actually faster, I implemented my own linear program algorithm to solve these things. Basically because if you think about most linear program solvers they're used for solving really, really large problems quickly. And this I have an exact opposite problem. I have really, really small problems. But I have thousands of them. So it was actually faster to build that out. It made the code run something like two orders of magnitude faster. So in terms of real time, it was something like 100 seconds for evolving a 15,000 state network from one time step to the next using commercial and open source linear program codes and some custom stuff that we implemented basically you're able to do it in a matter of a fraction of a second. >>: This process is run periodically or [inaudible]. >> Dan Work: Yes, so basically every time step. Every time you want to evolve a state from N to N minus 1 in practice we're talking about six seconds based on some particular properties of the discretization scheme which I didn't really discuss. So every six seconds you have to solve every one of these linear programs in your network at all the vertices and then evolve the PDE from one time step to the next. But it's actually -- I mean, it's fast enough that this can be done on a laptop computer in a fraction of a second. >>: Solve the space [inaudible]. >> Dan Work: In fact, for the estimation algorithm which I'll talk about in a minute instead of doing it just once every second we have to do it several times because we're using a Monte Carlo estimation algorithm. So this has to be done hundreds of times every six seconds just to move the state forward but these problems are very fast. If you look at the structure, the other reason I want to put this here, because look at the structure here, basically each vertex you have to solve this linear program, but this linear program doesn't depend on this one. It doesn't depend on this one, doesn't depend on this one. It can be nicely decoupled there. Once you have the linear problems solved, you have the boundaries for this edge and this edge, so on. So each of the edge conditions can be solved independently. It has a nice structure for basically either multi-threading, if you've got multiple cores, or distributed across multiple machines if needed. But, in practice, you know, the schemes are fast enough that you can run them in real time on very cheap commodity computers. Okay. So that's a good segue into -- so now you have this model, this nonlinear, nondifferentiable at least the state of the system is velocity. That's exactly what you're going to measure. How do you solve the real time estimation problem? And so this is the third thing that I worked on as part of my Ph.D. was basically how can we solve this estimation problem accurately but how can we do it in a way that's fast enough that it can run in real time? And without necessarily relying on tons and tons of computational infrastructure. Okay. So in order to fully specify the problem, I have to give one more piece of information, and that's the network observation model. Basically, what are the measurements, what is the model for our measurements look like. And all the work that I just described in terms of transforming this density model into a velocity model, the payoff is right here in the network observation equation. So the measurements that we get from cell phones are stacked in this vector of observations, YN at time N. VN is the state of the system. And H is just an operator that maps the measurements or the state to the measurements. And H is now linear. Because the velocity vector is just the velocity each discrete point everywhere on the network. And H just basically says, which parts, which cells in the network did you get a measurement from? And if you got a measurement, then H, it's just a matrix of 1s and 0s that picks off the locations where you got measurements from your cell phones. Cell phones sends velocity. So it's just a linear mapping. Of course you pick up some noise from the fact that the GPS has errors. You're looking at assuming that the velocity is constant in this cell for all of space and all of time. There's some other more subtle issues with the fact that spatial sampling induces some bias in terms of you sense faster vehicles more often. You treat all of that in the noise term for your observation operator. Okay. And the recursive state update equation, to summarize again is this network velocity equation you take the full velocity state. You break it up into each of the different edges, solve the optimization problems at vertices. Get boundaries conditions. Evolve each PDE, forward independently and, of course, there's noise involved in this process as well. Both from the fact that, okay, well there's parameters in the model which aren't completely specified. The model doesn't completely capture every possible detail. The model is an approximation of reality. It doesn't capture things like accidents. So there's uncertainty that's introduced into the model because of the fact that there's imprecision in the boundary conditions and things like this. Okay. So you have the observation equation. You have the recursive state update. There's lots of recursive estimation methods that are available to solve these problems. Particle filtering is sort of like the, when nothing else works pull out your particle filter. Fully nonlinear. Monte Carlo method. It's really useful for solving highly nonlinear problems, especially ones that are nondifferentiable. But the problem is, for large scale systems, these become very computationally intensive and very difficult to run in real time in practice. And there's been some applications where there's been some success in this. But others, it's really quite difficult. And extended Kalman filtering, if you're familiar with Kalman filtering, extended Kalman filtering, basically you have a nonlinear system you lineralize it and apply Kalman filtering for this problem. This is the generic approach when you have a nonlinear and nothing else works you applied Kalman filtering. I point out we're dealing with such a difficult model in terms of the nonlinearities and nondifferentiabilities which is in fact caused by the physics of the system. Extended Kalman filtering can't be applied to this problem. Instead what I use is a technique called ensemble common filtering it combines particle filtering and Kalman filtering together. Basically what you do, you come up with a Monte Carlo method for integrating the state through your model. And you compute statistically, the mean and the co-variance from these samples and then you use that mean and co-variance that you derive from your samples to do a standard Kalman update. Okay. So, again, what this looks like, basically you have two steps to the Kalman filter algorithm. The first step is basically use your model to predict what the mean of the system looks like and, what the co-variance looks like. If it were Kalman filtering or extended Kalman filtering, you'd lineralize your system and you'd have an analytic system of how the mean and co-variance evolve. For nonlinear system you basically generate lots and lots of samples of your velocity state from a distribution that has the mean and co-variance of whatever it was at the previous time step. You run those samples through your model. And that gives you a distribution of what the velocity looks like at the next time step. And you can compute a meaning co-variance from that. Then you get your measurements, and you compute the Kalman game. The Kalman game is a minimal variance estimator that combines the information that's contained in the estimate from the model with the information that's contained in the measurements. Then you use that to correct or update the estimate based on this additional information. Then you feed that updated state back into the model. So, again, just to go into a little bit more the details of the algorithm. First you initialize. You come up with a distribution with a mean speed, say, V bar A0. A denotes the fact that it's from the previous analyzed state or measurement update state, with the co-variance P. And you just generate K samples from this. K is the number of ensembles or samples that you want to use. Ensemble Kalman filtering, the samples or the draw from this realization are called ensembles. You take each of those ensemble members or members and run it through your nonlinear nondifferentiable differential model, to get as many samples from your next time step predict. You can compute the mean of that ensemble. You can compute the co-variance of that ensemble. And then you use that to do your Kalman gain computation. Again it's just minimal variance estimator. You can then use that to update the velocity at time N based on what the model forecasted compared to what information was contained, what new information was contained in the measurements. The difference here between Y measurement and this observation operator is the information that's the new information that's contained in those measurements. So you use that to update the state. Okay. So I'll switch now to talk a little bit about some of the experimental work that we've done to test out just how well this works in practice. So the first experiment that we ran was an experiment titled Mobile Century. It got the name Century because we used 100 vehicles. Basically we hired something like 165 grad students and had them drive these cars around a ten-mile stretch of the Bay Area, if you're familiar with this. It's right between the Dunbarton and San Mateo bridges in California. And they drove these cars for like eight hours sending measurements on the virtual trip lines. And it was a huge operation. We had tons of support staff that were doing things to make sure that-I mean, when someone offers you a car, an $800 cell phone and not much supervision, we didn't want people driving off to the beach. So we had infrastructure to make sure that for the purposes of this experiment we could track where people were, in addition to having it feed into our virtual trip line infrastructure. The site location is important. Because anyone can estimate traffic conditions in free flow. That's pretty trivial. You already have a pretty good notion if it's an empty road what the conditions will look like. But this site in particular is recurring congestion in the Bay Area. Gets lots of periods of both free flowing traffic, congested traffic and accidents. And all the data that we collected from this experiment, which is like 100 vehicles, three-second GPS measurements on the stretch of roadway, it's all public. It's all open. If you go to traffic.berkeley.edu, you can download it and use it for whatever you like. >>: [inaudible]. >> Dan Work: That's a great question. Yes. There were. There were accidents. >>: Rental cars? >> Dan Work: In fact, this is a big issue. In terms of making things work, one of the big challenges we had to do was get the research approved by the university. And the recommendation was basically hire all the students as employees and have them drive rental cars and the rental car companies will manage the insurance and you won't have to deal with it. So that's the approach that we did. We had people renting their vehicles. Fortunately, none of the drivers that we hired were actually involved in any of the accidents that occurred this day. But this is sort of -- this is a time space diagram of the traffic. You've got time on the X axis. This starts at 10:00 in the morning and goes to about 4:00 in the afternoon. And the post mile is just a mile marker along the roadway. People are driving up here. Starts about mile marker 21 goes to a little bit beyond mile marker 27. And the blue dots correspond with traffic that's moving fast. 70 miles an hour. And the red or yellow correspond to slower moving vehicles. You can see in the afternoon, congestion, there's lots of recurring congestion. We see a lot of back-up here. But in the morning, 11:00 in the morning, this is unusual. This was the one thing we weren't expecting to see. That's caused by an accident coming just up at the top side here. And caused this huge shockwave to propagate down the roadway. This is exactly what these models, of course, that we're working with are supposed to be good at capturing. That's why you can see this really sharp change from the free flowing traffic to the congested traffic. Okay. This is the data that was all the wild data that was stored locally on the phone. In terms of the size of the network. It's pretty modest. It's something like 13 edges, 14 vertices and I think about 70 dimensional state. So pretty small. But this accident, as I mentioned, in the morning was sort of the anomaly we weren't expecting. I mentioned it was a big experiment. There was a press event and there were several people up talking about how important this was for the future of traffic. And we had these real time displays showing the output of this ensemble Kalman filtering algorithm on a monitor. And in the morning we're assuming free flowing traffic, no real risk here that the algorithm could do something catastrophically wrong and avoid embarrassing us. And sure enough it started showing this bright red spot in the middle of the press event. And everybody here is a traffic expert. So they start calling up their departments of transportation or their, their traffic monitoring systems to say, you know, are these guys, what's going on, there's all this red stuff here. And sure enough fortunately we were able to show that this was caused by like a five car accident that morning and we were able to redeem ourselves by saying in fact that you know this is precisely why this type of technology and this type of monitoring is useful. If you could give people that hadn't left yet the fact that this accident occurred, this would be quite useful information. And part of the reason that we also chose this site is that there's, I mentioned these inductive loop detectors which are quite widely deployed in California, this site also has some of the densest coverage of these inductive loop detectors anywhere in the state. There's something like 17 of these inductive loop detectors, more or less a quarter of a mile spacing. Quarter mile, half mile, something like this, spacing, so you have extremely dense coverage of this existing fixed sensing infrastructure. That's what we're hoping to say that this might be a candidate way for some traffic monitoring applications to not have to use this stuff. And you can see that at least, you know, at this level you can see that at least some of the main features are retained. The morning accident, of course, is covered in these inductive loop detectors and you can see the recurring congestion, this is much higher resolution data. When you look at these things you have to keep in mind these have sensors as they do, these have sampling bias, things like this. The data that we actually used to run our estimation algorithms is much, much less dense than what you saw in those GPS phone logs that were stored locally on the phone. In fact, for the simulations I've shown we used ten virtual trip lines. So it's a little bit harder to reconstruct the picture of what happened the day for traffic using only this information. So the goal of these estimation algorithms is to sort of fill in the gaps. What happened in the places that we're not showing information. And in order to assess how well we do when reconstructing the velocity, it's really hard to get sort of the true state of traffic, what is the velocity everywhere on the roadway even though we had 100 vehicles that was something like anywhere from 2 to 5 percent of the traffic on that day. Doesn't really give us an accurate measurement what the rest of the traffic looks like. We had a couple of teams of video cameras sit on some bridges film with HD video cameras the traffic so we could identify the license plates of the vehicles that were not participating in the study get their travel times from one end of the experiment site to the other. So in the video I'm going to show next basically we have measurements reidentified travel times from this video data from license plate reidentification, and we'll compare that with the travel times that we compute using our velocity field that we've estimated. Okay. So here's the setup. Basically the experiment site is in the Bay Area. Zooms in here. And you'll see sort of the standard -- the side that we're doing the estimation on is going north. So up the screen. You'll see that the congestion starts to occur from this accident in the morning right away, you get your map interface a dark red and the screens on the right are actually showing what the estimation algorithm is doing. So on the bottom here, basically the green lines correspond to the estimated travel time, computed from this velocity field, plus minus three standard deviation ounce that estimate. And the blue curve corresponds to the mean travel time that was collected from the video data. So I mention we had these video cameras. Each pink cross hair here corresponds to one vehicle that was reidentified, their travel time across the stretch of the experiment. The blue curve here is just the mean of these pink marks, the individual data measurements. So we do like a five-minute moving average window of this data to get the mean estimate of travel time. And that's what we tried to track with our estimated from our velocity field. So right now it's about 3:00 in the afternoon. You can see the congestion clearing in the morning. I told you these models were good for understanding how traffic builds and clears. We do a pretty good job of tracking the mean clearing time as congestion clears 2:00 in the afternoon, travel times are starting to increase here. At the low point here you're looking at travel times about eight minutes across a stretch of roadway. Now it's about 3:00 in the afternoon. You can see the congestion is building again. It's taking almost double that. It's about 18 minute travel times that are showing here. In congestion the variance of the estimated travel time starts to increase because of some of the nonlinearities associated with the model but overall we're able to quite accurately reconstruct the travel time from the vehicles that were not participating in the study using the velocity field that we estimated from the GPS data sending these virtual trip line line measurements. >>: [inaudible] faster than. >> Dan Work: Yes, there's an HOV lane. That's one of the problems. Because the model doesn't describe the evolution of different lanes of traffic, just says that everybody's going in the same speed in all lanes. And in most cases that's a good approximation, but in the case where you have an HOV lane, that can be, that adds quite a bit of uncertainty or error into the model. So the next big thing that we did was try to scale this up and try to say that was nice for a small stretch of roadway but what would this actually look like to deploy at a large scale. So for a year, basically, we opened up our system, download people in the Bay Area to download an application onto their mobile phone. They could get the real time traffic estimation or estimates that were coming out of this velocity Kalman filtering algorithm shown on their phone and in return for that data, or the traffic information, we'd collect GPS measurements from these virtual trip lines. We had about 5,000 users that downloaded the application and $98 the experiment. And basically it concluded last November. The real challenges were scaling this up to show that this could actually not just be a fun academic experiment but that on networks of a meaningful size, that you could still actually get these algorithms to run in real time. So basically I mentioned earlier some of the features but I'll just recap. For the Bay Area, basically we have an automated algorithm that builds the network topology for us, using a database of underlying road topology that's constructed by NAVTEK. We have access to their map database. We basically can automatically generate using an interface just draw the area we want to do traffic estimation on. It will build the network topology for us and allow us to run this algorithm on top of that infrastructure. The network that we ran on, okay, so here are the exact numbers, 4,000 edges, 3,000 vertices, state dimension is over 15,000. And again because we're using an ensemble Kalman filtering algorithm we have to solve that every time in six seconds. Hundreds of times each time for each sample realization for the estimation algorithm. But these algorithms are fast enough that even the entire Bay Area algorithm runs on my Think Pad which now finally died, but it was a three-year-old Think Pad by the time that I was actually having this production code run on my system. So there's obviously some other pieces of the system that didn't run locally on the laptop such as like the map database wasn't all stored on my machine. But the core estimation algorithm that takes the data, runs the Kalman filtering algorithm, solves all the PDEs, solve all the linear programs and puts that out, runs in real time. I think it takes about three seconds to do a six-second update. You have plenty of time even for machine hiccups and things like this. In terms of scalability. It's highly scaleable. With a rack of servers something you could cover most of the freeways in the Continental United States. Okay. I just want to take a few minutes to just summarize sort of what I think some of the interesting extensions to this work are. As we move away from traffic and into other areas. So I really focused a lot today about how the traffic problem -- again how the GPS availability of data coming from smart phones is changing how we do traffic estimations, increasing the coverage a lot. But we have a problem with reconciling this information with density-based models. So I showed how to transform the density model into a velocity evolution equation and solve that problem using ensemble Kalman filtering. I think in terms of what the next steps are, I mean I think where this stuff gets really powerful is when you start to look at how to combine information from traffic with other cyberphysical systems. So just like we built Mobile Millennium this traffic information system that's taking cell phone data from all the cars and sending it back to the cloud and we're doing all kinds of computation and feeding that information back you can do the same thing for air quality. You go out deploy your air quality sensors all over the place. Nice environmental engineering application. And you build a system that solves the highest end contaminant transport models for air quality. But the problem is that these physical problems are obviously related. Congested roadways create emissions which degrade air quality. There's no reason if the physical systems are coupled that we shouldn't also couple the computational infrastructure. That in addition to just making the sensor data available, if you've got a robust algorithm for estimating the traffic conditions, making that state available for other services on the Web to take advantage of starts to make these cyberphysical systems much more powerful in terms of what they can estimate, what they can infer based on the physical environment. So I think getting to that level is something that's going to be really interesting. Okay. The main challenge is, some of the things I'm interested in working on and what I think some of the challenges are, obviously this platform-based design, designing these systems to easily be able to share both raw real time sensor data as well as the real time best estimates coming from sort of the state-of-the-art algorithms from different areas is definitely one of the challenges. And how to deal with authentication, how to deal with security, privacy, all these things as you try to merge these things I think is just going to open up a host of problems that are going to be interesting to study. There's lots of interesting problems in estimation, specifically related to when you move the sensors around. Anything from basically trying to integrate these moving sensor data into models that instead of tracking the users, track some aggregate quantity, which there may be many reasons you want to do that either for computational efficiency or for privacy. Trying to understand how to move your sensors efficiently, especially when you're on directed graphs that on the edges of the graphs you have partial differential equations that you have to solve. Makes the sensor planning problems quite difficult, especially in uncertainty in the state there. Designing real time estimation algorithms for dealing with the streaming data to handle both mathematical models and the huge volumes of data when you start to take advantage of a lot of this crowd sourced information. And then more on the technical side for mathematics just deciding whether if a problem is well posed dealing with bound fixed condition data how to actually work with trajectories in the data streams to identify whether or not the problem can even be solved or it's overdetermined or underdetermined. And the applications I think are, I mean, just a few of them. Obviously I'm interested in traffic based on my previous work at Berkeley, but I think there's obviously lots of extensions to the same ideas and smart buildings for air quality monitoring. Understanding basically how people move, how to do participatory sensing of buildings, using smart phones, using sensors that we deploy, using mobile sensors. I mentioned before how to integrate real time traffic estimates into air quality. These are a few applications that I'm motivated by in terms of where I think sort of the next steps might be in terms of solving the distributed parameter systems estimation problems. And with that, I'll conclude and I'll take any questions that you have. Thank you. [applause] >>: I have a question. So do you have a sense of how more accurate your model is that uses the sensors at these virtual lines? Then that you are taking say the velocities you record at each of them and kind of assuming that would be the velocity of the whole log and then just computing the total time it takes? >> Dan Work: Yeah, it really depends on -- at some point -- I mean as you increase the data volumes, the model becomes increasingly less useful, which is to say if you have enough data, you don't need any model at all. The data tells you everything. It contains all the information that a model would have and then some. >>: [inaudible] like you had an experiment with 5,000 people, but how do you manage the accuracy when it's only like a thousand or 500,000? >> Dan Work: Yeah, so these are great questions. I mean, the real challenge I think with a lot of these systems is with validation, is to be able to -- we developed a well-defined small experiment specifically so that we could validate it on one test segment. But to validate these systems in practice over large networks is very difficult. So you can obviously send test drivers out to assess how well you're performing here and there and over time build up some reliability estimates of what you've done, but the larger your system, the more time varying the system becomes, the harder it becomes to even validate how well these things are performing in practice. And I say even if you look at sort of the case for savings the models even if you have too much data, if you have too much data, models are still useful in the sense that they can help you identify things that are wrong with the data. And to sort of put it in perspective, a lot of companies right now are looking at -- I want to buy -- I want to build a traffic monitoring application or I want to build a new crowd sourcing application or new mobile service. But I don't have any of the data. So I'm going to buy it from somebody else. I'm going to buy it from a FedEx. I'm going to buy it from Verizon, buy it from somebody else. So you buy this data and you have no contract or no guarantees about the authenticity of the data except for someone's guarantee that we didn't make it up. And by having a model in the background, even if the data alone can tell you all you need to know about what you're interested in, the model may still help you identify whether or not you're being given information which is just physically inconsistent. And that may still allow you to detect things like a sensor is malfunctioning or so whether it's malicious that someone's giving you bad information or whether it just happens to be that the sensor is behaving poorly, a mathematical model will still help you basically identify or classify which sensors are junk, which sensor data is junk and which are physically consistent with what's happening. So I don't know if that answers your question. >>: So you talk about the monitoring. Maybe you covered this in the talk, but what's the challenge there? How is that different from the vehicles, I guess you don't have the same monitor. >> Dan Work: You know, there are some similarities. And the problem of traffic estimation and pedestrian modeling at the right scale. One thing that I didn't touch on during the traffic monitoring talk is if, understanding the boundary conditions is hard to estimate. You can either deploy sensors and try to measure it, but what you really are after is where people are starting their trips and where they want to go. If you can generate the trip demands, then you can pretty easily estimate what the boundary conditions need to be based on what all these demands look like. So in the same context for traffic, or for pedestrian modeling, a lot of understanding how people want to move through a building or through an urban area, again relate to where are they? Where are they starting from? And what are they trying to do? And so if -- again it's an origin destination estimation problem. So you needle to know what they've been doing and what they're going to do in the future. At that time the estimation problems are very similar. Once you start breaking down to try to identify like how an individual will move from one place in a room to the next place in the room, you can still work on transition models that sort of describe how a person or groups of people will move from one part to the next. But you don't have any of the transition laws. It's not as obvious to say just because they're spaced sort of in front of you that that's the direction that you're going to take in your next steps. So there I think there's a lot more work that has to be done in terms of both model identification and parameter identification for those types of problems. All right. Well, thank you. >> Jie Liu: Thank you.