Document 17844812

advertisement
>> Lucas Joppa: Right, well welcome. Thanks for the few of you in here in person and those of you
online on Resonant. It’s a great privilege and honor to be able to introduce and host Dr. Milind Tambe
from University of Southern California. Thankfully he has his title up here on the slide because he has
quite possibly the longest professorial title in the USC system for sure. He’s the Helen N. and Emmett H.
Jones Professor in Engineering.
I’m not going to run through his long list of accomplishments. But what strikes me from his Bio is the
number of awards that he’s won just in raw total. But also the diversity of domains that he won on
everything from engineering research to community service, societal impact. Also teaching, which is
awards that professors rarely get but should be rightfully honored when they do so.
Dr. Tambe has had a long and illustrious career primarily in the world of security games and game
theories applied to security games. He’s worked with all sorts of government agencies, municipalities to
help make ports, oceans, cities safer. But recently and the reason I was real excited to have him here at
Microsoft Research is his thinking around Green Security Games. The way that we can take game theory
and traditional security game approaches and start thinking about how we can deploy people in the real
world. To help protect natural resource whether that’s fisheries, wildlife, forestry, things like that.
That’s going to be the topic of Milind’s talk today. I’m looking forward to hearing what he has to say.
Thanks.
>> Milind Tambe: Great, well thank you for people who are here and people who may be watching
online. But we’ll keep this informal. Please feel free to interrupt me. We have what about till eleven
thirty, something like that?
>> Lucas Joppa: Yeah.
>> Milind Tambe: Okay, so the work I’m going to describe overall security games is joint work with a
number of current and former PhD students and postdocs. Two of them current students are here Fei
Fang I’m showing her over here and Thanh Nguyen. This is a picture of our current group.
I’m going to jump right into the topic which is security. We know that we face globally challenges of
security for protecting our ports and airports, intredicting the illegal flow of drugs, weapons, and money,
suppressing urban crime. Now in all of these cases we have limited security resources. How to optimize
the use of limited resources is something that we’ve built, been building Decision H for, for security
agencies.
More recently we’ve been taking some of these techniques and applying them for security of forests,
fish, and wildlife. This is what we’ve been referring to as Green Security. The commonality in all of
these domains, in all of these challenges is that we have limited resources, a lot of things to protect, and
a watchful adversary who can monitor our defenses and exploit any patterns.
How do you schedule or plan, or allocate limited resources taking into account this watchful adversary?
We’ve been appealing to game theory and in particular this notion of Stackelberg Security Games. In
this talk I will briefly introduce Stackelberg Security Games. I’ll then start talking about infrastructure
security. This is mainly to give an introduction to some of the work that we’ve done in the past. The
first ten minutes for Stackelberg Security Games, next five minutes or so for Infrastructure security.
The bulk of the talk will be focused on introducing Green Security. This is an exciting and important area
for research. I’ll outline some of the applications as well as some of the challenges we face. Then finally
PAWS which is a Protection Assistant for Wildlife Security a decision aid that we’ve been building.
I know Fei and Thanh who are here have heard this talk like a hundred times before. You know if you
want to check your email or something that’s fine. But hopefully you’ll be able to answer questions
from the audience that is here.
Lucas we were talking yesterday about AAAI and IJCAI papers, and all this sort of. I went back and
checked how many papers on security games we’ve published just as our own group. Depending on
how you count so there’s like sixty to seventy papers over the last eight years in security games. I’m
going to sort of give highlights of some of this. Green security’s a very new topic for us and we haven’t
published as much though.
Let’s begin by talking about Stackelberg Security Games. I’m going to introduce this game using a two
by two example where we have the US Coast Guard trying to protect a toy port of two targets, targets
one and two. Target one happens to be more important than target two. If as a result the Coast Guard
is always at target one and adversary conducting surveillance will attack target two. The adversary gets
a positive reward of one. The Coast Guard gets a negative reward minus one. If as a result the Coast
Guard were to switch and always protect target two. Now an adversary conducting surveillance will
attack target one. The adversary again gets a positive reward five. The Coast Guard again gets a
negative reward minus five.
Any deterministic strategy and adversary could defeat. If the Coast Guard were to use a mixed strategy,
a randomized strategy so that sixty percent of the time they’re at target one. Then forty percent of the
time they’re at target two. An adversary conducting surveillance will only know that the Coast Guard
are here sixty percent, there forty percent. But what they will do tomorrow remains unpredictable.
The goal here is to come up with a mixed strategy, a randomized policy that increases the cost and
uncertainty to an attacker in coming up with a plan of attack. We are not guaranteeing a hundred
percent security because there is no such thing in the real world. We are optimizing the use of our
limited security resources.
Now these kinds of games are called Stackelberg games because security resources come with first
defender with first. The adversary observes and then responds. What we are looking for is Stackelberg
Equilibrium for a two by two game we can solve this by hand. You know should it be a sixty/forty
allocation? Should it be a seventy/thirty allocation? But when there are trillions of possible patrolling
strategies and a large number of targets solving these problems by hand become difficult.
That’s where our research comes in. Research has been in solving massive-scale games and their
uncertainty. This has lead to a number of applications on infrastructure security. Work is in use for
Infrastructure security. For example by the US Coast Guard in generating patrols in ports such as
Boston, and New York, and Los Angeles, as well as generating patrols around the Staten Island Ferry. In
all of these cases the Stackelberg game formulation is used to generate these strategies.
Our work is also used by the TSA. They’ll assign Air Marshalls to flights on a randomized basis, again,
using a Stackelberg game model to generate these mixed strategies. All of this work started by earlier
work at the LAX Airport where the idea was generating randomized patrols on terminals or checkpoints.
Our work has also been tested by the LA Sheriff’s Department. This is for trying to suppress crime on
trains.
This was the past work which I will not focus on much today. Where we are going with this is Green
Security. You’ve arrived at exactly the right time area.
>>: Milind [indiscernible] Goldberg.
[laughter]
>> Milind Tambe: Here with Green Security we are focused on protection of forests, fish, and wildlife.
For example with the Coast Guard our focus is generating patrols in the Gulf of Mexico to try to deter
illegal fishing. With IBM it is for cleaner rivers in India. This is for inspections of factories that pollute
water. With Panthera and WWF and so these are two NGOs. That’ll be primarily the focus of my talk
today. It’s for generating, optimally generating patrols based on past poaching data and so forth to
protect endangered species.
What’s new here is the fact that there is a large amount of adversary behavior data. Unfortunately,
there’s a lot of poaching, a lot of illegal fishing, which provides us the data on the basis of which we can
learn adversary models. This then becomes a repeated game. We can learn these models to improve
our game playing strategies. When they attack again we can continually improve our strategies.
Here we are also able to incorporate or need to incorporate insights from conservation biology. How
collaboration from somebody like Lucas or other conservation biologists would be very valuable, or is
very valuable. Similarly for criminology, in fact there’s a conference that we’ve been organizing, C four
conference for conservation, criminology, and computation which tries to bring together these
interdisciplinary insights.
There is a newer domain that we’ve also worked on which is urban crime. Here there is a lot of data.
This is area around USC for example where the problem is more low level crime as in people stealing
laptops and cell phones, and so on. Here we can start at a clean slate and learn an adversary model
from scratch from data because there’s a lot of data people report crime.
We have Infrastructure Security games where we have fortunately limited numbers of attacks and data
where we have to rely more on expert knowledge. To Green Security where there is medium amount of
data let’s say because we have silent victim problems. Animals are not calling in saying there’s poaching
attacks going on.
This leads to some fundamental challenges in the fact that the data is biased and the fact that we have
to have a tradeoff in exploration and exploitation. Opportunistic crime then of course extends it to
situations where there’s a lot more data. More recently I guess us and I’m sure others too face a lot of
pressure from funding agencies to move all these techniques to work Cyber Security. That’s one of the
newer areas of research.
This gives an overview of the work that we’ve been doing in this area of security games. It’s a highly
interdisciplinary space. USC was kind enough to do a magazine article on our work outlining. One of the
things it produced was this map outlining all the places where our work is deployed. Particularly with
the use of, with the collaboration of non-governmental organization, you can see that some of this work
now is getting deployed outside the US. Some of our collaborators are deploying this work in other
parts of the world as well.
>>: [inaudible] the monitor there. The [indiscernible] based armor and its many iterations.
[laughter]
>> Milind Tambe: But this is only possible because of the interdisciplinary collaboration with law
enforcement agencies. We’ve really had to embed ourselves. You know Fei and Thanh, and everybody
else have gone trying to understand problems locally. Then based on that try to use, embed those
constraints in the solutions. Along the way for example they’ve had to learn new kind of cutting plane
algorithms that they didn’t imagine they would learn as part of their computer science training.
>>: Can you comment a little bit about domain specificity in terms of the learnings and the intensity of
effort versus the general principles that you rely upon that you can share among the applications?
>> Milind Tambe: Yes, so, I mean we are very much focused on this use inspired research. There are so
many instances where the domain provides us a new kind of a problem, and then generalizes too many
other domains. But when you encounter that kind of a domain for the first time it throws open a
completely new type of a problem that we hadn’t encountered before. Then you generalize.
It’s very much the, I mean we like to think of it very much the classic use inspired research where you
start from a problem. You try to find a solution and in that you recognize a general principle that applies
to many other domains. But that also solves the problem at hand. Not everything is like that; clearly
sometimes it’s just the problem that works in this domain. But this has been a repeated occurrence.
There are sometimes theorems that people are able to prove that would be wonderful if only the
domain constraints could just go away. For example you know here’s a theorem that would apply,
would speed up all our computation. If the Air Marshalls would only have even numbered cycles, that is
to say they would not intervene in hijacking on [indiscernible] cycled flights, or something like that. You
know if, okay returning, going from New York to…
>>: [inaudible] are open to solutions.
[laughter]
Add the flights [indiscernible].
[laughter]
>> Milind Tambe: It’s very interesting. I think we’ve learned a lot in terms of really understanding that
the domain really constrains the solution space and the problem space.
To put all our cards on the table we also have a startup ARMORWAY that has, you know now run by
several of my former PhD students. It’s growing. Things are you know, there’s a lot of, it’s a very
exciting time in terms of where all of this research and applications are going.
>>: You [indiscernible] something that [indiscernible] domains before you move on the [indiscernible]
staying [indiscernible] for a little bit. You looked at fishing attacks and somebody online cyber security
issues from the point of view of your methods? Because you know why you’re here is relevant you
know to meet with some of the fishing experts and so on.
>> Milind Tambe: I would love to…
>>: [indiscernible] it’s not the wildlife fishing.
[laughter]
>> Milind Tambe: Yes, I would really love to do that. I would be really, really…
>>: I’m curious if you could formulate it as a serial…
>> Milind Tambe: Yes, this, so I haven’t read it but one of my former, or two of my former postdocs
actually you know they’ve formulated some of these problems in this framework. They’re going after
these problems. I mean I would love to, I would love to really go out, go after this because…
>>: [inaudible].
>> Lucas Joppa: Yeah, so I was going to see what…
>>: Well, yeah, we can talk about that at the end before you run out of, if you have some white space
on your schedule so you could meet with some of these folks that do this kind of thing.
>> Milind Tambe: I would love to. I would be, we have actually as I was saying we’re getting pushed to
say we’ve got to move to the Cyber if you want more funding essential in some of these cases. That’s
the way it’s been working. That’s…
>>: Well I’m think about a question as opposed to you going there for, because that’s where the
interest is, really, if we can’t formulate those challenges in a way that would make sense for the
methods.
>> Milind Tambe: That’s right. I think, I mean obviously we are interested in the fundamental
algorithmic advances that we can make. We are interested in the domains as well. But you know
hopefully what we will find is that the principles will apply. Please?
>>: Can I just get a little contact. I apologize if this came up already. But it seems to me that this whole
model is dependent on kind of the stupidity of your adversaries. It’s in their interest to be as random as
possible within their domain. If they’re random then you can’t do anything better than be random…
>> Milind Tambe: No, no, so I think, so the way I explained it in the beginning is that we’re assuming the
adversary’s actually very in early part of this work. Is that the adversary’s rational. They are very
observant. They know exactly the patrolling strategies, the mixed strategies that you’re playing. Then
they choose from their perspective what’s in their best interest to attack.
They are just picking, so it’s like, I mean what we are doing is we are playing some kind of randomized
strategy. They’re observing what you’re doing. They’ll pick that one spot which is the best for them
which happens usually to be the worst for us. That’s the way the game gets played. They’re not playing
a randomized strategy.
>>: But isn’t equilibrium for both to be random here?
>> Milind Tambe: No in the Stackelberg game the defender plays a mixed strategy. The adversary’s
pure strategy response is optimal. The adversary’s job is really very simple in this model which is look at
the, you know look at the patrolling strategy of the defender. Find that one place where you can attack
and get the best possible result for yourself. It’s a fairly different model compared to a simultaneous
move game model. The computational complexity here is polynomial time compared to, so there’s a
unique equilibrium as opposed to having you know the way of our multiple equilibria. This model offers
some advantages over the normal sort of simultaneous move game magic with real absolution.
>>: Going to this domain of airports [indiscernible] security. You can’t define the problem in this scope
of the LAX airport. You know for a tourist point of view it’s actually a bigger game, right. If LAX is really
well protected you can try at another airport. Have you looked into this kind of interaction between
these agencies and how making one thing more secure may deviate to another one?
>> Milind Tambe: Right, right, right, so I mean I completely agree that you know that really there is a
bigger game that is being played. I mean we thought about it on a national scale how would you go
about doing these things. It’s not something that we’ve had access to internal saying here you know our
algorithms are actually deployed on a national scale for all of the airports or something like that. That’s
something that would be interesting to do. There’s no opposition from us to think about it in that way.
But it’s just something that hasn’t been given to us as a problem to think about.
Let me move on to Infrastructure Security. I know that, I mean it’s great we are having an interactive
session. I’ll try to just speed things along a little bit because I do have a lot to cover.
>>: You have until noon, right.
>> Milind Tambe: Oh, okay, that’s good.
>>: That’s why the [indiscernible].
[laughter]
>> Milind Tambe: That’s good, that’s good.
>>: We’re just going to keep you here all day, right.
>> Milind Tambe: That’s fine too, that’s fine too. I guess the longest lecture I’ve given is nine hours.
[laughter]
This was a summer speech, summer school in Beijing. They kept us for nine hours each, so morning nine
to evening six lecture.
>>: [inaudible]
[laughter]
>> Milind Tambe: That’s true, that’s true, that’s true. Let me start with Infrastructure security. Here I’ll
just highlight one piece of work that we’ve done. Just to give you a flavor for how this work gets done,
PROTECT which is we’ve built for the US Coast Guard. PROTECT is an acronym of the system. You can
see we spend a lot of time trying to come up with good acronyms.
This is a decision aid built for the Coast Guard to generate randomized patrols in ports taking into
account the values of different targets. It’s deployed in Boston and New York, and Los Angeles, and
maybe going towards other ports in the country as well.
What the Coast Guard are worried about are attacks of this type, but also a less serious attacks that
might happen. In two thousand eleven the system initially was deployed in Boston. At which point they
made a nice video for us after they evaluated the implementation to be a success. There’s some criteria
for how they evaluated it.
They made a nice video. I’ll play a thirty second clip from that video.
[video]
Most focused upon providing effective presence, reducing predictability, and enhancing the safety and
security within our ports. PROTECT guided patrols became a source of pride for Station Boston crew. It
doesn’t get much better than that. PROTECT has also had other positive side effects such as the
development of better tactics, techniques, and procedures. The results have been exceptional. We
can…
[video ended]
>> Milind Tambe: Beyond the first application for port patrols was generating patrols around the Staten
Island Ferry. This is work of Fei Fang as part of her PhD thesis. Carrying sixty thousand passengers a day
the threat is somebody ramming a boat into the ferry. The Coast Guard runs these patrols around the
ferry. These are actually our algorithms at work, actually part of Fei’s thesis.
It is completely and radically changed how they use to run these patrols before. Sometimes when you
know public appreciates the patrols and so forth. It’s good for the Coast Guard and also good for us.
I’m going to explain how we generate these patrols. Just to give a flavor for how this works. There’s a
discreet space time representation that I’m going to use. There’s actually a continuous time version of
this which is published in the Journal of AI Research two thousand thirteen. But I’ll use the discreet
space time version because it’s easier to explain in the time that we have available.
Here we have three locations A, B, C, and three time points five, ten, and fifteen minutes. We have this
ferry which is jumping from C at five minutes to B at ten minutes, to A at fifteen minutes. Each of these
red box locations are places where an adversary could attack.
To protect this ferry we can run different patrols. We can run a green patrol going from B at five
minutes to C at ten minutes, to B at fifteen minutes for example. A patrol boat can protect the ferry
right next to it as shown by the red arrow.
Now we can try to run a minimax algorithm. This is a linear program that tries to minimize the
adversary’s maximum expected utility. By generating probabilities of each of the R routes that we have.
We can solve this problem. We can get a solution which gives a probability distribution over all of the
different routes. It’s saying well probability of brown route is point one seven, green is point one three,
etcetera, etcetera. This could work but there are N to the power T variables here, N being the number
of locations, T being the total number of time slices, because of exponential numbers of patrol routes if
you look at it this way.
>>: In these solutions do you make the assumption that the sensing action is not observable by the
adversary? In other words if let’s say preparation dissents was deliverable. Therefore an adversary with
continual monitoring ability could make decisions based on the ability to sense, prep to sense by the
protector. Would that disrupt the intent of the algorithm?
>> Milind Tambe: The algorithms intent is that, I guess the assumption is that the adversary has
complete knowledge of the overall probabilities of patrols. They have past surveillance. They have
done all of this. They know generally how these you know how your patrol probabilities work. They
have complete information about exact, but what they don’t have is online information saying right now
where is your patrol boat?
>>: Right, I guess if each thinks to characterize the real world sentric assumptions that are made in
terms of adversary’s information gathering ability, in terms of the details of thwarting, versus executing
an attack. In light of the dynamics of preparation and temporal properties of the protection where to
imagine that the adversary could do online observation of some aspects of what it takes to do. You
know nothing’s instantaneous, right.
>> Milind Tambe: Right, right, right.
>>: The temporal progression of a protection action. How that would…
>> Milind Tambe: The idea here is not so much, so we are not trying to disrupt the adversary online by
observing where the adversary is at this point in time. What we are trying…
>>: How about the other way around. Adversary has the ability to observe the temporal aspects of like
say the protecting Coast Guard boat coming out of its garage and where it’s housed and starting out to
do the surveillance. They can just watch continually for that action and wait until it doesn’t happen
before they engage their attack. Given it’s not we’re talking about windows of time, intervals,
preparatory phases.
Whenever I see this kind of solution I always wonder about the richness of the real world giving
adversaries all sort of rich surfaces to attack that thwart the assumptions of analytical. Approaching
those beautiful in terms of how…
>> Milind Tambe: Right, so I guess the, I mean that’s a great question. Something that you know
repeated always have to ask. The basic assumption here is that we are optimizing limited resources.
The assumption also is that the adversary has perfect information about your probabilities of patrols
beforehand. But online that they are, right, so they planned. The idea is something like this. That they
are observing everything, they have perfect knowledge. They’re going to plan an attack at a certain
time, certain place. Because it’s not something that they can sort of do a dynamic planning on that
stuff.
>>: [inaudible] is that whatever they do they act as a plan they can’t abort easily. You know so they
actually plan. They commit ahead of time…
>> Milind Tambe: Right, right.
>>: That it’s going to happen on nine eleven at nine thirteen a.m. on this plane. They have the same
problems we do with like you know committing to a tug boat going out at time T.
>> Milind Tambe: If obviously they abort their attack that’s good, but if you know but…
>>: Yeah, but not if they aborted every single time at low cost. I think it’s best to wait until an online…
>> Milind Tambe: The online thing is the more interesting aspect which is what you know more recent
papers have started to go after. To say okay, if you now allow the adversary to have some online
sensing can you improve your plans? That’s something also we’ve been thinking about and modifying it.
But at some point…
>>: It seems much harder.
>> Milind Tambe: Yes it’s, I mean if you allow for the adversary to also have perfect online sensing.
Where here we’re saying well the adversary has perfect offline sensing. If you now say well there’s also
an addition of…
>>: Even noise down line and some resource to expand the monitoring and aborting.
>> Milind Tambe: The, so if you allow for some partial online it’s okay. I guess obviously the trouble
becomes if you have perfect online and then the instantaneous attack. Then you are basically you can’t
really do anything. Then it’s a, you know so you need to have…
>>: More iron shields for your boat.
>> Milind Tambe: Right, right, so you need the, I mean there needs to be some limitations on
adversary’s capabilities for us to be able to protect based on what we have. Otherwise there’s really
nothing you can really do.
>>: Yeah but again, I don’t want to stop the talk and say it wasn’t it’s a hard problem. But if many of
these situations are formed adversary figures out when stuff is deployed and can wait until stuff is not
deployed. To do their action without a lot of planning, without a lot of infrastructure which typifies
terrorism, sort of lightweight, nasty attacks.
>> Milind Tambe: But I guess that depends, right. I mean if you are thinking about something that’s
preplanned. That’s carried out as a big attack. Then normally, I mean the way I’ve read incidents and so
forth there’s careful planning. It can be something where you sort of do it on the fly opportunistically.
>>: Bigger methods might create pressure to evolve those [indiscernible]…
>> Milind Tambe: Right, right, right and the more, so the idea is right in the beginning as I said. We’re
not guaranteeing a hundred percent. All we are trying to do is to improve, increase the cost and
uncertainty to an attacker in coming up with a plan of attack.
If we now put pressure on them that they can’t plan so easily if you know they have to have much more
coordination and all of this. Then that’s what we’ve achieved. Because given limited resources there’s
really you cannot guarantee a hundred percent.
Going back to this, the main idea here is that there’s exponential numbers of routes. Now if you can
focus on individual segments of these routes. Take the probabilities on these you know take the two
segments, the green segment and the brown segments supposing. Put it together into a single
probability variable. Take the blue segments and put that into a single probability variable.
Now you are generating probabilities over these marginals, probability marginal, probability flows. You
have N squared multiplied by T variables rather than the N to power T variables. We can now scale up
and generate the probabilities that you want by sampling from these marginals directly.
This is a way in which we can scale up to solve these big problems. There’s one technique. There are
many other techniques. If I have time I’ll come to some of these later on.
This is a small clip from…
[video]
Southern California to utilize game theory as a way of optimizing and scheduling our patrol makes it
harder for somebody to anticipate where the patrols will be. Even though we’re having to slide back
operations a bit. What we’re working on is ideas that will allow scheduling that gives the appearance
that we’re out there a lot more than what we are, because it puts the boat in the right place.
[video ends]
>> Milind Tambe: This is just showing off that you know some of our work has been mentioned in
Congressional testimonies. We’ve spent a lot of time. Obviously, as I mentioned earlier there’s a, you
know papers at AAAI on each [indiscernible]. They do their normal evaluations of algorithms and so
forth. Several of you are very familiar with that.
But given that the work is deployed in the real world. It’s important for us to also evaluate how this is
performed in the real world. The focus of our evaluation there is does game theory lead to an improved
performance over how things use to be done before? In terms of optimizing these limited resource.
We’ve certainly done evaluations in the lab to that end bringing in human subjects and simulations.
We’ve gone out in the field and looked at patrol quality before and after our system was deployed. A
running human versus game theory computations, who does a better job producing unpredictable
patrols? Who does a better job covering the right kinds of targets? Shown, that game theoretic
approaches have lead to better performance.
Then to the extent possible field evaluation against real adversaries and this usually is only possible
when the consequences of an attack are low. For example we’ve done tests on LA’s Metro Rail system
at fare evaders, catching fare evaders.
One, so this was done over twenty-one days of patrol keeping conditions as identical as possible. On the
one side using game theoretic approaches, on the other side a, humans were given uniform random
approaches. But they could interrupt them and go after fare evaders wherever they felt they had the
highest chance of catching fare evaders. Shown that, if you look at capture rate per thirty minutes that
the game theoretic schedules led to a higher rate of captures.
>>: Did you try to pure random? That didn’t have the human…
>>Milind Tambe: We couldn’t’ because essentially this is the real world trials. You know these are real
officers doing real tests. There’s just no way to say well you have to exactly follow this and this only.
Because they know they can catch more fare evaders so they just say well we are going to go and go
after that.
>>: [inaudible] on the game theory side you’re taught to follow the strategy only?
>> Milind Tambe: We, what we tell them is that even there you can interrupt the schedule if you don’t
like what’s in hand and go anywhere you want. What we found is that, so actually they didn’t know
what was what. They just got a schedule on a phone. They had to follow, if you don’t like it go
somewhere else and catch more fare evaders.
>>: [inaudible]
>> Milind Tambe: Right, right, right. But it’s just that they always interrupted the uniformed random
one. They never interrupted the game theory one. That’s just the way it was run. Everything was kept;
they didn’t know what was what. They just had a schedule on the phone.
>>: They didn’t interrupt the game theory?
>> Milind Tambe: They did not interrupt the game theory. They always…
>>: On their own volition, like of their own.
>> Milind Tambe: Right, right, so we just had you know they had the schedule on the phone. They had
to go and just you know follow or if they didn’t follow then go somewhere else. The phone would note
wherever they went and you know got more fare evaders. That was the whole experiment. It was not
possible to say well you must follow exactly what we tell you to do.
This is just what was done previously and originally my plan was this would get done by ten fifty, but
that’s okay. We’re just slightly off schedule. I’ll move on to Green Security Games which is the main
topic of today’s talk. Now, so we’ll get as much as we can get through. You can tell me when you’ve
had enough and we can stop there. I can go for nine hours as I mentioned.
[laughter]
We’ll start with talking about the work we did with US Coast Guard. This is as I mentioned earlier work
with the US Coast Guard. This is generating patrols against illegal fishing. We generated these patrols.
The software is with the Coast Guard. It’s under evaluation.
More recently we’ve started working for forest protection. This is with Michigan, our colleagues at
Michigan State University and an NGO in Madagascar. The idea here is again trying o protect forest
from illegal logging.
Another project that we’re working on is inspecting factories that pollute rivers. In particular important
rivers like the Ganga River. Here with IBM the idea is can we inform the inspectors in you know using
past data and so forth where to do these randomized inspections?
But the work I’ll really focus on for today’s talk is the work for protecting endangered wildlife. We have
many collaborations around the world. One for example is with Wildlife Conservation Society in
Uganda. This is Murchison Falls National Park in Uganda. Perhaps some of you have been there. Maybe
Lucas you’ve been there. Absolutely wonderful place, I was there last fall. You know really, really
breath taking in terms of animals and so forth that you see there.
But there’s threat to the wildlife. The picture in the middle there is a snare. The way this works is that it
opens up and they’re buried in the ground. When an animal steps on it, it closes and traps the animal.
Then the poacher comes and kills it. We were told that there are elephants who walk around in the park
with their trunks cut off and so forth because of this. To the right are pictures of wire snares. There’s
thousands of snares and that’s just from two thousand fourteen.
The picture on the left is me with the head of security at the park. The way they conduct patrols there is
that they’ll send out patrols for some months, collect data on wildlife, on snares that they can find.
Bring all the data back to headquarters, analyze the data, and send out patrols again.
We can cast the problem based on this inspired by this sort of an idea. We’ve cast the problem as a
repeated Stackelberg game. You can look at it in terms of the park, in terms of a grid structure where
each grid cell is a target, targets where there is more water, more animals, higher weight, higher value.
We are going to start by a defender now calculating a randomized patrol strategy on this game board.
When they execute randomized patrols for several months and poachers attack targets, we get crime
data. From this crime data we can learn something about the adversary’s decision making. Because we
are assuming now the adversaries are not acting with perfect rationality. But we’re trying to learn their
decision making model, their bounded rationality model which is then going to feed into our calculation
of a mixed strategy.
This is quite different from the earlier work on Infrastructure security. Because now we have attack
data which allows us to learn something about our adversaries and improve our decision making in
terms of calculating randomized patrols.
I’m going to now outline some of the research challenges and our solutions. This is as I said new work,
work that’s ongoing. There’s lots more to do. I’ll start by looking at the bounded rationality models and
algorithms that need to be generated for that.
Obviously we are getting data from the field. I’ll share some results from that data. But to speed up
data collection we also have constructed a poaching game. This is an online game where people play for
real money. This is played on MTurk. This is a forest area where we’ve divided up the forest into grid
cells. Grid cells with higher levels of green are less, patrolled less frequently. Adversaries can attack
without probability of, high probability of being captured. Red are areas that are patrolled more
frequently.
The number of animals, number of hippos in a cell is related to the value of that cell. We chose hippos
because they’re non-charismatic species. People will not feel as bad poaching. People are acting as
poachers. The algorithm is acting as you know is generating the different patrolling strategies here.
People can go to different locations. There’s a certain you know success if they get, if they’re successful
they get a certain reward. If they fail they get a certain penalty. They can see the probability of success
and failure. If they go to a location they succeed they get real money. If they go to a location their
snaring fails they lose money.
We’ve been playing these sorts of games with people for many years now, collecting lots of data. Now
normally if we assume a game theoretic rational player. What we will assume is that everybody is
calculating expected utility. Then, you know so this is the capture probability multiplied by a penalty,
plus the not capture probability multiplied by a reward. Everybody would go to one cell and we’ll find
all our attacks in that one cell or maybe just around it.
But what happens in reality is that the attacks are spread all over. The model that we found that better
fits how people are acting is this Quantal Response Model. What this is this is a Stochastic Choice
Model, cells with higher expected utility. People are more likely to attack cells with lower expected
utility. People are less likely to attack where the probability is not zero.
If you use this model of course that leads to a non-convectional linear optimization problem. You know
solving these problems. Once you get past that and come up with a defender strategy. We are
assuming that the adversaries are acting according to this Quantal Response Model.
>>: Where did the Quantal Response Model come from? The Quantal Response, I guess you have to
read McFadden’s article on that.
>> Milind Tambe: Yeah, so I guess and there is a long history that goes back to Luce in nineteen sixty
and so forth. But I guess you know McFadden got a Nobel Prize for you know some of this work. This is
sort of rooted. Now it’s sort of invaded game theory. There’s a lot more work on kind of different
Quantal Response Models and so forth. But this one point of origin. Some people actually cite Luce and
go back even more too random utility models and so forth.
If you assume that the adversary is acting according to Quantal Response and generates our strategies
according to that and play against humans. It turns out that we perform better. These are just four of
the games that we’ve played on the Y axis defender expected reward on the X axis of four different
games. Lower is worse, higher is better.
If you assume adversaries are acting according to perfect, assuming perfectly rational we get the blue
bar which is the worst performing bar. Epsilon rationality is green, Quantal Response leads to red.
Assuming that the adversaries are acting according to Quantal Response leads to better performance for
the defender.
We can do, go on.
>>: With adversaries in dynamic but you have to be fixed.
>> Milind Tambe: We have to be fixed. The adversaries then choose a particular location to attack
knowing the probability of where we are patrolling.
>>: Then come back the next day if they think things have changed.
>> Milind Tambe: In this case it’s a one shot game. In this case we are just playing once and that’s it. In
Thanh’s PhD thesis she figured out that there’s a better way, improvement of Quantal Response that’s
possible which is Subjective Utility Quantal Response. We are looking at each, it seems like we could
model people as though they’re looking at each target. There’s capture probability, reward, and penalty
as some sort of features of that target. Then you take a weighted sum. The weights are learned from
past attack data.
Then use that subjective utility to generate a Subjective Utility Quantal Response Model. In human
subject experiments and playing against humans, assuming they’re playing Subjective Utility Quantal
Response leads to better performance. But this is all lab data.
More recently we got data from Uganda, seventeen years of data. This is showing a fit of the model that
we have to the data from Uganda on poaching attacks. These are ROC curves. The main thing we want
to know here is that going to this corner is better as we know. You can see that SUQR which is shown in
blue is significantly better than Quantal Response and other models that we could think of.
Now Thanh has been working on improving the model. Go ahead.
>>: Which one is the actual data? These are predicted. The other ones a human better picture.
>> Milind Tambe: Basically we’ve learned from let’s say past years. Then we are trying to predict for
next year. We may have you know, so this is like from training we have learned about you know our
models weights. Then on testing we are testing, you know predicting.
>>: If you were to fit the data that you had using the like you know the Quantal Response or the SUQR
response, which one’s a better fit? I’m just trying to understand.
>> Milind Tambe: Blue, the blue is a better fit. The more you put words here the better it is.
>>: [indiscernible] people that are caught versus that get through, right?
>> Milind Tambe: Sorry?
>>: In this model the true positives are catching folks.
>> Milind Tambe: Right and so we are trying to predict poaching attacks.
>>: It would be really nice to get the model. I mean, I’m sorry I don’t mean to…
>>: No, I don’t have any more.
>>: Oh, I was just as interested to know what the, do you use a Bayesian Model to get this prediction?
How do you generate this Subjective Utility? Try to predict the probability on what a utility. I could say
like for example really rich models that use features like availability bias like my cousin was caught in
that region. That really sticks in my mind and makes a higher, you can image like people amplify recent
[indiscernible] for example.
>> Milind Tambe: That’s, I’m going to that. I’m going there. That’s wonderful, that’s wonderful. This is
and Thanh is working with, this is our you know partly our collaboration with Lucas. But we, so this is
assuming a single adversary type. Meaning everybody has the same weight.
>>: But is it capturing emotions like this [indiscernible] like for a bigger reward I’m building two things
on risk versus a very conservative poacher that is trying to [inaudible]?
>> Milind Tambe: We haven’t yet gone there. I mean what we have done is taking all the animals to be
the same value right now. Just learn the weights from the data.
>>: What’s the penalty for being caught for like three weeks in jail or something?
>> Milind Tambe: Right and so we’ve just. I mean there’s a lot more to be done in terms of all the, and
there’s all kinds of features in the domain, slope of the land, the vegetation cover, all kinds of things like
that. What Thanh is doing right now to give away some of where she’s going is well the observations
are not perfect, so we really need to model observational uncertainty. All of that is yet to be done.
>>: But I can see that. I can imagine a really rich dissertation maybe it’s the top [indiscernible] on just
using the basic standard Quantal Response Model. Looking at inducing the human probabilities based
on sets of features. Like notions like the actually logs of what is a model of what somebody has heard
about. Then use about what someone’s caught for example.
Did somebody, can you compute whether somebody’s family member was, somebody they would know
on a social graph would know about them being caught on grid cell X, I, J. Actually showing how these
you know what the actual landscape of the cognitive assessments are versus the actual probability
which also have an influence. [indiscernible] if they’re ideal they would learn the actual probabilities.
But it’s not going to do that they’ll have their own model of this. Things like [indiscernible] are very
important I think.
>> Milind Tambe: Yes.
>>: Biases also.
>>: Yeah, so…
>>: One other thing, oh, go ahead.
>>: This model is assuming that they are observing…
>>: Perfectly.
>>: What the guards are doing.
>> Milind Tambe: Perfectly.
>>: Perfectly.
>>: Yeah.
>>: Whereas people are going to have some kind of a local or bias now which means [indiscernible].
>> Milind Tambe: Well that is true.
>>: If you ask them like you know what’s the product you actually kind of cool to get one of these
poachers and show them the grid.
[laughter]
Show them the land and say give me a probability and go out in these places. Then post a news story
about like real, real story the person X was caught combining with a tiger you know in this area. It’s like
okay now we assess and see how the, things you can sense about things that people could see could be
used to model what people would think.
>> Milind Tambe: You can see why we are so, I mean this is an important area and why we are so
excited about all the different ways in which this can be extended. One way in which we thought it
could be extended is by going from SUQR to Bayesian SUQR where we allow for different groups of
people to have different weights.
>>: Oh, yeah absolutely, well that raises a question. I mean I assume this SUQR…
>> Milind Tambe: SUQR.
>>: Was Bayesian. What do you mean by non-Bayesian versions of this?
>> Milind Tambe: I mean in the sense that a single adversary type. In the sense that…
>>: Yeah, but how do you model? How do you come up with data with that estimate?
>> Milind Tambe: Of the weights, from past attack data. We look at, let’s say you may have, you know
we have seventeen years of data in this…
>>: No, no not data. You actually use past attack data to induce a probability…
>> Milind Tambe: To look at the weights.
>>: The W.
>> Milind Tambe: The W.
>>: The weights.
>> Milind Tambe: Based on those weights then we get the probability of attack.
>>: I see, so the weights are basically ways to weight the probability based on what actually what they
did, I see. Yeah, I like the Bayesian idea.
>> Milind Tambe: Okay, so that’s what we did…
>>: That would be great, yeah.
>> Milind Tambe: That’s what we did next. Now we say there’s a heterogeneous population of
poachers. We learn different groups of weights. Now we want to go from one shot games to repeated
games. We, right now we’re playing this on Amazon, on MTurk again. Our thought was SUQR is good,
clearly beat people in single shot game. Bayesian SUQR must be even better.
We played repeated games. This is not easy on MTurk. We have to chase people, so for every game of
forty people we recruited and chased them by sending many, many emails, please come back play this
game. Please come back play this game. This is what we call a longitudinal study. In a sense there’s
thirty-five weeks where we chase forty people to play this game again and again, and again.
My student Debarun sent ten thousand emails in the process to get people to come back and play this
game again. This is what it took. At the end of it, so we wanted to show that a learning SUQR model,
one where we learn, would do significantly better than a non-learning model.
These are rounds of the game on the X axis. Defender expected utility on the Y axis. Lower is worse.
Higher is better. Maxim is the most simple strategy we could think of, no learning, nothing. What we
found is that our SUQR model performed worse than maximin. Bayesian SUQR performed far worse and
never recovered. There’s something, so eventually SUQR kind of learns and settles and improves over
maximin. But round two clearly it is much worse.
There’s something missing in the process. We were trying to understand why is it in one shot games we
were doing so well? As soon as we went to repeated games things started falling apart? What we
found out is that people will do super, what we’ll call superstitious learning, as it is known in the
psychological literature.
Essentially, there may be something; they may have made a really good choice. A great place to attack
in terms of expected utility. They go there by random chance. This is randomized patrolling they get
caught. Then they say this was not a good place to attack. Let me go somewhere else. They may have
made an error. They may have gone to a wrong place but by random chance they succeeded. Say, well
this is a really great place let me come back here again.
In a sense, they get this kind of reinforcement which we can model by essentially increasing or
decreasing subjective utility from the last round. We are now bumping up this subjective utility based
on whether they succeeded or failed, rather than where they attacked which what we had used in the
past.
>>: Not then compute. A true Bayesian Subjective Utility would actually have models of this strange,
these strange biases. You’d actually put distinctions in about things like recent C and superstitiousness.
This is what I mean by Bayesian I could have said…
>> Milind Tambe: That’s okay.
>>: You want a rich Bayesian model of what people are doing. It might be a very, very deep model that
gets them to things like this idea of fallibility and bias by experience. I’m surprised that that’s doing
worse for defender utility.
>> Milind Tambe: I should say that the R terminology of Bayesian SUQR is basically, you know Bayesian
as in you know we have Bayesian games. We have adversary types. We are saying there’s you know
different kinds of adversary types.
>>: Yeah.
>> Milind Tambe: That’s what, so what we, I mean what this is saying is that there’s an additional effect
over and above that simple characterization of types. As simply being characterized by a different
weights for different populations, or something like that. We need to have an additional phenomenon
that we have not captured before.
>>: Sounds like you’re not clear about how you get the Bayesian. Where do the weights come from?
>> Milind Tambe: We’re learning from data.
>>: Okay, so you actually get, by Bayesian you mean you don’t marginalize all data to the single
adversary type. You actually allow given experiences. But it’s still being done coherently versus this
idea of we’ll call it for now given that you’re [indiscernible] Bayesian, human cognitive SUQR.
>> Milind Tambe: Yes.
>>: Like…
>>: I know it basically referred to it but that is the Bayesian approach. Anyway, but you see you meant
probabilistically sound with separate groups based on keyhole views into where things have happened,
yeah.
>> Milind Tambe: Right and go on.
>>: It is like the overfitting problem in machine learning. Given that you are trying represent some
complex phenomenon with simpler presentations. Then you over fit to that then the performance gets
worse after some point. I wonder it’s like that kind of…
>> Milind Tambe: That order, I guess our, I mean one thing what we are saying is that there’s an
additional phenomena. That we had not taken into account which certainly is not on the behavioral
game theory literature as far as we are aware to take this additional phenomenon into account.
I mean obviously people are getting the inference. I mean that we know. But this is saying this is
embedded on top of the Quantal Response Model. That, once we do that then it improves
performance. In a sense we are saying yes there is some truth to the Quantal Response Model,
everything is good.
>>: What is [indiscernible]?
>> Milind Tambe: That’s the one where we take the SUQR and add this bumping up and down of
Subjective Utilities based on proportion of people succeeded or…
>>: [inaudible]. That’s like off of Bayesian but interesting.
>> Milind Tambe: I mean we do…
>>: No, what I meant by [indiscernible] I meant like that to understand the actual foundations of the
cognitive model based on repeat exposures to evidence would be a really rich, beautiful piece of work.
In the SHARP model…
>> Milind Tambe: Which I mean it’s an acronym. We love acronyms so it’s not, there’s a more spelled
out name to it.
>>: Oh, okay.
>> Milind Tambe: But in any event the point is that it is taking the Subjective Utility model and saying,
well we’re going to increase and decrease it based on the experiences people had.
>>: Right, so it would be really nice to featurize the space from the point of view of a lot of
observational features [indiscernible] and various events. How many might be scaled, motions of counts
of experiences. What would be cognitively salient and build like a BaysNet and see what would happen
beyond weights going up and down. But it sounds Heuristic right now.
>> Milind Tambe: Yes and, and…
>>: I just think it’s this really great opportunity to make those black lines even higher there.
>> Milind Tambe: Some of it not all of it, some of it in Thanh’s work that we were doing with Lucas now.
>>: Yeah.
>> Milind Tambe: But there’s a lot more. I completely agree there’s a lot more. There’s a big…
>>: What’s exciting about it is that we you know I’m not sure what the literature is like these days.
Modern versions of probability assessment of humans like the Almanac games we can play. But you
know for the kind of repeat game situation with a rich static, built in a static frame like poaching in these
grid cells. It would be really nice to see the models. You can display with just lots of data and how
people believe by doing [indiscernible]. Maybe actually have data we can actually play with from the
MTurk games and see if we can build a BaysNet that does, compliment certain, the work we’ve done.
>> Milind Tambe: Yeah, so let me push ahead because I do…
>>: [indiscernible] also is that you learn about people and how they over react, and what their biases
are to events in the world. But the most memorable, a lot of it will be kind of a Bayesian approach to
[indiscernible] literature. Say if we have these distinctions now build a model that predict probabilities.
>> Milind Tambe: Now I want to tell you something more [indiscernible]. This model SHARP in our
experiments people’s probability perception we found is exact opposite of how it’s stated in prospect
theory. We are not sure what’s going on completely. What the data shows is that in their model
basically people over weigh low probability and under weigh high probability. In our experiments it’s
flipped. People under weigh low probability, so it’s like point two probability I don’t care I’m going in.
Point six, oh that’s a lot I’m not going in.
>>: But it also could be a couple things, right. It could be, and for the sake of while in discussion and
diving deep dive. But it could be how you’re representing risk aversion. You might have people with a
strict timeline. If they’re taking risks they’re not being risk neutral, so that could explain some of those
data effects.
Prospect theory gets into also not just the [indiscernible] handle. But in my mind more about the
overweight versus utility and under weighting where loss, losing utility is much more painful.
>> Milind Tambe: Yes, yes, yes.
>>: But with a scaler scale [indiscernible] scale then gains in and out back, so it’s…
>>: I also wonder you know the gameless game is all more complex than most games that psychologists
work with in their experiments.
>> Milind Tambe: Yeah, yeah.
>>: I know it’s really rich that’s what I was saying, yeah.
>>: Yeah, it is big. It has a lot of pro-qualities. It’s a sequential so it’s coming back. I wonder if people
are following some strategies just to make the problem simpler in their minds. Kind of parcel to space
place like not probability small I’m not going to worry about it. That probability is light I’m going to
focus there to kind of make the problem complex and manageable for one.
>> Milind Tambe: Yeah.
>>: If the problem was simpler.
>>: Right.
>>: Maybe this effect on probabilities would be significantly different.
>> Milind Tambe: I think this very interesting…
>>: Yeah, yeah.
>> Milind Tambe: Because you know the context of how experiments were done in the past versus this
may. I mean this is what our psychology friends tell us. That might be the source of explanation.
>>: You could imagine following up on AJ’s you know sort of ideas and having enough data to say here’s
a rich new cognitive model about how people deal with the bounded memory and state space pruning.
You can actually by building a structural model generate human like probabilities given exposure. This
would be like a, we should like do this for a year, let’s just do this.
>>: Yeah.
[laughter]
>> Milind Tambe: That would be great, great.
>>: [inaudible] game. Make it simpler…
>>: Yeah, yeah.
>> Milind Tambe: Yeah.
>>: [inaudible] by [indiscernible] of probability at this way…
>>: This is like…
>>: [indiscernible] it again and again.
>> Milind Tambe: We allow for that.
>>: This is like the kind of thing that like big Josh Tenenbaum [indiscernible]…
[laughter]
>> Milind Tambe: Yeah.
>>: This kind of model human [indiscernible] bound the rationality of probability assessment.
>> Milind Tambe: Yeah.
>>: If those some playing with how we present information you can say point two but you can also do
something like the music play while they play the game [indiscernible]. Then someone just got caught
here.
[laughter]
It’s not really a high probability but they just…
>>: It’s [indiscernible].
>>: Yeah.
>>: What a good idea, yeah.
>> Milind Tambe: Spatial reasoning I’ll try to walk through this somewhat more quickly.
>>: We’re excited.
>> Milind Tambe: Oh, good, good, good, good, good. That’s the main point.
[laughter]
That’s the main point. This is Andrew Lemieux our collaborator. He tried this out in Uganda this PAWS
system which is generating patrols. He came back and said you need to pay more attention to
geography. Because we’re asking people to walk over water because water bodies had emerged and we
didn’t know about this.
Then we started working with Panthera. This is you know part of Malaysia. It’s a wonderful place. They
started patrolling, so this is finding a tiger footprint, a poacher’s camp along the patrols. But they also
came back and said parts are difficult to follow. They are in Malaysia. We are in LA. We’re having Skype
calls. They’re saying that the shortest distance between two points is not a straight line.
We were very confused by this. We went to Malaysia to see what was so special in Malaysia. We did an
eight hour patrol ourselves in the forest. This is at the beginning of the patrol. We’re so happy. This is
me, Fei, and my former…
>>: [indiscernible]
>> Milind Tambe: Yeah, former postdoc Bo, eight hour patrol ready to go…
>>: [indiscernible]
[laughter]
>> Milind Tambe: You can see at the end of the eight hours completely gone. This is you know, so this is
us in the, I guess it’s not playing. But anyway, so this is just suppose to show you patrolling, us actually
patrolling. But in the process we figured out what was going wrong. Fei figured out something very
important.
If you look at it in terms of a three D view on what was going on. It’ll turn now I guess it’s a little bit dark
here. But you can see here there are mountains and, well you can’t quite see. But it’s a very
mountainous region. If you ask people to just walk in straight line you’re asking them to go up and
down and they lose a lot of energy.
What Fei figured out is that in the forest inside actually there’s a hidden street map that people are
following. They’re following ridgelines. Their following water you know riverbeds. This hidden street
map is what the patrollers are actually following. We need to grab that and work with that.
If you look at it in terms of how this, we can now build a Hierarchical Game Model. There is a grid
structure. But the defender is actually not playing on a grid. They’re playing along these patrol paths.
This street map which is made of ridgelines, streams, from which we get the street map from, which we
generate actual patrol routes.
Defenders are choosing paths in the patrol routes that adversaries however are only looking at
individual grid cells and attacking on grids. Adversary’s action is local. They’re not trying to figure out
which patrol route you’re taking, what, what probably. They’re saying in my area of interest how
frequently do you come. I’m going to go based on that choose my action. This is kind of a Hierarchical
model that we’ve built and used in PAWS.
>>: Can we back it up a little bit here again.
>> Milind Tambe: Sure.
>>: I’m surprised that the attacker/poacher actions wouldn’t also be guided by efficiencies of access the
same way that the defender/patroller is.
>> Milind Tambe: They should be. This is a computational convenience right now for us. Meaning that,
because if we also modeled our attacker as having this it’s just…
>>: The same route space basically.
>> Milind Tambe: Yeah, then it just, it’s, now we are talking about path intersection and so forth.
>>: Yeah.
>> Milind Tambe: I mean it’s already; I was just going to tell you that this next part which is handling of
uncertainty shows you some of the complexities which is part of Fei’s thesis. Even with this model that
we have…
>>: But even if you use the convenience of the grid cell to the attacker. You could take the probability
mass of what they’re going to do and spread it over the actual paths, right.
>> Milind Tambe: That’s possibility.
>>: That way you would do the inference. Then it’s going to at least concentrate the probability on the
paths.
>> Milind Tambe: That’s good, that’s good, that’s good. I have fifteen minutes. Let me walk through
some of this and…
>>: You can go longer.
[laughter]
>> Milind Tambe: I would love…
>>: This is our lunch, right, so we can keep on going.
>> Milind Tambe: I would love for this talk to continue.
[laughter]
>>: We can’t, we have to see everything.
[laughter]
That you had intended at least.
>> Milind Tambe: One of the challenges is that we don’t have exact information of where the animals
are. There’s uncertainty, which means that the payoffs have to be modeled with uncertainty. One way
to model this uncertainty is in interval uncertainty. There may be other ways too.
But here for example we’re saying that the adversary’s payoff here is between minus four and minus
two. We don’t know exactly what it is. Now to solve these problems one approach is a behavioral
minimax regret. Minimax regret is a technique in AI some of you may know very well.
I’m just going to walk you through how this would work and notionally explain how the algorithm works.
The regret of a particular strategy X can be computed in this fashion. We take a sample of the game, so
we’re just looking at one sample of the payoff. X is the defender strategy. This Q represents adversary’s
response because they’re playing with Quantal Response.
Now we get a defender expected utility of minus of point five. We’ve played a strategy. The adversary
has responded. We got a defender expected utility. The regret of this strategy X is if we had played the
optimal strategy X star we would have actually gotten a defender expected utility of point two. Our
regret here is point seven because we got minus of point five. We should have gotten point two. That’s
the regret of this strategy X for this payoff instance.
Now if we were to play the strategy X on another game instance which is shown here, the same strategy
but on another game instance from the same interval uncertainty that we had. It turns out that the
regret has gone up to one point two instead of point seven. Because here the optimal strategy would,
the defender’s strategies would have given us the utility point, minus of point nine.
Going over all possible instances, over all of the entire interval we, if we can find that one instance that
would give us the maximum regret. That’s what we would get as the maximum regret of playing the
strategy X over all possible game instances.
We want to find that one strategy X which minimizes our maximum regret over all possible game
instances. That’s what we are after. We’re trying to come up with this, solve this optimization problem.
We’re trying to minimize our regret where X and R are decision variables. This is the utility loss to the
defender for playing a strategy X or all possible payoffs that could be sampled. The only problem here is
we would solve this except that there’s infinite number of constraints. Because all PAWS in you know
the space is continuous.
How do we solve these problems? One way to do this is by iterative constraint generation. The idea
here is we start with master problem which is going to solve our minimax regret for some sample
payoffs. It’s going to find that one strategy X which is minimizing maximum regret for sums that’s a
three sample payoffs.
It finds this optimal strategy. Using that now we find an upper bound which is that payoff which gives us
the highest regret given this optimal strategy. We feed that back into the master. Now we’ve kind of
created a lower bound and an upper bound. We can keep iterating until the lower bound and upper
bound collapse and we get the optimal, the one that minimizes our maximum regret.
In pictures, so we start with a payoff sample we get an optimal strategy. Now it turns out that this
payoff maximizes our regret given that strategy. Now we have these two instances. We generate this
optimal strategy. This is our lower bound. But now given this as our strategy we can generate a payoff
that maximizes our regret again, so this is an upper bound. Now we generate a strategy which
generates a lower bound to that. We keep iterating in this fashion until we converge and there can be
no more improvements. That’s our minimax regret.
Now, unfortunately what the assumption here is that we have only two rows, two defender strategies.
This works as long as the number of defender strategies is small. If the number of defender strategies
becomes very, very large because there’s many, many possible routes in the forest.
We need to embed this iterative process in a bigger iterative process. This arrow algorithm which is the
behavior minimax regret is now embedded in another process we’re incrementally adding new routes.
You generate a few routes. You run this minimax regret. Then you add more routes. You generate
minimax regret. You keep doing it until the whole process converges.
It’s sort of this double kind of iteration of convergence that leads to optimal, so it’s sort of, so initially
it’ll start with just two routes, do minimax regret. Then add one more route, do minimax regret, until
there can be no more improvement in the process. That’s…
>>: That’s kind of you’re assuming that if someone’s on a route in a grid cell they could inspect and also
scare off a poacher anywhere in the square?
>> Milind Tambe: We are assuming that they have to cover some fraction of the, but other than that,
yes. That’s what we are assuming, exactly. This is PAWS. This is what we have built as a system. We’ve
handed over to Malaysia. To, so PAWS has Hierarchical modeling, cutting planes, minimax regret.
There’s more, so Fei wrote this paper that well we can do more with planning and learning to kind of
deceive attackers and so forth. But this is not in the implemented system. It’s in regular use in the
sense that patrols are going out by Panthera. These are some pictures from those patrols.
This is showing you now one patrol that was planned and how closely it was followed, showing that they
are able to follow these patrol routes. This is basic information of the patrol, average trip length,
number of patrollers, number of hours per day that they travel. These are some pictures from the
patrols they found a poacher’s camp there.
This is a, you know we are try to compare how well these patrols are performing. There’s not a perfect
comparison. But, if you look at how many animal signs per kilometer we’re finding how many human
signs per kilometer. This is our patrols blue and red. Green is what they were previous patrols. But
these are not exactly for the same purpose. They were all trying to do tiger survey. Maybe the animal
signs per kilometer found is a more reasonable comparison, this may not be.
>>: Which animal signs?
>> Milind Tambe: This is for print of a tiger or for print of something.
>>: This is where animals have to be walking around.
>> Milind Tambe: Are walking around…
>>: Human signs are assumed…
>> Milind Tambe: Poachers making marks, poachers camps, poacher you know the lagers behind they
would choose the [indiscernible].
>>: Why is animal signs important because poachers will go there eventually?
>> Milind Tambe: When…
>> Lucas Joppa: But also part of it, part of the input that they’re optimizing over is a distribution model
of, an animal distribution model that’s built off of animal conservations…
>>: Well certainly the animal distribution models possibly the ease of access should be in the
probability, adversary’s probability model you’d think. You’d think that they learn about an animal
availability also.
>> Milind Tambe: That is in the SUQR has a weight given to animal density per grid cell.
>>: Okay…
>> Milind Tambe: That’s where that’s covered.
>>: [inaudible] ready, yeah, good.
>> Lucas Joppa: One of the things that’s coming in here. That will eventually be coming in here are
things like GPS tracks of animals in the park, right. The thing that’s interesting is like these street maps
are kind of a proxy for animal movement patterns, as well. But not a perfect proxy enough, so there’s
going to be, there’s an animal street map. There’s a human street map and where those two things
intercept are where poachers set snares, basically.
One of things I’ve been thinking about is saying, okay, well we have uncertainty in the species
distribution model that animals have. I mean there’s uncertainty of the mental model that poachers
and rangers have about the true underlying distribution of animals. But that model we have uncertainty
about.
We can ask the question of saying well we have a potential maximum reward of catching poachers now
based off an animal distribution map with current uncertainty. If we take a little bit off of that expected
return on poaching to collect better data on animal signs. That will increase that observation map.
Then five steps down maybe we’ll be doing better than we would five steps down if we just went for
current maximized anti-poaching efforts, right. It’s this kind of short term versus long term data versus
poaching observation stuff that’s interesting.
>> Milind Tambe: Last couple of slides. What we are hoping for is collaboration with Panthera to go to
many different sites that they’re active in this under active negotiation and discussion with them. We
will see how that goes.
We have also started working with WWF. This is a workshop in Indonesia that Thanh went to. This is
Nichole one of our members of our group. It’s a workshop with the rangers and so forth, WC is WWF
and so on in Indonesia. It was a PAWS workshop so Nichole got a sign; everybody got a PAWS certificate
for attending the workshop.
As part of the workshop Thanh and Nicole went on elephant rides. I mean this is because they are trying
to protect elephants. But this is another collaboration that’s budding, that’s moving forward. We hope
also to collaborate with WWF to deploy in Indonesia in the near future.
That’s it. I guess to summarize we’re moving from Infrastructure security games which is what we were
doing in the past to Green security. This opens up a whole new set of research challenges because from
single shot games we are going to repeated games. There’s larger amounts of repeated attacks which
gives us data, which allows us to build models of human behavior.
This is exciting because we’re from behavioral game theory sum models. But here now we have real
data from poaching from the field. All of the insights from conservational biology and so forth can now
be brought in. There’s a very interesting intersection of conservation, criminology, and computation
that’s possible here. Modeling attacker bounded rationality the fact they have limited surveillance.
They have limited planning. All kinds of things like that are important. Planning you know forest paths
and all kinds of, we already discussed many, many other research challenges here.
From our perspective even though there are global efforts in security games. It’s just the beginning.
There are many applications from testing in muks to ordered games, to software testing, to modeling
diseases. Combating diseases that people are working on where they’re filled with seeing security
games being applied in all these different directions. We are working with the TSA on a newer project
for improving airport screening.
I want to end by thanking our sponsors, our Create Homeland Security Center, Army Research Office,
TSA, and the US Coast Guard. Thank you.
[applause]
Go ahead.
>>: In this surveillance starting especially in the forest. Is it feasible to have drones?
>> Milind Tambe: It is. The thing is though I mean Lucas may have far more information about drones.
What I heard in Uganda, in Malaysia, and so forth is that there just is not the infrastructure to support
drones. But Lucas may have far more information.
>> Lucas Joppa: Yeah, it depends; I mean a lot of it is like they’re flying a lot of these missions in
Savannah Systems where drones make sense. Because you can actually see stuff a lot of the things like
in Malaysia you’re looking down at a green canopy. There’s not much to see.
There’s a lot of problems with drones. Hopefully that’s some of which we might solve over the next
couple years. But there’s a lot problems with drones and deploying them in real world situations with
people who aren’t drone experts and engineers.
There’s a guy, I mean you guys’ maybe, you probably know him, Tom Snitch at Maryland who’s been in
the news a lot for working with the Lindenberg Foundation for deploying these kinds of optimal
predicted patrols for drone routing. But again they work in basically there’s like a van like follows this
drone. You know I mean it’s pretty kludgy still, anyway.
>>: What are your thoughts in any of these areas we looked at. You could always invest more in X, Y, or
Z. More people, increasing the number of patrols, increasing the number of boats, admit a new sensor
like a drone. You could take a really nice opportunity to turn a crank given your best method that you
have available. To sticking in a new sensor and we turn the crank with probability distribution of what
you’d see if you use that new sensor. Then compute the value of adding that new sensor or policy or
both…
>>: Combine it.
>>: Where patrols change. We call this maybe value of enhanced policy. Here’s the dollar value and
here’s what might crank with my methods as base level. But that end times would tell us. You think
that would be a really nice way that you provide guidance to several agencies.
>> Milind Tambe: Right and we’re doing some of that with AVG and so forth in Madagascar with our
colleagues in Michigan State which is given the fixed number of dollars what’s the best you can buy in
terms of equipment.
>>: Well that’s one thing. Then you say get, but N more dollars…
>> Milind Tambe: Incrementally how much more can you get? That’s…
>>: Yeah and the cool thing is you have the base apparatus to doing each baseline. It could be value
information or value sensing on top of the current policy.
>> Milind Tambe: That’s a very good.
>>: You think it’d very fun to try that out?
>> Milind Tambe: Yes, I agree, I agree with that. Okay, go ahead.
>>: [inaudible] does the approaches generalize reason [indiscernible]…
>> Milind Tambe: We work with a conservation drone company. I do not remember the name of it in
the Netherlands. Just for awhile to generate routes for them for preference solicitation. You know
where would you fly in order to reduce uncertainty, in the forest the most, things like that.
We were really hoping for that collaboration to move forward. But it’s tough. I mean it just seems like
the infrastructure in many of these countries is just not there. We heard in some places from people
that we are not ready you know for such things. It could be, maybe it’s a couple of years. I don’t know
how many more years. But right now it’s not there yet.
>> Lucas Joppa: It’s just like there’s cultural sensitivities obviously. Then there’s topography and
budgetation, like Nepal just banned the use of drones full stop.
>>: For privacy issues?
>> Lucas Joppa: Yeah, I mean they take the welfare of their animals and nature very seriously. They
don’t want anything flying around.
>>: Just in time to not, to prevent the use for helping with the earthquake over there, right.
>> Lucas Joppa: Yeah, yeah, exactly, yeah so yeah that’s…
>>: Yeah, there’s the other side of the problem that the poachers can use the drones as well.
>> Lucas Joppa: Yeah, so I mean that’s, yeah.
>>: Yeah.
>> Lucas Joppa: I mean it’s the, but there’s the drone issue in conservation is something that you know
we’ve been interested in for a long time. It’s kind of blowing up at the moment. But the gap between
reality and hype is still significant. I think what you know this kind of work is coming this way, drones
are maturing over here, and probably in a couple of years there will be sustained flights. In you know
certain national parks in South Africa for instance daily running optimized patrols.
Those sorts of things, I think we’re within one to two years of that. But then large deployments is going
to be…
>> Milind Tambe: There’s, I mean there’s no doubt so many interesting questions that those things
open up, lots of things to think about there too.
>> Lucas Joppa: Like Balkan Paul Allen’s organization they have this great African Elephant census that
they’re basically trying to count all the elephants in Africa by flying missions in manned aircraft over the
continent, right. Of course they have sampling strategies for trying to infer.
But obviously they’re desperately interested in how to do this without actually putting people in the air.
Everything from image recognition of elephants on the ground to long flight times, and things like that,
but also being able to do, build out those routes based off of not necessarily. Well, I mean encounter
probabilities basically if they want to be able to fly. There’s people driving this forward it’s just a bit
brutal.
>> Milind Tambe: Okay, it’s past twelve. Thank you.
[applause]
Thank you for coming. Thank you for very interesting set of questions, thank you.
Download