Document 17865075

>> Eric Horvitz: Okay, we’ll get started. It’s great having Tuomas here from CMU. Tuomas is a Computer Science, Professor of Computer Science. He has affiliate appointments in the Machine Learning Department at CMU, as well as the Program in Algorithms, Combinatorics, and Optimization. He’s also part of the newish CMU/UPitt Joint Ph.D. Program in computational Biology. I’ve known Tuomas for a number of years. He’s been working in areas of interest to both of us. I mean that are dear to my heart and soul in bounded rationality, decision theory, game theory. Some of the core challenges we face in AI where we have limited resources and information. He developed per the topic today some leading algorithms for several general classes of game. These algorithms we’ll hear more about today, one of the most recent world championships in computer Heads-Up No-Limit Texas Hold’em. They also have lots of interesting implications for other kinds of problem solving in Machine intelligence, more broadly. Tuomas did his Ph.D. work at UMass Amherst working with mutual colleague Victor Lesser. This was before coming out to CMU. He probably remembers on his way to becoming a professor stopping off at Microsoft Research where Jack Breese and I worked very hard, including a boat trip in a rain storm. [laughter] Hope it’s a different jacket than you’re wearing today. >> Tuomas Sandholm: I wasn’t thinking it’s exactly the same looking jacket. But I swear it’s a different jacket. [laughter] >> Eric Horvitz: We tried to commit Tuomas to come MSR as a full time researcher back then. I think he actually considered us seriously, rainstorm to the side. But literally he was dressed like this in a pouring rainstorm on a boat in the middle of Lake Washington. He was very good spirited about it. In two thousand three Tuomas won the Computers and Thought Award given out by the IJCAI folks. He’s a Fellow of ACM, AAAI, and INFORMS. He’s published; I often reflected with some colleagues that Tuomas is like a publishing giant in terms of the breadth and depth of his publications in several different areas of work. Beyond that he’s managed to start and sell a very interesting startup, run it over a number of years. Help on the boards of other kinds of startups and venture oriented entrepreneurial projects. He has algorithms running in the real world including I think a very interesting set of procedures that are now being used in kidney exchanges. He came here about three or four years ago and just gave a talk on that work which I think has such incredible societal benefit, so with that come on up Tuomas. >> Tuomas Sandholm: Okay, thanks a lot for the very kind introduction. Thanks a lot all for coming here. I was asked to talk about kind of the state of the art in solving poker. I wanted to generalize it a little bit out into games beyond poker and kind of the technology capability that the community has built over the last ten years which is a huge leap. Ten years ago there was this kind of perception that they don’t work on game theory, can only solve toy problems. Well that’s no longer true. I couldn’t help myself; I put some new content in here as well. It’s not just an overview talk. If you have heard all of my talks in the past there’s about thirty percent new in material here. This is joint work with a number of Ph.D. students and collaborators. I’ll show those names on the slides as we go through each piece. Game formalism that we’ll be using is going to be the absolute standard extensive form Incompleteinformation game, which is this. There’s a game tree much like in chess. Except that there are these white nodes which represent natures move. You can model nature as a random player that moves stochastically not strategically, and has some probabilities for their actions. Also there are information sets which represent incomplete information. For example when the red player is in this state or this state he doesn’t know which one of those two states he’s in. He knows that he’s in one or the other but doesn’t know which. Similarly the blue player doesn’t know which one of these nodes he’s in when it’s his turn to move, any questions from that formulism? Okay, what’s the strategy here? Well it’s a mapping from information sets to actions. It would say for the red player which action will you take here, which action will you take in this information set? Which actions will you take in this information set and so forth? The strategy can be probabilistic. It might say okay, here you go forty percent here, sixty percent here. Clearly the strategy should depend on where you believe you are in the information set. For example the blue guy might want to do a different thing here and here. But he can’t because he doesn’t know where he is. But he can actually derive beliefs as to what’s a probability that he’s here versus here. Once you have the strategies you can just use Bayes’ rule to derive the probabilities. Okay, come on in there’s plenty of seats, don’t be shy. [laughter] >> Eric Horvitz: [indiscernible] don’t go together. [laughter] >> Tuomas Sandholm: Okay, so I’m going to be talking about domain-independent techniques. The application areas logical and it’ll be poker but there’s nothing poker specific here. Techniques for complete-information games like chess or checkers don’t apply here. You have to have completely different techniques. Challenges here include unknown state, uncertainty about what others and nature will do. In my opinion most importantly interpreting the signals that the other players sent, so when the other players take actions it signals to me about their private information. Conversely whenever I take actions it signals to the other guy about my private information. How do I take those into account? Well the beauty about Nash equilibrium which is a solution concept by John Nash from 1950 this gives a solid definition on how those signals prescriptively should be taken into account. But of course the Nash equilibrium solution concept it’s just a definition. To operationalize it you have to have algorithms for computing Nash equilibrium or approximations thereof. Alright, so Nash equilibrium is a strategy for each player and beliefs for each player, so that no agent benefits from using a different strategy. No agent unilaterally can benefit from deviating. This is the solution concept that we’ll be using throughout the talk. In one place I’m going to talk about the refinement as well and I’ll just make it clear. Okay, most real-world games are actually like this. By this I mean incomplete-information games that fit into the general extensive form game model. Negotiation all in various forms, military things, cyber security, like we’ve done some work on wireless jamming games. I have some ideas how to do operating system security applications, so that as well. Medical treatment planning, this is something that I’m super excited about. We have a new big crank proposal pending of that where you think about the world as there being a treater that treats a patient and a disease, two players zero some game. For some nodes of the game you have some probabilistic information that’s good. For other nodes you don’t so you have this kind of mixed stochastic versus game theory situation. Then you can use game solving techniques and opponent exploitation techniques for making sequential treatment plans. Biological opponents furthermore have the limitation that they can’t look ahead. They have like one step look ahead and you can actually exploit that as well. We’re actually with a biology collaborator proposing to use this for steering the adaptation of T-cell population in an individual, so as to drive their own T-cells to battle cancer or to battle auto-immune hepatitis. If we get the funding we’ll actually be doing wet lab with In vitro and then In vivo in mice. Alright, enough of that, oh by the way, thank you Microsoft we got a little seed grant to get that work started a couple of years ago. Okay, Poker, well that’s a benchmark. I don’t really view it as an application although you can view it as such. But I view it as a benchmark. It’s a challenge problem in the AI community since about 1992. There’s hidden information which is the other player’s cards, uncertainty about future events. Deceptive strategies are needed in a good player. You can’t always play the good hands aggressively and the bad hands weakly because the opponent will pick up on that, and do very bad things to you. The game trees are very large. I’ll talk about how large. Some of the techniques I’m going to be talking about here apply to general some multiplayer games. Some apply to just two players here on some games. If it’s just for two players here or some I’ll mention that on the slides. I should mention that two player Poker is a zero sum game. It’s actually very popular form of Poker. It’s not like we just looked at that because that’s the only thing we can look at which is also true. But it’s high stakes. Some of it is TV but most of it is actually online, so real no split level high stakes play, two player, mano e mano. A lot of professional gamblers prefer that form. It’s very interesting and they are super good, unbelievably good at that. How quickly they can adapt. How sophisticated their strategies are. Some of these people are without college degrees yet they are so smart. It’s just unbelievable. It’s kind of humbling. Alright, so here’s our approach which is basically now used by all the leading Poker research groups. This was foreshadowed of course by other things. The idea of automated abstraction was already here. But then there was custom equilibrium-finding and manual abstraction and so forth. But here’s the idea you have the original game. In the case of two player or in other words heads-up no limit Texas Hold’em the number of information sets in the game is about ten to the one hundred and sixty-one. Bigger than the number of atoms in the universe but even if you had for every universe you had a sub-universe and counted those atoms. It would still be less than this number. It’s a big game you can’t write in down. We run some automated abstraction algorithm that takes, as input some compact representation of the game. Think about a rule sheet printed on one piece of paper, if you will. To an abstracted game that is similar or hopefully similar. We’ll talk about that, to the original game. Then you run a custom equilibrium-finding algorithm to find a Nash equilibrium of that abstract game. Then you use a reverse model or reverse mapping to map it back to app works Nash equilibrium of the original game. Any questions on this framework? >>: Can you explain an abstraction at [indiscernible]? >> Tuomas Sandholm: No, I can’t explain it a little bit more because I have like thirty slides on it. I’m going to say a lot more about it, yeah. I’ll talk about it in detail now. Okay, so Lossless abstraction which almost sounds like an oxymoron. It’s more like finding isomorphism’s if you will. The observation there is that we can make games smaller by filtering information a player receives. For example instead of observing a specific signal exactly. A player instead of observes a filtered set of signals. For example, if the player is receiving an Ace of Hearts we’ll say it okay, he received an Ace. Sometimes some of the other detail doesn’t matter. This form of abstraction is just merging information sets. There are two information sets that the player can distinguish between. We’re going to set okay now he can’t, in the abstract game he can’t. If we do it losslessly we only remove redundant or irrelevant information from that model. Does that answer the question? Yeah? >> Eric Horvitz: Tuomas did you have any, has it been helpful and do we know what kind abstractions human experts use? >> Tuomas Sandholm: Okay, so this has been very helpful. I’ll talk about that on the next slide. We don’t really know what the abstractions humans use. >>: This suggests you’re not going for a flush, right, so... >> Tuomas Sandholm: I’m not, this is an example. I’m not saying that you can ever actually pull this abstraction off in a lossless way. This is an example of what this might be. We are not programming in the abstractions. We generated this algorithm called GameShrink which automatically identifies all of abstractions like this and makes them. We don’t have to know in advance what kind of abstraction you could actually do and still be lossless. The algorithm will smell them out itself. >>: But in many information sets this abstraction should work, shouldn’t it? >> Tuomas Sandholm: Yeah, well I’m not going to argue this particular one. But I’ll argue another one which is easy to understand. Let’s say that you get your first two cards in Texas Hold’em. Whether it’s an Ace, let’s say you get two Kings, King of Hearts, King of Spades versus King of Hearts, King of Clubs, same thing at that point. They only become important later when the flush consideration is relevant. That’s captured, so if you bundle something it doesn’t mean that you’re going to bundle it forever. >>: By that do mean that you’re encoding some like [indiscernible] formation that the game probably like… >> Tuomas Sandholm: No, you’re not encoding it. >>: Yeah. >> Tuomas Sandholm: You’re algorithmically identifying it. >>: You’re putting it like in your set of candidate abstractions that can be considered as a homomorphism. >> Tuomas Sandholm: I’m with you except for the putting it in part. It’s actually considering all of them itself. >>: Okay. >> Tuomas Sandholm: You know we’re not putting in candidates. It’s figuring them all out. >> Eric Horvitz: He’s got twenty-nine more slides coming so maybe. >> Tuomas Sandholm: Well twenty-nine of this. That’s the first topic of many. Okay, so with this we solved a game called Rhode Island Hold’em poker which was an AI challenge problem. Introduced by Shi and Littman to be kind of bigger than the Koon Poker that John Nash solved in the fifty’s, which was by the way the only game in his thesis. The connection between poker and game theory actually goes way back. Smaller than Texas Hold’em because it was viewed that Texas Hold’em is so big that we can’t ever make real traction on it. That actually turned out to be wrong. But anyway, three billion nodes in the game tree, without abstraction the sequence form linear program which can be used for equilibrium finding has ninety million rows and columns. It’s unsolvable. GameShrink which is our abstraction algorithm ran in one second and collapsed the game down to one percent of its original size. Ninety-nine percent of the game was actually irrelevant. After that the LP had one point two million rows and columns. At that point with the best LP solver which was a CPLEX barrier method it took eight days on a small super computer in my lab. We could just crank out the exact answer. We solved the exact Nash equilibrium with this. That was the largest incomplete information game solved by over four orders of magnitude at the time. That really showed the power of abstraction. >> Eric Horvitz: Tuomas when you save solve you’re solving a stack version of the game. >> Tuomas Sandholm: Lossless, so these Nash… >> Eric Horvitz: [Indiscernible]… >> Tuomas Sandholm: Of the original game. >> Eric Horvitz: You proved losslessness? >> Tuomas Sandholm: You prove, the GameShrink algorithm proves losslessness. It finds the exact Nash equilibrium. Not close but all the way to machine precision. >>: You said the word the, of course in general we know there can be multiple… >> Tuomas Sandholm: Yeah, I meant to say [indiscernible]. [laughter] That’s right there could be multiple Nash equilibrium, yeah, and this bound one. >>: Of course some would be better than others to have a way to… >> Tuomas Sandholm: Oh, no, no they couldn’t. In two player zero sum games… >>: [indiscernible]… >> Tuomas Sandholm: You have a nice swapping property that if you play anyone of the Nash equilibrium strategies. The opponent plays anyone of his they all pair up equally well against you. You’ll get the same values. >>: Zero-sum? >> Tuomas Sandholm: Yeah, okay so sometimes though even though lossless abstraction gets you in poker about ninety-nine percent of the game to go away. If you leave one percent of ten to the one sixty, that’s still a big number. [laughter] In Texas Hold’em you have to do lossless abstraction to get anywhere. Now I’m going to talk about first the leading practical approaches and the history of that. Then some new theory about how do tie abstraction to the Nash equilibrium quality in the original game. I’m not going to in the interest of time talk about everything. But from two thousand seven to two thousand thirteen the main idea is for practical abstraction with the following. One was integer programming to decide how many children each node at the level gets to have. You don’t want to have a uniform abstraction. You’re computing resources for Nash equilibrium finding tell you how large the abstraction can be. You want to use that size smartly where it matters. Secondly, potential-aware, I’ll talk about that. Then imperfect recall, imperfect recall is the idea that you may want to forget something that you knew in the past. In order to make your abstraction smaller and buy yourself more space to refine the present more finely. That use to be kind of a weird notion in game theory. There were just these obscure things but now it’s actually a very practical tool in solving games. Now I’m going to jump to the currently best abstraction algorithm which combines the ideas of Potential-Aware and Imperfect-Recall, and Earth Mover’s Distance. Then it obviates a need for IntegerProgramming. Alright and I’m going to do kind of a transformation from how that literature went. First Expected Hand Strength is the goodness of your hand assuming that nature and the opponent rolls out cards uniformly, from then on. Early poker abstractions use that as the measure of clustering hands, clustering the information that the players get. But that doesn’t really work that well. Here’s an example where you have Expected Hand Strength being basically equal but the hands are very different in the middle of the game. Let’s say we’ve started with two pair of fours. Or we started with ten Jack suited. Very different hands both have Expected Hand Strength point fifty-seven. Your algorithm might bucket those together or it should be played completely differently. Why are they different? Well this hand is pretty much usually mediocre, pretty good, not great. If you get a triple, yeah that’s great. But it has very little mass here and it has a lot of mass here. This is all but it has quite a bit of mass here and no mass here. Basically this hand is going to end up really good or terrible. Okay, so those should be played differently. That can be captured in Distribution-Aware abstraction. You look at the full distribution of hand strength, basically the histogram on the previous slide. Then you use for example earth-mover’s distance as a distance metric between the histograms. Turns out earth-mover’s distance is much better than L one or L two. Come on in. Don’t be shy. Do you want to open the door I guess they’re being shy. Yeah? >>: Sorry, [indiscernible] going back to the last slide. >> Tuomas Sandholm: Yeah. >>: Yes, thank you. You said that the least expected hand strengths I interpret as some of like value function, is that right? Or… >> Tuomas Sandholm: Yeah, I guess you could call it a value function, yeah. >>: Value… >> Tuomas Sandholm: But it’s not really a value function because it’s not based on the strategicness or what the other player, what the player and others are going to do from then on. It’s assuming a uniform roll out. >> Right, yes, but what if they, imagine that your opponents are going to roll out a strategic and [indiscernible] real way. >> Tuomas Sandholm: Right. >>: The clustering criteria according to that like expected hand strengths, be perfect or at least [indiscernible]? >> Tuomas Sandholm: No, I’m saying that even if they don’t do that it’s going to be imperfect. Now, I’m doing this kind of transformation into better and better things. Okay, so the prior before the twenty fourteen paper the prior best approach used this distributionaware abstraction with imperfect recall. But that doesn’t really take the potential into account. Potential is something you read about in the poker literature. But nobody’s really been able to define it. We actually define it operationally using kind of a recursive formulation. Let me show you an example of this first. Let’s say we have two situations. This is the game with private single X one. This is the game with private single X two. Here with probability one you get no information in the next step, answered in the results in the second step. Here I’m showing the results in the first step and you get no more information in the second step. They have the same distribution to the last round but very different potential. What we do then we instead of thinking of transitions to the last round histograms. We think about transitions to the next round histograms where the base of the histogram is states of the next round, which we have already abstracted by moving bottom up in the game tree to do the abstractions. Did this make sense? You’re not so sure. >> Eric Horvitz: Say it one more time. >> Tuomas Sandholm: Okay, the algorithm starts from the bottom of the game, from the leaves. There’s no potential left there. We can use your favorite metric like expected hand strength to cluster those. Now you have clusters there. At the previous level now you can look at what is a probability distribution of transition change to those next level clusters. Of course the earth-mover distance now is in a multi D space. We cluster based on that and that’s how we move up the game. Again, we can use imperfect, we use imperfect recall throughout. >> Eric Horvitz: It’s in the earlier versions just like the final state only. >> Tuomas Sandholm: The earlier ones, these, yeah these ones to kind of look at the transition from a current state to the final state histogram. Now we’re looking at the transition level by level, step by step in the game. But that also means that you don’t have this type of X access. You have this multi dimensional thing where you have to do the earth-movers distance. We do that and we develop the custom algorithm for that to approximate it. Because the normal earthmover algorithms don’t scale and this led to the best abstractions evaluated experimentally. >> Eric Horvitz: One more thing, stop me for a second that you’d be losing information without abstraction steps, that particular one? >> Tuomas Sandholm: Yeah, all of the abstractions I’m now talking about are a lossy. You have to lose information. >> Eric Horvitz: Right. >> Tuomas Sandholm: Otherwise the game ends up being too big. >> Eric Horvitz: As you go on you talk about lossy abstraction. >> Tuomas Sandholm: Yeah, the first one… >> Eric Horvitz: I can imagine the come back to say there’s not loss later. >> Tuomas Sandholm: I’m going to, so the thread on this part of the talk is going to be we started with lossless gets rid of ninety-nine percent. That’s not enough have to abstract more. Here’s the practical stuff. Then we’re going to talk about theory that actually gives you balance. That even if you abstract in this way you’re still bounded with respect to the original again. >> Eric Horvitz: Yeah. >> Tuomas Sandholm: Yeah and that’s a thread here. >> Eric Horvitz: Then… >> Tuomas Sandholm: I’m not there yet. >> Eric Horvitz: Michael Bowling’s group also did similar kinds of bounding. >> Tuomas Sandholm: Yeah, actually I’m talking about not only our work but throughout this. Like here this was Michael Bowling’s group. This was Alberta group before Michael Bowling joined, Michael Bowling’s work, Michael Bowling’s works, yeah. >> Eric Horvitz: Yeah. >> Tuomas Sandholm: Yeah, so I’m trying to do a little bit of overview of the field not just a presentation of our work. Okay, so Tartanian seven was a program at one of the most recent annual computer poker competitions in the no-limit heads-up category. It uses an abstraction similar to what I was just talking about, except in a distributed way. We can run on clusters now or we were running on cache coherent non-uniform memory access, super computer for that competition. The abstraction just uses any favorite abstraction algorithm like the one from the previous slide. At the top of the game you can define top anyway you want. We defined it be the flop round in poker, actually sort of pre-flop. Then the rest of the game is split into equal sized disjoint pieces based on public signals. You can put different computers working on the different PCs. It’s important that you do it based on the public signals because that guarantees that the information sets don’t cut across computers. Alright and how do you do that? Well you have a base abstraction generated with algorithm on the previous slide and you can look to transitions into that to have a well defined algorithm. Then for equilibrium finding we used External Sampling Monte Carlo Counterfactual Regret Minimization. Or a variant of that which is from the University of Alberta, it starts at the top of the tree and then when it gets to the rest part it passes samples of flop from each public cluster. Then you continue the iteration on a separate blade for each public cluster. Then you return the nodes. There’s some details as to how do you actually make it work in this distributed context. We could talk about but you have to do otherwise one converge. Then you can sample, do multiple samples into one of those continuations if you’re worried about the communication overhead. It becomes minor. Okay, now to Eric’s bound. >>: Just an idea is how much time does this… >> Tuomas Sandholm: Oh, as much as you can give it. As many cores as you can give it. As much time as you can give it. >>: But what are you… >> Tuomas Sandholm: We were running, this spring we were running. Oh, sorry this is the previous spring, about a thousand cores for about three months. We’d like to take it to an order or two more cores in the future. Yeah? >>: Maybe a technical detail, but are you maintaining your belief state at each node in the tree? Or you have some particle like sample core representation perhaps? >> Tuomas Sandholm: The beliefs are maintained explicitly. For each information set in the abstraction for each action CFR maintains the probability and one number for the regret. Yeah? >>: I’m a little confused. You run this thing for three months. Then you have a representation that you can play any game with no further computation. >> Tuomas Sandholm: No, no. >>: This is some particular instance of one game. >> Tuomas Sandholm: Good question, so this is, if we go back to that framework slide. It starts by getting the description of a particular game. Then the abstraction algorithm is run on that description, spits out the abstraction for that game. Then the equilibrium is computed for that game. The algorithm is general but the run is specific to that game, to that input. >>: Especially in poker. >> Tuomas Sandholm: Not even all poker, Heads-up no-limit Texas Hold’em poker. Yeah? >>: But, so you have a solution for that particular poker game that you can now take to a tournament and run in real time. >> Tuomas Sandholm: Yes, right. >>: How big is that representation? >> Tuomas Sandholm: I don’t remember how big it was here. For the next program that we developed this spring which is Claudico it was one point four terabytes. >>: That’s a big table or… >> Tuomas Sandholm: Big table, one point four terabyte table of action probabilities. >> Eric Horvitz: Does it ever make sense as the game evolves to try to get ahead of it with real time computation? >> Tuomas Sandholm: Yes, I was going to get to that. Yeah, you’re so smart you know you should tape your mouth because you’re jumping me ahead. [laughter] Okay, good, so Lossy game abstraction with bounds. This is actually tricky due to a known fact in games which is called abstraction pathology or abstraction non-moniticity, again from the University of Alberta here. Basically, in single agent settings be it in planning or MDPs what have you. If you make an abstraction that’s finer grained your solution quality can’t go down. In games that’s not true. You come up with a finer grained abstractions even if strict refinement of your original abstraction. Your solution quality can actually go down. It’s for awhile that kind of threw the whole framework into, in question. If that’s true why are we coming out with these finer and finer abstractions? Maybe we’re actually taking steps backward. But then we started looking at Lossy game abstraction with bounds first for stochastic games. Then for general extensive-form games and I’ll show you a few results on that in the next couple of slides. The abstraction is performed in the game tree not in what’s called the game of ordered signals and the signal representation. These are now general purpose unlike the GameShrink algorithm which was for that game of order signals class of games. It’s for both action and state abstraction. We so far talked about state abstraction where you bucket the information that you get. But you can also do action abstraction which is really important in games with large or continuous action spaces. Your true some of the actions are the action prototypes and pretend that the rest of the actions don’t exist. We’ll talk about that in detail. Here’s a detail, more general abstraction operations are enabled here by allowing not only many to one mappings of the states. But also the other way around one too many mappings and you can get some leverage from that. Okay, so here’s the main theorem. This is joint work with my student Christian Kroer. For any Nash equilibrium in the abstract game any undivided lifted strategy is an epsilon-Nash equilibrium in the original game. Where epsilon is defined like this. What is an undivided lifted strategy? Well, lifted strategy just is something that works in the original game in the obvious way you’d think about. Undivided is a constraint on how we reverse map. It’s not a restriction on games. It’s just a constraint on how we reverse map the answer back to the original game. Now what is this? This is kind of where the action is. It’s looking at measurable things in the difference between the abstraction and the real game and then tying it into the epsilon in the epsilon-Nash equilibrium which is a game theoretic notion. Okay, so what is this two times epsilon R? This is a utility error and this is defined recursively at the leaves. It’s just the error between the model, the abstraction, and the actual game. Its interior nodes, if it’s a player node it’s a max over what the players can do. If it’s a nature node it’s just the probability weighted sum of what nature can do. You’ll maximize over agents and you maximize over information sets and that gives you this value. Then there’s maximum over players of the sum of the heights where its player i’s turn to move or epsilon j zero times this W. This is a nature distribution error at that height. How wrong is your nature model at worst compared to the real game? You add that to the sum over height where it’s nature’s turn to move two times, this epsilon times this W. Where, this is the heights of nature turns nature distribution error at height j, and W is the maximum utility of a player in the abstract game. That’s, I didn’t expect that we’d walk through the proof or anything. But this gives you a very concrete thing where you can measure everything on the right hand side by just looking at the abstraction and looking at the game. Then it ties it to saying okay if you abstract it this way you’d solve the abstraction. Your error in the original game in Nash equilibrium is at most epsilon. Yeah? >>: Yeah, so you mentioned provicity that when we get finer and finer abstractions the actual solution can actually like go worse, right? >> Tuomas Sandholm: Yeah, for… >>: But I guess the hope is that with these transition and reward error bounds you would hope that the upper bound would behave in a better way. But actually we know that they don’t, right, because when you… >> Tuomas Sandholm: Good question, not quite, not quite. But somebody’s being eagle eyed here, that’s good. How is it possible that I’m giving you a theorem? That says that as I’m getting closer in the abstraction, finer grain in the abstraction to the original game. My epsilon is going down and I just told you its well known that there are abstraction pathologies where it actually goes up. What gives? >>: [indiscernible] bound proof. >> Tuomas Sandholm: Exactly, this is a bound and it leaves some room for non-moniticity as we’re approaching the real game with abstraction. Alright, so the utility error side of the bound is tight. The nature distribution error bound is tight up to a factor of six. Hardness results, well determining whether the two subtrees are what’s called extensive-form gametree isomorphic is actually graph isomorphism complete. This is something that you need to check even for lossless abstraction. It is not obvious because the graphs have special structure so you might think that this might be easier than graph isomorphism but it’s not. This if it’s hard computing the minimum-size abstraction given a bound is NP-complete and the other way around as well. Minimizing the bound given a maximum size of the abstraction is NP-complete. Now you might ask wait a second, as a pre-processor you’re solving some NP-complete problem. Then you’re doing the equilibrium finding which in two player zero-sum is actually polynomial time. How does that make sense? But these are of course worst case results. In practice it’s not only helpful it’s necessary to do the abstraction. You don’t have to do it optimally. Okay, this is showing an impossibility of level-by-level abstraction that shows that you have to actually consider the whole tree. Or at least you can’t focus your attention level-by-level as all of the prior abstraction algorithms have done if you want to have bounds. Even if you want to have a lossless abstraction you can’t do that. But in the interest of time let me not walk through that example. Okay, extension to imperfect recall. That theorem was for perfect recall. We have extensions to imperfect recall as well. There’s a paper from Alberta by Lanctot et al. Here we get exponentially stronger broad bounds than that. We get bounds for a broader class of games where abstraction can actually introduce nature error as well, which is somehow something that they precluded from consideration. Furthermore, our theorem is for any game. Theirs is just for the counter effects, sorry any equilibrium finding algorithm. Theirs is just for the counter factual regret naming algorithm. Okay, so, now as I thought about this abstraction theorem. It actually brought up an interesting other connection which is if you think about modeling. Models are never the real world. Modeling is a form of abstraction. Typically in game theory when we take the model and solve it we actually take the answer as if that’s somehow applicable to the real world. But we had no connection that says that, how does that answer actually relate to the real game? Now these are the first results that actually tie that gap as well. If you can measure the gap between your model and the reality, or at least bound it. Okay, action abstraction typically has been done manually. Still often time manual, there’s been some automation. Again, this is from a different group from the University of Alberta. For stochastic games of theory we had applies the theory that I just talk about applies. Then with my Ph.D. student Noam Brown we developed the first algorithm for parameter optimization for one player and two player zero-sum games. Where you can actually have one player control some parameter like for example the bet size. Then as you change the bet size you don’t have to restart the whole equilibrium finding. You can warm start with some really clever stuff that Noam did here. That allows you to move bet sizes. You can actually move multiple bet sizes at once. As long as the payouts are convex in the bet size vector this is actually guaranteed to converge. But I’m going to show you something cooler later today. I’m going to skip that part. Alright, so that gives us, that’s all I was going to say about automated abstraction for now. Next is custom equilibrium-finding. How do you solve the abstract game? Now I’m going to really look at two-player zero-sum only. Okay, this is kind of giving a perspective on the field. On the X axis we have year and y axis we have on log scale the number of nodes in the game tree that had been actually solved or near optimally solved. You can see that when their annual computer poker competition was announced around here it really spurred a lot of interest in this. People started building on each other’s work. We saw this super exponential jump in the technology capability. I’ll talk about this the best algorithms in detail in a little bit. Now especially when you’re in the CFR family you want to measure complexity not in the number of nodes, but in the number of information sets. Here’s a newer graph that I made again years on the x axis, number of information sets and a log scale on the y axis. You can see that this exponential growth has continued to this day. You can actually solve games now with about what is that? [laughter] Six, twelve, thirteen, ten to the thirteen, a little bit more than ten to the thirteen nodes in the game tree. I don’t have the number for this one. But for this one the number of nodes was already five times ten to the fifteen. >> Eric Horvitz: That’s interesting with people I often get questions is, are advances in AI. You know how much do our advances are due to power of machines getting better and more memory getting cheaper? I often say well it really come down to innovation in the problem solving space. It’s going to bust out of the kind of constraints that we see as the power of the computation pushing mean sum. You can imagine Poly-Moore’s Law against this graph having sort of seen really level out to be here. It can go flat while the AI invasions created… >> Tuomas Sandholm: That’s right. This is almost all algorithmic innovation, right, or AI innovation. I like that term. >> Eric Horvitz: No charge for that. >> Tuomas Sandholm: Yeah. >>: But, okay, but these dramatic improvements. You’re exploiting the structure of the problem, right. If I come up with a new game like Texas Hold’em and I’m going to add a… >> Tuomas Sandholm: I wouldn’t say that. I wouldn’t say that. >>: The actually integer on that card I can say that my search tree has explodes dramatically because I have more information. But because it’s independent of the game I can just decide to ignore it. >> Tuomas Sandholm: This is not the size of the original game. This is the size of the abstraction. Now if you go back to this picture. This was ten to the one sixty, keep that oh ten to the one sixty-one. Here we’re measuring how much comes into here. What is the size of the abstract game that get’s fed into the equilibrium-finding algorithm. That’s what’s on the y axis now. >>: I see, okay. >> Eric Horvitz: I wonder if you could actually [indiscernible] a cool graph would be also the bound, no so much of a bound on the formality on this graph here. >> Tuomas Sandholm: Bound on the optimality in the real game? >> Eric Horvitz: Right, the function of the size and study… >> Tuomas Sandholm: Yeah, for limit Texas Hold’em there’s been some of that. This I actual know the answer for this. Its one milli big blind per hand, it’s so, so close to optimal that a human playing for a lifetime at human speed could not tell with statistical significance whether they’re winning or losing, even if they’re playing optimally. >> Eric Horvitz: What was the measure you used, what’s the word again? >> Tuomas Sandholm: Milli big blind per hand, one thousandth of a big blind. For this it’s not, this is for no limit, and no bounds are known for no limit because you cannot even run the best response computation. You can’t even expose check how close to optimal you are in no limit. It’s a whole different beast. Okay, best equilibrium-finding algorithms counterfactual regret from Alberta… >> Eric Horvitz: Sorry, Tuomas can we go back to that last slide? >> Tuomas Sandholm: Yeah. >> Eric Horvitz: But if you use the same algorithms went back down in size of the abstraction. It would be interesting to just understand the function of the constraints on the richness of [indiscernible] what the error is. Current best algorithms but just reduce the constraints on so maybe the actual nature of the size of richness of the abstraction [inaudible]? >> Tuomas Sandholm: Yeah, so how this went is that the practice went way ahead of the theory as usual, many years before the theory. The theory is relatively new and it actually, we haven’t actually tried to tie the theory to any one of these things. The abstraction algorithms that were used before you got to these numbers. That hasn’t been done. But what’s been done in limit Texas-Hold’em you can actually expose computer best response and measure how exploitable in the original game you are. I know the answer for that guy. >> Eric Horvitz: Okay. >> Tuomas Sandholm: Okay, best algorithms were counterfactual regret from Alberta, Scalable EGT from my group, completely different algorithms. It’s amazing that completely different algorithms and they have selective superiority which is kind of weird. This is based on no-regret learning. This is based on Nesterov’s Excessive Gap Technique. Most powerful innovations here, well the number one is that each information set has its own separate no-regret learner. If you think about doing no-regret learning in the whole strategy space you’re totally dead in the water. It’s way too big. But here you can actually isolate it to each information set separately which is a brilliant innovation. Sampling you can actually sample a tree over and over of each iteration so you don’t have to walk through the whole tree. Here most powerful innovations, well first of all smoothing functions for the Excessive Gap Technique. That satisfy the conditions of that technique for sequence or games that enable this idea to be used for sequential games in the first place. More aggressive smoothing helped for an order of magnitude and still have spatial constraint balance smoothing between the primal and dual. Also kind of an order of magnitude there and then you can get memory scalability by taking the memory to a square root of the original. If the actions don’t depend on chance which is the case in poker, so fortunately this can take your memory to a square root of the original. This iteration complexity is one over epsilon and each iteration is slow. Here both of these parallelize. Here the iteration complexity is much worse one over epsilon squared but each iteration is fast. These are totally different. You think about this doing billions and billions of iterations to solve, each iteration running in less than a second. Here each iteration is like a day and maybe you do two hundred iterations, so totally different in that sense as well. Selective superiority so one can be faster than the other depending on the game and the abstraction. One thing that’s nice about this is that you can run it on imperfect recall abstractions. Although it’s not guaranteed to converge to an equilibrium, but at least you can run it. For awhile this couldn’t even be run on that. We kind of abandoned this for awhile. Also with some condition numbers on the matrix you can get log one over epsilon which is the best possible. That’s the same as interior point techniques. But interior point techniques aren’t scalable for memory. Alright, one slide on a new paper here, so a new prox function for the first-order method such as Excessive Gap Technique and Mirror Prox. Gives the first explicit convergence-rate bound for general zero-sum extensive-form games without requiring the condition number, the log one over epsilon. You’re getting, basically you’re getting this complexity but much faster and for much more general setting than our original paper. >> Eric Horvitz: Tuomas while the slide is above I was going to ask you at this pretty [indiscernible] seconds on this how this kind of semi-parallel pursuit at Alberta was influencing your team in terms of learnings or directions, or contrast to it what the Alberta Team was doing over the years? >> Tuomas Sandholm: Okay, great, over the years. Abstraction we’ve certainly been building on each other’s work a lot. On the equilibrium finding we went into exact opposite directions. This is coming from kind of a Machine Learning, no-regret learning tradition. We came from the optimization tradition. These had very little interplay. Except that when imperfect recall became the abstraction of choice we had abandoned this because it didn’t do imperfect recall. This one at least although it doesn’t have any guarantees you can run it. You can press the button and see what happens. That actually ended up being the best approach for awhile. Then because of that we actually went and said okay can we improve this? For the last couple of years we’ve mostly been coming up with better and better things here. Now, we’ve been building on that. Now the next slide I’m going to show is actually an improvement on this thread. Now we’re actually pursuing parallel threads in my group, this thread and that thread. Here there’s a lot of interplay with it and building up on each other’s work with the Alberta Group and other groups. It’s not just us and Alberta although they’re the leader groups. But Eric Jackson from California, Team [indiscernible] from California, Oscar [indiscernible] from Finland. There’s a Czechoslovakian group that’s very strong, French group that’s very strong. There’s been a lot of building on each other’s work. Yeah, okay, so a new prox function, better prox function for these optimization based techniques. That introduces gradient sampling schemes. In particular it enables the first stochastic first-order approach with convergence guarantees for extensive-form games. Now you can start to do sampling. We did some game sampling before but now you can actually do gradient sampling in this optimization framework as well which was one of the big reasons we moved away into the kind of no-regret space. It introduces a first first-order method for imperfect-recall abstractions which is the second reason we moved away from that. Now, I would say that both threads are alive again. Okay, this is kind of a weird post processing deal. Actually let me skip that. Endgame solving, coming back to your real time question, so, so far I talked about the game being sold up front. Huge strategy vector and then just look up at run time. But you can actually do end game solving. This has been very powerful in complete information games like chess. In fact for solving checkers that was the whole thing. It is a big dynamic program that was the whole the game as the endgame. That allowed Jonathan Schaeffer to solve checkers. In imperfect information games endgames holding is totally different due to the information sets. Benefits first of all, finer-grained information and action abstraction because you’re in a specific context you can afford to do finer-grained. You can dynamically select the coarseness of your action abstraction based on the size of your endgame. This is actually something that threw the humans off really badly in the man/machine Prince versus AI Match that I organized this spring. In that humans usually think a lot when there’s a lot of money in the pot in no-limit Texas Hold’em. This does the opposite. If there’s little money in the pot the endgame’s actually bigger because there’s more raises that are possible still, so it actually has to think more. The smaller the pot was the more the computer thought. That really rubbed the humans the wrong way. [laughter] Anyway, new information abstraction algorithms have taken into account the relevant distribution of the players’ type distributions, and types entering the endgame. By the time we get to the endgame we can use Bayes rule from what’s been played so far, get the distributions. We can now decide where we need more resolution in the abstraction versus not. We can compute exact equilibrium strategies rather than approximate ones because now we have a much smaller game. We can use LP instead of these iterative methods. We can compute equilibrium refinement and solving the off-tree problem. If we’ve action abstracted before we have been rounding back into the abstraction what the opponent has done. Our model, our thinking of where we are in the game in terms of pot size might be off. Now we can start again. We can start with a real pot size and fix that problem right on that spot. >> Eric Horvitz: All the solutions are made to fit in the attractability of what’s considered a normal response time? >> Tuomas Sandholm: Exactly, exactly, that’s exactly right. You get control of the coarseness of the abstraction to accomplish exactly that. Alright, so… >> Eric Horvitz: But I guess comfortably or is it the idea that if you had a little bit more time you could go, get a sense for the actual profile you’re on… >> Tuomas Sandholm: Yeah, we do a little, I don’t have a graph on that but you can develop that sense very easily as to what the tradeoff is. One of the tradeoffs we did in the man/machine is that we said look we don’t want to take more than twenty seconds on the last betting round. Because that’s something that is quicker than what humans on average do and quicker than the pros did on average. But this is still kind of annoying if you think about it. Each pro grave two days of their life waiting for the computer to respond over two weeks. You know that twenty seconds was the right number. If we had, had say two minutes we could have had a much more refined endgame abstraction and played better regarding card removal. Things like that that the pros actually picked up on. But it’s with the same algorithm just giving you the finergrained abstraction. >>: I wanted to get back to the question earlier and ask if it relates to always having, always starting every hand with the same pot, with the same chip stacks? >> Tuomas Sandholm: Yes, so if you wanted to, the game with different chip stacks is a different game, we’d go through the whole loop again. >>: Right. >> Tuomas Sandholm: That’s right. >>: In that that would be an example of an issue that we raised earlier because you would in essence have to recompute the entire, you’d have to compute many different games… >> Tuomas Sandholm: Yeah, you, and people have done that. If you want to play a poker game where you have different stacks as starting stacks in different hands, then and people have done that. Where we’ve always focused on annual computer poker competitions style where it’s always the same chip stacks that start out with. That’s how we played with the humans as well. That’s how, when we, yeah let me leave it at that, yeah. >>: Yeah. >> Tuomas Sandholm: Okay, it’s not perfect though. You think of rock, scissors, paper, if we are in this yellow endgame where the first guy has moved one third, one third, one third. Here we can conclude that ah, because he is moving randomly one third, one third, one third. Yeah, might as well always move rock. The endgame solver could conclude that playing rock always is just fine. Of course that’s a disaster. It does have its perils. We have some theory that ties the size of the endgame to the rest of the game. But that’s kind of largely an open research question. How do you tie the endgame solving into the game so that it’s not very exploitable? Alberta has also done some work on that and they have some guarantees. But in practice our method seems to be doing better than theirs which is, so pretty much too be open how that can be done in a semi safe way at least and still practically playing well. Okay, experimentally it helps. We did a test in twenty twelve with no-limit against all of the top players. Tartanian five was our button and adding the endgame solver improved performance against all of the competitors including itself. Then you can also look at removing weekly dominated strategies first. Looking at equilibrium on the remaining set which is a refinement of Nash equilibrium and that helped even more. This you can solve with LP. This you can solve with two LPs and you’re done. Okay, here’s another idea that Sam Ganzfried my student came up with. The idea was that what if we knew some dominate knowledge about the game that we’re solving. Now we’re by the way in Limit Texas Hold’em not no-limit. Maybe we’re right, maybe we’re wrong but we have some gut feeling that you know there’s some regions in the endgame. For example as we got stronger hand, weaker hand in this region we should bet fold. Here we should check call and so forth. This is what the opponent should do. We have some gut feel that that’s how it’s going to be. We’ve seen humans play like that though, we’re making a guess. But now we can write an integer program that will actually find and equilibrium that matches this qualitative structure if it exists. The idea is that at, so basically the integer program is trying to place this cut of thresholds in the right places. The leverage that allows us to do the integer program is at a threshold. I have to be indifferent between doing this and doing that. That’s the short of it. You can actually make multiple guesses and you can test each one of them. If you’re right you’re getting the Nash equilibrium. That really speeds up endgame solving if you want to use this idea. Also it allows to sometimes prove existence of equilibrium in games where it hasn’t been proven or couldn’t be proven before. Solve games for which no algorithms existed including multi player games. Of course you have to have a guess. The good thing is you can be wrong about it. If you’re wrong about it the integer program is going to tell you it, you’re an idiot. >>: When you say [indiscernible] equilibrium what is the setting… >> Tuomas Sandholm: Let me in the interest of time take that offline. It’s kind of a long story. That’s not poker. There are weird kind of continuous games that don’t fit Nash’s original theorem. Okay, so we talked about that, custom equilibrium-finding reverse model. The reverse model is this problem. Let’s say that we a continuous action space or a large action space and we have red action prototypes. The opponent could play outside of those actions, what do you do? Of course you yourself decide to play into those. You never fully get yourself off track but the opponent you can’t control. Let’s say f over x is a probability we map to the, down and one minus f is probability we map up. We came up with this axiomatic approach what would be Desiderata about f. Of course you might want to have more. But these seem to be at least what you want, if you’re at A map to A, if you’re at B map to B. Monotonicity as you get closer to B the probability of going to B shouldn’t go down. Scale invariance whether they’re playing for a dollar or a hundred dollars the reverse mapping should be the same. Small change in x shouldn’t lead to a large change in f. Small change in A or B shouldn’t lead to a large change in f. Here’s the Pseudo-harmonic mapping that we put together actually satisfies these Desiderata. It’s derived from the Nash equilibrium of simplified no-limit poker. It’s much less exploitable than prior mappings in simplified domains where we can evaluate it. It performs well in practice in no-limit Texas Hold’em. In particular it’s significantly outperforms the randomized geometric mapping which is the geometric average and randomized according to that. Alright, now comes something that I couldn’t help putting in because I’m so excited about this. This is not an overview part. This is kind of the bleeding edge of what we’re doing, or one of the bleeding edges of what we’re doing. Joint work with my student Noam Brown, the idea is that action abstraction we talked about could have a larger infinite branching factor. We pick the prototypes, let’s say those three. But the opponent can move out we have to map back. Alright, problems, well it’s a chicken and egg really. How you should abstract depends on the equilibrium because you should abstract things together that are played similarly. But you can’t start your equilibrium finding before you have the abstraction. If the abstraction changes you have to start the equilibrium finding from scratch. That sucks. Abstraction size must be tuned to the available run time. If I know that okay this spring I had a thousand cores with three months. Yeah, based on my practice experience in the past I know that this is roughly the abstraction size. But what if somebody donated us another month of computing, or another nine months of computing? Then we would have wasted our time running on a coarser abstraction that we could have used. Finer abstractions are not always better as we talked about and cannot feasibly calculate exploitability in the full game in large games like no-limit. The new idea is this; instead of going through the old sequence we’re going to collapse all of that into one. We call it Simultaneous Abstraction and Equilibrium Finding. Okay, so how it works let’s say that we have the original game like that, again, blue player, red player. We want to add the extra action for the blue player. The idea is that we assume the action was always there but was never played according to the counterfactual regret algorithm. Now, this is actually tying to the counterfactual regret algorithm that we talked about. Two challenges, what happened in that branch on iterations one through T that we really didn’t run on that branch because it wasn’t really there. This may violate the CFR algorithm. The regret bounds might not apply. Now we’re going to solve both. The first thing that is, we’re going to fill in the iterations. We’re going to generate this thing called the auxiliary game where we’re going to put all of this rest of it which we had already computed into a special node kind of an outside option that the player can take. Or the player can go into this new piece. Then we’re going to compute CFR in just this game which is much smaller. Furthermore, you don’t have to compute all T iterations of it because you wouldn’t actually have reached this node on all iterations of CFR. You can just weight it based on the reach, this kind of fills in what happened in those iterations. Alright, then we copy the strategy back here and voila we can continue. One fly in the ointment is that any imperfect information games some action may originate from multiple information sets. But we can solve that by putting in extra chance node in there. It plays according to the same probability it was reached in the T iterations in CFR. Okay an alternative to the Auxiliary game is called regret transfer. Something Noam and [indiscernible] put together. It doesn’t always work. It works for special cases where the payoffs are function of some Theta which is the action. Poker typically has this flavor where you can say okay I’m going to raise the stakes by a factor of three. Then you have an identical structure to another subgame in this new subgame. We’re going to store the regret as a function of Theta. When adding a new action Theta two we copy over the regret function and replace Theta one with Theta two. This runs in constant time so we can in constant time add this instead of the Order T time which the Auxiliary game required. Okay, now regret discounting applies to both the auxiliary game and the regret transfer. As I mentioned the second problem was that if the new action was never played we may have violated the CFR execution and the regret bounds don’t hold. We have to fix this. We can do this by de-weighting the old iterations. Zero means that we give them no-weight; one means we give them full weight. We have this theorem that says that how much weight can you give them and still satisfy the CFR regret bound? These are all measurable quantities that you’ve already computed. Eric you look unhappy or puzzled. >> Eric Horvitz: At best I was thinking about the question which is this, maybe you can answer it. Maybe you can just say it’s a Nash equilibrium so don’t worry about this. Is if you had two machines playing each other, doing this, with knowledge of each other. Are you still in a world where you can assume, I just don’t, I was trying to sort of entail what that would mean at the level of a reflective policy knowing about this algorithm, about both sides. >> Tuomas Sandholm: Okay, so that’s, sorry Eric I’ll have to take that offline. I’m going to think about that. >> Eric Horvitz: That goes with my expression. >> Tuomas Sandholm: I’ll have to think about that. That’s a good question because the way I’m thinking about this now. With Noam we’ve been thinking about this now. It’s an algorithm that you run before any play happens. But another thing that I want to do is that I see the pros be they human or computer play some actions that are out of my abstraction. I want to throw them in my abstraction and do this somehow online. >> Eric Horvitz: I guess I don’t know if all your assumptions hold in that situation? >> Tuomas Sandholm: I think they do because I’m just ignoring what the other guys are actually thinking. >> Eric Horvitz: Okay. >> Tuomas Sandholm: I’m just going to, and one thing that I’m thinking here is that I’m going to let them play out of the abstraction and not do this. If according to best response they’re actually shooting themselves in the foot. We could actually know that the human pros that we were playing this spring. Some of their manipulations were actually hurting them. Just let it go. Then those ones that actually hurt us we throw those back in the abstraction. >>: One interesting thing to see if it’s similar. It’s not just part of the human pros but give them a copy of this where they can express their beliefs of what hand they think the computer holds. Run that algorithm and say, if I knew what the computer was holding according to my beliefs, what would the algorithm do in the situation? How do they change their behavior and response? In other words they still don’t know what the computer’s doing. But they’re able to simulate the computer given their own beliefs about what the computer holds. I put you on an Ace, King; you know unsuited what would you play in this situation? I can actually run the algorithm and see what you’d play. I can adapt that to how I think you’d play. >> Tuomas Sandholm: Okay, interesting I’ll have to think about that. Okay good and it’s not always best to go all the way to the least T weighting. Here’s something that worked well in practice inside [indiscernible] of theorem. Again, these are things that you’ve already computed so no extra effort there. Where and when to add actions? Well, we talked about doing these online where based on what the humans do. But we can actually do this automatically offline. You’d want to add to the abstraction actions that are exploitative in the original game. One way to do it is to compute a full-game best response and see which ones exploit. The idea that we’d even experiment is that we add in actions when the derivative of the average regret with respect to time is more negative with the action than without it. Here’s the formula for it. Let’s not go into the details on that. The key here is that we can add any number of actions at once. One action can be at multiple information sets and added like that. You don’t have to just add piece by piece. Also you can of course start from some manual abstraction and then add on top of that. You can also have stronger conditions to be more conservative about what you add. The theory still goes through. Their claim is that, an obvious actually, eventually this will add every action that has Omega T regret growth which guarantees convergence to an equilibrium in the original unabstracted game. It will also avoid the abstraction pathology. Alright, Full-Game Best Response that’s kind of sub-routine here. Actually let me skip that because that’s kind of a detail. Removing actions, some actions may be added early, but turn out to be useless. We can also remove them so they don’t keep dragging us, our computation down. In two-player zero-sum games if an action is not part of a best response to a Nash equilibrium strategy. Then the regret goes down to minus infinity and the action needs to only be explored in CFR for a constant number of iterations in the beginning. This could be a large constant but still a constant. Furthermore, some of these iterations can be skipped. The idea here is that we can project if we do negative regret. How many times would I have to hit that part of the game in CFR to get to zero regret? Only then would that start to get revisited. I can already project that I can skip so many iterations and I don’t have to go to that sub-tree in those iterations. Experiments well we tested on Continuous Leduc Hold’em. Leduc Hold’em is one of the standard benchmarks in the field. Continuous just means that we have continuous actions. The initial abstraction contains just a minimum and maximum bet sizes. We’re not putting any handcrafted information in. We are testing against fixed abstractions with branching factors two, three, and five which are placed not just uniformly but smartly using the pseudoharmonic mapping. Here’s what we have for the benchmarks. If you look at here, Full Game Exploitability as we compute more exploitability goes down. The smaller the abstraction the quicker it converges in the beginning. But then it ends up higher and doesn’t go so well in the long run, as you would expect. Now we put our stuff on here and you can see that we actually do better than the small abstractions and better than the large abstractions. It really accomplishes what you wanted. It reaches zero while any fixed abstraction that’s not the full game will actually cap out and not reach zero. In fact it will over fit into the abstraction and start going up in the end. Okay, so let me just skip that. Okay, so now we can talk about two more pieces here. One is Opponent Exploitation and then the actual how is the State of Poker. I can get both of the in within the ninety minutes or we can skip one or the other in the interest of time. >> Eric Horvitz: You might want to leave some time for questions since the sessions so long. Why don’t you just pick. >> Tuomas Sandholm: Okay, I’m going to try to zoom through both then. [laughter] More is more. >> Eric Horvitz: I thought of applying one or other. But that’s okay or is also mathematically inclusive or so I… >> Tuomas Sandholm: Okay, well I’ll skip this part then. I’ll skip this part and I’ll just tell what it is. One is a hybrid between playing equilibrium and using opponent exploitation. The Machine Learning techniques for opponent exploitation have been tried in poker and they really can’t hold a candle to the game theory stuff. But the game theory stuff doesn’t fully exploit the opponents. We have a paper at Hybrid [indiscernible] starts from the game theory stuff and as we get more and more evidence that the opponent is making mistakes. We adjust our strategy to exploit. The second paper I was going to cover is what we call Safe Exploitation. If you start to deviate from equilibrium, so in from all equilibrium, so you’re playing something that’s not part of any equilibrium. You can exploit the opponent more. But you can open yourself for counter exploitation. The full reason was that that’s kind of an inherent problem you can’t get around for now. But now we can actually ask is it really true? Or can we exploit more than the game theory, any game theory strategy would and still be completely safe ourselves? The answer is surprisingly you can. Why? Well, what you do, you can’t, the high level idea is this and the first part is wrong. You would like to bankroll your further exploitation by the winnings you’ve made so far. But you can’t quite do that because if you mark with your upside you still have the full downside and your expected value becomes negative, becomes below your actual game value. What you do you tease out the role of luck from the role of mistakes the opponent could have made and given you money. You measure this or at least lower bound this part of what your opponent gave you due to mistakes. That is the amount with which you can bankroll your exploitation into the future and still be fully safe. That’s it, so now let’s move to poker. That tradeoff doesn’t start at zero. State to poker and this is the fun part; oh I guess it depends who you are. How many of you play poker? Okay, great. State to poker, so as I mentioned in Rhode Island Hold’em three point one billion nodes in the game tree, exactly solved. Key was a lossless abstraction, standard LPs holder. Nothing fancy in the equilibrium findings on it. Heads-Up Limit Texas Hold’em, the bots surpassed the pros in two thousand eight. University of Alberta organized the Man/Machine Match against two pros in two thousand seven, lost it. Two thousand eight they did it again and won it. In two thousand and eight in Limit Texas Hold’em which has ten to the fourteen nodes in the game tree, information sets in the game tree, bots surpassed humans. Now, it’s was the Alberta guys call essentially solved. It solved within a very close bound to optimal, one milli big blind per hand. What was key here, this was a new variant of CFR. They used a standard lossless abstraction methodology. They actually hardcoded it in a pseudo-isomorphism to preprocess and then they used a new variant of CFR for that. Head-Up No-Limit ten to the one hundred and sixty-one information sets. Much bigger a whole other can of worms. You can’t even measure currently exploitability how close to optimal you are. Tartanian seven won the Annual Computer Poker Competition, so that’s our bot. What we did then for the next few months we made an even better bot called [indiscernible]. Sorry, called Claudico. Then I organized this Man/Machine Match where I got four of the top ten pros in the world in Head-up no-limit Texas Hold’em. Specifically, to come and play eighty thousand hands of Heads-up no-limit Texas Hold’em at the casino in Pittsburgh. Each one was playing alone against Claudico. Here are the four pros Jason Less, Doug Polk, Buren Lee, Dong Kim. Here’s the super computer we used to compute the Nash equilibrium. It’s a black light super computer and here’s our team, so Noam Brown, and Sam Ganzfried who are my Ph.D. students. Alright… >> Eric Horvitz: Can’t get having this… >> Tuomas Sandholm: What? >> Eric Horvitz: I like how you displaced it and something’s sitting there, right? >> Tuomas Sandholm: Say again? >> Eric Horvitz: It’s like a display sitting there. >> Tuomas Sandholm: Oh, I know, I know. I didn’t, I thought that was funny that they actually made a custom edge of table and all of these things. Then they made like a chair, it had a chair. Like there’s something there but there’s nothing there. [laughter] I thought they did, the casino did a fantastic job with the setting. It was just awesome. We’re actually doing duplicate poker to reduce the role of variance. Two of them were actually playing in public with the reverse cards and other pair of them were playing in a private room with armed guards upstairs, so there was no cheating. We tried to really reduce the role of luck here to try to get statistical significance. But we failed so eighty thousand hands although that’s right at the upper limit of what the humans could do in terms of their time. Still not enough, so the humans won more chips than the computer. But it was so close that we couldn’t get our ninety-five percent statistical significance to say whose better, but oh, well. >>: Is there, how do they rate the pros? >> Tuomas Sandholm: Yeah, these are four of the top ten pros in the world. Doug Polk considered number one. He came number two against the computer. Buren Lee actually came in as number one, won twice as much as Doug against the computer. Dong often times considered number two or three in the world barely beat the computer. Jason Less lost to the computer. These guys carried the day. >> Eric Horvitz: It really is a festive looking at the methods to think about you know what on earth humans are doing. >> Tuomas Sandholm: Oh, my goodness. [laughter] Oh, my goodness and these guys don’t have the game theory vocabulary. I mean he has a Math and Economics Bachelor’s from University of Chicago. He’s a Computer Science Bachelor’s. No college degree. I don’t think a college degree. He was drawing graphs to me about the endgame where the y axis is a defense probability. The x axis has this thing and he had these curves. It was spot on. Like if I had to teach that stuff you know I could take this guys graphs and the guy doesn’t have a college degree. It’s so amazing. When we… >> Eric Horvitz: But even with the idea that the obvious noisy abstractions they must be using are getting so close to beating your… >> Tuomas Sandholm: Oh, they are beating, well yeah. >> Eric Horvitz: Yeah, something you said about whatever these abstractions are and then plus the amount of noise and all the craziness, without theorems and proofs. >> Tuomas Sandholm: Unbelievable, unbelievable and somebody might say hey well they played a lot of poker. They read books and played a lot of poker. Wait a second we play more poker in self play on the super computer every spring than mankind has ever played. [laughter] >>: Okay, anyway it’s just a reflection. >> Tuomas Sandholm: Yeah, it is very impressive. Also when we changed our bots like on some days we were turning the endgames over on versus off. It would actually have different flavors. We would change the pseudoharmonic mapping whether we’d randomize it or not. You know to try to throw curve balls at these guys. Within a hundred and fifty hands they picked up on everything. It’s just unbelievable. Anyway this is what it looked like. This is on Twitch and YouTube. You can look at all of the hands. This is really like a university for poker if you want to study this. You can look at two weeks of poker. >> Eric Horvitz: You have a comment about Microsoft [indiscernible] how we, why our names printed there? >>: Well, in the State of Pennsylvania in order for this to be legal there needs to be real cash prize money put up. We, MSR provided the cash awards for… >> Tuomas Sandholm: Yeah, so we had, the pros need to get paid to do this. We couldn’t gamble for real money. The Pennsylvania Gaming Board didn’t allow that. In hindsight that’s probably a good thing, CMU would have lost part of its endowment I guess. [laughter] Generously, Microsoft sponsored half of the prize and the Rivers Casino in Pittsburg sponsored half of the prize. In addition we had the Pittsburg Super Computing Center they were sponsoring the super computing. AI Journal was sponsoring the laptops and so forth. Thank you. It wouldn’t have been possible without you. I mean quite literally we were like two weeks before and before the commitments were hard. You know the casino would not have run this event if it hadn’t happened. >>: It was for some reason our Purchasing Department had a hard time issuing a purchase order. [laughter] It took awhile. >> Tuomas Sandholm: I had, yeah, thank you. Okay, so these pros took it very seriously. Two weeks of poker, one day break. These guys, these are not the kind of cigar smoking, scotch drinking, Stetson hat wearing guys, all American guys. These are like international pros who study all the time. They have computation tools. They flew in a guy from Florida to help them do the computational analysis during the day. During nights they were doing computational analysis… >> Eric Horvitz: Is that allowed, that’s allowed? >> Tuomas Sandholm: Well, we allowed them to do anything. >> Eric Horvitz: But I guess you’re saying that one day there might be human versus machine where there’s no support, right. But… >> Tuomas Sandholm: Yeah, we allowed them to have support. >> Eric Horvitz: Yeah. >> Tuomas Sandholm: Yeah, well you know we… >> Eric Horvitz: Except for your computer. >> Tuomas Sandholm: What? >> Eric Horvitz: Except access to your computer. >> Tuomas Sandholm: Except access to ours. [laughter] We gave them the logs every night so they could analyze that. They had all of the tools. They could use computers so they were kind of a human computer hybrid, if you will. But they took it very seriously. They’re stretching here in the morning. They were eating oatmeal at the casino for breakfast. I saw them drink one glass of red wine during those two weeks. I hope that it would jinx their team but I guess not. Okay, multiplayer poker, well bots aren’t very strong. In special cases like programs for jam/fold tournaments we’ve solved it near optimally. But by and large it’s not even clear that the Nash equilibrium is the right thing to play there. There’s some really interesting results from University of Alberta for Three Player Koon Poker. Even there you know what strategy I’d pick I can’t really help myself or hurt myself. But I can allocate the money between Max and Eric radically differently. It’s, Max wants it, yeah. It’s not even clear if Nash is the right thing. Then what can we learn from bots? How do humans learn poker? Well, they read books and they play poker. Who wrote books? Well, humans and it’s kind of recursive thing, it kind of falls on itself. There’s no ground truth there. In contrast to bots they’re working from the definition of the game only and the definition of Nash equilibrium. They’re sitting on ground truth. The bots actually learn to play very different kinds of strategies than the humans have evolved to play. Problem is the bot strategy is a big probability vector of one point four terabytes in the case of Claudico of probabilities. It’s hard for a human to understand stuff from there. But I’ll mention a few things that it does differently. First action to limp, so limping in poker is that when it’s your move you’re the first mover. Typically you want to raise or fold. Limping means that you just call. It’s like okay I’ll play that part. Consider it a weak move. Here is what this book says about it, poker book, “Limping is for Losers. This is the most important fundamental in poker-for every game, for every tournament, every stake: If you are the first player to voluntarily commit chips to the pot, open for a raise. Limping is inevitably a losing play. If you see a person at the table limping, you can be fairly sure he is a bad player. Bottom line: If your hand is worth playing, it is worth raising.” Similarly, Daniel Cates whose one of the other top ten players, not one of the ones in the tournament he verifies that in Two-Player Heads-up no-limit limping is a bad idea. Well, our bot limps. [laughter] It’s not just this bot. Every bot we’ve computed for this game has always limped between eight and twelve percent of the time. That’s an indication that limping might not be a bad idea. In fact the name Claudico is Latin for “I limp”. [laughter] We named it after its signature move. Alright, Donk bet. A common sequence in the first betting round is that the first mover raises, the second mover calls. First mover is representing strength, second mover is not. The latter has to move first in the second betting round by the rules. If he bets that is called “donk bet” or donkey bet or bad player bet. Like if you represented that you’re weak and now you’re representing you’re strong. Eh, something’s rotten in Denmark. You’re not really that credible. Consider it a bad move. Our bots donk bets and with various sizes as well. >> Eric Horvitz: There’s a Latin word for donkey in there somewhere. [laughter] >>: By the way, when watching it. When you’ve witnessed that you as a human you’re sure you’ve misremembered. >> Tuomas Sandholm: Oh, yeah you can start to doubt yourself. >>: Yeah. >> Tuomas Sandholm: As a human you’re like, woo that is so weird, now I must have misremembered. But there’s actually a string there that encodes the whole hand history. We gave that to the humans as well. We didn’t take; try to take advantage of the human’s bad memory. Not that they have a bad memory. They can remember these hand sequences from days ago in full detail. But you know if a layman like me plays you know it’s nice to have the sequence there. Okay, no I actually didn’t remember, he didn’t misremember. He did make that donk bet. Okay, using more than one bet size in a given situation risks signaling too much. Remember we talked about the signaling. Most pros use one bet size. Some use two and this is a little bit of an old bullet. Now a day’s pros have started to vary, some pros have started to vary the bet size a little more. Our bot uses a wide range of bet sizes, randomizes across them. What the human said is it’s perfectly balanced. It will bluff and it will value bet in the same type of situation with the same types of bet sizes, including huge ones and tiny ones. It will make a ten percent bet on the River to open. Or it would go all in on top of one fortieth of the pot. Sorry, forty times the pot or thirty-seven times the pot, and so forth. Alright, conclusions, domain-independent techniques combination of abstraction, equilibrium-finding, reverse mapping, and then opponent exploitation. Claudico we turned opponent exploitation off completely so Claudico could never actual saw how the humans played poker. We just did it in a purely game theoretic way. Let me leave it at that. >>: Can you say why? >> Tuomas Sandholm: Why? The opponent exploitation techniques really risk a lot depending on the technique because they’re not safe. Except the one that is safe but it doesn’t exploit much either. We thought that was a risk. We thought that there’s very little to exploit in these top pros. We just didn’t go there. For the next time I have my own ideas as to what we’re going to do. We’re going to do some of that in a very different way than we’ve done in the literature so far. >>: That’s an assumption that’s important to test because you know if maybe there is something that you can exploit. >> Tuomas Sandholm: Yeah, maybe there is something. I’m sure that there’s something to exploit. The top players they say it. Okay, I believe that there’s a lot to be exploited. It’s just hard to find those exploitations in the reasonable number of hands before you’ve lost a whole bunch of money trying. One thing that suggests that there is a lot exploit is that the human poker play in no-limit Texas Hold’em has changed quite radically over the last ten years. It was actually kind of soft game ten years ago. If you put in a half a year of study you would actually make a lot of money. That’s not the case anymore. Now a day’s top pros are very good and they’re randomizing. They’re using these notions of balance, card removal, very sophisticate things. They’re learning from each other. They actually have these schools. Doug Polk is actually the trainer of two of these other pros, so Dong and Jason because he’s so. People are so scared of him online he doesn’t get any action. He’s so good nobody wants to play him. What he does he takes these younger guys he calls students. Just like professors he calls them students that he trains. Who come in with a no-name and will play not as quite as well as him maybe but really well. Using his strategies and they will get action until they become too famous and nobody wants to play them either. But that is how the ecosystem works. Thank you. [applause] >> Eric Horvitz: Since we had the discussion along the way and it’s already noon so if it’s a burning question we’ll take it. But otherwise thanks everybody. Great thanks. [applause] >> Tuomas Sandholm: Thank you.

Document 17865075

Related documents

Products

Support

Document 17865075

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib