>> Li Deng: Welcome to the lecture by Professor Ray Liu. So let me give you a very brief introduction to our speaker today. So Professor Liu is a chair professor from University of Maryland. He has been extremely active in a wide variety of research in signal processing, information processing and in many type of decision learning areas. So today, we get the opportunity to very many to come over to teach us something that what he trying to be beyond the traditional machine learning by incorporating the decision making into part of the learning group. And a few [indiscernible] about Professor Ray Liu is as follows. And he is the leader in our signal processing society in IEEE, and he has been doing a tremendous amount of leadership work, including running some of our society's best publications such as Signal Processing magazine, which he started [indiscernible] really sort of excited our entire community and some related communities to our Signal Processing. So without further adieu, I will let Professor Liu teach us more detail about decision learning. Okay. Professor Liu? >> K.J. Ray Liu: Okay. Thank you, Li. I would like to share with you something that we have been working on recently, that we try to combine learning and strategic decision making together. And I'm going to use some example, a few example to illustrate the possibility that why this is important that we combine together. So first, we are in the era of big user generated data, and user generated is a very important keyword here. It's not just collected from this nature. Now, more and more decision and activity in our daily life is being record, track, and share. So this is lots of data [indiscernible]. And that is all because we have this lots of more mobile device sensor, social network, global cloud. And this abundant and still growing this real life big data have a tremendous opportunity for us and for many problem that we can study, okay, just illustrate that we can study all the behavior, sentiment, propagation modeling, traffic and there's many things we can study. And up to this moment, machine learning have been a very popular tool that we can study this, and it is trying to use reasoning to find new and relevant information and given some this background knowledge. There are many applications, I believe, that for the audience they all know that, and this machine learning algorithm have basically three elements. That is this representations, evaluation and optimizations. And representation is for to extract this feature from the data. And this evaluation is the objective function that we built in from this knowledge. Optimization is the method that we use to optimize the objective functions. So there are three very basic element here. And it seem to be more sufficient enough for many problem that we solve. However, there's a limitation and constraint here. The generalization assumption may not hold. What does that mean to the generation assumption? That is for this machine learning algorithm to hold, the training dataset had to be consistent with the testing set, okay. That may not be true for many user generated data, especially user behavior may change at different time under different settings. And also, most of the machine learning, this algorithm, they are for single objective function, okay, from system designer point of view, it's a single objective and cannot cover users' different interests. And most of these application and in the modern era, in media, social media and all of these, there are many uses. They all have different objectives. They interact, they make different decisions and these decision affect each other's decisions. Okay. So that is have different [indiscernible] and different objective and user also they are rational and therefore they're naturally selfish. They want to [indiscernible] their own objective. And the knowledge on contained in this data is difficult to be fully explored, and because of this data is the outcome of users' interaction. So this is such a different element and the user's interaction will [indiscernible] that each user had to make their own decision that cannot be [indiscernible] by traditional machine learning algorithm. So that's why we need strategic decision making here. And game theory is an ideal tool which is started as strategic decision making. First, it is of multiple objective functions. Every individual, every element, every machine all have their own objective functions, okay. And this is naturally involved in this concept of equilibrium. That would be the best for everybody. Just like in our society. We have a rule and policy and under this rule and policy, we all individually decide what to do and we have a common equilibrium and users' local individual interests will be taken into account. So this is basically to try to explicitly take into consideration of this user interaction and we need to come up with an optimal decision and, especially, this is optimal decision for smart and connected user. Why is a smart connected user. We can say that they are intelligent, they will learn and [indiscernible] from our environment. And especially that is a lot of this computing paradigm has become distributed. Each one is different, has a different individual learning, basically is for making optimal decision. And learning and decision making, in fact, often are coupled together. They cannot be separated. And due to what? Due to network externality. So what does that mean, network externality? Network externality, you will see again and again here, that is the influence the others' behavior on one's reward. Somebody else decision may affect my decision, okay. And just for example, like we are going to a restaurant. And if on that day, many people make decision to go to the same restaurant, and then you have to wait for a long line. That affected each other. So it is not we, each individual make a decision and that's it. Many this modern social media environment and in many application, the influence affect each other with this network externality. So we try to come up with something that up to this moment, learning, machine learning has been an item by itself. There's many thing have been developed. Strategic decision making is an item by itself as well. That is using game theory to study all these different principle. We have been using this for more than a decade in communication, in security, in behavior. Recently, we just, we found that in many applications that we see, these two items need to be a bridge in between. And this bridge is something that we are trying to build and we call this decision learning is learning with strategic decision making. So today, I'd like to use three example to illustrate that. We have developed some tool and you will see later, and for given the interaction with Li, I tried to pull out with three particular item to illustrate. One is on this online marketing, and that we will see that how information diffuse over online social networks. And you may ask, how can that have anything to do with decision making. I want to illustrate with you with some example, although I may not get to the detailed formulation. We have some paper [indiscernible] all the material. Second, we'll use a Groupon. We develop a family of this new game called Chinese restaurant game that learn combined with Chinese restaurant processes in machine learning, with decision making that will be able to study how customer learn and choose the best deal. And their interaction affect each other. And also, from crowdsourcing points of view, how can we design a mechanism in such a way, in such a way that we can achieve the desirable outcome, okay, and that is something that we are going to see. Okay. So I will start from this one. And I will because this, this lots of detail, I will try to point out to you the conceptual idea and concept so that we can understand and discuss more if we need to get to detail, we can discuss offline. So first, motivation. Information diffusion. This is a Twitter, this hashtag for 2008 U.S. presidential election. You can see this Palin, Sarah Palin talk about this lipstick on a pig. So when she say that, there is a star moment, and then everybody tweet or try to post it or forward, and then there is a pig intensity in the study [indiscernible]. And different phrase, people track at different time. They always will be propagate and then spread. So now our question is can we understand this phenomenon? How can we model this phenomenon. This is what online edification and for many application I believe [indiscernible]. Yeah, I think that I ask my student to make sure we put this correct here. Not an Apple one. So online advertisement and also motivation for this like this malicious or erroneous information suppression. As you can see, some of right word eventually become the completely wrong word just because something happen in the middle. So we had to understand what is this process, okay? Information diffusion problem. How information diffuse. User exchanging information over social network. This is another physical phenomenon. This is a physical phenomenon that we simply drop water into a pond and then see how this water diffuse. That is a physical phenomenon. But with the human being's interaction, it is not physical phenomenons. If I tell you something, if you are going to post it, you are going to tell other, you had to make a decision first. If this is interesting to you, if this is right to you, do you think other would be interested if you say so, like enhance your reputation or get you a [indiscernible]. After you make a decision that you had to that you would do so, it is a decision processing bar. So why we need to study this? We can grasp the dynamics of information process and predict and control what is the start and the peak time. Can we do so. For example, you have a new device, you want to do an advertisement or it's a president election. This is election day. Then if you didn't design this information, [indiscernible] then the pig and the message will happen after the election. So then why bother to do so? Okay. So we can study this. Now, most of this decision is based on machine learning to study this information diffusion. This is no surprise. That is a well established tool. Analyze the characteristics of information diffusion, that's one aspect. The second is to model the dynamic diffusion process. The constraint and limitation is very similar to what I had just illustrate. We rely on the dataset and then we structure, okay, because machine learning is you have to be given a dataset, structure and from that we learn. It's not given what do you do. Totally ignore the users' action, decision making. We do have a decision making here and difficult to involve mechanism design to achieve desired output. You will see the other example I will not talk about here. So why game theory? There is already illustrated, there is a strategic decision making here. For information to diffuse, it rely on other user to forward information and then rely on them to make a decision. Now, we have seen that and it depend on information is exciting, is friend will be interested or not, and focus on. So we cannot ignore this aspect. We have this is the micro aspect of this user interaction. If we want to model this well, we need to also be able to model that behavior and also how can we achieve this mechanism design. And this is [indiscernible] and we use evolutionary game to model. Why? Because information diffusion is just something evolve. It's just like evolution process. So we are using one branch of this [indiscernible] information diffusion is whether to forward or not. Somebody had to decide. And evolution process is this mutant, okay, our gene suddenly change one day, and then the question is whether to take this mutant or not. And to take this mutant or not, sometime is not have a choice. Sometimes it's nature's choice. It's a good day, everything [indiscernible]. Bad day, [indiscernible] can come, okay. So that will decide. And so we had to understand this and social network, we're talking about this social network have a great structure. Therefore, we also use a graphical version of this to study. Now what is that? But before we talk about graphical structure, let me explain what this evolutionary game is. That is something that has been used a lot in biology, ecological study. Now, to study the population shift, okay, of certain mutant in gene and if which one [indiscernible] in fact, we can also use that to study let me explain to you, what this? This is first all the equation. You can view this as total population. But populace change of people moving to Seattle, okay, is in proportion to if you live in Seattle, what [indiscernible] do you get minus the [indiscernible] utility of all the utility you can get in other city. If this is a much higher, many more people will move into here. So it's a dynamic. Is it lower, people will move out Seattle. So that is basically called replicated dynamic. That use a lot in this evolutionary game. And this does have a stable condition. We call evolutionary stable strategy. At the end, there will clearly be a stable population in Seattle, okay, and we have to understand that and we are using to use this concept later on to study information diffusion, and this is the example for that. This is the evolution of the deer with antler. Yes? >>: I haven't meant this before so this may be a naive question. Why is the average utility that suggests that I have no choice where I move next? For example, I wouldn't move to a city I really don't want to, but I would move to a city I would want to. It should be a max. >> K.J. Ray Liu: I think that is this. That is to study that move to this particular city against the rest and average, okay. You can also against one particular city. That is also fine. >>: Against the maximum user. >>: Against [indiscernible]. >> K.J. Ray Liu: Yes, so this is to study how population shift may move to Seattle, okay, compared to the average sense. Depend on which one you want to study, you can always change that, okay? So now let's take a look deer antler, with antler, okay, and with antler, suddenly, there's a mutant and suddenly the deer have this and some deer without. So with and with, if they come together, they have to file for female, and what damage they can have to each other. If with and without, what advantage so with no C would be larger. With is better. Why better? We [indiscernible] and A and B really depend. Without, without may be the best because they don't hurt each other. This, AA, may be very bad. So anyway, replicate [indiscernible] will basically say that what were the population of [indiscernible] X1 is with, with antler. That really depend on the what advantage that he may get, the differential of the advantage that you may get with this. And the equilibrium point is when this becomes zero. That zero meaning is not changing, okay. That is the equilibrium point. Okay. So this is just a very small example to explain this [indiscernible] dynamic. And we will see this result later on. Graphical evolutionary game is now I have that is [indiscernible] population. Now I have a structure. I'm here, and then I know [indiscernible], I know recall and we newly met today so we have different graph structure and now we have a graph structure we have to consider from [indiscernible] concept. It's no longer random. So graphical evolutionary game say that okay, now, we have this consider structure population. There is a notion of fitness. And this fitness now those who fit well may have some advantage or may survive or can [indiscernible]. We will see some example. And users fitness depend on its own fitness as a baseline plus this interaction with environment, okay. With the environment, there's a selection, selection from the environment, what would be my fitness. So anybody will have the notion of a fitness. And this graphical evolutionary game usually is analyzed under three updated rule. There's birth death, death birth, and imitations. I will show you a few. >>: [indiscernible] evolutionary game theory. And not evolutionary game theory that what kind of examples like [inaudible]. >> K.J. Ray Liu: This is the one place, because this you will see we try to model how it evolve. There are other games, such as cooperative games, noncooperative game, and potential game and also this is [indiscernible] game. There are many different games, okay. Depending on the scenario. Okay. Now, this is strategy update. You can see we have a population. We are going to use a few of these to model. I will not get to the details. I just want you to get some feeling. What does that mean, this birth death update. In this evolution, there got to be a randomness there. So here, we have initial population. We select birth is select proportional to the fitness. We will calculate fitness. I will not get to the detail how we calculate, okay? I had a paper if you want to show it. We select one. We saw very high fitness. And then this is [indiscernible] and we're going to select somebody for test. And then within this, we are going to select its neighbor, randomly select one to be dead. And then what does that mean, dead? Dead would basically be replaced and copied from this particular one. That mean this will eventually evolve in such a way it's a mutant, that change. [indiscernible] is test is randomly select, we select randomly. Once we select randomly to select, this is according to die, but after it's dead, which one had to copy then goes look for the neighbor and select one with the highest fitness number and then this guy is going to copy from that one. There's always randomness there and there's always a selection. It's a randomness and selection process just like nature evolution process. Imitation rule, we randomly select one to imitate unless select one and then this one is going to select whom to imitate from and then have to [indiscernible] proportional [indiscernible] to determine who is [indiscernible] going to imitate. So we are using this rule to model, in fact, in our paper, we use this and this both to calculate, and I will show you this result. Maybe before that. So graphical evolution formulation, a social network. So now we have seen this graphical evolution game. A social network is like this. The only difference from traditional, this communication network is all this link are abstract link. There's no direct link morning them, okay. This can be but we can foresee there is a graphical structure here. And the information diffusion is going to diffuse among here and every neighbor is going to decide what I am going to forward or post for you or not, and that depends on the fitness will become a notion if this information is something that I like, my neighbor is interest, you have to quantify that. And then from there, you decide you want to forward or not. So these two have similarity. Graphical evolution in the game, social network, they all have something that has graph structure. This is a [indiscernible], this is a user in a social network strategy. Strategy here in social network is if I forward you the information, if I post the information, you also post. What is the utility? This is the utility to determine that I mean if I post and you also post, you and I will gain such a utility. If I post, you don't that will be the utility between you and I. And if I don't post and you post, if I don't post, you also not to post, that's a utility again. So this is the case of the payoff function [indiscernible] interaction between you and me. And as we say, utility fitness and [indiscernible] point. >>: So would it be fair to say that in this case, the neighbor in terms of the note is the people that you have in your [indiscernible]. >> K.J. Ray Liu: Yes, it could be that I have Facebook, right, and all the people that I >> But now according to this formula, the way you have here, you don't model how this list is created and how do you decide whether you want to ask somebody [indiscernible]. >> K.J. Ray Liu: That's a I did not okay. I skip uniform structure and also information. You will see information here, yes. graph structure, and in my study here, all the detail. I study this all the other one is this [indiscernible] some information. I use Facebook >>: [indiscernible] if I don't forward information, how my neighbor can forward it? She doesn't have it? [indiscernible]. >> K.J. Ray Liu: >>: If I forward but I don't forward, okay. If I don't forward. >> K.J. Ray Liu: If you don't post, okay. If you don't post, but for some reason your neighbor may not learn from you. Because everybody have a connection. Your friend is not just me today. You also have other friend. Okay. So I skip all the detail. To calculate the fitness, everybody will calculate from its own neighbor. Okay. So I will just show you some of the result and interpret from the result the detail you can look the paper that we have. We can calculate a dynamic of the information diffusions. Also, we can find a final steady state of this population percentages that agree to post. And so what does that mean? Okay. Let's look at this as an example first. This is the one that we used from Facebook. Here we have about [indiscernible] user. Many ages. In total here, there are ten big circle, okay, ten big sub group, okay. And this network is scale free, okay. Scale free, meaning it satisfy the parallel, satisfy the parallel meaning the number of user that with this degree of this connectivity that increase, it would decrease as the [indiscernible] gamma to the minus two or minus three. So when you take a look, you'll be straight line. You'll see this is almost a straight line, right. So this is the parallel. Parallel meaning the more and more [indiscernible] with this connection. So when we verify this is that, now we use our game to model. We have four different cases. Let's look at four different cases. Case one is if I post, you also post, we get very high benefit from each other. The last one is if I post, you also post, we both may not have a good reputation, because that may be something very bad, okay? We don't want to talk about it, okay. And if I post, you don't post, it's very high, but the if I forward and you also forward, it's 0.6. Number three is forward and forward is 0.4. So now let's see what happened. dynamic of this situation. You will see more. This is the Eventually, it will go up. Eventually, it will reach to the steady state. The [indiscernible] one is case one. In the case one, eventually, you will find that almost 100 percent of the population, everybody posts. Because if you post, I post, we have very high advantage. The last one, almost zero percentage, because the utility of that is not as good as all the other. That is the worst so people will choose not to. And this one, case two, is this red one because still has an advantage but not as good as this, so there is a certain population, almost 70 percent, it doesn't say here, this one is almost 30 percent people will post and follow with each other. That's a steady state. >>: But in this case, the utility [indiscernible]. >> K.J. Ray Liu: Yeah, you quantify that whether determine how that is, okay. Yeah, we have to determine how that is. Now this is the one that we use this Facebook information, we had ten subgroups and each subgroup, one to ten, we used this utility and we can find that they all match with the simulation. One is our theoretical result, one is the simulation result based on this real data, okay. So yes if we are able to quantify this utility, how do you quantify it that's a different issue we are able to verify that the result is similar to the model that we have. And from the extra data that we can obtain. >>: So you choose this zero [indiscernible]. >> K.J. Ray Liu: We use all this, yes, yes. >>: [inaudible]. >>: [indiscernible]. >> K.J. Ray Liu: I'm sorry? >>: Can you have [indiscernible] information flow just [indiscernible]. >> K.J. Ray Liu: that. You will see, okay. I'm going to talk about >>: How do you know? I mean, can you relate somehow the extra data has [indiscernible] using this as sort of [indiscernible] so how do you know that [indiscernible]. >> K.J. Ray Liu: I think they are very close, okay, because this is all the number how many data that you use average, okay. I will talk about that. This is just one now. Now, the second we use is a memetracker data. This is for phrase cluster dataset to track. You can see that this is larger amounts of data. And let me explain what this memetracker is. It is to track this is a news organization when there is a [indiscernible] post. We are not commenting on the story, I'm afraid. Then some other news organization or news or website will pick it up. We'll say this one. And at different time, all the time, we are not commenting on that, you see. This one is not there, but it's there. So we have all this, and then give the other one this pickup a different kind. We are not commenting on that story up to here. But this to pick up. So this is a tracker that track how sentence and phrase may change and may propagate and who propagate at what time. So now we use this to do what? We want to verify the dynamics. You see that because now we have this we able to calculate and model the dynamic. This is to track the word called GoogleWare. This GoogleWare, okay, that and this is to David Archuleta, and this is NIN and this is Tehran. You can see that the this gray color is the actual data and the red one, the record is the one through our model. Through our model. And you say how do we know the utility function? We don't know. So we do [indiscernible], okay? We do [indiscernible]. From a [indiscernible], then we are able to determine. This is the blue one that is using is this in this [indiscernible] framework. As you can see that we can really produce a very accurate result in order to disquiet this dynamic. >>: The idea is that you use this [indiscernible] which I call training data and then the [indiscernible] whether the same kind of parameter [indiscernible] that you learn from this data [indiscernible]. >> K.J. Ray Liu: >> I think that Is that the trend data or test data? >> K.J. Ray Liu: This one. This one, I say one only had to be used as training data, but I don't know which one is that, okay. And the rest of them would just use to reproduce for other different [indiscernible]. >>: Just a quick question about this. network structure here? >> K.J. Ray Liu: >>: What do you assume is the This is uniform structure. Everything >> K.J. Ray Liu: Yes. In fact, it is [indiscernible] so it's a uniform, but it's a scale free network. We assume it's a scale free network and the detail of this equation we start with the right in the paper, yeah. >>: I thought that the physical data, this come from Facebook. So memetracker is not >> K.J. Ray Liu: Memetracker, okay? >>: [indiscernible] Facebook data probably give you the structure. >> K.J. Ray Liu: Yes, yes. And this is from the memetracker, okay. It's popular. Now, use this. This is a question you ask this one we do the experiment in a different way. We find these five group of site in the database. Each group had 500 sites. Ske will estimate the equilibrium for each group. And once we find this equilibrium, then we will go over the corresponding payoff matrix, okay. The payoff matrix go back to estimate. And then from there, we are going to do experiment, okay, for based on that data. We will estimate. So what do we find here? You can see that group number one is within this group, if somebody posts a message, about 19 percent of the people this that group will post. Group number five, if somebody posts this message in that particular group, 81 percent of the rest eventually will post the same message. What does that mean? That mean people in this particular group, they share major common interests with each other. However, this one may not. Therefore, what does this mean? If you want to particular to [indiscernible] some political message or some marketing effort, this one, they are more cohesive, so you may be able to identify a group like that, and then you can be sure the message you will be propagating more amount the user here. So we use our method, we are estimate the equilibrium state and then we estimate the corresponding payoff matrix. You can see that this is a simulated result, the black one, from our model. The red one is we estimate a payoff matrix and then we do experiment based on the data. And this is average and variance. We also plot the variance. So as you can see it's quite consistent to what we have predict by using our this [indiscernible] model. So this is one example to show that even for information diffusion, there is interaction between user where decision happen and if we take that into account, we can have a very good model to describe that. Next, I want to talk about is a Groupon. Use as an example to illustrate how customer can learn from each other and choose the best deal. Okay. So what does this mean? This is sort of the actual Groupon data that we collect. You can see that this is the data that we have. We found that. I don't know if to use a surprise or not. You know Groupon, right? Sometimes we will get Groupon, we will get good deal. If this is a restaurant where you go very often, if the price is dropped 50 percent, we will buy. Many people will buy it. Sometimes it will tell you how many. So if this is successful Groupon deal, we collect lots of data and we do some average and we calculate Yelp ranking, okay? We relate it to Yelp. We will find that it is Yelp ranking decline. Is that a surprise? It was a successful Groupon deal. You know why? What happened is this. You think it's a good deal. He think it's a good deal. Everybody think it's a good deal. Many people buy it. Now suddenly, you have quite a lot of people who have this and then in the next few weeks, these people will show up. So most of the time, many people show up and then the quality of the restaurant decline because of such a demand suddenly you have so many people. >>: If the price is low, they may say that, well, they are willing to wait a bit longer. So what do they give the rating on? >> K.J. Ray Liu: The rating is all of that. I think many people [indiscernible] many people who buy it and then many people will show up in this, therefore the quality during this particular time before you can actually [indiscernible] go down. But you know what? This restaurant, they don't want to see their rating go down because the reason that many people want to go is because they have a good rating. So that is why. This is a typical phenomenon that have this so called negative externality, meaning you and I, we all go show up at the restaurant. You and I had to wait for one hour and we will never come back again. So that is what happens. Okay. So this is learning and this is decision making. So we want to formulate a list in terms of this Chinese restaurant game problem. Chinese restaurant problem that is very standard this machine learning use a lot in machine learning basically [indiscernible] processes where infinite table and each one with infinite seat. And people have to come in and choose, decide to a new customer come in, decide to take a seat in one of the open table or open a new table. Okay. If you had not been in Chinese restaurant that would be more difficult to explain. But especially this is in the old [indiscernible] Chinese restaurant, it's all the roundtable, okay. So these are all nonstrategic. Now we formulate a game called Chinese restaurant game. We try to introduce the strategic behavior with decision making. So what is this? Let's take a look at this first. Hm. >>: [inaudible]. >> K.J. Ray Liu: Okay. I don't have [indiscernible]. We put it into these I had [indiscernible] from the first one, from the first customer come, the first customer can choose a table to sit. And then based on the second customer can come in and also can choose a table to seat. And what's the problem with that? Our model is this is a the same as the example I had used before. If you sit in a table with lots of space, you are very comfortable, okay. There's too many people there and you are not enjoy your eating. So when you come, you decide you want to open a new table, you want to sit in this big table, you want to choose a small table. And what happen is when you come, you only see two people there. Then therefore, you have to be able to predict, through learning or whatever that eventually, how many people is going to sit where and therefore, at this moment, what would be your best decision to choose which one, okay. So table size, here we have a table size that is X and theta is the system state, is the restaurant state, meaning this guy has spent 200 something dollar to [indiscernible] the restaurant, only spend one dollar. That is the system state of that. The customer has signal. The signal is all the [indiscernible] happen there. You have all this signal that you know, okay. And some customer may request the same table, of course, then that would be negative [indiscernible] externality. Meaning what? When we have more customer in one table, less space for each other. Then it's a negative effect. So through this, question now is what will be our best decision? That is the same as I can say now, because later on example, it's just like you to an airport and you turn on your iPhone or Microsoft phone, Windows Phone, whatever you turn on. So many this selection. Everybody select the one with the highest power state, right? You know what? Nobody can get connected, although there's the [indiscernible] one. That is the negative externality when you all come together, you cannot. So what is the best? And you say this is a typical example. Your decision affects my decision. My decision is not independent of you. When you make decision, and if I make decision, if we just look at our own [indiscernible], we consider how we mutually affect each other, it will not be the best decision. Okay. Sequential decision making. Everybody comes sequentially. Therefore, we have the information observation space. Action of previous players that is for the NIJ is number of customer and table J when the customer I come. So that mean when the customer I come, there are so many customer on table one, so many customer on table K, okay? And also, previously, we had already had so many we already had [indiscernible] customer did arrive so we can see. We come in, we can see [indiscernible] there and we can know that. Oh, [indiscernible] is here. Okay. So the first one arrive. Second one, choose that, and maybe number three choose. Okay. So that is what happened to choose. So now we have this signal. Signal, we assume, if it is perfect, meaning I can see everything. I know everything. Everything is so precise. Then it's become very simple. We have equilibrium theorem to show that we do have equilibrium, okay? We have given customer set of that many and so many table, this is the Nash equilibrium. That means the utility of how many users choosing table X and then given the size of this table X and the size of condition, that would be better than whenever one of these move to the other table Y. Move to table Y had no other advantage because this is bigger. That is equilibrium. We so that this is [indiscernible] very simple. Okay. You are going to have so many so we can study from the first one, but first this put into table one and the next put into table two and so on. Now it's equilibrium. The signal is perfect. However, in real life, signal is not perfect. It's not perfect learning have to involve, now we have decision making. Now we have to combine with learning. So we had the signal with rumor we have condition on the state of the system. We can use [indiscernible] whatever we can that is to learn the belief. How do we learn the belief. Now we don't see signal clearly. There is a belief we have to estimate a belief and everybody, based on the [indiscernible] making decisions. That's a typical learning thing and then we have to also couple with the decision making. Okay. So here we do have the best response in the Nash equilibrium. The best response that is when a user choose table when a when a user I come in, he choose table J and then given all of this previous user the signal he had and the history, what is the best this expectation it can get. And this best expectation, because of this belief is distribution so we have to average out through this belief then that would be our best result. And how do we calculate this? Very simple. Now we have all the [indiscernible] we have to do backward induction, because everybody depend on the future information. So when the last one arrive, we can, if we want to get optimal result, there's no future, from here we know the result and then we do backward induction to calculate everybody's best decision. Okay. So now I want to show an example here. This is a typical setting. We have two restaurant, one high quality restaurant, the other one is a low quality restaurant. This low quality restaurant, the quality factor increase from here to here. The average utility of one of the customer and that's the other customer. So you can see that the best response is the one that we propose. We always have the best, and this may not be the best example, okay, just before I came my student just give me this one because I want to use Groupon to do an example. This is particular one. The average utility of the best response is almost the same as the myopic one. The myopic one is based on what you know just currently. But in many situations, there is [indiscernible] and however, this myopic one is not Nash equilibrium, meaning people would be able to change it the state, okay, they are going to change. And learning, this is the one if we know the signal, we only know the signal that we can have if we learn the signal, learn the signal meaning we try to learn the belief, okay, we start to estimate a belief, but we don't take into account of what, of the negative externality, okay. We not considering the negative externality. So that is the worst. That is basically what we observe in this Groupon. In the Groupon, we see this rating going down. That is because of this. It is the worst. Why? Because it didn't take into consideration of other users effect. And therefore, the overall performance can be very bad, rather than if we take into consideration of all this effect together. Okay. So this one is to show based on what we have formulated what would be the best strategy that we have for new restaurant? For a new restaurant, we can see that this is a deal price. If the deal price is going down, the number of customer will be going up. That is for low quality restaurant. For high quality restaurant if a deal price going down, yes, number of customer will going up. That's all the same. And this part doesn't change much. Now if we look at this for low quality restaurant, if we look at the revenue, if we look at the revenue, that when the signal quality, meaning the advertisement or this rumor for why people had come, if the signal quality tend to be not as good, okay, if the signal quality is not that good for low quality restaurant, you may have a higher revenue. And the best pricing would be somewhere here. However, for high quality restaurant, if the high quality restaurant want to have a better revenue, then what? Then the signal quality had to be good. And then along this line, you can have a very high revenue. Okay. So depend on you are running a low quality restaurant or you're running a high quality restaurant. They are different strategy that you can use. You can maximize your own >>: Why is that low quality restaurant [indiscernible]. >> K.J. Ray Liu: Because for low quality restaurant, if a signal is too good, the customer may choose not to come. They may know that is a low quality restaurant. >>: Oh, I see. So the quality tells. >> K.J. Ray Liu: If it's a high quality restaurant, you want many people know that is true. The more who know that >> [indiscernible] quality actually is the rating. >> K.J. Ray Liu: >>: The rating, yes. [indiscernible]. >> K.J. Ray Liu: Yes. Okay. So this is a summary of what I had just said. And the whole thing is I want to show this one. It's highly nonlinear. You can see it's highly, okay. It is not just occur. It is indeed the whole outcome is highly nonlinear if we combine this decision and learning all together. It's very difficult to describe by using a very simple equations. With this, we develop a family of this decision learning game, starting from this Chinese restaurant game, and this dynamic Chinese restaurant game we use to model this as a network selection just like the one I mentioned, if you come to this network. You come in to, you are going to select which network to join and many people come and go. And you have many option. The one with very high energy, very power and then some would be less. Which one to choose. Indian Buffet game is when you can choose not just one, you can choose many. This is [indiscernible]. So we have a family of this that we have developed that we can use. What application that we can have. This is what we can see. This is typical application in social computing. This is the one that such as in Amazon and Yelp, where people sequentially write a review, okay, and this review may affect each other's decision and this is something that we see in Yelp or this is some the typical question and answer sign. We had the question and answer and people can write. With all of these, and not only in this, we can also have in YouTube and other video. With all of these, we find something in common in this social computing system. User will write sequentially and then they will make decision on all of these. Whether to produce a piece of content. And if so, what quality. And also, whether to [indiscernible] this encounter or not. So that is for the system that we see. So the sequentially. Second, there are externality among users' decision, meaning your decision affect my decision. My decision affect others' decision. Okay. So with this, the family of tool that we develop here can be used to model and analyze this result. You can analyze the business model of a social computing system and also to design effective incentive mechanism to seduce desirable behavior. Okay. Now I want to focus on the last one. This is some other applications that we have. The last one is the cloud sourcing. This one I want to emphasize with this decision making tool, we can specifically design a mechanism to achieve the desirable goal. Rather than we just learn the outcome. Okay. So this is about cloud sourcing. So let's see what problem do we have here. Large scale labeled dataset is very important for this learning. And because of this more data can be more accuracy, and larger scale annotation is very expensive. Therefore, this often becomes a bottleneck. So what solution do we have? Microtask crowd sourcing. We can have a microtask crowd sourcing for many people to come in to help us. It can be very large volume, short time and low cost. What is the problem? The problem is the collected data can be low quality because the user may not have the right incentive to do so. And you may say, let's give them incentive. Let's pay them. That's a problem. Most of the time, we also don't have very high budget to do so. So the reason we want to do microtask crowd sourcing is we don't have high budget. We even may not have budget or we have very low budget. But if you say give them more, if we had that, then we have this problem, but if you say give them incentive, then it's no longer meet the purpose of this. So what to do? I will show that we can design a mechanism to do so. Machine learning solution is what? Okay. Now we know this problem. We will add a data curation phase to filter out low quality data. Let's filter out or modify the learning algorithm to accept noisy label. That's what people have done. But these are basically [indiscernible] problem. What is the best we can do? We can do some trial and error. Now with this decision learning solution, what we propose is how about let's incentivize high quality data from the first place by devising a mechanism to do so and pay a very, very minimum cost to achieve that. Okay. So what to do. Crowd sourcing is for something repetitive and tedious. [indiscernible], correct? No? This is not a very simple one. That's why we can do this [indiscernible] crowd sourcing. And each task, the reward is often very small. So now you think about if you are low quality somebody with a low quality skill, you can only do this. But because iter very small, how do you maximize your profit? You can only use quantity to maximize the profit. In order to maximize your quantity, your quality of your work may not be too good, because it's too small and who care, okay. So you want to increase that. So it's small and there's no competition among the user. This microtask crowd sourcing lacks proper incentive. And so it's profitable for worker to submit poor solution because nobody know. As long as nobody know, they will submit that. We need to provide incentive. If we provide incentive, then our problem is this mechanism become costly, okay. So question is what kinds of incentive mechanisms that we have that we should request employ to collect high quality solutions in a cost effective way? Can we come up with that? Yes? >>: Just as a point of reference, for Amazon Mechanical Turk, there is an incentive system. There are two. The first is that the person that requests the work can reject the answer that they get back and not pay. >> K.J. Ray Liu: I know. >>: And the other is that each worker has a statistic which is what fraction of their work has been rejected. >> K.J. Ray Liu: >>: So there is but it does exist. Okay. I'm not saying that's an optimal mechanism, >> K.J. Ray Liu: Yes, I will talk about that, and we and people also has shown, and, in fact, we also had derived there is a minimum cost there, and the minimum cost can be very high. And we want to propose something different so I will talk about that. Okay. So the model is this. Strategic worker model. That's actions that it's produce quality from zero to one. One is being the highest quality. Gain. There's a reward given by the requester. And the cost, the cost is something that will be convex. This cost could be convex. Why? Because it is basically more costly to improve higher quality solutions. So it should be [indiscernible] functions. And differentiable is for [indiscernible] and also is a first order derivative is larger than this zero because answer the higher quality are more costly to produce. Okay. Each worker will act to maximize own utility. So that is worker's model. Requester's model. Single requester. We will publish tasks and solicit solution. Decide whether to submit whether a submitted solution should be accepted or not. There's a decision. So it is how and what to do with this. And then design mechanism. Which specify rule for evaluating submitted solution and for rewarding worker. Okay. So now there's a problem formulation. Solution concept, they had to be a symmetric Nash equilibrium. Symmetric saying now we assume everybody is the same, okay. So it's a symmetric. They all have the same utility and all the same conditions. And interested in desirable Nash equilibrium where worker will choose Q equal 1 as their equilibrium, because we assume such a task is so simple. You will check this is correct and that is not correct. everybody can easily achieve very high quality. So Now, there is a mechanism cost, CM. And this mechanism cost can be reward paid to worker or the cost to evaluate. That is the most important one, to evaluate the submitted solution. So the question now is, what is the minimum mechanism cost that can we guarantee the existence of such an equilibrium, that equilibrium in [indiscernible] the best for the requester and the best for the worker. There's two basic mechanism that we know before. And just like what you had mentioned, one is a reward consensus mechanism. How do we evaluate, based on everybody's result and then we will work. The one with the maturity is the right answer. We had to do these evaluations. And the next says the reward accuracy mechanism. We had to evaluate the accuracy, okay. We had to evaluate the accuracy. Then these two mechanism have minimum cost. The cost of that is this and the cost of that is that. That's a minimum cost. And why? Because we had to evaluate. And the problem is the minimum cost constraint in the two basic mechanisms, this the requester had no control. Those minimum cost is something that we had no control. The better that we can evaluate a result, the higher the cost. So this basically, this trade off may negate a low cost advantage of microtask crowd sourcing. So what can we do here? So I would like to illustrate one example that we tried to that we proposed that can overcome this problem in a very nice way. We propose a mechanism that do so. We employ quality aware worker training to reduce mechanism cost. Why is this? Instead of [indiscernible] scheme at the beginning, at the beginning provide some kind of training for worker at the beginning, we don't do so. We say that because this is very simple scheme, we try to assign this training to worker when what? When they perform poorly. You can view this as what? You can view this as like punishment state. I mean, all the time they have, if use that to produce result, they can make money. But perform poorly, and then if they perform poorly, then to get into training state and then that basically is punishment, okay, more like a punishment state. a they can if they they have more like a Okay. So there are two states in this proposed mechanism. One is a working state that can produce result and for them to make money and the training state, this training state you will get N training task to gain qualification for the working state. Okay. So now let's see how does this work. We have to evaluate. Now we have a training state and we have a working state. And there's all this probability, workers action right now not only affect the immediate utility, but also future utility. So therefore, each worker will choose his action based on what? Long term expected utility, okay? For this, basically this problem can be formulated as MDP. However this MDP had one thing, that is it is not each individually, just evaluate its own MDP is faced by each worker also depend on other worker. So it depend on each other. Therefore, this is a kind of game, the reverse MDP. Okay. It's MDP process, but it affect each other. So our result is this. Just summarize this, interpret this. In this proposed mechanism, if the number of training task, N, is large enough, okay, N is large enough, meaning if you don't do well, you go to punishment stage, you are going to spend a lot of hard work for nothing. Then there is a symmetric Nash equilibrium that can force you to produce quality one result. [indiscernible]. What does this mean? This mean that given any parameter in this working state, we can always guarantee the existence of such a desirable Nash equilibrium. At what cost? This one is very interesting. At a cost that syntactically, it can be zero. This is the sampling probability for testing. This is reward. The cost can be zero. So let's interpret what does that mean. We can collect high quality solution with an arbitrary low cost if we choose the right parameter. And the requester will have always, we always have a pretty [indiscernible] budget. >>: But the [indiscernible] cost, that's a training cost, not the reward cost. >> K.J. Ray Liu: This is the cost to, yeah, to total that you want to pay them and also there's two cost. One is the reward to the worker and then the other one is your evaluation. The budget, this budget is influenced by many things, okay. So how do we interpret this? I mean, given any budget, the proposed mechanism in anybody requested to collect high quality solution while still staying within the budget. Although this is a zero, that mean it can be very small, okay. We can do so. I want to show you one example. This is an example that we, actually, we performed experiment in our department. So the task, very simple task, calculating the sum of two randomly generated double digit number. Middle school, elementary school can do that. Ten points for each accepted solution. The goal of participant is to maximize the accumulated point. Very simple one. And then the task assigned to participants is three sets. Three different set. Each set has three minutes. We need to control the time because it help participant to quantify their cost of solving a task with various qualities in terms of time. If they want to maximize that within this three minutes, then they have to determine it will be very high quality work or very low quality work. They have to perform the trade off. So we have the mechanism we actually set. Set one, the reward accuracy mechanism with sampling probability one, meaning with probability one, we will accept every solution. It's very costly to [indiscernible]. Set two, sampling probability 0.3. I will only randomly select 30 percent of the solution to exam. And set three is this. But with this, we build upon it the proposed mechanism. So what is the result? This one. With probability one, we are going to examine every solution everybody produce. Here we collect results from 41 participants. They are mostly engineering graduates, okay. And use accuracy of each participant to indicate the effectiveness of this mechanism. So this is for set number one. We want to exam every solution. Then almost everybody produce very high quality result. Said two, only 30 percent of the solution will be exam, with very high probability in order for them to make profit to produce very bad result. Only one guy here, the here because this guy have very high working axis, okay. Maybe Microsoft want to hire people like that, but that. Why? Because you see only 30 percent chance will be evaluate. I want to maximize my own utility. That's the best way to maximize my utility, not with this. We impose this training. It's more like a detergent you actually implement. After that, you can see this is very similar to that one. And the cost of this may not be that high because with this, user may choose not to? Get into that kinds of behavior. Okay. So conclusion and future work. Decision making is a tool that combines learning and strategic decision making. We be a lyze users' optimal behavior from users' perspective, okay. That we try to argue, that cannot be ignored and that play a very important role and, in fact, we can also design optimal mechanism from system designer's point of view. So it is not simply from years' perspective, but it's also from system designer's perspective, depending on how you want to look into the problem. And here, we use three examples to show the effectiveness of this decision learning. One, we use online marketing to demonstrate what? We can learn users' utility function for strategic decision making. The second one is the Groupon. We demonstrate that we can learn from each other's interactions, okay. They all affect each other for better decision making. Otherwise, the purpose of Groupon would just decline. It doesn't serve the purpose. And for crowdsourcing, we argue that we can take into account users' strategic behavior to obtain better quality data for better training. So the final thought is this. That is what we see in the current big data world. In the current big data world, especially in social media, there's lots of data there. These are big user generated data. And then one of the user will take this data, do all the modeling and study, then perform decision making and test sequential action with the outcome. This outcome will come back to affect the data. And the second user will again do the same thing and its own decision may eventually come back to affect that data. Therefore, for big data, this data is not just a steady, unchanged data over there. It is a data that collected the change with time and also keep changing depending on each different user's interaction and decision making. Therefore, all of these have to be taken into consideration so that we can make optimal decision through the best learning that we can. Okay. much. This is the topic that I wanted to talk today. Thank you >>: I just have a question on the crowd sourcing, at the end you propose that according to this decision methodology, you have three stages or two stages of process, yeah, training state and working state. So I just wonder, how does this proposal [indiscernible] how does it depend upon your analysis of using [indiscernible]. >> K.J. Ray Liu: Okay. It will be this, right? So you are in working state. With that probability, this is the quality you produce. This is the quality that other people produce, okay. The worker's action at working state. And then with this probability, you can keep staying in the working state. With that probability, you will be getting into training state. And in the training state, that worker's action and training, okay, in the training state, you also decide how long you will stay over here, and then you will go back to here. >>: Yes, okay. case? >> K.J. Ray Liu: >>: Yeah, in this example, it's fixed. So the learning part is [indiscernible]. >> K.J. Ray Liu: >>: So now all this parameters here is fixed in this We're learning the [indiscernible] probability. So how difficult it is? >> K.J. Ray Liu: Given the simple example we have at universities, it's not that difficult. Yeah. However, if we want to apply this to this real problem, then we may have to take into account, you know, to keep a model. >>: So in order to estimate that, this is actually the key. you don't have that information, nothing >> K.J. Ray Liu: >>: If Yes, we need to have that. Then what kind of label you have in order to figure out this >> K.J. Ray Liu: The label data in our experiment is very simple, because this is basically to calculate two randomly generated double digit number, right. We don't have to [indiscernible]. We look at ourself. >>: Okay. >> K.J. Ray Liu: Yeah, otherwise you need or evaluate 30 percent, you know. evaluate every one >>: I got the impression that on set three, even if you had set the probability of sampling much lower, you would still perform very well. >> K.J. Ray Liu: You mean this one? >>: Yeah, set three would be much better than two, even if the probability of sampling was like 0.1. >> K.J. Ray Liu: This one. >>: Because all you need is a small probability of sampling to change the person's behavior. It's like the train ticket problem, right. People go into the train. But if I and choose, you know, it only needs a small probability to change the behavior. >> K.J. Ray Liu: Yeah. In fact, this, you can view this more like, okay if you got call, this is your punishment, okay? 0.1 percent, if your punishment is very severe, where you cannot earn money during a period of time, people may so this is a more deter, okay, so actually how often you may go here, you don't know, because of what if you see this, you know what. [indiscernible] translate into when the punishment is severe enough. >>: Then the probability could be very small and it still works well, right? >> K.J. Ray Liu: Yes. Yes. >>: An interesting extension of this that would bring in externalities, which would be the rest of the work requesters is to look at throughput of jobs on crowd sources. Because it's often the case that paying more actually doesn't change the quality. This is totally bizarre, but I've measured it. And other people have measured it. If you pay more, what I saw is the quality went down slightly. What goes up is the speed. So if you pay more, you can get the jobs done a lot faster. It would be interesting to see what the interaction is between this kind of mechanism design and throughput. Because if people perceive this as something that's slowing them down, how does that affect the throughput of your job? >> K.J. Ray Liu: Yes. >>: And there, it depends on what all the other requesters are doing. If you're the only person that's policing in this way, you're the only requester policing this way, you might find that no one wants to do your jobs. >> K.J. Ray Liu: >>: Um hmm, yes, that's correct. Yeah. In this case, do you announce the mechanism to the worker? >> K.J. Ray Liu: Yes, at the beginning they all know. to know. So then they take it into account. They have >>: In practice, you don't need to know whether it's a 30 percent or 10 percent, just need to be checked at the [indiscernible]. >> K.J. Ray Liu: No, we let people know, okay. know all this parameter, okay. We let people [Multiple people speaking.] >> K.J. Ray Liu: The message basically here is for something like this, it's not just we keep learning, keep learning, keep learning. We can incorporate some strategic decision making as a tool to design some kinds of mechanism that doesn't incur much cost. Also, it can really improve the system performance. It's just a different perspective to look into that. >>: [indiscernible]. >> K.J. Ray Liu: This is simply for one to be [indiscernible] probability one to [indiscernible] solution. >>: [indiscernible] and you have an answer for every visit. >> K.J. Ray Liu: >>: I had to evaluate every answer. Okay, okay. >> K.J. Ray Liu: This is 30 percent chance I have [indiscernible] probability. Maybe three out of ten answer I will evaluate. >>: Okay. >> K.J. Ray Liu: Because this is very low, so many people will choose to do a lousy work and they get reward because of the chance of being caught is small. >>: So in some case, if a worker finish ten tasks, and if they get [indiscernible]. >> K.J. Ray Liu: No, no. When they got called, they had to say N, I don't we can determine the N. Well, it's not here. There. We said this N, right here, the training of N test. So if you got called, you had to train by 20 problem, okay. And that 20 problem cannot come into your reward, okay. And after that you will continue to do that. So [indiscernible] mechanism is that you can view this as a training process to increase your workers' skill for them to improve their skill whenever they need to improve. >>: On the other hand, if they know that the chance of being caught is high, they lose everything, they have [indiscernible] may decide not to do this job but do something else. >> K.J. Ray Liu: >>: Yes. There's a [indiscernible]. >> K.J. Ray Liu: Yes. >>: This doesn't assume there's competing strategies. worker can choose between the two. >> K.J. Ray Liu: >>: Exactly, exactly. There's different layers. >> K.J. Ray Liu: >>: The same They can choose which test, yes. That's all strategic. >> K.J. Ray Liu: It is also strategic. There's many different ways just to show that we can develop some kind of mechanisms to achieve some desirable goal. Certainly, there are many other competing factors there. >>: [indiscernible] the worker will have to has to be trained again because accuracy is under [indiscernible]. >> K.J. Ray Liu: No, no, no. >>: Or just any single mistake, they'll have to be trained again? >> K.J. Ray Liu: Okay. There is a [indiscernible], okay which I didn't indicate here. In assigning them to assigning the trainer to worker when they perform poorly, we have to decide what does that mean, poorly, okay. It could be when they miss one problem or when they miss two consecutive problem, something like that. >>: So [indiscernible]. >> K.J. Ray Liu: >>: Right. You mean in this particular experiment? For the students, right? >> K.J. Ray Liu: In fact, I can tell you. I can tell you. We don't actually, because this is the result, right? We don't actually implement this punishment state, because we just collect this result. This mean that when a student, under this scenario, produce this high quality result, under this, we can say that if you perform poorly, but what we collect, we may I don't remember actually we tell them it's one problem or two consecutive problem, I don't recall that, but this is when we collect from them. Once they understand there is such a mechanism there, this is the quality over where they produce. >>: It's almost like a train ticket. >> K.J. Ray Liu: Yes. >>: Once I know there is something there, I will try my best to perform. >> K.J. Ray Liu: >> Li Deng: Yes, exactly. Thank you very much. >> K.J. Ray Liu: Okay. Thank you.