>> Jin Li: Today it's our great pleasure to have professor Mihaela van der Schaar from UCLA to be with us and give us a talk on a brief synopsis of her recent research at the multimedia communications and systems labs at UCLA. We will first hear Mihaela to give us a brief overview of her lab and then hear detailed talk from two of her students, Yi Su and Fangwen Fu. Professor Mihaela van der Schaar has won numerous award, I mean during her career. She has won 2004 NSF CAREER Award, 2005 Best Paper Award in IEEE transaction on CSVT, 2006 Okawa Foundation Award. She is IBM Faculty Award winner for year 2005, 2007, 2008, and won the 2008 Exploration Stream Analytics Innovation Award from IBM Watson in year 2008. In 2006, she won the Most Cited Paper Award from European EURASIP Image Communication Journals. She was associate editor of a number of IEEE Transactions, including IEEE Transaction on Multimedia Signal Processing Letters, Circuits and Sytem for Video Technology, and Signal Processing Magazines. She hold 32 US patent and three ISO award for her contribution to MPEG Video Compression and Streaming. Her research interest spans through multimedia communications, networking, processing and systems. Without further ado, let's hear what she has done exciting work in her lab. [applause]. >> Mihaela van der Schaar: Thank you, Jin, for the introduction. So what my plan is today is to give you first a brief synopsis of our recent research at UCLA in my lab, but my plan is to give you mainly a synopsis of what we are doing without getting in an awful lot of details in view of time. However, I'm going to have two students that are going to give more detailed talks after that. And actually what I'm trying -- what I hope for this presentation is to catalyze enough interest from you such that maybe when you come to UCLA and are visiting us and you can find more details about our own research while you're visiting us or maybe to invite us to give more detailed talks on any of the topics that you may find of interest here. Okay. So I'm going to try to start with the brief highlight of what is our research all about. So our focus in the last couple of years has mainly been to come up with a rigorous framework to analyze, design, and optimize. And I'm going to stress dynamic and heterogenous multi-user environments and applications. So actually starting with 2004 when I moved from industry to academia, we started actually to look in our group and trying to find a new theory together with algorithms and designs. The main focus was in trying to find a new theory for architecting next generating wired and wireless networks and as well as distributed systems which are interconnected through such networks which are able to support media applications. So in this research a main key theme is multi-user communication and networking. And what I'm going to do today is I'm going to go through a couple of the topic highlighted here in red. So I'm going to talk a little bit about some of our new designs for multi-user communication in wireless networks. I'm going to talk a little bit about how we can reduce power significantly in wireless multimedia systems. I'm going to briefly talk about our work on peer-to-peer networks as well as some more recent work on distributed stream mining systems. There are also some other topics we have been working on, but I'm not going to talk about those today. However, one of my students, Fangwen Fu will touch on the topic of wireless media communication. So if you look at really multi-user communication today most of the protocols we have are really kind of passive, and they really are daring to some predetermined rigid protocol rules. Also another type of design that has emerged in recent years was to really model transceivers as rational agents competing for resources, [inaudible] wireless resources, for example. And that's what game theory tries to do. However, if you look at the current solutions in game theory today, they can be kind of classifying to [inaudible] algorithm. On one hand we have cooperative games that eventually most of the time either require a lot of message exchanges to ensure that the users are going to be able to operate on the Pareto boundary. And here I'm having two users, just a user for illustration, and they have utility one and U2. And if they would like to operate on this Pareto boundary, most of the time either a moderator may be needed like for example an axis point or a base station or this may be on a distributed fashion but through heavy message exchange. If you don't want to have this type of infrastructure or message exchange overheads, what we have is solutions which are based on uncooperative approaches which are doing a myopic best response. And what we know about this type of game sys is for some of these games we can prove that a Nash equilibrium exists, but this Nash equilibrium which is based only on myopic base response and local information is for most games very inefficient. So for most communication games of interest, the Nash equilibrium is not on the Pareto boundary, but rather very far away from it, and at times it may even lie at zero-zero utility as I'm going to show you an example a little bit later. So also so first thing I would like to point out is that most of the work is focused either on fully collaborative information exchange or kind of almost passive response, myopic response based on local information. Also the focus is really on equilibrium characterization rather than instructing new solutions that will allow us to operate maybe based on local information but close to the Pareto boundary. So I like to have this instructed design that would allow us to move this point towards the Pareto boundary but without any message exchange. So that should be possible mainly because in reality these devices that we talk about are increasingly smarter. We have smart phones and smart gateways and smart devices which are heterogenous increasingly data strategic. They may not want to just comply to existing protocol rules but they may want within the framework of a particular protocol to maximize their own utility. They are capable really to proactively acquire information and also learn [inaudible] and also select desired equilibria. So then they work together to achieve what I'm going to call here as coopetition so there may be non-cooperative users aiming to maximize their own utility but which may discover in this process that it is better for them to move towards a cooperation. And I'm going to talk about that in just one second. So coopetition stands for cooperative approaches in by non- cooperative users, so non-cooperative users, but what we'd like to do is we would like to see when, in which had cases these users are going to be able to cooperate or coordinate to achieve better outcomes. And what we see actually in the literature is there is very little work in the communication area in trying to find out better equilibrium concepts and better solutions to achieve this better equilibria. So why we would like coopetition is because the strategic devices really compete for available resources and that may often result in inefficient outcomes. For example it's inefficient Nash equilibria. However, if you have coordination and cooperation, then both the efficiency may improve but also in many cases is the network is well designed, the performance of individual users, of all individual users may also be increased. So what I'm going to call coordination is the design of protocols that better utilize available information to coordinate spectrum access or resource access but without explicit message passing. So there would be no message exchanges involved. On the other hand, I'm going to call cooperation the design of protocols that shape the incentives of users, agents in this case, I'm going to call the wireless devices or network devices as agents, so shape the incentives in such a way that their selfish behavior results in cooperative outcomes. That it becomes in the self interest of users to come up with cooperative outcomes. So what we'd like to do is we'd like to have devices that are able to learn to determine when cooperation or coordination may be beneficial, and if that's the case to coordinate among themselves to select the desired equilibria, which are beneficial in terms of efficiency fairness as well as stability. So for example what we show here is a wireless device or a device that would interact with the environment that may be dynamically changed over time and it's making observation based on this information and based on that it may make beefs about its coupling with the other users, how its actions affect in the short term but also in the long term their performance. And based on that, they can determine the policy based on which they would like to interact in the future with the environment. So that is based not only on information about let's say traffic or private information of the user of private valuation of the user of a particular resource but also on how this particular user is coupled with other users in the network. And how he's affected and can affect other users. So coupling among users is key in this type of interaction. Also we are going to do a lot on trying to determine not only just to characterize an equilibrium, but try to select equilibria. And that may depend on really the device's ability, heterogenous ability maybe to acquire information and learn, but also achieve performance at a particular equilibrium both from a system performance, but also from individual users performance. And finally we will be interested in convergent properties, both whether we can achieve local stability or global stability. And my student, Yi Su, will talk much more about that in a particular setting. So kind of to finalize this introduction, our goal is really to look at different transceivers having different private knowledge that are trying to dynamically coordinate maybe, cooperate maybe or maybe compete to gather resources such that they maximize their utilities. And what we have seen in our recent work in the last couple of years that this really can achieve unprecedented improvements as opposed to current existing solutions of protocol design which rely on rigid predetermined rules for interaction. So let me start by giving a brief overview of some work that we did in this area of multi-user communication and networking. And I'm having here couple of the students that have worked in this area, conclusively will give a presentation later. And the type of problems of interest in this area are really new theory and protocol design to achieve this type of cooperation and coordination in multi-user access networks. And really the vision is to better understand the coupling existing among the decision making made by the users in this type of networks and design the next-generation MAC protocols that can help us improve again not only network efficiency but also fairness and stability. And last but not least which may be strategy proof meaning that there is really in the self interest of the users to comply with the protocols designed rather than manipulate them. So I'm going to start with exactly this last topic. So I'm going to talk about first, about just give you an example about how we went about to solve selfish behaviors in wireless networks. So we know that network protocols can be easily manipulated by selfish users which may not necessarily want to destroy the network but they may just want to maximize the utility. So they are self interested. And this leads to a result very often that's called a tragedy of commons that has been studied by many people, inclusively in the CSMA/CA scenario by Cagalj and others. We have looked actually at the similar scenario in the TDMA case. But I'm going to focus today just in view of time on the CSMA scenario just because it's faster. So really what we would like to do with this work is to come up with solution that would help us avoid these inefficient outcomes. And we would like to have a network designer design method that would help these selfish users to choose cooperative outcomes. So for that you are going to say well, we have solutions for that. Pricing schemes have been designed to do just that. So a widely discussed method to incentivize users to really come up with cooperative outcomes in competitive scenario is really pricing. And that has been studied for a long time in microeconomics for decades. And it is known that if you do that, the outcome can achieve the system optimum. And the way it works is by having prices that capture the external effects of a user's behavior on other's welfare. Hence, when facing this suitable chosen, and I want to emphasize suitable chosen prices, is equivalent really to taking the effect of one's behavior into other -- on to others into account. So for example what we have here is a user, user I may try to maximize its actions over it's action set. So for example, it's transmission probabilities in a wireless network. To maximize PI which will be the new utility which will be the utility of the user, so for example the performance of the throughput that he is going to gain, minus a cost. Minus a price, for use, he needs to pay for using these particular resources. So by introducing this taxation, we are hoping to try to really account for the inconvenience caused to others. However, how we can interpret these prices in network scenario? For a long time while looking at prices and taxation I was not very comfortable with it. I didn't felt it was a good solution really for wireless networks. And the main reason I started to ask this is because really how we can interpret these prices and payments in wireless scenarios. So for example, one way to interpret that would be to view these prices coordination signal, like done in the work by Kelly, for example, or Low. In this case, the signals are really for coordination in decentralized methods to find a global solution to this optimization problem. And for a -- however, for a pricing scheme to be effective, these users need to follow the prescribed scheme. They are still going to be needing to align themselves with this prescribed rule, so hence this becomes inconsistent with selfish behavior. Also it has a lot of disadvantage if requires users to exchange messages to determine the desired price, which for wireless network would be disadvantages. Another interpretation of these prices in networks was as real payments of some good. And that's for example the work of Varian or the other work of Kelly, where really they tried to view these payments a really some form of monetary payment that's exchanged among users. However, the question becomes in the networking scenario, what good do users really care about? Should it really be money? And really why do users agree to pay prices? Because in order to be able to have enforceable payments, we really would need to have some form of contracts. If we don't have that, then users may free-ride. And having contracts may require a lot of infrastructure addition to current networks which we might not want to add. Another problem is that really if these users do care indeed about these goods because we assume they should care, otherwise they would have an incentive to free-ride, if they care about that, then these payments or rewards will reduce or will increase the welfare of the users or the network provider. So really it becomes a difficult problem to compute the right price for these different types of users and to come up with the right infrastructure to implement taxation and pricing. And especially if private information of users is involved, evaluation of resources of good. This become an even more difficult problem. Last but not least in many situations it has been shown, for example, by Johari and Tsitsiklis is that price-taking behavior is not really consistent with selfish and strategic behavior. Most of the users will not be price taking but other price anticipating. They understand how their actions will affect future prices. And based on that, they are going to derail, really, this type of process of implementing prices to chief network utility maximization. So what we searched for a long time was really an alternative to prices. So we would like to have pricing schemes which are not necessarily suffering from this disadvantages mentioned before. So what we proceeds is an alternative to pricing to sustain cooperation among selfish users. For example in the case of wireless networks. And what I'm going to give to you today is just a simple example of where we implemented this type of scenario for a slotted Aloha protocol. However this solution that I'm going to describe next can be implemented in other scenarios that go way beyond just Aloha. So again, just for illustration I'm going to a very simple slotted Aloha scenario where I'm going to model the different wireless users transmitting their data as a non cooperative game. Having a number of users, a strategy of users which is actually going to be the probability of transmitting. The payoff of the users, which is going to be of internal valuation of resources, KI, let's say, of how much they'd like to receive a particular amount of traffic. Transmit particular type of traffic. Times the probability of transmitting, times the probability of the other users not transmitting. So if you like, this is the good put of a particular users, a particular user multiplied by a valuation KI. >>: Can I ask [inaudible]. >> Mihaela van der Schaar: Yes, please. >>: You used the word transmission. Can we replace reception in this model, like the YouTube nightmare scenario where everybody wants to, you know, I want my video and I don't care about him. >> Mihaela van der Schaar: Exactly. Yes. >>: Thank you. >> Mihaela van der Schaar: So it could work on a transmitter driven scenario but also it could be viewed from a receiver's point of view. >>: Terrific. Thank you. >> Mihaela van der Schaar: And as I'm going to show just briefly a little bit later, it can be even from a relay scenario. So it could be a gateway somewhere in the network as well. So there could be different levels of here are here of both competition as well as coordination among different entities, transmit the receiver gateways. Okay. But I chose this scenario because it's useful simple to explain the key idea. So in such a scenario, let's assume we have two users but as a result will hold for many users as well. So I have user 1 and user 2. And what is well known in such a scenario is that the Nash equilibria of such slotted Aloha games are this orange line in here. And as a matter of fact, the most likely outcome is the zero-zero outcome. So nobody will transmit. This result is known, collisions always occur and actually the users will get no traffic transmitted. So what we want is to find this method to support efficient outcomes without message exchanges of message passing, and we would like to do that without pricing. So one way to do that would be through the implementation of an invention function. So this is the currently -- yeah? >>: So that blue region is the ->> Mihaela van der Schaar: Oh, I have to explain a little bit better. So this blue region in here would be all the possible outcomes all the way to the Pareto boundary which would be possible if users would be able to coordinate. Okay? So they would be willing to time share the network. >>: Why is it like that? >> Mihaela van der Schaar: So in this particular case we are going to use this type of iterative function, yeah, so there will be no -- unlike the TDMA case, where I'm going to have something like this, yeah, where there was time share the network, in this particular case I'm not going to have coordination between users for the explicit -- for explicit let's say outside entity, but rather I'm going to have users in a distributed decentralized manner accessing the network. >>: [inaudible] can't time share? >> Mihaela van der Schaar: Yes, I cannot time share. Ideally, and I'm going to talk a little bit about that later, I'd like to have a way to coordinate among each other such that I can operate like in the TDMA scenario and I could time share. However, here everything is done if a -- that's it, it would like the current HO211A [phonetic] scenario, kind of a DCF scenario. It's a little bit simplified here, the slotted Aloha, I don't have sensing, but it would be quite similar to that. So what I have is I have users not sharing any information, just determine a transmission for liabilities. So this is the current contention games like used in current wireless networks like HO211 networks. And what we would like to do is we would like to introduce a new type of game, an intervention game. I'm going to call it for now Stackelberg contention game where the following will happen. I'm going to augment this type of game with the manager or the policer and this policer what is going to do as a first step is he will announce an intervention function or this intervention function could be known a priori. So he will say I would like you to behave according to this rule G. I'm going to go into more details about how this intervention work can be defined in just one second. But let's assume that I'd like the users to, for example, behave cooperatively such that they can operate on the Pareto boundary. And I'm announcing this rule. And then the different users knowing this particular rule G, they will determine the transmission probabilities. However, if they do not comply with my rule G, then only this policer will intervene and transmit. So what I'm going to introduce is a method to really tell users what is desire from them and in the case they do not perform according to my desired rule, I'm going to punish them. Let us go and understand this a little bit better. So I'm going to give an example. For example for this particular game I chose a total relative deviation. So what I'd like to have is I'd like to have an intervention function where the total relative deviation of all the users needs to be minimized, for example. And what I'm going to show is in here the relativity of a particular user, user I, and ideally what he would like to do to maximize his utility, if I don't have an intervention function, would be to always transmit. So the best for a user is always to transmit. As we have shown in the previous slide. However, what I'd like the user to do is to transmit only a certain amount such that I have the users actually time share this network and hence there is nobody cannibalizing the network and transmitting all the time. So the way to do that is by implementing an intervention function which can would shape this utility function of the user rather than being increasing so conventionally if I don't have an intervention, the users utility will just increase and his probability of transmission will increase. However if I'd like the user to operate on this particular point such that he's time sharing the network, then what will happen is I'm going to have the payoff being shaped actually by this intervention function. So this red curve in here will indicate the new shape intervention function through the use of this intervention user. So it will shape the utility function of the users such that it becomes in their own self interest when they try to maximize their utility to operate at the desired operation point. What is interesting to see is that, and that's that thing important, the level of intervention is zero at equilibrium. It really will be not an intervention. The intervening user will not intervene, he will only do that when the user misbehaves, when he starts to transmit above its prescribed transmission probability. So [inaudible] it is in the self interest of the user to maximize his utility to operate at the prescribed level and hence this intervention will not be taking place but it will only serve as a threat. So if you look at again at the differences between this type of work with conventional work done by [inaudible] people it's really quite different than conventional style grams and then after we named the Stackelberg game we almost regretted of calling it Stackelberg because it may be confusing. So in conventional Stackeklberg games what people do is the following. They have a leader, and this leader will have a fixed action. And given this fixed action, for example, the users were going to respond. And the manager is really the leader, and he will use resources. So if you look at the current existing games using Stackelberg strategies, the manager of use resources to implement the Stackelberg strategy. However, in our case what will happen is quite different. First what the leader will do, so what the manager will do, it will be a contingent plan depending on the followers choices. He may decide to intervene and how much to intervene will depend on how much the users are going to access, depending on the probability of access of the network. Also, the manager may not integrate an equilibrium, only a threat will exist. So resources are not just used. There is not overheads involved to put in this manager. Also, I want to say that here for simplicity I assume there is a manager, but this will be also implemented in a distributed way. You don't need to have a manager. So alternatively this contention game as opposed to conventional games can be viewed as Stackelberg game with multiple leaders. So conventionally we have one leader and multiple followers. Here we have the reverse. We have multiple leaders and a signal follower, the manager. And the manager will decide how eventually he will punish the users depending on what the leaders, which are the users, have done. So this is actually giving users incentives to behave appropriately. And this type of method could be used really beyond just a simple scenario I described here. We have looked at ad-hoc networks, multi-hop networks and in more recent times, for example, even as mitigation of attacks by malicious users. And it can serve the purpose of coordination as well as providing incentives to cooperation. But that was not shown here. However, I'm going to talk briefly about coordination now. So let's assume that we don't want necessarily to -- are not concerned with users manipulating their transmission probabilities. So if you rather have, you would like to have them coordinate such that we address the question that Phil just asked before. What happens if we would like them to operate like in the TDMA scenario, would like them to time share the network but without having a controller and without having message exchanges? So how can user in a distributed manner coordinate to access the spectrum and efficiently share the spectrum? So there will be a tradeoff between the degree of coordination and the amount of communication overhead. If you look at the current solutions, we have on one hand TDMA where coordination is achieved by a central scheduler. So we have both message exchanges but also we have an infrastructure. On the other hand, if you look at slotted Aloha CSMA protocols, then everything is distributed, everything is tee centralized, there is much less coordination so it means that users will very often collide. So the performance would be less than in this case. So what we would like ideally to have is protocols that are distributed and decentralized like this but which achieve such type of coordination. So we are able to do that. And the way to do that was through utilizing protocols with memory. So in the course of interaction, users receive different types of feedback. I'm going to give here just one example. They may know whether a channel was idle, whether they sent a packet and it was successful, or if it was a collision. Of course you could have more defined feedbacks and they could be taken into consideration to further improve the network. But for now we are going to assume just such a ternary feedback. The probability of a user transmitting is determined then by both its recent transmission decisions as well as the feedback information. And when designing the protocols we looked at two things. On one hand we would like to maximize the throughput, we would like to be as good as TDMA in terms of throughput, but also we would like a load delay. And we define delay as the average number of slots that the user needs to wait until its first successful transmission, okay, from a randomly chosen slot. So coordination can be achieved using memory in our method by correlating successful users in consecutive slots. So what I'm going to do is something very different than the current introductory level in DCF scenario for example of what current MACs do. We are going to have a successful user transmit with a high probability while -- if he success fully -- so a successful user which was already successfully transmitting data will transmit with a higher probability while other users, which are not currently transmitting will transmit with a lower probability. And of course as we allow a longer capture by one, the total throughput will increase while delay will also increase. So hence, we will have a tradeoff between on one hand efficiency and the other hand delay or if you like equity among the users because they were not able to access that channel as fast. Also as I'm going to show briefly in the next slide, with proper design, a fully distributed MAC protocol which would have an N minus one period where N is the number of users can achieve outcomes in a decentralized way which would be as good as a centralized Aloha. So let us consider protocols with memory 1 just for simplicity. So what I show here is on one hand the total throughput, so this is percentage, so it's the total channel utilization, and this is expected average delay, and here will be TDMA. TDMA would fully utilize the channel and will have minimum delay for the users. However, again, as we said before, it will involve some form of message exchanges as well as an infrastructure. On the other hand, what we have here in red is the protocols currently used without memory, and for example what we have here is an HO 211 DCF protocol being drawn. In green what I'm showing is a state of the art solution which would -- uses some heuristic methods to exploit some form of memory by Columbia University. And what I'm showing here in blue is a first algorithmic use just a memory of 1. And what you can see are tradeoff between having on one hand a low delay but a limited throughput versus having almost as good a throughput as TDMA but with a higher delay. Here I'm going to show only the protocol with memory 1 but we have now protocols that would be able to bring this toward the TDMA scenario. Now, why is this shaped like this? Well, the reason is that -- oh, by the way, this is for five users again, this ternary signals feedback. So for this particular scenario, the reason the curve is going down is because a success lasts for only one slot. Hence if there are many idle and collision slots between success and transmissions then what we are going to have as total throughput is that as the total throughput increases the idle and collision slots reduce and as a result, the delay will decrease. So that's the reason why the delay decreases in here. However, on the other hand, on this other side I'm going to have more successes and less collisions. However, a success of a user last for several slots. So hence, there are small number of idle or collision slots between two streaks of success. This will lead to a higher throughput but also to more delay. So you can see there is this tradeoff now between on one hand increasing the throughput but also increasing beyond the point of delay. >>: [inaudible] you will give a more [inaudible]. >> Mihaela van der Schaar: No. This is work done by another student, Joe Park. >>: Okay. So maybe. >> Mihaela van der Schaar: We can refer you to details in all of this in some of our publication. >>: Okay. >> Mihaela van der Schaar: Love to. >>: Maybe they can basically just briefly explain one of the collaboration protocols. So you basically remember in the last slot when you collide, whether the channel is [inaudible] or something, right? >> Mihaela van der Schaar: Yeah. So I now know the following. I know whether the spectrum is idle, whether nobody is transmitting. I will know that I have successful transmission the previous time slot. And I will know that I have the collision. >>: Okay. >> Mihaela van der Schaar: Kind of very similar with a current protocols we have today, like CSMA. >>: [inaudible] which is facing a channel that's occupied by someone else. >> Mihaela van der Schaar: I could -- okay. So what I have right now is I have idle or not. So I know the channel is occupied or not. Yeah. So I know it's occupied or not. >>: Okay. >> Mihaela van der Schaar: And if I transmit, I know that I was successfully transmitting or not. >>: [inaudible]. >> Mihaela van der Schaar: Yeah. So I know in the idle case is it used or not. >>: Okay. And what are the decisions ->> Mihaela van der Schaar: Okay. Okay. So the decisions are as follows: If a user successfully transmits in the previous time slot, then it will have a higher probability to transmit again. >>: Okay. >> Mihaela van der Schaar: So capture the channel. On the other hand, if users are going to transmit -- not transmit at the current moment in time, they will transmit at a low probability. >>: [inaudible]. >> Mihaela van der Schaar: And note that this is quite different. This is quite different than current protocols which -- yeah, yeah. Sorry. Again, [inaudible] time for all of the details, but we would love to show you more of the details. Okay. So if you look at this set of protocols with memory, we have the advantages of both worlds. On one hand they are going to have performances if you increase the memory. I just showed you memory 1. But if you increase the memory, we are going to achieve as good performance as TDMA without message passing. So the message overhead will still keep small. On the other hand, if you are concerned about the amount of memory, then you may have maybe performance which is only in this region as I showed in here. With memory 1, you will need to make a tradeoff between on one hand having throughput which may be as good as TDMA, versus delay. However, if you increase the memory, this curve goes towards TDMA. So we introduced these memory based transmissions in MAC protocols, and they require much less overhead, actually requires no overhead compared to the TDMA type scheduling. And by varying the level of memory, we can obtain a variety of performances in both delay as well as throughput. And these protocols with memory can be applied to various scenarios besides the example that I just gave now. For example, we use them for event networking as well. Now, to go back to a question that I got before from the audience, some of these methods could be implemented not only at a transmitter side or a receiver side but even within a network. So for example, I have another student of mine who looked at how we can have distributed spectrum management in relay networks. So you may have a base station in here, but we may have multiple users transmitting through each other to a particular base station. And then I will go in this project, which is a kind of new project for us was to have users autonomously determining their power. And what we would like to do is we would like them to spatially reuse the spectrum, so we would allow simultaneous rely transmissions of users. Also, we are going to use amplify and forward, which is low cost on like, for example, decoding for methods used by other people. We would let the nodes smartly interact with each other, meaning that they would like to optimize the spectrum in response to interference. So hence, what we do in order to optimize such a setting is again we rely on this non cooperative approaches, we model this as a game where the relays are the players. They try to determine their power allocation on the various channels in order to maximize achievable rates. So every user I will try to determine this power which will maximize the achievable rate given the interference of the other user, given the power that is allocated by the other users. So what we have approved in this type of scenario is that in such settings at least one equilibrium exists and determined the conditions for convergence for such settings an designed low cot protocols that are able to converge. And while doing that, the performance of this protocols was shown to be better than conventional approaches such as equal power or TDMA. And also if you compare to other existing solution for spectrum management in networks, for example, work that does spatial reuse of the relay slot or action based power control or for example simple two user settings where people have used much more complex schemes such as decode and forward, what we can see that in terms of spectrum efficiency signaling overheads and cost our methods has the best performance versus cost tradeoffs. So we achieve a high spectral efficiency at the low signaling costs, actually we don't have any signaling costs, so fully decentralized framework, and it's a very low cost because we rely on just amplify and forward techniques. Now, let me move at a different topic. I talked quite a lot about multi-user communication, but what I want to do now is talk about briefly about power constraint transmission in wireless media networks. And the work here is done by a set of different students. And the main coal here was to drastically reduce power in media systems. And the goal was to be both application aware as well as try to come up designs which will reduce both the transmit as well as the processing power. And the solution was that really to deal with the dynamics introduced by both application as well as by the dynamics in maybe the power consumption or the wireless channel to learn these dynamics online and come up with a run -- time stochastic optimization that will try to maximize the quality of the video data, for example, subject to whatever power constraint we may have. And really the overall goal here was to come with a shift in the current design for media systems where layers now are able to interact with each other. It's a different type of interaction as in the previous slides. In the past users were competing with each other. In here we have different layers of the protocol stack. So if you like the different agents are cooperative. They really try to cooperate with each other. So they would like to take into consideration their dynamics are different layers and based on that information, they would like to reason and interact. So the idea would be to have based on this interaction and learning methods that will lead to smarter design at different layers for the system stack. So just to give an example, here the layers are no longer application layer for example neck layer and physical layer but of the layers we talk about is the application layer, OS layer and the hardware layer. And our focus is again on trying to maximize for example the media quality and the traffic dynamics given whatever power constraints you may have. And the type of problems that we have looked at were bodes rate distortion, power control as well as power management, like for example faulty scaling as well as work such as resource allocation and scheduling in the case we have multiple tasks. We have also looked at the case where this hardware is faulty. So we may have errors in the hardware. And in that case, what we would like to do is still be able to utilize these ICs, rather than throwing them away, try to compensate for them at the higher layers. And if you look at the common solutions used to deal with this system problem, the key idea for us was no longer just act based on current information but rather consider how the current dynamics will impact future dynamics as well as future decisions. How, for example, a decision of switching to a different power level or switching off my transmission -- my RF at this moment in time will impact not only current performance but both the future performance as well as future costs. So this is what we call foresighted decision making. To deal with these dynamics online, we rely on reinforcement learning solutions. And this is kind of different than conventional. When people talk about learning in the past, this they mainly talk about model based learning or learning based on, for example, maximum likelihood estimation or even our own work on adaptive linear prediction. So most of the time what we have is we have a layer, for example the application layer of the OS layer looking at a particular type of dynamics, for example, traffic dynamics or CPU dynamics estimate the current conditions and based on that determine a local policy, local to this layer and decide what is the best strategy to act upon at this particular layer. Rather what you would like to have is the proposed solution would rely actually on repeated interactions among these agents and the unknown Dimes at the different layers of the protocol stack. Also what we are going to do is we are going to rely on model free learning techniques or reinforcement learning techniques. So we are going to have interactive learning, we are going to try without having a model to learn these different dynamics online and based on that also this different layers are going to be able to interact with each other and exchange messages. So our focus in this work is what is the minimum amount of message exchange I need to have among layers to enable to maximize the utility across the different layers of the proper core stack. And this reason this is challenging is because there's different layers of the proper core stack operate the different scales. So for example, the application layer operate at a very different scale then for example the OS. So hence what we would like to do is deal with this kind of asynchronous behavior and have a distributed control which would be able to achieve the maximum performance. And from when we will briefly talk about some of this work but not in the context of this problem but rather in the context of media transmission. Let us look how good these methods really are. So what I compared here against is I'm going to compare mainly against work done by Professor [inaudible] and others at UIUC where they adapt all the different layers of the protocol stack but they are doing that myopically. They just look at what happens and react based on that. So if you look at what happens for the particular scenario considered here, it is the performance of the video transmitted is only 31 dB. So it's really quite, quite low. However, on the other hand, if you look at the performance we cannot achieve by applying these foresighted decisions, performance significantly improved. And the reason is as follows: What we are doing is actually we are able to capitalize on the fact that current actions we are making will impact future decisions. And these future decision will have implications at also other layers of the protocol stack. So this is becoming very important. So it's not only about cross-layer optimization, but also making decision -- understanding how these decisions will make different layers impact the decisions at the same layer over time but also decisions that other layers over time. Finally let me move at another top, peer-to-peer networks. We did quite some work here in recent sometimes with the student that just graduated on this topic. And the main idea there was not necessarily looking at media but rather trying to come up with just better resource reciprocation strategies, how users could determine their upload and download rates in dynamically changing environments. So if you look coming up with a better BitTorrent type algorithm. So what we wanted to do is you have optimal solution for resource reciprocation among peers that are interested in the same content. And you would like to have a method that's both rigorous and analytical so not only simulation based but we would like the predicted outcomes that we can get. And to be able to deal with dynamic interactions so peers coming and going as well as with heterogenous peers, so peers having different uploads and download rates that are self interested and try to maximize their own utilities and they act autonomously. So it is a distributed scenario. And the solution there was to really model this type of reciprocation as a stochastic game where peers again are going to play their game. The actions that are going to be involved by the users playing the game is how much resources they will reciprocate and to which users they will reciprocate. Also which user they should choke or should unchoke. And they will determine that in the framework of the stochastic game by making foresighted decisions. So they will not only look at the current impact on the performance but also how their performance -- how their selection of actions which peers to choke -- to unchoke to whom they should transmit information, more information or less information will have an impact not only short term but also long term. So the goal is again to maximize the utility among this repeatedly interacting peers. And for that of course in order to do this long-term optimization, you really would like to have an efficient method and robust method to model the peers' behavior. And we do that again using online learning. The good news about this is that in more recent times we had the computer science department being interested in implementing some of these frameworks, so they have implemented this in actually PlanetLab for us and the current results we have are very promising significant improvements as opposed to methods such as BitTorrent or BiTyrant and other methods existing there. Okay. So this is really the last topic I'm going to talk about. This is again a more recent topic. The idea here is really trying to come up with some form of cyber discovery, if you like in the case of distributed network processing units. So you can think actually about having multiple gateways, for example. Everybody in the home having a gateway. This contain a particular amount of video data. And what you'd like to determine is you'd like to determine you're interested, for example, in skating. So what you'd like to do is you'd like to go through a set of queries whether it's a team sport, whether it's a winter sport and all these different types of queries are linked together. And what we have shown is actually that these complex classifiers can be decomposed into cascaded topologies of such simple binary classifiers but even more interestingly, if these type of classifiers are low indicated in a particular processing node. So for example, a particular entity is specialized in solving this type of -- answering these type of queries, while another entity is specialized in answering the different type of query and these nodes are not co-located, they may be at different locations in the network, then what we will have is we will have processing operators that can be instantiated on distributed remote devices and these devices may have their own constraints, resource constraints of answering a particular query or processing these particular -- a large amount of queries in a certain amount of time. So what you can think about is not only one query being sourced at a particular moment in time but maybe multiple queries simultaneously being sourced. So hence, these processing nodes may be congested. The problem then becomes can we, for example, maximize the joint classification quality across a large number of queries subject to maybe resource constraints by the processing nodes, delay constraints, I may want to answer a particular query within a certain amount of time, maybe couple of seconds or a minute, the dynamics of the traffic coming through the different classifiers and also since these classifiers may be at times located at different locations, how am I going to deal with these distributed decisions? Because at a particular node I have all the decisions to make. I can decide for example to operate at the higher accuracy to come up with a better classification of the data, but this may a longer amount of time for processing. So it's a tradeoff between curiosity of processing data versus the time associated with processing the data. Moreover, for certain types of queries, even the order of these types of questions of this type of classification decisions could be changed. So what we have is a joint decision on one hand of a topology, so if you like is a routing problem, but which is just combined at this stage not only with routing, so certain to logical mapping of these different classifiers but also the processing involved at every classifier. So as opposed to conventional let's say routing problems, the problem is we're not only routing the data, but also how much I should -- how I should instantiate what should be the operating point at every classifier because each one of those will lead to a particular processing delay and hence buffering the delay across the chain. So we come up here with new solutions for stream mining on both distributed routing, processing algorithms as well as optimization and online learning. And some of this work has been done in collaboration with IBM Research Watson. Now, there are other topics that I didn't discuss here. Like for example we did quite a lot of work together with Intel on multi-hop wireless networks, mesh networks, mainly for campuses and larger homes. We did some work on cognitive radio networks by Fangwen here, but he will not talk about it. We looked also at wireless media communication. Fangwen will talk about this in his talk. Also we looked at new cross-layer designs, methods to do cross layer design as well as in more recent times at some new solutions for doing reconfigurable coding. I this think Gary [inaudible] talked a little bit in his talk, a new paradigm for video compression. So to finalize really, many people are asking, well, how can we come up with a new clean slate design for the Internet, for example, or for networking and how we really can catalyze a new generation of algorithm, systems and design. And it is our view that what we would like to do is we don't want necessarily to come up with -- to throw away protocols and throw away designs but ratted just reengineer these existing protocols and make them operate as markets where devices based on this private knowledge that they have can dynamically coordinate, cooperate and compete so don't have these dumb devices but rather smart devices which can cooperate and compete and maybe even design the objective functions of these users such that they are aligned to whatever objective maybe a network provider may have. So based on our results we show that this really improves significantly the performance of the network. And again Yi will go into a little bit more of how this is possible. So our hope is really that these devices now will no longer be passive transmitters or receivers or relays, but rather they will evolve and become smarter because they need to compete against other users who coordinate with other user and it is an online learning process taking place. So the hope is that this will lead to theory as well practical designs and add some new dimension to both communication theory and network theory and also increasingly we believe that in game here as well because there the focus has been much less on equilibrium selection and much more equilibrium characterization. Also, the work there has not considered constraints in terms of memory, in terms of message exchanges which are very important for communication problems. Finally I know I skipped through a lot of the material, even though I talked way too long. And if you're interested, I'd like to ask you to please go to our website and from there you can find a link to our publications. And if you have any questions or any comments about any of our work, we will be delighted to answer more of your questions. >>: You really have too much material. [applause]. >> Mihaela van der Schaar: Too long. >>: I'm afraid to ask you questions because I think that will stretch the talk way too long. Actually would like to learn more about the [inaudible] especially in the [inaudible]. But I mean, I realize [inaudible]. >> Mihaela van der Schaar: Too long. >>: For me to interrupt. >> Mihaela van der Schaar: Maybe we can have the discussion after the lunch. >>: [inaudible]. >> Mihaela van der Schaar: So I guess you either can come and visit us or invite us -- if you are interested in -- >>: I will find out ->> Mihaela van der Schaar: If there is a particular topic. So my purpose was to tell you different topics, but if you are interested in a particular topic, either I or I together with a student we could come here back and give you more details. Also, Yi will talk now about some other types of protocols than the ones shown here but which are also able to improve the performance in wireless networks. So that may be interesting. And it will go in much more detail. Thank you. >> Yi Su: Good morning, everyone. My name is Yi Su, and I'm a graduate from the electrical engineering department from UCLA. And I'm working with Professor van der Schaar's group. It is my great pleasure today to present my research work at Microsoft Research. So the topic of today's presentation is new perspectives on multi-user communications which summarize our most recent progress in the understanding of how heterogenous communication devices information availability, decision making and learning capabilities have an impact of the performance of communication systems. So we start from a very high level abstract of the current existing literature in multi-user communication. As Professor van der Schaar just mentioned before, there are two main categories of this existing approach. First is called cooperative. Sometimes when you have centralized manager and collect this information for all the communication devices and perform some centralized optimization to maximize certain systemwide objective function. And sometimes we can also have some non-cooperative approach. And the outcome of such multi-user [inaudible] response to the Nash equilibrium, it's well known that the Nash equilibrium might not be efficient. So here is the information availability. So the existing research however is focus on two extremes of the information reliability, either local information or the global information. In reality we know that these communication devices might build space based on different standards or algorithms. And based on their hardware constraint, the ability to make the ability to make decision can also be different for different devices. So our question is that what if this happens and how will the system performance of the communication system will be like. So in this talk we will use two illustrative examples, including wide -- multi-user wideband power control and also wireless random access. So here's the structure of today's talk. So we will discuss two kinds of communication devices. We classify the communication devices into myopic users and foresighted users and we will address the definition of this my on here and foresightedness later. And we also have different assumptions about the information that these foresighted users possess and we will see how this information -- when the information changes how the system performance will vary. So the first example we will just use a multi-user power control scenario. So basically we investigate this problem in the frequency-selective interference channel setting. And it's a common setting in both the wired DSL system and wireless OFDM system. So in this problem, the individual communication devices want to optimize their transmit power spectrum density function in order to increase the achievable rate in such system. And we are going to use the game theoretic design to study the performance. So here is the system diagram. And XIF represents -- so it's basically it's a frequency-selective interference channel, so XIF represents the transmitter signal from transmitter I at frequency F. So HIJF represents the cross channel game from transmitter I to receiver J. So it's frequency selective, meaning that this HIJF is not a constant, it's a function in superscript F. So it will vary from frequency to frequency. So it receives YIF represents a receive signal at the receiver tray. It basically is composed of the -- it's designed signal from its own transmitter plus their interference signal from the other transmitter and also plus the sigma IF here. Sigma -- I have sigma IF here rather than noise spectra density at the receiver side. So if we look at the user K's receive spectrum, basically it's composed several -it consists of several parts. The first part is the noise spectra density absent some receiver, so it's indicated by the red curve. And the second part is the interference, the other transmitter cost to him. Basically it's the white space between the blue curve and the red curve. And also it will receive its desired signal from its own transmitter. So for reaching the middle user you need to determine how much power you need to allocate in different frequencies. So basically you need to determine this PKF here. This represents the user K's allocated power in frequency F. So each user is subject to a total constraint. So the summation for other allocated power in different frequent [inaudible] need to be less than or equal to this total power [inaudible]. >>: [inaudible] can you assume the message is delivered to one user or you can send multiple message, multiple [inaudible] let's say I have 10 message in my queue which is how it [inaudible] are you assuming in the allocation talking to multiple of them or are you basically just assuming [inaudible] one [inaudible]? >> Yi Su: Oh, he just need to talk to its own receiver. >>: Okay. So basically that's one receiver. So basically here he always has one sender and one receiver? >> Yi Su: Right. But they are simultaneously transmitting so they are creating difference to each other. So the utility we define here is actually we assume that each user must equally cheat the interference from the other user's noise so the total attributable rates it can successfully decode is actually given by this formula. Okay. So here we have the basic infrastructure of the multiple user interaction. Then we can define S again. So basically we have number of players. And action set is given by this one, meaning that each user need to choose some transmit power spectrum density function that satisfy the total power constraint. And also each users payoff is given by the achievable rate. So now we briefly reveal the existing research -- existing solution from this wideband convergence scenario. And the first kind of approach is called non cooperative approach. And the most famous algorithm is called iterative water-filling algorithm, proposed by people from Stanford. So from a single user perspective, given the noise and the interference prospect on density, the best thing he can do is just to perform the water filling. So iterative water fill is just a multi-user version of their single user water filling. So the basic procedure for the iterative water filling is the following. So every user just randomly pick up some visible power allocation and really this power allocation may not be the best response with respect to the interference of -- with respect to the interference that the other user cause to him. So at each iteration each user will update his power allocation as the water-filling solution with respect to the interference as the other user cause to him. And once he update his power allocation, he will create a new pattern of the interference to the other user. So he will perform this iteration over time. And people have shown that on the wide channel condition this iterative procedure will converge to a unique Nash equilibrium: So this is the so called iterative water-filling solution. In this scenario, each transmitter needs only to gather feedback from its own receiver of the interference pattern the other users cause to him. So there is no information exchanged between different transceiver pairs. Okay. The other type of approach is called the cooperative approach. And in this case, they need to have our system manager which need to collect all these cross-channel coefficient from different transceiver pairs. And this centralized controller then will perform some weighted [inaudible] maximization problem. Please note that this problem is actually a non-cooperative problem which is very hard to solve. But people use some dual approach to handle this problem. So this figure shows the performance comparison between the non-cooperative approach and the cooperative approach. We can see here for the cooperative approach they need to have a lot of information exchange, but the performance is much better than the best known solution of this non-cooperative scenario nowadays. So our question is that if we only focus on the non-cooperative setting, so users only have their local information about the interference pattern the other user creates -- create to him, can we do better than the iterative water filling? So now we will assume that some user gets smarter. He has the information about the water filling game and how he should to perform in such a non-cooperative setting. So this is the basic question. So we want to get some intuition from very simple to sort of to action game here. So we have row player and the column player. For the row player he can choose to play either R or then for the column player he can choose to play left or right. So each element in each box, the first element in each box represents a payoff for the row player, and the second element represents the payoff for the column player. As we can see here, 2 is larger than 1, 4 is larger than 3. So if the row player want to play the Nash strategy, then definitely he will choose to play down. So however if he choose to play down and for this column player because 1 is larger than 0, then he will choose to play left, so they will end up being a down left play which creates a joint payoff of 2-1 to both users. However, as we can see here, this up right will actually result in a strictly better payoff for both users. If those row players choose to play up. So from this simple game we can see here, this down left play is actually the Nash equilibrium, however, if this row player he gets smarter, he choose to play up, then both user end up a better play. So which means that if user is myopic, he just take the action of the other user as given. He will play the Nash strategy. This outcome will be not efficient. However, if this user gets smarter, he become foresighted and he know the actual structure of the game they are playing, he might choose another action which might benefit himself. So this is the basic intuition that we get from this simple game. Okay. Then we go back to the original wideband power control scenario and to investigate this foresighted issue, first we need to define the so-called Stackelberg equilibrium. So for Stackelberg equilibrium, we need to have one foresighted user. And also multiple myopic followers. So for these foresighted user we need to determine an action based on the following rules. So once he pick up action, he's aware of the response of all the other myopic followers. And for this foresighted user, he will choose the action of these AK star here. This AK star will leave to himself a utility that is not worse than all the other feasible action in its own action set. Meaning that this foresighted user is aware of the reaction of the other myopic users. Then he will simply choose the best action that bring the best performance for himself. Okay. Then we prove that the existence of the Stackelberg equilibrium is always -- we always have a Stackelberg equilibrium in the power control game. This is simple because this mapping is continuous and it is upper bounded. So this upper bound is called the single user waterfilling bound, meaning that the other users creates no interference to this foresighted user. This is the best rate that this foresighted user can get in the interference-free environment. So as long as the other user creates interference to him and his attributable rate will be less than these upper bound. Okay. Then the next question is how can we find the Stackelberg equilibrium, and how much better this Stackelberg equilibrium is compared with the Nash equilibrium. Because intuitively it should be -- it should not be worse than the Nash equilibrium. So we start from a very simple two-user scenario. It can be standard to the multi-user version. And we can formulate as a bi-level programming problem. So here we this user want to be the foresighted user. For this user 1 you need to determine his power allocation which is a vector that this par allocation will maximize its own achievable rate. Please note that here when user 1 formed this particular program need to consider the reaction of this user 2 because user 2 will simply choose the water-filling solution. And so basically this bi-level program consists of two sub problems. The first is upper level problem. And actually in the original iterative water filling solution, this foresighted user only saw the upper level program. He will totally ignore the lower level problem. So if the user become foresighted, he will form such a program. And the question is that having such a bi-level program, how can we compute the Stackelberg equilibrium? Okay. So we -- we try to understand the computational complexity of this such Stackelberg equilibrium and trying to based on the understanding we propose some low complexity algorithm to calculate the Stackelberg equilibrium. So in order to handle the original bi-level program we need to reformulate it into a single level problem. So we introduce a function named the water-filling function here. So basically this water-filling function will determine the user 2's allocated power at different frequency beams given the prior allocation of the foresighted user and user 2's own noise spectrum density cross-channel game, also its own power. So here is the cross-form expression for this water filling function. This looks complicated, however, the intuition here is very simple. We can look at the illustration here. So let's say this is their user -- user 1 chose something about power allocation and this is their spectrum that user 2 does -- this is the user 2 receiver's spectrum. And we can see before user 2 determined his power allocation he need to do some -- do the following things. First he feed to ran his channel based on the channel condition from the best channel to the worst channel. And the second thing he need to do is that based on his own power budget, he need to determine how much channel I need to occupy. And just simple increase his water level until he use up all his power budget. And then we can see from the closed form expression for the water-filling function is that first we need to have a permutation which ranked user 2's channel condition based on its own channel condition. And the second thing is that we need to determine some number of the channel such that user 2 will only access this channel. So as we can see from this closed form expression because as long as user 1's water level go across the boundary between different frequency beams, which will cost this function to be the left derivative and the right derivative to be unequal. So which creates these water-filling function to be non- differentiable. And at the same time we can see here this power term, this P1 F, also appears in the denominated term in the objective function, which creates the objective function to be non-convex. So these cause the optimization problem to be very hard to solve. But however, we notice that there exists some literature applying the dual algorithm in the non-convex power allocation in the literature. And we're trying to apply this method in our Stackelberg equilibrium computation and to see whether or not it's suitable for our scenario. So to apply the dual algorithm -- so basically we have the primal problem here. Then by having the primal problem, we first -- we form the [inaudible] dual and by maximizing for a given [inaudible] value but we maximum the [inaudible] dual then we have the dual function. Then by minimizing dual function we have the dual optimal. So firstly we show that for the Stackelberg equilibrium the dual gap may not be zero, however it's dual optimal. If the dual gap is not zero this dual optimal will be strictly tighter than the original single-user waterfilling bound. The mono tonic property of these dual function in terms of the [inaudible] valuable, meaning that if I increase the penalty term of the [inaudible] valuable and [inaudible] the dual -the dual function will decrease as I increase the penalty. And the most key things is that how is the complexity of solving this problem in the dual domain? Unfortunately it's still has the same complexity as in the dual domain which caused this problem cannot be decomposed into different [inaudible] problems. So why this is -- cannot be reduced? Because in the water filling function in this slide user 2's allocated power not only depends on user 1's allocated power, but in this particular frequency bin. It also involves some allocated power in the other frequency beams. So which cause this problem. Very hard to handle. So the key idea here to propose low complexity solution is that instead of solving the [inaudible] globally, we propose to use the local maximum to approximate the dual function. Then we just need to solve -- find the local maximum of this [inaudible] valuable and use these to update the dual function. The advantage is that we have very low complexity solution and it's not purely based on heuristics. And we only -- we also allowed arbitrary number of myopic users. Okay. Then by applying this low complexity solution, we stimulate a wide range of channel realizations and here something interesting happens. So we find that by introducing such a foresighted user, using this low complexity function, this foresighted user always achieve better performance compared with the iterative water-filling algorithm, meaning that if the user get the global information he choose another action, he will benefit -- benefit -- definitely benefit himself. While on the other hand for these myopic followers, surprisingly, these myopic followers also in the most relations these myopic followers also get strictly better performance. So it means that if this foresighted user, even though he's trying to maximize his own utility, the other myopic followers also get better performance so this is quite unique because in the original game theoretic scenario people don't have a generalized solution -- a result about in the Stackelberg game how's the performance of the follower. So it turns out in the water-filling game the follower's performance also gets better. Okay. How this is achieved. So let's look at the action domain. So these two figures shows the power allocation for both users and the upper figures shows the power allocation using the iterative water-filling algorithm. We can see here if both user apply this water-filling algorithm, then they will just use this best response. However, if this user 1 he get foresighted, then he will choose to not water fill his power, he will just occupy some channel that is in that channel he has very good channel conditions. By putting a lot of power in these channels and while at the same time he will keep the followers away from this channel, and by doing so, he can achieve much better performance compared with the water-filling solution. Even though he can gain immediately increase by water fill in these channels. However, it turns out if he wasn't filling these channels in return the other user will occupy this channel which result in worse performance for himself. >>: So [inaudible] has some kind of different channel transfer. >> Yi Su: Right, right, right, exactly. >>: So that's why you put on channel 3 and 7 to 10 is more favorable channels ->> Yi Su: That's the basic idea of frequency selected, yeah. >>: How achievable is that in the real world case? >> Yi Su: Okay. So basically in the DSL system, so this channel coefficient is fixed. It's fixed over time. >>: But you need to discovery ->> Yi Su: Right. So ->>: Channel condition which is fixed over time, I mean, may not be -- I mean channel condition [inaudible] change ->> Yi Su: Yeah. So the original power control, what this water-filling game is proposed for the wired system, for the ADSL and VDSL system. >>: [inaudible]. >> Yi Su: Okay. For that case. >>: Which -- yeah. >> Yi Su: It's not a problem. >>: Yeah. >> Yi Su: So here we -- in the first part of this talk we assume that they know this information. >>: Okay. >> Yi Su: And in the next part we will relax this assumption and they will learn this information. >>: Okay. >> Yi Su: Okay. So the conclusion so far is that we have the non-cooperative setting and however if some of the users they get foresighted is able to nail the structure of the game he's playing and potentially he can improve the performance for all these players. Okay. As I just mentioned, in order to play this Stackelberg equilibrium we assume that this foresighted user have global information, which might not be very realistic in the -- in real systems. So a more realistic assumption should be for this foresighted user he should only know the interference that the other user cause to him that aggregates effect the other user cost to him not the detail of the channel across channel transfer function of the actual game he plays. So our next investigative problem is that if these foresighted user only knows this aggregate effect how he should -- how should he explore this structure? And also attain some desirable performance that is similar as their separate equilibria. So the intuition that here is foresighted user he should somehow model the accumulative interference of the other user cost to him. Okay. Then now we reask this assumption then to see how this user should react. Okay. In order to model the mutual coupling, we need to reformulate the game that we original designed and first we need to introduce a new concept which is called state. So the state means something that some value that has direct relation for different users to different users utility function. So given a state, given the user case state and given user case action, each will directly determine this user case utility. So in particular, for example in the power control game, the state will be the interference the other user cause to him. So by defining state, we need to have this state determination function so this state determination function will captures the actual play of this particular game. So it will capture the aggregate effect of the other users' actions. So it will be a joint mapping from the joint action space to individual user's states. And on the other hand, we need to have some internal belief. So we need to have a belief function so this belief function will be mapping from individual's own action space to its own state. So meaning that each belief function will represent individual users internal modeling of how its action will impact its own state. So it's a purely internal belief from individual users own perspective. Now we are ready to define the conjectural equilibrium. So what does conjectural equilibrium means? So basically it's a configuration of belief functions and the joint action set. And at a conjectural equilibrium its user first he find that his beliefs are realized. He knows that if it takes such an action it will bring him certain state. And this state it will be confirmed by the actual play. And on the other hand, he will maximize his own utility function by believing that by playing this certain action and he will -- it will bring him such a state and this action will maximize his utility function. So now we decoupled the multi-user interaction and user just have its own internal belief and who it will perform as a best response with respect to its own internal billing. Now we are ready to apply this conjecture agreement into the water-filling scenario. Okay. So in the power control games, the state space is simply defined as the interference the other user cause to him, and the utility function is still the same as the previous case. And the state determination function, this is the actual play. It uses allocated power will directly determine his received interference. So the belief function in this case will choose it as a very simple linear form. And we will explain why this makes sense. And here individual user will just simply believe that if he increase his allocated power in frequency N and interference that he's going to experience will be decreased by a certain amount, especially his internal belief, and -- okay. Here is the main result we prove. So first we show that the aggregate interference the other user cause to this to this particular user will be a piece-wise linear function of the allocated power of this particular user in this channel. And the first order derivative with respect to the allocated power in the other frequency bins will be zero if the number of frequency's sufficiently large enough. So which means the first order information is sufficient enough to capture the multi-user coupling in such a multi-user power control games. So the intuition is -- we will expand the intuition here. So actually this is user 2's spectrum. He originally performed the water filling. As user 1 increased his allocated power in this frequency bin by a certain amount and if he increased this certain amount and user 2's water level will definitely increase, however, if the number of frequency bins is sufficient large then the increase of the water level will be negligible, so which means that the allocated power in the other frequency bins, the increasing of the hour level in the other frequency bins will be zero and while at the same time the user 2's allocated power in this particular frequency bins will be reduced as a linear function of the user 1's increased power. In return, user 1's experience interference in this particular channel will just simply linear function of its own allocated power in this frequency bins. So this is the basic intuition. And second thing we show that in both the previous case the Nash equilibrium solution and the Stackelberg equilibrium solution are just a special case of the conjectural equilibrium. Corresponding to different parameter set in the equilibrium funding. So in the Nash equilibrium case the foresighted user will just simply set up the first order term to be zero while at the Stackelberg case this foresighted user, he need to select his first -- he need to set up his parameter gamma to be the negative of the first order information of his own allocated power. And the result we have is that actually the open set of conjectural equilibrium exists in the belief domain. So now since we are mostly interested in achieving the Stackelberg equilibrium of only have local information, then the next step is simply how should we properly set up the belief function of beta and gamma in order to attempt to approach the Stackelberg equilibrium by only having local information? Then we propose a dynamic update. So for this foresighted user during each iteration, first you need to estimate the first order information, the first order derivative of the aggregate interference the other user cause to him as a function of its own allocated power in different frequency beams. By estimating this information, he will include this first order information into its own optimization program, then he will update his allocated power. Then he will reestimate the first order information. And they do this iteratively until no further rating improvement can be achieved. Then we have this conjecture-based rate maximization algorithm, and the essence of this algorithm is actually a local approximation of the bi-level program. Because we explore the structure of the water-filling game because simple linear belief makes sense in this scenario. So we can improve the performance of these multi-user power control scenario by only have very simple linear beliefs and only having local information. The simulation results shows that it's achieved comparable performance of the previous case where users, the foresighted user plus global information. Okay. So the conclusion so far is that if user -- foresighted user plus only local information, if he's able to make correct belief and potentially he's still able to improve the performance of himself but also the other players, okay, then the next question is that what if we have multiple foresighted users? And often is trying to learn and adapt. It turns out it might be a little -- it might be difficult at this stage because for this power control game the action set is actually a vector set. It might not be that easy to handle, however, we are able to discuss this problem in another different setting. So which is the wireless random access. So we allow multiple foresighted users and often just have local information. And we also apply this conjecture base approach and to see how it will shape the multi-user interaction in such an application scenario. So basically we investigate multiple nodes in a single cell. So the action set for each determine is just to determine the transmission probability. And the payoff is given by the throughput as -- just as professor mentioned before. So it's given by its own transmission probability times the probability that the other user doesn't transmit. So the key issue of this multi-user interaction scenario that people are most interested is the stability of your algorithm, the convergence of the algorithm, and how is the throughput efficiency and fairness in such interaction. As we have seen before, if user is purely myopic, he will set up this probability to be one. Then the network will collapse and nobody can get through. Okay. Then we investigate the conjecture-based approach in such a scenario. And very simple we -- simply we can define the state to be the probability there for each user having contention free channel. This is the probability that for this particular user K he has a free channel. Then this states a [inaudible] local information. It just monitored the aggregate effect of the other user cost to him. Then still in the user just develop linear belief. And for each iteration user will be -- each user will be left as by deviating my current action by a certain amount and it will in return reduce my probability of having a free channel by this amount. Then we are mostly interested in the limiting behavior of such dynamic process. And we investigate two different update mechanism. First is the best response scenario. Each user just update their action based on the best response. And the second scenario is just update the actions along the gradient direction. Okay. So here's the actual play. Here's the conceived play. And we have four major results. The first result we have is that all the -- all the operating points in the action space are actually conjectural equilibrium. This has nothing to do with the stability. And the next result we have is that we investigate the stability in the throughput -- we provide sufficient condition about the stability and convergence about these two dynamic update algorithm. And by having this sufficient condition, we are able to show that actually the impassable region is spanning from the original Nash equilibrium all the way to the Pareto boundary are just stable, stable conjectural equilibrium with respect to different belief setup. And the first result we have is that actually it's different operating points. These multiple users actually base the weighted fairness rule. So by having these four results it means that if we design these conjecture-based approach properly we'll simultaneously have stability, convergence, efficiency and also fairness. Okay. These slides expand why users can achieve weighted fairness. So definition of weighted fairness is as follows: So this formula gives the probability of this user I of having its packet gets through. So weighted fairness just means that each user's probability of having a successful transmission is weighted equal. It's proportionately equal. And by simple manipulation we have this form among different users. And we add the equilibrium of this conjecture protocol we find similar results even by this formula, and we know that if the number of user in the entire network is large enough and the probability of transmission for its users will be close to zero, then definitely we can have two equality approximately the same. So the next question is how should we set up the belief function in order to obtain the Pareto efficiency? And the key idea is that things should adapt. They need to because these users. They access -- they have access to the same observation because the outcome they know exactly which packet gets through. So they will simply update -- adapt this parameter in the function until they found that they cannot further improve that system performance. Okay. The advantage of this solution is that we don't need to have a centralized solver and to just operate based on the local information. And the resulting operating points is throughput efficient and also fair. And it can autonomously adapt to the traffic fluctuation. So in these slides we'll present an engineering interpretation of this conjecture-based approach to explain how it is different from the DCF protocol. And this is the original July date for the best response scenario. And we can actually decompose it into several different terms. These different terms respond to given the transmission probability of all these users based on the observation of the previous transmission, how the probability of these different users will evolve in the next time shot. So basically it's described the deterministic trajectory of the multi-user interaction. And then we just fill in this table. This represents the conjecture-based random access. And we can see here if user 1 here observe the other user make a successful transmission for our best response scenario, he will choose to reduce his transmission probability by a factor 2. And simply we can fill in these blanks for the other observations. Then we have our protocol of this conjecture based approach here. Then also we can have the currently the existing DCF function -protocol here. So basically what DCF performs is that if I did not transmit in the previous time slot I will not change my transmission probability, I just simply review my contention by number of 1 and if I observe a collision then I will reduce my transmission probability. If I make a successful transmission, I will increase my transmission probability. As we can see here, our proposed approach is actually similar in the case that I made the transmission. I made a transmission attempt. This is similar as the DCF protocol. However, what is different is the case that if I did not make a transmission. But still, I need to make some modification of the transmission probability of the next time slot. Why this is interesting is because even though I did not transmit in the previous time slot, I still observe something from the joint play. And I know that if the channel is congested or not. Then I should also be able to adapt my next transmission action in order to timely adapt to the network fluctuation. So this is the basic idea. So which means that the conjecture-based random access actually is -- makes use -- is making use of 4-bit information compared with the DCF case which only 2 bits information is explored. Okay. So here's the simulation results. And we can see here the DCF is not efficient as the number of nodes increase in the network and how the P-MAC protocol is actually proposed by people from University of Michigan at IWCOS. And in this case they need to know exactly the number of nodes in this work and they need to develop some -- approximately develop some [inaudible] expression of the transmission probability that enable to achieve very high level throughput. And it turns out by having this conjecture-based algorithm we can also be this P-MAC protocol by having no information about this program -- about an action network but just adapt based on the actual outcome. And however, for this P-MAC protocol needs to know the number of nodes and in the actual implementation people need to online estimate the number of nodes so which cause this protocol to be unstable because it's overlooked. There's no [inaudible]. And because of our [inaudible] design we showed that this equilibria is stable. So even though we can see the number of nodes we changed them, the number of nodes in the system over time, we can see here actually the conjecture based algorithm actually can still sustain the stability of the operating point over time. So the conclusion so far is that if everybody only have local information and if we design the belief function correctly and properly, still we are able to improve the system performance then the key idea is just to -- based on the problem of your investigate scenario how should dine your belief function and update it. So the future direction is that we are trying to understand for which type of multi-user interactions this conjecture-based approach is suitable. And also based on the information availability in different application scenarios how we should choose appropriate equivalent concept and how we can compute it and about -- and also its stability, convergence and fairness. And we also looking forward to apply this conjecture-based approach in different applications. And based on the presented work, we have two accepted journal papers and also one more pending. And we also published two more conference papers. These are all actually transaction papers. Okay. Thank you. >> Jin Li: Thank you.