>> Jin Li: Today it's our great pleasure to... Schaar from UCLA to be with us and give us...

advertisement
>> Jin Li: Today it's our great pleasure to have professor Mihaela van der
Schaar from UCLA to be with us and give us a talk on a brief synopsis of her
recent research at the multimedia communications and systems labs at UCLA.
We will first hear Mihaela to give us a brief overview of her lab and then hear
detailed talk from two of her students, Yi Su and Fangwen Fu.
Professor Mihaela van der Schaar has won numerous award, I mean during her
career. She has won 2004 NSF CAREER Award, 2005 Best Paper Award in
IEEE transaction on CSVT, 2006 Okawa Foundation Award. She is IBM Faculty
Award winner for year 2005, 2007, 2008, and won the 2008 Exploration Stream
Analytics Innovation Award from IBM Watson in year 2008.
In 2006, she won the Most Cited Paper Award from European EURASIP Image
Communication Journals. She was associate editor of a number of IEEE
Transactions, including IEEE Transaction on Multimedia Signal Processing
Letters, Circuits and Sytem for Video Technology, and Signal Processing
Magazines. She hold 32 US patent and three ISO award for her contribution to
MPEG Video Compression and Streaming. Her research interest spans through
multimedia communications, networking, processing and systems.
Without further ado, let's hear what she has done exciting work in her lab.
[applause].
>> Mihaela van der Schaar: Thank you, Jin, for the introduction. So what my
plan is today is to give you first a brief synopsis of our recent research at UCLA
in my lab, but my plan is to give you mainly a synopsis of what we are doing
without getting in an awful lot of details in view of time. However, I'm going to
have two students that are going to give more detailed talks after that.
And actually what I'm trying -- what I hope for this presentation is to catalyze
enough interest from you such that maybe when you come to UCLA and are
visiting us and you can find more details about our own research while you're
visiting us or maybe to invite us to give more detailed talks on any of the topics
that you may find of interest here.
Okay. So I'm going to try to start with the brief highlight of what is our research
all about. So our focus in the last couple of years has mainly been to come up
with a rigorous framework to analyze, design, and optimize. And I'm going to
stress dynamic and heterogenous multi-user environments and applications.
So actually starting with 2004 when I moved from industry to academia, we
started actually to look in our group and trying to find a new theory together with
algorithms and designs. The main focus was in trying to find a new theory for
architecting next generating wired and wireless networks and as well as
distributed systems which are interconnected through such networks which are
able to support media applications.
So in this research a main key theme is multi-user communication and
networking. And what I'm going to do today is I'm going to go through a couple
of the topic highlighted here in red. So I'm going to talk a little bit about some of
our new designs for multi-user communication in wireless networks. I'm going to
talk a little bit about how we can reduce power significantly in wireless multimedia
systems. I'm going to briefly talk about our work on peer-to-peer networks as
well as some more recent work on distributed stream mining systems.
There are also some other topics we have been working on, but I'm not going to
talk about those today. However, one of my students, Fangwen Fu will touch on
the topic of wireless media communication.
So if you look at really multi-user communication today most of the protocols we
have are really kind of passive, and they really are daring to some predetermined
rigid protocol rules. Also another type of design that has emerged in recent
years was to really model transceivers as rational agents competing for
resources, [inaudible] wireless resources, for example. And that's what game
theory tries to do.
However, if you look at the current solutions in game theory today, they can be
kind of classifying to [inaudible] algorithm. On one hand we have cooperative
games that eventually most of the time either require a lot of message
exchanges to ensure that the users are going to be able to operate on the Pareto
boundary. And here I'm having two users, just a user for illustration, and they
have utility one and U2. And if they would like to operate on this Pareto
boundary, most of the time either a moderator may be needed like for example
an axis point or a base station or this may be on a distributed fashion but through
heavy message exchange. If you don't want to have this type of infrastructure or
message exchange overheads, what we have is solutions which are based on
uncooperative approaches which are doing a myopic best response.
And what we know about this type of game sys is for some of these games we
can prove that a Nash equilibrium exists, but this Nash equilibrium which is
based only on myopic base response and local information is for most games
very inefficient. So for most communication games of interest, the Nash
equilibrium is not on the Pareto boundary, but rather very far away from it, and at
times it may even lie at zero-zero utility as I'm going to show you an example a
little bit later.
So also so first thing I would like to point out is that most of the work is focused
either on fully collaborative information exchange or kind of almost passive
response, myopic response based on local information.
Also the focus is really on equilibrium characterization rather than instructing new
solutions that will allow us to operate maybe based on local information but close
to the Pareto boundary. So I like to have this instructed design that would allow
us to move this point towards the Pareto boundary but without any message
exchange.
So that should be possible mainly because in reality these devices that we talk
about are increasingly smarter. We have smart phones and smart gateways and
smart devices which are heterogenous increasingly data strategic. They may
not want to just comply to existing protocol rules but they may want within the
framework of a particular protocol to maximize their own utility.
They are capable really to proactively acquire information and also learn
[inaudible] and also select desired equilibria. So then they work together to
achieve what I'm going to call here as coopetition so there may be
non-cooperative users aiming to maximize their own utility but which may
discover in this process that it is better for them to move towards a cooperation.
And I'm going to talk about that in just one second.
So coopetition stands for cooperative approaches in by non- cooperative users,
so non-cooperative users, but what we'd like to do is we would like to see when,
in which had cases these users are going to be able to cooperate or coordinate
to achieve better outcomes. And what we see actually in the literature is there is
very little work in the communication area in trying to find out better equilibrium
concepts and better solutions to achieve this better equilibria.
So why we would like coopetition is because the strategic devices really compete
for available resources and that may often result in inefficient outcomes. For
example it's inefficient Nash equilibria. However, if you have coordination and
cooperation, then both the efficiency may improve but also in many cases is the
network is well designed, the performance of individual users, of all individual
users may also be increased.
So what I'm going to call coordination is the design of protocols that better utilize
available information to coordinate spectrum access or resource access but
without explicit message passing. So there would be no message exchanges
involved.
On the other hand, I'm going to call cooperation the design of protocols that
shape the incentives of users, agents in this case, I'm going to call the wireless
devices or network devices as agents, so shape the incentives in such a way that
their selfish behavior results in cooperative outcomes. That it becomes in the
self interest of users to come up with cooperative outcomes.
So what we'd like to do is we'd like to have devices that are able to learn to
determine when cooperation or coordination may be beneficial, and if that's the
case to coordinate among themselves to select the desired equilibria, which are
beneficial in terms of efficiency fairness as well as stability.
So for example what we show here is a wireless device or a device that would
interact with the environment that may be dynamically changed over time and it's
making observation based on this information and based on that it may make
beefs about its coupling with the other users, how its actions affect in the short
term but also in the long term their performance. And based on that, they can
determine the policy based on which they would like to interact in the future with
the environment.
So that is based not only on information about let's say traffic or private
information of the user of private valuation of the user of a particular resource but
also on how this particular user is coupled with other users in the network. And
how he's affected and can affect other users. So coupling among users is key in
this type of interaction.
Also we are going to do a lot on trying to determine not only just to characterize
an equilibrium, but try to select equilibria. And that may depend on really the
device's ability, heterogenous ability maybe to acquire information and learn, but
also achieve performance at a particular equilibrium both from a system
performance, but also from individual users performance.
And finally we will be interested in convergent properties, both whether we can
achieve local stability or global stability. And my student, Yi Su, will talk much
more about that in a particular setting.
So kind of to finalize this introduction, our goal is really to look at different
transceivers having different private knowledge that are trying to dynamically
coordinate maybe, cooperate maybe or maybe compete to gather resources
such that they maximize their utilities. And what we have seen in our recent work
in the last couple of years that this really can achieve unprecedented
improvements as opposed to current existing solutions of protocol design which
rely on rigid predetermined rules for interaction.
So let me start by giving a brief overview of some work that we did in this area of
multi-user communication and networking. And I'm having here couple of the
students that have worked in this area, conclusively will give a presentation later.
And the type of problems of interest in this area are really new theory and
protocol design to achieve this type of cooperation and coordination in multi-user
access networks.
And really the vision is to better understand the coupling existing among the
decision making made by the users in this type of networks and design the
next-generation MAC protocols that can help us improve again not only network
efficiency but also fairness and stability.
And last but not least which may be strategy proof meaning that there is really in
the self interest of the users to comply with the protocols designed rather than
manipulate them.
So I'm going to start with exactly this last topic. So I'm going to talk about first,
about just give you an example about how we went about to solve selfish
behaviors in wireless networks. So we know that network protocols can be easily
manipulated by selfish users which may not necessarily want to destroy the
network but they may just want to maximize the utility. So they are self
interested. And this leads to a result very often that's called a tragedy of
commons that has been studied by many people, inclusively in the CSMA/CA
scenario by Cagalj and others. We have looked actually at the similar scenario in
the TDMA case. But I'm going to focus today just in view of time on the CSMA
scenario just because it's faster.
So really what we would like to do with this work is to come up with solution that
would help us avoid these inefficient outcomes. And we would like to have a
network designer design method that would help these selfish users to choose
cooperative outcomes.
So for that you are going to say well, we have solutions for that. Pricing schemes
have been designed to do just that. So a widely discussed method to incentivize
users to really come up with cooperative outcomes in competitive scenario is
really pricing. And that has been studied for a long time in microeconomics for
decades. And it is known that if you do that, the outcome can achieve the
system optimum. And the way it works is by having prices that capture the
external effects of a user's behavior on other's welfare.
Hence, when facing this suitable chosen, and I want to emphasize suitable
chosen prices, is equivalent really to taking the effect of one's behavior into other
-- on to others into account.
So for example what we have here is a user, user I may try to maximize its
actions over it's action set. So for example, it's transmission probabilities in a
wireless network. To maximize PI which will be the new utility which will be the
utility of the user, so for example the performance of the throughput that he is
going to gain, minus a cost. Minus a price, for use, he needs to pay for using
these particular resources.
So by introducing this taxation, we are hoping to try to really account for the
inconvenience caused to others. However, how we can interpret these prices in
network scenario? For a long time while looking at prices and taxation I was not
very comfortable with it. I didn't felt it was a good solution really for wireless
networks. And the main reason I started to ask this is because really how we
can interpret these prices and payments in wireless scenarios.
So for example, one way to interpret that would be to view these prices
coordination signal, like done in the work by Kelly, for example, or Low. In this
case, the signals are really for coordination in decentralized methods to find a
global solution to this optimization problem. And for a -- however, for a pricing
scheme to be effective, these users need to follow the prescribed scheme. They
are still going to be needing to align themselves with this prescribed rule, so
hence this becomes inconsistent with selfish behavior. Also it has a lot of
disadvantage if requires users to exchange messages to determine the desired
price, which for wireless network would be disadvantages.
Another interpretation of these prices in networks was as real payments of some
good. And that's for example the work of Varian or the other work of Kelly, where
really they tried to view these payments a really some form of monetary payment
that's exchanged among users.
However, the question becomes in the networking scenario, what good do users
really care about? Should it really be money? And really why do users agree to
pay prices? Because in order to be able to have enforceable payments, we
really would need to have some form of contracts. If we don't have that, then
users may free-ride. And having contracts may require a lot of infrastructure
addition to current networks which we might not want to add.
Another problem is that really if these users do care indeed about these goods
because we assume they should care, otherwise they would have an incentive to
free-ride, if they care about that, then these payments or rewards will reduce or
will increase the welfare of the users or the network provider.
So really it becomes a difficult problem to compute the right price for these
different types of users and to come up with the right infrastructure to implement
taxation and pricing. And especially if private information of users is involved,
evaluation of resources of good. This become an even more difficult problem.
Last but not least in many situations it has been shown, for example, by Johari
and Tsitsiklis is that price-taking behavior is not really consistent with selfish and
strategic behavior. Most of the users will not be price taking but other price
anticipating. They understand how their actions will affect future prices. And
based on that, they are going to derail, really, this type of process of
implementing prices to chief network utility maximization.
So what we searched for a long time was really an alternative to prices. So we
would like to have pricing schemes which are not necessarily suffering from this
disadvantages mentioned before. So what we proceeds is an alternative to
pricing to sustain cooperation among selfish users. For example in the case of
wireless networks. And what I'm going to give to you today is just a simple
example of where we implemented this type of scenario for a slotted Aloha
protocol. However this solution that I'm going to describe next can be
implemented in other scenarios that go way beyond just Aloha.
So again, just for illustration I'm going to a very simple slotted Aloha scenario
where I'm going to model the different wireless users transmitting their data as a
non cooperative game. Having a number of users, a strategy of users which is
actually going to be the probability of transmitting. The payoff of the users, which
is going to be of internal valuation of resources, KI, let's say, of how much they'd
like to receive a particular amount of traffic. Transmit particular type of traffic.
Times the probability of transmitting, times the probability of the other users not
transmitting.
So if you like, this is the good put of a particular users, a particular user multiplied
by a valuation KI.
>>: Can I ask [inaudible].
>> Mihaela van der Schaar: Yes, please.
>>: You used the word transmission. Can we replace reception in this model,
like the YouTube nightmare scenario where everybody wants to, you know, I
want my video and I don't care about him.
>> Mihaela van der Schaar: Exactly. Yes.
>>: Thank you.
>> Mihaela van der Schaar: So it could work on a transmitter driven scenario but
also it could be viewed from a receiver's point of view.
>>: Terrific. Thank you.
>> Mihaela van der Schaar: And as I'm going to show just briefly a little bit later,
it can be even from a relay scenario. So it could be a gateway somewhere in the
network as well. So there could be different levels of here are here of both
competition as well as coordination among different entities, transmit the receiver
gateways.
Okay. But I chose this scenario because it's useful simple to explain the key
idea. So in such a scenario, let's assume we have two users but as a result will
hold for many users as well. So I have user 1 and user 2. And what is well
known in such a scenario is that the Nash equilibria of such slotted Aloha games
are this orange line in here. And as a matter of fact, the most likely outcome is
the zero-zero outcome. So nobody will transmit. This result is known, collisions
always occur and actually the users will get no traffic transmitted.
So what we want is to find this method to support efficient outcomes without
message exchanges of message passing, and we would like to do that without
pricing. So one way to do that would be through the implementation of an
invention function. So this is the currently -- yeah?
>>: So that blue region is the ->> Mihaela van der Schaar: Oh, I have to explain a little bit better. So this blue
region in here would be all the possible outcomes all the way to the Pareto
boundary which would be possible if users would be able to coordinate. Okay?
So they would be willing to time share the network.
>>: Why is it like that?
>> Mihaela van der Schaar: So in this particular case we are going to use this
type of iterative function, yeah, so there will be no -- unlike the TDMA case,
where I'm going to have something like this, yeah, where there was time share
the network, in this particular case I'm not going to have coordination between
users for the explicit -- for explicit let's say outside entity, but rather I'm going to
have users in a distributed decentralized manner accessing the network.
>>: [inaudible] can't time share?
>> Mihaela van der Schaar: Yes, I cannot time share. Ideally, and I'm going to
talk a little bit about that later, I'd like to have a way to coordinate among each
other such that I can operate like in the TDMA scenario and I could time share.
However, here everything is done if a -- that's it, it would like the current HO211A
[phonetic] scenario, kind of a DCF scenario. It's a little bit simplified here, the
slotted Aloha, I don't have sensing, but it would be quite similar to that.
So what I have is I have users not sharing any information, just determine a
transmission for liabilities. So this is the current contention games like used in
current wireless networks like HO211 networks. And what we would like to do is
we would like to introduce a new type of game, an intervention game. I'm going
to call it for now Stackelberg contention game where the following will happen.
I'm going to augment this type of game with the manager or the policer and this
policer what is going to do as a first step is he will announce an intervention
function or this intervention function could be known a priori. So he will say I
would like you to behave according to this rule G. I'm going to go into more
details about how this intervention work can be defined in just one second. But
let's assume that I'd like the users to, for example, behave cooperatively such
that they can operate on the Pareto boundary. And I'm announcing this rule.
And then the different users knowing this particular rule G, they will determine the
transmission probabilities. However, if they do not comply with my rule G, then
only this policer will intervene and transmit.
So what I'm going to introduce is a method to really tell users what is desire from
them and in the case they do not perform according to my desired rule, I'm going
to punish them.
Let us go and understand this a little bit better. So I'm going to give an example.
For example for this particular game I chose a total relative deviation. So what
I'd like to have is I'd like to have an intervention function where the total relative
deviation of all the users needs to be minimized, for example. And what I'm
going to show is in here the relativity of a particular user, user I, and ideally what
he would like to do to maximize his utility, if I don't have an intervention function,
would be to always transmit. So the best for a user is always to transmit. As we
have shown in the previous slide. However, what I'd like the user to do is to
transmit only a certain amount such that I have the users actually time share this
network and hence there is nobody cannibalizing the network and transmitting all
the time.
So the way to do that is by implementing an intervention function which can
would shape this utility function of the user rather than being increasing so
conventionally if I don't have an intervention, the users utility will just increase
and his probability of transmission will increase.
However if I'd like the user to operate on this particular point such that he's time
sharing the network, then what will happen is I'm going to have the payoff being
shaped actually by this intervention function. So this red curve in here will
indicate the new shape intervention function through the use of this intervention
user. So it will shape the utility function of the users such that it becomes in their
own self interest when they try to maximize their utility to operate at the desired
operation point.
What is interesting to see is that, and that's that thing important, the level of
intervention is zero at equilibrium. It really will be not an intervention. The
intervening user will not intervene, he will only do that when the user
misbehaves, when he starts to transmit above its prescribed transmission
probability. So [inaudible] it is in the self interest of the user to maximize his
utility to operate at the prescribed level and hence this intervention will not be
taking place but it will only serve as a threat. So if you look at again at the
differences between this type of work with conventional work done by [inaudible]
people it's really quite different than conventional style grams and then after we
named the Stackelberg game we almost regretted of calling it Stackelberg
because it may be confusing.
So in conventional Stackeklberg games what people do is the following. They
have a leader, and this leader will have a fixed action. And given this fixed
action, for example, the users were going to respond. And the manager is really
the leader, and he will use resources. So if you look at the current existing
games using Stackelberg strategies, the manager of use resources to implement
the Stackelberg strategy.
However, in our case what will happen is quite different. First what the leader will
do, so what the manager will do, it will be a contingent plan depending on the
followers choices. He may decide to intervene and how much to intervene will
depend on how much the users are going to access, depending on the
probability of access of the network.
Also, the manager may not integrate an equilibrium, only a threat will exist. So
resources are not just used. There is not overheads involved to put in this
manager. Also, I want to say that here for simplicity I assume there is a
manager, but this will be also implemented in a distributed way. You don't need
to have a manager.
So alternatively this contention game as opposed to conventional games can be
viewed as Stackelberg game with multiple leaders. So conventionally we have
one leader and multiple followers. Here we have the reverse. We have multiple
leaders and a signal follower, the manager. And the manager will decide how
eventually he will punish the users depending on what the leaders, which are the
users, have done. So this is actually giving users incentives to behave
appropriately.
And this type of method could be used really beyond just a simple scenario I
described here. We have looked at ad-hoc networks, multi-hop networks and in
more recent times, for example, even as mitigation of attacks by malicious users.
And it can serve the purpose of coordination as well as providing incentives to
cooperation. But that was not shown here.
However, I'm going to talk briefly about coordination now. So let's assume that
we don't want necessarily to -- are not concerned with users manipulating their
transmission probabilities. So if you rather have, you would like to have them
coordinate such that we address the question that Phil just asked before. What
happens if we would like them to operate like in the TDMA scenario, would like
them to time share the network but without having a controller and without having
message exchanges? So how can user in a distributed manner coordinate to
access the spectrum and efficiently share the spectrum?
So there will be a tradeoff between the degree of coordination and the amount of
communication overhead. If you look at the current solutions, we have on one
hand TDMA where coordination is achieved by a central scheduler. So we have
both message exchanges but also we have an infrastructure. On the other hand,
if you look at slotted Aloha CSMA protocols, then everything is distributed,
everything is tee centralized, there is much less coordination so it means that
users will very often collide. So the performance would be less than in this case.
So what we would like ideally to have is protocols that are distributed and
decentralized like this but which achieve such type of coordination. So we are
able to do that. And the way to do that was through utilizing protocols with
memory. So in the course of interaction, users receive different types of
feedback. I'm going to give here just one example. They may know whether a
channel was idle, whether they sent a packet and it was successful, or if it was a
collision. Of course you could have more defined feedbacks and they could be
taken into consideration to further improve the network. But for now we are
going to assume just such a ternary feedback.
The probability of a user transmitting is determined then by both its recent
transmission decisions as well as the feedback information. And when designing
the protocols we looked at two things. On one hand we would like to maximize
the throughput, we would like to be as good as TDMA in terms of throughput, but
also we would like a load delay. And we define delay as the average number of
slots that the user needs to wait until its first successful transmission, okay, from
a randomly chosen slot.
So coordination can be achieved using memory in our method by correlating
successful users in consecutive slots. So what I'm going to do is something very
different than the current introductory level in DCF scenario for example of what
current MACs do. We are going to have a successful user transmit with a high
probability while -- if he success fully -- so a successful user which was already
successfully transmitting data will transmit with a higher probability while other
users, which are not currently transmitting will transmit with a lower probability.
And of course as we allow a longer capture by one, the total throughput will
increase while delay will also increase. So hence, we will have a tradeoff
between on one hand efficiency and the other hand delay or if you like equity
among the users because they were not able to access that channel as fast.
Also as I'm going to show briefly in the next slide, with proper design, a fully
distributed MAC protocol which would have an N minus one period where N is
the number of users can achieve outcomes in a decentralized way which would
be as good as a centralized Aloha.
So let us consider protocols with memory 1 just for simplicity. So what I show
here is on one hand the total throughput, so this is percentage, so it's the total
channel utilization, and this is expected average delay, and here will be TDMA.
TDMA would fully utilize the channel and will have minimum delay for the users.
However, again, as we said before, it will involve some form of message
exchanges as well as an infrastructure.
On the other hand, what we have here in red is the protocols currently used
without memory, and for example what we have here is an HO 211 DCF protocol
being drawn. In green what I'm showing is a state of the art solution which would
-- uses some heuristic methods to exploit some form of memory by Columbia
University. And what I'm showing here in blue is a first algorithmic use just a
memory of 1. And what you can see are tradeoff between having on one hand a
low delay but a limited throughput versus having almost as good a throughput as
TDMA but with a higher delay.
Here I'm going to show only the protocol with memory 1 but we have now
protocols that would be able to bring this toward the TDMA scenario. Now, why
is this shaped like this? Well, the reason is that -- oh, by the way, this is for five
users again, this ternary signals feedback. So for this particular scenario, the
reason the curve is going down is because a success lasts for only one slot.
Hence if there are many idle and collision slots between success and
transmissions then what we are going to have as total throughput is that as the
total throughput increases the idle and collision slots reduce and as a result, the
delay will decrease. So that's the reason why the delay decreases in here.
However, on the other hand, on this other side I'm going to have more successes
and less collisions. However, a success of a user last for several slots. So
hence, there are small number of idle or collision slots between two streaks of
success. This will lead to a higher throughput but also to more delay. So you
can see there is this tradeoff now between on one hand increasing the
throughput but also increasing beyond the point of delay.
>>: [inaudible] you will give a more [inaudible].
>> Mihaela van der Schaar: No. This is work done by another student, Joe
Park.
>>: Okay. So maybe.
>> Mihaela van der Schaar: We can refer you to details in all of this in some of
our publication.
>>: Okay.
>> Mihaela van der Schaar: Love to.
>>: Maybe they can basically just briefly explain one of the collaboration
protocols. So you basically remember in the last slot when you collide, whether
the channel is [inaudible] or something, right?
>> Mihaela van der Schaar: Yeah. So I now know the following. I know whether
the spectrum is idle, whether nobody is transmitting. I will know that I have
successful transmission the previous time slot. And I will know that I have the
collision.
>>: Okay.
>> Mihaela van der Schaar: Kind of very similar with a current protocols we have
today, like CSMA.
>>: [inaudible] which is facing a channel that's occupied by someone else.
>> Mihaela van der Schaar: I could -- okay. So what I have right now is I have
idle or not. So I know the channel is occupied or not. Yeah. So I know it's
occupied or not.
>>: Okay.
>> Mihaela van der Schaar: And if I transmit, I know that I was successfully
transmitting or not.
>>: [inaudible].
>> Mihaela van der Schaar: Yeah. So I know in the idle case is it used or not.
>>: Okay. And what are the decisions ->> Mihaela van der Schaar: Okay. Okay. So the decisions are as follows: If a
user successfully transmits in the previous time slot, then it will have a higher
probability to transmit again.
>>: Okay.
>> Mihaela van der Schaar: So capture the channel. On the other hand, if users
are going to transmit -- not transmit at the current moment in time, they will
transmit at a low probability.
>>: [inaudible].
>> Mihaela van der Schaar: And note that this is quite different. This is quite
different than current protocols which -- yeah, yeah. Sorry. Again, [inaudible]
time for all of the details, but we would love to show you more of the details.
Okay. So if you look at this set of protocols with memory, we have the
advantages of both worlds. On one hand they are going to have performances if
you increase the memory. I just showed you memory 1. But if you increase the
memory, we are going to achieve as good performance as TDMA without
message passing. So the message overhead will still keep small.
On the other hand, if you are concerned about the amount of memory, then you
may have maybe performance which is only in this region as I showed in here.
With memory 1, you will need to make a tradeoff between on one hand having
throughput which may be as good as TDMA, versus delay. However, if you
increase the memory, this curve goes towards TDMA.
So we introduced these memory based transmissions in MAC protocols, and
they require much less overhead, actually requires no overhead compared to the
TDMA type scheduling. And by varying the level of memory, we can obtain a
variety of performances in both delay as well as throughput. And these protocols
with memory can be applied to various scenarios besides the example that I just
gave now. For example, we use them for event networking as well.
Now, to go back to a question that I got before from the audience, some of these
methods could be implemented not only at a transmitter side or a receiver side
but even within a network. So for example, I have another student of mine who
looked at how we can have distributed spectrum management in relay networks.
So you may have a base station in here, but we may have multiple users
transmitting through each other to a particular base station. And then I will go in
this project, which is a kind of new project for us was to have users
autonomously determining their power. And what we would like to do is we
would like them to spatially reuse the spectrum, so we would allow simultaneous
rely transmissions of users. Also, we are going to use amplify and forward,
which is low cost on like, for example, decoding for methods used by other
people. We would let the nodes smartly interact with each other, meaning that
they would like to optimize the spectrum in response to interference. So hence,
what we do in order to optimize such a setting is again we rely on this non
cooperative approaches, we model this as a game where the relays are the
players. They try to determine their power allocation on the various channels in
order to maximize achievable rates.
So every user I will try to determine this power which will maximize the
achievable rate given the interference of the other user, given the power that is
allocated by the other users. So what we have approved in this type of scenario
is that in such settings at least one equilibrium exists and determined the
conditions for convergence for such settings an designed low cot protocols that
are able to converge.
And while doing that, the performance of this protocols was shown to be better
than conventional approaches such as equal power or TDMA. And also if you
compare to other existing solution for spectrum management in networks, for
example, work that does spatial reuse of the relay slot or action based power
control or for example simple two user settings where people have used much
more complex schemes such as decode and forward, what we can see that in
terms of spectrum efficiency signaling overheads and cost our methods has the
best performance versus cost tradeoffs. So we achieve a high spectral efficiency
at the low signaling costs, actually we don't have any signaling costs, so fully
decentralized framework, and it's a very low cost because we rely on just amplify
and forward techniques.
Now, let me move at a different topic. I talked quite a lot about multi-user
communication, but what I want to do now is talk about briefly about power
constraint transmission in wireless media networks. And the work here is done
by a set of different students. And the main coal here was to drastically reduce
power in media systems. And the goal was to be both application aware as well
as try to come up designs which will reduce both the transmit as well as the
processing power. And the solution was that really to deal with the dynamics
introduced by both application as well as by the dynamics in maybe the power
consumption or the wireless channel to learn these dynamics online and come
up with a run -- time stochastic optimization that will try to maximize the quality of
the video data, for example, subject to whatever power constraint we may have.
And really the overall goal here was to come with a shift in the current design for
media systems where layers now are able to interact with each other. It's a
different type of interaction as in the previous slides. In the past users were
competing with each other. In here we have different layers of the protocol stack.
So if you like the different agents are cooperative. They really try to cooperate
with each other. So they would like to take into consideration their dynamics are
different layers and based on that information, they would like to reason and
interact. So the idea would be to have based on this interaction and learning
methods that will lead to smarter design at different layers for the system stack.
So just to give an example, here the layers are no longer application layer for
example neck layer and physical layer but of the layers we talk about is the
application layer, OS layer and the hardware layer. And our focus is again on
trying to maximize for example the media quality and the traffic dynamics given
whatever power constraints you may have. And the type of problems that we
have looked at were bodes rate distortion, power control as well as power
management, like for example faulty scaling as well as work such as resource
allocation and scheduling in the case we have multiple tasks.
We have also looked at the case where this hardware is faulty. So we may have
errors in the hardware. And in that case, what we would like to do is still be able
to utilize these ICs, rather than throwing them away, try to compensate for them
at the higher layers.
And if you look at the common solutions used to deal with this system problem,
the key idea for us was no longer just act based on current information but rather
consider how the current dynamics will impact future dynamics as well as future
decisions. How, for example, a decision of switching to a different power level or
switching off my transmission -- my RF at this moment in time will impact not only
current performance but both the future performance as well as future costs.
So this is what we call foresighted decision making. To deal with these dynamics
online, we rely on reinforcement learning solutions. And this is kind of different
than conventional. When people talk about learning in the past, this they mainly
talk about model based learning or learning based on, for example, maximum
likelihood estimation or even our own work on adaptive linear prediction. So
most of the time what we have is we have a layer, for example the application
layer of the OS layer looking at a particular type of dynamics, for example, traffic
dynamics or CPU dynamics estimate the current conditions and based on that
determine a local policy, local to this layer and decide what is the best strategy to
act upon at this particular layer.
Rather what you would like to have is the proposed solution would rely actually
on repeated interactions among these agents and the unknown Dimes at the
different layers of the protocol stack. Also what we are going to do is we are
going to rely on model free learning techniques or reinforcement learning
techniques. So we are going to have interactive learning, we are going to try
without having a model to learn these different dynamics online and based on
that also this different layers are going to be able to interact with each other and
exchange messages.
So our focus in this work is what is the minimum amount of message exchange I
need to have among layers to enable to maximize the utility across the different
layers of the proper core stack. And this reason this is challenging is because
there's different layers of the proper core stack operate the different scales. So
for example, the application layer operate at a very different scale then for
example the OS. So hence what we would like to do is deal with this kind of
asynchronous behavior and have a distributed control which would be able to
achieve the maximum performance. And from when we will briefly talk about
some of this work but not in the context of this problem but rather in the context
of media transmission.
Let us look how good these methods really are. So what I compared here
against is I'm going to compare mainly against work done by Professor
[inaudible] and others at UIUC where they adapt all the different layers of the
protocol stack but they are doing that myopically. They just look at what happens
and react based on that.
So if you look at what happens for the particular scenario considered here, it is
the performance of the video transmitted is only 31 dB. So it's really quite, quite
low. However, on the other hand, if you look at the performance we cannot
achieve by applying these foresighted decisions, performance significantly
improved. And the reason is as follows: What we are doing is actually we are
able to capitalize on the fact that current actions we are making will impact future
decisions. And these future decision will have implications at also other layers of
the protocol stack. So this is becoming very important. So it's not only about
cross-layer optimization, but also making decision -- understanding how these
decisions will make different layers impact the decisions at the same layer over
time but also decisions that other layers over time.
Finally let me move at another top, peer-to-peer networks. We did quite some
work here in recent sometimes with the student that just graduated on this topic.
And the main idea there was not necessarily looking at media but rather trying to
come up with just better resource reciprocation strategies, how users could
determine their upload and download rates in dynamically changing
environments. So if you look coming up with a better BitTorrent type algorithm.
So what we wanted to do is you have optimal solution for resource reciprocation
among peers that are interested in the same content. And you would like to have
a method that's both rigorous and analytical so not only simulation based but we
would like the predicted outcomes that we can get. And to be able to deal with
dynamic interactions so peers coming and going as well as with heterogenous
peers, so peers having different uploads and download rates that are self
interested and try to maximize their own utilities and they act autonomously. So
it is a distributed scenario.
And the solution there was to really model this type of reciprocation as a
stochastic game where peers again are going to play their game. The actions
that are going to be involved by the users playing the game is how much
resources they will reciprocate and to which users they will reciprocate. Also
which user they should choke or should unchoke. And they will determine that in
the framework of the stochastic game by making foresighted decisions. So they
will not only look at the current impact on the performance but also how their
performance -- how their selection of actions which peers to choke -- to unchoke
to whom they should transmit information, more information or less information
will have an impact not only short term but also long term.
So the goal is again to maximize the utility among this repeatedly interacting
peers. And for that of course in order to do this long-term optimization, you really
would like to have an efficient method and robust method to model the peers'
behavior. And we do that again using online learning.
The good news about this is that in more recent times we had the computer
science department being interested in implementing some of these frameworks,
so they have implemented this in actually PlanetLab for us and the current
results we have are very promising significant improvements as opposed to
methods such as BitTorrent or BiTyrant and other methods existing there.
Okay. So this is really the last topic I'm going to talk about. This is again a more
recent topic. The idea here is really trying to come up with some form of cyber
discovery, if you like in the case of distributed network processing units. So you
can think actually about having multiple gateways, for example. Everybody in the
home having a gateway. This contain a particular amount of video data. And
what you'd like to determine is you'd like to determine you're interested, for
example, in skating. So what you'd like to do is you'd like to go through a set of
queries whether it's a team sport, whether it's a winter sport and all these
different types of queries are linked together. And what we have shown is
actually that these complex classifiers can be decomposed into cascaded
topologies of such simple binary classifiers but even more interestingly, if these
type of classifiers are low indicated in a particular processing node. So for
example, a particular entity is specialized in solving this type of -- answering
these type of queries, while another entity is specialized in answering the
different type of query and these nodes are not co-located, they may be at
different locations in the network, then what we will have is we will have
processing operators that can be instantiated on distributed remote devices and
these devices may have their own constraints, resource constraints of answering
a particular query or processing these particular -- a large amount of queries in a
certain amount of time.
So what you can think about is not only one query being sourced at a particular
moment in time but maybe multiple queries simultaneously being sourced. So
hence, these processing nodes may be congested.
The problem then becomes can we, for example, maximize the joint classification
quality across a large number of queries subject to maybe resource constraints
by the processing nodes, delay constraints, I may want to answer a particular
query within a certain amount of time, maybe couple of seconds or a minute, the
dynamics of the traffic coming through the different classifiers and also since
these classifiers may be at times located at different locations, how am I going to
deal with these distributed decisions? Because at a particular node I have all the
decisions to make. I can decide for example to operate at the higher accuracy to
come up with a better classification of the data, but this may a longer amount of
time for processing. So it's a tradeoff between curiosity of processing data
versus the time associated with processing the data.
Moreover, for certain types of queries, even the order of these types of questions
of this type of classification decisions could be changed. So what we have is a
joint decision on one hand of a topology, so if you like is a routing problem, but
which is just combined at this stage not only with routing, so certain to logical
mapping of these different classifiers but also the processing involved at every
classifier. So as opposed to conventional let's say routing problems, the problem
is we're not only routing the data, but also how much I should -- how I should
instantiate what should be the operating point at every classifier because each
one of those will lead to a particular processing delay and hence buffering the
delay across the chain.
So we come up here with new solutions for stream mining on both distributed
routing, processing algorithms as well as optimization and online learning. And
some of this work has been done in collaboration with IBM Research Watson.
Now, there are other topics that I didn't discuss here. Like for example we did
quite a lot of work together with Intel on multi-hop wireless networks, mesh
networks, mainly for campuses and larger homes.
We did some work on cognitive radio networks by Fangwen here, but he will not
talk about it. We looked also at wireless media communication. Fangwen will
talk about this in his talk. Also we looked at new cross-layer designs, methods to
do cross layer design as well as in more recent times at some new solutions for
doing reconfigurable coding. I this think Gary [inaudible] talked a little bit in his
talk, a new paradigm for video compression.
So to finalize really, many people are asking, well, how can we come up with a
new clean slate design for the Internet, for example, or for networking and how
we really can catalyze a new generation of algorithm, systems and design. And
it is our view that what we would like to do is we don't want necessarily to come
up with -- to throw away protocols and throw away designs but ratted just
reengineer these existing protocols and make them operate as markets where
devices based on this private knowledge that they have can dynamically
coordinate, cooperate and compete so don't have these dumb devices but rather
smart devices which can cooperate and compete and maybe even design the
objective functions of these users such that they are aligned to whatever
objective maybe a network provider may have.
So based on our results we show that this really improves significantly the
performance of the network. And again Yi will go into a little bit more of how this
is possible.
So our hope is really that these devices now will no longer be passive
transmitters or receivers or relays, but rather they will evolve and become
smarter because they need to compete against other users who coordinate with
other user and it is an online learning process taking place. So the hope is that
this will lead to theory as well practical designs and add some new dimension to
both communication theory and network theory and also increasingly we believe
that in game here as well because there the focus has been much less on
equilibrium selection and much more equilibrium characterization.
Also, the work there has not considered constraints in terms of memory, in terms
of message exchanges which are very important for communication problems.
Finally I know I skipped through a lot of the material, even though I talked way
too long. And if you're interested, I'd like to ask you to please go to our website
and from there you can find a link to our publications. And if you have any
questions or any comments about any of our work, we will be delighted to answer
more of your questions.
>>: You really have too much material.
[applause].
>> Mihaela van der Schaar: Too long.
>>: I'm afraid to ask you questions because I think that will stretch the talk way
too long. Actually would like to learn more about the [inaudible] especially in the
[inaudible]. But I mean, I realize [inaudible].
>> Mihaela van der Schaar: Too long.
>>: For me to interrupt.
>> Mihaela van der Schaar: Maybe we can have the discussion after the lunch.
>>: [inaudible].
>> Mihaela van der Schaar: So I guess you either can come and visit us or invite
us -- if you are interested in --
>>: I will find out ->> Mihaela van der Schaar: If there is a particular topic. So my purpose was to
tell you different topics, but if you are interested in a particular topic, either I or I
together with a student we could come here back and give you more details.
Also, Yi will talk now about some other types of protocols than the ones shown
here but which are also able to improve the performance in wireless networks.
So that may be interesting. And it will go in much more detail. Thank you.
>> Yi Su: Good morning, everyone. My name is Yi Su, and I'm a graduate from
the electrical engineering department from UCLA. And I'm working with
Professor van der Schaar's group. It is my great pleasure today to present my
research work at Microsoft Research.
So the topic of today's presentation is new perspectives on multi-user
communications which summarize our most recent progress in the understanding
of how heterogenous communication devices information availability, decision
making and learning capabilities have an impact of the performance of
communication systems.
So we start from a very high level abstract of the current existing literature in
multi-user communication. As Professor van der Schaar just mentioned before,
there are two main categories of this existing approach. First is called
cooperative. Sometimes when you have centralized manager and collect this
information for all the communication devices and perform some centralized
optimization to maximize certain systemwide objective function. And sometimes
we can also have some non-cooperative approach. And the outcome of such
multi-user [inaudible] response to the Nash equilibrium, it's well known that the
Nash equilibrium might not be efficient.
So here is the information availability. So the existing research however is focus
on two extremes of the information reliability, either local information or the global
information. In reality we know that these communication devices might build
space based on different standards or algorithms. And based on their hardware
constraint, the ability to make the ability to make decision can also be different for
different devices. So our question is that what if this happens and how will the
system performance of the communication system will be like. So in this talk we
will use two illustrative examples, including wide -- multi-user wideband power
control and also wireless random access.
So here's the structure of today's talk. So we will discuss two kinds of
communication devices. We classify the communication devices into myopic
users and foresighted users and we will address the definition of this my on here
and foresightedness later.
And we also have different assumptions about the information that these
foresighted users possess and we will see how this information -- when the
information changes how the system performance will vary.
So the first example we will just use a multi-user power control scenario. So
basically we investigate this problem in the frequency-selective interference
channel setting. And it's a common setting in both the wired DSL system and
wireless OFDM system. So in this problem, the individual communication
devices want to optimize their transmit power spectrum density function in order
to increase the achievable rate in such system.
And we are going to use the game theoretic design to study the performance. So
here is the system diagram. And XIF represents -- so it's basically it's a
frequency-selective interference channel, so XIF represents the transmitter signal
from transmitter I at frequency F. So HIJF represents the cross channel game
from transmitter I to receiver J. So it's frequency selective, meaning that this
HIJF is not a constant, it's a function in superscript F. So it will vary from
frequency to frequency.
So it receives YIF represents a receive signal at the receiver tray. It basically is
composed of the -- it's designed signal from its own transmitter plus their
interference signal from the other transmitter and also plus the sigma IF here.
Sigma -- I have sigma IF here rather than noise spectra density at the receiver
side.
So if we look at the user K's receive spectrum, basically it's composed several -it consists of several parts. The first part is the noise spectra density absent
some receiver, so it's indicated by the red curve. And the second part is the
interference, the other transmitter cost to him. Basically it's the white space
between the blue curve and the red curve. And also it will receive its desired
signal from its own transmitter.
So for reaching the middle user you need to determine how much power you
need to allocate in different frequencies. So basically you need to determine this
PKF here. This represents the user K's allocated power in frequency F.
So each user is subject to a total constraint. So the summation for other
allocated power in different frequent [inaudible] need to be less than or equal to
this total power [inaudible].
>>: [inaudible] can you assume the message is delivered to one user or you can
send multiple message, multiple [inaudible] let's say I have 10 message in my
queue which is how it [inaudible] are you assuming in the allocation talking to
multiple of them or are you basically just assuming [inaudible] one [inaudible]?
>> Yi Su: Oh, he just need to talk to its own receiver.
>>: Okay. So basically that's one receiver. So basically here he always has one
sender and one receiver?
>> Yi Su: Right. But they are simultaneously transmitting so they are creating
difference to each other. So the utility we define here is actually we assume that
each user must equally cheat the interference from the other user's noise so the
total attributable rates it can successfully decode is actually given by this formula.
Okay. So here we have the basic infrastructure of the multiple user interaction.
Then we can define S again. So basically we have number of players. And
action set is given by this one, meaning that each user need to choose some
transmit power spectrum density function that satisfy the total power constraint.
And also each users payoff is given by the achievable rate.
So now we briefly reveal the existing research -- existing solution from this
wideband convergence scenario. And the first kind of approach is called non
cooperative approach. And the most famous algorithm is called iterative
water-filling algorithm, proposed by people from Stanford.
So from a single user perspective, given the noise and the interference prospect
on density, the best thing he can do is just to perform the water filling. So
iterative water fill is just a multi-user version of their single user water filling. So
the basic procedure for the iterative water filling is the following. So every user
just randomly pick up some visible power allocation and really this power
allocation may not be the best response with respect to the interference of -- with
respect to the interference that the other user cause to him.
So at each iteration each user will update his power allocation as the water-filling
solution with respect to the interference as the other user cause to him. And
once he update his power allocation, he will create a new pattern of the
interference to the other user. So he will perform this iteration over time. And
people have shown that on the wide channel condition this iterative procedure
will converge to a unique Nash equilibrium: So this is the so called iterative
water-filling solution. In this scenario, each transmitter needs only to gather
feedback from its own receiver of the interference pattern the other users cause
to him. So there is no information exchanged between different transceiver pairs.
Okay. The other type of approach is called the cooperative approach. And in
this case, they need to have our system manager which need to collect all these
cross-channel coefficient from different transceiver pairs. And this centralized
controller then will perform some weighted [inaudible] maximization problem.
Please note that this problem is actually a non-cooperative problem which is very
hard to solve. But people use some dual approach to handle this problem.
So this figure shows the performance comparison between the non-cooperative
approach and the cooperative approach. We can see here for the cooperative
approach they need to have a lot of information exchange, but the performance
is much better than the best known solution of this non-cooperative scenario
nowadays.
So our question is that if we only focus on the non-cooperative setting, so users
only have their local information about the interference pattern the other user
creates -- create to him, can we do better than the iterative water filling?
So now we will assume that some user gets smarter. He has the information
about the water filling game and how he should to perform in such a
non-cooperative setting. So this is the basic question. So we want to get some
intuition from very simple to sort of to action game here. So we have row player
and the column player. For the row player he can choose to play either R or then
for the column player he can choose to play left or right.
So each element in each box, the first element in each box represents a payoff
for the row player, and the second element represents the payoff for the column
player. As we can see here, 2 is larger than 1, 4 is larger than 3. So if the row
player want to play the Nash strategy, then definitely he will choose to play down.
So however if he choose to play down and for this column player because 1 is
larger than 0, then he will choose to play left, so they will end up being a down
left play which creates a joint payoff of 2-1 to both users. However, as we can
see here, this up right will actually result in a strictly better payoff for both users.
If those row players choose to play up.
So from this simple game we can see here, this down left play is actually the
Nash equilibrium, however, if this row player he gets smarter, he choose to play
up, then both user end up a better play. So which means that if user is myopic,
he just take the action of the other user as given. He will play the Nash strategy.
This outcome will be not efficient. However, if this user gets smarter, he become
foresighted and he know the actual structure of the game they are playing, he
might choose another action which might benefit himself. So this is the basic
intuition that we get from this simple game. Okay.
Then we go back to the original wideband power control scenario and to
investigate this foresighted issue, first we need to define the so-called
Stackelberg equilibrium. So for Stackelberg equilibrium, we need to have one
foresighted user. And also multiple myopic followers. So for these foresighted
user we need to determine an action based on the following rules. So once he
pick up action, he's aware of the response of all the other myopic followers. And
for this foresighted user, he will choose the action of these AK star here. This AK
star will leave to himself a utility that is not worse than all the other feasible action
in its own action set. Meaning that this foresighted user is aware of the reaction
of the other myopic users.
Then he will simply choose the best action that bring the best performance for
himself. Okay. Then we prove that the existence of the Stackelberg equilibrium
is always -- we always have a Stackelberg equilibrium in the power control game.
This is simple because this mapping is continuous and it is upper bounded. So
this upper bound is called the single user waterfilling bound, meaning that the
other users creates no interference to this foresighted user. This is the best
rate that this foresighted user can get in the interference-free environment.
So as long as the other user creates interference to him and his attributable rate
will be less than these upper bound.
Okay. Then the next question is how can we find the Stackelberg equilibrium,
and how much better this Stackelberg equilibrium is compared with the Nash
equilibrium. Because intuitively it should be -- it should not be worse than the
Nash equilibrium. So we start from a very simple two-user scenario. It can be
standard to the multi-user version. And we can formulate as a bi-level
programming problem. So here we this user want to be the foresighted user.
For this user 1 you need to determine his power allocation which is a vector that
this par allocation will maximize its own achievable rate. Please note that here
when user 1 formed this particular program need to consider the reaction of this
user 2 because user 2 will simply choose the water-filling solution. And so
basically this bi-level program consists of two sub problems. The first is upper
level problem. And actually in the original iterative water filling solution, this
foresighted user only saw the upper level program. He will totally ignore the
lower level problem.
So if the user become foresighted, he will form such a program. And the
question is that having such a bi-level program, how can we compute the
Stackelberg equilibrium? Okay.
So we -- we try to understand the computational complexity of this such
Stackelberg equilibrium and trying to based on the understanding we propose
some low complexity algorithm to calculate the Stackelberg equilibrium. So in
order to handle the original bi-level program we need to reformulate it into a
single level problem. So we introduce a function named the water-filling function
here. So basically this water-filling function will determine the user 2's allocated
power at different frequency beams given the prior allocation of the foresighted
user and user 2's own noise spectrum density cross-channel game, also its own
power.
So here is the cross-form expression for this water filling function. This looks
complicated, however, the intuition here is very simple. We can look at the
illustration here. So let's say this is their user -- user 1 chose something about
power allocation and this is their spectrum that user 2 does -- this is the user 2
receiver's spectrum. And we can see before user 2 determined his power
allocation he need to do some -- do the following things. First he feed to ran his
channel based on the channel condition from the best channel to the worst
channel. And the second thing he need to do is that based on his own power
budget, he need to determine how much channel I need to occupy. And just
simple increase his water level until he use up all his power budget. And then we
can see from the closed form expression for the water-filling function is that first
we need to have a permutation which ranked user 2's channel condition based
on its own channel condition. And the second thing is that we need to determine
some number of the channel such that user 2 will only access this channel. So
as we can see from this closed form expression because as long as user 1's
water level go across the boundary between different frequency beams, which
will cost this function to be the left derivative and the right derivative to be
unequal. So which creates these water-filling function to be non- differentiable.
And at the same time we can see here this power term, this P1 F, also appears
in the denominated term in the objective function, which creates the objective
function to be non-convex.
So these cause the optimization problem to be very hard to solve. But however,
we notice that there exists some literature applying the dual algorithm in the
non-convex power allocation in the literature. And we're trying to apply this
method in our Stackelberg equilibrium computation and to see whether or not it's
suitable for our scenario.
So to apply the dual algorithm -- so basically we have the primal problem here.
Then by having the primal problem, we first -- we form the [inaudible] dual and by
maximizing for a given [inaudible] value but we maximum the [inaudible] dual
then we have the dual function. Then by minimizing dual function we have the
dual optimal.
So firstly we show that for the Stackelberg equilibrium the dual gap may not be
zero, however it's dual optimal. If the dual gap is not zero this dual optimal will
be strictly tighter than the original single-user waterfilling bound. The mono tonic
property of these dual function in terms of the [inaudible] valuable, meaning that
if I increase the penalty term of the [inaudible] valuable and [inaudible] the dual -the dual function will decrease as I increase the penalty.
And the most key things is that how is the complexity of solving this problem in
the dual domain? Unfortunately it's still has the same complexity as in the dual
domain which caused this problem cannot be decomposed into different
[inaudible] problems.
So why this is -- cannot be reduced? Because in the water filling function in this
slide user 2's allocated power not only depends on user 1's allocated power, but
in this particular frequency bin. It also involves some allocated power in the other
frequency beams. So which cause this problem. Very hard to handle. So the
key idea here to propose low complexity solution is that instead of solving the
[inaudible] globally, we propose to use the local maximum to approximate the
dual function. Then we just need to solve -- find the local maximum of this
[inaudible] valuable and use these to update the dual function.
The advantage is that we have very low complexity solution and it's not purely
based on heuristics. And we only -- we also allowed arbitrary number of myopic
users. Okay. Then by applying this low complexity solution, we stimulate a wide
range of channel realizations and here something interesting happens. So we
find that by introducing such a foresighted user, using this low complexity
function, this foresighted user always achieve better performance compared with
the iterative water-filling algorithm, meaning that if the user get the global
information he choose another action, he will benefit -- benefit -- definitely benefit
himself. While on the other hand for these myopic followers, surprisingly, these
myopic followers also in the most relations these myopic followers also get
strictly better performance. So it means that if this foresighted user, even though
he's trying to maximize his own utility, the other myopic followers also get better
performance so this is quite unique because in the original game theoretic
scenario people don't have a generalized solution -- a result about in the
Stackelberg game how's the performance of the follower. So it turns out in the
water-filling game the follower's performance also gets better.
Okay. How this is achieved. So let's look at the action domain. So these two
figures shows the power allocation for both users and the upper figures shows
the power allocation using the iterative water-filling algorithm. We can see here if
both user apply this water-filling algorithm, then they will just use this best
response. However, if this user 1 he get foresighted, then he will choose to not
water fill his power, he will just occupy some channel that is in that channel he
has very good channel conditions. By putting a lot of power in these channels
and while at the same time he will keep the followers away from this channel,
and by doing so, he can achieve much better performance compared with the
water-filling solution. Even though he can gain immediately increase by water fill
in these channels. However, it turns out if he wasn't filling these channels in
return the other user will occupy this channel which result in worse performance
for himself.
>>: So [inaudible] has some kind of different channel transfer.
>> Yi Su: Right, right, right, exactly.
>>: So that's why you put on channel 3 and 7 to 10 is more favorable channels ->> Yi Su: That's the basic idea of frequency selected, yeah.
>>: How achievable is that in the real world case?
>> Yi Su: Okay. So basically in the DSL system, so this channel coefficient is
fixed. It's fixed over time.
>>: But you need to discovery ->> Yi Su: Right. So ->>: Channel condition which is fixed over time, I mean, may not be -- I mean
channel condition [inaudible] change ->> Yi Su: Yeah. So the original power control, what this water-filling game is
proposed for the wired system, for the ADSL and VDSL system.
>>: [inaudible].
>> Yi Su: Okay. For that case.
>>: Which -- yeah.
>> Yi Su: It's not a problem.
>>: Yeah.
>> Yi Su: So here we -- in the first part of this talk we assume that they know this
information.
>>: Okay.
>> Yi Su: And in the next part we will relax this assumption and they will learn
this information.
>>: Okay.
>> Yi Su: Okay. So the conclusion so far is that we have the non-cooperative
setting and however if some of the users they get foresighted is able to nail the
structure of the game he's playing and potentially he can improve the
performance for all these players. Okay.
As I just mentioned, in order to play this Stackelberg equilibrium we assume that
this foresighted user have global information, which might not be very realistic in
the -- in real systems. So a more realistic assumption should be for this
foresighted user he should only know the interference that the other user cause
to him that aggregates effect the other user cost to him not the detail of the
channel across channel transfer function of the actual game he plays. So our
next investigative problem is that if these foresighted user only knows this
aggregate effect how he should -- how should he explore this structure? And
also attain some desirable performance that is similar as their separate equilibria.
So the intuition that here is foresighted user he should somehow model the
accumulative interference of the other user cost to him. Okay. Then now we
reask this assumption then to see how this user should react.
Okay. In order to model the mutual coupling, we need to reformulate the game
that we original designed and first we need to introduce a new concept which is
called state. So the state means something that some value that has direct
relation for different users to different users utility function. So given a state,
given the user case state and given user case action, each will directly determine
this user case utility. So in particular, for example in the power control game, the
state will be the interference the other user cause to him. So by defining state,
we need to have this state determination function so this state determination
function will captures the actual play of this particular game. So it will capture the
aggregate effect of the other users' actions. So it will be a joint mapping from the
joint action space to individual user's states.
And on the other hand, we need to have some internal belief. So we need to
have a belief function so this belief function will be mapping from individual's own
action space to its own state. So meaning that each belief function will represent
individual users internal modeling of how its action will impact its own state. So
it's a purely internal belief from individual users own perspective.
Now we are ready to define the conjectural equilibrium. So what does
conjectural equilibrium means? So basically it's a configuration of belief
functions and the joint action set. And at a conjectural equilibrium its user first he
find that his beliefs are realized. He knows that if it takes such an action it will
bring him certain state. And this state it will be confirmed by the actual play.
And on the other hand, he will maximize his own utility function by believing that
by playing this certain action and he will -- it will bring him such a state and this
action will maximize his utility function. So now we decoupled the multi-user
interaction and user just have its own internal belief and who it will perform as a
best response with respect to its own internal billing. Now we are ready to apply
this conjecture agreement into the water-filling scenario. Okay. So in the power
control games, the state space is simply defined as the interference the other
user cause to him, and the utility function is still the same as the previous case.
And the state determination function, this is the actual play. It uses allocated
power will directly determine his received interference. So the belief function in
this case will choose it as a very simple linear form. And we will explain why this
makes sense. And here individual user will just simply believe that if he increase
his allocated power in frequency N and interference that he's going to experience
will be decreased by a certain amount, especially his internal belief, and -- okay.
Here is the main result we prove. So first we show that the aggregate
interference the other user cause to this to this particular user will be a
piece-wise linear function of the allocated power of this particular user in this
channel. And the first order derivative with respect to the allocated power in the
other frequency bins will be zero if the number of frequency's sufficiently large
enough.
So which means the first order information is sufficient enough to capture the
multi-user coupling in such a multi-user power control games. So the intuition is
-- we will expand the intuition here. So actually this is user 2's spectrum. He
originally performed the water filling. As user 1 increased his allocated power in
this frequency bin by a certain amount and if he increased this certain amount
and user 2's water level will definitely increase, however, if the number of
frequency bins is sufficient large then the increase of the water level will be
negligible, so which means that the allocated power in the other frequency bins,
the increasing of the hour level in the other frequency bins will be zero and while
at the same time the user 2's allocated power in this particular frequency bins will
be reduced as a linear function of the user 1's increased power. In return, user
1's experience interference in this particular channel will just simply linear
function of its own allocated power in this frequency bins. So this is the basic
intuition. And second thing we show that in both the previous case the Nash
equilibrium solution and the Stackelberg equilibrium solution are just a special
case of the conjectural equilibrium. Corresponding to different parameter set in
the equilibrium funding. So in the Nash equilibrium case the foresighted user will
just simply set up the first order term to be zero while at the Stackelberg case this
foresighted user, he need to select his first -- he need to set up his parameter
gamma to be the negative of the first order information of his own allocated
power.
And the result we have is that actually the open set of conjectural equilibrium
exists in the belief domain. So now since we are mostly interested in achieving
the Stackelberg equilibrium of only have local information, then the next step is
simply how should we properly set up the belief function of beta and gamma in
order to attempt to approach the Stackelberg equilibrium by only having local
information?
Then we propose a dynamic update. So for this foresighted user during each
iteration, first you need to estimate the first order information, the first order
derivative of the aggregate interference the other user cause to him as a function
of its own allocated power in different frequency beams. By estimating this
information, he will include this first order information into its own optimization
program, then he will update his allocated power. Then he will reestimate the
first order information. And they do this iteratively until no further rating
improvement can be achieved.
Then we have this conjecture-based rate maximization algorithm, and the
essence of this algorithm is actually a local approximation of the bi-level program.
Because we explore the structure of the water-filling game because simple linear
belief makes sense in this scenario. So we can improve the performance of
these multi-user power control scenario by only have very simple linear beliefs
and only having local information. The simulation results shows that it's achieved
comparable performance of the previous case where users, the foresighted user
plus global information.
Okay. So the conclusion so far is that if user -- foresighted user plus only local
information, if he's able to make correct belief and potentially he's still able to
improve the performance of himself but also the other players, okay, then the
next question is that what if we have multiple foresighted users? And often is
trying to learn and adapt. It turns out it might be a little -- it might be difficult at
this stage because for this power control game the action set is actually a vector
set. It might not be that easy to handle, however, we are able to discuss this
problem in another different setting. So which is the wireless random access.
So we allow multiple foresighted users and often just have local information. And
we also apply this conjecture base approach and to see how it will shape the
multi-user interaction in such an application scenario. So basically we
investigate multiple nodes in a single cell. So the action set for each determine is
just to determine the transmission probability. And the payoff is given by the
throughput as -- just as professor mentioned before. So it's given by its own
transmission probability times the probability that the other user doesn't transmit.
So the key issue of this multi-user interaction scenario that people are most
interested is the stability of your algorithm, the convergence of the algorithm, and
how is the throughput efficiency and fairness in such interaction.
As we have seen before, if user is purely myopic, he will set up this probability to
be one. Then the network will collapse and nobody can get through.
Okay. Then we investigate the conjecture-based approach in such a scenario.
And very simple we -- simply we can define the state to be the probability there
for each user having contention free channel. This is the probability that for this
particular user K he has a free channel. Then this states a [inaudible] local
information. It just monitored the aggregate effect of the other user cost to him.
Then still in the user just develop linear belief. And for each iteration user will be
-- each user will be left as by deviating my current action by a certain amount and
it will in return reduce my probability of having a free channel by this amount.
Then we are mostly interested in the limiting behavior of such dynamic process.
And we investigate two different update mechanism. First is the best response
scenario. Each user just update their action based on the best response. And
the second scenario is just update the actions along the gradient direction.
Okay. So here's the actual play. Here's the conceived play. And we have four
major results. The first result we have is that all the -- all the operating points in
the action space are actually conjectural equilibrium. This has nothing to do with
the stability. And the next result we have is that we investigate the stability in the
throughput -- we provide sufficient condition about the stability and convergence
about these two dynamic update algorithm. And by having this sufficient
condition, we are able to show that actually the impassable region is spanning
from the original Nash equilibrium all the way to the Pareto boundary are just
stable, stable conjectural equilibrium with respect to different belief setup. And
the first result we have is that actually it's different operating points. These
multiple users actually base the weighted fairness rule. So by having these four
results it means that if we design these conjecture-based approach properly we'll
simultaneously have stability, convergence, efficiency and also fairness.
Okay. These slides expand why users can achieve weighted fairness. So
definition of weighted fairness is as follows: So this formula gives the probability
of this user I of having its packet gets through. So weighted fairness just means
that each user's probability of having a successful transmission is weighted
equal. It's proportionately equal. And by simple manipulation we have this form
among different users. And we add the equilibrium of this conjecture protocol we
find similar results even by this formula, and we know that if the number of user
in the entire network is large enough and the probability of transmission for its
users will be close to zero, then definitely we can have two equality
approximately the same. So the next question is how should we set up the belief
function in order to obtain the Pareto efficiency? And the key idea is that things
should adapt. They need to because these users. They access -- they have
access to the same observation because the outcome they know exactly which
packet gets through. So they will simply update -- adapt this parameter in the
function until they found that they cannot further improve that system
performance.
Okay. The advantage of this solution is that we don't need to have a centralized
solver and to just operate based on the local information. And the resulting
operating points is throughput efficient and also fair. And it can autonomously
adapt to the traffic fluctuation.
So in these slides we'll present an engineering interpretation of this
conjecture-based approach to explain how it is different from the DCF protocol.
And this is the original July date for the best response scenario. And we can
actually decompose it into several different terms.
These different terms respond to given the transmission probability of all these
users based on the observation of the previous transmission, how the probability
of these different users will evolve in the next time shot. So basically it's
described the deterministic trajectory of the multi-user interaction. And then we
just fill in this table. This represents the conjecture-based random access. And
we can see here if user 1 here observe the other user make a successful
transmission for our best response scenario, he will choose to reduce his
transmission probability by a factor 2. And simply we can fill in these blanks for
the other observations. Then we have our protocol of this conjecture based
approach here. Then also we can have the currently the existing DCF function -protocol here. So basically what DCF performs is that if I did not transmit in the
previous time slot I will not change my transmission probability, I just simply
review my contention by number of 1 and if I observe a collision then I will reduce
my transmission probability. If I make a successful transmission, I will increase
my transmission probability. As we can see here, our proposed approach is
actually similar in the case that I made the transmission. I made a transmission
attempt. This is similar as the DCF protocol. However, what is different is the
case that if I did not make a transmission. But still, I need to make some
modification of the transmission probability of the next time slot.
Why this is interesting is because even though I did not transmit in the previous
time slot, I still observe something from the joint play. And I know that if the
channel is congested or not. Then I should also be able to adapt my next
transmission action in order to timely adapt to the network fluctuation. So this is
the basic idea. So which means that the conjecture-based random access
actually is -- makes use -- is making use of 4-bit information compared with the
DCF case which only 2 bits information is explored.
Okay. So here's the simulation results. And we can see here the DCF is not
efficient as the number of nodes increase in the network and how the P-MAC
protocol is actually proposed by people from University of Michigan at IWCOS.
And in this case they need to know exactly the number of nodes in this work and
they need to develop some -- approximately develop some [inaudible] expression
of the transmission probability that enable to achieve very high level throughput.
And it turns out by having this conjecture-based algorithm we can also be this
P-MAC protocol by having no information about this program -- about an action
network but just adapt based on the actual outcome.
And however, for this P-MAC protocol needs to know the number of nodes and in
the actual implementation people need to online estimate the number of nodes
so which cause this protocol to be unstable because it's overlooked. There's no
[inaudible]. And because of our [inaudible] design we showed that this equilibria
is stable. So even though we can see the number of nodes we changed them,
the number of nodes in the system over time, we can see here actually the
conjecture based algorithm actually can still sustain the stability of the operating
point over time.
So the conclusion so far is that if everybody only have local information and if we
design the belief function correctly and properly, still we are able to improve the
system performance then the key idea is just to -- based on the problem of your
investigate scenario how should dine your belief function and update it.
So the future direction is that we are trying to understand for which type of
multi-user interactions this conjecture-based approach is suitable. And also
based on the information availability in different application scenarios how we
should choose appropriate equivalent concept and how we can compute it and
about -- and also its stability, convergence and fairness. And we also looking
forward to apply this conjecture-based approach in different applications. And
based on the presented work, we have two accepted journal papers and also one
more pending. And we also published two more conference papers.
These are all actually transaction papers. Okay. Thank you.
>> Jin Li: Thank you.
Related documents
Download