>> John Dunagan: So it's my pleasure to introduce... be talking to us today about demystifying Internet traffic. ...

advertisement
>> John Dunagan: So it's my pleasure to introduce Kashi Vishwanath, who will
be talking to us today about demystifying Internet traffic. Take it away.
>> Kashi V. Vishwanath: Do I need to stand here?
>> John Dunagan: No, it will record you there.
>> Kashi V. Vishwanath: All right. Thanks, John. Thanks for the introduction.
It's a pleasure to be here and to talk to you today about my experiences
regarding trying to unravel some of the mysteries surrounding Internet traffic at
least in the context that I'm looking at it.
So traffic is an important phenomena that as you may know is useful in a variety
of different settings. For example, Internet Service Providers, ISPs, are
interested in understanding what traffic demos look like at specific points in the
network, not just for trade, for also something projected into the future before
they take capacity planning measures.
Similarly, people like you and us, who are trying to do management studies,
trying to evaluate services are interested in understanding what kind of traffic
patterns exist out there.
So in this talk I'm going to be focusing on yet another example which I'll keep
coming back to, which is the following. Let us say that we interested in designing
the next generation Internet architecture, protocols, applications, services, one of
these. So we want to evaluate that when we make certain design choices on the
white board and they eventually translate to a real learning service on the
Internet, what does the performance look like when individual clients are placing
requests for data that is contained.
Now, the Internet obviously has a large number of users, sending a number of
different applications, protocols, et cetera. So it is clear that the performance of
this application will be going by what kind of traffic exists out there.
What is not clear, however, is what is the extent of this impact. So in this talk,
towards the end, hopefully I will have convinced you the following things.
Obviously the first one is that existing Internet traffic is extremely rich in its
structure. It does a number of interesting statistical properties that one might
wish to capture. For example burstiness pattern at different time scales.
Similarly, the impact it will have on individual applications is extremely difficult to
predict a priori, which in turn means that you need a more systematic method. I
will then tell you about one of the methods that I've thought about how to go
about arguing about, what kind of impact this traffic will have on individual
applications and protocols that you are trying to evaluate.
So with that being the general theme of the talk, let me now give you the specific
problem that I've tried to solve and what is the solution I've reached.
So let us go back to this problem. So we are trying to evaluate something that
we have deployed on how does it interact with Internet traffic and what is the
impact. So one obvious solution to this problem one way to approach it, we
would get access to a number of different machines on the Internet. We would
then deploy what was the prototype we are trying to evaluate on each of these
machines.
Now since we are running on the real Internet there is no question about how
realistic the experiment is. The big problem as you reckon is how do you get
access to these machines.
Even if you get access to a number of different machines, how do you ensure
that they are representative. And finally, experiments on the Internet are not
quite reproducible as you might now. So to go around all of these challenges, we
researchers somehow like to get away from running experiments on the Internet,
but we would like to run an equal experiment by configuring machines that we
have complete ownership and control of, interlocal testbed, a local set of
machines in our own track.
So now we take the prototype that we are trying to evaluate, run it on the local
cluster. Similarly we reproduce the set of clients that were accessing content on
the Internet on a local cluster again and in this fashion we have replicated the
communication characteristics in a complete set of machines that we completely
own.
At this point, we might naively declare that the evaluation of the prototype is
complete. But as you have seen, the picture on the right is missing a key thing, it
does not quite capture the complexity of the Internet. So a high-level motivation
for my research in the recent past and hopefully in the near future is the following
question.
So what would it take to create meaningful, simplified snapshots of the Internet
so that you could try your best to reproduce that snapshot in a local cluster?
Now, once you do that by appropriately configuring machines that can send data
to each other, when you now evaluate this new service in the local cluster there
is some hope that the numbers that you draw would have some correlation to the
actual numbers on the Internet.
So in short I want to expose individual application evaluations in the local cluster
to realistic Internet like settings.
So indeed I started my research by trying to build a better Internet, but pretty
soon when you start reading the evaluation section of a number of papers, the
evaluation methodology lies in between these spectrum. So either you could do
a critical analysis of what the Internet looks like, what the individual companies
look like or you could run experiments on the real Internet.
But most often these decisions are going by how much time do I have, what kind
of an expert intuition do myself and my colleagues have, et cetera.
So I have the desire to make experiments in a local testbed more realistic. And
as you can imagine, there are a number of different ingredients that would go into
it. Some of these I have been fortunate to work on, but a lot of them still remain.
In this talk I'll be focusing on one of these. So what would it take to reproduce
Internet-like traffic conditions in a local testbed? And more importantly I'll try to
argue that indeed individual applications that you are trying to evaluate care
about that traffic. And if you expose individual applications to a certain different
kind of traffic, you could receive conflicting results on what the performance looks
like.
So let us go back to the same problem again. Yes?
>>: (Inaudible).
>> Kashi V. Vishwanath: I'm sorry?
>>: (Inaudible).
>> Kashi V. Vishwanath: What is the Internet?
>>: Yes, as defined (inaudible).
>> Kashi V. Vishwanath: So Internet is defined on the problem that you have a
definition. For example, if you're interested in doing certain kind of analysis on
how better to route all traffic within an organization to Websites located outside,
your Internet's view in that case would be the border link of Microsoft Research
and everything outside would be represented as a cloud, everything inside would
be represented by a number of clients.
In the same way if you are an ISP and you want to take capacity planning
measures then the Internet again is based on your view of the world. So in that
case it would be how do I best manage traffic within my organization and how do
I get to route things which are coming outside, and I can model that using a black
box. So it's really the definition based on what is the problem you have at hand.
So the kind of argument I'm trying to make here is that even if you have a crisp
definition of what is the problem you have at hand, how do you then ensure that
for that Internet, that definition of Internet you reproduce Internet-like conditions
in a local testbed and what can be done about it.
So I was saying that we have now somewhat reduced the problem definition to
trying to understand Internet traffic at every single link on the Internet, and this
somewhat goes back to (phonetic) question. That is a seemingly impossible and
daunting task to do. So before we get there, there's a much simpler intractable
version of the problem which too we do not have complete control over and
which is what I'm going to be focusing on this talk.
So as I said, let us say there were specific links on the Internet that you're
interested in, for example the border link of MSR. So how do you understand
that what is the kind of traffic that flows across that border link and how does that
influence the flows of individual applications that I'm trying to evaluate which
share that link?
That is the focus of this talk, trying to understand traffic at specific links on the
Internet and then trying to argue that how does that impact the performance?
For example in this case, I've configured a local set of machines to generate
background traffic for that specific link. I've then configured individual prototypes
that I'm trying to evaluate, which are generating foreground traffic for the same
link.
So in this fashion I hope to understand traffic at this link and the impact it has on
the application I'm trying to evaluate all in a controlled setting.
So with that being said there are a few specific goals that I think any traffic
generator should have in the setting which is realism and responsiveness. By
realism I basically mean this if I'm trying to reproduce traffic for a specific link the
generated traffic should at least look like the original traffic for whatever metrics
of success you have.
More importantly it should be responsiveness. And I mean two things by this.
First of all, the generated traffic and application traffic should really interact with
each other, exactly like they will do on the Internet and not simply kill each other.
More importantly, I want to be in a position where I could say turning meaningful
knobs based on your intelligent estimates, let's say, on what the network would
look like in the future, I should be able to translate that into what does the traffic
on this link look like in the future?
For example one does everything from an upgrade in the access link based on
my ISP. So this should be a meaningful way in which I could hope to express
certain changes like that.
>>: (Inaudible) talking about a single physical link or are you talking about a
logical channel which is going to have the properties of the communication path
between two hosts that might ->> Kashi V. Vishwanath: That is the general goal. But for this talk, yes, I am
actually also focusing on a single physical link.
>>: So everything else that would affect that logical communication path, such
as routing change and DNS, everything that's how the scope of this talk
(inaudible).
>> Kashi V. Vishwanath: That is our scope of this talk. So anything that exists in
the end-to-end path will be something I'll be extracting. So there are a thousand
hosts on one side of the link and 10,000 loads on the other side of the link, I
would try my best to capture the distributions of what is going on on the
end-to-end path across these thousand cross 1,000 pairs. But everything else
that external influence would be something I would be something that I would
looking into the future. But time permitting I could tell you a little bit later on how I
approach to get around that problem.
So there are a number of challenges which would be presented if I'm trying to
achieve this goal. First of all is the fact that individual applications themselves
are changing over time. For example, even if we look at something very simple
like Web traffic. So back in the days, we used Web pages which had a simple
content, limited layout, et cetera. You visit the same Web page today and it's
inundated with text, media, ads, portfolios, et cetera. So the application
generator, the traffic generator should be able to understand and reproduce this
phenomenon.
Then again, individual application's popularity is constantly changing over time.
For example, if you look at this link, it's a transpacific link which runs between
Japan and United States, and I'm trying to show here across a five-year window
sample data of popularity of three different applications measured in bytes. So
as you can see here, the popularity of individual applications is constantly
changing over time.
So even if you had this information, there should be a way in which it would
express this in the generated traffic that you're coming up with.
Finally, I've been talking about this rich structure that is present in Internet traffic
and how does it influence individual application behavior? Yes?
>>: (Inaudible).
>> Kashi V. Vishwanath: So perhaps they were using that protocol to tunnel P to
P traffic. That's my guess there. So I don't have access to the pay load, so I
cannot really understand. Yeah.
So here I'm showing a sample of the same trace that I was looking at earlier, but
this is for a specific day. And I'm looking at throughput which is measured in
megabits per second for one second intervals across a 15-minute window and
I'm showing two kinds of traffic here, Web traffic and Napster, which used to be a
popular peer-to-peer file sharing service.
So one thing you can see here is Napster, which is the bottom of the two curves
is relatively smooth compared to Web traffic when I'm looking at one-second time
bins. Now, if I zoom in to the first few time bins and look at traffic at a much finer
time scale, at hundred millisecond intervals, now we can see that Napster traffic
is no longer as smooth as it once appeared to be.
It is still way better than Web traffic. If I further focus in on the first few time
scales and look at a much finer level of granularity now, then at 10 millisecond
time intervals there is hardly any difference between Napster traffic and Web
traffic.
So to the already existing list of challenges in trying to come up with the traffic
generator, we added in one more, which is the fact that Internet traffic perhaps
exhibits this rich burstiness property and the traffic generator should be able to
understand and reproduce this. Yes?
>>: (Inaudible) the users?
>> Kashi V. Vishwanath: There's a single link, single aggregated link, the
transfer between Japan and the United States. Yes.
So more important, the problem is exacerbated by the fact that burstiness of
Internet traffic is actually a function of what time scale you're trying to observe it
in. So there is no specific answer you could arrive at which says Internet traffic is
bursty, it has to be a much finer definition, and the traffic generator should be
able to both understand and reproduce this.
So with that being said, let me give you an overview of the solution I've reached
to this problem, before giving away more details. So I start by observing all
packets that are entering and exiting a specific link that I'm interested in. Starting
from this complex structure I extract key properties that one needs to understand
if one has to argue about Internet traffic.
Starting from that, I will build parsimonious models which will explain these
properties. Armed with these models, I will then configure individual hosts in a
local testbed that understand these models and use that to communicate with
each other.
I have built all of this into a single tool which I call Swing. So Swing starts from
observing packet choices for a specific link, build all of these models, extract
distributions, configure its host appropriately to generate real traffic in a local
testbed. At that point I can do a sanity check and compare whether the
generated traffic looks similar to the original traffic that I started with.
But more importantly, I can now envision the kind of things I was bringing out in
the motivation, which is deploying the service in the original scenario.
Corresponding to that, I can get an analog in the generated testbed here so I can
introduce the service and evaluate a system in the presence of realistic traffic.
Yes?
>>: (Inaudible) responsive (inaudible).
>> Kashi V. Vishwanath: That's based on -- so I'll get to that in a minute, but the
quick answer is first of all use TCP to get congestion responsiveness, then you
build a complex feedback loop one on top of the other, exactly like it would on the
Internet. But I would have to give more details on that.
But more importantly, in sort of just evaluating individual systems against a
specific kind of traffic you could now turn meaningful knobs into generated traffic
to project traffic demos into alternate scenarios and again evaluate your
application in the presence of such future scenarios.
So with that being said, here are the key contributions of this work. So starting
from this complex phenomena, I tried to extract key properties that one needs to
get a handle of in order to understand what Internet traffic really looks like for
specific links. I then build all of this into an automated tool called Swing which
can look the a Internet traces and reproduce that traffic in a local testbed.
I then argue in my work how individual applications really care about such
burstiness Internet traffic. And if you do not expose individual applications to
realistic Internet-like traffic patterns, you could reach conflicting results on what
the performance of individual applications should look like. Yes?
>>: (Inaudible) what are your metrics for whether you get (inaudible).
>> Kashi V. Vishwanath: Metrics for getting the traffic correct?
>>: Yeah. How will you know that it's indistinguishable so like a cryptographer
would have their definition of whether two messages are -- or whether a
message is indistinguishable ->> Kashi V. Vishwanath: I see.
>>: -- random, you want to have some level of confidence of whether you got
this right.
>> Kashi V. Vishwanath: I see.
>>: What is the metric that tells us that's going to be used?
>> Kashi V. Vishwanath: I see. So that is actually a great question. So at the
very least when you try to generate traffic which looks compared to the original
traffic the first thing you will do of course is first level order of metrics. So you
could say what is the mean and variance of the traffic process, does it look
similar to what I saw earlier.
Because you could imagine, however, IQ of management studies would get
about that factor. Then you could go one level higher and try to look at traffic at
individual time scales because I've observed in my experiments that that property
actually impacts individual application behavior. So the natural of the onset here
is that if there are individual applications that you are exposing to Internet-like
traffic patterns, then what properties of Internet traffic patterns do those
applications see in the evaluation.
So if you have generator traffic with these similar effects for those applications,
then the generator traffic is meaningful enough. So if I can reproduce controlled
setting Internet-like experiments in a controlled Swing-like setting, then the
generator Swing trace is meaningful and good enough to go there. That's the
answer.
>>: (Inaudible) when you're saying out of model of traffic looks like and then I
reproduce it if it matches my model then I succeed.
>> Kashi V. Vishwanath: Yes.
>>: But what you haven't captured is the whole point of trying to make things
more realistic is that your controlled environment captures somehow everything,
there's not going to be some feature of the real Internet traffic that you haven't
captured that would affect applications in a way you don't expect, and the only
way to validate that is to try your application out on the real Internet.
So I don't understand how you can -- it seems like your metric of success is
equal to your -- the thing you're modelling. You're saying, well, we know it's right
if the traffic fits in a particular distribution which is the distribution we used to
generate the traffic.
>> Kashi V. Vishwanath: Okay. So I apologize if that was not clear early in the
talk.
So you start with a given link and for a controlled setting based on number of
measurements and heuristics, you decide whether the application that you ran on
the Internet under sufficient bounds is meaningful or not.
Then you try your best to reproduce that scenario in the local testbed. That's just
a sanity check. Now, the Swing setting that you have in the local testbed, you
can tune meaningful knobs in there. For example, you could say everything else
being fixed if the application protocol changes this is a certain way, how would
that affect the traffic that is being generated and how would that impact the
applications that I'm trying to evaluate. So if I did not do a meaningful sanity
check at the first step, there is no hope that any numbers that I draw out of the
changed and protected traffic would have any meaningful information.
>>: (Inaudible) I agree you've done a necessary condition and you want to
achieve hope, but there's more to hope -- there's more to the solution than ->> Kashi V. Vishwanath: So can you give me one specific example so that I can
better understand what is the scenario?
>>: So I guess one way to validate this would be if you projected three months
into the future if these changes happen, this is what we get, make 10 predictions,
see which way Internet traffic actually changes and then see whether you match.
>> Kashi V. Vishwanath: That would be -- I would be happy to talk about some
of those results. Yeah, that's a great question. That is exactly the kind of
validation I have done.
So what I was saying is that before you get there you have to validate a specific
scenario, before you get there.
>>: Let's come back to that later.
>> Kashi V. Vishwanath: Yeah. So hopefully I've convinced you that
understanding Internet traffic is a complex and challenging problem as you guys
are also uncovering some of the mystery surrounding it.
So now I'm going to give you the main insights I've tried to use, which will answer
some of your questions on what hardware attack of responsiveness, too. So I've
already been talking about one of the approaches to one of the main insights in
the four insights that I have, which is that we need to have a hybrid approach to
understanding Internet traffic. So by that I mean we should be able to run a real
cord of the prototype that we are trying to evaluate interacted with realistic
packets, real packets, exchange each other in a controlled setting. But somehow
the traffic itself needs to be modelled for a better control.
So once you have that, what would some of the initial simple traffic generators
look like? For example you could say I look at traffic and to an existing the
border link of MSR, and I simply try to reproduce the timing in those packets,
using your repeat. Unlike most of the applications on the Internet.
So if I do that, you can see that depending upon what level of granularity I chose
to reproduce I can be fairly realistic in the kind of traffic that I generate. However,
since the Internet traffic is not quite dominant by UDP protocols and even if it did,
since we do not have meaningful observations to draw the high level application
behavior, there is no hope that the generated traffic will quite be as responsive as
you want it to be.
So in order to fix that, what would you do is go back to most applications that's
run on the Internet. For example, they use TCP to leverage this. So in this
argument, I'll say that if it's the TCP protocol that we are trying to model here, we
should use TCP as the underlying mechanism. So in this fashion, you can see
that the generated traffic will really interact with application traffic similar to what
you would have done on the Internet. Yes?
>>: (Inaudible).
>> Kashi V. Vishwanath: Oh, yeah, sure.
>>: (Inaudible).
>> Kashi V. Vishwanath: Exactly. Exactly.
>>: That depends upon how much research -- how many researchers you have
in your testbed.
>> Kashi V. Vishwanath: True.
>>: So how is it that so far you haven't described one requirement you have on
the testbed, and if your testbed has only (inaudible) no matter what you have.
>> Kashi V. Vishwanath: So that's a good question. So there are other
approaches -- so there are two real answers to this. The honest answer is I'm
kind of presuming that you are not starved on resources for the kind of
experiments that you're trying to do. But even if you are, let's say there is no way
you could get the number of hosts that you wanted to, then there are other
approaches you could try. For example, one of the projects that I worked on
talks about how could you configure individual virtual machines in a local testbed
and run them appropriately dilated at a much slower timeframe so that when 10
real world clock seconds pass, the VM only thinks that one second has passed.
So if you exchange a slower rate of data within that virtual machine, you could
perhaps get in the relative timeframe a much higher throughput. So those are
some techniques which I could think about. If really you did not have the
physical resources and that was the only passing factor, then perhaps you might
not mind being in terms of time to run the experiment.
So as you can see now if I use TCP instead then, yes, I could be fairly
responsive, I could try interacting with the applications that I'm trying to evaluate,
et cetera. It is not clear whether I can still project things into the future, but let us
not worry about that for the moment. Now, clearly once I have delegated the
task of putting individual packets on the wire, it is no longer clear as (inaudible)
was pointing out that unlike your repeat there is any hope that you can reproduce
these packet timings for example.
So that really is the second insight. So the fact that you should perhaps leverage
TCP that is available in the number of hosts that you are trying to run the
experiment on. Yes?
>>: (Inaudible) the TCP is going to adapt to your control setting. How do you
know how much offer (inaudible). When you're doing unity, you know, you send
a packet when the trace tells you to send the packet. With TCP, if the link is
suddenly more available you should send more data.
>> Kashi V. Vishwanath: That's exactly the point I was trying to bring up. It is
not clear that just because you use TCP how can you be realistic in the way you
are reproducing packets?
>>: (Inaudible) TCP is exactly no more powerful than unity.
>> Kashi V. Vishwanath: That is not quite true. So the fact of the matter is that
TCP is a fairly complex (inaudible) diagram to try to simulate in any other setting.
So the observation I have here is that -- and this will be much clearer in the
results I show, that since TCP plays a very critical role in determining what kind
of burstiness properties exist on Internet traffic if you try your best to simulate
that in any other world you will be hard pressed to reproduce that behavior. So
the best hope you have is to leverage TCP that's actually running on individual
hosts that are available for experimentation.
>>: (Inaudible) a little bit depending on what else is going on around it.
>> Kashi V. Vishwanath: Yes.
>>: How does TCP make the decision between I should send a packet now
because my congestion window says so, which is what they expect to do, and I
should maybe not send a packet now because actually don't have any available
from the application.
>> Kashi V. Vishwanath: Oh, so that will be another aspect of the work that I will
get into very soon. So if you'll hold your question for like five minutes, maybe I
can answer; otherwise, i can come back.
So once you have the second observation you're absolutely right. Just because
you have TCP and you bury some sort of a simple model on top of this, pretty
soon you will realize that any of those simple models is not going to satisfy the
goals, exactly the point you're trying to make, either it will be not realistic or can it
will not respond, but you cannot meet both of this.
To get the third in the second to last insight, we ask again what does Internet
traffic really depend upon? And to try to understand this, there is a complex
interaction at multiple layers. At the top-most layer you've got these users that
are trying to engage themselves in a variety of different activities. Then you've
got the applications that the users surrounding on below them, and finally the
network, which is responsible for carrying the large amount of packets that the
applications are generating.
So the hope here is that if you somehow capture these three layers
independently and then model the interaction between them, you would be able
to reproduce that in a much more controlled setting. So for individual users, I'm
interested in understanding what are the periods of activity, what applications do
they prefer, et cetera.
For individual applications what does the semantics look like, how do they defer
from each other, how are they similar to each other? And in the same way what
is the fate of individual packets when they put them on the wire? Does it get
delayed indefinitely, does it go all the way to the other end, exactly like the
questions that you are trying to raise.
So the hope here is that somehow you capture the interaction between these
three layers, but more importantly it's not going to be super accurate based on
the fact that you're doing this solely based on the observation of a single link, the
view of the world from that single link.
So that really is the third insight that goes into this world, that if you have any
hope you would perhaps leverage the recommendations from structural
modelling and generate traffic as a three-level hierarchy.
We have the final insight. Let me go back to this picture again. So in trying to
reproduce traffic using this three-level hierarchy, turns out that there is a complex
feedback loop going on.
For example, at the top-most layer, it's only when you get the entire HTML page
and part the contents you realize to click on individual URLs that are embedded
within. Similarly for TCP, only when previous data is acknowledged from the
other side do you decide to put based again on your state diagram how much
more data to put on the wire.
So in order to generate traffic, we should be able to capture and model this
closed-loop traffic generation process at these three layers. So with these four
insights, let me now describe how do you generate traffic.
So again going back to this picture, I'm going to extract all of these properties,
build the models, and then tell you how do I configure individual hosts to
configure traffic.
So look at this similar picture again. Start by observing all packets entering and
exiting a given link. At that point the first step is to identify individual applications
that belong to the original trade. So this is a step I'm hoping to leverage existing
world that exists out there, for example some of the people here are involved in
doing it.
So how do you look at individual packets and extract what applications do they
belong to. So after this my research starts. So I'm going to focus on individual
applications but the method mostly generalizes. I'm going to tell you how I
extract user application and network properties based on a single application.
So I look at these three packets and based on TCP/IP destination and port
numbers I extract flow information from these packets, so there's a high-level
description -- yes?
>>: (Inaudible).
>> Kashi V. Vishwanath: No. So but what I did not say here is that I'm not
assuming anything about the underlying protocol at a fine level of detail. So if
you have TCP level headers at those packets, I'll try my best to reverse and
(inaudible) the protocol that was going on.
>>: (Inaudible).
>> Kashi V. Vishwanath: So I can infer the request response patterns that's
going on. For example, I can infer that a certain number bytes are going on the
other end. You're absolutely right, if there's a much level higher level behavior
which is human dominated and there is a complex feedback control loop there, I
might not be able to capture that.
But at the very least, lowest level I might be able to capture how much data is
going across, what happens to the data, how much response comes back, how
long do I wait after that. So I'll be able to extract patterns like that. So that is my
hope for this work.
So then I look at individual flows and based on TCP sequence numbers and
(inaudible) numbers and time span, I try to exact what is the information on a
per-connection basis. For example in this case, what is the request response
exchange, what is the total number of request response pairs, what is the timing
separation between them, et cetera.
Then you look to group together individual flows into a single cluster, a quantity
which is known as an RRE. Auto request response exchange. So this of this as
analogous to all requests that are going out for (inaudible) objects in an HTML
page.
At the next step I attempt to group together individual RREs into what is known
as session. So this is again analogous to all the Web pages let's say you might
browse in a single visit to the work station, or all the movies that you would
download in a single P to P session for example.
At this step, at the high level I've described how do I take out the user and
application properties? So the only thing that remains is the network. To do that
I go back to individual flows that I've extracted earlier.
And now I will use variance of past TCP trace measurement techniques and not
necessarily innovate in this space.
The aim of these techniques is to look at individual IP addresses for sake of
completeness in the original trace and if you look at those IP addresses and use
a single logical link to connect the IP addresses to the link under observation,
what are the static values of capacity loss rate and latency that I can attribute to
those. So at this step I've described at a high level how do I build these margins.
So the key thing to note here is that I need not be super accurate in any of these
techniques, although it will not hurt to do that. The main observation here is the
fact that can I extract sufficient information for all of these models that I can start
explaining why Internet traffic looks like the way it does in the original case and
how I can use to drive my simulation studies.
So to summarize I looked at packet, drove them into applications, extracted flow
information, RRE information, session information, and then used my data
characteristics to extract capacity latency and loss rate.
So I should again note here that much of these is like work in progress and it will
continue to evolve as you come up with more applications. In particular the last
one for instance has (inaudible) for a lot of people to provide a rich source of
(inaudible) pieces in the past.
So let me go into a little more detail on some of these. So how do you look at
individual flows from packets. So those relatively straightforward. If you look at
the TCP dump information present in the original trace and without going into too
much detail it is information on where the packet came from, where it went to,
what was the timing, what are the TCP sequence headers, et cetera.
So based on that, I will try to reverse here the cause and information that is going
on on the trace. For example, what did the TCP (inaudible) look like, what was
the request data that was sent across, how long did the server on the other side
wait, when was the response sent back, et cetera.
So at the lowest level, I will extract flow connection and termination times and
then also more detailed information. For example, what is the request response
semantics. So once I do that as a basic building block, I could then try to figure
out RRE and session information. And this is how it works at the high level.
So I looked at all flows and sought them in increasing order of connects and
initiation. I look at the first flow and mark at the beginning of the first session and
the first RRE in that session.
I will then attempt to take more flows that are concurrent or overlapping with this
flow and attribute them to the same RRE. For example, I might get one new flow
which is time separated by a certain amount and does the interconnection time
for this particular flow.
I will then attempt to group more flows which are also concurrent with this flow.
Once in a while, I will get the flow which is physically separated from this cluster
by a certain amount. And at this point, I will declare that the previous RRE
ascended, a new one has started, and this is the enter RRE time.
I will then -- yes?
>>: (Inaudible).
>> Kashi V. Vishwanath: Yes.
>>: (Inaudible).
>> Kashi V. Vishwanath: Sorry. For a specific IP address. So for a pair of
destinations that the exchange is going on. So let's say I'm sending some data
to a Web server and I'm trying to download like a number of HTML pages -number of (inaudible) objects within that. But if I'm talking to some other Web
server which is not causally related to this, then that will not be a part of the
same ->>: (Inaudible).
>> Kashi V. Vishwanath: Yes, for now. Correct.
>>: (Inaudible).
>> Kashi V. Vishwanath: (Inaudible).
>>: If the Web server at same time I'm looking at other stuff, right. (Inaudible)
same servers.
>> Kashi V. Vishwanath: So the source is the same and the destination is the
same?
>>: Yes.
>> Kashi V. Vishwanath: So that's a great question. So let's say every time you
access the Web server, every single time you are actually watching the video,
then it makes sense that when I extract this RRE information I embed that as
some sort of high-level behavior that the user itself is expressing. But every time
a website is watched, a video is watched in parallel. So it would not hurt to group
them together into one single cluster.
However, if it happened a very low percentage of the time, then based on the
distributions I extract, it will get weeded out.
>>: (Inaudible) sometimes like there are certain -- for example the tendency
between this traffic and the tendency may occur or may not occur.
>> Kashi V. Vishwanath: That's a great question. So the answer in that case
would be what is the level of granularity that you are interested in. So if you
believe that you want to characterize individual users who may or may not
access multiple data on the same Web server, then you might be interested in
categorizing them into different classes. So it really depends upon whether the
kind of fidelity you get from the traffic generator is a bottleneck to the kind of
application studies that you're carrying out.
So if a large percentage of users start exhibiting that behavior, then there is
some reason to believe that it will influence the application behavior. And at that
step I might want to characterize the fact that there are two different
conversations going on but semantically they are different in nature.
So the other one, what would you do instead? So there is no other better option
instead of either figuring out whether to put them into different categories or
grouping them into a single one. At the end of the day, I want to just capture the
fact that how do two hosts talk to each other.
If a large number of times it happens to be an HTML page and a video page, that
is what I want to capture in this interaction. Similarly, once at some point of time
there will be a new flow which will be so far separated in time that I don't even
want to put it into the same set of priorities that I'm working on. So at this step I
will say that the session, quote/unquote has ended and the new one has started.
So the point here is I will try to extract distributions for all of these in the
underlying model. Then in the end, I will extract data using variance of
well-known techniques, so not necessarily innovating in the space. For example,
to extract latency estimate for this link, I will try to look at the separation time,
separation between the data value and the corresponding act value which is
coming from the other end. Similarly, I'll use variance of packet measurement
techniques to attribute capacity value for this link. And finally, extract loss rates
in a similar fashion.
So the point is that at the end of this, I would hopefully have extracted meaningful
distributions corresponding to the parameters of the model that I've built. So the
next step -- yes?
>>: (Inaudible) packet size?
>> Kashi V. Vishwanath: Yes, I do. So in fact in some of the experiments I've
found that if you do not do it then you could reach lopsided results. So in the
beginning I wanted to have a parsimonious model, which was sufficient for
explaining what the traffic looks like. But in the end it was driven by studies on
what applications cared about, and it turned out that packet size turned out to be
a particular one, so I ended up (inaudible) yes.
So I start with a network emulator that talks to a number of different hosts to
carry out the emulation framework, and in this case I use Modelnet. So the
network emulator basically knows what the Internet looks like based on what you
tell it. So in this case we tell it that it's the border link that we are interested in,
and sufficiently we would know the characteristics of this link. We would then
look at the Modelnet link, and it talks to a number of different hosts to carry out
the emulation.
We again divide the task of carrying out the emulation based on the popularity
and the original trace.
So if a third of the hosts in original trace belong to HTTP a third of this would be
responsible for emulating HTTP part of the background traffic.
So the next step would be to take these individual hosts and connect them on
either side of the dumbbell link. Again, that is based on the popularity in the
original trace. So if two-thirds of the bytes ended up flowing from left to right or if
you're interested in host, then if two-thirds of the hosts were on the left side on
this dumbbell link, you would take two of these and place them on the left side.
You do this for all of the hosts. At this step in the emulator you now want to
connect the host to the dumbbell link. To get values for the data properties you
go back to the distributions that you once extracted. For example, this like here
might get a 10 megabits per second capacity, 10 millisecond delay and two
percentage loss rate.
You do this for all links and connect them to the dumbbell link. At this step,
configuring the topology is complete. Only thing remaining is telling them about
the user and application property. Yes?
>>: So if I understand, are you assuming that two flows, that the only shared link
is this one link, and that a subset of the flows don't share some other link and
might interact with each other ->> Kashi V. Vishwanath: Yes, yes.
>>: Because of whatever (inaudible).
>> Kashi V. Vishwanath: Yes. You're absolutely right. So in reality that's going
to happen, but as an approximation in this case I'm assuming that there is no
shared pattern (inaudible). But you can imagine extending this using variance of
packet bear extensions that (inaudible) for example has worked on, which tries to
extract congestion on shared patterns and use that to understand how can I
safely attribute this to like a different level of hierarchy. For example, there be
could a tree spanning out of here, and there could be multiple hosts let's say if I
could extract information on what prefixes they could belong to, I could put them
all into a single cluster and feed them to a separate byte. But you're absolutely
right, for these experiments, I'm not modelling them, and I'm modelling them
independently.
>>: (Inaudible) model of loss rate (inaudible) so as long as you don't change that
assumption no matter of (inaudible) is going to get you interaction that's
happening?
>> Kashi V. Vishwanath: I see. So that's a great point. So the loss rate
assumption here is for flows that I do not have visibility to. For example, if you're
a link behind the border link of MSR and you're trying to connect to me I'm
outside the server, then the link that you use to connect to me, let's say I cannot
observe other traffic that goes through that link.
For example, there might be hosts here which share this link and I cannot
observe that interaction. In order to approximate that interaction I'm assigning
loss-rate values. But if there are multiple flows, all of which flow through that link
and that are part of my controlled experiment, then any induced loss rate would
be captured. So the loss-rate setting is for capturing something that I cannot
passively observe because I do not have visibility.
And in a similar way if I have visibility like multiple routers within the same
organization, maybe I can augment the technology. They are like two different
aspects of loss rate.
>>: (Inaudible) you need a big picture. (Inaudible).
>> Kashi V. Vishwanath: Sure. But hopefully if you are designing a network and
if you have complete control over the network, there is slight more information
that you would have then I had to work with. For example, I work with individual
traces which had no control over when they were gone.
If you are within a network, not only you would perhaps have more traces at
different points in the network which you could correlate, but you would also have
like routing information that you could leverage to augment that how does traffic
go from one point to the other.
So again, quickly, so look at the single application, for example, figure out based
on the probability distribution how many sessions to generate for each session,
how many RREs, for each RRE, total number of connections.
So let us look more closely at what an individual connection emulation might look
like. So as you recall, it has rich information on what are the request response
exchanges that are going on. So for this connection again based on a
distribution I would pick a pair of hosts in the original topology and establish a
real TCP connection in between them.
The sender then would take a request size, create a dummy packet, real TCP
packet, send it across.
The server on the other end would generate a response which would come back.
The client would wait for a certain amount of time, generate a new request, send
it across and wait for this final response on this connection.
So upon receiving this last response, the sender would then finish the connection
and terminate it. So at this point, at the lowest level we have finished emulating
a single TCP connection. So we proceed by doing other connection in this RRE,
other RREs and other sessions.
So when all of this real TCP data is flowing back in across this link, we can now
imagine introducing the service that we are trying to evaluate. At this step, the
flows belonging to this would interact with the flows belonging to the background
traffic and then we could draw numbers on how is the flows being influenced by
background traffic on that specific link.
So let me now try to give you a detail of some of the evaluation that I carry out.
So basically I'm interested in answering two questions. Can Swing look at
Internet traffic traces and try to reproduce them as is possible in a local testbed,
and if it does that, whether or not that influences the individual applications that
I'm trying to evaluate.
So to answer the first question -- yes?
>>: (Inaudible) can you do something far simpler than Swing such as simply, you
know, characterize let's say some property to the (inaudible).
>> Kashi V. Vishwanath: That's a good question.
>>: And just (inaudible) traffic and (inaudible).
>> Kashi V. Vishwanath: That's a great question. Yes. So I actually answered
some of that question. So this question here is really trying to capture that fact.
So when I say background traffic I'm hiding because I didn't want to mess up the
slide, I'm hiding the fact that I also mean realistic and responsive traffic. So that's
a good question.
So for the first question I'm looking at existing traffic traces. Part of the problem
is that I don't control these traffic repositories. So they come from different parts
of the world, have different properties, different link speed, different applications,
et cetera, and I experimented with all three of these.
So basically I'm interested in going back to my goals and seeing whether the
traffic that I generate based on the (inaudible4) approach that I described is
realistic and responsiveness.
So in the interest of time, let me skip some of the initial results which basically
say can you manage the course (inaudible) behavior. So as you can imagine,
even simple techniques would be fairly equipped to do that.
Instead I would focus on reproducing a certain aspect of Internet traffic that I did
not explicitly model, which is really part of the total concept, which is worse than
Internet traffic.
So in literature, there's a large amount of text in order to understanding why is
Internet traffic worse and what can you do about it? In this talk I'm going to use
one of the many metrics that exist out there. Firstly because of its popularity but
also because of its visual appeal, and that is Web-based multi-resolution
analysis. So I'll give you a quick one-minute overview on what this means for
sake of completeness, but then I'll follow it up with individual appeal I'm talking
about.
So imagine looking at all traffic, entering a specific link and trying to observe how
many bytes exist on a single millisecond. So for every (inaudible) millisecond
you calculate the total number of bytes. So at the smallest time scale you can
imagine computing interesting metrics of interest, for example mean and
variance of the traffic process.
You can then start coarsening the time scale in powers of two, so you combine a
different time bin, and try to understand how many bytes arrive in those time bins
for the duration of the trace.
So again at the next level of granularity, you can again compute mean and
variance for example. So keep doing that so on and so forth.
Now in addition to computing the mean and variance, I will compute a quantity
which is known as the energy of the traffic at a particular time scale. And for
sake of completeness, this computed using the details of the (inaudible)
transform of the original traffic process.
Intuitively speaking, this capture how bursty traffic is at a particular time scale,
but I'll also give you more visual appeal a little later.
So once I compute the energy at all of the time scales that I'm interested in, I can
plot that in a simple diagram. So on the X axis, I'm showing the powers of two
that I talked about. So as you go from left to right traffic will get more coarser.
Similarly, as you go from bottom to top on the Y axis that I'm showing the log of
the energy, traffic will become more bursty. So if there are two traffic plots I'm
showing at a particular time scale, one is higher than the other, then at that time
scale traffic is more bursty. Yes?
>>: (Inaudible) milliseconds.
>> Kashi V. Vishwanath: Yes, exactly.
>>: (Inaudible).
>> Kashi V. Vishwanath: Minus one because there's -- absolutely. Yes.
>>: So (inaudible) time (inaudible) of how the (inaudible) you just give a full
number for the ->> Kashi V. Vishwanath: Oh, because then I would have to show for 15 different
time series, and visually looking at it might not be that meaningful. So this
condenses the exact fact that you're trying to observe.
This synthesizes all of those theories into individual numbers for a particular time
scale. You're absolutely right, we want to look at that, but we would be hard
pressed to like visually compare and conclude whether one is better than the
other.
>>: (Inaudible) mean standard deviation variance I can come up with many
different shapes that have the same (inaudible).
>> Kashi V. Vishwanath: That's absolutely right, which is why you're computing
the energy of the traffic process. It is not exactly the mean and variance.
>>: You answered that by saying okay so (inaudible).
>> Kashi V. Vishwanath: So I did not pick this. So this is something ->>: (Inaudible) capturing and you wouldn't know it because you are using it
around the parameters you happen to be actually capturing.
>> Kashi V. Vishwanath: I'm sorry. Please repeat the question.
>>: The concern was that you're condensing this essentially real function into a
scaler.
>> Kashi V. Vishwanath: Okay.
>>: And you addressed that concern by saying I didn't just condense it into those
two scalers, I condensed it into this third scaler called energy. How do you know
you've got the right scalers?
>> Kashi V. Vishwanath: So what I was trying to say is Internet traffic does
represent burstiness. So in the networking literature a lot of researchers have
focused around understanding burstiness parts of Internet traffic as one means
to capture how much bursty Internet traffic is. And there's a number of visual
appeals to the energy plot which I'll describe in a couple of minutes. Which is
why I decided that without explicitly modelling burstiness, can I reproduce these
salient properties that are present, so which is why I chose this metric.
So if there is another metric where the signature of two different traffic processes
differed and individual applications when they were interacting with this traffic
would have seen the sensitivity to the difference, then yes, I would have had to
reproduce that metric.
>>: (Inaudible).
>> Kashi V. Vishwanath: By running the experiments.
>>: (Inaudible).
>> Kashi V. Vishwanath: So for example, if you come up with a new ->>: (Inaudible).
>> Kashi V. Vishwanath: Okay. So I'm going to give you a one minute ->>: I think actually you have more experiments about validating your work later.
>> Kashi V. Vishwanath: Yes.
>>: Let's get to the end of the talk about that.
>> Kashi V. Vishwanath: Okay. So here is the energy plot corresponding one of
the original traffic traces that I downloaded, for example, the (inaudible) and
exactly as you were saying, at a time scale of one, it's one millisecond, but going
up in powers of two, time scale of nine corresponds to two of the six milisecond
times. So now there are some well-known properties about this energy plot
which will come up a little later which may or may not have been apparent if you
just looked at time series or some other metrics, which is why researchers prefer
to go with it.
For example, this dip that you're seeing here in the energy plot has commonly
been attributed to the self-blocking nature of TCP. For example, as you would
know the TCP works in this complex feedback control loop where you placed it
on the wire, wait for data to be acknowledged before deciding when to put more
data.
So this in turn means that there is some amount of periodicity associated with
how long it takes to get data all the way across and get it back.
This periodicity is nothing but the converse of burstiness or the variance in the
traffic process. So in some sense you might expect intuitively to see some
periodicity around the round trip time of flows.
So indeed if you look at the original (inaudible) trace a majority of flows show a
round trip in the vicinity of 200, 250 milliseconds, which explains why I see a dip
in the energy at the particular time scale. This in turn means that if I came up
with a new trace and if indeed there was some regular pattern and I just plotted
the trace and looked at the dip, that would give me a first order approximation of
what might be the majority of round-trip times in the original flow. So this is one
of the many reasons, for example, why I picked energy plot and why other
researchers have used it in the past.
There are other reasons. For example, people have been attributing that Internet
traffic has self similar nature and they use the plot slow in large time scales to
come up with an approximation of self-similarity in terms of the so-called course
parameter. So those are some of the visual appeals that the energy plot has.
So in my work, instead of trying to understand and reproduce those properties, I
ask myself a very simple question. Can I capture Internet traffic's properties
using the first principles approach to sufficient accuracy that for a handful of
traces without really knowing what this means, can I reproduce that and extract
the energy plot of the reproduced trace and match it to the original energy plot
without explicitly modelling what self-similarity is for example.
So here is what I do. So in the first experiment instead of generating traffic using
the full power of Swing, I will triple it, also to put a little more perspective into
related work and what other people have done.
So in this case I'm extracting properties of users and applications, but I'm
completely omitting out the network which means I'm running this experiment by
running the experiment on a (inaudible) Internet, I'm not extracting capacity and
loss-rate values whether or not the extraction was accurate.
So the first thing I need to say here is even in this case as you exactly pointed
the course bin behavior basically matches so there is no much deviation. But
you look at the energy plot and it has nothing to do with the original traces which
means that there is scope for improvement.
Now I will relax the assumption that I made that I do not know about these
properties. Because indeed I extracted them. So if I add in the capacity
estimates as you can see the generated energy plot now is moving a little closer
to the original plot especially in the first few time scales.
If I relax this assumption further and add in the latency estimates, I can get a
much better match. And if you bought the argument that I gave a little while back
around the round-trim time and the tip in the common time scale I should be able
to reproduce that because I'm not trying to reproduce the distribution latency
values and indeed I can do that.
There is still a lot of scope for improvement in the first few time scales. And as
you guess it right, to get that, I add in the loss-rate estimates.
And now as shown by the green pairs of curves, intensely chosen to be of the
same color to confuse (laughter) we can now match the burstiness plot of the
original traffic process by generating traffic in this fashion. Yes?
>>: (Inaudible).
>> Kashi V. Vishwanath: Concurrent?
>>: Yes.
>> Kashi V. Vishwanath: So this was around a thousand flows concurrent at any
given time.
>>: (Inaudible).
>> Kashi V. Vishwanath: Which is why you're seeing this. So actually the
number of flows increased, and I've experimented with the data trace, then you
do not quite see the risk structure in the first few time scales, but you see it much
later, so it is relatively flat.
>>: Are you going (inaudible).
>> Kashi V. Vishwanath: Yes.
>>: Okay. I'll wait.
>>: (Inaudible).
>> Kashi V. Vishwanath: I see.
>>: (Inaudible).
>> Kashi V. Vishwanath: Yeah.
>>: Generate (inaudible).
>> Kashi V. Vishwanath: I see.
>>: Traffic or through traffic.
>> Kashi V. Vishwanath: True.
>>: Or (inaudible).
>> Kashi V. Vishwanath: True.
>>: So why not create a number of (inaudible) and run some of those tools
assuming that (inaudible) a good job and it was all right and why not do it that
way (inaudible).
>> Kashi V. Vishwanath: So this is where harpoon is. That's a great question.
So this is where harpoon is. So this is where my assumption of where harpoon
is. So I assume that harpoon is equally as accurate in extracting the user and
application properties, using a first pencil's approach and it does nothing about
the network or superimposing any of these or establishing a complex feedback
control loop between these.
>>: (Inaudible) try to achieve (inaudible). It's not great that you actually get it
completely right (inaudible) I think experts (inaudible) you try to send traffic and
modelling that and you say I fit that better.
>> Kashi V. Vishwanath: As a sanity check, yes.
>>: But really your (inaudible) isn't about that (inaudible).
>> Kashi V. Vishwanath: Yes.
>>: It's (inaudible).
>> Kashi V. Vishwanath: Yes.
>>: (Inaudible).
>> Kashi V. Vishwanath: Exactly.
>>: (Inaudible).
>> Kashi V. Vishwanath: Exactly.
>>: And the other (inaudible).
>> Kashi V. Vishwanath: I see.
>>: And things (inaudible).
>> Kashi V. Vishwanath: I see.
>>: So if you put all of them together.
>> Kashi V. Vishwanath: I see.
>>: You approximately get the same behavior ->> Kashi V. Vishwanath: I see.
>>: (Inaudible).
>> Kashi V. Vishwanath: Okay. Okay. So that's a great question. And I think
you're absolutely right. It may actually turn out that some of the accuracies that
I'm trying to shoot for does not matter in some sense. But it turns out and I
actually know this answer because I did a number of different experiments, there
are specific cases where even small deviations in this plot actually matter to the
application that you're trying to evaluate.
So if I did not have a good match starting out, then how do I scientifically answer
the question that whether or not I matched I still captured the impact it had on
individual applications accurately? So to get closer to that question, there is
some desire that if I run a control version of the Internet, then at least for that I
should be able to reproduce some of the properties that I saw.
>>: (Inaudible) on this trafficking (inaudible).
>> Kashi V. Vishwanath: I see.
>>: (Inaudible).
>> Kashi V. Vishwanath: Yeah.
>>: (Inaudible).
>> Kashi V. Vishwanath: Right.
>>: And the environment is a lot more dynamic.
>> Kashi V. Vishwanath: Sure.
>>: (Inaudible).
>> Kashi V. Vishwanath: And ->>: I mean whatever you do in the lab, for example, you take it out in the real
world it doesn't work (laughter) (inaudible) it doesn't work so ->> Kashi V. Vishwanath: So time permitting I'll give you an example of a
controlled experiment which I did where I knew exactly what are the sources that
are generating background traffic, and then I completely ignored it, looked at the
TCP dump trace and I reproduced that traffic using Swing. I was trying to
reverse what I already should have known.
I run some black-box experiments on top of this which tried run some
experiment, they were interacting with traffic in one end and now they were
interacting with some traffic in the other end. And I showed that the numbers
matched. Then I played the scenario that you're exactly saying. I said the
behavior of this application is going to evolve over time. And I applied the
revolution of behavior by tuning the Swing parameters without changing the
extraction process. And then I ran the black box again in this new scenario, and
then I showed that the application matches, which is some hope basically that at
least for the kinds of things I did it makes sense.
But you're absolutely right in the expanded generalized case you really don't
know like what aspects are important to capture. So in this work, I'm trying to
argue that some of the aspects that I tried to capture at least make sense in
common applications. But a general question is I think up for debate.
>>: (Inaudible) of the link?
>> Kashi V. Vishwanath: So this link is around five and a half megabits per
second. So this is fairly underutilized around -- so they said it's 0C3 link, but the
university blogged it to around 20, 25 megabits per second, so 15 to 20 percent,
not more than that.
So were you going to say that the performance, the numbers look like a heavily
utilized link what ->>: Yeah, (inaudible) model application (inaudible) really matter what the traffic
is doing?
>> Kashi V. Vishwanath: Yes, actually the answer is yes. So it depends on the
application that you're trying to evaluate. For example, if you're trying to build an
available (inaudible) tool, even for relatively underutilized link but bursty traffic
patterns, it can reach potentially conflicting results based on...
(End of audio)
Download