>> Srikanth Kandula: It's a pleasure to introduce Minlan... graduated just five years back from Peking University in China,...

advertisement
>> Srikanth Kandula: It's a pleasure to introduce Minlan Yu from Princeton. She
graduated just five years back from Peking University in China, so she is straight
out from grad school.
Minlan has interned in the past with us, with Albert, Dave, and Lee in bing. She
has built a measurement infrastructure that allowed her to look at TCP level
stacks and collect fair amount of network and host integrated measurements that
led to discovering TCP anomalies.
In the same vein she has done a bunch of work and she's best known perhaps
for doing Cute Algorithms that -- in routers and switches that do -- that enable
better measurement and better access control policies, that kind of stuff. All
right. I'll let her [inaudible].
>> Minlan Yu: Okay. Thank you for the introduction. It's my great pleasure to
give the talk here. And the talk is about edge networks, those networks in
companies, campuses, data centers and home.
So recently there have been growing interest on these network, these edge
networks. But mostly people focuses on how to make these networks bigger,
faster and more efficient. And what I'm going to focus today is on the
management of these edge networks. Based on yankee's group support.
In most enterprise today network management takes about 80 percent of the IT
budget but are still responsible for more than 60 percent of the outages. So this
management is an important topic, but it's only explored for these data centers
enterprise network area.
In these kind of networks, ideally we hope to have the users and applications of
first priority and only make the network as a secondary. So we want to make the
network as transparent as the air we breathe so that it can be totally support
different diversity of applications and different requirements from users.
But of course today's networks are far from this goal. So instead of working on
how to make today's networks easier to manage, we focus on how to redesign
these networks so that it's easier and cheaper to manage.
There is three challenges to manage these edge networks. The first challenge is
these networks are growing larger and larger every day with lots of hosts,
switches, and applications. On the other hand, people hope to enforce a lot of
flexible policies for different management tasks like routing, security and
measurement. Finally we only have simple switches in hand. These switches
have constraints on cost and energy. But how do we use these simple switches
to support both large networks and flexible policies?
In this talk I'm going to present you a system that achieve all these three
challenges in the same time.
First let's look at two examples large networks. The first is the large enterprise
networks. In these networks we usually have 10s of hundreds of thousands of
posts and we have thousands of switches connecting to them. On these hosts
people may run a diversity of applications. These applications may have
different traffic metrics and different performance requirements. The data center
networks are much larger than those enterprise networks. In data centers we
usually have virtual machines running on these servers. So totally there could be
millions of virtual machines and these virtual machines can move from servers to
different other servers.
And also we have different layers of switches connecting these servers together
and on these servers people may run different applications. Yes. Question?
>>: Why does the number of applications matter? If these were -- if these were
like different flows, is that's what's going on or like different -- just the presence of
different applications is somehow ->> Minlan Yu: So today people not only want to manage the switches and
servers but also the applications. They will want to provide different routing for
these applications, different QoS access control for these applications or improve
the network performance for these applications. So if you have a huge amount
of applications, each with different tracking metrics or difference performance
requirements it's a challenging problem.
And it's like more challenging the data center than in the these enterprise
environment because data centers have more diverse traffic patterns and more -much higher performance requirements. And in order to support these larger
networks as operators today, they also want to enforce flexible policies.
Today's network operators have so many considerations. They want to improve
network performance, improve security, support host mobility, saves energy,
saves cost and make the network easier to debug and maintain.
So in order to achieve all these management goals, operators usually have
different kind of policies. The first example of policy is these operators may want
to provide customized routing for different applications. For example, provide like
shorter latency paths for on those applications -- realtime applications.
They may also want to enforce access control rules in the switches to block the
traffic from malicious users. Finally for counting and measurement, they may
want to measure the amount of traffic from different users and different
applications.
But on the other hand, we only have simple switches. Today's switches usually
have increasing link speed. Today data centers usually have 40 gigabit PS link
have even more. So that means we can only process each packet with tens of
nanoseconds. In order to process these packets with fast speed we can only run
these very small on-chip memory. This memory is very different from the
memory we have in our laptop because they are very expensive and power
hungry, and as a result we can only afford to have a small size of it.
But on the other hand, operators wants to support -- to store lots of state in these
small memory. Just to support forwarding rules we have mailings of virtual
machines to data centers, that means we may need to install one forwarding rule
for each host or each switch in the small memory that could consume a lot of
memory and at same time we also need to use this memory to enforce access
control rules, quality of service for different applications and users.
Finally, we may want to maintain different counters for specific flows. It also like
requires memory.
So how can we address these challenge of using this small memory to install this
large amount of data? We'll build a system that to manage these edge networks.
And this system the operators first specify the policies in the management
system and then each management system will con configure these devices,
both the switches and end hosts to express these policies. This management
system will also collect measurement data from the network to understand the
network conditions.
So we'll build two systems. The first system works on the switches. It basically
focus on how to enforce flexible policies on the switches. And the second
system is called SNAP, it focuses on the operator direction, how to scale a
diagnose performance problems on the end host.
So a common approach I take in both DIFANE and SNAP or many of my other
works is that I want to first identify those practically challenging problems. And
then I want to design new algorithms and data structures. For example in
DIFANE's work I provide the new data structures that can make effective use of
the switch memory. And in SNAP work we provide new efficient data collection
analysis to algorithms.
And these algorithms and data structures wouldn't be enough if we can't validate
them with real prototyping. So in DIFANE's work we actually build prototypes
with openflow switches. And in SNAP work we build prototypes with both
Windows and Linux operating systems.
Finally we hope to really make real world impact with these prototypes so we
actually collaborate within industry. For DIFANE's work we collaborate with the
AT&T and try to evaluate our prototype using its network configuration data. And
for SNAP work, we actually work with the bing group in Microsoft and help
already deploy these tool in these production data center in bing.
So first let me talk about DIFANE. DIFANE is basically -- problem is how do we
scalably enforce flexible policies on the switches?
So if we look at traditional network, when we buy a switch we usually buy to
bundle the function together. The data plane, which is the hardware part of the
switches, what actually forward the packet. The control plane, which are the
brains in the switches that runs on protocols to decide where the traffic should
go. Since these two parts are bundled together in the switches in the data plane
they are both limited. For example, in the data plane different lenders may
provide different hardware and they only support limited amount of policies. And
in the control plane, since it's already designed when we buy the switch, it's very
hard to manage and to configure. So in order to manage traditional network
today operators only manage the network in offline fashion and sometimes they
go through each switches independently and try to manage an configure them.
So in order to improve the management of these traditional networks there have
been two new trends. The first trend is in the data plane. We call this flow-based
switches to support more flexible policies.
And the second trend is in the control plane. We have these logically centralized
controller to make management easier. I will talk about these two new trends in
more details.
The first trend is the flow-based switches called OpenFlow switches. These
switches have already been implemented by different vendors like HP and NEC,
and it's already been deployed by different campus and enterprise networks. So
what these switches does is it basically performs some simple actions based on
the rules. The rules are basically matches uncertified bits in the packet header.
And the actions could be drop packets from a malicious user, forward a packet to
a particular port or count the packets that belongs to the Web traffic.
So all these rules can be represented in these flow space I draw here. This flow
space here I'm showing two dimensional flow space. The X axis the source and
the Y axis is the destination. So here this red horizontal rule shows that I want to
block traffic to a particular destination and this blue line shows that I want to
count packets from a particular source. And in real switches these wildcard rules
are stored in a specific memory called TCAM.
There are two features of TCAM. The first feature is TCAM supports wildcards.
For example, the first rule here shows that X can be Y cards and Y should be
exact value one. And they describe this red rule here.
And the second feature of TCAM is it has different priorities. So if a packet
matches multiple rules in TCAM, only the rule with the highest priority will take
effect.
And in today's switches there are only limited amount of TCAM because TCAM is
on chip and it's very power hungry, so it's very important for us to reduce the
amount of rules we store in these TCAM memory.
And the second trend is that we want to have logically centralized control plane
to make management easier. So now we have these big centralized brain that
operators can easily configure their policies in the central place. And then this
brain take a job of deploying policies of all the switches. This makes
management easier but it's not scalable. And if the whole other controller is out
of its fields then the whole network may be out of control.
So in order to solve this problem, we present DIFANE. It's a scalable way to
enforce fine-grained policies in the switches. DIFANE's idea is basically you pull
back some lower stem function of the brain back to the switches and by doing
that you can achieve better scalability.
So before I talk about how do we enforce policies, we can think of what's a most
intuitive way to enforce policies from the brain to the switches. The first solution
is that we can just pre-install the rules in the switches. So the controller can
pre-install the rules and when a packet arrives at the switch, the switch can
match the packets with the rules and proceed it.
But the problem here is it doesn't support host mobility while host move to
different places that all the rules that associated with this particular host need to
be installed if all the switches that the host may connect to.
And usually a switch don't have enough TCAM memory to store all the rules in
the network.
So a different solution is proposed by Ethane's work. Its basic idea is we can
install the rules on demand. When the first packet arrives at the switch, the
switch doesn't have any rules so it buffers the packet and send the packet
header to the controller. Then the controller will look up its rule database, find
out the rules and install the rules in the switch. So that the following packets can
match these rules and get processed.
But the problem here is we can see the packet goes and extra long way of going
through the controller and come back. And based on the experiment, even when
the controller is near the switch and the controller only have a small amount of
rules, it still takes about 10 millisecond to go from the switch through the
controller and come back. There are two ->>: Did you say milliseconds.
>> Minlan Yu: 10 millisecond.
>>: Really?
>> Minlan Yu: There are two ->>: That's really a long time.
>> Minlan Yu: Yes, that's really long. So there are two contributors to this long
delay. The first is that the switches. The switch only have very small CPU. So it
can't process to these kind of buffering and sending at a very fast speed. So it
has a lot of delay here.
And another delay is from the oracle. Oracle need to look up the rules and try to
find the rules and then install it.
>>: [inaudible]. This sounds like a disc [inaudible] latency. Is the controller
[inaudible].
>> Minlan Yu: And in our experiment since we make the controller really simple,
most of the time a delay come from the switches.
>>: Come from the switches?
>> Minlan Yu: Yes.
>>: Okay.
>> Minlan Yu: Yeah. So if the controller getting complex it will have more time
delay.
And also we find out it's like hard to implement with the data switches because
it's increased with complexity. The switch can't just use first in first out buffer
because it has to buffer the package while continuing to handle the other
package that can match the rules that already in the switches. And once it
receives some rules from the controller, it has to find a way to retrieve the packet
from the buffer.
And finally, if there is some malicious users who send traffic to a variety of
destinations through a variety of ports then you can't always invalidate the rules
in the switches so that it can make all the packets that arrive at switch
experiences extra long delay and also increase the overhead at the controller.
So in order to find out a new solution that can address these two problems we've
seen with the two approaches we hope DIFANE could await the project we built,
can scale with the network growth naturally. It can address the limited TCAM
entries in the switches and the limited computing resources at the controller.
And also we hope to improve the per-packet performance so that we can always
keep the packets in the data plane to reduce the delay of each packet.
Finally we hope to have minimal modifications at the switches so that we don't
need to change the data plane hardware and it's easy for -- to deploy our -- the
DIFANE.
And our idea is very simple. We just want to combine this proactive approach
and reactive approach together so that we can have better scalability, higher
throughput and lower delay.
And DIFANE's design has two stages. In the first stage, the controller proactively
generates the rules and distributes them to some authority switches. So in our
design we still have a centralized controller. So it's still easy for operators to
configure the policies there. And then this controller will represent these rules in
the flow space. Then the controller partition this flow space into multiple parts
and design each part to one authority switch.
At the same time, the controller will distribute some partitioning information to all
the switch in the network so each switch knows how to handle the packet. Which
the switch receives some packets and knows which authority switch to ask for
the rules.
And the second stage, since the rules are already preinstalled in the authority
switches we can use the authority switches to keep the packets always in the
data plane and then reactively cache the rules.
So here's how it works. When the first packet arrives at the ingress switch, the
ingress switch redirects the packets to the authority switch. And then the thought
switch can process the packets according to the rules. At the same time, the
authority switch will install some cache rules so that the following packets can get
processed locally at the ingress switch and directly forward it.
So we can see the key difference here between DIFANE in the previous reactive
approach is now the packets traverse a slightly longer path in a data plane
instead of sending the packets through the slow control plane. So -- yeah?
>>: [inaudible] if you have no policy for the first packet it automatically redirects
to the authority switch?
>> Minlan Yu: Yes. So if the -- there's no cache the rules to handle the first
packet, then the ingress which will redirect the packets to the authority switch.
Yes.
>>: This is a detailed question, but are you doing like IP encapsulation here or
redirection?
>> Minlan Yu: So there are different solutions. The first solutions is you can just
use IP encapsulation to these kind of tunnelling from the ingress switch to the
authority switch and to the egress switch.
>>: And doing a lot of that still tends to be faster?
>>: 10 milliseconds. She [inaudible] budget.
>> Minlan Yu: That's much faster than that. And today the switches don't do
these kind of encapsulation hardware so we hope to use VLAN text to improve
the performance. Yes.
>>: [inaudible] you said that the reason you had 10 millisecond delay [inaudible]
came from the switch.
>> Minlan Yu: That's in our evaluation because we have a really simple
controller.
>>: Okay.
>> Minlan Yu: Yeah.
>>: So suppose you also had a similar kind of -- so you had a switch which was
not -- which did have a powerful CPU.
>> Minlan Yu: Uh-huh.
>>: Now you're going to ask that switch to do a lot more. You're going to ask it
to not just make a query to the sentence but you're going to ask it to do Mac and
Mac or IP Mac encapsulations.
>> Minlan Yu: Uh-huh.
>>: And a rule lookup. And that's going to take less than 10 milliseconds?
>> Minlan Yu: So rule lookup with the switch you really have to do.
>>: Okay.
>> Minlan Yu: If it has ->>: That's fine. So you're saying that encapsulation takes less time than
sending a query to the controller?
>> Minlan Yu: That I'm not sure, but based on the software evaluation it's faster
than sending the packet -- buffer the packet and sending it to the controller. And
we have a solution that use VLAN text that can be much faster than in the
hardware rather than like full encapsulation to further improve that.
But the challenge of using VLAN text is now the authority switch don't know the
ingress switch address anymore. So although we can implement a tunnel in
which VLAN text, we can't implement the rule caching scheme with VLAN text.
So instead of doing that, we can move the caching scheme back to the controller.
>>: So the -- the thing that you're improving on [inaudible] is just the post packet
delay.
>> Minlan Yu: Yes.
>>: Nothing else?
>> Minlan Yu: So basically the first packet delay would benefit from moving from
software to hardware. And the second better thing we do is this kind of
scalability. It's distributed instead of centralized. So it's like scale more with
larger networks. So when these things approach with much larger networks in
the long run can have a centralized controller.
>>: [inaudible].
>> Minlan Yu: We are -- for now we are look at [inaudible] network with
thousands of switches. And for that delay in the evaluation you will see that the
one single centralized controller can only process like 50K flows per second. So
that means for a thousand ->>: [inaudible] new flows per second, right?
>> Minlan Yu: New flows per second.
>>: Yeah.
>> Minlan Yu: And for a network with a thousand switches then you can only
support like 50 new flows per second from each switch.
>>: That's a different bottleneck. The 10 millisecond.
>> Minlan Yu: Yes, that's a different bottleneck. It's like more about the super
bottleneck rather than this delay bottleneck. So I hope DIFANE's solution can
address both the delay and the throughput bottlenecks.
>>: So how much faster is it then to encapsulate and send it -- I mean, how
much time does it take?
>> Minlan Yu: So based on the software evaluation we use the software click to
implement these kind of buffering techniques and those encapsulation is the
difference like 10 millisecond versus one millisecond.
>> Minlan Yu: So what is the purpose of the encapsulation, just to know where
the package came from?
>>: The encapsulation -- the purpose of encapsulation is to create a tunnel from
the ingress switch to the authority switch.
>>: Yeah, but why ->> Minlan Yu: So all the switches in the middle [inaudible] will no how to send
the packet and wouldn't do another redirection.
>>: These aren't visible links.
>>: Why is it bad to do another redirection? I mean the redirection is what
happens -- it's -- the switch looked up the packet in its TCAM and saw that the
default rule said send it that way on that link. What would be the harm if all the
switches did that on the way to the authority switch?
>> Minlan Yu: So here when the packet arrives at the switch, the switch knows
based on -- this is the direction the authority switch need to go. And that means
all these switches in the middle should also know that the ingress switch hasn't
processed the packet yet and need to redirect it. Right? If the ingress switch
already match the packet with a cache rules then can be sent directly. So all the
switch in the middle should not do anything.
So all switches in this link need to know that they need to redirect and all the
switches here should no -- so it's like add to the capacity and [inaudible].
>>: [inaudible] switches saying it's [inaudible].
>> Minlan Yu: Yeah. VLAN caching.
>>: So in terms of the expressivity of the policies that you can have now this is -this is similar to just dealing with the controller to the bottleneck using multiple
controllers?
>>: Yeah.
>> Minlan Yu: But if you have multiple controller, where are you going to place
it? So do you have like all the packet sent through an actual network to this
distributed amount of controller node and how do you distribute the state among
these multiple controllers? So like one thing I can say is you can use DIFANE's
technique to distribute the state among these controllers. But if we move the -these kind of states back to the switches then you no longer need these kind of
actual links to go through the controller.
>>: Okay. So I hear you saying there's a trick to distributing these rules. When
you say like you can use DIFANE and straight to distribute -- to partition the set
of rules along multiple controllers.
>> Minlan Yu: Yes.
>>: So that's not something that's straightforward.
>> Minlan Yu: Yes. Like how do you -- so we chose to do partition instead of
replication, right. A naive solution would be you let all the controllers store the
whole set of rules and so whenever the packet arrives whichever controller can
process it. But the problem there is like how do you direct the packet to the right
controller and how does this controller maintain consistent state with other
controllers?
>>: Do you want to tell us what happens if after you partition the rules, let's say
their authority switch goes down?
>> Minlan Yu: Yes, I'm going to talk about that.
Okay. So in summary the key benefits we get is like now we move the packets
proceeding totally in the hardware path instead of going through the controller
path. And a also it's distributed many authority switches. An extreme example is
like if this rule caching action is really slow, then in our design the packets
wouldn't need to rule caching scheme. So all the following packets will also get
redirected through the authority switches.
Okay. So now the question left is since we have a group of authority switches
how do we know which authority switches -- how does the ingress switch knows
which authority switch to ask for the rules? So if we look at this example, we
partition this flow space into three parts and to describe this partition we just
need three wildcard rules. And each rule describes the boundary of each
partition. And by doing that, we can easily describe this partition in the TCAM in
the switches.
And so all these authority switches together are actually hosting a distributed
directory of rules and it's not DHT because hashing didn't work here. But here
with we customize the partition we are able to like using TCAM hardware already
exist in the switches to support this partition.
>>: It's not an accident that [inaudible] right? You're just matching bits here?
>> Minlan Yu: Yes.
>>: So the fact that that's two and four is not an accident. You couldn't do three
and five?
>> Minlan Yu: Yes. So basically you need -- this is kind of one rule. But in
TCAM we needed like several rules to describe this.
>>: Right. So you divided up -- you just matched the bits and the packet
headers for ->> Minlan Yu: Yeah, it's like a traditional trick in TCAM if you have wildcard rule,
how do you implement it with power tool boundaries. Yeah.
Okay. So now we know that this partition information does some partition rules
we can install in all the switches. And what are these authority switches? They
are just normal switches. They are just storing a different set of rules called
authority rules. So in general one switch can be both ingress switch and
authority switch at the same time. So in general with switch we already have
three sets of rules in the TCAM by storing these three sets of rules in the TCAM
we can easily support DIFANE.
So the first set of rule are some cache rules that exist in the ingress switch and
are actually installed by the authority switches. And then the second set is some
authority rules that exist in authority switches and are proactively installed by the
controller. Finally there's some partition rules that exist in all the switches and
are proactively installed by the controller.
And there are different priorities among these rules. So if a packet matches
cache rules it can be processed directly. So cache rules have the highest
priority. And the partition rules have the lowest priority because if a packet
doesn't match any of the cache rule or the authority rules it will certainly match
one of the partition rules. And with these partition rules we can always keep the
packets in the data plane.
So actually implement DIFANE with OpenFlow switches and we don't need any
change of the data plane, so we just install three sets of rules in the data plane.
And then if a packet matches the authority rules we will send a notification to the
control plane and in the control plane we implement the cache manager that
sends some cache updates to the ingress switch. So at the ingress switch if it
receives some cache updates it can change the cache rules accordingly.
So we can see that the authority rules and the cache manager only exist in
authority switches and we just need software modification for authority switches
to support DIFANE.
So the next question is how can we implement this cache manager? Yes?
>>: Just pre a security perspective now what's going on is that each authority
switch has the ability to change the rules of every other switch in the record?
>> Minlan Yu: Uh-huh.
>>: For its subsection.
>> Minlan Yu: Yes. So and we assume that all the switches if the network are
trusted and are controlled by a centralized controller. So we don't support any
malicious switches. Yeah. Is.
>>: Okay.
>> Minlan Yu: Okay. So the next question is how can we implement this cache
manager? The challenge here is there are like multi-dimensional flow space and
there may be wildcard rules in different dimensions. So here I'm still showing a
simple example of two dimensional flow space and here there are several rules.
These rules are overlapping with each other so they have different priorities.
The rule one has highest priority and then rule two, then rule three, then rule four.
So if a packet matches rule three, we can't simply cache rule three because
another packet that matches both rule two and rule three should take rule two's
action. So instead the authority switch should generate a new rule that cover this
blue rectangle space and only cache this new rule.
A different case is that if a packet matches rule one then it can just cache rule
one directly. So this kind of caching problem because even challenge -- more
challenging when we have multiple authority switches. So here in this figure
when we have two authority switches each taking half of the flow space, then
where do we store rule one and rule three and rule four?
So for example, for rule one, if we store them in both authority switches then
these authority switches may make independent decisions what they want to
cache in the same ingress switch. So there are maybe catch conflicts. So
instead of storing rule one, we store two independent rules and then store two -each of them in one authority switches.
So we can see that we actually with multiple authority switches we actually
increase the total number of rules in the whole flow space. So this brings a
challenge to how to we partition this flow space so that we can minimize the total
number of TCAM entries required in the whole switches. This problem is
NP-hard for the flow space that's two dimensional even more dimensions. So we
propose a new solution called decision-tree based rule partition algorithm.
And the basic idea of this algorithm is cut B is better than cut A because cut B
follows the rule boundaries and doesn't generate any new rules.
Okay. So there are better questions of how do we handle network dynamics
because the operators may change the policy at the centralized controller, the
authority switch may fails, and the host may move to different places. And here
I'm going to talk about how do we handle authority switch failure.
So we actually duplicate the same partial flow space into two or three authority
switches that have scattered in different places in the network. So this kind of
duplication first helps to reduce the extra delay of redirection. For example, here
we have two different switches and for this ingress switch it can choose the
nearest authority switch to redirect a packet to. To implement that we actually
implement two different partition rules with the same flow space but direct them
to different authority switches.
So in the normal case for this ingress switch the first rule of the higher priority will
take effect. That means the packets will get redirected to the nearer authority
switch A1. Similarly for another ingress switch it will get redirect packets to its
nearer authority switch.
So if authority switch A1 fails then we run all these OSPF proposals across a
whole network so the ingress switch can easily get these kind of switch filler
events notification. And then this ingress switch can easily invalidate this
partition rule so that the second repartition rule naturally take effect and it can
easily redirect the packets to next nearer authority switch in the network.
So in summary by duplicating authority switches we can reduce the extra delay
of redirection, we can do better load balancing and fast reaction to authority
switch failures.
So we actually validate our implementation of DIFANE in the testbed of around
40 computers and we implement two different solutions. First is Ethane and the
second is DIFANE.
In Ethane setup the packets will get redirected to the controller and then the
controller install the rules so that the packets can get processed in the switch.
But in DIFANE's architecture the controller will pre-install the rules in authority
switch so that the packets can get redirect through the authority switch and get
processed directly.
So in order to be fair with Ethane we only evaluate our prototype with the single
authority switches. And also since the key difference between the two
architectures are the first packet of each flow, so we only count the first packet of
each flow.
And now we increase the number of ingress switches to test the throughput of
the two systems. So here I'm showing the throughput evaluation. The X axis is
the sending rate and the Y axis is the throughput. And both are in log scale.
Question?
>>: Yeah. I'm trying to understand [inaudible] like you started out with like
wanting to make networks easy to manage, right?
>> Minlan Yu: Uh-huh.
>>: So in what sense DIFANE achieved that? It seems you only added more
complexity on top of what Ethane had.
>> Minlan Yu: So Ethane had the same level of ease management, easier
management compared to Ethane because operators only media to care about
the centralized controller only to say like whether this group of students have
access to this group of servers only need to consider all these high level policies
at the central place in the centralized controller. And then DIFANE -- this is same
with both architecture. And DIFANE add the extra layer below this controller to
make this policy enforcement more scalable. And operators don't need to worry
about this.
>>: So okay. So let me ask that. So you're essentially saying it's not like you've
made management easy, you made ->> Minlan Yu: So how to scale management while still maintaining -- make it
easy. Like today's network is totally distributed on all the switches. It's hard to
manage. But it's pretty scalable. So we hope to find the right spots that achieve
both easy to manage and also scalability.
>>: And you don't -- okay. Just from a management perspective, you don't
expect operators to pick which switches should be authority switches and things
like that? They don't need to get involved in decisions at all?
>> Minlan Yu: No. Operators don't need to worry about that and we can just
randomly pick a group of switches to do that. And without partitioning algorithm
we can help operators decide with amount of rules they want to install and the
switch capabilities they have in the network how many authority switches, how
many partitions they need.
>>: But is that because the worst case [inaudible] authority switch in your
architecture is similar to the worst case known in the [inaudible] architecture?
Because if switches are doing mor I would want like the better switches in my
network to be picked out of authority switches rather than some random small
switch that connects to one computer.
>> Minlan Yu: So if authority wants to control like where it reinstalled the
authority switches they can certainly do that. Certainly like input as a
configuration.
>>: [inaudible] I imagine the switches would be heterogenous and operator -you would want operators to control it.
>>: Yeah, but just basic [inaudible] wouldn't matter if ->> Minlan Yu: Yeah, the input of our partitioning algorithm is like different size of
memory in different switches. And then we can like make sure the amount of
rules we put in each switch can fit the different size.
And this is actually a good question. We had some going work on like how can
we eliminate this heterogeneity across the switches totally from the operators?
So we hope that we have built a like a virtual memory manager layer in the
centralized controller that this can manage these kind of physical memory
different switches. And then operators only need to have this virtual memory
logic so that to write the operators program and where they hope to install the
rules.
Okay. So let's look at this throughput evaluation. Here I'm showing this figure
about the ascending rate and the throughput. Both are in log scale. So we find
out that with just the one ingress switch DIFANE already performs better than
Ethane. This is because in Ethane's architecture the local ingress switch has a
bottleneck of 20K flows so the switch -- ingress switch can only like process the
buffers of packets and send the packet to the centralized controller at the rate of
20K flows per second. Yes?
>>: The Y axis is not [inaudible] Y axis flows that I have never seen before per
second.
>> Minlan Yu: Yes. So it's like ->>: [inaudible].
>> Minlan Yu: The -- yeah you're right. Like it's number of packets per second.
First packets of each flow per second.
>>: First packet of flow I've never seen before per second.
>> Minlan Yu: Yes.
>>: So how [inaudible].
[brief talking over].
>>: New flows.
>> Minlan Yu: New flows per second. So like evaluation we are only like
sending new flows. Each new flow has like a single packet. So eliminate all the
flowing packets.
>>: Well, okay. So how [inaudible] is this in terms of [inaudible] workloads. How
often does this happen that you get 100K new flows at a switch per second?
>> Minlan Yu: I don't actually have real data on data center networks. I know
that for Web traffic it's really like 30 to 40 packets per flow.
>>: Web traffic. But this is traffic [inaudible] to the data center. So Web traffic is
not an issue here.
>> Minlan Yu: Yeah. It also like depends on how you define the flow. It's not
like per [inaudible] flow, it's basically how fine-grained control you want to put for
different flows. If you want to have only want to have control on the source
address, then alike it's only about how many new source is coming to this switch.
But if you want to have very fine-grained control in each applications, then it
depends on like which applications is sending the new flows.
>>: [inaudible] to go to packet expression which is [inaudible].
>> Minlan Yu: So in today's openflow is suppose like 10 tuples, so within these
10 tuples you can.
>>: [inaudible].
>> Minlan Yu: Sorry?
>>: At what rate can you do ten tuple inspection?
>> Minlan Yu: It's like normal switch today. Openflow doesn't add any hardware
support today. So if a switch can process one gigabit packet per second then it
can still ->>: No. Processing won't deal with packet per second is different than one
gigabit per second new flows are added per second, correct?
>> Minlan Yu: Yes.
>>: So that's your Y axis here. Your Y axis is not packets per second?
>> Minlan Yu: Yes.
>>: Your Y axis is number of new flows per second.
>> Minlan Yu: Yes.
>>: That's what it says is per second.
>>: No, it's new flows.
>> Minlan Yu: Yes, it's new flows per second.
>>: Yeah.
>>: I'm just wondering whether you actually get those new flows per second.
>> Minlan Yu: So it depends on the flow definition. So I don't actually have real
choice today's data center traffic what kind of new flows per second each switch
may get.
>>: And so the difference between the two of them is primarily because you run
out of buffer space for the packets in Ethane. Is that what's going on?
>> Minlan Yu: The difference is these kind of switch have to do some software
processing of buffer the packet and send the packet ahead to the controller. This
is data in a control plane pass, not the data plane pass. So the control plane
pass throughput is only like 20K flows per second.
>>: It appears to change sharply. [inaudible] what is the bottleneck resource
that you're hitting at [inaudible] in the curve?
>> Minlan Yu: For the egress switch it's the CPU overhead, CPU bottleneck.
>>: But you're generating almost the same amount of control plane traffic as in
DIFANE because you're still getting the ->> Minlan Yu: Yes. So key differences we don't need to buffer the packets
anymore.
>>: So it is a packet buffer?
>> Minlan Yu: Yes.
>>: Okay.
>> Minlan Yu: Yes. Sorry. Okay. So if we increase the number of ingress
switches then both systems could have better throughput but the centralized
controller can only handle 50K flows per second. And with DIFANE since we are
doing everything in the fast data plane we can achieve 800K flows per second
with a single authority switch.
And another benefit of DIFANE is self scaling with the network growth. So if we
hope to have larger networks with higher throughput we can just install more
authority switches in the network.
So another evaluation is about how can we scale to the large amount of rules.
So actually collect the access control dual traces from one campus network and
three different networks from AT&T. So for all these networks we collect this
access control rule data from all the switches and retrieve the network-wide rule
of what kind of policy operators hope to address.
>>: This is a [inaudible] question. So if I can go through like six switches from
coast to coast and the 10 rules are installed in the way [inaudible] works, the 10
rules will be installed on each of these six switches, right?
>> Minlan Yu: Ethane was only installing like ingress rules.
>>: And the others were just forward?
>> Minlan Yu: It depends on like which rule you mean. Like for this access
control rule it's only ingress switch.
>>: I see. So but for the second switch in that chain of six like it's a new flow for
that switch, right?
>> Minlan Yu: Yes. So like for forwarding rules in Ethane, like all the switching
need to know how to forward the packets. So the centralized controller get the
new flow from the first ingress switch it will tell all the switcher to install the
forwarding. It's kind of small trick to speed it up, yeah.
>>: So you do the same?
>> Minlan Yu: For us, we use tunnelling so we don't have this issue.
>>: But the tunnelling only works for the first packet. But does the authority
switch install the rules on all the six switches or only ->> Minlan Yu: So the tunnelling also happens from the ingress switch to the
egress switch. The direct tunnel between these two switch. So the forwarding
rule in DIFANE is only like the destination host what the egress router is, rather
than what's next the outgoing interfaces.
>>: Okay. We may want to take this offline but you're not tunnelling throughput,
you're not tunnelling all the packets, are you?
>> Minlan Yu: I'm tunnelling all the packets.
>>: Oh.
>>: [inaudible].
>> Minlan Yu: Yes.
>>: Clear and from then on everybody ->> Minlan Yu: Yes.
>>: All the other switches [inaudible].
>>: So the first packet of flow gets a tunnel -- that gets a tunnel label that takes it
directly to the authority switch?
>> Minlan Yu: Yes.
>>: And every subsequent packet in that flow gets a tunnel label that takes it
directly to the egress switch?
>> Minlan Yu: Yes.
>>: Is this [inaudible] uncertain?
>> Minlan Yu: Ethane does this kind of install forwarding ruling all the switches.
It's not -- it doesn't use tunnelling.
Okay. So basically for one evaluation we see is for these IPTV network we get
from AT&T it has about 3,000 switches and 5 million rules. And the amount of
authority switches we need and depends on the network size, the TCAM size
and the number of rules but in general we only need like .3 percent or 3 percent
of the network switches to be authority switches.
So in summary, let me use this slide to summarize what DIFANE does. It's
basically traditional network total had I distributed but it's hard to manage. And
this new architecture comes from OpenFlow and Ethane that's used as
centralized, logically-centralized controller so it's easy to manage but it's not
scalable.
And what DIFANE hope is to pull back this trend a little bit so that we can -- in
DIFANE the controller is still in charge, so it's still easy to manage, but all the
switches are hosting a distributed directory of rule, so it's month scalable.
And when we talked to Cisco, he seem -- Cisco is interested in this trend from
additional network to DIFANE because in their traditional network they also have
these kind of limited TCAM space in the routers that's not enough to animal all
the access control rules. So they view DIFANE as distributed way to handle
large amount of [inaudible] rules among a group of routers to address this limited
memory in the route -- in a single router.
>>: So if -- when you have a large number of authority switches, so if you have 3
percent of 3,000 switches then it's 90 [inaudible] that's a fairly large thing. And if
it turns out that the rules that you're having tend to be -- tending to in both
directions, then in general the number of subrules that you have to create is
going to go up with the product of the number of 30 switches and the number of
rules. Right? So now we're looking at hundreds of millions of subgroups.
>> Minlan Yu: Yes.
>>: Which is going to cause you to eat TCAM memory faster than in Ethane
which is going to cause cache business to go up. How much --
>> Minlan Yu: So basically you need a smart partition algorithm that can reduce
the amount a of rules. Especially if there's so many overlapping rules if one area,
I want to not cut the area but maintain that they are in a single switch.
>>: I don't have a good feel for what these rules look like but my guess would be
->> Minlan Yu: So actually for the AT&T state our algorithm works ->>: How localized are ->> Minlan Yu: The reason is that it's mostly access control rules that want to
block -- make sure the right set of customers have the right access to the source.
So basically these customers are pretty independent. So there's not much
overlapping among these rules.
>>: So rules don't tend to be -- block off traffic of one particular source or block
off traffic to a particular destination ->> Minlan Yu: Yes, it's more independent.
>>: [inaudible] small. Okay.
>> Minlan Yu: Yeah. But [inaudible] do that with more overlapping of rules them
it's much harder to break them apart. Yeah.
>>: [inaudible] particular [inaudible] let's say, you know, if you have you a -- if
your rule set look like half block destination, half block source then no way you
cut it is really going to help because [inaudible] rules and the consequence of
that is that if you try to scale this thing you're going to eat up -- you're going to eat
up TCAM nonlinearly [inaudible].
>> Minlan Yu: Yes.
>>: And ->> Minlan Yu: So for that, if you have the authority switch with larger TCAM
memory size, then you only need to cut less that will be -- help a lot.
>>: All right.
>>: One small question. How are these rules invalidated in these questions is
policies.
>> Minlan Yu: For non security rules we rely on time bombs. R for those rules
that's really important then you may need to remember them and actually we
modify them [inaudible].
>>: But if you have cache retention, you need to evict something from the cache.
You've got this problem you pointed out earlier that the rules interact. Is it ->> Minlan Yu: Yes. So the worst case, if all the traffic goes through -- always
goes through those harder switches.
>>: Is it always safe to evict a cache line or are they sometimes interdependent
from the [inaudible] switch?
>> Minlan Yu: So for Ethane work when you evict it down, all the traffic of need
to go through the controller so there is a potentially attack from the controller that
if you send a lot of traffic it will overload the controller. But in DIFANE's way, all
the traffic will go through authority switch, it's much harder to overload.
>>: I guess -- my question wasn't about [inaudible].
>> Minlan Yu: Okay.
>>: It was that the rules have these funny interactions you showed.
>> Minlan Yu: Yes.
>>: Where the rules overlap.
>> Minlan Yu: Yes.
>>: And you described a technique for why it's safe to send a part of a rule to an
ingress switch.
>> Minlan Yu: So ->>: Does that same property -- does the way you cut up the rules have the
property that it's always okay for an ingress switch to throw away a single rule, or
the rules, are they sometimes interdependent where throwing away one rule will
->> Minlan Yu: Yes. So when we cache rule, make sure it's independent so that
[inaudible].
Okay. So let me quickly talk about the second project that is SNAP. How do we
perform scaling performance diagnosis for data centers. So remember in
DIFANE we achieve the scalability by rescind the division of labor between the
centralized controller and the switches.
So now we are hoping to rescind the division of labor between the network and
the host so that we can achieve better scalability. So if we look at the search
application in data centers, a packet arrives as a front-end server. This front-end
server actually distribute the packets, this request layer by layer to a group of
workers, and these workers were generally responses and these responses
could be aggregate to layer by layer to the front-end server. So we can see that
with just a single request from a single search application, it's already a lot of
traffic in the data centers and, in fact, most of the servers are shared in the data
centers is run on different storage, MapReduce, jobs and cloud applications. So
it's really a very messy environment, even when everything works fine.
So if something goes wrong that the switch may fails or the link may fail or there
may be some performance problem or boxing the software component in these
applications it's really hard to identify.
So essentially there are challenging problem to how to diagnose performance
problem for data centers. Because these applications usually involve hundreds
of application components and runs on tens of thousands of servers.
And another problem is there could be new performance problems coming out
every day because developers keeps changing their code to add new features or
fix bugs.
And there could be old performance problems coming out too because of some
human factors. For those performance problems that we already knows,
developers without a networking background hey not fully understand. For
example, in networking we have these detailed protocols like Nagle's algorithm,
delayed ACK [inaudible] it could be a mystery to those developers without a
networking background. And for them to work with a network stack it could be a
disaster.
So how does people do diagnosing in today's data centers? They first really look
at these application logs. It will tell the number of requests you can process per
second or the response time for each request. So developers could identify
performance problems using these logs. They could find out that one percent of
the request have actually long delay, more than 200 millisecond delay.
So since this only happen for small portion of the request, they may think that it's
not their application code problem but the network probably. But in the network
side we only have very course-grained switch logs that only tells the number of
bytes on the number of packets each port can process per second in the switch.
So in order to identify these what really happen for these request, these switch
logs don't help much. Instead the developers have to install very expensive
packet sniffers and to capture large amount of packet traces. And then they want
to filter out those traces that corresponds to this one percent request. And either
manually use some tools to analyze these packet traces to find out what's really
happened and we'll find out the root cause for these problems.
But most of the time we find out that the problem actually comes from not only
the notes and application code on the network side but the interactions between
the network and applications. So we hope to build a tool that locates in the
operating system layers the network stack and a the end host so that it can tell
directly about the interactions between the applications and the network.
So we don't want to use the application logs because it's too application specific.
We don't want to switch logs because it's too coarse-grained and packet traces
are too expensive to capture all the time. Instead if we can build a generic tool
that's lightweight enough to run always in the network stack layer in the end host
and it's fine-grained enough to capture all the performance problems, then it will
be great.
So actually build a tool called SNAP, a scalable network application profiler that
runs everywhere, all the time in the data centers to capture the performance
problems.
So in SNAP basically the first step we do is collect some performance network
stack data -- network stack data from all the connections on all the end host.
There are basically two types of stack data we collected. The first type are
relatively easy to handle is some cumulative counters. So these counters are
like the number of packet loss, the number of fast retransmissions and timeouts
that each connection have experienced or the round-trip time estimation that
TCP use for congestion control, all the receiver and the amount of time the
receiver is receiver window limited. For these counters we can easily read some
periodically and calculate the difference between two reads.
The second types of data are some instantaneous snapshots. It's relatively
harder to capture because if you read them periodically you may miss some
important values. So we use some Poisson sampling to make sure the
snapshots we read are like statistically correct.
These data are like the number of bytes in the send buffer, the congestion
window size, the receiver window size and so on.
With the data we collected from the network stack we hope to do the
performance classifications. Yeah?
>>: I have a question. I know that a lot [inaudible] don't they provide these
counters for [inaudible].
>> Minlan Yu: Yes. The good thing is like Windows operating system already
provide these counters and we just need to read these counters.
>>: I don't mean [inaudible].
>> Minlan Yu: The NIC don't have these kind of per flow, these kind of TCP level
information. They only have this package information.
>>: Even the ones that actually offload, they sort of you can offload TCP.
>>: TCP is never fully offloaded.
>>: Whatever you offloaded, I mean presumably the sample RTT you imagine
that --
>> Minlan Yu: Yes, RTT is easier to handle but like other stack information like
how this -- how do we add to congestion control and what's the receiver window
size on the [inaudible] yeah, it's harder to get.
So we think that the stack information in operating system layer is like -- only a
small amount of information but it's used for [inaudible].
>>: I guess my question is I note that parts of this stack information are being
offloaded on the NIC, so the question would be sort of how much could NIC help
you with collecting this information without you involving the operating system
which presumably is quite busy doing other things?
>> Minlan Yu: So I think if you have a way to combine the NIC and OS together
it would be sorted better. For example one thing we don't really get here is like if
the packets get delayed we don't really know if it's delayed in the NIC card or is
Internet work. So if you have more information [inaudible].
Okay. So with this data we collected we hope to do some classification, but it's a
hard job because these performance problem may come from different
application code and we can't really classify all these root causes. So instead of
classifying the root causes of the problems we classify the problem based on the
life of the data transfer.
So here is an example that the packets really start from the sender application so
they send buffer to the network to the receiver and then the receiver may
acknowledge the data, the transfer of the data.
And at each part there may be some problems like the send buffer may be too
small so it limit the throughput. The network part may have different types of
packet loss events, and the receiver may either not reading the data fast enough
or not ACKing the receiver data fast enough.
So luckily we have already collect -- with these covers we get, we can easily tell
the problem -- the problems from different stages. For example for the network,
we already have the data about the number of fast retransmissions and number
of time-outs to tell this network problem directly. And at the refer end, we already
have the receiver window limit time and the round-trip times estimation to tell the
delayed ACK problem. And now we have the number of bytes in the send buffer
to tell the send buffer problems. And if it's not a problem of all these other stages
then we classify it as a sender application problems.
So we can find out that most of the problem we find can be directly measured
using this stack data. And some relies on sampling and the others relies on
inference so there's some error rate and it's evaluated in the paper.
>>: So is this classification going on at realtime or is it data --
>> Minlan Yu: Yes, one of our goal is to keep the classification single enough so
it can be done in realtime, it can tell directly about the problems for each
connection in realtime.
>>: And is this done in the root VM or in the guest VM on a -- in a data center
[inaudible].
>> Minlan Yu: So if you have these kind of openness of quality in the guest VM
then you can understand ->>: [inaudible].
>> Minlan Yu: Yeah. If you have the operating support there -- operating system
support there, then you can get it.
>>: [inaudible].
>> Minlan Yu: So after we get all these data and performance classification for
all the connections on all the host we collect this data to the centralized
management system. And here because -- as data center operators they usually
have the full knowledge of their data center, like the topology, the routing and the
mappings from connections to processes and the applications. So using all
these data, we can do some cross-connection correlations. So we can find out
the shared application code or shared resources like the host, the link or the
switches that actually cause the correlated performance problems.
So in summary SNAP has two parts. The first part is on each host we have
some online, lightweight processing and diagnosis piece of code and we also
have an offline, cross-connection diagnosis part to identify these correlated
problems.
And we actually SNAP in the real world in one of bing's data center. That data
center has about 8,000 machines and 700 applications. So we ran SNAP for
week and collected terabytes of data. And using this data we're able to identify
15 major performance problems and during this week about 21 percent of
applications certain network performance problems at a certain time.
And here's a summary of the performs result we found. The first is that we find
one application that has the send buffer performance problem for more than half
of the time. It's because the developers does not send buffer large enough. And
also there are about six applications that have lots of packet loss events has the
network layer for more than half of the time. And there are about eight
applications that's not reading data fast enough at the receiver end. And the
most interesting thing there is they are about 144 applications that's not ACKing
data fast enough is a delayed ACK problem.
>>: Is the analysis straightforward once you have the data?
>> Minlan Yu: Uh-huh.
>>: It is?
>> Minlan Yu: So the classification part is really straightforward. It's our goal to
keep it really simple so we can do it on time -- online. And then this kind of
correlation part is -- involves large amount of like topology and route information
data and how to do it.
>>: [inaudible] analysis for one TCP flow because I'm thinking back to all the
work on trying to -- like critical path analysis on TCP or [inaudible].
>> Minlan Yu: So for one connection in most of the identification is really simple.
Like it's actually compared to packet trace people of developed a lot of
techniques to infer from packet trace what kind of problem it is. But here, since
you are getting the data directly, you can directly tell what the problem is.
>>: Okay. And if they're logical problems do you have a sense what the
bottleneck is, which one is a real one?
>> Minlan Yu: So basically we look at the stage of the data transfer and if a
connection -- for a connection it's always throughput limited. So we want to know
which stage is the cause of the throughput limitation. And if it's an application
that's a good sign for us because the application is not running fast enough to
generate the data. But like since we have these counters if the stack so we know
that which stage is causing these throughput limitation.
>>: It's not clear to me how can you automatically decide when there is a
bottleneck that's a send buffer that's a timeout that's a fast retransmission, what
will be [inaudible] look at the log and decide [inaudible].
>>: You can do that.
>> Minlan Yu: Because you already have the counters about the number of
time-outs, the number of packet log. So using these counters, you can directly
tell if this connection is limited or not. So the key trick is since we're observation
in the -- observing in the TCP stack layer it's a state layer who handle these kind
of bottlenecks. Like if the receiver end is too slow, the stack need to like send
less. So the stack already have the information about whether receiver is slow or
not. Yes?
>>: How could this information get represented to the application?
>>: [inaudible].
>>: So the application might ->> Minlan Yu: So here we use.
>>: [inaudible].
>> Minlan Yu: Basically [inaudible].
>>: [inaudible].
>> Minlan Yu: They have mappings from the -- each connection to different
process [inaudible] they use the information to [inaudible] the applications. But
we can use that information to do the mapping too. Between connections to
applications. So we can find out for one application what are the connections
that they use -- using.
>>: [inaudible] finish up and then we can [inaudible] questions.
>> Srikanth Kandula: You have like five minutes.
>> Minlan Yu: Yeah. I'm almost finished. So basically we find out delayed ACK
is a very important problem in this data center. One example is that one
application can only send like five records per second at some time and time
1,000 records per second at other times. The reason is that if a data has like odd
number of packets, the receiver may need to wait for 200 millisecond before they
send the ACK, and if the sender is waiting for the ACK it will significantly reduce
its throughput. And delayed ACK was used in the Internet environment to reduce
the bandwidth usage and so the interrupts. But in data center environment we
hope that people can disable it because today we have much higher bandwidth
environment and a more powerful servers in the data centers and we no longer
need this feature in data centers and -- because it's causing a lot of problems.
Interestingly when we talk with goal people and they also find this delayed ACK
problem in their data centers but they find it out with the expensive packet
sniffers installed on certain racks in the data center. So they are not -- they don't
have a sense of how common this problem is in the entire data center.
So to summarize SNAP we use a delayed ACK example to summarize SNAP.
Basically we first monitor at the right place. We monitor the TCP stack layer so
we understand the network application interactions and we monitor at the end
host so it's more scalable and more lightweight.
And also we have algorithms to identify these performance problems for each
connection. For example how do we infer delayed ACK problems?
And then we have some correlation algorithms that can find out problems that -across connections. For example we can identify those applications that have
significant delayed ACK problems.
Finally we can fix the problem -- we hope to work with operators and developers
to fix these problems like disabling delayed ACK in datasets.
So in summary we actually build two systems. The first system is called
DIFANE, is focused on how to scalably configure devices to express the policies.
And the second system is called SNAP. It focuses on how to scalably collect
management data perform diagnosis on the end host.
And for future work I hope to continue working on this edge network area but
focuses on the interactions between the applications and the network and the
platform.
So the first aspect is how to make the platform to better support applications,
make it more flexible using software defined networking techniques and making it
more secure by combining host and network defenses to the.
On the other hand, I want to look at how to make applications use the network
better. I want to look at better new types of applications and see how to fit them
better with this emergent type of networks like the cloud, home networks.
So I have also done other researches in two main streams. The first is like how
to support flexible network policies and how to build networks that support
applications better.
And I'm happy to talk about all these talks offline.
So in summary, my research style is I hope to look at practically challenging
problems and want to combine those new data structures with real prototyping so
that we can solve the problems in the Internet and data center networks. Thank
you.
[applause].
>>: Any questions?
>>: I'll pass.
>>: [inaudible].
>>: [inaudible].
>>: I'll ask you a question. [laughter].
>>: [inaudible] the current apps, the number of apps [inaudible] average
problems. So, yeah, this one. I must have missed it. The six apps that are
having the network problem, what is the network problem that you ->> Minlan Yu: So all these problems are packet loss [inaudible] have different
types of packet loss events. So we think that this application may not be written
a good way so that like [inaudible] they write some synchronized write at the
same time that always cause packet loss.
>>: Okay.
[applause]
Download