>> Jin Li: Hello. It's my great pleasure... and to give a talk.

advertisement
>> Jin Li: Hello. It's my great pleasure to welcome Haiying Shen to come to Microsoft
and to give a talk.
Ms. Shen get her Ph.D. degree from Wayne State University in year 206.
Currently she is assistant professor in Clemson University with focus on peer-to-peer
systems, distributed computing, and their various solutions.
She has published more than 60 research papers in various top journals and conferences
as being on programming committee on many international conferences.
Without further adieu, let's hear what Professor Shen has to say about efficient and
effective file replication and consistency management maintenance in P2P systems.
>> Haiying Shen: Thank you, Jin. Good afternoon, everyone. It's my great pleasure to
give a presentation in Microsoft. I would like to thank Jin for the invitation.
My name is Haiying Shen. I'm assistant professor in the Clemson University in the EC
department.
The topic of my presentation is efficient and effective file replication and the consistency
maintenance in P2P systems.
Now, first I would like to briefly introduce my research areas. My research is mainly
focussed on distributed and parallel computed networks in computer systems, including
P2P and content delivery networks, high-performance grid computing, mobile
computing, and so on.
I have been conducting research in peer-to-peer systems since I was a Ph.D. student in
2002. In this area, I have been working on topology for scalable lookups, load balancing,
congestion control, file replication and consistency maintenance, and so on. Recently I'm
working on the video streaming.
In mobile computing, I have been working on the routing, reputation systems, energy
efficiency, social networks and so on.
This presentation is based on two publications in IEEE
Computer Systems in 2009.
Transactions on Parallel and
Now, let's see the outline. First is introduction, and then I will introduce the background
and the related works about file replication and the consistency maintenance. After that
I'll present our methods, including IRM, integrated file replication and consistency
maintenance, and a GeWave, geographically aware wave for file consistency
maintenance. And then I'll present our evaluation results and finally I conclude my
presentation.
All right. As we know that in [inaudible] Client/Server architecture peer-to-peer is a
completely decentralized system. It's well known for its high scalability, reliability,
dynamism, resiliency, organizing and so on.
BitTorrent and Overnet applications are peer-to-peer networks, which are widely used
nowadays.
And this talk is focussed on structured peer-to-peer systems, but the methods can also be
applied to your own structured P2P.
Now, in P2P file sharing system, every node has a routing table recording its neighbors.
So we now know the requests of file, the request will be forwarded to the file's
destination and then the file will be sent back to the requester.
In P2P file sharing systems, the file access is highly repetitive and skewed towards the
most popular ones. So if a file -- if a node becomes a hotspot, there will be a delayed
response.
File replication is one solution to deal with such problems. It replicates a file to some
other nodes in order to distribute the query load among a number of nodes and to avoid a
hotspot so that the file query efficiency can be enhanced.
A file replication needs file consistency maintenance in order to keep the consistency
between a file and its replicas. For example, if a file is changed. So all its replicas
should be updated correspondingly. For example, in eBay transactions, in shared
calendar, in banking, in flight control systems, we all need consistency maintenance to
maintain the consistency between the replicas and the file.
Now, Randy here is eBay's architecture principle. He proposed a five -- eBay's five
commandments. One of the five commandments is embrace inconsistency. 99 percent of
eBay stuff is based on inconsistency, and only 1 percent of transactions are based on
consistency.
So this time I come to Seattle to attend the P2P conference. So the presenter of the
keynote of this morning, who is Dr. Ken Birman from Cornell University, he mentioned
that today's embrace of inconsistency has given us scalable service, but we can just
cannot trust.
So he mentioned that consistency maintenance is really very important in our daily life or
in the realm of business.
Now, next let me introduced the related works of file replication. Previous work of file
replication can be classified into three categories. The first method replicates a file in the
nodes close to the file owner. The second method replicates a file in the requester. In the
requester. And then the third method replicates a file along the nodes in the path length.
So we can see maybe the requester no longer requested a file later on. So the replica -but the replica is still there. Maybe the nodes along the path will not receive the requests
later on.
So the drawback of previous method is that those methods cannot adapt to time-varying
replicas, utilization replicas. They may be utilized -- now recently they may be only
utilized later on, so they did not adaptively adjust to the time-varying utilization.
Second, it's difficult to ensure that all replicas are fully utilized. For example, the
replica's requester actually is not shared by other nodes. Only when the requester
requests the file later on it can get a file from itself.
And also those methods generate high overhead for unnecessary replicas and consistency
maintenance. Because the more replicas, the higher overhead for consistency
maintenance. Right?
So let's see some representative works for file replication. Now, this work replicates a
file to nodes close to the file owner. So the advantage of this method is that it can
enhance the replica hit rate. Because when the requests are forwarded to the server,
the request can always encounter the replica nodes around the server. And also it can
enhance the lookup efficiency.
But this advantage is that it produce overloaded nodes because the replica nodes are
ground to the server. So there are only a few options. That's why those nodes may be
overloaded.
And also it cannot significantly improve the query efficiency. Why? Because the
requests encounter the replica nodes only when it is close to the server, right? So it
cannot significantly improve -- reduce the path length. Now, that is one method.
Another method replicates a file to requesters. So the advantage is that it can enhance the
lookup efficiency because when the requester requests the file again and it can get the file
from itself. Now, however, it has low replica hit rate because this replica is not shared by
others. It can only use by itself.
Another method is to replicate a file to nodes along the path length, along the path. So of
course it can enhance the replica hit rate because there are lot more replicas along the
path. And also it can significantly enhance lookup efficiency.
So the drawback is it has significantly more replicas as though it generates high
overhead.
Now, let's see the related work on consistency maintenance. The previous work for
consistency maintenance can be classified to two groups: one is structure-based method,
another is method-spreading -- message-spreading methods.
In structure-based methods, all nodes constitute a structure, such as hierarchical structure,
true structure. But we know that P2P is characterized by churn, nodes leave and are
drawn continuously and frequently.
So for the structure-based method, the dynamism leads to unsuccessful update
propagation. That means in a tree, if a node fails and all of its children cannot get the
update successfully.
Okay. And also dynamism leads to high structure maintenance overhead. That is
structure. There is actual overhead for maintenance.
Message is spreading message, such as broadcasting, gossip, random walk. So of course
they leaded to tremendously redundant messages. And also one node may receive more
than one update. And also they cannot guarantee every replica node can get an update.
So those are the drawbacks of the two kind of message.
So basically the drawbacks of previous work is that they cannot guarantee success for
update and they generate high overhead.
So the essential problem is passively accepting update messages makes it difficult that
you avoid unnecessary update. Why? Because replica nodes just waiting for the update.
They cannot control whether they should ask for update or when they ask for the update.
So that's why they have those drawbacks.
And also we observe that that -- for some files, the query frequency is less than update
frequency. A file may be updated very frequently, but it's requested in a less frequency.
So that every time the file is updated, there is an update, but it's requested less frequently.
So in previous work, they tried to get guarantee that every time a file is changed, there is
update. But we lose the strict requirement. We want to guarantee that when a requester
receives a file, it is updated file. It doesn't matter how many updates are executed.
All right. Now, let's see representative work. This work is proposed -- is published in
2008 IEEE Transactions of Parallel Distributed Systems.
So they build a hierarchical structure. Those nodes are high-capacity nodes and
physically close nodes constitute a tree, so the replica update is forwarded along the
high-capacity nodes and then farther forwarded along the tree.
And this is another structure-based work proposed in 2005. Basically they built a tree
based on all the nodes in the system. And every node has an index indicating whether its
children have the replica or not. If its children doesn't have the replicas, the propagation
stop here. Otherwise the propagation will be farther forwarded to its children.
Now, this work is message spreading method, push/pull. So normally nodes use rumor
spreading for update. And only when a new node joins in the system it will pull the
replica nodes to poll update.
So we found that there's interdependency between file replication and consistency
maintenance. The file replication needs to minimize the number of replicas in order to
minimize the overhead of consistency maintenance.
On the other hand, all consistency maintenance needs to help file replication to keep the
consistency between a file and its replicas in dynamism.
So integrating these two techniques can enhance their mutual interactions, can avoid their
conflicting behaviors, and can ensure that the two techniques can be exploited to their
fullest capacities.
Therefore, we propose IRM, Integrated File Replication and Consistency Maintenance, in
a harmonic and coordinated manner.
Unfortunately previous work addresses these two issues separately, so that's why we
propose a method that combine them in a coordinated manner.
The features of IRM include identify effectiveness and low cost.
So basically each node in the system actively decide whether to create or delete a replica
and actively polls for the update rather than passively waiting for the update.
A node that replicates highly queried files polls at a high frequency for frequently
updated and queried files.
So IRM avoids unnecessary file replications and updates by dynamically adapting to
time-varying file query and update rates so that it improves replica hit rate, improves
replica utilization, improves file query efficiency and the consistency fidelity.
All right. Now, let's see the issues addressed in IRM.
File replication. Where to replicate files so that the file query can be significantly
expedited and the replicas can be fully utilized? We don't want to see underutilized
replica because it's a waste of overhead. And how to remove underutilized file replicas
so that the overhead for consistency maintenance is minimized.
Now, in consistency maintenance, there are also two issues. One is how to determine the
frequency that a replica node probes a file owner in order to guarantee timely file update.
If it probes too fast, it's a waste of resource. But if it probes too slow, maybe the file will
not be updated in a timely manner. Okay.
The next issue is how to reduce the number of polling operations to save cost and
meanwhile provide the fidelity of consistency maintenance.
Overview of IRM. So in P2P file sharing systems, some nodes have much higher traffic
load than other nodes due to three reasons.
First is node interests are different. So there are lot more traffic between the interested
nodes and the file owner than the node was not interested in the file.
Second, file popularity is nonuniform and time-varying. So a file may be popular at this
time, but it may not be popular at that time.
Third, nodes are located in different places and may have different number of neighbors
in P2P overlay network.
Now, let's see overview of IRM. For file replication, we can see that traffic junction
nodes carry more -- carry more traffic load. So IRM replicates files in frequent
requesters and traffic junction nodes. In order to improve the utilization of replica.
Now, this is IRM for file replication. For consistency maintenance, the node -- replica
nodes will actively poll the file owner for the update.
Now, let's see the details. In the case when a file requester requests a file, we define a
query initiating rate for file F as the number of queries for F sent by the requester during
a unit of time denoted by QF.
So we set a threshold for that query initiating rate, TQ. So when a file requester's QF is
greater than TQ, it replicates the file.
And for the case that a traffic -- traffic junction node replicates a file, we define query
passing rate of a file F as the number of queries for F received and forwarded by the node
during a unit of time denoted by LF.
So we set a threshold TL. When LF is greater than TL, the nodes add a replica request to
the file requester. And then the file owner will send the file to the replica requesters.
Now, let's see the process.
So those nodes will check whether the LF is greater than TL. If yes, then includes a file
replica requester into the file requester. And then after the server receives the requester,
the server will replicate the file to the replica requesters. And also send the file back to -the file to the requester. And then the requester checks its QF. If QF is greater than TQ,
the requester will replicate the file.
Now, another problem is the server may receive a lot of requesters. But how the server
can determine whether it should replicate a file to the requester or not. So we define the
server as the original file owner and the replica nodes.
So a server's query load L. It's defined as the visit rate during T. And so is capacity C as
represented by the number of queries it can respond during T.
Now, there's a factor gamma. And the server I periodically checks its LI. If it's load over
capacity is greater than the alpha, it releases actual load. It releases actual load. If it is
less than 1 over alpha, it will determine whether it replicates the file or not. If the benefit
is greater than cost, then it will replicate the replicated file. Otherwise it does nothing.
All right. Now, the file replicas may become underutilized. So in this case the replica
nodes periodically check their load. If the load is less than TL, they remove the file
replica so that IRM ensures that all replicas are fully utilized. There's no waste of
overhead for maintaining the replicas or for the consistency maintenance.
Okay. Next is the how to appropriately determine the value of QF and LF. For example,
maybe the requester is very interested in a file at this time but next time period it lost the
interest. And then later on it's interested in file again so that there will be a fluctuation of
replica creation, replica removal, replica creation.
So in order to avoid the resources wasted for the fluctuation, IRM employs exponential
moving average technique to reasonably determine file query rate over time T.
So basically it has a factor. So it does not discard the old observation, but it still use the
new observation. So this is the formula for determining the query rate.
Okay. Now, recall that in IRM replica nodes polls the file owner for update. Okay. Let's
see the benefiting of polling.
It can reduce redundant updates. So a node does not necessarily need to update its replica
after each update, as I mentioned. A file may be updated frequently, but the request is
coming less frequently. So we just want to make sure the requester receives the updated
file. So every time when a file is updated, the node does not have to update it until, you
know, there is a request.
>>: Question. When the request actually comes in, do we need to check and verify if the
file is of the latest version?
>> Haiying Shen: Yes.
>>: Okay. So and the time the request actually come in, you still need to do a
verification with our original server?
>> Haiying Shen: No, it does not do verification on purpose. For example, when I
received the request, I'm still in the process of polling. Then I just wait. I hold the
request until I get the updated file. But if I'm sure this is the updated file so far, then I
just respond.
>>: So even if this is nothing but true ->> Haiying Shen: Updated file. Yes.
>>: [inaudible] version, right?
>> Haiying Shen: Yes.
>>: Okay. So in that case you can still have some inconsistency.
>> Haiying Shen: Right. Right. Yes. Relatively. I will introduce the details later.
Okay. Next is it can enhance the fidelity of consistency guarantees of query results. So
that's what Jin mentioned. When a node is in the process of polling, there comes a
request, the node will hold the request. And then after it gets the up-to-date file, it will
respond to the requester.
Third, it enhances consistency guarantee in churn. Instead of passively waiting for
updates which may lead it to update failures due to known failures or departures, a node
can actively poll to get timely update.
For example, in a structure, if a node left, then its children cannot get the update
successfully. So in polling, so a node just polls the file owner for update. So there's no
unsuccessful update. Yes.
>>: So but that assumes that there is an owner for every file and this owner is
[inaudible].
>> Haiying Shen: Right.
>>: So how do you -- [inaudible] and you can just put the file into the P2P system,
usually user replicas to guarantee the ability, in your case practical means that the owner
is down, the whole system is down practically, right?
>> Haiying Shen: Right. Yes. Actually that is a question when I was coming here. I
was thinking when I was on the way. So actually if there's no replica, if there's no
replica, if the server is down, the file is not available anymore, right?
>>: If there is a replica -- oh ->> Haiying Shen: Yeah, but there is no replica. So, yes, that is really very good
question.
So in our work, we just assume, you know, the replica helps the server to serve the
clients. But if the server is down, all replicas are invalid. It's invalid. Replicas -- all
replicas we assume they are down. They are not available. The original file is gone.
But, you know, this is -- the problem actually goes back to the node failure in P2P, right?
P2P can handle node [inaudible] node departures. But P2P cannot handle node failures.
If a node fails, then all files is lost -- are lost, right?
Okay. So yes. So this problem actually I think it should be addressed by the P2P itself,
the traditional research work, rather than the replica -- replication consistency
maintenance. Good point.
Okay. Let's go on. It helps to guarantee timely update for all replicas. Because in
traditional update propagation, the update may be forwarded by a number of relay nodes
before it arrives at the replica node. But in polling, it's just the transmission is -- they are
actually between the file server and the replica node. So there's no delay. Yes.
>>: So what's the reason to have relay nodes in general in P2P? So my understanding
was that you use relay nodes because you usually cannot reach directly other nodes and
you create the tree.
>> Haiying Shen: Right.
>>: So there is -- in some situations you just want to have a direct route from the node to
the [inaudible].
>> Haiying Shen: The reason is to conduct the consistency in a distributed manner. For
example here. This is structure. So the node -- the replica update is forwarded or
propagated among the parents first, and then they will be further forwarded along the
tree. Or this structure. So the reason to build a structure for propagation is in order to do
the propagation in decentralized way.
>>: But then that assumes that -- but it doesn't guarantee that there is direct route -direct way to connect to any node, right?
>> Haiying Shen: It's easy.
>>: In your case [inaudible] any node can talk to any node, right? And in P2P usually
you don't do that. Because then you check for updates directly, right? You check
through the [inaudible].
>> Haiying Shen: No.
>>: I think here there's basically different philosophies. So, I mean, on the Internet -actually, any two nodes can try to establish a connection. I mean, not actually any two
node can actually establish connection, because, yes, [inaudible] was basically
[inaudible]. And to establish a connection actually an expensive operation. Even if two
nodes, I mean, can basically [inaudible] traversal basically operations. It takes time
because, I mean, basically [inaudible] you need to go through.
In our state, there's two [inaudible] one is topology view, which I don't see covered here.
So it's basically like the same -- I mean, let's say we have a huge popular social network,
like Facebook or something, and we all participate. From time to time I want content. So
one [inaudible] basically potential operation is that each time I want a new content form
[inaudible].
So when I, let's say, look at this set of photos, my node is going to join your cluster, open
an established collection among themselves and try to poll [inaudible].
This -- the work discussed here, more -- assuming you have more stable network, so it's
like, for example, when you boot your machine, the machine just join a big cloud, a
[inaudible] or something, then, I mean, you basically request content, so overlay route in
this cloud.
This way you do not have the established cost, basically peer connection. But you do
incur routing cost because the content is going to route along the way.
And our state, I will argue that a better solution is probably combination of the two.
Sometimes you need to evolve the topology, and sometimes you probably need to do
routing.
>>: Yeah. But my only question is, though, you [inaudible] because you practically say
that to poll from this you need to [inaudible].
>> Haiying Shen: Right.
>>: [inaudible] to that file will actually go to the owner ->> Haiying Shen: Right.
>>: -- and do direct connection to the owner.
>> Haiying Shen: Yes. Again, good point. This is -- the work so far, and then I'll
introduce a structure that can conduct the consistency maintenance in a decentral -distributed way. It is a good point. Okay.
Because if all nodes polled the server, the server will become a hotspot. It will be
overloaded. And also I agree with you. It is against the P2P principle distribution,
decentralized manner.
All right. Let's go on. The TTR is time to refresh. So the nodes will poll for update
every TTR. So the question is how to determine TTR so that the polling is conducted at
approximately the same rate as file update rate. We don't want to specify a fixed rate.
Because if it's too fast, it's a resource waste. If it's too slow, it cannot guarantee the files
are up-to-date.
So how to determine. This is very interesting. So the IRM relies on linear increase
multiplicative decrease algorithms.
Now, a file's maximum update rate is one over delta T. And the TTR is set to delta T
initially. So when replica nodes polls the file owner, if the file does not change between
successive polls, the TTR equals TTR old plus alpha. Otherwise, it's equal to TTR old
divided by beta.
Now, in order to limit the TTR within certain range, we specify TTR min and TTR
maximum and use this formula to limit the TTR to certain range. So that the polling is
conducted at approximately the same rate as file update.
As a result, this consistency maintenance algorithm can avoid unsuccessful update in
dynamism with less overhead.
Okay. So another observation is that query frequency, when query frequency is less than
polling frequency, it's an overhead waste to poll the file's owner. I mentioned that this is
-- several times.
Now, let's see example why. If this file replica nodes polls twice per second, once, twice
per second, but this replica is requested only once per second, so it does not have to
update twice per second. It only needs to update once per second because our goal is to
ensure the file is up-to-date when it's requested. Our goal is not to guarantee the file is
always up-to-date.
Okay. So that we define if the TTR is no more than file query frequency it will be equal
to file query frequency.
Do I make myself clear here? This is very important observation because no previous
work deals with this problem, so that the polling overhead can be reduced.
>>: It assumes that the files change periodically and is given frequency, [inaudible].
>> Haiying Shen: No. Unfortunately no. But we really do want to -- you know, to have
the observation in real world, in real application.
Okay. So now let's go back to your question. Yes. So if all nodes polls the file server,
the server will be overloaded. And also it's against the principle P2P.
All right. So we develop the GeWave file consistency maintenance. It's two goals. One
is it enables propagation be conducted among geographically close nodes. Or in P2P area
we call topology aware, locality aware, or proximity aware.
And secondly is decentralized way.
Now, before I introduce GeWave, let me introduce the Hilbert number. We have a
number of landmark nodes. We measured the distances between each node and the
landmark nodes. And each node will have landmark vector.
So physically close nodes have similar landmark vectors. And then we use help with the
curve to transform D dimensional vector to one dimension ID.
And we can see that if you -- here in D dimensional space U is close to U prime but
farther away from V after the D dimensional space is transformed to one dimension
space, U is still close to U prime but is farther away from V.
So that's the feature of a Hilbert curve. It can preserve the closeness of nodes in the
dimension reduction so that the similar landmark vectors will be transformed to close
Hilbert number.
And then we know that similar close nodes have similar landmark vectors, so they will
have similar help with numbers. If they have the same help with numbers, they are
physically close. If they have totally different help with numbers, they are physically far
away from each other.
Okay. So this is the GeWave. The GeWave is built dynamically based on node
geographical location and the polling rate. So we can see that the -- it takes the file
owner as the tree root and constitutes nodes into different levels based on polling rate
levels; nodes in the upper levels having high polling frequency than those in the lower
levels.
So that it ensures that the parents always have up-to-date file earlier than the children.
The children the poll their parents for update.
And also those nodes are physically close nodes. DEF, DEF, DEFG. Okay. So they
have the same Hilbert number, 5. So the GeWave connects the geographically close
nodes together.
And an update is propagated in the fashion of top-down wave between geographically
close nodes.
Okay. So it's updated in this way. Because those nodes have higher frequency than the
lower-level nodes, therefore the children can poll their parents for the update. And also
we can find that the polling is between the physically close nodes. So the overhead is
reduced because the transmission is in close distance.
Therefore, the GeWave achieves the two goals. One is the propagation or polling is
conducted in a decentralized manner, and also the polling is conducted between
physically close nodes.
Any questions?
Okay. Basically here the children, this level -- the nodes in level four polls the nodes in
level three. And actually it's from the top to the bottom. Nodes in level one polls file
owner. Nodes in file two polls nodes in level one. Nodes in level three poll nodes in
level two. Okay. And physically close nodes, they poll each ore.
Now, let's see the GeWave structure maintenance. So actually the file owner is
controlling the nodes. So it will collect the information including TTR and H of its
replica nodes and then it locates the replica node in the tree.
And for replica node departure, the node will notify its neighbors, predecessor and its
successor, that it's leaving. And the leaving node transfers its children to its neighbors.
For example, if two leaves this tree, it will transfer its children to its neighbor to here.
Okay. And then replica node failure. So for failure it's easy. If this node polls its parent
but it did not -- it does not get a response during certain time, and then it just go back to
the original method. It polls the file owner. And that's why it is dynamism resilient.
Replica node mobility. So the node at TTR may be changed. So when the TTR is
increased, it will traverse along a tree until it reaches its level and then traverse in
horizontal direction. Because all those values are ordered in ascending order. Okay. So
that is replica node mobility.
>>: Can this structure deal with the change in the polling pattern of a file? Let's say I'm
currently interested in this particular [inaudible]. After a while I'm not interested in this
anymore. So I'm stopping -- stop requesting that content. How does this structure
[inaudible]?
>> Haiying Shen: All those nodes are replica nodes.
>>: Okay. And how do -- I mean, but the -- I mean, the replicas nodes they are
requesting is determined by the [inaudible], right?
>> Haiying Shen: Right. Right.
>>: What I'm asking is that if the unit request frequency changes, then let's say maybe in
this period of time [inaudible]. Later, I mean, I'm not interested [inaudible] that changes.
Is this structure [inaudible]?
>> Haiying Shen: If user is interested. Yes. So when replica is underutilized, if the user
is not interested in the file any longer, so the replica will be removed, right?
>>: Okay.
>> Haiying Shen: In order to guarantee every replica has a high utilization so that that
is -- replica know the departure.
>>: Departure. Okay. So in that case the replica node will depart ->> Haiying Shen: Right.
>>: -- from that [inaudible].
>> Haiying Shen: Yes.
>>: If a node have increase in term of [inaudible].
>> Haiying Shen: Then there will be more replica nodes. And then this is replica node
creation.
>>: Okay. Let's say I already have a replica node. And in the past that frequency
[inaudible]. Later the frequency of that node increases. How is that going to -- is that
node going to move up?
>> Haiying Shen: If the frequency increases -- we have threshold, so if the frequency is
higher than the threshold, there is only one replica. But definitely the other nodes will
have higher request frequency. So the other nodes will also have replicas.
>>: I notice in this -- because, I mean, [inaudible] you're starting to change topology.
>> Haiying Shen: [inaudible] another topology.
>>: You said you're starting to build another topology based on usage patterns. When
will you start to build a topology? When will you just use the existing topology
[inaudible]?
>> Haiying Shen: When? Start to build the topology when start to poll. So when there's
node, even if there's one node, then this topology can be viewed. It's like P2P. So
initially there's only one node. And then when the second node join in the system, they're
connected. When the third join node joins in the system, they are connected. Finally
they become maybe ->>: So what's the use case? I don't understand that. Because for me these systems
usually have large number of files.
>> Haiying Shen: Right.
>>: And these files are usually -- there is not very good figurative [inaudible] of those
files. And it looks like practically if I have million files in my hard drive [inaudible]
which I'm sharing and I have people requesting them, it will be meaning that I will have
high [inaudible].
And some of this -- and my nodes can [inaudible] servers just because the topology is
very different. And also I just don't -- what's the use case to requesting the same file
[inaudible]? It looks more like messaging system is limited number of topics.
>> Haiying Shen: Okay.
>>: Poll based messaging system rather than file storage or something like that.
>> Haiying Shen: Okay. So it's true, because there are more fields -- more fields, the file
hardly change. But currently there are more and more applications that need consistency
maintenance. For example, shared calendar. Shared calendar. I change my calendar,
some others change calendar, we need to see each other.
>>: [inaudible] the messaging system, the queuing system.
>> Haiying Shen: Yes. Message, that is. Or, you know ->>: [inaudible] queuing system, but it's not file sharing system, it's more like -- I mean,
maybe it's right thing. I just mean that use case is not file sharing [inaudible].
>> Haiying Shen: The user case can be applied to the applications that have frequently
updated contents.
>>: I see. A limited number of those files.
>> Haiying Shen: Limited. Yes, yes. Well, banking, stock market.
>>: Small number of [inaudible].
>>: I think in the real cases, what you have is a large number of files, only a small
amount can change it.
>>: So then [inaudible]. So my point is that it's a place -- a different use case than file
sharing. It's more like messaging system and you have frequent updates with small
number of data content, right?
>> Haiying Shen: We can all supply to the file sharing. Why you cannot.
>>: You need to deal with those polled files. You need to deal with those files.
>>: [inaudible] an example. You have 100k files, right, and [inaudible] I don't know,
100k node [inaudible] 1000 node [inaudible] 10k files and they're sharing these files
[inaudible]. I just -- this will create practically a mess because it will [inaudible] very
dynamic structure [inaudible].
>> Haiying Shen: Right.
>>: So your topology is per file and not -- I don't understand how you [inaudible]
function in real life.
>> Haiying Shen: It's true. But, you know, as the keynote this morning, the keynote
present a mission. There's a conflict between scalability and the consistency
maintenance.
In order to achieve high scalability, we needed to resort to such a structure. We cannot
use broadcasting messages or spreading of gossip.
But, on the other hand, you're right. To maintain, you know, such a structure for each
file, it may lead it to high overhead. But there is a tradeoff.
>>: [inaudible] consistency can be sought in different ways. A way I see consistency
solved is that you do not guarantee the consistency; you guarantee the latest version.
>> Haiying Shen: Right, right.
>>: [inaudible] make sure, I mean, basically [inaudible] you have a way to resolve
conflict. So, I mean, for a Live Mash is basically such a system. You -- basically each
file attach with it a version that basically is the change history of those files.
Now, I mean, the server can go down. And you can leave replicas which is all the
replicas. That means when I retrieve a file, I may retrieve an older version of the file, but
that's fine anyway.
The problem is that when later that server comes up again and if we meet the system, the
system will figure out which one is later and deliver the newer version. And that way
you can solve consistency to some degree. You can still have consistency, basically
branches. [inaudible] the server, I mean, can go off and we each make our modification.
>>: But the inconsistency is already solved. The doctor says if file owner is down, we
don't serve the file.
[multiple people speaking at once]
>>: It's a way to solve it. It's expensive. It's not a good experience. That means when
the server is down, that file becomes unavailable. I mean, usually we're arguing you
want more availability [inaudible] consistency.
>>: Yes. But ->>: [inaudible] you and I co-edit a document. You want the document to be editable.
Not to say, okay, when you are the owner you're gone, well, I will not be able to edit the
document.
>>: Then we go into [inaudible].
>> Haiying Shen: Okay.
>>: [inaudible] but OneNote does that. OneNote allows for disconnecting of a shared
document. Because it basically -- if you look at the document structure on disk, it's a
change history with branches. And they have an automated mechanism [inaudible].
>>: Basically [inaudible] the branch.
>>: [inaudible]
>> Haiying Shen: So. Yeah. The -- actually the application depends on some systems.
Some application needs very strict consistency maintenance, like presented this morning.
He give us an example. Like airplane can choose its term. You know, if they're not
consistency, this system say, okay, there's no plane, go up, and that system there's no
plane, go down.
>>: [inaudible] but what we [inaudible] I think nowadays there's more and more
application that [inaudible] consistency.
It is true if you have systems [inaudible] but those examples serve basically pretty well
[inaudible]. So basically -- or the structure is in one place. Every time you want
[inaudible] go into that particular server.
>> Haiying Shen: Yes. Yes.
>>: Is that server okay?
>> Haiying Shen: Right.
>>: I mean, the majority of case when we're dealing with scalability, we need to have
some consistency. I mean, think about Facebook, basically, MySpace [inaudible]
application where you have huge amount of data [inaudible] consistency, how are you not
to survive without consistency?
>> Haiying Shen: Yeah. The kind of server model sometimes works much better than
maybe decentralizing.
>>: If you [inaudible].
>> Haiying Shen: Like a Google search. But then let's -- we should discuss the problem.
Is P2P really, you know, have -- does P2P really have more advantage than the
ClientServer model.
So for P2P researchers, we always say, okay, P2P decentralized has high scalability. It's
better than client server. But is it really true ->>: It is. I'm not arguing peer-to-peer isn't good for every application or every single
case, it's good in scenarios where you demand scalability. I mean, if your demand is have
some consistency, airline [inaudible] don't bother [inaudible].
>> Haiying Shen: [inaudible]
>>: [inaudible]
>> Haiying Shen: Okay. Yeah, we can discuss it offline. Because, you know, I'm
always in favor of peer-to-peer. I think when there are lot of users, a lot of nodes, it's
better to use peer-to-peer.
>>: I would say peer-to-peer is not for every application. It's not [inaudible].
>> Haiying Shen: Okay. All right. So let's go on. I presented the performance
evaluation.
So we compared IRM with the three category works: ServerSide, ClientSide and the
Path. And so we record that ServerSide replicates files near the server, ClientSide near
the client, and Path along the nodes in the path.
So let's see the performance. Here this metric is replica hit rate. We hope that the
replication method can have higher hit rate. So there's less underutilized replicas.
So from the figure we can see that Path has higher hit rate than IRM. And then IRM has
higher hit rate than ServerSide and ClientSide.
It's understandable that these three methods have the same number of replicas because
every time a server is overloaded, it creates a replica. So they have the same number of
replicas. But Path replicates file in all nodes along the routing path. So it's
understandable because it has much more replicas. That's why it has much higher replica
hit rate.
Now, this is node utilization. I didn't introduce the load balancing idea in my
presentation. So basically when IRM generates replicas you can see there's no available
capacity, but other methods do not. That's why IRM can always restrict the node
utilization around the [inaudible] but others, maybe like this ClientServer, it has high
node utilization. Some nodes may be overloaded.
So overhead of file replication. The Y axis is number of file replicas. It's a very easy to
understand this figure because Path replicates all the node in a routing pass. That's what
has a lot of replicas. But ->>: [inaudible]
>> Haiying Shen: Could you please say it again?
>>: These path, is it the one which actively pushes the updates?
>> Haiying Shen: No, no, no. Here we just compare the file replication. File replication
method. Those three methods are all file replication method. That's why I said previous
work adjusts file replication and consistency separately. So we hope, you know, we
combine them in coordinated manner.
So why IRM generates less number of file replicas is because it removes the replica, only
to utilize the replicas. Okay. So it has a small number of file replicas.
And we also compared IRM with and without adaptation. So from left figure we can see
that the IRM with and without adaptation almost have the same average routing path
lengths and the replica hit rate. But the IRM without adaptation has less -- has more
replicas than with adaptation. That means adaptation is good. It helps to reduce the
number of replicas.
All right. And also we measured the average path length in churn. And we can see that
IRM has slightly a little bit longer path length than path. We all know the reason, right?
It has much more replicas. And the ServerSide and ClientSide lead to significantly
longer path lengths.
The reason ClientSide has longer path lengths is because if the requester is down, then
that requester has to be forwarded in the original routing algorithm.
Now, let's see the maintenance. So we compare the GeWave with SCOPE, UMPT and
Push/Poll. Those are the experiment settings.
This figure shows the proximity-aware performance. The X axis means the physical
distance by hops and the Y axis means the CDF of percent of messages.
So from this figure we can see the GeWave and UMPT can always transmit almost 100
percent of messages within 10 hops. So the messages are transmitted within 10 hops.
But in Push/Poll and SCOPE, only 30 percent of messages are transmitted between -within 10 hops. So this means in these two methods, the messages are transmitted
between physically close nodes. But these two methods, they don't consider the
proximity. That's why these two methods have -- has high efficiency.
Overhead. Sorry. These -- the letters may not be clear. Average number of messages.
So this is the -- X axis means the number of replica nodes. So from this figure we can
see the GeWave leads to much less overhead than others.
This message, number of messages also included the messages for structure maintenance.
And GeWave does not need maintenance. And these two methods need maintenance.
And also Push/Poll. We understand message spreading, right, and master generates a lot
more messages than others. So from this figure we can conclude that GeWave generates
much lower cost than other methods.
Communication cost in churn. We define the communication cost as the product of the
message size and the physical path lengths. And, again, GeWave generates much less
communication cost than others.
There are two reasons. First it has -- the messages propagate between physically close
nodes, second, it has less messages than others.
Effectiveness of consistency maintenance. So we hope that with consistency
maintenance mechanism a request always can get the up-to-date file. So this Y axis
means the number of up-to-date files receives.
Now, from this figure we can see GeWave and Push/Poll can always -- can always get
the updated files. But push -- the SCOPE and the UMPT cannot guarantee that the
request has received up-to-date files.
Why? Because they are based on structure. So in structure you know the fields or know
the paths. Before the structure is fixed, their children cannot get the update successfully.
And Push/Poll broadcasting, it is more dynamism resilient.
All right. Conclusions. So file replication and consistency maintenance are intimately
connected. And the previous work just addressed these two issues separately. So we
propose integrated file replication and consistency maintenance.
Basically nodes actively determine if they needed to replicate a file or not and they
actively poll for the update. And also we propose GeWave to conduct the consistency
maintenance in a decentralized manner.
So briefly they have too many features. One is high effectiveness, seconds is high
efficiency. Efficiency means low cost; effectiveness means they will guarantee the file
consistency, they will guarantee the file query. Efficiency can be improved.
Thank you. Any questions and comments? Yes.
>>: So you're saying that [inaudible] model is more efficient than [inaudible] model,
right?
>> Haiying Shen: Yes.
>>: But that assumes [inaudible] very small number of files. In your example
[inaudible] but if you have scenario that like if you [inaudible] files or whatever, this is
information, and you need to poll for every [inaudible] change them out, even if you don't
change any of those, you will have [inaudible], correct?
>> Haiying Shen: Push. Every time when a file is updated ->>: [inaudible]
>> Haiying Shen: Huh?
>>: If files don't ->> Haiying Shen: Right. Right.
>>: For example, you have [inaudible] files but update one file per second.
>> Haiying Shen: Okay.
>>: [inaudible] message per second. You could have two million files. If you don't
update daily, you still have a million messages [inaudible] update frequency.
>> Haiying Shen: Good question. And personally actually I belive, you know,
Push/Poll, they have their own add advantages and disadvantages. They just -- they're
suitable in different kind of situations.
But in this method -- let me see. Okay. Here. It's not the file always push -- poll the
update. Using this algorithm, the file will be -- the file owner will be polled at
approximately the same rate as file update rate. So if the file ->>: Usually you don't know that data rate.
>> Haiying Shen: This is the feature of that algorithm. But it's not 100 percent accurate,
I believe. Approximately. That's why I put approximately. So in the example you
described, if there's no update, there's no polling.
>>: But if you ask for the file, you still need to do [inaudible].
>> Haiying Shen: If I ask ->>: Not update rate, but like client request rate. For example, I want to show new
content filed every second. So you -- as a client. Will you poll the owner every second?
>> Haiying Shen: Could you please ->>: So for each client, it wants to see the file. So you said there is a rate which client
requests the file.
>> Haiying Shen: Yes.
>>: So is it related to update rate? Or is it how you [inaudible]?
>> Haiying Shen: When a client wants a file, it just is in the request, right? And then the
request will be routed based on P2P routing algorithm. Why is related to [inaudible]?
>>: Yeah. Because how do you know if [inaudible] client requested the file. I don't load
it. I have it in local cache.
>> Haiying Shen: Right.
>>: So next time I ask for it in a second.
>> Haiying Shen: Yes.
>>: What happens in your case? Do you go and recheck with the owner if it's current?
Or do you just return the copy?
>> Haiying Shen: Okay. I see. I got it. So the client have a cache -- I think this is not
related to this work. If there's a replication algorithm, say the client replicated on
purpose, and then later on when it -- the client wants the file, it can [inaudible] previously
retrieve the file. Right?
But if it's -- it's a number case. Without file replication, just to say file a request, I get
file, it's usually -- it's stored in the cache, right? So that is operating system problem, so
whether it will get file from the cache or not.
>>: Cache, I mean [inaudible].
>> Haiying Shen: Okay. Okay. Okay. Then it needs to poll the file owner for the
update.
>>: So if you have a large number of clients, all readily available, constantly poll?
>> Haiying Shen: Right. Right. The file ->>: In a push model -- in a push model they will get always local [inaudible] only poll
the serve, and they get [inaudible] they knew the update happened, right?
>>: [inaudible] you really don't know the file update rate. It's really -- I mean -- it's not a
deterministic process. Sometimes the owner update the file frequently, sometimes he will
just leave it there.
Think about basically when you work on the file, let's say, sometimes you basically do
remissions quite frequently, I mean, then maybe goes to sleep, you leave the file there, I
mean, you don't update at all, and then you restart it to update again.
So, I mean, this process in the poll model, the polling -- the update rate is actually not
known until you basically -- somehow the server somehow tell the client how frequently
[inaudible].
>>: The update rate is very low, then the chance of getting [inaudible] I mean
[inaudible].
>> Jin Li: I think we can move the discussion, I mean, to basically a more -- we can
move to a conference room rather than use this room, basically more friendly discussions.
Let's thank Professor Shen for the interesting talk.
>> Haiying Shen: Thank you very much.
[applause]
Download