>> Jin Li: Hello. It's my great pleasure to welcome Haiying Shen to come to Microsoft and to give a talk. Ms. Shen get her Ph.D. degree from Wayne State University in year 206. Currently she is assistant professor in Clemson University with focus on peer-to-peer systems, distributed computing, and their various solutions. She has published more than 60 research papers in various top journals and conferences as being on programming committee on many international conferences. Without further adieu, let's hear what Professor Shen has to say about efficient and effective file replication and consistency management maintenance in P2P systems. >> Haiying Shen: Thank you, Jin. Good afternoon, everyone. It's my great pleasure to give a presentation in Microsoft. I would like to thank Jin for the invitation. My name is Haiying Shen. I'm assistant professor in the Clemson University in the EC department. The topic of my presentation is efficient and effective file replication and the consistency maintenance in P2P systems. Now, first I would like to briefly introduce my research areas. My research is mainly focussed on distributed and parallel computed networks in computer systems, including P2P and content delivery networks, high-performance grid computing, mobile computing, and so on. I have been conducting research in peer-to-peer systems since I was a Ph.D. student in 2002. In this area, I have been working on topology for scalable lookups, load balancing, congestion control, file replication and consistency maintenance, and so on. Recently I'm working on the video streaming. In mobile computing, I have been working on the routing, reputation systems, energy efficiency, social networks and so on. This presentation is based on two publications in IEEE Computer Systems in 2009. Transactions on Parallel and Now, let's see the outline. First is introduction, and then I will introduce the background and the related works about file replication and the consistency maintenance. After that I'll present our methods, including IRM, integrated file replication and consistency maintenance, and a GeWave, geographically aware wave for file consistency maintenance. And then I'll present our evaluation results and finally I conclude my presentation. All right. As we know that in [inaudible] Client/Server architecture peer-to-peer is a completely decentralized system. It's well known for its high scalability, reliability, dynamism, resiliency, organizing and so on. BitTorrent and Overnet applications are peer-to-peer networks, which are widely used nowadays. And this talk is focussed on structured peer-to-peer systems, but the methods can also be applied to your own structured P2P. Now, in P2P file sharing system, every node has a routing table recording its neighbors. So we now know the requests of file, the request will be forwarded to the file's destination and then the file will be sent back to the requester. In P2P file sharing systems, the file access is highly repetitive and skewed towards the most popular ones. So if a file -- if a node becomes a hotspot, there will be a delayed response. File replication is one solution to deal with such problems. It replicates a file to some other nodes in order to distribute the query load among a number of nodes and to avoid a hotspot so that the file query efficiency can be enhanced. A file replication needs file consistency maintenance in order to keep the consistency between a file and its replicas. For example, if a file is changed. So all its replicas should be updated correspondingly. For example, in eBay transactions, in shared calendar, in banking, in flight control systems, we all need consistency maintenance to maintain the consistency between the replicas and the file. Now, Randy here is eBay's architecture principle. He proposed a five -- eBay's five commandments. One of the five commandments is embrace inconsistency. 99 percent of eBay stuff is based on inconsistency, and only 1 percent of transactions are based on consistency. So this time I come to Seattle to attend the P2P conference. So the presenter of the keynote of this morning, who is Dr. Ken Birman from Cornell University, he mentioned that today's embrace of inconsistency has given us scalable service, but we can just cannot trust. So he mentioned that consistency maintenance is really very important in our daily life or in the realm of business. Now, next let me introduced the related works of file replication. Previous work of file replication can be classified into three categories. The first method replicates a file in the nodes close to the file owner. The second method replicates a file in the requester. In the requester. And then the third method replicates a file along the nodes in the path length. So we can see maybe the requester no longer requested a file later on. So the replica -but the replica is still there. Maybe the nodes along the path will not receive the requests later on. So the drawback of previous method is that those methods cannot adapt to time-varying replicas, utilization replicas. They may be utilized -- now recently they may be only utilized later on, so they did not adaptively adjust to the time-varying utilization. Second, it's difficult to ensure that all replicas are fully utilized. For example, the replica's requester actually is not shared by other nodes. Only when the requester requests the file later on it can get a file from itself. And also those methods generate high overhead for unnecessary replicas and consistency maintenance. Because the more replicas, the higher overhead for consistency maintenance. Right? So let's see some representative works for file replication. Now, this work replicates a file to nodes close to the file owner. So the advantage of this method is that it can enhance the replica hit rate. Because when the requests are forwarded to the server, the request can always encounter the replica nodes around the server. And also it can enhance the lookup efficiency. But this advantage is that it produce overloaded nodes because the replica nodes are ground to the server. So there are only a few options. That's why those nodes may be overloaded. And also it cannot significantly improve the query efficiency. Why? Because the requests encounter the replica nodes only when it is close to the server, right? So it cannot significantly improve -- reduce the path length. Now, that is one method. Another method replicates a file to requesters. So the advantage is that it can enhance the lookup efficiency because when the requester requests the file again and it can get the file from itself. Now, however, it has low replica hit rate because this replica is not shared by others. It can only use by itself. Another method is to replicate a file to nodes along the path length, along the path. So of course it can enhance the replica hit rate because there are lot more replicas along the path. And also it can significantly enhance lookup efficiency. So the drawback is it has significantly more replicas as though it generates high overhead. Now, let's see the related work on consistency maintenance. The previous work for consistency maintenance can be classified to two groups: one is structure-based method, another is method-spreading -- message-spreading methods. In structure-based methods, all nodes constitute a structure, such as hierarchical structure, true structure. But we know that P2P is characterized by churn, nodes leave and are drawn continuously and frequently. So for the structure-based method, the dynamism leads to unsuccessful update propagation. That means in a tree, if a node fails and all of its children cannot get the update successfully. Okay. And also dynamism leads to high structure maintenance overhead. That is structure. There is actual overhead for maintenance. Message is spreading message, such as broadcasting, gossip, random walk. So of course they leaded to tremendously redundant messages. And also one node may receive more than one update. And also they cannot guarantee every replica node can get an update. So those are the drawbacks of the two kind of message. So basically the drawbacks of previous work is that they cannot guarantee success for update and they generate high overhead. So the essential problem is passively accepting update messages makes it difficult that you avoid unnecessary update. Why? Because replica nodes just waiting for the update. They cannot control whether they should ask for update or when they ask for the update. So that's why they have those drawbacks. And also we observe that that -- for some files, the query frequency is less than update frequency. A file may be updated very frequently, but it's requested in a less frequency. So that every time the file is updated, there is an update, but it's requested less frequently. So in previous work, they tried to get guarantee that every time a file is changed, there is update. But we lose the strict requirement. We want to guarantee that when a requester receives a file, it is updated file. It doesn't matter how many updates are executed. All right. Now, let's see representative work. This work is proposed -- is published in 2008 IEEE Transactions of Parallel Distributed Systems. So they build a hierarchical structure. Those nodes are high-capacity nodes and physically close nodes constitute a tree, so the replica update is forwarded along the high-capacity nodes and then farther forwarded along the tree. And this is another structure-based work proposed in 2005. Basically they built a tree based on all the nodes in the system. And every node has an index indicating whether its children have the replica or not. If its children doesn't have the replicas, the propagation stop here. Otherwise the propagation will be farther forwarded to its children. Now, this work is message spreading method, push/pull. So normally nodes use rumor spreading for update. And only when a new node joins in the system it will pull the replica nodes to poll update. So we found that there's interdependency between file replication and consistency maintenance. The file replication needs to minimize the number of replicas in order to minimize the overhead of consistency maintenance. On the other hand, all consistency maintenance needs to help file replication to keep the consistency between a file and its replicas in dynamism. So integrating these two techniques can enhance their mutual interactions, can avoid their conflicting behaviors, and can ensure that the two techniques can be exploited to their fullest capacities. Therefore, we propose IRM, Integrated File Replication and Consistency Maintenance, in a harmonic and coordinated manner. Unfortunately previous work addresses these two issues separately, so that's why we propose a method that combine them in a coordinated manner. The features of IRM include identify effectiveness and low cost. So basically each node in the system actively decide whether to create or delete a replica and actively polls for the update rather than passively waiting for the update. A node that replicates highly queried files polls at a high frequency for frequently updated and queried files. So IRM avoids unnecessary file replications and updates by dynamically adapting to time-varying file query and update rates so that it improves replica hit rate, improves replica utilization, improves file query efficiency and the consistency fidelity. All right. Now, let's see the issues addressed in IRM. File replication. Where to replicate files so that the file query can be significantly expedited and the replicas can be fully utilized? We don't want to see underutilized replica because it's a waste of overhead. And how to remove underutilized file replicas so that the overhead for consistency maintenance is minimized. Now, in consistency maintenance, there are also two issues. One is how to determine the frequency that a replica node probes a file owner in order to guarantee timely file update. If it probes too fast, it's a waste of resource. But if it probes too slow, maybe the file will not be updated in a timely manner. Okay. The next issue is how to reduce the number of polling operations to save cost and meanwhile provide the fidelity of consistency maintenance. Overview of IRM. So in P2P file sharing systems, some nodes have much higher traffic load than other nodes due to three reasons. First is node interests are different. So there are lot more traffic between the interested nodes and the file owner than the node was not interested in the file. Second, file popularity is nonuniform and time-varying. So a file may be popular at this time, but it may not be popular at that time. Third, nodes are located in different places and may have different number of neighbors in P2P overlay network. Now, let's see overview of IRM. For file replication, we can see that traffic junction nodes carry more -- carry more traffic load. So IRM replicates files in frequent requesters and traffic junction nodes. In order to improve the utilization of replica. Now, this is IRM for file replication. For consistency maintenance, the node -- replica nodes will actively poll the file owner for the update. Now, let's see the details. In the case when a file requester requests a file, we define a query initiating rate for file F as the number of queries for F sent by the requester during a unit of time denoted by QF. So we set a threshold for that query initiating rate, TQ. So when a file requester's QF is greater than TQ, it replicates the file. And for the case that a traffic -- traffic junction node replicates a file, we define query passing rate of a file F as the number of queries for F received and forwarded by the node during a unit of time denoted by LF. So we set a threshold TL. When LF is greater than TL, the nodes add a replica request to the file requester. And then the file owner will send the file to the replica requesters. Now, let's see the process. So those nodes will check whether the LF is greater than TL. If yes, then includes a file replica requester into the file requester. And then after the server receives the requester, the server will replicate the file to the replica requesters. And also send the file back to -the file to the requester. And then the requester checks its QF. If QF is greater than TQ, the requester will replicate the file. Now, another problem is the server may receive a lot of requesters. But how the server can determine whether it should replicate a file to the requester or not. So we define the server as the original file owner and the replica nodes. So a server's query load L. It's defined as the visit rate during T. And so is capacity C as represented by the number of queries it can respond during T. Now, there's a factor gamma. And the server I periodically checks its LI. If it's load over capacity is greater than the alpha, it releases actual load. It releases actual load. If it is less than 1 over alpha, it will determine whether it replicates the file or not. If the benefit is greater than cost, then it will replicate the replicated file. Otherwise it does nothing. All right. Now, the file replicas may become underutilized. So in this case the replica nodes periodically check their load. If the load is less than TL, they remove the file replica so that IRM ensures that all replicas are fully utilized. There's no waste of overhead for maintaining the replicas or for the consistency maintenance. Okay. Next is the how to appropriately determine the value of QF and LF. For example, maybe the requester is very interested in a file at this time but next time period it lost the interest. And then later on it's interested in file again so that there will be a fluctuation of replica creation, replica removal, replica creation. So in order to avoid the resources wasted for the fluctuation, IRM employs exponential moving average technique to reasonably determine file query rate over time T. So basically it has a factor. So it does not discard the old observation, but it still use the new observation. So this is the formula for determining the query rate. Okay. Now, recall that in IRM replica nodes polls the file owner for update. Okay. Let's see the benefiting of polling. It can reduce redundant updates. So a node does not necessarily need to update its replica after each update, as I mentioned. A file may be updated frequently, but the request is coming less frequently. So we just want to make sure the requester receives the updated file. So every time when a file is updated, the node does not have to update it until, you know, there is a request. >>: Question. When the request actually comes in, do we need to check and verify if the file is of the latest version? >> Haiying Shen: Yes. >>: Okay. So and the time the request actually come in, you still need to do a verification with our original server? >> Haiying Shen: No, it does not do verification on purpose. For example, when I received the request, I'm still in the process of polling. Then I just wait. I hold the request until I get the updated file. But if I'm sure this is the updated file so far, then I just respond. >>: So even if this is nothing but true ->> Haiying Shen: Updated file. Yes. >>: [inaudible] version, right? >> Haiying Shen: Yes. >>: Okay. So in that case you can still have some inconsistency. >> Haiying Shen: Right. Right. Yes. Relatively. I will introduce the details later. Okay. Next is it can enhance the fidelity of consistency guarantees of query results. So that's what Jin mentioned. When a node is in the process of polling, there comes a request, the node will hold the request. And then after it gets the up-to-date file, it will respond to the requester. Third, it enhances consistency guarantee in churn. Instead of passively waiting for updates which may lead it to update failures due to known failures or departures, a node can actively poll to get timely update. For example, in a structure, if a node left, then its children cannot get the update successfully. So in polling, so a node just polls the file owner for update. So there's no unsuccessful update. Yes. >>: So but that assumes that there is an owner for every file and this owner is [inaudible]. >> Haiying Shen: Right. >>: So how do you -- [inaudible] and you can just put the file into the P2P system, usually user replicas to guarantee the ability, in your case practical means that the owner is down, the whole system is down practically, right? >> Haiying Shen: Right. Yes. Actually that is a question when I was coming here. I was thinking when I was on the way. So actually if there's no replica, if there's no replica, if the server is down, the file is not available anymore, right? >>: If there is a replica -- oh ->> Haiying Shen: Yeah, but there is no replica. So, yes, that is really very good question. So in our work, we just assume, you know, the replica helps the server to serve the clients. But if the server is down, all replicas are invalid. It's invalid. Replicas -- all replicas we assume they are down. They are not available. The original file is gone. But, you know, this is -- the problem actually goes back to the node failure in P2P, right? P2P can handle node [inaudible] node departures. But P2P cannot handle node failures. If a node fails, then all files is lost -- are lost, right? Okay. So yes. So this problem actually I think it should be addressed by the P2P itself, the traditional research work, rather than the replica -- replication consistency maintenance. Good point. Okay. Let's go on. It helps to guarantee timely update for all replicas. Because in traditional update propagation, the update may be forwarded by a number of relay nodes before it arrives at the replica node. But in polling, it's just the transmission is -- they are actually between the file server and the replica node. So there's no delay. Yes. >>: So what's the reason to have relay nodes in general in P2P? So my understanding was that you use relay nodes because you usually cannot reach directly other nodes and you create the tree. >> Haiying Shen: Right. >>: So there is -- in some situations you just want to have a direct route from the node to the [inaudible]. >> Haiying Shen: The reason is to conduct the consistency in a distributed manner. For example here. This is structure. So the node -- the replica update is forwarded or propagated among the parents first, and then they will be further forwarded along the tree. Or this structure. So the reason to build a structure for propagation is in order to do the propagation in decentralized way. >>: But then that assumes that -- but it doesn't guarantee that there is direct route -direct way to connect to any node, right? >> Haiying Shen: It's easy. >>: In your case [inaudible] any node can talk to any node, right? And in P2P usually you don't do that. Because then you check for updates directly, right? You check through the [inaudible]. >> Haiying Shen: No. >>: I think here there's basically different philosophies. So, I mean, on the Internet -actually, any two nodes can try to establish a connection. I mean, not actually any two node can actually establish connection, because, yes, [inaudible] was basically [inaudible]. And to establish a connection actually an expensive operation. Even if two nodes, I mean, can basically [inaudible] traversal basically operations. It takes time because, I mean, basically [inaudible] you need to go through. In our state, there's two [inaudible] one is topology view, which I don't see covered here. So it's basically like the same -- I mean, let's say we have a huge popular social network, like Facebook or something, and we all participate. From time to time I want content. So one [inaudible] basically potential operation is that each time I want a new content form [inaudible]. So when I, let's say, look at this set of photos, my node is going to join your cluster, open an established collection among themselves and try to poll [inaudible]. This -- the work discussed here, more -- assuming you have more stable network, so it's like, for example, when you boot your machine, the machine just join a big cloud, a [inaudible] or something, then, I mean, you basically request content, so overlay route in this cloud. This way you do not have the established cost, basically peer connection. But you do incur routing cost because the content is going to route along the way. And our state, I will argue that a better solution is probably combination of the two. Sometimes you need to evolve the topology, and sometimes you probably need to do routing. >>: Yeah. But my only question is, though, you [inaudible] because you practically say that to poll from this you need to [inaudible]. >> Haiying Shen: Right. >>: [inaudible] to that file will actually go to the owner ->> Haiying Shen: Right. >>: -- and do direct connection to the owner. >> Haiying Shen: Yes. Again, good point. This is -- the work so far, and then I'll introduce a structure that can conduct the consistency maintenance in a decentral -distributed way. It is a good point. Okay. Because if all nodes polled the server, the server will become a hotspot. It will be overloaded. And also I agree with you. It is against the P2P principle distribution, decentralized manner. All right. Let's go on. The TTR is time to refresh. So the nodes will poll for update every TTR. So the question is how to determine TTR so that the polling is conducted at approximately the same rate as file update rate. We don't want to specify a fixed rate. Because if it's too fast, it's a resource waste. If it's too slow, it cannot guarantee the files are up-to-date. So how to determine. This is very interesting. So the IRM relies on linear increase multiplicative decrease algorithms. Now, a file's maximum update rate is one over delta T. And the TTR is set to delta T initially. So when replica nodes polls the file owner, if the file does not change between successive polls, the TTR equals TTR old plus alpha. Otherwise, it's equal to TTR old divided by beta. Now, in order to limit the TTR within certain range, we specify TTR min and TTR maximum and use this formula to limit the TTR to certain range. So that the polling is conducted at approximately the same rate as file update. As a result, this consistency maintenance algorithm can avoid unsuccessful update in dynamism with less overhead. Okay. So another observation is that query frequency, when query frequency is less than polling frequency, it's an overhead waste to poll the file's owner. I mentioned that this is -- several times. Now, let's see example why. If this file replica nodes polls twice per second, once, twice per second, but this replica is requested only once per second, so it does not have to update twice per second. It only needs to update once per second because our goal is to ensure the file is up-to-date when it's requested. Our goal is not to guarantee the file is always up-to-date. Okay. So that we define if the TTR is no more than file query frequency it will be equal to file query frequency. Do I make myself clear here? This is very important observation because no previous work deals with this problem, so that the polling overhead can be reduced. >>: It assumes that the files change periodically and is given frequency, [inaudible]. >> Haiying Shen: No. Unfortunately no. But we really do want to -- you know, to have the observation in real world, in real application. Okay. So now let's go back to your question. Yes. So if all nodes polls the file server, the server will be overloaded. And also it's against the principle P2P. All right. So we develop the GeWave file consistency maintenance. It's two goals. One is it enables propagation be conducted among geographically close nodes. Or in P2P area we call topology aware, locality aware, or proximity aware. And secondly is decentralized way. Now, before I introduce GeWave, let me introduce the Hilbert number. We have a number of landmark nodes. We measured the distances between each node and the landmark nodes. And each node will have landmark vector. So physically close nodes have similar landmark vectors. And then we use help with the curve to transform D dimensional vector to one dimension ID. And we can see that if you -- here in D dimensional space U is close to U prime but farther away from V after the D dimensional space is transformed to one dimension space, U is still close to U prime but is farther away from V. So that's the feature of a Hilbert curve. It can preserve the closeness of nodes in the dimension reduction so that the similar landmark vectors will be transformed to close Hilbert number. And then we know that similar close nodes have similar landmark vectors, so they will have similar help with numbers. If they have the same help with numbers, they are physically close. If they have totally different help with numbers, they are physically far away from each other. Okay. So this is the GeWave. The GeWave is built dynamically based on node geographical location and the polling rate. So we can see that the -- it takes the file owner as the tree root and constitutes nodes into different levels based on polling rate levels; nodes in the upper levels having high polling frequency than those in the lower levels. So that it ensures that the parents always have up-to-date file earlier than the children. The children the poll their parents for update. And also those nodes are physically close nodes. DEF, DEF, DEFG. Okay. So they have the same Hilbert number, 5. So the GeWave connects the geographically close nodes together. And an update is propagated in the fashion of top-down wave between geographically close nodes. Okay. So it's updated in this way. Because those nodes have higher frequency than the lower-level nodes, therefore the children can poll their parents for the update. And also we can find that the polling is between the physically close nodes. So the overhead is reduced because the transmission is in close distance. Therefore, the GeWave achieves the two goals. One is the propagation or polling is conducted in a decentralized manner, and also the polling is conducted between physically close nodes. Any questions? Okay. Basically here the children, this level -- the nodes in level four polls the nodes in level three. And actually it's from the top to the bottom. Nodes in level one polls file owner. Nodes in file two polls nodes in level one. Nodes in level three poll nodes in level two. Okay. And physically close nodes, they poll each ore. Now, let's see the GeWave structure maintenance. So actually the file owner is controlling the nodes. So it will collect the information including TTR and H of its replica nodes and then it locates the replica node in the tree. And for replica node departure, the node will notify its neighbors, predecessor and its successor, that it's leaving. And the leaving node transfers its children to its neighbors. For example, if two leaves this tree, it will transfer its children to its neighbor to here. Okay. And then replica node failure. So for failure it's easy. If this node polls its parent but it did not -- it does not get a response during certain time, and then it just go back to the original method. It polls the file owner. And that's why it is dynamism resilient. Replica node mobility. So the node at TTR may be changed. So when the TTR is increased, it will traverse along a tree until it reaches its level and then traverse in horizontal direction. Because all those values are ordered in ascending order. Okay. So that is replica node mobility. >>: Can this structure deal with the change in the polling pattern of a file? Let's say I'm currently interested in this particular [inaudible]. After a while I'm not interested in this anymore. So I'm stopping -- stop requesting that content. How does this structure [inaudible]? >> Haiying Shen: All those nodes are replica nodes. >>: Okay. And how do -- I mean, but the -- I mean, the replicas nodes they are requesting is determined by the [inaudible], right? >> Haiying Shen: Right. Right. >>: What I'm asking is that if the unit request frequency changes, then let's say maybe in this period of time [inaudible]. Later, I mean, I'm not interested [inaudible] that changes. Is this structure [inaudible]? >> Haiying Shen: If user is interested. Yes. So when replica is underutilized, if the user is not interested in the file any longer, so the replica will be removed, right? >>: Okay. >> Haiying Shen: In order to guarantee every replica has a high utilization so that that is -- replica know the departure. >>: Departure. Okay. So in that case the replica node will depart ->> Haiying Shen: Right. >>: -- from that [inaudible]. >> Haiying Shen: Yes. >>: If a node have increase in term of [inaudible]. >> Haiying Shen: Then there will be more replica nodes. And then this is replica node creation. >>: Okay. Let's say I already have a replica node. And in the past that frequency [inaudible]. Later the frequency of that node increases. How is that going to -- is that node going to move up? >> Haiying Shen: If the frequency increases -- we have threshold, so if the frequency is higher than the threshold, there is only one replica. But definitely the other nodes will have higher request frequency. So the other nodes will also have replicas. >>: I notice in this -- because, I mean, [inaudible] you're starting to change topology. >> Haiying Shen: [inaudible] another topology. >>: You said you're starting to build another topology based on usage patterns. When will you start to build a topology? When will you just use the existing topology [inaudible]? >> Haiying Shen: When? Start to build the topology when start to poll. So when there's node, even if there's one node, then this topology can be viewed. It's like P2P. So initially there's only one node. And then when the second node join in the system, they're connected. When the third join node joins in the system, they are connected. Finally they become maybe ->>: So what's the use case? I don't understand that. Because for me these systems usually have large number of files. >> Haiying Shen: Right. >>: And these files are usually -- there is not very good figurative [inaudible] of those files. And it looks like practically if I have million files in my hard drive [inaudible] which I'm sharing and I have people requesting them, it will be meaning that I will have high [inaudible]. And some of this -- and my nodes can [inaudible] servers just because the topology is very different. And also I just don't -- what's the use case to requesting the same file [inaudible]? It looks more like messaging system is limited number of topics. >> Haiying Shen: Okay. >>: Poll based messaging system rather than file storage or something like that. >> Haiying Shen: Okay. So it's true, because there are more fields -- more fields, the file hardly change. But currently there are more and more applications that need consistency maintenance. For example, shared calendar. Shared calendar. I change my calendar, some others change calendar, we need to see each other. >>: [inaudible] the messaging system, the queuing system. >> Haiying Shen: Yes. Message, that is. Or, you know ->>: [inaudible] queuing system, but it's not file sharing system, it's more like -- I mean, maybe it's right thing. I just mean that use case is not file sharing [inaudible]. >> Haiying Shen: The user case can be applied to the applications that have frequently updated contents. >>: I see. A limited number of those files. >> Haiying Shen: Limited. Yes, yes. Well, banking, stock market. >>: Small number of [inaudible]. >>: I think in the real cases, what you have is a large number of files, only a small amount can change it. >>: So then [inaudible]. So my point is that it's a place -- a different use case than file sharing. It's more like messaging system and you have frequent updates with small number of data content, right? >> Haiying Shen: We can all supply to the file sharing. Why you cannot. >>: You need to deal with those polled files. You need to deal with those files. >>: [inaudible] an example. You have 100k files, right, and [inaudible] I don't know, 100k node [inaudible] 1000 node [inaudible] 10k files and they're sharing these files [inaudible]. I just -- this will create practically a mess because it will [inaudible] very dynamic structure [inaudible]. >> Haiying Shen: Right. >>: So your topology is per file and not -- I don't understand how you [inaudible] function in real life. >> Haiying Shen: It's true. But, you know, as the keynote this morning, the keynote present a mission. There's a conflict between scalability and the consistency maintenance. In order to achieve high scalability, we needed to resort to such a structure. We cannot use broadcasting messages or spreading of gossip. But, on the other hand, you're right. To maintain, you know, such a structure for each file, it may lead it to high overhead. But there is a tradeoff. >>: [inaudible] consistency can be sought in different ways. A way I see consistency solved is that you do not guarantee the consistency; you guarantee the latest version. >> Haiying Shen: Right, right. >>: [inaudible] make sure, I mean, basically [inaudible] you have a way to resolve conflict. So, I mean, for a Live Mash is basically such a system. You -- basically each file attach with it a version that basically is the change history of those files. Now, I mean, the server can go down. And you can leave replicas which is all the replicas. That means when I retrieve a file, I may retrieve an older version of the file, but that's fine anyway. The problem is that when later that server comes up again and if we meet the system, the system will figure out which one is later and deliver the newer version. And that way you can solve consistency to some degree. You can still have consistency, basically branches. [inaudible] the server, I mean, can go off and we each make our modification. >>: But the inconsistency is already solved. The doctor says if file owner is down, we don't serve the file. [multiple people speaking at once] >>: It's a way to solve it. It's expensive. It's not a good experience. That means when the server is down, that file becomes unavailable. I mean, usually we're arguing you want more availability [inaudible] consistency. >>: Yes. But ->>: [inaudible] you and I co-edit a document. You want the document to be editable. Not to say, okay, when you are the owner you're gone, well, I will not be able to edit the document. >>: Then we go into [inaudible]. >> Haiying Shen: Okay. >>: [inaudible] but OneNote does that. OneNote allows for disconnecting of a shared document. Because it basically -- if you look at the document structure on disk, it's a change history with branches. And they have an automated mechanism [inaudible]. >>: Basically [inaudible] the branch. >>: [inaudible] >> Haiying Shen: So. Yeah. The -- actually the application depends on some systems. Some application needs very strict consistency maintenance, like presented this morning. He give us an example. Like airplane can choose its term. You know, if they're not consistency, this system say, okay, there's no plane, go up, and that system there's no plane, go down. >>: [inaudible] but what we [inaudible] I think nowadays there's more and more application that [inaudible] consistency. It is true if you have systems [inaudible] but those examples serve basically pretty well [inaudible]. So basically -- or the structure is in one place. Every time you want [inaudible] go into that particular server. >> Haiying Shen: Yes. Yes. >>: Is that server okay? >> Haiying Shen: Right. >>: I mean, the majority of case when we're dealing with scalability, we need to have some consistency. I mean, think about Facebook, basically, MySpace [inaudible] application where you have huge amount of data [inaudible] consistency, how are you not to survive without consistency? >> Haiying Shen: Yeah. The kind of server model sometimes works much better than maybe decentralizing. >>: If you [inaudible]. >> Haiying Shen: Like a Google search. But then let's -- we should discuss the problem. Is P2P really, you know, have -- does P2P really have more advantage than the ClientServer model. So for P2P researchers, we always say, okay, P2P decentralized has high scalability. It's better than client server. But is it really true ->>: It is. I'm not arguing peer-to-peer isn't good for every application or every single case, it's good in scenarios where you demand scalability. I mean, if your demand is have some consistency, airline [inaudible] don't bother [inaudible]. >> Haiying Shen: [inaudible] >>: [inaudible] >> Haiying Shen: Okay. Yeah, we can discuss it offline. Because, you know, I'm always in favor of peer-to-peer. I think when there are lot of users, a lot of nodes, it's better to use peer-to-peer. >>: I would say peer-to-peer is not for every application. It's not [inaudible]. >> Haiying Shen: Okay. All right. So let's go on. I presented the performance evaluation. So we compared IRM with the three category works: ServerSide, ClientSide and the Path. And so we record that ServerSide replicates files near the server, ClientSide near the client, and Path along the nodes in the path. So let's see the performance. Here this metric is replica hit rate. We hope that the replication method can have higher hit rate. So there's less underutilized replicas. So from the figure we can see that Path has higher hit rate than IRM. And then IRM has higher hit rate than ServerSide and ClientSide. It's understandable that these three methods have the same number of replicas because every time a server is overloaded, it creates a replica. So they have the same number of replicas. But Path replicates file in all nodes along the routing path. So it's understandable because it has much more replicas. That's why it has much higher replica hit rate. Now, this is node utilization. I didn't introduce the load balancing idea in my presentation. So basically when IRM generates replicas you can see there's no available capacity, but other methods do not. That's why IRM can always restrict the node utilization around the [inaudible] but others, maybe like this ClientServer, it has high node utilization. Some nodes may be overloaded. So overhead of file replication. The Y axis is number of file replicas. It's a very easy to understand this figure because Path replicates all the node in a routing pass. That's what has a lot of replicas. But ->>: [inaudible] >> Haiying Shen: Could you please say it again? >>: These path, is it the one which actively pushes the updates? >> Haiying Shen: No, no, no. Here we just compare the file replication. File replication method. Those three methods are all file replication method. That's why I said previous work adjusts file replication and consistency separately. So we hope, you know, we combine them in coordinated manner. So why IRM generates less number of file replicas is because it removes the replica, only to utilize the replicas. Okay. So it has a small number of file replicas. And we also compared IRM with and without adaptation. So from left figure we can see that the IRM with and without adaptation almost have the same average routing path lengths and the replica hit rate. But the IRM without adaptation has less -- has more replicas than with adaptation. That means adaptation is good. It helps to reduce the number of replicas. All right. And also we measured the average path length in churn. And we can see that IRM has slightly a little bit longer path length than path. We all know the reason, right? It has much more replicas. And the ServerSide and ClientSide lead to significantly longer path lengths. The reason ClientSide has longer path lengths is because if the requester is down, then that requester has to be forwarded in the original routing algorithm. Now, let's see the maintenance. So we compare the GeWave with SCOPE, UMPT and Push/Poll. Those are the experiment settings. This figure shows the proximity-aware performance. The X axis means the physical distance by hops and the Y axis means the CDF of percent of messages. So from this figure we can see the GeWave and UMPT can always transmit almost 100 percent of messages within 10 hops. So the messages are transmitted within 10 hops. But in Push/Poll and SCOPE, only 30 percent of messages are transmitted between -within 10 hops. So this means in these two methods, the messages are transmitted between physically close nodes. But these two methods, they don't consider the proximity. That's why these two methods have -- has high efficiency. Overhead. Sorry. These -- the letters may not be clear. Average number of messages. So this is the -- X axis means the number of replica nodes. So from this figure we can see the GeWave leads to much less overhead than others. This message, number of messages also included the messages for structure maintenance. And GeWave does not need maintenance. And these two methods need maintenance. And also Push/Poll. We understand message spreading, right, and master generates a lot more messages than others. So from this figure we can conclude that GeWave generates much lower cost than other methods. Communication cost in churn. We define the communication cost as the product of the message size and the physical path lengths. And, again, GeWave generates much less communication cost than others. There are two reasons. First it has -- the messages propagate between physically close nodes, second, it has less messages than others. Effectiveness of consistency maintenance. So we hope that with consistency maintenance mechanism a request always can get the up-to-date file. So this Y axis means the number of up-to-date files receives. Now, from this figure we can see GeWave and Push/Poll can always -- can always get the updated files. But push -- the SCOPE and the UMPT cannot guarantee that the request has received up-to-date files. Why? Because they are based on structure. So in structure you know the fields or know the paths. Before the structure is fixed, their children cannot get the update successfully. And Push/Poll broadcasting, it is more dynamism resilient. All right. Conclusions. So file replication and consistency maintenance are intimately connected. And the previous work just addressed these two issues separately. So we propose integrated file replication and consistency maintenance. Basically nodes actively determine if they needed to replicate a file or not and they actively poll for the update. And also we propose GeWave to conduct the consistency maintenance in a decentralized manner. So briefly they have too many features. One is high effectiveness, seconds is high efficiency. Efficiency means low cost; effectiveness means they will guarantee the file consistency, they will guarantee the file query. Efficiency can be improved. Thank you. Any questions and comments? Yes. >>: So you're saying that [inaudible] model is more efficient than [inaudible] model, right? >> Haiying Shen: Yes. >>: But that assumes [inaudible] very small number of files. In your example [inaudible] but if you have scenario that like if you [inaudible] files or whatever, this is information, and you need to poll for every [inaudible] change them out, even if you don't change any of those, you will have [inaudible], correct? >> Haiying Shen: Push. Every time when a file is updated ->>: [inaudible] >> Haiying Shen: Huh? >>: If files don't ->> Haiying Shen: Right. Right. >>: For example, you have [inaudible] files but update one file per second. >> Haiying Shen: Okay. >>: [inaudible] message per second. You could have two million files. If you don't update daily, you still have a million messages [inaudible] update frequency. >> Haiying Shen: Good question. And personally actually I belive, you know, Push/Poll, they have their own add advantages and disadvantages. They just -- they're suitable in different kind of situations. But in this method -- let me see. Okay. Here. It's not the file always push -- poll the update. Using this algorithm, the file will be -- the file owner will be polled at approximately the same rate as file update rate. So if the file ->>: Usually you don't know that data rate. >> Haiying Shen: This is the feature of that algorithm. But it's not 100 percent accurate, I believe. Approximately. That's why I put approximately. So in the example you described, if there's no update, there's no polling. >>: But if you ask for the file, you still need to do [inaudible]. >> Haiying Shen: If I ask ->>: Not update rate, but like client request rate. For example, I want to show new content filed every second. So you -- as a client. Will you poll the owner every second? >> Haiying Shen: Could you please ->>: So for each client, it wants to see the file. So you said there is a rate which client requests the file. >> Haiying Shen: Yes. >>: So is it related to update rate? Or is it how you [inaudible]? >> Haiying Shen: When a client wants a file, it just is in the request, right? And then the request will be routed based on P2P routing algorithm. Why is related to [inaudible]? >>: Yeah. Because how do you know if [inaudible] client requested the file. I don't load it. I have it in local cache. >> Haiying Shen: Right. >>: So next time I ask for it in a second. >> Haiying Shen: Yes. >>: What happens in your case? Do you go and recheck with the owner if it's current? Or do you just return the copy? >> Haiying Shen: Okay. I see. I got it. So the client have a cache -- I think this is not related to this work. If there's a replication algorithm, say the client replicated on purpose, and then later on when it -- the client wants the file, it can [inaudible] previously retrieve the file. Right? But if it's -- it's a number case. Without file replication, just to say file a request, I get file, it's usually -- it's stored in the cache, right? So that is operating system problem, so whether it will get file from the cache or not. >>: Cache, I mean [inaudible]. >> Haiying Shen: Okay. Okay. Okay. Then it needs to poll the file owner for the update. >>: So if you have a large number of clients, all readily available, constantly poll? >> Haiying Shen: Right. Right. The file ->>: In a push model -- in a push model they will get always local [inaudible] only poll the serve, and they get [inaudible] they knew the update happened, right? >>: [inaudible] you really don't know the file update rate. It's really -- I mean -- it's not a deterministic process. Sometimes the owner update the file frequently, sometimes he will just leave it there. Think about basically when you work on the file, let's say, sometimes you basically do remissions quite frequently, I mean, then maybe goes to sleep, you leave the file there, I mean, you don't update at all, and then you restart it to update again. So, I mean, this process in the poll model, the polling -- the update rate is actually not known until you basically -- somehow the server somehow tell the client how frequently [inaudible]. >>: The update rate is very low, then the chance of getting [inaudible] I mean [inaudible]. >> Jin Li: I think we can move the discussion, I mean, to basically a more -- we can move to a conference room rather than use this room, basically more friendly discussions. Let's thank Professor Shen for the interesting talk. >> Haiying Shen: Thank you very much. [applause]