2011 International Conference on Software and Computer Applications IPCSIT vol.9 (2011) © (2011) IACSIT Press, Singapore An Efficient Cache Strategy in Structured Peer-to-Peer Networks Shin-Yi Chou and Yu-Wei Chen + Graduate Institute of Information and Logistics Management National Taipei University of Technology Taipei, Taiwan Abstract. In this paper, we propose a cache strategy which is suitable for decentralized structure peer-topeer (P2P) networks. In our cache strategy, each peer will maintain a list regarding the previous search results. Each peer can reduce most research times by exploring its list and enhance query efficiency. Keywords: Peer-to-Peer, decentralized structure, cache 1. Introduction The use of P2P applications is growing dramatically, particularly for sharing files and software. With recent widespread deployment of P2P technologies, P2P computing is attracting increasing attention. Many P2P systems have emerged recently as platforms for users to search and share information over the Internet. P2P systems can be classified into three categories, centralized, decentralized-unstructured, and decentralized-structured systems. Centralized P2P system has a central server which maintains a directory containing content information of the whole P2P system. A popular example of a centralized system is Napster [1]. However, centralized systems are prone to a single point of failure problem. Decentralized P2P system has no central server. Instead, the query is distributed to each node in the system, so that all nodes are connected as Ad hoc network topology. Each node can send messages to other nodes. Generally speaking, the decentralized P2P system can be divided into two types: unstructured and structured system. In unstructured system such as Gnutella [2], the search mechanism use flooding query messages to neighboring computers within a time-to-live (TTL) framework, it is not scalable. These problems have been extensively studied, such as [3]. Sripanidkulchai et al. proposed a content location solution in which peers are loosely organized to form an interest-based structure on top of the Gnutella network. Chen et al. [4] developed the optimal all-to-all broadcast scheme for the case of one-port communication that they not only require the minimal number of communication steps, but also incur the minimal number of message. The structured system used Distributed Hash Table (DHT) strategy. Each node maintains part of information and shares them to each other. So the fault tolerance and performance scalability is increased. The Chord [1], Tapestry [6], Pastry [7], and CAN [8] are well-known structured P2P networks. These works focus on providing one fundamental lookup service. In addition, the explicit network topology can limit the logical distance between two arbitrary computers to an upper bound. This paper proposes a strategy to improve the search efficiency for structured P2P system. We propose a cache strategy which is suitable for decentralized structure peer-to-peer networks. In our cache strategy, each peer will maintain a list regarding the previous search results. Each peer can reduce most research times by exploring its list. On the system side, the whole system reduce bandwidth load; on the user side, it can enhance query efficiency. + Corresponding author. Tel.: +(886-2) 2771-2171 #2364; fax: +(886-2) 8772-6946. E-mail address:, t8938012@ntut.edu.tw (Shin-Yi Chou), ywchen@ntut.edu.tw (Yu-Wei Chen). 38 The remainder of this paper is organized as follows: Some related works are introduced in Section 2. The design of cache list is presented in Section 3. In Section 4, conclusions are presented. 2. Related Work Recently, many researchers designed some strategies on the top of the existing P2P networks such as Chord [5], Tapestry [6], and Pastry [7] to enhance the efficiency of search. A popular strategy is to distribute part of the information to each node for increasing scalability, fault tolerance and more efficiently search. The approach in Cooperative File System (CFS) [9] is a new P2P storage system that provides provable guarantees for the efficiency and load-balance of file storage. CFS servers provide a distributed hash table (DHash) for block storage. DHash distributes and caches blocks at a fine granularity to achieve load balance and decreases latency with server selection. DHash finds block using the Chord location protocol, which operates in time logarithmic in the number of servers. In PAST [10], a large-scale P2P persistent storage utility, Rowstron and Druschel present and evaluate the storage management caching. In the PAST system, storage nodes and files are each assigned uniformly distributed identifiers, and replicas of a file are stored at nodes whose identifier matches most closely the file’s identifier. In P2P traffic, cache technology has been extensively studied. Several measurements [11, 12] study user behavior about download and upload. Gummadi et al. [11] probe deeply into modern P2P file sharing systems. They analyzed P2P file sharing traffic in order to dig deeper into the nature of file sharing workload. Their results show that the user behavior causes the P2P systems distribution to deviate substantially from Zipf curves. Sen and Wang [12] characterize the P2P traffic. They observe much skewed distribution in the traffic across the P2P file sharing system. 3. Proposed Cache Strategy The cache strategies have been widely used to reduce the search latency and network traffic. In this section, we present the client a caching strategy. When a client issues a request, it directly connected to the frequently contact node according to the caching strategy for reducing the searching time. 3.1. Node Definition We defined the different nodes in order to clarify the statement in the next section. (Local) node: General users join to P2P environment, called local node. Cache node: Cached by the local node from the cache list, called cache node. Fig. 1: The Cache List in the System. Figure 1 shows an example of the definitions of node. As an example to N2, N2 is a local node and N8, N20, and N24 are cache nodes since they are listed in the cache of N2. 3.2. Cache List 39 We design a cache list l in each node. n The cacche list contaains two typees of informaation includin ng: z k value of cache node key z c cache contennt list We neeed to ensuree the correctnness of the cache c path an nd find the cache c node w which is mattched or not.. When a nodde find cachhe nodes in thhe list, it wiill directly th hrough the key k value andd found the cache nodess location. It can effectiveely reduce thhe routing tim me. 3.3. Joinn and Leavee In P2P file system, nodes’ joinn and leave have high variability. v T node m The must find outt the system m distribution first when the t node joinn a structuredd P2P file sy ystem. Whenn leaving the system, the node shouldd n informaation to ensuure that its opperation is co ontinued. In the t cache strrategy, the join process iss update the node similar the general strructure of thhe P2P systtem. Each node n dynam mically mainttains DHT information.. Especially, local node builds b a buffeered space too store cache,, create a cacche list and a timer. When the t node leavves the systeem, since thee cache list is stored in the local noode, it doesn’t affect anyy structured network. n In thhis conditionn, node just sends s a “leav ve message” to its cache nnode that thee cache nodee realized it has h been leaaves. Sometiimes the “leave messagee” failed sinnce wrong usser behaviorr or networkk bandwidth problem. p Wee called the “accident “ leaave”. In this condition, c it will not notiify the node which in thee cache list, so the nodee which in thhe cache lisst can’t receeive its inforrmation receently. Thereffore, a nodee periodicallyy send a “coonfirm messsage” to the cache nodees which in cache list iin order to confirm thee statement of cache nodees. The node will immediiately updatee the cache innformation w when a cachee node is joinn or leave. 3.4. Searrch Protocool Figure 2 shows thatt the Cache Strategy S in thhe P2P file system. s Eachh node confiigured a cach he list in thee system, andd we set a rouuting protocool as follow: Step 1: Searrch the cachee list and connfirm whetheer the desired d contents exxist in the cacche list or no ot. Step 2: If thhe desired contents exist in i the cache, the node dirrectly connecct to the cachhe node. Step 3: If thhe desired contents do not exist in thee cache, then the original searching prrotocol is exeecuted. Fig. 2.The Caache Strategy in the System m. 4. Concllusion In this paper, p we prroposed a neew cache straategy on a deecentralized P2P system. The cache strategy cann be embeddeed in any of the decentraalized P2P sttructures. In our cache strrategy, each peer will maintain m a listt regarding thhe previous search resultts. User behaaviour, e.g. locality, l incrreased the caache hit ratio o. Each peerr can reduce search timess and enhancce query effficiency by exploring e its list. In futuure work, wee will designn cache strateegies on the top t of differeent structuress. 40 5. References [1] The Napster homepage, http://www.napster.com [2] Clip2.com, The Gnutella Protocol Specification V0.4, http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf, Mar. 2001. [3] K. Sripanidkulchai, B. Maggs, and H. Zhang, Efficient Content Location Using Interest-Based Locality in Peer-toPeer Systems, Proc. IEEE INFOCOM ’03, 2003. [4] M. S. Chen, P. S. Yu, and K. L. Wu, Optimal NODUP All-to-All Broadcasting Schemes in Distributed Computing Systems, IEEE Trans. Parallel and Distributed Systems, vol. 5, pp. 1275-1285, 1994. [5] Stoica, R. Morris,D.R. Karger, M.F.Kaashoek, and H.Balakrishnan, Chord: AScalable Peer-to-Peer Lookup Servicefor Internet Applications, Proc. ACM SIGCOMM, 2001. [6] B. Y. Zhao. J. Kubiatowicz, and A. D. Joseph. Tapestry: a fault-tolerant wide-area application infrastructure. Volume 32, 2002. [7] A. I. T. Rowstron and P. Druschel. Pastry: Scalable decentralized object location, and routing for large-scale peerto-peer storage utility. In SOSP, 2001. [8] S. Ratnasamy, P. Francis, M. Handley, R. M. Karp, and S. Shenker. A scalable content-addressable network. In SIGCOMM, 2001. [9] F. Dabek, M. F. Kaashoek, D. R. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with cfs. In SOAP, 2001. [10] A. I. T. Rowstron and P. Druschel. Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility. In SOSP, 2001. [11] K. Gummadi, R. Dunn, S. Saroiu, S. Gribble, H. Levy, and J. Zahorjan, Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload,Proc. ACM Symp. Operating Systems Principles (SOSP ‘03), pp. 314-329, Oct. 2003. [12] S. Sen and J. Wang, Analyzing Peer-to-Peer Traffic across Large Networks,IEEE/ACM Trans. Networking, vol. 12, no. 2, pp. 219-232, Apr. 2004. 41