International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014 Improved Metadata Management & Scalability in Dynamic Distributed Web Caching Mukesh Dawar#1, Charanjit Singh#2 #1 #2 Research Scholar, CSE Deptt, RIMT-IET, India Assistant Professor, CSE Deptt, RIMT-IET, India Abstract: The World Wide Web can be considered as a large distributed information system that provides access to shared data objects. As one of the most popular applications currently running on the Internet, the size of World Wide Web is of an exponential growth, which results in network congestion and server overloading. Web caching has been recognized as one of the effective schemes to alleviate the server bottleneck and reduce the network traffic, thereby minimize the user access latencies. In this paper, we first describe the elements of a Web caching system and its desirable properties. Then, we implement the techniques which have been used in Web caching systems. Clustering improves the retrieval latency and also helps to provide load balancing in distributed environment. But this cannot ensure the scalability issues, easy handling of frequent disconnections of proxy servers and metadata management issues in the network. network traffic as well. The quality of service and the response times can be improved by decreasing the network load. One way to achieve this is to install a Web caching service. Caching effectively migrate copies of popular documents to the Web clients from the closer Web servers. In general, Web client users see shorter delays when requesting a URL, network managers see less traffic and Web servers see lower request rates. An origin Web server might not only see lower requests rates but primarily will experience a lower server load because files will be fetched with an If-Modified-Since GET HTTP request. Web clients request documents from Web servers, either directly or through a Web cache server or proxy. A Web cache server has the same functionality as a Web server, when seen from the client and the same functionality as a client when seen from a Web server. The primary function of a Web cache server is to store Web documents close to the user, to avoid pulling the same document several times over the same connection, reduce download time and create fewer loads on remote servers. Keywords: Metadata, Metadata Server, Load balancing, Distributed Web Caching, Clustering, Latency, Robustness, Scalability, Disconnection Handling, Proxy server, clients. I. INTRODUCTION Many times users have to face frustrating delays while accessing a web page, congestion at servers and frequent disconnections of servers. Because the use of the Web is growing exponentially it is to be expected that the WWW traffic on the national and international networks also will grow exponential with raising latency. Nevertheless the user expects a high quality of service with modest response times. To maintain its functionality all these latencies must be maintained within the tolerable limits. That’s why upgrades in the networks/servers are always required for providing high speed and continuous services to its users. One solution is to store multiple copies of same document but this will increase the storage and maintenance cost. Another solution is to cache only the frequently accessed documents as most of the documents are rather static. This reduces retrieval latency and ISSN: 2231-5381 Fig 1: General Web Caching Approach II. WEB CACHING TECHNIQUES There are a number of techniques defined previously to do the Web caching. Having described the attributes of an ideal Web caching system, we now survey some schemes described in the literature and point out their inadequacies. The performance of a Web cache system depends on the size of its client community; http://www.ijettjournal.org Page 125 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014 the bigger is the user community, the higher is the probability that a cached document (previously requested) will soon be requested again. Caches sharing mutual trust may assist each other to increase the hit rate. A caching architecture should provide the paradigm for proxies to cooperate efficiently with each other. 2.1 Hierarchical Caching Architectures b) Every hierarchy level may introduce additional delays. c) High level caches may become bottlenecks and have long queuing delays. d) Multiple copies of the same document are stored at different cache levels. 2.2 Distributed Caching Architectures With hierarchical caching, caches are placed at multiple levels of the network. For the sake of simplicity, we assume that there are four levels of caches: bottom, institutional, regional, and national levels. At the bottom level of the hierarchy there are the client/browser caches. When a request is not satisfied by the client cache, the request is redirected to the institutional cache. If the document is not found at the institutional level, the request is then forwarded to the regional level cache which in turn forwards unsatisfied requests to the national level cache. If the document is not found at any cache level, the national level cache contacts directly the original server. When the document is found, either at a cache or at the original server, it travels down the hierarchy, leaving a copy at each of the intermediate caches along its path. Further requests for the same document travel up the caching hierarchy until the document is hit at some cache level. A hierarchical architecture is more bandwidth efficient, particularly when some cooperating cache servers do not have high-speed connectivity. In such a structure, popular Web pages can be efficiently diffused towards the demand. However, there are several problems associated with a caching hierarchy: In distributed caching architecture no intermediate caches are set up and there are only institutional caches at the edge of the network that cooperate to serve each others misses. Since there are no intermediate caches that store and centralize all documents requested by lower level caches, institutional caches need other mechanism to share the documents they contain. Institutional caches can query the other cooperating institutional Caches for documents that resulted in local misses. However, using a query based approach may significantly increase the bandwidth consumption and the experienced latency by the client since a cache needs poll all cooperating caches and wait for the slowest one to answer. Figure 1.3: Distributed Web Caching Architecture 2.3 Hybrid Caching Architecture Figure 1.2: Hierarchical Web Caching Architecture a) To set up such a hierarchy, cache servers often need to be placed at the key access points in the network. This often requires significant coordination among participating cache servers. ISSN: 2231-5381 In a hybrid scheme, caches may cooperate with other caches at the same level or at a higher level using distributed caching. ICP is a typical example. The document is fetched from a parent/neighbor cache that has the lowest RTT. Rabinovich proposed to limit the cooperation between neighbor caches to avoid obtaining documents from distant or slower caches, which could have been retrieved directly from the origin server at a lower cost. Pablo and Christian have proposed a mathematical model to analysis some important performance parameters for all of the above three schemes. They find that http://www.ijettjournal.org Page 126 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014 hierarchical caching system have lower connection time while distributed caching system has lower transmission time. And hierarchical caching has lower bandwidth usage, while distributed caching will distributed the traffic better as it will use more bandwidth in lower network levels. Also the Distributed caching will cost much smaller disk space, about only several Gbytes in an institutional cache; while in the hierarchical caching system, it system and doesn’t generate hot ill need hundreds of Gbytes at top-level cache. Moreover distributed caching system can share very well the total load of system and doesn’t generate hot spots with high load. In a hybrid scheme the latency greatly varies depending on the number of cache cooperate at every network level. 3.1 IMPLEMENTATION PHASES: Every time a client made a request to proxy server the queue length of cluster is checked if it is under limit then queue length of proxy server is checked and if the client limit of proxy server has not exceeded, the client is served otherwise the request is forwarded to a less loaded proxy server. This makes efficient balancing of load in the network for proper handling of all client requests. This strategy will work in following possible phases whenever client requests a proxy server (PSi) of cluster n (CSn) for some page: Step 1. After receiving request from the client the PSi checks its own metadata mdi for the relevant page, if the page found, it is counted as a Hit and the page is replied back to the client immediately. Otherwise PSi forwards the request to the MDS for further search. Step 2. In case of Miss in Step 1, the PSi forwards the request to the Metadata Server (MDSn) of the same cluster. MDSn will check metadata of the all other proxy servers fall into the same cluster (CSn) for the requested page. If the page is found the request is forwarded to that proxy server and response is being transmitted back to the client. Figure 1.4: Hybrid Web Caching Architecture Figure 1.5: The Proposed Scheme’s Architecture III. IMPLEMENTATION The proposed strategy includes origin servers, clusters of proxy servers and clients as shown in Figure 3. One extra node is added to every cluster that is Metadata Server (MDS). MDS’s task is to maintain metadata of all proxy servers within own cluster and metadata of neighboring cluster. In previous strategy every proxy server itself maintains metadata of its own cluster as well as of their neighboring clusters. So this strategy will reduce efforts and time of proxy servers. Step 3. If the page is not found in the other proxy servers of CSn, the MDSn will check its database for the metadata MDn-1 and MDn+1 of neighboring clusters CSn-1 and CSn+1 respectively for the requested page. If there is a Hit the request is forwarded to that PS of neighboring cluster and reply is being transmitted back to the client. Step 4. If the requested page is not found even in the neighboring clusters, the request is forwarded directly to the next neighboring clusters with a factor of 2 that are clusters CSn-2 and CSn+2. The MDSn is not having the metadata of these clusters so request is sent to both of them. If the requested page is found in any of these clusters the page is sent to the client otherwise they send a negative response to the MDSn Step 5. If the requested page is still not found, the request is forwarded directly to the origin server (OS). If there is a Hit at the origin server, the page is returned back to the client by retaining a copy of it in the proxy server PSi as well. Step 6. If the requested page is even not present at the origin server, and MDSn got a negative response from the origin server, a “Page Not Found” message is flashed back to the client. ISSN: 2231-5381 http://www.ijettjournal.org Page 127 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014 In case the requested page found at other PS, a copy of that page is also stored at the PSi before responding back to client and the updated metadata (umd) is transmitted to the MDSn at the next s/m time interval. Next time when the client makes a request for the same page the PS can send the page immediately if it is not stale. Otherwise the PS looks for the fresh page to send to the client. IV. RESULTS V. CONCLUSIONS Web services have become very popular today, but server overloading, scalability, disconnections and network congestion etc. have become a tradeoff to the performance of web. Web caching has come up as a great solution for all these problems and issues. We have discussed some of the problems affecting the performance of web caching and major issues related with the distributed web caching. . In this work, we have proposed a strategy called “Improved Metadata Management & Scalability in Dynamic Distributed Web caching “that can be easily deployed in the future. This is based on the DWCRLD to enhance the scalability and to alleviate extra overhead of metadata management of the proxy servers and also reduces the network traffic as well. This scheme also makes it easy to handle frequent disconnections in the network. By this even if the number of proxy servers grows in the network, metadata management will never be an issue. It further reduces the delays incurred in the replies and also enhances the Hit ratio and decreased the searching time too. REFERENCES [1] [2] Figure 1.6: Graph-1 by simulator for Hit Ratio [3] [4] [5] [6] [7] [8] [9] [10] Figure 1.7: Graph-2 by simulator to show hit ratio of Proxy server, Own cluster, and Neighbor cluster. ISSN: 2231-5381 [11] Vinod Valloppillil, Keith W. Ross., “Cache Array Routing Protocol v1.0”., INTERNET-DRAFT,Page 2-8, Feb 1998. Gwertzman and Seltzer J., and Seltzer M., “World Wide Web cache consistency”, in: Proceedings of the 1996 Usenix Technical Conference, Boston, MA, Harvard College, page 141-152, 1996. K.Worrel1. “Invalidation in large scale network object caches”. Technical report, Master's Thesis, University of Colorado, Boulder,Vol.11, Page 63-76, 1994. M. R. Korupolu and M. Dahlin, “Coordinated placement and replacement for large-scale distributed caches”, Proceedings of the IEEE Workshop on Internet Applications, July 1999 ,Technical Report TR-98-30, December 1998. P. Krishnan and B. Sugla, “Utility of cooperatingWeb proxy caches”, Computer Networks and ISDN Systems, pp. 195-203, April 1998 A. Feldmann, R. Caceres, F. Douglis, G. Glass, and M. Rabinovich, “Performance of Web proxy caching in heterogeneous bandwidth environments”, Proceedings of Infocom’99 ,vol 6, No.6 ,year march 1999. Wessels, D., and Claffey, K. "Internet Cache Protocol (ICP)”, version 2, RFC 2186, September 1997D. S. Hosseini-Khayat, “Improving Object Cache performance through selective placement” Proceedings of the 24th IASTED ,International conference on parallel and distributed computing and networks ,pages 262-265,year 2006. A. Balamasah,Marwankrunz, P. Nain,”Performance analysis of a Client-Side Caching or Prefetching system for Web Traffic” International Journal of Computer and telecommunications networks, vol 52,Issue 13, Pages 3673-3692.,Year 2007. Daniel A.Menasce,Vasudeva Akula , “Improving the performance of online Auctions thought Server-side Activity based caching”,World wide Web , Kluwer Academic Publishers Hingham, MA, USA, Vol. 10,Issue 2,Pages 181-204,Year 2007. Z.Duan, Zhimin Gu, "Dynamic Load Balancing in Web Cache Cluster," Grid and Cooperative Computing, International Conference, 2008 Seventh International Conference on Grid and Cooperative Computing, , pages. 147-150,2008. http://www.ijettjournal.org Page 128 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014 [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] Jin-Ha Kim, Gyu Sang Choi,Chita R.Das, “ Distributed web Servers on system area network” J.Pallavi Distributed Computation ,Volume 68(8),pages 1033-1043,Year 2008. Jeyanthi, S.; Maheswari, N.U ,”QoS assertion in distributed systems based on content delivery network”, Computing, Communication and Networking, 2008. ICCCn 2008. International Conference on18-20 Dec., Page(s):1 – 6,Year. 2008 Stallings, W. "SSL: Foundation for Web Security", The Internet Protocol Journal, Volume 1, Number 1, ISBN 0-471-31,499-4, published by Wiley in 1998 ,page 20-29, 1998. Cieslak, M., and Foster, D. "Web Cache Coordination Protocol V1.0," Internet Draft, Work in progress, draft-ietf-wrec-web-pro- 00.txt, June 1999. Wessels, D., and Claffey, K. "Application of Internet Cache Protocol (ICP), Version 2," RFC 2187, Informational RFC, September 1997. Melve, I. "Inter Cache Communications Protocols," Internet Draft, Work in progress, draft-melve-intercache-comproto-00.txt, November 1998 R. Malpani, J. Lorch, and D. Berger, “Making World Wide Web caching servers cooperate”, Proceedings of the 4th International WWW Conference, Boston, MA, pp. 107–117, Dec. 1995. A. Chankhunthod, P. B. Danzig, C. Neerdaels, M. F.Schwartz, and K. J. Worrel, “A hierarchical Internet object cache”, Usenix’96, January 1996. S. Michel, K. Nguyen, A. Rosenstein, L. Zhang, S. Floyd and V. Jacobson, “Adaptive Web caching: towards a new caching architecture”,Computer Network and ISDN Systems, pp. 107–117, November 1998. D. Povey and J. Harrison, ”A distributed Internet cache”, Proceedings of the 20th Australian Computer Science Conference,Sydney, Volume 38 , Issue 6 (April 2002) ,pages: 779 – 794,Year of Publication: 2002 Z. Wang, Cachemesh: “A distributed cache system for World Wide Web”, Web Cache Workshop, Pages: 1 – 10,Year of Publication: 2003. U. Legedza and J. Guttag, “Using network-level support to improve cache routing”, Computer Networks and ISDN Systems 30, 22-23, pp. 2193-2201, Nov. 1998. V. Valloppillil and K. W. Ross, “Cache array routing protocol v1.0”, Internet Draft draft-vinod-carp-v1-03.txt L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary cache: a scalable wide-area Web cache sharing protocol”, Proceedings of Sigcomm’98, Volume 8 , Issue 3 (June 2000) ,Pages: 281 - 293 , Year of Publication: 2000 . B. Bloom, “Space/time trade-offs in hash coding with allowable errors”, Communications of ACM, 13(7), pp. 422-426,July 1970. D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy, “Consistent hashing and random trees:distributed caching protocols for relieving hot spots on the World WideWeb”, STOC 1997, pages 654-663, 1997. F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul, “Rate of change and other metrics: a live study of the World-Wide Web”, Proceedings of the 1997 Usenix Symposium on Internet Technologies and Systems (USITS-97), pp. 147-158, Dec. 1997. T. M. Kroeger, D. D. E. Long, and J. C. Mogul, “Exploring the bounds of Web latency reduction from caching and prefetching”, Proceedings of the 1997 Usenix Symposium on Internet Technologies and Systems, Monterey, CA, Pages: 2 - 2., Dec.1997.. V. N. Padmanabhan and J. C. Mogul, “Using predictive prefetching to improve World Wide Web latency”, proceedings of Sigcomm’96. vol.26, pp. 22–36,July. 1996. ISSN: 2231-5381 http://www.ijettjournal.org Page 129