Improved Metadata Management & Scalability in Dynamic Distributed Web Caching Mukesh Dawar

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014
Improved Metadata Management & Scalability in Dynamic Distributed Web
Caching
Mukesh Dawar#1, Charanjit Singh#2
#1
#2
Research Scholar, CSE Deptt, RIMT-IET, India
Assistant Professor, CSE Deptt, RIMT-IET, India
Abstract: The World Wide Web can be considered as a
large distributed information system that provides access to
shared data objects. As one of the most popular applications
currently running on the Internet, the size of World Wide
Web is of an exponential growth, which results in network
congestion and server overloading. Web caching has been
recognized as one of the effective schemes to alleviate the
server bottleneck and reduce the network traffic, thereby
minimize the user access latencies. In this paper, we first
describe the elements of a Web caching system and its
desirable properties. Then, we implement the techniques
which have been used in Web caching systems. Clustering
improves the retrieval latency and also helps to provide load
balancing in distributed environment. But this cannot
ensure the scalability issues, easy handling of frequent
disconnections of proxy servers and metadata management
issues in the network.
network traffic as well. The quality of service and the response
times can be improved by decreasing the network load. One
way to achieve this is to install a Web caching service.
Caching effectively migrate copies of popular documents to
the Web clients from the closer Web servers. In general, Web
client users see shorter delays when requesting a URL,
network managers see less traffic and Web servers see lower
request rates. An origin Web server might not only see lower
requests rates but primarily will experience a lower server load
because files will be fetched with an If-Modified-Since GET
HTTP request. Web clients request documents from Web
servers, either directly or through a Web cache server or
proxy. A Web cache server has the same functionality as a
Web server, when seen from the client and the same
functionality as a client when seen from a Web server. The
primary function of a Web cache server is to store Web
documents close to the user, to avoid pulling the same
document several times over the same connection, reduce
download time and create fewer loads on remote servers.
Keywords: Metadata, Metadata Server, Load balancing,
Distributed Web Caching, Clustering, Latency, Robustness,
Scalability, Disconnection Handling, Proxy server, clients.
I.
INTRODUCTION
Many times users have to face frustrating delays while
accessing a web page, congestion at servers and frequent
disconnections of servers. Because the use of the Web is
growing exponentially it is to be expected that the WWW
traffic on the national and international networks also will
grow exponential with raising latency. Nevertheless the user
expects a high quality of service with modest response times.
To maintain its functionality all these latencies must be
maintained within the tolerable limits. That’s why upgrades in
the networks/servers are always required for providing high
speed and continuous services to its users. One solution is to
store multiple copies of same document but this will increase
the storage and maintenance cost. Another solution is to cache
only the frequently accessed documents as most of the
documents are rather static. This reduces retrieval latency and
ISSN: 2231-5381
Fig 1: General Web Caching Approach
II.
WEB CACHING TECHNIQUES
There are a number of techniques defined previously to do the
Web caching. Having described the attributes of an ideal Web
caching system, we now survey some schemes described in the
literature and point out their inadequacies. The performance of a
Web cache system depends on the size of its client community;
http://www.ijettjournal.org
Page 125
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014
the bigger is the user community, the higher is the probability
that a cached document (previously requested) will soon be
requested again. Caches sharing mutual trust may assist each
other to increase the hit rate. A caching architecture should
provide the paradigm for proxies to cooperate efficiently with
each other.
2.1 Hierarchical Caching Architectures
b) Every hierarchy level may introduce additional delays.
c)
High level caches may become bottlenecks and have
long queuing delays.
d) Multiple copies of the same document are stored at
different cache levels.
2.2 Distributed Caching Architectures
With hierarchical caching, caches are placed at
multiple levels of the network. For the sake of simplicity, we
assume that there are four levels of caches: bottom, institutional,
regional, and national levels. At the bottom level of the
hierarchy there are the client/browser caches. When a request is
not satisfied by the client cache, the request is redirected to the
institutional cache. If the document is not found at the
institutional level, the request is then forwarded to the regional
level cache which in turn forwards unsatisfied requests to the
national level cache. If the document is not found at any cache
level, the national level cache contacts directly the original
server. When the document is found, either at a cache or at the
original server, it travels down the hierarchy, leaving a copy at
each of the intermediate caches along its path. Further requests
for the same document travel up the caching hierarchy until the
document is hit at some cache level. A hierarchical architecture
is more bandwidth efficient, particularly when some cooperating
cache servers do not have high-speed connectivity. In such a
structure, popular Web pages can be efficiently diffused towards the demand. However, there are several problems
associated with a caching hierarchy:
In distributed caching architecture no intermediate
caches are set up and there are only institutional caches at the
edge of the network that cooperate to serve each others misses.
Since there are no intermediate caches that store and centralize
all documents requested by lower level caches, institutional
caches need other mechanism to share the documents they
contain. Institutional caches can query the other cooperating
institutional Caches for documents that resulted in local misses.
However, using a query based approach may significantly
increase the bandwidth consumption and the experienced latency
by the client since a cache needs poll all cooperating caches and
wait for the slowest one to answer.
Figure 1.3: Distributed Web Caching Architecture
2.3 Hybrid Caching Architecture
Figure 1.2: Hierarchical Web Caching Architecture
a)
To set up such a hierarchy, cache servers often need to
be placed at the key access points in the network. This
often requires significant coordination among
participating cache servers.
ISSN: 2231-5381
In a hybrid scheme, caches may cooperate with other
caches at the same level or at a higher level using distributed
caching. ICP is a typical example. The document is fetched
from a parent/neighbor cache that has the lowest RTT.
Rabinovich proposed to limit the cooperation between neighbor
caches to avoid obtaining documents from distant or slower
caches, which could have been retrieved directly from the origin
server at a lower cost. Pablo and Christian have proposed a
mathematical model to analysis some important performance
parameters for all of the above three schemes. They find that
http://www.ijettjournal.org
Page 126
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014
hierarchical caching system have lower connection time while
distributed caching system has lower transmission time. And
hierarchical caching has lower bandwidth usage, while
distributed caching will distributed the traffic better as it will
use more bandwidth in lower network levels. Also the
Distributed caching will cost much smaller disk space, about
only several Gbytes in an institutional cache; while in the
hierarchical caching system, it system and doesn’t generate hot
ill need hundreds of Gbytes at top-level cache. Moreover
distributed caching system can share very well the total load of
system and doesn’t generate hot spots with high load. In a
hybrid scheme the latency greatly varies depending on the
number of cache cooperate at every network level.
3.1 IMPLEMENTATION PHASES:
Every time a client made a request to proxy server the queue
length of cluster is checked if it is under limit then queue length
of proxy server is checked and if the client limit of proxy server
has not exceeded, the client is served otherwise the request is
forwarded to a less loaded proxy server. This makes efficient
balancing of load in the network for proper handling of all client
requests. This strategy will work in following possible phases
whenever client requests a proxy server (PSi) of cluster n (CSn)
for some page:
Step 1. After receiving request from the client the PSi checks
its own metadata mdi for the relevant page, if the page
found, it is counted as a Hit and the page is replied
back to the client immediately. Otherwise PSi forwards
the request to the MDS for further search.
Step 2. In case of Miss in Step 1, the PSi forwards the request
to the Metadata Server (MDSn) of the same cluster.
MDSn will check metadata of the all other proxy
servers fall into the same cluster (CSn) for the
requested page. If the page is found the request is
forwarded to that proxy server and response is being
transmitted back to the client.
Figure 1.4: Hybrid Web Caching Architecture
Figure 1.5: The Proposed Scheme’s Architecture
III. IMPLEMENTATION
The proposed strategy includes origin servers, clusters of proxy
servers and clients as shown in Figure 3. One extra node is
added to every cluster that is Metadata Server (MDS). MDS’s
task is to maintain metadata of all proxy servers within own
cluster and metadata of neighboring cluster. In previous strategy
every proxy server itself maintains metadata of its own cluster
as well as of their neighboring clusters. So this strategy will
reduce efforts and time of proxy servers.
Step 3. If the page is not found in the other proxy servers of
CSn, the MDSn will check its database for the metadata
MDn-1 and MDn+1 of neighboring clusters CSn-1 and
CSn+1 respectively for the requested page. If there is a
Hit the request is forwarded to that PS of neighboring
cluster and reply is being transmitted back to the client.
Step 4. If the requested page is not found even in the
neighboring clusters, the request is forwarded directly
to the next neighboring clusters with a factor of 2 that
are clusters CSn-2 and CSn+2. The MDSn is not having
the metadata of these clusters so request is sent to both
of them. If the requested page is found in any of these
clusters the page is sent to the client otherwise they
send a negative response to the MDSn
Step 5. If the requested page is still not found, the request is
forwarded directly to the origin server (OS). If there is
a Hit at the origin server, the page is returned back to
the client by retaining a copy of it in the proxy server
PSi as well.
Step 6. If the requested page is even not present at the origin
server, and MDSn got a negative response from the
origin server, a “Page Not Found” message is flashed
back to the client.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 127
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014
In case the requested page found at other PS, a copy of that page
is also stored at the PSi before responding back to client and the
updated metadata (umd) is transmitted to the MDSn at the next
s/m time interval. Next time when the client makes a request for
the same page the PS can send the page immediately if it is not
stale. Otherwise the PS looks for the fresh page to send to the
client.
IV. RESULTS
V. CONCLUSIONS
Web services have become very popular today, but server
overloading, scalability, disconnections and network congestion
etc. have become a tradeoff to the performance of web. Web
caching has come up as a great solution for all these problems
and issues. We have discussed some of the problems affecting
the performance of web caching and major issues related with
the distributed web caching. . In this work, we have proposed a
strategy called “Improved Metadata Management & Scalability
in Dynamic Distributed Web caching “that can be easily
deployed in the future. This is based on the DWCRLD to
enhance the scalability and to alleviate extra overhead of
metadata management of the proxy servers and also reduces the
network traffic as well. This scheme also makes it easy to
handle frequent disconnections in the network. By this even if
the number of proxy servers grows in the network, metadata
management will never be an issue. It further reduces the delays
incurred in the replies and also enhances the Hit ratio and
decreased the searching time too.
REFERENCES
[1]
[2]
Figure 1.6: Graph-1 by simulator for Hit Ratio
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Figure 1.7: Graph-2 by simulator to show hit ratio of Proxy
server, Own cluster, and Neighbor cluster.
ISSN: 2231-5381
[11]
Vinod Valloppillil, Keith W. Ross., “Cache Array Routing Protocol
v1.0”., INTERNET-DRAFT,Page 2-8, Feb 1998.
Gwertzman and Seltzer J., and Seltzer M., “World Wide Web cache
consistency”, in: Proceedings of the 1996 Usenix Technical Conference,
Boston, MA, Harvard College, page 141-152, 1996.
K.Worrel1. “Invalidation in large scale network object caches”.
Technical report, Master's Thesis, University of Colorado,
Boulder,Vol.11, Page 63-76, 1994.
M. R. Korupolu and M. Dahlin, “Coordinated placement and
replacement for large-scale distributed caches”, Proceedings of the
IEEE Workshop on Internet Applications, July 1999 ,Technical Report
TR-98-30, December 1998.
P. Krishnan and B. Sugla, “Utility of cooperatingWeb proxy caches”,
Computer Networks and ISDN Systems, pp. 195-203, April 1998
A. Feldmann, R. Caceres, F. Douglis, G. Glass, and M. Rabinovich,
“Performance of Web proxy caching in heterogeneous bandwidth
environments”, Proceedings of Infocom’99 ,vol 6, No.6 ,year march
1999.
Wessels, D., and Claffey, K. "Internet Cache Protocol (ICP)”, version 2,
RFC 2186, September 1997D.
S. Hosseini-Khayat, “Improving Object Cache performance through
selective placement” Proceedings of the 24th IASTED ,International
conference on parallel and distributed computing and networks ,pages
262-265,year 2006.
A. Balamasah,Marwankrunz, P. Nain,”Performance analysis of a
Client-Side Caching or Prefetching system for Web Traffic”
International Journal of Computer and telecommunications networks,
vol 52,Issue 13, Pages 3673-3692.,Year 2007.
Daniel A.Menasce,Vasudeva Akula , “Improving the performance of
online Auctions thought Server-side Activity based caching”,World
wide Web , Kluwer Academic Publishers Hingham, MA, USA, Vol.
10,Issue 2,Pages 181-204,Year 2007.
Z.Duan, Zhimin Gu, "Dynamic Load Balancing in Web Cache Cluster,"
Grid and Cooperative Computing, International Conference, 2008
Seventh International Conference on Grid and Cooperative Computing,
, pages. 147-150,2008.
http://www.ijettjournal.org
Page 128
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 3 - Apr 2014
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
Jin-Ha Kim, Gyu Sang Choi,Chita R.Das, “ Distributed web Servers on
system area network” J.Pallavi Distributed Computation ,Volume
68(8),pages 1033-1043,Year 2008.
Jeyanthi, S.; Maheswari, N.U ,”QoS assertion in distributed systems
based on content delivery network”, Computing, Communication and
Networking, 2008. ICCCn 2008. International Conference on18-20
Dec., Page(s):1 – 6,Year. 2008
Stallings, W. "SSL: Foundation for Web Security", The Internet
Protocol Journal, Volume 1, Number 1, ISBN 0-471-31,499-4,
published by Wiley in 1998 ,page 20-29, 1998.
Cieslak, M., and Foster, D. "Web Cache Coordination Protocol V1.0,"
Internet Draft, Work in progress, draft-ietf-wrec-web-pro- 00.txt, June
1999.
Wessels, D., and Claffey, K. "Application of Internet Cache Protocol
(ICP), Version 2," RFC 2187, Informational RFC, September 1997.
Melve, I. "Inter Cache Communications Protocols," Internet Draft,
Work in progress, draft-melve-intercache-comproto-00.txt, November
1998
R. Malpani, J. Lorch, and D. Berger, “Making World Wide Web
caching servers cooperate”, Proceedings of the 4th International WWW
Conference, Boston, MA, pp. 107–117, Dec. 1995.
A. Chankhunthod, P. B. Danzig, C. Neerdaels, M. F.Schwartz, and K. J.
Worrel, “A hierarchical Internet object cache”, Usenix’96, January
1996.
S. Michel, K. Nguyen, A. Rosenstein, L. Zhang, S. Floyd and V.
Jacobson, “Adaptive Web caching: towards a new caching
architecture”,Computer Network and ISDN Systems, pp. 107–117,
November 1998.
D. Povey and J. Harrison, ”A distributed Internet cache”, Proceedings
of the 20th Australian Computer Science Conference,Sydney, Volume
38 , Issue 6 (April 2002) ,pages: 779 – 794,Year of Publication: 2002
Z. Wang, Cachemesh: “A distributed cache system for World Wide
Web”, Web Cache Workshop, Pages: 1 – 10,Year of Publication: 2003.
U. Legedza and J. Guttag, “Using network-level support to improve
cache routing”, Computer Networks and ISDN Systems 30, 22-23, pp.
2193-2201, Nov. 1998.
V. Valloppillil and K. W. Ross, “Cache array routing protocol v1.0”,
Internet Draft draft-vinod-carp-v1-03.txt
L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary cache: a
scalable wide-area Web cache sharing protocol”, Proceedings of
Sigcomm’98, Volume 8 , Issue 3 (June 2000) ,Pages: 281 - 293 , Year
of Publication: 2000 .
B. Bloom, “Space/time trade-offs in hash coding with allowable errors”,
Communications of ACM, 13(7), pp. 422-426,July 1970.
D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R.
Panigrahy, “Consistent hashing and random trees:distributed caching
protocols for relieving hot spots on the World WideWeb”, STOC 1997,
pages 654-663, 1997.
F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul, “Rate of
change and other metrics: a live study of the World-Wide Web”,
Proceedings of the 1997 Usenix Symposium on Internet Technologies
and Systems (USITS-97), pp. 147-158, Dec. 1997.
T. M. Kroeger, D. D. E. Long, and J. C. Mogul, “Exploring the bounds
of Web latency reduction from caching and prefetching”, Proceedings
of the 1997 Usenix Symposium on Internet Technologies and Systems,
Monterey, CA, Pages: 2 - 2., Dec.1997..
V. N. Padmanabhan and J. C. Mogul, “Using predictive prefetching to
improve World Wide Web latency”, proceedings of Sigcomm’96.
vol.26, pp. 22–36,July. 1996.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 129
Download