Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 Trace Driven Simulation of GDSF# and Existing Caching Algorithms for Internet Web Servers 1 J B Patil, 2B.V. Pawar ABSTRACT policy GDSF# gives close to perfect performance in both Web caching is used to improve the performance of the the important metrics: HR and BHR. Internet Web servers. Document caching is used to Keywords: Web caching, replacement policy, hit ratio, reduce the time it takes Web server to respond to client byte hit ratio, trace-driven simulation requests by keeping and reusing Web objects that are 1. INTRODUCTION likely to be used in the near future in the main memory of the Web server, and by reducing the volume of data The enormous popularity of the World Wide Web has transfer between Web server and secondary storage. The caused a tremendous increase in network traffic due to heart of a caching system is its page replacement policy, http requests. This has given rise to problems like user- which needs to make good replacement decisions when perceived latency, Web server overload, and backbone its cache is full and a new document needs to be stored. link congestion. Web caching is one of the ways to The latest and most popular replacement policies like alleviate these problems [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. Web GDSF use the file size, access frequency, and age in the caches can be deployed throughout the Internet, from decision process. browser caches, through proxy caches and backbone caches, through reverse proxy caches, to the Web server The effectiveness of any replacement policy can be caches. evaluated using two metrics: hit ratio (HR) and byte hit In our work, we use trace-driven simulation for evaluating ratio (BHR). There is always a trade-off between HR and the performance of different caching policies for Internet BHR [1]. In this paper, using three different Web server Web servers. Our study uses Web server traces from logs, we use trace driven analysis to evaluate the effects three different sites on the Internet. of different replacement policies on the performance of a Web server. We propose a modification of GDSF policy, Cao and Irani have surveyed ten different policies and GDSF#, which allows augmenting or weakening the impact proposed a new algorithm, Greedy-Dual-Size (GDS) in of size or frequency or both on HR and BHR. Our [5]. The GDS algorithm uses document size, cost, and simulation results show that our proposed replacement age in the replacement decision, and shows better performance compared to previous caching algorithms. In [4] and [12], frequency was incorporated in GDS, Department of Computer Engineering, R. C. Patel Institute of Technology, Shirpur. (M.S.), India. E-mail: jbpatil@hotmail.com 1 resulting in Greedy-Dual-Size-Frequency (GDSF) and Greedy-Dual-Frequency (GDF). While GDSF is attributed to having best hit ratio (HR), it having a modest byte hit Department of Computer Science, North Maharashtra University, Jalgaon. (M.S.), India. E-mail: bvpawar@hotmail.com 2 ratio (BHR). Conversely, GDF yields a best HR at the cost of worst BHR [12]. 573 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 size si, then the impact of size is less than that of frequency In this paper, we propose a new algorithm, called Greedy- [4]. Dual-Size -Frequency # (GDSF #), which allows augmenting or weakening the impact of size or frequency Extending this logic further, we propose an extension to or both on HR and BHR. We compare GDSF# with GDS the GDSF, called GDSF#, where the key value of family algorithms like GDS(1), GDS(P), GDSF(1), GDSF(P). document is computed as Our simulation study shows that GDSF# gives close to perfect performance in both the important metrics: HR and BHR. where » and ´ are rational numbers. If we set » or ´ above 1, it augments the role of the corresponding parameter. The remainder of this paper is organized as follows. Conversely, if we set » or ´ below 1, it weakens the role of Section 2 introduces GDSF#, a new algorithm for Web the corresponding parameter. cache replacement. Section 3 describes the simulation model for the experiment. Section 4 describes the Therefore, we present the GDSF# algorithm as shown experimental design of our simulation while Section 5 below: presents the simulation results. We present our begin conclusions in Section 6. Initialize L = 0 2. GDSF# Algorithm (Our Proposed Algorithm) Process each request document in turn: In GDSF, the key value of document i is computed as follows [4] [12]: let current requested document be i if i is already in cache The Inflation Factor L is updated for every evicted else document i to the priority of this document i. In this way, L increases monotonically. However, the rate of increase while there is not enough room in cache for p is very slow. If a faster mechanism for increasing L is begin designed, it will lead to a replacement algorithm with let L = min( features closure to LRU. We can apply similar reasoning ), for all i in cache evict i such that to and using , If we augment the frequency by ,…, etc. instead of end load i into cache then the impact of frequency is more pronounced than that of size. Similarly, if we use , end ,…, etc. or use log (si) instead of file 574 =L Trace Driven Simulation of GDSF# and Existing Caching Algorithms for Internet Web Servers 3. SIMULATION MODEL FOR THE EXPERIMENT 4. EXPERIMENTAL DESIGN In case of Web Servers, a very simple Web server is This section describes the design of the performance assumed with a single-level file cache. When a request is study of cache replacement policies. The discussion received by the Web server, it looks for the requested file begins with the factors and levels used for the simulation. in its file cache. A cache hit occurs if the copy of the Next, we present the performance metrics used to evaluate requested document is found in the file cache. If the the performance of each replacement policy used in the document is not found in the file cache (a cache miss), study. Lastly, we discuss other design issues regarding the document must be retrieved from the local disk or the simulation study. from the secondary storage. On getting the file, it stores 4.1 Factors and Levels the copy in its file cache so that further requests to the There are two main factors used in the in the trace-driven same document can be serviced from the cache. If the simulation experiments: cache size and cache replacement cache is already full when a file needs to be stored, it policy. This section describes each of these factors and triggers a replacement policy. the associated levels. Our model also assumes file-level caching. Only complete Cache Size documents are cached; when a file is added to the cache, the whole file is added, and when a file is removed from The first factor in this study is the size of the cache. For the cache, the entire file is removed. the Web server logs, we have used seven levels from 1 MB to 64 MB. The upper bounds of cache size are chosen For simplicity, our simulation model completely ignores to represent an infinite cache size for the respective traces. the issues of cache consistency (i.e., making sure that An infinite cache is one that is so large that no file in the the cache has the most up-to-date version of the given trace, once brought into the cache, need ever be document, compared to the master copy version at the evicted. It allows us to determine the maximum achievable original Web server, which may change at any time). cache hit ratio and byte hit ratio, and to determine the Lastly, caching can only work with static files, dynamic performance of a smaller cache size to be compared to files that have become more and more popular within the that of an infinite cache. past few years, cannot be cached. Replacement Policy 3.1 Workload Traces In our research, we examine the following previously proposed replacement policies: GDS(1), GDS(P), GDSF(1), In this study, logs from three different Web servers are and GDSF(P). Our proposed policy GDSF# is also used: a Web server from an academic institute, Symbiosis examined and evaluated against these policies. Institute of Management Studies, Pune; a Web server from a manufacturing company, Thermax, Pune, and a Greedy-Dual-Size (GDS): GDS [5] maintains for each Web server for an E-Shopping site in UK, object a characteristic value Hi. A request for object i www.wonderfulbuys.co.uk. (new request or hit) requires a recalculation of Hi. Hi is calculated as 575 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 We show the simulation results of GDS(1), GDS(P), GDSF(1), GDSF(P)., GDSF#(1), and GDSF#(P) for the Web L is a running aging factor, which is initialized to zero, ci is server traces for hit rate and byte hit rate. The graph for the cost to fetch object i from its origin server, and si is Infinite indicates the performance for the Infinite cache the size of object i. GDS chooses the object with the size. smallest Hi-value. The value of this object is assigned to L. if cost is set to 1, it becomes GDS(1), and when cost is FIFO is chosen as a representative of strategies that do set to p = 2 + size/536, it becomes GDS(P). not exploit any particular access pattern characteristics, and hence its performance can be used to gauge the Greedy-Dual-Size-Frequency (GDSF): GDSF [4] [12] benefits of exploiting any such characteristics. calculates Hi as 5.1 Simulation Results for GDSF# Algorithm In this section, we experiment with the various values of It takes into account frequency of reference in addition » and ´ in the equation for computing key value, to size. Similar to GDS, we have GDSF(1) and GDSF(P). 4.2 Performance Metrics to augment or weaken the impact of frequency and size in The performance metrics used to evaluate the various GDSF#. replacement policies used in this simulation are Hit Rate and Byte Hit Rate. Effect of Augmenting Frequency in GDSF# Hit Rate (HR) Hit rate (HR) is the ratio of the number of if we add frequency in the GDS to make it GDSF, it requests met in the cache to the total number of requests. improves BHR considerably and HR slightly. To check whether we can further improve the performance, we set Byte Hit Rate (BHR) Byte hit rate (BHR) is concerned ë = 2, 5, 10 with δ = 1 in the equation for with how many bytes are saved. This is the ratio of the . Figure 1 shows a comparison of GDSF(1) and GDSF#(1) with ë = number of bytes satisfied from the cache to the total bytes 2, 5, 10 with ä = 1 for the three Web server traces. The requested. results indicate that augmenting frequency in GDSF# 5. SIMULATION RESULTS improves BHR in all the three traces but the improvement This section presents the simulation results for comes at the cost of HR. Again, we find that with ë = 2, we comparison of different file caching strategies. get the best results for BHR. Section 5.1 gives the simulation results for the GDSF# algorithm. Section 5.2 shows the results for Web servers. 576 Trace Driven Simulation of GDSF# and Existing Caching Algorithms for Internet Web Servers Figure 1: HR and BHR for GDSF# algorithm using Web server traces (ë=2, 5, 10) 577 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 Effect of De-Augmenting Size in GDSF# We find that we get best results for both HR and BHR for the combination ë = 2 and ä = 0.9 for all the six Web We have seen that emphasizing frequency in GDSF# traces. This combination shows close to perfect results in improved BHR. Now let us check the effect of performance for both the important metrics: HR and BHR. de-augmenting or weakening the size. For this, we set ä = 0.3, 0.6, 0.9 with ë = 1 in the equation for . Figure 2 This is important result because as noted earlier, there is shows a comparison of GDSF(1) and GDSF#(1) with ä = always a trade-off between HR and BHR [2]. Replacement 0.3, 0.6, 0.9 with ä = 1 for the three Web server traces. The policies that try to improve HR do so at the cost of BHR, results are on expected lines. The effect of decreased and vice versa [5]. Often, a high HR is preferable because impact of file size improves BHR across all the six traces. it allows a greater number of requests to be serviced out Again, it is at the cost of HR. Specifically, we get best of cache and thereby minimizing the average request BHR at ä = 0.3, and best HR at ä = 0.9. latency as perceived by the user. However, it is also desirable to maximize BHR to minimize disk accesses or Effect of Augmenting Frequency & De-Augmenting Size outward network traffic. in GDSF# In the next sections, we use the best combination of ë= 2 We have seen that emphasizing frequency and de- and ä= 0.9 in the equation for emphasizing size in GDSF# results in improved BHR, at for GDSF# to compare the performance of GDSF# with GDS(1), GDS(P), GDSF(1), the cost of slight reduction in HR. Now the question is and GDSF(P).. So, instead of denoting it as GDSF#(ë=2, then whether we can achieve still better results by ä=0.9), we will denote it as simply GDSF#. combination of both augmenting frequency and deaugmenting size. For this, we try different combinations of ë and ä. 578 Trace Driven Simulation of GDSF# and Existing Caching Algorithms for Internet Web Servers Figure 2: HR and BHR for GDSF# algorithm using Web server traces (ä=0.3, 0.6, 0.9) 5.2 Simulation Results for Web Servers In this section, we present and discuss simulation results for Thermax, Wonderfulbuys, and Symbiosis Web servers. Simulation Results for Thermax Figures 3a and 3b give the comparison of GDSF# with other algorithms. Figure 3b: Byte Hit rate of Thermax trace The results indicate that the HR achieved with an infinite sized cache is 98.71% while the BHR is 94.19% for the Thermax trace. Of the algorithms shown in Figure 3, GDSF(1) and GDSF#(1) had the highest and almost similar HRs. In case of BHRs, GDSF(P), GDSF#(1), and GDSF#(P) had the highest BHRs. However, GDSF(P) had a lower HR. GDSF# is thus optimized for both HR and BHR in case of Figure 3a: Hit rate of Thermax trace Thermax trace. 579 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 Simulation Results for Wonderfulbuys Simulation Results for Symbiosis Figures 4a and 4b show the performance graphically. Figures 5a and 5b show the performance graphically. Figure 4a: Hit rate of Wonderfulbuys trace Figure 5a: Hit rate of Symbiosis trace Figure 4b: Byte Hit rate of Wonderfulbuys trace Figure 5b: Byte Hit rate of Symbiosis trace The results indicate that the HR achieved with an infinite sized cache is 99.66% while the BHR is 99.27% for the The results indicate that the HR achieved with an infinite Wonderfulbuys trace. Of the algorithms shown in Figure sized cache is 98.16% while the BHR is 95.87% for the 4, GDSF(1) had the highest HR followed by GDSF#(1). Symbiosis trace. Of the algorithms shown in Figure 5, GDS(1) and GDS(P) also had a lower HRs. GDS(1) and GDSF(1) had the highest HR followed by GDSF#(1). GDS(1) had a lower HR. In case of BHRs, GDSF#(P) had the highest BHR followed by GDSF(P), GDSF#(1), and GDSF(1). However, GDSF# In case of BHRs, GDSF#(1), and GDSF#(P) had the highest scores over the others in better HR in case of BHRs followed by GDSF(1) and GDSF(P). However, Wonderfulbuys trace. 580 Trace Driven Simulation of GDSF# and Existing Caching Algorithms for Internet Web Servers because of comparatively better HR, GDSF# scores over explanation is that in size-based policies, large files most of the other algorithms in case of Symbiosis trace. are always the potential candidates for the eviction, and the inflation factor is advanced very slowly, so 6. CONCLUSION that even if a large file is accessed on a regular basis, In this paper, we proposed a Web cache algorithm called it is likely to be evicted repeatedly. GDSF and GDSF# GDSF#, which tries to maximize both HR and BHR. It uses frequency as a parameter in its decision-making, incorporates the most important parameters of the Web so popular large files have better chance of staying in traces: size, frequency of access, and age (using inflation a cache. In addition, the inflation or ageing factor, L is value, L) in a simple way. now advanced faster. GDSF and GDSF# shows substantially improved BHR across all traces. We have compared GDSF# with some popular cache replacement policies for Web servers using a trace-driven F simulation approach. We conducted several experiments frequency yield better BHR because they do not using three Web server traces. The replacement policies discriminate against the large files. These policies also examined were GDS(1), GDS(P), GDSF(1), GDSF(P). retain popular objects (both small and large) longer GDSF#(1), and GDSF#(P). We used metrics like Hit Ratio than recency-based policies like LRU. However, (HR) and Byte Hit Ratio (BHR) to measure and compare normally these policies show poor HR because these performance of these algorithms. Our experimental results policies do not take into account the file size which show that: F results in a higher file miss penalty. As pointed out by Williams et al. in [11], the observed F HRs can range from 20% to as high as 98%, with frequency or both on HR and BHR. Our results show rate of 98% comes from a Web server cache, rather that our proposed replacement policy gives close to than proxy cache. Our results are consistent with this perfect performance in both the important metrics: HR finding. and BHR. The results also indicate that it is more difficult to 7. REFERENCES achieve high BHRs than high HRs. For example, in all 1. the three traces, the maximum BHR is always less than Policies”, In ACM SIGMETRICS Performance The results are consistent across all the three traces. Evaluation Review, August 1999. GDSF# and GDSF show the best HR and BHR F M. Arlitt, R. Friedrich, and T. Jin, “Workload Characterization of Web Proxy Cache Replacement maximum HR. F We analyzed the performance of GDSF# policy, which allows augmenting or weakening the impact of size or majority ranging around 50%. The workload with a hit F Similarly, replacement policies giving importance to 2. M. Abrams, C. R. Standridge, G. Abdulla, S. Williams, significantly outperforming the baseline algorithms and E. A. Fox, “Caching Proxies: Limitations and like LRU, LFU for these metrics. Potentials”, In Proceedings of the Fourth International World Wide Web Conference, Pages Replacement policies emphasizing the document size 119-133, Boston, MA, December 1995. yield better HR, but typically show poor BHR. The 581 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 3. M. Arlitt and C. Williamson, “Trace Driven Proceedings of ACM SIGCOMM, PP 293-305, Simulation of Document Caching Strategies for Stanford, CA, 1996, Revised March 1997. 12. M. F., Arlitt, L. Cherkasova, J. Dilley, R. J. Friedrich, Internet Web Servers”, Simulation Journal, Vol. 68, 4. No. 1, PP 23-33, January 1977. and T. Y Jin, “Evaluating Content Management L. Cherkasova, “Improving WWW Proxies Techniques for Web Proxy Caches”, ACM Performance with Greedy-Dual-Size-Frequency SIGMETRICS Performance Evaluation Review, Caching Policy”, In HP Technical Report HPL-98- Vol.27, No. 4, PP 3-11, March 2000. 69(R.1), November 1998. 5. Author’s Biography P. Cao and S. Irani, “Cost-Aware WWW Proxy Caching Algorithms”, In Proceedings of the Prof. J. B. Patil received his BE USENIX Symposium on Internet Technology and (Electronics) degree from the SGGS College Systems, PP 193-206, December 1997. 6. of Engineering & Technology, Nanded in S. Jin and A. Bestavros, “GreedyDual*: Web 1986; M.Tech.( Computer Science & Data Caching Algorithms Exploiting the Two Sources of Processing) from Indian Institute of Temporal Locality in Web Request Streams”, In Technology, Kharagpur in 1993; and has submitted his Proceedings of the 5th International Web Caching Ph.D. thesis to the North Maharashtra University, Jalgaon and Content Delivery Workshop, 2000. 7. in Computer Science in January 2008. He is presently S. Podlipnig and L. Boszormenyi, “A Survey of Web working as Principal and Professor in the Department of Cache Replacement Strategies”, ACM Computing Computer Engineering, R. C. Patel Institute of Technology, Surveys, Vol. 35, No.4, PP 374-398, December 2003. 8. Shirpur since 2001. His current research interest is in the L. Rizzo, and L. Vicisano, “Replacement Policies area of Internet Web caching and prefetching. He has for a Proxy Cache”, IEEE/ACM Transactions on authored and co-authored over 20 papers in referred Networking, Vol. 8, No. 2, PP 158-170, April 2000. 9. academic journals and national/international conference A. Vakali, “LRU-based Algorithms for Web Cache proceedings. He is a life member of Computer Society of Replacement”, In International Conference on India (CSI), Institute of Engineers (India), and Indian Electronic Commerce and Web Technologies, Society for Technical Education (ISTE). Lecture Notes in Computer Science, Vol.1875, PP 409- Prof. B. V. Pawar received his B.E. degree 418, Springer-Verlag, Berlin, Germany, 2000. 10. R. P. Wooster and M. Abrams., “Proxy Caching that from the V.J.T.I., Mumbai, in Production Estimates Page Load Delays”, In Proceedings of Engineering in 1986; M.Sc. degree from the the Sixth International World Wide Web Conference, Mumbai University, Mumbai, in Computer PP 325-334, Santa Clara, CA, April 1997. Science in 1988; Ph.D. degree from the North 11. S. Williams, M. Abrams, C. R. Standridge, G. Abdulla, Maharashtra University, Jalgaon in Computer Science in and E. A. Fox, “Removal Policies in Network 2000. He is presently Professor in the Department of Caches for World-Wide-Web Documents”, In Computer Science, North Maharashtra University, 582 Trace Driven Simulation of GDSF# and Existing Caching Algorithms for Internet Web Servers Jalgaon where he has been involved in teaching and in referred academic journals and national/international research since 1991. His current research interests are in conference proceedings. He is a life member of Computer the areas of natural language processing and information Society of India (CSI), India and Linguistic Society of retrieval. He has authored and co-authored over 60 papers India (LSI). 583 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 An Extended MD5 (ExMD5) Hashing Algorithm For Better Data Integrity S. Karthikeyan ABSTRACT 1. INTRODUCTION Recent developments in scientific and engineering The Trusted Network Interpretation [4] points out that applications communication are playing a major role in integrity ensures computerized data, which are the same online transactions. All the transactions across the globe as those in source documents and have not been exposed are shared by network. The integrity ensures that only to accidental or malicious alteration or destruction. The authorized parties are able to modify the transmitted integrity ensures the data related are precise, accurate, information. Modification includes writing, changing, unmodified, meaningful and usable. The operating changing status and delaying or replaying transmitted system, database management system and the network messages. The Message Digest functions MD4 and MD5 enforce the integrity. provide one-way function integrity in the form of octets, The integrity of data is more important like confidentiality. 32-bit and 64 bit words respectively. The extended MD5 In some situations passing authentication data, the (ExMD5) was designed to be somewhat more integrity is paramount. In other cases, the need for conservative than MD5 in terms of being more concerned integrity is less obvious. Some of the message integrity with security. An extended MD5 hashing algorithm that threats are falsification of messages and noise. The follows five passes than MD5, which follows only four hackers may do the falsification of messages during passes. In the proposed approach, the message is transmission using active wiretap. Signals sent over processed in 512-bit blocks and the message digest is a communication media are subject to interference from 128-bit quantity. Each stage consists of computing other traffic on the same media, as well as from natural function based on the 512-bit message chunk and the sources such as, lightning, and electric motors. Such message digest to produce a new intermediate value. The unintentional interference is called noise. These forms of value of the message digest is the result of the output of noise are inevitable, and it can threaten the integrity of the final block of the message. An extended MD5 hashing data in a message. provides the better data integrity and the resulted values are compared against existing hashing algorithms such The integrity defines one-way hash function generating as MD4 and MD5 algorithms. the checksum of the message [2]. When the receiver gets the data, it hashes it as well and compares the two sums, Key Words : Hashing Algorithon, one-way function, if they match, then the data is unaltered. The integrity is message digest and integrity. implemented through the use of Message Authentication Code (MAC) and hash or Message Digest Functions. Reader and Head, Department of Computer Science, Karpagam Arts and Science College (Autonomous), Coimbatore-21. The MAC is cryptographically generated fixed length quantity and associated with a message to reassure the 584 An Extended MD5 (ExMD5) Hashing Algorithm For Better Data Integrity recipient that the message is genuine. The Message The main usage of hash function ‘h’ is to maintain Digest functions MD2, MD4 and MD5 provide one-way confidentiality and authentication between sender and function in the form of octets, 32-bit and 64 bit words receiver. The simplest hash function is the bit-by-bit respectively. The examples for message digest functions exclusive-OR (XOR) of every block. This operation can are MD2, MD4 and MD5. The MD5 was designed to be be expressed as follows: somewhat more conservative than MD4 in terms of being Mi = Ci1 more concerned with security. Ci2… Cin Where, 1.1. Hash Function Mi = ith bit of the hash code, 1≤ i ≤ n A hash function maps a message of any length into a n = Number of n-bit blocks in the input fixed length hash value or message digest. The hash value depends on input value. It provides an error detection Cij = ith bit in jth block capability [3]. A cryptographic function, such as DES or = XOR operation AES, is especially appropriate for sealing values, since the outsider will not know the key and thus will not be This operation produces a simple parity for each bit able to modify the stored value. A change in one bit or position. This process is known as longitudinal bits in the message results in a change in the hash code. redundancy check. A hash function is an easily computable map ƒ → x h from a very long input ‘x’ to a much shorter output ‘h’. In 2. LITERATURE REVIEW a hash function it is not computationally feasible to find The following various hash functions and its usages are same hash value for two different inputs x and x1 such discussed by the authors [9]. that ƒ(x) = ƒ(x ). The most widely used cryptographic 1 The hash function S hash functions are MD4, MD5 and Secure Hash Algorithm authentication and digital signature [7]. The hash code (SHA). The authors [5] will describe the following of the message ‘M’ is encrypted using private key of the properties that are important for hash function ‘h’. F sender PRs. Thus, the E(PRs,H(M)) is an encryption The function ‘h’ can be applied to a block of data of function of a variable length message ‘M’ and the private any size. key PRs and it produces a fixed size output. The receiver F The function ‘h’ produces a fixed-length output. F The h(x) is relatively easy to compute for any given decrypts hash value, which is received from sender using public key of the sender PUs and compares with the original message M. If it is equal the integrity is proved. ‘x’. F F R : M||E(PRs,H(M)) provides The hash function S R : E(K,(M||H(M||SV))) provides For any given value ‘h’, it is computationally infeasible authentication and confidentiality. The technique to find ‘x’ such that h(x) = h. assumes that the two communicating parties share a common secret value (SV) [11]. The sender ‘S’ computes For any given block ‘x’, it is computationally infeasible the hash values over the concatenation of ‘M’ and ‘SV’ to find y `” x such that h(y) = h(x). 585 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 and appends the result hashing value to ‘M’. Then entire The Bellare, Canetti and Krawczyk [6] present new, simple message with hash code is encrypted using shared key and practical constructions of message authentication value ‘K’. The receiver decrypts entire encoded message schemes based on a cryptographic hash function. This using shared secret key value ‘K’. The receiver extracts paper describes basic properties of cryptographic hash the original message ‘M’ and concatenates with secret functions and keyed hash functions. It also describes value. Calculate the hash value for ‘M||SV’ and compare various attacks in message authentication codes and with received hash value. In this approach both provides the solution. authentication and confidentiality is well proved. The The Bosselaers [1] presents a performance improvement various authors discuss the different approaches related 15% for the MD4 family of hash functions. The to the hashing algorithm. improvement is obtained by substituting n-cycle Anderson and Math [9] provide the information on instructions by n-1 cycle instructions and reducing so classifications of hash functions. There are many many instructions. This paper also compares the applications for which one-way hash functions are performance improvement on MD4, MD5, and SHA-1. required. Digital signatures are one example; it is usually This comparison is done by 90 MHz Pentium processors. not practical to sign the whole message, as public key Challal, Bouabdallah and Bettahar [12] describe the hybrid algorithms are rather slow, so the normal practice is to hash-chaining scheme in conjunction with an adaptive hash a message to a digit between 128 ad 160 bits, and and efficient data source authentication protocol, which sign this need. The paper describes the real requirements tolerates packet loss and guarantees non-repudiation of of hash functions like collision free, complementation free, media-streaming origin. The hybrid hash chaining has addition free and multiplication freedom properties. These the following terminology: If a packet Pj contains the freedom properties are central to controlling interactions hash of a packet Pi, the hash link connects Pi to Pj. The between cryptographic algorithms, and have the potential target of Pi is Pj. A signature packet is a sequence of to be useful in algorithm design as well. packet hashes, which are signed using a conventional Anderson and Biham [8] present a new hash function, digital signature scheme. A hash link relates the packet which is called a Tiger, to be designed for 64 bit processors with signature packets. This protocol allows saving the and secure than MD4, SHA-1 and Snefru-8. The next bandwidth and improves the probability that a packet be generation of processors has 64-bit words and the older verifiable even if some packets are lost. hash functions could not be implemented efficiently. The Cao, Lin and Xue [10] provide the information on secure Tiger hash function is stronger and faster than SHA-1 in randomized RSA-based blind signature scheme. The blind 32-bit processors and about three times faster on 64-bit signature scheme can yield a signature and message pair processors. It outputs 192-bit hash value, which can be whose information does not leak to the signer. When truncated to 128-bit or 168-bits for existing applications blind signatures are used to design e-cash schemes there compatibility. However this Tiger hash function has more are two problems: double spending and accuracy of signer passes than existing hash functions. information. The proposed scheme in this paper satisfies blindness and unforgeability properties. The computation 586 An Extended MD5 (ExMD5) Hashing Algorithm For Better Data Integrity cost of this scheme is six modular exponentiations, six F modular multiplications, three hashing operations and message word on each pass. twice of random number generation performed by the 3.1. Algorithm Description user to obtain and verify a signature. This paper uses one-way public hash function like MD4, The extended MD5 uses a different constant for each To find the message digest for ‘b’-bit message input: The SHA-1. ‘b’ is an arbitrary non-negative integer; ‘b’ may be zero; it 3. EXTENDED MD5 HASHING ALGORITHM need not be a multiple of eight, and it may be arbitrarily The Extended MD5 algorithm (ExMD5) takes as input a large. The bits of the message are written down as follows: message of random length and produces as output a 128m_0 m_1 ... m_{b-1} bit message digest of the input. This algorithm ensures that it is computationally infeasible to produce two Append Padding Bits messages having the same message digests, or to produce any message having a given pre-specified target message The message is to be fed into the message digest digest. The extended MD5 algorithm is an extension of computation which must be multiples of 512-bits. The the MD5 message digest algorithm. The proposed following steps are performed for message padding: algorithm is more secure and conservative in design than Step 1: The original message is padded by adding a 1- MD5. It is intended for digital signature applications, bit, followed by enough where it identifies the correct sender. message 64-bit less than a multiple of 512-bits. 0- bits to leave the The extended MD5 algorithm is designed to be quite Step 2: Then a 64-bit quantity representing the number secure comparing to existing hash algorithms like MD2, of bits in the unpadded message is appended to MD4, and MD5. In addition, it does not require any large the message. substitution tables; the algorithm can be coded quite The following figure1 represents the padding process. compactly. The extended MD5 algorithm is similar to 1-512 bits MD4 and MD5. The major differences are: Original Message F 1000…000 MD4 makes three passes over each 16-octet chunk of the message and MD5 has four passes for every 16- F F 64 bits Original length in bits octet chunk of the message. The extended MD5 makes Figure 1: Extended MD5 Figure1: Extended MD5 five passes over each 16-octet chunk. message padding The functions are slightly different in the number of The bit order within the octet is a most significant bit to bits in the shifts. the least significant bit and the octet order is a least significant bit to the most significant bit. MD4 has one constant, which is used for each message word in pass 2, and different constant used Overview of extended MD5 Message Digest Computation for the entire 16 message words in pass 3. No constant In extended MD5, the message is processed in 512-bit is used in pass 1. blocks (sixteen 32-bit words). The following figure2 will 587 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 illustrate this process. The message digest is a 128-bit Where S1 (i) = 7+5i, so the ‘S’ cycle over the values quantity (four 32-bit words). Each stage consists of 7,12,17,22. This is different ‘S1’ from that in MD4. The computing a function based on the 512-bit message first few steps of the pass are as follows: chunk and the message digest to produce a new d0 = d1 + (d0 + F(d1, d2, d3) + m0 + T1) = 7 intermediate value for the message digest. The value of d3 = d0 + (d3 + F(d0, d1, d2) + m1 + T2) = 12 the message digest is the result of the output of the final block of the message. d2 = d3 + (d2 + F(d3, d0, d1) + m2 + T3) = 17 Each stage in extended MD5 takes five passes over the d1 = d2 + (d1 + F(d2, d3, d0) + m3 + T4) = 22 message block (as opposed to four for MD5). As with d0 = d1 + (d0 + F(d1, d2, d3) + m4 + T5) = 7 MD4, at the end of the stage, each word of the modified message digest is added to the corresponding pre-stage Step 2: Extended MD5 Message Digest Pass 2 message digest value. In MD4, before the first stage, the message digest is initialized to d0 = 6745230116, d1 = ∧z) v (y∧ ∧~z). Whereas, A function G(x, y, z) is defined as (x∧ efcdab8916, d2 = 98badcfe16, and d3 = 1032547616. As with the function ‘G’ is different in extended MD5 to the ‘G’ extended MD5, each pass modifies d0, d1, d2, d3 using m0, function in MD4. A separate step is done for each of the m1, m2,…m15. The following five steps are performed to 16 words of the message. For each integer ‘i’ from 0 compute the message digest of the message. through 15, d(-i) ∧3=d(1-i)∧3+(d(-i)∧3+H(d(1-i)∧3,d(2-i)∧3,d(3-i)∧3) + m(3i + 5)∧15 +Ti+33) Constant Padded Message Where S2 (i) = i(i+7)/2 + 5; so, the ‘S’ cycle over the values 5, 9, 14, 20. This is a different S2 from that in MD4. The Dig first est few steps of the pass are as follows: … 512 bits d0 = d1 + (d0 + G(d1, d2, d3) + m1 + T17) = 5 Dig d3 est = d0 + (d3 + G(d0, d1, d2) + m6 + T18) = 9 d2 = d3 + (d2 + G(d3, d0, d1) + m11 + T19) = 17 512 bits Dig d1est = d2 + (d1 + G(d2, d3, d0) + m0 + T20) = 22 Figure2 : Entire processes for extended MD5 d0 = d1 + (d0 + G(d1, d2, d3) + m5 +T21) = 5 Message Step 1: Extended MD5 Message Digest Pass 1 Step 3: Extended MD5 Message Digest Pass 3 For each integer ‘i’ from 0 through 15, ⊕y⊕ ⊕z. A separate step A function H(x, y, z) is defined as x⊕ d (-i)∧3 = d (i-i)∧3 + (d (-i)∧3 + F(d (1-i)∧3 ,d (2-i)∧3 ,d (3-i)∧3 ) + is done for each of the 16 words of the message. For each mi + Ti+1) integer ‘i’ from 0 through 15, 588 An Extended MD5 (ExMD5) Hashing Algorithm For Better Data Integrity d(-i)∧3= d(1-i)∧3+ (d(-i)∧3+ H(d(1-i)∧3,d(2-i)∧3,d(3-i)∧3) + m(3i + 5)∧15 + S5= ((bit(∑S4&63))+1) Ti+33) The S4 is calculated by bit(∑S4) di, where i = 0 to 15. For Where S3(0)=4, S3(1)=11, S3(2)=16, S3(0)=23; so, the ‘S’ example, the cycle over the values 4, 11, 16, 23. This is a different S3 is 110100002 and binary value of 63 is 1111112. from that of the MD4. The first few steps of the pass are Step 1: Extract 6-LSB bits from 110100002. as follows: Step 2: Bit-wise AND operation between 1111112 d0=d1+(d0+H(d1,d2,d3)+m5+T33) = 4 and 0100002. d3=d0+(d3+H(d0,d1,d2)+m8+T34)= 11 Step 3: The result is 0100002 is added with d2=d3+(d2+H(d3,d0,d1)+m11+T35)= 16 0000012 get 0100012. d1=d2+(d1+H(d2,d3,d0)+m14+T36)= 23 Step 4: The value 0100012 is XOR with 0100002. The output will be 0000012. d0=d1+(d0+H(d1,d2,d3)+m1+T37) = 4 Step 5: The output value 0000012 is converted Step 4: Extended MD5 Message Digest Pass 4 into decimal value 110 and this value is used as an array A function I(x, y, z) is defined as y⊕(x v ~z). A separate index in Ti and the final value got is ‘d76aa478’. step is done for each of the 16 words of the message. For The extended hash function uses five passes and each integer ‘i’ from 0 through 15, d(-i) ∧3= d(1-i)∧3+ (d(l-i)∧3+ (d(-i)∧3+I(d(l-i)∧3,d(2-i)∧3, d(3-i)∧3) + m(7i)∧15 + Ti+49) S4 = 208 means, the binary value for 20810 provides the better hash values for input messages. ∑ 4. RESULT AND DISCUSSION The extended MD5 for unique hash value has been Where S4(i)=(i+3)(i+4)/2; so, the ↵s cycle over the 6, 10, implemented and the performance has been compared 15, 21. The first few steps of the pass are as follows: with MD5 and MD4 message digest hashing algorithms. d0= d1+( d0+I(d1, d2, d3)+m0+T49) = 6 The results were significant in terms of speed and accuracy. The extended MD5 algorithm has been d3= d0+( d3+I(d0, d1, d2)+m7+T50) = 10 designed and implemented on JAVA under Windows XP d2= d3+( d2+I(d3, d0, d1)+m14+T51) = 15 operating system using synthetic data. The extended MD5 quality is measured in terms of speed, which differs d1= d2+( d1+I(d2, d3, d0)+m5+T52) = 21 from algorithm to algorithm. d0= d1+( d0+I(d1, d2, d3)+m12+T53) = 6 Step 5: Extended MD5 Message Digest Pass 5 A separate step is done for this phase. The stage ‘S5’ is calculated by Figure3: Screen view of extended MD5 Algorithm 589 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 The above figure3 illustrates the screen view of proposed It also noted that there is a difference of values for certain extended MD5 algorithm. The user enters the two source runs between extended MD5 and MD5. The time values or selects the file, which is used as an input for difference between Extended MD5 and MD4 is very high. extended MD5 maximum 128-bit with the help of browse The following table 2 shows the mean and standard button. The hash value calculation button calculating deviation of time for Extended MD5 and the MD5. the unique hash value. Table 2: Summary statistics of extended MD5 in terms of time The following table1 shows the overall performance results obtained for various runs using the following Statistics Function hardware configuration: ExMD5 (in Seconds) MD5 (in Seconds) Processor : Intel Pentium IV 3.06 GHz Mean 0.0045 0.0059 RAM Standard Deviation 0.002799 0.003178 : 512 MB RAM Hard Disk : 80 GB HDD Table 1: Performance comparison of extended MD5 It is observed from the above table2 that the mean of in terms of time extended MD5 is low and standard deviation of extended MD5 is less than the MD5 and therefore, the extended Extended MD5 (in Seconds) MD5 (in Seconds) MD4 (in Seconds) 1 0.001 0.001 0.010 2 0.001 0.003 0.011 3 0.005 0.004 0.018 4 0.004 0.006 0.025 5 0.004 0.009 0.032 6 0.007 0.009 0.037 7 0.008 0.010 0.044 8 0.009 0.003 0.057 9 0.004 0.005 0.062 MD5 is more consistent. The following figure4 shows the graph of the time for extended MD5, MD5 and MD4 message digest algorithms in hash value calculation. Performance Graph of Extended MD5 in terms of Time 0.08 0.07 Time in Seconds Run Extended MD5 0.06 0.05 MD5 0.04 MD4 0.03 0.02 0.01 0 1 2 3 4 5 6 7 8 9 10 Run 10 0.002 0.009 0.073 Figure4: Performance Graph of extended MD5 in The performance with respect to the hash value terms of time calculation time is noted for up to 10 runs for same inputs. This table is used to analyze results about the speed and The pictorial representation of the above figure4 clearly performance of the proposed extended MD5 algorithm. shows that the speed of extended MD5 is greater than the other two hashing algorithms. In the accuracy point It is observed from the above table that the time taken for of view, the proposed extended MD5 has one more pass extended MD5 and MD5 are nearly same for various runs. 590 An Extended MD5 (ExMD5) Hashing Algorithm For Better Data Integrity than MD5 and provides high security than MD5 and International Journal of Information Security, MD4. Springer-Verlag, Vol. 6, No. 3, PP 153-181, 2007. [5] Lars R. Knudsen, Xuejia Lai, and Bart Preneel, 5. CONCLUSION “Attacks on fast double block length hash In network communication, the data transmission functions”, Journal of Cryptology, Springer-Verlag, experiences a great threat by attackers. Even though the Vol.11, No.1, PP 59-72, 1998. attackers cannot read the content of a message, they are [6] Mihir Bellare, Ran Canetti and Hugo Krawczyk, capable of changing the content. The economic liability “Keying of cyber crimes is also expected to increase two-to-three hash functions for message authentication”, Proceedings of the 16th Annual fold by every year. Most of the cyber crimes are International Cryptology Conference on Advances undetected. The security still remains a risky one. The in Cryptology, Lecture Notes in Computer Science, focus of this research paper is mainly on Integrity using Vol. 1109, PP 1-15, 1996. extended MD5 hashing algorithm. The extended MD5 [7] Phong Q. Nguyen, and Igor E. Shparlinski, “The hashing algorithm executes five passes and produces insecurity of digital signature algorithms with unique hash value for each message. This algorithm has partially known nonces”, Journal of Cryptology, been implemented on JAVA using windows XP operating Springer-Verlag, Vol. 15, No. 3, PP 151-176, 2002. system. The proposed extended MD5 algorithm has been [8] Rose Anderson and Eli Biham, “Tiger: A fast new applied in various online transactions for ensure the better hash integrity than MD5. function”, citeseer.ist.psu.edu/ anderson96tiger.html, PP 1-13, 1996. [9] Rose Anderson and Math .C, “The classifications 6. REFERENCES of hash functions”, citeseer.ist.psu.edu/33025.html, [1] Antoon Bosselaers, “Even faster hashing on PP 1-11, 1993. Pentium”, Proceedings of Eurocrypt’97, Lecture [10] Tinjie Cao, Dongdai Lin, and Rui Xue, “A randomized Notes in Computer Science, Springer-Verlag, Vol. RSA-based blind signature scheme for electronic 1233, PP 16-17, 1997. cash”, Computers and Security, Elsevier, Vol. 24, [2] Jie Liang, and Xue-jie lai, “Improved collision attack No. 1, PP 44-49, 2005. on hash function MD5”, Journal of Science and [11] Wen-Ai Jackson, Keith M. Martin, and Christine M. Technology, Springer-Verlag, Vol. 22, No. 1, PP 79- O’Keefe, “Mutually trusted authority-free secret 87, 2007. [3] Jose L Munoz, Jordi Forne, Oscar Esparza, and sharing schemes”, Journal of Design Codes and Miguel Soriano, “Certificate revocation system Cryptography, Springer-Verlag, Vol. 10, No. 4, PP 261- implementation based on the Mrekle hash tree”, 289, 1997. [12] Yacine Challal, Abdelmadjid Bouabdallah, and International Journal of Information Security, Hatem Bettahar, “Hybrid hash chaining scheme for Springer-Verlag, Vol. 2, No. 2, PP 110-124, 2004. [4] Karl Krukow, and Mogens Nielsen, “Trust structures adaptive multicast source authentication of media- – denotational and operational semantics”, streaming”, Computers and Security, Elsevier, Vol. 24, No. 1, PP 57-68, 2005. 591 Karpagam JCS Vol. 2 Issue 3 Mar. - Apr. 2008 [13] Theodosios Tsiakis, and George Sthephanides, “The [17] Steven M Bellovin, and Micheal Merritt, “Encrypted concept of security and trust in electronic payment key exchange: password-based protocols secure systems”, Computers and Security, Elsevier, Vol. 24, against dictionary attacks”, Proceedings of the No. 1, PP 10-15, 2005. IEEE Symposium on Research in Security and [14] Thomas Wu, “The secure remote password Privacy, Oakland, PP 48-56, 1992. protocol”, IEEE Journal on Selected Areas in Communication, IEEE Press, Vol. 11, No. 5, PP 648- Author’s Biography 656, 1993. Dr. S. Karthikeyan received the Doctorate [15] Tian-Fu Lee, Chai-Chaw Chang, and Tzonelih Degree in Computer Science and Hwang, “Private authentication techniques for Engineering from the Alagappa University, global mobility networks”, Journal of Wireless Karaikudi in 2008. He is currently working Personal Communications, Springer-Verlag, Vol. 35, as a Reader and Head in Department of Computer Science, No. 4, PP 329-336, 2005. Karpagam Arts and Science College (Autonomous), [16] Stephanie Alt, “Authenticated hybrid encryption Coimbatore. His research interests include Network for multiple recipients”, IEEE Journal on Selected Security using Cryptography. Areas in Communications, IEEE-Press, Vol.11, No.5, PP 156-182, 2006. 592