International Journal of Engineering Trends and Technology (IJETT) – Volume 35 Number 3- May 2016 Proficient Adaptation to Non-Critical Failure System with Consistency Check in Cloud Storage Poornashree B.R. S. Srividhya Dept of ISE, BNM Institute of Technology, Bengaluru, India. Assistant Professor, Dept of ISE, BNM Institute of Technology, Bengaluru, India. Abstract— Organizations are opting for outsourcing data to remote cloud service providers (CSPs). Customers can rent the CSPs storage infrastructure to store and retrieve data by paying fees. For an increased level of scalability, availability, and durability, some customers may want their data to be replicated on multiple servers. The more copies the CSP is asked to store, the more fees the customers are charged. Therefore, customers need to have a strong guarantee that the CSP is storing all data copies that are agreed upon in the service contract, and all these copies are consistent with the most recent medications issued by the customers. Fault tolerance is critical in the distributed systems and has to be achieved in order to make the system highly available. In this paper, the following features are proposed 1) it provides an evidence to the customers that the CSP is not cheating by storing fewer copies; 2) it supports outsourcing of dynamic data, i.e., it supports insertion, deletion, and append; 3) it allows authorized users to seamlessly access the file copies stored by the CSP; and 4) Fault tolerant system is achieved by making use of byzantine fault tolerance method. Keywords— Cloud computing; data replication; outsourcing data storage; dynamic environment; Fault Tolerance. I. INTRODUCTION Organizations rather than storing data on their private computer systems can outsource data to a remote cloud service provider (CSP). Such outsourcing of data storage relieves burden of constant updating to the server and other computation issues and enables organizations to concentrate more on innovations. Moreover by storing data in the cloud storage authorized users can access the remotely stored data from different geographic locations making it more convenient for them to access the information [1]. Customers can rent the CSPs storage infrastructure to store and retrieve almost unlimited amount of data by paying fees metered in gigabyte/month. For an increased level of scalability, availability, and durability, some customers may want their data to be replicated on multiple servers across multiple data centers [2]. The more copies the CSP is asked to store, the more fees the customers are charged. The cloud service providers are not trustworthy due to the economic reasons the service provider might delete partial data or tamper data, ISSN: 2231-5381 apart from intentional dishonest it may be exposed to administration errors such as backup and restore, migration of data to new systems or it may be vulnerable to latent faults, correlated faults and recovery faults [3]. Therefore, customers need to have a strong guarantee that the CSP is storing all data copies that are agreed upon in the service contract, and all these copies are consistent with the most recent modifications issued by the customers [4]. Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its components. A fault-tolerant design enables a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails. Within the scope of an individual system, fault tolerance can be achieved by anticipating exceptional conditions and building the system to cope with them, and, in general, aiming for self-stabilization so that the system converges towards an error-free state [5]. A fault-tolerant machine uses replicated elements operating in parallel. At any time, all the replications of each element should be in the same state. The objective of Byzantine fault tolerance is to be able to defend against Byzantine failures, in which components of a system fail with symptoms that prevent some components of the system from reaching agreement among themselves, where such agreement is needed for the correct operation of the system. Correctly functioning components of a Byzantine fault tolerant system will be able to provide the system's service, assuming there are not too many faulty components [6]. II. LITERATURE SURVEY Provable data possession (PDP) is a technique for validating data integrity over remote servers. In a typical PDP model, the data owner generates some metadata/information for a data file to be used later for verification purposes through a challengeresponse protocol with the remote/cloud server. The owner sends the file to be stored on a remote server which may be untrusted. As a proof that the server is still possessing the data file in its original form, it needs to correctly compute a response to a challenge http://www.ijettjournal.org Page 111 International Journal of Engineering Trends and Technology (IJETT) – Volume 35 Number 3- May 2016 vector sent from a verifier who can be the original data owner or a trusted entity that shares some information with the owner. Researchers have proposed different variations of PDP schemes under different cryptographic assumptions [7]; One of the core design principles of outsourcing data is to provide dynamic behavior of data for various applications. This means that the remotely stored data can be not only accessed by the authorized users, but also updated and scaled (through block level operations) by the data owner. PDP schemes presented in [8]–[14] focus on only static or warehoused data, where the outsourced data is kept unchanged over remote servers. Examples of PDP constructions that deal with dynamic data are [15]– [18]. The latter are however for a single copy of the data file. PDP schemes are also presented for multiple copies of static data [19] [20]. III. PROBLEM STATEMENT Once the organization outsources its data to a remote CSP which may not be trustworthy, the data owners lose the direct control over their sensitive data. This lack of control raises new formidable and challenging tasks related to data confidentiality and integrity protection in cloud computing. So the customer should have a strong guarantee that the service provider is storing all the outsourced data and all the data copies are consistent with the most recent modifications. Another important issue is replicating data across multiple servers increases scalability and availability but leads to consistency issue where the data copies stored across multiple servers might not be consistent and has to be verified. IV. SYSTEM AND ASSUMPTIONS A. System Architecture Fig.1 Data Storage System Model in Cloud Computing The System model in the fig.1 represents the data storage process in the cloud computing where the data owner or the trusted third party uploads his files to the cloud on to the multiple servers. The files are encrypted before offloading them to the cloud and the encrypted files are stored across multiple servers. The single secret shared key is shared in between the data owner and his authorized users, the users can request for the data from the cloud and receive the data copies, users after receiving the data copies from the ISSN: 2231-5381 cloud can decrypt the file by using the single secret shared key. B. System Components The cloud computing storage model considered in this work consists of four main components as illustrated in Fig. 1: (i) a data owner that can be an organization originally possessing sensitive data to be stored in the cloud; (ii) a CSP who manages cloud servers (CSs) and provides paid storage space on its infrastructure to store the owner’s files; (iii) authorized users — a set of owner’s clients who have the right to access the remote data; and (iv) verifier- a verifier is an individual who verifies the integrity of the files stored on the cloud upon user’s request. C. Outsourcing, Updating and Accessing The data owner has a file F and the CSP offers to store n copies {F1, F2,..., Fn} of the owner’s file on different servers to prevent simultaneous failure of all copies in exchange of pre-specified fees metered in GB/month. The number of copies depends on the nature of data; more copies are needed for critical data that cannot easily be reproduced, and to achieve a higher level of scalability. This critical data should be replicated on multiple servers across multiple data centers. On the other hand, non-critical, reproducible data are stored at reduced levels of redundancy. The number of copies to be stored is decided based on the byzantine fault tolerance method, the fault probability value is considered and the number of servers required to store the file is calculated. The CSP pricing model is related to the number of data copies. After outsourcing all n copies of the file, the owner may interact with the CSP to perform block-level operations on all copies. These operations includes modify, insert, append, and delete of the outsourced data copies. An authorized user of the outsourced data sends a data access request to the CSP and receives a file copy. The data access request is directed to the server which is active, and thus the user can receive the data file without any interruption and data loss. D. Threat Model The integrity of customers data in the cloud may be at risk due to the following reasons. First, the CSP whose goal is likely to make a profit and maintain a reputation has an incentive to hide data loss (due to hardware failure, manage- ment errors, various attacks) or reclaim storage by discarding data that has not been or is rarely accessed. Second, a dishonest CSP may store fewer copies than what has been agreed upon in the service contact with the data owner, and try to convince the owner that all copies are correctly stored intact. Third, to save the computational resources, the CSP may totally ignore the data-update requests issued by the owner, or not execute them on all copies leading to inconsistency between the file copies. The goal of the proposed scheme is to detect (with high probability) the CSP http://www.ijettjournal.org Page 112 International Journal of Engineering Trends and Technology (IJETT) – Volume 35 Number 3- May 2016 misbehavior by validating the number and integrity of file copies. E. Underlying Algorithms The proposed scheme consists of MD5 algorithm and Byzantine Fault Tolerance method algorithms. 1) MD5 Algorithm: The MD5 message digest algorithm is a widely used cryptographic hash function producing a 128-bit (16-byte) hash value, typically expressed in text format as a 32 digit hexadecimal number. MD5 has been utilized in a wide variety of cryptographic applications, and is also commonly used to verify data integrity. MD5 algorithm consists of 5 steps: Step 1. Appending Padding Bits. The original message is "padded" (extended) so that its length (in bits) is congruent to 448, modulo 512. The padding rules are: The original message is always padded with one bit "1" first. Then zero or more bits "0" are padded to bring the length of the message up to 64 bits fewer than a multiple of 512. Step 2. Appending Length. 64 bits are appended to the end of the padded message to indicate the length of the original message in bytes. The rules of appending length are: The length of the original message in bytes is converted to its binary format of 64 bits. If overflow happens, only the low-order 64 bits are used. Break the 64-bit length into 2 words (32 bits each). The low-order word is appended first and followed by the high-order word. Step 3. Initializing MD Buffer. MD5 algorithm requires a 128-bit buffer with a specific initial value. The rules of initializing buffer are: The buffer is divided into 4 words (32 bits each), named as A, B, C, and D. Word A is initialized to: 0x67452301. Word B is initialized to: 0xEFCDAB89. Word C is initialized to: 0x98BADCFE. Word D is initialized to: 0x10325476. Step 4. Processing Message in 512-bit Blocks. This is the main step of MD 5 algorithm, which loops through the padded and appended message in blocks of 512 bits each. For each input block, 4 rounds of operations are performed with 16 operations in each round. 2) Byzantine Fault Tolerance Method: The fault tolerance formula is derived from the byzantine algorithm which is based on DepSky architecture. The DepSky architecture consists of four clouds and each cloud uses its own particular interface. The depsky algorithm exists in the clients machine as a software library to communicate with each cloud. Fault Tolerance = 1+(n-1)f Where f = Fault Probability. n = No of Systems used. The steps involved in calculating the total number of storage servers required to store the files are: Step 1. The total number of storage servers available is considered, n=4. Step 2. The fault probability value is calculated based on the total number of servers available and number of servers down. Step 3. Fault tolerance formula is applied by using the available values. Step 4. The obtained value is rounded off to obtain the total number of storage servers required to store the files. V. IMPLEMENTATION The proposed method is implemented on top of Drivehq cloud. Through Drivehq customers can launch and manage Linux/Unix/Windows server instances (virtual servers) in Drivehq’s infrastructure. The number of instances can be automatically scaled up and down according to customers’ needs. Drivehq is a web storage service to store and retrieve almost unlimited amount of data. The implementation of the presented scheme consists of three modules: Data Owner Module - Data Owner manages both the authorized users and the cloud servers and also checks for the fault tolerance value and suggests the user. Authorized User Module - In this module the authorized user can maintain his files on the cloud by uploading and downloading them, can check for consistency of the data files and make dynamic update to the files. Verifier Module In this module the user sends request to the trusted third party to verify the consistency of the data files in the cloud and sends back the response to the user. Step 5. Output. The contents in buffer words A, B, C, D are returned in sequence with low-order byte first. ISSN: 2231-5381 http://www.ijettjournal.org Page 113 International Journal of Engineering Trends and Technology (IJETT) – Volume 35 Number 3- May 2016 A. File Upload Process- The implementation of the file upload process is as follows. The file is selected from the user’s local system. Using MD5 algorithm message digest is created for all uploading files. Then the message digest is stored in the user’s data base and file is transferred to cloud storage based on the fault tolerance level. [3]. B. Request Verification - The implementation of the request verification process is as follows. The files which are available to verify are displayed. The file which has to be verified is chosen and verification request is sent to the verifier. After verification, result is sent back through email to the authorized user. [6]. C. Verify Files - The implementation for the verification of files process is as follows. Request is received from user to verify the files. The file is fetched from the cloud storage. Message digest (MD1)is generated for the file. Get MD2 of the file from the database. Compare both MD1 with MD2. Result is sent back to the user. VI. SUMMARY AND CONCLUDING REMARKS Outsourcing data to remote servers has become a growing trend for many organizations to alleviate the burden of local data storage and maintenance. The proposed work discusses the problem of creating multiple copies of dynamic data file and verifying those copies stored on untrusted cloud servers. where the data owner is capable of not only archiving and accessing the data copies stored by the CSP, but also updating and scaling these copies on the remote servers. Moreover, the proposed scheme supports public verifiability, enables arbitrary number of auditing, and allows possession-free verification where the verifier has the ability to verify the data integrity of the files from the server. [4]. [5]. [7]. [8]. [9]. [10]. [11]. [12]. [13]. [14]. The demand for outsourcing the data to the cloud is tremendously increasing. So there is a need for an efficient method to check the consistency among various data copies across multiple servers. Fault tolerance is critical and has to be ensured in case of any server failures. The proposed method provides efficient consistency check for the data copies which are outsourced to the cloud and fault tolerance for servers. In this work the trusted third party or the verifier can verify the integrity of the files stored on the server to ensure consistency among multiple copies. Fault tolerance is achieved by replicating data among multiple servers across multiple data centers. Customers can have a strong guarantee that all the data copies are stored as per the service agreement and all the copies are consistent with most recent modification. [15]. REFERENCES [19]. [1]. [2]. G. Ateniese et al., ―Provable data possession at untrusted stores,‖ in Proc. 14th ACM Conf. Comput. Commun. Secur. (CCS), New York, NY, USA, 2007, pp. 598–609. C. Erway, A. Küpçü, C. Papamanthou, and R. Tamassia, ―Dynamic provable data possession,‖ in Proc. 16th ACM Conf. Comput. Commun. Secur. (CCS), New York, NY, USA, 2009, pp. 213–222. ISSN: 2231-5381 [16]. [17]. [18]. [20]. R. Curtmola, O. Khan, R. Burns, and G. Ateniese, ―MR-PDP: Multiple replica provable data possession,‖ in Proc. 28th IEEE ICDCS, Jun. 2008, pp. 411–420. Y. Deswarte, J.-J. Quisquater, and A. Saïdane, ―Remote integrity checking,‖ in Proc. 6th Working Conf. Integr. Internal Control Inf. Syst. (IICIS), 2003, pp. 1–11. Lakshmi Prasad Saikia and Kundal Kr. Medhi, ―Distributed Fault Tolerance System in Real Time Environments,‖ IJARCSSE, Vol. 3, Issue. 11, Nov. 2013, pp. 575-580, ISSN:2277 128X. Kevin Driscoll, Brendan Hall, Håkan Sivencrona, and Phil Zumsteg, ― Byzantine Fault Tolerance, from Theory to Reality,‖ in Honeywell International Technology Drive, Minneapolis, 2002, MN 55418. Poornashree B R and S Srividhya, ―A Survey on Provable Data Possession in Cloud Computing Systems,‖ IJERT, Vol. 5, Issue. 03, Mar. 2016, pp. 7779, ISSN:2278-0181. K. Zeng, ―Publicly verifiable remote data integrity,‖ in Proc. 10th Int. Conf. Inf. Commun. Secur. (ICICS), 2008, pp. 419–434. D. L. G. Filho and P. S. L. M. Barreto, ―Demonstrating data possession and uncheatable data transfer,‖ IACR (Inter- national Association for Cryptologic Research) ePrint Archive, Tech. Rep. 2006/150, 2006. F. Sebé, J. Domingo-Ferrer, A. Martinez-Balleste, Y. Deswarte, and J.-J. Quisquater, ―Efficient remote data possession checking in critical information infrastructures,‖ IEEE Trans. Knowl. Data Eng., vol. 20, no. 8, pp. 1034–1038, Aug. 2008. P. Golle, S. Jarecki, and I. Mironov, ―Cryptographic primitives enforcing communication and storage complexity,‖ in Proc. 6th Int. Conf. Finan- cial Cryptograph. (FC), Berlin, Germany, 2003, pp. 120– 135. M. A. Shah, M. Baker, J. C. Mogul, and R. Swaminathan, ―Auditing to keep online storage services honest,‖ in Proc. 11th USENIX Workshop Hot Topics Oper. Syst. (HOTOS), Berkeley, CA, USA, 2007, pp. 1–6. M. A. Shah, R. Swaminathan, and M. Baker, ―Privacypreserving audit and extraction of digital contents,‖ IACR Cryptology ePrint Archive, Tech. Rep. 2008/186, 2008. E. Mykletun, M. Narasimha, and G. Tsudik, ―Authentication and integrity in outsourced databases,‖ ACM Trans. Storage, vol. 2, no. 2, pp. 107–138, 2006. G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, ―Scalable and efficient provable data possession,‖ in Proc. 4th Int. Conf. Secur. Privacy Commun. Netw. (SecureComm), New York, NY, USA, 2008, Art. ID 9. C. Wang, Q. Wang, K. Ren, and W. Lou. (2009). ―Ensuring data storage security in cloud computing,‖ IACR Cryptology ePrint Archive, Tech. Rep. 2009/081. [Online]. Available: http://eprint.iacr.org/ Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, ―Enabling public verifiability and data dynamics for storage security in cloud computing,‖ in Proc. 14th Eur. Symp. Res. Comput. Secur. (ESORICS), Berlin, Germany, 2009, pp. 355–370. Z. Hao, S. Zhong, and N. Yu, ―A privacy-preserving remote data integrity checking protocol with data dynamics and public verifiability,‖ IEEE Trans. Knowl. Data Eng., vol. 23, no. 9, pp. 1432–1437, Sep. 2011. M. A. Hasan and A. F. Barsoum. (2010). ―Provable possession and replication of data over cloud servers,‖ Centre Appl. Cryptograph. Res., Univ. Waterloo, Waterloo, ON, USA,Tech.Rep.2010/32. Z. Hao and N. Yu, ―A multiple-replica remote data possession checking protocol with public verifiability,‖ in Proc. 2nd Int. Symp. Data, Privacy, E-Commerce, Sep. 2010, pp. 84–89. http://www.ijettjournal.org Page 114