Proficient Adaptation to Non-Critical Failure System with Consistency Check in Cloud Storage

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 35 Number 3- May 2016
Proficient Adaptation to Non-Critical Failure
System with Consistency Check in Cloud
Storage
Poornashree B.R.
S. Srividhya
Dept of ISE,
BNM Institute of Technology,
Bengaluru, India.
Assistant Professor, Dept of ISE,
BNM Institute of Technology,
Bengaluru, India.
Abstract— Organizations are opting for outsourcing
data to remote cloud service providers (CSPs). Customers
can rent the CSPs storage infrastructure to store and
retrieve data by paying fees. For an increased level of
scalability, availability, and durability, some customers
may want their data to be replicated on multiple servers.
The more copies the CSP is asked to store, the more fees
the customers are charged. Therefore, customers need to
have a strong guarantee that the CSP is storing all data
copies that are agreed upon in the service contract, and all
these copies are consistent with the most recent
medications issued by the customers. Fault tolerance is
critical in the distributed systems and has to be achieved in
order to make the system highly available. In this paper, the
following features are proposed 1) it provides an evidence
to the customers that the CSP is not cheating by storing
fewer copies; 2) it supports outsourcing of dynamic data,
i.e., it supports insertion, deletion, and append; 3) it allows
authorized users to seamlessly access the file copies stored
by the CSP; and 4) Fault tolerant system is achieved by
making use of byzantine fault tolerance method.
Keywords— Cloud computing; data replication;
outsourcing data storage; dynamic environment; Fault
Tolerance.
I. INTRODUCTION
Organizations rather than storing data on their
private computer systems can outsource data to a
remote cloud service provider (CSP). Such
outsourcing of data storage relieves burden of
constant updating to the server and other computation
issues and enables organizations to concentrate more
on innovations. Moreover by storing data in the cloud
storage authorized users can access the remotely
stored data from different geographic locations
making it more convenient for them to access the
information [1]. Customers can rent the CSPs storage
infrastructure to store and retrieve almost unlimited
amount of data by paying fees metered in
gigabyte/month. For an increased level of scalability,
availability, and durability, some customers may want
their data to be replicated on multiple servers across
multiple data centers [2]. The more copies the CSP is
asked to store, the more fees the customers are
charged. The cloud service providers are not
trustworthy due to the economic reasons the service
provider might delete partial data or tamper data,
ISSN: 2231-5381
apart from intentional dishonest it may be exposed to
administration errors such as backup and restore,
migration of data to new systems or it may be
vulnerable to latent faults, correlated faults and
recovery faults [3]. Therefore, customers need to
have a strong guarantee that the CSP is storing all
data copies that are agreed upon in the service
contract, and all these copies are consistent with the
most recent modifications issued by the customers
[4]. Fault tolerance is the property that enables
a system to continue operating properly in the event
of the failure of (or one or more faults within) some
of its components. A fault-tolerant design enables a
system to continue its intended operation, possibly at
a reduced level, rather than failing completely, when
some part of the system fails. Within the scope of
an individual system, fault tolerance can be achieved
by anticipating exceptional conditions and building
the system to cope with them, and, in general, aiming
for self-stabilization so that the system converges
towards an error-free state [5]. A fault-tolerant
machine uses replicated elements operating in
parallel. At any time, all the replications of each
element should be in the same state. The objective of
Byzantine fault tolerance is to be able to defend
against Byzantine failures, in which components of a
system fail with symptoms that prevent some
components of the system from reaching agreement
among themselves, where such agreement is needed
for the correct operation of the system. Correctly
functioning components of a Byzantine fault tolerant
system will be able to provide the system's service,
assuming there are not too many faulty components
[6].
II. LITERATURE SURVEY
Provable data possession (PDP) is a technique for
validating data integrity over remote servers. In a
typical PDP model, the data owner generates some
metadata/information for a data file to be used later
for verification purposes through a challengeresponse protocol with the remote/cloud server. The
owner sends the file to be stored on a remote server
which may be untrusted. As a proof that the server is
still possessing the data file in its original form, it
needs to correctly compute a response to a challenge
http://www.ijettjournal.org
Page 111
International Journal of Engineering Trends and Technology (IJETT) – Volume 35 Number 3- May 2016
vector sent from a verifier who can be the original
data owner or a trusted entity that shares some
information with the owner. Researchers have
proposed different variations of PDP schemes under
different cryptographic assumptions [7]; One of the
core design principles of outsourcing data is to
provide dynamic behavior of data for various
applications. This means that the remotely stored data
can be not only accessed by the authorized users, but
also updated and scaled (through block level
operations) by the data owner. PDP schemes
presented in [8]–[14] focus on only static or
warehoused data, where the outsourced data is kept
unchanged over remote servers. Examples of PDP
constructions that deal with dynamic data are [15]–
[18]. The latter are however for a single copy of the
data file. PDP schemes are also presented for multiple
copies of static data [19] [20].
III. PROBLEM STATEMENT
Once the organization outsources its data to a
remote CSP which may not be trustworthy, the data
owners lose the direct control over their sensitive
data. This lack of control raises new formidable and
challenging tasks related to data confidentiality and
integrity protection in cloud computing. So the
customer should have a strong guarantee that the
service provider is storing all the outsourced data and
all the data copies are consistent with the most recent
modifications. Another important issue is replicating
data across multiple servers increases scalability and
availability but leads to consistency issue where the
data copies stored across multiple servers might not
be consistent and has to be verified.
IV. SYSTEM AND ASSUMPTIONS
A. System Architecture
Fig.1 Data Storage System Model in Cloud
Computing
The System model in the fig.1 represents the data
storage process in the cloud computing where the
data owner or the trusted third party uploads his files
to the cloud on to the multiple servers. The files are
encrypted before offloading them to the cloud and the
encrypted files are stored across multiple servers. The
single secret shared key is shared in between the data
owner and his authorized users, the users can request
for the data from the cloud and receive the data
copies, users after receiving the data copies from the
ISSN: 2231-5381
cloud can decrypt the file by using the single secret
shared key.
B. System Components
The cloud computing storage model considered in
this work consists of four main components as
illustrated in Fig. 1: (i) a data owner that can be an
organization originally possessing sensitive data to be
stored in the cloud; (ii) a CSP who manages cloud
servers (CSs) and provides paid storage space on its
infrastructure to store the owner’s files; (iii)
authorized users — a set of owner’s clients who have
the right to access the remote data; and (iv) verifier- a
verifier is an individual who verifies the integrity of
the files stored on the cloud upon user’s request.
C. Outsourcing, Updating and Accessing
The data owner has a file F and the CSP offers to
store n copies {F1, F2,..., Fn} of the owner’s file on
different servers to prevent simultaneous failure of all
copies in exchange of pre-specified fees metered in
GB/month. The number of copies depends on the
nature of data; more copies are needed for critical
data that cannot easily be reproduced, and to achieve
a higher level of scalability. This critical data should
be replicated on multiple servers across multiple data
centers. On the other hand, non-critical, reproducible
data are stored at reduced levels of redundancy. The
number of copies to be stored is decided based on the
byzantine fault tolerance method, the fault probability
value is considered and the number of servers
required to store the file is calculated. The CSP
pricing model is related to the number of data copies.
After outsourcing all n copies of the file, the owner
may interact with the CSP to perform block-level
operations on all copies. These operations includes
modify, insert, append, and delete of the outsourced
data copies. An authorized user of the outsourced
data sends a data access request to the CSP and
receives a file copy. The data access request is
directed to the server which is active, and thus the
user can receive the data file without any interruption
and data loss.
D. Threat Model
The integrity of customers data in the cloud may
be at risk due to the following reasons. First, the CSP
whose goal is likely to make a profit and maintain a
reputation has an incentive to hide data loss (due to
hardware failure, manage- ment errors, various
attacks) or reclaim storage by discarding data that has
not been or is rarely accessed. Second, a dishonest
CSP may store fewer copies than what has been
agreed upon in the service contact with the data
owner, and try to convince the owner that all copies
are correctly stored intact. Third, to save the
computational resources, the CSP may totally ignore
the data-update requests issued by the owner, or not
execute them on all copies leading to inconsistency
between the file copies. The goal of the proposed
scheme is to detect (with high probability) the CSP
http://www.ijettjournal.org
Page 112
International Journal of Engineering Trends and Technology (IJETT) – Volume 35 Number 3- May 2016
misbehavior by validating the number and integrity of
file copies.
E. Underlying Algorithms
The proposed scheme consists of MD5 algorithm and
Byzantine Fault Tolerance method algorithms.
1)
MD5
Algorithm:
The MD5 message
digest algorithm is a widely used cryptographic
hash function producing a 128-bit (16-byte) hash
value, typically expressed in text format as a 32
digit hexadecimal number. MD5 has been utilized
in a wide variety of cryptographic applications,
and is also commonly used to verify data
integrity.
MD5 algorithm consists of 5 steps:
Step 1. Appending Padding Bits. The original
message is "padded" (extended) so that its length (in
bits) is congruent to 448, modulo 512. The padding
rules are:
The original message is always padded with
one bit "1" first.
Then zero or more bits "0" are padded to
bring the length of the message up to 64 bits
fewer than a multiple of 512.
Step 2. Appending Length. 64 bits are appended to
the end of the padded message to indicate the length
of the original message in bytes. The rules of
appending length are:
The length of the original message in bytes
is converted to its binary format of 64 bits. If
overflow happens, only the low-order 64 bits
are used.
Break the 64-bit length into 2 words (32 bits
each).
The low-order word is appended first and
followed by the high-order word.
Step 3. Initializing MD Buffer. MD5 algorithm
requires a 128-bit buffer with a specific initial value.
The rules of initializing buffer are:
The buffer is divided into 4 words (32 bits
each), named as A, B, C, and D.
Word A is initialized to: 0x67452301.
Word B is initialized to: 0xEFCDAB89.
Word C is initialized to: 0x98BADCFE.
Word D is initialized to: 0x10325476.
Step 4. Processing Message in 512-bit Blocks. This is
the main step of MD 5 algorithm, which loops
through the padded and appended message in blocks
of 512 bits each. For each input block, 4 rounds of
operations are performed with 16 operations in each
round.
2) Byzantine Fault Tolerance Method: The fault
tolerance formula is derived from the byzantine
algorithm which is based on DepSky architecture.
The DepSky architecture consists of four clouds
and each cloud uses its own particular interface.
The depsky algorithm exists in the clients
machine as a software library to communicate
with each cloud.
Fault Tolerance = 1+(n-1)f
Where f = Fault Probability.
n = No of Systems used.
The steps involved in calculating the total number
of storage servers required to store the files are:
Step 1. The total number of storage servers
available is considered, n=4.
Step 2. The fault probability value is calculated
based on the total number of servers available and
number of servers down.
Step 3. Fault tolerance formula is applied by using
the available values.
Step 4. The obtained value is rounded off to
obtain the total number of storage servers required
to store the files.
V. IMPLEMENTATION
The proposed method is implemented on top of
Drivehq cloud. Through Drivehq customers can
launch and manage Linux/Unix/Windows server
instances (virtual servers) in Drivehq’s infrastructure.
The number of instances can be automatically scaled
up and down according to customers’ needs. Drivehq
is a web storage service to store and retrieve almost
unlimited amount of data.
The implementation of the presented scheme
consists of three modules: Data Owner Module - Data
Owner manages both the authorized users and the
cloud servers and also checks for the fault tolerance
value and suggests the user. Authorized User Module
- In this module the authorized user can maintain his
files on the cloud by uploading and downloading
them, can check for consistency of the data files and
make dynamic update to the files. Verifier Module In this module the user sends request to the trusted
third party to verify the consistency of the data files
in the cloud and sends back the response to the user.
Step 5. Output. The contents in buffer words A, B, C,
D are returned in sequence with low-order byte first.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 113
International Journal of Engineering Trends and Technology (IJETT) – Volume 35 Number 3- May 2016
A. File Upload Process- The implementation of the
file upload process is as follows. The file is
selected from the user’s local system. Using MD5
algorithm message digest is created for all
uploading files. Then the message digest is stored
in the user’s data base and file is transferred to
cloud storage based on the fault tolerance level.
[3].
B. Request Verification - The implementation of the
request verification process is as follows. The
files which are available to verify are displayed.
The file which has to be verified is chosen and
verification request is sent to the verifier. After
verification, result is sent back through email to
the authorized user.
[6].
C. Verify Files - The implementation for the
verification of files process is as follows. Request
is received from user to verify the files. The file is
fetched from the cloud storage. Message digest
(MD1)is generated for the file. Get MD2 of the
file from the database. Compare both MD1 with
MD2. Result is sent back to the user.
VI. SUMMARY AND CONCLUDING REMARKS
Outsourcing data to remote servers has become a
growing trend for many organizations to alleviate the
burden of local data storage and maintenance. The proposed
work discusses the problem of creating multiple copies of
dynamic data file and verifying those copies stored on
untrusted cloud servers. where the data owner is capable of
not only archiving and accessing the data copies stored by
the CSP, but also updating and scaling these copies on the
remote servers. Moreover, the proposed scheme supports
public verifiability, enables arbitrary number of auditing,
and allows possession-free verification where the verifier
has the ability to verify the data integrity of the files from
the server.
[4].
[5].
[7].
[8].
[9].
[10].
[11].
[12].
[13].
[14].
The demand for outsourcing the data to the cloud is
tremendously increasing. So there is a need for an efficient
method to check the consistency among various data copies
across multiple servers. Fault tolerance is critical and has to
be ensured in case of any server failures. The proposed
method provides efficient consistency check for the data
copies which are outsourced to the cloud and fault tolerance
for servers. In this work the trusted third party or the
verifier can verify the integrity of the files stored on the
server to ensure consistency among multiple copies. Fault
tolerance is achieved by replicating data among multiple
servers across multiple data centers. Customers can have a
strong guarantee that all the data copies are stored as per the
service agreement and all the copies are consistent with
most recent modification.
[15].
REFERENCES
[19].
[1].
[2].
G. Ateniese et al., ―Provable data possession at
untrusted stores,‖ in Proc. 14th ACM Conf. Comput.
Commun. Secur. (CCS), New York, NY, USA, 2007,
pp. 598–609.
C. Erway, A. Küpçü, C. Papamanthou, and R.
Tamassia, ―Dynamic provable data possession,‖ in
Proc. 16th ACM Conf. Comput. Commun. Secur. (CCS),
New York, NY, USA, 2009, pp. 213–222.
ISSN: 2231-5381
[16].
[17].
[18].
[20].
R. Curtmola, O. Khan, R. Burns, and G. Ateniese,
―MR-PDP: Multiple replica provable data possession,‖
in Proc. 28th IEEE ICDCS, Jun. 2008, pp. 411–420.
Y. Deswarte, J.-J. Quisquater, and A. Saïdane, ―Remote
integrity checking,‖ in Proc. 6th Working Conf. Integr.
Internal Control Inf. Syst. (IICIS), 2003, pp. 1–11.
Lakshmi Prasad Saikia and Kundal Kr. Medhi,
―Distributed Fault Tolerance System in Real Time
Environments,‖ IJARCSSE, Vol. 3, Issue. 11, Nov.
2013, pp. 575-580, ISSN:2277 128X.
Kevin Driscoll, Brendan Hall, Håkan Sivencrona, and
Phil Zumsteg, ― Byzantine Fault Tolerance, from
Theory to Reality,‖ in Honeywell International
Technology Drive, Minneapolis, 2002, MN 55418.
Poornashree B R and S Srividhya, ―A Survey on
Provable Data Possession in Cloud Computing
Systems,‖ IJERT, Vol. 5, Issue. 03, Mar. 2016, pp. 7779, ISSN:2278-0181.
K. Zeng, ―Publicly verifiable remote data integrity,‖ in
Proc. 10th Int. Conf. Inf. Commun. Secur. (ICICS),
2008, pp. 419–434.
D. L. G. Filho and P. S. L. M. Barreto, ―Demonstrating
data possession and uncheatable data transfer,‖ IACR
(Inter- national Association for Cryptologic Research)
ePrint Archive, Tech. Rep. 2006/150, 2006.
F. Sebé, J. Domingo-Ferrer, A. Martinez-Balleste, Y.
Deswarte, and J.-J. Quisquater, ―Efficient remote data
possession
checking
in
critical
information
infrastructures,‖ IEEE Trans. Knowl. Data Eng., vol.
20, no. 8, pp. 1034–1038, Aug. 2008.
P. Golle, S. Jarecki, and I. Mironov, ―Cryptographic
primitives enforcing communication and storage
complexity,‖ in Proc. 6th Int. Conf. Finan- cial
Cryptograph. (FC), Berlin, Germany, 2003, pp. 120–
135.
M. A. Shah, M. Baker, J. C. Mogul, and R.
Swaminathan, ―Auditing to keep online storage services
honest,‖ in Proc. 11th USENIX Workshop Hot Topics
Oper. Syst. (HOTOS), Berkeley, CA, USA, 2007, pp.
1–6.
M. A. Shah, R. Swaminathan, and M. Baker, ―Privacypreserving audit and extraction of digital contents,‖
IACR Cryptology ePrint Archive, Tech. Rep. 2008/186,
2008.
E. Mykletun, M. Narasimha, and G. Tsudik,
―Authentication and integrity in outsourced databases,‖
ACM Trans. Storage, vol. 2, no. 2, pp. 107–138, 2006.
G. Ateniese, R. D. Pietro, L. V. Mancini, and G.
Tsudik, ―Scalable and efficient provable data
possession,‖ in Proc. 4th Int. Conf. Secur. Privacy
Commun. Netw. (SecureComm), New York, NY, USA,
2008, Art. ID 9.
C. Wang, Q. Wang, K. Ren, and W. Lou. (2009).
―Ensuring data storage security in cloud computing,‖
IACR Cryptology ePrint Archive, Tech. Rep. 2009/081.
[Online]. Available: http://eprint.iacr.org/
Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou,
―Enabling public verifiability and data dynamics for
storage security in cloud computing,‖ in Proc. 14th Eur.
Symp. Res. Comput. Secur. (ESORICS), Berlin,
Germany, 2009, pp. 355–370.
Z. Hao, S. Zhong, and N. Yu, ―A privacy-preserving
remote data integrity checking protocol with data
dynamics and public verifiability,‖ IEEE Trans. Knowl.
Data Eng., vol. 23, no. 9, pp. 1432–1437, Sep. 2011.
M. A. Hasan and A. F. Barsoum. (2010). ―Provable
possession and replication of data over cloud servers,‖
Centre Appl. Cryptograph. Res., Univ. Waterloo,
Waterloo, ON, USA,Tech.Rep.2010/32.
Z. Hao and N. Yu, ―A multiple-replica remote data
possession checking protocol with public verifiability,‖
in Proc. 2nd Int. Symp. Data, Privacy, E-Commerce,
Sep. 2010, pp. 84–89.
http://www.ijettjournal.org
Page 114
Download