Document 12913214

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 27 Number 3 - September 2015
Secrecy for Anonymity Users of Crowd sourcing Database
PylaJyothi1, Ch. NagabhushanaRao2
1,2
Final M.Tech Student 1,Associate Professor2
Department of Computer Science & Engineering, Dadi Institute of Engineering & Technology, Anakapalle531 002, A.P
Abstract: Privacy is the major concern while storing
and retrieving the data to or from crowd sourcing data
base, in crowd sourcing techniques into the database
engine usually crowd sourcing job requires us to
publish some sensitive data to the anonymous human
workers.
In this paper we are proposing a novel model for
authentication, group key protocol and data
confidentiality. Authentication and group key
generation can be implemented by the authentication
group key protocol and data confidentiality can be
maintained by the cryptographic algorithm, our
proposed work is more efficient and less time
complexity than the traditional approaches
I. INTRODUCTION
k-anonymity is a property controlled by
certain anonymized information. "Given individual
particular field-organized
information, create
information’s receiving with experimental assurances
that the people who are the information's subjects
can't be re-recognized while the information remain
for all intents and purposes useful."[1][2] A receive of
information is said to have the k-anonymity property
if the data for every individual contained in the result
can't be recognized from in any event k-1 people
whose data likewise show up in the result.
Society
is
encountering
exponential
development in the number and combination of
information accumulations containing individual
particular data as PC innovation, network availability
and disk storage room turn out to be progressively
reasonable. Information holders, working self and
with restricted knowledge, are left with the trouble of
discharging data that does not trade off protection,
privacy or national hobbies. By and large the
database's survival itself relies on upon the
information holder's capacity to deliver unknown
information in light of the fact that not discharging
such data at all may reduce the requirement for the
information, while on the other hand, neglecting to
give legitimate assurance inside of a result may make
circumstances that mischief people in general or
others.[3]
In a previous work presented fundamental
security models termed invalid guide, k-map also,
wrong-delineate give assurance by guaranteeing that
resulted data guide to no, k or mistaken substances,
separately. To decide what number of people each
resulted tuple really matches obliges consolidating the
resulted information with remotely accessible
information and breaking down other conceivable
ISSN: 2231-5381
attacks. Making such a determination specifically can
be a to a great degree troublesome task for the
information holder who results data.
In spite of the fact that it can expect the
information holder knows which information in PT
additionally show up remotely, and thusly what
constitutes a semi identifier, the particular qualities
contained in outer information can't be expected. I
along these lines seek to ensure the data in this work
by fulfilling a marginally distinctive limitation on
resulted information, termed the k-anonymity
prerequisite. This is an uncommon instance of k map
insurance where k is implemented on the resulted
information.[4]
The ideas of k-anonymity and of a quasiidentifier are straightforward. Nevertheless, care must
be taken to precisely state what is meant. [5] Provides
a detailed discussion of k-anonymity. A brief
summary is provided in this section as background for
the upcoming presentations on generalization and
suppression. Unless otherwise stated, the term data
refers to person-specific information that is
conceptually organized as a table of rows (or records)
and columns (orfields). Each row is termed a tuple.
Tuples within a table are not necessarily unique. Each
column is called an attribute and denotes a semantic
category of information that is a set of possible
values; therefore, an attribute is also a domain.
Attributes within a table are unique. So by observing
a table, each row is an ordered n-tuple of values <d1,
d2, …,dn> such that each value dj is in the domain of
the j-th column, for j=1, 2, …, n where n is the
number of columns. This corresponds to relational
database concepts [7].
II. RELATED WORK
The issue of k-anonymity [6] always tried to get on
the protected table PrT to be generated, one of the
important data requirements that has been followed
by the community and by companies releasing the
data, and based on the data which the generated data
should be differentiated by the related to no less than
a certain number of users. The group of attributes
consisted in the private table and also available
outside and so that exclusively for linked is known as
quasi-identifier. The requirement initiated is then
translated in [8] in the k-anonymity considerations
below, which that every tuple released cannot be
related to fewer than k users
Requirement 1:
http://www.ijettjournal.org
Page 167
International Journal of Engineering Trends and Technology (IJETT) – Volume 27 Number 3 - September 2015
Each release of data must be such that every
combination of values of quasi-identifiers can be
indistinctly matched to at least k respondents.
It considered to be impossible or highly not a
practical and limiting, to make considerations on the
database which is available for combining to outer
intruders or very importantly curious that the data
recipients and it is essentially k-anonymity takes safer
method requiring in the generated table itself and the
respondents be differentiable with respect to the
group of objects. To guarantee the k-anonymity
requirement and the k-anonymity requires each quasiidentifier value in the released table to have at least k
occurrences.
The method of k-anonymity requires the
initial identification of the quasi-identifier. The quasiidentifier depends on the outside information is
available to the recipient and this explains her
combining the ability and not all possible external
tables are available to every possible data recipient
and the various quasi-identifiers physically exist for a
given table. For the requirement of the simplicity the
actual k-anonymity proposal [6] considers that private
table PT has a single quasi-identifier composed of all
attributes in PT that can be externally available and
contains at most one tuple for each respondent.
Therefore although the identification of the correct
quasi-identifier for a private table can be a difficult
task it is assumed that the quasi-identifier has been
properly recognized and defined.[7,8]
Among the techniques proposed for
providing anonymity in the release of microdata, the
k-anonymity proposal focuses on two techniques in
particular: generalization and suppression, which,
unlike other existing techniques, such as scrambling
or swapping, preserve the truthfulness of the
information.
We
have
already
introduced
generalization and suppression in chap. “Microdata
Protection”. We now illustrate here their specific
definition and use in the context of k-anonymity.
Generalization consists in substituting the values of a
given attribute with more general values. To this
purpose, the notion of domain (i.e., the set of values
that an attribute can assume) is extended to capture
the generalization process by assuming the existence
of a set of generalized domains. The set of original
domains together with their generalizations is referred
to as Dom. Each generalized domain contains
generalized values and there exists a mapping
between each domain and its generalizations.
In the existing technique are does not provide privacy
of crowd sourcing database. So that the anonymous
human workers are easily get the crowd sourcing
database. In the existing technique not provide more
for crowd sourcing database. So that so many
technique are not providing security and more
efficiency for crowd sourcing database. So that to
overcome those problem we implement proposed
system.
III. PROPOSED SYSTEM
In this paper we are proposing an
empirical model of privacy preserving data retrieval
and storage technique of crowd storage databases, so
we must ensure the privacy for sensitive data to the
anonymous human workers. This model or
architecture divided into two stages, one is
authentication and group key generation and other is
to maintain the data confidentiality .We areBlock
cipher Encryption Algorithm. Before encrypt and
decrypt the data the users will generate group key.
The process generating group key is as follows.
The following alogorithm shows the authentication
process and generation of the group key and
implementation of the protocol
4. Subset of P points and sig
1.send ( Nonce)
6. enc and Dec of crowd database
Node users
2. Response (Points)
3. share Points
5.Auth and secret
key
UserAuthentication Schema and Key generation:
ISSN: 2231-5381
Group
Key
manager
1.
Each user will send request to group key
manager.
http://www.ijettjournal.org
Page 168
International Journal of Engineering Trends and Technology (IJETT) – Volume 27 Number 3 - September 2015
2.
3.
4.
5.
6.
7.
8.
9.
The group key manger will send response
as id of users.
Each user will send the nonce to group key
manger. The nonce (Ni) value will be
generating randomly.
After retrieving all nonce of users the
group key manager will generate a
point(Xi,Yi) for each user based on
random challenge. After generating points
the group key manager will send to
individual users.
Each user will retrieve the point he/she
will generate share point (Xi ,Yi^Ni)using
the point and random challenge. The
generation share point each user will send
to group key manager.
Using share point the group key manager
will generate signature using hash
function. Those signatures are sent
individual user.
After retrieving the signatures each user
again generate signature and compare both
are equal those users verified user.
After completion of verification process
the group key manager will secret key.
After generating secret key the group key
manager will divide secret with n number
of parts. Where any subset of parts will
reconstruct the polynomial function.
Before generating polynomial function the
group key manager will choose random
number for generation of polynomial
function. The polynomial function is given
below.
F(x)=secret+bx+ax2
10. After that the group key manger will send
subset parts to individual user.
11. The users will retrieve those parts and
again generate polynomial function and
get same secret of each user.
Example:
• Let S=1234
• n=6 and k=3 and obtain random integers
a1=166 and a2=94
f(x)=1234+166x+94x2
• Secret
share
points
(1,1494),(2,1942)(3,2598)(4,3402)(5,4414)(
6,5614)
• We give each participant a different single
point (both x and f(x) ).
Re-construction:
• In order to reconstruct the secret any 3 points
will be enough
• Let us consider
(x0,y0)=(2,1924),(x1,y1)=(4,3402),(x2,y2)=(5,4414)
L0=x-x1/x0-x1*x-x2/x0-x2=x-4/2-4*x-5/2-5=(1/6)x2(3/2)x+10/3
L1=x-x0/x1-x0*x-x2/x1-x2=x-2/4-2*x-5/4-5=-(1/2)x2(7/2)x-5
L2=x-x0/x2-x0*x-x1/x2-x1=x-2/5-2*x-4/5-4=(1/3)x22x+8/3
2
f(x)=
jlj(x)=1942((1/6)x -(3/2)x+10/3)+3402(2
2
(1/2)x -(7/2)x-)+4414((1/3)x -2x+8/3)
f(x)=1234+166x+94x2
Group key manger receives the registration
request from all the users, and generates a verification
share and forwards to all the requested users for
authentication purpose, generates the key using key
generation process and forwards the points to
extraction of the key from the equation generated by
the verification points.
For key generation protocol, it receives the
verification shares and key as input to construct the
lagranges polynomial equation f(x), which is passed
through (0, key) and verification points ,after that
group key manager forwards the points to data
owners. Data owners again reconstruct the key from
the verification points and check the authentication
code which is sent by the group key manager.
The completion of authentication and secret key each
user will store the database into database engine.
Before storing data into database engine each user
will encrypt the data using block cipher encryption
algorithm. The procedure block cipher encryption
algorithm as follows.Block cipher encryption
algorithm is a 64-bit symmetric block cipher with
variable length key. The algorithm operates with two
parts:
i)key expansion part
ii)data encryption part.
The role of key expansion part is to converts a key of
at most 448 bits into several sub key arrays totaling
4168 bytes. The data encryption occurs via a 16round Feistel network. It is only suitable for
application where the key does not change often, like
communications link or an automatic file encryption.
It is significantly faster than most encryption
algorithms
when
implemented
on
32-bit
microprocessors with large data caches .The nature of
encryption algorithms is that, once any significant
amount of security analysis is done, it is very
undesirable to change the algorithm for performance
reasons, thereby invalidating the results of the
analysis. Thus, it is imperative to consider both
security and performance together during the design
phase. While it is impossible to take all future
computer architectures into consideration, an
understanding of general optimization guidelines,
combined with exploratory software implementation
on existing architectures to calibrate performance,
Using lagrangeous polynomials
ISSN: 2231-5381
http://www.ijettjournal.org
Page 169
International Journal of Engineering Trends and Technology (IJETT) – Volume 27 Number 3 - September 2015
should help achieve higher speed in future encryption
algorithms.
Sub key Expansion:
Block cipher encryption algorithm uses a large
number of subkeys. These keys must be precomputed
before any data encryption or decryption. The P-array
consists of 18 32-bit subkeys: P1, P2,..., P18. There
are four 32-bit S-boxes with 256 entries each:
S1,0, S1,1,..., S1,255;
S2,0, S2,1,..,, S2,255;
S3,0, S3,1,..., S3,255;
S4,0, S4,1,..,, S4,255.
Pseudo Code of BF(Block Fish) Algorithm:
begin itemize Block cipher encryption algorithm has
16 rounds. The input is a 64-bit data element, x.
Divide x into two 32-bit halves: xL, xR.
Then, for i = 1 to 16: xL = xL XOR Pi xR = F(xL)
XOR xR Swap xL and xR
After the sixteenth round, swap xL and xR again to
undo the last swap.
Then, xR = xR XOR P17 and xL = xL XOR P18.
Finally, recombine xL and xR to get the ciphertext.
processor for humanoperators,” in Proc. ACM
SIGMOD, Athens, Greece, 2011,pp. 1315–1318.
[4] X. Liu et al., “CDAS: A crowdsourcing data
analytics system,”vol. 5, no. 10, pp. 1040–1051,
2012.
[5] L. Sweeney, “k-anonymity: A model for
protecting privacy,” Int.J. Uncertain. Fuzz. Knowl.
Based Syst., vol. 10, no. 5, pp. 557–570,
2002.
[6] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan,
“Incognito:Efficient full-domain k-anonymity,” in
Proc. SIGMOD Conf.,Baltimore, MD, USA, 2005,
pp. 49–60.
[7] A. Meyerson and R. Williams, “On the
complexity of optimal kanonymity,”in Proc. 23rd
ACM PODS, New York, NY, USA, 2004,
pp. 223–228.
[8] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan,
“Mondrian multidimensionalk-anonymity,” in Proc.
22nd ICDE, Washington, DC,USA, 2006, p. 25.
[9] R. J. Bayardo, Jr. and R. Agrawal, “Data privacy
through optimalk-anonymization,” in Proc. 21st
ICDE, Washington, DC, USA,2005, pp. 217–228.
[10] V. S. Iyengar, “Transforming data to satisfy
privacy constraints,”in Proc. 8th ACM SIGKDD Int.
Conf. KDD, Edmonton, AB, Canada,2002, pp. 279–
288.
BIOGRAPHIES
Decryption is exactly the same as encryption, except
that P1, P2,..., P18 are used in the reverse order.
Implementations of Blowfish that require the fastest
speeds should unroll the loop and ensure that all
subkeys are stored in cache.
IV.CONCLUSION
We have been concluding our current research work
with efficient authentication and privacy preserving
technique of crowd source databases. Group key
protocol handles authentication and group key
generation ,after the key generation, individual user
can encode or decode the data with blowfish
cryptographic algorithm ,after encryption data can be
stored into data base. If any users retrieve that and
decrypt that data using secret get the original data.
Our implementation results shows more performance
than other traditional approaches.
PylaJyothi
currently
pursuing,M.Tech. ,in the department
of
computer
science
and
engineering,DIET
College,Visakhapatnam,Andhra
Pradesh. The areas of interests are
Computer
Networks,
DWDM,
Artificial Intelligence.
Ch. NagabhushanaRao working as
Associate
Professor
in
Department of Computer Science &
Engineering,Dadi
Institute
of
Engineering
&
Technology,
Anakapalle-531 002,A.P. The areas
of Interest are Computer Networks,
DWDM, Artificial Intelligence.
REFERENCES
[1] A. Feng et al. “CrowdDB: Query processing with
the VLDBcrowd,” Proc. VLDB, vol. 4, no. 12, pp.
1387–1390, 2011.
[2] A. Marcus, E. Wu, S. Madden, and R. C. Miller,
“Crowdsourceddatabases: Query processing with
people,” in Proc. 5th CIDR,2011, pp. 211–214.
[3] A. Marcus, E. Wu, D. R. Karger, S. Madden, and
R. C.Miller, “Demonstration of qurk: A query
ISSN: 2231-5381
http://www.ijettjournal.org
Page 170
Download