An Improved Privacy Preserving Mining over Centralized Databases Ch.AjayKumar ,K.PrasadaRao

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014
An Improved Privacy Preserving Mining over
Centralized Databases
Ch.AjayKumar1,K.PrasadaRao2
1
M.Tech Student, 2Sr. Assistant Professor,
1,2
Department of CSE, Aditya Institute Of Technology And Management(AITAM),Tekkali,Srikakulam,AndhraPradesh
Abstract: Secure mining of association rule mining over
horizontal databases is always an interesting research issue in the
field of knowledge and data engineering. In horizontal
partitioning or data bases, databases are integrated from various
data holders or players for applying association rule mining over
integrated database. In this paper we are proposing a privacy
preserving mining approach with Improved LaGrange’s
polynomial equation for secure key generation and Binary Matrix
approach.
Index Terms: Association Rule mining, Binary Matrix,
LaGrange’s polynomial.
I.INTRODUCTION
In view of this brief description, it can be seen that
all of these protocols for secure multiparty function
evaluation run in unbounded "distributed time," that is,
using an unbounded number of rounds of communications
[1]. Even though the interaction for each gate can be
implemented in a way that requires only a constant number
of rounds, the total number of rounds will still be linear in
the depth of the underlying circuit. For many concrete
computations, the resulting number of rounds would be
prohibitive; in distributed computation, the number of
rounds is generally the most valuable resource quality
important
Secure function evaluation consists of distributive
evaluating a function so as to satisfy both the correctness
and privacy constraints. This task is made particularly
difficult by the fact that some of the players may be
maliciously faulty and try to cooperate in order to disrupt
the correctness and the privacy of the computation. Secure
function evaluation arises in two main settings. First, in
fault-tolerant computation. In this [2] setting correctness is
the main issue: we insist that the values a distributed
system returns are correct, no matter how some
components in the system fail. However, even if one is
solely interested in correctness, privacy helps to achieve it
most strongly: if one wants to maliciously influence the
outcome of an election, say, it is helpful to know who plans
to vote for whom. Second, secure function computation is
central to protocol design, as the correctness and privacy of
any protocol can be reduced to it. Here, as people may be
behind their computers, correctness and privacy are
Secure .function evaluation [3]. Assume we have n parties,
1 , . . . , n; each party i has a private input xi known only to
him. The parties want to correctly evaluate a given function
f on their inputs1, that is to compute y = f ( x l , ...,xn ),
ISSN: 2231-5381
while maintaining the privacy of their own inputs. That is,
they do not want to reveal more than the value y implicitly
reveals.
e.Bar-Ilan and Beaver were the first to investigate reducing
the round complexity for secure function evaluation. They
exhibited a non-cryptographic method that always saves a
logarithmic factor of rounds (logarithmic in the total length
of the players' inputs), while the total amount of
communication grows only by a polynomial factor.
Alternatively, they show that the number of rounds can be
reduced to a constant, but at the expense of an exponential
blowup in the message sizes. We insist that the total
amount of communication be polynomial bounded. While
their result shows that the depth of a circuit is not a lower
bound for the number of rounds necessary for securely
evaluating it, the savings is far from being substantial in a
general setting.
II. RELATED WORK
In the traditional association rule mining,
companies give their data to the analyst for finding the
patterns or association rules exist between the items.
Although it is advantageous to achieve sophisticated
analysis on tremendous volumes of data in a cost-effective
way, there exist several serious security issues of the datamining as-a-service paradigm. One of the main security
issues is that the server has access to valuable data of the
owner and may learn sensitive information from it. There
is a loss of corporate privacy. Traditional distributing
algorithm based on apriori, main disadvantage of this
approach is multiple database scan and candidate set
generations.
Association rule mining is one of the mainly
essential and fine researched methods of data mining. It
aims to extort exciting correlations, common patterns,
associations or informal structures amongst sets of objects
in the transaction databases or additional data repositories.
Association rules are broadly used in a range of areas such
as telecommunication networks, market and hazard
managing, inventory control etc. [1]. Different association
mining methods and algorithms will be momentarily
introduced and compared afterwards. Association rule
mining is to locate out association rules that suit the
predefined least amount support and confidence from a
database [3].
http://www.ijettjournal.org
Page 305
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014
The trouble is decomposed into two sub problems.
One is to discover those item sets whose occurrences go
above a predefined threshold in the database; those item
sets are known as frequent or large item sets. The second
dilemma is to produce association rules from those large
item sets with the constraints of negligible confidence [2].
The two most important approach for utilizing multiple
Processors that have emerge; distributed memory within
the each processor have a private memory; [6]and shared
memory within the all processors right to use common
memory. Shared memory structural design has many
popular properties. Each processor has a straight and equal
access to all memory in the scheme.[4]
In distributed memory structural design each
processor has its own local memory that can only be access
directly by that processor. A Parallel purpose could be
divided into number of subtasks and executed parallelism
on disconnect processors in the system .though the
presentation of a parallel application on a distributed
system is mostly subject on the allocation of the tasks
comprising the application onto the accessible processors
in the scheme.[5]
Association rule mining model amongst data mining
numerous models, including Association rules, clustering
and categorization models, is the mostly applied method.
The Apriori algorithm is the mainly representative
algorithm for association rule mining. It consists of plenty
of modified algorithms that focus on civilizing its
efficiency and accuracy.
III. PROPOSED WORK
In this approach we are proposing a privacy
preserving mining approach with Binary Matrix, it reduces
problem of multiple database scans and candidate set
generations by constructing the Binary Matrix. Data can be
integrated from multiple data holders or players, for secure
transmission or distributed partitioning we are
implementing an improved Lagrange’s polynomial
approach for secure key generation for encryption of data
from data holders with triple DES algorithm
.
Fig1 : Horizontal partitioning Architecture
Data Holder1
Data Holder2
Data Holder1
Cipher Pattern
Cipher Pattern
Cipher Pattern
Encoder/Decoder
Centralized Server
Every individual data holder or player maintains
their transactions or patterns, in horizontal partitioning ,
every data holder forwards their patterns to centralized
server after encryption of patterns which are at individual
end, At centralized server received pattern can be
decrypted with decoder and forwarded to binary matrix to
extract frequent pattern from the received patterns.
For experimental purpose we establish
connection between the nodes and Central location (Key
generation center) through network or socket
programming, Key can be generated by using improved
ISSN: 2231-5381
Binary Matrix
LaGrange’s polynomial equation and key can be
distributed to user
Every individual node participates in key generation
process and retrieves key by reconstruction. Encrypts the
datasets by using triple DES and key which is generated by
the LaGrange’s polynomial equation. All encrypted
datasets can be forwarded to centralized location and
decrypted with same symmetric key and forwards to
mining process.
Group key manger receives the registration
request from all the users, and generates a verification
share and forwards to all the requested users for
http://www.ijettjournal.org
Page 306
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014
authentication purpose, generates the key using key
generation process and forwards the points to extraction of
the key from the equation generated by the verification
points.
For key generation protocol, it receives the
verification shares and key as input to construct the
Lagrange’s polynomial equation f(x), which is passed
through (0, key) and verification points, after that group
key manager forwards the points to data owners. Data
owners again reconstruct the key from the verification
points and check the authentication code which is sent by
the group key manager.
When a new user tries to download the file, new
user need not to connect other data owner to decryption of
the file, user connects to the group key manager he will
update the group key and decrypts the files with previous
key again encrypt with new key and updates the new key to
all the data owners.
Key Generation process
The goal is to divide secret
(e.g., a safe combination)
into
pieces of data D1,….Dn in such a way that:
1. Knowledge of any k or more Di pieces makes
S easily computable.
2. Knowledge of any k-1 or fewer Di pieces leaves
S completely undetermined (in the sense that all
its possible values are equally likely).
This scheme is called (k,n) threshold scheme. If k=n then
all participants are required to reconstruct the secret.
•
Consider n=6 and k=3 and obtain any random
integers a1=166 and a2=94
f(x)=1234+166x+94x2
• Secret
share
points
D0=
(1,1494),D1=(2,1942)D3=(3,2598)D4=(4,3402)D5
=(5,4414)D6=(6,5614)
We give each participant a different single point
(both x and f(x)). Because we use Dx-1 instead of Dx the
points start from (1, f(1)) and not (0, f(0)). This is
necessary because if one would have (0, f(0)) he would
also know the secret (S=f(0))
Re-construction
• In order to reconstruct the secret any 3 points will
be enough
• Let us consider
(x0,y0)=(2,1924),(x1,y1)=(4,3402),(x2,y2)=(5,4414)
Using lagrangeous polynomials
L0=x-x1/x0-x1 *x-x2/x0-x2=x-4/2-4*x-5/2-5=(1/6)x2(3/2)x+10/3
L1=x-x0/x1-x0*x-x2/x1-x2=x-2/4-2*x-5/4-5=-(1/2)x2-(7/2)x-5
L2=x-x0/x2-x0 *x-x1/x2-x1=x-2/5-2*x-4/5-4=(1/3)x2-2x+8/3
2
f(x)=∑
j * lj(x) =1942((1/6)x -(3/2)x+10/3)+3402(2
2
(1/2)x -(7/2)x-)+4414((1/3)x -2x+8/3 )
f(x)=1234+166x+94x2
Recall that the secret is the free coefficient, which means
that S=1234.
Data owner initiate the request by sending the
random challenge to the group key manager, as a response
Group key manager sends a secret share, data owner
authenticates and forwards the verification share, data
owner receives the verification shares and generates the
key using Lagrange’s polynomial equation and forwards
the points to data owners for regeneration the key.
Example
• Let us consider S=1234 (Secret key)
4. Points(Subset of P points)
1. Request ( Rch)
Node users
2.Response (Sshare)
Group
Key
manager
3.Vshare
Rch ----Random challenge
P={p1,p2…pn}-------points for construction of Lagrange’s
equation
Sshare---Secret share
Binary Matrix:
Vshare----verification share
ISSN: 2231-5381
http://www.ijettjournal.org
Page 307
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014
The server or service provider performing
association rule mining on cipher database for finding
maximum frequent item sets. Thus the research presented a
new algorithm of mining maximum frequent itemsets first
based on the Binary Matrix of frequent length-1 itemsets.
The main idea of the algorithm is to create a Binary Matrix
with frequent length-1 itemsets as row headings and
transaction records’ IDs as column headings. In the matrix,
there are only two type of values, ‘1’ and ‘0’, which means
that the transaction record contains or not the
corresponding frequent length-1 itemset. Then it is
necessary to calculate the number of value 1 in each
column and the count of the columns with the same
number of value 1
If intersection of (i,j)==1
Counter :=+1;
Next
If counter ==Ii .size() then add to item list
Next
Step3: Set minimum threshold value (t)
Step4: for k=0;k<itemlist_size ;k++
If item_list[k].count >= t Then
add to frequent item list
Algorithm for Binary Matrix construction:
Step1: While e (true ) // patterns available
Step2: Read the individual pattern Pi separated by a special
character.
Step3: Construct an empty matrix with I rows and j
columns
Where ‘i’ is item and ‘j ‘ is transaction id
Step4: Set intersection (i,j)=1 if corresponding g item ‘I’
available in particular transaction ID ‘J’ .else set to 0.
Step5: Continue step 2 to 4
Now we can extract frequent patterns from the
matrix, to extract frequent 1 itemset, initially count
number of ones in vertical columns with respect to item, if
it matches minimum threshold values then treat it as
frequent item else ignore, continue same process for 2
itemset, check whether two items have ‘1’ in their
corresponding vertical columns then increment, continue
until all transactions verified. If total count greater than
threshold value then treat it as frequent item.
EXPERIMENTAL ANALYSIS
For experimental analysis we had implemented in
Java and considered some set of transactions for mining of
frequent patterns Let us consider some sample transactions
as follows
Transaction
Pattern
T1
a,b,c,d
T2
a,c,e
T3
b,d
T4
b,c,e
T5
c,d,e
T6
a,b,c,d,e
Fig 3: Transaction Table
ISSN: 2231-5381
Step1 : read item set {I1,I2…In) and Initialize counter:=0
,final counter :=0
Step2 : for i:=0 ;i< n ;i++
For j:=0 j<trans _size ;j++
Next
Step5: return frequent pattern list
Binary matrix can be constructed based on the
availability of the item with respect to transaction. Initially
the first transaction contains “a,b,c,d” ,So in corresponding
positions of items set to ‘1’ in first transaction else ‘0’ and
consider second transaction “a,c,e”, set the corresponding
item positions to ‘1’ in second transaction, continue the
process until all transactions get completed.
Frequent patterns generation
Initially frequent one item set can be generated by
counting number of individual items in all transactions
like, Consider item ‘a’, now count number of ‘1’s opposite
to item ‘a’ in all transactions, total count of a is 3 because a
available in transaction1 2 and 6.if the count equal or
greater than minimum threshold value or support count ( 2
in our example) it can be treated as frequent item.
To find the frequent two item set or three item set or n
item set, we can follow the same procedure until frequent
items found.
Consider two item set {a,b},now check the
corresponding ones opposite to “a,b” (both should be set to
“1”),then count would be “1”.
In the above table transaction 1 and 6 contains “1” in both
places of a and b, so count is 2. Now {a,b} is a frequent
item ,because our minim support count value is 2,by the
same process you can find the remaining frequent patterns.
Frequent ‘n’ item set
To find the frequent two item set or three item set or n
item set, we can follow the same procedure until frequent
items found.
Consider two item set {a,b},now check the
corresponding ones opposite to “a,b” (both should be set to
“1”),then count would be “1”.In the above table
transaction 1 and 6 contains “1” in both places of a and b,
http://www.ijettjournal.org
Page 308
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014
so count is 2. Now {a,b} is a frequent item ,because our
minim support count value is 2,by the same process you
can find the remaining frequent patterns.
Itemset
a
1
1
b
1
c
Transaction IDS
2
1
3
0
4
0
5
0
6
1
0
1
1
0
1
1
1
0
1
1
1
d
1
0
1
0
1
1
e
0
1
0
1
1
1
Association rules generation:
Frequent item sets are not equal to association rules, one
more step required to find association rules, in order to
obtain association between entities or items i.e. A  B,
we need to have support(A  B) and support(A)
All the required information for confidence
computation has already been recorded in itemset
generation


2. D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu.
Efficient mining of association rules in distributed
databases. IEEE Trans. Knowl. Data Eng., 8(6):911–922,
1996.
3. R. Agrawal and R. Srikant. Privacy-preserving data
mining. In SIGMOD Conference, pages 439–450, 2000.
4. M. Bellare, R. Canetti, and H. Krawczyk. Keying hash
functions for message authentication. In Crypto, pages 1–
15, 1996.
[5] A. Ben-David, N. Nisan, and B. Pinkas.FairplayMP - A
system for secure multi-party computation. In CCS, pages
257–266, 2008.
For each frequent itemset X,
For each proper nonempty subset A of X,
 Let B = X - A
 A  B is an association rule if
[6] J.C. Benaloh. Secret sharing homomorphisms: Keeping
shares of a secret. In Crypto, pages 251–260, 1986.
Confidence (A  B) ≥ minconf,
Support (A  B) = support (AB) =
support(X)
Confidence (A  B) = support (A  B)
/ support (A)
[7] J. Brickell and V. Shmatikov.Privacy-preserving graph
algorithms inthe semi-honest model. In ASIACRYPT,
pages 236–252, 2005.
[8] D.W.L. Cheung, J. Han, V.T.Y. Ng, A.W.C. Fu, and Y.
Fu. A fastdistributed algorithm for mining association
rules. In PDIS, pages 31–42, 1996.
IV. CONCLUSION
We are concluding our research work with efficient
frequent pattern mining approach in secure manner over
horizontal databases ,a secure key can be generated through
efficient and improved lagranges polynomial equation and
cipher data can be received and decrypted by centralized
server and finds the frequent patterns from its end in an
accurate and efficient manner
REFERENCES
[9] D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu.
Efficient mining of association rules in distributed
databases. IEEE Trans. Knowl. DataEng., 8(6):911–922,
1996.
[10] T. ElGamal. A public key cryptosystem and a
signature scheme based on discrete logarithms.IEEE
Transactions on Information Theory, 31:469–472, 1985
BIOGRAPHIES
1. The Round Complexity of Secure Protocols by Donald
Beaver*Harvard University s
ISSN: 2231-5381
Chintada Ajay Kumar completed B.Tech
degree in computer science and
http://www.ijettjournal.org
Page 309
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014
engineering and He is pursuing M.Tech degree in the
Department of Computer Science and Engineering, from
Aditya Institute of Technology And Management
(AITAM), Tekkali,A.P, and India. His Interested areas are
Data Mining and Computer Networks.
K. Prasada Rao completed his b.tech in
2004 and completed m.tech in 2009. He
pusrsuing P.hd from Acharya Nagarjuna
University. He working as Sr. Assisstant
Professor in the Department of Computer
Science and Engineering, from Aditya Institute of
Technology And Management (AITAM), Tekkali,A.P, and
India. His Interested areas are Data Mining and Computer
Networks.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 310
Download