Privacy Mining of Association Rules in Transversal Shared Databases TammineniKrushnam Raju ,

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
Privacy Mining of Association Rules in Transversal
Shared Databases
TammineniKrushnam Raju1, Ramesh kumar behara2
1
1,2
Final M.Tech Student, 2Assistant Professor
Dept of CSE, Sarada Institute of Science, Technology And Management(SISTAM), Srikakulam, Andhra Pradesh
Abstract:Association rule learning is a popular and well
researched method for discovering interesting relations
between variables in large databases. It is intended to
identify strong rules discovered in databases using different
measures of interestingness. In this paper we are proposing
horizontally distributed database and provides security of
patterns,to provide security of transaction item sets, we are
using AES cryptographic algorithm for encryption and
decryption of item sets. In this we generate the shared key
using Shamir secret share technique and for finding
frequent item sets we are usingaprori algorithm, So by
using these concepts we are providing security and reduce
time complexity while finding the frequent items.
I. INTRODUCTION
In distributed networks or open environments
nodes communicates with each other openly for
transmission of data, there is a rapid research going on
secure mining.Research work on privacy preserving
techniques while mining of data either in classification,
association rule mining or clustering.
Randomization and perturbation approach
available for privacy preserving process and it can be
maintained in two ways, one is cryptographic approach
here real data sets can be converted to unrealized datasets
by encoding the real datasets and the second one
imputation methods, here some fake values imputed
between there real dataset and extracted while mining with
some rules[1][2].
Clustering is a process of grouping similar type of
objects based on distance (for numerical data) or similarity
(for categorical data) between data objects. In distributed
environment data holders or players maintains individual
data sets and every node or vertex is connected with each
other by an edge along with their quasi identifiers [3].
In association rule mining approaches initially
frequent patterns can be generated based on the procedural
steps of the algorithm based on support count until no
ISSN: 2231-5381
frequent item set available and then finds the association
rule between entities and which are supporting minimum
support and confidence factors with respect to all frequent
items sets.
Most existing text clustering algorithms are
designed for central execution. They require that clustering
is performed on a dedicated node, and are not suitable for
deployment over large scale distributed networks.
Therefore, specialized algorithms for distributed and P2P
clustering have been developed, such as [6], [7], [8], [9].
However, these approaches are either limited to a small
number of nodes, or they focus on low dimensional data
only. In distributed environment, nodes are represented by
Privacy preserving Distributed clustering algorithm
proposed by “S. Jha, L. Kruger, and P. McDaniel”, here
data can be clustered by grouping the similar type of
objects and secure transmission through protocols
[4].Perturbation Method of string transformation proposed
for privacy preserving clustering technique by using
geometric techniques [10].
II. RELATED WORK
In
the
traditional
association
rule
mining,companies give theirdata to the analyst for finding
the patterns or association rules exist between the items.
Although it is advantageous to achieve sophisticated
analysis on tremendous volumes of data in a cost-effective
way, there exist several serious security issues of the datamining as- a-service paradigm. One of the main security
issues is that the server has access to valuable data of the
owner and may learn sensitive information from it. There is
a loss of corporate privacy. Traditional distributing
algorithm based on apriori, main disadvantage of this
approach is multiple database scan and candidate set
generations
Association rule mining is one of the mainly
essential and fine researched methods of data mining. It
aims to extort exciting correlations, common patterns,
associations or informal structures amongst sets of objects
in the transaction databases or additional data repositories.
Association rules are broadly used in a range of areas such
as telecommunication networks, market and hazard
managing, inventory control etc [1]. Different association
mining methods and algorithms will be momentarily
introduced and compared afterwards. Association rule
mining is to locate out association rules that suit the
http://www.ijettjournal.org
Page 315
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
predefined least amount support and confidence from a
database [3].
The trouble is decomposed into two sub problems.
One is to discover those item sets whose occurrences go
above a predefined threshold in the database; those item
sets are knownas frequent or large item sets. The second
dilemma is to produce association rules from those large
item sets with the constraints of negligible confidence [2].
The two most important approach for utilizing multiple
Processors that have emerge; distributed memory within
the each processor have a private memory; [6]and shared
memory within the all processors right to use common
memory. Shared memory structural design has many
popular properties. Each processor has a straight and equal
access to all memory in the scheme.[4]
In distributed memory structural design each
processor has its own local memory that can only be access
directly by that processor. A Parallel purpose could be
divided into number of subtasks and executed parallelism
on disconnect processors in the system .though the
presentation of a parallel application on a distributed
system is mostly subject on the allocation of the tasks
comprising the application onto the accessible processors
in the scheme.[5]
Categorization models, is the mostly applied method. The
Apriori algorithm is the mainly representative algorithm
for association rule mining. It consists of plenty of
modified algorithms that focus on civilizing its efficiency
and accuracy.
cryptographic algorithm. In the proposed system we are
using horizontally distribute data base for finding the
frequent items from the different database. In this paper we
are using aprori algorithm for finding frequent item sets ,
before finding frequent item sets, users send the item sets
to third party, in the form cipher item sets, to provide
cipher item sets we are using AES algorithm. The AES
algorithm requires one secret key,it can be generated by
Shamir shared key. So that by proving those two
techniques the proposed system more secure and also
reduce time for finding frequent item sets.
Authentication of Users:
In this module we are performing the authentication
of users before sending the transaction data base. The
authentication of users can done by using authenticate
group key transfer protocol. The process of authentication
as follows.
 The initiator sends a key generation request to
KGC with a list of group members as
M{m1,m2,…mn}
 KGC broadcasts the list of all participating
members M{m1,m2,…mn} as aResponse
 Each group member M will generate random
challenge Rcha and sent to KGC
 The KGC Random select group key and generate
inter polynomial function f(x) with degree of t to
pass through t+1 points (0,k) and (ai, bi+Rchg)for
i=1,2,….t.
III. PROPOSED SYSTEM
The proposed system is used for finding frequent items and
also provides the security of transaction item sets through
Data
Holder1
1.Authentication
and key
generation
3.Sent
transactio
n item sets
2.Read item
sets and
encrypt
3.Sent
transactio
n item sets
Data
Holder2
4.Find frequent item
sets
1. Authentication
and key
generation
1.Authenticat
ion and key
generation
2.Read item sets and
encrypt
2.Read item sets
and encrypt
AES
Algorithm
ISSN: 2231-5381
Data
Holder N
Centerlized
server
http://www.ijettjournal.org
3.Sent
transactio
n item sets
APRIORI
ALGORITHM
Page 316
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014




The KGC also generate additional points and
auth=h(h,k;m1,m2,…mn.; Rcha; . . .;Rcha(t);P1; . . . ;
Pt) where h is hash function
After generation authentication code the KGC will
sent all group member.
For each group member will calculate shared
secret key and retrieve the additional from
KGCand compute polynomial function.
After generation of polynomial function the group
member will recover the group key.
Before recovering group key each and every
group will check authentication process
i.e. auth=h(k;m1,m2,…mn.; Rcha; . . .;Rchat;P1; . . . ;
Pt ,) if both authentication codes are the Group
member is authenticated.
Shared secret key:
In this module each user completion of authentication
he/she will recovery the shared secret key from polynomial
function. The generation shared secret key we will used
Shamir secret key and Lagrange’s polynomial
equation.After generation of secret key each user encrypts
and decrypts the transaction items sets using this key and
any cryptography technique.
L1=x-x0/x1-x0*x-x2/x1-x2=x-2/4-2*x-5/4-5=-(1/2)x2-(7/2)x-5
L2=x-x0/x2 -x0 *x-x1/x2-x1=x-2/5-2*x-4/5-4=(1/3)x2-2x+8/3
2
f(x)=∑
j * lj(x) =1942((1/6)x -(3/2)x+10/3)+3402(2
2
(1/2)x -(7/2)x-)+4414((1/3)x -2x+8/3 )
f(x)=1234+166x+94x2
here we can consider the coefficient or a
which means that Secret key S is1234.
o
as secret key
Encryption and decryption of transaction item sets:
In this module each user will collect all transaction
item sets from the transaction database. After retrieving
item sets the user will convert the plain transaction item
sets into cipher using cryptography technique. In this paper
we are using AES algorithm for encryption and decryption
of transaction item sets. The user will encrypt the
transaction item sets using shred secret key and AES
algorithm. After encrypt the transaction item sets the sent
those transaction items to analyst.
Apriori Algorithm:
The name of the algorithm is based on the fact
that the algorithm uses prior knowledge of frequent itemset
Key Generation Process:
properties . It is an iterative approach where k-itemsets are
Example:
• Let us consider a secret key S=1234
•
N is total number of points n=6 and ,k is
minimum number of secret shares where consider
k=3 and consider any two random numbersa=166
and b=94 then
f(x)=1234+166x+94x2
• The points which are satisfying equation or secret
shares
are
D 0=
(1,1494),D1=(2,1942)D3=(3,2598)D4=(4,3402)D5
=(5,4414)D6=(6,5614)
used to find out (k+1) itemsets.To improve the efficiency
an important property Apriori property is used to reduce
the search space.
Algorithm:
1) L1 = {large 1-itemsets};
2) for( k = 2; Lk-1
3) Ck = apriori-gen(Lk-1); // New candidates
4) foralltransactions t
Centralized server forwards a specific point both x and y,
because we use n-1 number of shares instead of n the
points initiates from (1, f(1)) and not (0, f(0)). This is
required because if one would have (0, f(0)) he would also
know the secret key (S=f(0))
Ø; k++ ) do begin
D do begin
5) Ct = subset(Ck, t); Candidates contained in t
6) forallcandidates c
Ct do
7) c.count++;
8) end
Re-construction of secret key:
• In order to reconstruction of secret key,any three
points are enough
• Let us consider
(x0,y0)=(2,1924),(x1,y1)=(4,3402),(x2,y2)=(5,4414)
Using lagrangeous polynomials
L0=x-x1/x0-x1*x-x2/x0-x2=x-4/2-4*x-5/2-5=(1/6)x2 (3/2)x+10/3
ISSN: 2231-5381
9) Lk = {c
Ck | c.count
minsup}
10) end
11) Answer =
k Lk;
In this module the analyst will retrieve the all cipher
transaction item sets from users. After retrieving of all
transaction item sets from the users the analyst will decrypt
those transaction items using AES algorithm. After
completion decryption process the analyst will find the
http://www.ijettjournal.org
Page 317
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
frequent item sets using pattern mining technique. In this
paper we are using approri algorithm for finding frequent
item sets.
[10] T. ElGamal. A public key cryptosystem and a
signature scheme based ondiscrete logarithms.IEEE
Transactions on Information Theory, 31:469–472, 1985.
IV. CONCLUSION
BIOGRAPHIES
In this paper we proposed an algorithm for finding
frequent item sets from the transaction databases. For
finding frequent item sets we are using so many pattern
mining algorithms. In this paper we are using approri
algorithm for finding frequent item sets. Before finding
frequent item sets each data
holders will perform
verification of authentication users or not. After completion
authentication each user will generate secret shared key for
the encryption and decryption of transaction item sets. The
generation of secret key we are using Shamir shared key
and lagranges polynomial equation. The encryption and
decryption of transaction item sets we are using AES
algorithm. By proposing those concepts we provide more
security and finding most effective frequent pattern.
REFERENCES
TammineniKrushnamRaju is a Student
in M.Tech(CSE) in Sarada Institute of
science
Technology
And
Management,Srikakulam. He Received
his B.Tech(IT) from
Sri
Venkateswara College of Engineering
& Technology ,at Etcherla in Srikakulam. His interesting
areas are Data warehousing,java and oracle database.
Ramesh kumarbehara is working as
Asst.professor in Sarada Institute of
Science,
Technology
And
Management,Srikakulam,
Andhra
Pradesh. He received his M.Tech (CSE)
from Sarada Institute of Science,
Technology
And
Management,Srikakulam,
Andhra
Pradesh. JNTU Kakinada Andhra Pradesh. His research
areas include Network Security.
[1] R. Agrawal, C.Faloutsos, and A. Swami.Efficient
similarity search in sequence databases.In Proc. of the
Fourth International Conferenceon Foundations of Data
Organization and Algo-rithms, Chicago, October 1993.
[2] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, andA.
Swami.An interval classifier for databasemining
applications.In Proc. of the VLDBConference, pages
560{573, Vancouver, BritishColumbia, Canada, 1992.
[3] R. Agrawal, T. Imielinski, and A. Swami.Database
mining: A performance perspective.IEEE Transactions on
Knowledge and Data Engineering, 5(6):914{925,
December 1993. SpecialIssue on Learning and Discovery
in Knowledge-Based Databases.
[4] M. Bellare, R. Canetti, and H. Krawczyk. Keying hash
functions formessage authentication. In Crypto, pages 1–
15, 1996.
[5] A. Ben-David, N. Nisan, and B. Pinkas.FairplayMP - A
system forsecure multi-party computation. In CCS, pages
257–266, 2008.
[6] J.C. Benaloh. Secret sharing homomorphisms: Keeping
shares of a secretsecret. In Crypto, pages 251–260, 1986.
[7] J. Brickell and V. Shmatikov.Privacy-preserving graph
algorithms inthe semi-honest model. In ASIACRYPT, pages
236–252, 2005.
[8] D.W.L. Cheung, J. Han, V.T.Y. Ng, A.W.C. Fu, and Y.
Fu. A fastdistributed algorithm for mining association
rules. In PDIS, pages 31–42, 1996.
[9] D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu.
Efficient miningof association rules in distributed
databases. IEEE Trans. Knowl. DataEng., 8(6):911–922,
1996.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 318
Download