An Efficient and Secure Horizontal Partitioning Technique between Centralized Data holders

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 8 – Oct 2014
An Efficient and Secure Horizontal Partitioning
Technique between Centralized Data holders
T.Padmavathy 1,P.Suresh Babu 2
Final M.Tech Student1,Associate professor2
1,2
Department of computer science and engineering at Kaushik College of engineering, JNTUK University.
Abstract: Secure mining of association rule mining over
horizontal databases is always an interesting research issue
in the field of knowledge and data engineering. In
horizontal partitioning or data bases, databases are
integrated from various data holders or players for
applying association rule mining over integrated database.
In this paper we are proposing a privacy preserving mining
approach with Improved Shamir secret key generation for
secure key generation and Bit Wise Matrix approach.
Index Terms: Association Rule mining , Bit Wise Matrix ,
LaGrange’s polynomial.
I.INTRODUCTION
Association rule mining is key method of the major needed
process and accurate researched methods of data mining.
Its main focus to find common patterns and associations
that is unformatted structures over large sets of objects in
the transaction databases or additional data repositories.
Association rules are very widely used in a range of areas
those are communication networks, marketing and hazard
managing and inventory control etc.[1] There are many
different association mining methods and algorithms are
introduced and researched on compared further.
Association rule mining algorithms is to find association
rules that are apt to the pre-declared less support and
confidence factors from a database [3].
The main disadvantage is divided into two types.
First one is to find those item sets occurrences go redefined
threshold in the database. Such types of item sets are
known as very frequent or large item sets. The next one is
ambiguity to resultant association rules from large item sets
with the features of very less confidence [2]. There two
main approaches for utilizing multiple processors and
evolved and the decentralized storage within the each
processor have a secret storage; [6] and distributed shared
storage within all processors that have right to use common
storage.
Static storage design structure has many
properties. Every processor has equal access to all storage
in the methods.[4] In distributed storage structural
implementation of each processor has its own local storage
that can only be access directly by that processor.
ISSN: 2231-5381
A Parallel purpose could be divided into number of
subtasks and executed parallelism on disconnect processors
in the system .though the presentation of a parallel
application on a distributed system is mostly subject on the
allocation of the tasks comprising the application onto the
accessible processors in the scheme.[5]
Association rule mining model amongst data
mining numerous models, including Association rules,
clustering and categorization models, is the mostly applied
method.
The Apriori algorithm is the mainly
representative algorithm for association rule mining.
It includes many of manipulated methods that are
mainly concentrated
concept on its efficiency and
accuracy. Association Rule Learning is a basic method
used to find the associations over many variables. It is used
by retailers, and many of them with a many transactional
database.[7]Association rules are conditional statements
that help to find associations rule between unrelated data in
a relational database or any information storehouse.
Consider an example of an association rule would be "If a
customer buys a dozen breads and he is 80% likely to also
purchase butter/jam."
The association rules are analysing the data for frequent are
conditional patterns and using the situational support and
confidence is to identify the most important
associations[9]. The support is a notation of how frequently
the items emerge in the database. In data mining,
association rules are helpful for analysing and predicting
customer nature [8][9]. They play an significant role in
shopping basket data analysis, item clustering, catalog
design. Programmers use association rules to construct
programs of machine learning [5]. Machine learning is a
sort of artificial intelligence that seeks to assemble
programs with the capability to develop into more
competent without being explicitly programmed.
Algorithms for mining association rules from relational
data have been developed. numerous query languages have
been planned, to assist association rule mining such as the
issue of mining XML data has acknowledged very little
concentration, as the data mining society has paying
attention on the progress of techniques for extracting
common arrangement from varied XML data.
II. RELATED WORK
An association rule implies definite association
interaction among a set of objects in a database. An
http://www.ijettjournal.org
Page 358
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 8 – Oct 2014
association rule is an expression of the form A,B, where A
and B are items[10]. The observant logic of such a rule is
that transactions of the database which contain A be
inclined to contain B Association rule is one of the data
mining technique used to huge data out concealed
information starting datasets that can be use by an
organization decision maker to get better on the complete
earnings.
Apriori is an algorithm for frequent item set
mining and association rule learning over transactional
databases. It proceed by identifying the recurring
individual items in the database as well as extending them
to bigger item sets as long as those item sets come out
adequately often in the database. The frequent item sets
examined by Apriori and can be used to create a conclusion
association rules which depict interest to general trends in
the database. Apriori is designed to work on databases
containing transactions (for eg. collections of objects
bought by customers,)[11][12].
Other algorithms are planned for ruling
association rules in data having no transactions or having
no timestamps. Each transaction is seen as a set of items.
Data Holder1
Data Holder2
Apriori uses a "bottom up" method, where numerous
subsets are extensive one item at an instance and groups of
candidates are experienced alongside the data. Pseudocode
below demonstrates the process of frequent itemset
generation of the Apriori algorithm.
Distributed/Parallel Algorithms
Databases may accumulate an enormous quantity of data to
be mined. Mining association rules into such databases
might involve significant processing power [13].
III. PROPOSED WORK
In this approach we are proposing a privacy
preserving mining approach with Bit Wise Matrix, it
reduces problem of multiple database scans and candidate
set generations by constructing the Bit Wise Matrix. Data
can be integrated from multiple data holders or players, for
secure transmission or distributed partitioning we are
implementing an improved lagranges’s polynomial
approach for secure key generation for encryption of data
from data holders with triple DES algorithm.
Data Holder1
Cipher Pattern
Cipher Pattern
Cipher Pattern
Encoder/Decoder
Centralized Server
Binary Matrix
.
Fig1 : Horizontal partitioning Architecture
Every individual data holder or player maintains
their transactions or patterns, in horizontal partitioning,
every data holder forwards their patterns to centralized
server after encryption of patterns which are at individual
end,At centralized server received pattern can be decrypted
with decoder and forwarded to Bit Wise Matrix to extract
frequent pattern from the received patterns.
For experimental purpose we establish connection between
the nodes and Central location (Key generation center)
ISSN: 2231-5381
through network or socket programming, Key can be
generated by using improved LaGrange’s polynomial
equation and key can be distributed to user
Every individual node participates in key generation
process and retrieves key by reconstruction. Encrypts the
datasets by using triple DES and key which is generated by
the LaGrange’s polynomial equation. All encrypted
datasets can be forwarded to centralized location and
http://www.ijettjournal.org
Page 359
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 8 – Oct 2014
decrypted with same symmetric key and forwards to
mining process.
Group key manger receives the registration
request from all the users, and generates a verification
share and forwards to all the requested users for
authentication purpose, generates the key using key
generation process and forwards the points to extraction of
the key from the equation generated by the verification
points.
For key generation protocol, it receives the
verification shares and key as input to construct the
lagranges polynomial equation f(x), which is passed
through (0, key) and verification points ,after that group
key manager forwards the points to data owners. Data
points and check the authentication code which is sent by
the group key manager.
When a new user tries to download the file, new
user need not to connect other data owner to decryption of
the file, user connects to the group key manager he will
update the group key and decrypts the files with previous
key again encrypt with new key and updates the new key to
all the data owners.
Data owner initiate the request by sending the
random challenge to the group key manager, as a response
Group key manager sends a secret share, data owner
authenticates and forwards the verification share, data
owner receives the verification shares and generates the
key using Lagrange’s polynomial equation and forwards
4. Points(Subset of P points)
1. Request ( Rch)
Group
Key
manager
2.Response (Sshare)
Node users
3.Vshare
owners again reconstruct the key from the verification
Fig2
:
Authentication
and
Key
Generation
Rch ----Random challenge
Sshare---Secret share
Vshare----verification share
P={p1,p2…pn}-------points for construction of Lagrange’s
equation
Bit Wise Matrix:
Itemset
1
1
0
1
1
Item1
Item2
Item3
Item4
2
1
1
0
0
3
0
1
0
0
the points to data owners for regeneration the key
The server or service provider performing association rule
mining on cipher database for finding maximum frequent
item sets. Thus the research presented a new algorithm of
mining maximum frequent itemsets first based on the Bit
Wise Matrix of frequent length-1 itemsets. The main idea
of the algorithm isto create a Bit Wise Matrix with frequent
length-1 itemsets as row headings and
transaction records’ IDs as column headingsIn the matrix,
there are only two type of values, ‘1’ and ‘0’, which means
that the transaction record contains or not the
corresponding frequent length-1 itemset. Then it is
necessary to calculate the number of value 1 in each
column and the count of the columns with the same
number of value 1
Transaction Records ID
4
5
6
1
0
0
0
1
0
1
1
1
1
0
1
7
1
1
0
0
8
0
0
1
1
9
1
1
0
0
10
1
1
0
0
Fig 3 : Bit Wise Matrix
ISSN: 2231-5381
http://www.ijettjournal.org
Page 360
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 8 – Oct 2014
Now we can extract frequent patterns from the
matrix, to extract frequent 1 itemset, initially count
number of ones in vertical columns with respect to item,
ifit matches minimum threshold values then treat it as
frequent item else ignore, continue same process for 2
itemset,check whether two items have ‘1’ in their
corresponding vertical columns then increment, continue
until all transactions verified. If total count greater than
threshold value then treat it as frequent item
[9] International Journal of Computer Science and
Information Technology, Volume 3, Number 3, April 2010
[10] M.J. Zaki et al., Parallel Data Mining for Association
Rules on Shared- Storage. ,tech. report TR 618, Computer
Science Dept., Univ. of Rochester, 1997.
[11] D.W. Cheung et al."Efficient Mining of Association
Rules in Distributed Databases,"IEEE Trans. Knowledge
and Data Eng., vol. 8, no. 6, 1996,pp.911-922;
IV. CONCLUSION
We are concluding our research work with efficient
frequent pattern mining approach in secure manner over
horizontal databases ,a secure key can be generated through
efficient and improved lagranges polynomial equation and
cipher data can be received and decrypted by centralized
server and finds the frequent patterns from its end in an
accurate and efficient manner
[12] A. S and R. Wolff , "Communication-Efficient
Distributed Mining of Association Rules," Proc. ACM
SIGMOD Int'l Conf. Management of Data, ACM Press,
2001,pp. 47-48.
REFERENCES
[1] Dr .Sujni Paul, Associate Professor, Department of
Computer Science, Karunya University, Coimbatore 641 ,
Tamil Nadu, India
[14] H.Kargupta, I.Hamzaoglu, and Brian Stafford.
Scalable, distributed data mining-agent architecture. In
Heckerman et al. [8], page 21.
[2] R. Agrawal and R. Srikant , "Fast Algorithms for
Mining Association Rules in Large Distributed Database,"
Conf. Very Large Databases (VLDB 94),
[13] T.K Imielinski and A.M Virmani. MSQL: A query
language for database mining. 1999.
[15] R. Meo, G. Psaila, and S. Ceri. A new SQL like
operator for mining association rules. In The VLDB
Journal, pages 156–161.
BIOGRAPHIES
[3] R.J Agrawal and J.C. Shafer , "Parallel Mining of
association Rules," Distributed Systems Online March
2005
T.Padmavathy completed B.Tech in
Computer Science And Engineering
from
Godavari
Institute
of
Engineering and Technology. She
pursuing M.Tech in Department of
computer science and engineering at
Kaushik College of engineering,
[4] D.W.K Cheung ,"Efficient Mining of Association Rules
in Distributed Databases, "IEEE Trans. Knowledge and
Data Eng., vol. 9,
[5] D.W.K Cheung,"A Fast Dis Distributed Information
Systems, IEEE CS Press, 1997,
JNTUK University
[6] Albert Y.N Zomaya, Tarek El-Ghazawi, Ophir Frieder,
"Distributed Computing for Data Mining", IEEE
Concurrency, 1999. International Journal of Computer
Science and Information Technology, Volume 3, Number
3, April
P.Suresh Babu completed his
B.Tech and M.E. in Computer
Science and Engineering. He is
currently working as Associate
professor of Department of computer
science and engineering at Kaushik
College of engineering, JNTUK
University. He is having industrial
experience of 4 years and teaching experience of 15 years.
His areas of interest include Artificial intelligence, Neural
Networks, Cryptography & Network security, Compiler
Design and Advanced Data Structures.
[7] A. Prodromidis, P. Chan, and S. Stolfo. Chapter Meta
learning in Parallel distributed data mining systems: Issues
and approaches. AAAI/MIT Press, 2001.
[8] Morgan Kaufmann, 1996, pp. 432 Proc. ACM
SIGMOD 1-12. 2010- 99 Proc. 20th Int'l 16 IEEE tributed
Proc. Parallel and 432-444.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 361
Download