A Privacy Preserving Data mining over Distributed Network for Data holders

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014
A Privacy Preserving Data mining over Distributed
Network for Data holders
1
2
P. Ramesh ,Ch.Swapna Priya
Final M.Tech Student1,Assistant Professor2
1,2
Dept of CSE, Pydah Engineering College, Boyapalem, Visakhapatnam.
Abstract: Secure mining in horizontal databases is a
research issue in field of data engineering. In horizontal
partitioning, databases are integrated from various data
holders for applying association rule. In this paper we are
proposing a privacy preserving mining approach with
Improved LaGrange’s polynomial equation for secure key
generation and Boolean Matrix approach.
Index Terms: Association Rule mining, Boolean Matrix,
LaGrange’s polynomial.
I. INTRODUCTION
Using an unbounded number of rounds of
communications, for each gate can be implemented in a
way that requires only a constant number of rounds; the
total number of rounds will still be linear in the depth of
the underlying circuit. For many concrete computations,
the resulting number of rounds would be prohibitive; in
distributed computation, the number of rounds is generally
the most valuable resource quality important
Secure
function
evaluation
consists
of
distributively evaluating a function so as to satisfy both the
correctness and privacy constraints. This task is made
particularly difficult by the fact that some of the players
may be maliciously faulty and try to cooperate in order to
disrupt the correctness and the privacy of the computation.
Secure function evaluation arises in two main settings.
First, in fault-tolerant computation. In this [2]setting
correctness is the main issue: we insist that the values a
distributed system returns are correct. If one wants to
maliciously influence the outcome of an election, it is
helpful to know who plans to vote for whom. secure
function computation is central to protocol design, as the
correctness and privacy of any protocol can be reduced to
it. Here, as people may be behind their computers,
correctness and privacy Secure function evaluation [3].
Assume we have n parties, 1 , . . . , n; each party i has a
private input xi known only to him. The parties want to
correctly evaluate a given function f on their inputs1, that is
to compute y = f ( x l , ...,z,~), while maintaining the
privacy of their own inputs. That is, they do not want to
reveal more than the value y implicitly reveals.
e.
Bar-Ilan and Beaver were the first to investigate
reducing the round complexity for secure function
evaluation. They exhibited a non-cryptographic method
ISSN: 2231-5381
that always saves a logarithmic factor of rounds
(logarithmic in the total length of the player’s inputs),
while the total amount of communication grows only by a
polynomial factor. Alternatively, they show that the
number of rounds can be reduced to a constant, but at the
expense of an exponential blowup in the message sizes. We
insist that the total amount of communication be
polynomial bounded. While their result shows that the
depth of a circuit is not a lower bound for the number of
rounds necessary for securely evaluating it, the savings is
far from being substantial in a general setting.
II. RELATED WORK
In the traditional association rule mining,
companies give their data to the analyst for finding the
patterns or association rules exist between the items.
Although it is advantageous to achieve sophisticated
analysis on tremendous volumes of data in a cost-effective
way, there exist several serious security issues of the datamining as- a-service paradigm. One of the main security
issues is that the server has access to valuable data of the
owner and may learn sensitive information from it.
Traditional distributing algorithm based on Apriori, main
disadvantage of this approach is multiple database scan and
candidate set generations.
Association rule mining is one of the mainly
essential and fine researched methods of data mining. It
aims at exciting correlations, common patterns, sets of
objects in the transaction databases or additional data
repositories. Association rules are broadly used in a range
of areas such as telecommunication networks, market and
hazard managing, inventory control etc [1]. Different
association mining methods and algorithms will be
momentarily introduced and compared afterwards.
Association rule mining is to locate out association rules
that suit the predefined support and confidence from a
database [3].
The trouble is decomposed into two sub problems.
One is to discover those item sets whose occurrences go
above a predefined threshold called item as frequent or
large item sets. The second dilemma is to produce
association rules from those large item sets with the
constraints of negligible confidence.
The two most important approach for utilizing
multiple Processors that have distributed memory within
the each processor have a private memory; [6]and shared
memory within the all processors right to use common
http://www.ijettjournal.org
Page 60
International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014
memory. Each processor has a straight and equal access to
all memory in the scheme.[4]
In distributed memory structural design each
processor has its own local memory that can only be access
directly by that processor. A Parallel purpose could be
divided into number of subtasks and executed parallel on
disconnect processors in the system .though the
presentation of a parallel application on a distributed
system is mostly subject on the allocation of the tasks
comprising the application onto the accessible processors
in the scheme.[5] association rule mining is the mostly
applied method. The Apriori algorithm is the mainly
representative algorithm for association rule mining. It
consists of plenty of modified algorithms that focus on
civilizing
its
efficiency
and
accuracy.
III. PROPOSED WORK
In this approach we are proposing a privacy preserving
mining approach with Boolean Matrix, it reduces problem
of multiple database scans and candidate set generations by
constructing the Boolean Matrix. Data can be integrated
from multiple data holders or players, for secure
Data Holder1
decrypted with decoder and forwarded to Boolean Matrix
to extract frequent pattern from the received patterns.
For experimental purpose we establish
connection between the nodes and Central location (Key
generation center) through network or socket
programming, Key can be generated by using improved
LaGrange’s polynomial equation and key can be
distributed to user
Every individual node participates in key generation
process and retrieves key by reconstruction. Encrypt the
datasets by using triple DES and key which is generated by
the LaGrange’s polynomial equation. All encrypted
datasets can be forwarded to centralized location and
decrypted with same symmetric key and forwards to
mining process.
Group key manger receives the registration request from all
the users, and generates a verification share and forwards
to all the requested users for authentication purpose,
generates the key using key generation process and
forwards the points to extraction of the key from the
equation generated by the verification points.
For key generation protocol, it receives the
verification shares and key as input to construct the
Data Holder1
Data Holder2
Cipher Pattern
Cipher Pattern
Cipher Pattern
Encoder/Decoder
Centralized Server
Boolean Matrix
Fig1: Horizontal partitioning Architecture
transmission or distributed partitioning we are
implementing an improved lagranges’s polynomial
approach for secure key generation for encryption of data
from data holders with triple DES algorithm.
Every individual data holder or player maintains
their transactions or patterns, in horizontal partitioning ,
every data holder forwards their patterns to centralized
server after encryption of patterns which are at individual
end, At centralized server received pattern can be
ISSN: 2231-5381
lagranges polynomial equation f(x), which is passed
through (0, key) and verification points ,after that group
key manager forwards the points to data owners. Data
owners again reconstruct the key from the verification
points and check the authentication code which is sent by
the group key manager.
When a new user tries to download the file, new user need
not to connect other data owner to decryption of the file,
user connects to the group key manager he will update the
group key and decrypts the files with previous key again
http://www.ijettjournal.org
Page 61
International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014
encrypt with new key and updates the new key to all the
data owners.
Data owner initiate the request by sending the random
challenge to the group key manager, as a response Group
key manager sends a secret share, data owner authenticates
and forwards the verification share, data owner receives the
verification shares and generates the key using Lagrange’s
polynomial equation and forwards the points to data
owners for regeneration the key
Step2: Read the individual pattern Pi separated by a special
character.
Step3: Construct an empty matrix with I rows and j
columns
Where ‘i’ is item and ‘j ‘ is transaction id
Step4: Set intersection (i,j)=1 if corresponding g item ‘I’
available in particular transaction ID ‘J’ .else set to 0.
4. Points (Subset of P points)
1. Request ( Rch)
Group
Key
manager
2. Response (Sshare)
Node users
3. Vshare
Fig2 : Authentication and Key Generation
Rch ----Random challenge
Sshare---Secret share
Vshare----verification share
P={p1,p2…pn }-------points for construction of Lagrange’s
equation
Boolean Matrix:
The server or service provider performing
association rule mining on cipher database for finding
maximum frequent item sets. Thus the research presented a
new algorithm of mining maximum frequent item sets first
based on the Boolean Matrix of frequent length-1 item sets.
The main idea of the algorithm is to create a Boolean
Matrix with frequent length-1 item sets as row headings
and
transaction records’ IDs as column headings In the matrix,
there are only two type of values, ‘1’ and ‘0’, which means
that the transaction record contains or not the
corresponding frequent length-1 item set. Then it is
necessary to calculate the number of value 1 in each
column and the count of the columns with the same
number of value 1
Algorithm for Boolean Matrix construction:
Step1: While e (true ) // patterns available
ISSN: 2231-5381
Step5: Continue step 2 to 5
Now we can extract frequent patterns from the
matrix, to extract frequent 1 itemset, initially count number
of ones in vertical columns with respect to item, if it
matches minimum threshold values then treat it as frequent
item else ignore, continue same process for 2 itemset,
check whether two items have ‘1’ in their corresponding
vertical columns then increment, continue until all
transactions verified. If total count greater than threshold
value then treat it as frequent item
Algorithm for frequent pattern Generation from Boolean
Matrix:
Step1 : read item set {I1,I2…In) and Initialize counter:=0
,final counter :0
Step2 : for i:=0 ;i< n ;i++
For j:=0 j<trans _size ;j++
If intersection of (i,j)==1
Counter :+1;
Next
If counter ==Ii .size() then add to item list
Next
Step3: Set minimum threshold value (t)
Step4: for k=0;k<itemlist_size ;k++
http://www.ijettjournal.org
Page 62
International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014
If item_list[k].count >= t Then
add to frequentitemlist
End if
Next
Step5: return frequent pattern list
Boolean Matrix can be constructed based on the
availability of the item with respect to transaction. Initially
the first transaction contains “a,b,c,d” ,So in corresponding
positions of items set to ‘1’ in first transaction
else ‘0’ and consider second transaction “a,c,e”,set the
corresponding item positions to ‘1’ in second transaction,
continue the process until all transactions get completed.
Itemset
Transaction IDS
3
4
5
0
0
0
a
1
1
2
1
b
1
0
1
1
c
1
1
0
d
1
0
1
e
0
1
0
IV. CONCLUSION
We are concluding our research work with efficient
frequent pattern mining approach in secure manner over
horizontal databases ,a secure key can be generated through
efficient and improved lagranges polynomial equation and
cipher data can be received and decrypted by centralized
server and finds the frequent patterns from its end in an
accurate and efficient manner
REFERENCES
1.The Round Complexity of Secure Protocols by Donald Beaver*Harvard
University s
2. D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. Efficient mining of
association rules in distributed databases. IEEE Trans. Knowl. Data Eng.,
8(6):911–922, 1996.
3. R. Agrawal and R. Srikant.Privacy-preserving data mining. In
SIGMODConference, pages 439–450, 2000.
4. M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for
message authentication. In Crypto, pages 1–15, 1996.
6
1
[5] A. Ben-David, N. Nisan, and B. Pinkas.FairplayMP - A system
forsecure multi-party computation. In CCS, pages 257–266, 2008.
0
1
1
1
1
[6] J.C. Benaloh. Secret sharing homomorphism’s: Keeping shares of a
secret. In Crypto, pages 251–260, 1986.
0
1
1
1
1
1
Fig 4: Boolean Matrix
Frequent patterns generation:
Initially frequent one item set can be generated by
counting number of individual items in all transactions
like, Consider item ‘a’, now count number of ‘1’s opposite
to item ‘a’ in all transactions, total count of a is 3 because a
available in transaction1 2 and 6.if the count equal or
greater than minimum threshold value or support count ( 2
in our example) it can be treated as frequent item.
[7] J. Brickell and V. Shmatikov.Privacy-preserving graph algorithms
inthe semi-honest model. In ASIACRYPT, pages 236–252, 2005.
[8] D.W.L. Cheung, J. Han, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. A
fastdistributed algorithm for mining association rules. In PDIS, pages 31–
42, 1996.
[9] D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. Efficient miningof
association rules in distributed databases. IEEE Trans. Knowl. DataEng.,
8(6):911–922, 1996.
[10] T. ElGamal. A public key cryptosystem and a signature scheme
based ondiscrete logarithms. IEEE Transactions on Information Theory,
31:469–472, 1985.
BIOGRAPHIES
To find the frequent two item set or three item set or n
item set, we can follow the same procedure until frequent
items found.
P. Ramesh Completed Master of Computer
Applications (M.C.A) from Avanthi
Engineering College, Narsipatnam. He is
pursuing M.Tech (CSE) from Pydah
Engineering
College,
Boyapalem,
Visakhapatnam. His areas of interest are
data Ming, network security.
Consider two item set {a,b},now check the
corresponding ones opposite to “a,b” (both should be set to
“1”),then count would be “1”.In the above table
transaction 1 and 6 contains “1” in both places of a and b,
so count is 2. Now {a,b} is a frequent item ,because our
minim support count value is 2,by the same process you
can find the remaining frequent patterns.
Ch. SwapnaPriya completed Mtech. She
working as Asst. Professor Mtech
Coordinator Pydha college of engineering
and technology8 years of experience Areas
of interests are Data Mining,Computer
Networks,FLAT.
Frequent ‘n’ item set:
ISSN: 2231-5381
http://www.ijettjournal.org
Page 63
Download