A Lagrange’s Polynomial Based Secure Mining Over Distributed Databases R.Suneel Kumar

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
A Lagrange’s Polynomial Based Secure Mining
Over Distributed Databases
1
Koppala V Satya Surya Anusha, 2R.Suneel Kumar
2
1
M.Tech Scholar, Assistant professor
1,2
Dept of CSE, Maharaj Vijayaram Gajapathiraj College of Engineering
Chintalavalasa, Vizianagaram District, A.P
Abstract: Secure mining of association rule mining over
horizontal databases is always an interesting research issue
in the field of knowledge and data engineering. In
horizontal partitioning or data bases, databases are
integrated from various data holders or players for
applying association rule mining over integrated database.
In this paper we are proposing a privacy preserving mining
approach with Improved LaGrange’s polynomial equation
for secure key generation and Boolean Matrix approach.
Index Terms: Association Rule mining, Boolean Matrix,
LaGrange’s polynomial.
I.INTRODUCTION
In view of this brief description, it can be seen that
all of these protocols for secure multiparty function
evaluation run in unbounded "distributed time," that is,
using an unbounded number of rounds of communications
[1]. Even though the interaction for each gate can be
implemented in a way that requires only a constant number
of rounds, the total number of rounds will still be linear in
the depth of the underlying circuit. For many concrete
computations, the resulting number of rounds would be
prohibitive; in distributed computation, the number of
rounds is generally the most valuable resource quality
important
Secure function evaluation consists of distributive
evaluating a function so as to satisfy both the correctness
and privacy constraints. This task is made particularly
difficult by the fact that some of the players may be
maliciously faulty and try to cooperate in order to disrupt
the correctness and the privacy of the computation. Secure
function evaluation arises in two main settings. Coming to
fault tolerant computation, in this [2] setting correctness is
the main issue: we insist that the values a distributed
system returns are correct, no matter how some
components in the system fail. However, even if one is
solely interested in correctness, privacy helps to achieve it
most strongly: if one wants to maliciously influence the
outcome of an election, say, it is helpful to know who plans
to vote for whom. Second, secure function computation is
central to protocol design, as the correctness and privacy of
ISSN: 2231-5381
any protocol can be reduced to it. Here, as people may be
behind their computers, correctness and privacy are
Secure .function evaluation[3]. Assume we have n parties,
1 , . . . , n; each party i has a private input xi known only to
him. The parties want to correctly evaluate a given function
f on their inputs1, that is to compute y = f (x l , ...,z,~),
while maintaining the privacy of their own inputs. That is,
they do not want to reveal more than the value y implicitly
reveals.
Bar-Ilan and Beaver were the first to investigate
reducing the round complexity for secure function
evaluation. They exhibited a non-cryptographic method
that always saves a logarithmic factor of rounds
(logarithmic in the total length of the players' inputs), while
the total amount of communication grows only by a
polynomial factor. Alternatively, they show that the
number of rounds can be reduced to a constant, but at the
expense of an exponential blowup in the message sizes. We
insist that the total amount of communication be
polynomial bounded. While their result shows that the
depth of a circuit is not a lower bound for the number of
rounds necessary for securely evaluating it, the savings is
far from being substantial in a general setting.
II. RELATED WORK
In the traditional association rule mining,
companies give their data to the analyst for finding the
patterns or association rules exist between the items.
Although it is advantageous to achieve sophisticated
analysis on tremendous volumes of data in a cost-effective
way, there exist several serious security issues of the datamining as- a-service paradigm. One of the main security
issues is that the server has access to valuable data of the
owner and may learn sensitive information from it. There is
a loss of corporate privacy. Traditional distributing
algorithm based on apriori, main disadvantage of this
approach is multiple database scan and candidate set
generations
Association rule mining is one of the mainly
essential and fine researched methods of data mining. It
aims to extort exciting correlations, common patterns,
associations or informal structures amongst sets of objects
in the transaction databases or additional data repositories.
Association rules are broadly used in a range of areas such
as telecommunication networks, market and hazard
http://www.ijettjournal.org
Page 316
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
managing, inventory control etc [1]. Different association
mining methods and algorithms will be momentarily
introduced and compared afterwards. Association rule
mining is to locate out association rules that suit the
predefined least amount support and confidence from a
database [3].
The trouble is decomposed into two sub problems.
One is to discover those item sets whose occurrences go
above a predefined threshold in the database; those item
sets are known as frequent or large item sets. The second
dilemma is to produce association rules from those large
item sets with the constraints of negligible confidence [2].
The two most important approach for utilizing multiple
Processors that have emerge; distributed memory within
the each processor have a private memory; [6]and shared
memory within the all processors right to use common
memory. Shared memory structural design has many
popular properties. Each processor has a straight and equal
access to all memory in the scheme.[4]
In distributed memory structural design each
processor has its own local memory that can only be access
directly by that processor. A Parallel purpose could be
divided into number of subtasks and executed parallelism
on disconnect processors in the system .though the
presentation of a parallel application on a distributed
system is mostly subject on the allocation of the tasks
comprising the application onto the accessible processors
Data Holder1
Data Holder2
III. PROPOSED WORK
In this approach we are proposing a privacy
preserving mining approach with Boolean Matrix, it
reduces problem of multiple database scans and candidate
set generations by constructing the Boolean Matrix. Data
can be integrated from multiple data holders or players, for
secure transmission or distributed partitioning we are
implementing an improved Lagrange’s polynomial
approach for secure key generation for encryption of data
from data holders with triple DES algorithm.
Every individual data holder or player maintains
their transactions or patterns, in horizontal partitioning,
every data holder forwards their patterns to centralized
server after encryption of patterns which are at individual
end, At centralized server received pattern can be
decrypted with decoder and forwarded to Boolean Matrix
to extract frequent pattern from the received patterns. For
experimental purpose we establish connection between the
nodes and Central location (Key generation center) through
network or socket programming, Key can be generated by
using improved LaGrange’s polynomial equation and key
can be distributed to user
Every individual node participates in key
generation process and retrieves key by reconstruction. It
encrypt the datasets by using triple DES and key which is
generated by the LaGrange’s polynomial equation. All
Data Holder1
Cipher Pattern
Cipher Pattern
Cipher Pattern
Encoder/Decoder
Centralized Server
in the scheme.[5]
Categorization models, is the mostly applied method. The
Apriori algorithm is the mainly representative algorithm
for association rule mining. It consists of plenty of
modified algorithms that focus on civilizing its efficiency
and accuracy.
ISSN: 2231-5381
Boolean Matrix
encrypted datasets can be forwarded to centralized location
and decrypted with same symmetric key and forwards to
mining process. Group key manger receives the registration
request from all the users, and generates a verification
share and forwards to all the requested users for
authentication purpose, generates the key using key
generation process and forwards the points to extraction of
http://www.ijettjournal.org
Page 317
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
the key from the equation generated by the verification
points.
For key generation protocol, it receives the
verification shares and key as input to construct the
Lagrange’s polynomial equation f(x), which is passed,
through (0, key) and verification points, after that group
key manager forwards the points to data owners. Data
owners again reconstruct the key from the verification
points and check the authentication code which is sent by
the group key manager.
When a new user tries to download the file, new
user need not to connect other data owner to decryption of
the file, user connects to the group key manager he will
update the group key and decrypts the files with previous
key again encrypt with new key and updates the new key to
all the data owners.
Data owner initiate the request by sending the random
challenge to the group key manager, as a response Group
key manager sends a secret share, data owner authenticates
and forwards the verification share, data owner receives the
verification shares and generates the key using Lagrange’s
polynomial equation and forwards the points to data
owners for regeneration the key
•
•
N is total number of points n=6 and ,k is
minimum number of secret shares where consider
k=3 and consider any two random numbers
a=166 and b=94 then
f(x)=1234+166x+94x2
The points which are satisfying equation or secret
shares
are
D 0=
(1,1494),D1=(2,1942)D3=(3,2598)D4=(4,3402)D5
=(5,4414)D6=(6,5614)
Centralized server forwards a specific point both x and y,
because we use n-1 number of shares instead of n the
points initiates from (1, f(1)) and not (0, f(0)). This is
required because if one would have (0, f(0)) he would also
know the secret key (S=f(0))
Re-construction of secret key:
• In order to reconstruction of secret key, any three
points are enough
• Let us consider
(x0,y0)=(2,1924),(x1,y1)=(4,3402),(x2,y2)=(5,4414)
Using lagrangeous polynomials
4. Points (Subset of P points)
1. Request ( Rch)
Node users
Group
Key
manager
2. Response (Sshare)
3. Vshare
L0=x-x1/x0 -x1 *x-x2/x0-x2=x-4/2-4*x-5/2-5=(1/6)x2(3/2)x+10/3
L1=x-x0/x1-x0*x-x2/x1-x2=x-2/4-2*x-5/4-5=-(1/2)x2-(7/2)x-5
L2=x-x0/x2 -x0 *x-x1/x2-x1=x-2/5-2*x-4/5-4=(1/3)x2-2x+8/3
Fig2 : Authentication and Key Generation
Rch ----Random challenge
Sshare---Secret share
2
f(x)=∑
j * lj(x) =1942((1/6)x -(3/2)x+10/3)+3402(2
2
(1/2)x -(7/2)x-)+4414((1/3)x -2x+8/3 )
Vshare----verification share
P={p1,p2…pn }-------points for construction of Lagrange’s
equation
Key Generation Process :
f(x)=1234+166x+94x2
here we can consider the coefficient or a
which means that Secret key S is1234.
o
as secret key
Example:
• Let us consider a secret key S=1234
ISSN: 2231-5381
http://www.ijettjournal.org
Page 318
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
Boolean Matrix:
matches minimum threshold values then treat it as frequent
item else ignore, continue same process for 2 itemset,
verify whether two items have ‘1’ in their corresponding
vertical columns then increment, continue until all
transactions verified. If total count greater than threshold
value then treat it as frequent item
Centralized server or service provider performs
operation of association rule mining or frequent pattern
generation from horizontally partitioned data from different
data holders or players. Thus our current research proposed
a novel algorithm of mining maximum frequent item sets,
based on the Boolean Matrix generation with input
transactions. The main objective of the a technique is to
create a Boolean Matrix with set of items along with
transactions, here transactions are placed in columns and
items placed in corresponding blocks of transactions.
1: Read dataset {I1,I2…In) and
count:=0 ,final count =:0
prior Initialization of
2 : for j:=0 ;j< number_of _patterns ;j++
For k:=0 k<t rans_length ;k++
There are two type of values are possible based on
availability, if item is available in particular transaction
then it can be set it as ‘1’ else set to ‘0’, then it is necessary
to calculate the number of 1s in each column to count the
frequency of the item, if it exceeds the minimum threshold
value then it can be treated as frequent I item.
If (j,k)==1
Count :=+1;
Next
If count ==Ii .length() then add item to items array
Next
3: Set minimum support value (t0)
Algorithm for Boolean Matrix:
4: for i=0;i<itemlist_length ;i++
Step1: While (pattern size ()!=null )
If item_array[i].count >= t 0 Then
Step2: Read set of one pattern for each iteration separated
by an individual item. .
add to freuqnt_item_list
End if
Step3: Generate a matrix with i rows and j columns
Where ‘i’ is for item sets and ‘j ‘ is transactions
Next
5: return frequent pattern list
Step4: intersection of itemset and transaction can be
shown as (i,j) , if (i,j)=1 (i.e. corresponding item ‘I’
available in particular transaction ‘J’ .else set (i,j) to 0..
Boolean Matrix construction can be done based on
availability of the item in specific transactions. Let us
consider a transaction which contains “a, c,d,e” ,So in
corresponding positions of items set to ‘1’ in first
transaction else ‘0’ and consider second transaction
“a,c,e”,set the corresponding item positions to
‘1’ in
second transaction, continue the process until all
transactions get completed.
Step5: pattern size:= pattern size--1
Step6: Continue step 2 to 5
Now we can extract frequent patterns from the
matrix, to extract frequent 1 itemset, initially count number
of ones in vertical columns with respect to item, if it
Itemset
a
1
1
b
0
c
Transaction IDS
2
1
3
0
4
0
5
0
6
1
0
1
1
0
1
1
1
0
1
1
1
d
1
0
1
0
1
1
e
0
1
0
1
1
1
Fig 4: Boolean Matrix
ISSN: 2231-5381
http://www.ijettjournal.org
Page 319
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
Frequent 1 item set generation:
frequent one item set can be extracted, by sum of
number of 1s in individual transactions like, Consider an
item ‘a’, now count corresponding number of ‘1’s
horizontal to the item, so total count of item ‘a’ is 3
because a available in transactions 1 2 and 6.if the count
equal or greater than minimum threshold value or support
value ( 2 in our example) then it can be treated as frequent
item else in frequent.
Frequent ‘n’ item set extraction:
To mine frequent two item sets or three item sets or ‘n’
item sets, we can follow the same counter value for set of
items. let us Consider two item set {a,b}, now set the
counter for corresponding 1s opposite to items ‘a’ and ‘b’
(both should be set to “1”),then counter would be “1”.In
the above table transaction 1 and 6 contains “1” in both
places of a and b, so counter is 2. Now it can be treated as
frequent item, by the same process you can check the
counter for corresponding item sets find the remaining
frequent patterns.
Conclusion:
We are concluding our research work with efficient
frequent pattern mining approach in secure manner over
horizontal databases ,a secure key can be generated through
efficient and improved Lagrange’s polynomial equation
and cipher data can be received and decrypted by
centralized server and finds the frequent patterns from its
end in an accurate and efficient manner
References:
1. The Round Complexity of Secure Protocols by Donald Beaver*Harvard
Universitys
2. D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. Efficient mining of
association rules in distributed databases. IEEE Trans. Knowl. Data Eng.,
8(6):911–922, 1996.
ISSN: 2231-5381
3. R. Agrawal and R. Srikant.Privacy-preserving data mining. In
SIGMODConference, pages 439–450, 2000.
4. M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for
message authentication. In Crypto, pages 1–15, 1996.
[5] A. Ben-David, N. Nisan, and B. Pinkas.FairplayMP - A system
forsecure multi-party computation. In CCS, pages 257–266, 2008.
[6] J.C. Benaloh. Secret sharing homomorphisms: Keeping shares of a
secretsecret. In Crypto, pages 251–260, 1986.
[7] J. Brickell and V. Shmatikov.Privacy-preserving graph algorithms
inthe semi-honest model. In ASIACRYPT, pages 236–252, 2005.
[8] D.W.L. Cheung, J. Han, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. A
fastdistributed algorithm for mining association rules. In PDIS, pages 31–
42, 1996.
[9] D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. Efficient miningof
association rules in distributed databases. IEEE Trans. Knowl. DataEng.,
8(6):911–922, 1996.
[10] T. ElGamal. A public key cryptosystem and a signature scheme
based ondiscrete logarithms.IEEE Transactions on Information Theory,
31:469–472, 1985.
BIOGRAPHIES
R. Suneel Kumar received M.Tech in
computer Science and Engineering in 2012
from Jawaharlal Nehru Technological
University, Kakinada. He has two years of
teaching experience. He is currently
employed as Assistant professor in CSE department,
MVGR College of Engineering.
Koppala V Satya Surya Anusha pursuing
M.Tech in CSE department, MVGR
College of Engineering. Her interesting
areas are data mining and network
security.
http://www.ijettjournal.org
Page 320
Download