A Novel Model of Secure Mining with Decision Matrix Technique

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 4 - Apr 2014
A Novel Model of Secure Mining with Decision Matrix Technique
Prasanthi Kolluri *, Satyanarayana Mummana#
Final M.Tech Student , Assistant Professor
Department of CSE, Avanthi Institute of Engineering & Technology, Visakhapatnam. Andhra Pradesh
Abstract: Security in data mining is an important
research issue now days. In this paper we are proposing
an efficient a novel model of privacy preserving
association rule mining approach over data mining with
Decision matrix approach and security consideration
we are using RSA algorithm for Secure data
transmission. In this approach we are reducing the time
complexity during finding the patterns by the Decision
matrix ,Communication can be done with cipher
datasets instead of plain datasets .
I. INTRODUCTION
Pattern mining is the process of finding the
sequence of events from the large set of patterns, we have
various types of pattern mining algorithms to finding the
association rule mining between the events or data items
based on the pattern mining, algorithms like apriori is the
basic algorithm for finding the frequent patterns by
generating the candidate set generations for frequent
itemsets, the main drawback with the apriori algorithm is
multiple database scans and candidate set generations.
FP growth algorithm is one of the association rule
mining algorithm, Intially construct the fp tree and find the
frequent patterns through the tree by generating the suffix
tress fro individual data item or event from bottom to top,
but the main drawback with fp tree approach is memory
and complex when data items or events are more.
This paper deals with the out sourcing of data, it
means data owner places the data over cloud to mine the
data by the analyst in secure manner without losing the
data integrity
Association rule mining aims at the discovery of
itemsets that co-occur frequently in transactional data.
Centralized mining has been well studied in the past The
problem has a large worst-case complexity, a fact that
motivates business to outsource the mining process to
service providers, who have developed efficient,
specialized solutions. The data owner, apart from the
mining cost relief ,has additional motives for outsourcing.
First, it requires minimal computational resources, since
ISSN: 2231-5381
the owner is only required to produce and to send the
transactions to the miner.[1]
global rules for the whole organization. Therefore,
the cost of transferring transactions among the sources and
performing the global mining in a distributed manner is
saved.
Another view is corporate privacy – the release of
information about a collection of data rather than an
individual data item. I may not be concerned about
someone knowing my birthdate, mother’s maiden name, or
social security number; but knowing all of them enables
identity theft. This collected information problem scales to
large, multi-individual collections as well. A technique that
guarantees no individual data is revealed may still release
information describing the collection as a whole. Such
“corporate information ”is generally the goal of data
mining, but some results may still lead to concerns (often
termed a secrecy, rather than privacy, issue.)[4]
II. RELATED WORK
In recent days of technology, privacy is the
primary concern while mining of data over the networks,
there are mainly two types of approaches are available for
the privacy preserving, those are randomization and
perturbation and second one is cryptographic approach
In Initial approach fake values can be injected into
the real dataset and converts to unrealized dataset without
disturbing the integrity of real dataset. In second approach
,it uses either symmetric or asymmetric approach for the
encryption of the real datasets, at the receiver end real data
sets can decrypted and mines the data and forwards the
mined data by converting to cipher mining results, data
owner can decrypt the results.
In this paper we are proposing the cryptographic
approach for secure transmission of patterns to the analyst,
Initially Data owner prepares the real dataset and encrypts
the real dataset with cryptographic algorithm and forwards
to analyst. Analyst decrypts the data set and apply the
decision making approach and generates the patterns in
optimal manner and convert them to cipher and forwards
the cipher results to data owner, Analyst need not to know
the semantics of the data items and data owner can decrypt
the cipher mined results.
http://www.ijettjournal.org
Page 209
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 4 - Apr 2014
Decision
making system initially reads the
patterns and generates the decision matrix for individual
row wise, if an event or item present, it represents ‘1’ else
it represents ‘0’.After the construction of the complete
matrix frequent patterns can be extracted from the decision
system efficiently.
There are several fields where related work is
occurring. We first describe other work in privacypreserving data mining then go into detail on specific
background work on which this paper builds. Previous
work in privacy-preserving data mining has addressed two
issues. In one, the aim is preserving customer privacy by
distorting the data values [4]. The idea is that the distorted
data does not reveal private information, and thus is “safe”
to use for mining. The key result is that the distorted data,
and information on the distribution of the random data used
to distort the data, can be used to generate an
approximation to the original data distribution, without
revealing the original data values. The distribution is used
to improve mining results over mining the distorted data
directly, primarily through selection of split points to “bin”
1. Transactional Dataset /6.Mined results
RSA
continuous data. Later refinement of this approach
tightened the bounds on what private information is
disclosed, by showing that the ability to reconstruct the
distribution can be used to tightenestimates of original
values based on the distorted data [5].
We instead assume that some parties are allowed
to see some of the data, just that no one is allowed to see
all the data .In return, we are able to get exact, rather than
approximate, results.
III. PROPOSED APPROACH
We are proposing an efficient and empirical
model of privacy preserving data mining technique for
finding the frequent patterns with Decision matrix
approach. In the proposed system we are provide privacy
preserving mining of association rules from outsourced
transaction database. By using cryptography and
association rules we are provide security of transaction
database. The propose system mainly contains the two
modules i.e. data owner and service provider.
2. Cipher transactions
7. patterns
Data Owner
Analyst
5. Mined Cipher patterns
3. Cipher
Transactions
4. Mined Cipher patterns
DM
mining
ISSN: 2231-5381
http://www.ijettjournal.org
Page 210
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 4 - Apr 2014
The main task of data owner is convert the plain
database to cipher database. Here we are using RSA
algorithm for convert plain database to cipher database.
The RSA algorithm involves three steps: key generation,
encryption and decryption.In this module data can be
gathered for association rule mining, which contains
different transactions with respect to item sets.
Administrator sends the data to the analyst for generating
the association rules between the data items. Analyst finds
the interesting patterns between the items.
4. Select
an
integer a such
that 1
<a<
φ(prod) and gcd(a, φ(prod)) = 1; i.e. a and
φ(prod) both are coprime.

Where ‘a’ is the public key exponent.

‘a’ contains small Hamming and a short bitlength and weight results in more efficient
encryption – most commonly2 16 + 1 =
65,537. However, much smaller values
Key generation:
of e (such as 3) have been shown to be less
In this approach we are using an asymmetric approach for
the cryptography, which involves public key and private
key.Public key is used for encrypting the main text to
convert into cipher,Private key used for decryption of
cipher.RSA follows an approach for generation of keys as
follows:
secure in some settings.[5]
5. Calculate g as g−1 ≡ a (mod φ(prod)), i.e., prod is
the multiplicative inverse of e (modulo φ(prod)).

The
above
specification
clearly shows
for g given g⋅a ≡ 1 (mod φ(prod))
1. Select
any
two
distinct prime
numbers m and n.For
optimal
security
consideration select the prime numbers randomly
instead of make it static and efficiently found
with primalitytest.
2. Calculate prod = m*n.
Prod yields the result of modulus for both public and
private keys and it indicates the key length
3. Compute φ(prod) = φ(m)φ(n) = (m − 1)(n − 1),
where φ is Euler's quoficient function.
Encryption :
Frequent1Itemset
Item1
Item2
Item3
Item4
1
1
0
1
1
2
1
1
0
0
3
0
1
0
0
Transaction Records ID
4
5
1
0
0
1
1
1
1
0

Most considerably Euclidean algorithm for
computation.

g is kept as the private key exponent.
By construction, g⋅a ≡ 1 (mod φ(prod)),the
above public key consists of the modulus prod and
the public (or encryption) exponent a. The private
key consists of the modulus n and the private (or
decryption) exponent g, which must be kept
secret. p, q, and φ(prod) must also be kept secret
because they can be used to calculate d.
6
0
0
1
1
7
1
1
0
0
8
0
0
1
1
9
1
1
0
0
10
1
1
0
0
When Alice transfers her public key (prod, a) to Bob and makes the private key secret. Bob then wishes to send message M to
Alice and it can be computes as follows
C=ma (mod prod)
Where M into an integer m, such that 0 ≤ m < n
This cipher information can be forwarded to Bob
Decryption :
ISSN: 2231-5381
http://www.ijettjournal.org
Page 211
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 4 - Apr 2014
Alice can recover m from c by using her private key
exponent d via computing
M=cd (mod prod)
Given m, she can recover the original
message M by reversing the padding scheme.The server or
service provider performing association rule mining on
cipher database for finding maximum frequent item sets.
Thus the research presented a new algorithm of mining
maximum frequent itemsets first based on the Decision
matrix of frequent length-1 itemsets. The main idea of the
algorithm isto create a Decision matrix with frequent
length-1 itemsets as row headings and transaction records’
IDs as column headings (TABLE I). In the matrix, there
are only two type of values, ‘1’ and ‘0’, which means that
the transaction record contains or not the corresponding
frequent length-1 itemset. Then it is necessary to calculate
the number of value 1 in each column and the count of the
columns with the same number of value 1
After the construction of Decision matrix transactions can
be acquired by the sequence of items and no need to
construct the frequent one item set,then every possible
pattern can be compared with the pattern for presence of
items by the Decision matrix indication values of ‘0’ or
‘1’,no need to perform the multiple database scans and no
need to generate the number of candidate set generations
[7] R. Agrawal and R. Srikant, “Privacy-preserving data mining,” in
Proc.ACM SIGMOD Int. Conf. Manage. Data, 2000, pp. 439–450.
[8] S. J. Rizvi and J. R. Haritsa, “Maintaining data privacy in
associationrule mining,” in Proc. Int. Conf. Very Large Data Bases, 2002,
pp. 682–693.
[9] M. Kantarcioglu and C. Clifton, “Privacy-preserving distributed
miningof association rules on horizontally partitioned data,” IEEE
Trans.Knowledge Data Eng., vol. 16, no. 9, pp. 1026–1037, Sep. 2004.
[10] B. Gilburd, A. Schuster, and R. Wolff, “k-ttp: A new privacy
modelfor large scale distributed environments,” in Proc. Int. Conf. Very
LargeData Bases, 2005, pp. 563–568.
Bibliography:
Satyanarayana Mummana is working as an Asst. Professor in
Avanthi Institute of Engineering &
Technology, Visakhapatnam, Andhra
Pradesh. He has received his Masters
degree (MCA) from Gandhi Institute of
Technology and Management (GITAM),
Visakhapatnam and M.Tech (CSE) from
Avanthi Institute of Engineering &
Technology, Visakhapatnam. Andhra Pradesh. His research areas
include Image Processing, Computer Networks, Data Mining,
Distributed Systems, Cloud Computing
IV. CONCLUSION
I am prasanthi kolluri ,I had
completed B.tech in Lakireddy Balireddy
college
of
engineering(LBRCE),
Mylavaram, krishna dst, and currently
pursuing Mtech in Avanthi institute of
Engg
&
technology,
Narsipatnam,,Visakhapatnam
We are concluding our approach with intergared
approach of Decision matrix for association rule mining
and RSA for secure data transmission of data over
network,data can be transmitted over network secrely and
obtains the patterns in efficient manner
REFERENCES
[1] R. Buyya, C. S. Yeo, and S. Venugopal, “Market-oriented cloud
computing:Vision, hype, and reality for delivering it services as
computingutilities,” in Proc. IEEE Conf. High Performance Comput.
Commun.,Sep. 2008, pp. 5–13.
[2] W. K. Wong, D. W. Cheung, E. Hung, B. Kao, and N.
Mamoulis,“Security in outsourcing of association rule mining,” in Proc.
Int. Conf.Very Large Data Bases, 2007, pp. 111–122.
[3] L. Qiu, Y. Li, and X. Wu, “Protecting business intelligence and
customerprivacy while outsourcing data mining tasks,” Knowledge
Inform. Syst.,vol. 17, no. 1, pp. 99–120, 2008.
[4] C. Clifton, M. Kantarcioglu, and J. Vaidya, “Defining privacy for data
mining,” in Proc. Nat. Sci. Found. Workshop Next Generation
DataMining, 2002, pp. 126–133.
[5] I. Molloy, N. Li, and T. Li, “On the (in)security and (im)practicality
ofoutsourcing precise association rule mining,” in Proc. IEEE Int.
Conf.Data Mining, Dec. 2009, pp. 872–877.
[6] F. Giannotti, L. V. Lakshmanan, A. Monreale, D. Pedreschi, andH.
Wang, “Privacy-preserving data mining from outsourced databases,”in
Proc. SPCC2010 Conjunction with CPDP, 2010, pp. 411–426.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 212
Download