A Privacy Preserving Association Rule Mining Over Unrealized Datasets Sunil kumar chintada,JayanthiRaoMadina

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 4 - Nov 2013
A Privacy Preserving Association Rule Mining Over
Unrealized Datasets
Sunil kumar chintada,JayanthiRaoMadina
1
Final MTech student,Assistant professeor
Department of Software Engineering , SISTAM college, Srikakulam, Andhra Pradesh
2
Dept of CSE , SISTAM college, Srikakulam, Andhra Pradesh
Abstract: In this paper we are proposing an efficient an
empirical model of privacy preserving association rule mining
approach over data mining with Boolean matrix approach
and security consideration we are using RSA algorithm for
Secure data transmission. In this approach we are reducing
the time complexity during finding the patterns by the
Boolean matrix ,Communication can be done with cipher
datasets instead of plain datasets .
I. INTRODUCTION
Association rule mining aims at the discovery of
itemsets that co-occur frequently in transactional data.
Centralized mining has been well studied in the past The
problem has a large worst-case complexity, a fact tha
motivates business to outsource the mining process to
service providers, who have developed efficient,
specialized solutions. The data owner, apart from the
mining cost relief, has additional motives for outsourcing.
First, it requires minimal computational resources, since
the owner is only required to produce and to send the
transactions to the miner.[1]
This makes the outsourcing model also attractive
to application sin which data owners produce transactions
as streams and they have limited resources to maintain
them. Second, assume that the owner has multiple
production sources of transactions, e.g., consider a chain of
supermarkets which generate transactions at different
locations. All transactions can be sent to a single provider
for mining association rules. The provider could compute
association rules that are local to the individual stores or
global rules for the whole organization. Therefore, the cost
of transferring transactions among the sources and
performing the global mining in a distributed manner is
saved.
Generally when people talk of privacy, they say
“keep information about me from being available to
others”. However, their real concern is that their
information not be misused. The fear is that once
information is released, it will be impossible to prevent
misuse. Utilizing this distinction –ensuring that a data
mining project won’t enable misuse of personal
information – opens opportunities that “complete privacy”
would prevent. To do this, we need technical and social
solutions that ensure data will not be released.
ISSN: 2231-5381
Another view is corporate privacy – the release of
information about a collection of data rather than an
individual data item. I may not be concerned about
someone knowing my birthdate, mother’s maiden name, or
social security number; but knowing all of them enables
identity theft. This collected information problem scales to
large, multi-individual collections as well. A technique that
guarantees no individual data is revealed may still release
information describing the collection as a whole. Such
“corporate information” is generally the goal of data
mining, but some results may still lead to concerns (often
termed a secrecy, rather than privacy, issue.)[4]
II. RELATED WORK
There are several fields where related work is
occurring. We first describe other work in privacypreserving data mining then go into detail on specific
background work on which this paper builds. Previous
work in privacy-preserving data mining has addressed two
issues. In one, the aim is preserving customer privacy by
distorting the data values [4]. The idea is that the distorted
data does not reveal private information, and thus is “safe”
to use for mining. The key result is that the distorted data,
and information on the distribution of the random data used
to distort the data, can be used to generate an
approximation to the original data distribution, without
revealing the original data values. The distribution is used
to improve mining results over mining the distorted data
directly, primarily through selection of split points to “bin”
continuous data. Later refinement of this approach
tightened the bound son what private information is
disclosed, by showing that the ability to reconstruct the
distribution can be used to tighten estimates of original
values based on the distorted data [5].
More recently, the data distortion approach has
been applied to Boolean association rules [6], [7]. Again,
the idea is to modify data values such that reconstruction of
the values for any individual transaction is difficult, but the
rules learned on the distorted data are still valid. One
interesting feature of this work is a flexible definition of
privacy; e.g., the ability to correctly guess a value of ‘1’
from the distorted data can be considered a greater threat to
privacy than correctly learning a ‘0’.The data distortion
approach addresses a different problem from our work. The
http://www.ijettjournal.org
Page 207
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 4 - Nov 2013
assumption with distortion is that the values must be kept
private from whoever is doing the mining.[9]
We instead assume that some parties are allowed
to see some of the data, just that no one is allowed to see
all the data. In return, we are able to get exact, rather than
approximate, results.
III. PROPOSED APPROACH
We are proposing an efficient and empirical
model of privacy preserving data mining technique for
finding the frequent patterns with Boolean matrix
approach. In the proposed system we are provide privacy
preserving mining of association rules from outsourced
transaction database. By using cryptography and
association rules we are provide security of transaction
database. The propose system mainly contains the two
modules i.e. data owner and service provider.
RSA
1. Transactional Dataset /6.Mined results 2. Cipher transactions
7. patterns
5. Mined Cipher patterns
Data Owner
Analyst
4. Mined Cipher patterns
3. Cipher transactions
BM
mining
The main task of data owner is convert the plain
database to cipher database. Here we are using RSA
algorithm for convert plain database to cipher database.
The RSA algorithm involves three steps: key generation,
encryption and decryption. In this module data can be
gathered for association rule mining, which contains
different transactions with respect to item sets.
Administrator sends the data to the analyst for generating
the association rules between the data items. Analyst finds
the interesting patterns between the items.
Key generation:
In this approach we are using an asymmetric approach for
the cryptography, which involves public key and private
key. Public key is used for encrypting the main text to
convert into cipher, Private key used for decryption of
cipher.RSA follows an approach for generation of keys as
follows:
prime numbers randomly instead of make it
static and efficiently found with primarily test.
2. Calculate prod = m*n.
Prod yields the result of modulus for both public and
private keys and it indicates the key length
3. Compute φ(prod) = φ(m)φ(n) = (m − 1)(n − 1),
where φ is Euler's quoficient function.
4. Select
an
integer a such
that 1
<a<
φ(prod) and gcd(a, φ(prod)) = 1; i.e. a and
φ(prod) both are coprime.

Where ‘a’ is the public key exponent.

‘a’ contains small Hamming and a short bitlength and weight results in more efficient
encryption – most commonly2 16 + 1 =
1. Select any two distinct prime numbers m and n.
For optimal security consideration select the
ISSN: 2231-5381
65,537. However, much smaller values
http://www.ijettjournal.org
Page 208
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 4 - Nov 2013

of e (such as 3) have been shown to be less
[5]
By construction, g⋅a ≡ 1 (mod φ(prod)),the
above public key consists of the modulus prod and
the public (or encryption) exponent a. The private
key consists of the modulus n and the private (or
decryption) exponent g, which must be kept
secret. p, q, and φ(prod) must also be kept secret
because they can be used to calculate d.
secure in some settings.
5. Calculate g as g−1 ≡ a (mod φ(prod)), i.e., prod is
the multiplicative inverse of e (modulo φ(prod)).

The
above
specification
clearly shows
for g given g⋅a ≡ 1 (mod φ(prod))

g is kept as the private key exponent.
Most considerably Euclidean algorithm for
computation.
Encryption :
Frequent1Itemset
1
1
0
1
1
Item1
Item2
Item3
Item4
2
1
1
0
0
3
0
1
0
0
Transaction Records ID
4
5
1
0
0
1
1
1
1
0
When Alice transfers her public key (prod, a) to Bob and
makes the private key secret. Bob then wishes to send
message M to Alice and it can be computes as follows
C=ma (mod prod)
Where M into an integer m, such that 0 ≤ m < n
This cipher information can be forwarded to Bob
Decryption :
6
0
0
1
1
7
1
1
0
0
8
0
0
1
1
9
1
1
0
0
10
1
1
0
0
the number of value 1 in each column and the count of the
columns with the same number of value 1
After the construction of Boolean matrix transactions can
be acquired by the sequence of items and no need to
construct the frequent one item set, then every possible
pattern can be compared with the pattern for presence of
items by the Boolean matrix indication values of ‘0’ or
‘1’,no need to perform the multiple database scans and no
need to generate the number of candidate set generations
Alice can recover m from c by using her private key
exponent d via computing
IV. CONCLUSION
d
M=c (mod prod)
Given m, she can recover the original
message M by reversing the padding scheme. The server or
service provider performing association rule mining on
cipher database for finding maximum frequent item sets.
Thus the research presented a new algorithm of mining
maximum frequent item sets first based on the Boolean
matrix of frequent length-1 item sets. The main idea of the
algorithm is to create a Boolean matrix with frequent
length-1 item sets as row headings and transaction records’
IDs as column headings (TABLE I). In the matrix, there
are only two type of values, ‘1’ and ‘0’, which means that
the transaction record contains or not the corresponding
frequent length-1 item set. Then it is necessary to calculate
ISSN: 2231-5381
We are concluding our approach with integrated
approach of Boolean matrix for association rule mining
and RSA for secure data transmission of data over network,
data can be transmitted over network securely and obtains
the patterns in efficient manner
REFERENCES
[1] R. Buyya, C. S. Yeo, and S. Venugopal, “Marketoriented cloud computing:Vision, hype, and reality for
delivering it services as computingutilities,” in Proc. IEEE
Conf. High Performance Comput. Commun.,Sep. 2008, pp.
5–13.
[2] W. K. Wong, D. W. Cheung, E. Hung, B. Kao, and N.
Mamoulis,“Security in outsourcing of association rule
http://www.ijettjournal.org
Page 209
International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 4 - Nov 2013
mining,” in Proc. Int. Conf.Very Large Data Bases, 2007,
pp. 111–122.
[3] L. Qiu, Y. Li, and X. Wu, “Protecting business
intelligence and customerprivacy while outsourcing data
mining tasks,” Knowledge Inform. Syst.,vol. 17, no. 1, pp.
99–120, 2008.
[4] C. Clifton, M. Kantarcioglu, and J. Vaidya, “Defining
privacy for data
mining,” in Proc. Nat. Sci. Found. Workshop Next
Generation DataMining, 2002, pp. 126–133.
[5] I. Molloy, N. Li, and T. Li, “On the (in)security and
(im)practicality ofoutsourcing precise association rule
mining,” in Proc. IEEE Int. Conf.Data Mining, Dec. 2009,
pp. 872–877.
[6] F. Giannotti, L. V. Lakshmanan, A. Monreale, D.
Pedreschi, andH. Wang, “Privacy-preserving data mining
from outsourced databases,”in Proc. SPCC2010
Conjunction with CPDP, 2010, pp. 411–426.
[7] R. Agrawal and R. Srikant, “Privacy-preserving data
mining,” in Proc.ACM SIGMOD Int. Conf. Manage. Data,
2000, pp. 439–450.
[8] S. J. Rizvi and J. R. Haritsa, “Maintaining data privacy
in associationrule mining,” in Proc. Int. Conf. Very Large
Data Bases, 2002, pp. 682–693.
[9] M. Kantarcioglu and C. Clifton, “Privacy-preserving
distributed miningof association rules on horizontally
partitioned data,” IEEE Trans.Knowledge Data Eng., vol.
16, no. 9, pp. 1026–1037, Sep. 2004.
[10] B. Gilburd, A. Schuster, and R. Wolff, “k-ttp: A new
privacy modelfor large scale distributed environments,” in
Proc. Int. Conf. Very LargeData Bases, 2005, pp. 563–568.
BIBLIOGRAPHY
Sunil kumar chintada is working as an
software developer
in
E-centric
solutions pvt ltd, vizag. He received
B.Tech from Sarada Institute of Science,
Technology
and
Management,
Srikakulam. He is pursuing M.Tech in
Sarada Institute of Science, Technology
and Management, Srikakulam, Andhra Pradesh. Interesting
areas are Data Structures, Java and Oracle database
ISSN: 2231-5381
Jayanthi Rao Madina is working as a
HOD in Sarada Institute of Science,
Technology
And
Management,
Srikakulam, Andhra Pradesh. He
received his M.Tech (CSE) from
Aditya Institute of Technology And
Management, Tekkali. Andhra Pradesh.
His research areas include Image Processing, Computer
Networks, Data Mining, Distributed Systems.
http://www.ijettjournal.org
Page 210
Download