Secrecy Conserving of Association Rule Mining from Unrealized Transaction Data Base

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
Secrecy Conserving of Association Rule Mining
from Unrealized Transaction Data Base
Using Bipartite Matrix
Gangu. Dharma Raju1, Jayanthi Rao Madina2
1
1,2
Final M.Tech Student, 2Head of the Department
Dept of CSE, Sarada Institute of Science, Technology And Management(SISTAM), Srikakulam, Andhra Pradesh
Abstract: Now a days cloud computing is an inspire
development of storing data in cloud, but in the recent
interest for data mining as service for data owner. In this
paper we are proposing an empirical model of privacy
preserving association rule mining technique with an
efficient Cryptographic algorithm and bipartite matrix.
Data owner converts plain transactions to cipher and
forwards to service provider, service provider decrypts the
transactions and converts to plain and applies bipartite
matrix for generation of association rules. We are using SDES algorithm for privacy of item set and another
technique for finding frequent item sets we are using
bipartite matrix, by providing those techniques we can
provide more privacy and efficiency of mining over
frequent items sets.
I. INTRODUCTION
Association rule mining guarantees identifies the
itemsets that are frequently in data. In the centralized data
mining is studied. The limitation is a huge complexity that
provides services and developed as efficient and particular
solutions. Outsourcing needs more computing services so
the data owner needs data miner then sends the data to
miner. This idea makes the models efficient and attractive
and the data owners generate more results and it is very
difficult to maintain them. [1, 2]
Association rule mining is very hard task on large
amount of data and the outsourcing is also hard job. In
third party servers the mined data is on unsecured situation
because there is increase in usage of the mined data by
other data owners and more number of end users. For
flexible usage and utility the data to be mined and store in
centralized location because of flexible accessing. For all
these issues the association rule mining is the best solution
with encryption techniques. So the researchers focus on
data mining and the cryptographic techniques to outsource
the data in cloud server.[4]
The service providers calculate association rules
with their own and store in globalized server as global
rules. The complexity of sending the data and performance
of the central data mining is stored. It other way the service
provider itself becomes the point of malicious attack. It
service provider is not trusted one the data accessing is
ISSN: 2231-5381
limited and the data is not associated with the private
information[6,7]. Both the initial data and the resulted rules
from the service provider have to protect and maintain
security for the outsourced data in data mining.
There are two types of methods that can provide
security for complex information. The first method is data
is to apply encoding function that converts the original data
to a new format. The next method is to apply data
perturbation that modifies the original data randomly. The
perturbation method is less attractive since it can only
provide approximate results; on the other hand, the use of
encryption allows the exact rules to be recovered. In this
paper we propose and evaluate appropriate encryption
techniques for outsourcing of association rules mining. In
order for an encryption to be appropriate for the problem,
the following conditions should be satisfied. First, there
should be a correct, complete, and deterministic decryption
method that transforms the association rules found in the
encrypted database to the true association rules in the
original database. Second, the encryption and decryption
processes must be reasonably fast; otherwise, owners may
choose to apply association rules mining locally (if cost is
the only concern). Third, the encryption method must be
secure enough to prevent the service provider (or an
attacker) from recovering the original transactions and the
true association rules among the actual items by processing
the encrypted data.
Secure multiparty communication enables this
without the trusted third party. There may be considerable
communication between the parties to get the final result,
but the parties don’t learn anything from this
communication. The computation is secure if given just
one party’s input and output from those runs, we can
simulate what would be seen by the party. In this case, to
simulate means that the distribution of what is actually
seen and the distribution of the simulated view over many
runs are computationally indistinguishable. We may not be
able to exactly simulate every run, but over time we cannot
tell the simulation from the real runs.
II. RELATED WORK
Association rule mining[5] is a famous and large
researched process for finding attracting combination of
http://www.ijettjournal.org
Page 344
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
relations between parameters in huge databases. It is
intentionally to find strong rules found in databases using
different computations of interests. According to the topics
of perfect rules it is introduced association rules for finding
rules between products in large scale data recorded
by point-of-sale (POS) systems in supermarkets. Consider
an example rule {onions, potatoes}=> {burger} found in
the sales information of market would notify that if a buyer
buys onions and potatoes once he or she is also buy
hamburger meat. That type of information can be utilized
as the basis for predictions about marketing situations such
as promotional pricing or product placements. [4]
contain the item set. In the example database, the item
set {milk, bread, butter} has a support of 1/5=0.2.
Since it occurs in 0.2 of all transactions (1 out of 5
transactions).

The confidence of
a
rule
is
defined
conf(XY)=sup(X U Y). Consider an example
{bread,butter}milk has confidence of 1.0. In the
database, which means that for 100% of the
transactions containing butter and bread the rule is
correct (100% of the times a customer buys butter and
bread, milk is bought as well). Be careful when
reading the expression: here supp(X∪Y) means
"support for occurrences of transactions where X and
Y both appear", not "support for occurrences of
transactions where either X or Y appears", the latter
interpretation arising because set union is equivalent
to logical disjunction. The argument of sup() is a set of
preconditions, and thus becomes more restrictive as it
grows (instead of more inclusive).

Confidence can be interpreted as an estimate of the
probability P(Y|X), the probability of finding the RHS
of the rule in transactions under the condition that
these transactions also contain the LHS.[4,9]
The above example from market basket
analysis association rules are generated now in more
application areas includes Web usage mining and intrusion
detectionin frequentgeneration and bioinformatics. Other
than that with sequential association rule mining does not
include the items within transactions.[8]
Consider I={i1,i2,….in} be a group of n binary
objects called as items. Let D={t1,t2,t3…..tm} is a group of
transactions called as the database. In every transaction in
D has a distinct transaction ID and it consists of a subset of
items in I. A association rule is defined as the form X=>Y
where X,Y ⊆I and X⋂ Y=∮ . The group of items X and Y
are left hand side and the rule is right hand side
respectively. Consider an example that is the set id I
={milk, bread, butter, beer} and a small database consists
of the objects that their presence is represented as 1 and the
absence is represented as 0. Consider a transaction that is
{bread butter}milk. That means the customer buy bread
and butter there is chance to buy milk.
Note: this example is extremely small. In practical
applications, a rule needs a support of several hundred
transactions before it can be considered statistically
significant, and datasets often contain thousands or
millions of transactions.[5]
To select interesting rules from the set of all possible
rules, constraints on various measures of significance and
interest can be used. The best-known constraints are
minimum thresholds on support and confidence.

Finding all frequent itemsets in a database is difficult since
it involves searching all possible itemsets (item
combinations). The set of possible itemsets is the power
set over I and has size 2n -1 excluding the empty set which
is not a valid itemset). Although the size of the powerset
grows exponentially in the number of items n in I, efficient
search is possible using the downward-closure property of
support (also called anti-monotonicity) which guarantees
that for a frequent itemset, all its subsets are also frequent
and thus for an infrequent itemset, all its supersets must
also be infrequent. Exploiting this property, efficient
algorithms can find all frequent itemsets.[10]
III. PROPOSED SYSTEM
In the proposed system we provide privacy preserving
mining of association rules from outsourced transaction
database. By using cryptography and association rules we
are provide security over transaction database.
The support sup(X) of an itemset X is defined as the
proportion of transactions in the data set which
ISSN: 2231-5381
http://www.ijettjournal.org
Page 345
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
Bipartite Matrix
Data owner
Service Provider
S-Des
Algorithm
the form cipher format. After sending the service provider
will find the frequent items set and sent to data owner. The
data owner will perform the decryption process and get the
plain frequent transaction item sets.
Data Owner:
The data owner collects all the transaction item sets and
forwards to service provider.
Before sending all transaction items, the data owner will
convert transaction items into unknown format. Because,to
provide privacy over transaction item sets. So that, we can
use cryptography technique for security of item sets. The
conversion
unknown format of transaction items sets can be done by
data owner. The data owner can also convert unknown
format of frequent transaction items sets into plain format
after receiving of cipher patterns from service provider.
Privacy of transaction item sets:
The data owner collects all the transaction item sets and
convert cipher format by using S-DES algorithm. By
converting cipher format we can provide privacy of
transaction items. The data owner will perform all
encryption and decryption transaction items by providing
security of item sets. The data owner performing
encryption of item sets we can sent to service provider in
ISSN: 2231-5381
Service provider:
The service provider is third party user for provide service
to all the company members.
All the company members will sent transaction item sets to
service provider. Before sending the transaction item sets
to service provider the data owner will encrypt the item
into cipher format. After converting cipher format the data
owner will sent. After sending the service provider will
retrieve the all cipher transaction item sets and perform the
Association Rule technique for finding the frequent item
sets. In this paper we are proposed bipartite matrix
technique for finding frequent item sets. The process of
bipartite matrix is as follows.
http://www.ijettjournal.org
Page 346
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
I)
Creating bipartite matrix according to frequent
item sets:
The service provider will generate all
bipartite matrix based on the transaction item
sets. The procedure for generating bipartite
matrix as follows.
D =transaction data bases
S=item sets
T=items
In put: transaction data bases D
Output: frequent length item sets Si
if number >min_sup
generate maximum length candidate iteinsets from S,
for each itemsets in the candidate itemsets
calculateSuppor(itemsets)
if support(itemsets)>min_ sup
itemset is frequent
end If
end for
end if
If maximum frequent itemsets is not null
break;
end if
end for
End
Process:
Begin
Find all frequent length item sets Si from D
If Si not null
For each Sxin Si
For each item t in D
If ti containsSx
Sx|ti| =1
Else Sx|ti|=0
Return tw|N|
End for
End for
End if
After finding maximum frequent item sets the service
provider will sent the item sets to data owner. The data
owner will retrieve the frequent item sets and convert into
plain format.
Frequent patterns generation:
tw|N|={t1|N|,t2|n|……..tn|N|}
end
Frequent ‘n’ item set
II)
Extracting maximum frequent item sets from
bipartite matrix:
After generating bipartite matrix the
service provider will find the frequent item sets
from bipartite matrix.Finding frequent item sets
using bipartite matrix is as follows.
Input: The bipartitematrix frequent lengths-1 item
sets Si
Minimum support min_sup.
Frequent length-1itemsets Si
Output: Maximum frequentitemsets
Begin
For each column in the bipartite matrix
compute the number of value 1 in the current column
end for
return max[n]
sort( max[n])
for each one In the max[n]
compute number of the columns with the same number
value 1
ISSN: 2231-5381
Initially frequent one item set can be generated by
counting number of individual items in all transactions
like, Consider item ‘a’, now count number of ‘1’s opposite
to item ‘a’ in all transactions, total count of a is 3 because a
available in transaction1 2 and 6.if the count equal or
greater than minimum threshold value or support count ( 2
in our example) it can be treated as frequent item.
To find the frequent two item set or three item set or n item
set, we can follow the same procedure until frequent items
found. Consider two item set {a,b},now check the
corresponding ones opposite to “a,b” (both should be set to
“1”),then count would be “1”.In the above table
transaction 1 and 6 contains “1” in both places of a and b,
so count is 2. Now {a,b} is a frequent item ,because our
minim support count value is 2,by the same process you
can find the remaining frequent patterns
IV. CONCLUSION
In a cloud computing there has recent interest for
considerable paradigm of mining as service for company.
For mining frequent item sets from the data set we are
using association rule technique. In this paper we are
proposed mainly two concepts for privacy of transaction
data base and find the frequent item sets. First one provide
privacy transaction data base we are using S-DES
algorithm. The second one bipartite matrix for finding
frequent item sets. By proposing that concept we can
provide more efficiency and security of transaction item
sets and also provide security of frequent item sets.
http://www.ijettjournal.org
Page 347
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
REFERENCES
[1] D. Agrawal and C. C. Aggarwal. On the design and
quantification of privacy preserving data mining
algorithms. In Proceedings of the Twentieth ACM
SIGACT-SIGMOD-SIGART Symposium on Principles of
Database Systems, pages 247– 255, Santa Barbara,
California, USA, May 21-23 2001. ACM.
[2] R. Agrawal and R. Srikant. Privacy-preserving data
mining. In Proceedings of the 2000 ACM SIGMOD
Conference on Management of Data, pages 439–450,
Dallas, TX, May 14- 19 2000. ACM.
[3] M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim,
and V. Verykios. Disclosure limitation of sensitive rules.
In Knowledge and Data Engineering Exchange Workshop
(KDEX’99), pages 25–32, Chicago, Illinois, Nov. 8 1999.
[4] Special issue on constraints in data mining. SIGKDD
Explorations, 4(1), June 2002.
[5] C. Clifton. Using sample size to limit exposure to data
mining. Journal of Computer Security, 8(4):281–307, Nov.
2000.
[6] H. S. Delugach and T. H. Hinke. Wizard: A database
inference analysis and detection system. IEEE Transactions
on Knowledge and Data Engineering, 8(1), Feb. 1996.
[7] W. Du and M. J. Atallah. Privacy-preserving
cooperative scientific computations. In 14th IEEE
Computer Security Foundations Workshop, pages 273–
282, Nova Scotia, Canada, June 11-13 2001.
[8] W. Du and M. J. Atallah. Privacy-preserving statistical
analysis. In Proceeding of the 17th Annual Computer
Security Applications Conference, New Orleans,
Louisiana, USA, December 10-14 2001.
[9] Directive 95/46/ec of the european parliament and of
the council of 24 october 1995 on the protection of
individualswith regard to the processing of personal data
and on thefree movement of such data. Official Journal of
the EuropeanCommunities, No I.(281):31–50, Oct. 24
1995.
[10] A. Eisenberg.With false numbers, data crunchers try to
minethe truth. New York Times, July 18 2002.
Andhra Pradesh. He received his M.Tech (CSE) from
Aditya
Institute
of
Technology
And
Management(AITAM), Tekkali. Andhra Pradesh. His
research areas include Data Mining, Image Processing,
Computer Networks, Distributed Systems. He published
six papers in international journals and he attended for
three conferences.
BIOGRAPHIES
Gangu. DharmaRaju is student in
M.Tech(CSE) in Sarada Institute of
Science Technology and Management,
Srikakulam. He has received his
B.Tech(IT) from Sri Sivani College of
Engineering(SSCE), Srikakulam. His
interesting areas are Data Mining,
Networking.
Jayanthi Rao Madina is working as a
HOD in Sarada Institute of Science,
Technology
And
Management(SISTAM),
Srikakulam,
ISSN: 2231-5381
http://www.ijettjournal.org
Page 348
Download