Document 12929270

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 1- January 2016
Effective Positive and Negative Association Rules Using Bit
Vector Matrix in Data Mining
Harish Relangi1, Behara Vineela 2
1,2
Final M.Tech Student1, Asst.professo2
Dept of CSE, Sarada Institute of Science, Technology and Management (SISTAM), Srikakulam ,
Andhra Pradesh, India
Abstract:
Association rule learning is a
popular and well researched method for discovering
interesting relations between variables in large
databases. Most of the algorithms for mining
quantitative association rule for finding frequent
item set of positive item sets. Most of the algorithms
for mining cannot pay attention negative
dependencies. So that those algorithms are used to
extract such rules usually consider only one
evaluation criteria in measuring quality of generated
rules. So this is one of drawback for identify
negative quantitative association rules. In this paper
we are proposed bit vector (BV) generation for
finding quantitative association rules of positive and
negative item sets. So by using this concept we can
reduce time complexity for finding frequent item sets
of positive and negative association rule. By
performing this process we can also provide more
flexibility for the generation frequent item sets.
Keywords: Association Rule mining, frequent item
sets, negative association rule mining, data mining.
I. INTRODUCTION
Data mining has been vital in terms of
discovering patterns from large set of databases.
Data mining has been coined as an intermediate step
of knowledge discovery of data. It has been a
blended of many a process for extracting reports on
par of interestingness of the target user. Data mining
can be defined as an interdisciplinary approach for
mapping data sets and to the process of visualizing it.
It is a process of transforming raw data into its
understandable format meant for future usage. Data
mining aims at analyzing data into set of data groups;
termed as clusters; or as a set of unusual
dependencies ; termed to be outliers in analysis ; or
as a set of dependencies. The set of dependencies
that incur between any sets of data is termed as a
process of association. Association has been a
prominent endeavor for analyzing dependency of
one data object on the other, which is generally
associated by means of support and confidence. This
association study has become prominent in many
disciplines like market basket analysis, fraud
detection in password management and in many
decision support systems, intrusion detection, and
telecommunication. It allows users to analyze data
from many different dimensions or angles,
categorize it, and summarize the relationships
identified. Technically, data mining is the process of
ISSN: 2231-5381
finding correlations or patterns among dozens of
fields in large relational databases. Although data
mining is a relatively new term, the technology is
not. However, continuous innovations in computer
processing power, disk storage, and statistical
software are dramatically increasing the accuracy of
analysis while driving down the cost. Association
discovery in databases. Among sets of items in
transaction databases, it aims at discovering
implicative tendencies that can be valuable
information for the decision maker.
Association rule mining is associated in
deriving frequent features termed as frequent item
sets. Positive association rules are generated in
visualizing and predicting the outcomes by
analyzing the support and confidence factors. Many
algorithms have been coined for analyzing and
generating rules depending upon the level of
association that is hailing between data objects. The
rules generated are maintained for future prediction
analysis and many years the data sets with minimal
support are simply ignored or pruned as they form
negative associations.
Decision support system is built only on the
basis of positive association rules generated. Recent
researches have proved that the negative associations
which deal with infrequency in item set generations
are also important for analyzing the robustness of the
system and to build a reliable system. Mining of
positive and negative associations has attained
demand in studying frequent and infrequent item
sets. Much effort is to be posed for analyzing
negative associations. These associations are used
for extracting frequent items from infrequent item
sets and vice-versa by minimizing the threshold
levels of associations. Highly correlated data objects
are analyzed with ease. Many frameworks have been
designed for maintaining such infrequency in item
set generation. Additional interesting measures are
generally added to reduce the negative associations
or same measures are considered for framework for
deriving robust association rules.
Most of the Association rules generally rely
on single evaluation criteria, termed as monoobjective algorithm with certain limitations of
optimizing multi-interesting measures for easy
understanding and good coverage of the data set
objectives. Recent researches are been proposed for
extracting association rules as a multi-objective, by
considering several objectives in the process of
extracting associations depending on interestingness.
http://www.ijettjournal.org
Page 41
International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 1- January 2016
Bit vector generation has been proposed for
generating quantitative association rules of positive
and negative association with much reduced time
complexity and with more flexibility. Bit vectors are
derived depending on the occurrence of data items.
Care is taken for analyzing positive and negative
Boolean associations with thresholds defined to
support and confidence measures. Reformation of
associations rules are performed as per the
scalability of the data sets and threshold level
measures depending upon interestingness.
Apriori is the first algorithm proposed in
the association rule mining field and many other
algorithms were derived from it. Starting from a
database, it proposes to extract all association rules
satisfying minimum thresholds of support and
confidence. It is very well known that mining
algorithms can discover a prohibitive amount of
association rules; for instance, thousands of rules are
extracted from a database of several dozens of
attributes and several hundreds of transactions.
Valuable information is often represented by those
rare—low support—and unexpected association
rules which are surprising to the user. So, the more
we increase the support threshold, the more efficient
the algorithms are and the more the discovered rules
are obvious, and hence, the less they are interesting
for the user. As a result, it is necessary to bring the
support threshold low enough in order to extract
valuable information. Rule mining, introduced in, is
considered as one of the most important tasks in
Knowledge. Experiments show that rules become
almost impossible to use when the number of rules
overpasses 100. Thus, it is crucial to help the
decision-maker with an efficient technique for
reducing the number of rules.
To overcome this drawback, several
methods were proposed in the literature. On the one
hand, different algorithms were introduced to reduce
the number of item sets by generating closed,
maximal or optimal item sets , and several
algorithms to reduce the number of rules, using non
redundant rules , or pruning techniques . On the
other hand, post processing methods can improve the
selection
of
discovered
rules.
Different
complementary post processing methods may be
used, like pruning, summarizing, grouping, or
visualization. Pruning consists in removing
uninteresting or redundant rules. In summarizing,
concise sets of rules are generated. Groups of rules
are produced in the grouping process; and the
visualization improves the readability of a large
number of rules by using adapted graphical
representations. However, most of the existing post
processing methods are generally based on statistical
information in the database. Since rule
interestingness strongly depends on user knowledge
and goals, these methods do not guarantee that
interesting rules will be extracted. For instance, if
the user looks for unexpected rules, all the already
ISSN: 2231-5381
known rules should be pruned. Or, if the user wants
to focus on specific schemas of rules, only this
subset of rules should be selected. Moreover, as
suggested in, the rule post processing methods
should be imperatively based on a strong interactive.
II.
RELATED WORK
Many evolutionary algorithms (EAs) [13], have been
proposed in the literature for extracting a set of
QARs from datasets [14]–[16]. EAs, particularly
genetic algorithms (GAs) [17], are considered to be
one of the most successful search techniques for
complex problems and have proved to be an
important technique for learning and knowledge
extraction. These algorithms usually consider only
one evaluation criterion in measuring the quality of
the generated rules. Recently, some researchers have
framed the extraction of association rules as a multi
objective (rather than a single objective) problem,
taking into account several objectives in the process
of extracting association rules [18], [19]. This
approach removes some of the limitations of the
mono-objective algorithms and allows us to jointly
optimize several measures in order to mine a set of
rules that are interesting, easy to understand, and
with good coverage of the dataset.
Multi objective evolutionary algorithms (MOEAs)
[20], [21] provide an interesting method with which
to approach problems of a multi objective nature, as
they generate a family of equally valid solutions, in
which each solution tends to satisfy a criterion to a
greater extent than another. For this reason, some
MOEAs have been applied to mine QARs (by
considering several measures as objectives) [22],
[23] where each solution in the Pareto front
represents a QAR with different degree of tradeoff
between the different measures. Recent MOEAs are
based on decomposition (MOEA/D [24] and
MOEA/D-DE [25]), which explicitly decomposes
the multiobjective optimization problem into N
scalar optimization subproblems, and also optimizes
them simultaneously. These approaches have shown
some advantages over other MOEAs, presenting
lower computational complexity and a better
performance in three-objective continuous test
instances. Note MOEA/D [24] won the CEC2009
competition. These reasons have given rise to a
growing interest in these approaches within the
MOEA research community.
III.PROPOSED SYSTEM
Most of the methods proposed for mining
positive and negative association rules, maintains
both frequent and infrequent item sets and hence
suffer from scalability. To maintain the execution
time within user’s expectations, it is necessary to
design an efficient approach to mine both positive
http://www.ijettjournal.org
Page 42
International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 1- January 2016
and negative association rules. Given a database of
transactions DB and user-defined minimum support
(ms) value, minimum confidence (mc) values, the
problem is to extract all interesting positive and
negative Boolean association rules. We propose BV
process for finding frequent and infrequent item sets
in
a
give
transaction
database.
The
BV process contain two phases.
Phase 1: Generation of BV Vector
1.
Read all item sets from the transactional
database
Scan the transactional data base and store the
item based vectors BV.
For each item in the transactional database
then put one in the BV vector .
If item is not exists in transactional database
then put zero in the BV vector.
Repeat this process for end of transactional
database.
2.
3.
4.
5.
IV. CONCLUSIONS
In this paper, we have designed a new BV
structure to store both the frequent and infrequent
item sets for mining both positive and negative
association rules. In the proposed method the
database is scanned only once for mining positive
and negative association rules, so it reduces the
number of I/O operations. Another flexibility of the
structure is, if any new frequent 1-itemsets are
mined by reducing the user threshold value
(minimum support (ms)), the proposed method
allows appending of new items given item sets and
reperforming for finding positive and negative
association rule.
V. REFERENCES
[1]. A. Eiben and J. Smith, Introduction to Evolutionary
Computing. Berlin, Germany: Springer-Verlag, 2003.
[2] J. Mata, J. Alvarez, and J. Riquelme, “Mining numeric
association rules with genetic algorithms,” in Proc. 5th Int. Conf.
Artif. Neural Netw. Genetic Algorithms, Apr. 2001, pp. 264–267.
[3]. J. Alcala-Fdez, N. Flugy-Pape, A. Bonarini, and F. Herrera,
“Analysis of the effectiveness of the genetic algorithms based on
extraction of association rules,” Fund. Inf., vol. 98, no. 1, pp.
1001–1014, 2010.
[4]. D. Goldberg, Genetic Algorithms in Search, Optimization and
Machine Learning. Reading, MA, USA/White Plains, NY, USA:
Addison- Wesley/Longman, 1989.
[5] B. Alatas and E. Akin, “MODENAR: Multi-objective
diferential evolution algorithm for mining numeric association
rules,” Appl. Soft Comput., vol. 8, no. 1, p. 646, 2008.
Phase 2: Generation of frequent and infrequent
item sets from the transactional database
1.
2.
3.
4
Insert frequent 1-items one by one and
assign to an index FreqIndex
Generate candidate k-item sets from
frequent item sets. For each item X in a
candidate k item sets Ck
If supp(X) ≥ min_supp and Corr(X)>1 then
assign X to frequent k- item set list (FLk)
otherwise assign X to infrequent k-item set
list (IFLk).
Calculate support of X by performing
bitwise AND operation between bit vectors
(BV) (if x1, x2 then the supp(X) = x1 ^
x2).
5. Assign the FLk and IFLk to FreqIndex k and
InfreqIndex k respectively
By implementing this process we can find frequent
and infrequent item sets from the given transactional
database. So proposing this process we can also
provide more complexity and more efficiency.
ISSN: 2231-5381
[6] A. Ghosh and B. Nath, “Multi-objective rule mining using
genetic algorithms,” Inf. Sci., vol. 163, nos. 1–3, pp. 123–133,
2004.
[7]. C. Coello, G. Lamont, and D. V. Veldhuizen, Evolutionary
Algorithms for Solving Multi-Objective Problems. Norwell, MA,
USA: Kluwer Academic, 2002.
[8] K. Deb, Multi-Objective Optimization Using Evolutionary
Algorithms. Norwell, MA, USA: Kluwer Academic, 2001.
[9] D. Martin, A. Rosete, J. Alcala-Fdez, and F. Herrera, “A
multiobjective evolutionary algorithm for mining quantitative
association rules,” in Proc. 11th Int. Conf. Intell. Syst. Design
Applicat., Nov. 2011, pp. 1397–1402.
[10] H. Qodmanan, M. Nasiri, and B. Minaei-Bidgoli, “Multiobjective association rule mining with genetic algorithm without
specifying minimum support and minimum confidence,” Expert
Syst. Applicat., vol. 38, no. 1, pp. 288–298, 2011.
[11] Q. Zhang and H. Li, “MOEA/D: A multiobjective
evolutionary algorithm based on decomposition,” IEEE Trans.
Evol. Comput., vol. 11, no. 6, pp. 712–731, Dec. 2007.
[12] H. Li and Q. Zhang, “Multiobjective optimization problems
with complicated Pareto sets, MOEA/D and NSGA-II,” IEEE
Trans. Evol. Comput., vol. 13, no. 2, pp. 284–302, Apr. 2009.
http://www.ijettjournal.org
Page 43
International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 1- January 2016
BIOGRAPHIES:
Harish Relangi is student in
M.Tech (SE) in Sarada
Institute
of
Science
Technology
and
Management, Srikakulam.
He has received his B.Tech
(IT) Thandra Paparaya
institute of science and
technology
bobbili,
Vijayanagaram.
His
interesting
areas
are
network security and web technologies.
Behara Vineela
is
working as Asst.
professor in Sarada Institute
of Science, Technology
And
Management,
Srikakulam,
Andhra
Pradesh. She received his
M.Tech
(CSE)
from
AITAM
,Tekkali,
Srikakulam, AndhraPradesh.
JNTU Kakinada Andhra Pradesh. Her research areas
include Network Security
ISSN: 2231-5381
http://www.ijettjournal.org
Page 44
Download