International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 1- January 2016 Effective Positive and Negative Association Rules Using Bit Vector Matrix in Data Mining Harish Relangi1, Behara Vineela 2 1,2 Final M.Tech Student1, Asst.professo2 Dept of CSE, Sarada Institute of Science, Technology and Management (SISTAM), Srikakulam , Andhra Pradesh, India Abstract: Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. Most of the algorithms for mining quantitative association rule for finding frequent item set of positive item sets. Most of the algorithms for mining cannot pay attention negative dependencies. So that those algorithms are used to extract such rules usually consider only one evaluation criteria in measuring quality of generated rules. So this is one of drawback for identify negative quantitative association rules. In this paper we are proposed bit vector (BV) generation for finding quantitative association rules of positive and negative item sets. So by using this concept we can reduce time complexity for finding frequent item sets of positive and negative association rule. By performing this process we can also provide more flexibility for the generation frequent item sets. Keywords: Association Rule mining, frequent item sets, negative association rule mining, data mining. I. INTRODUCTION Data mining has been vital in terms of discovering patterns from large set of databases. Data mining has been coined as an intermediate step of knowledge discovery of data. It has been a blended of many a process for extracting reports on par of interestingness of the target user. Data mining can be defined as an interdisciplinary approach for mapping data sets and to the process of visualizing it. It is a process of transforming raw data into its understandable format meant for future usage. Data mining aims at analyzing data into set of data groups; termed as clusters; or as a set of unusual dependencies ; termed to be outliers in analysis ; or as a set of dependencies. The set of dependencies that incur between any sets of data is termed as a process of association. Association has been a prominent endeavor for analyzing dependency of one data object on the other, which is generally associated by means of support and confidence. This association study has become prominent in many disciplines like market basket analysis, fraud detection in password management and in many decision support systems, intrusion detection, and telecommunication. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of ISSN: 2231-5381 finding correlations or patterns among dozens of fields in large relational databases. Although data mining is a relatively new term, the technology is not. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost. Association discovery in databases. Among sets of items in transaction databases, it aims at discovering implicative tendencies that can be valuable information for the decision maker. Association rule mining is associated in deriving frequent features termed as frequent item sets. Positive association rules are generated in visualizing and predicting the outcomes by analyzing the support and confidence factors. Many algorithms have been coined for analyzing and generating rules depending upon the level of association that is hailing between data objects. The rules generated are maintained for future prediction analysis and many years the data sets with minimal support are simply ignored or pruned as they form negative associations. Decision support system is built only on the basis of positive association rules generated. Recent researches have proved that the negative associations which deal with infrequency in item set generations are also important for analyzing the robustness of the system and to build a reliable system. Mining of positive and negative associations has attained demand in studying frequent and infrequent item sets. Much effort is to be posed for analyzing negative associations. These associations are used for extracting frequent items from infrequent item sets and vice-versa by minimizing the threshold levels of associations. Highly correlated data objects are analyzed with ease. Many frameworks have been designed for maintaining such infrequency in item set generation. Additional interesting measures are generally added to reduce the negative associations or same measures are considered for framework for deriving robust association rules. Most of the Association rules generally rely on single evaluation criteria, termed as monoobjective algorithm with certain limitations of optimizing multi-interesting measures for easy understanding and good coverage of the data set objectives. Recent researches are been proposed for extracting association rules as a multi-objective, by considering several objectives in the process of extracting associations depending on interestingness. http://www.ijettjournal.org Page 41 International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 1- January 2016 Bit vector generation has been proposed for generating quantitative association rules of positive and negative association with much reduced time complexity and with more flexibility. Bit vectors are derived depending on the occurrence of data items. Care is taken for analyzing positive and negative Boolean associations with thresholds defined to support and confidence measures. Reformation of associations rules are performed as per the scalability of the data sets and threshold level measures depending upon interestingness. Apriori is the first algorithm proposed in the association rule mining field and many other algorithms were derived from it. Starting from a database, it proposes to extract all association rules satisfying minimum thresholds of support and confidence. It is very well known that mining algorithms can discover a prohibitive amount of association rules; for instance, thousands of rules are extracted from a database of several dozens of attributes and several hundreds of transactions. Valuable information is often represented by those rare—low support—and unexpected association rules which are surprising to the user. So, the more we increase the support threshold, the more efficient the algorithms are and the more the discovered rules are obvious, and hence, the less they are interesting for the user. As a result, it is necessary to bring the support threshold low enough in order to extract valuable information. Rule mining, introduced in, is considered as one of the most important tasks in Knowledge. Experiments show that rules become almost impossible to use when the number of rules overpasses 100. Thus, it is crucial to help the decision-maker with an efficient technique for reducing the number of rules. To overcome this drawback, several methods were proposed in the literature. On the one hand, different algorithms were introduced to reduce the number of item sets by generating closed, maximal or optimal item sets , and several algorithms to reduce the number of rules, using non redundant rules , or pruning techniques . On the other hand, post processing methods can improve the selection of discovered rules. Different complementary post processing methods may be used, like pruning, summarizing, grouping, or visualization. Pruning consists in removing uninteresting or redundant rules. In summarizing, concise sets of rules are generated. Groups of rules are produced in the grouping process; and the visualization improves the readability of a large number of rules by using adapted graphical representations. However, most of the existing post processing methods are generally based on statistical information in the database. Since rule interestingness strongly depends on user knowledge and goals, these methods do not guarantee that interesting rules will be extracted. For instance, if the user looks for unexpected rules, all the already ISSN: 2231-5381 known rules should be pruned. Or, if the user wants to focus on specific schemas of rules, only this subset of rules should be selected. Moreover, as suggested in, the rule post processing methods should be imperatively based on a strong interactive. II. RELATED WORK Many evolutionary algorithms (EAs) [13], have been proposed in the literature for extracting a set of QARs from datasets [14]–[16]. EAs, particularly genetic algorithms (GAs) [17], are considered to be one of the most successful search techniques for complex problems and have proved to be an important technique for learning and knowledge extraction. These algorithms usually consider only one evaluation criterion in measuring the quality of the generated rules. Recently, some researchers have framed the extraction of association rules as a multi objective (rather than a single objective) problem, taking into account several objectives in the process of extracting association rules [18], [19]. This approach removes some of the limitations of the mono-objective algorithms and allows us to jointly optimize several measures in order to mine a set of rules that are interesting, easy to understand, and with good coverage of the dataset. Multi objective evolutionary algorithms (MOEAs) [20], [21] provide an interesting method with which to approach problems of a multi objective nature, as they generate a family of equally valid solutions, in which each solution tends to satisfy a criterion to a greater extent than another. For this reason, some MOEAs have been applied to mine QARs (by considering several measures as objectives) [22], [23] where each solution in the Pareto front represents a QAR with different degree of tradeoff between the different measures. Recent MOEAs are based on decomposition (MOEA/D [24] and MOEA/D-DE [25]), which explicitly decomposes the multiobjective optimization problem into N scalar optimization subproblems, and also optimizes them simultaneously. These approaches have shown some advantages over other MOEAs, presenting lower computational complexity and a better performance in three-objective continuous test instances. Note MOEA/D [24] won the CEC2009 competition. These reasons have given rise to a growing interest in these approaches within the MOEA research community. III.PROPOSED SYSTEM Most of the methods proposed for mining positive and negative association rules, maintains both frequent and infrequent item sets and hence suffer from scalability. To maintain the execution time within user’s expectations, it is necessary to design an efficient approach to mine both positive http://www.ijettjournal.org Page 42 International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 1- January 2016 and negative association rules. Given a database of transactions DB and user-defined minimum support (ms) value, minimum confidence (mc) values, the problem is to extract all interesting positive and negative Boolean association rules. We propose BV process for finding frequent and infrequent item sets in a give transaction database. The BV process contain two phases. Phase 1: Generation of BV Vector 1. Read all item sets from the transactional database Scan the transactional data base and store the item based vectors BV. For each item in the transactional database then put one in the BV vector . If item is not exists in transactional database then put zero in the BV vector. Repeat this process for end of transactional database. 2. 3. 4. 5. IV. CONCLUSIONS In this paper, we have designed a new BV structure to store both the frequent and infrequent item sets for mining both positive and negative association rules. In the proposed method the database is scanned only once for mining positive and negative association rules, so it reduces the number of I/O operations. Another flexibility of the structure is, if any new frequent 1-itemsets are mined by reducing the user threshold value (minimum support (ms)), the proposed method allows appending of new items given item sets and reperforming for finding positive and negative association rule. V. REFERENCES [1]. A. Eiben and J. Smith, Introduction to Evolutionary Computing. Berlin, Germany: Springer-Verlag, 2003. [2] J. Mata, J. Alvarez, and J. Riquelme, “Mining numeric association rules with genetic algorithms,” in Proc. 5th Int. Conf. Artif. Neural Netw. Genetic Algorithms, Apr. 2001, pp. 264–267. [3]. J. Alcala-Fdez, N. Flugy-Pape, A. Bonarini, and F. Herrera, “Analysis of the effectiveness of the genetic algorithms based on extraction of association rules,” Fund. Inf., vol. 98, no. 1, pp. 1001–1014, 2010. [4]. D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA, USA/White Plains, NY, USA: Addison- Wesley/Longman, 1989. [5] B. Alatas and E. Akin, “MODENAR: Multi-objective diferential evolution algorithm for mining numeric association rules,” Appl. Soft Comput., vol. 8, no. 1, p. 646, 2008. Phase 2: Generation of frequent and infrequent item sets from the transactional database 1. 2. 3. 4 Insert frequent 1-items one by one and assign to an index FreqIndex Generate candidate k-item sets from frequent item sets. For each item X in a candidate k item sets Ck If supp(X) ≥ min_supp and Corr(X)>1 then assign X to frequent k- item set list (FLk) otherwise assign X to infrequent k-item set list (IFLk). Calculate support of X by performing bitwise AND operation between bit vectors (BV) (if x1, x2 then the supp(X) = x1 ^ x2). 5. Assign the FLk and IFLk to FreqIndex k and InfreqIndex k respectively By implementing this process we can find frequent and infrequent item sets from the given transactional database. So proposing this process we can also provide more complexity and more efficiency. ISSN: 2231-5381 [6] A. Ghosh and B. Nath, “Multi-objective rule mining using genetic algorithms,” Inf. Sci., vol. 163, nos. 1–3, pp. 123–133, 2004. [7]. C. Coello, G. Lamont, and D. V. Veldhuizen, Evolutionary Algorithms for Solving Multi-Objective Problems. Norwell, MA, USA: Kluwer Academic, 2002. [8] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms. Norwell, MA, USA: Kluwer Academic, 2001. [9] D. Martin, A. Rosete, J. Alcala-Fdez, and F. Herrera, “A multiobjective evolutionary algorithm for mining quantitative association rules,” in Proc. 11th Int. Conf. Intell. Syst. Design Applicat., Nov. 2011, pp. 1397–1402. [10] H. Qodmanan, M. Nasiri, and B. Minaei-Bidgoli, “Multiobjective association rule mining with genetic algorithm without specifying minimum support and minimum confidence,” Expert Syst. Applicat., vol. 38, no. 1, pp. 288–298, 2011. [11] Q. Zhang and H. Li, “MOEA/D: A multiobjective evolutionary algorithm based on decomposition,” IEEE Trans. Evol. Comput., vol. 11, no. 6, pp. 712–731, Dec. 2007. [12] H. Li and Q. Zhang, “Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGA-II,” IEEE Trans. Evol. Comput., vol. 13, no. 2, pp. 284–302, Apr. 2009. http://www.ijettjournal.org Page 43 International Journal of Engineering Trends and Technology (IJETT) – Volume 31 Number 1- January 2016 BIOGRAPHIES: Harish Relangi is student in M.Tech (SE) in Sarada Institute of Science Technology and Management, Srikakulam. He has received his B.Tech (IT) Thandra Paparaya institute of science and technology bobbili, Vijayanagaram. His interesting areas are network security and web technologies. Behara Vineela is working as Asst. professor in Sarada Institute of Science, Technology And Management, Srikakulam, Andhra Pradesh. She received his M.Tech (CSE) from AITAM ,Tekkali, Srikakulam, AndhraPradesh. JNTU Kakinada Andhra Pradesh. Her research areas include Network Security ISSN: 2231-5381 http://www.ijettjournal.org Page 44