International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 8, August 2013 ISSN 2319 - 4847 AN EFFICIENT ALGORITHM (FUFM) FOR MINING FREQUENT ITEM SETS Nazeer.Shaik1, N.L. Prasanna2 1 2 Pursuing M.Tech in CSE at Vignan's LARA Institute Of Technology and Science, Vadlamudi, Guntur Dist., A.P., India. Asst.Prof, Department of CSE, Vignan's LARA Institute Of Technology & Science, Vadlamudi Guntur Dist., A.P., India. ABSTRACT As the trends in the technology developing data mining turns to the advanced aspects. This paper explains about the item set mining. Frequent item sets are the one occurring randomly while mining the transactional data base. Utility based data mining is a new research area interested in all types of utility factors in data mining processes and targeted at incorporating utility considerations in data mining tasks. Advanced area in this field is the fast utility mining process which gives accurate results. Frequent Utility Frequent Mining(FUFM) is the new algorithm introduced here to retrieve the item sets fast from transactional database. The main aim in this paper is to retrieve the frequent utility itemsets and cluster those item sets with keyword or by number assignment. The results will be displayed without any loss of data. Keywords: Frequent Utility Frequent Mining(FUFM), Umining, Knowledge Discovery in Databases (KDD) ,UP growth. 1. INTRODUCTION Data mining and knowledge discovery from data bases has received much attention in recent years. Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, previously unknown and potentially useful patterns in data. These patterns are used to make predictions or classifications about new data, explain existing data, summarize the contents of a large database to support decision making and provide graphical data visualization to aid humans in discovering deeper patterns. The main aim in this paper is identifying and grouping the frequently used item sets from the transactional database. While in the auditing process of data base, the items which are purchased or collected frequently and clustering the frequent items displays as the better mining process. 2. BACKGROUND WORK KDD: The KDD process comprises of a few steps leading from raw data to some form of new knowledge. The volume of data contained in a database often exceeds the ability to analyze it efficiently, resulting in a gap between the collection of data and its understanding. A new concept is proposed for generating different kinds of itemsets namely High utility and high frequent itemsets (HUHF), High utility and low frequent itemsets (HULF), Low utility and high frequent itemsets (LUHF) and Low utility and low frequent itemsets (LULF). These itemsets are generated using the basic framework of FUM and FUFM algorithms. Customer Relationship Management (CRM) is incorporated into the system by generating a list of customers who are frequent buyers of these four different kind of itemsets. 3. OVERVIEW OF EXISTING SYSTEM 1 The traditional association rule mining (ARM) is used to identify frequently occurring patterns of item sets. 2 ARM model treats all the items in the database equally by only considering if an item is present in a transaction or not. 3 Though, frequency of occurrence may not express the semantics of applications, because the user's interest may be related to other factors, such as cost, profit, or aesthetic value. . 4 For example, a sales manager may not be interested in frequent item sets that do not generate significant profit. The frequent item set mining approach may not satisfy a sales manager’s goal. The support measure reflects the statistical correlation of items, but it does not reflect their semantic significance. In other words, statistical correlation may not measure how useful an item set is in accordance with a user’s preferences (i.e., profit). The profit of an item set depends not only on the support of the item set, but also on the prices of the items in that item set. Frequent Utility Frequent Mining(FUFM) consists of different methods. They are as follows a. HUHF b. HULF c. LUHF d. LULF Volume 2, Issue 8, August 2013 Page 81 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 8, August 2013 ISSN 2319 - 4847 Frequent Utility-Frequent Mining(FUFM) which finds all utility-frequent itemsets within the given utility and support constraints threshold. Utility-frequent itemsets are a special form of high utility itemsets using Selective Item Replication. There are two divisions are maintained. That is 1. High Utility High Frequency(HUHF), High Utility Low Frequency(HULF) 2. Low Utility High Frequency(LUHF), Low Utility Low Frequency(LULF) a. HUHF: High utility and high frequency itemsets by incorporating support into FUM algorithm. • First phase of this algorithm is to generate high utility itemsets H. • In the second phase, support value is calculated for each itemset in H b. HULF: high utility and low frequent itemset by support both FUM and FUFM and generated.algorithms. • The first phase is to generate high utility itemsets using FUM algorithm. • The second phase high utility high frequent itemsets are generated using FUFM(HU). HUHF itemsets are c. LUHF: To generate Low utility and high frequent itemsets. It follows the basic frame work of FUFM algorithm. d. LULF: Low utility and low frequent • First phase using exhaustive search low utility itemsets are determined. • Second phase, using set difference function low utility low frequent itemsets are generated from LU and LUHF. 4. ALGORITHM FUFM Task: Discovery of Utility Frequent Itemsets Input Database DB Constraints minUtil and minSup Output High Utility High Frequent itemsets (HUHF) [1] L = 1 [2] Find the set of candidates of length L with support >= minSup [3] Compute extended support for all candidates and output utility frequent itemsets [4] L += 1 [5] Use the frequent itemset mining algorithm to obtain new set of frequent candidates of length L from the old set of frequent candidates [6] Stop if the new set is empty otherwise go to step[3] Algorithm Working process The above steps proved success in finding the frequently occurred high utility itemsets. This is completely based on the threshold value which we assumed. Each and every stage is compared with the assumed value. Initial step here is to assigning the length of the candidate and comparing the value with the minimum support value. If it is greater than or equal to the minimum support, length of the set of candidates are displayed. Next step is for calculating the frequently occurred item sets and arranging the item sets into ascending order. With the use of frequent item set mining algorithm we get the frequent candidates of length L from the old set of frequent itemsets. While in the rotation of this process if we occur a new set with empty then stop the performance if not repeat the calculation process again and again until it get for empty set. Then proceed to stop the process and note the results occurred. 5. DATA FLOW DIAGRAM The above diagram depicts the complete chain process of calculating and displaying the frequent itemsets. In this comparing with threshold value gives the frequent utility item sets as the results. Volume 2, Issue 8, August 2013 Page 82 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 8, August 2013 ISSN 2319 - 4847 6. RESULTS Step 1 Entries in Umining algorithm Step 2 Then open FUM-F algorithm Step 3 Is opening four different mining algorithms. 1. HUHF- High Utility High Frequent Mining To view customer details press customer detail button 2. HULF-High Utility Low Frequent Mining Volume 2, Issue 8, August 2013 Page 83 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 8, August 2013 ISSN 2319 - 4847 To view customer details press customer detail button 3. LUHF-Low Utility High Frequent Mining To view customer details press customer detail button 4. LULF- Low Utility Low Frequent Mining To view customer details press customer detail button 7. CONCLUSION The UMining and FUM algorithms are for mining all high utility item sets. FUFM and FUM-F algorithms use both the statistical and the utility measures. From the basic framework of these algorithms the different kinds of item sets namely high utility high frequent, high utility low frequent, low utility high frequent and low utility low frequent are generated. Then Customer Relationship Management (CRM) is incorporated into the system by tracking the customers who are frequent buyers of the different kinds of item sets. REFERENCES [1] A. Erwin, R. P. Gopalan and N. R. Achuthan, “Efficient mining of high utility itemsets from large datasets,” in Proc. of PAKDD 2008, LNAI 5012, pp. 554-561 [2] H. F. Li, H. Y. Huang, Y. C. Chen, Y. J. Liu and S. Y. Lee, “Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams,” in Proc. of the 8th IEEE Int'l Conf. on Data Mining, pp. 881-886, 2008. [3] Y. Liu, W. Liao and A. Choudhary, “A fast high utility itemsets mining algorithm,” in Proc. of the Utility-Based Data Mining Workshop, 2005. [4] R. Agrawal and R. Srikant. “Fast algorithms for mining association rules,” in Proc. of the 20th VLDB Conf., pp. 487-499, 1994 [5] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” in Proc. of the 11th Int’l Conference on Data Engineering, pp. 3-14, Mar., 1995. [6] C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong and Y.-K. Lee. “Efficient tree structures for high utility pattern mining in incremental databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 21, Issue 12, pp.1708-1721, 2009. [7] Nazeer shaik, B. Renuka Devi, N L Prasanna, V.Satish kumar ” An Algorithm Used For Mining Frequent Pattern Sets From Very Large Databases” in the international conference. [8] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int’l Conf. Very Large Data Bases (VLDB ’94), pp. 487-499, Sept. 1994. [9] D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc.2001 Int’l Conf. Data Eng. (ICDE ’01), pp. 443-452, Apr. 2001. Volume 2, Issue 8, August 2013 Page 84 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 8, August 2013 ISSN 2319 - 4847 [10] A.W.-C. Fu, R.W.-W. Kwong, and J. Tang, “Mining n-Most Interesting Itemsets,” Proc. 2000 Int’l Symp. Methodologies for Intelligent Systems (ISMIS ’00), pp. 59-67, Oct. 2000. [11] H. F. Li, S. Y. Lee, & M. K. Shan “An efficient algorithm for mining frequent itemsets over the entire history of data streams” Proc. Int. Workshop on Knowledge Discovery in Data Streams, 2004. [12] J. Chang, W. Lee, “Finding recently frequent itemsets adaptively over online transactional data streams”, Information Systems, vol. 31 (8), pp. 849-869, 2006. [13] Y.-C. Li, J.-S. Yeh and C.-C. Chang, "A fast algorithm for mining share-frequent itemsets," in Proc. APWeb 2005, 417-428. [14] Frequent Itemset Mining Dataset Repository (FIMDR), http://fimi.cs.helsinki.fi/data/ (accessed 2009). [15] Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. In: Proc. IEEE ICDE 2001, pp. 433–442 (2001) [16] Hai Duong, Tin Truong, Bac Le “An Efficient Algorithm for Mining Frequent Itemsets with Single Constraint”. AUTHOR PROFILE Nazeer.Shaik, pursuing M.Tech in Computer Science Engineering at Vignan's LARA Institute Of Technology and Science, Vadlamudi, Guntur Dist., A.P., India. His research interests are Image Processing, Pattern Recognition and Data Mining. E-mail id: nazeer723@gmail.com. N.L.Prasanna, Asst.Prof, Department of CSE, Vignan's LARA Institute Of Technology & Science, Vadlamudi Guntur Dist., A.P., India. Her research interests are Data Mining Data Warehousing and Image Processing. Email id: prasanna.manu@gmail.com. Volume 2, Issue 8, August 2013 Page 85