AN EFFICIENT ALGORITHM (FUFM) FOR MINING FREQUENT ITEM SETS

advertisement
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 8, August 2013
ISSN 2319 - 4847
AN EFFICIENT ALGORITHM (FUFM) FOR
MINING FREQUENT ITEM SETS
Nazeer.Shaik1, N.L. Prasanna2
1
2
Pursuing M.Tech in CSE at Vignan's LARA Institute Of Technology and Science, Vadlamudi, Guntur Dist., A.P., India.
Asst.Prof, Department of CSE, Vignan's LARA Institute Of Technology & Science, Vadlamudi Guntur Dist., A.P., India.
ABSTRACT
As the trends in the technology developing data mining turns to the advanced aspects. This paper explains about the item set
mining. Frequent item sets are the one occurring randomly while mining the transactional data base. Utility based data mining is
a new research area interested in all types of utility factors in data mining processes and targeted at incorporating utility
considerations in data mining tasks. Advanced area in this field is the fast utility mining process which gives accurate results.
Frequent Utility Frequent Mining(FUFM) is the new algorithm introduced here to retrieve the item sets fast from transactional
database. The main aim in this paper is to retrieve the frequent utility itemsets and cluster those item sets with keyword or by
number assignment. The results will be displayed without any loss of data.
Keywords: Frequent Utility Frequent Mining(FUFM), Umining, Knowledge Discovery in Databases (KDD) ,UP growth.
1. INTRODUCTION
Data mining and knowledge discovery from data bases has received much attention in recent years. Data mining, the
extraction of hidden predictive information from large databases, is a powerful new technology with great potential to
help companies focus on the most important information in their data warehouses. Knowledge Discovery in Databases
(KDD) is the non-trivial process of identifying valid, previously unknown and potentially useful patterns in data. These
patterns are used to make predictions or classifications about new data, explain existing data, summarize the contents of a
large database to support decision making and provide graphical data visualization to aid humans in discovering deeper
patterns. The main aim in this paper is identifying and grouping the frequently used item sets from the transactional
database. While in the auditing process of data base, the items which are purchased or collected frequently and clustering
the frequent items displays as the better mining process.
2. BACKGROUND WORK
KDD: The KDD process comprises of a few steps leading from raw data to some form of new knowledge. The volume of
data contained in a database often exceeds the ability to analyze it efficiently, resulting in a gap between the collection of
data and its understanding.
A new concept is proposed for generating different kinds of itemsets namely High utility and high frequent itemsets
(HUHF), High utility and low frequent itemsets (HULF), Low utility and high frequent itemsets (LUHF) and Low utility
and low frequent itemsets (LULF). These itemsets are generated using the basic framework of FUM and FUFM
algorithms. Customer Relationship Management (CRM) is incorporated into the system by generating a list of customers
who are frequent buyers of these four different kind of itemsets.
3. OVERVIEW OF EXISTING SYSTEM
1 The traditional association rule mining (ARM) is used to identify frequently occurring patterns of item sets.
2 ARM model treats all the items in the database equally by only considering if an item is present in a transaction or not.
3 Though, frequency of occurrence may not express the semantics of applications, because the user's interest may be
related to other factors, such as cost, profit, or aesthetic value. .
4 For example, a sales manager may not be interested in frequent item sets that do not generate significant profit. The
frequent item set mining approach may not satisfy a sales manager’s goal. The support measure reflects the statistical
correlation of items, but it does not reflect their semantic significance. In other words, statistical correlation may not
measure how useful an item set is in accordance with a user’s preferences (i.e., profit). The profit of an item set
depends not only on the support of the item set, but also on the prices of the items in that item set.
Frequent Utility Frequent Mining(FUFM) consists of different methods. They are as follows
a. HUHF
b. HULF
c. LUHF
d. LULF
Volume 2, Issue 8, August 2013
Page 81
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 8, August 2013
ISSN 2319 - 4847
Frequent Utility-Frequent Mining(FUFM) which finds all utility-frequent itemsets within the given utility and support
constraints threshold. Utility-frequent itemsets are a special form of high utility itemsets using Selective Item Replication.
There are two divisions are maintained. That is
1. High Utility High Frequency(HUHF), High Utility Low Frequency(HULF)
2. Low Utility High Frequency(LUHF), Low Utility Low Frequency(LULF)
a. HUHF: High utility and high frequency itemsets by incorporating support into FUM algorithm.
• First phase of this algorithm is to generate high utility itemsets H.
• In the second phase, support value is calculated for each itemset in H
b. HULF: high utility and low frequent itemset by support both FUM and FUFM and
generated.algorithms.
• The first phase is to generate high utility itemsets using FUM algorithm.
• The second phase high utility high frequent itemsets are generated using FUFM(HU).
HUHF itemsets are
c. LUHF: To generate Low utility and high frequent itemsets. It follows the basic frame work of FUFM algorithm.
d. LULF: Low utility and low frequent
• First phase using exhaustive search low utility itemsets are determined.
• Second phase, using set difference function low utility low frequent itemsets are generated from LU and LUHF.
4. ALGORITHM FUFM
Task: Discovery of Utility Frequent Itemsets
Input
Database DB
Constraints minUtil and minSup
Output
High Utility High Frequent itemsets (HUHF)
[1] L = 1
[2] Find the set of candidates of length L with support >= minSup
[3] Compute extended support for all candidates and output utility frequent itemsets
[4] L += 1
[5] Use the frequent itemset mining algorithm to obtain new set of frequent candidates of length L from the old set of
frequent candidates
[6] Stop if the new set is empty otherwise go to step[3]
Algorithm Working process
The above steps proved success in finding the frequently occurred high utility itemsets. This is completely based on the
threshold value which we assumed. Each and every stage is compared with the assumed value. Initial step here is to
assigning the length of the candidate and comparing the value with the minimum support value. If it is greater than or
equal to the minimum support, length of the set of candidates are displayed. Next step is for calculating the frequently
occurred item sets and arranging the item sets into ascending order. With the use of frequent item set mining algorithm
we get the frequent candidates of length L from the old set of frequent itemsets. While in the rotation of this process if we
occur a new set with empty then stop the performance if not repeat the calculation process again and again until it get for
empty set. Then proceed to stop the process and note the results occurred.
5. DATA FLOW DIAGRAM
The above diagram depicts the complete chain process of calculating and displaying the frequent itemsets. In this
comparing with threshold value gives the frequent utility item sets as the results.
Volume 2, Issue 8, August 2013
Page 82
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 8, August 2013
ISSN 2319 - 4847
6. RESULTS
Step 1 Entries in Umining algorithm
Step 2 Then open FUM-F algorithm
Step 3 Is opening four different mining algorithms.
1. HUHF- High Utility High Frequent Mining
To view customer details press customer detail button
2. HULF-High Utility Low Frequent Mining
Volume 2, Issue 8, August 2013
Page 83
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 8, August 2013
ISSN 2319 - 4847
To view customer details press customer detail button
3. LUHF-Low Utility High Frequent Mining
To view customer details press customer detail button
4. LULF- Low Utility Low Frequent Mining
To view customer details press customer detail button
7. CONCLUSION
The UMining and FUM algorithms are for mining all high utility item sets. FUFM and FUM-F algorithms use both the
statistical and the utility measures. From the basic framework of these algorithms the different kinds of item sets namely
high utility high frequent, high utility low frequent, low utility high frequent and low utility low frequent are generated.
Then Customer Relationship Management (CRM) is incorporated into the system by tracking the customers who are
frequent buyers of the different kinds of item sets.
REFERENCES
[1] A. Erwin, R. P. Gopalan and N. R. Achuthan, “Efficient mining of high utility itemsets from large datasets,” in Proc.
of PAKDD 2008, LNAI 5012, pp. 554-561
[2] H. F. Li, H. Y. Huang, Y. C. Chen, Y. J. Liu and S. Y. Lee, “Fast and Memory Efficient Mining of High Utility
Itemsets in Data Streams,” in Proc. of the 8th IEEE Int'l Conf. on Data Mining, pp. 881-886, 2008.
[3] Y. Liu, W. Liao and A. Choudhary, “A fast high utility itemsets mining algorithm,” in Proc. of the Utility-Based
Data Mining Workshop, 2005.
[4] R. Agrawal and R. Srikant. “Fast algorithms for mining association rules,” in Proc. of the 20th VLDB Conf., pp.
487-499, 1994
[5] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” in Proc. of the 11th Int’l Conference on Data
Engineering, pp. 3-14, Mar., 1995.
[6] C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong and Y.-K. Lee. “Efficient tree structures for high utility pattern mining in
incremental databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 21, Issue 12, pp.1708-1721,
2009.
[7] Nazeer shaik, B. Renuka Devi, N L Prasanna, V.Satish kumar ” An Algorithm Used For Mining Frequent Pattern
Sets From Very Large Databases” in the international conference.
[8] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int’l Conf. Very Large Data
Bases (VLDB ’94), pp. 487-499, Sept. 1994.
[9] D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional
Databases,” Proc.2001 Int’l Conf. Data Eng. (ICDE ’01), pp. 443-452, Apr. 2001.
Volume 2, Issue 8, August 2013
Page 84
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 8, August 2013
ISSN 2319 - 4847
[10] A.W.-C. Fu, R.W.-W. Kwong, and J. Tang, “Mining n-Most Interesting Itemsets,” Proc. 2000 Int’l Symp.
Methodologies for Intelligent Systems (ISMIS ’00), pp. 59-67, Oct. 2000.
[11] H. F. Li, S. Y. Lee, & M. K. Shan “An efficient algorithm for mining frequent itemsets over the entire history of data
streams” Proc. Int. Workshop on Knowledge Discovery in Data Streams, 2004.
[12] J. Chang, W. Lee, “Finding recently frequent itemsets adaptively over online transactional data streams”,
Information Systems, vol. 31 (8), pp. 849-869, 2006.
[13] Y.-C. Li, J.-S. Yeh and C.-C. Chang, "A fast algorithm for mining share-frequent itemsets," in Proc. APWeb 2005,
417-428.
[14] Frequent Itemset Mining Dataset Repository (FIMDR), http://fimi.cs.helsinki.fi/data/ (accessed 2009).
[15] Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. In: Proc. IEEE ICDE
2001, pp. 433–442 (2001)
[16] Hai Duong, Tin Truong, Bac Le “An Efficient Algorithm for Mining Frequent Itemsets with Single Constraint”.
AUTHOR PROFILE
Nazeer.Shaik, pursuing M.Tech in Computer Science Engineering at Vignan's LARA Institute Of Technology
and Science, Vadlamudi, Guntur Dist., A.P., India. His research interests are Image Processing, Pattern
Recognition and Data Mining. E-mail id: nazeer723@gmail.com.
N.L.Prasanna, Asst.Prof, Department of CSE, Vignan's LARA Institute Of Technology & Science, Vadlamudi
Guntur Dist., A.P., India. Her research interests are Data Mining Data Warehousing and Image Processing. Email id: prasanna.manu@gmail.com.
Volume 2, Issue 8, August 2013
Page 85
Download