International Journal of Engineering Trends and Technology (IJETT) – Volume 20 Number 4 – Feb 2015 Predicting Missing Items in a Shopping Cart using Apriori Algorithm Nilesha Dalvi#1, Vinit Erangale#2, Amol Chavhan#3, Asst. Prof.Alka Srivastava#4 Department of Computer Engineering, Atharva College of Engineering, Marve Rd, Malad (west), Mumbai-95, Maharashtra, India Abstract— In today’s ever growing market it is very essential to keep track of the customer’s interests and keep them updated about the trends and products in the market. In this project we aim to create a shopping portal and have a database that records the item sets that are frequently bought together using ‘APRIORI ALGORITHM’ This information will be used to flash advertisements and offers on the products of their interests. Also to promote new products relating to their requirements and mainly to suggest and prompt them about the products which are often bought along with the product already present in their shopping cart. The frequently co-occurring group of items is determined by frequent pattern mining in the databases. Here the major contributing task is [2] expediting the frequent item sets by proposing a technique that uses the minimal data available in the shopping cart for the prediction of what other items the customer can choose to buy. Keywords— Apriori Algorithm, Association rule mining, Data mining I. INTRODUCTION A. Data Mining Data mining is the essential process of discovering hidden and interesting patterns from massive amount of data where data is stored in data warehouse [4]. It analyses data from [3] different perspectives and summarizes it into [3] useful information -information that can be used to increase revenue, cuts costs, or both. It’s a [1] powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. This task is [2] computationally expensive, especially when a large number of patterns exist. This large number of patterns which are mined during the various approaches makes it difficult for the user to identify the patterns which are very interesting. Data mining can be regarded as an algorithmic process that takes data as input and yields patterns, such as classification rules, itemsets, association rules, or summaries, as output. This data may reach to more than terabytes [4]. Data mining is also called (KDD) knowledge discovery in databases [4],[7], and it includes an integration of techniques from many disciplines such as statistics, neural networks, database technology, machine learning and information retrieval, etc [4],[9]. Interesting patterns are extracted at reasonable time by KDD’s techniques [4],[6]. KDD process has several steps, which are performed to extract patterns to user, such as data cleaning, data selection, data transformation, data pre-processing, data mining and pattern evaluation [4],[8] ISSN: 2231-5381 B. Association Rule Mining The association rule mining (ARM) is very important task within the area of data mining [2]. It is perhaps the most common form of local-pattern discovery in unsupervised learning systems [7]. Association rules are statements of the form {X1, X2, …, Xn} => Y meaning that if all of X1, X2,… Xn is found in the market basket, and then we have good chance of finding Y. The probability of finding Y for us to accept this rule is called the confidence of the rule. Normally rules that have a confidence above a certain threshold only will be searched. In many situations, association rules involves sets of items that appear frequently [1]. The technique is likely to be very practical in applications which use the similarity in customer buying behaviour in order to make peer recommendations [7]. It is intended to identify strong rules discovered in databases using different measures of interestingness [3]. II. IMPLEMENTATION A. Apriori Algorithm Apriori algorithm (Agrawal et al. 1993) is easy to execute and very simple, is used to mine all frequent item sets in database [4]. In the process of Apriori, the following definitions are needed [4]: Definition 1: Suppose T={T1, T2, … , Tm},(m_1) is a set of transactions, Ti= {I1, I2, … , In},(n_1) is the set of items, and k-itemset = {i1, i2, … , ik},(k_1) is also the set of k items, and k-itemset⊆ I Definition 2: Suppose _ (itemset), is the support count of itemset or the frequency of occurrence of an itemset in transactions. Definition 3: Suppose Ck is the candidate itemset of size k, and Lk is the frequent itemset of size k. The key idea of Apriori algorithm is to make multiple passes over the database. It employs an iterative approach known as a breadth-first search (level-wise search) through the search space, where k-itemsets are used to explore (k+1)itemsets [7]. It is used to generate all frequent itemset (i.e an itemset [7] whose support is greater than some user-specified minimum support denoted Lk, where k is the size of the itemset). A Candidate itemset is [7] a potentially frequent itemset (denoted Ck, where k is the size of the itemset). Page 184 International Journal of Engineering Trends and Technology (IJETT) – Volume 20 Number 4 – Feb 2015 1. Pass 1 of C3 and use L1 to get the transaction IDs of the minimum support count between x, y and z, then scan for C3 only in these specific transactions and repeat these steps until no new frequent itemsets are identified [4], [9] The process is illustrated in the Fig. 1. 1. Generate the candidate itemsets in C1 2. Save the frequent itemsets in L1 2. Pass k (i). Generate the candidate itemsets in Ck from the frequent itemsets in Lk-1 Scan all transactions to generate L1 table L1 (items, their support, their transaction IDs Join Lk-1p with Lk-1q, as follows: insert nto Ck select p.item1, p.item2, . . . , p.itemk-1, q.itemk-1 from Lk-1p, Lk-1q where, p.item1 = q.item1, . . . p.itemk-2 = q.itemk-2, p.itemk-1<q. itemk-1 Construct Ck by self-join • Generate all (k-1)-subsets from the candidate itemsets in Ck • Prune all candidate itemsets from Ck where, some (k-1)subset of the candidate itemset is not in the frequent itemset Lk-1 Use L1 to identify the target transactions for Ck (ii). Scan the transaction database to determine the support for each candidate itemset in Ck Scan the target transactions to generate Ck (iii). Save the frequent itemsets in Lk [7]. Fig. 1 Steps for Ck generation [4] B. Limitations Apriori algorithm too shows some loopholes in spite of being simple and clear. The main limitation is excessive wastage of time to hold a huge number of candidate sets with [4] much frequent itemsets, low minimum support or large itemsets. For example, if there are104 frequent 1-item sets, the Apriori algorithm will need to generate more than 107 length2 candidates and accumulate and test their occurrence frequencies [7]. Furthermore, to detect frequent pattern in size 100 (e.g.) v1, v2…v100, it will be required to generate 2100 candidate item sets that yield on costly and wasting of time of candidate generation [7], no matter what implementation technique is applied [4]. Thus from candidate itemsets, it will check for multiple sets and also [4] scan database many times repeatedly for finding candidate itemsets. , When the database is storing a large number of data services, the limited memory capacity, the system I/O load, considerably very long time will be consumed in scanning the database, so efficiency is very low[7]. C. Improvements Improved apriori algorithm [4] firstly scans all transactions to generate L1 which contains the items, their support count and Transaction ID where the items are found. And then use L1 later as a reference to generate L2, L3 ... Lk. When C2 is to be generated, it makes a self-join L1 * L1 to construct 2itemset C (x, y), where x and y are the items of C2. Before scanning all transaction records to count the support count of each candidate, use L1 to get the transaction IDs of the minimum support count between x and y, and thus scan for C2 only in these specific transactions. The same thing for C3, construct 3-itemset C (x, y, z), where x, y and z are the items ISSN: 2231-5381 The improvement of algorithm can be described as follows [4]: //Generate items, items support, their transaction ID (1) L1 = find_frequent_1_itemsets (T); (2) For (k = 2; Lk-1 __; k++) { //Generate the Ck from the LK-1 (3) Ck = candidates generated from Lk-1; //get the item Iw with minimum support in Ck using L1, (1_w_k). (4) x = Get _item_min_sup(Ck, L1); // get the target transaction IDs that contain item x. (5) Tgt = get_Transaction_ID(x); (6) For each transaction t in Tgt Do (7) Increment the count of all items in Ck that are found in Tgt; (8) Lk= items in Ck _ min_support; (9) End; (10)} III. CONCLUSION Thus, using Apriori Algorithm with improvements, we are aiming at achieving successful predictions for an online shopping cart. REFERENCES [1] [2] [3] Venkateswara, Sri. "Predicting Missing Items in Shopping Carts using Fast Algorithm." (2011) Nirmala, M., and V. Palanisamy. "An Enhanced Prediction Technique for Missing Itemset in." Nirmala, M., and V. Palanisamy. "An Enhanced Prediction Technique for Missing Itemset in." Page 185 International Journal of Engineering Trends and Technology (IJETT) – Volume 20 Number 4 – Feb 2015 [4] [5] Kollipara Anuradha, K. Anand Kumar "An E-Commerce application for Presuming Missing Items"International Journal of Computer Trends and Technology (IJCTT),V4(8):2636-2640 August Issue 2013 .ISSN Published by Seventh Sense Research Group Al-Maolegi, Mohammed, and Bassam Arkok. "AN IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES." Rao, Sanjeev, and Priyanka Gupta. "Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm 1." (2012). ISSN: 2231-5381 [6] [7] [8] [9] S. Rao, R. Gupta, “Implementing Improved Algorithm Over APRIORI Data Mining Association Rule H. H. O. Nasereddin, “Stream data mining,” International Journal of Web Applications, vol. 1, no. 4, pp. 183–190, 2009. F. Crespo and R. Weber, “A methodology for dynamic data mining based on fuzzy clustering,” Fuzzy Sets and Systems, vol. 150, no. 2, pp. 267–284, Mar. 2005. J. Han, M. Kamber,”Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, Book, 2000. Page 186