Class Based Rule Mining using Ant Colony Optimization Bijaya Kumar Nanda1 & Gyanesh Das2 1 Department of ICT, F.M. University, Balasore, Odisha 2 Department of ENTC, DRIEMS Abstract – Ant colony optimization (ACO) can be applied to the data mining field to extract rule-based classifiers. In this paper we discuss an overview of two important data mining tasks such as association rule mining (ARM) and classification. Finally we provide a brief overview of a classifier using ant colony optimization that combines association rules mining and supervised classification. I. II. ASSOCIATIVE RULES MINING AND ASSOCIATIVE CLASSIFICATION: There are different data mining techniques including supervised classification, association rules mining or market basket analysis, unsupervised clustering, web data mining, and regression. One technique of data mining is classification. The goal of classification is to build a model of the training data that can correctly predict the class of unseen or test objects. The input of this model learning process is a set of objects along with their classes (supervised training data). Once a predictive model is built, it can be used to predict the class of the objects of test cases for which class is not known. To measure the accuracy of the model, the available dataset is divided into training and test sets. The training set is used to build the model and test set is used to measure its accuracy. There are several problems from a wide range of domains which can be cast into classification problems. Therefore there is always a need of algorithms for building comprehensible and accurate classifiers. INTRODUCTION Classification rule discovery and association rule mining are two important data mining techniques. Association rule mining discovers all those rules from the training set that satisfy minimum support and confidence threshold while classification rule mining discovers a set of rules for predicting the class of unseen data. In this paper we discuss a classification algorithm combining the idea of association rules mining and supervised classification using ACO. It is class based association rule mining or associative classification in which consequent of an association rule is always a class label. The technique integrates classification with association rule mining to discover high quality rules for improving the performance of resulting classifier. ACO is used to mine only appropriate subset of class association rules instead of exhaustively searching for all possible rules. The mining process stops when the discovered rule set achieves a minimum coverage threshold. Strong association rules are discovered based on confidence and support and these rules are used to classify the unseen data. This integration finds more accurate and compact rules from the training set. Association rules mining (ARM) is another important data mining technique. It is used to find strong and interesting relationships among data items present in a set. A typical example of ARM is market basket analysis . In market basket analysis each record contains a list of items purchased by a customer. We are interested to find out the set of items that are frequently purchased together. The objective is to search for interesting habits of customers. The sets of items occurring together can be written as association rules. These association rules can be written as “IF THEN” statements. IF part is called the antecedent of rule and THEN contains the consequent of the rule. In ARM the antecedent and consequent are sets of data items called item-set. An item set that contains k items is called k item set. An association rule is written as A => B, where A and B are set of items. There are different real world The rest of this paper is organized as follows. Section II presents basic ideas of association rule mining and classification Section III describes integration of association rule mining and classification using ACO. Finally, Section IV concludes the paper. ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013 25 Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE) applications of ARM including market basket analysis, customer segmentation, electronic commerce, medical, web mining, finance, and bio informatics. the percentage of the dataset that is (correctly) covered by a set of rules. Coverage is also specified by the user. There are different challenges of class based rule mining First problem is that rule generation is based on frequent item-set mining process and for large databases it takes a lot of time due to the large quantity of items and samples. Secondly, this generates more rules. There may be redundant rules included in the classifier which increases the time cost when classifying objects. There are different approaches developed for associative classification. An algorithm was proposed by B. Liu and et al. It has three main steps: rule discovery, rule selection and classification. Rule discovery process mines all rules from training dataset where consequent of the rule is a class label. These rules are called class association rules. Rule selection process selects the subset of rules from all discovered rules on the basis of their predictive accuracy to make a classifier. They use confidence measure for selecting rules. Higher confidence rules usually give higher predictive accuracy Finally classification process classifies the unseen data samples. An unseen data sample is assigned the class of the rule that has highest confidence value and which also matches with the data sample. The basic problem with their approach is that they mine all possible rules that satisfy minimum support and confidence threshold. This computation is very time expensive in large databases. In ARM two factors are used to measure the importance of a rule, one is called support which is the ratio (or percentage) of transactions in which an item-set appears with respect to total number of transactions. Second factor is confidence, which is the percentage of the number of transactions that contain all items in the consequent as well as the antecedent to the number of transactions that contain all items in the antecedent. The aim of ARM is to find all rules whose support and confidence are greater than the minimum support and confidence threshold specified by the user. The formulas of calculating support and confidence of a rule X =>Y are calculated according to Equation (1.1) and (1.2). Support (X=>Y) = P(XUY) (1.1) Confidence (X=>Y) = P(Y|X) (1.2) Where P(XUY) is the probability of transaction contains X and Y together and P(X|Y) is the probability of Y given X. In other words, support is the probability that a selected transaction from the database will hold all items in the antecedent and the consequent, whereas the confidence is the probability that a randomly selected transaction will contain all the items in the consequent given that the transaction contains all the items in the antecedent. Another class based ARM algorithm called “classification based on multiple class association rules” is proposed by W. Li and et al. . They use multiple rules for classifying an unseen data sample. To classify a test sample the algorithm collects a small set of high confidence rules that match with the test sample and analyze correlation among these rules to assign the class label. They also use a tree structure for storing rules to improve the efficiency of rule retrieving process for classification purpose. The algorithm generates all possible association rules. Class based rule mining is a specific kind of ARM in which we are interested in finding class based association rules. A class based association rule is a rule in which consequent of the rule is always a class label. This takes advantage of ARM for finding interesting relationship among items in the dataset. The support and confidence measures are used to find important rules. We are interested in those association rules that satisfy minimum support and confidence threshold specified by the user. Basic problem is that of mining association rules from large amounts of data. The dataset which is used to build the class association rules contains a set of transaction described by a set of attributes. Each transaction belongs to a predetermined class. The representation of class association rule is X => C, where X is a list of items and C is the class label. Class based rule mining algorithm uses ACO algorithm for finding interesting relationships among data items. It uses its evolutionary capability to efficiently find more interesting subsets of association rules. It does not exhaustively search for all possible association rules as conventional ARM approaches does. In each generation of the algorithm a number of rules that satisfies minimum support and confidence threshold are selected for the final classifier. After each generation pheromones values are updated in such a way that better rules can be extracted in next coming generations. The final discovered rule set is the predictive model and is used to classify unseen test samples. The algorithm based on ACO Shown below discovers unordered rule General association rules mining approach can predict any attribute not just the class attribute and can predict the values of more than one attributes. Another difference is that class based association rules are normally used together as a set for classification of unseen test cases. Here a factor which is used with the support and confidence measure is called coverage. It is ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013 26 Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE) lists and has a different rule construction process, different pheromone update formula classifying unseen test cases. 1 Discovered_RuleList = {}; rule list with empty set */ 2 Training Set = {all training samples}; 3 Initialize min_support, min_confidence, min_coverege, 4 Initialize No_ants; */ initialize the maximum number of ants */ 5 FOR EACH CLASS C IN THE TRAINING SET Rule_Set_Class = {}; /* initialize the rule set of the selected class with empty set */ 7 Initialize pheromone value of all trails; 8 Initialize the heuristic values; 9 Calculate the support of all 1-itemset (item => C) of the training set; 10 IF (support (item) < min_support) 12 26 17 DO 18 Antt construct a class based association rule with a maximum g number of items in the rule; 19 t = t + 1; 22 III. OVERVIEW OF THE ALGORITHM In this section we describe the steps of ACO-based algorithm approach in detail. 3.1 General Description The approach finds a set of association rules from a training set to form a classifier. It does not mine all possible association rules but only a subset of them. Conventional association rules mining algorithms mine all possible rules, which are computationally expensive for large databases. The rules are selected on the basis of support and confidence. Each rule is in the form: IF (item1 AND item2 AND …) THEN class Each item is an attribute-value pair. An example of item is “weather = cold”. The attribute’s name is “weather” and “cold” is one of its possible values. The consequent of each association rule is a class label of a set of classes present in training dataset. We use only “=” operator as our algorithm only deals with categorical attributes. The algorithm for searching for the rules is ACO based. The search space is defined in the form of a graph, where each node of the graph represents a possible value of an attribute. Rules are discovered for each class separately. A temporary set of rules is discovered during each generation of the algorithm and inserted in a set of rules reserved for the selected class label. This process continues until coverage of the set of rules of selected class is greater than or equal to a minimum coverage threshold specified by the user. When rule set of the selected class has sufficient rules to satisfy the minimum coverage threshold then rules are generated for another class. The algorithm stops when the rules of all classes have been FOR EACH RULE CONSTRUCTED BY THE ANTS IF(support(Rule)>=min_support AND confidence(Rule)>=min_confidence) 23 24 g = g + 1; /* increment generation count */ 34 Output: Final classifier; 20 WHILE (t <= no_ants); 21 29 33 Pruning discovered rule set; 14 WHILE (g != no_attributes && coverege < min_coverege) t = 1; /* counter for ants */ Update pheromones; 32 END FOR 13 g = 1; /* generation count */ 16 28 31 Insert Rule_Set_Class in Discovered_RuleList END IF Temp_Rule_Set_Class = {}; Sort all the rules in Temp_Rule_Set_Class according to confidence and then support; 30 END WHILE Set the pheromone value 0 of all those items; 15 END FOR 27 Insert the rule one by one from Temp_Rule_Set_Class into Rule_Set_Class until coverage of Rule_Set_Class is greater than or equal to min_coverage; /* initialize the 6 11 25 Insert the rule in Temp_Rule_Set_Class; END IF ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013 27 Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE) generated. The final classifier contains rules of all classes. which do not satisfy a minimum support threshold. The value of zero ensures that these items cannot be selected by ants during rule construction process. At the start of the algorithm, discovered rule set is empty and user defined parameters are initialized that include minimum support, minimum confidence, minimum coverage and number of ants used by the algorithm. As we mine the association rules of each class separately, therefore the first step is to select a class from the set of remaining classes. The pheromone values and heuristic values on links between items (attribute-value pairs) are initialized. The pheromone values on incoming links to all those items are set to zero that do not satisfy the minimum support threshold so that ants are not able to choose these items. The generation count “g” is set to 1. 3.4 Selection of an Item An ant incrementally adds an item in the antecedent part of the rule that it is constructing. When an item (i.e. an attribute-value pair) has been included in the rule then no other value of that attribute can be considered. The probability of selection of an item for current partial rule is given by the Equation (1.4): Pij(t)= Generation count controls how many maximum numbers of items can be added by an ant in antecedent part of rule which it is constructing. For example when g = 2 an ant can add a maximum of two items in its rule antecedent part. This means that in the first generation we mine one-length association rules only. In the second generation we try to have two-length rules but we may not be able to reach two-length in some cases if the support of all candidate items is below the minimum threshold. Similarly we have third, fourth and subsequent generations. The maximum value of generation count is the number of attributes in dataset excluding class attribute. (1.4) Where τij(g) is the amount of pheromone associated between itemi and itemj in current generation. Furthermore, ηij(c) is the value of the heuristic function on the link between itemi and itemj for the current selected class. The total number of attributes in training dataset is a, and xi is a binary variable that is set to 1 if the attribute Ai was not used by current ant and otherwise set to 0, and bi is the number of possible values in the domain of attribute Ai. The denominator is used to normalize τij(g) ηij(c) value of each possible choice with the summation of τij(g) ηij(c) values of all possible choices. Those items which have higher pheromone and heuristic values are more likely to be selected. 3.2 Rule Construction 3.5 Heuristic Function Each ant constructs a single item rule in the first generation. In the second generation each ant tries to construct a rule with two items. Similarly we have 3 item rules in 3rd generation and so on. Rules with a maximum k number of items are generated in the kth generation, where k is the number of attributes in training set excluding the class attribute. The heuristic value of an item indicates the quality or attractiveness of that item and is used to guide the process of item selection. We use a correlation based heuristic function that calculates correlation of candidate items with the last item (attribute-value pair) chosen by the current ant. The heuristic function is: 3.3 Pheromone Initialization ηij= The pheromone values on all edges are initialized before the start of WHILE loop for each new class. The pheromone values on the edges between all items are initialized with the same amount of pheromone. The initial pheromone is: ij (t=1) = . (1.5) The most recently chosen item is itemi and itemj is the item being considered for adding in the rule. The component |itemi, itemj, classk| is the number of uncovered training samples having itemi, and itemj with class label k for which ants are constructing rules. This value is divided by the number of uncovered training samples that have itemi with classk to find the correlation between the items itemi and itemj. (1.3) Where a is the total number of attributes in training set excluding the class attribute and bi is the number of possible values in the domain of an attribute ai. The pheromone values of all those items are set to zero The other component of the heuristic function indicates the oveall importance of itemj in determining the classk. The factor |itemj, classk| is the number of uncovered training samples having itemj with classk ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013 28 Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE) Where τij(g) is the pheromone value between itemi and itemj in current generation, ρ represents the pheromone evaporation rate and Q is the quality of the rule constructed by an ant. The pheromones of these rules are increased so that in next generation ants can explore more search space instead of searching around those rules which are already inserted in the discovered rule set. This pheromone strategy increases the diversity of the search by focusing on new unexplored areas of search space. The pheromone update on the items occurring in those rules which are rejected due to low confidence but which have sufficient support is done in two steps. First a percentage of the pheromone value is evaporated and then a percentage of the pheromone (depending upon the quality of the rule) is added. If the rule is good then the items of that rule will become more attractive in next generation and more likely to be chosen by ants. Pheromones are evaporated to encourage exploration and to avoid early convergence. The pheromone values of other rules are updated by normalizing. Each pheromone value is normalized by dividing it by the summation of all pheromone values of its competing items. If the quality of a rule is good and there is a pheromone increase on the items used in the rule then the competing items will become less attractive in next generation due to normalization. The reverse is true if the quality of the rule is not good. and is divivded by the factor |itemj| is the number of uncovered training samples having itemj. The heuristic function considers the relationship of the items to be added in the rule and also takes into consideration the overall distribution of the item to be added. As rules are built for a specific class labels therefore our heuristic function is dependent on the class chosen by the ant. Our heuristic function reduces the irrelevant search space during rule construction process in order to better guide the ant to choose the next item in its rule antecedent part. It assigns a zero value to the combination of those items which do not occur together for a given class, thus efficiently restricting the search space for the ants. Therefore, it can be very useful for large dimensional search spaces. 3.6 Rule Construction Stoppage An ant continues to add items in the rule in every generation, for example if generation counter is three then it can add maximum three items in the rule antecedent which it is constructing. The rule construction process can stop in two cases: one if value of generation counter is equal to total number of attributes present in the dataset (excluding class attribute) and second if in any generation the coverage of the rule set of that particular class reaches minimum coverage threshold. 3.9 Rule Selection Process 3.7 Quality of a Rule After all the ants have constructed their rules during a generation, these rules are placed in a temporary set. These rules are checked for minimum support and confidence criterion and those which do not fulfill them are removed. The next step is to insert these rules in the rule set reserved for the discovered rules of the current class. A rule is moved from the temporary rule set to the rule set of the current class only if it is found to enhance the quality of the later set. For this purpose the top rule from the temporary rule set, called R1, is removed. This rule R1 is compared, one by one, with all the rules already present in the discovered rule set of the selected class. The comparison continues until a rule from the discovered rule set satisfies a criterion described below, or until there are no more rules left in the discovered rule set with which R1 can be compared. In the later case, when no rules in the discovered rule set are able to fulfill the criterion, R1 is inserted into the discovered rule set. If a rule in the discovered rule set fulfills the criterion then the rule R1 is rejected and further comparison of R1 is stopped. The criterion is as follows. Let the compared rule of discovered rule set be called R2. If R2 is more general than R1 and confidence of R2 is higher than or equal to R1 then R2 satisfies the criterion to reject the inclusion of R1. If R2 is exactly the same as R1 then also the criterion is satisfied. The The quality of a rule is calculated on the basis of its confidence which is calculated as: Q= (1.6) Here Covered is the number of training samples that match with the rule antecedent part and TP is the number of training samples which match the antecedent of the rule and whose consequent is also same as the consequent of the rule. If the confidence value is high then the rule is considered more accurate. This value is also used in for updating the pheromone values. 3.8 Pheromone Update The pheromone values are updated after each generation so that in next generation ants can make use of this information in their search. The amount of pheromone on links between items occurring in those rules which satisfy minimum support threshold but whose confidence is below the minimum required confidence (and hence they were removed from the temporary rule set) are updated according to the Equation (1.6): τij(g+1) = (1- ρ) τij(g)+(1-1/1+Q) τij(g) (1.7) ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013 29 Special Issue of International Journal on Advanced Computer Theory and Engineering (IJACTE) logic of this criterion is that since R2 is already in the rule set any data sample that matches with R1 is also matched with R2 and since we assign the class label of highest confidence rule therefore the data sample will always be classified by R2 and R1 will not increase the coverage of rule set. V. REFERENCES [1] G. Chen, H. Liu, L. Yu, Q. Wei, and X. Zhang, “A new approach to classification based on association rule mining,” Decision Support Systems, Vol. 42, No. 2, pp. 674-689, 2006 [2] B. Liu, H. Hsu, and Y. Ma, “Integrating classification and association rule mining,” in Proceedings of 4th International Conference on Knowledge Discovery Data Mining, pp. 80–86, 1998. [3] W. Li. “Classification based on multiple association rules,” MSc Thesis, Simon Fraser University, April 2001. [4] R.S. Parpinelli, H.S. Lopes, and A.A. Freitas, “Data mining with an ant colony optimization algorithm,” IEEE Transactions on Evolutionary Computation, Vol. 6, No. 4, pp. 321–332, Aug. 2002. [5] J. Han, and M. Kamber, Data Mining: Concepts and Techniques, 2nd ed., Morgan Kaufmann Publishers, 2006. [6] M. Dorigo, and T. Stützle, Ant Colony Optimization. Cambridge, MA: MIT Press,2004. [7] A. Freitas, “Survey of evolutionary algorithms for data mining and knowledge discovery,” in A. Ghosh, S. Tsutsui (Eds.), Advances in Evolutionary Computation, Springer-Verlag, pp. 151-160, 2001. [8] W. Li, J. Han, and J. Pei, “CMAR: Accurate and efficient classification based on multiple classassociation rules,” in Proceedings of IEEE InternationalConference on Data Mining. (ICDM ’01), pp. 369–376, 2001. [9] M. Dorigo, G. Di Caro, and L.M. Gambardella, “Ant algorithms for discrete optimization. 3.10 Discovered Rule Set When the coverage of discovered rule set of the selected class reaches a coverage threshold then we stop the rule discovery process for that class. This process is repeated for all classes. A final discovered rule set (or list) contains discovered rules of all classes.A new test case unseen during training is assigned the class label of the rule that covers the test sample and also has the highest confidence among any other rules covering it . This is implemented by keeping the rules in a sorted order (from highest to lowest) on the basis of their confidence. For a test case the rules are checked one by one according to the final discovered rule set (or list) contains discovered rules of all classes. order of their sorting and the first rule whose antecedents match the new test sample is fired and the class predicted by the rule’s consequent is assigned to the sample. If none of the discovered rules are fired then the sample is assigned the majority class of the training set which is the default class of the classifier. IV. CONCLUSION In this paper Class based rule mining using ACO algorithm was discussed, which combines two primary data mining paradigms, classification and association rules mining. It is a supervised learning approach for discovering association rules. ACO is used to find the most suitable set of association rules. ACO searches only a subset of association rules to form an accurate classifier instead of massively searching all possible association rules from the dataset. The set of discovered rules is evaluated after each generation of the algorithm. Better rules are generated in subsequent generations by adjusting of the pheromone values. ISSN (Print) : 2319 – 2526, Volume-2, Issue-1, 2013 30