International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 3, March 2013 ISSN 2319 - 4847 Attribute Grouping Based Classification for Knowledge Discovery in Databases Nikhat Khan1, Dr. Fozia H. Khan2 and Dr. G.S. Thakur3 1,2,3 Maulana Azad National Institute of Technology, Bhopal ABSTRACT The huge number of marketing campaigns has diminished the effect on general public. Technologies like data warehousing, data mining and campaign management software have made new insights where firms can gain competitive advantages. Through data mining, firms can identify valuable customers which in turn enable them to make practical and knowledge driven decisions. Different types of approaches are abound in the territory of data mining .In this paper we propose to group the data as a part of pre-processing before doing the classification methods. We will try out this method on several classifiers to try out the performance of this method. Classification performance and interpretability are of major importance and it is intended to keep the resulting rule bases small and comprehensible. The primary model is obtained from the data and subsequently attribute grouping is applied to improve the performance and also reduce the cost of the model. Keywords: Attribute Grouping, Classification, Data Mining, Neural Networks 1. INTRODUCTION The data related to direct marketing campaigns for a period of time produces data and information which is analyzed by managers for supporting decision making. Analysis becomes too difficult as the data is too huge. Resolution of this problem led to the development of business intelligence. The global economic crisis paved its way by innovation of new ideas about financial management and new thoughts and points of view of different people. Support management decisions. This paper starts with a brief review of some background on the concepts of business intelligence which is followed by the methodological approach and then the data availability is described. Then the classification techniques is applied and evaluated and the knowledge discovered is studied and analyzed in a results part. Subsequently, we draw the conclusions. The organizations or enterprises try to promote services and /or products through mass campaigns, or by targeting specific set of contacts [1]. In the present scenario due to massive competition, positive responses to mass campaigns are comparatively very low. Alternatively, focusing on specific targets and thus specific marketing campaigns is becoming more popular due to its efficiency. In paper study by Page and Luding it is observed that directed marketing has some drawbacks, as it may give rise to a negative attitude towards banks as customers feel that it results in the intrusion of privacy [2]. Data mining is the process in which huge amounts of data is collected and relevant information is derived from that. Frawley, Piatetsky-Shapiro, and Matheus (1992) have described data mining as “the nontrivial extraction of implicit, previously unknown, and potentially useful information from data” [3]. Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology which has very good potential of helping companies focusing on the valuable information in their data warehouses. Data mining tools is used for predicting future trends and thus allowing businesses to make knowledge-driven decisions. Data mining tools can resolve business queries in no time which was earlier too time consuming. 2. METHODOLOGY In our experiments we decided to use various classification methods such as decision trees and neural network. The source data used was related to direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls in order to attain the information if bank term deposit would be subscribed or not. This data was obtained from: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing [4] [5]. Picture 1 describes the standard process for knowledge discovery in databases (Data Mining). It begins with selection, followed by pre-processing of data, continued by data transformation, then the data mining task itself and ends with Evaluation and application of the system. Volume 2, Issue 3, March 2013 Page 402 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 3, March 2013 ISSN 2319 - 4847 Picture 1: Knowledge Discovery in Databases Source Data: It contains the following attributes: 1 - Age 2 - Job: type of job 3 - Marital: marital status4 - education 5 - Default: has credit in default? 6 - Balance: average yearly balance, in Euros 7 - Housing: has housing loan? 8 - Loan: has personal loan? 9 - Contact: contact communication type 10 - Day: last contact day of the month 11 - Month: last contact month of year 12 - Duration: last contact duration, in seconds 13 - Campaign: number of contacts performed during this campaign and for this client 14 - Pdays: number of days that passed by after the client was last contacted from a previous campaign 15 - Previous: number of contacts performed before this campaign and for this client 16 - Poutcome: outcome of the previous marketing campaign 17 – Subscription [1] Decision Trees Decision trees classify instances by sorting them based on feature values. The decision tree node represents a feature in an instance which is to be classified [6]. Similarly each branch represents a value which the node can assume. Instances are classified beginning at the root node and then sorted based on their feature values. The major approaches available are breadth-first or depth-first greedy searches. There are different types of Decision trees. Some of them whose performance measurement readings we have observed in our experiments are briefly described here: AD Tree: An alternating decision tree is the one which has decision nodes and prediction nodes [7]. Decision nodes describe a predicate condition while Prediction nodes contain a single number. AD Trees necessarily have prediction nodes as both root and leaves. An instance is classified by an AD Tree by following all paths for which all decision nodes are true and summing any prediction nodes which are traversed. [2] Neural Networks A neural network is also one of the well known representatives of learning algorithms and it is categorized into two basic classes – unsupervised and supervised networks. In supervised networks are generally represented by Competitive layers which automatically lead to finding relationships within the input vectors [8] Since the processed data is labeled and we already have a prior knowledge of the output classes, we conducted our experiments on supervised networks, which provide better performance in such cases [9]. There are many types of supervised networks; the most common one is still Multilayer Perceptron (MLP) networks with back-propagation learning algorithm. [3] Attribute-Grouping approach: In this classification approach the data is grouped by converting the actual discrete values into linguistic terms. The resulting data is subjected to Alternating Decision tree classification. This will reduce the dimensionality of data. Although, the boundaries of the discretized intervals are determined by the following IF THEN condition to minimize the uncertainty of the target attribute, the user may be more interested in the linguistic descriptions of these intervals, rather in their precise numeric boundaries. Thus, we start with expressing "linguistic ranges" of continuous attributes Volume 2, Issue 3, March 2013 Page 403 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 3, March 2013 ISSN 2319 - 4847 as lists of terms that the attributes can take (“high”, “low”, etc.). In our approach the attribute values are converted into range values or groups and then various classification techniques are applied to it. The data set is divided into a number of groups based on the range. The following are the IF condition formulas that will be used to group the data1. =IF(F2>1500, “Very High”, IF(F2>1000, “High”, IF(F2>500, “Medium”, IF(F2>100, “Low”, “Very Low”)))) a. This formula is used to calculate balance. b. If balance above 1500, then result very high balance. c. If balance between 1000 and 1500, then high balance. d. If balance between 500 and 1000, then medium balance. e. If balance between 500 and 100, then low balance. g. If balance is below 100, then very low balance. 2. =IF(J2<15,"Early","Late") a. If contact is made between 1 to 15 days of the month, then early otherwise Late. 3. =IF(60<A2,"Senior Citizen”, IF(25<A2,"Middle Aged”,” Youth")) a. If age is above 60, then answer is senior citizen. b. If age is between 25 and 60, then result is middle-aged. c. If age is below 25, then answer is youth. 4. =IF(L2>180, IF(360>L2,"Medium","Long"),"Short") a. To classify the call length as long, medium or short. 5. =IF (N2>0, IF (N2>720,"2 years”, IF(N2>360,"1 year”, “Short Time")),"Not contacted") a. To classify the time since last contacted as 2 years, 1 year back, short time or not contacted (the value has been rounded to 10). 6. =IF(O2=0,"Not Contacted”, IF(O2>20,"Often","Average")) a. To classify, the number of times contacted in previous campaigns. Classes are not contacted, often and Average. 7. =IF (M2=1,"Once",IF(M2>10,"Very Frequent”, IF(M2>5,"Often","Average"))) a. Same as above except for during the current campaign and classes changed to once, Very Frequent, Often and Average. 3. RESULT We picked up some algorithms to use as a testing base on grouped data and non-grouped data. The various algorithms used were – 1. AD Tree 2. Decision Stump [10] 3. Random Forest 4. Random Tree 5. REP Tree 6. Multilayer Perceptron [11] We first accomplished the task of performing the classification task on the grouped data. The results obtained after completing the mining using all the classifiers are shown in Table 1. Subsequently, we performed the mining on the ungrouped data to obtain the results as shown in Table 2. Table 1: Results (Grouped) Algorithm Classification Accuracy AD Tree 89.7075% Time Required [S] 4.25 Decision Stump 88.26% 0.09 0.708 Random Forest 87.6948% 3.78 0.838 Random Tree 85.94% 0.28 0.685 REP Tree 88.3559% 1.48 0.776 Multilayer Perceptron 86.9747% 3486.7 0.839 Volume 2, Issue 3, March 2013 ROC Area 0.863 Page 404 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 3, March 2013 ISSN 2319 - 4847 Table 2: Results (Ungrouped) Algorithm Classification Accuracy Time Required [S] ROC Area AD Tree 88.7073% 29.91 0.865 Decision Stump 88.26% 0.72 0.702 Random Forest 89.0563% 11.34 0.871 Random Tree 86.8027% 1.34 0.7 REP Tree 89.0243% 3.16 0.782 Multilayer Perceptron 87.7415 2187.72 0.833 4. DISCUSSION/ ANALYSIS In our experiments we focused on various methods which provide the information about the exact classification rules. We have observed that classification of data is generally best obtained from decision trees .Also decision trees provide classification rules in a format which is easy to visualize and understand. Thus experiments with decision trees can be considered highly successful. An easy to understand set of rules is obtained from the Alternating decision tree (AD Tree). In terms of success, the attribute grouping method proved very successful as it saved a it saved a great deal of time. This is proved by an observation at the results. We discuss the results of the classification using different classifiers further below4.1 AD Tree The most successful, perhaps is the AD tree. It provided pinpoint accuracy although saving on the matter of time. This is the only algorithm that showed improvement in both, time and accuracy. Besides, it proved more accurate while mining grouped data rather than ungrouped data. The time spent doing the classification on ungrouped data is almost 7 times then the time spent on doing it on grouped data. The result of classification using this algorithm is shown in Picture 1. Picture 2: Alternating Decision Tree, Visualized Form 4.2 Decision Stump The accuracy of this algorithm remained the same on both the datasets. The thing that changed was time; the time taken with the grouped data was nearly one –eighth of the time taken on the original data set. Volume 2, Issue 3, March 2013 Page 405 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 3, March 2013 ISSN 2319 - 4847 4.3 Random Forest This algorithm showed decrease in terms of accuracy when it was done on grouped data, although it saved considerably on time. However, the decrease in accuracy is not as much as the decrease in time, especially when organizations have large records and very less time to perform data mining. 4.4 Random Tree This algorithm, like random forest showed decline in terms of accuracy however not as much as the increase in terms of time saving. 4.5 REP Tree This tree, like the previous two trees saved on the matter of time while being lenient on matters of classification accuracy and ROC Area. However it was not as much a difference as the difference in time. Picture 3 (a): REP Tree Visualized Form, Complete Picture 3 (b): REP Tree Visualized Form, In Part 4.6 Multilayer Perceptron This was the only neural network and the only classifier that was not a decision tree using which we performed our classification using. And also, it was also the only classifier that showed an increase in time while performing on grouped data. This also showed a decrease in terms of accuracy and ROC area. Using this approach we got very good results. There was a huge improvement in the performance of the classifiers overall. For AD tree we found that after using attribute grouping approach the performance improvement was almost 90%. Even the classification accuracy was not showing much difference in most classifiers. This is a huge success of our experimental investigation. 5. CONCLUSION From our experiments we concluded that using our approach of attribute grouping method during pre-processing has proved to be very effective in improving the performance of most classifiers. This is especially true in cases of organizations having a huge number of records which requires a very large time space and cheap budgets. This method cuts considerably on time while being almost stable or showing minor fluctuation in terms of accuracy. Volume 2, Issue 3, March 2013 Page 406 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 3, March 2013 ISSN 2319 - 4847 Therefore it is very useful to perform classification using this approach. The algorithm that remains the most successful was the alternating decision tree algorithm; therefore this approach is best to use along with the alternating decision tree algorithm. However, the neural network algorithm multilayer perceptron did not show improvement in either time or accuracy so it is therefore not very advisable to use this in addition to neural networks. The amount of time spent in pre-processing is also negligible in comparison to the amount saved by this technique. Therefore this approach provides a good method for organizations so that they would be able to save time and cost spent on processing information which is full of numeric data. Further work on this topic could consist of comparison of a bigger variety of algorithms. References [1] Ling, X. and Li, C., 1998. “Data Mining for Direct Marketing: Problems and Solutions”. In Proceedings of the 4th KDD conference, AAAI Press, 73–79. [2] Page, C. and Luding, Y., 2003. “Bank manager’s direct marketing dilemmas – customer’s attitudes and purchase intention”. International Journal of Bank Marketing 21, No.3, 147–163. [3] Frawley, W., Piatetsky-Shapiro, G., & Matheus, C. (1992). Knowledge discovery in databases: An overview. AI Magazine: Fall, 213–228. [4] UCI Machine Learning Repository, "UCI Machine Learning Repository: Bank Marketing Data Set". Available: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing [5] [Moro et al., 2011] S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS. [6] Quinlan, J. R., (1986). Induction of Decision Trees. Machine Learning 1: 81-106, Kluwer Academic Publishers [7] Yoav Freund and Llew Mason. The Alternating Decision Tree Algorithm. Proceedings of the 16th International Conference on Machine Learning, pages 124-133 (1999) [8] Haykin, S. O.: Neural Networks and Learning Machines, 3rd edition, Prentice Hall, ISBN 9780131471399, 2009 [9] Fejfar, J., Šťastný, J. Cepl, M. Time series classification using k-Nearest Neighbours, Multilayer Perceptron and Learning Vector Quantization algorithms. In: Acta Universitatis Agriculturae et Silviculturae [10] Iba, Wayne; and Langley, Pat (1992); Induction of One-Level Decision Trees, in ML92: Proceedings of the Ninth International Conference on Machine Learning, Aberdeen, Scotland, 1–3 July 1992, San Francisco, CA: Morgan Kaufmann, pp. 233–240 [11] Rosenblatt, Frank. x. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC, 1961 AUTHOR Mrs. Nikhat Khan is an IT professional with over 15 years of experience. She has worked in multiple national and international organizations. Currently she is working as a Delivery Project Manager and has visited US, UK and German countries for managing and executing IT projects for renowned international clients. She is currently pursuing PhD from Maulana Azad National Institute of Technology, Bhopal (MANIT) and is eager to know about the advances in business intelligence, data mining and fuzzy logic. Dr. Fozia Haque Khan is Assistant Professor and coordinator of M. Tech (Nanotechnology) at MANIT Bhopal. Author of a book on Quantum Mechanics, Dr Fozia has guided 26 M.Tech theses and is currently guiding 7 PhDs on high application materials like oxide nanomaterials for solar cells, gas sensors and other nanomaterials. With over 80 publications in national and international journals and conferences of repute, she has delivered talks in several national and international conferences, and visited King Abdul Aziz University, Jeddah, S.A, Harvard Universit Boston USA, and Naval Research Lab, Washington DC (USA). Dr. Ghanshyam Singh Thakur has received BSc degree from Dr. Hari Singh Gour University Sagar M.P. in 2000. He has received MCA degree in 2003 from Pt. Ravi Shankar Shukal University Raipur C.G. and PhD degree from Barkatullah University, Bhopal M.P. in year 2009. He is Assistant Professor in the department of Computer Applications, Maulana Azad National Institute of technology, Bhopal, M.P. India. He has eight year teaching and research experience. He has 26 publications in national and international journals. His research interests include Text Mining, Document clustering, Information Retrieval, Data Warehousing. He is a member of the CSI, IAENG, and IACSIT. Volume 2, Issue 3, March 2013 Page 407