SAMPLE ENCYCLOPEDIA SUBMISSION Artificial Neural Networks and Data Mining Techniques Juan R. Rabuñal, juanra@udc.es Julián Dorado, julian@udc.es Alejandro Pazos, apazos@udc.es Dept. of Information & Communications Technologies, University of A Coruña, Spain INTRODUCTION The world of Data Mining (Cios, Pedrycz & Swiniarrski, 1998) is in constant expansion. New information is obtained from databases thanks to a wide range of techniques, which are all applicable to a determined set of domains and count with a series of advantages and inconveniences. The Artificial Neural Networks (ANNs) technique (McCulloch & Pitts, 1943) (Orchad, 1993) (Haykin, 1999) allows us to resolve complex problems in many disciplines (classification, clustering, regression, etc.), and presents a series of advantages that convert it into a very powerful technique that is easily adapted to any environment. The main inconvenience of ANNs, however, is that they can not explain what they learn and what reasoning was followed to obtain the outputs. This implies that they can not be used in many environments in which this reasoning is essential. This article presents a hybrid technique that not only benefits from the advantages of ANNs in the data-mining field, but also counteracts their inconveniences by using other knowledge extraction techniques. Firstly we extract the requested information by applying an ANN, then we apply other Data Mining techniques to the ANN in order to explain the information that is contained inside the network. We thus obtain a twolevelled system that offers the advantages of the ANNs and compensates for its shortcomings with other Data Mining techniques. BACKGROUND Ever since artificial intelligence appeared, ANNs have been widely studied. Their generalisation capacity, and their inductive learning convert them into a very robust technique that can be used in almost any domain. An ANN is an information processing technique that is inspired on neuronal biology and consists of a large amount of interconnected computational units (neurons), usually in different layers. When an input vector is presented to the input neurons, this vector is propagated and processed in the network until it becomes an output vector in the output neurons. The ANNs have proven to be a very powerful tool in an array of applications, but they present a big problem: their reasoning process cannot be explained, i.e. there is no clear relationship between the inputs that are presented to the network and the outputs it produces. This means that ANNs cannot be used in certain domains, even though several approaches and attempts to explain their behaviour have tried to solve this problem. ………………………………………….. ………………………………………….. ………………………………………….. The most recent works in rules extraction from ANNs are presented by Rivero et al and Rabuñal et al (Rivero, Rabuñal, Dorado, Pazos & Pedreira, 2004) (Rabuñal, Dorado, Pazos & Rivero, 2003). They extract rules by applying a symbolic regression system, based on Genetic Programming (GP) (Koza, Keane, Streeter, Mydlowec, Yu & Lanza, 2003) (Wong & Leung, 2000) (Engelbrecht, Rouwhorst & Schoeman, 2001), to a set of inputs / outputs produced by the ANN. The set of network inputs / produced outputs is dynamically modified, as explained on this paper. MAIN FOCUS OF THE CHAPTER This article presents an architecture in two levels for the extraction of knowledge from databases. In a first level, we apply an ANN as Data Mining technique; in the second level, we apply a knowledge extraction technique to this network. Data Mining with ANNs Artificial Neural Networks constitute a Data Mining technique that has been widely used as a technique for the extraction of knowledge from databases. Their training process is based on examples, and presents several advantages that other models do not offer. ………………………………………….. ………………………………………….. ………………………………………….. The purpose of this article is to correct this defect through the use of another knowledge extraction technique. This technique is applied to the ANN in order to extract its internal knowledge and express it in terms that are proper of the used technique (i.e. decision trees, semantic networks, IF-THEN-ELSE rules, etc.). Extracting knowledge from ANNs In general, the second knowledge extraction technique is applied to the network to explain its behaviour when it produces a set of outputs from a set of inputs, without considering its internal functioning. ………………………………………….. ………………………………………….. ………………………………………….. In this way, we create a closed system in which the patterns set is constantly updated and the search continues for new areas that are not yet covered by the knowledge that we have of the network. A detailed description of the method and of its application to concrete problems can be found in Rabuñal et al (Rabuñal, Dorado, Pazos & Rivero, 2003) (Rabuñal, Dorado, Pazos, Gestal, Rivero & Pedreira, 2004) (Rabuñal, Dorado, Pazos, Pereira & Rivero, 2004). FUTURE TRENDS The proposed architecture is based on the application of a knowledge extraction technique to an ANN, whereas the technique that will be used depends on the type of knowledge that we wish to obtain. Since there is a great variety of techniques and algorithms that generate information of the same kind (IF-THEN rules, trees, etc.), we need to study them and carry out experiments to test their functioning in the proposed system, and in particular their adequacy for the new patterns generation system. Also, this system for the generation of new patterns involves a large number of parameters, such as the percentage of patterns change, the number of new patterns, or the maximum error with which we consider that a pattern is represented by a rule. Since there are so many parameters, we need to study not only each parameter separately but also the influence of all the parameters on the final result. CONCLUSION This article proposes a system architecture that makes good use of the advantages of the ANNs, such as Data Mining, and avoids its inconveniences. On a first level, we apply an ANN to extract and model a set of data. The resulting model offers all the advantages of ANNs, such as noise tolerance and generalisation capacity. On a second level, we apply another knowledge extraction technique to the ANN, and thus obtain the knowledge of the ANN, which is the generalisation of the knowledge that was used for its learning; this knowledge is expressed in the shape that is decided by the user. It is obvious that the union of various techniques in a hybrid system conveys a series of advantages that are associated to them. REFERENCES Cios, K., Pedrycz, W., & Swiniarrski, R. (1998). Data Mining Methods for Knowledge Discovery. Kluwer Academic Publishers. Engelbrecht, A.P., Rouwhorst, S.E., & Schoeman L. (2001). A Building Block Approach to Genetic Programming for Rule Discovery. Data Mining: A Heuristic Approach, Abbass, R. Sarkar, C. Newton editors, Information Science Reference (formerly Idea Group Reference) Publishing. Haykin, S. (1999). Neural Networks (2nd ed.). Englewood Cliffs, NJ: Prentice Hall. Koza, J. R., Keane, M. A., Streeter, M. J., Mydlowec, W., Yu, J., & Lanza, G. (Editors) (2003). Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Dordrecht, The Netherlands: Kluwer Academic Publishers. McCulloch, W.S., & Pitts, W. (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics. (5) 115-133. Orchad, G. (1993). Neural Computing. Research and Applications. Ed. Institute of Physics Publishing, Londres. Rabuñal, J.R., Dorado, J., Pazos, A., & Rivero, D. (2003). Rules and Generalization Capacity Extraction from ANN with GP. Lecture Notes in Computer Science. 606613. Rabuñal, J.R., Dorado, J., Pazos, A., Gestal, M., Rivero, D., & Pedreira, N. (2004 a). Search the Optimal RANN Architecture, Reduce the Training Set and Make the Training Process by a Distribute Genetic Algorithm. Artificial Intelligence and Aplications 1, 415-420. Rabuñal, J.R., Dorado, J., Pazos, A., Pereira, J., & Rivero, D. (2004 b). A New Approach to the Extraction of ANN Rules and to Their Generalization Capacity Through GP. Neural Computation, (16) 7, 1483-1523. Rivero, D., Rabuñal, J.R., Dorado, J., Pazos, A., & Pedreira, N. (2004). Extracting Knowledge from Databases and ANNs with Genetic Programming: Iris Flower Classification Problem. Intelligent Agents for Data Mining and Information Retrieval. 136-152. Wong, M.L., & Leung, K.S. (2000). Data Mining using Grammar Based Genetic Programming and Applications, Kluwer Academic Publishers. TERMS AND DEFINITIONS Area of the Search Space: Set of specific ranges or values of the input variables that constitute a subset of the search space. Artificial Neural Networks: A network of many simple processors (“units” or “neurons”) that imitates a biological neural network. The units are connected by unidirectional communication channels, which carry numeric data. Neural networks can be trained to find nonlinear relationships in data, and are used in applications such as robotics, speech recognition, signal processing or medical diagnosis. Backpropagation algorithm: Learning algorithm of ANNs, based on minimising the error obtained from the comparison between the outputs that the network gives after the application of a set of network inputs and the outputs it should give (the desired outputs). Data Mining: The application of analytical methods and tools to data for the purpose of identifying patterns, relationships or obtaining systems that perform useful tasks such as classification, prediction, estimation, or affinity grouping. Evolutionary Computation: Solution approach guided by biological evolution, which begins with potential solution models, then iteratively applies algorithms to find the fittest models from the set to serve as inputs to the next iteration, ultimately leading to a model that best represents the data. Knowledge Extraction: Explicitation of the internal knowledge of a system or set of data in a way that is easily interpretable by the user. Rule Induction: Process of learning, from cases or instances, if-then rule relationships that consist of an antecedent (if-part, defining the preconditions or coverage of the rule) and a consequent (then-part, stating a classification, prediction, or other expression of a property that holds for cases defined in the antecedent). Search Space: Set of all possible situations of the problem that we want to solve could ever be in.