From: AAAI Technical Report SS-97-01. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved. Providing User-Support in Performing KnowledgeDiscovery in Databases Robert Engels, Michael Erdmann, Rainer Perkuhn, Rudi Studer Institut far AngewandteInformatik und Formale Beschreibungsverfahren University of Karlsruhe (TH) I9-76128 Karlsruhe (Germany) e-mail: {engels I erdmannI perkuhnI studer} @aifb.uni-karlsruhe.de Statement of Interest for the AAAISpring Symposium"Artificial Intelligence in KnowledgeManagement", Stanford University, March1997. Knowledge Management (KM) is becoming a success factor for industrial organisations. Obtaining control over and gaining information out of data helps to achieve the organisation’s goals more effectively. Thus knowledge(or information) becomes a very important resource. This resource must be adequately procured, stored, processed and communicated. These tasks are central points of Knowledge (and Information) Management which embraces key issues such as knowledgeacquisition, data warehouses, data mining, data base managementsystems, knowledge representation, case-based reasoning, hyper media, workflow management, and decision support systems. As one can see, besides the information systems aspect AI methodsand concepts play a significant role in Knowledge Management, esp. concerning procurement (knowledge acquisition), storing (knowledge representation), processing (e.g. data mining, decision support systems) knowledge. One important source of information and knowledgeis the growing numberof large databases which are built up and maintainedin large organizations, but in the meantimealso in a lot of small and mediumsize enterprises (SMEs).As consequence, the research and application area ’Knowledge Discovery in Databases’ (cf. [Frawley et al. 91], [Fayyad and Uthurusamy95], [Simoudis et al. 96], [Fayyad et al. 96]) gained some importance during the last years. Successful KDDapplications in marketing, financial investment, or network managementindicate that the development of KDDapplications mayresult in strategic advantages in performing the business tasks (cf. e.g. [Brachmanet al. 96]). However, all experiences show that the development of (successful) KDDapplications is a complex and errorprone process [Brachmanand Anand96]. In general, the KDDprocess consists of a task analysis step (for identifying the real application problem), a pre-processing step (for identifying and selecting relevant data), a data miningstep, and a post-processing step (for evaluating the 38 data mining results). At the moment, only first steps towards a comprehensive methodologythat supports such complex and iterative KDDprocesses have been proposed (cf. e.g. [Wirth and Reinartz 96], [Engels 96]). Onthe other hand, there is a clear indication that such a methodologyis needed since: ¯ especially in SMEsKDDapplications will be developed by application specialists and not by KDD specialists, ¯ task analysis has to be supported in a systematic way in order to come up with a KDDproblem specification whichreally meets the application needs, ¯ there exist so manydependencies between different KDD algorithms (e.g. for pre-processing and data mining) well as between data characteristics and applicable algorithms that a kind of planning support is required for achieving a well-defined and consistent KDDprocess, ¯ the KDDprocess is highly iterative which means that results of later steps (like e.g. evaluation) will haveto fed backto earlier steps (like e.g. pre-processing). Our approach for providing user-support in performing KnowledgeDiscovery in Databases [Engels 96] aims at developing a methodology to support the user when performing entire KDDprocesses and to support the reuse of previously successfully applied KDDprocesses. The methodology should provide a user guidance module to enable application specialists (not necessarily KDD experts) to develop a KDDprocess by (re-)using successfully applied (parts of) KDDprocesses, data mining algorithms, and pre- and post-processing algorithms adequately. By supporting such a reuse-oriented approach the development time of new applications decreases while higher quality solutions are achieved. The user guidance module includes a repository which contains previously applied KDDprocesses, data mining algorithms, and pre/post-processing algorithms. To be able to retrieve these processes and algorithms (e.g. by applying case-based reasoning methods) there must exist a uniform description that highlights the differences in the functionality of these processes and algorithms. On the other hand, the initial specification of the KDDtask must be decomposedinto appropriate subtasks in order to make the complex KDD process moretractable and in order to be able to associate adequate algorithms with the identified subtasks (cf. [Engels 96], [Engels and Perkuhn 96]). Furthermore, an appropriate specification of data characteristics is also a prerequisite for selecting applicable analysis algorithms(cf. e.g.[Hoppe96], [Michieet al. 94]). Such a user guidance moduleintegrates concepts from the Knowledge Acquisition community and the KDD community, e.g. structured knowledge descriptions on several levels ([Schreiber et al. 94], [Angeleet al. 96a]), task decomposition[Angele et al. 96b], and the definition of repositories of reusable processes and algorithms. Since our approach is developed in cooperation with an industrial partner emphasis is also put on evaluating the developed methodologyin real world domains such as the quality assessmentof cars. As we have mentioned in the beginning KDDis only one facet of Knowledge Management. Because of KDD’s capability to extract relevant information from a large amountof data and to use the discovered knowledgein a strategic way we think KDDis an important facet of Knowledge Management. Acknowledgements Part of this work is supported by Daimler-Benz AG, Research and Development. References J. Angele, D. Fensel, and R. Studer (1996a): Domainand Task Modelling in MIKE. In: A.G. Sutcliffe, D. Benyon, and F. van Assche (eds.): DomainKnowledge for Interactive System Design. Proceedings of the TC8/WG8.2 Conference on Domain Knowledge in Interactive System Design, Geneva, Switzerland, May 1996. Chapman& Hall, London, 1996. J. Angele, S. Decker, R. Perkuhn, and R. Studer (1996b): Modelling Problem-Solving Methods in New KARL. In: Proceedings of the 10th KnowledgeAcquisition Workshop(KAW’96),Banff, Canada, 1996. R. Brachman and T. Anand (1996): The Process Knowledge Discovery in Databases: A HumanCentered Approach. In: U. Fayyad et al. (eds.): Advances in Knowledge Discovery and Data Mining. AAAIPress, MenloPark, California, 1996. R. Brachman, T. Khabaza, W. Kloesgen, G. PiatetskyShapiro, and E. Simoudis (1996): Mining Business Databases. Communications of the ACM39, 11 (November1996), pp. 42-48. R. Engels (1996): Planning Tasks for Knowledge Discovery in Databases. Performing Task-Oriented User-Guidance. In: Proceedings of the 2nd 39 International Conference on KnowledgeDiscovery in Databases (KDD’96).Portland, Oregon, August 1996. R. Engels and R. Perkuhn (1996): Describing and Integrating CompetenceTheories for Problem-Solving Components and Machine Learning Algorithms. In: Position Paper Collection of the European Knowledge Acquisition Workshop (EKAW’96), Matlock Bath, U.K., May1996. (ftp://ftp.aifb.uni-karlsruhe.de/pub/ren/ekaw96.ps.Z) U.M. Fayyad and R. Uthurusamy (Eds.) (1995): Proceedings of the First International Conference on Knowledge Discovery & Data Mining. AAAI Press, MenloPark, 1995. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (eds.) (1996): Advances in Knowledge Discovery and Data Mining. AAAIPress, Menlo Park, California, 1996. W. Frawley and G. Piatetsky-Shapiro. (1991) Knowledge Discovery in Databases. Cambridge, Mass. 1991. D. Michie, D.J. Spiegelhalter, and C.C. Taylor (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood.1994. Th. Hoppe (1996): Kriterien zur Auswahl maschineller Lernverfahren (Criteria for selecting machinelearning algorithms). Informatik Spectrum,19(1), 12-19, 1996. A. Th. Schreiber, B. Wielinga, R. de Hoog, H. Akkermans, and W. van de Velde (1994): CommonKADS: Comprehensive Methodology for KBSDevelopment. IEEEExpert, December1994, pp. 28-37. E. Simoudis, J. Han and U. Fayyad (Eds.) (1996): Proceedings of the Second International Conference on KnowledgeDiscovery and Data Mining. AAAIPress, MenloPark, 1996. R. Wirth and Th. Reinartz (1996): Detecting Early Indicator Cars in an Automotive Database: A MultiStrategy Approach. In: E. Simoudis, J. Han and U. Fayyad (eds.): Proceedings of the 2nd Int. Conference on Knowledge Discovery in Databases, Portland, Oregon. 1996.