Providing User-Support in Performing Knowledge Discovery in Databases

advertisement
From: AAAI Technical Report SS-97-01. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
Providing User-Support in Performing
KnowledgeDiscovery in Databases
Robert Engels,
Michael
Erdmann, Rainer Perkuhn,
Rudi Studer
Institut far AngewandteInformatik und Formale Beschreibungsverfahren
University of Karlsruhe (TH)
I9-76128 Karlsruhe (Germany)
e-mail: {engels I erdmannI perkuhnI studer} @aifb.uni-karlsruhe.de
Statement of Interest for the AAAISpring Symposium"Artificial Intelligence in KnowledgeManagement",
Stanford University, March1997.
Knowledge Management (KM) is becoming a success
factor for industrial organisations. Obtaining control over
and gaining information out of data helps to achieve the
organisation’s goals more effectively. Thus knowledge(or
information) becomes a very important resource. This
resource must be adequately procured, stored, processed
and communicated. These tasks are central points of
Knowledge (and Information)
Management which
embraces key issues such as knowledgeacquisition, data
warehouses, data mining, data base managementsystems,
knowledge representation, case-based reasoning, hyper
media, workflow management, and decision support
systems.
As one can see, besides the information systems aspect AI
methodsand concepts play a significant role in Knowledge
Management, esp. concerning procurement (knowledge
acquisition), storing (knowledge representation),
processing (e.g. data mining, decision support systems)
knowledge.
One important source of information and knowledgeis the
growing numberof large databases which are built up and
maintainedin large organizations, but in the meantimealso
in a lot of small and mediumsize enterprises (SMEs).As
consequence, the research and application area ’Knowledge
Discovery in Databases’ (cf. [Frawley et al. 91], [Fayyad
and Uthurusamy95], [Simoudis et al. 96], [Fayyad et al.
96]) gained some importance during the last years.
Successful KDDapplications in marketing, financial
investment, or network managementindicate that the
development of KDDapplications mayresult in strategic
advantages in performing the business tasks (cf. e.g.
[Brachmanet al. 96]).
However, all experiences show that the development of
(successful) KDDapplications is a complex and errorprone process [Brachmanand Anand96]. In general, the
KDDprocess consists of a task analysis step (for
identifying the real application problem), a pre-processing
step (for identifying and selecting relevant data), a data
miningstep, and a post-processing step (for evaluating the
38
data mining results). At the moment, only first steps
towards a comprehensive methodologythat supports such
complex and iterative KDDprocesses have been proposed
(cf. e.g. [Wirth and Reinartz 96], [Engels 96]). Onthe other
hand, there is a clear indication that such a methodologyis
needed since:
¯ especially in SMEsKDDapplications will be developed
by application specialists and not by KDD
specialists,
¯ task analysis has to be supported in a systematic way in
order to come up with a KDDproblem specification
whichreally meets the application needs,
¯ there exist so manydependencies between different KDD
algorithms (e.g. for pre-processing and data mining)
well as between data characteristics and applicable
algorithms that a kind of planning support is required for
achieving a well-defined and consistent KDDprocess,
¯ the KDDprocess is highly iterative which means that
results of later steps (like e.g. evaluation) will haveto
fed backto earlier steps (like e.g. pre-processing).
Our approach for providing user-support in performing
KnowledgeDiscovery in Databases [Engels 96] aims at
developing a methodology to support the user when
performing entire KDDprocesses and to support the reuse
of previously successfully applied KDDprocesses.
The methodology should provide a user guidance module
to enable application specialists (not necessarily KDD
experts) to develop a KDDprocess by (re-)using
successfully applied (parts of) KDDprocesses, data mining
algorithms, and pre- and post-processing algorithms
adequately. By supporting such a reuse-oriented approach
the development time of new applications decreases while
higher quality solutions are achieved. The user guidance
module includes a repository which contains previously
applied KDDprocesses, data mining algorithms, and
pre/post-processing algorithms. To be able to retrieve these
processes and algorithms (e.g. by applying case-based
reasoning methods) there must exist a uniform description
that highlights the differences in the functionality of these
processes and algorithms. On the other hand, the initial
specification of the KDDtask must be decomposedinto
appropriate subtasks in order to make the complex KDD
process moretractable and in order to be able to associate
adequate algorithms with the identified subtasks (cf.
[Engels 96], [Engels and Perkuhn 96]). Furthermore, an
appropriate specification of data characteristics is also a
prerequisite for selecting applicable analysis algorithms(cf.
e.g.[Hoppe96], [Michieet al. 94]).
Such a user guidance moduleintegrates concepts from the
Knowledge Acquisition
community and the KDD
community, e.g. structured knowledge descriptions on
several levels ([Schreiber et al. 94], [Angeleet al. 96a]),
task decomposition[Angele et al. 96b], and the definition
of repositories of reusable processes and algorithms.
Since our approach is developed in cooperation with an
industrial partner emphasis is also put on evaluating the
developed methodologyin real world domains such as the
quality assessmentof cars.
As we have mentioned in the beginning KDDis only one
facet of Knowledge Management. Because of KDD’s
capability to extract relevant information from a large
amountof data and to use the discovered knowledgein a
strategic way we think KDDis an important facet of
Knowledge Management.
Acknowledgements
Part of this work is supported by Daimler-Benz AG,
Research and Development.
References
J. Angele, D. Fensel, and R. Studer (1996a): Domainand
Task Modelling in MIKE. In: A.G. Sutcliffe,
D.
Benyon, and F. van Assche (eds.): DomainKnowledge
for Interactive System Design. Proceedings of the
TC8/WG8.2 Conference on Domain Knowledge in
Interactive System Design, Geneva, Switzerland, May
1996. Chapman& Hall, London, 1996.
J. Angele, S. Decker, R. Perkuhn, and R. Studer (1996b):
Modelling Problem-Solving Methods in New KARL.
In: Proceedings of the 10th KnowledgeAcquisition
Workshop(KAW’96),Banff, Canada, 1996.
R. Brachman and T. Anand (1996): The Process
Knowledge Discovery in Databases: A HumanCentered Approach. In: U. Fayyad et al. (eds.):
Advances in Knowledge Discovery and Data Mining.
AAAIPress, MenloPark, California, 1996.
R. Brachman, T. Khabaza, W. Kloesgen, G. PiatetskyShapiro, and E. Simoudis (1996): Mining Business
Databases. Communications of the ACM39, 11
(November1996), pp. 42-48.
R. Engels (1996): Planning Tasks for Knowledge
Discovery in Databases. Performing Task-Oriented
User-Guidance.
In: Proceedings
of the 2nd
39
International Conference on KnowledgeDiscovery in
Databases (KDD’96).Portland, Oregon, August 1996.
R. Engels and R. Perkuhn (1996): Describing and
Integrating CompetenceTheories for Problem-Solving
Components and Machine Learning Algorithms. In:
Position Paper Collection of the European Knowledge
Acquisition Workshop (EKAW’96), Matlock Bath,
U.K., May1996.
(ftp://ftp.aifb.uni-karlsruhe.de/pub/ren/ekaw96.ps.Z)
U.M. Fayyad and R. Uthurusamy (Eds.) (1995):
Proceedings of the First International Conference on
Knowledge Discovery & Data Mining. AAAI Press,
MenloPark, 1995.
U.M. Fayyad, G. Piatetsky-Shapiro,
P. Smyth, and R.
Uthurusamy (eds.) (1996): Advances in Knowledge
Discovery and Data Mining. AAAIPress, Menlo Park,
California, 1996.
W. Frawley and G. Piatetsky-Shapiro. (1991) Knowledge
Discovery in Databases. Cambridge, Mass. 1991.
D. Michie, D.J. Spiegelhalter, and C.C. Taylor (1994).
Machine Learning,
Neural and Statistical
Classification. Ellis Horwood.1994.
Th. Hoppe (1996): Kriterien zur Auswahl maschineller
Lernverfahren (Criteria for selecting machinelearning
algorithms). Informatik Spectrum,19(1), 12-19, 1996.
A. Th. Schreiber, B. Wielinga, R. de Hoog, H. Akkermans,
and W. van de Velde (1994): CommonKADS:
Comprehensive Methodology for KBSDevelopment.
IEEEExpert, December1994, pp. 28-37.
E. Simoudis, J. Han and U. Fayyad (Eds.) (1996):
Proceedings of the Second International Conference on
KnowledgeDiscovery and Data Mining. AAAIPress,
MenloPark, 1996.
R. Wirth and Th. Reinartz (1996): Detecting Early
Indicator Cars in an Automotive Database: A MultiStrategy Approach. In: E. Simoudis, J. Han and U.
Fayyad (eds.): Proceedings of the 2nd Int. Conference
on Knowledge Discovery in Databases, Portland,
Oregon. 1996.
Download