The HySpirit Retrieval Platform Thomas Rölleke1,2 roelleke@HySpirit.com thor@dcs.qmw.ac.uk Ralf Lübeck1 luebeck@HySpirit.com Gabriella Kazai2 gabs@dcs.qmw.ac.uk 1 2 HySpirit GbR, Postfach 300258, D-44232 Dortmund, Germany Queen Mary University of London, Department of Computer Science, London, E1 4NS, England Between 1993-1999, the foundations of the retrieval platform HySpirit (HYpermedia System with Probabilistic Inference for the Retrieval of InformaTion) were developed at the University of Dortmund. Since 1999, HySpirit has been used as a research tool and as a commercial product in selected applications. The main purposes of the HySpirit technology are: Provide flexible abstraction layers (relational, logical and object-oriented) for modelling hypermedia and knowledge retrieval. Exploit the logical structure of documents for retrieving best entry points into documents. Support knowledge-oriented retrieval in semi-structured and heterogeneous data sources. Integrate fact-oriented and content-oriented searching and browsing. Support retrieval strategies exploiting relationships (spatial, temporal, semantic etc.). Consider intrinsic features of knowledge such as uncertainty, incompleteness and inconsistency. Allow for easy parameter setting to reflect user preferences (profiles) and support retrieval experiments and evaluation. Extend database models with probability theory in order to create a retrieval platform possessing the advantages and expressiveness of database models, and the aggregation of uncertain evidence as defined by probability theory. Currently, the platform is applied in research projects on structured document retrieval (XML) and interactive TV (MPEG-7). Also, the system is installed in a commercial environment, where various information needs occur in a highly heterogeneous and complex information environment. The HySpirit system consists of a number of modules including the core retrieval engine, knowledge modelling languages, indexers, analysers, statistical tools and data management connectors. This modular structure supports distributed development and a “plug-in” architecture. HySpirit has four main abstraction layers: POOL: Probabilistic Object-Oriented Logic ([3]) FVPD: Four-Valued Probabilistic Datalog ([2]) PD: Probabilistic Datalog ([4]) PRA: Probabilistic Relational Algebra ([1]) The POOL layer provides object-oriented concepts such as classification, association, and aggregation for modelling knowledge for the purpose of retrieval. In addition, POOL includes the content dimension in combination with objectoriented modelling. FVPD and PD are the logical layers providing rules and recursion, where FVPD comes with special features for representing incomplete and inconsistent knowledge. The core of the HySpirit system is its PRA retrieval engine. This layer provides the widely known relational paradigm, which recently has been extended with an SQL interface. All abstraction layers provide the key concept of HySpirit: representation of knowledge and its intrinsic uncertainty with database models and probability theory to achieve a descriptional approach for modelling complex information retrieval tasks such as hypermedia and knowledge retrieval. Further information about HySpirit can www.hyspirit.com and qmir.dcs.qmw.ac.uk. be found at REFERENCES [1] N. Fuhr and T. Rölleke. A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. ACM Transactions on Information Systems, 14(1): 32–66, 1997. [2] N. Fuhr and T. Rölleke. HySpirit - a Probabilistic Inference Engine for Hypermedia Retrieval in Large Databases. Proceedings of the 6th International Conference on Extending Database Technology (EDBT), Valencia, Spain, Lecture Notes in Computer Science, 24–38, Berlin et al., 1998. Springer. [3] T. Rölleke. POOL: Probabilistic Object-Oriented Logical Representation and Retrieval of Complex Objects. Shaker Verlag, Aachen, 1999. Dissertation. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference ’00, Month 1-2, 2000, City, State. Copyright 2000 ACM 1-58113-000-0/00/0000…$5.00. [4] T. Rölleke and N. Fuhr. Information Retrieval with Probabilistic Datalog. In F. Crestani, M Lalmas and C.J. Rijsbergen, eds, Uncertainty and Logics - Advanced Models for the Representation and Retrieval of Information. Kluwer Academic Publishers, 1998.