Designing Semantic CMS – Part I Semantic CMS Community Lecturer Organization Date of presentation Co-funded by the European Union 1 Copyright IKS Consortium Page: Part I: Foundations (1) Introduction of Content Management Part II: Semantic Content Management (3) Knowledge Interaction and Presentation (2) Foundations of Semantic Web Technologies Part III: Methodologies (7) Requirements Engineering for Semantic CMS Representation (4) Knowledge and Reasoning (8) Designing Semantic CMS (5) Semantic Lifting (9) Semantifying your CMS (6) Storing and Accessing Semantic Data (10) www.iks-project.eu Designing Interactive Ubiquitous IS Page: 3 What is this Lecture about? We have seen ... ... how requirements for semantic content management are defined in a systematic way. ... a list of industry needs. What Part III: Methodologies (7) Requirements Engineering for Semantic CMS (8) Designing Semantic CMS (9) Semantifying your CMS is missing? An efficient way to design an architecture for a semantic CMS that meets the defined requirements www.iks-project.eu (10) Designing Interactive Ubiquitous IS Copyright IKS Consortium Page: 4 How to design a semantic CMS? What does the architecture of a semantic CMS look like? Conceptual Reference Architecture Part 1 IKS Reference Architecture How can a semantic CMS be realized? Technical Architectural Style Part 2 REST Architecture www.iks-project.eu Copyright IKS Consortium Page: 5 www.iks-project.eu Copyright IKS Consortium Page: 6 Towards Semantic Content Management Content Management Semantic Content Management extract knowledge from content Content www.iks-project.eu Content Knowledge Copyright IKS Consortium Page: 7 How to build a Semantic CMS? Requirements from industry Easy integration with existing CMS Reuse features of existing CMS Use RESTful interfaces Semantic features as optional components Functional requirements Automatic extraction of entities from text Automatic extraction of relations between entities Automatic categorization of content Automatic linking of content ... www.iks-project.eu Copyright IKS Consortium Page: 8 What are semantic CMS? A Semantic CMS is a CMS with the capability of interacting with semantic metadata, Presentation and Interaction Layer extracting semantic metadata, Semantic Lifting Layer managing semantic metadata, Knowledge Representation and Reasoning Layer and storing semantic metadata Persistence Layer about content. www.iks-project.eu Copyright IKS Consortium Page: 9 Traditional CMS Architecture for Content User Interface Presentation Layer Content Access Business Logic Layer Content Administration Content Management Content Data Model Data Representation Layer Content Repository Persistence Layer www.iks-project.eu Copyright IKS Consortium Page: 10 Reference Architecture for Semantic CMS Presentation & Interaction Layer Semantic User Interaction Knowledge Access Knowledge Extraction Pipelines Knowledge Models Knowledge Repository www.iks-project.eu Knowledge Administration Reasoning Semantic Lifting Layer Knowledge Representation and Reasoning Layer Persistence Layer Copyright IKS Consortium Page: 11 Semantic User Interaction Dealing with knowledge in semantic CMS raises the need an additional user interface level that allows the interaction with content, Example: “A user writes an article and the SCMS recognizes the brand of a car in that article. An SCMS includes a reference to an object representing that car manufacturer – not only the brand name. The user can interact with the car manufacturer object and see, e.g. the location of its headquarter. Semantic User Interaction Knowledge Access Knowledge Extraction Pipelines Knowledge Models Knowledge Administration Reasoning Knowledge Repository www.iks-project.eu Copyright IKS Consortium Page: 12 Knowledge Access Access to inferred and extracted knowledge is encapsulated through a Knowledge Access layer It provides the access to knowledge for Semantic User Interaction. Semantic User Interaction Knowledge Access Knowledge Extraction Pipelines Knowledge Models Knowledge Administration Reasoning Knowledge Repository www.iks-project.eu Copyright IKS Consortium Page: 13 Knowledge Extraction Pipelines The main challenge for semantic CMS is the ability to extract knowledge in terms of semantic metadata from the stored content. A separate layer for Knowledge Extraction Pipelines encapsulates algorithms for semantic metadata extraction. Typically, knowledge extraction is a multistage process [FL04] by applying different IE/IR algorithms Semantic User Interaction Knowledge Access Knowledge Extraction Pipelines Knowledge Models Knowledge Administration Reasoning Knowledge Repository www.iks-project.eu Copyright IKS Consortium Page: 14 Pipeline Processing - Example Content PreEntity Extraction Processing Extraction Relation Extraction John Miller has brought a Jaguar car this year. Person Car Manufacturer Time Relation www.iks-project.eu Copyright IKS Consortium Page: 15 Reasoning After lifting content to a semantic level this extracted information may be used as inputs for reasoning techniques in the Reasoning layer Logical reasoning is a well-known artificial intelligence technique that uses semantic relations to retrieve knowledge about the content that was not explicitly known before. Semantic User Interaction Knowledge Access Knowledge Extraction Pipelines Knowledge Models Knowledge Administration Reasoning Knowledge Repository www.iks-project.eu Copyright IKS Consortium Page: 16 Knowledge Models Knowledge (representation) Models that define the semantic metadata are used to express knowledge Ontologies can be used to define semantic metadata that specifies so-called concepts and their semantic relations. Semantic User Interaction Knowledge Access Knowledge Extraction Pipelines Knowledge Models Knowledge Administration Reasoning Knowledge Repository www.iks-project.eu Copyright IKS Consortium Page: 17 Knowledge Repository Knowledge is stored in a Knowledge Repository that defines the fundamental data structure for knowledge State-of-the-art knowledge repositories implement a triple store where a triple is formed by a subject, a predicate, and an object A triple can be used to express any relation between a subject and an object Semantic User Interaction Knowledge Access Knowledge Extraction Pipelines Knowledge Models Knowledge Administration Reasoning Knowledge Repository www.iks-project.eu Copyright IKS Consortium Page: 18 Knowledge Administration Knowledge Administration includes the management of: Semantic User Interaction templates, Knowledge Extraction Pipeline management Reasoning management to the administration of Knowledge Models and Repositories. Semantic User Interaction Knowledge Access Knowledge Extraction Pipelines Knowledge Models Knowledge Administration Reasoning Knowledge Repository www.iks-project.eu Copyright IKS Consortium Page: 19 Integration Semantic User Interface User Interface Semantic User Interaction Content Access Knowledge Access Knowledge Extraction Pipelines Reasoning Content Data Model Knowledge Models Content Repository Knowledge Repository www.iks-project.eu Knowledge Administration Content Administration Content Management Copyright IKS Consortium Page: 20 Implementation of the Reference Architecture Reference implementation within the IKS project IKS: An open source community to bring semantic technologies to CMS platforms New incubating project at the Apache Software Foundation http://incubator.apache.org/stanbol www.iks-project.eu Copyright IKS Consortium Page: 21 Implementation of the Reference Architecture One year student project Information-Driven Software Engineering Extract knowledge from unstructured software specification documents Case study: 10.000 pages specification of German Health Card system www.iks-project.eu Copyright IKS Consortium Page: 22 Breathing life to the Reference Architecture Semantic User Interface User Interface Semantic User Interaction Content Access Knowledge Access Knowledge Extraction Pipelines Reasoning Content Data Model Knowledge Models Content Repository Knowledge Repository Knowledge Administration Content Administration Content Management Content Management ID|SE Platform www.iks-project.eu Copyright IKS Consortium Page: 23 Problem Statement ? Requirements Engineering Analysis & Design Implementation & Test www.iks-project.eu Copyright IKS Consortium Page: 24 Problem Statement Documents and Artifacts created in the software development process contain implicit information: Type of the document (e.g. requirements specification) Named Entities (e.g. actor „User“) Relations between the different document are not obvious Thematically similar Duplicates www.iks-project.eu Copyright IKS Consortium Page: 25 ID|SE Demo http://idse.cs.upb.de:8082/opencms/opencms/idse www.iks-project.eu Copyright IKS Consortium Page: 26 ID|SE-Platform – Architecture <<OpenCMS>> ID|SE-Service-Platform IE/IR-Service-Orchestrators ContentManagementSystem EvaluationServices ContentManagement Document-ContentStorage www.iks-project.eu MetaDataSearch IE/IR-Services Meta-Data-Model Meta-Data-Storage Copyright IKS Consortium Page: 27 Mapping with Reference Architecture www.iks-project.eu Copyright IKS Consortium ID|SE-Platform Page: 28 1. Send Request to the ID|SE Platform <<OpenCMS>> Content Management System Webservice ID|SE-Service Platform <<OpenCMS-Module>> GUI IEIR-ServiceOrchestrators DefaultMetaDataCreator Webservice IDefaultMetaDataCreator DefaultMetaDataCreator www.iks-project.eu Copyright IKS Consortium ID|SE-Platform Page: 29 2. Providing Documents ID|SE-Service Platform <<OpenCMS>> Content Management System IEIR-ServiceOrchestrators DefaultMetaData Creator <<component>> DocumentProvider Content-Management IProvideDocuments Webservice DocumentContentStorage www.iks-project.eu OpenCMSDocument ProviderProxy Copyright IKS Consortium Page: 30 ID|SE-Platform 3. Generation of Meta-Data IE/IR-ServiceOrchestrators DefaultMetaDataCreator IE/IR-Services Evaluation Services ContentExtraction Preprocessors Classifier Clusterer NamedEntityRecognizer InformationAggregator MetaDataModel www.iks-project.eu MetaDataStorage Copyright IKS Consortium ID|SE-Platform Page: 31 4. Providing/Presenting Meta-Data Webservice <<OpenCMS>> Content Management System <<OpenCMS-Module>> ArtifactSearchGUI Meta-Data-Search IEIR-Services MetaDataSearchEngine Webservice MetaDataSearchEngine MetaDataModel www.iks-project.eu MetaDataStorage Copyright IKS Consortium Page: 32 ID|SE Features Clustering of artefacts “Which artefacts are about ‘XYZ’ ” Classification of artefacts Named entity recognition No redundancy in software specification documents Efficient way in browsing through content www.iks-project.eu Duplicate Check Facetted Search Copyright IKS Consortium Page: 33 www.iks-project.eu Copyright IKS Consortium Page: 34 Evaluation Criteria Recall Precision F-Measure www.iks-project.eu Copyright IKS Consortium Page: 35 Evaluation of Semantic Features Clustering Classification Entity Recognition 100% 100% 100% 88% 90% 90% 90% 84% 80% 80% 80% 80% 77% 74% 72% 70% 70% 70% 60% 60% 60% 50% 50% 50% 40% 40% 40% 30% 30% 30% 20% 20% 20% 10% 10% 10% 0% 0% F-Measure Precision Recall www.iks-project.eu 64% 58% 56% 0% F-Measure Precision Recall F-Measure Precision Recall Copyright IKS Consortium Page: 36 Lessons Learned ... Now you should know ... ... the architectural requirements for a semantic CMS. ... the integration concept of two loosely coupled columns. ... the components of the reference architecture ... how the reference architecture model can used to build a semantic CMS from scratch and how an extended system can be extended www.iks-project.eu Copyright IKS Consortium