Reference Architecture for Semantic CMS

advertisement
Designing
Semantic
CMS – Part I
Semantic CMS Community
Lecturer
Organization
Date of presentation
Co-funded by the
European Union
1
Copyright IKS Consortium
Page:
Part I: Foundations
(1)
Introduction of Content
Management
Part II: Semantic Content
Management
(3)
Knowledge Interaction
and Presentation
(2)
Foundations of Semantic
Web Technologies
Part III: Methodologies
(7)
Requirements Engineering
for Semantic CMS
Representation
(4) Knowledge
and Reasoning
(8)
Designing
Semantic CMS
(5)
Semantic Lifting
(9)
Semantifying
your CMS
(6)
Storing and Accessing
Semantic Data
(10)
www.iks-project.eu
Designing Interactive
Ubiquitous IS
Page: 3
What is this Lecture about?
 We


have seen ...
... how requirements for
semantic content management
are defined in a systematic way.
... a list of industry needs.
 What

Part III: Methodologies
(7)
Requirements Engineering
for Semantic CMS
(8)
Designing
Semantic CMS
(9)
Semantifying
your CMS
is missing?
An efficient way to design an
architecture for a semantic CMS
that meets the defined
requirements
www.iks-project.eu
(10)
Designing Interactive
Ubiquitous IS
Copyright IKS Consortium
Page: 4
How to design a semantic
CMS?
What does the
architecture of a
semantic CMS look
like?
Conceptual
Reference
Architecture
Part 1
IKS Reference
Architecture
How can a semantic
CMS be realized?
Technical
Architectural
Style
Part 2
REST Architecture
www.iks-project.eu
Copyright IKS Consortium
Page: 5
www.iks-project.eu
Copyright IKS Consortium
Page: 6
Towards Semantic Content
Management
Content
Management
Semantic
Content Management
extract knowledge
from content
Content
www.iks-project.eu
Content
Knowledge
Copyright IKS Consortium
Page: 7
How to build a Semantic CMS?
 Requirements

from industry
Easy integration with existing CMS
 Reuse
features of existing CMS
 Use RESTful interfaces
 Semantic features as optional components
 Functional





requirements
Automatic extraction of entities from text
Automatic extraction of relations between entities
Automatic categorization of content
Automatic linking of content
...
www.iks-project.eu
Copyright IKS Consortium
Page: 8
What are semantic CMS?
A Semantic CMS is a CMS with the capability of
interacting with
semantic metadata,
Presentation and Interaction Layer
extracting
semantic metadata,
Semantic Lifting Layer
managing
semantic metadata,
Knowledge Representation and
Reasoning Layer
and storing
semantic metadata
Persistence Layer
about content.
www.iks-project.eu
Copyright IKS Consortium
Page: 9
Traditional CMS Architecture
for Content
User Interface
Presentation Layer
Content Access
Business Logic Layer
Content
Administration
Content Management
Content Data Model
Data Representation
Layer
Content Repository
Persistence Layer
www.iks-project.eu
Copyright IKS Consortium
Page: 10
Reference Architecture for
Semantic CMS
Presentation &
Interaction Layer
Semantic User Interaction
Knowledge Access
Knowledge
Extraction Pipelines
Knowledge Models
Knowledge Repository
www.iks-project.eu
Knowledge
Administration
Reasoning
Semantic Lifting Layer
Knowledge
Representation and
Reasoning Layer
Persistence Layer
Copyright IKS Consortium
Page: 11
Semantic User Interaction
 Dealing
with knowledge in semantic CMS raises the
need an additional user interface level that allows the
interaction with content,
 Example:

“A user writes an article and the SCMS recognizes the
brand of a car in that article. An SCMS includes a
reference to an object representing that car manufacturer
– not only the brand name. The user can
interact with the car manufacturer object and
see, e.g. the location of its headquarter.
Semantic User Interaction
Knowledge Access
Knowledge
Extraction Pipelines
Knowledge Models
Knowledge
Administration
Reasoning
Knowledge Repository
www.iks-project.eu
Copyright IKS Consortium
Page: 12
Knowledge Access
 Access
to inferred and extracted knowledge is
encapsulated through a Knowledge Access layer
 It provides the access to knowledge for Semantic User
Interaction.
Semantic User Interaction
Knowledge Access
Knowledge
Extraction Pipelines
Knowledge Models
Knowledge
Administration
Reasoning
Knowledge Repository
www.iks-project.eu
Copyright IKS Consortium
Page: 13
Knowledge Extraction
Pipelines
 The
main challenge for semantic CMS is the ability to
extract knowledge in terms of semantic metadata from
the stored content.
 A separate layer for Knowledge Extraction Pipelines
encapsulates algorithms for semantic metadata
extraction.
 Typically, knowledge extraction is a
multistage process [FL04] by applying
different IE/IR algorithms
Semantic User Interaction
Knowledge Access
Knowledge
Extraction Pipelines
Knowledge Models
Knowledge
Administration
Reasoning
Knowledge Repository
www.iks-project.eu
Copyright IKS Consortium
Page: 14
Pipeline Processing - Example
Content
PreEntity
Extraction Processing Extraction
Relation
Extraction
John Miller has brought a Jaguar car this year.
Person
Car
Manufacturer
Time
Relation
www.iks-project.eu
Copyright IKS Consortium
Page: 15
Reasoning
 After
lifting content to a semantic level this extracted
information may be used as inputs for reasoning
techniques in the Reasoning layer
 Logical reasoning is a well-known artificial intelligence
technique that uses semantic relations to retrieve
knowledge about the content that was not explicitly
known before.
Semantic User Interaction
Knowledge Access
Knowledge
Extraction Pipelines
Knowledge Models
Knowledge
Administration
Reasoning
Knowledge Repository
www.iks-project.eu
Copyright IKS Consortium
Page: 16
Knowledge Models
 Knowledge
(representation) Models that define the
semantic metadata are used to express knowledge
 Ontologies can be used to define semantic metadata
that specifies so-called concepts and their semantic
relations.
Semantic User Interaction
Knowledge Access
Knowledge
Extraction Pipelines
Knowledge Models
Knowledge
Administration
Reasoning
Knowledge Repository
www.iks-project.eu
Copyright IKS Consortium
Page: 17
Knowledge Repository
 Knowledge
is stored in a Knowledge Repository that
defines the fundamental data structure for knowledge
 State-of-the-art knowledge repositories implement a
triple store where a triple is formed by a subject, a
predicate, and an object
 A triple can be used to express any relation between a
subject and an object
Semantic User Interaction
Knowledge Access
Knowledge
Extraction Pipelines
Knowledge Models
Knowledge
Administration
Reasoning
Knowledge Repository
www.iks-project.eu
Copyright IKS Consortium
Page: 18
Knowledge Administration
 Knowledge



Administration includes the management of:
Semantic User Interaction templates,
Knowledge Extraction Pipeline management
Reasoning management to the administration of
Knowledge Models and Repositories.
Semantic User Interaction
Knowledge Access
Knowledge
Extraction Pipelines
Knowledge Models
Knowledge
Administration
Reasoning
Knowledge Repository
www.iks-project.eu
Copyright IKS Consortium
Page: 19
Integration
Semantic User Interface
User Interface
Semantic User Interaction
Content Access
Knowledge Access
Knowledge
Extraction Pipelines
Reasoning
Content Data Model
Knowledge Models
Content Repository
Knowledge Repository
www.iks-project.eu
Knowledge
Administration
Content
Administration
Content
Management
Copyright IKS Consortium
Page: 20
Implementation of the
Reference Architecture
 Reference
implementation within
the IKS project


IKS: An open source community to
bring semantic technologies to CMS
platforms
New incubating project at the
Apache Software Foundation
http://incubator.apache.org/stanbol
www.iks-project.eu
Copyright IKS Consortium
Page: 21
Implementation of the
Reference Architecture
 One
year student project
Information-Driven Software Engineering


Extract knowledge from unstructured
software specification documents
Case study: 10.000 pages specification of German Health
Card system
www.iks-project.eu
Copyright IKS Consortium
Page: 22
Breathing life to the
Reference Architecture
Semantic User Interface
User Interface
Semantic User Interaction
Content Access
Knowledge Access
Knowledge
Extraction Pipelines
Reasoning
Content Data Model
Knowledge Models
Content Repository
Knowledge Repository
Knowledge
Administration
Content
Administration
Content
Management
Content Management
ID|SE Platform
www.iks-project.eu
Copyright IKS Consortium
Page: 23
Problem Statement
?
Requirements
Engineering
Analysis &
Design
Implementation &
Test
www.iks-project.eu
Copyright IKS Consortium
Page: 24
Problem Statement

Documents and Artifacts created in the software
development process contain implicit information:


Type of the document (e.g. requirements specification)

Named Entities (e.g. actor „User“)
Relations between the different document are not obvious

Thematically similar

Duplicates
www.iks-project.eu
Copyright IKS Consortium
Page: 25
ID|SE Demo
http://idse.cs.upb.de:8082/opencms/opencms/idse
www.iks-project.eu
Copyright IKS Consortium
Page: 26
ID|SE-Platform – Architecture
<<OpenCMS>>
ID|SE-Service-Platform
IE/IR-Service-Orchestrators
ContentManagementSystem
EvaluationServices
ContentManagement
Document-ContentStorage
www.iks-project.eu
MetaDataSearch
IE/IR-Services
Meta-Data-Model
Meta-Data-Storage
Copyright IKS Consortium
Page: 27
Mapping with Reference
Architecture
www.iks-project.eu
Copyright IKS Consortium
ID|SE-Platform
Page: 28
1. Send Request to the ID|SE Platform
<<OpenCMS>>
Content Management
System
Webservice
ID|SE-Service Platform
<<OpenCMS-Module>>
GUI
IEIR-ServiceOrchestrators
DefaultMetaDataCreator
Webservice
IDefaultMetaDataCreator
DefaultMetaDataCreator
www.iks-project.eu
Copyright IKS Consortium
ID|SE-Platform
Page: 29
2. Providing Documents
ID|SE-Service Platform
<<OpenCMS>>
Content Management
System
IEIR-ServiceOrchestrators
DefaultMetaData
Creator
<<component>>
DocumentProvider
Content-Management
IProvideDocuments
Webservice
DocumentContentStorage
www.iks-project.eu
OpenCMSDocument
ProviderProxy
Copyright IKS Consortium
Page: 30
ID|SE-Platform
3. Generation of Meta-Data
IE/IR-ServiceOrchestrators
DefaultMetaDataCreator
IE/IR-Services
Evaluation
Services
ContentExtraction
Preprocessors
Classifier
Clusterer
NamedEntityRecognizer
InformationAggregator
MetaDataModel
www.iks-project.eu
MetaDataStorage
Copyright IKS Consortium
ID|SE-Platform
Page: 31
4. Providing/Presenting Meta-Data
Webservice
<<OpenCMS>>
Content Management
System
<<OpenCMS-Module>>
ArtifactSearchGUI
Meta-Data-Search
IEIR-Services
MetaDataSearchEngine
Webservice
MetaDataSearchEngine
MetaDataModel
www.iks-project.eu
MetaDataStorage
Copyright IKS Consortium
Page: 32
ID|SE Features
Clustering
of artefacts
“Which
artefacts are
about ‘XYZ’ ”
Classification
of artefacts
Named
entity
recognition
No redundancy in
software specification
documents
Efficient way in
browsing through
content
www.iks-project.eu
Duplicate
Check
Facetted
Search
Copyright IKS Consortium
Page: 33
www.iks-project.eu
Copyright IKS Consortium
Page: 34
Evaluation Criteria
Recall
Precision
F-Measure
www.iks-project.eu
Copyright IKS Consortium
Page: 35
Evaluation of Semantic Features
Clustering
Classification
Entity Recognition
100%
100%
100%
88%
90%
90%
90%
84%
80%
80%
80%
80%
77%
74%
72%
70%
70%
70%
60%
60%
60%
50%
50%
50%
40%
40%
40%
30%
30%
30%
20%
20%
20%
10%
10%
10%
0%
0%
F-Measure
Precision
Recall
www.iks-project.eu
64%
58%
56%
0%
F-Measure
Precision
Recall
F-Measure
Precision
Recall
Copyright IKS Consortium
Page: 36
Lessons Learned ...
 Now




you should know ...
... the architectural requirements for a semantic CMS.
... the integration concept of two loosely coupled columns.
... the components of the reference architecture
... how the reference architecture model can used to build
a semantic CMS from scratch and how an extended
system can be extended
www.iks-project.eu
Copyright IKS Consortium
Download