Knowledge Publishing and Access on the Semantic Web

advertisement
Next Generation Semantic Web Applications
Prof. Enrico Motta
Director, Knowledge Media Institute
The Open University
Milton Keynes, UK
Structure of the Talk
• Quick Recap: What is the Semantic Web?
• State of the art: 1st Generation SW Applications
– Emphasis on ontology-driven data aggregation
– Limited with respect to their ability to exploit large
scale, heterogeneous semantic markup
• Key research issues
– What needs to be done to enable the effective
development of the next generation of SW Applications
– Need for a different approach to some key res. areas
– How the SW itself can be exploited to address such key
research issues
Quick Recap: What is the Semantic Web?
The Semantic Web
A large scale, heterogenous collection
of formal, machine processable,
ontology-based statements (semantic
metadata) about web resources and
other entities in the world, expressed
in a XML-based syntax
Ontology
Metadata
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
UoD
Person
hasAffiliation
Organization
worksInOrgUnit
hasJobTitle
partOf
String
Organization-Unit
<akt:Person rdf:about="akt:EnricoMotta">
<rdfs:label>Enrico Motta</rdfs:label>
<akt:hasAffiliation rdf:resource="akt:TheOpenUniversity"/>
<akt:hasJobTitle>kmi director</akt:hasJobTitle>
<akt:worksInOrgUnit rdf:resource="akt:KnowledgeMediaInstitute"/>
<akt:hasGivenName>enrico</akt:hasGivenName>
<akt:hasFamilyName>motta</akt:hasFamilyName>
<akt:worksInProject rdf:resource="akt:Neon"/>
<akt:worksInProject rdf:resource="akt:X-Media"/>
<akt:hasPrettyName>Enrico Motta</akt:hasPrettyName>
<akt:hasPostalAddress rdf:resource="akt:KmiPostalAddress"/>
<akt:hasEmailAddress>e.motta@open.ac.uk</akt:hasEmailAddress>
<akt:hasHomePage
rdf:resource="http://kmi.open.ac.uk/people/motta/"/>
</akt:Person>
SW = A Conceptual Layer
over the web
SW is Heterogeneous!
Generating semantic markup
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
<RDF triple>
Key aspects of the SW
• Size (= Huge)
– Sem. markup (eventually to reach) the same order of magnitude
as the web
• Conceptual Heterogeneity (= Big)
– Sem. markup based on many different ontologies
• Rate of change (= Very High)
– Data generated all the time from human and artificial agents…
• Provenance (= Very Heterogeneous)
– ….Hence provenance itself is extremely heterogeneous
• Trust (= very variable and subjective)
– A side-effect of heterogeneous provenance
• Data Quality (= very variable)
– No guarantee of correctness
• Intelligence (= by-product of size and heterogeneity)
– Rather than a by-product of sophisticated problem solving
Compare with traditional KBS
• Size (= Small or Medium)
– KBS normally small to medium size
• Conceptual Heterogeneity (= Not an issue)
– KBS normally based on a single conceptual model
• Rate of change (= Very Low)
– Change rate under developers' control (hence, low)
• Provenance (= Not an issue)
– KBS are normally created ad hoc for an application by a
centralised team of developers
• Trust (= not a major issue)
– Centralisation of devpt. process implies no significant trust issues
• Data Quality (= not a major issue)
– Again, centralisation guarantees data quality across the board
• Intelligence (= by-product of complex, task-centric reasoning)
– E.g., sophisticated diagnostic, planning systems…
The Semantic Web today
1st Generation SW Applications
Bibliographic Data
CS Dept Data
AKT Reference Ontology
<rdf :Description
rdf :abo ut=" ht t p:/ /ww w.ecs. sot on.ac.uk/info/#p erso
n-01 2 6 9 ">
<ns 0 :family -name>Gibbins</ns0:family -name>
<ns 0 :full -name>Nicholas Gibbins</ns0:full
-name>
<ns 0 :given-name>N icholas</ns0:g iven-name>
<ns 0 :has-email address> nmg@ecs.sot on.ac.u k</ns0:has -email address>
<ns 0 :has-affi liation -to -unit
rdf: resour ce=" ht t p:// 1 94 .66 .1 8 3. 2 6 / WEBSITE/G OW
/Vie wDepartme nt.aspx?Dep art ment =7 5 0"/>
< / rdf :Descriptio n>
</ rdf :RDF>
RDF Data
Features of 1st generation
SW Applications
• Typically use a single ontology
– Usually providing a homogeneous view over
heterogeneous data sources Limited use of existing SW
data
• Closed to semantic resources
• Limited interactivity
– In contrast with typical web 2.0 applications
Hence: current SW applications are far more similar
to traditional KBS (closed semantic systems) than to
'real' SW applications (open semantic systems)
It is still early days..
1895
2006
Next Generation SW Applications
Next generation SW
applications
NG SW Application
• Able to exploit the SW at large
– Hence: Multi-Ontology
• Supporting interactivity
– E.g., allowing users to add semantic data
– Hence, open with respect to SW resources
• Ideally also able to exploit non-SW data
– E.g., folksonomies
– Hence, embedding powerful information
extraction engines
Two systems we have built
Magpie
AquaLog
Magpie Components
Ontology cache
(Lexicon)
Enriched
Web Page
Magpie
Hub
Web Page
Problem Domain & Resources
Jabber Server
(found-item 3275578832
localhost
#u"http://localhost/peopl
e/motta/" john-domingue
john-domingue)
(found-item 3275578832
localhost
Ontology based
Proxy Server
Semantic Log
AquaLog: Ontology-Driven
Question Answering
Which is the
capital of Spain?
NL SENTENCE
INPUT
Madrid
(?, capital, Spain)
<Spain, has-capital-city, Madrid>
QUERY
RESULT
TRIPLES
TRIPLES
Linguistic Analysis
Mapping Engine
ANSWER
NL Generation
PowerMagpie: Semantic
browsing on the 'open' SW
Need for mechanisms for automatically
identifying semantic markup relevant to
the current page, user, browsing session,
etc..
PowerAqua: QA on the 'open'
semantic web
Need for mechanisms for automatically
locating ontologies relevant to the current
query, map user terminology to ontologies,
integrate info from different ontologies, etc..
What needs to be done to facilitate the
development of such 2nd generation SW
applications?
Dynamic Ontology Selection
• First: powerful support for ontology selection
• Both PowerAqua and PowerMagpie heavily rely
on ontology selection to locate possibly
relevant knowledge in response to
– User queries (PowerAqua)
– Accessing web pages (PowerMagpie)
• Hence, ontology selection is a crucial task for
both systems
Current support for ontology
selection
Limitations of Swoogle
• Query/Search
– Only keyword search, we need more powerful query methods
(e.g., ability to pose formal queries)
• Repository structure
– Very weak in Swoogle, not even duplicates are dealt with
– Need for automatic derivation of relations between ontologies
• E.g., same-ontology-as, ontology-extends, ontology-incompatiblewith, etc…..
– We need these relations to structure the repository and to
support more powerful ranking methods (see next bp)
• Ontology ranking
– Swoogle only uses a 'popularity-based' one, we need other
methods as well
We also need:
• Methods for fast extraction of ontology
modules
– Typically we only want the part of the ontology
relevant to our current needs
• Methods for the integration of information
derived from different ontologies
– In the context of QA this problem typically reduces
to that of deciding whether two instances denote the
same entity
Even more importantly..
• Need to look at a number of key research
issues in the context provided by NG-SW
applications
– Example: Ontology Mapping
• Current work focuses on design-time mapping of
complete ontologies
– Example: Ontology Selection
• Current work focuses on user-mediated ontology
selection
– Example: Ontology Modularization
• Current work by and large assumes that the user is in
the loop
A new application scenario
• NG-SW applications require algorithms able to
perform tasks such as selecting, modularizing,
and mapping ontologies at run time
• Moreover, in such a context, mapping is
concerned with mapping ontology fragments,
rather than complete ontologies
So What?
• Time to go beyond 1st generation applications
• 2nd generation SW applications will exploit
much more fully the large scale semantic
markup provided by the SW
• Many issues to be addressed:
– Better ontology crawling, indexing, retrieving and
ranking support
– Mapping, selection, and modularization methods
appropriate for NG-SW applications
– Further acceleration needed in the generation of
semantic markup
Exploiting the SW itself to
tackle its heterogeneity
• Interestingly, a NG-SW-based approach can
also be used also to tackle key SW tasks, such
as Ontology Mapping
– Based on the use of the SW itself as background
knowledge
Exploiting Large-Scale Semantics
Case Study:
Using the Semantic Web as background
knowledge in Ontology Mapping
Ontology Mapping: State of
the Art
• State-of-the-art methods rely on a
combination of:
– Label similarity methods
• e.g., Full_Professor = FullProfessor
– Structure similarity methods
• Using taxonomic information or information about
domain and range of associated properties
• However, as pointed out by Aleksovski et al
(EKAW, 2006):
– In many cases there is no sufficient lexical overlap
– In many cases source and target ontology have not
sufficient structure to allow effective structure-based
mapping
Use of bkg. knowledge for
ontology mapping
Background Knowledge
A
?
B
External Source = One Ontology
Alekszovski et al. EKAW’06
• Map candidate terms into concepts from a richly axiomatized domain
ontology (anchors)
• Derive a mapping based on the relation of the anchor terms
Advantages:
• Handles dissimilar ontologies
• Returns semantic mappings
rel B’
A’
=
A
=
rel
B
Disadvantages:
• Assumes that a suitable domain
ontology is available.
• Approach only suitable for closed
domains
External Source = Web
van Hage et al. ISWC’05
• rely on Google and an online dictionary in the food domain to extract
semantic relations between candidate mappings using IR techniques
+ OnlineDictionary
Advantages:
• General purpose
IR Methods
A
rel
B
Disadvantages:
• IR Methods introduce noise
External Source = WordNet
Lopez et al. ESWC ’05
• use wordnet to map queries expressed in the user's
terminology to a domain ontology to support question
answering
Advantages:
WordNet
• General purpose
A
rel
B
Disadvantages:
• Knowledge sparseness
• Works best with concepts, not
so useful with relations
• WordNet is not an ontology!!!
Knowledge-poor ontology
mapping
• Actually isn’t a bit strange that such complex
and knowledge-poor methods are devised,
when the SW already provides so much
background knowledge?….
External Source = SW
Proposal:
• rely on online ontologies (Semantic Web) to derive mappings
• ontologies are dynamically discovered and combined
Semantic Web
A
rel
B
Advantages:
• General purpose
• Does not introduce noise
• Works with any kind of domain
entities (concepts, relations,
instances)
Strategy 1 - Definition
Semantic Web
Find ontologies that contain equivalent classes to A and B and use their
relationship in the ontologies to derive the mapping.
For each ontology use these rules:
B1’
A1 ’
A2’
B2’ …
O2
O1
An’
On
Bn’
A'  B'  A  B
A'  B' A  B
A'  B' A  B
A'  B'  A  B
A
rel
B
These rules can be extended to take into
account indirect relations between A’ and
B’, e.g., between parents of A’ and B’:
A'  C  C  B'  A'  B'
Strategy 1- Variants
Semantic Web
Quick variant: Stop as soon as a relation is found
A
 B1’
O1
A1’

B
Strategy 1- Variants
Semantic Web
Precise variant: Derive all possible mappings from all ontologies
and combine them into a final mapping.
 B1’
O1
A
 B2’
A1’
O2

A2’
B
Dealing with Contradictions:
•Return all mappings even if contradictory
•Return a mapping only when there is no
contradiction
•Return the most frequent mapping (i.e., the
mapping derived from most ontologies)
•Return the mappings with 'higher authority'
(based on metrics of ontology evaluation or
trust)
•Try to combine mappings
A BB  A A B
Food

Semantic Web
Semantic Web
Strategy 1- Examples
MeatOrPoultry

RedMeat

Beef
AcademicStaff

Researcher
ka2.rdf
Tap
Beef
SR-16

Food
FAO_Agrovoc
Researcher
ISWC

AcademicStaff
SWRC
Strategy 2 - Definition
Principle: If no ontologies are found that contain the two terms then
combine information from multiple ontologies to find a mapping.
Details:
(1) Select all ontologies containing A’ equiv. with A
(2) For each ontology containing A’:
Semantic Web
rel B’
C’
C
rel
(a) if A'  C find relation between C and B.
(b) if A'  C find relation between C and B.
(r1)A'  C  C  B  A  B
B
A’
A
(r 2) A'  C  C  B  A  B
(r 3) A'  C  C  B  A  B
rel
B
(r 4) A'  C  C  B  A  B
(r 5) A'  C  C  B  A  B
Strategy 2 - Examples
Ex1:
Chicken Vs. Food
Chicken  Poultry
Poultry  Food
(midlevel-onto)
(r1)
Chicken  Food
(Tap)
(Same results for Duck, Goose, Turkey)
Ex2:
Ex3:
Ham Vs. Food
Ham  Meat
Meat  Food
(pizza-to-go)
(SUMO)
Ham Vs. Seafood
Ham  Meat
Meat  Seafood
(pizza-to-go)
(wine.owl)
(r1)
(r3)
Ham  Food
Ham  Seafood
Conclusions
• Using the SW as background knowledge for
ontology mapping has several benefits
– Suitable for our NG-SW scenario as there is no need
for design-time selection of a background knowledge
– Even when design-time selection is feasible, it is
suitable for those cases where a suitable domain
ontology cannot be found
– Reduces noise by exploiting only ontologies
– Can be tailored to handle multiple solutions
– Can be integrated with other approaches, based on
lexical and structural analysis
If you would like to find out
more..
• 'Vision' papers
– Motta, E., Sabou, M. (2006). "Next Generation Semantic
Web Applications". 1st Asian Semantic Web Conference,
Beijing.
– Motta, E., Sabou, M. (2006). "Language Technologies and
the Evolution of the Semantic Web". LREC 2006, Genoa,
Italy.
– Motta, E. (2006). "Knowledge Publishing and Access on
the Semantic Web: A Socio-Technological Analysis".
IEEE Intelligent Systems, Vol.21, 3, (88-90).
• Ontology Modularization
– D' Aquin, M., Sabou, M., Motta, E. (2006). "Modularization:
A key for the dynamic selection of relevant knowledge
components". ISWC 2006 Workshop on Ontology
Modularization
If you would like to find out
more..
• Ontology Mapping
– Lopez, V., Sabou, M., Motta, E. (2006). "Mapping the
real semantic web on the fly". International
Semantic Web Conference, Georgia, Atlanta.
– Sabou, M., D'Aquin, M., Motta, E. (2006). "Using the
semantic web as background knowledge for
ontology mapping". ISWC 2006 Workshop on
Ontology Mapping.
• Ontology Selection
– Sabou, M., Lopez, V., Motta, E. (2006). "Ontology
Selection for the Real Semantic Web: How to Cover
the Queen’s Birthday Dinner?". Proceedings of EKAW
2006, Podebrady, Czech Republic.
Download