Semantic Web Application

advertisement
Semantic Web Application
Dr Abdulrahman Altahhan
Course Director for MSc in Data Science
Coventry University,
ab8556@coventry.ac.uk
Computers
• Originally: Computers used for numerical
calculation
• Currently: Computer used for Information
Processing
– Database
– Text Processing
– Games
The Web
• Seeking, using or logging information
• Sociably
• Business/ buying and selling
• The main Entry point for the Web world is?
• Search Engines
Search Engine
• Search Engine Play a game called prediction
• User provides some clues of what s/he wants
• Search Engine finds what it believes the user
wants
Search Engines
• SE are really doing predictions(classification )
• relevant/non relevant documents
• Have high recall , low precision?
Predicted Relevance
Actual
Relevance
Total
True Positive
False Negative
False Positive
True Negative
TP/(TP+FP)
Precision
Total
TP/(TP+FN) Recall
Search Engines
• Despite improvements of SE
–
–
–
–
High recall low precision
Sensitive to vocabulary(keywords)
Semantically similar queries do not return same results?
Single Webpage…
Solution…?
•
•
•
•
Search Engines
• Despite improvements of SE
–
–
–
–
High recall low precision
Sensitive to vocabulary(keywords)
Semantically similar queries do not return same results?
Single Webpage…
Solution…?
• Either come up with more sophisticated text
processing techniques
• Or cheat …!
•
Search Engines
• Despite improvements of SE
–
–
–
–
High recall low precision
Sensitive to vocabulary(keywords)
Semantically similar queries do not return same results?
Single Webpage…
Solution…?
• Either come up with more sophisticated retrieval
techniques
• Or cheat …!
– Then we need to change the way we store things!
– We need to reengineer the Web To be more suitable for
machines
• Or both: Semantic Web
Web story
• HTML: concerns the look only
– <H1> Title: Professor </H1>
• XML: concern structured own localized tags
– <Title> Professor </Title>
• RDF: concerns relationships, user defined schema
– Makes no assumption about domain
– All have URI
– <“Mozart”, composed, “The Magic Flute” >
• RDFS: extends RDF with a standard ontology vocabulary
– Class, Property - Type, subClassOf - domain, range
• Ontology: extends RDFS with capability of constructing
classes as well as agreeing on a domain terms
Semantic Web Vision
Machine-processable, global
Web standards:
 Assigning unambiguous
names (URI)
 Expressing data, including
metadata (RDF)
 Capturing ontologies (OWL)
 Query, rules,
transformations,
deployment, application
spaces, logic, proofs, trust
(in progress)
[Source: Emerging Web Technologies to
Watch, Steve Bratt, W3C]
10
XML
User definable and domain specific markup
HTML:
<H1>Internet and World Wide Web</H1>
<UL>
<LI>Code: G52IWW
<LI>Students: Undergraduate
</UL>
XML:?
11
XML
User definable and domain specific markup
HTML:
<H1>Internet and World Wide Web</H1>
<UL>
<LI>Code: G52IWW
<LI>Students: Undergraduate
</UL>
XML:
12
<module>
<title>Internet and World Wide Web</title>
<code>G52IWW</code>
<students>Undergraduate</students>
</module>
RDF
• Resources:
– Books, Person, etc.
• Property
– “written by”, “title”, “married to”, etc.
• Statement
– object-attribute-value triple
RDF for semantic annotation
•
•
•
•
RDF provides metadata about Web resources
Object -> Attribute-> Value triples
It has an XML syntax
Chained triples form a graph
http://sepang.nottingham.edu.my/~bpayam/images/payam-barnaghi.png
has_image
http://sepang.nottingham.edu.my/~bpayam/#Payam
UNiM
has_owner
#Payam
payam@nottingh
am
has_teaching
http://www.nottingham.edu.my/CSIT/G53ELC
14
has_email
<rdf:Description rdf:about=“#Payam”>
<has_email>payam@nottingham</has_email>
</rdf:Description>
RDF Example
15
Source: http://www.w3.org/TR/swbp-skos-core-guide/
What does RDF Schema add?
• Defines vocabulary for RDF
• Organizes this vocabulary in a typed hierarchy
• Class, subClassOf, type
• Property, subPropertyOf
• domain, range
Staff
subClassOf
Lecturer
domain
supervisedBy
type
Tom
16
[adapted
from: Studer et al, 04]
supervisedBy
subClassOf
range
Schema(RDFS)
Research Assistant
type
Alan
Data(RDF)
Basic Queries: Example
select X,Y
From {X} writtenBy {Y}
X, Y are variables, {X} writtenBy {Y} represents a
resource-property-value triple
17
Conclusions about RDF(S)
• Next step up from plain XML:
– (small) ontological commitment to modeling
primitives
– possible to define vocabulary
• However:
– no precisely described meaning
– no inference model
[Davies, 03]
18
Ontologies
• Term is originated from philosophy
• For the Semantic Web purpose:
– “An ontology is an explicit and formal specification
of a conceptualisation”.
(R. Studer)
19
Ontologies and Semantic Web
• Provide a shared understanding of a domain
• Consists of a finite list of
– Terms (properties or classes)
• Ex.: staff members, students, courses, modules, lecture theatres,
and schools are important concepts (terms)
– relationships
• Ontologies are useful for improving accuracy of Web
searches.
• Web searches can exploit
generalization/specialization information
20
Ontologies and Semantic Web (cont’d)
• In the context of the Web, ontologies provide a
shared understanding of a domain.
• Such a shared understanding is necessary to
overcome the difference in terminology.
• Ontologies are useful for improving accuracy of Web
searches.
• Web searches can exploit
generalization/specialization information.
21
A Sample Ontology
Object
is_a
knows
Person
described_in
Topic
writes
is_a
Student
Researcher
Semantics
is_a
Affiliation
Siggi
+49 721 608 6554
Ontology
T
Affiliation
P
writes
Ontology
similar
F-Logic
instance_of
Tel
F-Logic
subTopicOf
PhD
Doktoral
Student
PhDStudent
Student PhD Student
similar
described_in
D
D
is_about
T
Rules
T
is_about
P
knows
D
T
AIFB
• Major Paradigms: Logic Programming, Description Logic
• Standards: RDF(S); OWL
22
Document
[Studer et al, 04]
Ontologies (OWL)
• RDFS : do not provide
– similarity and/or differences of terms
– construct classes, not just name them
– can a program reason about some terms? E.g.:
• “if «Person» resources «A» and «B» have the same «foaf:email» property,
then «A» and «B» are identical”
– etc.
• This lead to the development of OWL (Web Ontology
Language)
• Basically we would like to engineer the web in a form similar
to a set of domain databases
source:
Introduction to the Semantic Web, Ivan Herman, W3C
23
Classes in OWL
• In RDFS, you can subclass existing classes…
that’s all.
• In OWL, you can construct classes from
existing ones:
– enumerate its content
– through intersection, union, complement
– through property restrictions
source:
Introduction to the Semantic Web, Ivan Herman, W3C
24
OWL classes can be “enumerated”
The OWL solution, where possible content is
explicitly listed:
source:
Introduction to the Semantic Web, Ivan Herman, W3C
25
Why develop an ontology?
• To make define web resources more precisely and make them
more amenable to machine processing
• To make domain assumptions explicit
– Easier to change domain assumptions
– Easier to understand and update legacy data
• To separate domain knowledge from operational knowledge
– Re-use domain and operational knowledge separately
• A community reference for applications
• To share a consistent understanding of what information means
[Davies, 03]
26
Inference Example
prof (X) → faculty(X)
faculty(X) → staff (X)
prof(X)  staff(X)
prof (michael)
faculty(michael)
staff (michael)
source: A Semantic Web Primer, Grigoris Antoniou and Frank van Harmelen, MIT Press
27
Semantic Web and AI?
• No human-level intelligence claims
• As with today’s WWW
– large, inconsistent, distributed
• Requirements
– scalable, robust, decentralised
– tolerant, mediated
• Semantic Web will make extensive use of current AI,
– any advancement in AI will lead to a better Semantic Web
– Current AI is already sufficient to go towards realizing the semantic web
vision
• As with WWW, Semantic Web will (need to) adapt fast
[Davies, 03]
28
Semantic Web & Knowledge Management
• Organising knowledge in conceptual spaces
according to its meaning.
• Enabling automated tools to check for
inconsistencies and extracting new
knowledge.
• Replacing query-based search with query
answering.
• Defining who may view certain parts of
information
29
Elsevier: Horizontal Information
Products
• Elsevier is a leading scientific publisher
• Its papers are organized according to journals
(vertical)
• Different types of journals
Elsevier: Horizontal Information
Products
• Elsevier is a leading scientific publisher
• Its papers are organized according to journals
(vertical)
• Different types of journals
Problem
• Sometime the subscribers are interested in
getting everything related to a certain topic that
is spread across traditional disciplines
• Example: Alzheimer disease (biology, medicine,
chemistry etc.)
Elsevier: Horizontal Information
Products
Problem
• Sometime the subscribers are interested in
getting everything related to a certain topic
that is spread across traditional disciplines
• Example: Alzheimer disease (biology,
medicine, chemistry etc.)
• Solution:
Elsevier: Horizontal Information
Products
Problem
• Topic across disciplines
– Alzheimer disease
– biology, medicine, chemistry etc.
– Same topic have different names
Solution
– A Thesaurus (lightweight ontology )
– Each domain has its own (ex. MeSH for medical)
– Used to access information sources such as MBASE and
Science Direct
– Still not the best full ontological approach
– But it is a start
– Why?
Elsevier: Horizontal Information
Products
Problem
• Topic across disciplines
– Alzheimer disease
– biology, medicine, chemistry etc.
– Same topic have different names
Solution
–
–
–
–
A Thesaurus (lightweight ontology )
Each domain has its own (ex. MeSH for medical)
Used to access information sources such as MBASE and Science Direct
Elsevier uses EMTREE as a single underlying ontology against which all
vertical information sources are indexed
• Semantic Web plays roles of:
–
–
–
–
RDF is used as an interoperability format between heterogynous data
EMTREE ontology is represented in RDF (not the best thing to do!)
Each separate data source is mapped onto EMTREE
The ontology is used as the entry for all data scources
Audi: Data Integration
Problem
• Similar problem of Elsevier but internally
• Data Integration : highest cost factor (IT-wise) for large
companies
• Audi:
51,000 Employee
22 billion rev
700,000 cars annually
1000 databases: caused missing out opportunities as data
sources are not interconnected
– The databases cannot be queried against one simple query that
returns reliable timely information that could be used for
decision making
– Audi relies on costly manual code generation and point-to-point
translation scripts for data integration
–
–
–
–
Audi: Data Integration
Solution
• Could be to create a gigantic data warehouse or big
data analysis which will entails a lot of changes
migration issues
Or Ontologies:
• Rationalizing disparate data sources into one body of
information
• Create ontology for data and content sources
• Add generic domain information
Audi: Data Integration
Solution
• Could be to create a gigantic data warehouse or big data
analysis which will entails a lot of changes migration issues
Or Ontologies:
• Rationalizing disparate data sources into one body of
information
• Create ontology for data and content sources
• Add generic domain information
• Integration can be done without disturbing existing
application
• The ontology is mapped to the data sources (fields, record,
files, documents)
• Which gives applications access to the data thorough the
ontology
Problem
Audi: Data Integration
Application A
• A is using the encoding
Application B
• B is using the encoding
However there is no way that a computer can know that both talks about the same thing
and that Olympus-OM-10 is a type of SLR
Solution
Audi: Data Integration
We can provide ad hoc translation for these data sources however it is not portable
Instead we might write a simple camera ontology in OWL
Solution
Audi: Data Integration
Application A
• A is using the first encoding •
•
•
•
Application B
B is using second encoding
It receives data form B
B parses the XML doc form A
B encounters SLR (A ?)
Solution
Audi: Data Integration
Application A
• A is using the first encoding •
•
•
•
•
Ontology
• I now the solution!
•
•
Application B
B is using second encoding
It receives data form B
B parses the XML doc form A
B encounters SLR (A ?)
B consults camera ontology
What do you know about SLR
Ontology returns “SLR is a type
of camera”
• The point her is that syntactic divergence is no longer a hindrance.
• In fact it is desirable because each app can use its own that suits it locally
• The ontology provides a single integration rather than n2 individual mappings
Skill Finding at Swiss Life
• The Swiss Life Group is the largest life insurance
company of Switzerland
– 11000 employees
– 14$ billion of written premiums
– Has subsidiaries branches offices in 50 countries
Problem
– Its distribution over wide geographical and culturally
divers areas
– The construction of a company wide skills repository is
difficult
Skill Finding at Swiss Life
Problem
– The construction of a company wide skills repository is
difficult
• Solution
– Swiss Life used a hand-built ontology to cover skills in
three organizational units:
• IT
• Private Insurance
• HR
– Consisted of 700 concepts + 180 educational concepts
– 130 job function concepts divided across the 3 units
Skill Finding at Swiss Life
• Solution
– Swiss Life used a hand-built ontology to cover
skills in three organizational units:
• IT
• Private Insurance
• HR
– Consisted of 700 concepts + 180 educational
concepts
– 130 job function concepts divided across the 3
units
Skill Finding at Swiss Life
• Solution
Skill Finding at Swiss Life
• Solution
Conclusion
• Interoperation and mapping are main
obstacles to realizing semantic web
• Understanding what is available is a necessary
prerequisite to the specific application
• The solutions are not always complete or
optimal
• We have described a number of case studies
of applying semantic web technologies to
support supporting interoperation
Questions ?
• Discussions
Sources
• A Semantic Web Primer, Grigoris Antoniou and
Frank van Harmelen, ISBN 0-262-01210-3,
2004, the MIT press.
• W3C Semantic Web
http://www.w3.org/2001/sw/
• The Semantic Web Community Portal,
http://www.semanticweb.org
• The European Semantic Web Conference
Download