Why do we need the Semantic Web? Requirements of the WWW

advertisement
Semantic Web
Dr. Alexandra I. Cristea
http://www.dcs.warwick.ac.uk/~acristea/
2
Why do we need the Semantic Web?
Requirements of the WWW
•
•
•
•
I have a dream for the Web [in which computers] become
capable of analyzing all the data on the Web –– the content,
links, and transactions between people and computers.
...the day-to-day mechanisms of trade, bureaucracy and our
daily lives will be handled by machines talking to machines.
The internet - already there
HTML programmers
Search engines
Core weight of interest
Tim Berners-Lee (1999) Weaving the Web
3
Scientific American, May 2001:
4
Where we (still) are Today: the
Syntactic Web
• Realising the complete “vision” is too hard for now
(probably)
• But we can make a start by adding semantic
annotation to web resources
[Hendler & Miller 02]
5
6
1
Hard Work using the
Syntactic Web…
The Syntactic Web is…
• A hypermedia, a digital library
Find images of Peter Patel-Schneider, Frank van Harmelen and Alan
Rector…
– A library of documents called (web pages) interconnected by a
hypermedia of links
• A database, an application platform
– A common portal to applications accessible through web pages, and
presenting their results as web pages
• A platform
l tf
for
f multimedia
lti di
– BBC Radio 4 anywhere in the world! Terminator 3 trailers!
• A naming scheme
– Unique identity for those documents
A place where computers do the presentation (easy) and
people do the linking and interpreting (hard).
Why not get computers to do more of the hard work?
Rev. Alan M. Gates, Associate Rector of the
Church of the Holy Spirit, Lake Forest, Illinois
[Goble 03]
7
Impossible(?) via the Syntactic Web…
8
What is the Problem?
• Consider a typical web page:
• Complex queries involving background knowledge
– Find information about “animals that use sonar but are not
either bats or dolphins”
, e.g., Barn Owl
• Locating information in data repositories
– Travel enquiries
– Prices of goods and services
– Results of human genome experiments
• Finding and using “web services”
– Visualise surface interactions between two proteins
• Delegating complex tasks to web “agents”
– Book me a holiday next weekend somewhere warm, not too
far away, and where they speak French or English
• Markup consists of:
– rendering
information
(e.g., font size
and colour)
– Hyper-links to
related content
• Semantic content is
accessible to
humans but not
(easily) to
computers…
9
10
What information can a machine
see…
What information can we see…
WWW2002
The eleventh international world wide web conference
Sheraton waikiki hotel
Honolulu, hawaii, USA
7-11 may 2002
1 location 5 days learn interact
Registered participants coming from
australia, canada, chile denmark, france, germany, ghana, hong kong,
india, ireland, italy, japan, malta, new zealand, the netherlands, norway,
singapore, switzerland, the united kingdom, the united states, vietnam,
zaire
Register now
On the 7th May Honolulu will provide the backdrop of the eleventh
international world wide web conference. This prestigious event …
Speakers confirmed
Tim berners-lee




















 
11
12
2
Solution: XML markup with
“meaningful” tags?
But What About…
<conf>
<name>

</name>
<location>

</conf>
<place>
</place>
</location>
<date></date>
<slogan>
<date></date>
<slogan>
<participants>
<participants>
</slogan>
</slogan>





13
Machine sees…
14
A more current scenario
<>

</>
<>
</>
<></>
<>


</>
<>




• What are you doing on Burns night?
– Google “burns”
– Wikipedia articles on Robert Burns
– Amazon listing of books by Burns
– Google Maps to look at birthplace of Burns
15
16
17
18
3
Google Maps
19
20
Combining one source with a service from another
Combining Information
21
Web APIs
22
Limitations of Web APIs
• A large and growing number of web data sources
provide program-accessible interfaces (APIs).
• The web site http://www.programmableweb.com
currently (October 2015) lists over 14123.
• Most popular Web APIs are:
• The interfaces are non-uniform - REST, RPC
(e.g., SOAP) and hybrid
• The results are returned in variety of formats XML, JSON, Atom
• The data schemas tend to be providerspecific
• Militates against the development of portable,
generic methods of accessing and using data.
23
24
4
History of the (Semantic) Web
The semantic web
• Web was “invented” by Tim Berners-Lee (amongst
others), a physicist working at CERN
• TBL’s original vision of the Web was much more
ambitious than the reality of the existing (syntactic)
Web:
• Invented by Tim Berners-Lee and others.
W3C driving organisation.
– Web of machine-readable data
• What are the main aims of the SW?
“... a goal of the Web was that, if the interaction between person and
hypertext could be so intuitive that the machine-readable information
space gave an accurate representation of the state of people's
thoughts, interactions, and work patterns, then machine analysis could
become a very powerful management tool, seeing patterns in our work
and facilitating our working together through the typical problems which
beset the management of large organizations.”
TBL (and others) have since been working towards realising this
vision, which has become known as the Semantic Web
E.g., article in May 2001 issue of Scientific American…
– Automated query-answering
query answering
– Automated use of the data (reasoning,
planning,acting, etc)
25
26
WWW v Semantic Web
Why the Semantic Web?
•
•
•
•
I don’t think [the Semantic Web is] a very good name
but we’re stuck with it now. The word semantics is used
by different groups to mean different things ...I think we
could have called it the Data Web. ...it connects all
applications together or gives [people] access to data
across the company ...
WWW is a web of documents
SW is a web of data
WWW documents are human readable
SW data is machine readable (in theory
at least)
• Shared AAA principle:
Anyone can say Anything, Anywhere.
Tim Berners-Lee (2007), Interview in Business Week
27
28
What can the Semantic Web
actually do?
Why the Semantic Web?
• Query answering:
• IBM’s Watson: beats human competitors at
Jeopardy
• but
• specifically trained for this task (including
looking at decade’s worth of past Jeopardy
answers)
• sort of cheating (reaction times means it
always gets first go!)
• Syntax / semantics distinction: long
history in philosophy of language,
linguistics, formal logic
y
concerned with arrangement
g
of
• Syntax
symbols
• Semantics concerned with the relation
between symbols strings and the world:
what things actually mean.
29
30
5
What can the Semantic Web
actually do?
What can the Semantic Web
actually do?
• Query answering:
• Wolfram-alpha: does complex queryanswering and solves mathematical problems
• but
• hand-curated database - not the Semantic
Web
• hugely labour-intensive to develop and cannot
take advantage of new knowledge
• Query answering:
• Other systems:
– considerable progress
– current state-of-the-art is extremely useful
• but
• the general case is hard!
31
What can the Semantic Web
actually do?
32
What are the requirements of the Semantic Web?
• Large numbers of users to make their data:
• Automated use of data:
• works well in constrained circumstances:
– available
– in an appropriate machine-readable format
This is happening now: open government data (esp.
in UK and US) and many other organisations and
individuals: https://www.data.gov.uk/ https://www.data.gov/
– for example: Google maps can automatically
combine information about maps, speed limits,
current road usage, etc., to get estimates of
journey time
>> find more open data repositories as homework!
• Good query-answering systems
• The ability to automatically interpret and use
data
• very hard in unconstrained circumstances:
– classic SW example of an automated travel agent
still far from achievable
33
34
Need to Add “Semantics”
• External agreement on meaning of annotations
– E.g., Dublin Core
• Agree on the meaning of a set of annotation tags
Ontology Languages
for the
Semantic Web
– Problems with this approach
• Inflexible
• Limited number of things can be expressed
• Use Ontologies to specify meaning of annotations
–
–
–
–
Ontologies provide a vocabulary of terms
New terms can be formed by combining existing ones
Meaning (semantics) of such terms is formally specified
Can also specify relationships between terms in multiple
ontologies
35
6
What is an ontology?
Same world-view?
• Originally: a definitive account of what exists
(derived from metaphysics).
• Therefore, we can create a single ontology
that describes the world –
• maybe dividing into smaller sub-ontologies
as necessary.
• But this is completely misconceived!
• Check as a homework other definitions of the word
‘ontologies’ via Google.
• Hence ‘Ontology merging’ a hot research area!
37
38
Why Semantic Web ontologies?
Ontologies in the SW
• A way of encoding domain knowledge,
linking the knowledge, which allows for
reasoning with the data
• Dictionary/ Vocabulary 
 Taxonomy 
 Ontology
• Ontologies allow for data integration and
inference, for automated query-answering
and automated use of data
• data integration
39
Why Semantic Web ontologies?
Why Semantic Web ontologies?
• data integration
• inference
William Burnes is the
father of Robert Burns.
…
Father is a subclass of
parent.
…
40
• data integration
• Inference
William
Burnes is the
parent of
Robert Burns.
• Automated query-answering
• Automated use of data
41
42
7
Example Ontologies
Dublin Core
FOAF
TrackBack
MetaVocab
Basic Geo Vocabulary
BIO
RSS 1.0
VCard RDF
Creative Commons metadata
WOT
SIOC
GoodRelations
DOAP
Programmes Ontology
Music Ontology
OpenGUID
Provenance Vocabulary
Pedagogical diagnosis
DILIGENT Argumentation Ontology
Language
RDF
OWL DL
RDF
RDF
RDF Schema
RDF
RDF Schema
RDF
RDF Schema
OWL DL
OWL DL
OWL DL
RDF Schema
OWL 2
OWL 2
RDF Schema
OWL DL
OWL DL
OWL 2
Structure of an Ontology
Swoogle hits
1,364,337
1,194,871
502,401
441,790
248,130
220,228
201,786
181,962
112,216
97,292
42,911
5,000
1,442
943
646
1
1
1
1
Revised
28 October 2006
27 July 2005
Ontologies typically have two distinct components:
• Names for important concepts in the domain
16 February 2002
1 February 2006
5 March 2004
6 December 2000
22 February 2001
– Elephant is a concept whose members are a kind of animal
– Herbivore is a concept whose members are exactly those
animals who eat only plants or parts of plants
– Adult_Elephant
Adult Elephant is a concept whose members are exactly
those elephants whose age is greater than 20 years
23 February 2004
11 April 2008
1 October 2011
5 November 2005
7 September 2009
14 February 2010
24 September 2008
25 August 2009
1 April 2012
13 September 2006
http://semanticweb.org/
• Background knowledge/constraints on the domain
– Adult_Elephants weigh at least 2,000 kg
– All Elephants are either African_Elephants or
Indian_Elephants
– No individual can be both a Herbivore and a Carnivore
43
Example Ontology
44
A Semantic Web — First Steps
Make web resources more accessible to automated processes
• Extend existing rendering markup with semantic
markup
– Metadata annotations that describe content/function of web
accessible resources
• Use Ontologies to provide vocabulary for annotations
– “Formal
Formal specification”
specification is accessible to machines
• A prerequisite is a standard web ontology language
– Need to agree common syntax before we can share
semantics
– Syntactic web based on standards such as HTTP and HTML
45
Ontology Design and Deployment
• Given key role of ontologies in the Semantic Web, it is essential
to provide tools and services to help users:
– Design and maintain high quality ontologies, e.g.:
•
•
•
•
46
The Semantic Web
Shared ontologies help to exchange data
and meaning between web-based services
Meaningful — all named classes can have instances
Correct — captured intuitions of domain experts
Minimally redundant — no unintended synonyms
Richly axiomatised — (sufficiently) detailed descriptions
– Store (large numbers) of instances of ontology classes, e.g.:
• Annotations from web pages
– Answer queries over ontology classes and instances, e.g.:
• Find more general/specific classes
• Retrieve annotations/pages matching a given description
– Integrate and align multiple ontologies (merging)
(Image by Jim Hendler)
47
48
8
Wine Example Scenario
Ontologies in the Semantic Web
Tell me what wines I
should buy to serve with
each course of the
following menu.
• Provide shared data structures to
exchange information between agents
• Can be explicitly used as annotations in
web sites
• Can be used for knowledge-based
services using other web resources
• Can help to structure knowledge to build
domain models (for other purposes)
Books Agent
Wine Agent
I recommend
Chardonney or
DryRiesling
Grocery Agent
49
Many languages use “OO” model based on:
Ontology Languages
• Objects/Instances/Individuals
• Wide variety of languages for “Explicit Specification”
– Graphical notations
•
•
•
•
– Elements of the domain of discourse
– Equivalent to constants in FOL
Semantic networks
Topic Maps (see http://www.topicmaps.org/)
UML
RDF
• Types/Classes/Concepts
– Sets of objects sharing certain characteristics
– Equivalent to unary predicates in FOL
– Logic based
•
•
•
•
•
•
• Relations/Properties/Roles
Description Logics (e
(e.g.,
g OIL
OIL, DAML+OIL
DAML+OIL, OWL)
Rules (e.g., RuleML, Prolog)
First Order Logic (e.g., KIF)
Conceptual graphs
(Syntactically) higher order logics (e.g., LBase)
Non-classical logics (e.g., Flogic, Non-Mon, modalities)
– Sets of pairs (tuples) of objects
– Equivalent to binary predicates in FOL
• Such languages are/can be:
–
–
–
–
– Probabilistic/fuzzy
• Degree of formality varies widely
– Increased formality makes languages more amenable to machine
processing (e.g., automated reasoning)
50
Well understood
Formally specified
(Relatively) easy to use
Amenable to machine processing
51
Web “Schema” Languages
52
Protégé
• Existing Web languages extended to facilitate content
description
– XML  XML Schema (XMLS)
– RDF  RDF Schema (RDFS)
• XMLS not an ontology language
– Changes format ~ DTDs (document schemas) for XML
– Adds an extensible type hierarchy
• Integers, Strings, etc.
• Can define sub-types, e.g., positive integers
• RDFS is recognisable as an ontology language
– Classes and properties
– Sub/super-classes (and properties)
– Range and domain (of properties)
53
54
9
(In)famous “Layer Cake”
???
???
???
 Semantics+reasoning
 Relational Data
?
?
 Data Exchange
• Relationship between layers is not clear
• OWL DL extends “DL subset” of RDF
55
56
Linked Data
57
58
Semantic web: Linked Data
Linked Data: The four rules
• Isn’t just about putting data on the Web
• It’s about making links
• Web of Hypertext -> Web of Data
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look
up those names.
3 When someone looks up a URI,
3.
URI provide
useful information, using the standards
(RDF*, SPARQL).
4. Include links to other URIs, so that they
can discover more things.
59
60
10
Why HTTP URIs?
URIs
• Globally unique names
Homepage of the Department of Computer Science
http://www.dcs.warwick.ac.uk/
– can be created in a decentralised fashion
by domain name owners;
– no central naming authority is required.
Homepage of Alexandra Cristea
http://www2.warwick.ac.uk/fac/sci/dcs/people/Alexandra_Cristea
• These URIs point to web documents - or in the terminology
of WebArch, information resources.
• Not just a name, but a means of
accessing information describing the
identified entity. (URL)
– by definition, all its essential characteristics can be conveyed in a
message
• Web clients request a representation of a resource
• One and the same resource might have different
representations; e.g., text in English, Greek, Chinese, etc.
61
Content Negotiation
62
URIs for Things
• HTTP clients send HTTP headers with each request to
indicate what kinds of documents they prefer.
• Client can say prefers language X over Y.
• Or prefers RDF over HTML.
• Servers inspect headers and select an appropriate response.
• We need mechanisms to ensure that when URIs are
dereferenced,
– real-world objects are not confused with documents that
describe them, and
– humans as well as machines can retrieve appropriate
representations.
Header of GET Requests
GET /fac/sci/dcs/people/Alexandra_Cristea HTTP/1.1
Host: www2.warwick.ac.uk
Accept: text/html, application/xhtml+xml
Accept Language: en, gr, cn
Servers Response
HTTP/1.1 200 OK
Content -Type: text/html
Content-Language: en
63
RDF for Linked Data
64
Kinds of Links
• RDF is standardly used for Linked Data. Advantages include:
– Easy to insert RDF links between data from different sources.
– Information from different sources can be combined by graph
merging.
– Information using different schemas can be expressed in a single
graph i.e.,
graph,
i e by mixing different vocabularies.
vocabularies
– Data can be tightly or loosely structured.
• Relationship Links
– related things in other data sources.
≈ hyperlinks in a web document.
– e.g. foaf:based_neardbpedia:Edinburgh
• Identity Links
– URI aliases of other data sources for the same (realworld/abstract) object.
• Features of RDF that are avoided:
• Vocabulary Links
– Reification (hard to query with SPARQL)
– Collections and containers (ditto). Use multiple triples with same
predicate instead.
– Blank nodes: makes merging less effective.
– definitions of vocabulary terms used to represent the data.
65
66
11
Identity Links
• different URIs may refer to same real-world object.
– Standard for equivalence: http://www.w3.org/2002/07/owl#sameAs.
• Motivations for this approach:
– Different aliases can be dereferenced to different description of same
resource (AAA principle).
– Support provenance : trace back to publisher of URI.
– canonic > centralised naming authority > barrier to spread web of data.
• Potential problems:
– Identity may be context dependent
– Facts vs. opinions
67
68
Reflecting on Linked Data
Is Your Data 5-★?
• Structured data
– available on web (i.e. open) in many formats:
– CSV, Excel, HTML Microdata(e.g. http://schema.org/), web APIs, PDF
tables (shudder), ...
• Advantages of Linked Data:
– A unifying data model (RDF)
– A standardised data access mechanism (HTTP)
– Hyperlink-based data discovery: links connect all Linked Data into a
single global data space and enable Linked Data applications to discover
new data sources at run-time.
– Self-descriptive data: vocabulary definitions are recoverable like other
data, and vocabulary terms can be linked to one another.
69
Reflecting on Linked Data
70
Web of Data (Linked Data)
• Linked data adopts perspective of data integration.
– Not (necessarily) interested in reasoning aspect of
Semantic Web.
• http://blog.paulwalk.net/2009/11/11/linked-opensemantic/:
–
–
–
–
Data can be open, while not being linked.
Data can be linked, while not being open.
Data which is both open and linked is increasingly viable.
The Semantic Web can only function with data which is
both open and linked.
71
72
12
Summary Linked Data
Acknowledgements
Thanks to various people from
whom I “borrowed” material:
• Linked Data principles
–
–
–
–
–
– Naming things with URIs
– Making URIs dereferenceable
– Providing useful RDF information
– Including links to other things
Jeen Broekstra
Carole Goble
Frank van Harmelen
Austin Tate
Raphael Volz
And thanks to all the people
from whom they borrowed it

73
74
Finding out more on SW
• Course website and recommended reading
• Do your homeworks!
• There is lots of relevant literature online –
try to explore it
• Also a lot of informal discussion on Twitter,
newsgroups, YouTube, etc.
75
13
Download