What is the Semantic Web?

advertisement
What is the Semantic Web?
“The Semantic Web is an extension of the current web in which information
is given a well-defined meaning, better enabling computers and people to
work in cooperation.” – Tim Berners-Lee, et al [The Semantic Web, Scientific
American, 2001.]
“A set of standards and best practices for sharing data and the semantics of
that data over the Web for use by applications” -- Bob DuCharme [Learning
SRARQL, 2013.]
Standards:
1. RDF data model
2. SPARQL query language
3. RDFS and OWL standards for storing vocabularies and ontologies.
Best practices include the use of URIs (IRIs) to refer to entities on the web
and use of standards.
1
Semantic Web Layer Cake
Source: http://www.semanticfocus.com/blog/entry/title/introduction-to-the-semantic-web-vision-and-technologies-part-2foundations/
The Semantic Web is a REALITY
Currently, the Semantic Web encompasses almost 10000 databases, >85
billion facts, > 800 million links. These are publicly available data, identifiable
via URI and accessible via HTTP.
Example: DBPedia -- Wikipedia for the Semantic Web, which can be used by
both, humans and computers. For humans, information is returned as an
HTML document, for computers – information is returned in machine
understandable RDF format. The link
http://dbpedia.org/resource/Central_Connecticut_State_University
http://dbpedia.org/page/Central_Connecticut_State_University
(returns the web page)
http://dbpedia.org/data/Central_Connecticut_State_University
(returns machine-understandable representation)
3
About: Central Connecticut State University
An Entity of Type : Public university, from Named Graph : http://dbpedia.org, within Data Space : dbpedia.org
Property
Value
•Central Connecticut State University is a regional,
comprehensive public university in New Britain,
Connecticut. Founded in 1849 as Connecticut Normal
School, CCSU is Connecticut's oldest publicly funded
Central Connecticut State University is a regional, comprehensive public university in New Britain, Connecticut. Founded in 1849 as Connectic
university. CCSU is made up of four schools: the
Ammon School of Arts & Science, the School of
Business, the School of Education & Professional
Studies, and the School of Engineering & Technology.
Attended by over 11,000 students, 9,200 are
undergraduates, and 2,000 are graduate students. It is
part of the Connecticut State University System ,
which also oversees Eastern, Western, and Southern
Connecticut State Universities. Together they have a
student body of over 34,000. As a commuter school,
more than half of students live off campus and ninety
percent are in-state students.
dbo:abstract
dbo:affiliation
•dbr:Connecticut_State_University_System
dbo:athletics
•dbr:National_Collegiate_Athletic_Association
dbo:campus
•dbr:Suburb
dbo:city
•dbr:New_Britain,_Connecticut
dbo:country
•dbr:United_States
dbo:endowment
•4.7E7
dbo:formerName
•Central Connecticut State College
•Connecticut Normal School
•New Britain Normal School
•Teachers College of Connecticut
dbo:mascot
•Blue Devil
dbo:numberOfPostgraduateStudents
•2094 (xsd:integer)
dbo:numberOfUndergraduateStudents
•9771 (xsd:integer)
dbo:officialSchoolColour
•BlueandWhite
4
CCSU info as a machine-understandable document
<?xml version="1.0" encoding="utf-8" ?><rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:georss="http://www.georss.org/georss/"
xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dbp="http://dbpedia.org/property/"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:ns7="http://www.w3.org/ns/prov#" xmlns:dbo="http://dbpedia.org/ontology/"
xmlns:dct="http://purl.org/dc/terms/" > <rdf:Description
rdf:about="http://dbpedia.org/resource/Yan_Klukowski__3"> <dbo:team
rdf:resource="http://dbpedia.org/resource/Central_Connecticut_State_University" />
</rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/resource/Dan_Gaspar__8">
<dbo:team rdf:resource="http://dbpedia.org/resource/Central_Connecticut_State_University" />
</rdf:Description> <rdf:Description
rdf:about="http://dbpedia.org/resource/1999%E2%80%932000_Los_Angeles_Clippers_season">
<dbp:college rdf:resource="http://dbpedia.org/resource/Central_Connecticut_State_University" />
</rdf:Description> <rdf:Description
rdf:about="http://dbpedia.org/resource/List_of_Phi_Beta_Sigma_chapters"> <dbp:school
rdf:resource="http://dbpedia.org/resource/Central_Connecticut_State_University" />
</rdf:Description>
……..
5
XML (eXtended Markup Language)
XML is a flexible text format that is used to structure, store, and transport data
over the Web. Contrary to HTML, which is about displaying data, XML is about
describing data, BUT there is no one standard way to describe the same data.
Example: Consider the concept COURSE, and its instance CS462.
HTML description
XML description
<H1> CS462: AI</H1>
<course>
<UL>
<title> CS462: AI </title>
<LI> CRV: 4185
<CRV> 4185 </CRV>
<LI> Level: undergrad/grad
<level> undergrad/grad </level>
<LI> Professor: NZ, office hours …
<Professor>
<LI> Website: www.cs.ccsu.edu/~neli
<name> NZ </name>
</UL>
<office hours> …</office hours>
<Website> … </Website>
</Professor>
</course>
6
XML documents are labeled trees
Course
Professor
CRN
Name
Title
Level
Web site
7
XML (contd.)
XML documents are easily readable and understandable by humans, because
their tags are familiar terms, but
• XML lacks semantics, and
• XML makes no commitment to ontological vocabulary, nor to ontological
modelling , i.e. can not serve as knowledge representation language.
Because XML is a universal meta markup language, the same term can be given
different meanings by different sources (for example title can mean “book title”
or “person title”) . To resolve such inconsistencies, the so-called namespaces
are used. For example:
xmlns:dc=“http://purl.org/dc/elements/1.1/” defines namespace dc
(Dublin Core) and <dc:title>Artificial Intelligence</dc:title> suggests that term
title refers to a book.
xmlns:v=“http://www.w3.org/2006/vcard/” describes people, and
<v:title>Doctor</v:title> suggests that term title refers to a person.
8
RDF (Resource Description Framework)
•
RDF is the foundation for representing and processing knowledge on the web. It is a
graph-based data model, where knowledge is represented as a list of statements called
triples.
•
Each triple has the form “subject, predicate, object”. Example: “Jones TEACHES
Math101”
•
Each element of a triple (the resource) is identified by a URI. Example:
<http://myUniv.edu/people/Jones> <http://myUniv.edu/terms/teaches>
<http://myUniv.edu/courses/Math101> --- in N-triples format.
RDF can be implemented in various ways (called serializations), one of which has XMLbased syntax to support syntactic interoperability. Example:
<rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:myUniv=“http:/myUniv.edu/terms/”>
<rdf:Description rdf:about=“http://myUniv.edu/jones”>
<myUniv:teaches>
<rdf:Description
rdf:about=“http://myUniv.edu/courses/Math101”>
</rdf:Description>
</rdf:Description>
</rdf:RDF>
9
RDF domain example -- Friend of a Friend
(FOAF) domain
RDF class Person
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:Person>
<foaf:name>Neli Zlatareva</foaf:name>
<foaf:title>Dr</foaf:title>
<foaf:givenname>Neli</foaf:givenname>
<foaf:family_name>Zlatareva</foaf:family_name>
<foaf:homepage rdf:resource="http://www.cs.ccsu.edu/~neli"/>
</foaf:Person>
</rdf:RDF>
10
RDF statements are directed labeled graphs
<http://www.math.ccsu/jones>
<http://www.cs.ccsu.edu/~neli/univ.owl#teaches>
<http://www.ccsu.edu/catalog/Math101>
• RDF is provided with a model-theoretic semantics that defines
the notion of entailment between two RDF statements.
• RDF graphs are finite sets of RDF triples.
• This types of graphs are very similar to semantic nets.
11
Another example
Consider the following set of triples:
{ <?p1 foaf:name “Jones”>, <?p1 foaf:knows ?p2>,
<?p1 myUniv:teaches ?c1>, <?p2 myUniv:studies ?c1>,
<?p2 foaf:name “Bob”>, <?p2 foaf:mbox “Bob@mygmail.com”>,
<?c1 rdf:type myUniv:course>, <?c1 foaf:name “Math101”>}
where foaf : <http://xmlns.com/foaf/0.1/>
rdf: <http://www.w3.org/1999/02/22-rdf-syntav-ns#type>
“Jones”
“Bob”
foaf:name
foaf:name
foaf: knows
foaf:mbox
_:p1
_:p2
myUniv:teaches
foaf:name
myUniv:studies
“Bob@mgmail.com”
_:c1
“Math101”
rdf:type “myUniv:course”
12
RDF Schema (RDF Vocabulary Description Language)
• RDF is a universal language that allow users to describe their own
domains, but it does not make assumptions about any particular domain.
• RDF Schema defines the vocabulary, specifies object properties and their
values, and describes the relations between objects.
• RDF Schema organizes this vocabulary in a typed class hierarchy.
Example (for short, in N3 format, which is a superset of N-Triples; it allows us
to define a URI prefix and identify entity URIs wrt a set of prefixes at the
beginning of the document)
@prefix univ: <http://www.cs.ccsu.edu/~neli/univ.owl> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
univ:professor rdfs:subClassOf univ:staff .
univ:staff rdf: type rdfs: Class .
univ: professor rdf:type rdfs:Class .
univ:Jones rdf:type univ:professor .
13
RDF/RDFS Example
teaches
Jones
Math101
Professor
TeachingAss
RDF
RDFS
Staff
14
RDF and RDFS Axiomatic Semantics
• All language primitives are represented by constants, such as Resource,
Class, Property, subClassOf, Literal, etc.
• A few predefined predicates are used to represent relations between
constants, such as:
– An RDF triple is represented as PropVal (P, R, V), where P is a property,
R is a resource, and V is a value;
– Predicate Type (R, T) states that resource R has the type T, and it is
equivalent to PropVal (type, R, T). All classes are instances of Class
and have the type Class, i.e Type (Class, Class), Type (Property, Class),
Type (Resource, Class), etc.
• Resource is the most general class – every class and every property is a
resource.
• Predicates in RDF statements are properties.
15
RDF and RDFS Semantics (contd)
In RDFS, we also have subclasses , subproperties, and constrains.
• subClassOf is a property, i.e. Type(subClassOf, Property).
• If class C is a subclass of class C’, then all instances of C are also instances of C’,
i.e. PropVal (subClassOf, ?c, ?c’)  (Type(?c, Class) & Type(?c’, Class) &
?x (Type (?x, ?c)  Type (?x, ?c’)))
• Property P is a subproperty of property P’, if P’(x, y) whenever P (x, y), i.e.
Type (subPropertyOf, Property)
PropVal (subPropertyOf, ?p, ?p’)  (Type(?p, Property) & Type(?p’, Property)
& ?r ?v (PropVal (?p, ?r, ?v)  PropVal (?p’, ?r, ?v)))
• Every constraint resource is a resource, i.e.
PropVal (subclassOf, ConstraintResourse, Resourse)
• Constraint properties are all properties that are also constraint resourses, i.e
(Type (?cp, ConstraintProperty)  (Type (?cp, ConstraintResource) &
Type (?cp, Property))
16
RDF and RDFS Semantics (contd)
• domain and range are constraint properties, i.e.
Type (domain, ConstraintProperty)
Type (range, ConstraintProperty).
• Domain of a property is a set of all object to which P applies, i.e.
PropVal (domain, ?p, ?d)  ?x ?y (PropVal (?p, ?x, ?y)  Type (?x, ?d))
• Range of property P is the set of all values that P can take, i.e.
PropVal (range, ?p, ?r)  ?x ?y (PropVal (?p, ?x, ?y)  Type (?y, ?r))
Given all these axioms, we can derive the following formulas:
PropVal (domain, range, Property)
PropVal (range, range, Class )
PropVal (domain, domain, Property)
PropVal (range, domain, Class)
Example. Given PropVal (subClassOf, Professor, Staff),
PropVal (domain, teaches, Professor), PropVal (teaches, Jones, Math1)
we can derive Type (Jones, Staff).
17
A Direct Inference System for RDF and RDFS
•
Based on rules of the form
If
E contains certain triples
Then add to E certain triples, where E is a set of RDF triples.
•
Example rules (from W3C RDF recommendations):

If
E contains the triple (?x, ?p, ?y)
Then E also contains the triple (?p, rdf : type, rdf : property)

If
E contains the triples (?u, rdfs : subClassOf, ?v)
and (?v, rdfs : subClassOf, ?w)
Then E also contains the triple (?u, rdfs : subClassOf, ?w)

If
E contains the triples (?x, rdf : type, ?u)
and (?u, rdfs : subClassOf, ?v)
Then E also contains the triple (?x, rdf : type, ?v)

If
E contains the triples (?x, ?p, ?y)
and (?p, rdfs : range, ?u)
Then E also contains the triple (?y, rdf : type, ?u)
18
How inference in RDF is different from
inference in RDFS?
Consider the following triples
• myUni:Student1 rdf: type myUni: TeachingAssistant .
• myUniv: TeachingAssistant rdfs: subClassOf myUniv: Staff .
RDF inference will not return an answer to the query to retrieve all staff
members, i.e.
(?x , rdf : type, myUniv : Staff)
because there is no triple matching this pattern.
RDFS will return the instances of the TeachingAssistant class using the rule
(called the “type propagation” rule)
If
E contains the triples (?x, rdf : type, ?u)
and (?u, rdfs : subClassOf, ?v)
Then E also contains the triple (?x, rdf : type, ?v)
19
Types of inferences in SW applications
1.
2.
3.
4.
Class membership: if x is an instance of class C, and C is a subclass of D,
we want to infer that x is an instance of D.
Equivalence of classes: If class A is equivalent to class B, and class B is
equivalent of class C, then A is equivalent to C.
Classification: if a property-value pair is declared to be a sufficient
condition for membership in class A, then if individual x satisfies this
condition, x must be an instance of A.
Consistency: if x is declared to be an instance of class A where A  B 
C, A  D , and B  D = , then the ontology is inconsistent because class
A must be empty but instead x  A.
20
Multiple inheritance and RDFS
Consider the rule
If
E contains the triples (?x, rdf : type, ?u)
and (?u, rdfs : subClassOf, ?v)
Then E also contains the triple (?x, rdf : type, ?v)
Assume the following triples
a. “Bob” rdf : type myUniv : TeachingAssistant .
b. “Bob” rdf : type myUniv : Student .
c.
myUniv: TeachingAssistant rdfs : subClassOf myUniv: Staff .
From a. and c. and the above rule will can derive
“Bob” rdf : type myUni : Staff .
The later may be inconsistent with b. if Staff and Student are supposed to be disjoint
classes. But disjointness of classes cannot be expressed in RDFS.
In RDFS, if ?A is subClassOf ?B, and ?A is subClassOf ?C, then any individual ?x that is
a member of ?A will also be a member of ?B and ?C. That is, the range definitions in
RDFS are not used to restrict the range of a property, but to infer the membership of
the range.
21
What can we deduce in RDFS?
In summary, the inference capabilities of RDFS are limited to the following :
1.
2.
3.
Given the domain and the range of a property, we can deduce:
• Class membership from the domain of a property.
• Class membership from the range of a property.
Example: given that “Course isTaughtBy Professor” and “Math101
isTaughtBy Jones”, we can derive that Math101  Course, and Jones
 Professor.
Given a class hierarchy, we can deduce superclass membership.
Example: given that Professor  Staff and Jones  Professor, we can
derive that Jones  Staff.
Given a property hierarchy, we can deduce new facts from
subproperty relationships. Example: from teachAt  emplyedBy and
“Jones teachAt CCSU” we can derive that “Jones employedBy CCSU”
22
What cannot we deduce in RDFS?
1.
2.
3.
4.
5.
We can’t say that two classes are disjoint, i.e. we can define Student and
Staff as subclasses to Person class, but can say that they are disjoint.
Property range is defined globally for all classes, we can’t declare range
restrictions that apply to some classes only, i.e. exceptions are not
allowed.
We can’t build Boolean combinations of classes. For example, we may
want to declare a new class, person, which is disjoint union of classes
male and female.
Cardinality restrictions are not allowed. For example, we can’t say that a
person has exactly two parents, or a class has exactly one instructor.
We can’t declare a property to be inverse of another property, transitive,
functional, etc.
23
SPARQL: a query language for RDF and RDFS
SPARQL is based on RDF Turtle serialization and basic graph pattern matching
algorithm.
Example:
PREFIX rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?c
WHERE
{
?c rdf : type rdfs:Class .
}
This query retrieves all classes.
24
Using SPARQL
25
What can we do with SPARQL?
1.
2.
3.
4.
5.
6.
7.
Extract data as RDF subgraphs, URIs, blank nodes, etc.
Explore data via query for unknown relations.
Transform RDF data from one vocabulary into another
Construct new RDF graphs based on RDF query graphs
Update RDF graphs
Do logical entailment for RDF, RDFS, and OWL
Do federated queries over deferent SPARQL endpoints
26
Basic query
27
… and the result
1.
Comes next
28
Another example
29
… and the result
30
Ontologies and the Semantic Web
“An ontology is an explicit, formal specification of a shared conceptualization.
The term is borrowed from philosophy, where an ontology is a systematic
account of existence. For AI systems, what ‘exists’ is that which can be
represented.” T. Gruber, 1993.
Ontologies are represented via Classes, Relationships, and Instances.
Constrains can be imposed on relationships to define allowed values.
There are several types of ontologies (see textbook, pp. 462 – 468: Internet
shopping world example)
• Upper (top-level) ontologies, representing concepts such as space, time,
event, etc. that are universally valid.
• Domain ontologies, representing concepts in a generic domain.
• Task ontologies, representing concepts related to a particular task.
• Application ontologies, representing specific application task-oriented
domains.
31
An Ontology Example
• Visit http://protege.stanford.edu/ to learn
about creating ontologies.
Source: http://www.sei.cmu.edu/isis/guide/gifs/fruit-ontology.gif
32
OWL: The Web Ontology Language

The original OWL language, OWL 1, was intended to provide a richer expressiveness
compared to RDFS which is why it was based on SHOIN(D) logic. More expressive power,
however, may lead to undesirable computational properties which is why OWL 1 was
designed in 3 different flavors to address different knowledge representation needs:
1.
2.
3.



OWL Full: fully compatible with RDFS which is further extended with cardinality
constraints and other means for maximum expressivity, BUT the language is
undecidable.
OWL DL: subset of OWL Full to allow for efficient reasoning, but is not fully
compatible with RDFS.
OWL Lite: subset of OWL DL that does not allow for enumerated classes,
disjointness and arbitrary cardinality.
The latest version of OWL, OWL 2, is based on SROIQ(D) logic. It also comes in different
flavors: OWL EL, OWL RL, OWL QL, which are all subsets of OWL 2 DL, which turn is a
subset of OWL 2 Full.
OWL is based on the Open World Assumption, which states that the absence of
information is not the reason to assume that this information is false.
OWL does not rely on the Unique Name Assumption (which is the case with data bases),
i.e. if two names are not explicitly stated to be different, they may refer to the same
33
individual.
OWL RDF/RDFS relation
rdfs: Resource
rdfs:Class
owl:Class
rdf:Property
owl:ObjectProperty
owl:DatatypeProperty
OWL uses RDF syntax. owl:Class, owl:DatatypeProperty, and
owl:ObjectProperty are specializations of rdfs:Class and rdf:Property,
respectively.
34
OWL 1 Syntax
OWL 1 is based on the SHOIN(D) logic, which provides for the following
expressiveness:
 The TBox defines subsumption relationships between classes (ex. C ⊑ D)
 The ABox contains facts about class membership (ex. C(a), C(b)), properties
relations (ex. R(a, b)), equality and difference relations between individuals (ex.
a = b, a  b)
 The RBox defines subsumption/inclusion relationships between properties (ex.
R ⊑ S), inverse properties (ex. R - ), and transitivity properties (ex. R ⊑ + R)
 Class constructors: conjunction (C ⊓ D), disjunction (C ⊔ D), negation ( C)
 Property restrictions – universal ( R . C) and existential ( R . C)
 Number restrictions -- n R and n R
 Closed classes (nominals} – {a}
 Datatypes
35
OWL 2 Syntax
OWL 2 is based on the SROIQ(D) logic, which provides for the following additional
expressiveness compared to OWL 1:



The TBox allows also for equivalence relationships between classes (ex. C  D)
and for a special class expression Self:  S.Self. We can also state n S . C and
n S . C
The ABox allows also for negated property relations (ex.  R(a, b))
The RBox may contain in addition to simple properties, inverse properties
(ex. R - ) and universal properties (U) and also allows for general inclusion
(ex. R1  R2 ⊑ S), symmetry, reflexivity, irreflexivity and disjunctiveness of
properties.
There are different syntax versions of OWL 2:
 Functional syntax (substitutes the abstract syntax of OWL 1)
 RDF/XML syntax (extends the existing OWL/RDF syntax)
 OWL/XML syntax (new XML serialization)
 Manchester syntax (machine readable intended for ontology editors)
 Turtle syntax (human readable)
36
Want to build an OWL ontology yourself?
Although some of OWL serializations are not very hard to use in an
application-development setting, there are ontology editor applications
that are easy to learn and much more efficient ontology development
tools.
My #1 choice is PROTÉGÉ, developed at Stanford University and freely
available at http://protégé.stanford.edu. It comes is two versions:
 Web application
 Desktop application
Both come with extensive documentation, including Ontology
Development 101: A Guide to Creating Your First Ontology @
http://protegewiki.stanford.edu/wiki/Ontology101
37
n
38
39
Download