Intro to Semantic Web Programming

advertisement
Intro to Semantic Web
Programming
From Semantic Web Programming
by J. Hebeler et al.,
Semantic Web
• Represents data in formats amenable to automated
processing, integration, and reasoning.
• Data is king and it provides even greater value when
connected with other data sources to create a linked data
web.
• Sem Web standards include RDF, OWL, and SPARQL
• The Linking Open Data Initiative:
3/09 and 2/10
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.
html
http://blog.ted.com/2010/03/08/the_year_open_d/
• 2008 Article: http://tomheath.com/papers/bizer-heath-bernerslee-ijswis-linked-data.pdf
Introduction
• Semantics offer the leverage to make more information better
and not overwhelmingly worse
• Offers a new approach to extremely tough but lucrative
challenges that employ vast amount of information and
services
• Awareness of Sem Web is required for any solution that
depends on dynamic information and service resources
• Part 1: Chapters 1 and 2: “Introducing Semantic Web
Programming” - Chapter 2 implements “Hello Semantic Web
World”
• Two main areas drive a semantic web App: Knowledge
representation and Application Integration.
Introduction (cont.d)
• Part 2: Chapters 3 to 7: “Foundations of Semantic Web
Programming” - details Knowledge Representation
– Ch. 3: “Modeling Information” shows data modeling with RDF;
– Ch. 4: “Incorporating Semantics” – creates an Ontology knowledge
model with RDFS and OWL;
– Ch. 5: “Modeling Knowledge in the Real World” – exercises an
ontology via Appl. Frameworks and reasoners , such as Pellet
– Ch. 6: “Discovering Information” extracts useful information from a
knowledge model with SPARQL
– Ch. 7: “Adding Rules” : Shows the use of SWRL, a W3C standard.
Introduction (Cont.d)
• Part 3: “Building Semantic Web Applications” – Chapters 8-11
– Integrates the knowledge base with an application that acts upon it
– Solid programming base for the semantic web
– Ch. 8: “Applying a Programming Framework” – explores Jena semantic
web framework
– Ch. 9: “Combining Information” – ports info from sources such as
relational databases, web services, and other formats
– Ch. 10: ”Aligning Information” – aligning data with ontological
concepts to unify disparate information
– Ch. 11: “Sharing Information” – outputs the info in many formats, viz.,
RDFa, microformats, SPARQL endpoints, etc.
– Uses FriendTracker App to directly show the programming concepts
Introduction (Continued)
• Part 4: “Expanding Semantic Web Programming” – Ch. 12-15
– Ch. 12: “Developing and Using Semantic Services” – adds semantics to
services to allow them to participate in the Semantic Web
– Ch. 13: “Managing Space and Time” – adds space and time
considerations to the knowledge representation
– Ch. 14: “Applying Patterns and Best Practices” – shows architectural
patterns for constructing various Semantic Web Apps
– Ch. 15: “Moving Forward” – discussed the future – focuses on four
critical evolving areas: ontology management, advanced integration
and distribution, advanced reasoning, and visualization.
• Website: http://semwebprogramming.org
Chapter 1: Preparing to Program a Semantic Web of Data
• Make useable sense out of large, distributed
information found throughout the WWW
• Objectives of the chapter:
– Form a useful, pragmatic definition of Sem Web
– ID the major components and relate them
– Outline how Sem Web impacts Apps
– Discuss myths and hype
– Explore the origin and foundation of Sem Web
– Find out real world applications
Semantic Web Programming: Chapter 1 Map
• Semantic Web (SW)
– Origin
• Foundation – DL and Graph Theory
– Components
• SW – Statements, Ontology, Instance, and Language
• SW Tools – Frameworks, IDE, Reasoner, and KB
– Features – Expressiveness, Inference, Integration, and Unique
• Programming
– Examples
– Impacts – Data centric, Sharing Data, Dynamic Data, Expressive Data
– Roadblocks – Myths and Hype
Definition: Sem Web
• A web of data described and linked in ways to
establish context or semantics that adhere to
defined grammar and language constructs
• No formal standard for such programmed
semantics; Also, aggregation, sharing, and
validation are not easy. E.g., Building
• SW – addresses semantics through
standardized connections to related info by
labeling data uniquely and addressable
• Figure 1-2: Isolation Vs the Semantic Web
Feature
WWW
Semantic Web
Fundamental Component
Unstructured Content
Formal Statements
Primary Audience
Humans
Applications
Links
Indicate Location
Indicate Location and
Meaning
Primary Vocabulary
Formatting Instructions
Semantics and Logic
Logic
Informal/nonstandard
Description Logic
Sem Web
• SW statements allow the definition and
organization of info to form rich expressions,
simplify integration and sharing, enable
inference, and allow meaningful information
extractions
• While the info remains distributed, dynamic,
and diverse
• In summary, SW improves your App’s ability to
effectively utilize large amounts of diverse info
on the scale of the WWW
SW Relationships
• Include definitions, associations, aggregations, and
restrictions
• Figure 1.3 – Example Graph
• Your own “good” personal secretary? – Family, Friends,
Associates, Suppliers, Distributors, Employees, Bank, ..
• Establish both concepts (e.g., a Person has a birth date)
and instances (e.g., John is a friend of Bill)
• The former defines an ontology; the latter – instance
data
• Statements can be asserted or inferred; the former is
created directly, while the later needs a reasoner to
infer additional statements logically (dashed lines)
SW
• Sw statements employ a SW vocabulary and language
to identify different types of stmnts and relationships.
• SW offers several languages; Choice to balance user’s
needs for performance, integration, and
expressiveness.
• The Stmnts are in two forms: Knowledgebases and
files; KB offer dynamic, extensible storage like
relational DB; Files contain static stmnts.
• Example: http://www.geonames.org/ontology
• http://en.wikipedia.org/wiki/GeoNames
• Can enhance existing data sources (relational DB, web
page, web service) and applications (standalone
desktop App, mission-critical enterprise App, and large
scale web app/svc).
The FOAF Project: Pg. 29-30
Source:
http://www.geonames.org/ontology/ontology_v2.2.1.rdf
• <?xml version="1.0" encoding="UTF-8" ?>
• - <rdf:RDF
xml:base="http://www.geonames.org/ontology"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:gn="http://www.geonames.org/ontology#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
Same: in Turtle Format
Links to Discuss
• Geonames Feature Codes:
http://www.geonames.org/statistics/total.html
• Linked data: http://linkeddata.org/
• Sem Web Blog:
http://www.geospatialsemanticweb.com/2006/10/14/ge
onames-ontology-in-owl
• Web Services Overview:
http://www.geonames.org/export/ws-overview.html
• Collective Intelligence: Algorithms of the Intelligent Web,
Marmanis and Babenko, 2009 (Thanks to Luis Atencio)
Source:
http://www.geonames.org/ontology/ontology_v2.2.1.rdf
- <gn:Code
rdf:about="http://www.geonames.org/ontology#T.RK">
<skos:definition xml:lang="en">a conspicuous, isolated
rocky mass</skos:definition>
<skos:inScheme
rdf:resource="http://www.geonames.org/ontology#T" />
<skos:prefLabel xml:lang="en">rock</skos:prefLabel>
</gn:Code>
Source:
http://www.geonames.org/ontology/ontology_v2.2.1.rdf
- <gn:Code
rdf:about="http://www.geonames.org/ontology#S.HTL">
<skos:definition xml:lang="en">a building providing lodging and/or
meals for the public</skos:definition>
<skos:inScheme
rdf:resource="http://www.geonames.org/ontology#S" />
<skos:prefLabel xml:lang="en">hotel</skos:prefLabel>
</gn:Code>
Source:
http://www.geonames.org/ontology/ontology_v2.2.1.rdf
- <gn:Class rdf:about="http://www.geonames.org/ontology#S">
<rdfs:comment xml:lang="en">spot, building, farm,
...</rdfs:comment>
</gn:Class>
- <gn:Class
rdf:about="http://www.geonames.org/ontology#T">
<rdfs:comment xml:lang="en">mountain, hill, rock,
...</rdfs:comment>
</gn:Class>
Source: http://en.wikipedia.org/wiki/GeoNames
•
GeoNames is a geographical data base available and accessible through various Web services, under
a Creative Commons attribution license.
• Database and web services
The GeoNames database contains over 10,000,000 geographical names corresponding to over 7,500,000
unique features.[1] All features are categorized into one out of nine feature classes and further
subcategorized into one out of 645 feature codes. Beyond names of places in various languages, data
stored include latitude, longitude, elevation, population, administrative subdivision and postal codes.
All coordinates use the World Geodetic System 1984 (WGS84).
Those data are accessible free of charge through a number of Web services and a daily database
export.[2] The Web services include direct and reverse geocoding, finding places through postal
codes, finding places next to a given place, and finding Wikipedia articles about neighbouring places.
• Wiki interface
The core of GeoNames data base is provided by official public sources, of which quality may vary.
Through a wiki interface, users are invited to manually edit and improve the data base by adding or
correcting names, move existing features, add new features, etc.
• Semantic Web integration
Each GeoNames feature is represented as a Web resource identified by a stable URI. This URI provides
access, through content negotiation, either to the HTML wiki page, or to a RDF description of the
feature, using elements of the GeoNames ontology.[3] This ontology describes the GeoNames
features properties using the Web Ontology Language, the feature classes and codes being described
in the SKOS language. Through Wikipedia articles URL linked in the RDF descriptions, GeoNames data
are linked to DBpedia data and other RDF Linked Data.
Comparison of Relational DB and KB
Feature
Relational Database
Knowledge Base
Structure
Schema
Ontology Statements
Data
Rows
Instance Statements
Admin Language
DDL
Ontology Statements
Query Language
SQL
SPARQL
Relationships
Foreign Keys
Multidimensional
Logic
External to DB, with
triggers
Formal Logic Statements
Uniqueness
Key for table
URI
Major Programming Components
• SW Statement: A triple (subject-Predicate-Object) –
thousands of these simple 3-tuples are combined in an
App.
• URI- unique name across the entire Internet. Each SW
Statement has an URI. Could include a URL, an abstract
URN, or IRI (Intl Resrc ID)
• SW Languages – Has a set of keywords to instruct the SW
tools.
• An Ontology – Has statements that define concepts,
relationships, and constraints. Similar to a DB schema or
OO class diagram. Use the many existing Ontologies, for
better quality and speed.
• Instance Data – Statements about specific instances; not a
generic concept. This forms the bulk of the SW – an
ontology containing the concept ‘person’ may be used by
millions of instances of ‘person’
• Example: The nces site: http://nces.ed.gov/
Programming Concepts (Contd)
• Construction tools: To construct and integrate a SW
through ontology/instance creation/import. GUI-based
SW editor to see and explore; APIs
• Interrogation tools: Navigate thru the SW to return a
requested response. Query language
• Reasoners: Add inference to SW; logical additions to gain
classification and realization (same as relationship).
Leverage assertions
• Rule Engines: Support inference typically beyond that can
be deduced from description logic. Merge ontologies,
count and string searches.
• SW Frameworks: package all the tools above to have an
integrated flow. We use open source alternatives for both
GUI and API. So, you can start right away.
Impacts on Programming
• Web Data-Centric: Place data at its center;
Data is key
• Semantic Data: Place meaning directly within
the data Vs in programming instructions
• Data Integration/Sharing: Access and share
rich info resources throughout the WWW
• Dynamic Data: Enable dynamic, run-time
changes to the structure and contents of your
info
• Add comments on each later on
Roadblocks, Myths, and Hype
• Roadblocks: web-centric development; accepting of
dirty, conflicting data; and dynamic addition of new
data.
• SW perspective: data-centric programming and
distributed information programming
• Traditional approach: ETL (extract, transform, and load)
does not scale; has a single point of failure
• WWW is full of data- good, bad, and wrong data. To fix,
we need to know the truth; not easy. SW can define
reputation and manage conflicting info
• Remaining open to new data is not an easy
perspective. SW maintains a nimble, agile view of data.
Myths
• Pursuit of one big info model – SW supports a
multitude of distributed info sources with a
multitude of perspectives. Your solution and
you need to have that perspective.
• One view – Allow any view for info analysis.
Get started quickly and adapt/evolve.
Encourages agility, and decoupled, modular
design. Look for existing SW sources and use
• Acceptance – New technologies scare people,
as they should. Change also scares people.
Hype
• Hype doomed AI; SW, does incorporate excellent AI
research of the past; and offers a useful improvement
in leveraging info. SW is an evolutionary step in making
info work harder for us
• Hype may make complex technology look too simple.
Tough, challenging problems demand complex
technical solutions. SW reduces unnecessary
complexity to focus on necessary complexity, that of
managing info and knowledge we produce
• Hype can overpromise the inherent challenges in using
a new technology. Do not expect a perfect world of
tools, frameworks, and the SW itself. You may hit sum
bumps, but hang in there, it will be worth it.
SW Origins
• Scientific American article by Tim Berners-Lee et al.,
discussed SW for machine readability, easy info
integration, info inference, unique naming, and rich
representations, …
• Graph Theory – 1736/Euler - nodes and relationships.
Graphs imply the answer mathematically; brute force
exhaustive analysis otherwise.
• Description Logic – 1980s; rules to construct valid,
useful knowledge representations, knowledge
representations that are decidable and can actually
produce an answer. From first order logic. From AI
research. Info not tacit (If else). The externalized form
reveals the info for verification, integration, reasoning,
and interrogation. Relationships beyond inheritance to
seek dependable logic.
Links
• Haiti http://topics.nytimes.com/top/news/internati
onal/countriesandterritories/haiti/index.html
• http://nces.ed.gov/
• SEC and XBRL:
http://www.cpa2biz.com/Content/media/PRO
DUCER_CONTENT/Newsletters/Articles_2010/
CorpFin/SEC_XBRL.jsp
Download