Intro to Semantic Web Programming From Semantic Web Programming by J. Hebeler et al., Semantic Web • Represents data in formats amenable to automated processing, integration, and reasoning. • Data is king and it provides even greater value when connected with other data sources to create a linked data web. • Sem Web standards include RDF, OWL, and SPARQL • The Linking Open Data Initiative: 3/09 and 2/10 http://www.ted.com/talks/tim_berners_lee_on_the_next_web. html http://blog.ted.com/2010/03/08/the_year_open_d/ • 2008 Article: http://tomheath.com/papers/bizer-heath-bernerslee-ijswis-linked-data.pdf Introduction • Semantics offer the leverage to make more information better and not overwhelmingly worse • Offers a new approach to extremely tough but lucrative challenges that employ vast amount of information and services • Awareness of Sem Web is required for any solution that depends on dynamic information and service resources • Part 1: Chapters 1 and 2: “Introducing Semantic Web Programming” - Chapter 2 implements “Hello Semantic Web World” • Two main areas drive a semantic web App: Knowledge representation and Application Integration. Introduction (cont.d) • Part 2: Chapters 3 to 7: “Foundations of Semantic Web Programming” - details Knowledge Representation – Ch. 3: “Modeling Information” shows data modeling with RDF; – Ch. 4: “Incorporating Semantics” – creates an Ontology knowledge model with RDFS and OWL; – Ch. 5: “Modeling Knowledge in the Real World” – exercises an ontology via Appl. Frameworks and reasoners , such as Pellet – Ch. 6: “Discovering Information” extracts useful information from a knowledge model with SPARQL – Ch. 7: “Adding Rules” : Shows the use of SWRL, a W3C standard. Introduction (Cont.d) • Part 3: “Building Semantic Web Applications” – Chapters 8-11 – Integrates the knowledge base with an application that acts upon it – Solid programming base for the semantic web – Ch. 8: “Applying a Programming Framework” – explores Jena semantic web framework – Ch. 9: “Combining Information” – ports info from sources such as relational databases, web services, and other formats – Ch. 10: ”Aligning Information” – aligning data with ontological concepts to unify disparate information – Ch. 11: “Sharing Information” – outputs the info in many formats, viz., RDFa, microformats, SPARQL endpoints, etc. – Uses FriendTracker App to directly show the programming concepts Introduction (Continued) • Part 4: “Expanding Semantic Web Programming” – Ch. 12-15 – Ch. 12: “Developing and Using Semantic Services” – adds semantics to services to allow them to participate in the Semantic Web – Ch. 13: “Managing Space and Time” – adds space and time considerations to the knowledge representation – Ch. 14: “Applying Patterns and Best Practices” – shows architectural patterns for constructing various Semantic Web Apps – Ch. 15: “Moving Forward” – discussed the future – focuses on four critical evolving areas: ontology management, advanced integration and distribution, advanced reasoning, and visualization. • Website: http://semwebprogramming.org Chapter 1: Preparing to Program a Semantic Web of Data • Make useable sense out of large, distributed information found throughout the WWW • Objectives of the chapter: – Form a useful, pragmatic definition of Sem Web – ID the major components and relate them – Outline how Sem Web impacts Apps – Discuss myths and hype – Explore the origin and foundation of Sem Web – Find out real world applications Semantic Web Programming: Chapter 1 Map • Semantic Web (SW) – Origin • Foundation – DL and Graph Theory – Components • SW – Statements, Ontology, Instance, and Language • SW Tools – Frameworks, IDE, Reasoner, and KB – Features – Expressiveness, Inference, Integration, and Unique • Programming – Examples – Impacts – Data centric, Sharing Data, Dynamic Data, Expressive Data – Roadblocks – Myths and Hype Definition: Sem Web • A web of data described and linked in ways to establish context or semantics that adhere to defined grammar and language constructs • No formal standard for such programmed semantics; Also, aggregation, sharing, and validation are not easy. E.g., Building • SW – addresses semantics through standardized connections to related info by labeling data uniquely and addressable • Figure 1-2: Isolation Vs the Semantic Web Feature WWW Semantic Web Fundamental Component Unstructured Content Formal Statements Primary Audience Humans Applications Links Indicate Location Indicate Location and Meaning Primary Vocabulary Formatting Instructions Semantics and Logic Logic Informal/nonstandard Description Logic Sem Web • SW statements allow the definition and organization of info to form rich expressions, simplify integration and sharing, enable inference, and allow meaningful information extractions • While the info remains distributed, dynamic, and diverse • In summary, SW improves your App’s ability to effectively utilize large amounts of diverse info on the scale of the WWW SW Relationships • Include definitions, associations, aggregations, and restrictions • Figure 1.3 – Example Graph • Your own “good” personal secretary? – Family, Friends, Associates, Suppliers, Distributors, Employees, Bank, .. • Establish both concepts (e.g., a Person has a birth date) and instances (e.g., John is a friend of Bill) • The former defines an ontology; the latter – instance data • Statements can be asserted or inferred; the former is created directly, while the later needs a reasoner to infer additional statements logically (dashed lines) SW • Sw statements employ a SW vocabulary and language to identify different types of stmnts and relationships. • SW offers several languages; Choice to balance user’s needs for performance, integration, and expressiveness. • The Stmnts are in two forms: Knowledgebases and files; KB offer dynamic, extensible storage like relational DB; Files contain static stmnts. • Example: http://www.geonames.org/ontology • http://en.wikipedia.org/wiki/GeoNames • Can enhance existing data sources (relational DB, web page, web service) and applications (standalone desktop App, mission-critical enterprise App, and large scale web app/svc). The FOAF Project: Pg. 29-30 Source: http://www.geonames.org/ontology/ontology_v2.2.1.rdf • <?xml version="1.0" encoding="UTF-8" ?> • - <rdf:RDF xml:base="http://www.geonames.org/ontology" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:gn="http://www.geonames.org/ontology#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/"> Same: in Turtle Format Links to Discuss • Geonames Feature Codes: http://www.geonames.org/statistics/total.html • Linked data: http://linkeddata.org/ • Sem Web Blog: http://www.geospatialsemanticweb.com/2006/10/14/ge onames-ontology-in-owl • Web Services Overview: http://www.geonames.org/export/ws-overview.html • Collective Intelligence: Algorithms of the Intelligent Web, Marmanis and Babenko, 2009 (Thanks to Luis Atencio) Source: http://www.geonames.org/ontology/ontology_v2.2.1.rdf - <gn:Code rdf:about="http://www.geonames.org/ontology#T.RK"> <skos:definition xml:lang="en">a conspicuous, isolated rocky mass</skos:definition> <skos:inScheme rdf:resource="http://www.geonames.org/ontology#T" /> <skos:prefLabel xml:lang="en">rock</skos:prefLabel> </gn:Code> Source: http://www.geonames.org/ontology/ontology_v2.2.1.rdf - <gn:Code rdf:about="http://www.geonames.org/ontology#S.HTL"> <skos:definition xml:lang="en">a building providing lodging and/or meals for the public</skos:definition> <skos:inScheme rdf:resource="http://www.geonames.org/ontology#S" /> <skos:prefLabel xml:lang="en">hotel</skos:prefLabel> </gn:Code> Source: http://www.geonames.org/ontology/ontology_v2.2.1.rdf - <gn:Class rdf:about="http://www.geonames.org/ontology#S"> <rdfs:comment xml:lang="en">spot, building, farm, ...</rdfs:comment> </gn:Class> - <gn:Class rdf:about="http://www.geonames.org/ontology#T"> <rdfs:comment xml:lang="en">mountain, hill, rock, ...</rdfs:comment> </gn:Class> Source: http://en.wikipedia.org/wiki/GeoNames • GeoNames is a geographical data base available and accessible through various Web services, under a Creative Commons attribution license. • Database and web services The GeoNames database contains over 10,000,000 geographical names corresponding to over 7,500,000 unique features.[1] All features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes. Beyond names of places in various languages, data stored include latitude, longitude, elevation, population, administrative subdivision and postal codes. All coordinates use the World Geodetic System 1984 (WGS84). Those data are accessible free of charge through a number of Web services and a daily database export.[2] The Web services include direct and reverse geocoding, finding places through postal codes, finding places next to a given place, and finding Wikipedia articles about neighbouring places. • Wiki interface The core of GeoNames data base is provided by official public sources, of which quality may vary. Through a wiki interface, users are invited to manually edit and improve the data base by adding or correcting names, move existing features, add new features, etc. • Semantic Web integration Each GeoNames feature is represented as a Web resource identified by a stable URI. This URI provides access, through content negotiation, either to the HTML wiki page, or to a RDF description of the feature, using elements of the GeoNames ontology.[3] This ontology describes the GeoNames features properties using the Web Ontology Language, the feature classes and codes being described in the SKOS language. Through Wikipedia articles URL linked in the RDF descriptions, GeoNames data are linked to DBpedia data and other RDF Linked Data. Comparison of Relational DB and KB Feature Relational Database Knowledge Base Structure Schema Ontology Statements Data Rows Instance Statements Admin Language DDL Ontology Statements Query Language SQL SPARQL Relationships Foreign Keys Multidimensional Logic External to DB, with triggers Formal Logic Statements Uniqueness Key for table URI Major Programming Components • SW Statement: A triple (subject-Predicate-Object) – thousands of these simple 3-tuples are combined in an App. • URI- unique name across the entire Internet. Each SW Statement has an URI. Could include a URL, an abstract URN, or IRI (Intl Resrc ID) • SW Languages – Has a set of keywords to instruct the SW tools. • An Ontology – Has statements that define concepts, relationships, and constraints. Similar to a DB schema or OO class diagram. Use the many existing Ontologies, for better quality and speed. • Instance Data – Statements about specific instances; not a generic concept. This forms the bulk of the SW – an ontology containing the concept ‘person’ may be used by millions of instances of ‘person’ • Example: The nces site: http://nces.ed.gov/ Programming Concepts (Contd) • Construction tools: To construct and integrate a SW through ontology/instance creation/import. GUI-based SW editor to see and explore; APIs • Interrogation tools: Navigate thru the SW to return a requested response. Query language • Reasoners: Add inference to SW; logical additions to gain classification and realization (same as relationship). Leverage assertions • Rule Engines: Support inference typically beyond that can be deduced from description logic. Merge ontologies, count and string searches. • SW Frameworks: package all the tools above to have an integrated flow. We use open source alternatives for both GUI and API. So, you can start right away. Impacts on Programming • Web Data-Centric: Place data at its center; Data is key • Semantic Data: Place meaning directly within the data Vs in programming instructions • Data Integration/Sharing: Access and share rich info resources throughout the WWW • Dynamic Data: Enable dynamic, run-time changes to the structure and contents of your info • Add comments on each later on Roadblocks, Myths, and Hype • Roadblocks: web-centric development; accepting of dirty, conflicting data; and dynamic addition of new data. • SW perspective: data-centric programming and distributed information programming • Traditional approach: ETL (extract, transform, and load) does not scale; has a single point of failure • WWW is full of data- good, bad, and wrong data. To fix, we need to know the truth; not easy. SW can define reputation and manage conflicting info • Remaining open to new data is not an easy perspective. SW maintains a nimble, agile view of data. Myths • Pursuit of one big info model – SW supports a multitude of distributed info sources with a multitude of perspectives. Your solution and you need to have that perspective. • One view – Allow any view for info analysis. Get started quickly and adapt/evolve. Encourages agility, and decoupled, modular design. Look for existing SW sources and use • Acceptance – New technologies scare people, as they should. Change also scares people. Hype • Hype doomed AI; SW, does incorporate excellent AI research of the past; and offers a useful improvement in leveraging info. SW is an evolutionary step in making info work harder for us • Hype may make complex technology look too simple. Tough, challenging problems demand complex technical solutions. SW reduces unnecessary complexity to focus on necessary complexity, that of managing info and knowledge we produce • Hype can overpromise the inherent challenges in using a new technology. Do not expect a perfect world of tools, frameworks, and the SW itself. You may hit sum bumps, but hang in there, it will be worth it. SW Origins • Scientific American article by Tim Berners-Lee et al., discussed SW for machine readability, easy info integration, info inference, unique naming, and rich representations, … • Graph Theory – 1736/Euler - nodes and relationships. Graphs imply the answer mathematically; brute force exhaustive analysis otherwise. • Description Logic – 1980s; rules to construct valid, useful knowledge representations, knowledge representations that are decidable and can actually produce an answer. From first order logic. From AI research. Info not tacit (If else). The externalized form reveals the info for verification, integration, reasoning, and interrogation. Relationships beyond inheritance to seek dependable logic. Links • Haiti http://topics.nytimes.com/top/news/internati onal/countriesandterritories/haiti/index.html • http://nces.ed.gov/ • SEC and XBRL: http://www.cpa2biz.com/Content/media/PRO DUCER_CONTENT/Newsletters/Articles_2010/ CorpFin/SEC_XBRL.jsp