Semantic Web Technologies: A Tutorial Li Ding University of Maryland Baltimore County Joint work with Deborah McGuinness, Tim Finin and Anupam Joshi Presented at Kodak Research Laboratories, Rochester, New York 18 July 2006 @ 2 The Web has made people smarter craigslist del.icio.us @ 3 But what about machines? tell register Machines still have a very minimal understanding of text and images. @ 4 Motivation: machine-friendly data Natural Language Li Ding is a person LiDigisasaon as seen by a person XML – represent structures <person>Li Ding</person> as seen by a person as seen by a machine <on>LiDig</on> as seen by a machine Semantic Web - represent more semantics represent structures enable common vocabulary associate symbols with logic interpretation for inference @ Semantic Web Technologies @ 6 Semantic Web Layers Semantic Aspect Web Aspect HTTP "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.“ – Berners-Lee, Hendler & Lassila, Scientific American, 2001 Image source: http://en.wikipedia.org/wiki/Image:W3c_semantic_web_stack.jpg @ 7 The Semantic Web is simple Each URI denotes a concept Don't say "colour" say <http://example.com/2002/std6#col> URIs are connected by triples Relational database RDF (Resource Description Framework) Machines read data as directed RDF graph Source: Tim Berners-Lee, Putting the Web back into Semantic Web, ISWC2005 Keynote @ 8 Example: RDF graph and syntax http://xmlns.com/foaf/0.1/name t1 t2 Li Ding http://www.w3.org/1999/02/22-rdf-syntax-ns#type RDF Graph URI, Literal, BNode Triple http://xmlns.com/foaf/0.1/Person The entire graph means: there exist a person whose name is “Li Ding”. <?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#“> <foaf:Person> <foaf:name>Li Ding</foaf:name> </foaf:Person> Data encoded in RDF/XML syntax </rdf:RDF> XML unicode Namespace URI as tag Alternative RDF syntax languages: N3(notation 3), N-Triples, Turtle @ 9 Example: Surfing RDF graphs G1: http://cs.umbc.edu/~dingli1/foaf.rdf http://cs.umbc.edu/~dingli1/foaf.rdf#dingli foaf:name foaf:knows rdf:type Surf to definition G3: http://xmlns.com/foaf/1.0/ Li Ding foaf:Person foaf:mbox rdf:type wordNet:Agent mailto:finin@umbc.edu rdfs:seeAlso http://cs.umbc.edu/~finin/foaf.rdf Surf to another instance G2: http://cs.umbc.edu/~finin/foaf.rdf foaf:mbox mailto:finin@umbc.edu foaf:firstName foaf:surname Finin Tim rdfs:subClassOf foaf:Person rdf:type rdfs:Class rdfs:domain foaf:mbox rdf:type rdf:Property rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# foaf: http://xmlns.com/foaf/1.0/ @ Example: Serving human & machine 10 The Original RDF/XML for machines The HTML is generated by applying XSLT on RDF/XML @ 11 Ontology Spectrum Catalog/ID Thesauri “narrower term” relation DB Schema Terms/ glossary Disjointness, Frames Formal Inverse, is-a (properties) part of… UMLS RDF Wordnet OO Formal Informal instance is-a RDFS DAML CYC OWL IEEE SUO Value Restriction Simple Taxonomies Source: Originally by Deborah L. McGuinness (KSL, Stanford), modified by Tim Finin General Logical constraints Expressive Ontologies @ Ontology Languages: RDFS and OWL RDFS 12 Set theory – rdfs:Class Relation – rdf:Property, rdfs:domain, rdfs:range Hierarchy – rdfs:subClassOf, rdfs:subPropertyOf Built-in Datatype – xsd:string, xsd:dataTime OWL Description Logic Class axioms oneOf, disjointWith, unionOf, complementOf, intersectionOf … Restriction, onProperty, cardinality, hasValue… Property axioms Class, Thing, Nothing DatatypeProperty, ObjectProperty, AnnotationProperty,… inverseOf , TransitiveProperty , SymmetricProperty FunctionalProperty, InverseFunctionalProperty Equality– equivalentClass , sameAs , differentFrom… Ontology annotation – Ontology, imports, versionInfo @ 13 Example: Inference using ontologies Ontology Languages (RDFS, OWL) has formal foundations that allow us to infer additional (implicit) statements RDFS provides basic ones, e.g. sub-class, sub-property, domain OWL adds many more axioms, e.g. inverse-property, equality, SWRL (Semantic Web Rule Language) enables a general purposed solution Supports rule representation But also requires inference support beyond RDFS and OWL hasbrother rdfs:subPropertyOf hasSibling hasSibling #Joe hasBrother hasUncle hasChild owl:inverseOf hasParent hasParent #Louise hasChild #Deborah SWRL: (x hasParent y) (y hasBrother z) => (x hasUncle z) Source: Semantic Web tutorial (AAAI 2005) by Deborah L. McGuinness @ More languages and more ontologies Languages (require special inference engine) [Trust/Uncertainty] BayesOWL [Proof] PML (Proof Markup Language) [Query/Data Access] SPARQL Query Language for RDF [Rule] SWRL( Semantic Web Rule Language) [Policy] REI: A Policy Specification Language 14 [Service] OWL-S by DAML (1.2 preview available) [Service] SAWSDL (Semantic Annotations for WSDL) [Thesauri] SKOS (Simple Knowledge Organization System) Ontologies (only need RDFS and/or OWL inference) Upper ontologies - OpenCyc, WordNet, OntoSem, SUO Specialized common ontologies - FOAF, Dublin Core, RSS Domain ontologies – bibtex, biology, and many… Li Ding, Pranam Kolari, Zhongli Ding, and Sasikanth Avancha, “Using Ontologies in the Semantic Web: A Survey”, in Ontologies in the Context of Information Systems (book chapter), 2005. http://ebiquity.umbc.edu/paper/html/id/257/ @ 15 Semantic Web Tools Editor Online Registry DAML Ontology Library Schema Web Search Engine publish Swoogle Semantic Web Search Browser Tabulator IsaViz Piggybank Arago Horus Mspace Magpie browse Protégé Swoop Reasoner create inference Managing Ontologies instance update extend integrate Mapping Tools Pellet (DL) Racer (DL) FACT++ (DL) Jena JTP F-OWL Euler CWM Jena (SPARQL) KAON Kowari Seasam OWLIM 3store Triple store Instance store Redland Tap RDF store Yars ONION IBM IODT PROMPT OntoMapper RDFLib RDF gateway Glue allegro OntoMerge Oracle 10 Ontomorph source1: http://ebiquity.umbc.edu/paper/html/id/257/Using-Ontologies-in-the-Semantic-Web-A-Survey source2: http://www.wiwiss.fu-berlin.de/suhl/bizer/toolkits/ @ Semantic Web data @ Semantic Web data sources 17 Text editor: I write RDF/XML manually. Semantic Web Editors: Protégé, Swoop Information Extraction (consumer side) NLP (hard), e.g. SemNews heuristic scrapping (regular expr.), e.g. Semagix Freedom Wrapped database content (publisher side) blog, social network websites, e.g. livejournal.com academic interests: http://www.mindswap.org/, http://ebiquity.umbc.edu Generated by software creative commons license embedded in HTML embedded metadata JPEG, PDF (XMP) agent communication message … @ 18 The Scale of the Semantic Web Statistics based Semantic Web data indexed by Swoogle Year Terms Documents Individuals Triples Bytes (million) (million) (million) (million) (billion) 2004 0.15 0.33 7.3 48 4.3 2006 1.9 1.6 16 276 47 2008 10 100 1000 20,000 3000 Estimated number of documents based on Google query Docs Optimistic 109 Conservative 105 Corresponding Google query rdf OR inurl:rss OR inurl:foaf -filetype:html rdf filetype:rdf @ Where the data from 19 “com” has contributed the largest portion of websites (71%) and pure SWDs (39%) because industry has adopted virtual hosting technology as well as ontologies such as RSS and FOAF most SWOs are from “org” (46%, e.g. www.w3.org) and “edu” (14%, e.g., spire.umbc.edu) because of the deep interests in developing ontologies from academia and non-profit organizations. SWDs: Semantic Web documents; SWOs: semantic web ontologies; pure SWD: not embeded note: Statistics of top level domain is also used in characterizing the Web (Henziger and Lawrence 2004) @ 20 Source websites of SWD Jan 2005- Aug 2005 100000 1000000 1, 125911 100000 2, 17474 3, 5200 10000 1000 y = 6236.7x -0.6629 R2 = 0.9622 100 10 80401, 2 100517, 1 1 1 10 100 1000 10000 100000 100000 0 y: # of websites hosting >= m SWDs y: # of websites hosting >= m SWDs Jan 2005- Mar 2006 10000 1000 100 y = 6598.8x -0.7305 R2 = 0.9649 10 1 1 10 100 1000 10000 100000 1000000 m: # of SWDs m: # of SWDs Invariant found! The number of websites hosting more than m SWDs follows power law distribution Similar to the Web Head: virtual hosting Tail: crawling strategy @ 21 Size of SWD Number of SWDs Embedded SWDs are small 69% have 3 triples 96% have <10 triples; Pure SWDs 60% have 5 to 1000 triples. Special size of RSS 130 Number of SWOs SWOs # of triples 17 triples for channel 7 triples for each of the 15 items Biased by PML, Small ones from RDF test Largest is 1M @ 22 Age of SWD Measured by the last-modified time of SWD PSWD: Exponential distribution SWO: flat tail -- ontology development interests decrease? pswd swo (pml filtered) 1000000 Expon. (pswd) y = 2E-48e 0.0032x 100000 10000 1000 100 10 1 7/20/1995 4/15/1998 1/9/2001 10/6/2003 7/2/2006 @ How Semantic Web Terms are used? 23 All usage distributions follow Power distribution Few SWTs been well populated 371 has >100 class-instance 1208 has>100 property-instances @ 24 Swoogle Rank (citation based) http://www.w3.org/2000/01/rdf-schema http://www.w3.org/1999/02/22-rdf-syntax-ns indegree=1,077,768,mean(inflow)=0.100 0.25 1 0.11 2 indegree=432,984,mean(inflow)=0.039 0.51 0.10 0.30 0.35 0.11 http://www.w3.org/2002/07/owl indegree=86,959,mean(inflow)=0.069 0.18 0.16 5 0.03 indegree=270,178,mean(inflow)=0.168 0.20 0.10 6 0.12 0.43 http://purl.org/rss/1.0 8 http://web.resource.org/cc 0.17 indegree=57,066,mean(inflow)=0.195 0.21 0.27 0.07 0.10 4 0.12 0.11 http://purl.org/dc/elements/1.1 0.07 0.06 0.16 0.12 0.20 0.08 10 http://www.hackcraft.net/bookrdf/vocab/0_1/ indegree=16,380,mean(inflow)=0.167 3 0.29 Computed using Swoogle metadata by May 2006 0.23 0.05 0.03 0.17 indegree=155,949,mean(inflow)=0.036 0.25 7 indegree=54,909,mean(inflow)=0.042 9 http://www.w3.org/2001/vcard-rdf/3.0 0.10 indegree=861,416,mean(inflow)=0.096 http://purl.org/dc/terms 0.27 http://xmlns.com/foaf/0.1/index.rdf indegree=512,790,mean(inflow)=0.217 @ Semantic Web Applications @ 26 TAGA: Travel Agent Game in Agentcities Motivation Features Market dynamics Auction theory (TAC) Semantic web Agent collaboration (FIPA & Agentcities) Owl as a content language Open Market Framework Auction Services OWL message content OWL Ontologies Global Agent Community Ontologies FIPA (JADE, April Agent Platform) Semantic Web (RDF, OWL) Web (SOAP,WSDL,DAML-S) Internet (Java Web Start ) http://taga.umbc.edu/ontologies/ Owl for protocol description Report Contract travel.owl – travel concepts fipaowl.owl – FIPA content lang. auction.owl – auction services tagaql.owl – query language Owl for representation and reasoning Report Direct Buy Transactions Report Auction Transactions Market Oversight Agent Bulletin Board Agent Customer Agent Technologies Report Travel Package Auction Service Agent Proposal Direct Buy Travel Agents Web Service Agents Owl for service descriptions FIPA platform infrastructure services, including directory facilitators enhanced to use OWL-S for service discovery http://taga.umbc.edu (offline now) @ 27 Semantic Content Publishing data stored in database PHP generates both HTML and OWL HTML pages link to corresponding OWL no more web scraping http://ebiquity.umbc.edu/person/html/Li/Ding/ http://ebiquity.umbc.edu/person/foaf/Li/Ding/foaf.rdf FOAF PHP PHP http://ebiquity.umbc.edu/ -- ebiquity group website Mysql database @ Rei Policy Language 28 Rei is a declarative policy language for describing policies over actions Reasons over domain dependent information Currently represented in OWL + logical variables Based on deontic concepts Permission, Prohibition, Obligation, Dispensation Models speech acts Delegation, Revocation, Request, Cancel Meta policies Priority, modality preference Policy engineering tools Reasoner, IDE for Rei policies in Eclipse http://rei.umbc.edu/ @ Example: enforcing privacy policy 29 The speaker doesn’t want others to know the specific room that he’s in, but is willing for others to know he’s on campus He defines the following privacy policy Share my location with a granularity >= “State” The broker isLocated(US) => Yes! isLocated(Maryland) => Yes! isLocated(UMBC) => Uncertain.. isLocated(ITE-RM210) => Uncertain.. @ Cobra: Context Broker Architecture Ontology Agents Service Inference Policy http://cobra.umbc.edu/ 30 @ Web-scale semantic web data access agent data access service ask (“person”) Search vocabulary Compose query Populate RDF database inform (“foaf:Person”) 31 the Web Index RDF data Search URIrefs in SW vocabulary ask (“?x rdf:type foaf:Person”) inform (doc URLs) Search URLs in SWD index Fetch docs Query local RDF database @ Swoogle Semantic Web Search Engine Harvesting Semantic Web data from the Web Provide search/navigation services for machines (via REST+ RDF/XML) 32 Digest doc, term, namespace Links Also serves human users Status Running since summer 2004 1.6M RDF documents, 300M RDF triples, 10K ontologies http://swoogle.umbc.edu/ @ 33 Ontology Dictionary From web of document to web of data Aggregate from multiple sources Inductively learned definition Onto 1 foaf:name Onto 2 rdf:type owl:Class rdfs:domain foaf:Person foaf:Person foaf:Agent rdfs:subClassOf foaf:name rdfs:domain rdf:type owl:Class wob:hasInstanceDomain foaf:Person wob:hasInstanceDomain foaf:Agent dc:title rdfs:subClassOf SWD3 foaf:name foaf:Person rdf:type dc:title Tim Finin Dr. http://swoogle.umbc.edu/2005/modules.php?name=Ontology_Dictionary @ Semantic Web Challenges - Winners 2003 CS AKTive Space (CAS) is an integrated Semantic Web application which provides a way to explore the UK Computer Science Research domain across multiple dimensions for multiple stakeholders, from funding agencies to individual researchers. 34 2004 Flink itself is also likely to be unique as a crossover between a social experiment and a semantic application. 2005 CONFOTO is a browsing and annotation service for conference photos. http://challenge.semanticweb.org/ @ 35 Triple Shop: SPARQL dataset finder Who knows Anupam Joshi? Show me their names, email address and pictures 1. Compose a SPARQL query without FROM clause 2. Parse SPARQL query, search Swoogle for related URLs, and compose a dataset 3. Run SPARQL query on dataset http://sparql.cs.umbc.edu/tripleshop2/ @ 36 Integrating Social Networks data FOAF FOAF Network Reputation Systems J. Golbeck source knows RDF RDF/XML Citeseer Rank knows L. Ding H. Chen P. Kolari DBLP Coauthor Database HTML Google PageRank knows J. Hendler knows F. Perich T. Finin Kagal A. Joshi hub Golbeck’s Trust Network sink island Trust sameName Reputation Trust network Computation Entity mapping Tie strength Trust aggregation Y. Peng L. Ding co-author L. Kagal T. Finin 28 6 1 A. Sheth A. Joshi 1 5 M. P. Singh H. Chen F. Perich DBLP Coauthor Network @ 37 Inference Web Infrastructure WWW SDS OWL-S/BPEL (DAML/SNRC) CWM (TAMI) JTP (DAML/NIMD) SPARK (CALO) N3 KIF SPARK-L UIMA Text Analytics (NIMD/Exp Agg) Proof Markup Language (PML) Trust Justification Provenance Toolkit IWTrust Trust computation IW Explainer/ Abstractor End-user friendly visualization IWBrowser Expert friendly Visualization IWSearch search engine based publishing IWBase provenance registration [Inference Web] Framework for explaining question answering tasks by abstracting, storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by question answerers. @ PML: Proof Markup Langauge 38 isQueryFor Query foo:query1 (type TonysSpecialty ?x) Question foo:question1 (what is Tony’s Specialty) IWBase hasAnswer hasLanguage NodeSet foo:ns1 (hasConclusion …) fromQuery isConsequentOf Language hasInferencEngine InferenceEngine hasRule InferenceStep InferenceRule hasAntecendent … NodeSet foo:ns2 (hasConclusion …) fromAnswer InferenceStep Source hasVariableMapping Mapping isConsequentOf hasSourceUsage SourceUsage hasSource usageTime … Justification Trace @ IWBrowser – Justification and Provenance 39 @ Tracking Provenance via RDF Molecule decompose An RDF graph G 40 The graph’s RDF molecules http://www.cs.umbc.edu/~dingli1 t2 foaf:knows t1 foaf:name t1 Li Ding t4 t3 foaf:name foaf:mbox t2 Tim Finin t3 t4 t3 mailto:finin@umbc.edu Match sub-Graph Web pages containing one or more molecules discovered by Swoogle Ding, L.; Finin, T.; Peng, Y.; Pinheiro da Silva, P.; McGuinness, D.L. Tracking RDF Graph Provenance using RDF Molecules. Proceedings of the Fourth International Semantic Web Conference (poster), November 2005. 2005 , @ http://www-ksl.stanford.edu/KSL_Abstracts/KSL-05-06.html Conclusion The Semantic Web simple but powerful Standardized by W3C: RDF, RDFS, OWL Current focuses 41 Query -- SPARQL Rules – SWRL, RIF Web services – OWL-S, WSDL-S, SAWSDL Best practice and deployment but cannot do everything Open questions Business model, Industry adoption? Privacy? @ 42 Recommended Readings Tutorials Starting points Semantic Web Road map, (since 1998), Tim Berners-Lee The Semantic Web, Scientific American, May 2001, Tim Berners-Lee, James Hendler and Ora Lassila Ontology Development 101: A Guide to Creating Your First Ontology, 2001, Natalya F. Noy and Deborah L. McGuinness Semantic Web Tutorials, http://www.w3.org/2001/sw/BestPractices/Tutorials W3C Semantic Web activity, http://www.w3.org/2001/sw/ W3C Semantic Web Interest Group, http://www.w3.org/2001/sw/interest/ W3C Semantic Web News, http://www.w3.org/2001/sw/news Planet RDF - aggregated blogs, http://planetrdf.com/ Dave Beckett’s Resource Description Framework (RDF) Resource Guide Swoogle Semantic Web Search Engine, http://swoogle.umbc.edu Semantic Web reference card, http://ebiquity.umbc.edu/resource/html/id/94/ Conferences and Journals International Semantic Web Conference (ISWC) European Semantic Web Conference (ESWC) Semantic Technology Conference (SemTech) Journal of Web Semantics @ Ongoing W3C’s Semantic Web Activity RDF Data Access Working Group RuleML => SWRL=> RIF Best Practices Working Group RDQL… => SPARQL Rules Interchange Working Group 43 Vocabulary management, e.g. WordNet Thesauri– SKOS (Simple Knowledge Organization System) Image Annotation DOAP (Description of a Project) Many tutorials and demos Semantic Annotations for Web Services Description Language Working Group OWL-S and WSDL-S WSDL 2.0 @