Tools for the Semantic web Jim Hendler http://www.mindswap.org Sem Web: What it’s all about Knowledge representation, as this technology is often called, is currently in a state comparable to that of hypertext before the advent of the web: it is clearly a good idea, and some very nice demonstrations exist, but it has not yet changed the world. It contains the seeds of important applications, but to unleash its full power it must be linked into a single global system. -- Tim Berners-Lee, inventor of the WWW, 2001. Kyoto U, Oct 2002 2 www.mindswap.org 2 Part I: Review of semantic WEB Kyoto U, Oct 2002 3 www.mindswap.org 3 On the Web -- links are critical! Web page Any Web Resource <a href= URI> HTML <a href=“http://…”> On the Semantic WEB -- links are critical! URI URI RDF Kyoto U, Oct 2002 URI RDF is like the web! 4 www.mindswap.org 4 Sem Web models start from RDF… DOC1 <mind:Person rdf:id=“Hendler”> <mind:title jobs:Professor> <jobs:placeOfWork http://www.cs.umd.edu> </mind:Person> Jobs: Professor Mind: Mind:title DOC1 Jobs: Kyoto U, Oct 2002 Hendler Jobs:placeOfWork 5 Web Page http://www… www.mindswap.org 5 XML is NOT semantics Kyoto U, Oct 2002 6 www.mindswap.org 6 XML is NOT semantics <photo> <subject> http://www.w3.org/~timbl </subject> <name> Tim Berners-Lee</name> </name> … </photo> Kyoto U, Oct 2002 7 www.mindswap.org 7 XML is NOT semantics Xml schema is DOCUMENT checking photo has multiple subject fields photo has one physical location etc. <photo> <subject> http://www.w3.org/~timbl </subject> <name> Tim Berners-Lee</name> </name> … </photo> Kyoto U, Oct 2002 8 www.mindswap.org 8 XML is NOT semantics Xml schema is DOCUMENT checking photo has multiple subject fields photo has one physical location etc. WHICH SAYS NOTHING ABOUT TALKS, SUBJECTS, PEOPLE, EVENTS, etc. <photo> <subject> http://www.w3.org/~timbl </subject> <name> Tim Berners-Lee</name> </name> … </photo> Kyoto U, Oct 2002 9 www.mindswap.org 9 The SEMANTICS is in the links (e.g. to ontologies)! Event:title <daml:ObjectProperty rdf:ID="photograph"> <rdfs:domain rdf:resource="#Picture"/> <rdfs:range rdf:resource= …#person"/> </daml:ObjectProperty> Event:WebPage < > rdf:type photo:Photograph, Photo:File http://…/images#image1, Photo:topic :event1#event:speaker. Event1 a Event:event; date “May 7-11”, speaker http://…#timbl.html Title “WWW 2002…” TimBL rdf:type w3c-ont:person; name “Tim Berners-Lee” … Kyoto U, Oct 2002 <s:Class rdf:about="http://www.semanticweb.org/o ntologies/swrc-onto-2000-0910.daml#Conference"> <s:comment> describes a generic conceptabout events </s:comment> <s:subClassOf rdf:resource="http://www.semanticweb.or g/ontologies/swrc-onto-2000-0910.daml#Event"/> <a:disjointFrom rdf:resource="http://www.semanticweb.or g/ontologies/swrc-onto-2000-0910.daml#Workshop"/> <a:restrictedBy rdf:resource="http://www.semanticweb.or g/ontologies/swrc-onto-2000-0910.daml#genid18"/> <rdf:Description rdf:about="http://www.w3.org/2001/03/earl/0.95#Person"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="http://www.w3.org/2001/03/earl/0.95#Assertor"/> </rdf:Description> 10 www.mindswap.org 10 Semantic Web Ontologies are “models” nme CV CV work vate educ CV ed uc •New SW languages add models to provide mappings and structure. •XML necessary, not sufficient. Semantics on the WEB Web ontologies, like the WWW itself, are not “separable” Thinking about the ontologies, without considering The links to other ontologies The instances that link to them The crawling and collecting of ontological terminologies Is like thinking about the Web without the links!! Other titles Mind: Other URIs Jobs: Kyoto U, Oct 2002 Other Professors Jobs: Professor Other Pages Mind:title DOC1 Hendler Jobs:placeOfWork Other descriptions 12 Web Page http://www… www.mindswap.org 12 Part 2: OWL - The “Web Ontology Language” OWL Kyoto U, Oct 2002 13 www.mindswap.org 13 OWL extends RDF… RDF-schema Class, subclass Property, subproperty + Restrictions Range, domain Local, global Existential Cardinality + Combinators Union, Intersection Complement Symmetric, transitive + Mapping Equivalence Inverse Kyoto U, Oct 2002 rdfs:Class rdf:ID="Meeting"> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#MeetingName"/> <daml:toClass rdf:resource="http://www.w3.org/2000/10/XMLSchema#string"/> <daml:cardinality>1</daml:cardinality> </daml:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#uri"/> <daml:toClass rdf:resource="http://www.w3.org/2000/10/XMLSchema#uriReference"/> <daml:maxCardinality>1</daml:maxCardinality> </daml:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#location"/> <daml:toClass rdf:resource="http://www.w3.org/2000/10/XMLSchema#string"/> <daml:cardinality>1</daml:cardinality> </daml:Restriction> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource="#Issues" /> <daml:toClass rdf:resource="#Issue" /> <daml:minCardinality>0</daml:minCardinality> </daml:Restriction> </rdfs:subClassOf> </rdfs:Class> 14 www.mindswap.org 14 Into a usable “Modeling” language In science, models provide interoperability across jargons Mathematical models: equations of a system Physical models: “sticks and balls” of the atom Virtual models: the visualization of a complex data set INFORMATION MODELS: taxonomies and thesauris Ontologies extend thesaurus information models to provide Semantic restrictions on property relations Must have vs. May have vs. Doesn’t have Has some vs. has N vs. has 1 Some vs. All property restrictions Formal underpinnings Logical entailments Note: rules, logics, proofs are parts of ontologies, but not yet at a “consensus” level for standardization Kyoto U, Oct 2002 Should build as add-ons to OWL to take advantage of “terminology features” 15 www.mindswap.org 15 OWL is not OWL is NOT… … A knowledge representation language per se Definitely not “The standard: for KR” … A “Description Logic” per se It does support DL “idioms” E.g. “Lymphoma” is restricted to be a subClassOf those things whose “disease” property is “Cancer” It will include a “subset” which is Complete, decidable, in DL complexity case But, it will allow uses that DLs do not Maybe outside the “semantics” of the model theory …The right thing to use in KR/KA research per se But do use it to distribute your results But do use it to test your theories Kyoto U, Oct 2002 16 www.mindswap.org 16 OWL is a WEB ontology langauge OWL is WEB-BASED DISTRIBUTED MACHINE-PROCESSIBLE BASED ON DAML+OIL By charter! It may become a Web recommendation Same “language status” as HTML, XML, XML schema A starting place for further evolution And SMIL, P3P, Standard ≠ Use Kyoto U, Oct 2002 17 www.mindswap.org 17 Part 3: KA in the (OWL supported) Sem Web The good news: DAML+OIL is already the most used ontology language in history Sept 30, 02: Crawler finds 5M+ DAML statements on 20,000+ web pages Doesn’t include many instance KBs tied to ontologies Doesn’t include many very large RDFS-based KBs that include some OWL OWL is being supported by large corporation labs Web tool developers: IBM, HP, Sun, Intel, Fujitsu Content providers: Daimler-Chrysler, Nokia, Motorola, EDS, Agfa OWL is starting to be used by thesaurus distributors C.f. National Cancer Institute metathesaurus to be released in OWL The bad news On the web it is a statistical blip -- the web is HUGE (HUMONGOUS!!) The big players are still on the sidelines We could become the next XML or the next SMIL Kyoto U, Oct 2002 18 www.mindswap.org 18 Do we need KA? Tom Mitchell made an interesting point He says “users are lazy” they won’t do mark-up He says we should use NLP + machine learning (primarily) He’s WRONG Greatest impact likely to be non-textual, non-document content DATA AND PROGRAMS 2010 IMAGES AND DOCUMENTS 2000 1990 Kyoto U, Oct 2002 19 www.mindswap.org 19 So who is going to mark it up? There are not now, and never will be, enough knowledge engineers to support the important, critical applications of our technology Government applications: NASA, US DoD … Health Care applications: Open Health, Swiss hospitals … Genomics/Bioinformatics: NCI metathesaurus, Gene Ontology… ... Historians: Freedman’s project Let alone the really important stuff out there MY information My photo archives, my home page, my daughter’s home page, my project pages, my favorite hobby pages, etc. etc. etc. Personal information created the Web!!! Kyoto U, Oct 2002 20 www.mindswap.org 20 Then a miracle occurs THE WEB!! Mosaic M ANY Q uickTim e™ and a G r aphics decom pr essor ar e needed t o see t his pict ur e. Tool USERS Language HTML M ar ket BETTER Users TO O LS Netscape, IE, Altavista, etc. Kyoto U, Oct 2002 21 www.mindswap.org 21 Key: The Value Proposition Tools must consider work v. value People will NOT use tools that require a lot of work and have little (perceived) value People WILL use tools that save them work and/or provide high (perceived) value “Perceived” value ≠ “real” value in many cases Creating Web pages (ca. 1993) was “cool” No study has yet shown a positive work value for the Web as a whole But it has changed the way we live Viral: My friend sees it, wants one. My competitor sees it, needs one Kyoto U, Oct 2002 TBL’s “secret” advice: Start small but viral and you can change many things (July, 02) 22 www.mindswap.org 22 Value Proposition 1: Semantic Page Creation The personal info killer application? Tell me about your : Important Person Hobby Job Query I know about - Scuba shop - Scuba vacation 1 - Scuba vacation 2 - Scuba instructor Ont Library Marked Up Pages classes Quic kTime™ and a TIFF (LZW) decompress or are needed to see this picture. Choice XHTML+OWL QuickTi me™ and a TIFF (LZW) decompr essor ar e needed to see this picture. • Many people don’t have home pages Value: Hints for useful properties (using ontology classes) Help create content (using ontology instances). Kyoto U, Oct 2002 •Note: Useful libraries (lots of stuff) already exist (see daml.org) www.mindswap.org 23 23 Value Proposition 2: Semantic Web Portals The MOSAIC of the Semantic Web? <XSLT/> KB <Oncogene rdf:ID="Oncogene, MYB"><code>C3682</code><id>3683</id> <Found_In_Organism rdf:ID="Human"></Found_In_Organism> <Gene_Has_Function rdf:ID="Gene Transcription"></Gene_Has_Function> <Gene_Has_Function rdf:ID="Transcriptional Regulation"></Gene_Has_Function> <In_Chromosomal_Location rdf:ID="6q22q23"/> </Oncogene> <Oncogene rdf:ID="Oncogene NMYC"> <code>C17656</code><id>17657</id><Found_In_Organism rdf:ID="Human"></Found_In_Organism> <In_Chromosomal_Location rdf:ID="2p24.1"/> <Gene_Has_Function rdf:ID="Transcriptional Regulation"> </Gene_Has_Function><Gene_Associated_With_Disease rdf:ID="Neuroblastoma "> </Gene_Associated_With_Disease></ Oncogene> • Combine browsing, search, and authoring Value: As I link to concepts, I find useful resources Pages, Databases, programs, etc. Kyoto U, Oct 2002 24 www.mindswap.org 24 Value prop 3: Semantic Web Services Kyoto U, Oct 2002 25 www.mindswap.org 25 VP 3: And service composition Kyoto U, Oct 2002 Buy the French version of a book from amazon.fr and have it sent to my mother 26 www.mindswap.org 26 Semantic Web Knowledge Acquisition Virtually no one will create ontologies from scratch High-End ontology developers will be a tiny percentage (10,000 High end Web Designers = 1/10,000 of users) It is easier to read then to create ontologies Expect “cut and paste” (HTML analogy) Most used OWL editor to date is Emacs Can Bootstrap from existing content HTML screen scrapers, structured data, Excel spread sheets,… No training allowed Motivated users will skim the docs on occasion Most users want to use it now “Everyone” has a browser - deploy tools through that Common metaphors must be used: Form fill, menu, search Note: No formal justification for any of these - but it worked before! Kyoto U, Oct 2002 27 www.mindswap.org 27 Adding power via Semantic Web Tools can be domain independent Your tool should be usable in lots of contexts! Use the standards: OWL and its successors crucial Tools should assume multiple ontologies “It’s the links, stupid” Ontology search, collection, “integration” crucial Check out the DAML crawler (http://www.daml.org/crawler) BackEnd technologies must be scaleable Can co-evolve with Semantic Web size But remember, the Web is HUGE Kyoto U, Oct 2002 28 www.mindswap.org 28 Allow extensibility Users MUST be able to add their own concepts Semantic Web (and OWL) allow this Advanced users will become ontology providers It will be “cool” to have yours be the ontology of choice in a domain Consistency CANNOT be maintained on the web May be a useful heuristic Insist on consistency and the Semantic Web fails! Kyoto U, Oct 2002 29 www.mindswap.org 29 GIVE IT AWAY!!!!! There is, and will be, no market for any of this unless we create it! No one will make money selling their tools until we have MANY more users Make small, cheap, easy to download version of your tools available Give it away The big winners on the web made it available for free: Browsers: Mosaic, Netscape, IE Plug-ins: Flash, RealPlayer, Quicktime Tools: Adobe, Real Media Kyoto U, Oct 2002 30 www.mindswap.org 30 Part 4: Mindswap tools Maryland Information and Network Dynamics Laboratory Semantic Web Agents Project http://www.mindswap.org/ Kyoto U, Oct 2002 31 www.mindswap.org 31 Practicing what I preach Open source Tools at http://www.mindswap.org Described in proceedings But out of date - open source moves fast Based on the principals outlined in this talk RIC: Ontologies make it EASIER to enter knowledge Turn properties into forms, use restrictions to check form filling Creates a KB of the results that can be used for search Coming soon: create a nice web page (using SXMLT) SMORE: Create content and markup as you go Multiple ontology ConvertToRDF: Dump spreadsheets to RDF using mapping ontology RDFScreenScraper: turn semi-structured web pages into ParkaSW: Scaleable, data-based KB back-end Some built in inferencing Pulled from the patent system to become open source! Kyoto U, Oct 2002 32 www.mindswap.org 32 Conclusions The Semantic Web is real, and it is moving fast Two years ago you hadn’t heard of it, now it’s on the cover of your proceedings We’ll win if we remember the “rules of the web” Berners-Lee Principle: Build small but viral Hendler’s Rule: On the web there is no “THE” Yours is ONE of the ways of doing it Consensus is hard, but critical We did it once and created DAML+OIL, the most-used AI language ever Everyone’s application is needed Value proposition: Make it fun, cool, and useful and people will kill to do the markup (The Web proves this) Give it away: Create the markets and we’ll all win THE YOUR work is important! Kyoto U, Oct 2002 This time it could be for real! 33 www.mindswap.org 33