The Semantic Web Barry Smith http://ontologist.com The problem of ontology human beings can integrate highly heterogeneous information Consider how the human mind copes with complex phenomena in the social realm (e.g. speech acts of promising) which involve: – – – – – – – – experiences (speaking, perceiving), intentions, language, action (and tendencies to action), deontic powers, obligations, claims, authority … background habits, mental competences, records and representations understanding how computers can effect the same sort of integration is a difficult problem A new silver bullet The Semantic Web designed to integrate the vast amounts of heterogeneous online data and services via dramatically better support at the level of metadata designed to yield the ability to query and integrate across different conceptual systems Tim Berners-Lee, inventor of the internet ‘sees a more powerful Web emerging, one where documents and data will be annotated with special codes allowing computers to search and analyze the Web automatically. The codes … are designed to add meaning to the global network in ways that make sense to computers’ hyperlinked vocabularies, called ‘ontologies’ will be used by Web authors ‘to explicitly define their words and concepts as they post their stuff online. ‘The idea is the codes would let software "agents" analyze the Web on our behalf, making smart inferences that go far beyond the simple linguistic analyses performed by today's search engines.’ Exploiting tools such as: XML OWL (Ontology Web Language) RDF (Resource Descriptor Framework) DAML-OIL (Darpa Agent Mark-Up Language – Ontology Inference Layer) (? confusing syntactic integration with semantic integration) University Ontology Person* AdministrativeStaff Employee* Director Faculty Chair Professor Dean AssistantProfessor ClericalStaff AssociateProfessor SystemsStaff FullProfessor Student VisitingProfessor Undergraduate Lecturer GraduateStudent PostDoc Organization* Assistant Department ResearchAssistant Institute TeachingAssistant Program ResearchGroup School University Publication* Article* BookArticle* ConferencePaper* JournalArticle* WorkshopPaper* Book* Periodical* Journal* Magazine* Proceedings* Thesis* University Ontology Relations advisor(Student, Professor) affiliateOf(Organization, Person)* affiliatedOrganization(Organization, Organization)* alumnus(Organization, Person)* containedIn(Document, Document)* doctoralDegreeFrom(Person, University) emailAddress(Person, .STRING)* head(Organization, Person)* listedCourse(Schedule, Course) mastersDegreeFrom(Person, University) member(SocialGroup, Person)* University Ontology Relations offers(University, Course) publicationAuthor(Document, Person)* publicationDate(Document, .DATE)* publicationOrg(Document, Organization)* publicationResearch(Publication, Research) publisher(Document, Organization)* researchInterest(Person, Research) researchProject(ResearchGroup, Research) subOrganizationOf(Organization:"suborganization" , Organization:"superorganization")* takesCourse(Student, Course) Defining ‘gene’ GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype Example: The Enterprise Ontology A Sale is an agreement between two Legal-Entities for the exchange of a Product for a Sale-Price. A Strategy is a Plan to Achieve a high-level Purpose. A Market is all Sales and Potential Sales within a scope of interest. Example: Statements of Accounts Company Financial statements may be prepared under either the (US) GAAP or the (European) IASC standards These allocate cost items to different categories depending on the laws of the countries involved. Job: to develop an algorithm for the automatic conversion of income statements and balance sheets between the two systems. Not even this relatively simple problem has been satisfactorily resolved … why not? Because the very same terms mean different things and are applied in different ways in different cultures Verizon The promise of Web Services, augmented with the Semantic Web, is to provide THE major solution for integration, the largest IT cost / sector, at $ 500 BN/year. The Web Services and Semantic Web trends are heading for a major failure (i.e., the most recent Silver Bullet). In reality, Web Services, as a technology, is in its infancy. ... There is no technical solution (i.e., no basis) other than fantasy for the rest of the Web Services story. Analyst claims of maturity and adoption (...) are already false. ... Verizon must understand it so as not to invest too heavily in technologies that will fail or that will not produce a reasonable ROI. Dr. Michael L. Brodie, Chief Scientist, Verizon IT OntoWeb Meeting, Innsbruck, December 16-18, 2002 Assumptions Communication / compatibility problems should be solved automatically (by machine) Hence ontologies must be applications running in real time Application ontology: Ontologies are inside the computer thus subject to severe constraints on expressive power (effectively the expressive power of Description Logic) The Semantic Web Initiative The Web is a vast edifice of heterogeneous data sources Needs the ability to query and integrate across different conceptual systems How resolve incompatibilities? enforce terminological compatibility via standardized term hierarchies, with standardized definitions of terms, which 1. satisfy the constraints of a description logic (DL) 2. are applied as meta-tags to the content of websites Clay Shirky The Semantic Web is a machine for creating syllogisms. Humans are mortal Greeks are human Therefore, Greeks are mortal Lewis Carroll - No interesting poems are unpopular among people of real taste - No modern poetry is free from affectation - All your poems are on the subject of soapbubbles - No affected poetry is popular among people of real taste - No ancient poetry is on the subject of soapbubbles Therefore: All your poems are bad. the promise of the Semantic Web it will improve all the areas of your life where you currently use syllogisms most of the data we use is not amenable to recombination in syllogistic form because it is partial, inconclusive, contextsensitive So we guess, extrapolate, intuit, we do what we did last time, we do what we think our friends would do … but we almost never use syllogistic logic. We Describe the World in Generalities People who live in Brooklyn speak with a Brooklyn accent People who live in France speak French Merging Databases Merging databases simply becomes a matter of recording in RDF somewhere that "Person Name" in your database is equivalent to "Name" in my database, and then throwing all of the information together and getting a processor to think about it. [http://infomesh.net/2001/swintro/] Is your "Person Name = John Smith" the same person as my "Name = John Q. Smith"? Who knows? Not the Semantic Web XML-syntax does not help <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> and with correct XML-syntax: <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17 </STREET> and with correct XML-syntax: Is "Jules" the <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> first name of the <LASTNAME>Deryck</LASTNAME> person, or of the <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> business-card? <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> and with correct XML-syntax: Is Jules or <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> Newco the <LASTNAME>Deryck</LASTNAME> member of XTC <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> Group? <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> and with correct XML-syntax: <BUSINESS-CARD> <FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> Do the phone <JOBTITLE>Business Manager</JOBTITLE> numbers and <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> address belong <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> to Jules or to the <ADDRESS> business? <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD> Metadata: the new Silver Bullet agree on a metadata standard for washing machines as concerns size, price, etc. create machine-readable databases and put them on the net consumers can query multiple sites simultaneously and search for highly specific, reliable, context-sensitive results Shirkey: The Semantic Web's philosophical argument -- the world should make more sense than it does -- is hard to argue with. The Semantic Web, with its neat ontologies and its syllogistic logic, is a nice vision. However, like many visions that project future benefits but ignore present costs, it requires too much coordination and too much energy to be effective in the real world … Shirkey Much of the proposed value of the Semantic Web is coming, but it is not coming because of the Semantic Web. The amount of meta-data we generate is increasing dramatically, and it is being exposed for consumption by machines as well as, or instead of, people. But it is being designed a bit at a time, out of selfinterest and without regard for global ontology. Semantic Web effort thus far devoted primarily to developing systems for standardized representation of web pages and web processes (= ontology of web typography) not to the harder task of developing of ontologies (term hierarchies) for the content of such web pages Cory Doctorow A world of exhaustive, reliable metadata would be a utopia. Problem 1: People lie Meta-utopia is a world of reliable metadata. But poisoning the well can confer benefits to the poisoners Metadata exists in a competitive world. Some people are crooks. Some people are cranks. Some people are French philosophers. Practical problems of the semantic web: who will police the coding? Problem 2: People are lazy Half the pages on Geocities are called “Please title this page” Problem 3: People are stupid The vast majority of the Internet's users (even those who are native speakers of English) cannot spell or punctuate Will internet users learn to accurately tag their information with whatever DLhierarchy they're supposed to be using? Problem 4: Multiple descriptions “Requiring everyone to use the same vocabulary denudes the cognitive landscape, enforces homogeneity in ideas.” (Cary Doctorow) Problem 5: Ontology Impedance = semantic mismatch between ontologies being merged This problem recognized in Semantic Web literature: http://ontoweb.aifb.uni-karlsruhe.de /About/Deliverables/ontoweb-del-7.6-swws1.pdf Solution 1: treat it as (inevitable) ‘impedance’ and learn to find ways to cope with the disturbance which it brings Suggested here: http://ontoweb.aifb.uni-karls-ruhe.de/About/Deliverables/ontoweb-del-7.6-swws1.pdf Solution 2: resolve the impedance problem on a case-by-case basis Suppose two databases are put on the web. Someone notices that "where" in the friends table and "zip" in the places table mean the same thing. http://www.w3.org/DesignIssues/Semantic.html We can use the Semantic Web to prove that Joe loves Mary we found two documents on a trusted site, one of which said that ":Joe :loves :MJS", and another of which said that ":MJS daml:equivalentTo :Mary". We also got the checksums of the files in person from the maintainer of the site. To check this information, we can list the checksums in a local file, and then set up some FOPL rules that say "if file 'a' contains the information Joe loves mary and has the checksum md5:0qrhf8q3hfh, then record SuccessA", "if file 'b' contains the information MJS is equivalent to Mary, and has the checksum md5:0892t925h, then record SuccessB", and "if SuccessA and SuccessB, then Joe loves Mary". [http://infomesh.net/2001/swintro/] Both solutions fail 1. treating mismatches as ‘impedance’ ignores the problem of error propagation (and is inappropriate in an area like medicine) 2. resolving impedance on a case-by-case basis defeats the very purpose of the Semantic Web Clinicians often do not use category systems at all – they use unstructured text from which usable data has to be extracted in a further step Why? Because every case is different, much patient data is context-dependent Problem 5: Ontology Impedance = semantic mismatch between ontologies ‘gene’ used in websites issued by biotech companies involved in gene patenting medical researchers interested in role of genes in predisposition to smoking insurance companies Other problems with DL-based ontologies DL poor when dealing with contextdependent information/usages of terms e.g. Severe Acute Respiratory Syndrome and when it comes to dealing with time and when it comes to dealing with information about instances (rather than concepts or classes) SARS is NOT Severe Acute Respiratory Syndrome it is THIS collection of instances of Severe Acute Respiratory Syndrome associated with THIS coronavirus and ITS mutations Experience shows that there can be no mechanical solution to the problems of data integration in domains like medicine or genetics, or in the domain of really existing commercial transactions The problem in every case is one of finding an overarching framework for good definitions, definitions which will be adequate to the nuances of the domain under investigation For DL Ontologies are software tools thus limited in their expressive power and in their effectiveness as quality controls IFOMIS idea: distinguish two separate tasks: - developing computer applications capable of running in real time - developing an expressively rich ontology of a sort which will allow sophisticated quality control Problem 4: Multiple descriptions Requiring everyone to use the same vocabulary to describe their material is not always practicable and this is especially so in the medical domain Basic Formal Ontology BFO The Vampire Slayer BFO ontology not the ‘standardization’ or ‘specification’ of concepts (not a branch of knowledge or concept engineering) but an inventory of the types of entities existing in reality BFO goal: to remove ontological impedance by constraining terminology systems with good ontology BFO not a computer application but a reference ontology in the sense of Aristotelian philosophy -- it sacrifices tractability for the sake of expressive power Defining ‘gene’ GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype Ontology ‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’ ... ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ … are ontological terms in the sense of traditional (philosophical) ontology Two basic BFO oppositions Granularity (of molecules, genes, cells, organs, organisms ...) SNAP vs. SPAN getting time right of crucial importance for medical informatics MedO: medical domain ontology theory of granularity relations between – molecule ontology – gene ontology – cell ontology – anatomical ontology – etc. Will serve as basis for new, validated Medical WordNet BFO not just a system of categories but a formal theory with definitions, axioms, theorems designed to provide formal resources for the building of reference ontologies for specific domains the latter should be of sufficient richness that terminological incompatibilities can be resolved intelligently rather than by brute force The Reference Ontology Community IFOMIS (Leipzig) Laboratories for Applied Ontology (Trento/Rome, Turin) Foundational Ontology Project (Leeds) Ontology Works (Baltimore) Ontek Corporation (Buffalo/Leeds) Language and Computing (L&C) (Belgium/Philadelphia) Domains of Current Work IFOMIS Leipzig: Medicine, Bioinformatics Laboratories for Applied Ontology Trento/Rome: Ontology of Cognition/Language Turin: Law Foundational Ontology Project: Space, Physics Ontology Works: Genetics, Molecular Biology Ontek Corporation: Biological Systematics Language and Computing: Natural Language Understanding MOG (Melbourne Ontology Group)(?) The End