Taxonomies & Classifications for Organizing Content QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. What do we know about taxonomies? Ontology comes from the Greek ontologia. Onto = the science of existence Logia =talking about being Who gets credit for taxonomies? Aristotle is the founder of taxonomy. •His ideas represent the foundation for object-oriented systems •He introduced a number of inference rules (syllogisms) used in modern logic-based reasoning systems Why is it, that in the last decade ( 2000 years after A) that knowledge representations & ontologies have gained importance? •Agent communication (Automated data mining) •Artificial Intelligence (Cyc) •Description of content to facilitate its retrieval (Intelligent searches) •Ecommerce (Amazon) •E-science experiments •E-learning systems •Information integration (Personalized newspapers & journals) •Intelligent devices (Management of Remote equipment) •Knowledge management (Corporate Intranet) •Speech and natural language understanding •Web Service discovery (Mobile devices) •Etc, etc, etc, whatever the humankind concocts (the MATRIX) What do all of these things have in common? •Automated data mining •Artificial Intelligence •Intelligent searches •Amazon •E-science experiments •E-learning systems •Personalized newspapers & journals •Intelligent devices •Knowledge management •Speech and natural language understanding •Web Service discovery Through the use of ONTOLOGIES, they attempt to represent knowledge in such a way that it can be understood by a computer and have the computer use this knowledge in real time. What are the ontological challenges? •Multiple groups of people are conceptualizing different ways to represent knowledge and the programs they write have different conceptual backgrounds: learning theory, psychology, philosophy, logic, computer science •Ontologies can differ depending on the needs/conventions of the producers & the consumers of the knowledge being represented. •The word ontology is used to describe different degrees of structure •Ontologies can differ depending on the needs/conventions of the producers & the consumers of the knowledge being represented. For example the word APPLIANCE has many different meanings: QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. An ontology about the domain of APPLIANCE could model: •Household Appliances (small & major) - blenders, expresso machine, stoves, washer/dryers, etc. •Computer Appliances - 1U, software, virtual, etc. •Orthodontic Appliances - braces, retainers, etc. Domain ontologies represent concepts in very specific and often eclectic ways, thus they are often incompatible. Furthermore, different ontologies in the same domain can also arise due to different perceptions of the domain based on cultural background, education, ideology, or because a different representation language was chosen •The word ontology has been used to describe artifacts with different degrees of structure. Simple taxonomies YAHOO Metadata schemes DUBLIN CORE Logical theories CYC Artificial Intelligence AI DAMLMarkup Agent DARPALanguage GL Generative Lexicon HAC Hierarchical Agglomerative Knowledge HTML Clustering IE HyperText Markup Language ILP Information Extraction IR Inductive Logic Programming JS Informational Retrieval KB Jensen-Shannon divergence KM Knowledge Base KR Knowledge Management LSI Knowledge Representation LSA Semantic Indexing Latent MRD Semantic Analysis (=LSI) Latent MT Machine Readable Dictionary MUC Translation Machine Message Understanding Conferences Named Entity Recognition NER NLP Language Processing Natural NP Phrase Noun OIL Ontology Inference Layer OWLOntology Language Web PLSI Probabilistic Latent Semantic PMI Indexing POS Pointwise Mutual Information PP Of Speech Part RDF(S) Prepositional Phrases SVMs Description Framework Resource VP (Schema) QA Support Vector Machines UMLPhrase Verb XML Questioning Answering XML-DTD Unified Modeling Language WSD eXtensible Markup Language XML-Document Type Definition Word Sense Disambiguation Regardless of these differences, in one way or another an ontology looks at a domain in terms of: • Classes (general things) in the many domains of interest • The relationships that can exist among things • The properties (or attributes) those things may have Cyc A project started in Austin, Texas by Doug Lenat as part of Microelectonics and Computer Technology. It is an AI project that attempts to assemble a comprehensive ontology and database of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning. The original knowledge base is proprietary, but now there is an open version. WordNet A semantic lexicon for the English language. The purpose is twofold: •to produce a combination of dictionary and thesaurus that is more intuitively usable •to support automatic text analysis and AI applications. The Dublin Core A metadata element set is a standard for cross-domain information resource description. It provides a simple and standardized set of conventions for describing things online in ways that make them easier to find. Dublin Core is widely used to describe: • Digital materials such as video •Sound •Image •Text •Composite media like web pages. Suggested Upper Merged Ontology or SUMO It was originally developed by the Teknowledge Corporation and now is maintained by Articulate Software. SUMO originally concerned itself with meta-level concepts and thereby would lead naturally to a categorization scheme for encyclopedias. It has now been considerably expanded to include a mid-level ontology and dozens of domain ontologies. SUMO was first released in December 2000. Web Ontology Language or OWL W30 trying to define an ontology that can be used across all domains and applications: •Agent communication •Artificial Intelligence •Description of content to facilitate its retrieval •Ecommerce •E-science experiments •E-learning systems •Information integration •Intelligent devices •Knowledge management •Speech and natural language understanding •Web Service discovery The General Formal Ontology (GFO) Developed by Heinrich Herre, Barbara Heller and collaborators (research group at Onto-Med in Leipzig. Primarily, the ontology GFO: • Includes objects as well as processes and both are integrated into one coherent system • includes levels of reality • is designed to support interoperability by principles of ontological mapping and reduction • contains several novel ontological modules in particular, a module for functions and a module for roles • is designed for applications, firstly in medical, biological, and biomedical areas, but also in the fields of economics and sociology. EXAMPLES of ONTOLOGIES IN AC Web Portals - define an ontology for its community An ontology for an information science portal includes the terms: "journal paper," "publication," "person," and "author." This ontology could include definitions that state things such as "all journal papers are publications" or "the authors of all publications are people." When combined with facts, these definitions allow other facts that are necessarily true to be inferred. These inferences can, in turn, allow users to obtain search results from the portal that are impossible to obtain from conventional retrieval systems. Such a technique relies on content providers using the web ontology language to capture highquality ontology relationships. EXAMPLES of ONTOLOGIES IN ACTI Multimedia Collection An indexer selects the value "Late Georgian" for the style/period of an antique chest of drawers, it should be possible to infer that the data element "date.created" should have a value between 1760 and 1811 A.D. and that the "culture" is British. Availability of this type of background knowledge significantly increases the support that can be given for indexing as well as for search. Another feature that could be useful is support for the representation of default knowledge. An example of such knowledge would be that a "Late Georgian chest of drawers," in the absence of other information, would be assumed to be made of mahogany. This knowledge is crucial for real semantic queries, e.g. a user query for "antique mahogany storage furniture" could match with images of Late Georgian chests of drawers, even if nothing is said about wood type in the image annotation. EXAMPLES of ONTOLOGIES IN ACT Corporate Website Management An ontology-enabled web site may be used by: •A salesperson looking for sales collateral relevant to a sales pursuit • A technical person looking for pockets of specific technical expertise and detailed past experience • A project leader looking for past experience and templates to support a complex, multi-phase project, both during the proposal phase and during execution A typical problem for each of these types of users is that they may not share terminology with the authors of the desired content. The salesperson may not know the technical name for a desired feature or technical people in different fields might use different terms for the same concept. For such problems, it would be useful for each class of user to have different ontologies of terms, but have each ontology interrelated so translations can be performed automatically. Moving from the World Wide Web to the Semantic Web Ontologies figure prominently in the emerging Semantic Web as a way of representing the semantics of documents and enabling the semantics to be used by web applications and intelligent agents. There are studies on generalized techniques for merging ontologies, but this area of research is still largely theoretical. Information versus Knowledge The World Wide Web is based mainly on documents written in Hypertext Markup Language (HTML). Language When you enter a search query: - Expandable - language independent “Information Architecture - machine understandable - understood by humans and Design -Fall 2007 and UT Austin” ambiguous the search engine is programmed to pull relevant documents based on an algorithm formula which factors metadata relevant to your query word: • number of keywords in the page Knowledge •name of images - changes •number of hyper linksrapidly entering and exiting the page •etc. - may be local to an entity Information versus Knowledge <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Information Architecture and Design Fall 2007</title> <meta name="keywords" content="Information Architecture, Information Design, Information Architecture and Design, School of Information, Web Information Seeking, Web Site Design, Information Seeking, The search program It has pulls no real understanding, Information Retrieval, Fall 2006"> <meta name="description" content="Course Web Site: Information Architecture and Design, Fall 2006"> <meta content="text/html; charset=iso-8859-1"> <link href="Web_files/iabeta.css" rel="stylesheet" type="text/css"> <link rel="stylesheet" type="text/css" href="Web_files/iaprint.css" media="print"> </head> <body> <!--Begin page header logo and search box. This header contains an image file that will change with each class.--> <div id="headerlogo"> NOINFORMATION KNOWLEDGE School of Information, The University of Texas at Austin<br> <span class="logoL2">385E Information Architecture and Design l</span><br> of the page <p class="logoL1"> <span class="logoL3">Fall 2007</span></p></div><div id="headersearch" class="noprint"> <form method="get" action="http://www.google.com/univ/utexas"><input name="q" size="30" maxlength="255" value="" align="top" type="text"><br> <input name="btnG" value="Search" align="center" type="submit"> <a href="http://www.google.com"><img src="Web_files/GoogleLogo.gif" border="0" height="27" width="64"></a></form> </div><!--Begin top navigation including primary (folder) and secondary nav subline. --><ul id="topnavfolders" class="noprint"><li><a href="index.html" class="selected">Overview</a></li> <li><a href="policies.html">Policies</a></li><li><a href="schedule.html">Schedule</a></li<li><a href="assignments.html">Assignments</a></li<li><a href="resources.html">Resources</a></li </ul><div id="topnavsub" class="noprint"><a href="#1" class="overview">General Info</a>&nbsp;<a href="#2" class="overview">Description</a>&nbsp;<a href="#3" class="overview">Objectives</a>&nbsp;<a href="#4" class="overview">Textbooks</a>&nbsp;<a href="#5" class="overview">Mailing List</a>&nbsp;</div><div id="content"><a name="1"></a><h1>General Information:</h1><p>Instructor: A. Fleming Seay, PhD <br>Email: <a href="mailto:Fleming_Seay@Dell.com">Fleming_Seay@Dell.com</a><br>Phone: (412) 3341682<br>Office Hours: by appointment</p><p>Class Meeting Time: Tuesday 6:30&ndash;9:30pm <br>Classroom: SZB 546<br>Course Website: <a href="http://www.ischool.utexas.edu/%7Ei385e/index.html">http://www.ischool.utexas.edu/~i385e</a><br>TA: Jade Anderson<br> ischool.utexas.edu">jade@ischool.utexas.edu</a> Email: <a href="mailto:jade@ Information versus Knowledge FACTS - what exists on the Web at the present time INTERPRETATION OF FACTS in light of: •Truths •Beliefs •Perspectives •Judgments •Methodologies •Know-how ontology = Information versus Knowledge Artificial Intelligence Agent Markup Language Generative Lexicon Hierarchical Agglomerative Knowledge Clustering HyperText Markup Language Information Extraction Inductive Logic Programming Informational Retrieval Jensen-Shannon divergence Knowledge Base Knowledge Management Knowledge Representation Latent Semantic Indexing Latent Semantic Analysis (=LSI) Machine Readable Dictionary Machine Translation Message Understanding Conferences Named Entity Recognition Natural Language Processing Noun Phrase Ontology Inference Layer Web Ontology Language Probabilistic Latent Semantic Indexing Pointwise Mutual Information Part Of Speech Prepositional Phrases Resource Description Framework (Schema) Support Vector Machines Verb Phrase Questioning Answering Unified Modeling Language eXtensible Markup Language XML-Document Type Definition Word Sense Disambiguation Bibliography Cimiano, Phillip. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. 2006. (New York: Springer Science & Business Media, LLC). Heflin, Jeff (editor). “OWL Web Ontology Language Use Cases and Requirements: W3C Recommendation 10 February 2004.”. http://www.w3.org/TR/webont-req/ . 2004. World Wide Web Consortium. Retrieved August 21, 2007. Hillman, Diane. “ Using Dublin Core.” http://dublincore.org/documents/usageguide/ . 1995-2007. Dublin Core Metadata Initiative. Retrieved July 25, 2007. Hillman, Diane. “Using Dublin Core - The Elements”. http://dublincore.org/documents/usageguide/elements.shtml . 1995-2007. Dublin Core Metadata Initiative. Retrieved July 25, 2007. Walton, D. Christopher. Agency and the Semantic Web. 2007. (NewYork: Oxford University Press). “about Cycorp.” http://www.cyc.com/cyc/company . 2002-2007. Cycorp, Inc. Retrieved September 29, 2007. “About Wordnet.” http://wordnet.princeton.edu/ . 2006. Princeton University. Retrieved September 29, 2007. “General Formal Ontology.” http://www.ontomed.de/en/theories/gfo/index.html . 2007. University Leipzig: Department of Formal Concepts. Retrieved September 29, 2007. “MODS: Metadata description Schema the Official Website. “ http://www.loc.gov/standards/mods/ . August 27, 2007. Library of Congress. Retrieved September 29, 2007. A core glossary is a simple glossary or defining dictionary which enables definition of other concepts, especially for newcomers to a language or field of study. It contains a small working vocabulary and definitions for important or frequently encountered concepts, usually including idioms or metaphors useful in a culture.In computer science, a core glossary is a prerequisite to a core ontology. An example of this is seen in SUMO.[edit] The search engine Google provides a service to only search web pages belonging to a glossary therefore providing access to a kind of compound glossary of glossary entries found on the web.[1] An upper ontology (or foundation ontology) is a model of the common objects that are generally applicable across a wide range of domain ontologies. It contains a core glossary in whose terms objects in a set of domains can be described. There are several standardized upper ontologies available for use, including Dublin Core, GFO, OpenCyc/ResearchCyc, SUMO, and DOLCEl. WordNet, while considered an upper ontology by some, is not an ontology: it is a unique combination of a taxonomy and a controlled vocabulary (see above, under Attributes). RDF (XML based syntax) RDFS OWL Ontology Web Language