SWEET: Upper-Level Ontologies for Earth System Science OPeNDAP Meeting Feb 2007 Rob Raskin PO.DAAC Jet Propulsion Laboratory Data to Knowledge Data Basic Elements Services Storage Interoperability Volume/Density Statistics Analysis Methodology Information Knowledge Bytes Numbers Models Facts Ingest Archive Visualize Infer Understand Predict File Database HDF-EOS GIS/MIS Ontology Mind Syntactic OPeNDAP WMS/WCS Semantic High/Low Low/High Checksum Moments Descriptive Inferential Fourier Wavelet EOF SSA Exploratory-analysis Model-based-mining Syntax Semantics Semantics: Shared Understanding of Concepts Provides a namespace for scientific terms…plus Provides descriptions of how terms relate to one another Example tags in markup language: subclass, subproperty, part of, same as, transitive property, cardinality, etc. Enables object in “data space” to be associated formally with object in “science concept space” “Shared understanding” enables software tools to find “meaning” in resources Ontology Representation W3C has adopted four XML-based standard ontology languages: Basic building blocks: RDF, OWL-Lite, OWL-DL, OWL Full Class, subclass, property, subproperty, sameAs Standard language enables anyone to extend an ontology Knowledge built up incrementally Why an Upper-Level Ontology for Earth System Science? Many common concepts used across Earth Science disciplines (such as properties of the Earth) Provides common definitions for terms used in multiple disciplines or communities Provides common language in support of community and multidisciplinary activities Reduced burden (and barrier to entry) on creators of specialized domain ontologies Only need to create ontologies for incremental knowledge Semantic Web for Earth & Environmental Terminology (SWEET) Ontology of Earth system science and data concepts Provides a common semantic framework (or namespace) for describing Earth science information and knowledge Emphasis on improving search for NASA Earth science data resources Represented in OWL-DL SWEET Ontologies Integrative Ontologies Living Substances Non-Living Substances Faceted Ontologies Natural Phenomena Physical Processes Human Activities Earth Realm Physical Properties Data Space Time Numerics Units SWEET Supports Knowledge Reuse SWEET is a concept space Enables scalable classification of Earth science and datarelated concepts Enables object in data space to be mapped to science concept space Concept space is translatable into other languages/cultures using “sameAs” notions SWEET Science Ontologies Earth Realms Physical Properties temperature, composition, area, albedo, … Substances Atmosphere, SolidEarth, Ocean, LandSurface, … CO2, water, lava, salt, hydrogen, pollutants, … Living Substances Humans, fish, … SWEET Conceptual Ontologies Phenomena ElNino, Volcano, Thunderstorm, Deforestation, Terrorism, physical processes (e.g., convection) Each has associated EarthRealms, PhysicalProperties, spatial/temporal extent, etc. Specific instances included e.g., 1997-98 ElNino Human Activities Fisheries, IndustrialProcessing, Economics,… SWEET Numerical Ontologies SpatialEntities TemporalEntities Extents: duration, century, season, … Relations: after, before, … Numerics Extents: country, Antarctica, equator, inlet, … Relations: above, northOf, … Extents: interval, point, 0, positiveIntegers, … Relations: lessThan, greaterThan, … Units Extracted from Unidata’s UDUnits Added SI prefixes Multiplication of two quantities carries units Numerical Ontologies Numeric concepts defined in OWL only through standard XML XSD spec Added in SWEET Intervals defined as restrictions on real line Numerical relations (lessThan, max, …) Cartesian product (multidimensional spaces) Numeric ontologies used to define spatial and temporal concepts XSD: Datatypes Numeric String boolean, decimal, float, double, integer, nonNegativeInteger, positiveInteger, nonPositiveInteger, negativeInteger, long, int, short, unsignedLong, unsignedInt, unsignedShort, unsignedByte, hexBinary, base64Binary String, normalizedString, anyURI, token, language, NMTOKEN, Name, NCName Date dateTime, time, date, gYearMonth, gYear, gMonthDay, gDayxsd:gMonth Data and Services Ontology Formats Data models Data Sttructures Special values Missing, land, sea, ice, etc. Parameters Scale factors, offsets, algorithms Data Services Subset, reproject Example: AIRS Level 2 Dataset Subset of Dataset where DataModel= Level 2 Instrument= AIRS HorizontalDimension= 2 VerticalDimension= 1 Format= HDF-EOS Property= Temperature Substance= Air 3DLayer Fragment of SWEET subClassOf PlanetaryLayer partOf Atmosphere partOf sameAs= “Lower Atmosphere” primarySubstance =“air” AtmosphereLayer subClassOf Troposphere isUpperBoundaryOf subClassOf Stratosphere upperBoundary =50 km lowerBoundary =15 km isLowerBoundaryOf Tropopause How SWEET was Initially Populated Initial sources GCMD Over 10,000 datasets Over 1000 keywords Data providers submit additional terms for “free-text” search CF Over 700 keywords Very long term names surface_downwelling_photon_spherical_irradiance_in_sea_w ater Decomposed into facets Property= spherical_irradiance Substance= sea_water Space= surface Direction= down Collaboration Web Site Discussion tools Version Control/ Configuration Management Trace dependencies on external ontologies Tools to search for existing concepts in registered ontologies Ontology Validation Procedure Blog, wiki, moderated discussion board W3C note is formal submission method Registry/discovery of ontologies Support workflows/services for ontology development Community Issues Content Standards and Conventions Agreement on standards for use of OWL Fuzzy representation conventions Submit as standard to NASA Standards & Processes Working Group Review Board Maintain alignment given expansion of classes and properties Who will oversee and maintain for perpetuity (or at least through the next funding cycle)? ESIP Federation? A new consortium? Global Support Provide tools to visualize and appreciate the big picture Update/Matching Issues No removal of terms except for spelling or factual errors Must avoid contradictions Additions can create redundancy if sameAs not used Humans must oversee “matching” Subscription service to notify affected ontologies when changes made CF has established moderator to carry out analogous additions OWL “import” imports entire file Associate community with ontology terms Community tagging Best Practices Keep ontologies small, modular Be careful that “Owl:Import” imports everything Use higher level ontologies where possible Identify hierarchy of concept spaces Model schemas Try to keep dependencies unidirectional Web Sites http://sweet.jpl.nasa.gov http://PlanetOnt.org