DRAFT – The Rosetta Model: Can the Different Physical Science Data Models be Reconciled? – DRAFT Todd A King1 (tking@igpp.ucla.edu) Deborah L McGuinness2,3 (dlm@ksl.stanford.edu) Raymond J Walker1 (rwalker@igpp.ucla.edu) Peter Fox4 (pfox@ucar.edu) D Aaron Roberts5 (aaron.roberts@nasa.gov) Christopher Harvey6 (christopher.harvey@cesr.fr) 1 Insitute of Geophysics and Planetary Physics/UCLA, 2835 Slichter Hall, Los Angeles, CA 90095-1567, United States Rensselaer Polytechnic Institute 3 Stanford University, 353 Serra Hall, Stanford, CA 94305, United States 4 UCAR, 1850 Table Mesa Drive, Boulder, CO 80305, United States 5 NASA/NSSDC, Code 692 NASA Goddard Space Flight Center, Greenbelt, MD 20771 6 Centre de Données de la Physique des Plasmas (CDPP), 18 avenue Edouard Belin, TOULOUSE 31 401, France 2 1. Abstract There are a variety of data models in the physical sciences, some of which are in overlapping domains. Each of the data models have been derived in different ways. Some have been based on formal ontologies, others on informal ontologies and others on relational schemas. An additional complication is that different international agencies have divided the physical science domains into different sub-domains leading to some confusion as to which data model to adopt. The most prevalent data models in use today are the Planetary Data System (PDS), Space Physics Archive Search and Extract (SPASE), Virtual Solar Terrestrial Observatory (VSTO), the International Virtual Observatory Alliance (IVOA) and the Global Change Master Directory (GCMD). We take a comparative look at the various data models and ask the questions: Can they be reconciled? Is it possible to have a Rosetta Model to translate between each of the models? What role can ontologies play in defining a Rosetta Model? 2. Descriptions and Metadata There are many different information models and classification ontologies in use today. Each is designed for a particular application. Some are very general and others are tailored for a specific discipline. Some of the most widely used are: CAA: Cluster Active Archive. Designed to support the archiving and distribution of high quality calibrated data products from ESA's Cluster mission, using an approach general enough to be applicable to other environments. It has a Mission, Observatory, Instrument hierarchy. The recovered data & metadata is adequate for API use. 480 terms (198 classes and, 282 enumeration elements); Purpose: Resource Discovery, Resource Sharing, Arching, Content Classification.; Specification: Narrative and XML Schema Dublin Core: Originally designed for information resources (documents) and has been expanded to include data, images, movies, and other types of resources. 27 terms (15 core, 12 element types). Purpose: Resource Discovery (published works).; Specification: Narrative IVOA: The International Virtual Observatory Alliance (IVOA) is a set of standards to "facilitate the international coordination" of the "utilization of astronomical archives as an integrated and interoperating virtual observatory." Standards set by the IVOA include VOTable, VOResource, Unified Content Descriptor (UCD). 63 terms (6 categories, 57 terms) and 486 UCD terms for data classification. Purpose: Resource Discovery (data, collections, services, and curation) and Content Classification.; Specification: Narrative and XML Schema OAI-ORE: The Object Reuse and Exchange (ORE) activity of the Open Archives Initiative (OAI) which is developing specifications that allow distributed repositories to exchange information about their constituent digital objects. The first release of the ORE specifications is scheduled for March 8, 2008. The OAI-ORE is distinct from the OAI-PMH (a protocol for exchanging metadata) – Conceptual only. Purpose: Compound Object Description. PDS3: The Planetary Data System (PDS) is a data set nomenclature designed to be consistent across discipline boundaries and standards for labeling data files. Its intent is archive planetary science data and supporting information to enable effective use and interpretation. 14,458 terms (1643 elements and 81 objects. 12,734 standard values (2,848 target names, 144 volume sets, 1,966 volumes and 1,370 data set IDs)). Purpose: Archiving; Specification: Narrative, ODL with PDS vocabulary SPASE: The Space Physics Archive Search and Extract (SPASE) is a data model designed for the Solar and Space Physics communities to unify the data environment to facilitate finding, retrieving, formatting, and obtaining basic information about data essential for research. 340 terms (10 resource types, 35 entities (containers), 30 enumerations, 55 attributes. 265 items which are values used in enumeration (controlled lists)). Purpose: Resource Discovery, Resource Sharing and Content Classification; Specification: Narrative, XML Schema and XMI SWEET: Semantic Web for Earth and Environmental Terminology (SWEET) provides a common semantic framework for various Earth science initiatives. There are 17 ontologies consisting of biosphere, human_activities, process, substance, data_center, material_thing, property, sunrealm, data, numerics, sensor, time, earthrealm, phenomena, space, and units. 3,940 terms (17 ontologies). Purpose: Reference Model; Specification: OWL VSTO: Virtual Solar Terrestrial Observatory. Originally designed as a set of ontologies for organizing and integrating information spanning upper atmospheric terrestrial physics to solar physics. Fundamental classes include instrument, observatory, data, and services. Its upper level has been reused in other science areas including volcanology and plate tectonics. 407 terms (one ontology with 35 top-level classes). Purpose: Resource Discovery, Resource Sharing, and Content Classification. Specification: OWL 3. The Rosetta Model Participants • Mission • Observatory • Instrument o Detector • Person • Reference • Target Product • Sample (Physical) • Data Structure (Digital) o Catalog (record collection) o Table (row, column) o Image (x, y, z) o Movie (x, y, z, t) o n-Array o Compound Structure (?) • Documents Resource • Repository • Registry • Web Link • Service Collection • Dataset • Event • Campaign Annotation • Notes • Terms • Associations A. Cluster Active Archive Designed to support the archiving and distribution of high quality calibrated data products from ESA's Cluster mission, using an approach general enough to be applicable to other environments. It has a Mission, Observatory, Instrument hierarchy. The recovered data & metadata is adequate for API use. From the Cluster Metadata Dictionary, Issue: 2, Date: May 4, 2006 Rev. : 2 Metadata is information which describes a dataset. It should be complete, that is, contain all the information required to read and interpret the bits (syntactic description), and to understand what the resulting numerical values (or bit strings) represent (semantic description), including how the data was obtained ; the latter information impacts upon the scientific significance of the data. The purpose of the CAA Metadata Dictionary is to describe fully the required CAA metadata information, and to explain how that information must be formatted so as to be exploitable by the generic software of Cluster Active Archive. There are 6 top-level CAA concepts or classes: Level Description Mission This level contains information relevant to the whole mission. Observatory The Cluster mission consists of 4 observatories : Cluster-1, Cluster-2, Cluster-3, and Cluster-4. Experiment The Cluster mission has 11 experiments, each identified by its Principal Investigator, plus the auxiliary data. Instrument The Cluster instruments are identified by Observatory and Experiment. Dataset Each instrument produces one or more datasets ; this level of metadata is common to the whole of each dataset. Parameter File A dataset contains one or more parameters, each of which has its own metadata Each dataset is composed of ¯les, the number of which will grow regularly with time during CAA. For CAA, there will be : one block of metadata at the mission level (for the Cluster mission), four blocks at the observatory level (Cluster-1, Cluster-2, Cluster-3, Cluster-4) eleven blocks at the experiment level (one for each of the eleven instruments), sixty blocks of metadata (listed on page 32) and the instrument level, plus a further six blocks of metadata for the various auxiliary data products. To recover all the metadata relative to any one dataset it is necessary to know the relation between these blocks of metadata. For example, when looking at the metadata associated with the CIS-1 instrument (CIS instrument on Spacecraft 1) it is necessary to know that this is associated with metadata concerning the Experiment CIS and the Observatory Spacecraft-1, and that these are associated with the Mission Cluster. Linkage between the different levels (illustrated by the arrows in Fig. 1) is provided at each level by concept keywords included specially for this purpose. Overall Characteristics Scope: 480 terms (198 classes, 282 enumeration elements) Purpose: Resource Discovery, Resource Sharing, Arching, Content Classification. Specification: XML Schema References [CAA] Cluster Metadata Dictionary http://caa.estec.esa.int/documents/DataD_V22.pdf B. Dublin Core Originally designed for information resources (documents) and has been expanded to include data, images, movies, and other types of resources. From Wikipedia The Dublin Core standard includes two levels: Simple and Qualified. Simple Dublin Core comprises fifteen elements; Qualified Dublin Core includes three additional elements (Audience, Provenance and RightsHolder), as well as a group of element refinements (also called qualifiers) that refine the semantics of the elements in ways that may be useful in resource discovery. Simple Dublin Core The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements: 1. Title 2. Creator 3. Subject 4. Description 5. Publisher 6. Contributor 7. Date 8. Type 9. Format 10. Identifier 11. Source 12. Language 13. Relation 14. Coverage 15. Rights Each Dublin Core element is optional and may be repeated. The DCMI has established standard ways to refine elements and encourage the use of encoding and vocabulary schemes. There is no prescribed order in Dublin Core for presenting or using the elements. Full information on element definitions and term relationships can be found in the Dublin Core Metadata Registry [DCMR]. Qualified Dublin Core Subsequent to the specification of the original 15 elements, an ongoing process to develop exemplary terms extending or refining the Dublin Core Metadata Element Set (DCMES) was begun. The additional terms were identified, generally in working groups of the Dublin Core Metadata Initiative, and judged by the DCMI Usage Board to be in conformance with principles of good practice for the qualification of Dublin Core metadata elements. Element refinements make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope. The guiding principle for the qualification of Dublin Core elements, colloquially known as the Dumb-Down Principle, states that an application that does not understand a specific element refinement term should be able to ignore the qualifier and treat the metadata value as if it were an unqualified (broader) element. While this may result in some loss of specificity, the remaining element value (without the qualifier) should continue to be generally correct and useful for discovery. DCMI also maintains a small, general vocabulary recommended for use within the element Type. This vocabulary currently consists of 12 terms: 1. Collection 2. Dataset 3. Event 4. Image 5. InteractiveResource 6. MovingImage 7. PhysicalObject 8. Service 9. Software 10. Sound 11. StillImage 12. Text In addition to element refinements, Qualified Dublin Core includes a set of recommended encoding schemes, designed to aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules [DCENC]. A value expressed using an encoding scheme may thus be a token selected from a controlled vocabulary (e.g., a term from a classification system or set of subject headings) or a string formatted in accordance with a formal notation (e.g., "2000-12-31" as the standard expression of a date). If an encoding scheme is not understood by an application, the value may still be useful to a human reader. Overall Characteristics Scope: 27 terms (15 core, 12 element types) [5]. Purpose: Resource Discovery (published works). Specification: Narrative References [DCMR] Dublin Core Official web site http://dublincore.org/dcregistry/ [DCENC] Dublin Core Encoding Guidelines http://dublincore.org/resources/expressions/ [DCXML] Guidelines for implementing Dublin Core in XML http://dublincore.org/documents/abstract-model/ C. IVOA The International Virtual Observatory Alliance (IVOA) is a set of standards to "facilitate the international coordination" of the "utilization of astronomical archives as an integrated and interoperating virtual observatory." Standards set by the IVOA include VOTable, VOResource, Unified Content Descriptor (UCD). Excerpts from various IVOA documents [IVOA] VOResource The IVOA Resource Metadata specification (VOResource) permits describing the following attributes of a resource [VORES]: Identity metadata Title, ShortName, Identifier, Curation metadata Publisher, PublisherID, Creator, Creator.Logo, Contributor, Date, Version, Contact.Name, Contact.Address, Contact.Email, Contact.Telephone General content metadata Subject, Description, Source, ReferenceURL, Type, ContentLevel, Relationship, RelationshipID Collection and service content metadata Facility, Instrument, Coverage.Spatial, Coverage.RegionOfRegard, Coverage.Spectral, Coverage.Spectral.Bandpass, Coverage.Spectral.MinimumWavelength, Coverage.Spectral.MaximumWavelength, Coverage.Temporal.StartTime, Coverage.Temporal.StopTime, Coverage.Depth, Coverage.ObjectDensity, Coverage.ObjectCount, Coverage.SkyFraction, Resolution.Spatial, Resolution.Spectral, Resolution.Temporal, UCD, Format, Rights Data quality metadata DataQuality, ResourceValidationLevel, ResourceValidatedBy, Uncertainty.Photometric, Uncertainty.Spatial, Uncertainty.Spectral, Uncertainty.Temporal Service metadata Service.AccessURL, Service.InterfaceURL, Service.BaseURL, Service.HTTPResultsMIMEType, Service.StandardID, Service.MaxSearchRadius, Service.MaxReturnRecords, Service.MaxReturnSize Unified Content Descriptors Unified Content Descriptors (UCD) is a formal vocabulary for astronomical data that is controlled by the International Virtual Observatory Alliance (IVOA). The vocabulary is restricted in order to avoid proliferation of terms and synonyms, and controlled in order to avoid ambiguities. A UCD is used to classify a token of information. For example, it may be used to identify the type of information in a field of a table or a tagged value in metadata description [VOUCD]. All existing UCD1+ words are grouped into 12 main categories. These categories are expressed by the first atom of the word, whose possible values are: 1. arith (arithmetics) 2. em (electromagnetic spectrum) 3. instr (instrument) 4. meta (metadata) 5. obs (observation) 6. phot (photometry) 7. phys (physics) 8. pos (positional data) 9. spect (spectral data) 10. src (source) 11. stat (statistics) 12. time (time) VOTable The VOTable format is an XML standard for the interchange of data represented as a set of tables [VOTAB]. It extends the HTML Table specification by adding metadata to describe the contents of the table. This includes the data type, units and classification of the contents of each field in a table. The VOTable format also permits encode binary data to be included in the table or reference external streams of binary data. Overall Characteristics Scope: 63 terms (6 categories, 57 terms) and 486 UCD terms for data classification. Purpose: Resource Discovery (data, collections, services, and curation) Content Classification. Specification: Narrative and XML Schema References [IVOA] IVOA Web Site http://www.ivoa.net/ [VORES] Resource Metadata for the Virtual Observatory, Version 1.12, IVOA Recommendation 2007 March 2. http://www.ivoa.net/Documents/latest/RM.html [VOUDC] The UCD1+ controlled vocabulary, Version 1.23, IVOA Recommendation 02 April 2007 http://www.ivoa.net/Documents/latest/UCDlist.html [VOTAB] VOTable Format Defnition, Version 1.1, IVOA Recommendation 2004-08-11 http://www.ivoa.net/Documents/latest/VOT.html [VOASTR] Ontology of Astronomical Object Types, Version 1.0, IVOA Working Draft 2007 Feb 19 http://www.ivoa.net/Documents/WD/Semantics/AstrObjectOntology20070219.pdf D. OAI-ORE The Object Reuse and Exchange (ORE) activity of the Open Archives Initiative (OAI) which is developing specifications that allow distributed repositories to exchange information about their constituent digital objects. The first release of the ORE specifications is scheduled for March 8, 2008. The OAI-ORE is distinct from the OAIPMH (a protocol for exchanging metadata) Excerpts from the Object Reuse and Exchange white paper [OAIORE] Compound information objects are aggregations of distinct information units that when combined form a logical whole. Some examples of these are a digitized book that is an aggregation of chapters, where each chapter is an aggregation of scanned pages; a music album that is the aggregation of several audio tracks; an image object that is the aggregation of a high quality master, a medium quality derivative and a low quality thumbnail; a scholarly publication that is aggregation of text and supporting materials such as datasets, software tools, and video recordings of an experiment; and a multi-page web document with an HTML table of contents that points to multiple interlinked HTML individual pages. If we consider all information objects reusable in multiple contexts (a notable feature of networked information), then the aggregation of a specific information unit into a compound object is not due to the inherent nature of the information unit, but the result of the intention of the human author or machine agent that composed the compound object. Research in the Semantic Web community has introduced the notion of named graphs[5], which are essentially a set of RDF assertions, forming a graph, to which a URI is assigned. The graph as a whole then can be treated as a web resource, and assertions such as metadata statements, authority, etc. can be associated with that resource. These ideas are very promising as an approach to expressing the notion of a compound object on the web. However, they remain in a research phase, and need further specification in order to become adoptable as part of an implementable interoperability specification. Our proposals described later in this document build on this notion of a named graph. A core goal of OAI-ORE – Object Reuse and Exchange – is to develop standardized, interoperable, and machine-readable mechanisms to express compound object information on the web. The OAI-ORE standards will make it possible for web clients and applications to reconstruct the logical boundaries of compound objects, the relationships among their internal components, and their relationships to the other resources in the web information space. This will provide the foundation for the development of value-adding services for analysis, reuse, and re-composition of compound objects, especially in the areas of e-Science, e-Scholarship, and scholarly communication, which are the target applications of ORE To enable widespread adoption of the standards developed by OAI-ORE we have determined that they must be congruent with and leverage the Web Architecture. This architecture essentially consists of: URIs that identify resources, which are “items of interest”, that, when accessed through standard protocols such as HTTP, return representations of current resource state and which are linked via URI references. The combination of nodes, which denote resources, and arcs, which assert the relationships among those resources, forms the web graph, and HTTP access to this graph is the basis for services (e.g. robot-based search engines) and data mining (e.g., link analysis) from which new information and knowledge is derived. An illustration of publishing a compound object on the web: Overall Characteristics In development [OREPROJ] Open Archives Initiative - Object Reuse and Exchange Project http://www.openarchives.org/ore/ [OAIORE] Open Archives Initiative – Object Reuse and Exchange, Compound Information Objects: The OAI-ORE Perspective, May 28, 2007 http://www.openarchives.org/ore/documents/CompoundObjects-200705.html E. PDS3 The Planetary Data System (PDS) is a data set nomenclature designed to be consistent across discipline boundaries and standards for labeling data files. Its intent is archive planetary science data and supporting information to enable effective use and interpretation. Excerpts from PDS documents [PDS] A mission archive should contain sufficient documentation of the mission, the instrument(s), and calibration procedures necessary for members of the current and future science community to effectively use and, if appropriate, recalibrate the data. An archive includes complete information about the geometry relevant to the observations (e.g., spacecraft position and orientation relative to the target). It also includes catalog files that may be ingested into the PDS database along with the raw data and higher order data products. [PDSAPG] PDS defines a data set as a logical grouping of data products. Data sets may be combined into data set collections. For example, all of the data sets from one instrument from a given mission could be considered a data set collection. Each product is assigned a Product ID which is unique within a data set. In turn each data set has a unique ID, as well as each data set collection. It is possible to refer to a specific product using a combination of Data Set ID and Product ID. All data submitted to PDS must be accompanied by a set of catalog files which briefly describe the mission, instrument host (that is, the spacecraft or other facility within which the instrument operates), instrument, and data set. Additional catalog files identify key personnel and references cited in other catalog files. The set of 6 core catalog files is: MISSION.CAT: INSTHOST.CAT: INST.CAT: DATASET.CAT: PERSON.CAT: REF.CAT: Mission description Instrument host (spacecraft) description. Instrument description. Dataset description. Person description. Reference description. Descriptions of targets may also be describe in a Target Catalog file (TARGET.CAT). Descriptions of products are stored in a "label" file which is stored with the data. A label may be included at the beginning of a data file and describe its contents (attached label) or in a separate file (detached label). References to files within a label can not contain paths so the label must existing at the same location (folder/directory) in the file system as the data. The contents of a label conform to the Object Definition Language (ODL) specification and the current release of the PDS data dictionary. [PDSSD] The data dictionary describes each object and allowed elements. A PDS label can describe the detailed structure and format of a data file. This includes the binary representation of the data, record structure and bit-level location of information. There are current 1643 elements and 81 objects in the PDS data dictionary. Many elements in PDS have standard value lists (a controlled list of possible values). PDS defines allowed standard values as part of an element definition (not as a discrete element in the dictionary). There are a total of 12,734 standard values. In the PDS system each target (planet, moon, asteroid, etc) has a standard value as does each volume and dataset. There are 2,848 target names, 144 volume sets, 1,966 volumes and 1,370 data set IDs. Overall Characteristics Scope: 14,458 terms (1643 elements and 81 objects. 12,734 standard values (2,848 target names, 144 volume sets, 1,966 volumes and 1,370 data set IDs)) Purpose: Archiving Specification: Narrative and ODL with PDS vocabulary References [PDS] PDS Documentation http://pds.nasa.gov/documents/ [PDSAPG] Planetary Data System - Archive Preparation Guide (APG), Aug 29, 2006, Version 1.1 http://pds.nasa.gov/documents/apg/apg_Aug_29h.pdf [PDSSR] PDS Standards Reference http://pds.nasa.gov/documents/sr/index.html F. SPASE The Space Physics Archive Search and Extract (SPASE) is a data model designed for the Solar and Space Physics communities to unify the data environment to facilitate finding, retrieving, formatting, and obtaining basic information about data essential for research. Excerpts from SPASE Documentation[SPASE-DM] The Solar and Space Physics communities need a unified data environment to facilitate finding, retrieving, formatting, and obtaining basic information about data essential for their research. With the increasing requirement for data from multiple sources, this need has become acute. A unified method to describe data and other resources is the key to achieving this unified environment. The SPASE (Space Physics Archive Search and Extract) Data Model provides a basic set of terms and values organized in a simple and homogeneous way, to facilitate access to Solar and Space Physics resources. The SPASE Model will provide the detailed information at the parameter level required for Solar and Space Physics applications. The Data Model provides enough detail to allow a scientist to understand the content of Data Products (e.g., a set of files for 3 second resolution Geotail magnetic field data for1992 to 2005), together with essential retrieval and contact information. A typical use would be to have a collection of descriptions stored in one or more related internet-based registries of products; these could be queried with specifically designed search engines which link users to the data they need. The Data Model also provides constructs for describing components of a data delivery system. This includes repositories, registries and services. Resources At the top level of the Data Model is the Resource Type. Each type of resource has a tailored set of attributes to describe the resource. For data products the resource types are: Numerical Data, Display Data, and Catalog and the resource types that support these: Observatory, Instrument, Registry, Repository, Service, Granule, Person Each resource is assigned a unique resource identifier (URI) so that it can be referenced by other resources, within publications, by a user or by service. There are currently 10 resource types in SPASE. The data dictionary contains 378 terms consisting of 35 entities (containers), 30 enumerations, 55 attributes and 265 items which are values used in controlled lists (enumerations). Overall Characteristics Scope: 340 terms (10 resource types, 35 entities (containers), 30 enumerations, 55 attributes. 265 items which are values used in enumeration (controlled lists)) Purpose: Resource Discovery, Resource Sharing and Content Classification; Specification: Narrative, XML Schema and XMI References [SPASE] SPASE web site http://www.spase-group.org/ [SPASE-DM] A Space and Solar Physics Data Model from the SPASE Consortium, Version: 1.2.0, Release Date: 2007-05-22. http://www.spase-group.org/data/doc/spase-1_2_0.pdf G. SWEET Semantic Web for Earth and Environmental Terminology (SWEET) provides a common semantic framework for various Earth science initiatives. There are 17 ontologies consisting of biosphere, human_activities, process, substance, data_center, material_thing, property, sunrealm, data, numerics, sensor, time, earthrealm, phenomena, space, and units. Excerpts from SWEET Documents [SWEETGUIDE] The ontologies within the Semantic Web for Earth and Environmental Terminology (SWEET) provide an upper-level ontology for Earth system science. The SWEET ontologies include several thousand terms, spanning a broad extent of Earth system science and related concepts (such as data characteristics) using the OWL language. The ontologies can be downloaded from http://sweet.jpl.nasa.gov/sweet. To support such a large collection and adhere to the guiding principles, the concepts are divided, where possible, into orthogonal dimensions or facets in support of reductionism. The primary ontologies are shown in Figure and explained below. Each box represents a separate ontology, and a connecting line indicates where major properties are used to define concepts across ontology spaces. Ontologies revised and validated Jan 26, 2006 Earth Realm Physical Phenomena Physical Process Physical Property Physical Substance Sun Realm Biosphere Data Data Center Human Activity Material Thing Numerics Sensor Space Time Units Overall Characteristics Scope: 3,940 terms (17 ontologies) Purpose: Reference Model Specification: OWL References [SWEET] SWEET Web site http://sweet.jpl.nasa.gov/ [SWEETGUIDE] Guide to SWEET Ontologies, Rob Raskin, NASA/Jet Propulsion Lab http://sweet.jpl.nasa.gov/guide.doc H. VSTO Virtual Solar Terrestrial Observatory. Originally designed as a set of ontologies for organizing and integrating information spanning upper atmospheric terrestrial physics to solar physics. Fundamental classes include instrument, observatory, data, and services. Its upper level has been reused in other science areas including volcanology and plate tectonics. Excerpts from the VSTO documents [VSTO] The Virtual Solar Terrestrial Observatory (VSTO) project provides an electronic repository of observational data spanning the solar-terrestrial physics domain. VSTO is a distributed, scalable education and research environment for searching, integrating, and analyzing observational, experimental and model databases in the fields of solar, solarterrestrial and space physics (SSTSP) and utilizes semantic web technologies. We are also implementing tools and infrastructure for accessing and using the data. Our main contributions include the repository, infrastructure, and tools for the particular solar terrestrial physics as well as the design and infrastructure that may be broadened to cover more diverse science areas and communities of use. Overall Characteristics Scope: 407 terms (one ontology with 35 top-level classes) Purpose: Resource Discovery, Resource Sharing, and Content Classification. Specification: OWL References [VSTO] VSTO Project web site http://vsto.hao.ucar.edu/