GEON Geoinformatics and Ontology based Discovery, Integration and Analysis of Geoscience data SESDI Discovery Ontology Analysis Integration A.Krishna Sinha, Department of Geology Virginia Tech, Blacksburg, pitlab@vt.edu A.K.Sinha, 2007 (modified From Seber, 2005) Earthquakes Aquifers Tectonics Moho depth Geology Examples of geoscience datasets: extreme heterogeneity in syntax, structure and semantics Gravity Sediment thickness Focal Mechanisms Topography Mines Faults Magnetics What do Geo-scientists want from cyberinfrastructure? • • • • • • Data inter-operability Access to High Performance Computation: grid based Visualization to aid interpretation Web based tools for computations , modeling Archive and preserve legacy data Advancing geoscience education • Ease of registration, discovery and access to data (static and streaming) from distributed systems= smart search • And most importantly : query based Integration of diverse disciplinary data=knowledge discovery , hypothesis evaluation and hazard assessment A.K.Sinha, 2007 Integration: a buzz word but with complex solutions • What is Integration? – Relationships in information contained in heterogeneous and multi-disciplinary databases What are our choices? – Layering of data (commonly used): integration done by the user – Schema based integration (merging of schema, but user must be knowledgeable about the organization, e.g. semantics of schema, and an unlikely activity for most geoscientists) – Ontology based semantic integration utilizing web services Data Registration is Important for integration! A.K.Sinha and Kai Lin, Geoinformatics A.K.Sinha,2006 2007 Information Integration Scenario What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry using gravity data ? Find Geologic Maps 1 2 5 • Location States Virginia •Classification System Rock Classifiers Igneous Pluton A-type 3 •Mineral Zircon •Geologic Time Dating Methods U-Pb Zircon Methods Zr 4 I & S type A type Find tools 104 Ga/Al 3 6 5 Create a Data Product Discover data sets 6 A.K.Sinha, 2007 729 722 745 730 706 724 Distribution and ages of A-Type plutons based on integration of multiple databases: a data product that geologists would want through cyberinfrastructure 735 A.K.Sinha, 2007 Cyber- solutions for geoscientists • How do we get there…..? – Discovery, Integration, and Analysis • What are our choices….? A.K.Sinha, 2007 Discovery and Integration of Geoscience Data is a two step process •Discovery of earth science data…step 1 Leading to •Integration of earth science data…step 2 A.K.Sinha and Kai Lin, Geoinformatics A.K.Sinha,2006 2007 How do Ontologies help with Smart Discovery and Semantic Integration? A.K.Sinha, 2007 What is Ontology? Why use Ontology? An explicit formal specifications of the terms in the domain and relations among them (Gruber 1993) For earth scientists, it simply means making information about known relationships between concepts and associated data. For example, such as those that exist between rocks , density and seismic velocity available in a language that computers can work with Motivations for Using Ontologies • A better way to discover datasets Use the knowledge in ontologies to find datasets • A better way to query datasets Query through ontologies without knowing the details of the schemas • A better way to integrate multiple datasets Integrate multiple datasets on-the-fly if they are registered to ontologies • A better way to segment large data bases Transfer only parts of data bases required for integration A.K.Sinha and Kai Lin, Geoinformatics A.K.Sinha,2006 2007 Framework for discovery and integration Data Discovery Level 1: Level 2: Data Registration at the Index Level Data Registration at the Item Level •Discovery of data resources (e.g., geophysics geologic maps, etc) •Registration at data level ontologies (e.g. bulk rock geochemistry, gravity database) Data Integration Level 3: Data Registration at the Item Detail Level: DATA ONTOLOGY (e.g., column in geochemical database that represents SiO2 measurement) Requirement for Semantic Integration Earth Sciences Virtual Database A Data Warehouse where Schema heterogeneity problem is solved Data Integration Using DIA Engine A.K.Sinha, 2007 Level 3: Registration at the Item Detail Level (Example1) AnalyticalOxideConcentration 1 0..n analyticalOxide: AnalyticalOxide concentration : ValueWithUnit errorOfConcentration : ValueWithUnit GEON approach of registering data to concepts removes structural (format) and semantic heterogeneity A Section from Planetary Material Ontology A.K.Sinha, 2007 Level 3: Registration at the Item Detail Level (Example2) Data Ontology: Subject, Object, Unit, and Value Unit Subject Mineral (Biotite) has 1. wt% 2. wt% Object 1. TiO2 2. SiO2 has Value 1. 1.88 2. 37.09 AnalyticalOxideConcentration analyticalOxide: AnalyticalOxide concentration : ValueWithUnit errorOfConcentration : ValueWithUnit 1 A Section from Planetary Material Ontology A.K.Sinha, 2007 High Level Ontology Packages Import NASA: Semantic Web for Earth Science Numerics Ontology Import http://www.isi.edu/~pan/OWLTime.html Import NASA: Semantic Web for Earth Science Units Ontology Geologic Time Planetary Structure Planetary Material GeoImage Physical Properties PlanetaryLocation Import NASA: Semantic Web for Earth Science Physical Property Ontology Mathematical & Statistical Functions Planetary Phenomenon Import Import Space Ontology Physical Phenomena Ontology NASA: Semantic Web for Earth Science NASA: Semantic Web for Earth Science Planetary Materials Example of Data Ontology A.K.Sinha, 2007 Elements and isotopes A.K.Sinha, 2007 Rocks A.K.Sinha, 2007 Ontology Building is an Intensive Process IBM Rational Rose Deploy Class Diagrams to Protégé Protégé Web Ontology Language (OWL) Convert Protégé Project to an OWL File Description Logic Reasoners Run Consistency Checks - Rational Rose is used to develop UML class diagrams that describe the types of objects in the system and relationships between them. - Protégé is an ontology editor. - OWL is useful for information processing as compared to only presentation. - A Description Logic Reasoner performs various inferencing services, such as computing the inferred superclasses of a class, determining whether or not a class is consistent, deciding whether or not one class is subsumed by another, etc. A.K.Sinha and Kai Lin, Geoinformatics A.K.Sinha,2006 2007 Ontology based Engine: from concepts to data products A.K.Sinha, 2007 The DIA Engine – Discovery: Access to Data – Integration of Data: Structural and Semantic Data Heterogeneity – Analysis of Data: Verify Hypothesis A.K.Sinha, 2007 DIA Portal A.K.Sinha, 2007 Geologic Map-Based Interface A.K.Sinha, 2007 Data Product • A-Type Bodies Identified A.K.Sinha, 2007 Integrated Data Product • Gravity Data With a Overlay of a Plutonic Body in Virginia A.K.Sinha, 2007 DIA Internal Structure • User Interface: – Displays the Data / Data Product in a Visually Enhancing Manner • Java / VB Script • Microsoft ASP .Net • Microsoft VB .Net • Back-End: The Major Technologies that are Used “Behind the Scenes” to Generate Data Products – Microsoft .NET Web Query: Show all A-type bodies in Virginia Geological Map Services Server / Gazetteer – Java Web Services DIA – ESRI’s ArcGIS Server Engine 9.1 GEONResources – ESRI’s ArcSDE 9.1 Web Service (Spatial Database) Registered U/Pb Zircon SDSC GEON – Microsoft SQL Server Ages Data Server (Geo-Chemical Database) • Functionality Coding: – Microsoft Visual Basic (to code the discrimination filters) – Java A-Type Filter (Web service) SoqlToSql Web Service Registered Geochemical Data Query Specification • In a selected “region of interest” the user is provided with a number of options (the menu) Sub-Menu #1 Sub-Menu #2 Igneous GeoChemical Sub-Menu #3 Magma Class Sub-Menu #4 A-Type Sub-Menu #5 Bounding Box Selection Discriminant Diagram Analysis Type Selection • User clicks through different menus to build an exact query A.K.Sinha, 2007 Data Integration Discovery • Semantic integration of data products in DIA requires: Geospatial Query Aspatial Query Integration Between Different Ontologic Classes Within Same Ontologic Class Geochemical – Ontologies • a data ontology to interpret data from different sources • a service ontology A-Type Identification Geochemical Geophysics VA. Ontologically Registered Data WY. Ontologically Registered Data TX. Ontologically Registered Data Ontologically Registered Data – Data sharing • requires data registration Data Product A.K.Sinha, 2007 Geologic Time Data Product Analysis DIA’s Solution for Current Tool Sharing Approaches • Each research group develops its own tools • Tools developed by a research group are rarely used by other groups • Redundancy of development efforts • Little interoperability amongst tools – Interaction amongst different tools is often not possible or requires extensive (re)coding • Solution: Wrap Tools as Web Services Accessible to the Scientific Community Worldwide A.K.Sinha, 2007 Geoinformatics and the Semantic Web From Data Ontology to Service Ontology A.K.Sinha, 2007 Geoinformatics and the Semantic Web • World Wide Web is primarily a repository of documents written in HTML • Human involvement is necessary for data interoperability and integration – Cumbersome Tasks: Large size / options of data sets are a major impediment • Machine-understandable information can facilitate exchange and processing – Meaning needs to be associated with the information including data and services – Primary objective of the semantic Web. A.K.Sinha, 2007 Semantic Web Services • Semantic Web services are the result of the evolution of the syntactic definition of Web services and the semantic Web Bussler et al 2003: A Conceptual Architecture for Semantic Web Enabled Web Services • As with data, mapping concepts of Web services to service ontologies is required A.K.Sinha, 2007 Semantic Interoperability: Bringing Data and Services Together A.K.Sinha, 2007 Example Query: Find the chemical composition of a liquid derived by 30% partial melting based on the average abundances of REE of A-Type plutons in Virginia. Query Input GEON Portal Display Data Products, Associations, etc. Report Generation Process Execution Data Services Workflow Generation Computational Services Query Translation Query Engine Tool Services Service Registries Petrology Database Geophysics Database Images Semantic Tool Ontologies Semantic Data Ontologies Maps Geoscience Disciplinary Databases Query Data Discovery Web Services Space Data Filtering Tool Data Product Computational Tool Partial Melting Tool Data Product 1. Semantic technologies are key to changing the culture of geoscientists. Semantic capabilities provide increase in efficiency (may not make us smarter scientists ! ) 2. Proof of Concept beyond theory is required to bring community on board Summary Geophysics stratigraphy Tectonics Petrology Hydrology Geochemistry Paleontology Structure 3. Funding agencies should be encouraged to support science driven needs & goals 4. Promote semantic Web based education **************************************************************************************** 5. Would we wish to consider a global geoscience semantic Web consortium? 6. Would we wish to consider establishing an Ontology center for coordinating research? Funding?? Formalize partnerships between Societies, Agencies?? A.K.Sinha, 2007 A.K.Sinha and Kai Lin, Geoinformatics 2006 http://www.geosociety.org/meetings/07geoInfo/index.htm