Geoinformatics and Ontology based Discovery, Integration and Analysis of Geoscience data Ontology

advertisement
GEON
Geoinformatics and
Ontology based Discovery, Integration and
Analysis of Geoscience data
SESDI
Discovery
Ontology
Analysis
Integration
A.Krishna Sinha, Department of Geology
Virginia Tech, Blacksburg, pitlab@vt.edu
A.K.Sinha, 2007
(modified From Seber, 2005)
Earthquakes
Aquifers
Tectonics
Moho depth
Geology
Examples of geoscience
datasets: extreme
heterogeneity in syntax,
structure and semantics
Gravity
Sediment thickness
Focal
Mechanisms
Topography
Mines
Faults
Magnetics
What do Geo-scientists want from
cyberinfrastructure?
•
•
•
•
•
•
Data inter-operability
Access to High Performance Computation: grid based
Visualization to aid interpretation
Web based tools for computations , modeling
Archive and preserve legacy data
Advancing geoscience education
• Ease of registration, discovery and access to data (static and
streaming) from distributed systems= smart search
• And most importantly : query based Integration of diverse
disciplinary data=knowledge discovery , hypothesis evaluation
and hazard assessment
A.K.Sinha, 2007
Integration: a buzz word but with complex
solutions
•
What is Integration?
– Relationships in information contained in
heterogeneous and multi-disciplinary databases
What are our choices?
– Layering of data (commonly used): integration done
by the user
– Schema based integration (merging of schema, but
user must be knowledgeable about the organization,
e.g. semantics of schema, and an unlikely activity for
most geoscientists)
– Ontology based semantic integration utilizing web
services
Data Registration is Important for integration!
A.K.Sinha and Kai Lin, Geoinformatics
A.K.Sinha,2006
2007
Information Integration Scenario
What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How
about their 3-D geometry using gravity data ?
Find Geologic Maps
1
2
5
• Location
States
Virginia
•Classification System
Rock Classifiers
Igneous
Pluton
A-type
3
•Mineral
Zircon
•Geologic Time
Dating Methods
U-Pb Zircon Methods
Zr
4
I & S type
A type
Find tools
104 Ga/Al
3
6
5
Create a Data Product
Discover data sets
6
A.K.Sinha, 2007
729
722
745
730
706
724
Distribution and ages of A-Type
plutons based on integration of
multiple databases: a data product
that geologists would want through
cyberinfrastructure
735
A.K.Sinha, 2007
Cyber- solutions for geoscientists
• How do we get there…..?
– Discovery, Integration, and Analysis
• What are our choices….?
A.K.Sinha, 2007
Discovery and Integration of Geoscience Data
is a two step process
•Discovery of earth science data…step 1
Leading to
•Integration of earth science data…step 2
A.K.Sinha and Kai Lin, Geoinformatics
A.K.Sinha,2006
2007
How do Ontologies help
with Smart Discovery
and Semantic
Integration?
A.K.Sinha, 2007
What is Ontology? Why use Ontology?
An explicit formal specifications of the terms in the domain and relations among them
(Gruber 1993)
For earth scientists, it simply means making information about known
relationships between concepts and associated data. For example, such as those
that exist between rocks , density and seismic velocity available in a language that
computers can work with
Motivations for Using Ontologies
•
A better way to discover datasets
Use the knowledge in ontologies to find datasets
•
A better way to query datasets
Query through ontologies without knowing the details of the schemas
•
A better way to integrate multiple datasets
Integrate multiple datasets on-the-fly if they are registered to ontologies
•
A better way to segment large data bases
Transfer only parts of data bases required for integration
A.K.Sinha and Kai Lin, Geoinformatics
A.K.Sinha,2006
2007
Framework for discovery and integration
Data Discovery
Level 1:
Level 2:
Data Registration
at the Index Level
Data Registration
at the Item Level
•Discovery of
data resources
(e.g., geophysics
geologic maps, etc)
•Registration at data
level ontologies
(e.g. bulk rock
geochemistry,
gravity database)
Data Integration
Level 3:
Data Registration
at the Item Detail
Level: DATA ONTOLOGY
(e.g., column in
geochemical database
that represents
SiO2 measurement)
Requirement for
Semantic
Integration
Earth Sciences Virtual Database
A Data Warehouse where
Schema heterogeneity problem is
solved
Data Integration
Using DIA Engine
A.K.Sinha, 2007
Level 3: Registration at the Item Detail Level (Example1)
AnalyticalOxideConcentration
1
0..n
analyticalOxide: AnalyticalOxide
concentration : ValueWithUnit
errorOfConcentration : ValueWithUnit
GEON
approach of
registering
data to
concepts
removes
structural
(format) and
semantic
heterogeneity
A Section from Planetary Material Ontology
A.K.Sinha, 2007
Level 3: Registration at the Item Detail Level (Example2)
Data Ontology: Subject, Object, Unit, and Value
Unit
Subject
Mineral (Biotite)
has
1. wt%
2. wt%
Object
1. TiO2
2. SiO2
has
Value
1. 1.88
2. 37.09
AnalyticalOxideConcentration
analyticalOxide: AnalyticalOxide
concentration : ValueWithUnit
errorOfConcentration : ValueWithUnit
1
A Section from Planetary Material Ontology
A.K.Sinha, 2007
High Level Ontology Packages
Import
NASA: Semantic Web for Earth Science
Numerics Ontology
Import
http://www.isi.edu/~pan/OWLTime.html
Import
NASA: Semantic Web for Earth Science
Units Ontology
Geologic Time
Planetary Structure
Planetary Material
GeoImage
Physical Properties
PlanetaryLocation
Import
NASA: Semantic Web for Earth Science
Physical Property Ontology
Mathematical
& Statistical
Functions
Planetary
Phenomenon
Import
Import
Space Ontology
Physical Phenomena Ontology
NASA: Semantic Web for Earth Science NASA: Semantic Web for Earth Science
Planetary
Materials
Example of
Data Ontology
A.K.Sinha, 2007
Elements and isotopes
A.K.Sinha, 2007
Rocks
A.K.Sinha, 2007
Ontology Building is an Intensive Process
IBM
Rational Rose
Deploy Class Diagrams
to Protégé
Protégé
Web Ontology
Language (OWL)
Convert Protégé Project
to an OWL File
Description Logic
Reasoners
Run Consistency Checks
- Rational Rose is used to develop UML
class diagrams that describe the types of
objects in the system and relationships
between them.
- Protégé is an ontology editor.
- OWL is useful for information processing
as compared to only presentation.
- A Description Logic Reasoner performs
various inferencing services, such as
computing the inferred superclasses of a
class, determining whether or not a class
is consistent, deciding whether or not
one class is subsumed by another, etc.
A.K.Sinha and Kai Lin, Geoinformatics
A.K.Sinha,2006
2007
Ontology based Engine: from
concepts to data products
A.K.Sinha, 2007
The DIA Engine
– Discovery: Access to Data
– Integration of Data: Structural and Semantic
Data Heterogeneity
– Analysis of Data: Verify Hypothesis
A.K.Sinha, 2007
DIA Portal
A.K.Sinha, 2007
Geologic Map-Based Interface
A.K.Sinha, 2007
Data Product
• A-Type Bodies Identified
A.K.Sinha, 2007
Integrated Data Product
• Gravity Data With a Overlay of a Plutonic Body in Virginia
A.K.Sinha, 2007
DIA Internal Structure
•
User Interface:
– Displays the Data / Data Product in a Visually Enhancing Manner
• Java / VB Script
• Microsoft ASP .Net
• Microsoft VB .Net
•
Back-End:
The Major Technologies that are Used “Behind the Scenes” to Generate Data Products
– Microsoft .NET Web
Query: Show all A-type bodies in Virginia
Geological Map
Services
Server / Gazetteer
– Java Web Services
DIA
– ESRI’s ArcGIS Server
Engine
9.1
GEONResources
– ESRI’s ArcSDE 9.1
Web Service
(Spatial Database)
Registered U/Pb Zircon
SDSC GEON
– Microsoft SQL Server
Ages Data
Server
(Geo-Chemical
Database)
•
Functionality Coding:
– Microsoft Visual Basic
(to code the
discrimination filters)
– Java
A-Type
Filter
(Web service)
SoqlToSql
Web Service
Registered Geochemical Data
Query Specification
• In a selected “region of interest” the user is provided
with a number of options (the menu)
Sub-Menu #1
Sub-Menu #2
Igneous
GeoChemical
Sub-Menu #3
Magma Class
Sub-Menu #4
A-Type
Sub-Menu #5
Bounding
Box
Selection
Discriminant
Diagram
Analysis Type Selection
• User clicks through different menus to build an exact query
A.K.Sinha, 2007
Data Integration
Discovery
• Semantic integration
of data products in
DIA requires:
Geospatial Query
Aspatial Query
Integration
Between Different Ontologic
Classes
Within Same Ontologic Class
Geochemical
– Ontologies
• a data ontology to
interpret data
from different
sources
• a service ontology
A-Type Identification
Geochemical
Geophysics
VA. Ontologically
Registered Data
WY. Ontologically
Registered Data
TX. Ontologically
Registered Data
Ontologically Registered Data
– Data sharing
• requires data
registration
Data Product
A.K.Sinha, 2007
Geologic Time
Data Product
Analysis
DIA’s Solution for Current Tool Sharing
Approaches
• Each research group develops its own tools
• Tools developed by a research group are rarely
used by other groups
• Redundancy of development efforts
• Little interoperability amongst tools
– Interaction amongst different tools is often not possible or
requires extensive (re)coding
• Solution: Wrap Tools as Web Services Accessible to
the Scientific Community Worldwide
A.K.Sinha, 2007
Geoinformatics and the Semantic Web
From Data Ontology
to
Service Ontology
A.K.Sinha, 2007
Geoinformatics and the Semantic Web
• World Wide Web is primarily a repository of documents
written in HTML
• Human involvement is necessary for data interoperability
and integration
– Cumbersome Tasks: Large size / options of data sets
are a major impediment
• Machine-understandable information can facilitate
exchange and processing
– Meaning needs to be associated with the information
including data and services
– Primary objective of the semantic Web.
A.K.Sinha, 2007
Semantic Web Services
• Semantic Web services are the result of the
evolution of the syntactic definition of Web
services and the semantic Web
Bussler et al 2003: A Conceptual Architecture for Semantic Web Enabled Web Services
• As with data, mapping concepts of Web
services to service ontologies is required
A.K.Sinha, 2007
Semantic Interoperability: Bringing Data and
Services Together
A.K.Sinha, 2007
Example Query: Find the chemical composition of a liquid derived by 30% partial melting based on the
average abundances of REE of A-Type plutons in Virginia.
Query Input
GEON Portal
Display Data Products,
Associations, etc.
Report
Generation
Process
Execution
Data
Services
Workflow
Generation
Computational
Services
Query
Translation
Query
Engine
Tool
Services
Service Registries
Petrology
Database
Geophysics
Database
Images
Semantic Tool Ontologies
Semantic Data Ontologies
Maps
Geoscience
Disciplinary Databases
Query
Data
Discovery
Web Services Space
Data
Filtering Tool
Data Product
Computational
Tool
Partial
Melting Tool
Data Product
1. Semantic technologies are key to
changing the culture of geoscientists.
Semantic capabilities provide increase in
efficiency (may not make us smarter
scientists ! )
2. Proof of Concept beyond theory is
required to bring community on board
Summary
Geophysics
stratigraphy
Tectonics
Petrology
Hydrology
Geochemistry
Paleontology
Structure
3. Funding agencies should be encouraged
to support science driven needs & goals
4. Promote semantic Web based education
****************************************************************************************
5. Would we wish to consider a global geoscience semantic Web consortium?
6. Would we wish to consider establishing an Ontology center for coordinating
research? Funding?? Formalize partnerships between Societies,
Agencies??
A.K.Sinha,
2007
A.K.Sinha and Kai Lin, Geoinformatics
2006
http://www.geosociety.org/meetings/07geoInfo/index.htm
Download