What is Cyberinfrastructure?

advertisement
Cyberinfrastructure and EarthScope Science goals:
A GEON perspective
What is Cyberinfrastructure?
What is GEON?
How will GEON research facilitate discovery and
integration of earth science data?
What are the benefits of such a research initiative for
EarthScope?
How can earth scientists participate in
Cyberinfrastructure research opportunities?
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Cyberinfrastructure
• Cyberinfrastructure is the organized
aggregate of technologies enabling
access and coordination of
information technology resources
to facilitate science, engineering,
and societal goals.
– Data access from distributed systems
–
–
–
–
Data inter-operability
Computation: grid based and workflows
Visualization
Tools
National Science Foundation’s
Cyberinfrastructure
NSF Blue Ribbon Panel
(Atkins) Report
provided a compelling
and comprehensive
vision of an integrated
Cyberinfrastructure
– Integration: highlighted today
Modified from Berman,
SDSC, 2005
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
A KEY OBSERVATION IN SUPPORT OF
CYBERINFRASTRUCTURE RESEARCH IN GEOSCIENCES
“Large team** efforts are required to build a
federation of data and tools; but smaller groups or
individuals working independently and given
access to these data and tools can (and likely will)
make fundamental discoveries”
MODIFIED FROM BLUE RIBBON ADVISORY PANEL ON CYBERINFRUSTRUCTURE
REPORT, NSF
** such as GEON and other projects
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
EarthScope Instrumentation and Data
USArray
PBO
SAFOD
InSAR
Plus semantic integration of other earth science data
Cyberinfrastructure
Resources
Science Investigators
Towards an
Integrated
Earth Science
data and
knowledge base
to achieve
EarthScope
Science and
education goals
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Educators and the Public
A.K.Sinha, Virginia Tech, 2005
Adapted from
D.Seber,SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Three dimensional view of the lithosphere-asthenosphere boundary and surface topography of
the northern Appalachians. Base of lithosphere interpolated from migrated Ps waveform
images at 6 labeled stations. (From Rychert et al. 2005)
New knowledge about evolution of continents requires complex integration of
geophysical data with those associated with sub-crustal lithosphere ages, its
composition and physical properties (seismic, thermal etc), surface geology and
associated events chronology
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
What is the
geologic and
geophysical
record of SuperContinent
assembly and
dispersal?
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
EarthScope Science Targets: Examples from eastern North
America
• What is the geologic and geophysical record of Super-Continent
assembly and dispersal?
• What are the architectures of terrane boundaries at depth?
• How do composition, temperature and strain fabrics vary within the
lithosphere and asthenosphere? Are lithospheric and asthenospheric
strain coupled?
• How sharp is the lithosphere-asthenosphere boundary? What defines
it?
DATA NEEDED TO ADDRESS THESE QUESTIONS ARE DISTRIBUTED
ACROSS THE COUNTRY, IN DIFFERENT FORMATS AND CANNOT
BE INTEGRATED IN A WEB ENVIRONMENT WITH EXISTING
TECHNOLOGIES
—overcoming heterogeneity is a priority cyberinfrastructure
challenge
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Outline
• Data integration problem and solutions
• GEON data integration solution: ontology
enabled semantic mediation
• What is ontology
• Registering data to ontologies
• Discovering data and using workflows in a web
environment to go from queries to questions
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
GEON Architecture addresses problems of :
1. Variety of data sources and types
2. Discovery and relevance
3. Addressing needs of different communities
• Platform heterogeneity: different OS platforms
• DBMS heterogeneity: different database systems, e.g.
SQLServer, mySQL, DB2
• Data type heterogeneity
• Schema heterogeneity
• Heterogeneity in units, accuracy, resolution
• Semantic heterogeneity
( modified from Baru, SDSC, 2005)
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
What is GEON ? How can GEON help
integrate heterogeneous and distributed
data?
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
GEON: The Geosciences
Network
www.geongrid.org
 GEON is a NSF funded collaborative
research between IT and Earth Science
researchers with the goal of developing
cyberinfrastructure to enable new integrative
modes of geosciences research
 GEON is developing a pioneering system
to use knowledge-based techniques to
discover, query, and integrate data in the
Geosciences
 Project participants include 14 PI
institutions, as well as partners from other
projects, agencies, and industry.
 GEON has deployed a Web servicesbased, distributed computing infrastructure,
called the GEONgrid, across the PI and
partner sites.
 GEONgrid provides access to distributed
data collections, tools, and applications
Research and Education Products and Results:
 Technologies for “Smart Search”, On-the-fly
Data Integration, GIS Map Integration, Distributed
Portals, and 4D Visualization
 Earth Science Research within GEON on
 3D Lithospheric Structure
 Integrated Geoscience Modeling
 Geologic Evolution of North America
 Ontologic Framework for the Geo-sphere
 Cyberinfrastructure Summer Institute for
Geoscientists and Graduate Courses in
Geoinformatics
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
GEON and Cyberinfrastructure
• Develop cyberinfrastructure that enables
interlinking and sharing multidisciplinary
Earth Science data resources, software
and tools
• Create a scientist-friendly portal to access
data, software for analysis , modeling, and
visualization
• Create the GEONgrid to enable seamless
data integration and analysis environment
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
GEON: GEOsciences Network
Data
Physical model
Portal (login, myGEON)
Registration
GEONsearch
GEONworkbench
Data
Visualization
Indexing
Workflow
Registration
Integration
& Mapping
Services
Services
Services
Services
Services
Core Grid Services
GT3, OGSA-DAI, GSI, CAS, gridFTP, SRB, PostGIS, mySQL, DB2
Modeling
Environment
Physical Grid
RedHat Linux, ROCKS, Internet, I2, OptIPuter (planned)
Model results
HPCC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Discovering, sharing and using data in a web
environment: GEON style
• Discovery of data resources
(e.g., gravity, geologic maps, etc)
requires registration through
use of high level index terms
• GEON has deployed extension
of AGI Index terms-will be cross
indexed to others such as
GCMD, AGU
• Discovering Item level content of
databases requires registration
through data level ontologies
(e.g. column in geochemical
database that represents SiO2
measurement) and is a
requirement for semantic
integration
• Item detail level registration
through ontologies reduces
schema based data
heterogeneities
• Computation and modeling
tools can be registered for use
by community
• Visualization capabilities
• Easy access to data through
GEON Portal
• Individual workbench built into
GEON Portal
• Scientific Workflow Systems
provide computational and
query capabilities in a web
environment
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
GEON Index
Ontology
AGI Index Terms
Index terms from AGI used for identifying type of data
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Integration: a buzz word but with complex solutions
•
What is Integration?
– Relationships in information contained in
heterogeneous and multi-disciplinary databases
What are our choices?
– Layering of data (commonly used)
– View based techniques (create a virtual schema)
– Schema based integration (merging of schema, but
user must be knowledgeable about the organization,
e.g. semantics of schema)
– Ontology based semantic integration utilizing
workflows….favored by GEON
Data Registration is Important for integration!
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
What is Ontology? Why use Ontology?
• Ontology : An explicit formal specifications of the terms in the
domain (e.g. Geology) and relations among them (Gruber 1993)
• Why use ontology
 To share and reuse of domain knowledge
 To make explicit domain assumptions
 To separate domain knowledge from the operational knowledge
 To analyze domain knowledge
• Ontology Languages:
 RDF and RDFS
 OIL
 DAMP+OIL
 OWL: Ontology Web Language fromW3C
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Motivations for Using Ontologies in GEON
• A better way to discover and understand datasets
Use the knowledge in ontologies to find datasets
• A better way to query datasets
Query through ontologies without knowing the details of the schemas
• A better way to integrate multiple datasets
Integrate multiple datasets on-the-fly if they are registered to ontologies
• A Better way to segment large data bases
Transfer only parts of data bases required for integration
An emerging research frontier- Geo-Ontology
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Modified From Kai Lin, SDSC, 2005
A.K.Sinha, Virginia Tech, 2005
Class Diagrams - The Basic Building Block for Semantic
Integration
Earth Scientists create disciplinary ontologies!!!
Earth Science research : stages in developing ontologies
Napkin Stage
GEON Planetary Structure concepts
GEON formal ontology
Concept Map Stage
High Level Ontology: integrated GEON,
SWEET and NADM stage
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
High Level Ontology Packages : representing relationships
Imports Numerics
Ontology
Imports Time
Ontology
Imports Units
Ontology
Planetary Material
Phy sicalProperty
Location
Imports Phy sical
Property Ontology
Planetary Structure
ImportsSpace Ontology
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Planetary Material
State of Matter
Element
Rocks
Minerals
Data Types
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
GEON Cyberinfrastructure …
More than just about the data,
GEON is about going from simple
Queries to complex Questions
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
A Home Buyer’s Information Integration Problem
What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms,
a nearby school ranking in the upper third, in a neighborhood
with below-average crime rate and diverse population?
?
Information
Integration
“Multiple-Worlds”
Mediation
Bertram Ludäscher, SDSC
Realtor
Crime Stats
School Rankings
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Demographics
A.K.Sinha, Virginia Tech, 2005
A query example: Use SQL to ask a database to
show you all white wines from California of 2003
vintage….
A question: "Tell me what wines I should buy to
serve with each course of the following menu. And,
by the way, I don't like Sauternes." … from W3C
This requires two databases (e.g. food and wine) and
prescribed relationships between them that are
defined for computers as Ontologies
Bertram Ludäscher, SDSC
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
The Problem: Scientific Data Integration
or: … from Queries to Questions
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Data Registration: key to integration
Click on Submission
to register a dataset
Input a data set name
Select a zipped
shapefile
Choose an ontology class
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Registration at the item detail level using data ontology: working with data
SiO2 is an instance of class
AnalyticalOxideConcentration and has all
information about the element Si
Planetary Material Ontology
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
GEONsearch:building on data registration
Choose subject (from a “base” ontology)
Choose location (from a gazetteer Webservice)
Choose a time (numeric range or from a time
ontology Webservice)
Choose concepts from ontologies
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Kai Lin, SDSC, 2005
A.K.Sinha, Virginia Tech, 2005
Ontology Enabled Map Integration :A Case Study
• Geologic Data sets
Arizona, Idaho, Montana, Utah, Nevada, Colorado, Wyoming, New Mexico
• Ontologies
 Geologic Time Scale
 Multihierarchial Rock Classification from Canada Geologic Survey
 British Rock Classification Scheme
Snapshot after querying “Paleozoic”
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Scientific Workflow Systems in GEON
• Adding computational capability in a web environment
– Promote “scientific discovery” by providing tools and
methods to generate scientific workflows
– Support computational infrastructure for
modeling,classification,computation
– Design frameworks which define efficient ways to connect to
the existing data and integrate heterogeneous data from
multiple resources
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Workflow layout
for rock
classification, but
can be used for
any query that
requires a
classifier
Find data on
the basis of
ontologic
registration
PointInPolygon
algorithm
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Integration Scenario: A-type pluton query
•
•
Classifying A-types from an Igneous rock database
Integrating between Relational and Spatial (shapefiles) databases to query and interactively
display GIS results
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Integration Scenario: Stages for access to data and tools in a
workflow environment
The integration scenario: What is the distribution and U/Pb Zircon ages of A-Type Plutons in Virginia?
Ontology System
1
2
5
• Location
States
Virginia
•Classification System
Rock Classifiers
Igneous
Pluton
A-type
3
•Mineral
Zircon
•Geologic Time
Dating Methods
U-Pb Zircon Methods
Zr
4
I&S
type
A type
104 Ga/Al
3
6
5
6
6
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
Distribution and ages of A-Type plutons
and their ages based on integration of
multiple databases
How do earth scientists participate in Cyberinfrastructure research?
• Know your data……its content and definitions
• Think more broadly…..integration is between databases that are
different from yours
• Learn more about how to use IT through summer workshop at SDSC, as
well as others sponsored by Societies
• Register your data using Index Terms through GEON Portal to facilitate
discovery of databases; use data ontology for discovery of data
• Build and Share tools and services for use in a web environment
• Construct concept maps in your discipline….leads to formal ontologies
required for semantic integration……remember Geo-Ontology
• EarthScope requires integrative capabilities
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
o
b
j
e
c
t
p
r
o
c
e
s
s
t
r
a
n
s
i
t
i
o
n
melt
residual
solids
cumulate
pha ses
p
a
r
t
s
o
f
c
o
m
p
o
n
e
n
t
s
pa rtly-molten
crust
melting
underplating
now where?
dehydra tion
melting
crusta l
melting
intrapla ting
wet melting
ra dioa ctive
heating
“fertile”
crusta l source
decompression
melting
heating
mixed
source
meta graywa cke
meta pelite
From Objects to
Processes---- just
the beginning of a
new integrative
world
amphibolite
Some processes and objects typically involved in crustal melting.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
From Cal Barnes,
Texas Tech, 2005
A.K.Sinha, Virginia Tech, 2005
Two important events at GSA
DIVISION OF GEOINFORMATICS
GEON and EarthScope Reception
Data to Knowledge
FIRST Business Meeting
will take place during the
upcoming National GSA
meeting ,Salt Lake City
Tuesday,18 October,
Ballroom D, 5.45-7.45pm
Monday, 17 October,
Hilton Salt Lake City Center
Alpine West Ballroom
5.00-7.00pm
Download