GEON: The Geosciences Network Chaitan Baru Division Director, Science R&D

advertisement
GEON:
The Geosciences Network
& The National Laboratory for Advanced Data Research
(NLADR)
Chaitan Baru
Division Director, Science R&D
San Diego Supercomputer Center
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Outline
• About SDSC
• Cyberinfrastructure projects
• E.g., TeraGrid, BIRN, SCEC/CME, GEON, SEEK,
NEES, …
• GEON
• NLADR
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
SDSC Organization Chart
Administration &
Operations
Director (Fran Berman)
Exec Director (Vijay Samalam)
Strategic Partnerships
& External Relations
User Services &
Development
(Anke Kamrath)
– Consulting
– Training
– Documentation
– User Portals
– Outreach & Education
– User Services
Production
Systems
(Richard Moore)
Allocated Sys
Production Servers
Networking Ops
SAN/Storage Ops
Servers/Integration
Security Ops
TeraGrid
Operations
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
Technology
R&D
(Vijay Samalam)
Science
R&D
(Chaitan Baru)
Advanced
Cyberinfrastructure Lab
SRB Lab
Networking Research
HPC Research
Tech Watch Group
SDSC/Cal-IT2 Synthesis
Center
•Data & Knowledge Labs
Science Projects
• Bio-, neuro-, eco-, geoinformatics…
•NLADR
SAN DIEGO SUPERCOMPUTER CENTER
An emphasis on end-to-end
“Cyberinfrastructure” (CI)
• Development of broad infrastructure, including services,
not just computational cycles
• Referred to as “e-science” in the UK
• A major emphasis at SDSC on data, information, and
knowledge
• Increased focus on:
•
•
•
•
Strategic applications, and “strategic” communities
Training and Outreach, e.g. Summer Institutes
Community codes, but also data collections, databases
“Researcher-level” services, e.g. Linux cluster management software,
ease transition from local environment to large-scale computing
environment
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
SDSC and CI Projects
• SDSC is involved in several, NSF and NIH-funded, community-based CI
projects
• TeraGrid – Providing access to high-End, national-scale, physical
computing infrastructure
• BIRN – Biomedical Informatics Research Network, funded by NIH.
Integrating distributed brain image data
• GEON – Geosciences Network. Integrating distributed Earth Sciences
data
• SCEC/CME – Southern California Earthquake Consortium Community
Modeling Environment
• SEEK – Scientific Environment for Ecological Knowledge. Integrating
distributed biodiversity data along with tools
• OptIPuter – Distributed computing environment using
Lambda Grids
• NEES – Network for Earthquake Engineering Simulation.
Integrating distributed earthquake simulation and sensor data
• ROADNet – Realtime Observatories, Applications, and Data
management Network
• TeraBridge – Health Monitoring of Civil Infrastructure…
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
The TeraGrid:
High-end Grid Infrastructure
PSC
Purdue
Indiana
Oakridge
UT Austin
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Typical Characteristics of CI Projects
• Close collaboration between science and IT researchers
• Need to provide data and information management…
•
•
•
•
Storage management, archiving
Data modeling, semantic modeling—spatial, temporal, topic, process
Data and Information visualization
Semantic integration of data
• Logic-based formalisms to represent knowledge and map between
ontologies
• … as well as high-end computing
• BIRN, SCEC, GEON, TeraBridge – all have allocations on the
TeraGrid
• Convert community codes into Web/Grid services
• Enable scientists to access much larger computing capability from
local cluster/desktop
• Provide support for scientific workflow systems (visual programming
environments for Web services)
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Biomedical Informatics Research Network:
Example of a “community” Grid
PI of BIRN CC: Mark Ellisman
Co-I’s of BIRN CC: Chaitan Baru, Phil
Papadopoulos, Amarnath Gupta, Bertram
Ludaescher
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
The GEONgrid:
Another “community” Grid
Geological
Survey of
Canada
Rocky Mountain
Testbed
Chronos
Mid-Atlantic Coast
Testbed
OptIPuter
Livermore
NASA
USGS
KGS
Navdat
ESRI
SCEC
CUAHSI
PoP node
Partner Projects
Compute cluster
Data Cluster
Partner services
1TF cluster
www.geongrid.org
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Project Overview
• Close collaboration between geoscientists and IT to
interlink databases and Grid-enable applications
• “Deep” data modeling of 4D data
• Situating 4D data in context—spatial, temporal, topic, process
• Semantic integration of Geosciences data
• Logic-based formalisms to represent knowledge and map between
ontologies
• Grid computing
• Deploy a prototype GEON Grid: heterogeneous networks, compute
nodes, storage capabilities. Enable sharing of data, tools, expertise.
Specify and execute workflows
• Interaction environments
• Information visualization. Visualization of concept maps
• Remote data visualization via high-speed networks
• Augmented reality in the field
• Linkage to BIRN
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Funding Sources
• National Science Foundation ITR Project, 2002-2007, $11.6M
• Also, $900K for Chronos, $1M for CUAHSI-HIS (NSF)
PI Institutions
•
Arizona State University
•
Bryn Mawr College
•
Penn State University
•
Rice University
•
San Diego State University
•
San Diego Supercomputer Center/UCSD
•
University of Arizona
•
University of Idaho
•
University of Missouri, Columbia
•
University of Texas at El Paso
•
University of Utah
•
Virginia Tech
•
UNAVCO
•
Digital Library for Earth System
Education (DLESE)
Partners
•
California Institute for Telecommunications
and Information Technology, Cal-(IT)2
•
Chronos
•
CUAHSI-HIS
•
ESRI
•
Geological Survey of Canada
•
Georeference Online
•
HP
•
IBM
•
IRIS
•
Kansas Geological Survey
•
Lawrence Livermore National Laboratory
•
NASA Goddard, Earth System Division
•
Southern California Earthquake Consortium
(SCEC)
•
U.S. Geological Survey (USGS)
Affiliated Project
•
EarthScope
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Science Drivers (1)
DYSCERN:DYnamics, Structure, and
Cenozoic Evolution of the Rocky Mountains)
• Rocky Mountain region is at apex of a broad dynamic orogenic
plateau between stable interior of North America and the active plate
margin along the west coast.
• For the past 1.8 billion years, the region has been the focus of
repeated tectonic activity
• …has experienced complex intra-plate deformation for the past 300
million years.
• The deformation processes involved are the subject of considerable
debate…
• GEON is undertaking an ambitious project to map the lithospheric
structure in the Rocky Mountain region in a highly integrated
analysis and input the result into a 3-D geodynamic model
• …to elucidate our understanding of the Cenozoic evolution of this
region.
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Science Drivers (2)
CREATOR: Crustal Evolution—Anatomy of
an Orogen
• The Appalachian Orogen is a continental scale mountain belt that
provides a geologic template to examine the growth and break up of
continents through plate tectonic processes. The record spans a
period in excess of 1000 million years.
• Focus on developing an integrated view of collisional processes
represented by Siluro-Devonian Acadian Orogeny. Integration
scenarios will require IT-based solutions, including design of
ontologies and new tools
• Research activities include
• Organization of geologic and petrologic database for the mid-Atlantic test bed
• Development of an ontologic framework to facilitate web based analysis of
data.
• Registration of geologic and terrane maps, and data for igneous rocks
• Application of data mining techniques for discovering similarities in geologic
databases
• Design of workflow for Web-based navigation and analysis of maps and
igneous rock databases
• Development of Web services for mineral and rock classification, including
use of SVG-based graphics
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
GEONgrid Service Layers
Portal (login, myGEON)
GeonSearch
Registration
Services
GeoWorkbench
Data
Mediation
Services
Indexing
Services
Visualization
& Mapping
Services
Workflow
Services
Core Grid Services
Authentication, monitoring, scheduling, catalog, data transfer,
replication, collection management, databases
Physical Grid
RedHat Linux, ROCKS, Internet, I2, OptIPuter
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
GEON Workbench: Registration
• Uploadable:
• OWL ontologies
• OWL inter-ontology mappings
(“articulations”)
• Data sets (shape files)
• “Semantic Registration”
• Link data set D with ontology
O1 (w/ instance-based
heuristic)
• Query D using ontology O2
• (e.g. rock classification: O1=
GSC, O2=BGS)
• Ontology-Enabled
Application
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
A Multi-Hierarchical Rock Classification
“Ontology” (GSC)
Genesis
Fabric
Composition
Texture
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
Kai Lin, SDSC
Boyan Brodaric, GSC
SAN DIEGO SUPERCOMPUTER CENTER
Geology Workbench: Uploading Ontologies
Choose
ClickantoOWL
checkfile
its to
detail
upload
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
Name Space
Can be used to import this
ontology into others
SAN DIEGO SUPERCOMPUTER CENTER
Geology Workbench: Data Registration
Choose Ontology Class
Click on Submission
Data set name
Select a shapefile
Choose an ontology class
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Geology Workbench: Data Registration
Step 2: Map data to selected ontology
It contains information about
geologic age
AREA
PERIMETER
AZ_1000
AZ_1000_ID
GEO
PERIOD
ABBREV
DESCR
D_SYMBOL
P_SYMBOL
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Geology Workbench: Data Registration
Step 3: Resolve mismatches
Two terms are
not matched any
ontology terms
Manually mapping
algonkian into
the ontology
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Geology Workbench: Ontology-enabled Map
Integrator
All areas with the
age Paleozoic
Choose interesting
Classes
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Geology Workbench: Change Ontology
Submit a mapping
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
Ontology mapping
between British Rock
Classification and Canadian
Rock Classification
SAN DIEGO SUPERCOMPUTER CENTER
GEON Ontology Development
Workshops
• Workshop format
• Led by GEON PI’s
• Involves small group of domain experts from community
• Participation by a few IT experts in data modeling and knowledge
representation
• Igneous Petrology, led by Prof. Krishna Sinha, VaTech, 2003
• Seismology, led by Prof. Randy Keller, UT El Paso, Feb 24-25, 2004
• Aqueous Geochemistry, led by Dr. William Glassley, Livermore Labs,
March 2-3, 2004
• Structural Geology, led by Prof. John Oldow, Univ. of Idaho, 2004
• Metamorphic Petrology, led by Prof. Maria Crawford, Bryn Mawr,
under planning
• Chronos and CUAHSI are planning ontology efforts
• Also, on-going ontology work in SCEC
• Discussion with Steve Bratt, COO, W3C
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Community-Based Ontology
Development
•
Draft of an aqueous geochemistry
ontology developed by scientists
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
Bill Glassley (LLNL),
Bertram Ludaescher, Kai Lin (SDSC),
et al
SAN DIEGO SUPERCOMPUTER CENTER
Levels of Knowledge Representation
•
•
•
•
•
•
Controlled vocabularies
Database schema (relational, XML, …)
Conceptual schema (ER, UML, … )
Thesauri (synonyms, broader term/narrower term)
Taxonomies
Informal/semi-formal representations
• “Concept spaces”, “concept maps”
• Labeled graphs / semantic networks (RDF)
• Formal ontologies, e.g., in [Description] Logic (OWL)
• “formalization of a specification”
constrains possible interpretation of terms
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Use of Knowledge Structures
• Conceptual models of a domain or application,
(communication means, system design, …)
• Classification of …
• concepts (taxonomy) and
• data/object instances through classes
• Analysis of ontologies e.g.
• Graph queries (reachability, path queries, …)
• Reasoning (concept subsumption, consistency checking, …)
• Targets for semantic data registration
• Conceptual indexes and views for
•
•
•
•
searching,
browsing,
querying, and
integration of registered data
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Example of a Large Data Problem
Ramon Arrowsmith, Chris Crosby
Arizona State University
• E.g. manipulation, analysis and use of LIDAR
(LIght Detection And Ranging) data…
Ramon Arrowsmith,
Chris Crosby, ASU
th, 2004,
NASASeminar,
Seminar,June
March
2004 Edinburgh, Scotland
e-Science
2824,
SAN DIEGO FOR
SUPERCOMPUTER
CENTER
CYBERINFRASTRUCTURE
THE GEOSCIENCES
LIght Detection And Ranging
• Airborne scanning laser
rangefinder
• Differential GPS
• Inertial Navigation System
30,000 points per second at
~15 cm accuracy
• $400–$1000/mi2,
106 points/mi2, or
0.04–0.1 cents/point
Extensive filtering to remove
tree canopy (virtual deforestation)
Figure from R. Haugerud, U.S.G.S - http://duff.geology.washington.edu/data/raster/lidar/About_LIDAR.html
Ramon Arrowsmith,
Chris Crosby, ASU
th, 2004,
NASASeminar,
Seminar,June
March
2004 Edinburgh, Scotland
e-Science
2824,
SAN DIEGO FOR
SUPERCOMPUTER
CENTER
CYBERINFRASTRUCTURE
THE GEOSCIENCES
Northern San Andreas LIDAR: fault geomorphology
Full Feature DEM
Ramon Arrowsmith,
Chris Crosby, ASU
th, 2004,
NASASeminar,
Seminar,June
March
2004 Edinburgh, Scotland
e-Science
2824,
Bare Earth DEM
SAN DIEGO FOR
SUPERCOMPUTER
CENTER
CYBERINFRASTRUCTURE
THE GEOSCIENCES
Processing LiDAR data: the problems
• Huge datasets:
• 1 GB of point return (.txt)
data
• 150 MB of point return (.txt)
data
• 5.5 MB after filtering for
ground returns
Fort Ross, CA 7.5 min quad
• How do we grid these data?
• ArcGIS can’t handle it
• Expensive commercial
software not an option for
most data consumers
Ramon Arrowsmith,
Chris Crosby, ASU
th, 2004,
NASASeminar,
Seminar,June
March
2004 Edinburgh, Scotland
e-Science
2824,
SAN DIEGO FOR
SUPERCOMPUTER
CENTER
CYBERINFRASTRUCTURE
THE GEOSCIENCES
GRASS as a processing tool for LiDAR
• GRASS: Open source GIS
• Interpolation commands designed for large data sets
• Splines use local pt density to segment data into rectangular
areas for interpolation
• Can control spline tension and smoothness
• Modular configuration could easily be implemented within
the GEON work flow
• E.g.: User uploads point data to remote site where GRASS
interpolation module runs on super computer and returns
user a raster file.
• Host the large LIDAR data sets on GEON Data
Node at SDSC, with access to large cluster
computers
Ramon Arrowsmith,
Chris Crosby, ASU
th, 2004,
NASASeminar,
Seminar,June
March
2004 Edinburgh, Scotland
e-Science
2824,
SAN DIEGO FOR
SUPERCOMPUTER
CENTER
CYBERINFRASTRUCTURE
THE GEOSCIENCES
Accessing data from more than one
information source:
Federated Metadata Query
Metadata Query
GSID a la LSID (Life Sciences Identifiers)
Metadata Querying Middleware
• Search API
• Result format (XML, URI’s)
gsid:dlese:….
gsid:gn:…
gsid:iris:…
Query & Result Wrappers (return URI’s)
THREDDS
DLESE
XML Schema
Geography
Network
XML Schema
ArcCatalog,
ArcXML
16th Annual IRIS Workshop, June 9-13, 2004, Tucson, AZ
IRIS
SRB
MCAT,
CORBA,
Web services SRB API
Grid
Metadata
Catalog
Grid Services
SAN DIEGO SUPERCOMPUTER CENTER
Federated GSID-based Data Access
GSID-based request
gsid:srb:….
gsid:odbc:….
Data Access Middleware
• Map URIs to local access protocols
SRB ArcXML
GML
OpenDAP
http
ftp
ODBC GridFTP scp
JDBC
Data
16th Annual IRIS Workshop, June 9-13, 2004, Tucson, AZ
Item-level
Metadata
Collectionlevel
Metadata
SAN DIEGO SUPERCOMPUTER CENTER
iGEON – International Cooperation:
Experiences to date
• Canada
• Geological Society of Canada (Ottawa, Vancouver): Dr. Boyan Brodaric is one
of the original team members of GEON.
• Contributing important data sets by setting up a WMS (Web Mapping
Services) server at WestGrid in Vancouver, BC.
• 1Gbps link from Vancouver to GEON portal node at SDSC
• China
• Computational Geodynamics Lab will host a GEON PoP node for iGEON in
China
• Australia
• Interactions between GEON and EON (Earth and Ocean Network)
• Work with Dietmar Mueller to help run mantel convection codes on Linux
clusters and provide as a Web service in GEON
• Russia, Kyrgyztan
• Held discussion with scientists from Russian Academy on data integration and
use of Grid computing for geodynamics codes
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
International Cooperation:
Planned
• Australia
• Collaboration planned with ACCESS (www.access.edu.au), Australian
computational earth systems simulator. Install a GEON node.
• Mexico
• Meeting planned between CICESE earth scientists and GEON re. connectivity
into Mexico
• Japan
• Sending invitation to Earth Simulator visualization group to attend GEON
Visualization workshop.
• UK
• Visit to UK e-Science Center June 28/29, 2004
• Targeted 
• iGEON in Asia-Pacific could collaborate with the PRAGMA effort (Peter
Arzberger, PI)
• GEON will participate in next PRAGMA meeting as one of the featured
applications
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
Opportunities
• Define common standards, e.g.
• Global Geosciences Identifiers (URI…)
• Ontologies (Semantic Web standards)
• Web services definitions, and other standards
• Work towards linking GEON with other related efforts
• Travel funds for travel to each other’s science and IT workshops and
individual meetings
• Sabbatical, training visits
• Share computing capabilities for GeoScience
applications
• Technologies for 3D and 4D visualizations, on-demand
computing, …
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
FYI
Cyberinfrastructure Summer Institute
for the Geosciences
August 16-20, 2004, San Diego
See
www.geongrid.org/summerinstitute
for more information
e-Science Seminar, June 28th, 2004, Edinburgh, Scotland
SAN DIEGO SUPERCOMPUTER CENTER
National Laboratory for Advanced Data
Research (NLADR)
An SDSC/NCSA Data Collaboration
Co-Directors:
Chaitan Baru, Data and Knowledge System (DAKS)
SDSC
Michael Welge, Automated Learning Group (ALG)
NCSA
National Laboratory for Advanced Data Research
NLADR Vision
• Collaborative R&D activity between NCSA
(Illinois) and SDSC in advanced data
technologies
• …guided by real applications from science communities
• …to develop broad data architecture framework
• …within which to develop, deploy, and test data-related
technologies
• …in the context of a national-scale physical infrastructure
(Internet-D)
National Laboratory for Advanced Data Research
NLADR Focus
• Solving the data needs of real applications
• Initially focused on some Geoscience applications (GEON,
LEAD)
• Also, looking into environmental science applications
(LTER, NEON, CLEANER)
• NLADR Fellows program—enable postdocs, faculty, staff
from domain sciences to partner with NLADR staff
National Laboratory for Advanced Data Research
Core Activities
• Internet-D: Fielding a distributed, data
testbed
• Core technologies and reference
implementations of “data
cyberinfrastructure”
• Standards activities
• Evaluation: usability and performance
National Laboratory for Advanced Data Research
Internet-D
• Distributed data testbed
• Initially, within networked environment between SDSC and NCSA.
• Open to community
• …for testing new data management and data mining approaches,
protocols, middleware, and technologies.
• A minimum configuration will include
• Distributed infrastructure, e.g. cluster systems at each end-point—with
maximum memory and adequate disk capability, high-speed network
connectivity across the end points.
• High-end configuration
• Prototype environment to represent very-high end, “extreme”
capability.
• Provide highest possible end-to-end bandwidth, from disk-to-disk
• Very large main memory and very large disk arrays.
National Laboratory for Advanced Data Research
NLADR Core Technologies
• Core data services
• Caching, replication, prefetching, multiple transfer streams
• Integration of distributed data
• Integrate independently-created, distributed,
heterogeneous databases
• Mining complex data
• Data mining of distributed, complex scientific data,
including exploratory analysis and visualization
• Long-term Data Preservation
• Developing tools to preserve data for long periods of time
National Laboratory for Advanced Data Research
NLADR Evaluation Activities
• Data Grid benchmarking efforts
• Functionality and performance
• In multi-user, concurrent access environments
• Online, on-demand
• Evaluate parallel filesystems, parallel database
systems
• Develop “data experts” for various modalities of
data
• Investigate and characterize architectures,
capabilities for long term preservation
National Laboratory for Advanced Data Research
Joining NLADR
• No formal process yet
• Contact me (baru@sdsc.edu), if interested
• Should be willing to contribute one or more of…
• Interesting applications
• People time, to work on NLADR objectives
• Infrastructure (servers, storage, networking) towards
Internet-D
National Laboratory for Advanced Data Research
Thank You!
• Visit www.geongrid.org
• Stay tuned for: www.nladr.net
• My email: baru@sdsc.edu
th, 2004,
NASASeminar,
Seminar,June
March
2004 Edinburgh, Scotland
e-Science
2823,
SAN DIEGO FOR
SUPERCOMPUTER
CENTER
CYBERINFRASTRUCTURE
THE GEOSCIENCES
Download