moore

advertisement
Data Grids
Reagan W. Moore
San Diego Supercomputer Center
9500 Gilman Drive, La Jolla, CA 92093-0505
Phone: 858 534-5073 FAX: 858 534-5152
E-mail: moore@sdsc.edu
http://www.npaci.edu/DICE/
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Topics
• Data Grid Requirements
– Data management
– Automation
– Latency hiding
• Current technology
– Distributed collections / digital libraries / data grids
• State of the art systems
– Virtual data grids / persistent archives
– Emerging Standards
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Data Management Environments
• Code development
– Collaboration, check-out, versioning
• Run-time execution
– High performance access, locking, latency hiding,
automation, archival storage
• Publication
– Discovery, consistency, persistent archives
• Are the capabilities required by all three
environments compatible?
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Data Requirements are Met by
Collection Technology
• Provide three levels of abstraction for data,
information, and knowledge management
(bits, tagged attributes, relationships)
• Automate access through use of information
discovery on logical collections that span storage
systems
• Manage latency by streaming, caching,
replication, aggregation, remote proxies, staging
• Provide a persistent environment by building a
consistent environment over evolving technology
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Current Technology
• Logical data collections
– Storage Resource Broker / Metadata Catalog
• Abstract data management by building a data
handling system that interoperates with storage
systems (file systems, archives, databases)
• Abstract information management by building
information catalog management that interoperates
with information repositories (databases)
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
SDSC Storage Resource Broker
& Meta-data Catalog
Application
Resource,
User
User
Defined
C, C++,
Linux I/O
Unix
Shell
Java, NT
Prolog Web
Browsers Predicate
SRB
MCAT
Archives
Dublin
Core
HPSS, ADSM, HRM
UniTree, DMF
File Systems Databases
Unix, NT,
Mac OSX
Third-party
copy
Remote
Proxies
DB2, Oracle,
Postgres
DataCutter
Application
Meta-data
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Information Management Projects
• Digital Libraries
–
–
–
–
–
NSF Digital Library Initiative, Phase II - UCSB, Stanford
NLM Digital Embryo digital library - GMU
NPACI Digital Sky - Caltech 2MASS sky survey
California Digital Library - AMICO
NSF National SMETE Digital Library - UCAR / DLESE
• Grid Environments
–
–
–
–
NASA Information Power Grid - NASA Ames
DOE Data Visualization Corridor - LLNL
DOE Particle Physics Data Grid - Babar
NSF Grid Physics Network - U Fl
• Persistent Archives
– NARA Persistent Archive
– NHPRC - Scalable archives
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Data Grids
Data Grid - links multiple data collections
Separate name spaces
Separate administration domains
Heterogeneous database instances
Stage data from collection into the data grid
Database A
Data grid
Database B
The data grid is itself a collection that provides mechanisms
to hide latency and provide a global namespace
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
State-of-the-art Data Management
• Provide knowledge management abstraction
– Abstract the processes that create the derived
data product (Virtual data grid)
– Abstract the collection formation used to
organize the derived data products (Persistent
Archive)
• A persistent archive is a virtual data grid in
which the derived data products are data
collections
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Standards
• Object Management Group - OMG
– Model Driven Architecture for platform
independent models of services
• Platform dependent models transform an abstract
representation into CORBA, Java, C, ….
• Builds upon Uniform Modeling Language (UML)
• Manages life cycle for software services
– Common Warehouse Metamodel
• Provides abstract representation for collections that
can be used to migrate collections to alternate
databases
• Builds upon a subset of UML
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Standards
• World Wide Web Consortium - W3C
– Semantic Web for natural language queries to
collections.
– Builds upon the DARPA Agent Markup
Language for services, and logic manipulation
languages (DAML-L, OIL)
– Uses Resource Description Framework and
XML
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Standards
• ISO
– Topic maps manage relationships between
concept spaces and collection attributes
– Provide mechanisms to manage semantic
interoperability
• Global Grid Forum
– Provides authentication systems, data handling
systems, execution environments
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Knowledge Based Data Grids
Relationships
Between
Concepts
Knowledge
Repository for
Rules
Access
Services
Rules - KQL
Knowledge
Management
XTM DTD
Ingest
Services
Knowledge or
Topic-Based
Query / Browse
Attributes
Semantics
Information
Repository
SDLIP
Information
XML DTD
(Model-based Access)
Attribute- based
Query
Fields
Containers
Folders
Storage
(Replicas,
Persistent IDs)
National Partnership for Advanced Computational Infrastructure
Grids
Data
MCAT/HDF
(Data Handling System - SRB)
Feature-based
Query
San Diego Supercomputer Center
Data Intensive Computing Environment Group
Staff
Students - GSRA
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Reagan Moore
Chaitan Baru
Sheau Yen Chen
Charles Cowart
Amarnath Gupta
George Kremenek
Bertram Ludäscher
Richard Marciano
Arcot Rajasekar
Abe Singer
Michael Wan
Ilya Zaslavsky
Bing Zhu
Martin Kuhl
Liying Sui
Yang Yu
Valter Crescenzi
Students - Undergrad Interns
•
•
•
•
•
Peter Shin
Roman Olshanowsky
Shabbar Tambawala
Pratik Mukhopadhyay
+/- NN
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Further Information
http://www.npaci.edu/DICE
National Partnership for Advanced Computational Infrastructure
San Diego Supercomputer Center
Download