PowerPoint - CUAHSI-HIS

advertisement
iRODS:
integrated Rule
Oriented Data System
Ray Idaszak
Director , Collaborative Environments
RENCI
University of North Carolina at Chapel Hill
iRODS
• Integrated Rule-Oriented Data System
– What It Is
• Origins, How it works, What’s different about it
– Why It Is
• Context, Role it serves
– Where It’s Going (Today, Future)
• Funding, Key efforts
iRODS Talk Outline
• Integrated Rule-Oriented Data System
– What is the Integrated Rule-Oriented
Data System?
• Origins, Technology, How it works
– Why It Is
• Context, Role it serves
– Where It’s Going (Today, Future)
• Funding, Key efforts
What’s Different about iRODS?
• iRODS lets you manage your data with your
rules and in your way…
Against a
backdrop of
federatable
community data
worldwide
via Policies
iRODS Background
• Integrated Rule-Oriented Data System
– Open-source initiative that represents 12+ years of
development and over $10M of NSF grant funding
– Another $8M+ funding pending (via NSF DataNet)
• Collaboration between
– UNC Chapel Hill
• Data Intensive Cyber Environments group (DICE)
– RENCI
• State-funded Cyberinfrastructure Institute at UNC Chapel Hill
– San Diego Supercomputing Center
iRODS Data and Policy Virtualization
User Client
Views & Manages
Data
Data Grid
User Sees Single “Virtual Collection”
/cuahsi/catalog
/cuahsi/modeling
/cuahsi/terrain
RENCI
Utah State Univ
SDSC
/cuahsi/modeling
/cuahsi/catalog
/cuahsi/terrain
The iRODS Data Grid installs in a “layer” over storage systems, so you can view, manage, access,
add, and share part or all of your data in a unified Collection.
Using a Data Grid - Details
RENCI
SDSC
iRODS Server
Rule Engine
iRODS Server Rule
Engine
USU
iCAT
Metadata
Catalog
iRODS Server
Rule Engine
• User asks for data using logical properties (client-server)
• Data request goes to 1st Server
• Server looks up information in Catalog (applies rules)
• Catalog responds 3rd Server has data
• 1st Server peer-to-peer asks 3rd Server to serve up data
• 3rd Server applies rules and serves data
Using a Data Grid – NEAR FUTURE (DB Resource)
RENCI
SDSC
iRODS Server
Rule Engine
iRODS Server Rule
Engine
USU
MySQL
PostgreSQL
Oracle
iCAT
Metadata
Catalog
iRODS Server
Rule Engine
• User not running SQL Server locally makes query
• Query goes to 1st Server
• Server looks up information in Catalog (applies rules)
• Catalog responds that 3rd Server has SQL db
• 1st Server sends 3rd Server SQL query
• 3rd Server applies rules and serves query result
Example Clients & Client Interfaces
(i.e. iRODS is client agnostic)
•
•
•
•
•
•
•
•
•
•
•
•
•
C library calls
.NET
Unix shell commands
Java I/O class library (JARGON)
SAGA
Web browser (Java-python)
Windows browser
WebDAV
Fedora digital library middleware
Dspace digital library
Parrot
Kepler workflow
Fuse user-level file system
- Application level
- Windows client API
- Scripting languages
iDrop
- Web services
- Drag and drop GUI
- Grid API
- User actions can be
- Web interface
mapped to policies
- Windows interface
- iPhone interface
- Digital library middleware
- Digital library services
- Unification interface
- Grid workflow
- Unix file system
iRODS Policies
• iRODS is described as a “Policy-based” data
management system
• Policy def’n: A proposed or adopted course of action
– ergo iRODS associates a “course of action” for all data
• Pre- and Post- “Policy Enforcement Points” (PEP)
– Pre: Course of action for data coming into iRODS
– Post: Course of action for data going out of iRODS
iRODS Policies
•
•
•
•
•
•
•
•
•
•
Retention, disposition, distribution, arrangement
Authenticity, provenance, description
Integrity, replication, synchronization
Deletion, trash cans, versioning
Archiving, staging, caching
Authentication, authorization, redaction
Access, approval, IRB, audit trails, report generation
Assessment criteria, validation
Derived data product generation, format parsing
Federation
iRODS Rule Engine, Workflows
• iRODS has its own built-in imperative interpreted
programming language called the Rule Engine
• The iRODS Rule Engine executes Microservices
• An iRODS “program” is called a Workflow
– A Microservice is one “step” of an iRODS Workflow
– iRODS Workflows are executed on the iRODS Server
– Arbitrary external WEB-SERVICES can be one “step” of
an iRODS Workflow
• Encapsulated as a microservice
iRODS Microservices
• Microservices are written in C and provide:
Well, really anything that can be done in C, and that’s in part what
makes iRODS so extensible, but typically:
–
–
–
–
–
Standard operations; e.g. file or format conversion
Queries on metadata catalog
Interaction with web services
Triggering external HPC workflows
Remote and delayed execution control
• Microservices communicate through
– Arguments, session variables, user space variables, etc.
Differentiating Workflows
• iRODS data grid workflows
– Low-complexity, a small number of operations
compared to the number of bytes in the file
– Server-side workflows
– Data sub-setting, filtering, metadata extraction
• Grid workflows
– High-complexity, a large number of operations
compared to the number of bytes in the file
– Client-side workflows
– Computer simulations, pixel re-projection
A few more iRODS notes…
• Authentication
– GSI (PKI), Kerberos, Shibboleth, Challenge-response
• Authorization
– Roles, user groups, resource groups, policy constraints, ACLs
• Transport
– TCP/IP (parallel I/O streams), Reliable Blast UDP
• Metadata catalog
– PostgreSQL, mySQL, Oracle
• Distributed rule engine
– Scheduler, messaging system, execution engine, rule base
iRODS Talk Outline
• Integrated Rule-Oriented Data System
– What is the Integrated Rule-Oriented Data
System?
• Origins, Technology, How it works
– Why is there an Integrated RuleOriented Data System?
• Context, Role it serves
– Where It’s Going (Today, Future)
• Funding, Key efforts
Entire Data Life Cycle: The iRODS Vision
Each data life cycle stage increases the value and usability of the original collection
Project
Collection
Data
Grid
Data
Processing
Pipeline
Private
Shared
Analyzed
Published
Preserved
Sustained
Local
Policy
Distribution
Policy
Service
Policy
Description
Policy
Representation
Policy
Re-purposing
Policy
Jeff et. al. hit
jackpot: collection
now accepted as
ref collection for
decades
Hydrology
Datagrid grows in
value to ecology
and biology and
federated
Jeff gets
data from a
sensor
Jeff shares
data with
colleagues
Together w/
colleagues,
analyzes data
and produces
results
Digital
Library
Reference
Collection
Federation
Results peerreviewed and
published
iRODS Talk Outline
• Integrated Rule-Oriented Data System
– What is the Integrated Rule-Oriented Data System?
• Origins, Technology, How it works
– Why is there an Integrated Rule-Oriented Data
System?
• Context, Role it serves
– Where Is iRODS going Today and in the
Future?
• Funding, Key efforts
iRODS: Future
• Pending 2011 NSF DataNet
– DataNet Federation Consortium (DFC)
• Includes CUAHSI!! (and several others)
• RENCI: Creating an “Enterprise” version of
iRODS
– http://iren-web.renci.org/irods-meeting/irods@renci2011UserMeeting-contribution.pdf
Summary
• iRODS fills an important niche
– Differentiation: It’s a Policy-driven distributed data management
system formally supporting the entire Data LifeCycle
• E.g. an iRODS DataGrid is a vehicle to fulfilling NSF’s Data
Management Plan requirement at the community scale
– Classification: Middleware
• iRODS is not intended to be all encompassing, but rather
work with other DataNets, Workflow Engines, systems like
CUAHSI HIS, etc. in canvasing a National Cyberinfrastructure
– i.e. Falls primarily in the “Data Services/Storage” portion of NSF’s
Data Enabled Science description
• With iRODS, the community is still responsible for:
– Schema, data formats, defining policies, defining web interfaces,
building analysis and knowledge tools, etc.
iRODS Credits
Principal Investigators
Richard Marciano, Reagan Moore (PI), Arcot Rajasekar
Additional Contributors
William Sims Bainbridge, Leesa Brieger, Luis Carriço, Sheau-Yen Chen, Michael
Conway, Jason Coposky, Vijay Dantuluri, Antoine de Torcy, Wei Ding, Kevin Gamiel,
Lucas Gilbert, Nuno Guimarães, Chien-Yi Hou, Bernard J. ( Jim) Jansen, Oleg
Kapeljushnik, Mounia Lalmas, Christopher A. Lee, Xia Lin, Gary Marchionini, Cathy
Marshall, Jason Reilly, Meredith Ringel Morris, Stefan Rüger, Wayne Schroeder,
Michael Stealey, Lisa Stilwell, Jaime Teevan, Paul Tooby, Michael Wan, Bing Zhu
iRODS Credits
Research Supported By
 NSF ITR 0427196, Constraint-Based Knowledge Systems for Grids,
Digital Libraries, and Persistent Archives (2004–2007)
 NARA supplement to NSF SCI 0438741, Cyberinfrastructure; From
Vision to Reality—Developing Scalable Data Management
Infrastructure in a Data Grid-Enabled Digital
 NARA supplement to NSF SCI 0438741, Cyberinfrastructure; From
Vision to Reality—Research Prototype Persistent Archive Extension
(2006–2007)
 NSF SDCI 0721400, SDCI Data Improvement: Data Grids for
Community Driven Applications (2007–2010)
 NSF/NARA OCI-0848296, NARA Transcontinental Persistent Archive
Prototype (2008–2012)
iRODS Credits
For More Information
http://www.irods.org
http://diceresearch.org/
http://dice.unc.edu/
http://www.renci.org/news/releases
/renci-teams-with-dice
Thank You.
http://www.renci.org
Download