Overview of the SDSC Storage Resource Broker

advertisement
Overview of the SDSC
Storage Resource Broker
Wayne Schroeder
(and other SRB team members)
May, 2004
San Diego Supercomputer Center,
University of California San Diego
SDSC/UCSD/NPACI
What is SRB? (1 of 3)
• The SDSC Storage Resource Broker (SRB) is clientserver middleware that provides a uniform interface
for connecting to heterogeneous data resources over a
network and accessing unique or replicated data
objects.
• SRB, in conjunction with the Metadata Catalog
(MCAT), provides a way to access data sets and
resources based on their logical names or attributes
rather than their names and physical locations.
What is SRB? (2 of 3)
• The SDSC SRB system is a comprehensive
distributed data management solution, with features
to support the management, collaborative (and
controlled) sharing, publication, and preservation of
distributed data collections.
• The SRB also serves as middleware via a rich set of
APIs available to higher-level applications and by
providing a management layer on top of a wide
variety of storage systems.
What is SRB? (3 of 3)
The SRB is an integrated solution which includes:
–
–
–
–
–
–
–
–
–
–
a logical namespace,
interfaces to a wide variety of storage systems,
high performance data movement (including parallel I/O),
fault-tolerance and fail-over,
WAN-aware performance enhancements (bulk operations),
storage-system-aware performance enhancements ('containers' to aggregate files),
metadata ingestion and queries (a MetaData Catalog (MCAT)),
user accounts, groups, access control, audit trails, GUI administration tool
data management features, replication
user tools (including a Windows GUI tool (inQ), a set of SRB Unix commands, and Web
(mySRB)), and APIs (including C, C++, Java, and Python).
SRB Scales Well (many millions of files, terabytes)
Supports Multiple Administrative Domains / MCATs (srbZones)
And includes SDSC Matrix: SRB-based data grid workflow management system to
create, access and manage workflow process pipelines.
SRB Projects
• Digital Libraries
– UCB, Umich, UCSB, Stanford,CDL
– NSF NSDL - UCAR / DLESE
• NASA Information Power Grid
• Astronomy
– National Virtual Observatory
– 2MASS Project (2 Micron All Sky Survey)
• Particle Physics
– Particle Physics Data Grid (DOE)
– GriPhyN
– SLAC Synchrotron Data Repository
• Medicine
– Digital Embryo (NLM)
• Earth Systems Sciences
– ESIPS
– LTER
• Persistent Archives
– NARA
– LOC
• Neuro Science & Molecular Science
– TeleScience/NCMIR, BIRN
– SLAC, AfCS, …
Over 90 Tera Bytes in 16 million files
SRB Scalability
Storage Resource Broker (SRB)
Data brokered by SDSC instances of SRB**
As of 7/24/2003
Data_size (in
Project Instance
Count (files)
GB)
NPACI
6,050.00
2,317,368
Digsky
46,100.00
5,719,025
DigEmbryo
720.00
45,365
HyperLter
215.00
5,097
Hayden
7,078.00
59,399
Portal
968.00
27,250
SLAC
1,790.00
254,974
NARA/Collection
52.80
79,195
NSDL/SIO Exp
232.00
15,809
TRA
90.60
2,385
LDAS/SALK
498.00
9,858
BIRN
121.00
237,283
AfCS
95.30
18,762
UCSDLib
1,084.00
138,415
NSDL/CI
278.00
993,886
SCEC
12.60
18,660
TeraGrid
623.00
36,508
TOTAL
66,008.30
9,979,239
Users
367
68
23
27
142
316
43
51
23
25
60
138
20
29
113
38
1,978
3,461
As of 9/12/2003
Data_size (in
Count (files)
GB)
8,350.00
2,903,386
46,100.00
5,719,025
720.00
45,365
215.00
5,097
7,130.00
59,781
1,133.00
31,717
1,790.00
254,974
53.10
79,112
315.00
27,170
92.00
2,378
737.00
12,898
273.00
675,531
99.00
19,714
1,085.00
138,421
379.00
2,596,090
7,561.00
1,249,144
1,664.00
47,644
77,696.10
13,867,447
Users
372
68
23
28
158
339
43
55
23
26
66
145
20
29
114
39
1,942
As of 10/01/2003
As of 11/14/2003
Data_size (in
Data_size (in
Count
Count (files)
Users
GB)
GB)
(files)
8,570.00
2,946,526
375
8,822.00 2,995,432
46,100.00
5,719,025
68
42,786.00 6,076,982
720.00
45,365
23
720.00
45,365
215.00
5,097
28
215.00
5,097
7,830.00
59,983
158
7,835.00
60,001
1,141.00
32,396
342
1,244.00
34,094
1,790.00
254,974
43
2,108.00
294,149
53.90
79,118
55
67.00
82,031
418.00
39,253
23
603.00
87,191
92.00
2,387
26
92.00
2,387
767.00
12,906
66
824.00
13,016
382.00
2,048,132
156
389.00 1,084,749
102.00
20,260
20
107.00
21,295
1,085.00
138,421
29
1,085.00
138,421
379.00
2,596,090
114
465.00 2,948,903
9,680.00
1,561,396
40
12,274.00 1,721,241
1,745.00
49,106
2,073
10,603.00
433,938
3,490
81,069.90
66 TB
9.97 million 3 thousand
77 TB
13.9 million 3 thousand
** Does not cover data brokered by SRB spaces administered outside SDSC.
Does not cover databases; covers only files stored in file systems and archival storage systems
Does not
cover shadow-
81 TB
15,610,435
3,639
15.6 million3 thousand
90,239.00 16,044,292
90 TB
Users
377
69
23
28
168
352
43
56
26
26
66
167
21
29
114
43
2,229
3,837
16 million 3 thousand
Case Study: SRB in BIRN
BIRN Toolkit
Data Management
Queries/Results
Mediator
GridPort
Database
Scheduler
Globus
Database
SRB
File
System
Distributed Resources
MCAT
HPSS
Data Grid
Grid Management
Viewing/Visualization
Data Access
NMI
Applications
Data Model
Computational Grid
Collaboration
SRB History
• A DataGrid since SRB 1.0, Production 1997
• SDSC Started by General Atomics, 1985
– GA/UCSD Staff
– On UCSD Campus
– SRB by GA Employees
• Today, SDSC no longer GA, all UCSD
– All staff UCSD employees
• GA Commercial SRB Version (Nirvana)
–
–
–
–
Based on SRB 1.1.8 (2001)
Nirvana and SDSC versions diverged
SDSC SRB free to academic organizations
License from Nirvana for commercial
SRB – A Data Grid Solution
• Storage Resource Broker
– Collaborative client-server system that federates
distributed heterogeneous resources using uniform
interfaces and metadata
– Provides a simple tool to integrate data and metadata
handling – attribute-based access
– Blends browsing and searching
– Developed at SDSC
-
Operational for 5+ years;
Under continual development since 1997;
Customer-driven;
Brokering over 90 TeraBytes in over 16 million files at SDSC
Using a Data Grid - Details
DB
SRB
MCAT
SRB
SRB
SRB
SRB
SRB
•Data Grid has arbitrary number of servers
•Complexity is hidden from users
SDSC Storage Resource Broker
& Meta-data Catalog
Application
Resource,
User
User
Defined
C, C++,
Linux I/O
Unix
Shell
Java, NT
Browsers
Prolog
Web
Predicate
SRB
MCAT
Dublin
Core
Application
Meta-data
Archives
HPSS, ADSM, HRM
UniTree, DMF
File Systems Databases
Unix, NT,
Mac OSX
Third-party
copy
Remote
Proxies
DB2, Oracle,
Sybase
DataCutter
Federated SRB Operation
Peer-to-peer
Brokering
Read Application
in Boston
Logical Name
Or
Attribute Condition
Parallel Data
Access
1
6
SRB
server
SRB
server
3
4
SRB
agent
5
SRB
agent
Durham
2
San Diego
1.Logical-to-Physical mapping
2. Identification of Replicas
3.Access & Audit Control
5/6
R1
Server(s)
Spawning
MCAT
Data
Access
R2
R2
inQ Windows GUI
Virtual Hierarchical Collection Management
Attributes
• SRB metadata
–
–
–
–
–
Location, protocol
Unix semantics
Authorization, authentication
Latency management
Container aggregation
• Administrative
– Dublin core, provenance
– Annotations, comments
• Discipline specific attributes
– Collection
– User defined
Authentication Management




Grid Security Infrastructure (GSI)
Encrypted Password
GSS-API for Kerberos or DCE
Collection-owned Data




Collection ID installed at each storage system
Users authenticate themselves to the SRB
SRB authenticates to local server
Or GSI Delegation (Ananta Manandhar,
CCLRC)
Logical File Name
One of the major functions of SRB is the
mapping between a logical file name and its
physical file. The mapped info of a logical
filename includes:








Location of name in collection hierarchy
Physical file location: host name and path
Protocol: for fetching ‘local’ file
Unix semantics for file manipulation
Location in container
Audit trail
Access control list
Locking status
Replica Management
 Files can be replicated into any valid physical storage
resource registered in SRB.
 Each replica is managed by the same logical filename as
the original one and a unique replication number. Each
replica can have unique metadata.
 1-to-many Replication: A logical resource can contain
several physical storage resources.
 Multiple replicas can be made to the same storage resource
 Many Modes of Replication:
 Synchronous Replication; Sput to a logical resource
 Asynchronous Replication; Sput then later Sreplicate
 Out of Band Replication; Outside SRB, then register
Containers
• Physical Grouping of Objects
• Similar to tar but has significant differences
• Multiple Uses:
–
–
–
–
–
To take advantage of resource characteristics
To aid access patterns
Move data sets together
Tie together logically different files
Automatic Archiving/Caching
• Chaining of Containers
• Sharing of metadata
• Containers for Collections
Proxy Operation
• Proxy operation –
–
–
–
–
server performs operations on behalf of client
performs operations where the data are located
subset and filter operations – datacutter
Metadata extraction and ingestion checks
srbExecCommand() API and Spcommand
utility • request a specific server to execute a specific
command and stream the result to stdin
• used by the NVO(national virtual observatory)
cutout service
SRB – More Features
• Client Support
–
–
–
–
–
Pure Java Client
Web Services - WSDL, Matrix workflow system
Web Support - MySRB Extensions
Pure Java Client & Browser
inQ Version 3.1 and more Windows Support
• Administrative Support
– GUI-based Administration
– More Features - Resource, User, Method Management
– User-friendly Installation Procedures
Metadata Management




Metadata Insertion Through User Interfaces
Bulk Metadata Insertion
Template Based Metadata Extraction
Metadata Search
 system data
 user-defined metadata
 File Content Search: Key words are pre-extracted by a
template and saved as user-defined metadata.
Storage Resource Broker
• SRB wears many hats:
–
–
–
–
–
–
It is a distributed but unified file system
It is a database access interface
It is a digital library
It is a semantic web
It is a data grid system
It is an advanced archival system
Criticisms of SRB
• Not completely open source
– But semi-open and available to academics
• Not standards-based
– But internal protocols need not be
• Monolithic
– Integrated
– And well partitioned
Some SRB Weaknesses (my view)
• Difficult to explain and understand
– SRB does so much, people tend to learn subsets and
are often unaware of useful features
– Different groups are interested in different sets of
features
– An “elevator speech” is either vague or incomplete
• Not completely open source
• Collaborations difficult
– Need to expand
• Limited Staff
– Feature-focused projects (+/-): docs, error messages
Some SRB Strengths
• Integrated solution
– High performance
– Highly functional
– Relatively easy to enhance
•
•
•
•
•
•
Middle-ware and Complete-ware
Customer driven
Sound architecture
Mature, but also being actively developed
Growing user base
Highly coordinated centralized team
TeamSRB, San Diego
•
•
•
•
•
•
•
•
•
•
•
•
Reagan Moore (Program Director, DAKS)
Arcot Rajasekar (Director)
Michael Wan (Chief Architect)
Wayne Schroeder
George Kremenek
Bing Zhu
Sheau-Yen Chen
Charles Cowart
Arun Jagatheesan (GriPhyN)
Lucas Gilbert
Roman Olsachnowsky (BIRN)
Tim Warnock (BIRN)
Contacts
• For Additional Information:
Web: http://www.npaci.edu/dice/srb
Mail: srb@sdsc.edu
Mailing-list: srb-chat@sdsc.edu
Download