High Performance Storage System Harry Hulen 281-488-2473

advertisement
High Performance Storage System
Harry Hulen
281-488-2473
hulen@us.ibm.com
HSM: Hierarchical storage management
• Purposes of HSM:
– Extend disk space
– Back up disk files to tape
– Managed permanent archive
data
File System
• User sees virtually unlimited file system
– Data migrates “down” the hierarchy
– Migrated files may be asychronously purged
from higher level (e.g. disk) to free up space
• Multiple classes of service in a single
name space
– Disk to tape
– Tape only (SLAC approach)
– Complex, e.g. Striped disk to mirrored tape
disk
robotic tape
shelf tape
Big storage, like big computing, is
fundamentally an aggregation problem
A typical commercial SAN allocates a
few high-function disk arrays among
many non-shared file systems and
data bases on many computers
A
B
Our large shared-data SANs must
aggregate many disk arrays among a
few very large file systems and data
bases shared by many computers
C
SAN
SAN or LAN
A
B
C
Reserve
Administrator manages spare capacity
SAN File System manages spare capacity
HPSS architecture
• Shared, secure global file system
• Aggregate disks, tapes, and
bandwidth
• SAN and/or LAN connected
• Metadata-mediated via database
based on IBM DB2
• Highly distributed with multiple
data movers and subsystems for
scalability
• API for maximum control and
performance (e.g. “hints”)
• Parallel FTP (PFTP)
• Multi-petabyte capability in a
single name space (e.g. SLAC,
LLNL, BNL, ECMWF, DOD)
Core
Server
Based on
HPSS 6
LAN
Metadata
Disks
Backup
Core Server
Client
Computers
SAN
Tape-Disk
Movers
Disk
Arrays
Robotic Tape
Libraries
The HPSS Collaboration
•
•
•
U.S Department of Energy Laboratories are Co-Developers
– Lawrence Livermore National Lab. - Sandia National Laboratories
– Los Alamos National Laboratory
- Oak Ridge National Laboratory
– Lawrence Berkeley National Lab.
IBM Global Services in Houston, Texas
– Access to IBM technology (DB2, for example)
– Project management
– Quality assurance and testing (SEI CMM Level 3)
– Outreach: commercial sales and service
Advantages of Collaborative Development
– Developers are users: focus on what is needed and what works
– Keeps focus on the high end: the largest data stores
– A limited “open source” model for collaboration members and users
• “Since 1993”
HPSS performance trivia
• Capacity
– Largest HPSS installation (BNL) has 2 petabytes in a single
address space with no indications of an upper bound
– Calculations show ability to handle 100s of millions of files in a name
space
• File Access Rate (recent data with DB2, not tuned)
– 50 create-writes per second with 6-processor Power4 and AIX
(ECMWF)
– 20 create-writes per sec with 4-processor XEON with Linux (Test
lab)
– Hope to achieve 100 c-w/sec with optimization and newer hw
• Data Bandwidth
– Data rate benchmark 1 gigabyte per sec to 16 movers with 16
disks each (4 year old data)
– 2-way and 4-way striping of disk arrays and tapes for higher singlefile transfers
– Concurrent transfers among many clients, disk arrays, and tape
libraries for very high aggregate transfer capability
Disaster Recovery:
Difficulty grows with size
• For most cluster file systems, loss of disk corrupts entire file system
– Entire file system must be rebuilt or restored from backup
– Disk array availability about .9998 - .9999
• HPSS keeps metadata separate from data
– Metadata kept in a DB2 database
– HPSS disk files and tape files use the same metadata
– Loss of an entire disk array causes only loss of data not migrated to
tape (or to another disk), HPSS continues to run
– Restoration of system = reloading metadata
• Recovery Performance
– Capable of recovery in minutes from loss of any or all disk data (hours
to days in other large systems)
– Capable of recovery in hours from loss of all metadata (hours to days in
other large systems)
HPSS Plans
• 2004
– New HPSS infrastructure based on DB2 and eliminating DCE
(transparent to users)
– HPSS for Linux and “HPSS Light”
– LAN-less data transfers (SAN capability)
– Include support for HTAR and HSI utility packages
– Stand-alone PFTP offering and push protocol
• 2005
–
–
–
–
–
–
ASCI Parallel Local File Movers for Lustre archive
Globus Grid Gridftp capability
True VFS interface (initially Linux)
Additional small file performance improvements
Exploit multilevel hierarchy (e.g. MAID)
Better integration with application agents (e.g. Objectivity)
• 2006
– Object-based disk technology
– Exploit DB2 metadata engine for content management
HPSS for Linux will make HPSS
more widely available
HPSS
10s
HPSS for
Linux
Other
HSMs
D2D and
D2D2T
Backup
100s
1000s
• HPSS serves 8 of
the top 20 HPC
sites
• HPSS for Linux will
enable HPSS to
~1.5 PB
extend down from
XXL and XL to L
later this year
• HPSS for Linux will
~.5 PB
be offered in lowercost pre-configured
packages
ASCI Purple Parallel Local File Movers
Capability
Or
Capacity Platform
Application
HPSS Parallel
Local File
Movers open,
read, and write
Lustre files
using Unix
semantics
1
Lustre is a
shared global
file system in
development by
DOE, HP, and
others.
3
HPSS
2
Lustre Disk
Client
Archive
Agent
 Simplicity (configuration, equipment expenditures, networking)
Performance potential
 Minimize disk cache
Site-provided
agent controls
migration
based on file
content and
not on
empirical data
HTAR: Use Of Containers
saves metadata overhead
10 Records, 10 Files
10 Metadata Entries
Local Disks
30 Records, 3 Files
3 Metadata Entries
A file with multiple
records is a
container
Global Disks
Data is
mirrored,
format
is not
A multi-level hierarchy
• Example 3 level
– Disk Arrays
– Massive Arrays
of Idle Disks
– Tape Libraries
• MAID will fill
the “big middle”
between disk and
tape
• HPSS supports
multilevel
hierarchies today
Core
Server
Based on
HPSS 6
LAN
Metadata
Disks
Backup
Core Server
Client
Computers
SAN
Tape-Disk
Movers
Disk
Arrays
Massive
Array of
Idle Disks
(MAID)
Robotic Tape
Libraries
HPSS Grid support
• Short-term plans include GSI FTP
– LBL/NERSC has prototyped a GSI-enabled
HPSS PFTP client and daemon
– Tested by KEK lab (Japan) and U of Tokyo
• Long-term plans include
HPSS-compatible GridFTP
– Argonne National Lab is designing and
implementing
– Fully Globus compatible
– Target is later this year (2004)
How to build a really large system
to ingest and process data
Multiple primary file systems
Writing concurrently
does not interfere
with processing
access to primary disk
Ingest
Primary
Primary
(e.g.
GPFS)
Primary
(e.g.
GPFS)
(e.g. SGFS)
Database
engine
Batch into
containers
Process
Process
Process
Institutional
metadata
Institutional metadata
can direct Process
to Secondary disk in
case of loss of Primary
Tertiary
Secondary
(e.g. HPSS)
(e.g. HPSS)
Single hierarchical archive file system
Related documents
Download