Storage Systems in HPC John A. Chandy Department of Electrical and Computer Engineering University of Connecticut Research Summary • Storage Systems – Active Storage – Parallel File Systems – Reliable Data Storage – Active Storage Networks Storage Systems • Parallel Computing – Building parallel file systems to support HPC – Computation at the storage node – Data organization methods to improve performance • Reliable Data Storage – Customizable and extensible storage for reliability – Backup strategies using personal storage devices – Data security, trust, and reliability in the cloud Parallel File Systems • Network Attached Storage – Put the storage on the network with a computer (server) acting as the go-between Network Parallel File Systems • Separate the metadata from the storage Metadata Network Parallel File Systems • How do you improve metadata performance? – Distribute metadata services on data nodes – Use active storage and object services Active Storage • Allows us to run applications on storage nodes • Can dramatically reduce data traffic – Eliminate large network latencies • Take advantage of fast RAID arrays and SSDs – Drives bottle-necked by slow networks • Run applications in parallel across multiple nodes • Make use of unused processor time Programming Model • Based on object storage • RPC based – Executable objects – RPC calls have full access to all object functions – read, write, create, set attribute, etc. • Functions can be synchronous or async • Supports multiple languages (C, Java, Python) Programming Model • Based on work by Acharya, Riedel - Stream based • Our model is Remote Procedure Call (RPC) based o Use executable objects o Added command to begin execution o Allow full access to all OSD functions • Functions can be run sync or async o Due to iSCSI 30sec timeout o Working to allow queries for async • Allow parallel execution using async • Support multiple languages (c, java, python) Security • Multiprocess implementation – Limits AS functions from directly accessing objects – Limits access to the object services library – Enforces use of object security mechanisms • chroot sandboxing – C/Java engines run in a chroot directory – Allows limited system libraries – e.g. libc Security • Multiprocess Implementation o Limits AS functions from directly accessing objects o Limits access to the OSD services library Forces the use of RPC o Enforces the use of OSD security mechanisms • Chroot Sandboxing o Applied to engines o Limits engines inside a single directory o Allows limiting of libraries AS versions of libraries possible Active Storage Code Example Results: AES Local vs. Active Storage Results: Scaling with Multiple OSDs Results: C vs. Java High Performance Computing • Active storage network – Computing in the network – SIMD-like processing of data in motion – Adaptive computing network elements – Application optimizations for database queries, scientific applications, data mining, sort, etc. Active Storage Networks Data Sort BECAT Collaboration • Large Data Problems • Parallel File Systems Implementation