Dealing with Data: Choosing a Good Storage Technology for Your Application Rick Wagner HPC Systems Manager July 1st, 2014 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Application Focus Storage choices should be driven by application need, not just what’s available. But, applications need to adapt as they scale. Writing a few small files to an NFS server is fine… writing 1000’s simultaneously will wipe out the server. If you use binary files, don’t invent your own format. Consider HDF5. 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Storage Technologies File Systems ext4 NFS Devices Services memory Cloud block MySQL CouchDB Lustre PVFS FUSE 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Storage Technologies File Systems ext4 NFS Devices Services memory Cloud block MySQL CouchDB Lustre PVFS Each has its own performance characteristics FUSE Not all are available everywhere 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO File Systems Classic access, POSIX, Windows Most relevant: • Local • Remote • NFS, CIFS • Parallel (Lustre, GPFS) Local file systems are good for small and temporary files Network file systems very convenient for sharing data between systems 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Parallel File Systems IS 5030 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 17 16 19 18 21 20 23 22 25 24 27 26 29 28 31 30 33 32 MGT 35 34 CONSOLE 36 STATUS PSU 1 PSU 2 FAN RST Flash I/O Node 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 16 Compute Nodes 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 Rail 0 Dual 10GbE IS 5030 1 2 3 5 4 7 6 9 8 11 10 13 12 14 15 17 16 19 18 20 21 22 23 25 24 26 27 28 29 31 30 32 33 34 MGT 35 CONSOLE 36 STATUS PSU 1 PSU 2 FAN RST Lustre Filesystem Each switch connected to its 6 neighbors via 3 QDR links IS 5030 1 3 2 5 4 7 6 9 8 11 10 13 12 15 14 17 16 19 18 21 20 23 22 25 24 27 26 29 28 31 30 33 32 MGT 35 34 CONSOLE 36 STATUS PSU 1 PSU 2 FAN RST Dual 10GbE Flash I/O Node 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 16 Compute Nodes 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 Rail 1 IS 5030 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 MGT CONSOLE 36 STATUS PSU 1 PSU 2 FAN RST 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Parallel File Systems TRESTLES IB cluster GORDON IB cluster TRITON Myrinet cluster Mellanox 5020 Bridge 64 Lustre LNET Routers Myrinet 10G Switch 12 GB/s 100 GB/s 25 GB/s MDS Redundant Switches MDS Arista 7508 Arista 7508 10G 10G for Reliability and Performance MDS Metadata Servers 3 Distinct Network Architectures 32 OSS OSS OSS OSS OSS 72TB 72TB 72TB 72TB (Object Storage Servers) Provide 100GB/s Performance and >4PB Raw Capacity 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO A Cautionary Tale http://www.youtube.com/watch?v=gDfLXAtRJfY&feature=youtu.be 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Devices Raw block device (/dev/sdb) or RAM FS (/dev/shm) Useful in specific cases, like fast scratch Can be very good for small I/O 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Services Things accessed programmatically Frequents the last thought for HPC applications: A MISTAKE Databases Cloud storage (Amazon S3) Document storage (MongoDB, CouchDB) 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Know What You Need http://www.youtube.com/watch?v=F4OIDszDA9E 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Order of Magnitude Guide Storage file/directory file sizes BW IOPs Local HDD 1000s GB 100 MB/s 100 Local SSD 1000s GB GB/s 10000 RAM FS 10000s GB GB/s 10000 NFS 100s GB 100 MB/s 100 Lustre/GPFS 100s TB 100 GB/s 1000 Cloud Infinite TB 10 GB/s 0 DB N/A N/A N/A 10000 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Choosing My application needs to: Write a checkpoint dump from memory from a large parallel simulation. I should consider: A parallel file system and a binary file format like HDF5. 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Choosing My application needs to: Run analysis on remote systems and return the results to a web portal for users. I should consider: Cloud storage for results and input, and local scratch space for the job. 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Choosing My application needs to: Randomly access many small files, or read and write small blocks from large files. I should consider: A database, RAM FS, or local scratch space. 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Many Boxes Make a Sad Panda http://www.youtube.com/watch?v=N2zK3sAtr-4 Database logos courtesy of RRZEicons http://commons.wikimedia.org/ 2013 Summer Institute: Discover Big Data, August 5-9, San Diego, California SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO