Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny University of Wisconsin Two trends s a set Dat Performan ce • Storage appliances address both trends. Storage Appliances: + and Storage Appliances: Great for basic file service Easy to manage: Plug in and it works Good performance: Specialized just for I/O Reliable and available too Storage Appliances for the Grid: Mismatch? Inflexible: Few, specific protocols (e.g., NFS) Costly: 10x the cost of PC + a few disks Difficult to integrate: Just one piece of the puzzle A Solution: NeST NeST: A Storage Appliance for the Grid Flexible: Multiple simultaneous protocols Low-cost: Use commodity machines Virtual protocol layer Dynamic adaptation Grid-aware: Integrate w/ higher-level systems Design specifically for the Grid Outline Introduction General architecture Design goals Flexibility Low-cost Grid-aware features NeST in the Grid example Conclusions NeST: Protocol layer Physical network layer Chirp HTTP Grid FTP NFS Common protocol layer Dispatcher Storage Mgr Transfer Mgr Physical storage layer Control flow Datal flow Concurrency Models Virtualizes different protocols Mediates access to network NeST: Dispatcher Physical network layer Chirp HTTP Grid FTP NFS Common protocol layer Dispatcher Storage Mgr Transfer Mgr Physical storage layer Control flow Datal flow Concurrency Models Mediates interaction between other components Gathers information, advertises NeST: Storage manager Physical network layer Chirp HTTP Grid FTP NFS Common protocol layer Dispatcher Storage Mgr Transfer Mgr Physical storage layer Control flow Datal flow Concurrency Models Space management Access control Virtualizes physical storage NeST: Transfer manager Physical network layer Chirp HTTP Grid FTP NFS Common protocol layer Dispatcher Storage Mgr Transfer Mgr Physical storage layer Control flow Datal flow Concurrency Models Implementss cheduling policies Chooses concurrency model Outline Introduction General architecture Design goals Flexibility Low-cost Grid-aware features NeST in the Grid example Conclusions Flexibility: Multiple protocols Problem: How to support multiple protocols? One approach: Just a Bunch of Servers (JBOS) Problems with JBOS Lack of control (scheduling) Painful administration No shared code Larger memory footprint wu-ftpd nfsd httpd JBOS Server NeST: Flexibility By Design NeST: Integrate protocols and gain advantage Implementation like VFS Integration introduces new challenges Different protocols allow different auth models More expensive to add a new protocol Less fault isolation NeST vs JBOS Linux cluster - Dual PIII - 1 GB Ram - linux 2.2.19 Each protocol - 4 clients - 10 MB files 30 25 35 30 25 20 15 15 5 0 Chirp GridFTP Apache 10 linux nfsd 20 wu-ftpd Server bandwidth (MB/s) 35 HTTP NFS 10 5 Total 0 • For each protocol, NeST is comparable to JBOS server. Exerting scheduling control Different scheduling policies FCFS Cache-aware [USENIX ‘02] Proportional share Proportional share scheduling Allows administrators to set protocol proportions e.g. favor NFS Very difficult in JBOS Server bandwidth (MB/s) Proportional share 35 30 25 Linux cluster - Dual PIII - 1 GB Ram - linux 2.2.19 Each protocol - 4 clients - 10 MB files 20 15 10 5 0 FCFS 1:1:1:1 1:2:1:1 1:1:1:4 Scheduling configuration • In most cases, achieves Jain’s metric of fairness > 0.98 (1 is “fair”). Outline Introduction General architecture Design goals Flexibility Low-cost Grid-aware features NeST in the Grid example Conclusions Low-Cost: New challenges Desire: Run on arbitrary OS on arbitrary PC Software-only, user-level storage appliance Currently on Linux (release 0.9) and Solaris (beta) Problem: Portable performance Performance under load is platform / workload dependent Threads or processes on some systems, events on others May also be workload dependent (e.g. whether in cache) NeST approach: Dynamic adaptivity Simultaneously support multiple concurrency models Monitor performance using each model Bias towards better model over time Adaptive Concurrency 1 0 Adaptive 0.5 Threads Adaptive Threads 1 Linux: 10 MB files Events 2 0 1.5 Ave time per request (sec) 3 Events Ave time per request (ms) Solaris: 1K files • Dynamic adaptation approaches “ideal” without static information. Outline Introduction General architecture Design goals Flexibility Low-cost Grid-aware features NeST in the Grid example Conclusions Grid-Aware Mechanisms Basic functionality Users and groups: Dynamic creation/deletion does not need administrative intervention Access control: Generic AFS-style ACLs Advanced functionality QoS: Preferential scheduling Advertises into global scheduling systems Flexible protocol and authentication mechanisms Self-cleaning storage guarantees: Lots Storage guarantees: Lots Characteristics of Lots: Self-cleaning Expired lots become “best-effort” lots Lot management Capacity: Total amount of data lot can store Duration: Time for which data is guaranteed to exist Set of files: Multiple files may co-exist within lot Either default set created by administrator, OR use resource management protocol to create before usage Implementation: File system quotas Advantage: Integrates cleanly with local access methods Disadvantage: Performance hit for large writes Outline Introduction General architecture Design goals Flexibility Low-cost Grid-aware features NeST in the Grid example Conclusions NeST in the Grid Advertisement Global Execution Manager Linux NeST Solaris NeST compute node compute node Home compute node Remote compute node NeST in the Grid Global Execution Mgr 1) 2) 6) 3) N GridFTP N 5) 4) NFS Home Remote 1) Home submits jobs 2) Global reserves space 3) Global coordinates xfer 4) Global starts jobs 5) Global coordinates xfer 6) Global terminates space Conclusions NeST: A storage appliance for the Grid Design goals: Gain manageability Without sacrificing performance Flexibility: Virtual protocol architecture Low-cost: Adaptation mechanisms Grid-aware: Space management Current status: release 0.9 available Future work Hot deployable NeSTs, lot management extensions Questions? http://www.cs.wisc.edu/condor/nest