NeST: Network Storage Technologies Building I/O Appliances on Commodity Systems John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau and Miron Livny http://www.ioappliance.com Outline Introduction Case studies Storage modules Conclusion Problem Statement Appliances are attractive because they are robust, reliable, available and especially because they are easy to use. To fulfill these criteria, traditional network appliances impose policy decisions on their users and are built either as kernel modules or upon specially designed kernels. “How to build portable, configurable I/O appliances?” Goal To create a network-storage “template” that produces a range of I/O appliances according to the storage needs of the target application and any constraints of the host system. Target App Network Storage Technologies Host System Perfect I/O Appliance Host system constraints Thread support Raw disk access Select interface Target app. storage needs Invariant and variant storage needs Invariant Reliable Low latency High bandwidth Easy to administer Cheap Target app. storage needs Variant Write concurrency Replacement costs Security and authentication needs Communication protocol Transfer unit Outline Introduction Case studies Storage modules Conclusion Building I/O appliances Four case studies ReqEx WiND Web proxy cache Condor checkpoint server What is ReqEx? ReqEx Staging Area Huge tape library (terabytes) Queue of Reqs Tape Robot A robot moves archived data one tape at a time to a temporary staging area. What is ReqEx? Condor Manager ReqEx Staging Area Data is transferred and stored locally to facilitate access by compute nodes. WAN Perfect I/O Appliance Compute cluster ReqEx variant storage needs Write concurrency No write (or read) concurrency Replacement costs Tape robot is very slow; objects cannot be lost Security and authentication needs Only owner can remove object Protocol ReqEx can be linked with NeST client library Transfer unit Whole object transfers only What is WiND? WiND variant storage needs Write concurrency No write concurrency Replacement costs Unknown Security and authentication needs Unknown Protocol Predefined specific WiND protocol Transfer unit Disk blocks are accessed directly What is a web proxy cache? Internet Perfect I/O Appliance Frequently accessed objects can be stored locally to decrease request latencies. Local Area Network Cache variant storage needs Write concurrency No write concurrency Replacement costs Negligible Security and authentication needs None Protocol HTTP Transfer unit Whole object transfer only What is Condor ckpt server? A condor job runs on an execute machine. Keyboard activity causes the job to be evicted. A snapshot of the process is sent to the checkpoint server. When the job migrates to another idle machine, the checkpoint file is recovered and progress resumes. Perfect I/O Appliance CCS variant storage needs Write concurrency No write concurrency Replacement costs The running time of the job (could be months) Security and authentication needs Unauthorized access cannot be allowed Protocol Can link with NeST client library Transfer unit Whole file transfer only I see you’re discussing checkpointing. Don’t forget about incremental. Outline Introduction Case studies Storage modules Conclusion Storage modules Static Configuration Name Protocols Space Administrative Interface Concurrency Architectures Runtime Adaptation Storage Management Data Semantics Configurable Components Concurrency architecture Data semantics Protocol layer Namespace Security and authentication Storage management Concurrency architecture “How can multiple storage requests be interleaved to maximize system throughput?” NOB POP Easy ... but uninteresting. POT Data semantics Must stored objects be protected from concurrent writes? Is transaction support necessary? What are the recovery costs for lost objects? Protocol layer Most applications can not link with NeST client libraries Most applications have their own specific communication protocols “How can a protocol layer easily communicate with arbitrary networking protocols?” Tower of Babel Namespace Flat Hierarchical “How do clients uniquely identify their stored objects?” Security and authentication Ownership Privacy Encryption Authentication Access rights Storage management Native filesystem Raw disk access Uninteresting from client perspective Outline Introduction Case studies Storage modules Conclusion Conclusions and future work Conclusions None Future work Lots Maybe you should try a little harder. Conclusions and future work How to most easily identify the variant storage needs of the target application? Config file? Installation script? Run-time monitoring? How to ensure that performance is at least as good as an appliance specifically designed for the target application?