User-space I/O for μs-level storage devices Anastasios Papagiannis, Giorgos Saloustros, Manolis Marazakis, and Angelos Bilas Institute of Computer Science (ICS) Foundation for Research and Technology – Hellas (FORTH) Greece Current Storage Technologies DRAM + Fast + Byte-addressable - Volatile Disk / Flash + Non-volatile - Slow - Block-addressable 2 Iris (presentation at WOPSS'16) Emerging Storage Technologies NVM connectivity ? 3 PCI-Express (10-100 μsec latency, DMA) DIMM (nsec-scale latency, load/store interface, caching) Iris (presentation at WOPSS'16) Linux I/O Path Application libC VFS File System Block Layer Device Drivers Device S/W overheads: more pronounced with low-latency storage devices. 4 Iris (presentation at WOPSS'16) Software overheads in I/O Path Request Latency in log (µs) 10000 [ source: Moneta, MICRO’2010 ] 1000 File System Operating System Hardware 100 10 1 Disk Flash NVM ~ 20,000 instructions to issue and complete a 4 KB I/O request. 5 Iris (presentation at WOPSS'16) Our goal How to design the I/O path to take full advantage of fast storage devices (NVM) ? 6 Performance (latency, throughput) close to hardware spec’s Scalable sharing Strong protection Iris (presentation at WOPSS'16) Outline of this talk Motivation Design Protection features in modern server processors Key-Value interface to storage Evaluation Conclusions 7 Iris (presentation at WOPSS'16) I/O Path: mostly in kernel-space (via system calls) Hardware access in user-space is restricted For security and isolation How to allow applications to use hardware features ? Key insight from the Dune prototype (OSDI’2012) 8 Use hardware-assisted virtualization to provide a process, rather than a machine abstraction, to safely and efficiently expose privileged hardware features to user programs while preserving standard OS abstractions Iris (presentation at WOPSS'16) Protection features in processors: privilege rings 9 Iris (presentation at WOPSS'16) Protection features in processors: virtualization 10 Iris (presentation at WOPSS'16) System calls Hypercalls Kernel Process hypercall syscall System Call System Call Handlers Interposer Host Guest Mode Mode Processor 11 Hypercalls are normally used by the kernel running in a VM. VMEXIT + VMENTRY sequences (save/restore state of host and guest) Memory accesses: protected by EPT Privileged State: managed/protected by VT-x Iris (presentation at WOPSS'16) I/O path: From user-space to device (via hypercall) DUNE module 12 Iris (presentation at WOPSS'16) I/O Interposer Common I/O path: intercept block accesses, serve them from key/value store User-space library (LD_PRELOAD) Translates read/write system calls to key-value calls (get/put) Allows running unmodified applications Maps each file as a set of key-value pairs Maintains in-memory metadata Forwards rest of I/O-related system calls to the Linux kernel 13 Iris (presentation at WOPSS'16) Iris kernel Protected access to key-value store Get/Put API for using the key-value store Permission checks and updates Leverages Intel/VT-x to run in a more privileged domain compared to user-space applications Enter kernel-space only for initialization and coarse-grain file operations 14 Iris (presentation at WOPSS'16) Iris Key-Value Store Storage engine (based on Be-tree) Atomicity and reliability guarantees (CoW update protocol) Direct access to storage hardware 15 DMA API (for PCIe-attached storage) Load/Store instructions (for memory bus-attached storage) Iris (presentation at WOPSS'16) Outline of this talk Motivation Design Protection features in modern server processors Key-Value interface to storage Evaluation Conclusions 16 Iris (presentation at WOPSS'16) Evaluation: Testbed 2 x Intel Xeon E5620 processors (2x4 cores) DRAM: 24GB Microbenchmark: FIO 8GB used for PCM emulation (PMBD driver – MSST’2014) Block-size: 512 bytes Queue-depth: 1 Direct I/O (bypassing Linux page cache) # I/O issuing threads: 1-4 Comparison with EXT4 and XFS 17 Iris (presentation at WOPSS'16) Evaluation: single-thread IOPS performance KIOPS Read Write ext4 269 203 xfs 261 199 Iris 445 439 Iris: 1.7x read IOPS, 2.2x write IOPS I/O latency: on the order of a few microseconds Reads: 2.24 – 3.8 Writes: 2.27 – 5.02 18 Iris (presentation at WOPSS'16) Evaluation: Read and Write IOPS Iris: 400 KIOPS per core (2x improvement) 19 Iris (presentation at WOPSS'16) Conclusions I/O path providing direct access to fast storage devices Key-value store for keeping data and metadata Minimizing software overheads Without sacrificing strong protection Guarantees atomicity and reliability Intel VT-x to provide protected access interface Encouraging preliminary evaluation results 20 400 KIOPS per core Better than ext4 and xfs: ~ 2x IOPS (random, small-sized) Iris (presentation at WOPSS'16) Questions ? Manolis Marazakis Institute of Computer Science, FORTH – Heraklion, Greece E-mail: maraz@ics.forth.gr Web: http://www.ics.forth.gr/carv Acknowledgements (EU FETHPC Grant 671553) 21 http://www.exanest.eu https://twitter.com/exanest_h2020 Iris (presentation at WOPSS'16) Backup Slides 22 Iris (presentation at WOPSS'16) PMBD Hybrid architecture: 23 Physical: NVM DIMMs attached to memory bus Logical: NVM exposed as block device to the OS Iris (presentation at WOPSS'16) Evaluation Millions random read IOPS 4K 1rqs 8 7 6 IOPS 5 XFS EXT4[j] EXT4[o] EXT4[w] Tucana 2.4x 4 3 2 1 0 1 24 2 4 8 16 #threads Iris (presentation at WOPSS'16) 32 64 128 CARV Laboratory @ FORTH (Heraklion, Greece) 25 Iris (presentation at WOPSS'16)