Differentiated I/O services in virtualized environments Tyler Harter, Salini SK & Anand Krishnamurthy 1 Overview • Provide differentiated I/O services for applications in guest operating systems in virtual machines • Applications in virtual machines tag I/O requests • Hypervisor’s I/O scheduler uses these tags to provide quality of I/O service 2 Motivation • Variegated applications with different I/O requirements hosted in clouds • Not optimal if I/O scheduling is agnostic of the semantics of the request 3 Motivation VM 1 VM 2 VM 3 Hypervisor 4 Motivation VM 2 VM 3 Hypervisor 5 Motivation • We want to have high and low priority processes that correctly get differentiated service within a VM and between VMs Can my webserver/DHT log pusher’s IO be served differently from my webserver/DHT’s IO? 6 Existing work & Problems • Vmware’s ESX server offers Storage I/O Control (SIOC) • Provides I/O prioritization of virtual machines that access a shared storage pool But it supports prioritization only at host granularity! 7 Existing work & Problems • Xen credit scheduler also works at domain level • Linux’s CFQ I/O scheduler supports I/O prioritization – Possible to use priorities at both guest and hypervisor’s I/O scheduler 8 Original Architecture Low High Syscalls I/O Scheduler (e.g., CFQ) Low High Guest VMs QEMU Virtual SCSI Disk Syscalls I/O Scheduler (e.g., CFQ) Host 9 Original Architecture 10 Problem 1: low and high may get same service 11 Problem 2: does not utilize host caches 12 Existing work & Problems • Xen credit scheduler also works at domain level • Linux’s CFQ I/O scheduler supports I/O prioritization – Possible to use priorities at both guest and hypervisor’s I/O scheduler • Current state of the art doesn’t provide differentiated services at guest application level granularity 13 Solution Tag I/O and prioritize in the hypervisor 14 Outline • • • • • • KVM/Qemu, a brief intro… KVM/Qemu I/O stack Multi-level I/O tagging I/O scheduling algorithms Evaluation Summary 15 KVM/Qemu, a brief intro.. kernel-mode: switch into guest-mode and handle exits due to I/O operations user-mode: I/O when guest needs to access devices guest-mode: execute guest code, which is the guest OS except I/O Linux has allpart the of KVM module Relies on a mechanisms VMM Linux kernelasince Has 3 modes:kernel, virtualization capable needs to operate version 2.6 user, guesteither Intel CPU with several VMs. VT or AMD SVM extensions Linux Standard Kernel with KVM - Hypervisor Hardware 16 KVM/Qemu, a brief intro.. kernel-mode: switch into guest-mode and handle exits due to I/O operations user-mode: I/O when guest needs to access devices guest-mode: execute guest code, which is the guest OS except I/O Linux has allpart the of KVM module Relies on a mechanisms VMM Linux kernelasince Has 3 modes:kernel, virtualization capable needs to operate version 2.6 user, guesteither Intel CPU with several VMs. VT or AMD SVM extensions Linux Standard Kernel with KVM - Hypervisor Hardware 17 KVM/Qemu, a brief intro.. Each Virtual Machine is an user space process Linux Standard Kernel with KVM - Hypervisor Hardware 18 KVM/Qemu, a brief intro.. libvirt Other user space ps Linux Standard Kernel with KVM - Hypervisor Hardware 19 KVM/Qemu I/O stack Application in guest OS Issues an I/O-related system This system call will lead to call (eg: read(), an write(), stat()) within submitting I/O request from awithin user-space context of the the kernel-space of the virtual machine. VM Application in guest OS read, write, stat ,… System calls layer The I/O request will reach a device driver - either an ATA-compliant (IDE) or SCSI VFS FileSystem BufferCache Block SCSI ATA KVM/Qemu I/O stack Application in guest OS The device driver will issue privileged instructions to read/write to the memory regions exported over PCI by the corresponding device Application in guest OS read, write, stat ,… System calls layer VFS FileSystem BufferCache Block SCSI ATA KVM/Qemu I/O stack A These VM-exit instructions will take will place trigger for each VM-exits, of the that The privileged I/O related privileged will be handled by the core instructions are passed by the hypervisor to instructions KVM moduleresulting within the from Host's the original kernel-space I/O the QEMU machine emulator request context in the VM Qemu emulator Linux Standard Kernel with KVM - Hypervisor Hardware KVM/Qemu I/O stack These instructions will then be QEMUcompletion will generate block-access I/O qemu Upon of the system calls, emulated by device-controller emulation Thus the original I/O request will generate requests, in an a special blockdevice will "inject" interrupt into the VM that modules within QEMU (either as of ATA orHost as I/O requests to the kernel-space the emulationissued module originally the I/O request SCSI commands) Qemu emulator Linux Standard Kernel with KVM - Hypervisor Hardware Multi-level I/O tagging modifications Modification 1: pass priorities via syscalls Modification 2: NOOP+ at guest I/O scheduler Modification 3: extend SCSI protocol with prio Modification 2: NOOP+ at guest I/O scheduler Modification 4: share-based prio sched in host Modification 5: use new calls in benchmarks Scheduler algorithm-Stride πΌπ·π - ID of application π΄π VπΌππ – Virtual IO counter for πΌπ·π πβππππ = Shares assigned to πΌπ·π ππ‘πππππ = Global_shares/ πβππππ π Dispatch request() { Select the ID π which has lowest Virtual IO counter Increase ππΌππ by ππ‘πππππ if (ππΌππ reaches threshold) Reinitialize all ππΌππ to 0 Dispatch request in the queue π } 31 Scheduler algorithm cntd • Problem: Sleeping process can monopolize the resource once it wakes up after a long time • Solution: – If a sleeping process k wakes up, then set ππΌππ = max( min(all ππΌππ which are non zero), ππΌππ ) 32 Evaluation • Tested on HDD and SSD • Configuration: Guest RAM size 1GB Host RAM size 8GB Hard disk RPM 7200 SSD 35000 IOPS Rd, 85000 IOPS Wr Guest OS Ubuntu Server 12.10 LK 3.2 Host OS Kubuntu 12.04 LK 3.2 Filesystem(Host/Guest) Ext4 Virtual disk image format qcow2 33 Results • Metrics: – Throughput – Latency • Benchmarks: – Filebench – Sysbench – Voldemort(Distributed Key Value Store) 34 Shares vs Throughput for different workloads : HDD 35 Shares vs Latency for different workloads : HDD • Priorities are better respected if most of the read request hits the disk 36 Effective Throughput for various dispatch numbers : HDD • Priorities are respected only when dispatch numbers of the disk is lower than the number of read requests generated by the system at a time • Downside: Dispatch number of the disk is directly proportional to the effective throughput 37 Shares vs Throughput for different workloads : SSD 38 Shares vs Latency for different workloads : SSD • Priorities in SSDs are respected only under heavy load, since SSDs are faster 39 Comparison b/w different schedulers • Only Noop+LKMS respects priority! (Has to be, since we did it) 40 Results Hard drive/SSD Webserver Mailserver Random Reads Sequential Reads Voldemort DHT Reads Hard disk Flash 41 Summary • It works!!! • Preferential services are possible only when dispatch numbers of the disk is lower than the number of read requests generated by the system at a time • But lower dispatch number reduces the effective throughput of the storage • In SSD, preferential service is only possible under heavy load • Scheduling at the lowermost layer yields better differentiated services 42 Future work • Get it working for writes • Get evaluations on VMware ESX SIOC and compare with our results 43 44