Virtual Machine Disk Images Introspection and a bit more... Vasily Tarasov (SBU) Dean Hildebrand (IBM Almaden) Renu Tewari (IBM Almaden) Erez Zadok (SBU) File system and Storage Lab (FSL) Outline • How all that started • The idea of introspection • A couple of results from a 1st prototype • Future work • Benchmarking, Filebench Two important technologies Virtual Machines (VMs) - Computational resources consolidation - Flexible, efficient and scalable - Hardware support - Multiple solutions: VMWare, KVM, Xen, ... - Cloud-way of delivering services Network Attached Storage (NAS) - Storage consolidation - Scalable, manageable and efficient - NFS/CIFS available on majority of Operating Systems - NAS sales jumped from $540M in 1998 to $5.1B in 2003 - IBM SONAS Two technologies… Dean VM NAS …and they grow Dean VM NAS How do VM & NAS work together? Can we make them work better? VM IBM SONAS Typical Setup VMWare, KVM, XEN, ... Virtual Machines Host 1 VM 1-2 Virtual Machines Host 2 VM 2-2 Virtual Machines Host 3 VM 3-2 NFS CLIENT VM 3-1 NFS CLIENT VM 2-1 NFS CLIENT VM 1-1 Storage 1-1 GPFS Node 1 NFS SERVER 1 NFS SERVER 2 Storage 1-2 Storage 2-1 GPFS Node 2 Storage 2-2 Storage 3-1 GPFS Node 1 3 Storage 3-2 Storage 4-1 GPFS Node 1 4 Storage 4-2 Datapath Decomposed RM – Read-Ahead – Request Mangling and Scheduling Virtual File System On-Disk File System Block Layer CA RA RA RM CA RA RM Controller Driver Host RA – CAching VM Guest CA Applications Controller Emulator RM CA RA NFS Client RM NETWORK NFS Server NAS Virtual File System On-Disk File System Block Layer Controller Driver RM CA RA RA RM CA RA RM RM Collecting traces: setup Rand/Seq Read Rand/Seq Write Various I/O sizes Multi-file workloads Multi-process workloads Meta-data intensive VMWare ESX4 NFS Server Within VM trace 1Gbps VSCSI Layer Trace Block Layer Trace Network Trace Applications Virtual File System On-Disk File System Block Layer Controller Driver Host Rand/Seq Read Rand/Seq Write Various I/O sizes Multi-file workloads Multi-process workloads Meta-data intensive VM Guest User-Space Workload Collecting traces: setup Network Trace VSCSI Layer Trace Controller Emulator NFS Client NETWORK NFS Server NAS Virtual File System On-Disk File System Block Layer Controller Driver Block Layer Trace Some interesting results VM Guest Applications 4MB Virtual File System On-Disk File System 4KB Block Layer 1MB Controller Driver 128KB Host Controller Emulator NFS Client 32KB NETWORK NFS Server NAS Virtual File System On-Disk File System Block Layer Controller Driver 256KB I/O sizes change WIOV’11 - Revisiting the Storage Stack in Virtualized NAS Environments Meta-data Ops Data Ops Non-VM case # stat /foo/bar sys_stat(/foo/bar) NFS_GETATTR(foobar_fh) VM case Update attributes # stat /foo/bar List directories Creation/deletion Lookup Access permissions sys_stat(/foo/bar) Link/Symlink operations NFS_READ(dskimg_fh) NFS_WRITE(dskimg_fh) Come up with an idea Disk Image File Ext, NTFS, UFS, ... What is located in this region? READ(dskfh, offset, len) Offset Size NFS Server READ from: Inode Directory entry Data of specific file ... Do intelligent things! Prototype Results: Find 80% improvement 40 35 find 35 Runtime (sec) 30 25 Non-optimized Optimized 20 15 10 5 0 7 Prototype Results: Startup 2.6x times faster 130 sec 50 sec Future work • Solid implementation • More efficient cache policies • Optimizations on the write path • Analysis of more complex workloads Virtual Machine Disk Images Introspection a bit more... A Recent Study Concluded that… 1. Much of what researchers conclude in their studies is misleading, exaggerated, or flat-out wrong 2. A new claim about a research findings is more likely to be false than true 3. Researchers tend to publish positive results more often HotOS’11: Benchmarking FS Benchmarking: It is Rocket Science than negative findings 4. Chances be accepted to a conference are higher if 2005-2008to study thebyresults are “more exciting” J. Ioannidis A Medicine B D 5/4/2011 Biology Sociology E C Computer Science Physics 18 Filebench • Originally created by SUN Microsystem (RIP ) • Maintained by FSL • Used in many papers • Flexible: Workload Model Language – WML • Portable: Linux, FreeBSD, Solaris, MacOS, Windows * Filebench WML define fileset name=myfileset,size=16kb,entries=1000 define process name=reader,instances=1 { thread name=readerthread,memsize=10m,instances=10 { flowop read name=myread,filesetname=myfileset,iosize=2kb } } Filebench for Cloud Services flowops: • Reads • Writes • Creates POSIX NFS RPC • Deletes AFS RPC • +20 more sophisticated Cloud Filebench for Virtualized Environments define hypervisor name=hpv,type=esx3.1,instances=1 { define vm name=hpv,type=windows,instances=5 { define process name=reader,instances=1 { thread name=readerthread,memsize=10m,instances=10 { flowop read name=myread1,filesetname=myfileset,… } } } } Virtual Machine Disk Images Introspection and a bit more... Thank you! Vasily Tarasov (SBU) Dean Hildebrand (IBM Almaden) Renu Tewari (IBM Almaden) Erez Zadok (SBU) File system and Storage Lab (FSL)