vCUDA: GPU Accelerated High Performance Computing in Virtual Machines Lin Shi, Hao Chen and Jianhua Sun 1 IEEE 2009 2009-12-31 Presenter: Hung-Fu Li HPDS Lab. NKUAS Lecture Outline Abstract Background Motivation CUDA Architecture vCUDA Architecture Experiment Result Conclusion 3 4 5 7 8 13 19 2 Abstract This paper describe vCUDA, a GPGPU computation solution for virtual machine. The author announced that the API interception and redirection could provide transparent and high performance to the applications. This paper would carry out the performance evaluation on the overhead of their framework. 3 Background VM(Virtual Machine) CUDA (Computation Unified Device Architecture) API (Application Programming Interface) API Interception, Redirection RPC(Remote Procedure Call) 4 Motivation Virtualization may be the simplest solution to heterogeneous computation environment. Hardware varied by vendors, it is not necessary for VMdeveloper to implements hardware drivers for them. (due to license, vendor would not public the source and kernel technique) 5 Motivation ( cont. ) Currently the virtualization does only support Accelerated Graphic API such as OpenGL, named VMGL, which is not used for general computation purpose. 6 CUDA Architecture Component Stack User Application << CUDA Extensions to C>> CUDA Runtime API CUDA Driver API CUDA Driver CUDA Enabled Device 7 vCUDA Architecture Split the stack into hardware/software binding User Application Part of SDK << CUDA Extensions to C>> CUDA Runtime API soft binding CUDA Driver API Direct communicate CUDA Driver CUDA Enabled Device hard binding 8 vCUDA Architecture ( cont. ) Re-group the stack into host and remote side. User Application Part of SDK << CUDA Extensions to C>> [v]CUDA Runtime API [v]CUDA Driver API Remote binding (guestOS) [v]CUDA Enabled Device(vGPU) CUDA Driver API CUDA Driver Host binding CUDA Enabled Device 9 vCUDA Architecture ( cont. ) [v]CUDA Runtime API [v]CUDA Driver API [v]CUDA Enabled Device(vGPU) Remote binding (guestOS) Use fake API as adapter to adapt the instant driver and the virtual driver. API Interception Parameters passed Order Semantics Hardware State Communication Use Lazy-RPC Transmission Use XML-RPC as high-level communication.(for cross-platform requirement) 10 vCUDA Architecture ( cont. ) Host OS Virtual Machine OS Non instant API lazyRPC Instant API 11 vCUDA Architecture ( cont. ) vCUDA API with virtual GPU Lazy RPC Reduce the overhead of switching between host OS and guest OS. Hardware states vGPU NonInstant Package AP API Invocation NonInstant API call LazyRPC Instant api call GPU Stub vStub 12 Experiment Result Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility 13 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility 14 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility 15 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility 16 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility 17 Experiment Result ( cont. ) Criteria Performance Lazy RPC and Concurrency Suspend& Resume Compatibility MV: Matrix Vector Multiplication Algorithm StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems MRRR: Multiple Relatively Robust Representations GPUmg: Molecular Dynamics Simulation with GPU 18 Conclusion They have developed CUDA interface for virtual machine, which is compatible to the native interface. The data transmission is a significant bottleneck, due to RPC XMLparsing. This presentation have briefly present the major architecture of the vCUDA and the idea of it. We could extend the architecture as component / solution to make the cloud computing support GPU. 19 End of Presentation Thanks for your listening. 20