Cloud Computing Technology and Science, 11/30 – 12/3, 2010. On-Demand Virtual Cluster in Cloud Web-Based OS Environment Hsi-En Yu, Chia- Yen Liu, Yi-Lun Pan, Chang-Hsing Wu, Hui-Shan Chen and Weicheng Huang National Center for High-performance Computing, Taiwan, R.O.C. {yun, chris, serenapan, hsing, chwhs, whuang}@nchc.org.tw Abstract In Cloud computing environment, there are various important issues, including information security, virtual computing resource management, routing, fault tolerance, and so on. Among these issues, the virtual computing resource management has emerged as one of the most important issues in past few years. As virtualization technologies become more prevalent, each Cloud user encounters the problem of building his/her own virtual cluster with less friendly interface of virtual resource management. To help resolving this issue, an On-Demand Virtual Cluster in Cloud Web-Based OS feature has been developed by the Distributed Computing Team in the National Center for High-performance Computing (NCHC). The On-Demand Virtual Cluster system incorporates the autonomic computing and exhibits the ability to reconfigure itself to adapt to the changes in the Cloud environment. And, it can discover, diagnose, and monitor Cloud computing resources automatically. In addition, an On-Demand Virtual Cluster widget embedded in the Cloud WebOS is developed. An extremely lightweight approach helps the acquisition of virtual computing services to provide dynamically creation mechanism. The approach leverages virtualization techniques combined with cluster queuing system and load balance migration mechanism. Keywords: Virtualization Techniques, WebOS, and Virtual Cluster. I. Implementation and System Architecture By integrating virtualization technologies and Web-based Operating System (WebOS), we have come up with an approach to acquiring Cloud services via Cloud Widgets in the Cloud WebOS environment. Currently, it integrates the Cloud, WebOS, and virtual machine to build a virtual computer in distributed computing environment [1]. This progress helps to lower the barrier for using Cloud Computing Environment. The designed Cloud WebOS has become necessary to provide Cloud users with an interface that is both user-friendly and straightforward. In order to develop an autonomic virtual computing resources management system based on decentralized resource discovery architecture, we propose the On-Demand Virtual Cluster system based on Cloud WebOS. This research focuses on virtual resources management with an interactive graphical user environment. In the Cloud WebOS environment, upon receiving a Cloud job request from the end users via Web Browser, the system provides the lightweight approach to acquire Cloud Services via Cloud Widgets, which in turn connect the Image Creator Widget, Virtual Machine (VM) Creator Widget, VM Monitor Widget, and VM Control Widget, within On-Demand Virtual Cluster system as shown in the figure 1. The proposed system helps creating the most adaptive computing resources automatically based on the demands from the end users. These Widgets of On-Demand Virtual Cluster system in Cloud WebOS also drive the Cloud middleware to operate physical computing resources and storages. Figure 1. The System Architecture of On-Demand Virtual Cluster in Cloud WebOS Meanwhile, as the figure 2 is sketched, the end users connect the Application Pool to get the software services, such as Nanometer MOS device simulation, via Cloud WebOS easily. After connecting Application Pool, the Cloud WebOS also can integrate the public service provider, such as Amazon EC2 [2] and so on. Upon receiving Cloud job request via Cloud WebOS, the On-Demand Virtual Cluster makes communication with Cloud Middlewares, which are Data Broker, Monitoring & Reporting, and Dynamic Provisioning. Figure 2. The Application of On-Demand Virtual Cluster The Data Broker collects data from the distributed physical sensors. The Monitoring & Reporting takes responsible for monitoring the status of physical machines and virtual machines. Finally, the Dynamic Provisioning provides the capability of automatic resource allocation and the feature of dynamic load prediction. It improves the performance of the dynamic scheduling over conventional scheduling policies. The NCHC Distributed Computing team not only built Cloud WebOS platform, along with the framework of EyeOS [3], but also incorporated self-developed Cloud Widgets into the Cloud WebOS platform. Therefore, the On-Demand Virtual Cluster system focuses on leveraging virtualization techniques combined with WebOS. On-Demand Virtual Cluster on the physical computing resources. In the figure 6, the main task of the VM Monitor Widget is to monitor the all the status of virtual machines, Networks, and the physical hardware. Furthermore, this Widget makes use of the information and the status provided by the Monitoring & Reporting Cloud Middleware. The VM Control Widget is designed for such a purpose with Cloud visualizer integrated as the core of its Cloud service, as shown in the figure 7 and the figure 8. II. Research Results - The Designed Cloud Widgets In addition to the basic Widgets, more advanced Cloud Computing Widgets are attempted as well. One of the most important results in this paper is that we have developed many Cloud Widgets with friendly graphical user interface, especially the On-Demand Virtual Cluster System in WebOS. As the shown in the figure 3, the kernel of this system architecture consists of four Widgets, including Image Creator Widget, VM Creator Widget, VM Monitor Widget, and VM Control Widget. Users without much learning effort can easily manage all of these widgets. Besides, these Widgets allow users to customize and to arrange their complicated computing tasks according to their requirements. Figure 4. Image Creator Widget Figure 5. VM Creator Widget Figure 3. Cloud Widgets: Connecting users to NCHC’s Cloud Resources The Image Creator Widget, in the figure 4, is to generate the customized base image and on-demand/specified application from the end users’ requirements. This Widget provides a complete and integrated HPC software stack, such as operating system, management tools, resource monitor, and even commercial package, such as the Matlab for example. VM Creator Widget - with the profile of virtual cluster demanded by the user provided, it will generate a specification, shown in the figure 5, which in turn is parsed by the VM Creator engine to create an Figure 6. VM Monitor Widget The following part, we discuss experimental results. There are three scenarios, including the performance of Network I/O, the performance of Disk I/O, and the performance of Message Passing Interface (MPI) program on VM cluster. Moreover, in order to improve the performance, we use the Virtio driver in the virtual machines. Virtio driver provides paravirtualized functions for network virtualization and disk I/O virtualization. The solutions of Laplace's equation are important in many fields that consist of science, fluid dynamics, astronomy and notably the fields of electromagnetism [6]. Therefore, we adopt the Laplace equation for heat conduction to be our benchmark with a MPI matrix to evaluate the performance. In the figure 9, we found the Network speed is tackled about 166 Mb/s without Virtio, because the I/O bottleneck is between virtual machine and hypervisor. Therefore, our proposed system is activated the Virtio. The performance of Network I/O is nearly the same with native machine. Figure 7. VM Control Widget Cloud Visualizer – Linux Booting Status Figure 8. VM Control Widget Cloud Visualizer – Windows 7 Booting Status Figure 9. The Performance of Network I/O III. Experimental Results The preliminaries of experiment are needed to set up, including the multi-sites physical computing environment, the virtual machine – KVM, Network Speed Test [4], and Disk I/O test tool - bonnie++ [5]. As shown in Table 1 and Table 2, we list the experiment parameter statement used in our experimental environments and the summary environment characteristics of NCHC computing resources. In the performance of Disk I/O scenario, we with Virtio and without Virtio. With Virtio, improved write performance about 120% performance about 20%, as the following shown. compared it can be and read figure 10 Table 1. Experiment Parameter Statement R Number of real nodes V Number of virtual machines RV Number of virtual machines created in each real node Table 2 Summary Environment Characteristics of NCHC Computing Resources Resource CPU Model Memory (GB) CPU Speed (MHz) #CPUs Nodes Job Manager Snowfox Intel(R)Xeon(R) CPU 2.5GHz, E5420 16 2500 112 14 Torque Figure 10. The Performance of Disk I/O In the performance of MPI program on virtual machine scenario, we study the decrement of performance due to the virtualization as shown in Figure 11. We take an MPI job, which needs 8 cores as example. Each VM runs respectively on a real node, the number of processor core in VM is set as 2, and we change ”R” and ”V”. In cluster computing, if we want to integrate real nodes and virtual machine to execute a parallel application collectively, we should understand the possible decrement of performance due to the virtualization. When the number of VM increases, the turnaround time of the executed job follows to increase, the overhead amounts to 11.73% in the case of using virtual machines entirely. In order to reduce the overhead, we activate the Virtio in the virtual machine, and the overhead can be lowered to 6.56%. Figure 13. The Distribution of VMs in Real Nodes (2/2) IV. Conclusion The proposed toolkit – On-Demand Virtual Cluster system in WebOS, provides Cloud users with an interface that is both user-friendly and more straightforward. In this project, we combine the WebOS platform with Cloud computing resources to offer users a friendlier Cloud environment. The On-Demand Virtual Cluster in Cloud Web-Based OS Environment not only helps user to build virtual cluster easily and automatically, but also provides different varieties of computing environment such as Linux, Win7, and so on. Furthermore, the ability to distribute and balance the workload across multiple physical as well as virtual computing resources will be tackled in the future development of this research. Figure 11. The Degree of Virtualization with Virtio and without Virtio We also study the performance by the different distribution of VMs in real nodes as shown in following figure 12 and figure 13. The total number of VMs is fixed as 8, the number of processor core in VM is set as 1, and we change ”R” and ”RV”. The purpose of this experiment mainly provides a suggestion for users to distribute the virtual machines in real nodes when creating VM cluster. The experiment results show that there is best performance in the case of “R8 RV1” (8 virtual machines in 8 real nodes). When the number of virtual machines in single real node increases, the turnaround time of the executed job follows to increase. The overhead amounts to 24.61% in the case of creating 8 virtual machines in single real node. This is because there is limited memory and CPU power in single real node. The whole performance would go down greatly if the available resource of real node runs out. Figure 12. The Distribution of VMs in Real Nodes (1/2) References [1] "Virtual Machine Hosting for Networked Clusters: Building the Foundations for ``Autonomic'' Orchestration", Laura Grit, David Irwin, Aydan Yumerefendi, and Jeff Chase. In the First International Workshop on Virtualization Technology in Distributed Computing (VTDC), November 2006 [2] http://aws.amazon.com/ec2/ [3] http://eyeos.org/ [4] http://vmstudy.blogspot.com/2010_04_01_archive.html [5] http://www.coker.com.au/bonnie++/ [6] http://en.wikipedia.org/wiki/Electromagnetism