Rain: Dynamically Provisioning Clouds within FutureGrid Geoffrey Fox, Andrew J. Younge, Gregor von Laszewski, Archit Kulshrestha, Fugang Wang Pervasive Technology Institute, Indiana University, Bloomington, IN USA Introduction This work presents a novel method for dynamically provisioning Cloud services onto HPC resources provided through the FutureGrid project. The dynamic provisioning process enables scientific researchers to build advanced platforms and services using FutureGrid infrastructure to leverage the power of HPC in a way that was previously impossible with traditional Grid infrastructure. The Runtime Adaptable INsertion (RAIN) service allows for researchers to leverage Cloud infrastructure to deploy their own platform and operating environment with ease, a task which was previously impossible with any HPC system available. Design RAIN Service Architecture Research has shown entire resource re-provisioning can be accomplished in a very short time, which warrants the use of RAIN due to its added benefits and ease of use to researchers. This advancement may finally lower the entry barrier into HPC for a large number of scientists who's requirements were too large and means too small take advantage of HPC resources. Dynamic Provisioning Dynamic provisioning embodies a key building block in the overall design of the FutureGrid project. Within FutureGrid, there is the concept of provisioning or "raining" both infrastructure and platforms on demand to commodity hardware based on user requirements for what software stack and operating environment suits them best. This concept of dynamic provisioning is a pivotal process in making the FG deployment unique and desirable to any set of scientific researchers requiring high performance computing today. In essence we are hosting both infrastructure and platforms within a service oriented architecture. We define raining infrastructure as the rapid deployment of Infrastructure as-a-Service (IaaS), which delivers a service that allows users to gain access to and fully manage a compute infrastructure suitable for their needs. Most common is the management of a virtualized set of images of Virtual machines. Such virtualization allows the datacenter to host a number of virtualized servers on the same hardware. Examples of IaaS, particularly within FG, are Nimbus and Eucalyptus clouds. However raining platforms with Platforms as a Service (PaaS), takes on the next level. It delivers services to the users integrating a computing platform and/or a solution stack to support the development of cloud applications. The platform therefore provides significant enhancements to the infrastructure building a cloud and reducing the cost and complexity associated with developing software on a simple cloud infrastructure. Examples of PaaS deployments that FG plans to support include Hadoop, Twister, and possibly other Message Queue systems as demand rises. Implementation Discussion In order to provide dynamic provisioning to users, not one, but multiple different tools will need to be integrated together to create a seamless experience for the user. As such, we have identified xCAT as the best fit for bare-metal level OS deployment. With xCAT, we can provision a wide array of Operating Systems on the available resources, thereby providing the environment a user wants with ease. While xCAT is remarkably well suited for OS deployment, an additional layer is needed to manage the provisioning of such resources and the scheduling of work to them. Adaptive Computing's MOAB Suite provides an elevated queue to accept tasks and control xCAT to provision the resources to effectively meet the needs of the queue. Using MOAB with xCAT and our own RAIN services could provide dynamic provisioning and adaption of resources within a particular site-wide deployment on the FutureGrid. With the creation and utilization of a wide variety of UNIX-based Operating Systems, a configuration management system is needed to keep everything working properly. This includes managing and updating installed software, adding security patches, maintaining configuration files and adding host keys and certificates on the fly to xCAT provisioned nodes and newly created virtual machines. This provides a seamless environment for both the users as well as the system administrators. With the vast array of virtual clusters and private clouds, a number of head nodes are required to manage each system. While these head nodes are not computationally intensive, they do need dedicated resources on their own. This includes head nodes for Nimbus clouds, Eucalyptus clouds, a PBS queue, and any other user-determined distributed systems. It is important to note that neither Moab, XCAT, BCFG2, or other tools which are often referred to by members of the project are able to provide the functionality needed for FG alone. In an implementation view they provide portions of the functionality and we will see how these tools can assist building the Architecture of FG. Together these tools comprise the building blocks for our RAIN service. The dynamic provisioning software architecture was deployed onto a FutureGrid test platform called Gravel as well as the production-level Sierra and India clusters. On Gravel the dynamic provisioning scenario was tested using VirtualBox virtual machines. The xCAT VirtualBox plugin was used to manage the power attributes of the VMs and the Moab Service Manager’s xCAT plugin was modified to add support for VirtualBox VMs and emulated real FutureGrid infrastructure providing an ideal development platform. The system was tested with various RHEL5, CentOS5 and Fedora images using stateful and stateless installs of each to obtain preliminary performance results. In a stateless setup the time taken to have a node provisioned and ready to accept jobs is affected by the time it takes to transfer the root image over the network in addition to the boot up time of the node with the image. When similar images were deployed using stateless and stateful modes we found no documentable difference between the boot times and the results varied between tests. The size of the image is a large part of the boot times in both cases and we plan to run further tests with smaller satellite images where the core image is small and most of the tools and software are mounted read only onto the image and study the best mode of deployment. Process View About FutureGrid The FutureGrid is an NSF-funded project which provides an experimental platform that accommodates batch, grid and cloud computing, allowing researchers to attack a range of research questions associated with optimizing, integrating and scheduling the different service models. FutureGrid will provide a significant new experimental computing grid and cloud computing test-bed to the research community, together with user support for third-party researchers conducting experiments on FutureGrid. The test-bed includes a geographically distributed set of heterogeneous computing systems, a data management system that will hold both metadata and a growing library of software images, and a dedicated network allowing isolatable, secure experiments. More more information, please visit our website at: http://futuregrid.org