FutureGrid Image Management and Rain Presenters: Javier Diaz Gregor von Laszewski https://portal.futuregrid.org Science Cloud Summer School 2012 Motivation • FutureGrid (FG) is a testbed providing users with grid, cloud, and high performance computing resources • One of the goals of FutureGrid is to provide a testbed to perform experiments in a reproducible way among different infrastructures • We need mechanism to ease the use of these infrastructures • FG Rain and Image Management frameworks allow users to easily create customized environments by placing suitable images onto the FG resources https://portal.futuregrid.org Science Cloud Summer School 2012 Rain • In FG, dynamic provisioning goes beyond the services offered by common scheduling tools that provide such features • We want to easily provide custom HPC environment, Cloud environment, or virtual networks on-demand • Example: “rain” a Hadoop environment into a set of machines – fg-rain -n 8 –hadoop –j myHadoopApp.jar … – Users and administrators do not have to set up the Hadoop environment as it is being done for them • Makes use of the Image Management Framework https://portal.futuregrid.org Science Cloud Summer School 2012 Architectural Overview RAIN Image Management Client Image Management Server Image Generation Portal FG Shell Image Repository API Image Registration Image Instantiation External Services: Chef, Security tools https://portal.futuregrid.org IaaS and Bare-Metal HPC Infrastructures Cloud IaaS Frameworks Nimbus Eucalyptus AWS OpenNebula OpenStack HPC Clusters Bare Metal Science Cloud Summer School 2012 Image Management • Key component in any modern compute infrastructure (virtualized or non-virtualized) • Processes part of the image management life-cycle: Creating and Customizing Images User selects properties and software stack features meeting his/her requirements (b) Storing Images Abstract Image Repository https://portal.futuregrid.org (c) Registering Images Adapting the Images (a) http://futuregrid.org (d) Instantiating Images Nimbus Eucalyptus OpenStack OpenNebula Bare Metal Science Cloud Summer School 2012 FutureGrid Image Management Framework • Framework provides users with the tools needed to ease image management across infrastructures • Users choose the software stacks of their images and the infrastructure/s • Targets end-to-end workflow of the image life-cycle • Create, store, register and deploy images for both virtualized and non-virtualized resources in a transparent way • Allows users to have access to bare-metal provisioning (departure from typical HPC centers) – Users are not locked into a specific computational environment offered typically by HPC centers https://portal.futuregrid.org Science Cloud Summer School 2012 Architectural Overview Image Management Client Image Management Server Image Generation Portal FG Shell Image Repository API Image Registration Image Instantiation External Services: Chef, Security tools https://portal.futuregrid.org IaaS and Bare-Metal HPC Infrastructures Cloud IaaS Frameworks Nimbus Eucalyptus AWS OpenNebula OpenStack HPC Clusters Bare Metal Science Cloud Summer School 2012 Image Generation • Creates images according to user’s specifications: • OS type and version • Architecture • Software Packages Command Line Tools Requirements: OS, version, hadrware,... Yes • Software installation may be Retrieve aided by Chef Image from • Images are not aimed to any Repository specific infrastructure • Image stored in Repository or returned to user https://portal.futuregrid.org Matching Base Image in the Repository? No Generate Image Image Gen. Server OpenNebula Base OS VM VM CentOS 5 VM CentOS 6 Ubuntu 12 X86_64 X86_64 X86 Base Software Base Image FG Software Install Software Cloud Software Update Image User Software User's Image Store in Image Repository Science Cloud Summer School 2012 Image Repository • Service to query, store, and update images • Unique interface to store various kind of images for different systems • Images are augmented with some metadata which is maintained in a searchable catalog • Keep data related with the usage to assist performance monitoring and accounting • Independent from the storage back-end. It supports a variety of them and new plugins can be easily created https://portal.futuregrid.org Science Cloud Summer School 2012 Image Metadata User Metadata imgId Image’s unique identifier Field Name owner owner userId os Operating system User’s unique identifier description Description of the image fsCap Disk max usage (quota) tag Image’s keywords fsUsed Disk space used vmType Virtual machine type lastLogin Last time user used the framework imgType Aim of the image status permission Access permission Active, pending, disable imgStatus Status of the image role Admin, User createdDate Upload date ownedimg # of owned images lastAccess Last time the image was accessed Field Name Description Description accessCount # times the image has been accessed size Size of the image https://portal.futuregrid.org Science Cloud Summer School 2012 Image Registration I • Adapts and registers images into specific infrastructures • Two main infrastructures types are considered to adapt the image: – HPC: Create network bootable images that can run in bare-metal machines (xCAT/Moab) – Cloud: Convert the images in VM disks and enable VM’s contextualization for the selected cloud https://portal.futuregrid.org Science Cloud Summer School 2012 Image Registration II • User specifies where to Command Line Tools register the image Requirements: Image, • Optionally, user can select Kernel, Infrastructure User's Image kernel from a catalog Customize Image for: • Decides if an image is HPC Eucalyptus OpenNebula secure enough to be OpenStack Nimbus Amazon registered Image Customized for the selected Infrastructure • The process of registering Security Check an image only needs to be done once per infrastructure Upload Image to the Infrastructure https://portal.futuregrid.org Retrieve from Image Repository Image is Ready for Instantiation in the Infrastructure Science Cloud Summer School 2012 Register Image in the Infrastructure FutureGrid Image Management and Rain Examples https://portal.futuregrid.org Science Cloud Summer School 2012 Starting to use the software • Requirements – FutureGrid portal account – Accounts in the infrastructures you want to use (Eucalyptus, OpenStack, Nimbus, HPC) – Request account to use Image Management and Rain software • Software is installed in India login node – ssh jdiaz@india.futuregrid.org • Load FutureGrid software – module load futuregrid https://portal.futuregrid.org https://portal.futuregrid.orgScience Cloud Summer School 2012 Generate an Image • fg-generate -u jdiaz -o centos -v 5 -a x86_64 – s python26, wget Generate img 1 Deploy VM And 2 Gen. Img 3 Store in the Repo or Return it to user https://portal.futuregrid.org Science Cloud Summer School 2012 Generate an Image • fg-generate -u jdiaz -o centos -v 5 -a x86_64 s python26, wget Client output: Generate img Deploy VM And 2 Gen. Img Image generator client... Please insert the password for 1 the user jdiaz Password: Selected Architecture: x86_64 Connecting server: i120:567913 Your image requestStore is in the queue to be processed in the Repo ------wait here if too many request are being processed-----or Your image request is being processed Return it to user Generating the image ------wait here until finished-----Your image has be uploaded in the repository with ID=915678426632408832461797 The image and the manifest generated are packaged in a tgz file. Please be aware that this FutureGrid image does not have kernel and fstab. Thus, it is not built for any deployment type. To deploy the new image, use the IMDeploy command. https://portal.futuregrid.org Science Cloud Summer School 2012 Image Repository Examples • Query the image repository – fg-repo –u jdiaz –q “* where os=centos_5” Authentication OK 2 items found imgId=215369546596144595085417, os=centos_5, arch=x86_64, owner=jdiaz, description=None, tag=jdiaz2699012769, vmType=none, imgType=machine, permission=private, status=available imgId=68725515834828774883357, os=centos_5, arch=x86_64, owner=jdiaz, description=None, tag=jdiaz1786816389, vmType=none, imgType=machine, permission=private, status=available • Upload an Image – fg-repo –u jdiaz –p imagefile.tgz “os=centos & vmtype=kvm & description=my image” Checking quota and Generating an ImgId Authentication OK Uploading image. You may be asked for ssh/passphrase password Imagefile.tgz 100% 53 0.1KB/s 00:00 Registering the image https://portal.futuregrid.org The image has been uploaded andhttps://portal.futuregrid.org registered with id 211913675261934066702430 Science Cloud Summer School 2012 Image Repository Examples • Add User – fg-repo –u jdiaz --useradd userId Authentication OK User created successfully. Remember that you still need to activate this user (see setuserstatus command) • Image Usage – fg-repo –u jdiaz –histimg Authentication OK imgId=191563243441508818679593, createdDate(UTC)=2011-10-13 21:43:30, lastAccess(UTC)=2011-10-24 17:37:45, accessCount=16, imgId=111462205747829171557134, createdDate(UTC)=2011-10-14 20:36:40, lastAccess(UTC)=2011-10-21 13:48:04, accessCount=4, imgId=21870735808909675281040, createdDate(UTC)=2011-10-07 20:36:33, lastAccess(UTC)=2011-10-07 20:36:33, accessCount=0, https://portal.futuregrid.org Science Cloud Summer School 2012 Register an Image for HPC • fg-register -u jdiaz -r 2131235123 -x india Register img from Repo 1 Register img in Moab and 6 recycle sched Get img from Repo 2 Customize img 5 3 Return info about the img 4 https://portal.futuregrid.org Register img in xCAT (cp files/modify tables) Science Cloud Summer School 2012 Register an Image for HPC • fg-register -u jdiaz -r 2131235123 -x india Client output: Starting image deployer... Please insert the password for theRegister user jdiaz img Password: from Repo Get img from Connecting to xCAT server Repo 1 ------wait here if an image is being registered----2 Authentication OK Register img in and registering image on xCAT Customizing Customize img 5 Moab------wait and here 6 until finished----3 recycle sched to MoabReturn Connecting server info Your image has been registered in xCAT as centosjavi960524558. about the img Register img in xCAT Please allow a few minutes for xCAT to register the image before attempting to use it. (cp files/modify tables) To boot an machine using your image: qsub -l os=<imagename> 4 To check the status of the job you can use checkjob and showq commands https://portal.futuregrid.org Science Cloud Summer School 2012 Register an Image stored in the Repository into OpenStack • fg-register -u jdiaz -r 2131235123 -s india -v ~/novarc Deploy img from Repo 1 Upload the img to the 5 Cloud 4 Return img to client https://portal.futuregrid.org Get img from Repo 2 Customize img 3 Science Cloud Summer School 2012 Register an Image stored in the Repository into OpenStack • fg-register -u jdiaz -r 2131235123 -s india -v ~/novarc Client output: Starting image registration... Please insert the password for the user jdiaz Password: Deploy img Authentication OK from Repo ------wait here until finished----Get img from Retrieving image. You may be asked for ssh/passphrase password Repo 1 centos5jdiaz2250444196.img 100% 1496MB 65.0MB/s 00:23 2 euca-bundle-image …. Upload the euca-upload-image … Customize img imgeuca-register to the … 5 4 Cloud 3 IMAGE emi-437C1239 Return img Your image has been registered on OpenStack with the id emi-437C1239 to client To launch a VM you can use euca-run-instances -k keyfile -n <#instances> id Remember to load you Eucalyptus environment before you run the instance (source eucarc) More information is provided in More information is provided in https://portal.futuregrid.org/tutorials/oss and in https://portal.futuregrid.org/tutorials/eucalyptus https://portal.futuregrid.org Science Cloud Summer School 2012 Rain an Image and execute a task (baremetal) • fg-rain -u jdiaz -r 123123123 -x india -j testjob.sh -m 2 7 qsub, monitor status, completion status and indiacate output files 1 Run job in my image stored in the repo Register img 2 3 Register img in Moab and recycle 8 sched Register img from Repo 4 Customize img 7 5 Return info about the img 6 https://portal.futuregrid.org Get img from Repo Register img in xCAT (cp files/modify tables) Science Cloud Summer School 2012 Rain an Image and execute a task (baremetal) • fg-rain -u jdiaz -r 123123123 -x india -j testjob.sh -m 2 Client output: Run job in my image stored in the repo 7 Starting rain... qsub, Please insert the monitor passwordstatus, for the user jdiaz 1 Password:completion status and indiacate output files ----- Deploy the image. Same logs as before --Register img Job id is: 200941 Wait until the job finishes 2 Get img from State: Idle Register img 4 Repo 3 State: Idle from Repo State: Running Register img Customize img 7 State: Running in Moab and State: Completed 5 recycle 8 Return Completion Code: 0 Time: Fri Oct 28 15:05:02 sched info about Register img in xCAT The Standard output is in the file: salida.txt the img (cp files/modify tables) The Error output is in the file: jobscript.e200941 6 https://portal.futuregrid.org Science Cloud Summer School 2012 Rain a Hadoop environment in Interactive mode • fg-rain -u jdiaz -i ami-00000017 -s india -v ~/OSessexindia/novarc --hadoop --inputdir ~/inputdir1/ --outputdir ~/outputdir/ -m 3 -I Start VM 2 VMs Running 3 1 Install/Configure Hadoop 4 VM Deploy Hadoop Login User in Hadoop Master Environment VM HADOOP 5 VM https://portal.futuregrid.org https://portal.futuregrid.orgScience Cloud Summer School 2012 Rain a Hadoop environment in Interactive mode • fg-rain -u jdiaz -i ami-00000017 -s india -v ~/OSessexindia/novarc --hadoop --inputdir --outputdir Waiting STARTUP_MSG: been starting successfully jobtracker, tooutput: have access formatted. Starting logging to Instance NameNode to /N/u/jdiaz/hadoopjob764175511/hadoop-1.0.2/libexec/../logs/hadoopi-00000772 associated~/inputdir1/ with address server-1906 Client If we exit from VM: Waiting STARTUP_MSG: 12/07/10 jdiaz-jobtracker-10.1.2.157.out to 17:15:50 have access INFO hostto=namenode.NameNode: Instance 10.1.2.157/10.1.2.157 i-00000773 associated SHUTDOWN_MSG: with address server-1907 Starting Rain... ~/outputdir/ -m 3 -I Waiting to have STARTUP_MSG: /************************************************************ server-1908: starting access args tasktracker, to=Instance [-format] logging i-00000774 to /N/u/jdiaz/hadoopjob764175511/hadoopassociated with address server-1908 Stopping Please insert Hadoop the password Cluster for the user jdiaz All STARTUP_MSG: SHUTDOWN_MSG: 1.0.2/libexec/../logs/hadoop-jdiaz-tasktracker-10.1.2.160.out VMs are accessible: Shutting True= Start 1.0.2 downVM NameNode at 10.1.2.157/10.1.2.157 Password:jobtracker version stopping Creating STARTUP_MSG: ************************************************************/ server-1907: temporal sshkey build tasktracker, =files https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 to /N/u/jdiaz/hadoopjob764175511/hadoop-r Verify that server-1907: thestarting stopping requested tasktracker image islogging in available status or wait until it is available 2Sat Copying 1304954; Starting 1.0.2/libexec/../logs/hadoop-jdiaz-tasktracker-10.1.2.159.out temporal compiled cluster private bytasktracker 'hortonfo' and ssh-key Mar 24 files 23:58:21 to VMsUTC 2012 Creatingthe server-1908: temportal stopping sshkey pairpublic foronEC2 VMs Running Configuring ************************************************************/ starting Running namenode, Job ssh in VM logging to mounting /N/u/jdiaz/hadoopjob764175511/hadoop-1.0.2/libexec/../logs/hadoophome directory (assumes that sshfs and ldap is installed) Save private stopping namenode sshkey intoand a file Copying 12/07/10 jdiaz-namenode-10.1.2.157.out temporal 17:15:50 private INFO util.GSet: and publicVM ssh-key VMs Launching server-1908: image stopping datanode 3type files= to64-bit Configuring 12/07/10 server-1908: You are going 17:15:50 ssh starting toin beINFO VM logged datanode, and util.GSet: mounting logging but 2%home you max to /N/u/jdiaz/hadoopjob764175511/hadoopcan directory memory change =(assumes to 19.33375 your user that MB by sshfs executing and ldapsuis-installed) <username> Waiting server-1907: for running stopping state datanode in as allroot, the VMs Install/Configure Copying 12/07/10 1.0.2/libexec/../logs/hadoop-jdiaz-datanode-10.1.2.160.out List of machines temporal 17:15:50 are private INFO in /root/machines util.GSet: and public capacity ssh-key and /N/u/<username>/machines. files = 2^21 to VMs = 2097152 entries Your real home is in i-00000772:pending server-1906: stopping secondarynamenode Hadoop Configuring 12/07/10 server-1907: /tmp/N/u/<username> 17:15:50 ssh starting in INFO VM datanode, and util.GSet: mounting logging recommended=2097152, home to /N/u/jdiaz/hadoopjob764175511/hadoopdirectory (assumes actual=2097152 that sshfs and ldap is installed) i-00000773:pending Job Done 1 4 VM Setting 12/07/10 1.0.2/libexec/../logs/hadoop-jdiaz-datanode-10.1.2.159.out Hadoop up is17:15:50 in Hadoop the home INFO environment directory namenode.FSNamesystem: in of the your jdiaz user.home directory fsOwner=jdiaz VM i-00000774:pending Login User in Deploy Hadoop Configure 12/07/10 server-1906: [root@10 17:15:50 ~]# Hadoop Warning: INFO cluster Permanently namenode.FSNamesystem: in the jdiaz added home'server-1906,10.1.2.157' directorysupergroup=supergroup (RSA) to the list of known hosts. ------------------------HADOOP Hadoop Master Environment Starting 12/07/10 server-1906: Hadoop 17:15:50 starting cluster INFO secondarynamenode, innamenode.FSNamesystem: the jdiaz home directory logging to isPermissionEnabled=true /N/u/jdiaz/hadoopjob764175511/hadoopi-00000772:running 5 Formatting 12/07/10 1.0.2/libexec/../logs/hadoop-jdiaz-secondarynamenode-10.1.2.157.out 17:15:50 HDFS INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 i-00000773:running 12/07/10 Waiting in17:15:50 17:15:49 the safemode INFO namenode.FSNamesystem: namenode.NameNode: STARTUP_MSG: isAccessTokenEnabled=false VM i-00000774:running /************************************************************ accessKeyUpdateInterval=0 Safe mode is OFF min(s), accessTokenLifetime=0 min(s) ------------------------https://portal.futuregrid.org 12/07/10 Starting 17:15:50 INFO daemons namenode.NameNode: Caching file Science names occuring more than 10 times https://portal.futuregrid.org https://portal.futuregrid.org Cloud Summer School 2012 Number MapReduce of instances booted 3 Rain a Hadoop environment and execute Word count 1/2 • As example we use the word count application to count the words of several books • Create script with the hadoop command (hadoopword.sh) hadoop jar $HADOOP_CONF_DIR/../hadoop-examples*.jar wordcount inputdir1 outputdir • Download books in txt $ wget i120/test-image/books-example.tgz • Uncompress books $ mkdir ~/inputdir1 $ tar xvfz books-example.tgz –C ~/inputdir1 https://portal.futuregrid.org Science Cloud Summer School 2012 Rain a Hadoop environment and execute Word count 2/2 • Execute rain $ fg-rain -u jdiaz -i ami-00000017 -s india -v ~/OSessexindia/novarc –j ~/hadoopword.sh --hadoop --inputdir ~/inputdir1/ --outputdir ~/outputdir/ -m 3 • Once the job is done $ ls ~/outputdir/outputdir/ _logs part-r-00000 _SUCCESS • The output is in the file part-r-00000 https://portal.futuregrid.org Science Cloud Summer School 2012 Rain a Virtual Cluster • fg-cluter run -i ami-00000017 -n 3 -t m1.medium -a mycluster Start VM 2 VMs Running 3 Install/Configure SLURM 1 4 Deploy Virtual Cluster Login User in Frontend VM SLURM Frontend SLURM Compute VM 5 VM https://portal.futuregrid.org SLURM Compute Science Cloud Summer School 2012 Additional Information • FG Rain – Download https://github.com/futuregrid/rain – Doc http://futuregrid.github.com/rain/ • FG Cluster – Download https://github.com/futuregrid/virtualcluster – Doc http://futuregrid.github.com/virtual-cluster/ https://portal.futuregrid.org Science Cloud Summer School 2012