NMCAC Exemplar Knowledge Transfer 1. Describe system features, component functionality, location of components and Hierarchy. a. System Features b. System Management Hierarchy c. System Admin controller Functionality d. Rack Leader Controller (RLC) Functionality e. Compute Node Functionality f. Login (Service) Node Functionality g. Storage Node Functionality h. Individual Rack Unit (IRU) i. IRU slot IDs j. IB Switch Blade k. Chassis Management Control (CMC) and cabling l. System Interconnects m. NAS Cube Storage System 2. Review TEMPO Cluster Management Software and commands a. Cpower b. Cimage c. Cadmin d. Discover e. Discover-rack f. Configure-cluster 3. Review Cluster inventory commands (Cluster Inventory Tool (IVT)) a. Tempo-info-gather 4. Use of the IPMITOOL (Intelligent Platform Management Interface) a. System console access b. System power reset c. System power down d. IPMI web Interface 5. Identify InfiniBand components and functions. a. InfiniBand Basics b. IB switch blade hardware c. InfiniBand Fabric Management and tools. d. OpenSM utilities –RLC 6. System Imaging and admin a. Managing system images with Cimage command b. NFS Options c. NIS Options d. Add Users Procedure 3-1 Clone a Compute Node Image To clone a compute node image, perform the following steps: 1. From the system admin controller, create a clone of the compute node image, as follows: # cimage --clone-image compute-sles10sp2 new After that command is complete, you will have a new image located in /var/lib/systemimager/images/new on the system admin controller. 2. To see that the image is now available, perform the following command: # cimage --list-images image: compute-sles10sp1 kernel: 2.6.16.46-0.12-carlsbad kernel: 2.6.16.46-0.12-smp image: new kernel: 2.6.16.46-0.12-carlsbad kernel: 2.6.16.46-0.12-smp For RPM lists, the default RPM lists are located in /etc/opt/sgi/rpmlists on the system admin controller. SGI suggests you never change these files, but rather, create your own versions using the ones supplied by SGI as a base. Please note, it is important that certain packages be in the rpmlist for a given node. For example, an rpmlist used for compute nodes should have packages sgi-compute-node and sgi-cluster. Service nodes must have sgi-service-node and sgi-cluster. Procedure 3-2 Manually adding a Package to a Compute Node Image To manually add a package to a compute node image, perform the steps: 1. Make a clone of the compute node image, as described in "Customizing Software Images" on page 102. 2. Determine what images and kernels you have available now, as follows: # cimage --list-images image: compute-sles10sp1 kernel: 2.6.16.46-0.12-carlsbad 3: System Operation kernel: 2.6.16.46-0.12-smp image: compute-sles10sp1-clone kernel: 2.6.16.46-0.12-carlsbad kernel: 2.6.16.46-0.12-smp 3. From the system admin controller, change directory to the images directory, as follows: # cd /var/lib/systemimager/images/ 4. From the system admin controller, copy the RPMs your wish to add, as follows: # cp /newrpm.rpm new/tmp 5. The new RPMs now reside in /tmp direcory in the image named new. To install them into your new compute node image, perform the following commands: # chroot new bash And then perform the following: # rpm -Uvh /tmp/newrpm.rpm 6. The image on the system admin controller is updated. However, you still need to push the changes out. Ensure there are no nodes currently using the image and then run this command: # cimage --push-rack new r\* This will push the updates to the rack lead controllers and the changes will be seen by the compute nodes the next time they start up. For information on how to ensure the image is associated with a given node, see the cimage --set command and the example in Procedure 3-3, page 104. Procedure 3-3 Creating a Simple Compute Node Image Clone Note: Always work from a clone image, see "Customizing Software Images" on page 102. To create a simple compute node image clone from the system admin controller, perform the following steps: 1. To clone the compute node image, perform the following: # cimage --clone-image compute-sles10sp1 compute-sles10sp1-clone 2. To see the images and kernels in the list, perform the following: # cimage --list-images image: compute-sles10sp1 kernel: 2.6.16.46-0.12-carlsbad kernel: 2.6.16.46-0.12-smp image: compute-sles10sp1-clone kernel: 2.6.16.46-0.12-carlsbad kernel: 2.6.16.46-0.12-smp 3. To change the compute nodes to use the cloned image/kernel pair, perform the following: # cimage --set compute-sles10sp1-clone 2.6.16.46-0.12-smp "r*i*n*" Procedure 3-4 Manually Adding a Package to the Service Node Image To manually add a package to the service node image, perform the following steps: 1. Use the mksiimage command to create your own version of the service node image. See "Creating Compute and Service Images Using the mksiimage Command" on page 111 of the Altix ICE System Administrators Guide. 2. Change directory to the images directory, as follows: # cd /var/lib/systemimager/images/ 3. From the system admin controller, copy the RPMs your wish to add, as follows, where my-service-image is your own service node image: # cp /newrpm.rpm my-service-image/tmp 4. The new RPMs now reside in /tmp direcory in the image named my-service-image. To install them into your new compute node image, perform the following commands: # chroot new bash And then perform the following: # rpm -Uvh /tmp/newrpm.rpm 3: System Operation At this point, the image has been updated with the rpm. Please note, that unlike compute node images, changes made to a service node image will not be seen by service nodes until they are re-installed with the image. If you wish to install the package on running systems, you can copy the rpm to the running system and use rpm from there. cimage Command The cimage command allows you to list, modify, and set software images on the compute nodes in your system. The cimage command accepts the following options: Option Description --help Usage and help text --list-images Lists images present in the database --list-nodes RACK ... Lists what compute nodes are set to --set IMAGE KERNEL NODE ... Sets the compute nodes to a certain boot image and kernel combination --add-db IMAGE Adds an image to the database --del-db IMAGE Deletes an image from the database --push-rack IMAGE RACK ... Pushes an image to specified rack(s) --del-rack IMAGE RACK Deletes an image from specified rack(s) --clone-image OIMAGE NIMAGE Clones an existing image to a new image --del-image IMAGE Deletes an existing image entirely RACK arguments take the format rX. NODE arguments take the format rXiYnZ. X, Y, Z can be single digits, a [start-end] range, or * for all matches. ... indicates more than one RACK or NODE argument can be passed in. 7. Node Failure identification a. Identify if a node has failed b. Get failure information c. Disable the node 8. SGI support procedures a. SupportFolio login creation b. SGI support hotline information 9. Review SGI formal system training 10. Questions and Answers Additional Items from Campuses: 1) Anything special about creating accounts 2) Is the process for job submission with mpirun? If not how? 3) an hour of time monitoring the running of the testing jobs so that we have a handle on what to use when operating the system