NMCAC Exemplar Knowledge Transfer

advertisement
NMCAC Exemplar Knowledge Transfer
1. Describe system features, component functionality, location of components
and Hierarchy.
a. System Features
b. System Management Hierarchy
c. System Admin controller Functionality
d. Rack Leader Controller (RLC) Functionality
e. Compute Node Functionality
f. Login (Service) Node Functionality
g. Storage Node Functionality
h. Individual Rack Unit (IRU)
i. IRU slot IDs
j. IB Switch Blade
k. Chassis Management Control (CMC) and cabling
l. System Interconnects
m. NAS Cube Storage System
2.
Review TEMPO Cluster Management Software and commands
a. Cpower
b. Cimage
c. Cadmin
d. Discover
e. Discover-rack
f. Configure-cluster
3. Review Cluster inventory commands (Cluster Inventory Tool (IVT))
a. Tempo-info-gather
4. Use of the IPMITOOL (Intelligent Platform Management Interface)
a. System console access
b. System power reset
c. System power down
d. IPMI web Interface
5. Identify InfiniBand components and functions.
a. InfiniBand Basics
b. IB switch blade hardware
c. InfiniBand Fabric Management and tools.
d. OpenSM utilities –RLC
6. System Imaging and admin
a. Managing system images with Cimage command
b. NFS Options
c. NIS Options
d. Add Users
Procedure 3-1 Clone a Compute Node Image
To clone a compute node image, perform the following steps:
1. From the system admin controller, create a clone of the compute node image, as
follows:
# cimage --clone-image compute-sles10sp2 new
After that command is complete, you will have a new image located in
/var/lib/systemimager/images/new on the system admin controller.
2. To see that the image is now available, perform the following command:
# cimage --list-images
image: compute-sles10sp1
kernel: 2.6.16.46-0.12-carlsbad
kernel: 2.6.16.46-0.12-smp
image: new
kernel: 2.6.16.46-0.12-carlsbad
kernel: 2.6.16.46-0.12-smp
For RPM lists, the default RPM lists are located in /etc/opt/sgi/rpmlists on the
system admin controller. SGI suggests you never change these files, but rather, create
your own versions using the ones supplied by SGI as a base.
Please note, it is important that certain packages be in the rpmlist for a given node.
For example, an rpmlist used for compute nodes should have packages
sgi-compute-node and sgi-cluster. Service nodes must have
sgi-service-node and sgi-cluster.
Procedure 3-2 Manually adding a Package to a Compute Node Image
To manually add a package to a compute node image, perform the steps:
1. Make a clone of the compute node image, as described in "Customizing Software
Images" on page 102.
2. Determine what images and kernels you have available now, as follows:
# cimage --list-images
image: compute-sles10sp1
kernel: 2.6.16.46-0.12-carlsbad
3: System Operation
kernel: 2.6.16.46-0.12-smp
image: compute-sles10sp1-clone
kernel: 2.6.16.46-0.12-carlsbad
kernel: 2.6.16.46-0.12-smp
3. From the system admin controller, change directory to the images directory, as
follows:
# cd /var/lib/systemimager/images/
4. From the system admin controller, copy the RPMs your wish to add, as follows:
# cp /newrpm.rpm new/tmp
5. The new RPMs now reside in /tmp direcory in the image named new. To install
them into your new compute node image, perform the following commands:
# chroot new bash
And then perform the following:
# rpm -Uvh /tmp/newrpm.rpm
6. The image on the system admin controller is updated. However, you still need to
push the changes out. Ensure there are no nodes currently using the image and
then run this command:
# cimage --push-rack new r\*
This will push the updates to the rack lead controllers and the changes will be
seen by the compute nodes the next time they start up. For information on how
to ensure the image is associated with a given node, see the cimage --set
command and the example in Procedure 3-3, page 104.
Procedure 3-3 Creating a Simple Compute Node Image Clone
Note: Always work from a clone image, see "Customizing Software Images" on page
102.
To create a simple compute node image clone from the system admin controller,
perform the following steps:
1. To clone the compute node image, perform the following:
# cimage --clone-image compute-sles10sp1 compute-sles10sp1-clone
2. To see the images and kernels in the list, perform the following:
# cimage --list-images
image: compute-sles10sp1
kernel: 2.6.16.46-0.12-carlsbad
kernel: 2.6.16.46-0.12-smp
image: compute-sles10sp1-clone
kernel: 2.6.16.46-0.12-carlsbad
kernel: 2.6.16.46-0.12-smp
3. To change the compute nodes to use the cloned image/kernel pair, perform the
following:
# cimage --set compute-sles10sp1-clone 2.6.16.46-0.12-smp "r*i*n*"
Procedure 3-4 Manually Adding a Package to the Service Node Image
To manually add a package to the service node image, perform the following steps:
1. Use the mksiimage command to create your own version of the service node
image.
See "Creating Compute and Service Images Using the mksiimage
Command" on page 111 of the Altix ICE System Administrators Guide.
2. Change directory to the images directory, as follows:
# cd /var/lib/systemimager/images/
3. From the system admin controller, copy the RPMs your wish to add, as follows,
where my-service-image is your own service node image:
# cp /newrpm.rpm my-service-image/tmp
4. The new RPMs now reside in /tmp direcory in the image named
my-service-image. To install them into your new compute node image,
perform the following commands:
# chroot new bash
And then perform the following:
# rpm -Uvh /tmp/newrpm.rpm
3: System Operation
At this point, the image has been updated with the rpm. Please note, that unlike
compute node images, changes made to a service node image will not be seen by
service nodes until they are re-installed with the image. If you wish to install the
package on running systems, you can copy the rpm to the running system and
use rpm from there.
cimage Command
The cimage command allows you to list, modify, and set software images on the
compute nodes in your system.
The cimage command accepts the following options:
Option Description
--help Usage and help text
--list-images Lists images present in the database
--list-nodes RACK
...
Lists what compute nodes are set to
--set IMAGE KERNEL
NODE ...
Sets the compute nodes to a certain boot image and
kernel combination
--add-db IMAGE Adds an image to the database
--del-db IMAGE Deletes an image from the database
--push-rack IMAGE
RACK ...
Pushes an image to specified rack(s)
--del-rack IMAGE
RACK
Deletes an image from specified rack(s)
--clone-image
OIMAGE NIMAGE
Clones an existing image to a new image
--del-image IMAGE Deletes an existing image entirely
RACK arguments take the format rX.
NODE arguments take the format rXiYnZ.
X, Y, Z can be single digits, a [start-end] range, or * for all matches.
... indicates more than one RACK or NODE argument can be passed in.
7. Node Failure identification
a. Identify if a node has failed
b. Get failure information
c. Disable the node
8. SGI support procedures
a. SupportFolio login creation
b. SGI support hotline information
9. Review SGI formal system training
10. Questions and Answers
Additional Items from Campuses:
1) Anything special about creating accounts
2) Is the process for job submission with mpirun? If not how?
3) an hour of time monitoring the running of the testing jobs so that we have a handle on
what to use when operating the system
Download