Platform HPC 3.2 Integration with Xeon Phi

advertisement
IBM Platform HPC:
Best Practices for integrating with
Intel® Xeon PhiTM Coprocessors
Date: August 22, 2013 (Revision 1.3)
Author: Gábor Samu (gsamu@ca.ibm.com)
Reviewer: Mehdi Bozzo-Rey (mbozzore@ca.ibm.com)
I.
Background ..................................................................................................................................... 3
II.
Infrastructure Preparation ............................................................................................................... 4
III. Intel Software tools deployment.................................................................................................... 12
IV. IBM Platform HPC: Xeon Phi monitoring, workloads ...................................................................... 14
A.
IBM Platform HPC built-in Xeon Phi monitoring.......................................................................... 14
B.
IBM Platform LSF ELIM: Xeon Phi monitoring, job scheduling (dynamic resources).................... 16
C.
IBM Platform LSF: Xeon Phi job scheduling (LSF configuration)................................................... 22
D.
IBM Platform LSF: Xeon Phi job submission................................................................................ 23
Appendix A: IBM Platform HPC and Intel Xeon Phi Integration Scripts ................................................... 32
Copyright and Trademark Information................................................................................................... 32
I.
Background
IBM Platform HPC is a complete, end-to-end HPC cluster management solution. It includes a rich set of
out-of the-box features that empowers high performance technical computing users by reducing the
complexity of their HPC environment and improving their time-to-solution.
IBM Platform HPC includes the following key capabilities:
 Cluster management
 Workload management
 Workload monitoring and reporting
 System monitoring and reporting
 MPI libraries (includes IBM Platform MPI)
 Integrated application scripts/templates for job submission
 Unified web portal
Intel Xeon Phi (Many Integrated Cores - MIC) is a new CPU architecture developed by Intel Corporation
that provides higher aggregated performance than alternative solutions deliver. It is designed to simplify
application parallelization while at the same time delivering significant performance improvement.
The distinct features included in the Intel Xeon Phi design are following:
 It is comprised of many smaller lower power Intel processor cores
 It contains wider vector processing units for greater floating point
 Performance /watt with its innovative design, Intel Xeon Phi delivers higher aggregated
performance,
 Supports data parallel, thread parallel and process parallel, and increased total memory
bandwidth.
This document provides an example configuration of IBM Platform HPC for an environment containing
systems equipped with Xeon Phi. The document is broken down into three broad sections:



Infrastructure Preparation
Xeon Phi monitoring using IBM Platform HPC Monitoring Framework
Xeon Phi workload management using IBM Platform LSF
The overall procedure was validated with the following software versions:




IBM Platform HPC 3.2 (Red Hat Enterprise Linux 6.2)
Intel(r) MPSS version 2.1.6720-12.2.6.32-220 (Intel software stack for Xeon Phi)
Intel(r) Cluster Studio XE for Linux version 2013 Update 1
Intel(r) MPI version 4.1.0.024
II.
Infrastructure Preparation
For the purpose of this document, the example IBM Platform HPC cluster is configured as follows:
IBM Platform HPC head node:
Compute node(s):
mel1
compute000
compute001
Both compute000, compute001 are equipped as follows:
 1 Intel Xeon Phi co-processor cards
 2 Gigabit Ethernet NICs
 (No InfiniBand(R) present)
The infrastructure is configured using the following networks:
IP Ranges
192.0.2.2 - 192.0.2.50
Description
Cluster private network
192.0.2.51 - 192.0.2.99
192.0.2.100 - 192.0.2.150
Xeon Phi network (bridged)
Out-of-Band management network
Comments
Provisioning, monitoring,
computation
Computation
IPMI network
mel1 has two network interfaces configured:
 eth0 (public interface)
 eth1 (private interface)
compute000, compute001 will have two network interfaces configured:


eth1 (private interface); configured during provisioning
bridged interface; to be configured post provisioning
The following steps assume that mel1 has been installed with IBM Platform HPC 3.2 and the original
configuration was not modified. Additionally, the steps below rely upon the IBM Platform HPC CLI/TUI
tools. All of the operations performed in the document using the CLI/TUI tools, can also be performed
using the IBM Platform HPC Web Console.
1. Create a new node group template for compute hosts equipped with Xeon Phi. Create a copy of the
default package based compute node group template compute-rhel-6.2-x86_64 named computerhel-6.2-x86_64_Xeon_Phi.
# kusu-ngedit -c compute-rhel-6.2-x86_64 -n compute-rhel-6.2-x86_64_Xeon_Phi
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/.updatenics
New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/fstab.kusuappend
New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/hosts
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/hosts.equiv
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/passwd.merge
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/group.merge
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/shadow.merge
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_config
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_dsa_key
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_rsa_key
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_key
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_dsa_key.pub
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_key.pub
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_rsa_key.pub
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /root/.ssh/authorized_keys
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /root/.ssh/id_rsa
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/kusu/etc/logserver.addr
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/lsf/conf/lsf.conf
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/lsf/conf/hosts
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/lsf/conf/lsf.shared
New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/lsf/conf/lsf.cluster.mel1_cluster1
....
....
Distributing 75 KBytes to all nodes.
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
2. Add Intel MPSS packages to the default software repository managed by IBM Platform HPC. This is
required to automate the deployment of Intel MPSS to the Xeon Phi equipped nodes.
# cp *.rpm /depot/contrib/1000/
# ls -la *.rpm
-rw-r--r-- 1 root root 16440156 May 8 13:55 intel-mic-2.1.6720-12.2.6.32-220.el6.x86_64.rpm
-rw-r--r-- 1 root root 3298216 May 8 13:55 intel-mic-cdt-2.1.6720-12.2.6.32-220.el6.x86_64.rpm
-rw-r--r-- 1 root root 522844 May 8 13:55 intel-mic-flash-2.1.386-2.2.6.32-220.el6.x86_64.rpm
-rw-r--r-- 1 root root 10255872 May 8 13:55 intel-mic-gdb-2.1.6720-12.2.6.32-220.el6.x86_64.rpm
-rw-r--r-- 1 root root 182208656 May 8 13:55 intel-mic-gpl-2.1.6720-12.el6.x86_64.rpm
-rw-r--r-- 1 root root 2300600 May 8 13:55 intel-mic-kmod-2.1.6720-12.2.6.32.220.el6.x86_64.rpm
-rw-r--r-- 1 root root 280104 May 8 13:55 intel-mic-micmgmt-2.1.6720-12.2.6.32.220.el6.x86_64.rpm
-rw-r--r-- 1 root root 254776 May 8 13:55 intel-mic-mpm-2.1.6720-12.2.6.32-220.el6.x86_64.rpm
-rw-r--r-- 1 root root 10863724 May 8 14:10 intel-mic-ofed-card-6720-12.2.6.32-220.el6.x86_64.rpm
-rw-r--r-- 1 root root 1489992 May 8 14:10 intel-mic-ofed-dapl-2.0.36.7-1.el6.x86_64.rpm
-rw-r--r-- 1 root root 44528 May 8 14:10 intel-mic-ofed-dapl-devel-2.0.36.7-1.el6.x86_64.rpm
-rw-r--r-- 1 root root 220712 May 8 14:10 intel-mic-ofed-dapl-devel-static-2.0.36.7-1.el6.x86_64.rpm
-rw-r--r-- 1 root root 108940 May 8 14:10 intel-mic-ofed-dapl-utils-2.0.36.7-1.el6.x86_64.rpm
-rw-r--r-- 1 root root 5800 May 8 14:10 intel-mic-ofed-ibpd-6720-12.2.6.32-220.el6.x86_64.rpm
-rw-r--r-- 1 root root 14730200 May 8 14:10 intel-mic-ofed-kmod-6720-12.2.6.32-220.el6.x86_64.rpm
-rw-r--r-- 1 root root 102052 May 8 14:10 intel-mic-ofed-kmod-devel-6720-12.2.6.32220.el6.x86_64.rpm
-rw-r--r-- 1 root root 8536 May 8 14:10 intel-mic-ofed-libibscif-6720-12.2.6.32-220.el6.x86_64.rpm
-rw-r--r-- 1 root root 5620 May 8 14:10 intel-mic-ofed-libibscif-devel-6720-12.2.6.32220.el6.x86_64.rpm
-rw-r--r-- 1 root root 55163240 May 8 13:55 intel-mic-sysmgmt-2.1.6720-12.2.6.32-220.el6.x86_64.rpm
# kusu-repoman -u -r "rhel-6.2-x86_64"
Refreshing repository: rhel-6.2-x86_64. This may take a while...
3. The Out-of-Band management network is defined using kusu-netedit. The configuration of an Outof-Band management network is highly recommended. Refer to Chapter 6 (Networks) in the IBM
Platform HPC 3.2 Administering IBM Platform HPC guide for details on configuring a BMC network.
# kusu-netedit -a -n 192.0.2.0 -s 255.255.255.0 -i bmc -t 192.0.2.100 -e "BMC network" -x "-bmc"
4. Next, modify the node group "compute-rhel-6.2-x86_64_Xeon_Phi" to add the following Optional
Packages for deployment and to enable the Out-of-Band management network. The kusu-ngedit
tool is used for this purpose. kusu-ngedit presents the administrator with a TUI interface from which
the package selection and network selection can be performed.
Networks TUI screen:
Enable network "bmc" for node group "compute-rhel-6.2-x86_64_Xeon_Phi"
Optional Packages TUI screen:
intel-mic-micmgmt
intel-mic-mpm-2.1.6720
intel-mic-2.1.6720
intel-mic-cdt-2.1.6720
intel-mic-flash-2.1.386
intel-mic-gdb-2.1.6720
intel-mic-gpl
intel-mic-kmod
intel-mic-sysmgmt-2.1.6720
libstdc++ (needed by Intel MPSS software)
5. For each Xeon Phi device, you must assign a static IP address. The IP addresses selected for this
example are on the cluster network 192.0.2.0. The Xeon Phi device IP addresses are added to IBM
Platform HPC as unmanaged devices using the kusu-addhost command. This will ensure that
the Xeon Phi hostnames are added to the /etc/hosts on each cluster node, as well as preventing the
IPs from being allocated by IBM Platform HPC for other devices.
Hostname
compute000
compute000-mic0
compute001
compute001-mic0
IP address
192.0.2.11
192.0.2.51
192.0.2.12
192.0.2.52
# kusu-addhost -s compute000-mic0 -x 192.0.2.51
Setting up dhcpd service...
Setting up dhcpd service successfully...
Setting up NFS export service...
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
Distributing 8 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
# kusu-addhost -s compute001-mic0 -x 192.0.2.52
Setting up dhcpd service...
Setting up dhcpd service successfully...
Setting up NFS export service...
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
Distributing 8 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
6. IBM Platform HPC manages the network interfaces of all compute nodes but does not currently
support the management of bridged network interfaces. It is necessary to define a bridge on the
compute nodes so that the Xeon Phi devices can be accessed over the network. This is mandatory in
the situation for example where Xeon Phi native MPI. The following procedure automates the
configuration of the network bridge and the necessary Xeon Phi configuration to utilize the bridge.
The following procedure supports a maximum of one Xeon Phi device per node. Two steps are
involved:


Create a post install script that will trigger the run of /etc/rc.local and add it to the node
group of your choice
Create rc.local.append for the same node group, under the appropriate cfm directory on the
installer
Appendix A contains details on where to obtain the example post-install script, and rc.local.append
contents.
** WARNING: The following changes will prevent Platform HPC from managing network interfaces on
the compute nodes. **
Copy the example post_install.sh script to /root on the IBM Platform HPC head node.
# cp post_install.sh /root
Start kusu-ngedit and edit the Xeon Phi specific node group compute-rhel-6.2-x86_64_ Xeon_Phi.
Add the script post_install.sh as a Custom Script.
Copy the example rc.local.xeon_phi script to the appropriate CFM directories with filename
rc.local.append. This will ensure that the contents of rc.local.xeon_phi are appended to the rc.local file on
the respective compute nodes. In this case, we need to copy the file to the CFM directory for the node
group compute-rhel-6.2-x86_64_ Xeon_Phi.
# cp rc.local.xeon_phi /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/rc.local.append
Next, execute kusu-cfmsync to make the change take effect.
[root@mel1 ~]# kusu-cfmsync -f
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/rc.local.append
Distributing 1 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
7. To ensure a consistent host name space, you should use the CFM framework to propagate the
/etc/hosts file from the IBM Platform HPC head node to all known Xeon Phi devices.
On the IBM Platform HPC head node perform the following operations:
# cp /etc/hosts /shared/hosts
# mkdir -p /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/opt/intel/mic/filesystem/base/etc/rc.d
In /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/opt/intel/mic/filesystem/base/etc/rc.d create the file
rc.sysinit.append containing the following:
cp /shared/hosts /etc/hosts
**Note: The above steps must be repeated under the /etc/cfm/installer-rhel-6.2-x86_64
file system. This is required as the IBM Platform HPC head node is equipped with Xeon Phi.**
The updates to the Xeon Phi configuration files are propagated to all nodes in the node groups
"compute-rhel-6.2-x86_64_Xeon_Phi". On the IBM HPC head node, execute "kusu-cfmsync".
# kusu-cfmsync -f
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
New file found: /etc/cfm/compute-rhel-6.2x86_64_Xeon_Phi/opt/intel/mic/filesystem/base/etc/rc.d/rc.sysinit.append
Distributing 0 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
8. Provision all Xeon Phi equipped nodes using the node group template "compute-rhel-6.2x86_64_Xeon_Phi". Note that once nodes are discovered by kusu-addhost, the administrator must
exit from the listening mode by pressing Control-C. This will complete the node discovery process.
# kusu-addhost -i eth0 -n compute-rhel-6.2-x86_64_Xeon_Phi -b
Scanning syslog for PXE requests...
Discovered Node: compute000
Mac Address: 00:1e:67:49:cc:83
Discovered Node: compute001
Mac Address: 00:1e:67:49:cc:e5
^C
Command aborted by user...
Setting up dhcpd service...
Setting up dhcpd service successfully...
Setting up NFS export service...
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
Distributing 100 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
9. If passwordless SSH as 'root' to the Xeon Phi devices is needed, then the following step must be
performed prior to the generation of the Intel MPSS configuration. Copy the public SSH key for the
root account from the head node to all nodes that are in the node group compute-rhel-6.2x86_64_Xeon_Phi node group.
# ln -s /opt/kusu/etc/.ssh/id_rsa.pub /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/root/.ssh/id_rsa.pub
# kusu-cfmsync -f
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/root/.ssh/id_rsa.pub
Distributing 0 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
III.
Intel Software tools deployment
It is recommended to install all of the Intel Software tools, and Intel MPI to the common shared file
system. IBM Platform HPC configures a default NFS share "/shared" which is common on all compute
nodes managed by the software. With the procedure above, /shared is mounted and available on all
nodes in the cluster, including Xeon Phi co-processor environments.
Here, the native Intel Software tools and Intel MPI installation programs are used. No further detail is
provided in the document.
As part of the installation of the Intel Software tools, you may be required to install additional 32-bit
libraries. If this is the case, then the following yum commands must be executed across the nodes as
follows in the example below.
# lsrun -m "mel1 compute000 compute001" yum -y install libstdc++.i686
Loaded plugins: product-id, security, subscription-manager
Updating certificate-based repositories.
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package libstdc++.i686 0:4.4.6-3.el6 will be installed
--> Processing Dependency: libm.so.6(GLIBC_2.0) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libm.so.6 for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libgcc_s.so.1(GLIBC_2.0) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libgcc_s.so.1(GCC_4.2.0) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libgcc_s.so.1(GCC_3.3) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libgcc_s.so.1(GCC_3.0) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libgcc_s.so.1 for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libc.so.6(GLIBC_2.4) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libc.so.6(GLIBC_2.3.2) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libc.so.6(GLIBC_2.3) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libc.so.6(GLIBC_2.2) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libc.so.6(GLIBC_2.1.3) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libc.so.6(GLIBC_2.1) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libc.so.6(GLIBC_2.0) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: libc.so.6 for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: ld-linux.so.2(GLIBC_2.3) for package: libstdc++-4.4.6-3.el6.i686
--> Processing Dependency: ld-linux.so.2 for package: libstdc++-4.4.6-3.el6.i686
--> Running transaction check
---> Package glibc.i686 0:2.12-1.47.el6 will be installed
--> Processing Dependency: libfreebl3.so(NSSRAWHASH_3.12.3) for package: glibc-2.12-1.47.el6.i686
--> Processing Dependency: libfreebl3.so for package: glibc-2.12-1.47.el6.i686
---> Package libgcc.i686 0:4.4.6-3.el6 will be installed
--> Running transaction check
---> Package nss-softokn-freebl.i686 0:3.12.9-11.el6 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package
Arch Version
Repository
Size
================================================================================
Installing:
libstdc++
i686 4.4.6-3.el6
kusu-compute-default 298 k
Installing for dependencies:
glibc
i686 2.12-1.47.el6 kusu-compute-default 4.3 M
libgcc
i686 4.4.6-3.el6
kusu-compute-default 110 k
nss-softokn-freebl i686 3.12.9-11.el6 kusu-compute-default 116 k
Transaction Summary
================================================================================
Install
4 Package(s)
Total download size: 4.8 M
Installed size: 14 M
Downloading Packages:
-------------------------------------------------------------------------------Total
21 MB/s | 4.8 MB 00:00
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Warning: RPMDB altered outside of yum.
Installing : libgcc-4.4.6-3.el6.i686
1/4
Installing : glibc-2.12-1.47.el6.i686
2/4
Installing : nss-softokn-freebl-3.12.9-11.el6.i686
3/4
Installing : libstdc++-4.4.6-3.el6.i686
4/4
Installed products updated.
Installed:
libstdc++.i686 0:4.4.6-3.el6
Dependency Installed:
glibc.i686 0:2.12-1.47.el6
libgcc.i686 0:4.4.6-3.el6
nss-softokn-freebl.i686 0:3.12.9-11.el6
Complete!.
IV.
IBM Platform HPC: Xeon Phi monitoring, workloads
A. IBM Platform HPC built-in Xeon Phi monitoring.
NOTE: The following section requires that Fix Pack hpc-3.2-build216840 is applied to the IBM Platform
HPC head node. This is available via the IBM Fix Central site.
IBM Platform HPC provides rudimentary Xeon Phi monitoring capabilities via the Web based console out
of the box. The capabilities are not enabled by default at the time of installation. To enable the
monitoring capabilities the following steps must be performed on the head node.
Add the line “hasMIC=true" to /usr/share/pmc/gui/conf/pmc.conf
# sed -i s?unselect=?unselect=mics,?g /usr/share/pmc/gui/conf/prefconf/hostListDTDiv_default.properties
# sed -i s?unselect=?unselect=mics,?g /usr/share/pmc/gui/conf/prefconf/hostListProvisionDTDiv_default.properties
# pmcadmin stop
# pmcadmin start
The IBM Platform HPC Web Console incorrectly assumes that 'micinfo' is located in /usr/bin. The current
Intel MPSS installs micinfo to /opt/intel/mic/bin. Here a wrapper script to call /opt/intel/mic/bin is
distributed to /usr/bin on all nodes within the compute-rhel-6.2-x86_64_Xeon_Phi node group.
Create a script ‘micinfo’ with the following contents:
#!/bin/sh
/opt/intel/mic/bin/micinfo
exit 0
# mkdir -p /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/usr/bin
Copy the micinfo script to the appropriate CFM directory and set execute permissions.
# cp micinfo /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/usr/bin
# chmod 755 /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/usr/bin/micinfo
kusu-cfmsync -f
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/usr/bin/micinfo
Distributing 0 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
The following screenshot of the IBM Platform HPC Web console shows the "MIC" tab that displays
metrics for each Xeon Phi device on a per host basis:
B. IBM Platform LSF ELIM: Xeon Phi monitoring, job scheduling (dynamic
resources).
IBM has prepared an example ELIM script for IBM Platform HPC (and IBM Platform LSF) that leverages
the Intel MPSS tools to provide metrics for both monitoring and job scheduling.
Download details for the example ELIM script can be found in Appendix A. The example ELIM has been
validated on systems with IBM Platform HPC 3.2/IBM LSF Express 8.3, Intel MPSS 2.1 build 4346-16 and
2 Xeon Phi co-processor cards per node; it will report back the following metrics:







Total number of Xeon Phi co-processors per node
Number of cores per Xeon Phi co-processor
Xeon Phi CPU temperature (Celsius)
Xeon Phi CPU frequency (GHz)
Xeon Phi Total power (Watts)
Xeon Phi Total Free memory (MB)
Xeon Phi CPU utilization (%)
Below, the IBM Platform HPC hpc-metric-tool is used to configure the monitoring of the Xeon Phi
specific metrics.
Download details for the script (mic_metrics_add.sh) to automate the configuration of the Intel MIC
metrics can be found in Appendix A. The script requires as input the ELIM script, dynelim.intelmic.
The following is executed on the IBM Platform HPC head node
# chmod 755 /root/dynelim.intelmic
# chmod 755 /root/mic_metric_add.sh
# ./mic_metric_add.sh /root/dynelim.intelmic
Adding External Metric Summary:
Name: num_mics
LSF resource Mapping: default
ELIM file path: /root/dynelim.intelmic
LSF resource interval: 60
LSF resource increase: n
Display Name: Number of MICs
External Metric is added, please run "hpc-metric-tool apply" to apply the change to cluster.
Adding External Metric Summary:
Name: miccpu_temp0
LSF resource Mapping: default
ELIM file path: /root/dynelim.intelmic
LSF resource interval: 60
LSF resource increase: n
Display Name: MIC0 CPU temp Celsius
External Metric is added, please run "hpc-metric-tool apply" to apply the change to cluster.
….
….
Adding External Metric Summary:
Name: micnum_cores1
LSF resource Mapping: default
ELIM file path: /root/dynelim.intelmic
LSF resource interval: 60
LSF resource increase: n
Display Name: MIC1 Number of cores
External Metric is added, please run "hpc-metric-tool apply" to apply the change to cluster.
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.num_mics
New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.micnum_cores1
New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccore_freq0
New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccpu_util1
New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccpu_temp1
….
….
New file found: /etc/cfm/lsf-master-candidate/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccore_freq1
New file found: /etc/cfm/lsf-master-candidate/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccpu_temp0
New file found: /etc/cfm/lsf-master-candidate/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.micfree_mem0
Distributing 333 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Setting up dhcpd service...
Setting up dhcpd service successfully...
Setting up NFS export service...
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
Distributing 89 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Running the commands above will result in multiple ELIMs being written to the $LSF SERVERDIR
directory (/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc) with names:
elim.num_mics
elim.miccpu_temp0
...
...
As each ELIM is an instance of "dynelim.intelmic", you simply retain elim.num_mics. Below are the
steps to perform a cleanup of the ELIM scripts.
Cleanup of ELIM scripts from CFM template directories. Here all elim.mic* files are removed; we retain
only elim.num_mics. (executed on the IBM Platform HPC head node).
# cd /etc/cfm
# find ./ -name "elim.mic*" -print | xargs rm -f
Cleanup of ELIMs from $LSF_SERVERDIR on all nodes (executed on the IBM Platform HPC head node).
# lsgrun -m “ compute000 compute001" find /opt/lsf/8.3/linux2.6-glibc2.3-x86_64/ -name "elim.mic*" exec rm -f {} \;
Update CFM (executed on IBM Platform HPC head node).
# kusu-cfmsync -f
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.micnum_cores1
Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/elim.miccore_freq0
Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/elim.miccpu_util1
Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/elim.miccpu_temp1
Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.mictotal_power1
….
….
Removing orphaned file: /opt/kusu/cfm/7/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/elim.micfree_mem0
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
# lsadmin limshutdown all
Do you really want to shut down LIMs on all hosts? [y/n] y
Shut down LIM on <mel1> ...... done
Shut down LIM on <compute000> ...... done
Shut down LIM on <compute001> ...... done
** NOTE: The warning messages below in the output of lsadmin limstartup may be ignored **
# lsadmin limstartup all
Aug 14 11:37:44 2013 23856 3 8.3 do_Resources: /opt/lsf/conf/lsf.shared(340): Resource name
processes reserved or previously defined. Ignoring line
Do you really want to start up LIM on all hosts ? [y/n]y
Start up LIM on <mel1> ...... Aug 16 15:37:49 2013 25092 3 8.3 do_Resources:
/opt/lsf/conf/lsf.shared(340): Resource name processes reserved or previously defined. Ignoring line
done
Start up LIM on <compute000> ...... Aug 14 11:37:50 2013 88229 3 8.3 do_Resources:
/opt/lsf/conf/lsf.shared(340): Resource name processes reserved or previously defined. Ignoring line
done
Start up LIM on <compute001> ...... Aug 14 11:37:50 2013 63077 3 8.3 do_Resources:
/opt/lsf/conf/lsf.shared(340): Resource name processes reserved or previously defined. Ignoring line
done
The output of the IBM Platform LSF 'lsload' command shows the metrics as expected:
# lsload -l
HOST_NAME
status r15s r1m r15m ut pg io ls it tmp swp mem root maxroot
processes clockskew netcard iptotal cpuhz cachesize diskvolume processesroot ipmi powerconsumption
ambienttemp cputemp num_mics miccpu_temp0 miccore_freq0 mictotal_power0 micfree_mem0
miccpu_util0 micnum_cores0 miccpu_temp1 miccore_freq1 mictotal_power1 micfree_mem1
miccpu_util1 micnum_cores1 ngpus gpushared gpuexcl_thrd gpuprohibited gpuexcl_proc gpumode0
gputemp0 gpuecc0 gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2 gpumode3 gputemp3
gpuecc3 hostid
ip
mac
osversion abbros cpumode fanrate
rxpackets
txpackets
rxbytes
txbytes
droppedrxpackets
droppedtxpackets
errorrxpackets
errortxpackets
overrunrxpackets
overruntxpackets
rxpacketsps
txpacketsps
rxbytesps
txbytesps
gpumodel0 gpumodel1 gpumodel2 gpumodel3 gpudriver
compute001
ok 0.0 0.0 0.0 5% 0.0 12 0 1 8584M 2G 31G 8591.0 1e+04 725.0
0.0 4.0 6.0 2100.0 2e+04 2e+06
717.0 -1.0
-1.0
-1.0 -1.0 1.0
46.0
1.1
74.0
2662.6
0.0
57.0
- - 010a0c01
eth0:;eth1:9.21.51.37;eth2:;eth3:;eth4:;eth5:;
eth0:00:1E:67:49:CC:E5;eth1:00:1E:67:49:CC:E6;eth2:00:1E:67:49:CC:E7;eth3:00:1E:67:49:CC:E8;eth4:00:
1E:67:0C:BE:20 Red_Hat_Enterprise_Linux_Server_release_6.2_(Santiago) RedHat6 06/2d
-1
eth0:290901;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:26842;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:338700;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:2668;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
compute000
ok 0.0 0.1 0.0 4% 0.0 10 0 1 8584M 2G 31G 8591.0 1e+04 723.0
0.0 4.0 6.0 1200.0 2e+04 1e+07
715.0 -1.0
-1.0
-1.0 -1.0 1.0
54.0
1.1
67.0
2662.4
0.0
57.0
- - 010a0b01
eth0:;eth1:9.21.51.36;eth2:;eth3:;eth4:;eth5:;
eth0:00:1E:67:49:CC:83;eth1:00:1E:67:49:CC:84;eth2:00:1E:67:49:CC:85;eth3:00:1E:67:49:CC:86;eth4:00
:1E:67:0C:BA:D8 Red_Hat_Enterprise_Linux_Server_release_6.2_(Santiago) RedHat6 06/2d
-1
eth0:293788;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:26151;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:338976;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:2679;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
-
mel1
ok 0.2 0.3 0.3 7% 0.0 60 3 2 28G 33G 25G 3e+04 5e+04 1021.0
0.0
4.0 6.0 1200.0 2e+04 2e+06
938.0 -1.0
-1.0
-1.0 -1.0
- - 010a0a01
eth0:192.0.2.10;eth1:9.21.51.35;eth2:;eth3:;eth4:;eth5:;
eth0:00:1E:67:31:42:CD;eth1:00:1E:67:31:42:CE;eth2:00:1E:67:31:42:CF;eth3:00:1E:67:31:42:D0;eth4:00
:1E:67:0C:BB:80 Red_Hat_Enterprise_Linux_Server_release_6.2_(Santiago) RedHat6 06/2d
-1
eth0:3416828;eth1:13903274;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:11413427;eth1:29766156;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:331724;eth1:2318337;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:13522166;eth1:42629433;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:125;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:9;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:3;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:1;eth2:0;eth3:0;eth4:0;eth5:0;
eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0;
It is now possible to display the newly added resources in the IBM Platform HPC Web Console. In the
screenshot below, the Xeon Phi specific metrics are displayed on the respective host status lines.
C. IBM Platform LSF: Xeon Phi job scheduling (LSF configuration)
The following steps describe the necessary IBM Platform LSF configuration in support of Xeon Phi. All
LSF hosts equipped with Xeon Phi must be tagged with the Boolean resource "mic". This will allow users
submitting Xeon Phi specific workloads to IBM Platform LSF to request a system equipped with Xeon Phi.
Additionally, it is necessary to enable the resource reservation per slot for the defined resource
'num_mics'.
Edit /etc/cfm/templates/lsf/default.lsf.shared and make the following updates:
Begin Resource
RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION
mic
Boolean ()
()
(Intel MIC architecture)
....
....
End Resource
# Keywords
Edit /etc/cfm/templates/lsf/default.lsf.cluster and make the following updates:
Begin Host
HOSTNAME model
XXX_lsfmc_XXX ! !
compute000 ! ! 1
compute001 ! ! 1
....
....
End Host
type
1 3.5
3.5 ()
3.5 ()
server r1m mem swp RESOURCES #Keywords
() () (mg)
() (mic)
() (mic)
Edit /etc/cfm/templates/lsf/lsbatch/default/configdir/lsb.resources and make the following
updates:
Begin ReservationUsage
RESOURCE
METHOD
num_mics
End ReservationUsage
RESERVE
Y
Edit /etc/cfm/templates/lsf/lsbatch/default/configdir/lsb.params and add the following parameter:
Begin Parameters
....
....
RESOURCE_RESERVE_PER_SLOT=Y
End Parameters
Run the command kusu-addhost -u to make the changes take effect.
# kusu-addhost -u
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Setting up dhcpd service...
Setting up dhcpd service successfully...
Setting up NFS export service...
Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh
Distributing 102 KBytes to all nodes.
Updating installer(s)
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
Sending to 198.51.100.255
Sending to 192.0.2.255
D. IBM Platform LSF: Xeon Phi job submission
This section discusses the methodology to submit both "offload" and "native" type workloads to Xeon
Phi coprocessor equipped nodes in an IBM Platform LSF cluster. In simple terms:


With the "offload" model, the executable is run on the host processor, offloading specific work
to the Xeon Phi coprocessor.
With the "native" model, the executable runs natively on the Xeon Phi coprocessor.
For both "offload" and "native" type workloads, it is assumed that the following configuration for IBM
Platform LSF exists (see Section C: IBM Platform LSF: Xeon Phi job scheduling above):



i.
Non-shared dynamic resource 'num_mics', which counts the number of Xeon Phi coprocessor
(cards) available on a system (online state).
Non-shared Boolean resource 'mic', which is configured for nodes equipped with Xeon Phi
coprocessor(s).
RESOURCE_RESERVE_PER_SLOT=Y has been configured.
Offload Example I: The following example shows a simple offload binary ('omp_numthreadsofl')
being launched under IBM Platform LSF. The binary has been compiled using the Intel Compiler
(intel-compilerproc-117-13.0-1.x86_64) and is launched using the IBM Platform LSF bsub
command requesting the Boolean resource "mic" and rusage on the resource num_mics equal
to 1. Note that it is currently not possible to target a specific Xeon Phi device at runtime when
running an Offload executable. This is a limitation of the current Intel tools.
$ bsub -I -R "select[mic] rusage[num_mics=1]" /shared/omp_numthreadsofl -t 16
Job <1083> is submitted to default queue <medium_priority>.
<<Waiting for dispatch ...>>
<<Starting on compute000>>
Hello World from thread = 0
Hello World from thread = 11
Number of threads on node compute000-mic0 = 16
Hello World from thread = 2
Hello World from thread = 1
Hello World from thread = 4
Hello World from thread = 9
Hello World from thread = 8
Hello World from thread = 10
Hello World from thread = 5
Hello World from thread = 6
Hello World from thread = 7
Hello World from thread = 3
Hello World from thread = 13
Hello World from thread = 12
Hello World from thread = 14
Hello World from thread = 15
ii.


Offload Example II: The following shows an example of an Intel MPI offload binary being
launched under IBM Platform LSF. The binary has been compiled using the Intel Compiler
(intel-compilerproc-117-13.0-1.x86_64) and is launched using the IBM Platform LSF bsub
command requesting the following:
Boolean resource "mic".
Resource reservation (rusage) on the resource num_mics equal to 1 (per slot).

Two processors (MPI ranks), one processor per node. Note that each MPI rank will use Offload
if Xeon Phi available.
$ bsub -n 2 -R "select[mic] rusage[num_mics=1] span[ptile=1]" -I mpiexec.hydra /shared/mixedofl_demo
Job <1082> is submitted to default queue <medium_priority>.
<<Waiting for dispatch ...>>
<<Starting on compute000>>
Hello from thread 0 out of 224 from process 0 out of 2 on compute000
Hello from thread 94 out of 224 from process 0 out of 2 on compute000
Hello from thread 8 out of 224 from process 0 out of 2 on compute000
Hello from thread 78 out of 224 from process 0 out of 2 on compute000
Hello from thread 14 out of 224 from process 0 out of 2 on compute000
Hello from thread 70 out of 224 from process 0 out of 2 on compute000
Hello from thread 1 out of 224 from process 0 out of 2 on compute000
Hello from thread 57 out of 224 from process 0 out of 2 on compute000
Hello from thread 113 out of 224 from process 0 out of 2 on compute000
Hello from thread 72 out of 224 from process 0 out of 2 on compute000
Hello from thread 16 out of 224 from process 0 out of 2 on compute000
….
….
Hello from thread 43 out of 224 from process 1 out of 2 on compute001
Hello from thread 98 out of 224 from process 1 out of 2 on compute001 .
iii.
Native Examples
IBM has devised an example job wrapper script which allows users to launch jobs targeted to
Xeon Phi under IBM Platform LSF. The example job wrapper script assumes that IBM Platform
LSF has been configured as per Section C: IBM Platform LSF: Xeon Phi job scheduling above.
Download instructions for the example ELIM, dynelim.intelmic and the job wrapper script
(mic.job) can be found in Appendix A.
The job wrapper makes the following assumptions:
 Support for a maximum of 2 Xeon Phi devices per node.
 The job wrapper script assumes that there is a shared $HOME directory for the user
running the job, so the wrapper will not function for the user 'root'.
 Each Xeon Phi equipped LSF host is running an ELIM which reports back the number of
Xeon Phi available in the node (dynamic resource 'num_mics').
 Jobs requiring a Xeon Phi are submitted to LSF with the correct resource requirements.
For Xeon Phi jobs, bsub -n N translates to a request for N Xeon Phi devices. Jobs must
also be submitted with the correct corresponding rusage[num_mics=1] resource
(assuming the configuration above on Section C). For example, here we submit a job
which requires 2 Xeon Phi coprocessors:
o $ bsub -n 2 -R "select [mic] rusage[num_mics=1]" <PATH_TO>/mic.job
<PATH_TO>/a.out



o Note that 2 job slots will also be allocated on the nodes selected.
Intel MPI (runtime) must be available if MPI ranks are to be run on Xeon Phi (coprocessor mode).
The job wrapper script has been tested with both native Xeon Phi binaries (leveraging
Intel MPSS 'micnativeloadex') and Intel MPi Xeon Phi co-processor mode jobs.
Xeon Phi jobs submitted to IBM Platform LSF will be marked with the Xeon Phi
hostname(s) using the IBM Platform LSF bpost command. This provides rudimentary
control over access to devices. Once a Xeon Phi has been marked for use by a job, it is
used exclusively for the duration of the job.
Native MPI Example ('Co-processor mode'): The following shows an example of a native Xeon
Phi MPI binary being launched under IBM Platform LSF. The binary has been compiled using the
Intel Compiler (intel-compilerproc-117-13.0-1.x86_64) and is launched using the 'mic.job'
wrapper. The resource requirement string for the job requests the Boolean resource "mic" and
rusage on the resource num_mics equal to 1.
$ bsub -n 2 -I -R "select[mic] rusage[num_mics=1]" /shared/mic.job /shared/cpuops_mpi
Job <975> is submitted to default queue <medium_priority>.
<<Waiting for dispatch ...>>
<<Starting on compute001>>
- current ops per sec [avg 35.24]
33.66 35.44 35.24 35.24 35.24 35.44
35.24 35.24 35.24 35.44 35.24 35.44
35.24 35.44 35.24 35.24 35.24 35.44
35.24 35.24 35.24 35.44 35.24 35.24
35.24 35.44 35.24 35.24 35.24 35.24
35.24 35.24
....
....
During execution of the job we see the following:
bpost of Xeon Phi hostname(s) to the job (this is performed by the job wrapper script, 'mic.job'):
# bjobs -l 975
Job <975>, User <hpcadmin>, Project <default>, Status <RUN>, Queue <medium_prio
rity>, Interactive mode, Command </shared/mic.job /shared/
cpuops_mpi>, Share group charged </hpcadmin>
Wed Aug 14 12:48:13: Submitted from host <mel1>, CWD <$HOME>, 2 Processors Requ
ested, Requested Resources <select[mic] rusage[num_mics=1]
>;
Wed Aug 14 12:48:15: Started on 2 Hosts/Processors <compute001> <compute000>;
Wed Aug 14 12:48:29: Resource usage collected.
MEM: 5 Mbytes; SWAP: 357 Mbytes; NTHREAD: 7
PGID: 188945;
PGID: 188946;
PGID: 188970;
PGID: 188971;
PIDs: 188945
PIDs: 188946 188948 188969
PIDs: 188970
PIDs: 188971
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - loadStop - - - - - - - - - root maxroot processes clockskew netcard iptotal cpuhz cachesize
loadSched loadStop diskvolume processesroot ipmi powerconsumption ambienttemp cputemp
loadSched
loadStop
num_mics miccpu_temp0 miccore_freq0 mictotal_power0 micfree_mem0
loadSched loadStop
miccpu_util0 micnum_cores0 miccpu_temp1 miccore_freq1 mictotal_power1
loadSched
loadStop
micfree_mem1 miccpu_util1 micnum_cores1 ngpus gpushared
loadSched
loadStop
gpuexcl_thrd gpuprohibited gpuexcl_proc gpumode0 gputemp0 gpuecc0
loadSched
loadStop
gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2 gpumode3 gputemp3
loadSched loadStop
gpuecc3
loadSched loadStop EXTERNAL MESSAGES:
MSG_ID FROM
POST_TIME MESSAGE
0 hpcadmin Aug 14 12:48 compute001-mic0
1 hpcadmin Aug 14 12:48 compute000-mic0
ATTACHMENT
N
N
bhosts -l compute000
HOST compute000
STATUS
CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
ok
16.00 - 16 1 1 0 0 0 CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem root maxroot
Total
0.0 0.0 0.0 0% 0.0 16 0 27 8584M 2G 31G 8591.0 1e+04
Reserved
0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M 0.0 0.0
processes clockskew netcard iptotal cpuhz cachesize diskvolume
Total
722.0
0.0 4.0 6.0 1200.0 2e+04 1e+07
Reserved
0.0
0.0 0.0 0.0 0.0
0.0
0.0
processesroot ipmi powerconsumption ambienttemp cputemp num_mics
Total
714.0 -1.0
-1.0
-1.0 -1.0 0.0
Reserved
0.0 0.0
0.0
0.0 0.0 1.0
….
.....
# bhosts -l compute001
HOST compute001
STATUS
CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
ok
16.00 - 16 1 1 0 0 0 CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem root maxroot
Total
0.0 0.0 0.0 0% 0.0 17 0 32 8584M 2G 31G 8591.0 1e+04
Reserved
0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M 0.0 0.0
processes clockskew netcard iptotal cpuhz cachesize diskvolume
Total
731.0
0.0 4.0 6.0 1200.0 2e+04 2e+06
Reserved
0.0
0.0 0.0 0.0 0.0
0.0
0.0
processesroot ipmi powerconsumption ambienttemp cputemp num_mics
Total
717.0 -1.0
-1.0
-1.0 -1.0 0.0
Reserved
0.0 0.0
0.0
0.0 0.0 1.0
….
….
You can see above that num_mics has 1 unit reserved (as expected) on both compute000 and
compute001.
Native (non-MPI) Example: The following shows an example of a native Xeon Phi binary (non-MPI)
being launched under IBM Platform LSF. The binary has been compiled using the Intel Compilers, Intel
MPI (intel-mpi-intel64-4.1.0p-024.x86_64) and is launched using the 'mic.job' wrapper. The resource
requirement string for the job requests the Boolean resource "knc" and rusage on the resource
num_mics equal to 1.
$ bsub -I -R "select[mic] rusage[num_mics=1]" /shared/mic.job /shared/fibo
Job <1078> is submitted to default queue <medium_priority>.
<<Waiting for dispatch ...>>
<<Starting on compute000>>
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
….
….
Remote process returned: 0
Exit reason: SHUTDOWN OK
During execution of the job you see the following:
'bpost' of Xeon Phi hostnames to the job (this is performed by the job wrapper script)
# bjobs -l 1078
Job <1078>, User <hpcadmin>, Project <default>, Status <RUN>, Queue <medium_pri
ority>, Interactive mode, Command </shared/mic.job /shared
/fibo>, Share group charged </hpcadmin>
Wed Aug 14 17:43:54: Submitted from host <mel1>, CWD <$HOME>, Requested Resourc
es <select[mic] rusage[num_mics=1]>;
Wed Aug 14 17:43:57: Started on <compute000>;
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - loadStop - - - - - - - - - root maxroot processes clockskew netcard iptotal cpuhz cachesize
loadSched loadStop ….
….
gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2 gpumode3 gputemp3
loadSched loadStop
gpuecc3
loadSched loadStop EXTERNAL MESSAGES:
MSG_ID FROM
POST_TIME MESSAGE
0 hpcadmin Aug 14 17:43 compute000-mic0
ATTACHMENT
N
# bhosts -l compute000
HOST compute000
STATUS
CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
ok
16.00 - 16 1 1 0 0 0 CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem root maxroot
Total
0.0 0.0 0.0 3% 0.0 23 1 2 8600M 2G 31G 0.0 0.0
Reserved
0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M processes clockskew netcard iptotal cpuhz cachesize diskvolume
Total
0.0
0.0 0.0 0.0 0.0
0.0
0.0
Reserved
- processesroot ipmi powerconsumption ambienttemp cputemp num_mics
Total
0.0 0.0
0.0
0.0 0.0 0.0
Reserved
- 1.0
….
….
gpuecc0 gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2
Total
0.0 0.0 0.0 0.0 0.0 0.0 0.0
Reserved
-
gpumode3 gputemp3 gpuecc3
Total
0.0 0.0 0.0
Reserved
….
….
You see above that num_mics has 1 unit reserved (as expected).
Appendix A: IBM Platform HPC and Intel Xeon Phi Integration Scripts
The package containing all example scripts referred to in this document are available for download at
IBM Fix Central (http:// http://www.ibm.com/support/fixcentral/). The scripts are provided as examples
as per the terms indicated in the file LICENSE. The package contains the example scripts:





dynelim.mic: dynamic ELIM script to collect Intel MIC related metrics (HPC, LSF)
mic.job: Intel MIC job wrapper (LSF)
mic_metric_add.sh: script to facilitate the rapid addition of Intel MIC metrics (HPC).
rc.local.xeon_phi Script portion to configure network bridge, and configure Intel MPSS to use
network bridge
post_install.sh Custom script for IBM Platform HPC. This will force the execution of rc.local on
the first boot after the CFM has executed.
Copyright and Trademark Information
© Copyright IBM Corporation 2013
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered
in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other
companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark
information" at www.ibm.com/legal/copytrade.shtml.
Download