IBM Platform HPC: Best Practices for integrating with Intel® Xeon PhiTM Coprocessors Date: August 22, 2013 (Revision 1.3) Author: Gábor Samu (gsamu@ca.ibm.com) Reviewer: Mehdi Bozzo-Rey (mbozzore@ca.ibm.com) I. Background ..................................................................................................................................... 3 II. Infrastructure Preparation ............................................................................................................... 4 III. Intel Software tools deployment.................................................................................................... 12 IV. IBM Platform HPC: Xeon Phi monitoring, workloads ...................................................................... 14 A. IBM Platform HPC built-in Xeon Phi monitoring.......................................................................... 14 B. IBM Platform LSF ELIM: Xeon Phi monitoring, job scheduling (dynamic resources).................... 16 C. IBM Platform LSF: Xeon Phi job scheduling (LSF configuration)................................................... 22 D. IBM Platform LSF: Xeon Phi job submission................................................................................ 23 Appendix A: IBM Platform HPC and Intel Xeon Phi Integration Scripts ................................................... 32 Copyright and Trademark Information................................................................................................... 32 I. Background IBM Platform HPC is a complete, end-to-end HPC cluster management solution. It includes a rich set of out-of the-box features that empowers high performance technical computing users by reducing the complexity of their HPC environment and improving their time-to-solution. IBM Platform HPC includes the following key capabilities: Cluster management Workload management Workload monitoring and reporting System monitoring and reporting MPI libraries (includes IBM Platform MPI) Integrated application scripts/templates for job submission Unified web portal Intel Xeon Phi (Many Integrated Cores - MIC) is a new CPU architecture developed by Intel Corporation that provides higher aggregated performance than alternative solutions deliver. It is designed to simplify application parallelization while at the same time delivering significant performance improvement. The distinct features included in the Intel Xeon Phi design are following: It is comprised of many smaller lower power Intel processor cores It contains wider vector processing units for greater floating point Performance /watt with its innovative design, Intel Xeon Phi delivers higher aggregated performance, Supports data parallel, thread parallel and process parallel, and increased total memory bandwidth. This document provides an example configuration of IBM Platform HPC for an environment containing systems equipped with Xeon Phi. The document is broken down into three broad sections: Infrastructure Preparation Xeon Phi monitoring using IBM Platform HPC Monitoring Framework Xeon Phi workload management using IBM Platform LSF The overall procedure was validated with the following software versions: IBM Platform HPC 3.2 (Red Hat Enterprise Linux 6.2) Intel(r) MPSS version 2.1.6720-12.2.6.32-220 (Intel software stack for Xeon Phi) Intel(r) Cluster Studio XE for Linux version 2013 Update 1 Intel(r) MPI version 4.1.0.024 II. Infrastructure Preparation For the purpose of this document, the example IBM Platform HPC cluster is configured as follows: IBM Platform HPC head node: Compute node(s): mel1 compute000 compute001 Both compute000, compute001 are equipped as follows: 1 Intel Xeon Phi co-processor cards 2 Gigabit Ethernet NICs (No InfiniBand(R) present) The infrastructure is configured using the following networks: IP Ranges 192.0.2.2 - 192.0.2.50 Description Cluster private network 192.0.2.51 - 192.0.2.99 192.0.2.100 - 192.0.2.150 Xeon Phi network (bridged) Out-of-Band management network Comments Provisioning, monitoring, computation Computation IPMI network mel1 has two network interfaces configured: eth0 (public interface) eth1 (private interface) compute000, compute001 will have two network interfaces configured: eth1 (private interface); configured during provisioning bridged interface; to be configured post provisioning The following steps assume that mel1 has been installed with IBM Platform HPC 3.2 and the original configuration was not modified. Additionally, the steps below rely upon the IBM Platform HPC CLI/TUI tools. All of the operations performed in the document using the CLI/TUI tools, can also be performed using the IBM Platform HPC Web Console. 1. Create a new node group template for compute hosts equipped with Xeon Phi. Create a copy of the default package based compute node group template compute-rhel-6.2-x86_64 named computerhel-6.2-x86_64_Xeon_Phi. # kusu-ngedit -c compute-rhel-6.2-x86_64 -n compute-rhel-6.2-x86_64_Xeon_Phi Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/.updatenics New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/fstab.kusuappend New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/hosts New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/hosts.equiv New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/passwd.merge New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/group.merge New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/shadow.merge New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_config New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_dsa_key New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_rsa_key New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_key New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_dsa_key.pub New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_key.pub New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /etc/ssh/ssh_host_rsa_key.pub New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /root/.ssh/authorized_keys New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /root/.ssh/id_rsa New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/kusu/etc/logserver.addr New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/lsf/conf/lsf.conf New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/lsf/conf/hosts New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/lsf/conf/lsf.shared New file found: /etc/cfm/compute-rhel-6.2-x86_64_ Xeon_Phi /opt/lsf/conf/lsf.cluster.mel1_cluster1 .... .... Distributing 75 KBytes to all nodes. Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 2. Add Intel MPSS packages to the default software repository managed by IBM Platform HPC. This is required to automate the deployment of Intel MPSS to the Xeon Phi equipped nodes. # cp *.rpm /depot/contrib/1000/ # ls -la *.rpm -rw-r--r-- 1 root root 16440156 May 8 13:55 intel-mic-2.1.6720-12.2.6.32-220.el6.x86_64.rpm -rw-r--r-- 1 root root 3298216 May 8 13:55 intel-mic-cdt-2.1.6720-12.2.6.32-220.el6.x86_64.rpm -rw-r--r-- 1 root root 522844 May 8 13:55 intel-mic-flash-2.1.386-2.2.6.32-220.el6.x86_64.rpm -rw-r--r-- 1 root root 10255872 May 8 13:55 intel-mic-gdb-2.1.6720-12.2.6.32-220.el6.x86_64.rpm -rw-r--r-- 1 root root 182208656 May 8 13:55 intel-mic-gpl-2.1.6720-12.el6.x86_64.rpm -rw-r--r-- 1 root root 2300600 May 8 13:55 intel-mic-kmod-2.1.6720-12.2.6.32.220.el6.x86_64.rpm -rw-r--r-- 1 root root 280104 May 8 13:55 intel-mic-micmgmt-2.1.6720-12.2.6.32.220.el6.x86_64.rpm -rw-r--r-- 1 root root 254776 May 8 13:55 intel-mic-mpm-2.1.6720-12.2.6.32-220.el6.x86_64.rpm -rw-r--r-- 1 root root 10863724 May 8 14:10 intel-mic-ofed-card-6720-12.2.6.32-220.el6.x86_64.rpm -rw-r--r-- 1 root root 1489992 May 8 14:10 intel-mic-ofed-dapl-2.0.36.7-1.el6.x86_64.rpm -rw-r--r-- 1 root root 44528 May 8 14:10 intel-mic-ofed-dapl-devel-2.0.36.7-1.el6.x86_64.rpm -rw-r--r-- 1 root root 220712 May 8 14:10 intel-mic-ofed-dapl-devel-static-2.0.36.7-1.el6.x86_64.rpm -rw-r--r-- 1 root root 108940 May 8 14:10 intel-mic-ofed-dapl-utils-2.0.36.7-1.el6.x86_64.rpm -rw-r--r-- 1 root root 5800 May 8 14:10 intel-mic-ofed-ibpd-6720-12.2.6.32-220.el6.x86_64.rpm -rw-r--r-- 1 root root 14730200 May 8 14:10 intel-mic-ofed-kmod-6720-12.2.6.32-220.el6.x86_64.rpm -rw-r--r-- 1 root root 102052 May 8 14:10 intel-mic-ofed-kmod-devel-6720-12.2.6.32220.el6.x86_64.rpm -rw-r--r-- 1 root root 8536 May 8 14:10 intel-mic-ofed-libibscif-6720-12.2.6.32-220.el6.x86_64.rpm -rw-r--r-- 1 root root 5620 May 8 14:10 intel-mic-ofed-libibscif-devel-6720-12.2.6.32220.el6.x86_64.rpm -rw-r--r-- 1 root root 55163240 May 8 13:55 intel-mic-sysmgmt-2.1.6720-12.2.6.32-220.el6.x86_64.rpm # kusu-repoman -u -r "rhel-6.2-x86_64" Refreshing repository: rhel-6.2-x86_64. This may take a while... 3. The Out-of-Band management network is defined using kusu-netedit. The configuration of an Outof-Band management network is highly recommended. Refer to Chapter 6 (Networks) in the IBM Platform HPC 3.2 Administering IBM Platform HPC guide for details on configuring a BMC network. # kusu-netedit -a -n 192.0.2.0 -s 255.255.255.0 -i bmc -t 192.0.2.100 -e "BMC network" -x "-bmc" 4. Next, modify the node group "compute-rhel-6.2-x86_64_Xeon_Phi" to add the following Optional Packages for deployment and to enable the Out-of-Band management network. The kusu-ngedit tool is used for this purpose. kusu-ngedit presents the administrator with a TUI interface from which the package selection and network selection can be performed. Networks TUI screen: Enable network "bmc" for node group "compute-rhel-6.2-x86_64_Xeon_Phi" Optional Packages TUI screen: intel-mic-micmgmt intel-mic-mpm-2.1.6720 intel-mic-2.1.6720 intel-mic-cdt-2.1.6720 intel-mic-flash-2.1.386 intel-mic-gdb-2.1.6720 intel-mic-gpl intel-mic-kmod intel-mic-sysmgmt-2.1.6720 libstdc++ (needed by Intel MPSS software) 5. For each Xeon Phi device, you must assign a static IP address. The IP addresses selected for this example are on the cluster network 192.0.2.0. The Xeon Phi device IP addresses are added to IBM Platform HPC as unmanaged devices using the kusu-addhost command. This will ensure that the Xeon Phi hostnames are added to the /etc/hosts on each cluster node, as well as preventing the IPs from being allocated by IBM Platform HPC for other devices. Hostname compute000 compute000-mic0 compute001 compute001-mic0 IP address 192.0.2.11 192.0.2.51 192.0.2.12 192.0.2.52 # kusu-addhost -s compute000-mic0 -x 192.0.2.51 Setting up dhcpd service... Setting up dhcpd service successfully... Setting up NFS export service... Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh Distributing 8 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 # kusu-addhost -s compute001-mic0 -x 192.0.2.52 Setting up dhcpd service... Setting up dhcpd service successfully... Setting up NFS export service... Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh Distributing 8 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 6. IBM Platform HPC manages the network interfaces of all compute nodes but does not currently support the management of bridged network interfaces. It is necessary to define a bridge on the compute nodes so that the Xeon Phi devices can be accessed over the network. This is mandatory in the situation for example where Xeon Phi native MPI. The following procedure automates the configuration of the network bridge and the necessary Xeon Phi configuration to utilize the bridge. The following procedure supports a maximum of one Xeon Phi device per node. Two steps are involved: Create a post install script that will trigger the run of /etc/rc.local and add it to the node group of your choice Create rc.local.append for the same node group, under the appropriate cfm directory on the installer Appendix A contains details on where to obtain the example post-install script, and rc.local.append contents. ** WARNING: The following changes will prevent Platform HPC from managing network interfaces on the compute nodes. ** Copy the example post_install.sh script to /root on the IBM Platform HPC head node. # cp post_install.sh /root Start kusu-ngedit and edit the Xeon Phi specific node group compute-rhel-6.2-x86_64_ Xeon_Phi. Add the script post_install.sh as a Custom Script. Copy the example rc.local.xeon_phi script to the appropriate CFM directories with filename rc.local.append. This will ensure that the contents of rc.local.xeon_phi are appended to the rc.local file on the respective compute nodes. In this case, we need to copy the file to the CFM directory for the node group compute-rhel-6.2-x86_64_ Xeon_Phi. # cp rc.local.xeon_phi /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/rc.local.append Next, execute kusu-cfmsync to make the change take effect. [root@mel1 ~]# kusu-cfmsync -f Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/etc/rc.local.append Distributing 1 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 7. To ensure a consistent host name space, you should use the CFM framework to propagate the /etc/hosts file from the IBM Platform HPC head node to all known Xeon Phi devices. On the IBM Platform HPC head node perform the following operations: # cp /etc/hosts /shared/hosts # mkdir -p /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/opt/intel/mic/filesystem/base/etc/rc.d In /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/opt/intel/mic/filesystem/base/etc/rc.d create the file rc.sysinit.append containing the following: cp /shared/hosts /etc/hosts **Note: The above steps must be repeated under the /etc/cfm/installer-rhel-6.2-x86_64 file system. This is required as the IBM Platform HPC head node is equipped with Xeon Phi.** The updates to the Xeon Phi configuration files are propagated to all nodes in the node groups "compute-rhel-6.2-x86_64_Xeon_Phi". On the IBM HPC head node, execute "kusu-cfmsync". # kusu-cfmsync -f Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh New file found: /etc/cfm/compute-rhel-6.2x86_64_Xeon_Phi/opt/intel/mic/filesystem/base/etc/rc.d/rc.sysinit.append Distributing 0 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 8. Provision all Xeon Phi equipped nodes using the node group template "compute-rhel-6.2x86_64_Xeon_Phi". Note that once nodes are discovered by kusu-addhost, the administrator must exit from the listening mode by pressing Control-C. This will complete the node discovery process. # kusu-addhost -i eth0 -n compute-rhel-6.2-x86_64_Xeon_Phi -b Scanning syslog for PXE requests... Discovered Node: compute000 Mac Address: 00:1e:67:49:cc:83 Discovered Node: compute001 Mac Address: 00:1e:67:49:cc:e5 ^C Command aborted by user... Setting up dhcpd service... Setting up dhcpd service successfully... Setting up NFS export service... Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh Distributing 100 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 9. If passwordless SSH as 'root' to the Xeon Phi devices is needed, then the following step must be performed prior to the generation of the Intel MPSS configuration. Copy the public SSH key for the root account from the head node to all nodes that are in the node group compute-rhel-6.2x86_64_Xeon_Phi node group. # ln -s /opt/kusu/etc/.ssh/id_rsa.pub /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/root/.ssh/id_rsa.pub # kusu-cfmsync -f Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/root/.ssh/id_rsa.pub Distributing 0 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 III. Intel Software tools deployment It is recommended to install all of the Intel Software tools, and Intel MPI to the common shared file system. IBM Platform HPC configures a default NFS share "/shared" which is common on all compute nodes managed by the software. With the procedure above, /shared is mounted and available on all nodes in the cluster, including Xeon Phi co-processor environments. Here, the native Intel Software tools and Intel MPI installation programs are used. No further detail is provided in the document. As part of the installation of the Intel Software tools, you may be required to install additional 32-bit libraries. If this is the case, then the following yum commands must be executed across the nodes as follows in the example below. # lsrun -m "mel1 compute000 compute001" yum -y install libstdc++.i686 Loaded plugins: product-id, security, subscription-manager Updating certificate-based repositories. Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package libstdc++.i686 0:4.4.6-3.el6 will be installed --> Processing Dependency: libm.so.6(GLIBC_2.0) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libm.so.6 for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libgcc_s.so.1(GLIBC_2.0) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libgcc_s.so.1(GCC_4.2.0) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libgcc_s.so.1(GCC_3.3) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libgcc_s.so.1(GCC_3.0) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libgcc_s.so.1 for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libc.so.6(GLIBC_2.4) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libc.so.6(GLIBC_2.3.2) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libc.so.6(GLIBC_2.3) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libc.so.6(GLIBC_2.2) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libc.so.6(GLIBC_2.1.3) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libc.so.6(GLIBC_2.1) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libc.so.6(GLIBC_2.0) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: libc.so.6 for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: ld-linux.so.2(GLIBC_2.3) for package: libstdc++-4.4.6-3.el6.i686 --> Processing Dependency: ld-linux.so.2 for package: libstdc++-4.4.6-3.el6.i686 --> Running transaction check ---> Package glibc.i686 0:2.12-1.47.el6 will be installed --> Processing Dependency: libfreebl3.so(NSSRAWHASH_3.12.3) for package: glibc-2.12-1.47.el6.i686 --> Processing Dependency: libfreebl3.so for package: glibc-2.12-1.47.el6.i686 ---> Package libgcc.i686 0:4.4.6-3.el6 will be installed --> Running transaction check ---> Package nss-softokn-freebl.i686 0:3.12.9-11.el6 will be installed --> Finished Dependency Resolution Dependencies Resolved ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: libstdc++ i686 4.4.6-3.el6 kusu-compute-default 298 k Installing for dependencies: glibc i686 2.12-1.47.el6 kusu-compute-default 4.3 M libgcc i686 4.4.6-3.el6 kusu-compute-default 110 k nss-softokn-freebl i686 3.12.9-11.el6 kusu-compute-default 116 k Transaction Summary ================================================================================ Install 4 Package(s) Total download size: 4.8 M Installed size: 14 M Downloading Packages: -------------------------------------------------------------------------------Total 21 MB/s | 4.8 MB 00:00 Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Warning: RPMDB altered outside of yum. Installing : libgcc-4.4.6-3.el6.i686 1/4 Installing : glibc-2.12-1.47.el6.i686 2/4 Installing : nss-softokn-freebl-3.12.9-11.el6.i686 3/4 Installing : libstdc++-4.4.6-3.el6.i686 4/4 Installed products updated. Installed: libstdc++.i686 0:4.4.6-3.el6 Dependency Installed: glibc.i686 0:2.12-1.47.el6 libgcc.i686 0:4.4.6-3.el6 nss-softokn-freebl.i686 0:3.12.9-11.el6 Complete!. IV. IBM Platform HPC: Xeon Phi monitoring, workloads A. IBM Platform HPC built-in Xeon Phi monitoring. NOTE: The following section requires that Fix Pack hpc-3.2-build216840 is applied to the IBM Platform HPC head node. This is available via the IBM Fix Central site. IBM Platform HPC provides rudimentary Xeon Phi monitoring capabilities via the Web based console out of the box. The capabilities are not enabled by default at the time of installation. To enable the monitoring capabilities the following steps must be performed on the head node. Add the line “hasMIC=true" to /usr/share/pmc/gui/conf/pmc.conf # sed -i s?unselect=?unselect=mics,?g /usr/share/pmc/gui/conf/prefconf/hostListDTDiv_default.properties # sed -i s?unselect=?unselect=mics,?g /usr/share/pmc/gui/conf/prefconf/hostListProvisionDTDiv_default.properties # pmcadmin stop # pmcadmin start The IBM Platform HPC Web Console incorrectly assumes that 'micinfo' is located in /usr/bin. The current Intel MPSS installs micinfo to /opt/intel/mic/bin. Here a wrapper script to call /opt/intel/mic/bin is distributed to /usr/bin on all nodes within the compute-rhel-6.2-x86_64_Xeon_Phi node group. Create a script ‘micinfo’ with the following contents: #!/bin/sh /opt/intel/mic/bin/micinfo exit 0 # mkdir -p /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/usr/bin Copy the micinfo script to the appropriate CFM directory and set execute permissions. # cp micinfo /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/usr/bin # chmod 755 /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/usr/bin/micinfo kusu-cfmsync -f Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh New file found: /etc/cfm/compute-rhel-6.2-x86_64_Xeon_Phi/usr/bin/micinfo Distributing 0 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 The following screenshot of the IBM Platform HPC Web console shows the "MIC" tab that displays metrics for each Xeon Phi device on a per host basis: B. IBM Platform LSF ELIM: Xeon Phi monitoring, job scheduling (dynamic resources). IBM has prepared an example ELIM script for IBM Platform HPC (and IBM Platform LSF) that leverages the Intel MPSS tools to provide metrics for both monitoring and job scheduling. Download details for the example ELIM script can be found in Appendix A. The example ELIM has been validated on systems with IBM Platform HPC 3.2/IBM LSF Express 8.3, Intel MPSS 2.1 build 4346-16 and 2 Xeon Phi co-processor cards per node; it will report back the following metrics: Total number of Xeon Phi co-processors per node Number of cores per Xeon Phi co-processor Xeon Phi CPU temperature (Celsius) Xeon Phi CPU frequency (GHz) Xeon Phi Total power (Watts) Xeon Phi Total Free memory (MB) Xeon Phi CPU utilization (%) Below, the IBM Platform HPC hpc-metric-tool is used to configure the monitoring of the Xeon Phi specific metrics. Download details for the script (mic_metrics_add.sh) to automate the configuration of the Intel MIC metrics can be found in Appendix A. The script requires as input the ELIM script, dynelim.intelmic. The following is executed on the IBM Platform HPC head node # chmod 755 /root/dynelim.intelmic # chmod 755 /root/mic_metric_add.sh # ./mic_metric_add.sh /root/dynelim.intelmic Adding External Metric Summary: Name: num_mics LSF resource Mapping: default ELIM file path: /root/dynelim.intelmic LSF resource interval: 60 LSF resource increase: n Display Name: Number of MICs External Metric is added, please run "hpc-metric-tool apply" to apply the change to cluster. Adding External Metric Summary: Name: miccpu_temp0 LSF resource Mapping: default ELIM file path: /root/dynelim.intelmic LSF resource interval: 60 LSF resource increase: n Display Name: MIC0 CPU temp Celsius External Metric is added, please run "hpc-metric-tool apply" to apply the change to cluster. …. …. Adding External Metric Summary: Name: micnum_cores1 LSF resource Mapping: default ELIM file path: /root/dynelim.intelmic LSF resource interval: 60 LSF resource increase: n Display Name: MIC1 Number of cores External Metric is added, please run "hpc-metric-tool apply" to apply the change to cluster. Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.num_mics New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.micnum_cores1 New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccore_freq0 New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccpu_util1 New file found: /etc/cfm/installer-rhel-6.2-x86_64/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccpu_temp1 …. …. New file found: /etc/cfm/lsf-master-candidate/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccore_freq1 New file found: /etc/cfm/lsf-master-candidate/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.miccpu_temp0 New file found: /etc/cfm/lsf-master-candidate/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.micfree_mem0 Distributing 333 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Setting up dhcpd service... Setting up dhcpd service successfully... Setting up NFS export service... Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh Distributing 89 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Running the commands above will result in multiple ELIMs being written to the $LSF SERVERDIR directory (/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc) with names: elim.num_mics elim.miccpu_temp0 ... ... As each ELIM is an instance of "dynelim.intelmic", you simply retain elim.num_mics. Below are the steps to perform a cleanup of the ELIM scripts. Cleanup of ELIM scripts from CFM template directories. Here all elim.mic* files are removed; we retain only elim.num_mics. (executed on the IBM Platform HPC head node). # cd /etc/cfm # find ./ -name "elim.mic*" -print | xargs rm -f Cleanup of ELIMs from $LSF_SERVERDIR on all nodes (executed on the IBM Platform HPC head node). # lsgrun -m “ compute000 compute001" find /opt/lsf/8.3/linux2.6-glibc2.3-x86_64/ -name "elim.mic*" exec rm -f {} \; Update CFM (executed on IBM Platform HPC head node). # kusu-cfmsync -f Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.micnum_cores1 Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/elim.miccore_freq0 Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/elim.miccpu_util1 Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/elim.miccpu_temp1 Removing orphaned file: /opt/kusu/cfm/1/opt/lsf/8.3/linux2.6-glibc2.3x86_64/etc/elim.mictotal_power1 …. …. Removing orphaned file: /opt/kusu/cfm/7/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc/elim.micfree_mem0 Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 # lsadmin limshutdown all Do you really want to shut down LIMs on all hosts? [y/n] y Shut down LIM on <mel1> ...... done Shut down LIM on <compute000> ...... done Shut down LIM on <compute001> ...... done ** NOTE: The warning messages below in the output of lsadmin limstartup may be ignored ** # lsadmin limstartup all Aug 14 11:37:44 2013 23856 3 8.3 do_Resources: /opt/lsf/conf/lsf.shared(340): Resource name processes reserved or previously defined. Ignoring line Do you really want to start up LIM on all hosts ? [y/n]y Start up LIM on <mel1> ...... Aug 16 15:37:49 2013 25092 3 8.3 do_Resources: /opt/lsf/conf/lsf.shared(340): Resource name processes reserved or previously defined. Ignoring line done Start up LIM on <compute000> ...... Aug 14 11:37:50 2013 88229 3 8.3 do_Resources: /opt/lsf/conf/lsf.shared(340): Resource name processes reserved or previously defined. Ignoring line done Start up LIM on <compute001> ...... Aug 14 11:37:50 2013 63077 3 8.3 do_Resources: /opt/lsf/conf/lsf.shared(340): Resource name processes reserved or previously defined. Ignoring line done The output of the IBM Platform LSF 'lsload' command shows the metrics as expected: # lsload -l HOST_NAME status r15s r1m r15m ut pg io ls it tmp swp mem root maxroot processes clockskew netcard iptotal cpuhz cachesize diskvolume processesroot ipmi powerconsumption ambienttemp cputemp num_mics miccpu_temp0 miccore_freq0 mictotal_power0 micfree_mem0 miccpu_util0 micnum_cores0 miccpu_temp1 miccore_freq1 mictotal_power1 micfree_mem1 miccpu_util1 micnum_cores1 ngpus gpushared gpuexcl_thrd gpuprohibited gpuexcl_proc gpumode0 gputemp0 gpuecc0 gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2 gpumode3 gputemp3 gpuecc3 hostid ip mac osversion abbros cpumode fanrate rxpackets txpackets rxbytes txbytes droppedrxpackets droppedtxpackets errorrxpackets errortxpackets overrunrxpackets overruntxpackets rxpacketsps txpacketsps rxbytesps txbytesps gpumodel0 gpumodel1 gpumodel2 gpumodel3 gpudriver compute001 ok 0.0 0.0 0.0 5% 0.0 12 0 1 8584M 2G 31G 8591.0 1e+04 725.0 0.0 4.0 6.0 2100.0 2e+04 2e+06 717.0 -1.0 -1.0 -1.0 -1.0 1.0 46.0 1.1 74.0 2662.6 0.0 57.0 - - 010a0c01 eth0:;eth1:9.21.51.37;eth2:;eth3:;eth4:;eth5:; eth0:00:1E:67:49:CC:E5;eth1:00:1E:67:49:CC:E6;eth2:00:1E:67:49:CC:E7;eth3:00:1E:67:49:CC:E8;eth4:00: 1E:67:0C:BE:20 Red_Hat_Enterprise_Linux_Server_release_6.2_(Santiago) RedHat6 06/2d -1 eth0:290901;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:26842;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:338700;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:2668;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; compute000 ok 0.0 0.1 0.0 4% 0.0 10 0 1 8584M 2G 31G 8591.0 1e+04 723.0 0.0 4.0 6.0 1200.0 2e+04 1e+07 715.0 -1.0 -1.0 -1.0 -1.0 1.0 54.0 1.1 67.0 2662.4 0.0 57.0 - - 010a0b01 eth0:;eth1:9.21.51.36;eth2:;eth3:;eth4:;eth5:; eth0:00:1E:67:49:CC:83;eth1:00:1E:67:49:CC:84;eth2:00:1E:67:49:CC:85;eth3:00:1E:67:49:CC:86;eth4:00 :1E:67:0C:BA:D8 Red_Hat_Enterprise_Linux_Server_release_6.2_(Santiago) RedHat6 06/2d -1 eth0:293788;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:26151;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:338976;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:2679;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; - mel1 ok 0.2 0.3 0.3 7% 0.0 60 3 2 28G 33G 25G 3e+04 5e+04 1021.0 0.0 4.0 6.0 1200.0 2e+04 2e+06 938.0 -1.0 -1.0 -1.0 -1.0 - - 010a0a01 eth0:192.0.2.10;eth1:9.21.51.35;eth2:;eth3:;eth4:;eth5:; eth0:00:1E:67:31:42:CD;eth1:00:1E:67:31:42:CE;eth2:00:1E:67:31:42:CF;eth3:00:1E:67:31:42:D0;eth4:00 :1E:67:0C:BB:80 Red_Hat_Enterprise_Linux_Server_release_6.2_(Santiago) RedHat6 06/2d -1 eth0:3416828;eth1:13903274;eth2:0;eth3:0;eth4:0;eth5:0; eth0:11413427;eth1:29766156;eth2:0;eth3:0;eth4:0;eth5:0; eth0:331724;eth1:2318337;eth2:0;eth3:0;eth4:0;eth5:0; eth0:13522166;eth1:42629433;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:125;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:9;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:3;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:1;eth2:0;eth3:0;eth4:0;eth5:0; eth0:0;eth1:0;eth2:0;eth3:0;eth4:0;eth5:0; It is now possible to display the newly added resources in the IBM Platform HPC Web Console. In the screenshot below, the Xeon Phi specific metrics are displayed on the respective host status lines. C. IBM Platform LSF: Xeon Phi job scheduling (LSF configuration) The following steps describe the necessary IBM Platform LSF configuration in support of Xeon Phi. All LSF hosts equipped with Xeon Phi must be tagged with the Boolean resource "mic". This will allow users submitting Xeon Phi specific workloads to IBM Platform LSF to request a system equipped with Xeon Phi. Additionally, it is necessary to enable the resource reservation per slot for the defined resource 'num_mics'. Edit /etc/cfm/templates/lsf/default.lsf.shared and make the following updates: Begin Resource RESOURCENAME TYPE INTERVAL INCREASING DESCRIPTION mic Boolean () () (Intel MIC architecture) .... .... End Resource # Keywords Edit /etc/cfm/templates/lsf/default.lsf.cluster and make the following updates: Begin Host HOSTNAME model XXX_lsfmc_XXX ! ! compute000 ! ! 1 compute001 ! ! 1 .... .... End Host type 1 3.5 3.5 () 3.5 () server r1m mem swp RESOURCES #Keywords () () (mg) () (mic) () (mic) Edit /etc/cfm/templates/lsf/lsbatch/default/configdir/lsb.resources and make the following updates: Begin ReservationUsage RESOURCE METHOD num_mics End ReservationUsage RESERVE Y Edit /etc/cfm/templates/lsf/lsbatch/default/configdir/lsb.params and add the following parameter: Begin Parameters .... .... RESOURCE_RESERVE_PER_SLOT=Y End Parameters Run the command kusu-addhost -u to make the changes take effect. # kusu-addhost -u Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Setting up dhcpd service... Setting up dhcpd service successfully... Setting up NFS export service... Running plugin: /opt/kusu/lib/plugins/cfmsync/getent-data.sh Distributing 102 KBytes to all nodes. Updating installer(s) Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 Sending to 198.51.100.255 Sending to 192.0.2.255 D. IBM Platform LSF: Xeon Phi job submission This section discusses the methodology to submit both "offload" and "native" type workloads to Xeon Phi coprocessor equipped nodes in an IBM Platform LSF cluster. In simple terms: With the "offload" model, the executable is run on the host processor, offloading specific work to the Xeon Phi coprocessor. With the "native" model, the executable runs natively on the Xeon Phi coprocessor. For both "offload" and "native" type workloads, it is assumed that the following configuration for IBM Platform LSF exists (see Section C: IBM Platform LSF: Xeon Phi job scheduling above): i. Non-shared dynamic resource 'num_mics', which counts the number of Xeon Phi coprocessor (cards) available on a system (online state). Non-shared Boolean resource 'mic', which is configured for nodes equipped with Xeon Phi coprocessor(s). RESOURCE_RESERVE_PER_SLOT=Y has been configured. Offload Example I: The following example shows a simple offload binary ('omp_numthreadsofl') being launched under IBM Platform LSF. The binary has been compiled using the Intel Compiler (intel-compilerproc-117-13.0-1.x86_64) and is launched using the IBM Platform LSF bsub command requesting the Boolean resource "mic" and rusage on the resource num_mics equal to 1. Note that it is currently not possible to target a specific Xeon Phi device at runtime when running an Offload executable. This is a limitation of the current Intel tools. $ bsub -I -R "select[mic] rusage[num_mics=1]" /shared/omp_numthreadsofl -t 16 Job <1083> is submitted to default queue <medium_priority>. <<Waiting for dispatch ...>> <<Starting on compute000>> Hello World from thread = 0 Hello World from thread = 11 Number of threads on node compute000-mic0 = 16 Hello World from thread = 2 Hello World from thread = 1 Hello World from thread = 4 Hello World from thread = 9 Hello World from thread = 8 Hello World from thread = 10 Hello World from thread = 5 Hello World from thread = 6 Hello World from thread = 7 Hello World from thread = 3 Hello World from thread = 13 Hello World from thread = 12 Hello World from thread = 14 Hello World from thread = 15 ii. Offload Example II: The following shows an example of an Intel MPI offload binary being launched under IBM Platform LSF. The binary has been compiled using the Intel Compiler (intel-compilerproc-117-13.0-1.x86_64) and is launched using the IBM Platform LSF bsub command requesting the following: Boolean resource "mic". Resource reservation (rusage) on the resource num_mics equal to 1 (per slot). Two processors (MPI ranks), one processor per node. Note that each MPI rank will use Offload if Xeon Phi available. $ bsub -n 2 -R "select[mic] rusage[num_mics=1] span[ptile=1]" -I mpiexec.hydra /shared/mixedofl_demo Job <1082> is submitted to default queue <medium_priority>. <<Waiting for dispatch ...>> <<Starting on compute000>> Hello from thread 0 out of 224 from process 0 out of 2 on compute000 Hello from thread 94 out of 224 from process 0 out of 2 on compute000 Hello from thread 8 out of 224 from process 0 out of 2 on compute000 Hello from thread 78 out of 224 from process 0 out of 2 on compute000 Hello from thread 14 out of 224 from process 0 out of 2 on compute000 Hello from thread 70 out of 224 from process 0 out of 2 on compute000 Hello from thread 1 out of 224 from process 0 out of 2 on compute000 Hello from thread 57 out of 224 from process 0 out of 2 on compute000 Hello from thread 113 out of 224 from process 0 out of 2 on compute000 Hello from thread 72 out of 224 from process 0 out of 2 on compute000 Hello from thread 16 out of 224 from process 0 out of 2 on compute000 …. …. Hello from thread 43 out of 224 from process 1 out of 2 on compute001 Hello from thread 98 out of 224 from process 1 out of 2 on compute001 . iii. Native Examples IBM has devised an example job wrapper script which allows users to launch jobs targeted to Xeon Phi under IBM Platform LSF. The example job wrapper script assumes that IBM Platform LSF has been configured as per Section C: IBM Platform LSF: Xeon Phi job scheduling above. Download instructions for the example ELIM, dynelim.intelmic and the job wrapper script (mic.job) can be found in Appendix A. The job wrapper makes the following assumptions: Support for a maximum of 2 Xeon Phi devices per node. The job wrapper script assumes that there is a shared $HOME directory for the user running the job, so the wrapper will not function for the user 'root'. Each Xeon Phi equipped LSF host is running an ELIM which reports back the number of Xeon Phi available in the node (dynamic resource 'num_mics'). Jobs requiring a Xeon Phi are submitted to LSF with the correct resource requirements. For Xeon Phi jobs, bsub -n N translates to a request for N Xeon Phi devices. Jobs must also be submitted with the correct corresponding rusage[num_mics=1] resource (assuming the configuration above on Section C). For example, here we submit a job which requires 2 Xeon Phi coprocessors: o $ bsub -n 2 -R "select [mic] rusage[num_mics=1]" <PATH_TO>/mic.job <PATH_TO>/a.out o Note that 2 job slots will also be allocated on the nodes selected. Intel MPI (runtime) must be available if MPI ranks are to be run on Xeon Phi (coprocessor mode). The job wrapper script has been tested with both native Xeon Phi binaries (leveraging Intel MPSS 'micnativeloadex') and Intel MPi Xeon Phi co-processor mode jobs. Xeon Phi jobs submitted to IBM Platform LSF will be marked with the Xeon Phi hostname(s) using the IBM Platform LSF bpost command. This provides rudimentary control over access to devices. Once a Xeon Phi has been marked for use by a job, it is used exclusively for the duration of the job. Native MPI Example ('Co-processor mode'): The following shows an example of a native Xeon Phi MPI binary being launched under IBM Platform LSF. The binary has been compiled using the Intel Compiler (intel-compilerproc-117-13.0-1.x86_64) and is launched using the 'mic.job' wrapper. The resource requirement string for the job requests the Boolean resource "mic" and rusage on the resource num_mics equal to 1. $ bsub -n 2 -I -R "select[mic] rusage[num_mics=1]" /shared/mic.job /shared/cpuops_mpi Job <975> is submitted to default queue <medium_priority>. <<Waiting for dispatch ...>> <<Starting on compute001>> - current ops per sec [avg 35.24] 33.66 35.44 35.24 35.24 35.24 35.44 35.24 35.24 35.24 35.44 35.24 35.44 35.24 35.44 35.24 35.24 35.24 35.44 35.24 35.24 35.24 35.44 35.24 35.24 35.24 35.44 35.24 35.24 35.24 35.24 35.24 35.24 .... .... During execution of the job we see the following: bpost of Xeon Phi hostname(s) to the job (this is performed by the job wrapper script, 'mic.job'): # bjobs -l 975 Job <975>, User <hpcadmin>, Project <default>, Status <RUN>, Queue <medium_prio rity>, Interactive mode, Command </shared/mic.job /shared/ cpuops_mpi>, Share group charged </hpcadmin> Wed Aug 14 12:48:13: Submitted from host <mel1>, CWD <$HOME>, 2 Processors Requ ested, Requested Resources <select[mic] rusage[num_mics=1] >; Wed Aug 14 12:48:15: Started on 2 Hosts/Processors <compute001> <compute000>; Wed Aug 14 12:48:29: Resource usage collected. MEM: 5 Mbytes; SWAP: 357 Mbytes; NTHREAD: 7 PGID: 188945; PGID: 188946; PGID: 188970; PGID: 188971; PIDs: 188945 PIDs: 188946 188948 188969 PIDs: 188970 PIDs: 188971 SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - loadStop - - - - - - - - - root maxroot processes clockskew netcard iptotal cpuhz cachesize loadSched loadStop diskvolume processesroot ipmi powerconsumption ambienttemp cputemp loadSched loadStop num_mics miccpu_temp0 miccore_freq0 mictotal_power0 micfree_mem0 loadSched loadStop miccpu_util0 micnum_cores0 miccpu_temp1 miccore_freq1 mictotal_power1 loadSched loadStop micfree_mem1 miccpu_util1 micnum_cores1 ngpus gpushared loadSched loadStop gpuexcl_thrd gpuprohibited gpuexcl_proc gpumode0 gputemp0 gpuecc0 loadSched loadStop gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2 gpumode3 gputemp3 loadSched loadStop gpuecc3 loadSched loadStop EXTERNAL MESSAGES: MSG_ID FROM POST_TIME MESSAGE 0 hpcadmin Aug 14 12:48 compute001-mic0 1 hpcadmin Aug 14 12:48 compute000-mic0 ATTACHMENT N N bhosts -l compute000 HOST compute000 STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW ok 16.00 - 16 1 1 0 0 0 CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem root maxroot Total 0.0 0.0 0.0 0% 0.0 16 0 27 8584M 2G 31G 8591.0 1e+04 Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M 0.0 0.0 processes clockskew netcard iptotal cpuhz cachesize diskvolume Total 722.0 0.0 4.0 6.0 1200.0 2e+04 1e+07 Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0 processesroot ipmi powerconsumption ambienttemp cputemp num_mics Total 714.0 -1.0 -1.0 -1.0 -1.0 0.0 Reserved 0.0 0.0 0.0 0.0 0.0 1.0 …. ..... # bhosts -l compute001 HOST compute001 STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW ok 16.00 - 16 1 1 0 0 0 CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem root maxroot Total 0.0 0.0 0.0 0% 0.0 17 0 32 8584M 2G 31G 8591.0 1e+04 Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M 0.0 0.0 processes clockskew netcard iptotal cpuhz cachesize diskvolume Total 731.0 0.0 4.0 6.0 1200.0 2e+04 2e+06 Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0 processesroot ipmi powerconsumption ambienttemp cputemp num_mics Total 717.0 -1.0 -1.0 -1.0 -1.0 0.0 Reserved 0.0 0.0 0.0 0.0 0.0 1.0 …. …. You can see above that num_mics has 1 unit reserved (as expected) on both compute000 and compute001. Native (non-MPI) Example: The following shows an example of a native Xeon Phi binary (non-MPI) being launched under IBM Platform LSF. The binary has been compiled using the Intel Compilers, Intel MPI (intel-mpi-intel64-4.1.0p-024.x86_64) and is launched using the 'mic.job' wrapper. The resource requirement string for the job requests the Boolean resource "knc" and rusage on the resource num_mics equal to 1. $ bsub -I -R "select[mic] rusage[num_mics=1]" /shared/mic.job /shared/fibo Job <1078> is submitted to default queue <medium_priority>. <<Waiting for dispatch ...>> <<Starting on compute000>> 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 …. …. Remote process returned: 0 Exit reason: SHUTDOWN OK During execution of the job you see the following: 'bpost' of Xeon Phi hostnames to the job (this is performed by the job wrapper script) # bjobs -l 1078 Job <1078>, User <hpcadmin>, Project <default>, Status <RUN>, Queue <medium_pri ority>, Interactive mode, Command </shared/mic.job /shared /fibo>, Share group charged </hpcadmin> Wed Aug 14 17:43:54: Submitted from host <mel1>, CWD <$HOME>, Requested Resourc es <select[mic] rusage[num_mics=1]>; Wed Aug 14 17:43:57: Started on <compute000>; SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - loadStop - - - - - - - - - root maxroot processes clockskew netcard iptotal cpuhz cachesize loadSched loadStop …. …. gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2 gpumode3 gputemp3 loadSched loadStop gpuecc3 loadSched loadStop EXTERNAL MESSAGES: MSG_ID FROM POST_TIME MESSAGE 0 hpcadmin Aug 14 17:43 compute000-mic0 ATTACHMENT N # bhosts -l compute000 HOST compute000 STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW ok 16.00 - 16 1 1 0 0 0 CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem root maxroot Total 0.0 0.0 0.0 3% 0.0 23 1 2 8600M 2G 31G 0.0 0.0 Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M processes clockskew netcard iptotal cpuhz cachesize diskvolume Total 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Reserved - processesroot ipmi powerconsumption ambienttemp cputemp num_mics Total 0.0 0.0 0.0 0.0 0.0 0.0 Reserved - 1.0 …. …. gpuecc0 gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2 Total 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Reserved - gpumode3 gputemp3 gpuecc3 Total 0.0 0.0 0.0 Reserved …. …. You see above that num_mics has 1 unit reserved (as expected). Appendix A: IBM Platform HPC and Intel Xeon Phi Integration Scripts The package containing all example scripts referred to in this document are available for download at IBM Fix Central (http:// http://www.ibm.com/support/fixcentral/). The scripts are provided as examples as per the terms indicated in the file LICENSE. The package contains the example scripts: dynelim.mic: dynamic ELIM script to collect Intel MIC related metrics (HPC, LSF) mic.job: Intel MIC job wrapper (LSF) mic_metric_add.sh: script to facilitate the rapid addition of Intel MIC metrics (HPC). rc.local.xeon_phi Script portion to configure network bridge, and configure Intel MPSS to use network bridge post_install.sh Custom script for IBM Platform HPC. This will force the execution of rc.local on the first boot after the CFM has executed. Copyright and Trademark Information © Copyright IBM Corporation 2013 U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.