IBM Systems & Technology Group Platform MPI 9.1.2 Introduction to Platform MPI Dec, 2013 © 2013 IBM Corporation IBM Systems & Technology Group Topics Introduction Key Concepts in Parallel Computing MPI Background Platform MPI Background Installing Platform MPI Building & Running MPI Applications 2 © 2013 IBM Corporation IBM Systems & Technology Group Who am I? Stan Graves <gravess@us.ibm.com> Platform MPI Software Developer • Joined HP-MPI team ~7 years ago • Primary customer support contact • Customer facing technical trainer ClusterPack Project Manager & Developer 3 • ~3 years Developer, ~2 years Project Manager • Scale Out Cluster Management Solution • Parallel system administration tools • System image creation & distribution © 2013 IBM Corporation IBM Systems & Technology Group Topics Introduction Key Concepts in Parallel Computing MPI Background Platform MPI Background Installing Platform MPI Building & Running MPI Applications 4 © 2013 IBM Corporation IBM Systems & Technology Group What is Parallel Computing? “Parallel Computing” is an overloaded term Parallelism at bit-level, instruction level, data level, and task level can all claim to be “parallel computing” “Parallel machines” can be some combination of: • • “Scale Up” Multiple processing elements within a single machine (multiple sockets per machine and/or multiple cores per socket) “Scale Out” Multiple computers in a cluster (or grid or cloud) For purposes of this discussion: • 5 Parallel computing is the coordination of work across many individual processes running concurrently (e.g. “in parallel”) on one or more computers to solve a single problem. © 2013 IBM Corporation IBM Systems & Technology Group What is Parallel Computing? Parallel Programming Opportunities Clusters can be scaled to meet problem size Commercial-Off-The-Shelf (COTS) hardware components to build clusters are readily available Many possible cluster styles depending on customer needs • High Availability • Load Balancing • Computation Clusters • Grid Computing • Cloud Computing 6 © 2013 IBM Corporation IBM Systems & Technology Group What is Parallel Computing? Parallel Programming Difficulties The coordination & synchronization of multiple processes running concurrently introduces complexity as the number of processes increases Common software bugs with parallel computing • Race conditions • Deadlocks • Synchronization • Parallel slowdown Interconnect bandwidth & latency are often limiting factors for overall parallel program throughput Interconnect fabrics are increasing in complexity (e.g. torus configurations, multi-card & multi-port networks) 7 © 2013 IBM Corporation IBM Systems & Technology Group What is Parallel Computing? Parallel Programming Difficulties There is no “Silver Bullet” “The Mythical Man Month” “Accidental complexity” is complexity that is incidental and non-essential to the problem that needs to be solved. “Essential complexity” is inherently part of the problem that needs to be solved. “Amdahl’s Law” accounts for the theoretical improvement of a task when only a sub-set is improved. Often used in evaluating the expected effect of adding processors to a parallel application. 8 © 2013 IBM Corporation IBM Systems & Technology Group Topics Introduction Key Concepts in Parallel Computing MPI Background Platform MPI Background Installing Platform MPI Building & Running MPI Applications 9 © 2013 IBM Corporation IBM Systems & Technology Group What is MPI? Message Passing Interface (MPI) API Specification for the interchange of messages between individual computing processes De-facto standard for parallel programming Formalized topology, synchronization, and communication for multiple process jobs Language independent, platform independent Many different implementations • Typically non-ABI compatible MPI Forum • http://www.mpi-forum.org/ General overview at Wikipedia • 10 http://en.wikipedia.org/wiki/Message_Passing_Interface © 2013 IBM Corporation IBM Systems & Technology Group What is a rank? The MPI Standard uses the term “rank” to describe a process that is part of an MPI job Ranks have two unique identifiers Global Rank ID Local Rank ID A rank “is a” process (for “almost all” implementations) All threads in a process are part of the same rank NOT defined this way by the standard Generally, MPI jobs are run with one rank per core © 2013 IBM Corporation IBM Systems & Technology Group MPI Key Concepts MPI-1 Standard Communicator Communicators are objects connecting groups of processes in the MPI session. Every MPI job includes the default MPI_Comm_World that includes all the ranks that are part of the job. Point-to-point Operations Point-to-point communication involves sending and receiving message between two processes. A much used example is the MPI_Send and MPI_Recv. Collective Operations Applications may require coordinated operations among multiple processes. For example, all ranks need to cooperate to sum sets of numbers. Derived Datatypes Predefined MPI datatypes: MPI_INT, MPI_CHAR, MPI_DOUBLE. Suppose your data is an array of ints and all the processors want to send their array to the root with MPI_Gather. © 2013 IBM Corporation IBM Systems & Technology Group MPI Key Concepts MPI-2 Standard One-Sided Communication Put, Get, and Accumulate, being a write to remote memory, a read from remote memory, and a reduction operation on the same memory across a number of tasks. Also defined are global, pair wise, and remote locks. Collective Extensions Various extensions to collective operations. (e.g. MPI_Reduce_local) Dynamic Process Management The key aspect of this MPI-2 feature is "the ability of an MPI process to participate in the creation of new MPI processes or to establish communication with MPI processes that have been started separately." (e.g MPI_Comm_Spawn) Parallel I/O A collection of functions designed to allow the difficulties of managing I/O on distributed systems to be abstracted away to the MPI library, as well as allowing files to be easily accessed in a patterned fashion using the existing derived datatype functionality. © 2013 IBM Corporation IBM Systems & Technology Group MPI Key Concepts MPI-3 Standard Non-blocking Collectives Non-blocking versions of all collective calls. Allows the overlapping of collective communication with application computation. Neighborhood Collectives Neighborhood (aka sparse) collective operations. Allows the user to define specific topology patterns for sparse communication in collective operations. One-sided communication operations Fortran 2008 Bindings Clarifications to existing parts of the MPI 2.2 Standard © 2013 IBM Corporation IBM Systems & Technology Group Topics Introduction Key Concepts in Parallel Computing MPI Background Platform MPI Background Installing Platform MPI Building & Running MPI Applications 15 © 2013 IBM Corporation IBM Systems & Technology Group Platform MPI 9 - Market Leadership Auto-Detect Interconnect Runtime detection or specification of interconnect protocol Broad range of interconnects Improved Automated Benchmarking of selected Collective Operations Improved dynamic collective algorithm selection Improved user benchmarking for better algorithm selection. LSB (Linux Standard Base) Abstracts system calls Allows one MPI library to run with multiple libc versions Linux/Windows Support Shared source base for both Oss, makes them “bug” compatible – fix it once will fix it for all O/Ss Scheduler Neutral 16 TCP performance Optimized TCP point to point communications Runtime tunable to optimize large messages for TCP Improved Collective Algorithms Improved Reduce/AllReduce, Allgather, Alltoall LSF, PE-POE, PBS/torque, Slurm, MS-HPC © 2013 IBM Corporation IBM Systems & Technology Group What is Platform MPI? • • Platform MPI is a proprietary implementation of the V1.2, V2.2 and V3.0 MPI Standard (Partial). More ISVs have standardized on Platform MPI and distribute Platform MPI than any other commercial MPI * * Abaqus * * ANSYS, CFX, FLUENT RADIOSS * * * * * Molpro University of Cardiff * * * * NX Nastran 17 AMLS © 2013 IBM Corporation IBM Systems & Technology Group Major Product Milestones 2006 2003 1997 2009 2010 2012 2013 First Release HP-MPI 1.1 for HP-UX First Release IBM Platform MPI 8.3 First Release HP-MPI 2.0 for Linux First Release Platform MPI 7.1 First Release HP-MPI 1.0 for Windows Platform MPI 9.1 Performance & Scalability Enhancements Last Release HP-MPI 2.3.1 Platform MPI 9.1.2 Community Edition Platform MPI 8.0 First “simultaneous” release of Linux & Windows First release to incorporate Platform MPI 5.6.x features (ScaliMPI) 18 © 2013 IBM Corporation IBM Systems & Technology Group Platform MPI Advantage Proven scalability to 128K ranks Supports broad set of interconnects Infiniband - SDR, DDR, QDR, FDR Mellanox (OFED/IBV) Qlogic (PSM) 10G Myrinet (MX) Multiple protocol support Interconnect ‘native’ protocols (IBV, PSM, MX, ...) UDAPL TCP Shared memory Automatic interconnect detection/selection at runtime Develop, Debug, Test with TCP and run with IBV. 19 © 2013 IBM Corporation IBM Systems & Technology Group Platform MPI Advantage Supports a wide variety of schedulers LSF for Linux and Windows PE-POE Windows HPC SLURM PBS Pro MPI 2.2 Standard Compliance for all products Preview of selected MPI 3.0 Features Non-blocking collectives Fortran 2008 Bindings MPI 3.1 Draft High Availability APIs (Standard Edition Only) 20 © 2013 IBM Corporation IBM Systems & Technology Group Topics Introduction Key Concepts in Parallel Computing MPI Background Platform MPI Background Installing Platform MPI Building & Running MPI Applications 21 © 2013 IBM Corporation IBM Systems & Technology Group Installation and Upgrades Platform MPI All files are located under the MPI_ROOT installation location The installation directory is re-locatable Uses InstallAnywhere on Linux & Windows Installation files can be removed after installation Product can be installed one time and copied or included in installation images 22 © 2013 IBM Corporation IBM Systems & Technology Group What is MPI_ROOT? Platform MPI can have multiple installed versions on the same cluster. MPI_ROOT is an environment variable that should be set to the installation root of the desired version. The “mpirun” will “intuit” a MPI_ROOT if not set. This process is “good” but NOT “fool proof.” $MPI_ROOT/bin/mpirun … %MPI_ROOT%\bin\mpirun … © 2013 IBM Corporation IBM Systems & Technology Group Platform MPI Editions Edition Overview Platform MPI 9.1.2 Community Edition • • 24 • Free download from IBM Developer Works (registration required) • Limited to 4096 ranks per job • Limited support for “Cluster Test Tools”, no support for HA Features • Developer Works Forum for support (best effort basis only) • ONLY “full versions” will be released on the Developer Works portal Community Edition + Support • Limited to 8192 ranks per job • Limited support for “Cluster Test Tools”, no support for HA Features • Entitled to Support and FixPacks Standard Edition • No limits on job size (64k+ ranks) • Full access to “Cluster Test Tools” and HA Features • Entitled to Support and FixPacks © 2013 IBM Corporation IBM Systems & Technology Group Topics Introduction Key Concepts in Parallel Computing MPI Background Platform MPI Background Installing Platform MPI Building & Running MPI Applications 25 © 2013 IBM Corporation IBM Systems & Technology Group “Hello World” #include <stdlib.h> #include <stdio.h> #include <mpi.h> int main(int argc, char **argv) { int rank, size, len; char name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Get_processor_name(name, &len); printf ("Hello world! I'm %d of %d on %s\n", rank, size, name); MPI_Finalize(); exit(0); } 26 © 2013 IBM Corporation IBM Systems & Technology Group Compiling with MPI For each language, the compiler wrappers search for compilers in a specific order: 1. If the user has established a preference with the environment variables MPI_CC, MPI_CXX, MPI_F77, MPI_F90, that takes precedence. 2. Otherwise look for a compiler in the language specific list from left to right. If the compiler in the list can be found in the user’s PATH. We recommend Option #1. Use –show to view the compiler wrapper’s options and command line. % $MPI_ROOT/bin/mpicc –show example.c 27 © 2013 IBM Corporation IBM Systems & Technology Group Compiling with Platform MPI The compiler wrappers are provided as a “template” The “-show” option should be used to examine the compile & link commands •ONLY shared libraries are shipped & supported • Build commands can be included in a more complex build environment • 28 Support will be “limited” and on a best effort basis © 2013 IBM Corporation IBM Systems & Technology Group Building an example application Best Practice: Set MPI_ROOT in the environment to the installed location of the Platform MPI. Multiple versions of Platform MPI can be installed on the same machine. Platform MPI should be installed in a shared files system location, or locally on each machine in the cluster. Platform MPI ships with several example applications that can be used for testing. The “demo” directory is on a shared file system across all the nodes in the cluster. The Platform MPI version is installed locally on each node. 29 © 2013 IBM Corporation IBM Systems & Technology Group Running with Platform MPI Setting “MPI_ROOT” and invoking the appropriate mpirun command is sufficient to start the MPI job • $MPI_ROOT/bin/mpirun … In most cases it is NOT appropriate to add Platform MPI directories to system level environment variables PATH •LD_LIBRARY_PATH •LD_PRELOAD • 30 © 2013 IBM Corporation IBM Systems & Technology Group Stan Graves (gravess@us.ibm.com) QUESTIONS? 31 © 2013 IBM Corporation