Model Coupling Toolkit: Recent Developments and Future Plans Robert Jacob Argonne National Laboratory Second Workshop on Coupling Technologies for Earth System Modeling, NCAR, February 20-22, 2013. About Argonne Founded in 1943, designated a national laboratory in 1946 Managed by The University of Chicago for the U.S. Department of Energy – More than 2,900 employees and 5,000+ facility users – About $475M/year budget – 1,500-acre, wooded site in DuPage County, Illinois Broad science portfolio Numerous sponsors MCT Philosophy: Model coupling vs. “the coupler” MCT is not a coupler Instead, MCT provides datatypes/methods you add to models (and a separate coupler or driver if you want one) to make the models “couple-able”. Must also follow a few programming standards (suggested but not required for using MCT) – Separate “initialization” and “run” methods. – Do not use MPI_COMM_WORLD everywhere. – Avoid global data types Model Coupling Toolkit: History Pre-History: – 1996 parallel coupler for Fast Ocean Atmosphere Model (Jacob, UW-Madison) – 1998 Physical-space Statistical Analysis System (Larson, NASA) U.S. Department of Energy ACPI Avante-Garde project (2000-2001) – First work on parallelizing CCSM coupler (cpl5->cpl6) – MCT 1.0 DOE Scientific Discovery through Advanced Computing (SciDAC) (2001-2011) – MCT 2.0 – MCT 2.7.3 – CCSM3, CCSM4, CESM1. DOE Climate Science for a Sustainable Energy Future (CSSEF) (2011 – 2015) – MCT 2.8 – Next generation MCT All DOE support is from the climate modeling program in the Office of Biological and Environmental Research (BER) in the Office of Science. MCT Architecture High-level MCT classes Low-level MCT classes Message-Passing Environment Utilities (MPEU) mpeu (message passing environment utilities) Developed by the NASA DAO, and extended by MCT developers, mpeu provides the following services to Fortran90 MPI applications: Support for basic derived types (List, String) on which low-level classes in MCT are built And…. F90 module-style access to MPI Support for multiprocessor stdout and stderr Error handling / shutdown Support for namelist replacement “resource files” Run-time flow tracing Timing/Load balance measurements MCT Low-level Classes Coupled model registry (describe how many models are coupled (no limit)) – MctWorld Multi-field data storage (hold data being transferred (any amount)) – AttrVect Domain decomposition (any grid, any decomposition) – GlobalSegMap Intercomponent parallel data transfer scheduler (between two GSMaps) – Router Intercomponent parallel data transfer (For a Router and an Av) – Transfer Intracomponent parallel data redistribution (for an AV and a GSMap of the same grid) – Rearranger MCT High-Level Classes and Modules Interpolation (sparse) matrix object – SparseMatrix Sparse matrix – Attribute Vector multiply (for interpolation) – MatAttrVectMult Physical Grid Description – GeneralGrid Time averaging and accumulation support – Accumulator Masked/unmasked spatial integrals and averages – SpatialIntegral Combining sources from two or more models – Merge Communication methods for MCT datatypes – – – – – AccumulatorComms AttrVectComms GeneralGridComms GlobalSegMapComms SparseMatrixComms Typical MCT Use: ATM (M nodes) CPL (N nodes) OCN (P nodes) Call MCT World Call MCT World Call MCT World Define GlobalSegMap Define AttrVect Define Router Define GlobalSegMaps Define AttrVects Define Routers Define Accumulators Read Matrix elements Define GlobalSegMap Define AttrVect Define Router Read Atmosphere Data Initialization Read Ocean Data DO WORK DO WORK MCT_Send(AtrVect, Router) MCT_Recv(AtrVect, Router) MCT_Recv(AAtrVect, ARouter) MCT_Recv(OAtrVectin, ORouter) MCT_AvMatVectMult(AAtrVect, SparseMatrix, OAtrVectout) Compute Fluxes MCT_Send(AAtrVect, ARouter) MCT_Send(OAtrVect, ORouter) MCT_Send(AtrVect, Router) MCT_Recv(AtrVect, Router) More on MCT Use The user must: – Know how data is layed out on processors. – Describe decomposition to MCT with a GSMap • Points are uniquely numbered globally – Copy local data in to an MCT Attribute vector. • Copy in either memory order or global index order – Calculate Interpolation weights (with SCRIP or ESMF Regridder) – Read in interpolation weights (to root node) MCT can: – Derive communication tables between decompositions (using indices) – Do all parallel data communication necessary for interpolation, gather, scatter. • Minimizes sizes of data transferred. MCT Users “IPCC-class” production coupled model: The NSF/DOE Community Earth System Model – – – – MCT is the default coupling method in CCSM4 and CESM1 MCT datatypes always used in top level coupler driver. MCT methods/datatypes are default for driver-component communication. All AR5 simulations by CCSM4/CESM1 are using MCT. Other academic coupled systems: – COAMPS/ROMS - Hurricanes – ROMS/Swan - coastal oceanography – WRF/ROMS - Hurricanes OASIS3 – MCT – See next talk CESM using cpl7/MCT can scale to 100K cores. 12 MCT Recent history 01/06/2010: MCT 2.7.0 released in CCSM4 – Limted used of OpenMP 02/28/2010: MCT 2.7.1 released in CESM1 11/30/2010: MCT 2.7.2 released in CESM1.0.3 (CW2010 in Toulouse, France. December, 2010) 01/25/2011: MCT 2.7.3 add debugging option to configure (2011: Some divergence between Argonne and NCAR MCT repositories) 02/07/2012: MCT 2.7.4 update autoconf build to latest version (Jim Edwards) MCT More Recent history MCT 2.8.0 - Released April 30, 2012 (first standalone release since 2.6)! – Merged differences in Argonne and NCAR MCT repos. – New datatype in AttributeVector to speed up copies (thanks to Bill Sacks, NCAR) – ANL and NCAR repos in sync! 07/12/12 - MCT 2.8.1 Convert Argonne repository to git – Repository now world readable. Copy on github.com – Full 10+ year history MCT development converted. – Github provides SVN interface allowing CESM to pull directly. Eliminate duplicate repo at NCAR. git clone http:git.mcs.anl.gov/MCT.git Continuing to improve cpl7/MCT performance at scale. Slow initialization time for high-resolution, high-processor count cases. – 1/8th degree runs on Intrepid were taking over 1.5 hours to initialize. – Traced to initialization of MCT’s Rearranger (equal to 2 Routers) which moves data between overlapping decompositions. Thanks Tony! MCT More Recent history 09/12/12 - MCT 2.8.2 – Includes fix for slow Router init. – Released in CESM 1.1 – Not released separately. 12/19/12 – MCT 2.8.3 Current Version – Public release – All of above changes plus some minor compiler fixes CSSEF research demands on coupling Dynamical Adaptive Atmospheric Dynamics – Grid points are created and destroyed on a coupler processor – Changing cell sizes for just one grid within coupler will require online calculation of new interpolation weights • Which requires more information about both grids then currently in coupler. Development of MPAS-Ocean – Need to retain information about unstructured grids for interpolation weight calculation. Resiliency and Scaling – Dynamic load balancing and resilient computing means points could move from processor to processor. – Millions of threads and small per-core memory means need more parallelism and optimize for low-memory Solution: Re-Implement MCT data model with MOAB MOAB = Mesh Oriented dAtaBase – A database for mesh (structured and unstructured) and field data associated with mesh – Tuned for memory efficiency first, speed a close second – Serial, parallel look very similar, parallel data constructs imbedded in MOAB interface – http://trac.mcs.anl.gov/projects/ITAPS/wiki/MOAB – Developed under DOE SciDAC program – Includes parallel I/O and visualization capabilities. – Included in nuclear engineering exascale co-design center. MOAB is already used in other projects, notably DOEfunded cryosphere modeling and nuclear reactor simulation. Like MCT, it is “battle tested”. Ice Sheet bed Klystron Mesh MOAB Data Model • 4 fundamental “types”: – Entity: fine-grained entities in grid (vertex, tri, hex) • Supported types: vertex, edge, tri, quad, polygon, tet, prism, pyramid, hex, septahedron, polyhedron • Mostly unstructured, though can represent structured (leveraging work done with ParVis). • Flexible in representing intermediate-dimension entities (internal edges/faces) – Entity Set: arbitrary set of entities & other sets • Parent/child relations, for embedded graphs between sets – Interface: object on which interface functions are called and through which other data are obtained – Tag: named datum annotated to Entitys, Entity Sets, Interface • Instances accessed using opaque (type-less) “handles” • MOAB is a C++ library. Fortran interface is iMesh MOAB Data Model illustrated Review: MCT Classes Mesh Fields MOAB provides different class structures that define mesh, fields, and domain decomposition Index-space domain decomposition MCTWorld (Legacy) type MCTWorld integer :: MCT_comm integer :: ncomps integer :: mygrank integer,dimension(:),pointer :: nprocspid integer,dimension(:,:),pointer :: idGprocid end type MCTWorld Lightweight component model registry that stores the coupled-systemwide MPI global communicator (and local PE rank on it), number of component models, number of PEs in each component, and global PE rank translation table Registry methods: – Create/destroy - init()/clean() – Query: # components, components’ root PE ranks, rank translation MCTWorld (MOAB) type MCTWorld integer :: MCT_comm integer :: ncomps integer :: mygrank iMesh_Instance :: mesh integer,dimension(:),pointer :: nprocspid integer,dimension(:,:),pointer :: idGprocid end type MCTWorld Sole extension to the datatype is the MOAB mesh instance MCTWorld was conceived as a "lightweight registry" that served as a directory service for intercomponent communications. Addition of an MOAB instance to it makes it considerably heavier, but converts it into a full-blown registry for coupling purposes. AttrVect (Legacy MCT) type AttrVect type(List) :: iList type(List) :: rList integer,dimension(:,:),pointer :: iAttr real(FP) ,dimension(:,:),pointer :: rAttr end type AttrVect Stores pointwise collections of REAL (INTEGER) fields, or attributes, indexible by string tags in iList (iList) Key methods: – Create/destroy: init(), clean() – Query: length - lsize(), # REAL/INTEGER attributes nIAttr()/nRAttr(), names of attributes – Manipulate: copy(), zero(), append attributes, Import/Export indivudual attributes, sorting,, cross-indexing of attributes AttrVect (MOAB) type AttrVect type(List) :: iList type(List) :: rList iBase_TagHandle,dimension(:),pointer :: itagh iBase_TagHandle,dimension(:),pointer :: rtagh iBase_EntityHandle,dimension(:),pointer :: enths end type AttrVect Built using Fortran interface to MOAB (iMesh) – INTEGER/REAL attribute lists retained – Natural equivalence between “attribute” and “tag” – Attributes now stored contiguously and referenced by a handle iBase_TagHandle (implemented as an integer) – Mesh entities referenced by iBase_EntityHandles iMesh-AttrVect test program ! Initialize MCT (Default 3-D--but empty--iMesh instance created: call MCTWorld_init(1, MPI_COMM_WORLD, comm1, 1) ! Initialize MCT AttrVect: call AttrVect_init(av1, rList=‘field1:field2’, & lsize=avsize) ! Query embedded iMesh instance to determine dimensionality: call iMesh_getGeometricDimension(%VAL(ThisMCTWorld%mesh), & geom_dim, ier) ! iMesh query function on the new Av tag handle call iMesh_getTagName(%VAL(ThisMCTWorld%mesh), & %VAL(av1%rtagh(1)) , & tagname, ier, %VAL(10)) Other AttrVect methods from previous slide also available as-is MCT on Mira - Argonne’s BlueGene/G 48 racks 1024 nodes per rack 1.6 Ghz 16-2ay core processor and 16 GB RAM per node 348 I/O nodes 240 GB/s 35PB Storage 768K cores 768 TB Ram 10PF peak CESM with cpl7/MCT on BG/Q Status Latest development version of CESM compiles and runs on BG/Q Nodes (1 degree case) Total ranks (pure MPI) Simulation rate (years/day 32 512 4.03 64 1024 6.54 128 2048 8.96 Compare: 7.4 years/day on 512 BG/P nodes (2048 cores; mixed). Compiler bug encountered and patched! CESM is mixed-mode but currently any threading slows down the model. 29 Additional MCT development New Features to aid in debugging Router times – GSMap and MCTWorld print(). • Print contents to ascii file for later reading – Router init internal timers • Invoked with optional string argument to Router init. – RouterTest.F90 - test program which reads in output GSMap and MCTWorld info and builds a Router. • Will build on same number of procs and same decomposition as original model. – On branch but not yet released Next: Conversion from F90 to F95 (and later F2003) MCT: To be continued… www.mcs.anl.gov/mct