Performance of the Community Earth System Model (CESM1.0.3) on Huygens 18 april 2012 Michael Kliphuis Introduction This short paper describes the performance of the Community Earth System Model (CESM1.0.3) on the IBM P6 Huygens at SARA for a number of different resolutions. This gives an indication for the amount of cpu hours (PNU in the tables below) that we need, to do the runs for the so called ACC project in which the development of the Antarctic Circumpolar Current and its climatic impact is studied. CESM The Community Earth System Model (CESM) is the latest in a series of climate models that have been developed by and maintained at the National Center for Atmospheric Research (NCAR) In contrast to its predecessor CCSM3, the CESM contains options for a terrestrial carbon cycle and dynamic vegetation, atmospheric chemistry and aerosol dynamics, and ocean ecosystems and biogeochemical coupling, all necessary for an earth system model, as distinct from a purely physical model like the CCSM3. Version 4 of the model (CCSM4) was released in April, 2010, but this was primarily a prerelease of the new CESM physical models and coupler infrastructure. Tested resolutions In the notation for a resolution the part before the underscore is the resolution of the atmoshere/land models (e.g. 0.9x1.25 in degrees) and the part after it the resolution of the ocean/ice model (e.g. gx1v6 means 1 degree version 6, gx3v7 means 3 degree version 7) We tested the performance for the following resolutions: I) 0.9×1.25_gx1v6 atm/lnd: 288 × 192 × 26 gridpoints ocn/ice: 384 × 320 × 60 gridpoints II) 1.9x2.5_gx1v6 atm/lnd: 144 × 96 × 26 gridpoints ocn/ice: 384 × 320 × 60 gridpoints III) T31_gx3v7 atm/lnd: 96 × 48 × 26 gridpoints ocn/ice: 100 × 116 × 26 gridpoints In our project we need to do a spinup run of about 2000 years with all components fully active. Unfortunately the 3 resolutions above are the only ones which can be run fully active i.e. with all components active and no use of data models. This is confirmed by Dr. Bette Otto-Bliesner from NCAR. Other options such as T62_gx1v6 or T42_T42 do have an active ocean/ice model but datamodels for atmosphere/land that use the COREv2 dataset. For the atm model in the 3 tested resolutions we used ‘cam4’ which is the standard configuration. It is also possible to use ‘cam5’ which has many improvements but has more complicated physics and for instance for the 0.9×1.25_gx1v6 case it has 30 vertical levels instead of 26 degrading the overall performance by approximately a factor 2 [1]. Load Balancing Procedure In order to get the best performance it is best to: - put the ocean on its own set of cores - put the coupler on a subset of the atmosphere cores - cheap components must always run ahead of expensive ones - preferably take a multiple of 32 cores for the total nr of cores - try to keep a component on 1 node - beware of empty cores (that are not doing anything) - the sum of run times of LND, ATM, ICE and CPL should be the runtime of OCN When a run is finished then it will output the performance and component run times in a file in the directory ‘timing’ under the directory from where the runscript is submitted. On Huygens we have SMT (Simultaneous Multi Threading) which means that each core can do 2 tasks at the same time. The CESM code is compiled with openMP. In the performance tests we found that setting the number of threads to 2 (see Results below) gave the best performance. For good performance also: 1. make sure that in the file with compiler settings (Macros.huygens) -g is switched off -essl to the FC compiler (FC := mpfort -v -compiler xlf90_r -lessl) 2. set REST_OPTION = 'never' in the env_run.xml file Results All components fully active The tables below show the performance results for the described resolutions with all components fully active (compset B) The 2 values in the columns ‘cpl pes’, ‘lnd pes’ etc. stand for: tasks x threads = nr of mpi tasks x nr of openmp threads root_pe = index of first process in the set 0.9×1.25_gx1v6 Compset cpl pes 64 128 256 512 32x2 0 64x2 0 160x2 0 320x2 0 1.9×2.5_gx1v6 Total # cpl cores pes 32 64 128 256 512 compset B lnd pes 20x2 0 40x2 0 80x2 0 160x2 0 320x2 0 14x2 0 28x2 80 64x2 160 128x2 320 compset B lnd pes 20x2 0 40x2 0 80x2 0 160x2 0 64x2 256 T31_gx3v7 compset B Total # cpl lnd cores pes pes 32 64 128 256 24x2 0 48x2 0 96x2 0 192x2 0 24x2 0 48x2 0 96x2 0 192x2 0 ice pes atm pes ocn pes PNU/ yr performance yrs/day 40x2 0 80x2 0 160x2 0 320x2 0 54x2 0 108x2 0 224x2 0 448x2 0 10x2 54 20x2 108 32x2 224 64x2 448 359 4.28 401 7.66 492 12.49 791 15.53 ice pes atm pes ocn pes PNU/ yr performance yrs/day 20x2 0 40x2 0 80x2 0 160x2 0 256x2 0 20x2 0 40x2 0 80x2 0 160x2 0 320x2 0 12x2 20 24x2 40 48x2 80 96x2 160 192x2 320 159 4.82 181 8.49 206 14.91 329 18.70 477 25.78 ice pes atm pes ocn pes PNU/ yr performance yrs/day 20x2 0 40x2 0 80x2 0 160x2 0 24x2 0 48x2 0 96x2 0 192x2 0 8x2 24 16x2 48 32x2 96 64x2 192 54 14.21 60 25.43 64 44.63 108 57.01 figure 1: performance CESM_1.0.3 on Huygens (compset B) ocean/ice fully active with datamodels for atmosphere/land This was only tested for resolution 1.9x2.5_gx1v6 which we decided to use for the spinup run in our project (see conclusions below). The table below shows the performance results for this resolution 1.9×2.5_gx1v6 Total # cpl cores pes 128 512 32x2 0 128x2 0 compset G lnd pes 1x1 16 1x1 64 ice pes atm pes ocn pes PNU/ yr performance yrs/day 32x2 0 128x2 0 1x1 0 1x1 0 96x2 32 384x2 128 111 27.74 250 49.21 Conclusions For the simulations in the ACC project we need to do the spinup run with all components active (compset B). In the tested version of CESM (1.0.3) there are only 3 resolutions for which it is possible to run a compset B. This is confirmed by Dr. Bette Otto-Bliesner from NCAR. When I asked her for advice about what resolution to use for the ACC project she talked to one of the ocean modelers, Markus Jochum, on his recommendation for our project and his response was: “The T31_gx3v7 is perfect for this. In Christine's paper ([3] in press) you'll find it described and the performance compared to the other resolutions. For SO and ENSO climate work I'd always use the T31_gx3v7” If however we want the ocean component to be at least 1 degree then given the fact that we will do a couple of runs of about 2000 modelyears it is best to use 1.9x2.5_gx1v6. We still have a budget of 2.9 million cpu hours left on Huygens which need to be used before Huygens is replaced by a new supercomputer at the end of this year. 1000 modelyears with 1.9x2.5_gx1v6 (compset B) on 512 cores takes 39 days and costs 477.000 PNU When the spinup run is ready we can decide to continue the run with only the ocean and ice component active and with datamodels for atmosphere and ocean (compset G). 1000 modelyears with 1.9x2.5_gx1v6 (compset G) on 512 cores takes 20 days and costs 250.000 PNU REFERENCES [1] P. H. Worley et al. Performance of the Community Earth System Model, http://mmc.geofisica.unam.mx/edp/Ejemplitos/SC11/src/pdf/papers/tp49.pdf [2] CESM1 timing table at the NCAR website, http://www.cesm.ucar.edu/models/cesm1.0/timing/ [3] C.A. Shields et al. The low resolution CCSM4, http://www.cgd.ucar.edu/staff/markus/CCSM4_LowRes_minRev_dec13.pdf