Performance of the Community Earth System Model (CESM1.0.3)

advertisement
Performance of the
Community Earth System Model (CESM1.0.3)
on Huygens
18 april 2012
Michael Kliphuis
Introduction
This short paper describes the performance of the Community Earth System Model (CESM1.0.3) on
the IBM P6 Huygens at SARA for a number of different resolutions. This gives an indication for the
amount of cpu hours (PNU in the tables below) that we need, to do the runs for the so called ACC
project in which the development of the Antarctic Circumpolar Current and its climatic impact is
studied.
CESM
The Community Earth System Model (CESM) is the latest in a series of climate models that have been
developed by and maintained at the National Center for Atmospheric Research (NCAR)
In contrast to its predecessor CCSM3, the CESM contains options for a terrestrial carbon cycle and
dynamic vegetation, atmospheric chemistry and aerosol dynamics, and ocean ecosystems and
biogeochemical coupling, all necessary for an earth system model, as distinct from a purely physical
model like the CCSM3. Version 4 of the model (CCSM4) was released in April, 2010, but this was
primarily a prerelease of the new CESM physical models and coupler infrastructure.
Tested resolutions
In the notation for a resolution the part before the underscore is the resolution of the atmoshere/land
models (e.g. 0.9x1.25 in degrees) and the part after it the resolution of the ocean/ice model (e.g. gx1v6
means 1 degree version 6, gx3v7 means 3 degree version 7)
We tested the performance for the following resolutions:
I)
0.9×1.25_gx1v6
atm/lnd: 288 × 192 × 26 gridpoints
ocn/ice: 384 × 320 × 60 gridpoints
II)
1.9x2.5_gx1v6
atm/lnd: 144 × 96 × 26 gridpoints
ocn/ice: 384 × 320 × 60 gridpoints
III)
T31_gx3v7
atm/lnd: 96 × 48 × 26 gridpoints
ocn/ice: 100 × 116 × 26 gridpoints
In our project we need to do a spinup run of about 2000 years with all components fully active.
Unfortunately the 3 resolutions above are the only ones which can be run fully active i.e. with all
components active and no use of data models. This is confirmed by Dr. Bette Otto-Bliesner from
NCAR.
Other options such as T62_gx1v6 or T42_T42 do have an active ocean/ice model but datamodels for
atmosphere/land that use the COREv2 dataset.
For the atm model in the 3 tested resolutions we used ‘cam4’ which is the standard configuration. It is
also possible to use ‘cam5’ which has many improvements but has more complicated physics and for
instance for the 0.9×1.25_gx1v6 case it has 30 vertical levels instead of 26 degrading the overall
performance by approximately a factor 2 [1].
Load Balancing Procedure
In order to get the best performance it is best to:
- put the ocean on its own set of cores
- put the coupler on a subset of the atmosphere cores
- cheap components must always run ahead of expensive ones
- preferably take a multiple of 32 cores for the total nr of cores
- try to keep a component on 1 node
- beware of empty cores (that are not doing anything)
- the sum of run times of LND, ATM, ICE and CPL should be  the runtime of OCN
When a run is finished then it will output the performance and component run times in a file
in the directory ‘timing’ under the directory from where the runscript is submitted. On
Huygens we have SMT (Simultaneous Multi Threading) which means that each core can do 2
tasks at the same time. The CESM code is compiled with openMP. In the performance tests
we found that setting the number of threads to 2 (see Results below) gave the best
performance.
For good performance also:
1. make sure that in the file with compiler settings (Macros.huygens)
-g is switched off
-essl to the FC compiler (FC := mpfort -v -compiler xlf90_r -lessl)
2. set REST_OPTION = 'never' in the env_run.xml file
Results
All components fully active
The tables below show the performance results for the described resolutions with all
components fully active (compset B)
The 2 values in the columns ‘cpl pes’, ‘lnd pes’ etc. stand for:
tasks x threads = nr of mpi tasks x nr of openmp threads
root_pe
= index of first process in the set
0.9×1.25_gx1v6
Compset cpl
pes
64
128
256
512
32x2
0
64x2
0
160x2
0
320x2
0
1.9×2.5_gx1v6
Total #
cpl
cores
pes
32
64
128
256
512
compset B
lnd
pes
20x2
0
40x2
0
80x2
0
160x2
0
320x2
0
14x2
0
28x2
80
64x2
160
128x2
320
compset B
lnd
pes
20x2
0
40x2
0
80x2
0
160x2
0
64x2
256
T31_gx3v7 compset B
Total #
cpl
lnd
cores
pes
pes
32
64
128
256
24x2
0
48x2
0
96x2
0
192x2
0
24x2
0
48x2
0
96x2
0
192x2
0
ice
pes
atm
pes
ocn
pes
PNU/
yr
performance
yrs/day
40x2
0
80x2
0
160x2
0
320x2
0
54x2
0
108x2
0
224x2
0
448x2
0
10x2
54
20x2
108
32x2
224
64x2
448
359
4.28
401
7.66
492
12.49
791
15.53
ice
pes
atm
pes
ocn
pes
PNU/
yr
performance
yrs/day
20x2
0
40x2
0
80x2
0
160x2
0
256x2
0
20x2
0
40x2
0
80x2
0
160x2
0
320x2
0
12x2
20
24x2
40
48x2
80
96x2
160
192x2
320
159
4.82
181
8.49
206
14.91
329
18.70
477
25.78
ice
pes
atm
pes
ocn
pes
PNU/
yr
performance
yrs/day
20x2
0
40x2
0
80x2
0
160x2
0
24x2
0
48x2
0
96x2
0
192x2
0
8x2
24
16x2
48
32x2
96
64x2
192
54
14.21
60
25.43
64
44.63
108
57.01
figure 1: performance CESM_1.0.3 on Huygens (compset B)
ocean/ice fully active with datamodels for atmosphere/land
This was only tested for resolution 1.9x2.5_gx1v6 which we decided to use for the spinup run in our
project (see conclusions below).
The table below shows the performance results for this resolution
1.9×2.5_gx1v6
Total #
cpl
cores
pes
128
512
32x2
0
128x2
0
compset G
lnd
pes
1x1
16
1x1
64
ice
pes
atm
pes
ocn
pes
PNU/
yr
performance
yrs/day
32x2
0
128x2
0
1x1
0
1x1
0
96x2
32
384x2
128
111
27.74
250
49.21
Conclusions
For the simulations in the ACC project we need to do the spinup run with all components active
(compset B). In the tested version of CESM (1.0.3) there are only 3 resolutions for which it is possible
to run a compset B. This is confirmed by Dr. Bette Otto-Bliesner from NCAR.
When I asked her for advice about what resolution to use for the ACC project she talked to one of the
ocean modelers, Markus Jochum, on his recommendation for our project and his response was:
“The T31_gx3v7 is perfect for this. In Christine's paper ([3] in press) you'll find it described and the
performance compared to the other resolutions. For SO and ENSO climate work I'd always use the
T31_gx3v7”
If however we want the ocean component to be at least 1 degree then given the fact that we will do a
couple of runs of about 2000 modelyears it is best to use 1.9x2.5_gx1v6.
We still have a budget of 2.9 million cpu hours left on Huygens which need to be used before
Huygens is replaced by a new supercomputer at the end of this year.
1000 modelyears with 1.9x2.5_gx1v6 (compset B) on 512 cores takes 39 days and costs 477.000 PNU
When the spinup run is ready we can decide to continue the run with only the ocean and ice
component active and with datamodels for atmosphere and ocean (compset G).
1000 modelyears with 1.9x2.5_gx1v6 (compset G) on 512 cores takes 20 days and costs 250.000 PNU
REFERENCES
[1] P. H. Worley et al. Performance of the Community Earth System Model,
http://mmc.geofisica.unam.mx/edp/Ejemplitos/SC11/src/pdf/papers/tp49.pdf
[2] CESM1 timing table at the NCAR website, http://www.cesm.ucar.edu/models/cesm1.0/timing/
[3] C.A. Shields et al. The low resolution CCSM4,
http://www.cgd.ucar.edu/staff/markus/CCSM4_LowRes_minRev_dec13.pdf
Download