Uploaded by Thangavel Subramani

Project deliverable ESCF2020-05

advertisement
E3C: Exploring Energy Efficient Computing
Dawn Geatches, Science & Technology Facilities Council, Daresbury Laboratory,
Warrington, WA4 4AD. dawn.geatches@stfc.ac.uk
This scoping project was funded under the Environmental Sustainability Concept Fund
(ESCF) within the Business Innovation Department of STFC.
This document is a first attempt to demonstrate how users of the quantum mechanicsbased software code CASTEP1 can run their simulations on high performance computing
(HPC) architectures efficiently. Whatever the level of experience a user might have, the
climate crisis we are facing dictates that we need to (i) become aware of the consumption of
computational resources of our simulations; (ii) understand how we, as users can reduce this
consumption; (iii) actively develop energy efficient computing habits. This document provides
some small insight to help users progress through stages (i) and (ii), empowering them to
adopt stage (iii) with confidence.
This document is not a guide to setting-up and running simulations using CASTEP, these
already exist (see, for example CASTEP ). Assumptions are made throughout this document
that the user has a basic familiarity of the software and its terminology. This document does
not exhaust all of the possible ways to reduce computational cost – much will be left to the
user to discover for themselves and to share with the wider CASTEP community (e.g. via the
JISCMAIL CASTEP Users Mailing List ). Thank you!
Sections
1. Computational cost of simulations
2. Reducing the energy used by your simulation
A.
Cell file
B.
Param file
C. Submission script
D. An (extreme) example
3. Developing energy efficient computing habits: A recipe
4. What else can a user do?
5. What are the developers doing?
1. Computational cost of simulations
‘Computational cost’ in the context of this project is synonymous with ‘energy used’. As a
user of high performance computing (HPC) resources have you ever wondered what effect
your simulations have on the environment through the energy they consume? You might be
working on some great new renewable energy material and running hundreds or thousands
of simulations over the lifetime of the research. How does the energy consumed by the
research stack against the energy that will be generated/saved/stored etc. by the new
material? Hopefully, the stacking is gigantically in favour of the new material and its
promised benefits.
Fortunately, we can do more than hope that that is the case, we can actively reduce the
energy consumed by our simulations, indeed, it’s the responsibility of every single
computational modeller to do exactly that. Wouldn’t it be great (not to say impressive) if,
when you write your next funding application, you can give a ballpark figure as to the amount
of energy your computational research will consume over the lifetime of the project?
As a user you might be thinking ‘but what effect can I have when surely the HPC architecture
is responsible for energy usage?’ and ‘then there’s the code itself, which should be as
efficient as possible but if it’s not I can’t do anything about that?’ Both of these thoughts are
grounded in truth: the HPC architecture is fixed - but we can use it efficiently; the software
we’re using is structurally fixed – but we can run it efficiently.
The energy cost (E) of a simulation is the total power per core (P) consumed over the length
of time (T ) of the simulation, which for parallelised simulations run on (N) cores is: 𝐸 = 𝑁𝑃𝑇.
From this it is logical to think that reducing N, P, and/or T will reduce E, which is theoretically
true. Practically though, let’s assume that the power consumed by each core is a fixed
property of the HPC architecture, we now have: 𝐸 ∝ 𝑁𝑇. This effectively encapsulates where
we, as users of HPC, can control the amount of energy our simulations consume and seems
simple. All we need to do is learn how to optimize the number of cores and the length of time
of our simulations.
We use multiple cores to share the memory load and to speed-up a calculation, giving us
three calculation properties to optimise: number of cores; memory per core; time. To reduce
the calculation time we might first increase the number of cores. Many users might already
know that the relationship between core count and calculation time is non-linear thanks to
the required increase in core-to-core and node-to-node communication time. Taking the
latter into account means the total energy used is 𝐸 = 𝑁𝑇 + 𝑓(𝑁, 𝑇) where 𝑓(𝑁, 𝑇) captures
the energy cost of the core-core/node-node communication time.
To optimise energy efficiency, any speed-up in calculation time gained by increasing the
number of cores, needs to balance the increased energy cost of using additional cores.
Therefore, the speed-up factor needs to be more than the factor of the number of cores as
shown in the equations below for a 2-core vs serial example.
𝐸𝑠 = 𝑇𝑠 , (𝑓(𝑇𝑠 ) = 0)
Energy of serial (i.e. 1-core) calculation
𝐸2𝑁 = 2𝑇2𝑁 + 𝑓(2, 𝑇2𝑁 )
Energy of 2-core calculation
𝐸2𝑁 ≤ 𝐸𝑆
For the energy cost of using 2 cores to be no greater
than the energy cost of the serial calculation
1
1
2𝑇2𝑁 + 𝑓(2, 𝑇2𝑁 ) ≤ 𝑇𝑆
i.e.
𝑇2𝑁 + 2 𝑓(2, 𝑇2𝑁 ) ≤ 2 𝑇𝑆 which means that the total
calculation time using 2-cores needs to be less than half of the serial time. So, for users to
run simulations efficiently in parallel, they need to balance the number of cores and the
associated memory load per core, and the total calculation time. The following section shows
how some of the more commonly used parameters within CASTEP affect these three
properties.
NB: The main purpose of the following examples is to illustrate the impact of different
user-choices on the total energy cost of simulations. These examples do not indicate
the level of ‘accuracy’ attained because ‘accuracy’ is determined by the user
according to the type, contents, and aims of their simulations.
2. Reducing the energy used by your simulation
This section uses an example of a small model of a clay mineral, (and later a carbon
nanotube) to illustrate how a user can change the total energy their simulation uses by a
judicious choice of CASTEP input parameters.
Figure 1 unit cell of generic silicate clay mineral comprising 41 atoms
A. Cell file
Pseudopotentials
Choose the pseudopotential according to the type of simulation, e.g. for simulations of cell
structures ultrasofts2 are often sufficient, although if the pseudopotential library does not
contain an ultrasoft version for a particular element, the on-the-fly-generated (OTFG)
ultrasofts3 might suffice. If a user is running a spectroscopic simulation such as infrared
using density functional perturbation theory4 then norm-conserving5 or OTFG normconserving3 could be the better choices. The impact of pseudopotential type on the
computational cost is shown in Table 1 through the total (calculation) time.
Type of
pseudopotential
Ultrasoft
Normconserving
OTFG
Ultrasoft
OTFG
OTFG NormUltrasoft conserving
QC5 setb
Cut-off energy
(eV)
# coresa
Memory/process
(MB)
Peak memory
use (MB)
Total time (secs)
370
900
598
340
925
5
666
5
681
5
2072
5
1007
5
681
777
802
2785
1590
791
55
89
250
109
136
Table 1 Pseudopotential and size of planewave set required on ‘fine’ setting of Materials Studio 2020 6, and an example of
memory requirements and time required for a single point energy calculation using the recorded number of cores on a
single node. Unless otherwise stated, the same cut-off energy per type of pseudopotential is implied throughout this
document. aUsing Sunbird (CPU: 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz with 20 cores each), unless stated
otherwise all calculations were performed on this HPC cluster. bDesigned to be used at the same modest (340 eV) kinetic
energy cut-off across the periodic table. They are ideal for moderate accuracy high throughout calculations e.g. ab initio
random structure searching (AIRSS).
K-points
Changing the number of Brillouin zone sampling points can have a dramatic effect on
computational time as shown in Table 2. Bear in mind that increasing the number of k-points
increases the memory requirements, often tempting users to increase the number of cores,
further increasing overall computational cost. Remember though, it’s important to use the
number of k-points that provide the level of accuracy your simulations need.
Type of
pseudopotential
Ultrasoft
kpoints_mp_grid
2 1 1 (1)
652
3 2 1 (3)
666
4 3 2 (12)
1249
2 1 1 (1)
630
3 2 1 (3)
681
4 3 2 (12)
1287
768
777
1580
764
791
1296
32
55
222
85
136
477
Memory/process
(MB)
Peak memory use
(MB)
Total time (secs)
OTFG Norm-conserving
Table 2 Single point energy calculations run on 5 cores using different numbers of k-points (in brackets), showing the
effects for different pseudopotentials.
Vacuum space
When building a material surface it is necessary to add vacuum space to a cell (see Figure 2
for example), and this adds to the memory requirements and calculation time because the
‘empty space’ (as well as the atoms) is ‘filled’ by planewaves. Table 3 shows that doubling
the volume of vacuum space doubles the total calculation time (using the same number of
cores).
Vacuum space
(Å)
Memory/process
(MB)
Peak memory
use (MB)
Total time (secs)
Overall parallel
efficiencya
Figure 2 Vacuum space added to create clay mineral
surface (to study adsorbate-surface interactions for
example –adsorbate not included in the above)
0
5
10
20
666
766
834
1078
777
928
1066
1372
55
69%
102
66%
202
67%
406
61%
Table 3 Single point energy calculations using ultrasoft
pseudopotentials and 3 k-points, run on 5 cores, showing
the effects of vacuum space.
aCalculated automatically by CASTEP.
Supercell size
The size of a system is one of the more obvious choices that affects the demands on
computational resources; nevertheless, it is interesting to see (from Table 4) that for the
same number of kpoints doubling the number of atoms increases the memory load per
process between 35% (41 to 82 atoms) to 72%, (82 to 164 atoms) and the corresponding
calculation times increase by factors 11 and 8.5 respectively. In good practice, the number
of kpoints is scaled according to the supercell size, increasing the computational cost more
modestly.
Supercell size (# atoms)
Kpoints (mp
Kpoints scaled for supercells
grid)
2x1x1 and 2x2x1
Memory/process (MB)
Peak memory use (MB)
Total time (secs)
Overall parallel efficiencya
1x1x1
(41)
3 2 1 (3)
666
777
55
69%
2 x 1 x 1 (82)
2 x 2 x 1 (164)
321
(3)
897
1175
631
69%
321
(3)
1547
2330
5416
67%
211
(1)
732
1025
329
74%
211
(1)
1315
2177
1660
72%
Table 4 Single point energy calculations using ultrasoft pseudo-potentials, run on 5 cores, showing the effects of supercells.
aCalculated automatically by CASTEP.
Figure 3 Example of 2 x 2 x 1 supercell
Orientation of axes
This might be one of the more surprising and unexpected properties of a model that affects
computational efficiency. The effect becomes significant when a system is large,
disproportionately longer along one of its lengths, and is misaligned with the x-, y-, z-axes,
see Figure 4 and Table 5 for exaggerated examples of misalignment. This effect is due to
the way CASTEP transforms real-space properties between real-space and reciprocalspace; it converts the 3-d fast Fourier transforms (FFT) to three, 1-d FFT columns that lie
parallel to the x-, y, z-axes.
Figure 4 Top row: A capped carbon nanotube (160 atoms), and bottom row a long carbon nanotube (1000 atoms) showing:
long axes aligned in the x-direction (left); z-direction (middle); skewed (right).
Orientation
(# atoms)
# Cores
Memory/process (MB)
Peak memory use (MB)
Total time (secs)
Overall parallel
efficiencya
Relative total energy
(# cores * total time
core-seconds)
X
(160)
5
884
1893
392
79%
Z
(160)
5
882
1885
359
84%
Skewed
(160)
5
882
1838
409
82%
X
(1000)
60
2870
7077
3906
78%
Z
(1000)
60
2870
7077
3908
78%
Skewed
(1000)
60
2870
7077
5232
75%
1960
1795
2045
234360
234480
313920
Table 5 Single point energy calculations of carbon nanotubes shown as oriented in Fig. 4, using ultrasoft pseudopotentials
(280 eV cut-off energy) and 1 k-point. aCalculated automatically by CASTEP.
B. Param file
Grid-scale
Although the ultrasofts require a smaller size of planewave basis set than the normconserving, they do need a finer electron density grid scale in the settings ‘grid_scale’ and
‘fine_grid_scale’. As shown in Table 6 the denser grid scale setting for the OTFG ultrasofts
(with the exception of the QC5 set) can almost double the calculation time over the larger,
planewave hungry OTFG norm-conserving pseudopotentials that converge well under a less
dense grid.
Type of
pseudopotential
Norm-conserving
Ultrasoft OTFG
Norm-conserving
OTFG
OTFG
Ultrasoft Ultrasoft
QC5 set
grid_scale :
fine_grid_scale
1.5:1.75
2.0:3.0
2.0:3.0
1.5:1.75
2.0:3.0
2.0:3.0
2.0:3.0
Memory/process
(MB)
Peak memory use
(MB)
Total time (secs)
792
681
666
680
731
2072
1007
803
1070
777
791
956
2785
1590
89
150
55
136
221
250
109
Table 6 Single point energy calculations run on 5 cores, showing the effects of different electron density grid settings.
Data Distribution
Parallelizing over plane wave vectors (‘G-vectors’), k-points or a mix of the two has an
impact on computational efficiency as shown in Table 7.
The default for a .param file without the keyword ‘data_distribution’ is to prioritize k-point
distribution across a number of cores (less than or equal to the number requested in the
submission script) that is a factor of the number of k-points, see for example, Table 7,
columns 2 and 3. Inserting ‘data_distribution : kpoint’ into the .param file prioritizes and
optimizes the k-point distribution across the number of cores requested in the script. In the
example tested, selecting data distribution over kpoints increased the calculation time over
the default of no data distribution; compare columns 3 and 5 of Table 7.
Requesting G-vector distribution has the largest impact on calculation time and combining
this with requesting a number of cores that is also a factor of the number of k-points, has the
overall largest impact on reducing calculation time –see columns 6 and 7 of Table 7.
Requesting mixed data distribution has a similar impact on calculation time as not requesting
any data distribution for 5 cores but not for 6 cores, the ‘mixed’ distribution used 4-way
kpoint distribution rather than the default (non-) request that applied 6-way distribution –
compare columns 2 and 3 with 8 and 9.
For the small clay model system the optimal efficiency was obtained using G-vector data
distribution over 6 cores (852 core-seconds) and the least efficient choice was mixed data
distribution over 6 cores (1584 core-seconds). The results are system-specific and need
careful testing to tailor to different systems.
Number of tasks per node
This is invoked by adding ‘num_proc_in_smp’ to the .param file, and controls the number of
message parsing interface (MPI) tasks that are placed in a specifically OpenMP (SMP)
group. This means that the “all-to-all” communications is then done in three phases instead
of one:
(1) tasks within an SMP collect their data together on a chosen “controller” task within their
group;
(2) the “all-to-all” is done between the controller tasks;
(3) the controllers all distribute the data back to the tasks in their SMP groups.
For small core counts, the overhead of the two extra phases makes this method slower than
just doing an all-to-all; for large core counts, the reduction in the all-to-all time more than
compensates for the extra overhead, so it’s faster. Indeed, the tests (shown in Table 8)
reveal that invoking this flag fails to produce as large a speed-up as the flag:
‘data_distribution : gvector’ (compare columns 3 and 9) for the test HPC cluster – Sunbird,
reflecting the requested small core count. Generally speaking, the more cores in the Gvector group, the higher you want to set “num_proc_in_smp” (up to the physical number of
cores on a node).
Column #
1
Requested data
distribution +
# cores in HPC
submission script
Actual data
distribution
Memory/process
(MB)
Peak memory use
(MB)
Total time (secs)
Overall parallel
efficiencya
Relative total
energy
(# cores * total
time
core-seconds)
2
None
3
None
4
Kpoints
5
Kpoints
6
Gvector
7
Gvector
8
Mixed
9
Mixed
5 cores
6 cores
5 cores
6 cores
5 cores
6 cores
5 cores
6 cores
kpoint
4-way
1249
kpoint
6-way
1219
kpoint
5-way
1249
kpoint
6-way
1219
Gvector
5-way
728
Gvector
6-way
698
kpoint
4-way
1249
kpoint
4-way
1253
1581
1561
1581
1561
839
804
1581
1585
295
99%
199
96%
292
98%
226
96%
191
66%
142
71%
294
98%
264
96%
1475
1194
1460
1356
955
852
1470
1584
Table 7 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, showing the effects of data
distribution across different numbers of cores requested in the script file. ‘Actual data distribution’ means that reported by
CASTEP on completion in this and (where applicable) all following Tables. ‘Relative total energy’ assumes that each core
requested by the script consumes X amount of electricity. aCalculated automatically by CASTEP.
num_proc_in_smp
Requested data_distribution
Actual data distribution
Default
None
Gvector
kpoint Gvector
4-way 5-way
2
4
5
None
Gvector None
Gvector None
Gvector
kpoint Gvector kpoint
Gvector kpoint
Gvector
4-way
5-way
4-way 5-way
4-way
5-way
1249
728
1249
728
1249
728
Memory/process (MB)
1249
728
1581
839
1581
844
1581
846
Peak memory use (MB)
1580
837
231
171
230
182
237
183
Total time (secs)
222
156
98%
60%
98%
56%
96%
56%
Overall parallel efficiencya
96%
66%
4
5
6
7
8
9
Column # 1
2
3
Table 8 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, run on 5 cores, showing the
effects of setting ‘num_proc_in_smp : 2, 4, 5’, both with and without the ‘data_distribution : gvector’ flag. ‘Default’ means
‘num_proc_in_smp’ absent from .param file. aCalculated automatically by CASTEP
Optimization strategy
This parameter has three settings and is invoked through the ‘opt_strategy’ flag in the
.param file:
ο‚·
Default - Balances speed and memory use. Wavefunction coefficients for all k-points
in a calculation will be kept in memory, rather than be paged to disk. Some large
work arrays will be paged to disk.
ο‚· Memory - Minimizes memory use. All wavefunctions and large work arrays are paged
to disk.
ο‚· Speed - Maximizes speed by not paging to disk.
This means that if a user runs a large memory calculation, optimizing for memory could
obviate the need to request additional cores although the calculation will take longer - see
Table 9 for comparisons.
opt_strategy
Memory/process (MB)
Peak memory use
(MB)
Total time (secs)
Overall parallel
efficiencya
Default
793
1566
Memory
750
1092
Speed
1249
1581
232
94%
290
97%
221
96%
Table 9 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, run on 5 cores, showing the
effects of optimizing for speed or memory. ‘Default’ means either omitting the ‘opt_strategy’ flag from the .param file or
adding it as ‘opt_strategy : default’. aCalculated automatically by CASTEP.
Spin polarization
If a system comprises an odd number of electrons it might be important to differentiate
between the spin-up and spin-down states of the odd electron. This directly affects the
calculation time, effectively doubling it as shown in Table 10.
.param flag
and
setting
Memory/process
(MB)
Peak memory
use (MB)
Total time (secs)
Overall parallel
efficiencya
spin_polarization
false
true
1249
1415
1581
1710
222
96%
455
98%
Table 10 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, run on 5 cores, showing the
effects of spin polarization. aCalculated automatically by CASTEP.
Electronic energy minimizer
Insulating systems often behave well during the self-consistent field (SCF) minimizations and
converge smoothly using density mixing (‘DM’). When SCF convergence is problematic and
all attempts to tweak DM-related parameters have failed, it is necessary to turn to ensemble
density functional theory7 and accept the consequent (and considerable) increase in
computational cost –see Table 11.
.param flag
and
setting
Memory/process (MB)
Peak memory use (MB)
Total time (secs)
Overall parallel efficiencya
metals_method (Electron minimization)
DM
EDFT
1249
1581
222
96%
1289
1650
370
97%
Table 11 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, run on 5 cores, showing the
effects of the electronic minimization method. ‘DM’ means density mixing and ‘EDFT’ ensemble density functional theory.
aCalculated automatically by CASTEP.
C. Script submission file
Figure 5 An example HPC batch submission script
Figure 5 captures the script variables that affect HPC computational energy and usage
efficiency:
(i)
(ii)
(iii)
(iv)
(v)
The variable familiar to most HPC users describes the number of cores (‘tasks’)
requested for the simulation. Unless the calculation is memory hungry configure
the requested number of cores to sit on the fewest nodes because this reduces
expensive node-to-node communication time.
Choosing the shortest job run time gives the calculation a better chance of
progressing through the job queue swiftly.
When not requesting use of all cores on a single node, remove the ‘exclusive’
flag to accelerate progress through the job queue.
Using the most recent version of software captures the latest upgrades and bugfixes that might otherwise slow down a calculation.
Using the ‘dryrun’ tag provides a (very) broad estimate of the memory
requirements. In one example the estimate of peak memory use was ¼ of that
actually used during the simulation proper.
D. An (extreme) example
Clay mineral (Figure 2)
Pseudopotential and cut-off
energy (eV)
K-points
Grid-scale : fine-grid-scale
num_proc_in_smp/requested
data distribution
Actual data distribution
Optimization strategy
Spin polarization
Electronic energy minimizer
Number of cores requested
Memory/process (MB) / Scratch
disk (MB)
Peak memory use (MB)
Total time (seconds)
Overall parallel efficiencya
Relative total energy
(# cores * total time
core-seconds/ core-hours)
kiloJoules used (approx.)
Careful optimisation for
energy efficiency
Vacuum space 10Å
Ultrasoft; 370
Careless – no optimisation
for energy efficiency
Vacuum space 10 Å
OTFG-Ultrasoft; 599
3
2:3
default / Gvector
12
3:4
20 / none
5-way Gvector only
Speed
False
Density mixing
5
834 / 0
3-way Gvector,12-way kpoint,
3-way (Gvector) smp
Default
True
EDFT
40
RESULTS
1461 / 6518
1066
215
69%
1075
0.30
9107
45,302
96%
1 812,080
503.36
202
52,000
Table 12 One clay mineral model, (Figure 2) with vacuum spaces of 10Å - Single point energy calculations showing the
difference between carefully optimizing for energy efficiency and carelessly running without pre-testing. aCalculated
automatically by CASTEP.
Table 12 illustrates the combined effects of many of the model properties and parameters
discussed in the previous section on the total time and overall use of computational
resources. It’s unlikely a user would choose the whole combination of model properties and
parameters shown in the ‘careless’ column, but it nevertheless gives an idea of the impact a
user can have on the energy consumption of their simulations. For comparison, the cheapest
electric car listed in 2021 consumes 26.8 kWh per 100 miles, or 603 kJ/km, which means
that the carelessly run simulation used the equivalent energy of driving this car about 86 km,
whereas the efficiently run simulation ‘drove’ it 0.33 km.
For computational scientists and modellers, applying good energy efficiency practices needs
to become second nature; following an energy efficiency ‘recipe’ or procedure is a route to
embedding this practice as a habit.
3. Developing energy efficient computing habits: A recipe
1) Build a model of a system that contains only the essential ingredients that
allows exploration of the scientific question. This is one of the key factors that
determines the size of a model.
2) Find out how many cores per node there are on the available HPC cluster. This
enables users to request the number of cores/tasks that minimizes inter-node
communication during a simulation.
3) Choose the pseudopotentials to match the science. This ensures users don’t use
pseudopotentials that are unnecessarily computationally expensive.
4) Carry out extensive convergence testing based on the minimum accuracy
required for the production run results, e.g.:
(i) Kinetic energy cut-off (depends on pseudopotential choice)
(ii) Grid scale and fine grid scale (depends on pseudopotential choice)
(iii) Size and orientation of model, including e.g. number of bulk atoms,
number of layers, size of surface, vacuum space etc..
(iv) Number of k-points
These decrease the possibility of over-convergence and its associated
computational cost.
5) Spend time optimising the .param file properties described in Section B using a
small number of SCF cycles:
a. Data distribution: Gvector, k-points or mixed?
b. Number of tasks per node
c. Optimization strategy
d. Spin polarization
e. Electronic energy (SCF) minimization method
This increases the chances of using resources efficiently due to matching the
model and material requirements to the simulation parameters.
6) Optimise the script file. This increases the efficient use of HPC resources.
7) Submit the calculation and initially monitor it to check it’s progressing as
expected. This reduces the chances of wasting computational time due to trivial
(‘Friday afternoon’!) mistakes.
8) Carry out your own energy efficient computing tests (and send your findings to
the JISCMAIL CASTEP mailing list).
9) Sit back and wait for the simulation to complete, basking in the knowledge that
the simulation is running as energy efficiently1 as a user can possibly make it!
4. What else can a user do?
In addition to using the above recipe to embed energy-efficient computing habits, a user
can take a number of actions to encourage the wider awareness and adoption of energy
efficient computing:
a. If the HPC cluster uses SLURM, use the ‘sacct’ command to check the
amount of energy consumed2 (in Joules) by a job -see Figure 6.
b. If your local cluster uses a different job-scheduler, ask your local IT helpdesk if
it has the facility to monitor the energy consumed by each HPC job.
c. Include the energy consumption of simulations in all forms of reports and
presentations, e.g. in/formal talks, posters, peer reviewed journal articles, social
media posts etc. This will increase awareness of our role as environmentally
aware and conscientious computational scientists and users of HPC resources.
1
It’s highly probable that users can expand on the list of model properties and parameters described within
this document to further optimise energy efficient computing.
2
‘Note: Only in case of exclusive job allocation this value reflects the jobs' real energy consumption’ :- see
https://slurm.schedmd.com/sacct.html
Figure 6 Examples of information about jobs output through SLURM’s ‘sacct’ command (plus flags). Top: list of details
about several jobs run from 20/03/2021; bottom: details for a specific job ID via the ‘seff <jobID>’ command.
d. Include estimates of the energy consumption of simulations in applications for
funding. Although not yet explicitly requested in EPSRC funding applications,
there is the expectation that UKRI’s 2020 commitment to Environmental
Sustainability will filter down to all activities of its research councils including
funding. This will mean that funding applicants will need to demonstrate their
awareness of the environmental impact of their proposed work. Become an
impressive pioneer and include environmental impact through energy
consumption in your next application.
5. What are the developers doing?
The compilation of this document included a chat with several of the developers of CASTEP
who are keen to help users run their software energy efficiently; they shared their plans and
projects in this field:
ο‚·
ο‚·
ο‚·
Parts of CASTEP have been programmed to run on GPUs with up to a 15-fold
speed-up (for non-local functionals).
Work on a CASTEP simulator is underway that should reduce the number of
CASTEP calculations required per simulation by choosing an optimal parallel domain
decomposition, and implementing timings for FFTs – the big parallel cost; also it will
estimate compute usage. This simulator will go a long way to providing the structure
needed to add energy efficiency to CASTEP and will be accessible through the
’- -dryrun’ command. The toy code is available in bitbucket.
The developers recognise the need for energy consumption to be acknowledged as
an additional factor to be included in the cost of computational simulations. They will
be planning their approach beyond the software itself such as including energy
efficient computing in their training courses.
Acknowledgements
I acknowledge the support of the Supercomputing Wales project, which is part-funded by the
European Regional Development Fund (ERDF) via Welsh Government.
Thank you to the following CASTEP developers for their invaluable input and support for this
small project: Dr. Phil Hasnip and Prof. Matt Probert (University of York); Prof Chris Pickard
(University of Cambridge); Dr. Dominik Jochym (STFC); Prof. Stewart Clark (University of
Durham). Thanks also to Dr. Sue Thorne (STFC) and Dr. Ed Bennett (Supercomputing,
Wales) for sharing their research engineering perspectives.
References
(1)
(2)
(3)
(4)
(5)
(6)
(7)
Clark, S. J.; Segall, M. D.; Pickard, C. J.; Hasnip, P. J.; Probert, M. I. J.; Refson, K.;
Payne, M. C. First Principles Methods Using CASTEP. Z Krist. 2005, 220, 567–570.
Vanderbilt, D. Soft Self-Consistent Pseudopotentials in a Generalized Eigenvalue
Formalism. Phys Rev B 1990, 41, 7892–7895.
Pickard, C. J. On-the-Fly Pseudopotential Generation in CASTEP. 2006.
Refson, K.; Clark, S. J.; Tulip, P. Variational Density Functional Perturbation Theory for
Dielectrics and Lattice Dynamics. Phys Rev B 2006, 73, 155114.
Hamann, D. R.; Schlüter, M.; Chiang, C. Norm-Conserving Pseudopotentials. Phys.
Rev. Lett. 1979, 43 (20), 1494–1497.
BIOVIA. Dassault Systèmes, Materials Studio 2020; Dassault Systèmes: San Diego,
2019.
Marzari, N.; Vanderbilt, D.; Payne, M. C. Ensemble Density Functional Theory for Ab
Initio Molecular Dynamics of Metals and Finite-Temperature Insulators. Phys. Rev. Lett.
1997, 79, 1337–1340.
Download