E3C: Exploring Energy Efficient Computing Dawn Geatches, Science & Technology Facilities Council, Daresbury Laboratory, Warrington, WA4 4AD. dawn.geatches@stfc.ac.uk This scoping project was funded under the Environmental Sustainability Concept Fund (ESCF) within the Business Innovation Department of STFC. This document is a first attempt to demonstrate how users of the quantum mechanicsbased software code CASTEP1 can run their simulations on high performance computing (HPC) architectures efficiently. Whatever the level of experience a user might have, the climate crisis we are facing dictates that we need to (i) become aware of the consumption of computational resources of our simulations; (ii) understand how we, as users can reduce this consumption; (iii) actively develop energy efficient computing habits. This document provides some small insight to help users progress through stages (i) and (ii), empowering them to adopt stage (iii) with confidence. This document is not a guide to setting-up and running simulations using CASTEP, these already exist (see, for example CASTEP ). Assumptions are made throughout this document that the user has a basic familiarity of the software and its terminology. This document does not exhaust all of the possible ways to reduce computational cost – much will be left to the user to discover for themselves and to share with the wider CASTEP community (e.g. via the JISCMAIL CASTEP Users Mailing List ). Thank you! Sections 1. Computational cost of simulations 2. Reducing the energy used by your simulation A. Cell file B. Param file C. Submission script D. An (extreme) example 3. Developing energy efficient computing habits: A recipe 4. What else can a user do? 5. What are the developers doing? 1. Computational cost of simulations ‘Computational cost’ in the context of this project is synonymous with ‘energy used’. As a user of high performance computing (HPC) resources have you ever wondered what effect your simulations have on the environment through the energy they consume? You might be working on some great new renewable energy material and running hundreds or thousands of simulations over the lifetime of the research. How does the energy consumed by the research stack against the energy that will be generated/saved/stored etc. by the new material? Hopefully, the stacking is gigantically in favour of the new material and its promised benefits. Fortunately, we can do more than hope that that is the case, we can actively reduce the energy consumed by our simulations, indeed, it’s the responsibility of every single computational modeller to do exactly that. Wouldn’t it be great (not to say impressive) if, when you write your next funding application, you can give a ballpark figure as to the amount of energy your computational research will consume over the lifetime of the project? As a user you might be thinking ‘but what effect can I have when surely the HPC architecture is responsible for energy usage?’ and ‘then there’s the code itself, which should be as efficient as possible but if it’s not I can’t do anything about that?’ Both of these thoughts are grounded in truth: the HPC architecture is fixed - but we can use it efficiently; the software we’re using is structurally fixed – but we can run it efficiently. The energy cost (E) of a simulation is the total power per core (P) consumed over the length of time (T ) of the simulation, which for parallelised simulations run on (N) cores is: πΈ = πππ. From this it is logical to think that reducing N, P, and/or T will reduce E, which is theoretically true. Practically though, let’s assume that the power consumed by each core is a fixed property of the HPC architecture, we now have: πΈ ∝ ππ. This effectively encapsulates where we, as users of HPC, can control the amount of energy our simulations consume and seems simple. All we need to do is learn how to optimize the number of cores and the length of time of our simulations. We use multiple cores to share the memory load and to speed-up a calculation, giving us three calculation properties to optimise: number of cores; memory per core; time. To reduce the calculation time we might first increase the number of cores. Many users might already know that the relationship between core count and calculation time is non-linear thanks to the required increase in core-to-core and node-to-node communication time. Taking the latter into account means the total energy used is πΈ = ππ + π(π, π) where π(π, π) captures the energy cost of the core-core/node-node communication time. To optimise energy efficiency, any speed-up in calculation time gained by increasing the number of cores, needs to balance the increased energy cost of using additional cores. Therefore, the speed-up factor needs to be more than the factor of the number of cores as shown in the equations below for a 2-core vs serial example. πΈπ = ππ , (π(ππ ) = 0) Energy of serial (i.e. 1-core) calculation πΈ2π = 2π2π + π(2, π2π ) Energy of 2-core calculation πΈ2π ≤ πΈπ For the energy cost of using 2 cores to be no greater than the energy cost of the serial calculation 1 1 2π2π + π(2, π2π ) ≤ ππ i.e. π2π + 2 π(2, π2π ) ≤ 2 ππ which means that the total calculation time using 2-cores needs to be less than half of the serial time. So, for users to run simulations efficiently in parallel, they need to balance the number of cores and the associated memory load per core, and the total calculation time. The following section shows how some of the more commonly used parameters within CASTEP affect these three properties. NB: The main purpose of the following examples is to illustrate the impact of different user-choices on the total energy cost of simulations. These examples do not indicate the level of ‘accuracy’ attained because ‘accuracy’ is determined by the user according to the type, contents, and aims of their simulations. 2. Reducing the energy used by your simulation This section uses an example of a small model of a clay mineral, (and later a carbon nanotube) to illustrate how a user can change the total energy their simulation uses by a judicious choice of CASTEP input parameters. Figure 1 unit cell of generic silicate clay mineral comprising 41 atoms A. Cell file Pseudopotentials Choose the pseudopotential according to the type of simulation, e.g. for simulations of cell structures ultrasofts2 are often sufficient, although if the pseudopotential library does not contain an ultrasoft version for a particular element, the on-the-fly-generated (OTFG) ultrasofts3 might suffice. If a user is running a spectroscopic simulation such as infrared using density functional perturbation theory4 then norm-conserving5 or OTFG normconserving3 could be the better choices. The impact of pseudopotential type on the computational cost is shown in Table 1 through the total (calculation) time. Type of pseudopotential Ultrasoft Normconserving OTFG Ultrasoft OTFG OTFG NormUltrasoft conserving QC5 setb Cut-off energy (eV) # coresa Memory/process (MB) Peak memory use (MB) Total time (secs) 370 900 598 340 925 5 666 5 681 5 2072 5 1007 5 681 777 802 2785 1590 791 55 89 250 109 136 Table 1 Pseudopotential and size of planewave set required on ‘fine’ setting of Materials Studio 2020 6, and an example of memory requirements and time required for a single point energy calculation using the recorded number of cores on a single node. Unless otherwise stated, the same cut-off energy per type of pseudopotential is implied throughout this document. aUsing Sunbird (CPU: 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz with 20 cores each), unless stated otherwise all calculations were performed on this HPC cluster. bDesigned to be used at the same modest (340 eV) kinetic energy cut-off across the periodic table. They are ideal for moderate accuracy high throughout calculations e.g. ab initio random structure searching (AIRSS). K-points Changing the number of Brillouin zone sampling points can have a dramatic effect on computational time as shown in Table 2. Bear in mind that increasing the number of k-points increases the memory requirements, often tempting users to increase the number of cores, further increasing overall computational cost. Remember though, it’s important to use the number of k-points that provide the level of accuracy your simulations need. Type of pseudopotential Ultrasoft kpoints_mp_grid 2 1 1 (1) 652 3 2 1 (3) 666 4 3 2 (12) 1249 2 1 1 (1) 630 3 2 1 (3) 681 4 3 2 (12) 1287 768 777 1580 764 791 1296 32 55 222 85 136 477 Memory/process (MB) Peak memory use (MB) Total time (secs) OTFG Norm-conserving Table 2 Single point energy calculations run on 5 cores using different numbers of k-points (in brackets), showing the effects for different pseudopotentials. Vacuum space When building a material surface it is necessary to add vacuum space to a cell (see Figure 2 for example), and this adds to the memory requirements and calculation time because the ‘empty space’ (as well as the atoms) is ‘filled’ by planewaves. Table 3 shows that doubling the volume of vacuum space doubles the total calculation time (using the same number of cores). Vacuum space (Å) Memory/process (MB) Peak memory use (MB) Total time (secs) Overall parallel efficiencya Figure 2 Vacuum space added to create clay mineral surface (to study adsorbate-surface interactions for example –adsorbate not included in the above) 0 5 10 20 666 766 834 1078 777 928 1066 1372 55 69% 102 66% 202 67% 406 61% Table 3 Single point energy calculations using ultrasoft pseudopotentials and 3 k-points, run on 5 cores, showing the effects of vacuum space. aCalculated automatically by CASTEP. Supercell size The size of a system is one of the more obvious choices that affects the demands on computational resources; nevertheless, it is interesting to see (from Table 4) that for the same number of kpoints doubling the number of atoms increases the memory load per process between 35% (41 to 82 atoms) to 72%, (82 to 164 atoms) and the corresponding calculation times increase by factors 11 and 8.5 respectively. In good practice, the number of kpoints is scaled according to the supercell size, increasing the computational cost more modestly. Supercell size (# atoms) Kpoints (mp Kpoints scaled for supercells grid) 2x1x1 and 2x2x1 Memory/process (MB) Peak memory use (MB) Total time (secs) Overall parallel efficiencya 1x1x1 (41) 3 2 1 (3) 666 777 55 69% 2 x 1 x 1 (82) 2 x 2 x 1 (164) 321 (3) 897 1175 631 69% 321 (3) 1547 2330 5416 67% 211 (1) 732 1025 329 74% 211 (1) 1315 2177 1660 72% Table 4 Single point energy calculations using ultrasoft pseudo-potentials, run on 5 cores, showing the effects of supercells. aCalculated automatically by CASTEP. Figure 3 Example of 2 x 2 x 1 supercell Orientation of axes This might be one of the more surprising and unexpected properties of a model that affects computational efficiency. The effect becomes significant when a system is large, disproportionately longer along one of its lengths, and is misaligned with the x-, y-, z-axes, see Figure 4 and Table 5 for exaggerated examples of misalignment. This effect is due to the way CASTEP transforms real-space properties between real-space and reciprocalspace; it converts the 3-d fast Fourier transforms (FFT) to three, 1-d FFT columns that lie parallel to the x-, y, z-axes. Figure 4 Top row: A capped carbon nanotube (160 atoms), and bottom row a long carbon nanotube (1000 atoms) showing: long axes aligned in the x-direction (left); z-direction (middle); skewed (right). Orientation (# atoms) # Cores Memory/process (MB) Peak memory use (MB) Total time (secs) Overall parallel efficiencya Relative total energy (# cores * total time core-seconds) X (160) 5 884 1893 392 79% Z (160) 5 882 1885 359 84% Skewed (160) 5 882 1838 409 82% X (1000) 60 2870 7077 3906 78% Z (1000) 60 2870 7077 3908 78% Skewed (1000) 60 2870 7077 5232 75% 1960 1795 2045 234360 234480 313920 Table 5 Single point energy calculations of carbon nanotubes shown as oriented in Fig. 4, using ultrasoft pseudopotentials (280 eV cut-off energy) and 1 k-point. aCalculated automatically by CASTEP. B. Param file Grid-scale Although the ultrasofts require a smaller size of planewave basis set than the normconserving, they do need a finer electron density grid scale in the settings ‘grid_scale’ and ‘fine_grid_scale’. As shown in Table 6 the denser grid scale setting for the OTFG ultrasofts (with the exception of the QC5 set) can almost double the calculation time over the larger, planewave hungry OTFG norm-conserving pseudopotentials that converge well under a less dense grid. Type of pseudopotential Norm-conserving Ultrasoft OTFG Norm-conserving OTFG OTFG Ultrasoft Ultrasoft QC5 set grid_scale : fine_grid_scale 1.5:1.75 2.0:3.0 2.0:3.0 1.5:1.75 2.0:3.0 2.0:3.0 2.0:3.0 Memory/process (MB) Peak memory use (MB) Total time (secs) 792 681 666 680 731 2072 1007 803 1070 777 791 956 2785 1590 89 150 55 136 221 250 109 Table 6 Single point energy calculations run on 5 cores, showing the effects of different electron density grid settings. Data Distribution Parallelizing over plane wave vectors (‘G-vectors’), k-points or a mix of the two has an impact on computational efficiency as shown in Table 7. The default for a .param file without the keyword ‘data_distribution’ is to prioritize k-point distribution across a number of cores (less than or equal to the number requested in the submission script) that is a factor of the number of k-points, see for example, Table 7, columns 2 and 3. Inserting ‘data_distribution : kpoint’ into the .param file prioritizes and optimizes the k-point distribution across the number of cores requested in the script. In the example tested, selecting data distribution over kpoints increased the calculation time over the default of no data distribution; compare columns 3 and 5 of Table 7. Requesting G-vector distribution has the largest impact on calculation time and combining this with requesting a number of cores that is also a factor of the number of k-points, has the overall largest impact on reducing calculation time –see columns 6 and 7 of Table 7. Requesting mixed data distribution has a similar impact on calculation time as not requesting any data distribution for 5 cores but not for 6 cores, the ‘mixed’ distribution used 4-way kpoint distribution rather than the default (non-) request that applied 6-way distribution – compare columns 2 and 3 with 8 and 9. For the small clay model system the optimal efficiency was obtained using G-vector data distribution over 6 cores (852 core-seconds) and the least efficient choice was mixed data distribution over 6 cores (1584 core-seconds). The results are system-specific and need careful testing to tailor to different systems. Number of tasks per node This is invoked by adding ‘num_proc_in_smp’ to the .param file, and controls the number of message parsing interface (MPI) tasks that are placed in a specifically OpenMP (SMP) group. This means that the “all-to-all” communications is then done in three phases instead of one: (1) tasks within an SMP collect their data together on a chosen “controller” task within their group; (2) the “all-to-all” is done between the controller tasks; (3) the controllers all distribute the data back to the tasks in their SMP groups. For small core counts, the overhead of the two extra phases makes this method slower than just doing an all-to-all; for large core counts, the reduction in the all-to-all time more than compensates for the extra overhead, so it’s faster. Indeed, the tests (shown in Table 8) reveal that invoking this flag fails to produce as large a speed-up as the flag: ‘data_distribution : gvector’ (compare columns 3 and 9) for the test HPC cluster – Sunbird, reflecting the requested small core count. Generally speaking, the more cores in the Gvector group, the higher you want to set “num_proc_in_smp” (up to the physical number of cores on a node). Column # 1 Requested data distribution + # cores in HPC submission script Actual data distribution Memory/process (MB) Peak memory use (MB) Total time (secs) Overall parallel efficiencya Relative total energy (# cores * total time core-seconds) 2 None 3 None 4 Kpoints 5 Kpoints 6 Gvector 7 Gvector 8 Mixed 9 Mixed 5 cores 6 cores 5 cores 6 cores 5 cores 6 cores 5 cores 6 cores kpoint 4-way 1249 kpoint 6-way 1219 kpoint 5-way 1249 kpoint 6-way 1219 Gvector 5-way 728 Gvector 6-way 698 kpoint 4-way 1249 kpoint 4-way 1253 1581 1561 1581 1561 839 804 1581 1585 295 99% 199 96% 292 98% 226 96% 191 66% 142 71% 294 98% 264 96% 1475 1194 1460 1356 955 852 1470 1584 Table 7 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, showing the effects of data distribution across different numbers of cores requested in the script file. ‘Actual data distribution’ means that reported by CASTEP on completion in this and (where applicable) all following Tables. ‘Relative total energy’ assumes that each core requested by the script consumes X amount of electricity. aCalculated automatically by CASTEP. num_proc_in_smp Requested data_distribution Actual data distribution Default None Gvector kpoint Gvector 4-way 5-way 2 4 5 None Gvector None Gvector None Gvector kpoint Gvector kpoint Gvector kpoint Gvector 4-way 5-way 4-way 5-way 4-way 5-way 1249 728 1249 728 1249 728 Memory/process (MB) 1249 728 1581 839 1581 844 1581 846 Peak memory use (MB) 1580 837 231 171 230 182 237 183 Total time (secs) 222 156 98% 60% 98% 56% 96% 56% Overall parallel efficiencya 96% 66% 4 5 6 7 8 9 Column # 1 2 3 Table 8 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, run on 5 cores, showing the effects of setting ‘num_proc_in_smp : 2, 4, 5’, both with and without the ‘data_distribution : gvector’ flag. ‘Default’ means ‘num_proc_in_smp’ absent from .param file. aCalculated automatically by CASTEP Optimization strategy This parameter has three settings and is invoked through the ‘opt_strategy’ flag in the .param file: ο· Default - Balances speed and memory use. Wavefunction coefficients for all k-points in a calculation will be kept in memory, rather than be paged to disk. Some large work arrays will be paged to disk. ο· Memory - Minimizes memory use. All wavefunctions and large work arrays are paged to disk. ο· Speed - Maximizes speed by not paging to disk. This means that if a user runs a large memory calculation, optimizing for memory could obviate the need to request additional cores although the calculation will take longer - see Table 9 for comparisons. opt_strategy Memory/process (MB) Peak memory use (MB) Total time (secs) Overall parallel efficiencya Default 793 1566 Memory 750 1092 Speed 1249 1581 232 94% 290 97% 221 96% Table 9 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, run on 5 cores, showing the effects of optimizing for speed or memory. ‘Default’ means either omitting the ‘opt_strategy’ flag from the .param file or adding it as ‘opt_strategy : default’. aCalculated automatically by CASTEP. Spin polarization If a system comprises an odd number of electrons it might be important to differentiate between the spin-up and spin-down states of the odd electron. This directly affects the calculation time, effectively doubling it as shown in Table 10. .param flag and setting Memory/process (MB) Peak memory use (MB) Total time (secs) Overall parallel efficiencya spin_polarization false true 1249 1415 1581 1710 222 96% 455 98% Table 10 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, run on 5 cores, showing the effects of spin polarization. aCalculated automatically by CASTEP. Electronic energy minimizer Insulating systems often behave well during the self-consistent field (SCF) minimizations and converge smoothly using density mixing (‘DM’). When SCF convergence is problematic and all attempts to tweak DM-related parameters have failed, it is necessary to turn to ensemble density functional theory7 and accept the consequent (and considerable) increase in computational cost –see Table 11. .param flag and setting Memory/process (MB) Peak memory use (MB) Total time (secs) Overall parallel efficiencya metals_method (Electron minimization) DM EDFT 1249 1581 222 96% 1289 1650 370 97% Table 11 Single point energy calculations using ultrasoft pseudopotentials and 12 k-points, run on 5 cores, showing the effects of the electronic minimization method. ‘DM’ means density mixing and ‘EDFT’ ensemble density functional theory. aCalculated automatically by CASTEP. C. Script submission file Figure 5 An example HPC batch submission script Figure 5 captures the script variables that affect HPC computational energy and usage efficiency: (i) (ii) (iii) (iv) (v) The variable familiar to most HPC users describes the number of cores (‘tasks’) requested for the simulation. Unless the calculation is memory hungry configure the requested number of cores to sit on the fewest nodes because this reduces expensive node-to-node communication time. Choosing the shortest job run time gives the calculation a better chance of progressing through the job queue swiftly. When not requesting use of all cores on a single node, remove the ‘exclusive’ flag to accelerate progress through the job queue. Using the most recent version of software captures the latest upgrades and bugfixes that might otherwise slow down a calculation. Using the ‘dryrun’ tag provides a (very) broad estimate of the memory requirements. In one example the estimate of peak memory use was ¼ of that actually used during the simulation proper. D. An (extreme) example Clay mineral (Figure 2) Pseudopotential and cut-off energy (eV) K-points Grid-scale : fine-grid-scale num_proc_in_smp/requested data distribution Actual data distribution Optimization strategy Spin polarization Electronic energy minimizer Number of cores requested Memory/process (MB) / Scratch disk (MB) Peak memory use (MB) Total time (seconds) Overall parallel efficiencya Relative total energy (# cores * total time core-seconds/ core-hours) kiloJoules used (approx.) Careful optimisation for energy efficiency Vacuum space 10Å Ultrasoft; 370 Careless – no optimisation for energy efficiency Vacuum space 10 Å OTFG-Ultrasoft; 599 3 2:3 default / Gvector 12 3:4 20 / none 5-way Gvector only Speed False Density mixing 5 834 / 0 3-way Gvector,12-way kpoint, 3-way (Gvector) smp Default True EDFT 40 RESULTS 1461 / 6518 1066 215 69% 1075 0.30 9107 45,302 96% 1 812,080 503.36 202 52,000 Table 12 One clay mineral model, (Figure 2) with vacuum spaces of 10Å - Single point energy calculations showing the difference between carefully optimizing for energy efficiency and carelessly running without pre-testing. aCalculated automatically by CASTEP. Table 12 illustrates the combined effects of many of the model properties and parameters discussed in the previous section on the total time and overall use of computational resources. It’s unlikely a user would choose the whole combination of model properties and parameters shown in the ‘careless’ column, but it nevertheless gives an idea of the impact a user can have on the energy consumption of their simulations. For comparison, the cheapest electric car listed in 2021 consumes 26.8 kWh per 100 miles, or 603 kJ/km, which means that the carelessly run simulation used the equivalent energy of driving this car about 86 km, whereas the efficiently run simulation ‘drove’ it 0.33 km. For computational scientists and modellers, applying good energy efficiency practices needs to become second nature; following an energy efficiency ‘recipe’ or procedure is a route to embedding this practice as a habit. 3. Developing energy efficient computing habits: A recipe 1) Build a model of a system that contains only the essential ingredients that allows exploration of the scientific question. This is one of the key factors that determines the size of a model. 2) Find out how many cores per node there are on the available HPC cluster. This enables users to request the number of cores/tasks that minimizes inter-node communication during a simulation. 3) Choose the pseudopotentials to match the science. This ensures users don’t use pseudopotentials that are unnecessarily computationally expensive. 4) Carry out extensive convergence testing based on the minimum accuracy required for the production run results, e.g.: (i) Kinetic energy cut-off (depends on pseudopotential choice) (ii) Grid scale and fine grid scale (depends on pseudopotential choice) (iii) Size and orientation of model, including e.g. number of bulk atoms, number of layers, size of surface, vacuum space etc.. (iv) Number of k-points These decrease the possibility of over-convergence and its associated computational cost. 5) Spend time optimising the .param file properties described in Section B using a small number of SCF cycles: a. Data distribution: Gvector, k-points or mixed? b. Number of tasks per node c. Optimization strategy d. Spin polarization e. Electronic energy (SCF) minimization method This increases the chances of using resources efficiently due to matching the model and material requirements to the simulation parameters. 6) Optimise the script file. This increases the efficient use of HPC resources. 7) Submit the calculation and initially monitor it to check it’s progressing as expected. This reduces the chances of wasting computational time due to trivial (‘Friday afternoon’!) mistakes. 8) Carry out your own energy efficient computing tests (and send your findings to the JISCMAIL CASTEP mailing list). 9) Sit back and wait for the simulation to complete, basking in the knowledge that the simulation is running as energy efficiently1 as a user can possibly make it! 4. What else can a user do? In addition to using the above recipe to embed energy-efficient computing habits, a user can take a number of actions to encourage the wider awareness and adoption of energy efficient computing: a. If the HPC cluster uses SLURM, use the ‘sacct’ command to check the amount of energy consumed2 (in Joules) by a job -see Figure 6. b. If your local cluster uses a different job-scheduler, ask your local IT helpdesk if it has the facility to monitor the energy consumed by each HPC job. c. Include the energy consumption of simulations in all forms of reports and presentations, e.g. in/formal talks, posters, peer reviewed journal articles, social media posts etc. This will increase awareness of our role as environmentally aware and conscientious computational scientists and users of HPC resources. 1 It’s highly probable that users can expand on the list of model properties and parameters described within this document to further optimise energy efficient computing. 2 ‘Note: Only in case of exclusive job allocation this value reflects the jobs' real energy consumption’ :- see https://slurm.schedmd.com/sacct.html Figure 6 Examples of information about jobs output through SLURM’s ‘sacct’ command (plus flags). Top: list of details about several jobs run from 20/03/2021; bottom: details for a specific job ID via the ‘seff <jobID>’ command. d. Include estimates of the energy consumption of simulations in applications for funding. Although not yet explicitly requested in EPSRC funding applications, there is the expectation that UKRI’s 2020 commitment to Environmental Sustainability will filter down to all activities of its research councils including funding. This will mean that funding applicants will need to demonstrate their awareness of the environmental impact of their proposed work. Become an impressive pioneer and include environmental impact through energy consumption in your next application. 5. What are the developers doing? The compilation of this document included a chat with several of the developers of CASTEP who are keen to help users run their software energy efficiently; they shared their plans and projects in this field: ο· ο· ο· Parts of CASTEP have been programmed to run on GPUs with up to a 15-fold speed-up (for non-local functionals). Work on a CASTEP simulator is underway that should reduce the number of CASTEP calculations required per simulation by choosing an optimal parallel domain decomposition, and implementing timings for FFTs – the big parallel cost; also it will estimate compute usage. This simulator will go a long way to providing the structure needed to add energy efficiency to CASTEP and will be accessible through the ’- -dryrun’ command. The toy code is available in bitbucket. The developers recognise the need for energy consumption to be acknowledged as an additional factor to be included in the cost of computational simulations. They will be planning their approach beyond the software itself such as including energy efficient computing in their training courses. Acknowledgements I acknowledge the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government. Thank you to the following CASTEP developers for their invaluable input and support for this small project: Dr. Phil Hasnip and Prof. Matt Probert (University of York); Prof Chris Pickard (University of Cambridge); Dr. Dominik Jochym (STFC); Prof. Stewart Clark (University of Durham). Thanks also to Dr. Sue Thorne (STFC) and Dr. Ed Bennett (Supercomputing, Wales) for sharing their research engineering perspectives. References (1) (2) (3) (4) (5) (6) (7) Clark, S. J.; Segall, M. D.; Pickard, C. J.; Hasnip, P. J.; Probert, M. I. J.; Refson, K.; Payne, M. C. First Principles Methods Using CASTEP. Z Krist. 2005, 220, 567–570. Vanderbilt, D. Soft Self-Consistent Pseudopotentials in a Generalized Eigenvalue Formalism. Phys Rev B 1990, 41, 7892–7895. Pickard, C. J. On-the-Fly Pseudopotential Generation in CASTEP. 2006. Refson, K.; Clark, S. J.; Tulip, P. Variational Density Functional Perturbation Theory for Dielectrics and Lattice Dynamics. Phys Rev B 2006, 73, 155114. Hamann, D. R.; Schlüter, M.; Chiang, C. Norm-Conserving Pseudopotentials. Phys. Rev. Lett. 1979, 43 (20), 1494–1497. BIOVIA. Dassault Systèmes, Materials Studio 2020; Dassault Systèmes: San Diego, 2019. Marzari, N.; Vanderbilt, D.; Payne, M. C. Ensemble Density Functional Theory for Ab Initio Molecular Dynamics of Metals and Finite-Temperature Insulators. Phys. Rev. Lett. 1997, 79, 1337–1340.