This section covers how to monitor energy use for your jobs on ARCHER2 and how to control the CPU frequency which allows some control over how much energy is consumed by jobs.
The default CPU frequency cap on ARCHER2 compute nodes for jobs launched using
srun is currently set
to 2.0 GHz. Information below describes how to control the CPU frequency cap using Slurm.
Monitoring energy use
The Slurm accounting database stores the total energy consumed by a job and you can also directly access the counters on compute nodes which capture instantaneous power and energy data broken down by different hardware components.
Using sacct to get energy usage for individual jobs
Energy usage for a particular job may be obtained using the
sacct command. For instance
sacct -j 2658300 --format=JobID,Elapsed,ReqCPUFreq,ConsumedEnergy
will provide the elapsed time and consumed energy in joules for the job(s) specified with
The output of this command is:
JobID Elapsed ReqCPUFreq ConsumedEnergy
------------ ---------- ---------- --------------
2658300 02:19:48 Unknown 4.58M
2658300.bat+ 02:19:48 0 4.58M
2658300.ext+ 02:19:48 0 4.58M
2658300.0 02:19:09 Unknown 4.57M
In this case we can see that the job consumed 4.58 MJ for a run lasting 2 hours, 19 minutes and 48 seconds with the CPU frequency unset. To convert the energy to kWh we can multiply the energy in joules by 2.78e-7, in this case resulting in 1.27 kWh.
The Slurm database may be cleaned without notice so you should gather any data you want as soon as possible after
the job completes - you can even add the
sacct command to the end of your job script to ensure this data is
In addition to energy statistics
sacct provides a number of other statistics that can be specified to the
option, the full list of which can be viewed with
or using the
Accessing the node energy/power counters
The counters are available on each compute node and record data only for that compute node. If you are running multi-node jobs, you will need to combine data from multiple nodes to get data for the whole job.
On compute nodes, the raw energy counters and instantaneous power draw data are available at:
There are a number of files in this directory, all the counter files include the current value and a timestamp.
- power - Point-in-time power (Watts).
- energy - Accumulated energy (Joules).
- cpu_power - Point-in-time power (Watts) used by the CPU domain.
- cpu_energy - The total energy (Joules) used by the CPU domain.
- cpu*_temp - Temperature reading (Celsius) of the CPU domain - one file per CPU socket.
- memory_power - Point-in-time power (Watts) used by the memory domain.
- memory_energy - The total energy (Joules) used by the memory domain.
- generation - A counter that increments each time a power cap value is changed.
- startup - Startup counter.
- freshness - Free-running counter that increments at a rate of approximately 10Hz.
- version - Version number for power management counter support.
- power_cap - Current power cap limit in Watts; 0 indicates no capping.
- raw_scan_hz - The power management scanning rate for all data in pm_counters.
This documentation is from the official HPE documentation:
energy counters include all on-node systems. The major components
are the CPU (processor), memory and Slingshot network interface controller (NIC).
There exists an MPI-based wrapper library that can gather the
pm counter values at runtime via a simple
set of function calls. See the link below for details.
Controlling CPU frequency
You can request specific CPU frequency caps (in kHz) for compute nodes through
srun options or environment variables.
The available frequency caps on the ARCHER2 processors along with the options and environment variables:
|Slurm environment variable
|Turbo boost enabled?
The only frequency caps available on the processors on ARCHER2 are 1.5 GHz, 2.0 GHz and 2.25GHz+turbo.
Setting the CPU frequency cap in this way sets the maximum frequency that the processors can use. In practice, the individual cores may select different frequencies up to the value you have set depending on the workload on the processor.
When you select the highest frequency value (2.25 GHz), you also enable turbo boost and so the processor is free to set the CPU frequency to values above 2.25 GHz if possible within the power and thermal limits of the processor. We see that, with turbo boost enabled, the processors typically boost to around 2.8 GHz even when performing compute-intensive work.
For example, you can add the following option to
srun commands in your job submission scripts to set the CPU frequency
to 2.25 GHz (and also enable turbo boost):
srun --cpu-freq=2250000 ...usual srun options and arguments...
Alternatively, you could add the following line to your job submission script before you use
to launch the application:
Testing by the ARCHER2 CSE team has shown that most software are most energy efficient when 2.0 GHz is selected as the CPU frequency.
The CPU frequency settings only affect applications launched using the
Priority of frequency settings:
- The default
SLURM_CPU_FREQ_REQsetting set by the ARCHER2 service applies if no other mechnism is used to set the CPU frequency
- Setting the
SLURM_CPU_FREQ_REQenvironment variable in a job script overrides options provided the default environment variable setting for any subsequent
sruncommands in the job script.
- Adding the
--cpu-freq=<freq in kHz>option to the
srunlaunch command itself overrides all other options.
--cpu-freq=<freq in kHz> option to
sbatch (e.g. using
#SBATCH --cpu-freq=<freq in kHz>
will not change the CPU frequency of
srun commands used in the job as the default setting for ARCHER2
will override the
sbatch option when the script runs.
Default CPU frequency
If you do not specify a CPU frequency then you will get the default setting for the ARCHER2 service
when you lanch an application using
The table below lists the history of default CPU frequency settings on the ARCHER2 service
|Default CPU frequency
|12 Dec 2022 - current date
|Nov 2021 - 11 Dec 2022
|Unspecified - defaults to 2.25 GHz
Slurm CPU frequency settings for centrally-installed software
Most centrally installed research software (available via
commands) uses the same default Slurm CPU frequency as set globally for all ARCHER2 users (see above
for this value). However, a small number of software have performance that is significantly
degraded by using lower frequency settings and so the modules for these packages reset the
CPU frequency to the highest value (2.25 GHz). The packages that currently do this are:
If you specify the Slurm CPU frequency in your job scripts using one of the mechanisms described above after you have loaded the module, you will override the setting from the module.