Skip to content

Power Management MPI Library

The ARCHER2 compute nodes each have a set of so-called Power Management (PM) counters. These cover point-in-time power readings for the whole node, and for the CPU and memory domains. The accumulated energy use is also recorded at the same level of detail. Further, there are two temperature counters, one for each socket/processor on the node. The counters are read ten times per second and the data written to a set of files stored within node memory (located at /sys/cray/pm_counters/).

For convenience, we have developed an MPI-based wrapper, called pm_mpi_lib that facilitates the reading of the PM counter files, see the link below.

https://github.com/cresta-eu/pm_mpi_lib

The PM MPI Library makes it possible to monitor the Power Management counters during the execution of an MPI code running across multiple compute nodes. The library is lightweight, containing just three functions, and is intended to be straightforward to use. You simply decide which parts of your code you wish to profile as regards energy usage and/or power consumption.

As your code executes, the PM counters will be read at various points by a single designated monitor rank on each node assigned to the job. These readings are then written to a log file, which, after the job completes, will contain one set of time-stamped readings per node for every call to the pm_mpi_record function made from within your code. The readings can then be aggregated according to preference.

Further information along with test harnesses and example scripts can be found by reading the PM MPI Library readme file.