ARCHER2 hardware
System overview
ARCHER2 is a HPE Cray EX supercomputing system which has a total of 5,860 compute nodes. Each compute node has 128 cores (dual AMD EPYC 7742 64-core 2.25GHz processors) giving a total of 750,080 cores. Compute nodes are connected together by a HPE Slingshot interconnect.
There are additional User Access Nodes (UAN, also called login nodes), which provide access to the system, and data-analysis nodes, which are well-suited for preparation of job inputs and analysis of job outputs.
Compute nodes are only accessible via the Slurm job scheduling system.
There are two storage types: home and work. Home is available on login nodes and data-analysis nodes. Work is available on login, data-analysis nodes and compute nodes (see I/O and file systems).
This is shown in the ARCHER2 architecture diagram:
The home file system is provided by dual NetApp FAS8200A systems (one primary and one disaster recovery) with a capacity of 1 PB each.
The work file system consists of four separate HPE Cray L300 storage systems, each with a capacity of 3.6 PB. The interconnect uses a dragonfly topology, and has a bandwidth of 100 Gbps.
The system also includes 1.1 PB burst buffer NVMe storage, provided by an HPE Cray E1000F.
Note
The NVMe storage is currently in preparation, and is planned to be made available to users in Spring 2022.
Compute node details
The compute nodes each have 128 cores. They are dual socket nodes with two 64 core AMD EPYC 7742 processors.
Note
Note due to Simultaneous Multi-Threading (SMT) each core has 2 threads, therefore a node has 128 cores / 256 threads. Most users will not want to use SMT, see Launching parallel jobs.
Component | Details |
---|---|
Processor | 2x AMD Zen2 (Romes) EPYC 7742, 64-core, 2.25 Ghz |
Cores per node | 128 |
NUMA structure | 8 NUMA regions per node (16 cores per NUMA region) |
Memory per node | 256 GB (standard), 512 GB (high memory) |
Memory per core | 2 GB (standard), 4 GB (high memory) |
L1 cache | 32 kB/core |
L2 cache | 512 kB/core |
L3 cache | 16 MB/4-cores |
Vector support | AVX2 |
Network connection | 2x 100 Gb/s injection ports per node |
Memory details
The 5,276 standard nodes each have 256 GB and the 584 high memory nodes each have 512 GB. All memory is 8-channel, DDR4 3200MHz with 204.8 GB/s peak bandwidth.
Interconnect details
ARCHER2 has a HPE Slingshot interconnect with 200 Gb/s signalling. It uses a dragonfly topology:
-
Nodes are organized into groups.
- 128 Nodes in a group.
- Electrical links between Network Interface Card (NIC) and switch.
- 16 switches per group.
- 2x NIC per node.
- All-to-all connection amongst switches in a group using electrical links.
-
All-to-all connection between groups using optical links.
- 2 groups per ARCHER2 Cabinet.