VASP

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VASP computes an approximate solution to the many-body Schrödinger equation, either within density functional theory (DFT), solving the Kohn-Sham equations, or within the Hartree-Fock (HF) approximation, solving the Roothaan equations. Hybrid functionals that mix the Hartree-Fock approach with density functional theory are implemented as well. Furthermore, Green's functions methods (GW quasiparticles, and ACFDT-RPA) and many-body perturbation theory (2nd-order Møller-Plesset) are available in VASP.

In VASP, central quantities, like the one-electron orbitals, the electronic charge density, and the local potential are expressed in plane wave basis sets. The interactions between the electrons and ions are described using norm-conserving or ultrasoft pseudopotentials, or the projector-augmented-wave method.

To determine the electronic ground state, VASP makes use of efficient iterative matrix diagonalisation techniques, like the residual minimisation method with direct inversion of the iterative subspace (RMM-DIIS) or blocked Davidson algorithms. These are coupled to highly efficient Broyden and Pulay density mixing schemes to speed up the self-consistency cycle.

Useful Links

Using VASP on ARCHER2

VASP is only available to users who have a valid VASP licence.

If you have a VASP 5 or 6 licence and wish to have access to VASP on ARCHER2, please make a request via the SAFE, see:

How to request access to package groups

Please have your license details to hand.

Note

Both VASP 5 and VASP 6 are available on ARCHER2. You generally need a different licence for each of these versions.

Running parallel VASP jobs

To access VASP you should load the appropriate vasp module in your job submission scripts.

To load the default version of VASP, you would use:

module load vasp

Tip

VASP 6.4.3 and above have all been compiled to include Wannier90 functionality. Older versions of VASP on ARCHER2 do not include Wannier90.

Once loaded, the executables are called:

vasp_std - Multiple k-point version
vasp_gam - GAMMA-point only version
vasp_ncl - Non-collinear version

Once the module has been loaded, you can access the LDA and PBE pseudopotentials for VASP on ARCHER2 at:

$VASP_PSPOT_DIR

Tip

VASP 6 can make use of OpenMP threads in addition to running with pure MPI. We will add notes on performance and use of threading in VASP as information becomes available.

Example VASP submission script

#!/bin/bash

# Request 16 nodes (2048 MPI tasks at 128 tasks per node) for 20 minutes.   

#SBATCH --job-name=VASP_test
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=128
#SBATCH --cpus-per-task=1
#SBATCH --time=00:20:00

# Replace [budget code] below with your project code (e.g. t01)
#SBATCH --account=[budget code] 
#SBATCH --partition=standard
#SBATCH --qos=standard

# Load the VASP module
module load vasp/6

# Avoid any unintentional OpenMP threading by setting OMP_NUM_THREADS
export OMP_NUM_THREADS=1

# Ensure the cpus-per-task option is propagated to srun commands
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK

# Launch the code - the distribution and hint options are important for performance
srun --distribution=block:block --hint=nomultithread vasp_std

VASP Transition State Tools (VTST)

As well as the standard VASP 5 modules, we provide versions of VASP 5 with the VASP Transition State Tools (VTST) from the University of Texas added. The VTST version adds various functionality to VASP and provides additional scripts to use with VASP. Additional functionality includes:

Climbing Image NEB: method for finding reaction pathways between two stable states.
Dimer: method for finding reaction pathways when only one state is known.
Lanczos: provides an alternative way to find the lowest mode and find saddle points.
Optimisers: provides an alternative way to find the lowest mode and find saddle points.
Dynamical Matrix: uses finite difference to find normal modes and reaction prefactors.

Full details of these methods and the provided scripts can be found on the VTST website.

On ARCHER2, the VTST version of VASP 5 can be accessed by loading the modules with VTST in the module name, for example:

module load vasp/6/6.4.1-vtst

Compiling VASP on ARCHER2

If you wish to compile your own version of VASP on ARCHER2 (either VASP 5 or VASP 6) you can find information on how we compiled the central versions in the build instructions GitHub repository. See:

Build instructions for VASP on GitHub

Tips for using VASP on ARCHER2

Switching MPI transport protocol from OpenFabrics to UCX

The VASP modules are setup to use the OpenFabrics MPI transport protocol as testing has shown that this passes all the regression tests and gives the most reliable operation on ARCHER2. However, there may be cases where using UCX can give better performance than OpenFabrics.

If you want to try the UCX transport protocol then you can do this using by loading additional modules after you have loaded the VASP modules. For example, for VASP 6, you would use:

module load vasp/6
module load craype-network-ucx
module load cray-mpich-ucx

Increasing the CPU frequency and enabling turbo-boost

The default CPU frequency is currently set to 2 GHz on ARCHER2. While many VASP calculations are memory or MPI bound, some calculations can be CPU bound. For those cases, you may see a signiicant difference in performance by increasing the CPU frequency and enabling turbo-boost (though you will almost certainly also be less energy efficient).

You can do this by adding the line:

export SLURM_CPU_FREQ_REQ=2250000

in your job submission script before the srun command

Performance tips

The performance of VASP depends on the version of VASP used, the performance of MPI collective operations, the choice of VASP parallelisation parameters (NCORE/NPAR and KPAR) and how many MPI processes per node are used.

KPAR: You should always use the maximum value of KPAR that is possible for your calculation within the memory limits of what is possible.

NCORE/NPAR: We have found that the optimal values of NCORE (and hence NPAR) depend on both the type of calculation you are performing (e.g. pure DFT, hybrid functional, Γ-point, non-collinear) and the number of nodes/cores you are using for your calculation. In practice, this means that you should experiment with different values to find the best choice for your calculation. There is information below on the best choices for the benchmarks we have run on ARCHER2 that may serve as a useful starting point. The performance difference from choosing different values can vary by up to 100% so it is worth spending time investigating this.

MPI processes per node We found that it is sometimes beneficial to performance to use less MPI processes per node than the total number of cores per node in some cases for the benchmarks used.

OpenMP threads Using multiple OpenMP threads per MPI process can be beneficial to performance. 4 OpenMP threads per MPI process typically sees the best performance in the tests we have performed.

VASP performance data on ARCHER2

VASP performance data on ARCHER2 is currently available for two different benchmark systems:

TiO_2 Supercell, pure DFT functional, Γ-point, 1080 atoms (results added soon)
CdTe Supercell, hybrid DFT functional. 8 k-points, 65 atoms

CdTe Supercell, hybrid DFT functional. 8 k-points, 65 atoms

Basic information:

Uses vasp_ncl
NELM = 6

Performance summary:

VASP version:
- VASP 6.4.2 with MKL 19.5 gives best performance (vasp/6/6.4.2-mkl19 modules)
Cores per node:
- Best performance usually from 128 MPI processes per node - all cores occupied
NCORE:
- Up to 8 nodes: best performance with 4 OpenMP threads (NCORE fixed at 1)
- 16 nodes or more: best performance with NCORE = 16
KPAR = 2 is maximum that can be used on standard memory nodes
Scales well to 8 nodes, OK to 16 nodes
Using 4 OpenMP threads per MPI process usually gives best performance

Setup details: - vasp/6/6.4.2-mkl19 module - GCC 11.2.0 - MKL 19.5 for BLAS/LAPACK/ScaLAPACK and FFTW - OFI for MPI transport layer

Nodes	MPI processes per node	OpenMP thread per MPI process	Total cores	NCORE	KPAR	LOOP+ Time
1	32	4	128	1	2	5838
2	32	4	256	1	2	3115
4	32	4	512	1	2	1682
8	32	4	1024	1	2	928
16	128	1	2048	16	2	612
32	128	1	4096	16	2	459
64	128	1	8192	16	2	629