VASP
The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.
VASP computes an approximate solution to the many-body Schrödinger equation, either within density functional theory (DFT), solving the Kohn-Sham equations, or within the Hartree-Fock (HF) approximation, solving the Roothaan equations. Hybrid functionals that mix the Hartree-Fock approach with density functional theory are implemented as well. Furthermore, Green's functions methods (GW quasiparticles, and ACFDT-RPA) and many-body perturbation theory (2nd-order Møller-Plesset) are available in VASP.
In VASP, central quantities, like the one-electron orbitals, the electronic charge density, and the local potential are expressed in plane wave basis sets. The interactions between the electrons and ions are described using norm-conserving or ultrasoft pseudopotentials, or the projector-augmented-wave method.
To determine the electronic ground state, VASP makes use of efficient iterative matrix diagonalisation techniques, like the residual minimisation method with direct inversion of the iterative subspace (RMM-DIIS) or blocked Davidson algorithms. These are coupled to highly efficient Broyden and Pulay density mixing schemes to speed up the self-consistency cycle.
Useful Links
Using VASP on ARCHER2
VASP is only available to users who have a valid VASP licence.
If you have a VASP 5 or 6 licence and wish to have access to VASP on ARCHER2, please make a request via the SAFE, see:
Please have your license details to hand.
Note
Both VASP 5 and VASP 6 are available on ARCHER2. You generally need a different licence for each of these versions.
Running parallel VASP jobs
To access VASP you should load the appropriate vasp
module in your job
submission scripts.
To load the default version of VASP, you would use:
module load vasp
Tip
VASP 6.4.3 and above have all been compiled to include Wannier90 functionality. Older versions of VASP on ARCHER2 do not include Wannier90.
Once loaded, the executables are called:
vasp_std
- Multiple k-point versionvasp_gam
- GAMMA-point only versionvasp_ncl
- Non-collinear version
Once the module has been loaded, you can access the LDA and PBE pseudopotentials for VASP on ARCHER2 at:
$VASP_PSPOT_DIR
Tip
VASP 6 can make use of OpenMP threads in addition to running with pure MPI. We will add notes on performance and use of threading in VASP as information becomes available.
Example VASP submission script
#!/bin/bash
# Request 16 nodes (2048 MPI tasks at 128 tasks per node) for 20 minutes.
#SBATCH --job-name=VASP_test
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=128
#SBATCH --cpus-per-task=1
#SBATCH --time=00:20:00
# Replace [budget code] below with your project code (e.g. t01)
#SBATCH --account=[budget code]
#SBATCH --partition=standard
#SBATCH --qos=standard
# Load the VASP module
module load vasp/6
# Avoid any unintentional OpenMP threading by setting OMP_NUM_THREADS
export OMP_NUM_THREADS=1
# Ensure the cpus-per-task option is propagated to srun commands
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
# Launch the code - the distribution and hint options are important for performance
srun --distribution=block:block --hint=nomultithread vasp_std
VASP Transition State Tools (VTST)
As well as the standard VASP 5 modules, we provide versions of VASP 5 with the VASP Transition State Tools (VTST) from the University of Texas added. The VTST version adds various functionality to VASP and provides additional scripts to use with VASP. Additional functionality includes:
- Climbing Image NEB: method for finding reaction pathways between two stable states.
- Dimer: method for finding reaction pathways when only one state is known.
- Lanczos: provides an alternative way to find the lowest mode and find saddle points.
- Optimisers: provides an alternative way to find the lowest mode and find saddle points.
- Dynamical Matrix: uses finite difference to find normal modes and reaction prefactors.
Full details of these methods and the provided scripts can be found on the VTST website.
On ARCHER2, the VTST version of VASP 5 can be accessed by loading the modules with
VTST
in the module name, for example:
module load vasp/6/6.4.1-vtst
Compiling VASP on ARCHER2
If you wish to compile your own version of VASP on ARCHER2 (either VASP 5 or VASP 6) you can find information on how we compiled the central versions in the build instructions GitHub repository. See:
Tips for using VASP on ARCHER2
Switching MPI transport protocol from OpenFabrics to UCX
The VASP modules are setup to use the OpenFabrics MPI transport protocol as testing has shown that this passes all the regression tests and gives the most reliable operation on ARCHER2. However, there may be cases where using UCX can give better performance than OpenFabrics.
If you want to try the UCX transport protocol then you can do this using by loading additional modules after you have loaded the VASP modules. For example, for VASP 6, you would use:
module load vasp/6
module load craype-network-ucx
module load cray-mpich-ucx
Increasing the CPU frequency and enabling turbo-boost
The default CPU frequency is currently set to 2 GHz on ARCHER2. While many VASP calculations are memory or MPI bound, some calculations can be CPU bound. For those cases, you may see a signiicant difference in performance by increasing the CPU frequency and enabling turbo-boost (though you will almost certainly also be less energy efficient).
You can do this by adding the line:
export SLURM_CPU_FREQ_REQ=2250000
in your job submission script before the srun command
Performance tips
The performance of VASP depends on the version of VASP used, the performance
of MPI collective operations, the choice of VASP parallelisation parameters
(NCORE
/NPAR
and KPAR
) and how many MPI processes per node are used.
KPAR: You should always use the maximum value of KPAR
that is possible for
your calculation within the memory limits of what is possible.
NCORE/NPAR: We have found that the optimal values of NCORE
(and hence NPAR
)
depend on both the type of calculation you are performing (e.g. pure DFT, hybrid functional,
Γ-point, non-collinear) and the number of nodes/cores you are using for your
calculation. In practice, this means that you should experiment with different values
to find the best choice for your calculation. There is information below on the best
choices for the benchmarks we have run on ARCHER2 that may serve as a useful starting
point. The performance difference from choosing different values can vary by up to
100% so it is worth spending time investigating this.
MPI processes per node We found that it is sometimes beneficial to performance to use less MPI processes per node than the total number of cores per node in some cases for the benchmarks used.
OpenMP threads Using multiple OpenMP threads per MPI process can be beneficial to performance. 4 OpenMP threads per MPI process typically sees the best performance in the tests we have performed.
VASP performance data on ARCHER2
VASP performance data on ARCHER2 is currently available for two different benchmark systems:
- TiO_2 Supercell, pure DFT functional, Γ-point, 1080 atoms (results added soon)
- CdTe Supercell, hybrid DFT functional. 8 k-points, 65 atoms
CdTe Supercell, hybrid DFT functional. 8 k-points, 65 atoms
Basic information:
- Uses
vasp_ncl
NELM = 6
Performance summary:
- VASP version:
- VASP 6.4.2 with MKL 19.5 gives best performance (
vasp/6/6.4.2-mkl19
modules)
- VASP 6.4.2 with MKL 19.5 gives best performance (
- Cores per node:
- Best performance usually from 128 MPI processes per node - all cores occupied
NCORE
:- Up to 8 nodes: best performance with 4 OpenMP threads (NCORE fixed at 1)
- 16 nodes or more: best performance with
NCORE = 16
KPAR = 2
is maximum that can be used on standard memory nodes- Scales well to 8 nodes, OK to 16 nodes
- Using 4 OpenMP threads per MPI process usually gives best performance
Setup details:
- vasp/6/6.4.2-mkl19
module
- GCC 11.2.0
- MKL 19.5 for BLAS/LAPACK/ScaLAPACK and FFTW
- OFI for MPI transport layer
Nodes | MPI processes per node | OpenMP thread per MPI process | Total cores | NCORE | KPAR | LOOP+ Time |
---|---|---|---|---|---|---|
1 | 32 | 4 | 128 | 1 | 2 | 5838 |
2 | 32 | 4 | 256 | 1 | 2 | 3115 |
4 | 32 | 4 | 512 | 1 | 2 | 1682 |
8 | 32 | 4 | 1024 | 1 | 2 | 928 |
16 | 128 | 1 | 2048 | 16 | 2 | 612 |
32 | 128 | 1 | 4096 | 16 | 2 | 459 |
64 | 128 | 1 | 8192 | 16 | 2 | 629 |