Quickstart for users¶
The ARCHER2 Service is not yet available. This documentation is in development.
This guide aims to quickly enable new users to get up and running on ARCHER2 by running through the process of getting an ARCHER2 account, logging in and running your first job.
Request an account on ARCHER2¶
You need to use both a password and a passphrase-protected SSH key pair to log into ARCHER2. You get the password from SAFE but will need to setup your own SSH key pair and add the public part to your account via SAFE before you will be able to log in. We cover the authentication steps below.
Obtain an account on the SAFE website¶
The first step is to sign up for an account on the ARCHER2 SAFE website. This account is used to manage your user accounts and report on your usage and quotas. To do this:
- Go to the SAFE New User Signup Form
- Fill in your personal details. You can come back later and change them if you wish
- Click Submit
You are now registered. Your SAFE password will be emailed to the email address you provided. You can then login with that email address and password.
Request an ARCHER2 login account¶
Once you have a SAFE account you will need to request a user account on ARCHER2 itself. To do this you will require a Project Code; you usually obtain this from the Principle Investigator (PI) or project manager for the project you will be working on. Once you have the Project Code:
- Log into SAFE
- Use the Login accounts - Request new account menu item
- Select the correct project from the drop down list
- Select the ARCHER2 machine in the list of available machines
- Click Next
- Enter a username for the account
- Click Request
The PI or project manager of the project will be asked to approve your request. After your request has been approved the account will be created and when this has been done you will receive an email. You can then come back to SAFE and pick up the initial, one use password for your new account (ARCHER2 account passwords are also sometimes referred to as LDAP passwords by the system).
Generating and adding an SSH key pair¶
How you generate your SSH key pair depends on which operating system you use and which SSH client you use to connect to ARCHER2. We will not cover the details on generating an SSH key pair here, but [detailed information on generating an SSH key pair is available in the ARCHER2 User and Best Practice Guide](https://docs.archer2.ac.uk/user-guide/connecting.html).
Once you have generated your SSH key pair, you should add the public part to your login account using SAFE:
- Log into SAFE
- Use the menu Login accounts and select the ARCHER2 account you want to add the SSH key to
- On the subsequent Login account details page click the Add Credential button
- Select SSH public key as the Credential Type and click Next
- Either copy and paste the public part of your SSH key into the SSH Public key box or use the button to select the public key file on your computer.
- Click Add to associate the public SSH key part with your account
Once you have done this, your SSH key will be added to your ARCHER2 account.
Remember, you will need to use both an SSH key and password to log into ARCHER2 so you will also need to collect your initial password before you can log into ARCHER2. We cover this next.
Collecting your ARCHER2 password¶
You should now collect your ARCHER2 password:
- Log into SAFE
- Use the Login accounts menu to select your new login account
- This will display details of your account. Use the View Login Account Password button to view your single-use ARCHER2 password.
This password is generated randomly by the software. It’s best to copy-and-paste it across when you log in to the service machine. After you login, you will be prompted to change it. You should enter this password again, and then you will be prompted for your new, easy-to-remember password. Your new password should conform to the ARCHER2 Password Policy.
When you change your password on the service machine in this way, this is not reflected on the SAFE.
Login to ARCHER2¶
To log into ARCHER2 you should use the
You will first be prompted for the passphrase associated with your SSH key pair. Once you have entered your passphrase successfully, you will then be prompted for your password. You need to enter both correctly to be able to access ARCHER2.
If your SSH key pair is not stored in the default location (usually
~/.ssh/id_rsa) on your local system, you may need to specify the
path to the private part of the key wih the
-i option to
For example, if your key is in a file called
you would use the command
ssh -i keys/id_rsa_archer2 email@example.com
to log in.
When you first log into ARCHER2, you will be prompted to change your initial password. This is a three step process:
- When promoted to enter your ldap password: Re-enter the password you retrieved from SAFE
- When prompted to enter your new password: type in a new password
- When prompted to re-enter the new password: re-enter the new password
Your password has now been changed
More information on connecting to ARCHER2 is available in Connecting to ARCHER2.
File systems and manipulating data¶
ARCHER2 has a number of different file systems and understanding the difference between them is crucial to being able to use the system. In particular, transferring and moving data often requires a bit of thought in advance to ensure that the data is secure and in a useful form.
ARCHER2 file systems are:
- /home: backed up for disaster recovery purposes only, data recovery for accidental deletion is not supported. NFS, available on login and service nodes.
- /work: not backed-up. Lustre, available on login, service and compute nodes.
Top tips for managing data on ARCHER2:
- Do not generate huge (>1000) numbers of files in a single directory
- Much of the performance difference on transferring data is due to numbers of files involved in the transfer - minimise the number of files that you have to transfer by using archiving tools to improve performance.
- Archive directories or large numbers of files before moving them between file systems (e.g. using tar)
- When using
rsyncbetween file systems mounted on ARCHER2 avoid using the compression options as these slow operations down (as file system bandwidth is generally better than throttling by CPU performance by using compression).
- Think about automating the combination and transfer of multiple files output by software on ARCHER2 to other resources. The Data Management Guide linked below provides examples of how to automatically verify the integrity of an archive and examples of how to do this.
Information on best practice in managing you data is available in the section Data management and transfer.
Software on ARCHER2 is principally accessed through environment modules. These
load and unload the desired compilers, tools and libraries through the
module command and its subcommands. Some will be loaded by default on login,
providing a default working environment; many more will be available for use but
initially unloaded, allowing you to set up the environment to suit your needs.
At any stage you can check which modules have been loaded by running:
Running the following command will display all environment modules available on ARCHER2, whether loaded or unloaded:
The search field for this command may be narrowed by providing the first few characters of the module name being queried. For example, all available versions and variants of VASP may be found by running:
module avail vasp
You will see that different versions are available for many modules. For
vasp/6/6.1.0 are two available versions of
VASP. Furthermore, a default version may be specified and will be used if no
version is provided by the user.
VASP is licensed software, as are some other software packages on ARCHER2. You must have a valid licence to use licensed software on ARCHER2. Often you will need to request access through the SAFE. More on this below.
module load and
module add commands perform the same action, loading
a module for use. Following the above,
module load vasp
would load the default version of VASP, while
module load vasp/5/5.4.4
would specifically load version 5.4.4. A loaded module may be unloaded through
module remove or
module unload vasp
which would unload whichever version of VASP is currently in the environment. Rather than issuing separate unload and load commands, versions of a module may be swapped as follows:
module swap vasp vasp/5/5.4.4
Other helpful commands are:
module help <modulename>which provides a short description of the module
module show <modulename>which displays the contents of the modulefile
Points to be aware of include:
- Some modules will conflict with others. A simple example would be the conflict
arising when trying to load a different version of an already loaded module.
When a conflict occurs, the loading process will fail and an error message
will be displayed. Examination of the message and the modulefiles (via
module show) should reveal the cause of the conflict and how to resolve it.
- The order in which modules are loaded can matter. Consider two modules
which set the same variable to a different value. The final value
would be that set by the module which loaded last. If you suspect that two
modules may be interfering with one another, you can examine their contents
Requesting access to licensed software¶
Some of the software installed on ARCHER2 requires a user to have a valid licence agreed with the software owners/developers to be able to use it (for example, VASP). Although you will be able to load this software on ARCHER2 you will be barred from actually using it until your licence has been verified.
You request access to licensed software through the EPCC SAFE (the web administration tool you used to apply for your account and retrieve your initial password) by being added to the appropriate Package Group. To request access to licensed software:
- Log in to SAFE
- Go to the Menu Login accounts and select the login account which requires access to the software
- Click New Package Group Request
- Select the software from the list of available packages and click Select Package Group
- Fill in as much information as possible about your license; at the very least provide the information requested at the top of the screen such as the licence holder’s name and contact details. If you are covered by the license because the licence holder is your supervisor, for example, please state this.
- Click Submit
Your request will then be processed by the ARCHER2 Service Desk who will confirm your license with the software owners/developers before enabling your access to the software on ARCHER2. This can take several days (depending on how quickly the software owners/developers take to respond) but you will be advised once this has been done.
Create a job submission script¶
To run a program on the ARCHER2 compute nodes you need to write a job submission script that tells the
system how many compute nodes you want to reserve and for how long. You also need to use the
command to launch your parallel executable.
For a more details on the Slurm scheduler on ARCHER2 and writing job submission scripts see the Running jobs on ARCHER2 section of the User and Best Practice Guide.
Parallel jobs on ARCHER2 should be run from the /work file system as /home is not available on the
compute nodes - you will see a
chdir or file not found error if you try to run a job from the /home file system.
Create a job submission script called
submit.slurm in your space on the work file system using your
favourite text editor. For example, using
auser@eslogin01:~> cd /work/t01/t01/auser auser@eslogin01:/work/t01/t01/auser> vim submit.slurm
You will need to use your project code and username to get to the correct directory. i.e. replace the t01 above with your project code and replace the username auser with your ARCHER2 username.
Paste the following text into your job submission script, replacing
your budget code e.g.
#!/bin/bash --login #SBATCH --job-name=test_job #SBATCH --nodes=1 #SBATCH --tasks-per-node=128 #SBATCH --cores-per-task==1 #SBATCH --time=0:5:0 #SBATCH -account=ENTER_YOUR_BUDGET_CODE_HERE # Load the xthi module to get access to the xthi program module load xthi # srun launches the parallel program based on the SBATCH options srun --cppu-bind=cores xthi
Submit your job to the queue¶
You submit your job to the queues using the
auser@eslogin01:/work/t01/t01/auser> sbatch submit.slurm Submitted batch job 23996 The value returned is your *Job ID*.
Monitoring your job¶
You use the
squeue command to examine jobs in the queue. Use:
auser@eslogin01:/work/t01/t01/auser> squeue -u $USER
To list all the jobs you have in the queue.
squeue on its own lists all jobs
in the queue from all users.
Checking the output from the job¶
The job submission script above should write the output to a file called
(i.e. if the Job ID was 23996, the file would be
slurm-23996.out), you can check the contents
of this file with the
cat command. If the job was successful you should see output that looks
auser@eslogin01:/work/t01/t01/auser> cat slurm-23996.out Hello from rank 20, thread 0, on nid00001. (core affinity = 20) Hello from rank 27, thread 0, on nid00001. (core affinity = 27) Hello from rank 23, thread 0, on nid00001. (core affinity = 23) Hello from rank 34, thread 0, on nid00001. (core affinity = 34) Hello from rank 18, thread 0, on nid00001. (core affinity = 18) Hello from rank 33, thread 0, on nid00001. (core affinity = 33) Hello from rank 19, thread 0, on nid00001. (core affinity = 19) Hello from rank 22, thread 0, on nid00001. (core affinity = 22) Hello from rank 6, thread 0, on nid00001. (core affinity = 6) Hello from rank 26, thread 0, on nid00001. (core affinity = 26) Hello from rank 31, thread 0, on nid00001. (core affinity = 31) Hello from rank 21, thread 0, on nid00001. (core affinity = 21) Hello from rank 35, thread 0, on nid00001. (core affinity = 35) Hello from rank 32, thread 0, on nid00001. (core affinity = 32) Hello from rank 28, thread 0, on nid00001. (core affinity = 28) Hello from rank 25, thread 0, on nid00001. (core affinity = 25) Hello from rank 24, thread 0, on nid00001. (core affinity = 24) Hello from rank 30, thread 0, on nid00001. (core affinity = 30) Hello from rank 29, thread 0, on nid00001. (core affinity = 29) Hello from rank 10, thread 0, on nid00001. (core affinity = 10) Hello from rank 2, thread 0, on nid00001. (core affinity = 2) Hello from rank 11, thread 0, on nid00001. (core affinity = 11) Hello from rank 0, thread 0, on nid00001. (core affinity = 0) Hello from rank 1, thread 0, on nid00001. (core affinity = 1) Hello from rank 7, thread 0, on nid00001. (core affinity = 7) Hello from rank 4, thread 0, on nid00001. (core affinity = 4) Hello from rank 3, thread 0, on nid00001. (core affinity = 3) Hello from rank 5, thread 0, on nid00001. (core affinity = 5) Hello from rank 8, thread 0, on nid00001. (core affinity = 8) Hello from rank 9, thread 0, on nid00001. (core affinity = 9) Hello from rank 12, thread 0, on nid00001. (core affinity = 12) Hello from rank 13, thread 0, on nid00001. (core affinity = 13) Hello from rank 14, thread 0, on nid00001. (core affinity = 14) Hello from rank 15, thread 0, on nid00001. (core affinity = 15) Hello from rank 16, thread 0, on nid00001. (core affinity = 16) Hello from rank 17, thread 0, on nid00001. (core affinity = 17) ... output trimmed ...
If something has gone wrong, you will find any error messages in the file instead of the expected output.