TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY)
LINUX
CRICOS PROVIDER CODE: 00120C
WORKSHOP
05 RSB Computer Cluster
By Jiajia Li (ANU Biological Data Science Institute)
06/03/2025
Learning Objectives
• Learn the configuration of RSB computer cluster
• Learn the data storage policy of RSB cluster
• Learn the job scheduling system
• Write and run the variant calling pipeline
2 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
RSB Computer Cluster
The RSB computer cluster consists of 4 servers, including 2 CPU servers and 2 GPU servers.
The servers work together and are controlled and scheduled by SLURM.
The specs of 4 servers:
3 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Assessing RSB servers
To access the RSB server, we need to:
1. Connect to GlobalProtect
2. Connect to server using `ssh` command
3. Using your UID as account name and ANU password
4 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Welcome message
And you can also see the current usage of the server.
5 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Data storage locations on RSB server
• Home directory: /mnt/data/(server)/home/UID, 100GB per user
• Groups directory: /mnt/data/(server)/home/groups, 500GB per group
• Projects directory: /mnt/data/(server)/home/projects, 250GB per project
Scratch Space: /mnt/data/(server)/home/scratch/…
• No limitation
• Data not backed up
• Files will be deleted after 130 days
• You can store temporary files here
6 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Job Scheduling System - SLURM
A job scheduling system, also called Workload Management System or Cluster Management
System, is a software designed to efficiently allocate and manage computing resources in a
distributed computing environment.
These systems are commonly used by high-performance computing clusters, data centres,
and other large-scale computing infrastructures.
Their primary purpose is to optimise the utilisation of available resources while ensuring fair
access to those resources for multiple users.
7 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Job Scheduling System - SLURM
The RSB cluster uses SLURM, which is an open-source project.
The NCI’s supercomputer Gadi uses PBS Professional. It has a similar syntax to SLURM, and
you can quickly learn PBS Pro after you learn SLURM.
8 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
SBATCH script
To submit a job to SLURM, you need to write a SBARCH script which includes a SBATCH
header with several settings.
9 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
SBATCH script
On the cluster, you have to specify every directory and file from root /.
To use conda environment, write this in your SBATCH script:
The second path is where conda install packages in our environment.
You can use `cd ~/.conda/envs` to see what’s inside.
Avoid using `cd` command in the SBATCH script, it sometimes doesn’t work.
10 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Submit a job
Let’s save our SBATCH script to “job.sh”. The SBATCH script is also a shell script.
To submit a job, we run `sbatch job.sh`.
11 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Practise
Please set up the variant calling Conda environment on the cluster.
Download all needed packages.
Modify your previous shell script and submit it as a SLURM job.
12 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Download files from Server to Local
You need to run this command on your local device, not the server.
Let’s download the Final Variants files.
Copy from remote to Local:
scp
[email protected]:~/variant-calling/results/*_final_variants.vcf ~/variant-calling
Copy from Local to Remote:
scp ~/variant-calling/NexteraPE-PE.fa
[email protected]:~/variant-calling
13 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
THANK YOU
Contact Us
Jiajia Li
ANU Biological Data Science Institute
RN Robertson Building, 46 Sullivan’s Creek Rd
Canberra ACT 2600
E [email protected]
W https://bdsi.anu.edu.au/
TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) | CRICOS PROVIDER CODE: 00120C