0% found this document useful (0 votes)
75 views14 pages

05 RSB Cluster

The document outlines a workshop on the RSB computer cluster at ANU, detailing its configuration, data storage policies, and job scheduling system using SLURM. Participants will learn to write and run a variant calling pipeline and manage files between the server and local devices. Key instructions include connecting to the server, using SBATCH scripts for job submission, and understanding data storage limitations.

Uploaded by

navinp1281
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views14 pages

05 RSB Cluster

The document outlines a workshop on the RSB computer cluster at ANU, detailing its configuration, data storage policies, and job scheduling system using SLURM. Participants will learn to write and run a variant calling pipeline and manage files between the server and local devices. Key instructions include connecting to the server, using SBATCH scripts for job submission, and understanding data storage limitations.

Uploaded by

navinp1281
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY)

LINUX

CRICOS PROVIDER CODE: 00120C


WORKSHOP
05 RSB Computer Cluster

By Jiajia Li (ANU Biological Data Science Institute)


06/03/2025
Learning Objectives

• Learn the configuration of RSB computer cluster


• Learn the data storage policy of RSB cluster
• Learn the job scheduling system

• Write and run the variant calling pipeline

2 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
RSB Computer Cluster

The RSB computer cluster consists of 4 servers, including 2 CPU servers and 2 GPU servers.
The servers work together and are controlled and scheduled by SLURM.

The specs of 4 servers:

3 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Assessing RSB servers

To access the RSB server, we need to:


1. Connect to GlobalProtect
2. Connect to server using `ssh` command
3. Using your UID as account name and ANU password

4 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Welcome message

And you can also see the current usage of the server.

5 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Data storage locations on RSB server

• Home directory: /mnt/data/(server)/home/UID, 100GB per user


• Groups directory: /mnt/data/(server)/home/groups, 500GB per group
• Projects directory: /mnt/data/(server)/home/projects, 250GB per project

Scratch Space: /mnt/data/(server)/home/scratch/…


• No limitation
• Data not backed up
• Files will be deleted after 130 days
• You can store temporary files here

6 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Job Scheduling System - SLURM

A job scheduling system, also called Workload Management System or Cluster Management
System, is a software designed to efficiently allocate and manage computing resources in a
distributed computing environment.

These systems are commonly used by high-performance computing clusters, data centres,
and other large-scale computing infrastructures.

Their primary purpose is to optimise the utilisation of available resources while ensuring fair
access to those resources for multiple users.

7 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Job Scheduling System - SLURM

The RSB cluster uses SLURM, which is an open-source project.

The NCI’s supercomputer Gadi uses PBS Professional. It has a similar syntax to SLURM, and
you can quickly learn PBS Pro after you learn SLURM.

8 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
SBATCH script

To submit a job to SLURM, you need to write a SBARCH script which includes a SBATCH
header with several settings.

9 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
SBATCH script

On the cluster, you have to specify every directory and file from root /.

To use conda environment, write this in your SBATCH script:

The second path is where conda install packages in our environment.

You can use `cd ~/.conda/envs` to see what’s inside.

Avoid using `cd` command in the SBATCH script, it sometimes doesn’t work.

10 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Submit a job

Let’s save our SBATCH script to “job.sh”. The SBATCH script is also a shell script.

To submit a job, we run `sbatch job.sh`.

11 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Practise

Please set up the variant calling Conda environment on the cluster.

Download all needed packages.

Modify your previous shell script and submit it as a SLURM job.

12 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
Download files from Server to Local

You need to run this command on your local device, not the server.

Let’s download the Final Variants files.

Copy from remote to Local:


scp [email protected]:~/variant-calling/results/*_final_variants.vcf ~/variant-calling

Copy from Local to Remote:


scp ~/variant-calling/NexteraPE-PE.fa [email protected]:~/variant-calling

13 ANU BIOLOGICAL DATA SCIENCE INSTITUTE | JIAJIA LI 6/03/2025


TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) CRICOS PROVIDER CODE: 00120C
THANK YOU
Contact Us

Jiajia Li
ANU Biological Data Science Institute

RN Robertson Building, 46 Sullivan’s Creek Rd


Canberra ACT 2600

E [email protected]
W https://bdsi.anu.edu.au/

TEQSA PROVIDER ID: PRV12002 (AUSTRALIAN UNIVERSITY) | CRICOS PROVIDER CODE: 00120C

You might also like