Cluster Resources
Hardware specifications and limits
This page summarizes the available hardware and resource limits.
Hardware overview
Compute nodes (12x)
Each of the 12 compute nodes (node01–node12) is identical:
| Resource | Spec |
|---|---|
| CPU | AMD EPYC 9655P (Zen 5, Turin) |
| Cores | 96 |
| Threads per core | 2 |
| Base clock | 2.6 GHz |
| L3 cache | 384 MB |
| RAM | 1152 GB DDR5-5600 ECC (24x 48GB, i.e., 12GB per core, 6GB per thread) |
GPU node (1x)
The GPU node (gnode01) has two CPUs and two GPUs:
| Resource | Spec |
|---|---|
| GPUs | 2x NVIDIA H200 (141 GB HBM3e each, PCIe) |
| CPUs | 2x AMD EPYC 9555 (Zen 5, Turin) |
| Cores | 128 (2x 64) |
| Threads per core | 1 (SMT disabled) |
| Base clock | 3.2 GHz |
| L3 cache | 512 MB (2x 256 MB) |
| RAM | 3072 GB DDR5-4800 ECC (48x 64GB, i.e., 24GB per core) |
Head node
| Resource | Spec |
|---|---|
| CPU | AMD EPYC 7443 (Zen 3, Milan) |
| Cores | 24 |
| Base clock | 2.85 GHz |
| L3 cache | 128 MB |
| RAM | 1024 GB DDR4-3200 ECC |
The head node is for login, package installation, and job submission. Do not run computations here.
Partitions
sinfo # View current partition status| Partition | Nodes | Default | Max time |
|---|---|---|---|
compute |
12 (node01-12) | Yes | 20 days |
gpu |
1 (gnode01) | No | 20 days |
Quality of Service (QoS)
| QoS | Max time | Limits | Priority |
|---|---|---|---|
interactive |
3 days | 2 jobs, 192 CPUs | highest |
short |
1 hour | – | high |
medium |
1 day | – | medium |
long |
7 days | – | low |
extended |
20 days | 1 job | lowest |
normal |
(partition default) | – | baseline |
The interactive QoS is automatically applied for salloc sessions.
Storage
| Path | Type | Purpose |
|---|---|---|
/srv/home/<user> |
NFS (NVMe RAID5) | Home directory, scripts, active project data |
/mnt/sas |
NFS (HDD array) | Long-term storage, archives, separate directories for users, working groups, and projects |
/srv/home lean
/srv/home is kept on fast NVMe storage with limited capacity. Move inactive projects and large datasets you don’t actively need to /mnt/sas to keep space available for everyone. /mnt/sas is slower, but has substantially more capacity.
Archive storage (/mnt/sas)
The /mnt/sas directory is intended for large datasets and archival storage – data that you don’t need immediate access to but want to keep available on the cluster.
| Path | Purpose |
|---|---|
/mnt/sas/users/<username> |
Your personal archive space |
/mnt/sas/groups/<group> |
Shared data for your research group |
/mnt/sas/projects/<project> |
Project-specific shared storage (restricted access) |
/mnt/sas/scratch |
Temporary workspace (may be cleaned periodically) |
Project storage (/mnt/sas/projects)
Some research projects have dedicated shared storage under /mnt/sas/projects/<project>. Access to project directories is restricted – only members of the project group can read or write files there, and no other users can access the data.
- Project membership is granted by the responsible PI and enforced by the cluster administrator.
- Access may be time-limited and will automatically expire after the agreed-upon date.
- If you need access to a project directory, contact the PI responsible for the project.
- If you believe your access has expired in error, contact the cluster administrator.
Local scratch disk (/localdisk)
Each node has a fast local SSD mounted at /localdisk. This is not shared across nodes — each node has its own independent disk.
| Node | Disk | Usable space |
|---|---|---|
| Compute nodes | 2x 480 GB SATA SSD (RAID-1) | ~417 GB |
| GPU node | 2x 1.92 TB NVMe SSD (RAID-1) | ~1.8 TB |
When you run a Slurm job, a per-job scratch directory is automatically created and cleaned up afterwards. Two environment variables point to it:
TMPDIR— set to/localdisk/slurm-<jobid>LOCALDISK_DIR— same path
Use these for temporary files that benefit from fast local I/O (e.g. intermediate results, caches, temporary databases). Data written here is automatically deleted when your job ends.
# tempdir() automatically uses the job scratch dir (follows TMPDIR)
tempdir()
#> "/localdisk/slurm-12345/RtmpXyz"
# For explicit access to the scratch directory
scratch <- Sys.getenv("LOCALDISK_DIR")# In shell scripts
echo "$TMPDIR" # /localdisk/slurm-12345
echo "$LOCALDISK_DIR" # /localdisk/slurm-12345import os, tempfile
tempfile.gettempdir() # /localdisk/slurm-12345
os.environ["LOCALDISK_DIR"] # /localdisk/slurm-12345Do not write directly to /localdisk/ outside of your job’s scratch directory — there is no automatic cleanup for files outside TMPDIR.
The cluster storage is not backed up. Any deleted data cannot be restored. This applies to all storage paths (/srv/home, /mnt/sas, /srv/data). The cluster is meant for active computation, not as a primary archive.
Useful commands
# Cluster status
sinfo
# Your running jobs
squeue --me
# Your past jobs
sacct --starttime=today
# Detailed job info
scontrol show job <jobid>
# QoS limits
sacctmgr show qos format=name,maxwall,maxjobspu
# Node details
scontrol show node <nodename>