Using {batchtools}
Submitting Slurm jobs from R
{batchtools} is an R package for submitting and managing Slurm jobs directly from R. It handles sbatch submission, log collection, result retrieval, and job status tracking – so you work in R instead of writing shell scripts.
The cluster ships with a default configuration and a custom Slurm template that work out of the box. You can use them as-is or override them per project.
Quick start
library(batchtools)
# Create a registry (stores jobs and results in a directory)
reg <- makeRegistry(file.dir = "my_registry", seed = 1)
# Define jobs -- here, a simple function applied to different inputs
batchMap(function(x) {
Sys.sleep(2)
x^2
}, x = 1:10)
# Submit all jobs to Slurm (uses cluster defaults)
submitJobs()
# Check status
getStatus()
# Wait for completion, then collect results
waitForJobs()
reduceResultsList()That’s it. Behind the scenes, batchtools generates sbatch scripts from the cluster template, submits them to Slurm, and tracks everything in the registry directory.
Cluster defaults
The cluster provides default configuration files at /etc/xdg/batchtools/:
| File | Purpose |
|---|---|
config.R |
Sets the Slurm cluster functions, default resources, and job limits |
slurm_bips.tmpl |
Custom Slurm template with resource validation and QoS handling |
{batchtools} automatically picks up /etc/xdg/batchtools/config.R as a fallback when no project-level or user-level config is found. You don’t need to do anything to use it.
Default resource values
The cluster defaults are:
| Resource | Default | Meaning |
|---|---|---|
ncpus |
1 | CPU cores per task |
memory |
1024 | MB per CPU (--mem-per-cpu) |
hours |
6 | Walltime (6 hours) |
qos |
"medium" |
QoS level (1 day limit) |
partition |
"compute" |
Slurm partition |
measure.memory |
TRUE |
Track peak memory usage (rough heuristic, will likely underestimate) |
The maximum number of concurrent jobs (max.concurrent.jobs) is set to 500 as a somewhat conservative default.
Overriding defaults per submission
Pass a resources list to submitJobs() to override any default:
submitJobs(resources = list(
ncpus = 4,
memory = 4096,
hours = 48,
qos = "long"
))Only the resources you specify are overridden – everything else keeps the default value.
Resources reference
The Slurm template supports the following resources. Set them via submitJobs(resources = list(...)) or as default.resources in your config.
CPU and memory
| Resource | Type | Default | Description |
|---|---|---|---|
ncpus |
integer | 1 | Number of CPU cores per task. Use this for multicore parallelism (parallel, future). |
ntasks |
integer | 1 | Number of MPI tasks. Only set > 1 for MPI parallelism (Rmpi, pbdMPI) (Not yet implemented). |
memory |
integer | 1024 | Memory in MB per CPU (--mem-per-cpu). Must be at least 100. |
ncpus vs ntasks
Most R users want ncpus for parallel computing (e.g., mclapply(mc.cores = 4) or future::plan(multicore, workers = 4)). Setting ntasks > 1 is for MPI and will trigger a warning if used.
Time specification
Specify walltime using exactly one of these options:
| Resource | Type | Description |
|---|---|---|
walltime |
integer | Walltime in seconds |
days |
integer | Walltime in days |
hours |
integer | Walltime in hours |
minutes |
integer | Walltime in minutes |
submitJobs(resources = list(hours = 6)) # 6 hours
submitJobs(resources = list(days = 3)) # 3 days
submitJobs(resources = list(minutes = 30)) # 30 minutesIf you don’t specify any time, the walltime defaults to the QoS limit (1 day for the default "medium" QoS).
Specifying multiple time units (e.g., both hours and days) will cause an error. This is intentional – it prevents accidental time accumulation when your default.resources (e.g., hours = 6) gets merged with per-job resources (e.g., days = 1), which would otherwise silently result in 1 day + 6 hours.
If you have a time setting in your default.resources, either:
- Always use the same unit when submitting jobs, or
- Remove the time setting from defaults and specify it per-job
hours / days over walltime
hours = 6 is clearer than walltime = 21600. The template handles the conversion.
QoS (Quality of Service)
| Resource | Type | Default | Description |
|---|---|---|---|
qos |
character | "medium" |
QoS level. See Slurm Basics – QoS for available levels and their time limits. |
In most cases you don’t need to set qos explicitly. The template handles it automatically:
- Only walltime set (recommended): The template picks the smallest QoS that fits your walltime. For example,
hours = 3auto-selects"medium",days = 3auto-selects"long". - Only QoS set (or neither): The walltime is set conservatively to the QoS limit. With the default
"medium", that means 1 day. - Both set: The template validates that the walltime fits within the QoS limit and errors if it doesn’t.
# Template auto-selects qos = "long" (7 days covers 3 days)
submitJobs(resources = list(days = 3))
# Template auto-selects qos = "short" (1 hour covers 30 minutes)
submitJobs(resources = list(minutes = 30))The "interactive" QoS is not allowed for batch jobs and will be rejected.
GPU
| Resource | Type | Default | Description |
|---|---|---|---|
gpus |
integer | — | Number of GPUs (1–2). |
submitJobs(resources = list(gpus = 1, partition = "gpu"))Node selection
| Resource | Type | Default | Description |
|---|---|---|---|
partition |
character | "compute" |
Slurm partition ("compute" or "gpu"). |
nodelist |
character | — | Run on specific node(s), e.g. "node01" or "node[11-12]". |
exclude |
character | — | Exclude node(s), e.g. "node01". |
Email notifications
| Resource | Type | Default | Description |
|---|---|---|---|
mail_type |
character | — | When to notify: "BEGIN", "END", "FAIL", "ALL", or comma-separated combination. |
mail_user |
character | — | Email address for notifications. |
submitJobs(resources = list(
mail_type = "END,FAIL",
mail_user = "name@leibniz-bips.de"
))Deadline
| Resource | Type | Default | Description |
|---|---|---|---|
deadline |
character | — | Slurm removes the job if it can’t finish by this time. Format: "YYYY-MM-DD" or "YYYY-MM-DD HH:MM" (Europe/Berlin timezone). |
# Job must finish by Friday end of day
submitJobs(resources = list(deadline = "2026-02-06 18:00"))Priority
| Resource | Type | Default | Description |
|---|---|---|---|
nice |
integer | — | Priority adjustment (-10000 to 10000). Higher values = lower priority. Negative values require elevated privileges. |
R version
| Resource | Type | Default | Description |
|---|---|---|---|
r_version |
character | (auto) | R version to load on the compute node, e.g. "4.5.2". |
By default, the template detects the R version of your submitting session and loads the same version on the compute node via module load R/<version>. You only need to set this if you want a different version than the one you’re currently running.
# Force a specific R version on the compute node
submitJobs(resources = list(r_version = "4.4.3"))Threading control
| Resource | Type | Default | Description |
|---|---|---|---|
omp.threads |
integer | — | Sets OMP_NUM_THREADS on the compute node. |
blas.threads |
integer | — | Sets OPENBLAS_NUM_THREADS on the compute node. |
Other
| Resource | Type | Default | Description |
|---|---|---|---|
comment |
character | — | Annotation for the job, visible in squeue. Useful for identifying jobs by project. |
Customizing the configuration
{batchtools} looks for configuration files in this order (first found wins):
batchtools.conf.Rin the current working directory (project-level)~/.batchtools.conf.Rin your home directory (user-level)/etc/xdg/batchtools/config.R(cluster default)
Per-project configuration
To customize settings for a specific project, create a batchtools.conf.R in the project directory:
# batchtools.conf.R -- project-level overrides
# Use the cluster template (same as default)
cluster.functions <- makeClusterFunctionsSlurm(
"/etc/xdg/batchtools/slurm_bips.tmpl",
array.jobs = TRUE
)
# Override defaults for this project
default.resources <- list(
ncpus = 4,
hours = 12,
memory = 4096,
qos = "long",
partition = "compute"
)Per-user configuration
For user-wide defaults, create ~/.batchtools.conf.R:
# ~/.batchtools.conf.R -- user-level defaults
cluster.functions <- makeClusterFunctionsSlurm(
"/etc/xdg/batchtools/slurm_bips.tmpl",
array.jobs = TRUE
)
default.resources <- list(
ncpus = 1,
hours = 6,
memory = 2048,
qos = "medium",
partition = "compute",
mail_type = "FAIL",
mail_user = "name@leibniz-bips.de"
)Using a custom template
If you need to modify the Slurm template itself (e.g., to add extra module load commands or environment setup), copy it to your project or home directory and point your config at the copy:
# Copy the template
cp /etc/xdg/batchtools/slurm_bips.tmpl ~/my_slurm.tmpl# In your batchtools.conf.R, point to the copy
cluster.functions <- makeClusterFunctionsSlurm(
"~/my_slurm.tmpl",
array.jobs = TRUE
)The default template includes input validation, QoS auto-selection, and R version handling. Copy it rather than writing one from scratch.
Registries and experiments
{batchtools} provides two workflow patterns:
Simple registry (makeRegistry)
Best for applying a function to many inputs:
reg <- makeRegistry(file.dir = "sim_registry", seed = 42)
# Map a function over inputs
batchMap(function(n, dist) {
data <- switch(dist,
normal = rnorm(n),
uniform = runif(n)
)
mean(data)
}, n = c(100, 1000, 10000), dist = c("normal", "uniform", "normal"))
# Submit and collect
submitJobs()
waitForJobs()
reduceResultsList()Experiment registry (makeExperimentRegistry)
Best for structured simulation studies with multiple problems/algorithms:
reg <- makeExperimentRegistry(file.dir = "experiment", seed = 42)
# Define a "problem" (data generation)
addProblem("sim_data", fun = function(n, ...) {
list(data = rnorm(n))
})
# Define "algorithms" (methods to compare)
addAlgorithm("mean", fun = function(instance, ...) mean(instance$data))
addAlgorithm("median", fun = function(instance, ...) median(instance$data))
# Create experiment grid
addExperiments(
prob.designs = list(sim_data = data.table(n = c(100, 1000))),
repls = 50 # 50 replications each
)
# Submit all
submitJobs()Job management
# Status overview
getStatus()
# Find specific jobs
findNotSubmitted()
findRunning()
findDone()
findErrors()
# View logs of failed jobs
getLog(id = 42)
# Or for errors specifically
getErrorMessages()
# Resubmit failed jobs
submitJobs(findErrors())
# Cancel running jobs
killJobs(findRunning())
# Clean up a registry
removeRegistry("experiment")Common patterns
Multicore parallelism within jobs
To use parallel::mclapply() or future with multiple cores per job:
submitJobs(resources = list(ncpus = 4))Then inside your function:
# With parallel
parallel::mclapply(data, my_fun, mc.cores = 4)
# With future
future::plan(future::multicore, workers = 4)Make sure the number of workers in your R code matches ncpus. Requesting 4 CPUs but using mc.cores = 8 will oversubscribe the allocation.
Large memory jobs
submitJobs(resources = list(
memory = 8192, # 8 GB per CPU
ncpus = 1
))Total memory for the job is memory * ncpus. With memory = 8192 and ncpus = 4, you get 32 GB total.
Long-running jobs
submitJobs(resources = list(
days = 5,
qos = "long" # optional -- auto-selected from walltime
))Annotating jobs for tracking
submitJobs(resources = list(
comment = "my_project_sim_v3"
))The comment is visible in squeue --me --format="%.18i %.9P %.30j %.8T %.10M %.9l %.6D %k".