R on the Cluster
Managing R versions, packages, and environments
R is the primary language for most users on this cluster. This guide covers how to work effectively with R in a shared HPC environment.
Loading R
R is available through the module system:
# See available versions
module avail R
# Load a specific version
module load R/4.5.3
# Check it worked
R --versionAdd your preferred R version to ~/.bashrc:
echo 'module load R/4.5.3' >> ~/.bashrcInstalling packages
Install packages on the head node (it has internet access):
module load R/4.5.3
Rinstall.packages("data.table")
install.packages("tidyverse")Packages are installed to ~/R/x86_64-pc-linux-gnu-library by default and are available on all nodes.
Packages that download data
Some packages download additional files on first use. Run these on the head node first (it has internet), then they’ll work on compute nodes.
For torch, you need to install the CUDA-enabled backend after installing the package — see GPU Jobs > Installing torch with CUDA support for step-by-step instructions.
Parallel R
There are two fundamentally different approaches to parallelism on a cluster: running one multi-threaded job (Approach A) or many single-threaded jobs (Approach B). This section covers Approach A — within-job parallelism using R. For Approach B (many independent jobs), see batchtools.
Within-job parallelism
Within-job parallelism is useful when you have a single job (interactive or batch) with multiple cores and want to parallelize a loop or apply operation within that job. We recommend the future and mirai ecosystems as modern, flexible approaches. The base R parallel package works too but is less flexible.
First, request multiple cores:
# Interactive
salloc --cpus-per-task=8 --mem=16G --time=02:00:00
# Or in a batch script
#SBATCH --cpus-per-task=8
#SBATCH --mem=16GWith future (recommended)
The future package provides a clean, unified interface for parallel evaluation. Combined with future.apply, it offers parallel versions of lapply(), Map(), vapply(), etc.
library(future)
library(future.apply)
# Automatically uses all cores allocated by Slurm
plan(multicore, workers = availableCores())
# Parallel lapply — works just like base lapply()
results <- future_lapply(1:100, function(i) {
Sys.sleep(0.1) # some slow computation
sqrt(i)
})You can also use future directly for more control:
library(future)
plan(multicore, workers = availableCores())
# Launch two independent computations
f1 <- future({ slow_model_fit(data_part1) })
f2 <- future({ slow_model_fit(data_part2) })
# Collect results (blocks until done)
result1 <- value(f1)
result2 <- value(f2)The furrr package adds future_map() and friends if you prefer purrr-style syntax.
With mirai (recommended)
mirai is a modern, lightweight parallelism framework. It is fast and has minimal overhead. Combined with mirai.promises, it also integrates well with Shiny.
library(mirai)
# Launch local daemons using allocated cores
daemons(n = as.integer(Sys.getenv("SLURM_CPUS_PER_TASK", "1")))
# Submit tasks
m <- mirai(slow_function(x), x = my_data)
# Collect result (blocks until done)
m[]For parallel map-style operations, use mirai_map():
library(mirai)
daemons(n = as.integer(Sys.getenv("SLURM_CPUS_PER_TASK", "1")))
# Similar to lapply(), but runs in parallel
results <- mirai_map(1:100, function(i) {
Sys.sleep(0.1)
sqrt(i)
}) |> lapply(function(m) m[])With parallel (base R)
The parallel package ships with R and requires no installation. It is more limited than future or mirai but fine for simple use cases:
library(parallel)
ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK", "1"))
results <- mclapply(1:100, function(i) {
sqrt(i)
}, mc.cores = ncores)Always make sure the number of parallel workers matches the cores you requested from Slurm. Requesting --cpus-per-task=4 but spawning 16 workers will oversubscribe the node and slow you down.
Beyond single-node parallelism
The approaches above run workers within a single Slurm job. For multi-node parallelism — distributing work across multiple compute nodes — there are two main approaches:
- batchtools (recommended) — Submit and manage Slurm jobs from R. Each unit of work (e.g. one simulation repetition, one model fit) becomes a separate Slurm job. Robust, well-tested, and the standard tool for simulation studies and benchmark experiments on this cluster. Most users are already familiar with it.
- targets — Pipeline-based workflows with dependency tracking, caching, and automatic Slurm worker submission via
crew.cluster. More flexible than batchtools for complex pipelines where steps have dependencies, and targets only re-runs what changed. Worth learning for larger or iterative projects.
Common issues
Package installation fails
Some packages need system libraries. Check the error message and ask the admin to install missing system dependencies (not R dependencies, you install those yourself).
Make sure that during installation you
- Are using the P3M repository (
https://p3m.dev/...) - Are getting binary rather than source packages
For example:
#> install.packages("stringr")
Installing package into '/home/burk/R/x86_64-pc-linux-gnu-library/4.5'
(as 'lib' is unspecified)
trying URL 'https://p3m.dev/cran/latest/bin/linux/rhel9-x86_64/4.5/src/contrib/stringr_1.6.0.tar.gz'
Content type 'binary/octet-stream' length 351116 bytes (342 KB)
==================================================
downloaded 342 KB
* installing *binary* package 'stringr' ...
* DONE (stringr)That ensures you get precompile packages that do not need compilation which is slower and can include all sorts of issues if some system dependency is unmet.
“Package not found” in batch job
Make sure you:
- Loaded the same R version as when you installed packages
- Installed packages on the head node first
Memory issues with large datasets
Use data.table instead of data.frame for better memory efficiency:
library(data.table)
dt <- fread("large_file.csv") # Much faster and lighter than read.csvRequest more memory if needed:
salloc --mem=64G ...But note that at some point you will essentially block an entire node for yourself. Please be very certain if you actually need to do that.
For a more comfortable development experience, see the Connecting to the Cluster guide for setting up Positron or VS Code with remote SSH.