R on the Cluster

Managing R versions, packages, and environments

Modified

2026-01-22

R is the primary language for most users on this cluster. This guide covers how to work effectively with R in a shared HPC environment.

Loading R

R is available through the module system:

# See available versions
module avail R

# Load a specific version
module load R/4.5.2

# Check it worked
R --version
TipMake it permanent

Add your preferred R version to ~/.bashrc:

echo 'module load R/4.5.2' >> ~/.bashrc

Installing packages

Install packages on the head node (it has internet access):

module load R/4.5.2
R
install.packages("data.table")
install.packages("tidyverse")

Packages are installed to ~/R/x86_64-pc-linux-gnu-library by default and are available on all nodes.

Packages that download data

Some packages download additional files on first use (e.g., torch downloads libtorch). Run these on the head node first:

library(torch)  # First load triggers download

Then they’ll work on compute nodes without internet.

Beware however that torch automatically detects if a GPU is available and downloads the corresponding version. That means that installing torch on the head node will lead to you loading the CPU-only version on the GPU-node and not being able to use the GPUs. A workaround is being worked on.

Parallel R

Using multiple cores

Request multiple cores and use them in R:

salloc --cpus-per-task=8 --mem=16G --time=02:00:00
library(parallel)
ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK", "1"))

results <- mclapply(1:100, function(i) {
  # Your computation
  sqrt(i)
}, mc.cores = ncores)

Using future

The future package provides a cleaner interface:

library(future)
library(future.apply)

# Use all allocated cores
plan(multicore, workers = availableCores())

# Parallel apply
results <- future_lapply(1:100, function(i) {
  sqrt(i)
})

Common issues

Package installation fails

Some packages need system libraries. Check the error message and ask the admin to install missing system dependencies (not R dependencies, you install those yourself).

Make sure that during installation you

  1. Are using the P3M repository (https://p3m.dev/...)
  2. Are getting binary arather than source packages

For example:

#> install.packages("stringr")
Installing package into '/home/burk/R/x86_64-pc-linux-gnu-library/4.5'
(as 'lib' is unspecified)
trying URL 'https://p3m.dev/cran/latest/bin/linux/rhel9-x86_64/4.5/src/contrib/stringr_1.6.0.tar.gz'
Content type 'binary/octet-stream' length 351116 bytes (342 KB)
==================================================
downloaded 342 KB

* installing *binary* package 'stringr' ...
* DONE (stringr)

That ensures you get precompile packages that do not need compilation which is slower and can include all sorts of issues if some system dependency is unmet.

“Package not found” in batch job

Make sure you:

  1. Loaded the same R version as when you installed packages
  2. Installed packages on the head node first

Memory issues with large datasets

Use data.table instead of data.frame for better memory efficiency:

library(data.table)
dt <- fread("large_file.csv")  # Much faster and lighter than read.csv

Request more memory if needed:

salloc --mem=64G ...

But note that at some point you will essentially block an entire node for yourself. Please be very certain if you actually need to do that.

TipWorking with an IDE

For a more comfortable development experience, see the Connecting to the Cluster guide for setting up Positron or VS Code with remote SSH.