R on the Cluster
Managing R versions, packages, and environments
R is the primary language for most users on this cluster. This guide covers how to work effectively with R in a shared HPC environment.
Loading R
R is available through the module system:
# See available versions
module avail R
# Load a specific version
module load R/4.5.2
# Check it worked
R --versionAdd your preferred R version to ~/.bashrc:
echo 'module load R/4.5.2' >> ~/.bashrcInstalling packages
Install packages on the head node (it has internet access):
module load R/4.5.2
Rinstall.packages("data.table")
install.packages("tidyverse")Packages are installed to ~/R/x86_64-pc-linux-gnu-library by default and are available on all nodes.
Packages that download data
Some packages download additional files on first use (e.g., torch downloads libtorch). Run these on the head node first:
library(torch) # First load triggers downloadThen they’ll work on compute nodes without internet.
Beware however that torch automatically detects if a GPU is available and downloads the corresponding version. That means that installing torch on the head node will lead to you loading the CPU-only version on the GPU-node and not being able to use the GPUs. A workaround is being worked on.
Parallel R
Using multiple cores
Request multiple cores and use them in R:
salloc --cpus-per-task=8 --mem=16G --time=02:00:00library(parallel)
ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK", "1"))
results <- mclapply(1:100, function(i) {
# Your computation
sqrt(i)
}, mc.cores = ncores)Using future
The future package provides a cleaner interface:
library(future)
library(future.apply)
# Use all allocated cores
plan(multicore, workers = availableCores())
# Parallel apply
results <- future_lapply(1:100, function(i) {
sqrt(i)
})Common issues
Package installation fails
Some packages need system libraries. Check the error message and ask the admin to install missing system dependencies (not R dependencies, you install those yourself).
Make sure that during installation you
- Are using the P3M repository (
https://p3m.dev/...) - Are getting binary arather than source packages
For example:
#> install.packages("stringr")
Installing package into '/home/burk/R/x86_64-pc-linux-gnu-library/4.5'
(as 'lib' is unspecified)
trying URL 'https://p3m.dev/cran/latest/bin/linux/rhel9-x86_64/4.5/src/contrib/stringr_1.6.0.tar.gz'
Content type 'binary/octet-stream' length 351116 bytes (342 KB)
==================================================
downloaded 342 KB
* installing *binary* package 'stringr' ...
* DONE (stringr)That ensures you get precompile packages that do not need compilation which is slower and can include all sorts of issues if some system dependency is unmet.
“Package not found” in batch job
Make sure you:
- Loaded the same R version as when you installed packages
- Installed packages on the head node first
Memory issues with large datasets
Use data.table instead of data.frame for better memory efficiency:
library(data.table)
dt <- fread("large_file.csv") # Much faster and lighter than read.csvRequest more memory if needed:
salloc --mem=64G ...But note that at some point you will essentially block an entire node for yourself. Please be very certain if you actually need to do that.
For a more comfortable development experience, see the Connecting to the Cluster guide for setting up Positron or VS Code with remote SSH.