PyTorch via module
This guide shows how to run PyTorch with GPU acceleration using the official NVIDIA container.
Why use this setup?
Preinstalled PyTorch (v2.2+) with CUDA, cuDNN, and NCCL
No need to install Conda or pip packages
Reproducible containerized environment
Loading the PyTorch Module
To load the PyTorch container module:
module load apptainer/pytorch
This provides the following helper commands:
pytorch_exec: Run a command inside the containerpytorch_shell: Open an interactive shell inside the container
Interactive GPU Session
To run PyTorch interactively on a GPU node:
Request a GPU node
srun --partition=aoraki_gpu \
--gres=gpu:1 \
--cpus-per-task=4 \
--mem=8G \
--time=00:10:00 \
--pty bash
Load the module inside the GPU shell
module load apptainer/pytorch/24.04
Test PyTorch inside the container
pytorch_exec python3 -c "import torch; print('PyTorch:', torch.__version__); print('CUDA:', torch.cuda.is_available()); print('GPU:', torch.cuda.get_device_name(0))"
SLURM Batch Job Example
To submit a test job to SLURM, save this to pytorch_test.slurm:
#!/bin/bash
#SBATCH --job-name=pytorch-test
#SBATCH --partition=aoraki_gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=00:05:00
module load apptainer/pytorch/24.04
pytorch_exec python3 -c "
import torch
print('PyTorch:', torch.__version__)
print('CUDA:', torch.cuda.is_available())
print('GPU:', torch.cuda.get_device_name(0))
print('cuDNN:', torch.backends.cudnn.enabled)
print('NCCL:', torch.distributed.is_nccl_available())
"
Submit with:
sbatch pytorch_test.slurm
Running Your Own Scripts
To run your own PyTorch scripts inside the container:
pytorch_exec python3 my_training_script.py