PyTorch via module
--------------------

This guide shows how to run **PyTorch with GPU acceleration** using the official
NVIDIA container.

Why use this setup?
^^^^^^^^^^^^^^^^^^

- Preinstalled PyTorch (v2.2+) with CUDA, cuDNN, and NCCL
- No need to install Conda or pip packages
- Reproducible containerized environment

Loading the PyTorch Module
^^^^^^^^^^^^^^^^^^^^^^^^^^

To load the PyTorch container module:

.. code-block:: bash

   module load apptainer/pytorch

This provides the following helper commands:

- ``pytorch_exec``: Run a command inside the container
- ``pytorch_shell``: Open an interactive shell inside the container

Interactive GPU Session
^^^^^^^^^^^^^^^^^^^^^^^

To run PyTorch interactively on a GPU node:

1. **Request a GPU node**

.. code-block:: bash

   srun --partition=aoraki_gpu \
        --gres=gpu:1 \
        --cpus-per-task=4 \
        --mem=8G \
        --time=00:10:00 \
        --pty bash

2. **Load the module inside the GPU shell**

.. code-block:: bash

   module load apptainer/pytorch/24.04

3. **Test PyTorch inside the container**

.. code-block:: bash

   pytorch_exec python3 -c "import torch; print('PyTorch:', torch.__version__); print('CUDA:', torch.cuda.is_available()); print('GPU:', torch.cuda.get_device_name(0))"

SLURM Batch Job Example
^^^^^^^^^^^^^^^^^^^^^^^

To submit a test job to SLURM, save this to ``pytorch_test.slurm``:

.. code-block:: bash

   #!/bin/bash
   #SBATCH --job-name=pytorch-test
   #SBATCH --partition=aoraki_gpu
   #SBATCH --gres=gpu:1
   #SBATCH --cpus-per-task=4
   #SBATCH --mem=8G
   #SBATCH --time=00:05:00

   module load apptainer/pytorch/24.04

   pytorch_exec python3 -c "
   import torch
   print('PyTorch:', torch.__version__)
   print('CUDA:', torch.cuda.is_available())
   print('GPU:', torch.cuda.get_device_name(0))
   print('cuDNN:', torch.backends.cudnn.enabled)
   print('NCCL:', torch.distributed.is_nccl_available())
   "

Submit with:

.. code-block:: bash

   sbatch pytorch_test.slurm

Running Your Own Scripts
^^^^^^^^^^^^^^^^^^^^^^^^

To run your own PyTorch scripts inside the container:

.. code-block:: bash

   pytorch_exec python3 my_training_script.py