AlphaFold3 Module

The AlphaFold3 module provides a simplified way to run AlphaFold 3 on our HPC system using an Apptainer container. This module handles all the necessary environment setup, including binding the correct database and model paths.

Loading the Module

Before running AlphaFold3, you must first load the alphafold3/3.0.1 module. This command will also automatically load its dependency, the apptainer module.

module load alphafold3/3.0.1

Running an AlphaFold3 Job

The module provides a wrapper script named af3 that simplifies the command line for running AlphaFold3. You do not need to specify any Apptainer-specific commands (:bash:apptainer exec, :bash:–bind, etc.). The af3 script handles this for you.

The basic syntax for the af3 wrapper is:

af3 <path_to_input_json> <path_to_output_directory> [additional_alphafold_args]

Example: Running the Data Pipeline

To prepare the input features for a protein, you can run the data pipeline on its own. This is useful for testing or for pre-processing multiple inputs.

# Example: Running the data pipeline
af3 my_protein.json ./alphafold3_output --run_data_pipeline=true --run_inference=false

:bash:`my_protein.json`: Your input JSON file.
:bash:`./alphafold3_output`: The directory where the results will be saved.
:bash:`--run_data_pipeline=true`: This flag tells AlphaFold3 to run the data preparation step.
:bash:`--run_inference=false`: This flag tells AlphaFold3 not to run the full inference (folding) step, which requires the models.

Example: Running the Full Pipeline To run the full end-to-end pipeline, including both data preparation and model inference (folding), you must ensure that the AlphaFold3 model weights have been downloaded to the path specified by the :bash:$AF3_MODELS environment variable.

#Example: Running the full pipeline
af3 my_protein.json ./alphafold3_output --run_data_pipeline=true --run_inference=true

Important Directories The AlphaFold3 module sets up several environment variables that point to key directories. These can be helpful for understanding where data and models are located.

$AF3_DB: Points to the directory containing the public AlphaFold3 databases.

$AF3_MODELS: Points to the directory where the AlphaFold3 model weights should be stored.

$AF3_SIF_PATH: The full path to the Apptainer container image.

You can inspect these variables at any time by running:

module show alphafold3/3.0.1

Using a GPU with SLURM

To run a full AlphaFold3 job, you must request a GPU on a suitable compute node. The af3 wrapper automatically detects and uses available GPUs within the container. To submit a job to a GPU node, you should use a SLURM batch script.

Submitting a Job to the A100 or L40 Partitions You can request a single GPU from either the a100 or l40 partitions. The :bash:–gres=gpu:1 option requests one generic GPU, and the :bash:–partition option specifies the partition you want to use.

Create a batch script (e.g., run_alphafold3.slurm) with the following content:

#!/bin/bash
#SBATCH --job-name=af3_job
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=24:00:00
#SBATCH --partition=aoraki_gpu_L40    # or 'A100'
#SBATCH --gres=gpu:1

#Load the AlphaFold3 module
module purge
module load alphafold3/3.0.1

#Run the AlphaFold3 job with the full pipeline
af3 my_protein.json ./alphafold3_output --run_data_pipeline=true --run_inference=true

echo "AlphaFold3 job finished."

Submit the script using the sbatch command:

batch run_alphafold3.slurm

bash:–job-name:: A descriptive name for your job.
bash:–nodes:: Number of nodes to request (typically 1 for a single job).
bash:–ntasks:: Number of tasks to run (1 task per job).
bash:–cpus-per-task:: Number of CPU cores to reserve. AlphaFold3 benefits from multiple cores, especially during the data pipeline stage.
bash:–mem:: Amount of memory to reserve.
bash:–time:: Maximum wall-clock time for the job.
bash:–partition:: Specifies the partition to use (e.g., l40 or a100).
bash:–gres=gpu:1:: Requests one GPU from the specified partition.