AlphaFold3 Module
The AlphaFold3 module provides a simplified way to run AlphaFold 3 on our HPC system using an Apptainer container. This module handles all the necessary environment setup, including binding the correct database and model paths.
Loading the Module
Before running AlphaFold3, you must first load the alphafold3/3.0.1 module. This command will also automatically load its dependency, the apptainer module.
module load alphafold3/3.0.1
Running an AlphaFold3 Job
The module provides a wrapper script named af3 that simplifies the command line for running AlphaFold3. You do not need to specify any Apptainer-specific commands (:bash:apptainer exec, :bash:–bind, etc.). The af3 script handles this for you.
The basic syntax for the af3 wrapper is:
af3 <path_to_input_json> <path_to_output_directory> [additional_alphafold_args]
Example: Running the Data Pipeline
To prepare the input features for a protein, you can run the data pipeline on its own. This is useful for testing or for pre-processing multiple inputs.
# Example: Running the data pipeline
af3 my_protein.json ./alphafold3_output --run_data_pipeline=true --run_inference=false
:bash:`my_protein.json`: Your input JSON file.
:bash:`./alphafold3_output`: The directory where the results will be saved.
:bash:`--run_data_pipeline=true`: This flag tells AlphaFold3 to run the data preparation step.
:bash:`--run_inference=false`: This flag tells AlphaFold3 not to run the full inference (folding) step, which requires the models.
Example: Running the Full Pipeline To run the full end-to-end pipeline, including both data preparation and model inference (folding), you must ensure that the AlphaFold3 model weights have been downloaded to the path specified by the :bash:$AF3_MODELS environment variable.
#Example: Running the full pipeline
af3 my_protein.json ./alphafold3_output --run_data_pipeline=true --run_inference=true
Important Directories The AlphaFold3 module sets up several environment variables that point to key directories. These can be helpful for understanding where data and models are located.
$AF3_DB: Points to the directory containing the public AlphaFold3 databases.
$AF3_MODELS: Points to the directory where the AlphaFold3 model weights should be stored.
$AF3_SIF_PATH: The full path to the Apptainer container image.
You can inspect these variables at any time by running:
module show alphafold3/3.0.1
Using a GPU with SLURM
To run a full AlphaFold3 job, you must request a GPU on a suitable compute node. The af3 wrapper automatically detects and uses available GPUs within the container. To submit a job to a GPU node, you should use a SLURM batch script.
Submitting a Job to the A100 or L40 Partitions You can request a single GPU from either the a100 or l40 partitions. The :bash:–gres=gpu:1 option requests one generic GPU, and the :bash:–partition option specifies the partition you want to use.
Create a batch script (e.g., run_alphafold3.slurm) with the following content:
#!/bin/bash
#SBATCH --job-name=af3_job
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=24:00:00
#SBATCH --partition=aoraki_gpu_L40 # or 'A100'
#SBATCH --gres=gpu:1
#Load the AlphaFold3 module
module purge
module load alphafold3/3.0.1
#Run the AlphaFold3 job with the full pipeline
af3 my_protein.json ./alphafold3_output --run_data_pipeline=true --run_inference=true
echo "AlphaFold3 job finished."
Submit the script using the sbatch command:
batch run_alphafold3.slurm
- bash:–job-name:
A descriptive name for your job.
- bash:–nodes:
Number of nodes to request (typically 1 for a single job).
- bash:–ntasks:
Number of tasks to run (1 task per job).
- bash:–cpus-per-task:
Number of CPU cores to reserve. AlphaFold3 benefits from multiple cores, especially during the data pipeline stage.
- bash:–mem:
Amount of memory to reserve.
- bash:–time:
Maximum wall-clock time for the job.
- bash:–partition:
Specifies the partition to use (e.g., l40 or a100).
- bash:–gres=gpu:1:
Requests one GPU from the specified partition.