AlphaFold3 Module ------------ The AlphaFold3 module provides a simplified way to run AlphaFold 3 on our HPC system using an Apptainer container. This module handles all the necessary environment setup, including binding the correct database and model paths. Loading the Module ^^^^^^^^^^^^^^^^^^ Before running AlphaFold3, you must first load the alphafold3/3.0.1 module. This command will also automatically load its dependency, the apptainer module. .. code-block:: bash module load alphafold3/3.0.1 Running an AlphaFold3 Job ^^^^^^^^^^^^^^^^^^^^^^^^^ The module provides a wrapper script named af3 that simplifies the command line for running AlphaFold3. You do not need to specify any Apptainer-specific commands (:bash:apptainer exec, :bash:--bind, etc.). The af3 script handles this for you. The basic syntax for the af3 wrapper is: .. code-block:: bash af3 [additional_alphafold_args] Example: Running the Data Pipeline To prepare the input features for a protein, you can run the data pipeline on its own. This is useful for testing or for pre-processing multiple inputs. .. code-block:: bash # Example: Running the data pipeline af3 my_protein.json ./alphafold3_output --run_data_pipeline=true --run_inference=false - :bash:`my_protein.json`: Your input JSON file. - :bash:`./alphafold3_output`: The directory where the results will be saved. - :bash:`--run_data_pipeline=true`: This flag tells AlphaFold3 to run the data preparation step. - :bash:`--run_inference=false`: This flag tells AlphaFold3 **not** to run the full inference (folding) step, which requires the models. Example: Running the Full Pipeline To run the full end-to-end pipeline, including both data preparation and model inference (folding), you must ensure that the AlphaFold3 model weights have been downloaded to the path specified by the :bash:$AF3_MODELS environment variable. .. code-block:: bash #Example: Running the full pipeline af3 my_protein.json ./alphafold3_output --run_data_pipeline=true --run_inference=true Important Directories The AlphaFold3 module sets up several environment variables that point to key directories. These can be helpful for understanding where data and models are located. **$AF3_DB:** Points to the directory containing the public AlphaFold3 databases. **$AF3_MODELS:** Points to the directory where the AlphaFold3 model weights should be stored. **$AF3_SIF_PATH:** The full path to the Apptainer container image. You can inspect these variables at any time by running: .. code-block:: bash module show alphafold3/3.0.1 Using a GPU with SLURM ^^^^^^^^^^^^^^^^^^^^^^ To run a full AlphaFold3 job, you must request a GPU on a suitable compute node. The af3 wrapper automatically detects and uses available GPUs within the container. To submit a job to a GPU node, you should use a SLURM batch script. Submitting a Job to the A100 or L40 Partitions You can request a single GPU from either the a100 or l40 partitions. The :bash:--gres=gpu:1 option requests one generic GPU, and the :bash:--partition option specifies the partition you want to use. Create a batch script (e.g., run_alphafold3.slurm) with the following content: .. code-block:: bash #!/bin/bash #SBATCH --job-name=af3_job #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8 #SBATCH --mem=32G #SBATCH --time=24:00:00 #SBATCH --partition=aoraki_gpu_L40 # or 'A100' #SBATCH --gres=gpu:1 #Load the AlphaFold3 module module purge module load alphafold3/3.0.1 #Run the AlphaFold3 job with the full pipeline af3 my_protein.json ./alphafold3_output --run_data_pipeline=true --run_inference=true echo "AlphaFold3 job finished." Submit the script using the sbatch command: .. code-block:: bash batch run_alphafold3.slurm :bash:--job-name: A descriptive name for your job. :bash:--nodes: Number of nodes to request (typically 1 for a single job). :bash:--ntasks: Number of tasks to run (1 task per job). :bash:--cpus-per-task: Number of CPU cores to reserve. AlphaFold3 benefits from multiple cores, especially during the data pipeline stage. :bash:--mem: Amount of memory to reserve. :bash:--time: Maximum wall-clock time for the job. :bash:--partition: Specifies the partition to use (e.g., l40 or a100). :bash:--gres=gpu:1: Requests one GPU from the specified partition.