Conda

Installation

We recommend using Miniforge to manage conda environments and packages. Miniforge is a community-led, minimal conda/mamba installer that uses conda-forge as the default channel.

To install Miniforge under your user account, you can use the following commands:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh -b -u

Conda environments

Activating the base environment.

source ~/miniforge3/bin/activate

Your command prompt will then change to include “(base) “ at the start, in order to remind you that this environment is activated. You can deactivate the environment by typing:

conda deactivate
Creating and activating a sub-environment

Although once you have activated the base conda environment, you can in principle start to install packages immediately, your use of conda will generally be better organised if you do not install packages directly into the base environment, but instead use a named sub-environment. You can have multiple sub-environments under a single base environment, and activate the one that is required at any one time. Unless you install packages directly into the base environment, your sub-environments will work independently.

To create a named environment (for example, called “myenv”), ensure that the base environment is activated (the command prompt should start with “(base) “), and type:

# to create a named environment that will live in ~/.conda/envs
conda create -n myenv

# or you can create an environment in any* directory with
conda create -p /path/to/put/your/environment
It will show the proposed installation location, and once you answer the prompt to proceed, will do the installation. If you have followed these instruction, this location should be /home/users/<your_username>/miniconda3/envs/myenv. You can alternatively give it a different location using the option -p <path> instead of -n <name>.

Note do not create conda environments in subdirectories of /mnt/auto-hcs/ - conda will either fail or have it will have issues.

Once you have created your sub-environment, you can activate it using conda activate <name> for example:

conda activate myenv
The command prompt will then change (e.g. to start with “(myenv) “) to reflect this. Typing conda deactivate once will return you to the base environment; typing it a second time will deactivate conda completely (as above).
To List your conda environments type the following:
conda env list

Installing conda packages

Once you have activated a named environment, you can install packages with the conda install command, for example:

conda install gcc

You can also force particular versions to be installed. See the conda cheat sheet for details.

To list the packages installed in the currently activated environment, you can type conda list.

Running packages from your conda environment

In order to run packages from a conda environment that you installed previously, you will first need to activate the environment in the session that you are using. This means repeating some of the commands typed above. Of course, you will not need to repeat the steps to create the environment or install the software, but the following may be needed again:

source activate

conda activate myenv

Installing pip packages

Many python packages that are available via PyPI are also available as conda packages in conda-forge, and it is generally best to use these via “conda install” as above.

Nonetheless, you can also install pip packages (as opposed to conda packages) into your conda environment. However, first you should type:

conda install pip

before typing the desired commands such as

pip install numpy

If you do not install pip into your sub-environment, then either:

Your shell will fail to find the pip executable, or your shell will find pip in your base environment, which will lead to pip packages being installed into the base environment, resulting in potential interference between your conda environments Explicitly installing pip into your sub-environment will guard against this.

Using conda with SLURM

In order to use conda environments within your slurm script you need to source the conda profile script so that the conda paths get set.

source ~/miniforge3/etc/profile.d/conda.sh
export PYTHONNOUSERSITE=1 # don't add python user site library to path

conda activate myenv

Adding custom conda environments to Jupyter

On the commandline, first create a conda environment and install the packages/software you wish into it. Then add the ipykernel and register it with Juptyer.

conda create --name myCondaEnvironment

conda activate myCondaEnvironment

conda install <packages/software of interest>

conda install -c anaconda ipykernel

python -m ipykernel install --user --name=myCondaEnvironment

Then in Jupyter the custom environment can be loaded by Kernel -> Change Kernel

General Bioinformatics Tools

The following categories of bioinformatics tools are available through conda:

  • Read aligners (e.g., bwa, bowtie2)

  • Variant callers (e.g., freebayes, gatk, bcftools)

  • File format tools (e.g., samtools, vcftools)

  • GWAS tools (e.g., plink, gemma)

  • Visualization (e.g., igv, multiqc)

  • RNA-seq / transcriptomics (e.g., kallisto, salmon)

  • Assemblers (e.g., spades, megahit)

Finding Bioinformatics Tools

There are several ways to find bioinformatics tools in conda:

  1. Search online (recommended for discovery) * Use the Anaconda package search or browse specific channels:

    • You can search for tools like: * plink * bcftools * samtools

  2. Command-line search From your terminal:

    # Search all channels (if configured)
    conda search <package-name>
    
    # Example:
    conda search plink
    
    # If using Mamba (faster alternative to conda)
    mamba search plink
    

    To restrict search to a specific channel:

    conda search -c bioconda plink
    
  3. Get full list (advanced) You can list everything in a channel, but it’s very large:

    # List all bioconda packages
    conda search --channel bioconda "*" | less
    

    Tip: pipe it through grep to find specific tools:

    conda search -c bioconda "*" | grep vcftools
    

Managing Conda Environments to Conserve Home Directory Storage

To save home directory storage space, it is recommended to create Conda environments in a shared project directory. This approach allows you to manage your Conda environments within your project directory and if needed share them with collaborators. If you do not yet have a shared project directory, please contact RTIS Solutions to request one.

To create Conda environments directly within your project directory (using the --prefix option), follow the guidelines below:

Creating a Conda Environment

Run the following command to create a new environment inside your project’s shared directory:

conda create --prefix /path/to/project_directory/env python

This command is Python version agnostic. To specify a particular Python version explicitly, add python=x.y to the command.

Migrating an Existing Conda Environment

To move an existing Conda environment to a new location:

  1. Export your current environment to a YAML file:

conda env export --name existing_env > environment.yml
  1. Create a new environment from the exported YAML file at your chosen location:

conda env create --prefix /path/to/project_directory/env/conda_envs/myenv --file environment.yml
  1. Activate the newly created environment:

conda activate /path/to/project_directory/env/conda_envs/myenv

Creating an Alias for Easy Activation

To simplify environment activation, consider adding an alias to your shell configuration file (e.g., .bashrc or .bash_profile):

alias activate_myenv="conda activate /path/to/project_directory/env"

Activate your environment using the alias:

activate_myenv

This method is Python-version agnostic and provides a convenient way to manage Conda environments in shared or collaborative project directories.