Custom Conda Environments

Introduction

If our conda environments don’t meet your needs, you can consider maintaining your own miniconda or miniforge installation in your home directory.

Please BE CAREFUL when installing this environment and have a read of this page, even if you’ve worked with conda before.

By default, this will initialise conda and modify your ~/.bashrc to activate the environment by default, which can lead to unexpected behaviour in our Strudel2 apps and batch submissions.

We reccommend the Tabby QoS terminals for managing environments as both conda and mamba are memory intensive processes, furthermore, some packages such as py-xgboost-gpu require a GPU to be present during install.

Conda vs Mamba

Conda is often used as a standard for maintaining python datascience environments, but because of the complexity of the environments typically needed, the conda solver can be known to hang (for 20 minutes or longer) especially if trying to install into an existing complicated environment, only to report that it was unable to solve due to version conflicts.

Mamba is a parallelised C++ rewrite of the slowest parts of conda, though while much faster (similar environments can solve in a minute or so), it can be unstable at times leading to unexpected errors.

Since a mamba installation comes compatible with conda and acts as a dropin replacement, we recommend that you start with a mambaforge installation and then fall back on conda commands whenever mamba fails. Simply replace the mamba keyword with conda.

In this tutorial we will cover installing a mamba environment.

1. Installing miniforge

  1. Following the recommendations on mamba’s installation page, we will be using the Miniforge distribution.

  2. Following the link through to the download page, copy the link for the following distribution:

    • OS: Linux
    • Architecture: x86_64 (amd64)
  3. Using the Strudel2 terminal application or your favourite terminal application using SSOSSH, log into the cluster.

  4. Download the distribution to your home directory and run the installer

    wget <https://.../Miniforge3-Linux-x86_64.sh> # Replace this URL with the one you copied
    bash Miniforge3-Linux-x86_64.sh
  5. Following the prompts from the installer:

    1. Accept the license agreement
    2. Choose where you’d like your installation to be kept.
      (By default your home directory)

    The installer will prompt you to initialise conda. This is NOT recommended.

    This will edit your ~/.bashrc file so the environment will activate when you log in, which can lead to unexpected interactions with batch jobs on cluster like MLeRP.

    If you have done so by accident you can remove the additions to your .bashrc file manually (this is located in your home directory).

  6. You can now activate your environment using one of the following commands:

    # Absolute paths are preferred for batch jobs or config files
    source /path/to/install/...miniforge/bin/activate
    
    # Relative paths may be more convenient in everyday usage
    source ./miniforge/bin/activate 
    
    # `.` works as a shorthand for the source command
    . ./mambaforge/bin/activate 

2. Installing your packages

Both mamba and conda environments are intended to leave their base environments untouched, so when maintaining your environments it is good practice to create a new one before installing any packages.

To encourage reproducability it is recommended to use an environment.yml file to maintain your packages. If you’d like a reference, the recipes to our environments are stored in /apps/conda-envs.

We have created the custom.yml recipe for you to use as a minimal install that will be compatible with our Strudel2 apps. Alternatively, you could opt to use one of our other environments as a starting point.

  1. Copy your chosen environment file to your home directory and modify it to suit your requirements. Make sure to uncomment the pip section if you need pipto install packages.

    cp /apps/conda-envs/custom.yml ~/custom.yml
    nano ~/custom.yml
  2. Then create the environment off the file. If you didn’t change the envname in the last step, by default this will create an environment with the name ‘custom’.

    mamba env create -f ~/custom.yml
  3. You can now activate the environment we’ve created whenever you need to work with your scripts.

    # List all available environments
    conda env list
    
    # Activate a selected environment
    conda activate <environment_name>
  4. You can then install packages manually if you’re still experimenting with new packages, or update your install with the environment file.

    mamba install package_name
    pip install package_name
    mamba env update -f custom.yml

Conda’s solver can result in different package versions if installing packages in a different order. Once you have confirmed which packages you need, it is good practice to work out an environment.yml that will build your environment in one command.

Alternatively if your application requires multiple install stages, consider a script to build the environment so that your work is reproducible.

For more details about environment file syntax, have a look at conda’s documentation on managing environments.

3. Integrating with Strudel2

Our Jupyter Lab application should recognise the conda environment that has just been created and you will be able to select it from the environment dropdown. If the environment that you’re looking for is missing, check that its path is included in your ~/.conda/environments.txt file in your home directory.

If it’s missing, append the path to your environment to the bottom of the file.

echo /path/to/your/environment >> ~/.conda/environments.txt