Conda is a package and environment management system, it helps you build robust and self-contained environments using pre-compiled packages. Conda has the following features:

  • Conda can quickly install, run, and update packages and associated dependencies.
  • Conda can create, save, load, and switch between project specific software environments.
  • Although Conda is mainly used for Python-based software, Conda can package and distribute software for any language such as C, C++, FORTRAN, R, Java, etc.

Conda as a package manager helps you find and install packages. If you need a package that requires a different version of Python (for instance, but can be any other package), you do not need to switch to a different environment manager, because Conda is also an environment manager. With just a few commands, you can set up a totally separate environment to run that different version of Python, while continuing to run your usual version of Python in your normal environment.

The packages can come from different sources, called channels. The most popular of the these channels is Anaconda (which is often confused with Conda) but in the scientific community conda-forge is more common, where you can find thousands of packages. Advanced users can also create their own channels (on a server or using the local filesystem).

Conda installation

Conda is made available at ECWMF through the module system, so you don't need to install it to use it on HPCF.

However, you may also want to install it on your work/personal computer to create and organise your environments. The simplest way is to download the Miniconda distribution:

curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh # Linux
curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh # MacOS M1
curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh # MacOs Intel

Or even on Windows if you're feeling wild: Miniconda3-latest-Windows-x86_64.exe

Just run the script to install Conda. It will update your shell config so you'll need to reload your shell session to access it. For linux (similar approach on MacOS):

bash /Miniconda3-latest-Linux-x86_64.sh. # see --help for options such as -p PREFIX
source ~/.bashrc

If that works, you'll see a (base) at the start of your prompt, which tells us we are in the "base" conda environment.

Creating environments

There are multiple ways of creating environments: from a file containing the list of packages and their versions, by providing the list on the command line, by creating an empty environment and adding the packages one by one, etc.

Let's start with a basic environment containing Python and Numpy:

conda create -n myfirstenv python numpy

You can then load it using:

conda activate myfirstenv

The environment will be stored next to where Conda is installed (or by default in a .conda directory if you can't write to the Conda directory). To install the environment in a dedicated path, you can use the following:

conda create -p /path/to/envs/myfirstenv python numpy

and load it by specifying the full path:

conda activate /path/to/envs/myfirstenv

Once the environment is loaded, you can install more packages:

conda install xarray pandas matplotlib

To list the packages in your environment:

conda list

Once you obtain a stable environment, you can export it to a yaml file:

conda env export > myfirstenv.yaml

You can then use this file to reproduce exactly the environment, share it with other users, use it as a base for other environments, etc. To create a new environment from a yaml file, run the following command:

conda env create -n mysecondenv -f myfirstenv.yaml

To unload your environment, just run:

conda deactivate

To remove your environment:

conda env remove -n myfirstenv

Finally, to see the list of available environments:

conda env list

See Conda documentation and tutorials for more ways to create and manage environments.

Installing packages

Going back to our first environment with only python and numpy installed. When you install the following packages:

conda install xarray pandas matplotlib

you will see that Conda will also install all the packages dependencies. It will automatically solve the conflicts between dependencies. This is why if you add another package to the list above, you may end up with a completely different environment, as the common dependencies between the two list of packages may be different.

You can also use specific version (Conda will try to get the latest by default):

conda install xarray pandas=1.3 matplotlib=3.4

This is perfectly valid but this will make Conda's life more difficult as it will have to solve a more constrained dependencies problem. You will see that constraining too much the environment will quickly give you conflicts, so only specify the versions you really need.

To remove a package, simply use:

conda remove xarray pandas matplotlib

This will also remove all the dependencies.

To search for a package:

conda search xarray

For get more information a package (dependencies, timestamp, etc.):

conda search xarray --info

This last command can be very useful to debug dependencies conflicts.

Specifying channels

When installing packages using the commands above, you will use the default channels (Anaconda if you downloaded it using Miniconda or ECMWF channels on ECMWF HPC). On your personal/work machine (not on the HPC), it is recommended to switch to conda-forge (https://conda-forge.org/docs/user/introduction.html), which contains many scientific packages and is more suited to our needs.

This can be done at the command line level by specifying the channel by hand:

conda install xarray pandas matplotlib -c conda-forge

Or add it permanently to your list of channels using:

conda config --add channels conda-forge

It is recommended to stick to one channel to avoid conflicts. Only use multiple channels if you know they are compatible with each other.

On the ECMWF HPC infrastructure, we use dedicated default channels, which are based on conda-forge. It is recommended to stick with those channels. More information, see Conda at ECMWF.

Conda and pip

Some packages are not available on Conda but can be accessible through pip. Conda and pip are compatible and users are free to activate their Conda environments and install packages through pip as well. Users should be cautious by doing so as it could lead to dependencies issues. It is always best to try to install packages through Conda whenever possible. Users can also use pip to install their own packages, rather than creating a Conda recipe, which requires more work. 

The best practice is to first install all the base packages using Conda and then install on top of that the remaining packages using pip.


Useful links

Introduction

https://towardsdatascience.com/introduction-to-conda-virtual-environments-eaea4ac84e28

https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html

https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

Cheat sheet

https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf

Building your own recipe

Conda: how to create recipes and channels

https://docs.conda.io/projects/conda-build/en/latest/concepts/recipe.html#more-information

https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html

Conda config

https://docs.conda.io/projects/conda/en/latest/commands/config.html

https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html