The current version of RSQKit is a work in progress. Any content should not be considered final at this stage.
Skip to content Skip to footer

Your tasks: Reproducible software environments

What are reproducible software environments?

Reproducible software environments are crucial for ensuring that software behaves consistently across different systems, especially when it comes to research.

Here are some popular tools and approaches for creating reproducible software environments based on their scope and usage:

  • Programming language-specific environments - focus on managing dependencies and development/runtime environments for specific programming languages, ensuring consistent behavior of software across different systems. Extremely useful when you are developing or modifying other people’s software.
  • Containerised environments - use containers to encapsulate software and its dependencies, ensuring the software runs consistently across different systems regardless of the underlying host operating system.
    • Docker - creates lightweight, isolated containers for packaging software and its dependencies.
    • Singularity, Apptainer - focuses on containerisation for high-performance computing (HPC) and scientific computing.
    • Docker Compose - manages multi-container Docker applications, facilitating reproducible environments with multiple services.
  • System-level environments - work at the system level, ensuring that the entire system, including the operating system and application configurations, is reproducible.
    • Vagrant - creates reproducible virtualised environments using configuration scripts to define virtual machines.
    • NixOS - ensures reproducible environments with declarative package management, tracking all system dependencies and configurations - embodying the “operating system as code” philosophy which treats the entire operating system, including its configuration and infrastructure, as code that can be managed, versioned, and deployed like any other software application.
    • Packer - automates the creation of consistent machine images, supporting multiple platforms.
  • Workflow-oriented environments - geared toward creating reproducible environments for scientific research, bioinformatics, and complex workflows.
    • Workflow Description Language - a language to define reproducible research workflows, ensuring that pipelines run consistently across systems.
    • Galaxy - open-source platform for FAIR data analysis that enables users to access and collect data from reference databases, external repositories and other data sources; use tools from various domains

Code produced by researchers is sometimes not packaged in a library, package or container that you can readily run on your system. Sometimes you also may want to look at the source code and be able to make modifications. In these cases, you need to download the code and reproduce its programming language-specific environment in order to run in. In the rest of this document, we focus on the programming language-specific environments - also known as the virtual software development environments.

What are virtual software development environments?

A virtual software development environment helps us create an isolated working copy of a software project that uses a specific version of a programming language interpreter/compiler (e.g. Python 3.10 or Python 3.12) together with specific versions of a number of external libraries (dependencies) required by our software installed into that virtual environment.

Virtual environments are typically implemented as sub-directories within your software project with a particular structure (but note that some tools can place virtual environments outside your software project). They contain links to specified dependencies and allow for isolation from other software projects on your machine that may require different versions of the same programming language or external libraries.

Why should you use virtual software development environments?

Description

Software applications often rely on external libraries that you need to install and manage on your system as you develop your software. Software will sometimes need a specific version of an external library (e.g. because they were written to work with feature, class, or function that may have been updated in more recent versions), or a specific version of the program language interpreter/compiler (e.g. consider legacy software requiring Python 2 vs. new applications written in Python 3).

This means that each software application you develop on your machine may require a different setup and a set of dependencies so it is useful to be able to keep these configurations separate to avoid confusion between projects. The solution for this problem is to create a self-contained virtual environment per project, which contains a particular version of your programming language interpreter/compiler plus a number of additional external libraries.

Another big motivator for using virtual environments is that they make sharing your code with others (users or developers) much easier. By sharing a description of your virtual development environment you enable others to quickly replicate the same environment on their machines and run or further develop your software - making your work portable, reusable and more reproducible.

Considerations

  • As more external libraries are added to your software project over time, you can add them to its specific virtual environment and avoid a great deal of confusion by having separate (smaller) virtual environments for each project rather than one huge global environment on your machine with potential package version clashes.
  • You have an older project that only works under, e.g., Python 2. You do not have the time to migrate the project to Python 3 or it may not even be possible as some of the third party dependencies are not available under Python 3. You have to start another project under Python 3. The best way to do this on a single machine is to set up two separate Python virtual environments.
  • One of your Python 3 projects is locked to use a particular older version of a third party dependency. You cannot use the latest version of the dependency as it breaks things in your project. In a separate branch of your project, you want to try and fix problems introduced by the new version of the dependency without affecting the working version of your project. You need to set up a separate virtual environment for your branch to ‘isolate’ your code while testing the new feature.
  • You do not have to worry too much about specific versions of external libraries that your project depends on most of the time. Virtual environments enable you to always use the latest available version without specifying it explicitly. They also enable you to use a specific older version of a package for your project, should you need to.

Solutions

  • Make your research software reusable and your research that relies on that software reproducible by setting up and sharing its virtual development environment.

How do you create virtual software development environments?

Description

Most modern programming languages use some kind of virtual environments or a similar mechanism to isolate libraries or dependencies for a specific project, making it easier to develop, run, test and share code with others.

Part of managing a virtual software development environment involves installing, updating and removing external packages on your system. You would need a package manager tool for your programming language to be able to do that - this is typically a command line tool that you invoke from a command line terminal. In addition to a package manager, you will need another command line tool to create and manage virtual environments on your machine. Sometimes, a package manager combines both of these functionalities and you only need to install one extra tool on your system.

Considerations

  • There are often multiple package and environment management tools even for a single programming language:
    • For example, commonly used tools for managing Python packages and virtual environments are Pip (Python package manager tool which interacts and obtains the packages from the central repository called Python Package Index (PyPi)) and Venv (Python virtual environment manager tool available by default from the standard Python distribution from Python 3.3). One alternative is to use Poetry - a modern Python packaging tool which also installs Python packages from PyPI and handles virtual environments automatically. Also check UV - a single and fast Python package and project manager, built to replace Pip and Venv.
    • If your Python code relies on non-Python packages, for instance when some C++ libraries must also be installed and you want to support multiple platforms, a better choice may be Conda - a Python package and environment management system part of the Anaconda Python distribution (often used by the scientific community). Conda has its own repository system separate from (but compatible with) PyPI that distributes non-Python packages packages as well and has its own non-venv-based virtual environment system.
    • If you are using R - consider Renv that will help you build reproducible environments for your R projects
    • For Julia programming language - check Pkg.jl; for C++ - check Conan, for Java - check Maven, for Ruby - check Bundler.
    • There are some some generic tools to have a look at as well - e.g. Spack, NixOS, guix.
  • You need to decide what tools are best for you - based on your personal preferences, or what the software project and your team or community is already using (so you can get help when you need it). Not using virtual environments at all and mixing different tools to manage them could lead to a bad example of a spaghetti setup, not knowing which dependencies are being used and causing issues when running and debugging code.

Solutions

  • Decide on and start using a package manager tool and a virtual environment management tool for your programming language.
Skip tool table

Tools and resources on this page

Tool or resource Description Related pages Registry
Apptainer Apptainer (formerly Singularity) simplifies the creation and execution of containers, ensuring software components are encapsulated for portability and reproducibility, especially in High Performance Computing (HPC) environments.
Bundler Bundler is a Ruby gems and environment management tool for Ruby projects.
Conan Conan is an open source, decentralised and multi-platform package manager for C and C++ that allows for creating and sharing native binaries.
Conda Open-source, cross-platform, language-agnostic package manager and environment management system - originally developed to solve package management challenges faced by Python data scientists
Docker Docker is a tool for creating isolated environments (application isolation) for software development called containers to enable consistent software running across platforms. Docker allows developers to build, share, run and verify applications easily. DockerHub is a repository for sharing and managing container images. Continuous Integration... Creating a good README
Docker Compose Docker Compose is a tool for defining and running multi-container applications for streamlined and efficient development and deployment experience.
Galaxy Galaxy is a free, open-source system for analysing data, authoring workflows, training and education, publishing tools, and managing infrastructure.
Maven Maven is a software project management and build automation tool used primarily for Java, but can also be used to build and manage projects written in C#, Ruby, Scala, and other programming languages.
NixOS NixOS is a free and open-source Linux distribution based on the Nix package manager. It uses declarative configuration (using the Nix language) to manage packages and the entire system environment - allowing for reproducibility and portability.
Packer Packer is an automated build system to manage the creation of identical images for containers and virtual machines (multiple platforms) from a single source configuration, encapsulating 'Images as Code' philosophy. Packer is lightweight, runs on every major operating system, and is highly performant, creating machine images for multiple platforms in parallel.
Pip Package manager for Python packages
Pkg.jl Pkg.jl is a package manager for the Julia programming language.
Poetry Python packaging and dependency management tool
Python Package Index (PyPi) Official third-party software repository for Python packages Packaging & releasing ...
Renv Package that helps you create reproducible environments for your R projects - use renv to make your R projects more isolated, portable and reproducible
Singularity Singularity is an open source container platform allows us to create and run containers that package up pieces of software in a way that is portable and reproducible. Singularity is designed for ease-of-use on shared multiuser systems and in High Performance Computing (HPC) environments. Singularity is compatible with all Docker images and it can be used with GPUs and MPI applications.
UV UV is extremely fast Python package and project manager, written in Rust.
Vagrant Vagrant is a tool that simplifies and provides a single workflow for the creation and management of virtual machines, and provides full VM isolation. Continuous Integration...
Venv Python module for creating lightweight “virtual environments”, each with their own independent set of Python packages installed in their site directories
Workflow Description Language Workflow Description Language (WDL) is an open standard for describing data processing workflows with a human-readable and writeable syntax.
Contributors

How to cite this page

Aleksandra Nenadic, Simon Christ, "Reproducible software environments". everse.software. http://everse.software/RSQKit/reproducible_software_environments (accessed 27 March 2025).