Skip to content Skip to footer

tasks: Archiving software

How to ensuring long-term reproducibility and access to research software?

In research domains that rely heavily on computation, software is not just a tool — it is an integral part of the scientific process. However, software is inherently fragile: it evolves rapidly, becomes deprecated, and often depends on specific environments, libraries, or hardware. As a result, many research outputs become irreproducible or unusable within just a few years due to the lack of access to the original software environment. Without systematic software archiving, research risks losing critical components of its provenance, making long-term validation, replication, and reuse of results impossible.

Putting code on GitHub or GitLab (or any similar code hosting service) is good practice for code sharing, versioning and even code packaging, it is not enough for long-term software archiving. This is because these are commercial services - if they change their policies, remove repositories (e.g. for inactivity, or security reasons), or even shut down (which has happened to code sharing platforms in the past), your code could disappear.

Archival means long-term preservation independent of any one platform.

Benefits for research communities

Implementing proper software archiving practices brings significant value:

  • Reproducibility - future researchers can rerun computational experiments with the exact same software stack.
  • Preservation of scientific value - the loss of valuable tools, simulations or models that underpin published work is prevented.
  • Compliance with Open Science mandates - meeting funder and journal requirements for software availability and preservation.
  • Collaboration and reuse - archived software can be rediscovered, cited, and reused by other researchers, accelerating innovation.

Considerations

Effective software archiving in research is more complex than simply saving source code. It requires addressing multiple interrelated technical aspects:

  • Environment preservation - dependencies on compilers, libraries (e.g., NumPy, R packages), OS-level features, and system architectures must be captured.
  • Build reproducibility - binary reproducibility is often non-trivial due to non-deterministic build processes or missing historical dependencies.
  • Versioning and provenance - capturing software version history, commit hashes, and linkages to specific datasets or publications is essential.
  • Emulation and virtualisation - for legacy software, virtual machines or emulators may be necessary to recreate the execution environment.
  • Licensing constraints - proprietary software dependencies can limit what can legally be archived and shared.
  • Metadata and documentation - proper archival demands machine- and human-readable metadata, including usage instructions, authorship, and configuration settings.

Archival solutions for research software

Several archival solutions for research software are emerging:

  • Software Heritage can provide an universal archive of source code, capturing the development history of open-source software at scale.
  • ReproZip captures the execution environment of research software, enabling portability and reproducibility across platforms.
  • Guix / NixOS are functional package managers that enable reproducible builds and isolated software environments.
  • Containers (e.g., Docker, Singularity) are popular tools for bundling applications with dependencies, especially in high-performance computing.
  • VM snapshots are used when containerisation is not feasible, particularly for GUI-based or legacy software.
  • Institutional repositories and Zenodo provide DOI-backed software archiving linked to publications, ensuring persistent citation and access.
  • RO-Crate has an honourable mention here, while it is not an archival mechanism it is a critical metadata format that ensures items (e.g., workflows) that are archived are described, understandable and reusable.

Conclusion

Software archiving is now a foundational component of digital research infrastructure. As the scientific community moves toward open, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable) principles, robust software preservation practices are essential. Researchers must adopt workflows and tools that not only produce results but also ensure those results can be trusted and reused decades from now.

Related pages

More information

EVERSE TeSS search results:
Skip tool table

Tools and resources on this page

Tool or resource Description Related pages
Docker Docker is a tool for creating isolated environments (application isolation) for software development called containers to enable consistent software running across platforms. Docker allows developers to build, share, run and verify applications easily. DockerHub is a repository for sharing and managing container images. Continuous Integration... Creating a good README Reproducible software ...
GitHub GitHub is a platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control. GitHub provides access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project. Performing a code review Computational workflows Documenting software Documenting software u... Releasing software Software documentation Using version control
GitLab DevOps platform that enables teams to collaborate, plan, develop, test, and deploy software using an integrated toolset for version control, CI/CD, and project management. Performing a code review Computational workflows Documenting software Documenting software u... Packaging & releasing ... Releasing software Software documentation Using version control
Guix Package manager.
NixOS NixOS is a free and open-source Linux distribution based on the Nix package manager. It uses declarative configuration (using the Nix language) to manage packages and the entire system environment - allowing for reproducibility and portability. Reproducible software ...
ReproZip ReproZip automatically packs research along with all necessary data files, libraries, environment variables and options into a self-contained bundle which can be used to set up the same original environment so anybody can reproduce the research on a different machine, without tracking down and installing the dependencies, or even having to run the same operating system.
Singularity Singularity is an open source container platform allows us to create and run containers that package up pieces of software in a way that is portable and reproducible. Singularity is designed for ease-of-use on shared multiuser systems and in High Performance Computing (HPC) environments. Singularity is compatible with all Docker images and it can be used with GPUs and MPI applications. Reproducible software ...
Software Heritage Software Heritage archive is the largest public collection of source code in existence. It Collects, preserves, curates and makes available software in source code form as cultural heritage Software metadata
Zenodo Zenodo is a general-purpose open repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit research papers, data sets, research software, reports, and other research-related digital artefacts. Documenting software Releasing software Software documentation Software identifiers Software metadata
Contributors

How to cite this page

Aleksandra Nenadic, "Archiving software". everse.software. http://everse.software/RSQKit/archiving_software .