What is software metadata?
Software metadata is structured data that provides information about a software application, its components, and its behavior. It describes various attributes of the software, including:
- Name & version: identifies the software and its release version.
- Authors: details about the developers or organisations that created the software.
- Dependencies: lists required libraries, frameworks, or other software needed for proper functioning.
- License: specifies the software license (e.g., MIT, BSD, GPL, proprietary).
- Build & runtime information: includes details like operating system compatibility, architecture (e.g., 32-bit or 64-bit), and runtime requirements.
- Support and maintenance information: contact details to get support and questions answered.
Considerations
Providing metadata with your software is important because it provides the crucial context and (typically machine readable) information about your software and its components, enhancing its discoverability, reusability and interoperability with other tools.
The type of metadata you need from software depends on your specific use case.
- If your main focus is academic credit for software, citation metadata is most important.
- If you’re aiming to replicate an analysis, versioning and dependencies matter more than authorship or titles.
- If you’re searching for new software suited to a particular task, keywords and descriptions are most relevant.
Often, developers of scientific software, repositories that host it, and users have multiple objectives—sometimes balancing several of these needs at once.
Solutions
There are various kinds of software metadata standards with slightly different purposes and use cases, some of which are listed below.
General software metadata standards, for example:
- CodeMeta – a standardised JSON-LD metadata format for describing software projects to support the preservation, discovery, reuse, and attribution of research software.
- SPDX – used for documenting software licenses, components, and security information.
- Dublin Core – a general purpose metadata standard and vocabulary for describing resources of any type (first developed for describing web content in the early days of the World Wide Web), now often used for software documentation.
- Schema.org – promotes schemas for structured data on the Internet including software applications and digital assets, e.g. BioSchemas Computational Tool profile enables you to describe how to run software, including its input and output parameters.
Software development, build, package and dependency metadata helps developers track software versions, dependencies, and compatibility, making development, running and maintenance easier. For example:
- PyPI metadata (Python) -
setup.py
andpyproject.toml
define package metadata for Python packages and projects. pom.xml
(Maven – Java) - defines project dependencies, build configurations, and plugins in Java projects.package.json
(Node.js / npm) - manages dependencies, scripts, and metadata for JavaScript projects.- interoperability & integration metadata (e.g. BioSchemas Computational Tool profile) - facilitates communication between different software components, ensuring they work together without conflicts.
Software container & deployment metadata helps automate builds, testing, and deployment by providing necessary configuration details. For example:
- SBOM (Software Bill of Materials) – a comprehensive list of all components and dependencies in a software product.
- Open Container Initiative (OCI) Image Specification – standard metadata format for container images, including layers, dependencies, and authorship.
- Dockerfile - defines base images, environment variables, and configurations for containerised applications.
- Kubernetes metadata - provides metadata for managing deployments, services, and pods in Kubernetes clusters.
Using CodeMeta to describe software
CodeMeta is a community-developed metadata standard designed to describe and exchange metadata about research software projects in a structured way.
It provides a machine-readable JSON-LD format (in the form of codemeta.json
file attached to your software project)
for storing metadata about software, including authorship, licensing, dependencies, versioning, and more.
It consists of a set of properties that extend Schema.org (a popular metadata vocabulary designed to describe Digital Objects on the Web)
with software-specific metadata (e.g. maintainer, build instructions, software documentation, etc.).
It was created to standardise metadata across different repositories and programming ecosystems, making it easier to share, discover, and cite software. See the CodeMeta terms to understand which terms are used to describe software.
Who uses CodeMeta?
- GitHub & GitLab code repositories support it to help document software for better discoverability.
- Researchers use it to cite research software in academic papers.
- Software repositories & archives like Zenodo, FigShare, InvenioRDM and Software Heritage, as well as many institutional repositories use it as a standardised metadata format across platforms.
- FAIR data initiatives support the use of CodeMeta format to help with findability.
How can you use CodeMeta?
You can use the CodeMeta terms to create a codemeta.json
file for your software projects and share it
in the root of the source code repository (e.g. on GitHub & GitLab) along with your code.
You can create the codemeta.json
file:
- by using CodeMeta Generator, an online form-based service to help you describe valid CodeMeta records.
- by using SOMEF command line tool and using the
-c
flag to export the CodeMeta file generated from your README file and available documentation. Alternatively, SOMEF Vider will allow you to download auto-generated CodeMeta files (remember to double check the results). - manually, e.g. by using the CodeMeta template as a reference. JSON-LD files can be validated with services like JSON-LD validator.
Using CodeMeta file to describe your software will propagate between different archival infrastructures, platforms and
services which understand CodeMeta descriptions and can ingest existing codemeta.json
files automatically (Zenodo, FigShare, InvenioRDM and Software Heritage).
This means you will not have to duplicate the work when using such services - e.g. when obtaining a DOI for your software,
if you have codemeta.json
file already you will not have to fill in the corresponding software metadata again.
Related pages
Skip tool tableTools and resources on this page
Tool or resource | Description | Related pages | Registry |
---|---|---|---|
CodeMeta | CodeMeta is a community standard and initiative focused on creating a minimal metadata schema for scientific software and code, promoting their findability, preservation, and reuse through machine-readable metadata in JSON-LD format. | Software identifiers | |
CodeMeta Generator | A free, open-source project that creates a minimal metadata schema for research software and code | ||
FigShare | Figshare is a provider of open research repository infrastructure for sharing, showcasing and managing all research outputs in a discoverable, citable, reportable and transparent way. | ||
InvenioRDM | A turn-key research data management (RDM) repository based on Invenio Framework and Zenodo | ||
JSON-LD validator | Service to validate JSON-LD files | ||
Software Heritage | Collects, preserves, curates and makes available software in source code form as cultural heritage | ||
SOMEF | Software Metadata Extraction Framework (SOMEF) is a command line tool for automatically extracting relevant software information from README files | Creating a good README | |
SOMEF Vider | A service running SOMEF to obtain CodeMeta files | ||
SPDX | System Package Data Exchange (SPDX) is an open standard for representing systems with software components as SBOMs (Software Bill of Materials) and other AI, data and security references supporting a range of risk management use cases. | Licensing software | |
Zenodo | Zenodo is a general-purpose open repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit research papers, data sets, research software, reports, and other research-related digital artefacts. | Documenting software Releasing software Software documentation Software identifiers |
How to cite this page
Daniel Garijo, Aleksandra Nenadic, "Software metadata". everse.software. http://everse.software/RSQKit/software_metadata .