tasks: Software project structure

Why does structuring your software project matter?

Description

The directory structure for organising your software projects (or research projects in general) involves creating a clear and logical layout for files and data, facilitating easy navigation, collaboration and reproducibility. A well-structured and consistent project organisation, aligned with common practices, is essential for making your work more accessible. This approach helps others (and yourself) quickly locate information, which is particularly valuable for long-term projects or collaborative efforts.

Considerations

Organising code or research project directories in a consistent manner is essential for several reasons:

A clear structure makes it easier for you and others to understand the project’s layout.
Contributors can quickly locate necessary resources without navigating a cluttered or chaotic directory.
Consistent naming conventions and directory structures enable efficient code sharing, review, and seamless transitions between projects that follow similar conventions.
Proper separation of code, configuration files, and data facilitates faster issue isolation and resolution.
In research projects, well-organised code, data, and documentation support easier result reproduction and validation.
Well-documented and consistent directory structures help future research build upon existing work more effectively.

What are some good practices in organising software projects into directory structure?

Typically, as you start developing software from scratch, you might put all the code and data in the project directory’s root. However, as your software project grows and you find yourself having to store other files, e.g. results, tests, auxiliary information and metadata in addition to your code and data, you may consider organising your directory differently.

Below are some established good practices for setting up and maintaining a software project’s directory structure.

Directory structure

Put all files related to the project into a single directory and choose a meaningful name for it that reflects the project’s purpose or topic.

Top-level directory of the project should contain various auxiliary information and metadata about your software, making it easy for others to find out what it does and how to reuse it. This information is especially important as you start sharing your work with others - e.g. as a repository on code sharing platforms such as GitHub or GitLab. For example:

README file to describe the project and instructions on installing and running the code and reproducing the results - see more on creating good README files.
LICENSE file to describe the how others can reuse your software or work - see more on licensing software.
CITATION.cff file to describe how to cite the project - see more on citing software.
codemeta.json file (or similar metadata standard) to describe your software’s metadata - see more on software metadata.

Organise the rest of the software project’s file into sub-directories clearly labelled based on the type of their content. For example:

code (or scripts or src) directory for storing your source code.
data folder to store your data. Further organise raw, cleaned, intermediate, and/or processed data in separate subdirectories (e.g. data/raw, data/clean, data/processed) to maintain clarity and prevent overwriting or losing the original raw data.
results folder for storing analysis outputs, summary statistics, or any data generated after processing.
doc folder for storing various software documentation and guides.
figures (or fig) - for storing all visualisations like charts, graphs, and figures generated from the code/analysis (alternatively, these can go in the results directory).
papers or presentations or references folders for research papers, articles, or any other literature cited or referenced in the research project. These could go into separate projects so you do not mix them with your software - especially if you are developing a software package for use by others and research papers are not relevant in that context.
If specific subdirectories in your project require distinct descriptions or licenses (for example, the license for your code may differ from that of your data), include separate README or LICENSE files in those subdirectories to apply to the files within them.

Naming conventions

In addition to using standard and self-explanatory directory names mentioned above, follow other naming conventions for files and directories, for example:

Avoid special characters or spaces (they can cause errors when read by computers); use underscores (_) or hyphens (-) to separate words instead and be consistent.
Name files to reflect their contents, version, or date (or, even better, use version control to track different versions).

Use version control

Put the whole software project under version control and in its own repository:

At the very least, code (and data) sub-directories should be version controlled; you can also version control documentation, manuscripts, results, etc. - i.e. anything that is written manually and not generated automatically
If data files are too large (or contain sensitive information) to track by version control and expose in public repositories, you should untrack them (e.g. using .gitignore file in Git). The same goes if you are storing passwords in files - they should not be version controlled.
Use tags or releases afforded to us by code sharing platforms to mark specific versions of results (a version submitted to a journal, dissertation version, poster version, etc.) so as to avoid using version numbers in file names and proliferation of different files.

Example project structures

Below is an example of a directory structure for a generic research project that contains code and data. This is not the only way to organise your project folder (there is no official or standard way and you may not have all the same subfolders in your project), but it is a good practice to follow that will be understandable by many others.

project_name/
├── README                # overview of the project
├── LICENSE               # license (reuse terms) for the project as a whole
├── CITATION.cff          # citation information for the project
├── data/                 # data files used in the project
│   ├── README            # describe the origin of your data
│   ├── raw/              # store your raw data and do not modify it
│   └── processed/        # store cleaned/processed/modified data separately 
├── manuscript/           # manuscript describing the results
├── results/              # results of the analysis (data, tables)  
│   ├── preliminary/
│   └── final/
├── figures/              # results of the analysis (figures)
│   ├── comparison_plot.png
│   └── regression_chart.pdf
├── src/                  # contains source code for the project
│   ├── LICENSE           # license that just applies to the code
│   ├── requirements.txt  # software requirements and dependencies
│   ├── main_script.py    # main script/code entry point
│   └── ...
├── doc/                  # documentation for your software
│   ├── index.html            # entry point into the documentation website    
│   └── ...
└── ...

For research projects that contain Python code - check out the following two tools:

FAIR Python Cookiecutter is a command-line template tool that can help you set up a Python software project skeleton that uses modern state-of-the-art development tools and helps you follow best practices for code and metadata quality. It generates project directories from predefined templates and ensures consistency by scaffolding a standardised project layout, including folders like src/ for code, tests/ for tests (shown below), and configuration files such as pyproject.toml, README, and .gitignore. This is especially helpful for teams or when starting new software projects, as it reduces setup time and enforces best practices.
Poetry is a dependency and packaging manager that simplifies Python project management. It uses a single pyproject.toml file to declare dependencies, manage virtual environments, and handle versioning and publishing. In addition, Poetry will also create a new directory with the necessary structure for you if you are starting a new Python project from scratch. Poetry itself does not enforce a specific directory structure, but it encourages and works well with the “src layout” (shown below), which is a widely adopted best practice in modern Python development (as does FAIR Python Cookiecutter).

project_name/
├── pyproject.toml         # Poetry config: dependencies, metadata, build system
├── README.md              # Project description
├── src/                   # Source code root (recommended layout)
│   └── your_package/      # Actual Python package/module
│       ├── __init__.py
│       └── ...
├── tests/                 # Unit and integration tests
│   └── test_something.py
└── .gitignore

For best practices and guidance for designing research projects in particular focused on data - check out the Turing Way Project’s Guide for project design.

tasks

Citing software

How can people cite your software?

tasks

Creating a good README

How to create a good README document for your software project.

tasks

Software metadata

How to describe your software using metadata?

Training

EVERSE TeSS search results:

Tools and resources on this page

Tool or resource	Description	Related pages
GitHub	GitHub is a platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control. GitHub provides access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project.	Research Software Stor... APICURON - The platfor... DOME Registry Research Software Stor... Research Software Stor... Archiving software Citing software Performing a code review Computational workflows Credit and recognition... Documenting code Documenting software p... Documenting software u... Adopting FAIR research... Using organisational G... Packaging software Releasing software Using version control
GitLab View on TechRadar	DevOps platform that enables teams to collaborate, plan, develop, test, and deploy software using an integrated toolset for version control, CI/CD, and project management.	Research Software Stor... Phoenix2 Archiving software Performing a code review Computational workflows Documenting code Documenting software p... Documenting software u... Adopting FAIR research... Using organisational G... Packaging software Releasing software Using version control
Poetry View on TechRadar	Python packaging and dependency management tool	Reproducible software ... Maintaining research s...