Last updated on 2025-02-04 | Edit this page

Overview

Questions

What is a Software Management Plan (SMP)?
Why is an SMP important?
How is an SMP useful?

Objectives

Understand the role of a Software Management Plan (SMP).
Understand that the stage and scope of your software can determine that some parts of the SMP are not relevant (yet).
Understand that no matter the scope of your software, an SMP is always relevant.

Introduction

A Software Management Plan (SMP) is a formal document explaining how software is written and managed both during and after a research project. It is a living document and will evolve with the boundary conditions of your project and software. While it is encouraged to write an SMP before starting to develop code, it is never too late to create one for existing projects.

Importance

SMPs provide value both inside and outside your organization. The things that you and your organization might find valuable in an SMP are: - Writing the plan encourages you to think about the roles and responsibilities within the project, thus defining tasks and responsibilities early on. - Once a plan has been filled out, it can also be used to give guidance for new team members, thus reducing the time needed for onboarding. - Writing an SMP will guide you through the best practices that you can apply to your software based on the size and scope of your project. Following the best practices outlined in an SMP will make it easier for others to use or cite your software.

Outside your organization, research funders specifically have become more aware of SMPs and are starting to require them because of all of the above reasons. From the funder perspective, a well thought out SMP also demonstrates the feasibility and reliability of the project. It is thus a good idea to prepare a well written SMP when submitting a funding proposal.

Key Points

A SMP is valuable in any stage of your project
- It outlines how the software supports the vision of your project
- It encourages you to follow best practices based on the scope of your project

Content from Generating a Repository using a Template

Last updated on 2025-04-14 | Edit this page

Overview

Questions

What is the fastest way to set up a new Python repository that suits your needs?
How do you make sure this repository follows best practices?

Objectives

Explain why you should use a template to start a new repository
Demonstrate how to create a repository using the NLeSC Python Template
Demonstrate how to use the upgrade feature of [copier] to extend your repository with extra features

Why use a template?

No matter how (in)experienced a programmer you are, using a template when starting a new project is always a good idea. Templates let you easily make use of the latest and greatest in best practices, without having to reinvent the wheel for every project.

As someone who is new to programming, employing a template can help you to setup your first projects and help you adhere to coding standards such that your software can be understood by others. In general employing templates saves time and helps to prevent confusion when copy pasting from old repositories. E.g. parameters and specific configurations are easily overlooked and not being edited.

In both cases, using a trusted template will save you time, worries and make sure your new projects are never outdated from the start.

A popular tool for using and applying templates is the Python tool [copier]. Using copier, you can create a project based on a template made by yourself or someone else. Templates can be configured to ask you some questions for details that copier will then fill in throughout the files to personalize it to suit your needs.

The recommended way to install copier as a local tool is using pipx:

BASH

python3 -m pip install --user pipx
python3 -m pipx --ensurepath
pipx install copier

Now you can use copier as a standalone tool to create new projects.

Challenge 1: Use the NLeSC Python Template

Create a new project based on the Python template by the Netherlands eScience Center:

Show me the solution

BASH

copier copy gh:nlesc/python-template path/for/your/project

Challenge 1: Use the NLeSC Python Template (continued)

Callout

Copier can create a project from a local folder and from a remote git repository URL such as https://github.com/nlesc/python-template.git. The following shortcut URLs are also supported:

GitHub: gh:namespace/project
GitLab: gl:namespace/project

Notice how you are asked to answer a lot of basic information that you already filled in for the SMP in the previous episode.

Challenge 2: Reuse machine readable SMP tool output

The SMP tool provides a machine readable yaml file with all relevant answers for creating a new project using this template. First remove the previous project to make sure everything stays clean:

BASH

rm -rf path/to/destination

Show me the solution

BASH

copier copy --answers-file smp_answers.yaml gh:nlesc/python-template path/for/your/project

Challenge 1: Use the NLeSC Python Template (continued)

Note how you were asked a lot fewer questions this time! Part of this is simply filling in the information you already entered in the SMP. For the other questions you weren’t asked, the SMP tool has already suggested which set of optional features best suit your project.

Due to the limitation how copier handles the answer files, you cannot review choices you made in the SMP. In the next section, you will see how you can add or change features of your code repository.

Challenge 3: Add more features to your project

Move to your project folder and use copier’s update functionality to change your email address. - What do you have to do to your project before you can use copier update? - Why is this useful/needed? - Which questions are you asked this time? - How are the changes applied and how can you manage which ones to accept?

Show me the solution

As mentioned, you must first move to your project folder before copier update can work.

BASH

cd path/for/your/project

If you then immediately try to update your project, you will run into an error:

BASH

$ copier update

Updating is only supported in git-tracked subprojects.

So, what you must do is place your project under version control using git. You can do so with the following commands, assuming you already have your name and email configured.

BASH

git init
git add .  # add everthing for the first commit
git commit -m "Initial commit"

When updating, copier will overwrite any of your local files with new information. With your project under version control, you can safely update your project and easily ignore any updates by restoring the old contents of your files.

BASH

copier update

Note that you are asked the full set of questions again, but all your previous answers are already filled in.
After completing the questions, you can use git status to see which files have been changed, and git diff to inspect the exact changes.

Key Points

Use a template to implement best practices for you from the start
Create a new repository using copier copy gh:nlesc/python-template path/for/your/project
Re-use the information from your SMP with the additional --answers-file smp_answers.yaml argument
Change answers or extra features to your project using copier update

Content from Intermediate software testing

Last updated on 2024-12-04 | Edit this page

Overview

Questions

How can I make changes to my research code while being sure existing functionality still works?
How can I execute the same test with multiple parameters?
What is code coverage and how can it help me verify the functionality of my code?
How do I create independent testing for my code without having to instantiate all the software?
How can I prevent dealing with external system during my tests?
How can I check if a given change improves the performance of my code?
How do I make sure that the application I have deployed for another party still works?
How can I make sure my programs stops when impossible cases are found?

Objectives

Use pytest to write tests.
Use parameterized tests.
Use code coverage to have an idea of the confidence the system still works when changing the code.
Use mocks to mock out complex paths of code.
Use stubs to stub out complex paths of code.
Use performance testing tool to see if the speed of the code confirms to our demands.
Use smoke tests to do a quick check if the application should still run.
Use runtime testing to prevent weird cases.

Introduction

In this episode we are going to take a look at a few different types of automated testing. We will also see how we can use code coverage the increase our confidence that everything still works when we make a change to the code. There is an assumed base of having worked through the material on this website.

1. Improve testing

Add code coverage

What is code coverage? Code coverage is the percentage of the research / production code you have written that is covered by unittests. If you have a very high percentage then when you make a change in the code, and it breaks the chances of it being caught before showing it to users is very high. If you have a low percentage the chances of finding these bugs are low, or you have to do a lot of manual testing. When working with python and pytest there are packages to easily get the test coverage of your application. The one used most is pytest-cov .

Add parameterized tests

When writing tests it sometimes happens that you want a lot of tests for the same function. You could write a lot of test functions with the same setup and when calling the function under test some different parameters. A cleaner way where you have to maintain less code afterwards to do this is by using paramterized tests. With this you add the different parameters as inputs to you test function. An example of this looks like this:

PYTHON

@pytest.mark.parametrize(
    ("onset", "phenomenon", "expected"),
    [
        (
            "2024-12-09T11:31:14Z",
            "snow-ice",
            "Monday 9 December: chance of snow/road icing",
        ),
        (
            "2025-01-04T00:00:00Z",
            "low-temperature",
            "Saturday 4 January: chance of cold",
        ),
    ],
    ids=["special_case", "normal_case"],
)
def test_get_english_headline(onset: str, phenomenon: str, expected: str) -> None:
    """test generation of english headline"""
    assert _get_english_headline({"onset": onset, "phenomenon": phenomenon}) == expected

As you can see even the expected result is now an input of the test. We can use the ids parameter to give a test a name. With this name you can also run the test for only one of the ids. For more information on parameterized tests you can read this how-to guide.

2. Testing a unit of software without having to instantiate all the code

Sometimes it happens that you want to test a function but in that function a lot of complex objects are used (and those objects in turn need other objects…). One way to deal with this is to add those complex objects as input to the function. You can that use this mock to prevent you having to create all those objects yourself. In the code bellow we see the complex class being mocked and then given an implementation for when the method is called. This way we don’t need to create input_one and input_two with all of their possible inputs. This type of test double tests state and behaviour.

PYTHON

from unittest.mock import MagicMock

class Complex:
    
    def __init__(self, input_one, input_two):
        self.input_one = input_one
        self.input_two = input_two
    
    def execute(self):
        "do complex things"
        pass
    
def function_under_test(my_complex_object_with_multiple_inputs):
    return my_complex_object_with_multiple_inputs.execute()

def test_function_under_test():
    inputs = MagicMock()
    inputs.execute = MagicMock(return_value=3)
    result = function_under_test(inputs)
    expected = 3
    assert result == expected
    assert inputs.execute.call_count == 1

For more information on mocking you can read this quick guide.

3. Working with external systems during a test

When writing code you do not always have the data on your machine. Sometimes you need to download data over http. For this a lot of the time people use the requests library (when you have async code aiohttp is a nice alternative). For your unit test however you don’t want to be dependent on the network, because this is unreliable and can have your tests sometimes fail for no reason. One way is to split the http call inside another method and use a fake response when testing that method. The following code calls the german weather opendata platform to get thunderstorm data. The page gets a lot of updates in the data but the format stay’s the same. The actual api calls can then be tested inside an integration test and also look at the error handling. More information about integration testing can be found at the turing way.

PYTHON

import requests
from bs4 import BeautifulSoup
from unittest.mock import patch

def get_konrad3d_data(url):
    response = requests.get(url)
    return response.text

def extract_latest_file(overview_page):
    soup = BeautifulSoup(overview_page, features="html.parser")
    urls = soup.find_all('a')
    latest_file = urls[-1].get('href')
    return latest_file

def get_latest_file_konrad3d():
    url = "https://opendata.dwd.de/weather/radar/konrad3d/"
    overview_page = get_konrad3d_data(url)
    latest_file = extract_latest_file(overview_page)
    return latest_file

data = '<html><head><title>Index of /weather/radar/konrad3d/</title></head><body><h1>Index of /weather/radar/konrad3d/</h1><hr><pre><a href="../">../</a><a href="KONRAD3D_20241116T093000.xml">KONRAD3D_20241116T093000.xml</a>                       16-Nov-2024 09:34                3895<a href="KONRAD3D_20241118T092500.xml">KONRAD3D_20241118T092500.xml</a>                       18-Nov-2024 09:30                3938</pre><hr></body></html>'

def test_download_latest_data_konrad3d():
    with patch("faketest.get_konrad3d_data", return_value=data):
        result = get_latest_file_konrad3d()
    expected = "KONRAD3D_20241118T092500.xml"
    assert result == expected

Another way to not do these API calls is by using the requests_mock library to mock requests API calls. This makes you dependent on another library and still does not show you if things work in reality. It’s being used by a lot of people, but personally I prefer fewer dependencies and write integration tests for the integration with external systems. When you mock this it can give you a false sense of security like happened with the Crowdstrike outage in their testing. If you want to use this an example can be found bellow.

PYTHON

import requests
import requests_mock

def get_konrad3d_data(url):
    response = requests.get(url)
    return response.text

def test_download_latest_data_konrad3d():
    data = '<html><head><title>Index of /weather/radar/konrad3d/</title></head><body><h1>Index of /weather/radar/konrad3d/</h1><hr><pre><a href="../">../</a><a href="KONRAD3D_20241116T093000.xml">KONRAD3D_20241116T093000.xml</a>                       16-Nov-2024 09:34                3895<a href="KONRAD3D_20241118T092500.xml">KONRAD3D_20241118T092500.xml</a>                       18-Nov-2024 09:30                3938</pre><hr></body></html>'
    url = 'https://opendata.dwd.de/weather/radar/konrad3d/'
    with requests_mock.Mocker() as m:
        m.get(url, text=data)
        result = get_konrad3d_data(url)
    assert result == data

4 Performance testing of functions

There are moments that the performance of you function might matter a lot. You might not want a single function to ever execute slower than x seconds. To test this you could write tests for the specific functions that should stay fast. How this works is that you run a function x amount of times and the max duration of the function should not be higher than the x seconds. A useful library to help with these types of test in python is pytest-benchmark. This library can also be used to check if the performance between versions of the code is improved.

PYTHON

import time
def function_to_test(duration=1):
    time.sleep(duration)
    return 123

def test_my_function(benchmark):
    allowed_speed = 1.000002
    result = benchmark.pedantic(function_to_test, iterations=5)
    assert benchmark.stats.stats.max < allowed_speed

    assert result == 123

This code can be run with the following command: pytest -v -s the file_this_is_in.py::test_my_function. It will run the code 5 times and none of the calls is allowed to be slower than the allowed_speed.

When you write API’s you can also have performance requirements. For this another type of tool is used. One of the most used tools for this in python is locust. For more information about this tool look you can look at their documentation.

5. Smoke testing to see if your application is still doing its basic functionality

There are moments that you want to start an application but the application has some prerequisites it needs to have before you can say that it’s good and allowed to run. For this you can use a smoke tests. For example when you have an application that when a user calls it reads configurations files from a file system the check could be if the files exist at the correct location and the format is as expected. Maybe someone manually moved the files it this could break the whole system. So when the files are not there, there is smoke and thus if it’s production we could get a fire. In the example bellow you could see how to test something like this in the same application. However, most of the time those checks would be in another script before you start this script (or if you use Kubernetes an init container).

PYTHON

def config_file_is_found():
    #check on location if file exists
    pass

def main():
     #application logic
    pass

if __name__ == '__main__':
    if not config_file_is_found(): 
        raise FileNotFoundError("our config file is not found")
    main()

More information about smoke tests can be found on the turing way.

6. Runtime testing

When software is in production, and you introduce a new path inside the code you might want to run it for a while without actually implementing the behaviour inside that code path. And example for this is that when we implemented an extra validation for our public dataplatform we first added the validation where we allowed everything like before. But we executed the new logic and logged all unexpected things that happened. This gave us a lot of information about what would happen when we would turn the feature on for real. One important thing we found out that inside our network some http requests would only reach their destination after 10+ seconds. The application would already have given the users an error and that’s not what we wanted. Because of this information we could add a solution that when we eventually brought our check live no users got an error.

An example of a check like this can be found bellow.

PYTHON

def my_new_validation_logic_to_external_api():
    print("do an external api call")
    return True

def get_observation_data():
    return "observation data"

def give_the_user_observation_data():
    try:
        is_allowed = my_new_validation_logic_to_external_api()
        if not is_allowed:
            logger.warning("for user with id x we get not allowed back")
            is_allowed = True
    except Exception as exc:
        logger.warning("We got the following exception: %s", str(exc))
        is_allowed = True

        
    if is_allowed:
        return get_observation_data()

An example where you would like to do this for a research project might be when with reinforcement learning steps take too long. This can mean that for cost efficiency at that moment it is the most cost-effective. More information on runtime testing can be found at the turing way.

7. Closing words

In the previous parts we have looked at quite a few different types of test with examples. Also, some ways on making the tests more reusable and improving the quality. We would like to end with giving a few more possible resources where you could find information about different types of tests or testing tools:

Content from Continuous Integration

Last updated on 2025-02-12 | Edit this page

Overview

Questions

What is CI/CD?
Why do we use CI/CD?
What does CI/CD have to do with version control?
What is a CI/CD pipeline?
What is docker and how does it relate to CI/CD?

Objectives

Be able to explain the basic concepts of Continuous Integration and Continuous Delivery.
Be able to explain (identify three reasons) why Continuous Integration and Continuous Delivery should be used.
Be able to identify freely available tools and services to implement these concepts in a research context.
Be able to explain the relationship between CI/CD and version control
Be able to build a simple CI/CD pipeline or workflow

The main goal is to create awareness of the concept of Continuous Integration and Continuous Delivery and some of the tools to support it.

Continuous Integration and Continuous Deployment / Delivery

Introduction

The Turing way explains the concept very well.

Continuous Integration should not be confused with DevOps: CI/CD is not DevOps, but DevOps effectively requires CI/CD. DevOps is the concept where a team is responsible for the entire life cycle of a software product or software component. From development to deployment to operating and maintaining it in production. Continuous Integration and Continuous Delivery and/or Deployment plays a large role in this. But DevOps is not required for leveraging CI/CD.

We should distinguish Continuous Integration, Continuous Delivery and Continuous Deployment.

For more information about DevOps see the guide at github.

Continuous Integration

Continuous integration is the practice of integrating all your code changes into the main branch of a shared source code repository early and often, automatically testing each change when you commit or merge them, and automatically kicking off a build. With continuous integration, errors and security issues can be identified and fixed more easily, and much earlier in the development process. – gitlab.com

Continous Delivery

Continuous delivery is a software development practice that works in conjunction with CI to automate the infrastructure provisioning and application release process. – gitlab.com

This can be understood as creating a Docker container, creating a PyPi package for Python, a jar file for Java or equivalents for programming languages like R or C/C++ and Fortran. This will be done in an automated way every time a change is pushed to the git repository on github, or gitlab or some other platform.

Continous Deployment

Continuous deployment enables organizations to deploy their applications automatically, eliminating the need for human intervention. – gitlab.com

When talking about deployment, we mean that the software is running on a server and the services it provides are available for consumption by other software components. For research software that is the focus of this course, that is rarely the case, so we ignore Continuous Deployment and focus on Integration and Delivery.

Why is Continuous Integration and Continuous Delivery recommended?

CI/CD should be used when work is done in a collaborative project where changes created by different contributors need to be merged and tested. The earlier in the process this is done, the easier any bugs or merge conflicts are to solve.

However, even in projects with a single developer utilizing CI/CD tools can be very benificial. It will enable users of the software to have early access to it and bugs are discovered sooner. It also enables the developer to run unit testing in an automated way to discover bugs early in the process.

The earlier conflicts and bugs are discovered, the easier they are to fix.
Deliver value to the user of the software quickly

Version control

Relationship between version control and CI/CD

CI/CD relies on version control. Workflows are usually triggered by events, such as merging to the main branch happening on the verson control repository.

Containers

Nowadays software is often distributed or deployed using containers. These containers contain every dependency an application needs and can run anywhere. This is especially useful for applications that run as a service on a cloud provider such as Google or Amazon Web Services where it is not known beforehand where an application will run. These containers are light weight operating system images in which the application is stored including everything it needs. CI/CD is extremely useful for automatically building such container images, as explained below in the section explaining pipelines and workflows. Running applications in containers tends to enforce decoupling from external dependencies and communicating to external services through well defined and stable interfaces. Building and distributing containers is generally done using docker but there are others such as podman You can find more information about containers here.

Publishing your application as a docker image is advised if the application has a lot of dependencies or if the build process is very complicated. If the application is simple with only few dependencies, then creating a docker image probably creates too much overhead. Docker is not suitable for publishing a module or library. Here is a tutorial on how to build docker images Docker is a commercial application that has a community edition that can be used free of charge. Usually the communitiy edition provides more than enough functionality. Podman is a free and open source alternative for Docker if you prefer.

The docker website has a list of guides on creating docker images for all kinds of purposes.

Pipelines or workflows

A CI/CD pipeline is an automated process utilized by software development teams to streamline the creation, testing and deployment of applications. – gitlab.com

Pipeline stages

verification, testing
build
deployment (Is this relevant for scientists who typically do not operate continuous running processes)
Perhaps publishing packages to PyPi or similar

Simple github workflow

Github workflows and alternatives from other suppliers enable you to trigger certain jobs on certain events. For instance, you can configure it to only update the documentation if the only changes pushed to the repository are in the documentation. Alternatively you can trigger it to build a release package only when merging to the main branch. Check the Understanding GitHub Actions guide for all the possibilities.

Github Pages13 is a workflow that works out of the box by configuring it on the repository’s website14

Workflows are defined in the .github/workflows directory in a git repository.

Github has a quick start tutorial to get you started with workflows.

The example below will update the repository’s GitHub Pages site when a changes is pushed to the gh-pages branch. As you can see, it will only build on pushing to the gh-pages branch.

YAML

on:
  push:
    branches:
      - gh-pages

jobs: 
  publish-gh-pages:
    name: "Publish GitHub pages"
    runs-on: ubuntu-latest
    steps:
      - name: Configure GitHub Pages
        uses: actions/configure-pages@v5

Run automated tests

Build containers

Reproducable, all dependencies, packages included.

Github[11] and docker[12] have tutorials on building docker container images in githhub workflows.

[11] https://github.com/actions/starter-workflows/blob/main/ci/docker-image.yml [12] https://docs.docker.com/guides/gha/

And more

Github provides a collection of starter workflows that you can build on.

Demo: From zero to published package in 15 minutes

Step 1: Create a Python project

First step is to create a Python project on your personal computer. There are several tools to help you with this. In this tutorial we’ll be using Poetry.

SH

poetry new demo-tdcc-nes
cd demo-tdcc-nes

Next turn the project into a git repository:

SH

git init --initial-branch=main

And add the project’s contents to it:

SH

git add .
git commit -m "Initial commit"

Step 2: Create a github repository and push the project to it

Go to https://github.com and login.
Click on “New” to create a new project and give it the name demo-tddc-nes.
Click “Create repository” at the bottom right. No need to change anything else.

Then push the project with the following commands:

SH

git remote add origin git@github.com:<yourusername>/demo-tddc-nes.git
git push -u origin main

Refresh the page in the browser and you will see the contents of your project.

Step 3: Create a python module and a test

With a text editor create a file hello.py in the demo_tdcc_nes/ subfolder in the repository and paste the contents below.

PYTHON

def hello(thing: str) -> str:
    return f"Hello {thing}"

Next, to create a test, create a folder tests in the repository top level directory:

SH

mkdir tests

So, the repository should look like this:

.
├── demo_tdcc_nes
│   ├── hello.py
│   └── __init__.py
├── pyproject.toml
├── README.md
└── tests
    └── __init__.py

Now create a file test_hello.py in the tests folder and paste the following content:

PYTHON

import pytest

import demo_tdcc_nes
import demo_tdcc_nes.hello

@pytest.mark.parametrize('thing, expected', [("TDCC-NES", "Hello TDCC-NES")])
def test_hello(thing: str, expected: str) -> None:
    result = demo_tdcc_nes.hello.hello(thing)
    assert result == expected

install pytest package:

SH

pipx install pytest

Then run the tests as follows:

SH

pytest tests/

And examine the result of a passing test:

TXT

$ pytest tests/
================================================== test session starts ===================================================
platform linux -- Python 3.11.9, pytest-8.3.3, pluggy-1.5.0
rootdir: /home/user/projects/TDCC/demo-tdcc-nes
configfile: pyproject.toml
collected 1 item                                                                                                         

tests/test_hello.py .                                                                                              [100%]

=================================================== 1 passed in 0.01s ====================================================
$

Step4: Commit the changes to git and push them to github

Type git status to show the files that have been added or have had their contents changed since the last commit.

$ git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        demo_tdcc_nes/hello.py
        tests/test_hello.py

nothing added to commit but untracked files present (use "git add" to track)

Stage the changes and commit:

SH

git add demo_tdcc_nes/hello.py tests/test_hello.py
git commit -m "Implemented hello with test"

Next push the changes to github:

SH

git push

Step 5: Add a github actions workflow

The next step is to add a workflow to trigger github into creating a package and make it available.

Create the .github/workflows sub folder in the repository:

SH

mkdir -p .github/workflows

The repository now should look like this:

.
├── demo_tdcc_nes
│   ├── hello.py
│   ├── __init__.py
│   └── __pycache__
│       ├── hello.cpython-311.pyc
│       └── __init__.cpython-311.pyc
├── .github
│   └── workflows
├── pyproject.toml
├── README.md
└── tests
    ├── __init__.py
    ├── __pycache__
    │   ├── __init__.cpython-311.pyc
    │   └── test_hello.cpython-311-pytest-8.3.3.pyc
    └── test_hello.py

In the .github/workflows folder create the file build-demo-tdcc-nes.yml. The name of the file is not very important, it’s helpful however to choose a name that identifies what it does. It should have the extension .yml or .yaml so it can be identifies as a yaml file.

Add the content below to the .github/workflows/build-demo-tdcc-nes.yml file:

YAML

name: Upload Python Package

on: [push]

permissions:
  contents: read

jobs:
  release-build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Build release distributions
        run: |
          # NOTE: put your own distribution build steps here.
          python -m pip install build
          python -m build

      - name: Upload distributions
        uses: actions/upload-artifact@v4
        with:
          name: release-dists
          path: dist/

  pypi-publish:
    runs-on: ubuntu-latest

    needs:
      - release-build

    steps:
      - name: Retrieve release distributions
        uses: actions/download-artifact@v4
        with:
          name: release-dists
          path: dist/

Stage the change, commit and push:

SH

git add .github
git commit -m "Add packaging workflow"
git push

Click on the “Actions” tab in the github web page to see the workflow. Then click on the workflow (it has the title of the git commit message). A small graph is shown and at the bottom “release-dists” link is provided. Click on that to download the package.

That’s it! Package published!

Step 6: Publishing the package to PyPi

You can publish the package to PyPi if you have an account at https://pypi.org/. Follow the steps in the Python Packaging User Guid to make the package available through PyPi.

Step 7: Additional workflow jobs

If you only want to trigger a certain workflow when there is a change in, for example, the docs directory, you can add the following in a separate wokflow file in the .github/workflows directory:

YAML

on:
  push:
    paths:
      - 'docs/**'

jobs: 
  publish-gh-pages:
    name: "Publish GitHub pages"
    runs-on: ubuntu-latest
    steps:
      - name: Configure GitHub Pages
        uses: actions/configure-pages@v5

Considerations

CI/CD pipelines are not very suitable if your tests require a lot of static data. Running large integration tests inside a CI/CD pipeline is thus not recommended as there is generally limited space and time in CI/CD pipelines. Writing small and fast unit tests that run automatically inside CI/CD pipelines rather than large integration tests is recommended. It is therefore helpful to practice software engineering best practices such as decoupling, since that will lead to more easily testable code. Larger integration tests can still be done in CI/CD as long as they don’t require more than a few hundred megabytes of space and can complete within say 30 minutes. Check the resource limits your CI/CD infrastructure provider (e.g. github or gitlab) imposes on CI/CD pipelines. If you expect your tests to require more time and space than the CI/CD platform of you choice allows, consider alternative approaches such as blue/green deployments. Another alternative is to host your own pipeline runners that can be associated with your project on github9 or gitlab10.