Packaging and developing python projects with nested git-submodules

by Konstantinos Demartinos

31.10.18

Introduction

The present article aims at discussing basic operations that might be relevant to packaging and developing python projects with nested git-submodules. The motivation stems from an actual case that can be abstracted as follows:

We want to work with a git repository that has nested git submodules of an arbitrary depth.

This can be further analysed in the following use-cases:

  1. It should be easy to update the superproject after any revision in the nested submodules.
  2. Packaging of the superproject should depend only in the upstream repositories of the nested submodules.

Abstract structure

We represent an abstract structure based on the simple example provided by Python Packaging Authority (PyPA).

/mypackage
  /mypackage
    __init__.py
    ...
  setup.py
  Makefile
  requirements.txt
  /deps
    /package00
      /deps
        /package10
        ...
          ...
          /deps
            /packageij
    /package01

packageij is the jth submodule at depth i.

Packaging: set-up setup.py

One of our designated goals is to make installation of mypackage independent of the submodules. To this end, the install_requires field in setup.py should include only the names of submodules at the highest level (package0j).

IMPORTANT NOTE: If a submodule at any depth (i.e. packageij) is not uploaded to the Python Package Index (PyPI) then a valid link (see here) should be appended to the dependency_links field in the setup function. The package should be then installed by using the --process-dependency-links flag of the pip command, like so:

$ pip install mypackage --process-dependency-links

Development operations

Clone the repository

In order to have all the submodules initialized while cloning the superproject the following command should be invoked:

$ git clone --recurse-submodules <mypackage-URL>

Using a Makefile to create the development environment

To simplify the configuration of the development environement a Makefile is provided, that performs the following operations:

  1. Creates a python virtual environment.
  2. Installs all dependencies in requirements.txt in the virtual environment.
  3. Runs steps (1) and (2) in case of upstream or local revisions of the dependencies.

Steps (1) and (2) are executed with make install, while Step (3) is executed through make reinstall.

A typical content for such a file would be the following:

 0 env_dir:=venv
 1 pip:=$(env_dir)/bin/pip
 2
 3 install:
 4         python3 -m venv $(env_dir)
 5         $(pip) install -r requirements.txt --process-dependency-links
 6         $(pip) install --upgrade pip
 7
 8 clean:
 9         rm -r $(env_dir)
10
11 reinstall:
12         make clean install

Notice the --process-dependency-links flag on line 5.

Declaring dependencies

Working with submodules enables us to make revisions in the respective packages, while developing the superproject. The question then rises: How could these revisions be easily exposed to the super-project?

Direct import

One solution would be to directly import the submodules through the respective paths. E.g. import deps.packageij.subpackageij. But with nested submodules of arbitrary depth this becomes rather tedious.

Updating upstream

Another solution would be to update the upstream repositories first and then recreate the development environment. Depending on how we actually publish the submodules, this might require one or more additional operations before we actually work on the super-project.

If we publish to PyPI, a two-step process is required so that to update first the git-repository and then the package in PyPI. Although this is a proper deployment procedure, it seems rather complex for local development purposes.

Declaring dependencies through the submodule paths

Probably the simplest solution is to declare the dependency to all submodules, by explicitly referring to the respective paths in the requirements.txt file, with reverse order from the deepest to the most shallow, like so:

deps/package00/deps/package10/deps/package20
deps/package00/deps/package10
deps/package00
...

This way, the resolution of all the paths needs be done only once. Afterwards one can make revisions directly in the nested submodules, and re-create the development environment with

$ make reinstall

References