Packaging and developing python projects with nested git-submodules
by Konstantinos Demartinos
The present article aims at discussing basic operations that might be relevant to packaging and developing python projects with nested git-submodules. The motivation stems from an actual case that can be abstracted as follows:
We want to work with a git repository that has nested git submodules of an arbitrary depth.
This can be further analysed in the following use-cases:
- It should be easy to update the superproject after any revision in the nested submodules.
- Packaging of the superproject should depend only in the upstream repositories of the nested submodules.
We represent an abstract structure based on the simple example provided by Python Packaging Authority (PyPA).
/mypackage /mypackage __init__.py ... setup.py Makefile requirements.txt /deps /package00 /deps /package10 ... ... /deps /packageij /package01
packageij is the
jth submodule at depth
One of our designated goals is to make installation of
of the submodules. To this end, the
install_requires field in
include only the names of submodules at the highest level (
IMPORTANT NOTE: If a submodule at any depth (i.e.
packageij) is not
uploaded to the Python Package Index (PyPI) then a valid link (see here)
should be appended to the
dependency_links field in the
setup function. The
package should be then installed by using the
pip command, like so:
$ pip install mypackage --process-dependency-links
Clone the repository
In order to have all the submodules initialized while cloning the superproject the following command should be invoked:
$ git clone --recurse-submodules <mypackage-URL>
Makefile to create the development environment
To simplify the configuration of the development environement
Makefile is provided, that performs the following operations:
- Creates a
- Installs all dependencies in
requirements.txtin the virtual environment.
- Runs steps (1) and (2) in case of upstream or local revisions of the dependencies.
Steps (1) and (2) are executed with
make install, while Step (3) is executed
A typical content for such a file would be the following:
0 env_dir:=venv 1 pip:=$(env_dir)/bin/pip 2 3 install: 4 python3 -m venv $(env_dir) 5 $(pip) install -r requirements.txt --process-dependency-links 6 $(pip) install --upgrade pip 7 8 clean: 9 rm -r $(env_dir) 10 11 reinstall: 12 make clean install
--process-dependency-links flag on line 5.
Working with submodules enables us to make revisions in the respective packages, while developing the superproject. The question then rises: How could these revisions be easily exposed to the super-project?
One solution would be to directly import the submodules through the respective paths.
import deps.packageij.subpackageij. But with nested submodules of
arbitrary depth this becomes rather tedious.
Another solution would be to update the upstream repositories first and then recreate the development environment. Depending on how we actually publish the submodules, this might require one or more additional operations before we actually work on the super-project.
If we publish to PyPI, a two-step process is required so that to update first the git-repository and then the package in PyPI. Although this is a proper deployment procedure, it seems rather complex for local development purposes.
Declaring dependencies through the submodule paths
Probably the simplest solution is to declare the dependency to
all submodules, by explicitly referring to the respective paths
requirements.txt file, with reverse order from the deepest
to the most shallow, like so:
deps/package00/deps/package10/deps/package20 deps/package00/deps/package10 deps/package00 ...
This way, the resolution of all the paths needs be done only once. Afterwards one can make revisions directly in the nested submodules, and re-create the development environment with
$ make reinstall