Packaging Python Apps
by Morgan Shorter on Mon, 25 Apr 2022I have never quite understood why Python (or Ruby) packages are delivered through their own manager (pip, gem) instead of the local system package manager (apt, yum, etc). It might make sense for pure language packages but it becomes borderline insane when dealing with bindings to native libraries.
Side-stepping the OS distribution creates tons of problems especially when you rely on something with some very problematic design issues like easy_install. If you are interested in packaging software history, here is a great post about the subject.
Python bindings, which package manager to choose?
Fortunately 10 years later, these problems have been recognized
and work on in the Python community. Pip supports pre-built binaries now,
though most packages are built for the manylinux
target which
links against
glibc.
That has important implications when for various reasons
(including the fact that DjaoDjin is a PaaS to run Web applications as Docker
containers), we are packaging Python applications as Docker images.
The official Python Docker images come in multiple variants:
- python:<version> Debian-based with common packages
- python:<version>-slim Debian-based with minimal packages needed to run python
- python:<version>-alpine Alpine-based when small images is a primary concern
- python:<version>-windowsservercore Windows-based, because well...
First caveat: glibc vs. musl
Alpine is built against musl not glibc as most distribution, Debian included, are. Musl imposes multiple constraints, like stack size limits which some users have reported issues with.
There is a new PEP for wheels
to be built against musllibc and to be distributed under the
musllinux
tag. This is currently not as widely adopted as the
manylinux
tag. Most crucially, the
cffi package
does not have musllinux
builds yet (Apr 2022). This means that
installing compiled-language packages for a musl linked python, even if they
have a musllinux
build, requires a compilation step for cffi and
makes it somewhat impractical to use python binaries built against musl.
On the other hand, contrary to Debian, Alpine's package manager, apk, has features which allow for temporary / virtual packages. This makes it easier to build software from source without leaving gcc and other build-time prerequisites behind in docker images. This feature doesn't get us out of needing time and resources to build objects, but does keep resultant docker images free from the security and disk-usage impacts of keeping build-time dependencies.
Practically, for any application that requires Python bindings to native
code because, for example, it generates charge receipts (WeasyPrint),
or processes images (ImageMagick),
or uses the PostgresQL bindings (psycopg2),
or uses cryptography (pyca/cryptography)
features that only leaves python:<version>
and python:<version>-slim
as alternatives at this point.
Second caveat: /usr/local/bin/python
In the official python docker images, python is built from source and installed in /usr/local. This means a few important things:
- Installing Python packages via the OS package manager might install a copy of python and it's dependencies, which may be incompatible with the python version that was built from source for the official image. Using the OS package manager will bloat the image with with a huge graph of (likely irrelevant) dependencies. It is also less deterministic and more likely to result in run-time errors.
- It is possible (likely) to have a mismatch in version between the python version built from source and the version installed by the OS.
Writing a packaging policy
Given the previous analysis, we standardize Dockerfile
as such:
- use an official `python:3.X-slim` base image
- prefer pip over the OS package manager for Python bindings
- install pure native code libraries through the OS package manager
More to read
You might also like to read:
- Which AMI to use as a base?
- How New EC2 Instances Lead to Re-write PDF Tools
- Testing a Django 2.2 website with SQLite3 on CentOS 7
More technical posts are also available on the DjaoDjin blog, as well as business lessons we learned running a SaaS application hosting platform.