Tuesday, May 7, 2013

Local PyPI Options

Having a central package repository has helped the Python community immensely through sharing reusable code. There's a few issues that arise when you start depending on such a resources though, and may need to be solved:
  1. make your installs resilient against Internet/PyPI issues,
  2. speed up your installs significantly (after the first one),
  3. prevent problems installing packages that are removed from distribution by the author,
  4. allow installation of packages from within a firewalled environment where the host performing the installation does not have Internet access, and
  5. allow hosting and installation of private packages.
All while being as little an overhead on the package users as possible (ie. maintenance of a system performing the above should be either low or none).

Searching for "PyPI" on the package repository is somewhat daunting (the page of results seems to go on forever). Having done a bit of a survey of the top hits there seems to be only a few packages that are relevant to the above requirements (presented here in the order that the PyPI search ranks them):
  • Flask-Pypi-Proxy - A semi-proxy that supports private package upload. Its dependencies are quite hefty and it does not mirror packages locally.
  • pyramidpypi - "This is a very simple pypi-like server written with the pyramid web framework." Pyramid is a very hefty dependency for such a simple server and it only supports private package upload.
  • simplepypi - a very simple local repository allowing upload of packages and installation of them.
  • yopypi - is a "load balancer" which punts requests to a mirror automatically when the primary PyPI is unavailable.
  • djangopypi / djangopypi2 - are both PyPI servers acting as local repositories with the same user interface as the real thing. No proxying, though there is a manual tool infi.pypi_manager which may be used to mirror packages to a local djangopypi.
  • inupypi - appears to also be a "load balancer".
  • mypypi - another local PyPI server using Zope 3.
  • pypiserver - serves files out of local directories or redirects to the real server if not found. Handles upload of private packages. No proxying for missing packages, though it does have a facility for updating packages which are already in the local directories.
  • pyshop - another private repository implementation with access controls built in. It also performs caching proxy of packages not present locally. Hefty dependencies (Pyramid but also an SQL database).
  • spynepi - a proxying server with local storage which also handles local upload of private packages! In Twisted. Using "spyne" which is some RPC mechanism and I don't know what it's got to do with PyPI serving. Hefty dependencies.
  • chishop - another simple local repository with upload written in Django.
  • ClueReleaseManager - yet another local repository though with full meta-data support and what appears to be proxying of PyPI meta-data, but not files.
  • pyroxy - a proxying index server which can serve local files (but without local caching of proxied files).
  • scrambled - a very simple server of local files (point it at a directory and run).
  • devpi-server - a transparent caching proxy with local storage of the files accessed. Uses a redis database, which is an additional dependency that is a problem in my deployment scenario.
  • collective.eggproxy - implements caching proxy but has hefty dependencies. Also seems to be very fetch-happy, retrieving eggs I don't actually need.
A lot of the implementations above have a bunch of user controls built into them. And there's an awful lot of "simple PyPI in framework X" implementations. Most of the "proxy" solutions (save pyshop, devpi-server and collective.eggproxy) required manual download of the package files, or they just proxied their requests through to the Internet with no local file storage for speed/resilience. Those others had dependencies that prevented me easily installing them into my target environment.

So none of them fit the bill, and none appeared to be easily modifiable to do what I want. So, I wrote my own: proxypypi :-)

When proxyypi is asked about a package it doesn't know it automatically goes off and fetches the file download list for the package, rewriting all references (PyPI and external) so they appear to be local. On request of one of those now-local package files it performs a background fetch of the file contents and serves up the new file data to the pip request (thus keeping that request alive despite its very short timeout duration).