Managing Python Ecosystems
You know that old quote:
The wider the net you cast, the wider the variety you catch.
Was it a wise old fisherman? Or a dogged Python programmer? Either way, words don't come much truer than those.
Few, if any, programming languages have embodied the description "general-purpose" as wholly as Python. And with the wide net of that applicability comes a wide variety in use -- and environments.
Library and framework developers rarely get to control how their code is used, and thus have to think about how their code fits into the whole ecosystem. From writing hybrid code for Python 2 and 3 to inserting shims for Pythons without threading support, there's no rest for the rigorous. Until now.
Announcing ecoutils
Ecosystems differ. Widely. Academic Python tends to be more Windows-heavy, corporate Python will probably forever be entrenched in Python 2, and one can never predict the arrival of that oddball user with the super old version of Python on Cygwin. But these are generalities and we can do better.
Enter ecoutils
. ecoutils
is a pure-Python module
that, using nothing but builtins, generates a semantic, Python-centric
profile of the environment that's running it. This includes:
- Host operating system: Windows, OS X, Ubuntu, Debian, CentOS, RHEL, etc.
- Language version: 2.5, 2.6, 2.7, ..., 3.4, 3.5, ..., etc.
- Executable runtime: CPython, PyPy, Jython, etc., (plus build date and compiler)
- Features: 64-bit, IPv6, Unicode character support (UCS-2/UCS-4)
- Built-in library support: OpenSSL, threading, SQLite, zlib, and more
- User environment: umask, ulimit, working directory
- Machine info: CPU count, hostname, filesystem encoding
Now, instead of crossing platform support bridges when users bring them to you, you can be proactive. Now, instead of guessing how developers are using the code, you can design for their needs and watch those needs change.
ecoutils
only gets more valuable when code goes to production. If you
manage your own machines, you know the risk of version drift and
missed boxes only goes up with machine number and time. If you don't
manage your machines, it's just a matter of time until someone is
being trained on your boxes.
So what does a profile look like?
Generating a profile
Profiles are generated by ecoutils.get_profile()
.
When run as a module, ecoutils
calls get_profile()
and prints a
JSON-formatted profile. On my fully-updated Ubuntu 14.04LTS machine,
python -m boltons.ecoutils
yields:
{ "_eco_version": "1.0.0", "cpu_count": 4, "cwd": "/home/mahmoud/projects/boltons", "fs_encoding": "UTF-8", "guid": "6b139e7bbf5ad4ed8d4063bf6235b4d2", "hostfqdn": "mahmoud-host", "hostname": "mahmoud-host", "linux_dist_name": "Ubuntu", "linux_dist_version": "14.04", "python": { "argv": "boltons/ecoutils.py", "bin": "/usr/bin/python", "build_date": "Jun 22 2015 17:58:13", "compiler": "GCC 4.8.2", "features": { "64bit": true, "expat": "expat_2.1.0", "ipv6": true, "openssl": "OpenSSL 1.0.1f 6 Jan 2014", "readline": true, "sqlite": "3.8.2", "threading": true, "tkinter": "8.6", "unicode_wide": true, "zlib": "1.2.8" }, "version": "2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]", "version_info": [2, 7, 6, "final", 0] }, "time_utc": "2016-05-24 07:59:40.473140", "time_utc_offset": -8.0, "ulimit_hard": 4096, "ulimit_soft": 1024, "umask": "002", "uname": { "machine": "x86_64", "node": "mahmoud-host", "processor": "x86_64", "release": "3.13.0-85-generic", "system": "Linux", "version": "#129-Ubuntu SMP Thu Mar 17 20:50:15 UTC 2016" }, "username": "mahmoud" }
Weighing in at just over 1KB, it's not too daunting! ecoutils is part
of the boltons package, so pip install boltons
and
see how yours compares.
By virtue of being in boltons, the ecoutils
module is also fully
standalone, and can be used without the rest of the boltons
package. ecoutils has been tested with Python 2.6, 2.7, 3.4, 3.5, and
PyPy on Ubuntu, Debian, RHEL, OS X, FreeBSD, and
Windows. File an issue if something seems to be
broken. Compatibility is the goal.
Transmission and collection
Now, ecoutils is really just part of the solution. Sure you can write out a quick profile it at the top of every log file, and you won't regret it. However, real ecosystem management means running a sort of Python analytics shop.
For those familiar with browsing the Internet, your browser is a virtual machine that has likely been participating in a similar arrangement all day today. Like Google Analytics or Piwik, the setup involves collecting relevant data, and then sending it to a central server for storage and querying.
Collection is handled by ecoutils
. As far as transmission is
concerned, in development environments, we have a dead-simple,
side-effect-minimizing, single-file HTTP client that sends ecoutils
profiles to a central analytics server on application startup.
In production environments, our framework serves this information for queries on a special port, through SuPPort's MetaService, through clastic's MetaApplication, where this all started. Here's an example of it running in Wikipedia Hashtags Search, on a managed Wikimedia environment, over which I have minimal control, and need maximum information.1
Push or pull, all the data is stored in a simple SQL (or JSONL) format, as demonstrated by espymetrics, the example project for my Enterprise Software with Python course. Nothing more enterprise than having literally dozens of environments by design, and even more than that by debt.
One last note, data management is all about audience and context. If
you're an administrator in a professional setting, the data above is
great. But there are understandably some cases where you might want
something less identifiable. get_profile
has a scrub
flag that
handles that. See the docs for details.
Success stories
Originally designed for easier remote administration across multiple environments, a little bit of info has had far-reaching impacts. For a few examples from my work at PayPal, this approach enabled us to:
- Deprecate and remove production Python 2.6 support from our framework, simplifying our build matrix without customer impact.
- Actively engage new users attempting to use our framework with unsupported Pythons or OSes.
- Improve utilization through designing for observed CPU counts.
In practice, ecoutils
combines well with psutil data to
go even further in utilization.
Building for variation
Some of you probably came here expecting to read yet another great post about virtualenv, tox, and maybe even conda envs. I'm glad you've already heard of them, because they're a big part of the story. If you haven't yet explored these tools, check them out, because they are invaluable for cross-version Python testing and packaging.
Also, if you're working on an open-source library, I can vouch for Travis CI (Linux) and Appveyor (Windows) as very valuable providers for cross-platform testing. I use both of them on boltons, and it makes it easier, not harder, for contributors to submit pull requests with confidence. Most outfits can't afford to have a team member leading support for each platform, like we do at PayPal.
Conclusion
Python is more than just an expressive, succinct programming
language. In a diverse world, Python is a tremendous force, made so by
its wide deployment, cross-platform support, and external library
integrations. Python gives you SQLite, JSON, SSL, Unicode, and much
more, but with many necessary strings attached to Python version,
build, or environment. ecoutils
offers an experienced look at the
real features that affect the value of Python components and teams.
Don't leave ecosystems and their constituents to chance, whim, or fad. Collect the data that makes your ecosystem unique, and make measured decisions based on the realest demand: actual usage.
-
When that server seems slow, remember to donate to Wikipedia. And maybe volunteer, because money alone does not make servers run fast. ↩