Managing Python Ecosystems

You know that old quote:

The wider the net you cast, the wider the variety you catch.

Was it a wise old fisherman? Or a dogged Python programmer? Either way, words don't come much truer than those.

Few, if any, programming languages have embodied the description "general-purpose" as wholly as Python. And with the wide net of that applicability comes a wide variety in use -- and environments.

Library and framework developers rarely get to control how their code is used, and thus have to think about how their code fits into the whole ecosystem. From writing hybrid code for Python 2 and 3 to inserting shims for Pythons without threading support, there's no rest for the rigorous. Until now.

Announcing ecoutils

Ecosystems differ. Widely. Academic Python tends to be more Windows-heavy, corporate Python will probably forever be entrenched in Python 2, and one can never predict the arrival of that oddball user with the super old version of Python on Cygwin. But these are generalities and we can do better.

Enter ecoutils. ecoutils is a pure-Python module that, using nothing but builtins, generates a semantic, Python-centric profile of the environment that's running it. This includes:

Now, instead of crossing platform support bridges when users bring them to you, you can be proactive. Now, instead of guessing how developers are using the code, you can design for their needs and watch those needs change.

ecoutils only gets more valuable when code goes to production. If you manage your own machines, you know the risk of version drift and missed boxes only goes up with machine number and time. If you don't manage your machines, it's just a matter of time until someone is being trained on your boxes.

So what does a profile look like?

Generating a profile

Profiles are generated by ecoutils.get_profile().

When run as a module, ecoutils calls get_profile() and prints a JSON-formatted profile. On my fully-updated Ubuntu 14.04LTS machine, python -m boltons.ecoutils yields:

{
  "_eco_version": "1.0.0",
  "cpu_count": 4,
  "cwd": "/home/mahmoud/projects/boltons",
  "fs_encoding": "UTF-8",
  "guid": "6b139e7bbf5ad4ed8d4063bf6235b4d2",
  "hostfqdn": "mahmoud-host",
  "hostname": "mahmoud-host",
  "linux_dist_name": "Ubuntu",
  "linux_dist_version": "14.04",
  "python": {
    "argv": "boltons/ecoutils.py",
    "bin": "/usr/bin/python",
    "build_date": "Jun 22 2015 17:58:13",
    "compiler": "GCC 4.8.2",
    "features": {
      "64bit": true,
      "expat": "expat_2.1.0",
      "ipv6": true,
      "openssl": "OpenSSL 1.0.1f 6 Jan 2014",
      "readline": true,
      "sqlite": "3.8.2",
      "threading": true,
      "tkinter": "8.6",
      "unicode_wide": true,
      "zlib": "1.2.8"
    },
    "version": "2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]",
    "version_info": [2, 7, 6, "final", 0]
  },
  "time_utc": "2016-05-24 07:59:40.473140",
  "time_utc_offset": -8.0,
  "ulimit_hard": 4096,
  "ulimit_soft": 1024,
  "umask": "002",
  "uname": {
    "machine": "x86_64",
    "node": "mahmoud-host",
    "processor": "x86_64",
    "release": "3.13.0-85-generic",
    "system": "Linux",
    "version": "#129-Ubuntu SMP Thu Mar 17 20:50:15 UTC 2016"
  },
  "username": "mahmoud"
}

Weighing in at just over 1KB, it's not too daunting! ecoutils is part of the boltons package, so pip install boltons and see how yours compares.

By virtue of being in boltons, the ecoutils module is also fully standalone, and can be used without the rest of the boltons package. ecoutils has been tested with Python 2.6, 2.7, 3.4, 3.5, and PyPy on Ubuntu, Debian, RHEL, OS X, FreeBSD, and Windows. File an issue if something seems to be broken. Compatibility is the goal.

Transmission and collection

Now, ecoutils is really just part of the solution. Sure you can write out a quick profile it at the top of every log file, and you won't regret it. However, real ecosystem management means running a sort of Python analytics shop.

For those familiar with browsing the Internet, your browser is a virtual machine that has likely been participating in a similar arrangement all day today. Like Google Analytics or Piwik, the setup involves collecting relevant data, and then sending it to a central server for storage and querying.

Collection is handled by ecoutils. As far as transmission is concerned, in development environments, we have a dead-simple, side-effect-minimizing, single-file HTTP client that sends ecoutils profiles to a central analytics server on application startup.

In production environments, our framework serves this information for queries on a special port, through SuPPort's MetaService, through clastic's MetaApplication, where this all started. Here's an example of it running in Wikipedia Hashtags Search, on a managed Wikimedia environment, over which I have minimal control, and need maximum information.1

Push or pull, all the data is stored in a simple SQL (or JSONL) format, as demonstrated by espymetrics, the example project for my Enterprise Software with Python course. Nothing more enterprise than having literally dozens of environments by design, and even more than that by debt.

One last note, data management is all about audience and context. If you're an administrator in a professional setting, the data above is great. But there are understandably some cases where you might want something less identifiable. get_profile has a scrub flag that handles that. See the docs for details.

Success stories

Originally designed for easier remote administration across multiple environments, a little bit of info has had far-reaching impacts. For a few examples from my work at PayPal, this approach enabled us to:

In practice, ecoutils combines well with psutil data to go even further in utilization.

Building for variation

Some of you probably came here expecting to read yet another great post about virtualenv, tox, and maybe even conda envs. I'm glad you've already heard of them, because they're a big part of the story. If you haven't yet explored these tools, check them out, because they are invaluable for cross-version Python testing and packaging.

Also, if you're working on an open-source library, I can vouch for Travis CI (Linux) and Appveyor (Windows) as very valuable providers for cross-platform testing. I use both of them on boltons, and it makes it easier, not harder, for contributors to submit pull requests with confidence. Most outfits can't afford to have a team member leading support for each platform, like we do at PayPal.

Conclusion

Python is more than just an expressive, succinct programming language. In a diverse world, Python is a tremendous force, made so by its wide deployment, cross-platform support, and external library integrations. Python gives you SQLite, JSON, SSL, Unicode, and much more, but with many necessary strings attached to Python version, build, or environment. ecoutils offers an experienced look at the real features that affect the value of Python components and teams.

Don't leave ecosystems and their constituents to chance, whim, or fad. Collect the data that makes your ecosystem unique, and make measured decisions based on the realest demand: actual usage.


  1. When that server seems slow, remember to donate to Wikipedia. And maybe volunteer, because money alone does not make servers run fast. 


#python #work #code #esp #boltons
Previously
Enterprise Software with Python
Designing a version
Getting a Python job
RWC 2016 Lightning Talk
Enterprise Overhaul: Resolving DNS