I have adopted a parsimonious approach towards the inclusion of library dependencies in my software projects. I've been meaning to write about this since before the left-pad debacle, and at the time I even teased it a bit:

left-pad exposes cultural problems with respect to dependencies and code reuse, the technology is just an enabler

First, I want to go on an absolutely massive tangent, and start with 3 axioms of dependency management complexity:

  1. Cool APIs Don't Change; all other things being equal, a stable API is better than one that keeps changing.

  2. Dependency graphs have a total complexity on the order of their edge count.

  3. Edge weight isn't constant, it scales up by some factor based on the number of paths that use it.

(Henceforth, "dependency" will be abbreviated "dep")

2 has some good data going for it, and I suspect there may be ways to prove 3, though it may also be the kind of thing that graph theorists argue about when they aren't trying to color in high dimensional cubes. If you have a very basic familiarity with graph theory, and how it applies to networks, then take the opposite approach to deps. Flow and connectivity are bad, because they propagate the breakages that develop when deps change. Star topology is good, because isolation is both insulation and loose coupling, which is a computer scientist's way of saying "superior."

These axioms suggest that relying on lots of deps isn't necessarily bad if their arrangement is simple. This is good news, because it means I'm not just some stuffy Luddite admonishing you to stop installing code off the internet.

If you want to try and guess how much a dep will cost, here's a napkin estimate. Come up with 3 numbers: the frequency that a dep changes (a real number from 0-1), its distance from the root of your dep graph, and the number of edges coming out of it. Since I'm not very sophisticated, I think it's a reasonable approximation to multiply all the terms together 1. Deps that are isolated, stable, and directly imported are cheap and those that are volatile, tightly coupled with others, and a distant concern are expensive.

Obviously, you cannot arbitrarily add code to a project without it getting more complex. The first term having a limit at zero is meant to suggest that deps that rarely change (like standard libraries) do not impose a lot of extra management overhead as you upgrade them. It's why stable libraries are better than unstable ones. And its why people in the Go community have been banging on about vendoring even though nobody wants to hear it.

The elephant in the room about vendoring is that it works great for applications and large projects being undertaken by teams at companies, but it's rubbish for libraries. Peter Bourgon's go best practices post as a nice section on dependency management, and in it he claims:

Libraries with vendored dependencies are very difficult to use; so difficult that it is probably better said that they are impossible to use. [...] Without getting too deep in the weeds, the lesson is clear: libraries should never vendor dependencies.

This is an uncontroversial opinion among those who have actually tried it.

Lots of languages today ship with or encourage the use of a tool 2 and repository that attempts to fetch deps automatically. Go 1.0 shipped with a set of tools that included:

  • a way to fetch deps based on the contents of another package
  • a canonical way to build and linked packages based on conventions
  • static linkage by default

This made dep resolution a build time concern. Version resolution for production deployments was something you solved once, albeit manually, and not on every box you deployed to. The simplistic go get lowered the barrier to entry for tinkerers and library authors; if you had a URL or a github account, you could "publish" a package without worrying about the land rush.

With no solution for reproducible builds or pinning other than "we prefer vendoring" and "go get is not a package manager", the community tried out a bunch of stuff. Amazingly, lots of them focus on vendoring, and most of them are different from the standard approach. To contrast, without a pre-existing automatic build system, Rust's build tool and package manager were combined into one project. It follows all of the standard practices, and in a move of unprecedented irony was named cargo. 3

It's this class of tools that I tweeted about.

The manifest+lock pattern allows you to fetch, freeze and pin dep versions, but it doesn't make any improvements to the management process. The TCO doesn't change. Over time, people start to underestimate the cost of longer term maintenance. At its worst, you get an almost comical combinatorial explosion and you're told not to worry and not to look behind the curtain. Eventually, someone decides it's overly complex and they build something new and simple, but the culture ensures they end up right back where they started.

If you think the culture isn't informed by the tooling, next time you go to include a dep in your project ask yourself if you'd still do so if it were a C project and the library was a shared object file instead. This process is so daunting that a special type of hell was invented to describe it. Many Go bindings for C projects don't bother and just bundle all of the C code.

I'm cautiously optimistic that once we've solved the minor technical hurdles that are conveniently modelling these real long term complexities, we'll be able to keep our culture of suspicion and aversion when it comes to deps. Aside from those on the core team, we've long had influential members of the community supporting this; Blake Mizerany's dotGo 2014 talk on The Three Fallacies of Dependencies is an early example, and they echo even today in Peter's best practices blog post.

If you're writing a library, focus on what your library is trying to do, and don't go including a bunch of deps for a few bells and whistles. If you want a simpler project and a simpler way to manage your deps, then by far the easiest and most effective thing to do is use fewer libraries.

  1. If this doesn't exist, we can call it the Moiron score. It might not have a ring to it, but people might start to learn how to pronounce my name.

  2. Central open source repos for languages probably started with Perl's CPAN, but many languages have adopted one (or more!) tools to manage their own dependencies: npm, a number of PHP attempts culminating in Composer, pypi/cheeseshop, gem/rubygems, cabal/hackage, rebar/hex and now cargo/crates, et al.

  3. Sincere apologies for this, I could not help myself. I really do think, given the calibre of the people who built it, that they might have thought a bit out of the box (UGH sorry again) if the build system was already there.

Aug 3