Johnny Cache

I've been waiting a long time to write about this. Johnny Cache is now released upon the world. It's a drop-in caching library/framework for Django that will cache all of your querysets forever in a consistent and safe manner. You can install it via pip install johnny-cache.

Conceptually, Johnny Cache started when I wrote the 'Queryset Caching' post last May. That was written after the ideas for how to implement such a cache had coalesced into a plan, but before an implementation had been created. A proof of concept was developed that summer and put into production on a fairly large site. The code that went into that version was probably not releasable; but a lot of work had gone into the code (and the testing suite), and I couldn't bear to start over on a clean implementation.

This January rolled along, and a few events converged that convinced me I needed to increase my open source footprint. Johnny had proven to be central to the scaling capability of the application where it was in use, and I felt that it would be a real benefit to the community to rewrite it. I created a repository for an MIT licensed project and threw up the easy part; a thread-local caching mechanism similar to the locmem backend, but cleared after every request.

After I gave Jeremy the URL to the hg repository, he banged out a nice framework for how the queryset caching mechanism would work in 1.2 (the patch point had changed), and declared it "pretty much done." It was fantastic; ~100 lines or so of clean, concise python code. And of course, it didn't work at all. Nearly 80 commits, 2000 lines of tests, fixtures, documentation, edge case handling, 1.1 support, and actual implementation later (his version handled generation keys but completely left out queryset keys based on the generations), we had something we were confident would finally work as advertised. Johnny's documentation explains what the project is, and what it does, but I want to reflect a bit more on the process of its development.

This is the first project I've really worked on with a Distributed VCS that I've had more than one developer working on. Both of us are really used to a centralized repos, and I don't think we quite embraced the "every change is a branch" philosophy. Other than that, working in a distributed nature wasn't really that helpful, because most of our machines don't have unfettered access to the internet and so it can be difficult to share revisions between us without hitting the central repository. This is, perhaps, one of the true draws of sites like bitbucket and github. I've found that dvcs is great when developers are working in isolation (like, forking beaker to write a new auth backend) but I prefer a centralized development repos when working on the same issues. Mercurial does either quite well.

This is also the first time I've used the Sphinx Documentation system that python.org et al use. I've used stuff like JavaDoc and Doxygen in the past, and frankly the documentation it produces is almost always worthless, even for a library. It doesn't highlight the important pieces properly, it doesn't provide room for exposition, etc. I've shied away from Sphinx in the past for a few reasons; the startup cost seemed a bit steep (it really isn't), and there was a conceptual confusion in the steps between writing the documentation and getting the results you desire. Finally, I didn't want to really write lots of extra documentation that would live outside my code.

My feelings on writing "extra" documentation have substantially changed, however, and Sphinx offers a best-of-both-worlds hybrid, with commands that will automatically pull documentation strings or automatically document modules, functions, classes, etc, but more or less leave the entire form and function of the documentation up to the actual ReST documents themselves. When I need to explain something, I explain it. When I want to include docstrings, or highlight a specific function as a method of doing some higher level action within the context of my app/library, I can do that with ease. It was a delight.

This was also the first project that I've used a significant amt of TDD for. The implementation details for Django 1.1 and Django 1.2 were radically different (1.2 had to support multiple databases, itself a major change), but in the end we wanted the software to more or less operation in the same transparent manner. Having a thorough testing suite caught tons of subtle bugs in behavior (including lots of regressions) that we might never have found otherwise. There were a few database-specific behavioral bugs that would have passed under one db but not another, which could very well have gone unnoticed were it not so easy to set up different environments and test.

I still feel like TDD works much better when it's easy to define correctness, but perhaps it's the case that any software development is much better when that's the case. The test suite for johnny-cache is extensive, and at least 1/3rd of the tests were written before the code to pass it was. A culture of healthy fear grew up around the tests; if there were acknowledged holes in the testing suite, the code that would supposedly functioned to pass those gaps was presumed to be buggy, and this defensive posture helped us find a few bugs.

Finally, the difference between "hacking" and "shipping" is pretty apparent when you do the legwork to get proper documentation written, set up the right distribution channels, and you have a desire for that code to be used by people, and in some way be a representation of your ideas and abilities. There are still lots of options for future development on Johnny; we've only got the basic operation of the cache up, but there's tons of things you might want to do once you start to understand how your app is utilizing Johnny.

Feb 28 2010

jmoiron plays the blues

Johnny Cache