I've been pleasantly surprised with the amount of interest in johnny-cache since Jeremy and I released it this past weekend. A lot of the comments revealed that perhaps the documentation is missing an important discussion on the repercussions of using Johnny. They are also pretty positive about the name :)
"Is johnny-cache for you?" is the most important question that is not answered by the documentation. Using Johnny is really adopting a particular caching strategy. This strategy isn't always a win; it can impact performance negatively:
- any real database read is first a cache miss, then a cache write
- any database write is a cache write
- any write to any table invalidates all cache depending on that table
- there are extra cache reads on every request to load the current generations
The major positive impact is:
- any cached read doesn't hit your database
This turns out to be a pretty exceptional positive for pretty large class of applications. Loading from memcached is going to smoke even your db's queryset cache with respects to latency while giving you cheap and easy horizontal scalability. It's not often you get these two coming hand in hand.
Every time you do a query that hits cache, your database doesn't have to accept a connection, allocate cursors, examine your query, execute it, and return the result. This is a fairly heavy cognitive load to lift off of your database servers.
If you were using something akin to MySQL's queryset cache before, you can pretty much turn it off. Not only do you get that memory back for loading indexes, performing queries, etc, but you can now horizontally scale your query cache with ease.
Pre-Django 1.2, splitting db reads and db writes at the application level was a real pain. Scaling reads across a pool of RODB databases is no picnic, either. For a read-heavy application, Johnny can alleviate so much read traffic that you can potentially just scale reads in memcached. Even if you need to horizontally scale reads across an rodb pool, they now have a shared queryset cache, such that reads on one slave saves reads to another.
Still, writes eventually happen, and when they do, Johnny will blow away the cache depending on the table written to. The implications of this are that Johnny's effectiveness is reduced if you:
- have "logical" write operations that hit many tables
- write heavily to one table that is then featured in many joins
- have very few tables
An unappreciated caveat to this is that the relative frequency of your writes and reads matters quite a bit. For a simple one page, one query, one table scenario where you are receiving about 1 write per second. This might seem like too often for Johnny to be useful, but if you serve 30 pages per second, you are hitting cache 96% of the time.
Typical webapps are going to read far more often than they write, and serve a few pages far more often than the other pages on the site. For these apps, Johnny will probably work quite well. Even in cases where it doesn't fly, it's probably a good starting point.
But due to the magic of the internet, I don't have to rely solely on hypothetical and anecdotal evidence. Someone running such an application tried Johnny out and wrote a nice little blog post about his results. His chart even suggests that his application is quite write heavy. It also looks pretty similar to what we saw when we pushed the primordial version of Johnny live last year. The post itself is pretty fascinating; the readers digest translation is that he already had some caching in place, but installed Johnny, set it up, and his query count still dropped pretty dramatically (illustrated). Note that it wasn't just cache hits that dropped; Johnny can cache some queries that MySQL can't, and there are other classes of queries that are impossible to cache but are easily avoided. Despite that initial positive result, he noticed that his CPU utilization and context-switching increased, likely because memcached and mysql (and I perhaps even his app server) were running on the same box.
So, where to take Johnny from here? Johnny is version '0.1' not because we think it's barely ready for use, but because we felt like we released the smallest piece of software that could actually be of use.
The first improvement would be a way to allow application authors to keep Johnny from caching result sets from tables that receive very heavy write traffic, like a log table. Although monkeypatching was really the only way to achieve the level of integration and simplicity we needed, you always have to acknowledge that there will be cases where people only want to use your code some of the time, or maybe most of the time, but not all of the time. Some kind of model annotation or table blacklist might suffice here, but I want to think through this and its invalidation implications a bit more before deciding on how to do it.
Another improvement I want is increased access to the generational keys Johnny maintains. I recognize cases where you might want to use Johnny's invalidation to consistently cache higher level objects like html fragments or even entire pages. Consider something like a @invalidate_on_model(Post) decorator for an RSS feed of latest blog posts that would only have to be generated upon the first read, and invalidates automatically when the Post's table is altered (or after some optional timeout). I'm still trying to work out how to increase this idea's usefulness when you introduce pagination.
Towards answering the question that is the title of and reason for this post, I'd like to either build in or provide separately something that utilizes Johnny's hit/miss signals to give per-page and per-table statistics about cache hits and misses.
Every application has its own set of circumstances and requirements, and probably its own optimal caching strategy, but if you're a perfectionist with a deadline, Johnny might just get you a whole lot of bang for fairly little buck.

public domain
comments
Matthew Schinckel 06:53 March 2nd ∞
django-devserver has a module for showing % cache hits: this was the key figure for me when deciding to use johnny-cache in our production system (and this is using the locmem cache until I get around to setting up memcached on that server).
A big performance increase comes when viewing admin pages, especially if they have inline models with foreign keys. I have some very connected data, and the admin interface is now usable without raw_id_fields set for each of these.
bjunix 06:56 March 2nd ∞
Thanks for this great write-up! I am looking forward with interest to see how johnny cache evolves over time. Besides, I really like your writing style.
Jason Moiron 10:43 March 2nd ∞
@Matthew
I want to be able to at least potentially run stats over an extended period on a live or load testing system, to be able to easily examine trends; but for a quick shot overview of how Johnny will fit into your existing strategy django-devserver sounds like a good call.
The origin app for primorial Johnny had a large and heavily utilized CMS component that was dashboardy and did tons of reads; Johnny was ideal because it really sped that up while remaining safe/consistent/transparent. The admin itself also features all sorts of dropdowns and multiple-selects for foreign keys and m2m, so it's a good fit there, too.
Andy Baker 11:07 March 3rd ∞
Does it make sense to combine Johnny Cache with another cache strategy?
I've got a Django shop based on a CMS that used to use simple full page cacheing. I turned that off when I added a shopping cart and I intended to work out a better strategy using template fragment cacheing.
Would I be better off in most cases (CMS/Shop type apps) just turning off all other caching mechanisms and just use ORM caching? Or should I combine Johnny Cache with a healthy dose of template caching?
Obviously I am just asking for guestimates and nothing short of a proper load of benchmarking would really settle this matter properly!
Jeremy 11:26 March 3rd ∞
@Andy It really depends on the load you have and the nature of the site, but we definitely do combine caching strategies in our environment. Specifically we have a full page cache for some public (not often updated, but very highly visited) pages, then we let Johnny and a per object cache pick up the slack on more dynamic pages. It also works great since Johnny is a bit of a "last line of defense" next to the database. Page and template caches can be as much help to your app-server as your database, while johnny is almost purely a database cache (though it makes the page render faster which can increase your requests per second).
On a page, for example, that may get hundreds of hits a minute, the front end cache avoids any but 2-3 database queries (sessions if you're using database sessions which Johnny doesn't really handle well anyway). A good example of that is a news page.
On a purely dynamic site like a shopping cart, that doesn't really make sense though. If you have high query count template fragments, that may help, but the rendering time itself of the fragment usually isn't enough to warrant the caching of it.
The best thing you can do is watch your server load and database query/load graphs. If either is struggling, your caching strategy isn't effective or high level enough. Johnny is a great base to build upon though for high read sites, and for all of my personal sites, is enough to handle the load I get.
Andy Baker 11:50 March 3rd ∞
None of my sites are especially high traffic but RAM is a limiting factor so caching gives me a quicker response and helps me run with fewer processes than I would otherwise need.
If I read you correctly then you are saying that Johnny Cache is a good first 'fire and forget' cache and for my requirements that might be enough.
Jeremy 12:16 March 3rd ∞
@Andy: Correct, and for lower traffic sites it is likely enough. Obviously it won't be as fast as a fully cached page, but the parts of the page that you can't cache (the shopping cart, shopping cart items) are the ones that would be slow anyway...which johnny could potentially catch.
Load testing can help a lot to determine if it is enough. If you can get more data on things like page render time/return, cache hits vs misses, load, etc, you'll be able to get a good idea. Or perhaps load testing can just tell you that your site can handle 3x the amount of traffic you have with a certain strategy. Maybe try it with and without johnny since it's so easy to install/uninstall and see the difference.