Psychic Origami - A turbogears caching decorator

A while back I wrote a caching decorator for chrss. It's mostly used for the rss feeds, to help avoid having over-zealous rss readers slowing the site down. However I'm also now running it on a few other pages that were a bit slow (notably generating PGN files for games).

After letting it sit for a while Ian and Kyran also started using it on ShowMeDo. That was a couple of months ago. So now that I can be fairly certain it works it seemed like time to share it with the world.

So first off here's a few features/comments:

It's shamelessly based on code from Django (the caching backends at least)
It features an "anti-dogpiling" mechanism to try and make sure only one thread triggers a cache refresh
Multiple backends supported:

dummy - does no caching (for testing/development use)
simple - just uses a dictionary (for testing/development use)
localmem - thread-safe cache for use in process
file - uses file-system to cache data (this is what's used with chrss and ShowMeDo)
memcache - uses memcached for caching (it should work, but not massively tested at the moment)

Now for some example usage:

from turbogears import expose, controllers
from cache import cache_result

class MyController(controllers.Controller):

    @expose(content_type="text/plain")
    @cache_result()
    def cache_some_text(self):
        ''' no template so it's pretty straightforward - expose just has to come first '''
        return 'this will be cached'

    @expose(template="my_template)
    @cache_result()
    def cache_data_only(self):
        ''' with a template we can just cache the data and not rendered html '''
        return dict(value='this dictionary will be cached')

    @expose()
    @cache_result()
    @expose(template="my_template)
    def cache_html(self):
        '''
        or we can cache the rendered html, but we have to use an outer expose()
        to make the method public
        '''
        return dict(value='will be cached with the html')

To see how you can use the @cache_result() you are probably best looking
at the source (there's a fairly detailed comment explaining it). The following parameters can
be passed in:

key_fn - function used to derive a key to store the data in (default uses current url and user identity)
version_fn - can be used to control how a cached value expires (defaults to a function that returns the same value everytime)
timeout_seconds - how many seconds until the value start to expire

The default key function can be controlled via the config parameter cache.decorator.anon_only. If set to True (the default) it will only look in the cache for data when users are not logged in. Otherwise when users are logged in it will use a key just for them. The default is handy if you just want to avoid problems with a flood of anonymous users (e.g. from Slashdot/Digg etc).

The version function can be used to force expire cached values. The value of the version function is compared to the value stored in the cache and if different this triggers a cache refresh. For example if the version function was based on the number of comments in a blog post, then whenever a comment was added to the blog post the cache would get refreshed. This avoids having to wait for the cache to expire.

timeout_seconds specifies how many seconds before a value expires. It defaults to the value set in cache.decorator.timeout_seconds in your config file (or 1800 seconds if not set there).

anti-dogpiling

So first I'll explain what I mean by "dogpiling" with respect to cache expiry.

The standard way to use a cache is to do something like:

value = cache.get('key', None)
if value is None:
    value = recompute_cached_value()
    cache['key'] = value
return value

Now this is fine normally. When the cached value expires the next request will simply call recompute_cached_value() and the cache will be updated for future requests.

The trouble arises when recompute_cached_value() takes a long time to run and you have have a lot of other requests running at the same time. If a request is still recalculating the value and another request comes along, then that will also attempt to recalculate the value. This will in turn probably slow down the calculation going on, making it more likely that the next request to arrive will also trigger a recalculation and so on. Very quickly you can end up with tens/hundreds/thousands of request all attempting to recalculate the cached value and you have lost most of the advantage of caching in the first place.

So to handle this situation more gracefully this caching decorator employs a two stage expiry.

First there is a hard cut off expiry that works like normal. This is set to occur later than the other expiry time and is the value that would be fed to memcache or equivalent.

The second expiry time set is the one normally used. Basically when we store/retrieve the cached data we also have access to this expiry time (and the version). If we see that we need to recalculate the value (due to the expiry time being in the past or the version being different), then we attempt to grab a lock to recalculate the value. If we don't grab the lock, we assume another thread is doing the recalculation and rather than wait around we simply serve up the old (stale) data. This should mean that one thread (potentially per-process) will end up doing the recalculation rather than several.

This also means that we don't have to remove a value from the cache to force a refresh (which might cause dogpiling). Instead we can update whatever value we use in our version function, to trigger a graceful refresh.

conclusion

So that's a basic intro to this caching decorator. It's quite a handy quick way of adding some caching to your turbogears app. You'll need to see how well it works for you. I'm providing it "as is" and making no claims about anything. Feel free to incorporate it into your code and modify as you see fit. Just let me know if you have any issues or feedback.

bonus decorator

The cache code also includes a simple decorator to control the Expires header sent out with a response:

def expires(seconds=0):
    '''set expire headers for client-size caching'''
    def decorator(fn):
        def decorated(*arg,**kw):
            cherrypy.response.headers['Expires']=formatdate(_current_time()+seconds)
            return fn(*arg,**kw)
        return decorated
    return decorator

It's handy for getting the client to cache some data for us too. I use it on some of the PIL generated images served up via my app.

source code

Download turbogears caching decorator

The source for the decorator(s) includes a simple test suite (to be run using nose).