Caching (part 2)

What happens if you’re caching more data than you can possibly store in memory? Memcache will expire content that’s still actively being requested, leading to slow responses. Well, let’s stick it on the filesystem, that’s huge. Yes, it’s much slower than RAM, but much faster than having the content regenerated dynamically.

Level 2 - Filesystem

Everything learned from the previous post can be applied here, specifically cache key strategy. In order to allow manual directory operations on your filecache, I recommend segmenting the files based on the hash in the key, like so :

cache key : category-article-name-abcdef1234567890
filecache : /var/cache/
location : /var/cache/category/ab/category-article-name-abcdef1234567890

In the example above, /var/cache is the equivalent of memcache, but we take the first portion and make a directory of it (to allow forced expiration by category type). We also take the last portion (the hash), and create a subdirectory based on the first two characters of it. Otherwise the category directory could get rather full, and unmanageable.

The other difference between memcache and filecache is that if everything goes well, filecache will never expire. This is not ideal, and if you always fall back to filecache if your memcache fails to hit, then you’ll serve the same data forever. It is essential to make an active request to re-populate your filecache. Essential. But when?

Level 2.5 - Background Refresh

If the data’s in memcache, serve from there. If it’s in filecache, serve from there. Optionally, repopulate memcache to save processing on further requests. Once the client has their response, spawn a background process to request the same data that the client did, and populate memcache / filecache once it’s done.

By always serving from cache, requestors are getting quick response times and your servers have low load. The background process detailed above should have a mechanism to ensure that multiple processes don’t get spawned at once - pushing the filecache data back to memcache means that subsequent requests will be served from there, and hence the logic that says ‘launch a background request when the data is only in filecache’ won’t be triggered. Other methods of doing this include using a directory or queue as a dropbox for work, and the background process just looks in that directory for tasks.

If you’re lucky, you can serve all your content from cache, all the time. In the final part we’ll bring it all together.