executive summary

buffcacher is a simple Python script that shows how to use use free RAM on your machine to cache arbitrary key-values. Think of it like memcached running on your local machine, only with elastic memory usage and faster.

What should a 'cache' be? It means a lot of things, but to my mind the default programming type should be:

"keep around expensive-to-generate bits of read-only data in case we need them again, or until the computer really needs that RAM for something else"

Don't get too excited about Java's soft references - those are objects living in the small bit of your RAM assigned to your JVM.

Don't get excited about abusing varnish either (in this context; there's plenty else to get excited with varnish about, of course). You would be unlikely to want your cache to hit disk - you don't want it backed by a file.

But varnish is so nice and fast much because of it exploiting the VM - virtual memory - system on your OS.

All mainstream OSes have much the same strategy - unused pages of RAM are used to cache disk IO automatically. If your application is reading and/or writing to the same bits of a file, its likely getting reads back straight from the OS 'buffer cache' without hitting disk. And this happens much more than you think, since all the code and libraries are, at this level, just files too.

So I made a quick and dirty filesystem to (ab)use the OS VM system for the caching of arbitrary objects without them writing through to file. As the buffer cache sits in front of the filesystem, the filesystem's reading and writing can look like this:

int read(file*, offset, length, bytes*):
   return -EIO; # report an error
int write(file*, offset, length, bytes*):
   return length; # report success, but don't store the bytes anywhere

Now its a case of tracking where you have written your cached items to in some file on this filesystem, and if your reading back of the cached value at some later time succeeds, those bytes were read from the OS's cached pages!

In this way its easy to cache items in the OS and let them compete for free RAM with the IO pages. As caching is diminising returns, there's likely plenty of free RAM to store the hot IO and these cached values, with the remainder of free RAM churning with the speculative storing of soon-discarded cold pages of things.

I used FUSE to get this working. But it was quite slow. Unfair to say it's extremely slow, since, it is faster than memcached running on the same machine :)

But imagine a kernel module filesystem to do this... that'd be blazingly fast. The key lookup could be inside the kernel too; I could imagine a lookup being a dreaded ioctl that returned the file offset and length of items or reserved for storing new items. And if the read hits the filesystem callback, it could check to see if its a miss or if the filesystem itself explicitly evicted the value since it has been replaced by another value in since the offset/length was fetched with the ioctl; in this way, multiple users could use the same store (perfect for forking and other cooperative things; what would your gnome or KDE want to cache that wasn't naturally a file?). How fast might a kernel module be? Its the same as a RAM disk, basically. That's fast :)

You could go further; if you are using some NoSQL server, or even memcached, you likely have it on another node. As sets are a fraction of the gets, you could have a single thread on each client that listens to the NoSQL server for the hashcodes of set keys, and invokes some ioctl that invalidates them in the common cache. This way, all your redis or whatever reads on each of your webserver nodes could be cached locally.

And of course your language of choice could use such a mechanism under the hood for a proper soft reference system using an mmapped version of such a file.

My code took a shortcut - every key is aligned on a page boundary; with a small amount of extra accounting, you could pack them tightly if you always keep the last partially-used page in an explicit buffer. That's the kind of shortcut that prototypes can take :)

the code

  • kvfs is a FUSE filesystem; run it like this: ./kvfs -okernel_cache -obig_writes mount-point
  • kv.py is a client library to keep track of values in such a file
  • kv-test is a simple test app that simulates some pretty artificial-looking load. It can run against either kvfs or memcached.
William Edwards,
19 Jul 2010, 16:13
William Edwards,
19 Jul 2010, 16:16
William Edwards,
19 Jul 2010, 16:12