Red Hat
Jul 13, 2012
by Adam Warski

Konrad recently shared on our company’s technical room an interesting article on how caching is done is a big polish social network, nk.pl. One of the central concepts in the algorithm is generational caching (see here or here). The basic idea is that for cache keys you use some entity-specific string + version number. The version number increases whenever data changes, thus invalidating any old cache entries, and preventing stale data reads. This makes the assumption that the cache has some garbage collection, e.g. it may simply be a LRU cache.

Of course on each request we must know the version number – that’s why it must be stored in a global cache (but depending on our consistency requirements, it also may be distributed across the cluster asynchronously). However the data itself can be stored in local caches. So if our system is read-most, the only “expensive” operation that we will have to do per request is retrieve the version numbers for the entities we are interested in. And this is usually very simple information, which can be kept entirely in-memory.

Depending on the type of data and the usage patterns, you can cache individual entities (e.g. for a Person entity, the cache key could be person-9128-123, 9128 being the id, 123 the version number), or the whole lot (e.g. for a Countries entity, the cache key could be countries-8, 8 being the version number). Moreover in the global cache you can keep the latest version number per-id or per-entity; meaning that when the version changes, you invalidate a specific entity or all of them.

Having written most of Envers, it quite naturally occurred to me that you may use the entity revision numbers as the cache versions. Subsequent Envers revisions are monotonically increasing numbers, for each transaction you get the next one. So whenever a cached entity changes, you would have to populate the global cache with the latest revision number.

Envers provides several ways to get the revision numbers. During the transaction, you can call AuditReader.getCurrentRevision() method, which will give you the revision metadata, including the revision number. If you want more fine-grained control, you may implement your own listener (EntityTrackingRevisionListener), see the docs), and get notified whenever an entity is changed, and update the global cache in there. You can also register an after-transaction-completed callback, and update the cache outside of the transaction boundaries. Or, if you know the entity ids, you may lookup the maximum revision number using either AuditReader.getRevisions or an AuditQueryCreator.

As you can obtain the current revision number during a transaction, you may even update the version/revision in the global cache atomically, if you use a transactional cache such as Infinispan.

All of that of course in addition to auditing, which is still the main purpose of Envers :)

Adam