Distributed Systems - Cache

Updated: 2018-12-11

Where Does Cache Happen

  • local cache: browser and operating system
  • CDN: user request may be served by CDN before it hits the servers
  • Server: things we are discussed below. Can be deployed on high memory instances.

Why Cache

  • fast, reliable access to multiple types of data
  • micro services: services are stateless, which makes it easier to (auto)scale them. They often achieve the stateless loose coupling by maintaining state in caches or persistent stores.
  • to front a database or other persistent store.
  • memorize data which is expensive to recompute, and which doesn't come from a persistent store.

When not to use cache

  • if it takes longer to retrieve from cache(e.g. redis) than from database
  • reload cache: app may be non-operational for some time while the cache is priming.

Cache Efficiency

In general what a cache should try to do is to retain the keys that have the highest probability of being accessed in the future.

  • Eviction: the efficiency of a cache is related to how good decisions it makes about what data to evict.
  • Hits/misses ratio: the percentage of read queries that the cache is able to serve.
  • Skewed: often a small percentage of keys get a very large percentage of all the accesses.
  • Patterns may change: the access pattern often changes over time, which means that as time passes certain keys that were very requested may no longer be accessed often, and conversely, keys that once were not popular may turn into the most accessed keys.

Cache Replication

  • Replicating caches globally helps with the "thundering herd" scenario: without global replication, member traffic shifting from one region to another would encounter “cold” caches for those members in the new region. Processing the cache misses would lengthen response times and overwhelm the databases.

  • When the compute systems write data to a local cache, the data has to be replicated to all regions so it’s available to serve member requests no matter where they originate.

  • Read: Caching For Global Netflix

Options

  • redis (C)
  • memcached (C): memcached is a key value

    • EVCache
  • ehcache (Java)

Memcached

Keys can also be set with a TTL, or expiration time. If no expiration time is set, the key will be kept until space is needed for new keys and something needs to be “evicted.” (LRU)

mcrouter

mcrouter is a memcached connection pooler and proxy developed at Facebook. https://github.com/facebook/mcrouter

Redis

Redis means REmote DIctionary Server

https://redis.io/topics/lru-cache

Redis is single threaded. However there are threads in order to perform certain slow operations on disk, perform asynchronous tasks on a different thread was called bio.c: Background I/O.

Redis uses an approximate LRU algorithm, so it’s not suitable if you have precise LRU requirements for caching

Top 5 uses for Redis: content caching; user session store; job & queue management; high speed transactions; notifications.

Memcached vs redis

TL;DR: For anything new, use Redis.

Memcached is a volatile in-memory key/value store. Redis can act like one (and do that job as well as memcached), but it is a data structure server.

You can configure Redis to behave exactly like memcache by disabling persistence (disable all of the "save" entries in redis.conf).

It speaks a superset of the memcache protocol,

Redis offers a lot more then just cache. You can even use it as a datastore because it can persist data to disk.

Here are a few of the features that redis offers which memcached doesn't and allows redis to be used as a "real" data store instead of just a cache.

  • Powerful data types and powerful commands to leverage them. Hashes, Sorted Sets, Lists, and more. (memcached: storing serialized lists of values in a given key)
  • Persistence to disk, by default.
  • Transactions with optimistic locking (WATCH/MULTI/EXEC)
  • Pub/sub. Extremely fast.
  • Values up to 512MB in size (memcached limited to 1MB per key)
  • Lua scripting (as of 2.6)
  • Built in clustering (as of 3.0)
  • Extremely fast at everything.

They allow redis to provide a fantastic shared queue (lists), a great messaging solution (pub/sub), a good place for storing sessions (hashes), and a compelling place for high score tracking (sorted sets).

LRU algorithm

One case where memcached is still superior to redis is the case where all you need is a rather dumb cache with absolutely predictable hard memory limit. When redis is (ab)used as LRU-like cache the same way as memcached (plain get/set operations at rather high rate) its memory consumption can steadily grow due to internal memory fragmentation (jemalloc). Memcached, on the other hand, due to slab allocator and slab-aware LRU approach, never exceeds its memory limit.

TAO: Facebook's Cache

  • write-through cache, which means writes are done synchronously both to the cache and to the backing store.
  • Memcached was designed for single servers
  • TAO uses a write-through architecture where Memcache is a demand-filled look-aside cache
  • several core components of the system remain very similar, such as the internal request protocol and the memory structure.
  • Memcache and TAO also use a similar channel to replicate invalidations across geographical regions, which relies on writing dirty queries to the backing db store and letting the MySQL replication channel handle some of the work.

Reads

jitter in caching

http://highscalability.com/blog/2012/4/17/youtube-strategy-adding-jitter-isnt-a-bug.html