Last Updated: 2023-08-15

Why Caching

Caching is a fundamental building block in computer systems.

  • to front a database or other persistent store to improve latency and efficiency.
  • memorize data which is expensive to recompute, and doesn't come from a persistent store.
  • for microservices: services are stateless, which makes it easier to (auto)scale them. They often achieve the stateless loose coupling by maintaining state in caches or persistent stores.

When not to use cache

  • if it takes longer to retrieve from cache (e.g. redis) than from database
  • reload cache: app may be non-operational for some time while the cache is priming.

How to choose a cache

Read-optimized cache vs read-write cache

Use a read-optimized (or "serving") cache if:

  • The data being cached is the same as the data in the primary store or the result of offline pre-computations.
  • either caching all of the base data or can pre-determine which subset of the data to cache.

Use a read-write cache if:

  • The data being cached is the result of online computations, such as the results of a backend query.
  • the cached set is determined dynamically, such as based on the real-time popularity of keys.

Local vs Remote

  • local cache: the cache is linked in the binary, query by a function call; the cached data lives in the local memory or disk.
  • remote cache: the cache lives on another server (can be deployed on high memory instances), accessed by an RPC call.
  • or use both: if local cache missed, query remote cache.

Cache Efficiency

In general what a cache should try to do is to retain the keys that have the highest probability of being accessed in the future.

  • Eviction: the efficiency of a cache is related to how good decisions it makes about what data to evict.
  • Hits/misses ratio: the percentage of read queries that the cache is able to serve.
  • Skewed: often a small percentage of keys get a very large percentage of all the accesses.
  • Patterns may change: the access pattern often changes over time, which means that as time passes certain keys that were very requested may no longer be accessed often, and conversely, keys that once were not popular may turn into the most accessed keys.

Cache Replication

  • Replicating caches globally helps with the "thundering herd" scenario: without global replication, member traffic shifting from one region to another would encounter “cold” caches for those members in the new region. Processing the cache misses would lengthen response times and overwhelm the databases.
  • When the compute systems write data to a local cache, the data has to be replicated to all regions so it’s available to serve member requests no matter where they originate.

Read: Caching For Global Netflix


Redis and Memcached are the most popular options.

  • Redis (C)
  • Memcached (C): an in-memory key-value store.
  • EVCache (Java): open sourced by Netflix. "EVCache is a memcached & spymemcached based caching solution that is mainly used for AWS EC2 infrastructure for caching frequently used data."
  • Ehcache (Java)

Cloud services:

  • Amazon ElastiCache
  • Google Cloud Memorystore
  • Azure Cache for Redis


Keys can also be set with a TTL, or expiration time. If no expiration time is set, the key will be kept until space is needed for new keys and something needs to be “evicted.” (LRU)


mcrouter is a memcached connection pooler and proxy developed at Facebook. https://github.com/facebook/mcrouter


Redis means REmote DIctionary Server


Redis is single threaded. However there are threads in order to perform certain slow operations on disk, perform asynchronous tasks on a different thread was called bio.c: Background I/O.

Redis uses an approximate LRU algorithm, so it’s not suitable if you have precise LRU requirements for caching

Top 5 uses for Redis: content caching; user session store; job & queue management; high speed transactions; notifications.

Redis Sentinel is a monitoring solution for Redis instances that handles automatic failover of Redis masters and service discovery (who is the current master for a given group of instances?).

Memcached vs redis

TL;DR: For anything new, use Redis.

Memcached is a volatile in-memory key/value store. Redis can act like one (and do that job as well as memcached), but it is a data structure server.

You can configure Redis to behave exactly like memcache by disabling persistence (disable all of the "save" entries in redis.conf).

It speaks a superset of the memcache protocol,

Redis offers a lot more than just cache. You can even use it as a datastore because it can persist data to disk.

Here are a few of the features that redis offers which memcached doesn't and allows redis to be used as a "real" data store instead of just a cache.

  • Powerful data types and powerful commands to leverage them. Hashes, Sorted Sets, Lists, and more. (memcached: storing serialized lists of values in a given key)
  • Persistence to disk, by default.
  • Transactions with optimistic locking (WATCH/MULTI/EXEC)
  • Pub/sub. Extremely fast.
  • Values up to 512MB in size (memcached limited to 1MB per key)
  • Lua scripting (as of 2.6)
  • Built in clustering (as of 3.0)
  • Extremely fast at everything.

They allow redis to provide a fantastic shared queue (lists), a great messaging solution (pub/sub), a good place for storing sessions (hashes), and a compelling place for high score tracking (sorted sets).

LRU algorithm

One case where memcached is still superior to redis is the case where all you need is a rather dumb cache with absolutely predictable hard memory limit. When redis is (ab)used as LRU-like cache the same way as memcached (plain get/set operations at rather high rate) its memory consumption can steadily grow due to internal memory fragmentation (jemalloc). Memcached, on the other hand, due to slab allocator and slab-aware LRU approach, never exceeds its memory limit.

TAO: Facebook's Cache

  • write-through cache, which means writes are done synchronously both to the cache and to the backing store.
  • Memcached was designed for single servers
  • TAO uses a write-through architecture where Memcache is a demand-filled look-aside cache
  • several core components of the system remain very similar, such as the internal request protocol and the memory structure.
  • Memcache and TAO also use a similar channel to replicate invalidations across geographical regions, which relies on writing dirty queries to the backing db store and letting the MySQL replication channel handle some of the work.


jitter in caching



  • auto-sharding with load balancing
  • cross-process state transfer to preserve cache hit rate
  • cache invalidations / updates to keep caches fresh


  • Browsers and operating systems have local cache.
  • CDN: user request may be served by CDN before it hits the servers.