Caching Strategies · System Internals

The big picture#

TL;DRthe 30-second version

A caching strategy is a contract between three actors — the application, the cache, and the backing store — about who reads, who writes, and in what order. The cache is fast but a copy; the store is slow but authoritative.
Read patterns decide who loads on a miss: cache-aside (lazy loading) puts the app in charge; read-through hides the load behind the cache. Both leave a brief window for stale data and duplicate loads.
Write patterns decide how a change reaches the store: write-through (cache + store together, safe but slow), write-back/write-behind (cache now, store later — fast but loses data on crash), write-around (store only), write-invalidate (store then delete the cached copy).
The three classic load-spike failures are the stampede (a hot key expires and a thousand misses hit the store at once), penetration (missing keys bypass the cache and hammer the store), and avalanche (mass simultaneous expiry). Fixes: request coalescing, negative caching, and jittered TTLs.
There is no best strategy — every one trades read latency vs. write latency vs. consistency vs. durability. You choose by read/write mix and how much staleness you can tolerate.

Everything below expands on these points. Read the core sections top to bottom for the full mental model; the collapsible "Go deeper" boxes hold the advanced internals (latency math, leases, tuning) you can skip on a first pass.

appreads and writes

hit → serve · miss ↓cache-aside fills the cache on a miss

cachefast · a COPY (RAM, ~ms)

load / flush / invalidate

backing storeslow · AUTHORITATIVE (DB / object store, ~10–100ms)

The three actors and the data flow between them

Start here: the problem it solves#

Your database can serve, say, a few thousand reads per second; your traffic wants a hundred thousand. A cache absorbs most of those reads from memory in well under a millisecond. But now the same data lives in two places, and the moment they can disagree you have a correctness problem layered on top of a performance win.

So caching is really a set of policy choices about three actors that must stay both fast and consistent: the application (which wants answers), the cache (a fast but ephemeral copy), and the backing store (the slow but authoritative source of truth). On a read miss, who loads the data and who populates the cache? On a write, do you update the cache, the store, or both — and in what order? How long does a cached entry live? And what happens when a popular key expires and a thousand requests miss at once?

Keeping the copy consistent with the source is the whole game. The cache only helps if it mostly hits; it only stays correct if writes and expiry keep its copy close enough to the store. Every strategy below is one answer to 'how do these three stay fast and agree?'

The trade-offEvery strategy trades among read latency, write latency, consistency, and durability. There is no universally best choice — you pick based on the read/write mix and how much staleness you can tolerate.

Read patterns: cache-aside vs read-through#

Cache-aside (also called lazy loading) puts the application in charge. The app checks the cache; on a miss it reads the store itself and then writes the value back into the cache so the next read hits. It's the most common pattern because it's simple and the cache and store stay decoupled — if the cache is down, the app just talks to the store directly (degraded, not broken). The cost: the loading logic lives in every caller, and there's a brief window where concurrent misses for the same key can each load it.

App asks the cache for key K.
Hit → return the cached value. Done in ~1ms.
Miss → app reads K from the backing store.
App writes K back into the cache (usually with a TTL).
App returns the value. Next read for K is a hit.

Read-through moves that logic into the cache layer. The application only ever talks to the cache; on a miss the cache itself loads from the store, populates, and returns the value. The behavior is similar to cache-aside, but the loading is centralized in one place (a cache library or provider) instead of duplicated across application code — and a single read-through cache can coalesce concurrent misses internally.

Either way, the first read is slowBoth populate the cache only after a miss, so the first request for a key pays the full store latency; subsequent reads are fast hits until the entry expires or is evicted. (To pre-warm, you write the cache on the write path — see write-through — or use refresh-ahead.)

Go deeperCache-aside vs read-through: the real differences

Where the load logic lives: cache-aside = in the app (flexible, but every caller must get it right); read-through = in the cache layer (centralized, but you depend on the provider's behavior).
Data model: read-through caches usually require the cached object to map cleanly to a store entity. Cache-aside can cache anything — a computed view, a joined result, a rendered fragment.
Failure mode: with cache-aside the app can fall back to the store if the cache is unreachable. A pure read-through path may fail if the cache is the only thing the app knows how to call.
Inconsistency window: both can briefly serve stale data, but cache-aside paired with write-invalidate is the classic combo that bounds it.

PredictWith cache-aside, two requests miss on the same cold key at the same instant. What happens?

Hint: Who loads on a miss, and is there any coordination between callers?

Both load from the store and both write the cache — a duplicate load. It's harmless for a single key (last write wins, value is the same), but if thousands of requests miss the same hot key at once it becomes a stampede that can overwhelm the store. The fix is request coalescing / single-flight: let one request load while the others wait and share the result.

Write patterns: through, back, around, invalidate#

Read patterns decide who fills the cache; write patterns decide how a change reaches the store and what happens to the cached copy. There are four common choices, and they differ mainly in where the latency lands and what a crash can lose.

Write-through — write the cache and the store synchronously, together, before acknowledging. The cache is always consistent with the store and reads are warm, but every write pays the store's latency (two hops). Best when reads soon follow writes and you can't tolerate stale cache.
Write-back (write-behind) — write only the cache, mark the entry dirty, acknowledge immediately, and flush dirty entries to the store later in batches. Writes are very fast and the store sees far fewer, coalesced writes — but a crash before the flush silently loses every un-flushed write. Durability is sacrificed for throughput.
Write-around — write straight to the store and skip the cache entirely. The cache fills only on a later read miss. Good for write-heavy data that's rarely read soon (logs, audit records): it avoids polluting the cache with entries nobody will read, but the first read after a write is always a miss.
Write-invalidate — write the store, then delete the cached entry (rather than updating it) so a stale copy can't linger and the next read reloads fresh. This is the standard write path paired with cache-aside reads.

Why invalidate instead of update the cache?Updating the cache on write seems natural but invites races: two concurrent writers can interleave their cache update and store update so the cache ends up holding the older value. Deleting the entry is idempotent and self-healing — the next reader simply reloads the current value from the store. 'Invalidate, don't update' is the safer default.

The recurring tension is consistency vs. speed vs. durability. Write-through is safe but slow; write-back is fast but risks loss; write-around and write-invalidate avoid serving stale data at the cost of more misses. Note these compose: write-through + read-through gives an always-warm consistent cache; cache-aside + write-invalidate gives a lazy cache that's eventually consistent within the TTL.

The numbers: latency, hit rate, staleness window#

A cache's payoff is governed by one equation. With cache hit latency h, store (miss) latency m, and hit rate r, the average read latency is: r·h + (1−r)·(m + h). The (m + h) on a miss reflects that a cache-aside miss pays the store and then the cache write. Because m is often 10–100× h, the hit rate r dominates everything.

Hit rate is non-linear in value. Going from 90% → 99% hit rate roughly 10× reduces the share of requests that touch the store — often the difference between the database coping and falling over. The last few percent of hit rate matter most, because misses are what generate store load.
Cache vs store latency: a RAM cache hit is ~0.1–1ms; a database miss is ~5–50ms; a cross-region or disk miss can be 100ms+. The whole point is to keep most reads on the fast path.
Store load is what you're really protecting. If the store handles Q queries/sec and you receive N reads/sec, you need a hit rate of at least 1 − Q/N or the store saturates. A cache isn't just for latency — it's a shield that bounds backing-store QPS.

The staleness windowWith TTL-based expiry, the worst-case staleness is the TTL: a value changed in the store right after a cache fill can be served stale for up to TTL seconds. With write-invalidate, the window shrinks to the replication/propagation delay of the invalidation. Choosing the TTL is choosing how stale you're willing to be.

Go deeperWorking set, memory, and why hit rate plateaus

Hit rate is bounded by how much of the working set fits in cache memory. If 20% of keys serve 80% of reads (a Zipfian access pattern, common in practice), caching that hot 20% already gets you most of the benefit — but pushing hit rate higher means caching the long tail, which needs disproportionately more memory for diminishing returns. This is why eviction policy (LRU/LFU) matters: it decides which keys survive when memory is full, and a good policy keeps the genuinely hot set resident.

Two more amplifiers of effective hit rate: larger TTLs keep entries resident longer (higher hit rate, more staleness), and request coalescing turns a burst of N concurrent misses into a single store load (the store sees 1 query, not N). Both trade something — staleness, or a little added latency for the waiters — for less store load.

Variants: TTL expiry, refresh-ahead, negative caching#

TTL-based expiry — each entry carries a time-to-live; after it elapses the entry is treated as absent and the next read reloads. The simplest way to bound staleness without explicit invalidation. Shorter TTL = fresher but lower hit rate; longer TTL = higher hit rate but staler.
Refresh-ahead — proactively reload an entry that is hot and near expiry, before it actually expires, so reads never hit a cold miss. It trades extra background loads (some wasted, for keys that wouldn't have been read again) for lower tail latency on hot keys. Effective only when access is predictable.
Negative caching — cache the fact that a key does NOT exist (a 'not found' sentinel) for a short TTL. Without it, every request for a missing key bypasses the cache and hits the store every time — the basis of cache penetration. A short negative TTL absorbs those misses while limiting how long a later-created key stays invisible.

Stale-while-revalidateA popular hybrid (used by HTTP caches and CDNs): when an entry expires, keep serving the stale value to readers while a single background load refreshes it. Readers never block on a miss, and the store sees one refresh instead of a stampede — at the cost of briefly serving known-stale data.

The core tension: consistency vs latency vs durability#

Every strategy is a point in a triangle of consistency (does the cache agree with the store?), latency (how fast are reads and writes?), and durability (can an acknowledged write be lost?). You cannot maximize all three at once.

Consistency vs latency: write-through and write-invalidate keep the cache close to the store but add a synchronous store hop to every write. Cache-aside with a long TTL is fast but can serve data that's stale by up to the TTL.
Latency vs durability: write-back gives the fastest writes by acknowledging before the store is updated — but that un-flushed window is exactly what a crash loses. Write-through is durable because the store is updated before the ack.
Consistency across replicas: when many app servers each hold their own cache, a write must invalidate all of them. That needs an invalidation message bus (or short TTLs as a crude substitute), and there's always a propagation delay during which different servers can disagree.

Pick by workloadRead-heavy, staleness-tolerant (feeds, catalogs) → cache-aside + TTL. Write-then-read, must-be-fresh (user profile edits) → write-through. Write-heavy, rarely-read (logs, events) → write-around. Throughput-critical, loss-tolerant (counters, metrics buffers) → write-back with a persistent log to bound the loss.

Failure modes: stampede, penetration, avalanche, loss#

Caches fail in characteristic ways, almost always by letting too much traffic reach the store at once. The three load-spike failures have specific names and specific fixes.

Cache stampede (thundering herd) — a single hot key expires (or is evicted) and every concurrent request for it misses at the same instant, all stampeding the store with the same query. Fix: request coalescing / single-flight (one loader, the rest wait and share the result), a short lock or lease per key, or early/probabilistic recomputation that refreshes a hot key slightly before it expires.
Cache penetration — requests for keys that don't exist anywhere bypass the cache (nothing to hit) and hammer the store on every request. Common with malicious or buggy clients probing random IDs. Fix: negative caching (cache the 'not found' for a short TTL) and/or a Bloom filter in front of the cache to reject keys that definitely don't exist.
Cache avalanche — a large number of entries expire at the same moment (e.g. everything loaded at startup with the same TTL), so a huge wave of misses hits the store simultaneously. Fix: jitter the TTLs (add a small random spread) so expirations are smeared over time, and warm the cache gradually rather than all at once.
Write-back data loss — with write-behind, every write acknowledged but not yet flushed is held only in cache memory. A crash, eviction, or restart before the flush loses it permanently and silently. Mitigate with a persistent write-ahead log for the dirty buffer, replication of the cache tier, or by reserving write-back for data you can afford to lose.
Stale reads — any pattern that fills the cache and then lets the store change underneath it (long TTL, failed invalidation, replication lag) serves data older than the source of truth. Bound it with shorter TTLs, reliable invalidation, or versioned keys.

Go deeperFacebook's lease: stampede + stale-set in one mechanism

In 'Scaling Memcache at Facebook' (NSDI 2013), a cache miss returns a lease — a short-lived token — to just one client, which is then the only one allowed to load from the store and set the value. Other clients that miss the same key are told to wait briefly and retry, so the store sees one load, not thousands: stampede solved. The same lease also defends against stale sets — if the key was invalidated while a client was loading, that client's lease is voided and its stale value is rejected, so a slow loader can't overwrite fresher data. Leases are a clean, production-proven answer to two of the hardest caching races at once.

Strategies side by side#

Strategy	Consistency	Read/Write latency	Durability	Best for
Cache-aside (lazy)	Eventual (bounded by TTL)	Fast reads on hit; miss pays store; fast writes	Store is durable; cache loss is fine	Read-heavy, staleness-tolerant; the default
Read-through	Eventual (bounded by TTL)	Same as cache-aside; load centralized	Store is durable	Read-heavy with a cache library that loads for you
Write-through	Strong (cache = store)	Slow writes (cache + store sync); warm reads	Durable — store written before ack	Write-then-read, must-be-fresh data
Write-back / behind	Weak until flush	Fastest writes; warm reads	At risk — crash loses un-flushed writes	Throughput-critical, loss-tolerant (counters)
Write-around	Fresh on store; cache fills on read	Fast writes; first read is a miss	Durable — store written directly	Write-heavy, rarely-read-soon (logs, events)

Where these patterns run in the wild#

Redis & Memcached — the two default in-memory caches. The cache-aside + write-invalidate combo on Redis is the most common application caching pattern. Redis adds server-assisted client-side caching (the cache tells clients when their cached keys change, so each app server can keep a tiny local copy and have it invalidated automatically).
Facebook memcache — the canonical large-scale deployment. Cache-aside (they call it 'demand-filled look-aside'), leases to kill stampedes and stale sets, and a region-wide invalidation pipeline driven off the database commit log. Documented in 'Scaling Memcache at Facebook' (NSDI 2013).
Netflix EVCache — a distributed, replicated caching tier built on Memcached, spanning AWS availability zones. Writes fan out to multiple zones for availability; it's a global read-through/write-aside layer tuned for very high hit rates and zone failure tolerance.
CDNs (CloudFront, Fastly, Cloudflare) — caching at the network edge. They live and die by TTLs (Cache-Control headers), stale-while-revalidate, and request coalescing at each edge so origin servers see one fetch per object, not one per viewer.
Database query/buffer caches — the buffer pool in PostgreSQL/MySQL is an internal write-through-ish cache of disk pages; query result caches and materialized views are application-visible caches with their own invalidation rules.

The pattern is fractalOnce you see app↔cache↔store, you see it at every layer: CPU L1/L2/L3 caches in front of RAM, the OS page cache in front of disk, a CDN in front of your origin, and your app cache in front of the database. The same read/write/expiry questions recur at each level.

Common misconceptions & gotchas#

Cache-aside vs read-through — what's the actual difference?

Both load on a miss; the difference is WHERE the load logic lives. Cache-aside puts it in the application: the app checks the cache, and on a miss it reads the store and populates the cache itself. Read-through puts it in the cache layer: the app only talks to the cache, and the cache loads from the store on a miss. Cache-aside is more flexible and lets the app fall back to the store if the cache is down; read-through centralizes the logic and can coalesce concurrent misses for you.

When does write-back actually lose data?

Write-back (write-behind) acknowledges a write as soon as it's in the cache, then flushes to the store later in batches. Any write that has been acknowledged but not yet flushed lives only in cache memory — so a crash, restart, or eviction of that dirty entry before the flush loses it permanently and silently. The exposure is the flush interval times the write rate. Mitigate with a persistent log for the dirty buffer or by replicating the cache tier; or only use write-back for data you can afford to lose.

What's a cache stampede and how do you stop it?

A stampede (thundering herd) happens when a hot key expires and every concurrent request for it misses at the same instant, all hitting the store with the identical query and potentially overwhelming it. The primary fix is request coalescing / single-flight: let exactly one request load the value while the others wait and share the result. Supporting defenses: a per-key lock or lease (Facebook's approach), TTL jitter so keys don't expire together, and early/probabilistic recomputation that refreshes a hot key shortly before it expires.

What are the trade-offs of TTL length?

A shorter TTL means fresher data (smaller staleness window) but a lower hit rate and more store load, because entries expire and reload more often. A longer TTL means a higher hit rate and less store load but more staleness — you can serve data up to TTL seconds out of date. Also always jitter TTLs: identical TTLs on many keys cause a cache avalanche when they all expire at once.

Should I update the cache on a write, or just delete it?

Prefer deleting (invalidating). Updating the cache on write opens a race: two concurrent writers can interleave so the cache ends up holding the older value. Deleting is idempotent — the next read simply reloads the current value from the store. Update-in-place is only worth it when reloads are expensive and you've handled the ordering carefully.

QuizAt startup you load 50,000 keys into Redis, each with a fixed 1-hour TTL. An hour later your database briefly falls over. What happened, and what's the fix?

Cache penetration; add a Bloom filter
Cache avalanche; add random jitter to the TTLs so they don't all expire together
Write-back data loss; enable a persistent log
Nothing caching-related; the database just failed

Show answer

Cache avalanche; add random jitter to the TTLs so they don't all expire together — All 50,000 keys were loaded at the same instant with the same TTL, so they all expired at the same instant an hour later — a cache avalanche. Every one of those keys then missed simultaneously and the wave of reloads overwhelmed the database. The fix is TTL jitter: add a small random spread (e.g. 3600 ± a few hundred seconds) so expirations are smeared over time instead of synchronized.

In an interview#

Frame it as three actors — app, cache, backing store — and two sets of choices. Name the read patterns (cache-aside / lazy loading, read-through) and the write patterns (write-through, write-back, write-around, write-invalidate), then reason about which fits the workload: read-heavy with tolerable staleness → cache-aside + TTL; write-then-read, must-be-fresh → write-through; write-heavy, rarely-read → write-around; throughput-critical, loss-tolerant → write-back with a persistent log. Say explicitly why you invalidate rather than update the cache (idempotent, avoids write races).

Then volunteer the failure modes — this is what separates a strong answer. Stampede on hot-key expiry → request coalescing / single-flight + TTL jitter. Penetration on missing keys → negative caching or a Bloom filter. Avalanche on mass expiry → jittered TTLs. Write-back loss → persistent log or replication. Bonus points for naming Facebook's leases (one mechanism that kills both stampedes and stale sets) and for noting the cross-server consistency problem (invalidation bus vs. short TTLs).

Then open the simulator: WRITE with different strategies and watch the cache and store agree or diverge, READ to see hits vs. miss-then-load, TICK to expire a TTL and flush write-back, CRASH to lose dirty data, and BURST a hot key with stampede protection on and off.

References & further reading#

References

Nishtala et al. — Scaling Memcache at Facebook (NSDI 2013) — leases, stale-set protection, and region-wide invalidation at scale
AWS — Database Caching Strategies Using Redis (whitepaper) — cache-aside, write-through, TTL, and when to use each
AWS ElastiCache — Caching strategies (lazy loading, write-through, TTL) — pseudocode and trade-offs for the core read/write patterns
Redis — Client-side caching (server-assisted, tracking & invalidation) — how Redis invalidates per-client local caches
Netflix Tech Blog — Announcing EVCache — a replicated, multi-AZ Memcached caching tier

Ready to try it?

The simulator is a real, deterministic implementation — pick a scenario and step through it, scrubbing the timeline forward and backward through every change.

Try these in the simulator

Write-through + cache-aside →A write lands in both cache and store (write-through), a read serves from cache, time expires a TTL, then a burst on a hot key shows the thundering herd with no protection.Write-back: fast but risky →Write-back acks the moment the cache is updated and flushes to the store later on a tick — fast writes, but a crash before the flush loses the data.Stampede protection coalesces loads →The same burst of concurrent reads on a cold key, but with stampede protection on — the misses collapse into a single store load instead of a herd.

Open the Caching Strategies simulator →

Up nextEvent Loop

← Back to the learning path