Caching Strategies Every Developer Should Know

Welcome, Developer 👋

If you’ve been doing system design interviews, you’ve heard this question before: “How would you scale this system to handle millions of requests?”

And nine times out of ten, caching is part of the answer.

But here’s the thing: most developers know that caching exists. Far fewer know how to reason about it, which strategy to pick, and what trade-offs they’re accepting when they do. That’s exactly what we’re going to fix today.

Grab your favourite beverage, developer. Let’s get into it.

Why Caching Exists

Every time your app hits a database, it pays a cost. Network latency. Disk I/O. Query execution. That cost might be 5ms or 500ms depending on your setup, but it adds up fast when you’re handling thousands of requests per second.

Caching solves this by storing the result of an expensive operation so the next request can skip it entirely. Instead of asking the database “give me user #1234 again”, you check a fast in-memory store first. If it’s there, you return it instantly. If not, you go to the database and store the result for next time.

Simple concept. But the complexity lives in the details.

The Cache Layers

Before we talk strategies, it’s worth understanding where caches live in a typical system.

In-process (L1): This is memory inside your application itself. Think of a simple Map object in Node.js that stores computed values. Fastest possible access, but it’s local to a single server instance and disappears when the process restarts. Not suitable for distributed systems.

Distributed cache (L2): This is where Redis or Memcached lives. A dedicated cache server (or cluster) that all your application instances talk to. Slightly slower than in-process because of the network hop, but shared across instances and persistent across restarts. This is what most production systems mean when they say “we cache this”.

CDN: For static assets and even API responses, a Content Delivery Network caches at the edge, physically close to your users. A user in São Paulo shouldn’t be waiting on a response from a server in Virginia. CDNs solve that.

Most real systems use all three layers together. Understand what each one is optimised for, and you’ll make better decisions.

The Four Core Strategies

1. Cache-Aside (Lazy Loading)

This is the most common pattern. Your application code is responsible for managing the cache.

The flow looks like this:

Request comes in.
Check the cache. If the data is there (cache hit), return it.
If it’s not there (cache miss), fetch from the database.
Store the result in the cache.
Return the result.

// assumes redis and db clients are initialised
async function getUser(userId: string): Promise<User | null> {
  const cacheKey = `user:${userId}`;
 
  // 1. Check cache first
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);
 
  // 2. Cache miss, go to database
  const user = await db.users.findById(userId);
  if (!user) return null;
 
  // 3. Store in cache with a TTL of 10 minutes
  await redis.set(cacheKey, JSON.stringify(user), 'EX', 600);
 
  return user;
}

The good: Only caches data that’s actually requested. As long as you handle cache errors gracefully, a miss just falls back to the database.

The trade-off: The first request after a cache miss (or expiry) always hits the database. If a lot of those happen at the same time, say after a cache flush, you get a thundering herd. More on that in a bit.

2. Write-Through

Here the cache is updated every time data is written to the database. Your app writes to the cache and the database at the same time.

// assumes redis and db clients are initialised
async function updateUser(userId: string, data: Partial<User>): Promise<User | null> {
  // Write to database
  const updated = await db.users.update(userId, data);
  if (!updated) return null;
 
  // Immediately update cache
  const cacheKey = `user:${userId}`;
  await redis.set(cacheKey, JSON.stringify(updated), 'EX', 600);
 
  return updated;
}

The good: Cache stays in sync with the database in the happy path. No stale reads after writes.

The trade-off: Every write pays the cost of updating two places. If you’re writing data that rarely gets read, you’re caching things nobody will ask for. Also worth knowing: if the DB write succeeds but the cache write fails, your cache is now stale despite using write-through. These two operations are not atomic, so you need to handle that failure case explicitly.

3. Write-Behind (Write-Back)

Your app writes to the cache immediately and returns. The cache then asynchronously flushes the data to the database in the background.

The good: Writes feel fast because you’re not waiting on the database. This pattern shines for high-volume, low-criticality writes like view counts, activity logs, or analytics events where write throughput matters more than immediate durability.

The trade-off: If the cache crashes before the flush happens, you lose data. This strategy trades durability for speed. Only use it when losing a small number of writes is acceptable and you have a plan for what happens if the cache goes down before flushing.

4. Read-Through

Similar to cache-aside, but the cache itself is responsible for fetching from the database on a miss, not your application code. This requires a caching layer that supports connectors or plugins capable of calling your data source directly. Some Redis modules like RedisGears support this, but it’s not a built-in feature of managed services like ElastiCache out of the box.

The good: Cleaner application code. The caching logic is centralised in one place instead of scattered across your services.

The trade-off: Less control over the fetch logic, and you still hit the cold start problem. The first request for any key always goes to the database, same as cache-aside. You also need a caching layer that explicitly supports this pattern, which limits your options.

The Hard Problems

Cache Invalidation

Phil Karlton famously said: “There are only two hard things in computer science: cache invalidation and naming things.”

He was right. The question isn’t just how to cache data. It’s when to remove it.

TTL (Time to Live) is the simplest approach. Set an expiry time on every cache entry and let it die naturally. Easy to implement, but your cache can serve stale data right up until expiry.

Event-driven invalidation is more precise. When data changes, you explicitly delete or update the relevant cache keys. Harder to get right, but more consistent. This is what you want for anything user-facing where stale data causes real problems.

One thing interviewers like to probe here: invalidation gets harder in a distributed setup. If you run multiple cache nodes or replicas, deleting a key in one place doesn’t automatically clear it everywhere. You either need a shared cache cluster or a way to broadcast invalidation events across nodes. Mention that and you’ve shown you’ve thought past the single-server case.

The next three problems are a family that interviewers love, and they often get confused for each other. They’re related but distinct, so it’s worth being precise.

Cache Avalanche

Imagine a large batch of cache entries all share the same expiry time. At midnight they expire together, and a flood of requests all miss the cache at once and hit your database simultaneously. The database can’t handle the surge and falls over. The same thing happens if your cache layer itself goes down and every request suddenly bypasses it.

This is cache avalanche: many keys failing at the same time.

A few ways to mitigate it:

Jitter on TTL: Add a small random offset to your expiry times so entries don’t all expire at the same moment. Instead of a fixed 600 seconds, use 600 + Math.floor(Math.random() * 60) as your TTL so entries are spread across a 60-second window.
Background refresh: Before the TTL expires, proactively refresh entries in the background so the cache is never actually empty.
Graceful degradation: If the cache layer goes down, protect the database with rate limiting or circuit breakers rather than letting every request through.

Thundering Herd (Hot Keys)

This is the single-key version of the problem. One cache key is so popular that it gets hit thousands of times per second. The moment it expires, even briefly, every one of those concurrent requests misses at once and stampedes the database for the same value.

A few ways to handle it:

Mutex / distributed lock: When the key expires, only one process is allowed to fetch from the database and repopulate the cache. The others wait for it, then read the fresh value.
Serve stale while refreshing: Return the slightly stale value to most requests while a single background task refreshes it.
L1 in front of L2: Put a short-lived in-process cache in front of Redis to absorb the hot reads before they ever reach the shared layer.

The distinction worth remembering: avalanche is many keys expiring together, thundering herd is one hot key getting hammered. Interviewers will notice if you keep them straight.

Cache Penetration

The previous two problems assume the data actually exists. Penetration is the opposite: requests for keys that will never be in the cache because the underlying record doesn’t exist at all.

Someone requests user:99999999, which isn’t a real user. The cache misses, the request falls through to the database, the database returns nothing, and so nothing gets cached. Every repeat of that request does the same thing. No TTL tuning helps, because there’s nothing to store. It’s also a real attack vector: hammer random non-existent IDs and you bypass the cache entirely, sending all that load straight to the database.

Two standard defenses:

Cache the empty result: Store a null or sentinel value with a short TTL so repeat misses for the same missing key get absorbed by the cache instead of hitting the database every time.
Bloom filter: Keep a probabilistic structure that can tell you a key definitely doesn’t exist before you even query. If the filter says no, you reject the request without touching the database.

Eviction

Worth one mental note even though it’s less dramatic than the others: your cache has finite memory. When it fills up, it evicts entries based on a policy, commonly LRU (least recently used) or LFU (least frequently used).

The practical consequence is that keys can disappear before their TTL expires. If you’re ever debugging mysterious cache misses on data that “should” still be there, eviction under memory pressure is a likely culprit. Knowing your eviction policy and watching your cache’s memory usage is part of running one in production.

How to Think About This in a System Design Interview

When you’re asked to design a system, bring up caching proactively. Here’s a simple mental framework:

What is expensive to compute or fetch? That’s your cache candidate.
How often does it change? Determines your TTL and invalidation strategy.
What happens if the cache serves stale data? Guides how aggressive your TTL should be.
What happens if the cache goes down? Your system should degrade gracefully, not crash.
How will you measure it? Cache hit ratio is the number that matters. A cache with a low hit ratio is adding latency and cost without buying you much, and you should be ready to say how you’d monitor and tune it.

It also helps to anchor your answer in the read/write profile of the system. Caching pays off most on read-heavy workloads where the same data is requested far more often than it changes. If a system is write-heavy or the data is rarely re-read, say so, and explain why a cache might not be the right tool there. Knowing when not to cache reads as senior.

When an interviewer asks “how would you handle 10x traffic?”, one of your first answers should be: “We’d introduce a caching layer between the application and the database using Redis, with a cache-aside pattern and TTL-based invalidation. For hot data, we’d add an in-process L1 cache on top.”

That answer signals distributed systems thinking. That’s what staff-level roles are looking for.

What I Learned

Caching looks simple from the outside. You store a value and retrieve it later. But real production systems deal with distributed caches, invalidation bugs, avalanches, hot keys, and penetration attacks constantly.

The developers who stand out are the ones who understand not just how to cache, but when not to, and what breaks when you do it wrong.

Once you internalize these patterns, you’ll start seeing them everywhere. In how Redis works, in how CDNs serve content, in how your ORM batches queries. It’s one of those foundational concepts that unlocks a lot of other things.

Conclusion

Here’s the honest summary, developer.

Caching is not a silver bullet. It’s a trade-off. You’re trading consistency for speed, and simplicity for scale. Every strategy we covered has a cost: write-through doubles your write overhead, write-behind risks data loss, cache-aside opens you up to thundering herds, and TTL-based invalidation means you’ll serve stale data sometimes.

The skill isn’t knowing that caching exists. It’s knowing which strategy fits your read/write ratio, your consistency requirements, and your failure tolerance, and being able to explain that reasoning clearly in a room with a whiteboard.

That’s what separates a developer who uses Redis from one who designs systems with it.

If you’re prepping for system design interviews, bookmark this one. And if you’re already using Redis in production, take a second look at your TTLs and invalidation logic. There might be a thundering herd waiting for you at midnight.

That’s it for today. If this was useful, share it with someone getting ready for their next system design round in an interview.

Stay focused, Developer! 🚀