Caching Strategies Every Developer Should Know (2026)

Q: When should I use Redis vs Memcached

Use Redis in almost all cases. Redis supports data structures (hashes, lists, sorted sets, streams), persistence, replication, Lua scripting, and pub/sub. Memcached is simpler and uses slightly less memory per key, but lacks these features. The only scenario where Memcached wins is when you need multi-threaded access to a simple key-value store and want to maximize memory efficiency for millions of small values. Redis 7.0+ with io-threads closes even that gap.

Q: How do I decide what TTL to use

Base it on how stale your data can be without impacting user experience. Static configuration: 1-24 hours. User profiles: 5-15 minutes. Product listings: 1-5 minutes. Real-time dashboards: 5-30 seconds. Financial data: 0 seconds (don't cache). Start with shorter TTLs and increase them after monitoring hit rates. A 90%+ hit rate with acceptable staleness means your TTL is right.

Q: Should I cache database queries or API responses

Cache at the highest level possible. Caching an API response avoids the database query, serialization, and business logic processing. Caching a database query only skips the database round-trip. For read-heavy APIs, cache the entire response with a unique key based on the URL and query parameters. For write-heavy systems where different APIs read the same data, cache at the query level so invalidation is simpler.

Q: How do I warm a cache after a deploy or restart

Build a cache warming script that pre-loads your most frequently accessed keys. Monitor your cache hit rates in production to identify the top 1,000-10,000 keys. On deploy, run the warming script before routing traffic to the new instance. For Redis, use SCAN to iterate existing keys and refresh them. Keep warming time under 60 seconds -- prioritize hot keys over completeness.

Q: What's the difference between CDN caching and application caching

CDN caching happens at edge servers geographically close to users, reducing network latency from 200-500ms to 10-30ms. It's best for static assets and public content. Application caching (Redis/Memcached) happens at your origin servers, reducing database latency from 5-50ms to under 1ms. It's best for personalized data, session state, and computed results. Most production systems use both layers together.

Q: How do I monitor cache effectiveness

Track four metrics: hit rate (target 90%+), miss rate, eviction rate, and memory usage. In Redis, use INFO stats to get keyspace_hits and keyspace_misses. Set up alerts when hit rate drops below 80% -- it usually means a key pattern changed or your TTL needs adjustment. Also monitor P99 latency for cache operations; if Redis latency spikes, check for large keys or blocking commands like KEYS.

The Thundering Herd That Took Out the Primary

On a Black Friday a few years ago, a product-catalog service I helped operate went from comfortable 6,000 RPS to a frozen 0 RPS in roughly eleven seconds. The architecture on paper was fine: Redis in front of Postgres, a nice 94% hit rate, cache-aside for every product page, TTL set to "one hour" to match merchandising's content schedule. What nobody had thought about was that "one hour" was a literal constant. Every product cached during the traffic ramp at 00:00 UTC expired at 01:00 UTC, simultaneously, on the exact second marketing had scheduled a push notification that drove a 4x spike.

At 01:00:00, 180,000 product keys evaporated from Redis at once. The push notification landed three seconds later. Every application pod cache-missed on every request, dogpiled onto Postgres, and in under ten seconds the database was at max_connections with a two-minute query queue. The first user-facing symptom was checkout failing. The second was the read replica falling behind. The third was the on-call hitting a wall of too many connections errors and realising every mitigation they had was already offline, because the admin tools also cached through the same Redis.

The fix was two lines of code: a random jitter on TTL (ttl + rand(0, ttl * 0.1)) and a single-flight lock on cache regeneration. Total patch size: 42 characters. Total revenue lost to the outage: into six figures. That is the caching failure mode they do not teach you -- not "cache misses are slow," but "uniform TTLs plus a traffic spike plus a shared backing store equals an outage that your hit-rate graph will swear was not your fault."

This guide is the pattern catalogue that would have prevented that incident -- and the four or five other classes of caching bug I have since watched take down production systems at 200,000+ RPS. Cache-aside, write-through, write-behind, and read-through are the strategies. Stampede, penetration, avalanche, and stale-data bugs are the failure modes. You need to know both, because every caching decision is a deliberate trade between latency and a specific way the system can break.

The Core Caching Patterns

Cache-Aside (Lazy Loading)

The application checks the cache first. On a miss, it fetches from the database, writes to cache, and returns the result. This is the most common pattern and the one you should default to.

async function getUser(userId: string) {
  // Check cache first
  const cached = await redis.get(`user:${userId}`);
  if (cached) return JSON.parse(cached);

  // Cache miss -- fetch from database
  const user = await db.users.findById(userId);
  if (user) {
    await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
  }
  return user;
}

Pros: Only caches data that's actually requested. Simple to implement. Cache failures don't break reads (just slower).

Cons: First request is always a cache miss. Data can become stale. Cache stampede risk on popular keys.

Write-Through

Every write goes to both the cache and the database synchronously. The cache is always up-to-date, eliminating stale reads. The trade-off is higher write latency since both stores must acknowledge the write.

async function updateUser(userId: string, data: Partial<User>) {
  // Write to database
  const user = await db.users.update(userId, data);
  // Write to cache (synchronous with the DB write)
  await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
  return user;
}

Pros: Cache is always consistent with the database. No stale reads.

Cons: Higher write latency. Caches data that may never be read. Write failures require careful handling.

Write-Behind (Write-Back)

Writes go to the cache immediately, and the cache asynchronously flushes to the database. This gives you the lowest write latency but introduces the risk of data loss if the cache fails before flushing.

Pros: Lowest write latency. Batches database writes for efficiency.

Cons: Risk of data loss. Complex failure handling. Not suitable for financial transactions.

Read-Through

The cache itself is responsible for loading data on a miss. The application only talks to the cache, never directly to the database. This simplifies application code but requires a cache that supports data loading callbacks.

Pros: Simpler application code. Cache manages its own population.

Cons: Tighter coupling between cache and data store. Harder to debug miss behavior.

Caching Strategy Comparison

Strategy	Read Latency	Write Latency	Consistency	Complexity	Data Loss Risk
Cache-Aside	Fast (on hit)	Normal	Eventual	Low	None
Write-Through	Fast	Higher	Strong	Medium	None
Write-Behind	Fast	Lowest	Eventual	High	Yes
Read-Through	Fast (on hit)	Normal	Eventual	Medium	None

Eviction Policies: What Gets Removed When the Cache Is Full

LRU (Least Recently Used)

The most widely used eviction policy. Removes the item that hasn't been accessed for the longest time. Redis uses an approximated LRU algorithm that samples 5 keys and evicts the least recently used among them. This is efficient and works well for most workloads where recent access predicts future access.

LFU (Least Frequently Used)

Removes the item with the lowest access frequency. Better than LRU when you have a mix of frequently and rarely accessed items -- a popular item accessed 1,000 times won't be evicted just because it wasn't accessed in the last minute. Redis 4.0+ supports LFU natively.

TTL (Time-to-Live)

Items expire after a fixed duration regardless of access patterns. The simplest invalidation strategy and often the best starting point. Set TTL based on how stale your data can tolerate: 60 seconds for dashboards, 300 seconds for user profiles, 3600 seconds for configuration data.

Pro tip: Add jitter to your TTLs. If 1,000 cache entries all expire at the same second, you'll get a thundering herd hitting your database simultaneously. Use TTL + random(0, TTL * 0.1) to spread expirations. This single technique prevents more outages than any other caching optimization.

Redis Caching: The Industry Standard

Redis is the default choice for application-level caching. It's fast (100,000+ operations/sec on a single core), supports rich data structures, and has built-in TTL support. Here's how to use it effectively:

Key Design Patterns

// Namespace your keys to avoid collisions
user:123:profile
user:123:sessions
product:456:details
product:456:reviews:page:1

// Use hash types for objects (saves memory vs JSON strings)
HSET user:123 name "Alice" email "alice@example.com" role "admin"
HGET user:123 name  // Returns "Alice" without deserializing the whole object

Connection Pooling

Never create a new Redis connection per request. Use a connection pool. A single Redis instance handles 100,000+ ops/sec, but connection overhead can bottleneck you at 1,000 ops/sec if you're reconnecting each time. Most Redis client libraries support pooling -- configure a pool size of 10-50 connections for typical web applications.

CDN Caching: The First Layer Users Hit

CDN caching serves static assets (and sometimes dynamic content) from edge servers close to users. A properly configured CDN reduces origin server load by 80-95% for content-heavy sites.

Cache-Control Headers

// Static assets -- cache aggressively
Cache-Control: public, max-age=31536000, immutable

// API responses -- short cache with revalidation
Cache-Control: public, max-age=60, stale-while-revalidate=300

// Private data -- never cache on CDN
Cache-Control: private, no-store

The stale-while-revalidate directive is powerful: it serves the stale cached version immediately while fetching a fresh copy in the background. Users get instant responses, and the data updates within seconds.

Pricing and Cost Impact of Caching

Caching Layer	Monthly Cost	Latency Reduction	Origin Load Reduction
Redis (AWS ElastiCache r7g.large)	$200/month	50-100ms to <1ms	60-80%
Redis (self-hosted, 16GB)	$50-80/month	50-100ms to <1ms	60-80%
CloudFront CDN (1TB transfer)	$85/month	200-500ms to 10-30ms	80-95%
Cloudflare CDN (Pro plan)	$20/month	200-500ms to 10-30ms	80-95%
Varnish (self-hosted)	$40-60/month	50-200ms to <5ms	70-90%
Application-level (in-memory)	$0 (uses app RAM)	50ms to <0.1ms	40-60%

Note: The real cost savings come from reduced database load. A $200/month Redis instance can replace $2,000/month in database scaling costs. Calculate your cache ROI by measuring database query costs before and after implementing caching.

Common Caching Pitfalls and Solutions

Cache Stampede (Thundering Herd)

When a popular cache key expires, hundreds of concurrent requests simultaneously hit the database to regenerate it. Solutions:

Locking -- the first request acquires a lock and regenerates the cache; others wait or serve stale data
Early expiration -- regenerate the cache before it actually expires using a background process
Probabilistic early recomputation -- each request has a small probability of regenerating the cache before expiry, spreading the load

Cache Penetration

Requests for data that doesn't exist bypass the cache every time (the cache stores nothing for a miss). Attackers can exploit this to overload your database. Solutions: cache null results with a short TTL (60 seconds), or use a Bloom filter to reject requests for keys that definitely don't exist.

Cache Avalanche

Many cache keys expire at the same time, causing a sudden spike in database load. This often happens after a cache restart or when TTLs are set to uniform values. Solution: randomize TTLs and implement gradual cache warming on startup.

Stale Data Bugs

The hardest caching problem. A user updates their profile, but cached responses still show old data. Solutions: invalidate on write (cache-aside), use write-through caching, or accept eventual consistency with short TTLs. Document your consistency guarantees per endpoint so your team knows what to expect.

Multi-Layer Caching Architecture

Production systems typically combine multiple caching layers. Here's a recommended setup for a high-traffic web application:

Browser cache -- static assets with long max-age and content hashing for cache busting
CDN edge cache -- static assets and cacheable API responses (public data, product listings)
Application-level cache (Redis) -- session data, computed results, database query results
Database query cache -- avoid for MySQL (deprecated and problematic), useful for PostgreSQL materialized views
OS page cache -- automatically caches disk reads in RAM, no configuration needed

Watch out: Each caching layer adds invalidation complexity. If you cache at the CDN, application, and database levels, a single data update requires invalidating all three. Start with one layer (usually Redis) and add more only when you have measured evidence that it's needed.

Frequently Asked Questions

When should I use Redis vs Memcached?

Use Redis in almost all cases. Redis supports data structures (hashes, lists, sorted sets, streams), persistence, replication, Lua scripting, and pub/sub. Memcached is simpler and uses slightly less memory per key, but lacks these features. The only scenario where Memcached wins is when you need multi-threaded access to a simple key-value store and want to maximize memory efficiency for millions of small values. Redis 7.0+ with io-threads closes even that gap.

How do I decide what TTL to use?

Base it on how stale your data can be without impacting user experience. Static configuration: 1-24 hours. User profiles: 5-15 minutes. Product listings: 1-5 minutes. Real-time dashboards: 5-30 seconds. Financial data: 0 seconds (don't cache). Start with shorter TTLs and increase them after monitoring hit rates. A 90%+ hit rate with acceptable staleness means your TTL is right.

Should I cache database queries or API responses?

Cache at the highest level possible. Caching an API response avoids the database query, serialization, and business logic processing. Caching a database query only skips the database round-trip. For read-heavy APIs, cache the entire response with a unique key based on the URL and query parameters. For write-heavy systems where different APIs read the same data, cache at the query level so invalidation is simpler.

How do I warm a cache after a deploy or restart?

Build a cache warming script that pre-loads your most frequently accessed keys. Monitor your cache hit rates in production to identify the top 1,000-10,000 keys. On deploy, run the warming script before routing traffic to the new instance. For Redis, use SCAN to iterate existing keys and refresh them. Keep warming time under 60 seconds -- prioritize hot keys over completeness.

What's the difference between CDN caching and application caching?

CDN caching happens at edge servers geographically close to users, reducing network latency from 200-500ms to 10-30ms. It's best for static assets and public content. Application caching (Redis/Memcached) happens at your origin servers, reducing database latency from 5-50ms to under 1ms. It's best for personalized data, session state, and computed results. Most production systems use both layers together.

How do I monitor cache effectiveness?

Track four metrics: hit rate (target 90%+), miss rate, eviction rate, and memory usage. In Redis, use INFO stats to get keyspace_hits and keyspace_misses. Set up alerts when hit rate drops below 80% -- it usually means a key pattern changed or your TTL needs adjustment. Also monitor P99 latency for cache operations; if Redis latency spikes, check for large keys or blocking commands like KEYS.

Start Simple, Measure, Then Optimize

Don't build a five-layer caching architecture on day one. Start with cache-aside using Redis and sensible TTLs. Measure your hit rate, latency percentiles, and database load reduction. Add a CDN layer when static asset serving becomes a bottleneck. Consider write-through when stale data causes real user complaints.

The best caching strategy is the simplest one that meets your performance requirements. Every layer you add is a layer you have to monitor, debug, and invalidate correctly. Get one layer right before adding the next.

Caching Strategies Every Developer Should Know (With Examples)

The Thundering Herd That Took Out the Primary

The Core Caching Patterns

Cache-Aside (Lazy Loading)

Write-Through

Write-Behind (Write-Back)

Read-Through

Caching Strategy Comparison

Eviction Policies: What Gets Removed When the Cache Is Full

LRU (Least Recently Used)

LFU (Least Frequently Used)

TTL (Time-to-Live)

Redis Caching: The Industry Standard

Key Design Patterns

Connection Pooling

CDN Caching: The First Layer Users Hit

Cache-Control Headers

Pricing and Cost Impact of Caching

Common Caching Pitfalls and Solutions

Cache Stampede (Thundering Herd)

Cache Penetration

Cache Avalanche

Stale Data Bugs

Multi-Layer Caching Architecture

Frequently Asked Questions

When should I use Redis vs Memcached?

How do I decide what TTL to use?

Should I cache database queries or API responses?

How do I warm a cache after a deploy or restart?

What's the difference between CDN caching and application caching?

How do I monitor cache effectiveness?

Start Simple, Measure, Then Optimize

Related Articles

Enjoyed this article?

Comments

Leave a comment

Stay in the loop