Caching Strategies Every Developer Should Know (With Examples)
A practical guide to caching techniques including Redis, CDN caching, database caching, and application-level strategies.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

The Thundering Herd That Took Out the Primary
On a Black Friday a few years ago, a product-catalog service I helped operate went from comfortable 6,000 RPS to a frozen 0 RPS in roughly eleven seconds. The architecture on paper was fine: Redis in front of Postgres, a nice 94% hit rate, cache-aside for every product page, TTL set to "one hour" to match merchandising's content schedule. What nobody had thought about was that "one hour" was a literal constant. Every product cached during the traffic ramp at 00:00 UTC expired at 01:00 UTC, simultaneously, on the exact second marketing had scheduled a push notification that drove a 4x spike.
At 01:00:00, 180,000 product keys evaporated from Redis at once. The push notification landed three seconds later. Every application pod cache-missed on every request, dogpiled onto Postgres, and in under ten seconds the database was at max_connections with a two-minute query queue. The first user-facing symptom was checkout failing. The second was the read replica falling behind. The third was the on-call hitting a wall of too many connections errors and realising every mitigation they had was already offline, because the admin tools also cached through the same Redis.
The fix was two lines of code: a random jitter on TTL (ttl + rand(0, ttl * 0.1)) and a single-flight lock on cache regeneration. Total patch size: 42 characters. Total revenue lost to the outage: into six figures. That is the caching failure mode they do not teach you -- not "cache misses are slow," but "uniform TTLs plus a traffic spike plus a shared backing store equals an outage that your hit-rate graph will swear was not your fault."
This guide is the pattern catalogue that would have prevented that incident -- and the four or five other classes of caching bug I have since watched take down production systems at 200,000+ RPS. Cache-aside, write-through, write-behind, and read-through are the strategies. Stampede, penetration, avalanche, and stale-data bugs are the failure modes. You need to know both, because every caching decision is a deliberate trade between latency and a specific way the system can break.
The Core Caching Patterns
Cache-Aside (Lazy Loading)
The application checks the cache first. On a miss, it fetches from the database, writes to cache, and returns the result. This is the most common pattern and the one you should default to.
async function getUser(userId: string) {
// Check cache first
const cached = await redis.get(`user:${userId}`);
if (cached) return JSON.parse(cached);
// Cache miss -- fetch from database
const user = await db.users.findById(userId);
if (user) {
await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
}
return user;
}
Pros: Only caches data that's actually requested. Simple to implement. Cache failures don't break reads (just slower).
Cons: First request is always a cache miss. Data can become stale. Cache stampede risk on popular keys.
Write-Through
Every write goes to both the cache and the database synchronously. The cache is always up-to-date, eliminating stale reads. The trade-off is higher write latency since both stores must acknowledge the write.
async function updateUser(userId: string, data: Partial<User>) {
// Write to database
const user = await db.users.update(userId, data);
// Write to cache (synchronous with the DB write)
await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
return user;
}
Pros: Cache is always consistent with the database. No stale reads.
Cons: Higher write latency. Caches data that may never be read. Write failures require careful handling.
Write-Behind (Write-Back)
Writes go to the cache immediately, and the cache asynchronously flushes to the database. This gives you the lowest write latency but introduces the risk of data loss if the cache fails before flushing.
Pros: Lowest write latency. Batches database writes for efficiency.
Cons: Risk of data loss. Complex failure handling. Not suitable for financial transactions.
Read-Through
The cache itself is responsible for loading data on a miss. The application only talks to the cache, never directly to the database. This simplifies application code but requires a cache that supports data loading callbacks.
Pros: Simpler application code. Cache manages its own population.
Cons: Tighter coupling between cache and data store. Harder to debug miss behavior.
Caching Strategy Comparison
| Strategy | Read Latency | Write Latency | Consistency | Complexity | Data Loss Risk |
|---|---|---|---|---|---|
| Cache-Aside | Fast (on hit) | Normal | Eventual | Low | None |
| Write-Through | Fast | Higher | Strong | Medium | None |
| Write-Behind | Fast | Lowest | Eventual | High | Yes |
| Read-Through | Fast (on hit) | Normal | Eventual | Medium | None |
Eviction Policies: What Gets Removed When the Cache Is Full
LRU (Least Recently Used)
The most widely used eviction policy. Removes the item that hasn't been accessed for the longest time. Redis uses an approximated LRU algorithm that samples 5 keys and evicts the least recently used among them. This is efficient and works well for most workloads where recent access predicts future access.
LFU (Least Frequently Used)
Removes the item with the lowest access frequency. Better than LRU when you have a mix of frequently and rarely accessed items -- a popular item accessed 1,000 times won't be evicted just because it wasn't accessed in the last minute. Redis 4.0+ supports LFU natively.
TTL (Time-to-Live)
Items expire after a fixed duration regardless of access patterns. The simplest invalidation strategy and often the best starting point. Set TTL based on how stale your data can tolerate: 60 seconds for dashboards, 300 seconds for user profiles, 3600 seconds for configuration data.
Pro tip: Add jitter to your TTLs. If 1,000 cache entries all expire at the same second, you'll get a thundering herd hitting your database simultaneously. Use
TTL + random(0, TTL * 0.1)to spread expirations. This single technique prevents more outages than any other caching optimization.
Redis Caching: The Industry Standard
Redis is the default choice for application-level caching. It's fast (100,000+ operations/sec on a single core), supports rich data structures, and has built-in TTL support. Here's how to use it effectively:
Key Design Patterns
// Namespace your keys to avoid collisions
user:123:profile
user:123:sessions
product:456:details
product:456:reviews:page:1
// Use hash types for objects (saves memory vs JSON strings)
HSET user:123 name "Alice" email "alice@example.com" role "admin"
HGET user:123 name // Returns "Alice" without deserializing the whole object
Connection Pooling
Never create a new Redis connection per request. Use a connection pool. A single Redis instance handles 100,000+ ops/sec, but connection overhead can bottleneck you at 1,000 ops/sec if you're reconnecting each time. Most Redis client libraries support pooling -- configure a pool size of 10-50 connections for typical web applications.
CDN Caching: The First Layer Users Hit
CDN caching serves static assets (and sometimes dynamic content) from edge servers close to users. A properly configured CDN reduces origin server load by 80-95% for content-heavy sites.
Cache-Control Headers
// Static assets -- cache aggressively
Cache-Control: public, max-age=31536000, immutable
// API responses -- short cache with revalidation
Cache-Control: public, max-age=60, stale-while-revalidate=300
// Private data -- never cache on CDN
Cache-Control: private, no-store
The stale-while-revalidate directive is powerful: it serves the stale cached version immediately while fetching a fresh copy in the background. Users get instant responses, and the data updates within seconds.
Pricing and Cost Impact of Caching
| Caching Layer | Monthly Cost | Latency Reduction | Origin Load Reduction |
|---|---|---|---|
| Redis (AWS ElastiCache r7g.large) | $200/month | 50-100ms to <1ms | 60-80% |
| Redis (self-hosted, 16GB) | $50-80/month | 50-100ms to <1ms | 60-80% |
| CloudFront CDN (1TB transfer) | $85/month | 200-500ms to 10-30ms | 80-95% |
| Cloudflare CDN (Pro plan) | $20/month | 200-500ms to 10-30ms | 80-95% |
| Varnish (self-hosted) | $40-60/month | 50-200ms to <5ms | 70-90% |
| Application-level (in-memory) | $0 (uses app RAM) | 50ms to <0.1ms | 40-60% |
Note: The real cost savings come from reduced database load. A $200/month Redis instance can replace $2,000/month in database scaling costs. Calculate your cache ROI by measuring database query costs before and after implementing caching.
Common Caching Pitfalls and Solutions
Cache Stampede (Thundering Herd)
When a popular cache key expires, hundreds of concurrent requests simultaneously hit the database to regenerate it. Solutions:
- Locking -- the first request acquires a lock and regenerates the cache; others wait or serve stale data
- Early expiration -- regenerate the cache before it actually expires using a background process
- Probabilistic early recomputation -- each request has a small probability of regenerating the cache before expiry, spreading the load
Cache Penetration
Requests for data that doesn't exist bypass the cache every time (the cache stores nothing for a miss). Attackers can exploit this to overload your database. Solutions: cache null results with a short TTL (60 seconds), or use a Bloom filter to reject requests for keys that definitely don't exist.
Cache Avalanche
Many cache keys expire at the same time, causing a sudden spike in database load. This often happens after a cache restart or when TTLs are set to uniform values. Solution: randomize TTLs and implement gradual cache warming on startup.
Stale Data Bugs
The hardest caching problem. A user updates their profile, but cached responses still show old data. Solutions: invalidate on write (cache-aside), use write-through caching, or accept eventual consistency with short TTLs. Document your consistency guarantees per endpoint so your team knows what to expect.
Multi-Layer Caching Architecture
Production systems typically combine multiple caching layers. Here's a recommended setup for a high-traffic web application:
- Browser cache -- static assets with long max-age and content hashing for cache busting
- CDN edge cache -- static assets and cacheable API responses (public data, product listings)
- Application-level cache (Redis) -- session data, computed results, database query results
- Database query cache -- avoid for MySQL (deprecated and problematic), useful for PostgreSQL materialized views
- OS page cache -- automatically caches disk reads in RAM, no configuration needed
Watch out: Each caching layer adds invalidation complexity. If you cache at the CDN, application, and database levels, a single data update requires invalidating all three. Start with one layer (usually Redis) and add more only when you have measured evidence that it's needed.
Frequently Asked Questions
When should I use Redis vs Memcached?
Use Redis in almost all cases. Redis supports data structures (hashes, lists, sorted sets, streams), persistence, replication, Lua scripting, and pub/sub. Memcached is simpler and uses slightly less memory per key, but lacks these features. The only scenario where Memcached wins is when you need multi-threaded access to a simple key-value store and want to maximize memory efficiency for millions of small values. Redis 7.0+ with io-threads closes even that gap.
How do I decide what TTL to use?
Base it on how stale your data can be without impacting user experience. Static configuration: 1-24 hours. User profiles: 5-15 minutes. Product listings: 1-5 minutes. Real-time dashboards: 5-30 seconds. Financial data: 0 seconds (don't cache). Start with shorter TTLs and increase them after monitoring hit rates. A 90%+ hit rate with acceptable staleness means your TTL is right.
Should I cache database queries or API responses?
Cache at the highest level possible. Caching an API response avoids the database query, serialization, and business logic processing. Caching a database query only skips the database round-trip. For read-heavy APIs, cache the entire response with a unique key based on the URL and query parameters. For write-heavy systems where different APIs read the same data, cache at the query level so invalidation is simpler.
How do I warm a cache after a deploy or restart?
Build a cache warming script that pre-loads your most frequently accessed keys. Monitor your cache hit rates in production to identify the top 1,000-10,000 keys. On deploy, run the warming script before routing traffic to the new instance. For Redis, use SCAN to iterate existing keys and refresh them. Keep warming time under 60 seconds -- prioritize hot keys over completeness.
What's the difference between CDN caching and application caching?
CDN caching happens at edge servers geographically close to users, reducing network latency from 200-500ms to 10-30ms. It's best for static assets and public content. Application caching (Redis/Memcached) happens at your origin servers, reducing database latency from 5-50ms to under 1ms. It's best for personalized data, session state, and computed results. Most production systems use both layers together.
How do I monitor cache effectiveness?
Track four metrics: hit rate (target 90%+), miss rate, eviction rate, and memory usage. In Redis, use INFO stats to get keyspace_hits and keyspace_misses. Set up alerts when hit rate drops below 80% -- it usually means a key pattern changed or your TTL needs adjustment. Also monitor P99 latency for cache operations; if Redis latency spikes, check for large keys or blocking commands like KEYS.
Start Simple, Measure, Then Optimize
Don't build a five-layer caching architecture on day one. Start with cache-aside using Redis and sensible TTLs. Measure your hit rate, latency percentiles, and database load reduction. Add a CDN layer when static asset serving becomes a bottleneck. Consider write-through when stale data causes real user complaints.
The best caching strategy is the simplest one that meets your performance requirements. Every layer you add is a layer you have to monitor, debug, and invalidate correctly. Get one layer right before adding the next.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Node.js Performance Tuning: Handle 10x More Requests
Go from 800 to 15,000+ requests/second with clustering, Fastify, connection pooling, Redis caching, and event loop optimization. Real benchmarks included.
10 min read
ArchitectureHow to Scale a Next.js App to 100k Users Without Breaking the Bank
A practical playbook for scaling Next.js apps to 100k users using ISR, multi-layer caching, CDN optimization, and backend tuning -- all for under $500/month.
9 min read
ArchitectureRedis vs Kafka vs RabbitMQ: When to Use What (Real Examples)
A developer-focused comparison of Redis, Kafka, and RabbitMQ. Covers architecture, performance, use cases, and decision-making guidelines with real-world scenarios.
10 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.