Caching Strategies Every Developer Should Know (With Examples)
A practical guide to caching techniques including Redis, CDN caching, database caching, and application-level strategies.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

The Fastest Code Is Code That Never Runs
Every millisecond of latency costs you users. Amazon found that 100ms of added latency reduced sales by 1%. Google discovered that a 500ms delay in search results dropped traffic by 20%. Caching is how you eliminate that latency -- but most developers implement it wrong, ending up with stale data bugs, cache stampedes, and systems that are harder to debug than they were slow.
I've built caching layers for systems handling 200,000+ requests per second. The patterns are well-established, but the devil is in the implementation details. This guide covers every caching strategy you'll encounter in production, with concrete examples and the failure modes nobody warns you about.
What Is Caching?
Definition: Caching is the practice of storing copies of frequently accessed data in a faster storage layer so that future requests for that data can be served more quickly. Caches trade storage space and data freshness for reduced latency and lower load on origin data stores. Every cache must define an eviction policy and an invalidation strategy.
The caching hierarchy in a typical web application looks like this: browser cache, CDN edge cache, API gateway cache, application-level cache (Redis/Memcached), database query cache, and OS page cache. Each layer has different latency characteristics, invalidation complexity, and failure modes.
The Core Caching Patterns
Cache-Aside (Lazy Loading)
The application checks the cache first. On a miss, it fetches from the database, writes to cache, and returns the result. This is the most common pattern and the one you should default to.
async function getUser(userId: string) {
// Check cache first
const cached = await redis.get(`user:${userId}`);
if (cached) return JSON.parse(cached);
// Cache miss -- fetch from database
const user = await db.users.findById(userId);
if (user) {
await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
}
return user;
}
Pros: Only caches data that's actually requested. Simple to implement. Cache failures don't break reads (just slower).
Cons: First request is always a cache miss. Data can become stale. Cache stampede risk on popular keys.
Write-Through
Every write goes to both the cache and the database synchronously. The cache is always up-to-date, eliminating stale reads. The trade-off is higher write latency since both stores must acknowledge the write.
async function updateUser(userId: string, data: Partial<User>) {
// Write to database
const user = await db.users.update(userId, data);
// Write to cache (synchronous with the DB write)
await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
return user;
}
Pros: Cache is always consistent with the database. No stale reads.
Cons: Higher write latency. Caches data that may never be read. Write failures require careful handling.
Write-Behind (Write-Back)
Writes go to the cache immediately, and the cache asynchronously flushes to the database. This gives you the lowest write latency but introduces the risk of data loss if the cache fails before flushing.
Pros: Lowest write latency. Batches database writes for efficiency.
Cons: Risk of data loss. Complex failure handling. Not suitable for financial transactions.
Read-Through
The cache itself is responsible for loading data on a miss. The application only talks to the cache, never directly to the database. This simplifies application code but requires a cache that supports data loading callbacks.
Pros: Simpler application code. Cache manages its own population.
Cons: Tighter coupling between cache and data store. Harder to debug miss behavior.
Caching Strategy Comparison
| Strategy | Read Latency | Write Latency | Consistency | Complexity | Data Loss Risk |
|---|---|---|---|---|---|
| Cache-Aside | Fast (on hit) | Normal | Eventual | Low | None |
| Write-Through | Fast | Higher | Strong | Medium | None |
| Write-Behind | Fast | Lowest | Eventual | High | Yes |
| Read-Through | Fast (on hit) | Normal | Eventual | Medium | None |
Eviction Policies: What Gets Removed When the Cache Is Full
LRU (Least Recently Used)
The most widely used eviction policy. Removes the item that hasn't been accessed for the longest time. Redis uses an approximated LRU algorithm that samples 5 keys and evicts the least recently used among them. This is efficient and works well for most workloads where recent access predicts future access.
LFU (Least Frequently Used)
Removes the item with the lowest access frequency. Better than LRU when you have a mix of frequently and rarely accessed items -- a popular item accessed 1,000 times won't be evicted just because it wasn't accessed in the last minute. Redis 4.0+ supports LFU natively.
TTL (Time-to-Live)
Items expire after a fixed duration regardless of access patterns. The simplest invalidation strategy and often the best starting point. Set TTL based on how stale your data can tolerate: 60 seconds for dashboards, 300 seconds for user profiles, 3600 seconds for configuration data.
Pro tip: Add jitter to your TTLs. If 1,000 cache entries all expire at the same second, you'll get a thundering herd hitting your database simultaneously. Use
TTL + random(0, TTL * 0.1)to spread expirations. This single technique prevents more outages than any other caching optimization.
Redis Caching: The Industry Standard
Redis is the default choice for application-level caching. It's fast (100,000+ operations/sec on a single core), supports rich data structures, and has built-in TTL support. Here's how to use it effectively:
Key Design Patterns
// Namespace your keys to avoid collisions
user:123:profile
user:123:sessions
product:456:details
product:456:reviews:page:1
// Use hash types for objects (saves memory vs JSON strings)
HSET user:123 name "Alice" email "alice@example.com" role "admin"
HGET user:123 name // Returns "Alice" without deserializing the whole object
Connection Pooling
Never create a new Redis connection per request. Use a connection pool. A single Redis instance handles 100,000+ ops/sec, but connection overhead can bottleneck you at 1,000 ops/sec if you're reconnecting each time. Most Redis client libraries support pooling -- configure a pool size of 10-50 connections for typical web applications.
CDN Caching: The First Layer Users Hit
CDN caching serves static assets (and sometimes dynamic content) from edge servers close to users. A properly configured CDN reduces origin server load by 80-95% for content-heavy sites.
Cache-Control Headers
// Static assets -- cache aggressively
Cache-Control: public, max-age=31536000, immutable
// API responses -- short cache with revalidation
Cache-Control: public, max-age=60, stale-while-revalidate=300
// Private data -- never cache on CDN
Cache-Control: private, no-store
The stale-while-revalidate directive is powerful: it serves the stale cached version immediately while fetching a fresh copy in the background. Users get instant responses, and the data updates within seconds.
Pricing and Cost Impact of Caching
| Caching Layer | Monthly Cost | Latency Reduction | Origin Load Reduction |
|---|---|---|---|
| Redis (AWS ElastiCache r7g.large) | $200/month | 50-100ms to <1ms | 60-80% |
| Redis (self-hosted, 16GB) | $50-80/month | 50-100ms to <1ms | 60-80% |
| CloudFront CDN (1TB transfer) | $85/month | 200-500ms to 10-30ms | 80-95% |
| Cloudflare CDN (Pro plan) | $20/month | 200-500ms to 10-30ms | 80-95% |
| Varnish (self-hosted) | $40-60/month | 50-200ms to <5ms | 70-90% |
| Application-level (in-memory) | $0 (uses app RAM) | 50ms to <0.1ms | 40-60% |
Note: The real cost savings come from reduced database load. A $200/month Redis instance can replace $2,000/month in database scaling costs. Calculate your cache ROI by measuring database query costs before and after implementing caching.
Common Caching Pitfalls and Solutions
Cache Stampede (Thundering Herd)
When a popular cache key expires, hundreds of concurrent requests simultaneously hit the database to regenerate it. Solutions:
- Locking -- the first request acquires a lock and regenerates the cache; others wait or serve stale data
- Early expiration -- regenerate the cache before it actually expires using a background process
- Probabilistic early recomputation -- each request has a small probability of regenerating the cache before expiry, spreading the load
Cache Penetration
Requests for data that doesn't exist bypass the cache every time (the cache stores nothing for a miss). Attackers can exploit this to overload your database. Solutions: cache null results with a short TTL (60 seconds), or use a Bloom filter to reject requests for keys that definitely don't exist.
Cache Avalanche
Many cache keys expire at the same time, causing a sudden spike in database load. This often happens after a cache restart or when TTLs are set to uniform values. Solution: randomize TTLs and implement gradual cache warming on startup.
Stale Data Bugs
The hardest caching problem. A user updates their profile, but cached responses still show old data. Solutions: invalidate on write (cache-aside), use write-through caching, or accept eventual consistency with short TTLs. Document your consistency guarantees per endpoint so your team knows what to expect.
Multi-Layer Caching Architecture
Production systems typically combine multiple caching layers. Here's a recommended setup for a high-traffic web application:
- Browser cache -- static assets with long max-age and content hashing for cache busting
- CDN edge cache -- static assets and cacheable API responses (public data, product listings)
- Application-level cache (Redis) -- session data, computed results, database query results
- Database query cache -- avoid for MySQL (deprecated and problematic), useful for PostgreSQL materialized views
- OS page cache -- automatically caches disk reads in RAM, no configuration needed
Watch out: Each caching layer adds invalidation complexity. If you cache at the CDN, application, and database levels, a single data update requires invalidating all three. Start with one layer (usually Redis) and add more only when you have measured evidence that it's needed.
Frequently Asked Questions
When should I use Redis vs Memcached?
Use Redis in almost all cases. Redis supports data structures (hashes, lists, sorted sets, streams), persistence, replication, Lua scripting, and pub/sub. Memcached is simpler and uses slightly less memory per key, but lacks these features. The only scenario where Memcached wins is when you need multi-threaded access to a simple key-value store and want to maximize memory efficiency for millions of small values. Redis 7.0+ with io-threads closes even that gap.
How do I decide what TTL to use?
Base it on how stale your data can be without impacting user experience. Static configuration: 1-24 hours. User profiles: 5-15 minutes. Product listings: 1-5 minutes. Real-time dashboards: 5-30 seconds. Financial data: 0 seconds (don't cache). Start with shorter TTLs and increase them after monitoring hit rates. A 90%+ hit rate with acceptable staleness means your TTL is right.
Should I cache database queries or API responses?
Cache at the highest level possible. Caching an API response avoids the database query, serialization, and business logic processing. Caching a database query only skips the database round-trip. For read-heavy APIs, cache the entire response with a unique key based on the URL and query parameters. For write-heavy systems where different APIs read the same data, cache at the query level so invalidation is simpler.
How do I warm a cache after a deploy or restart?
Build a cache warming script that pre-loads your most frequently accessed keys. Monitor your cache hit rates in production to identify the top 1,000-10,000 keys. On deploy, run the warming script before routing traffic to the new instance. For Redis, use SCAN to iterate existing keys and refresh them. Keep warming time under 60 seconds -- prioritize hot keys over completeness.
What's the difference between CDN caching and application caching?
CDN caching happens at edge servers geographically close to users, reducing network latency from 200-500ms to 10-30ms. It's best for static assets and public content. Application caching (Redis/Memcached) happens at your origin servers, reducing database latency from 5-50ms to under 1ms. It's best for personalized data, session state, and computed results. Most production systems use both layers together.
How do I monitor cache effectiveness?
Track four metrics: hit rate (target 90%+), miss rate, eviction rate, and memory usage. In Redis, use INFO stats to get keyspace_hits and keyspace_misses. Set up alerts when hit rate drops below 80% -- it usually means a key pattern changed or your TTL needs adjustment. Also monitor P99 latency for cache operations; if Redis latency spikes, check for large keys or blocking commands like KEYS.
Start Simple, Measure, Then Optimize
Don't build a five-layer caching architecture on day one. Start with cache-aside using Redis and sensible TTLs. Measure your hit rate, latency percentiles, and database load reduction. Add a CDN layer when static asset serving becomes a bottleneck. Consider write-through when stale data causes real user complaints.
The best caching strategy is the simplest one that meets your performance requirements. Every layer you add is a layer you have to monitor, debug, and invalidate correctly. Get one layer right before adding the next.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Node.js Performance Tuning: Handle 10x More Requests
Go from 800 to 15,000+ requests/second with clustering, Fastify, connection pooling, Redis caching, and event loop optimization. Real benchmarks included.
10 min read
ArchitectureHow to Scale a Next.js App to 100k Users Without Breaking the Bank
A practical playbook for scaling Next.js apps to 100k users using ISR, multi-layer caching, CDN optimization, and backend tuning -- all for under $500/month.
9 min read
ArchitectureMonolith vs Microservices: What Actually Works in 2026?
A realistic comparison of monolithic and microservices architectures with infrastructure cost breakdowns, migration strategies, and practical guidance for choosing the right approach.
9 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.