Tools & Techniques

Caching Strategies for Web Applications

A cache trades storage for speed. Getting that trade right requires understanding the patterns — cache-aside, write-through, write-behind — and knowing which cache invalidation approach keeps data consistent without making the cache pointless.

Published June 22, 2026

There is a saying attributed, somewhat apocryphally, to Phil Karlton: "There are only two hard things in computer science: cache invalidation and naming things." Naming things is hard because language is imprecise. Cache invalidation is hard because you're managing two sources of truth, and keeping them synchronized under concurrent reads and writes is a genuinely difficult distributed systems problem. Understanding the standard patterns helps you choose the right trade-offs rather than discovering them through production incidents.

Why caches exist: the cost gap

A Redis GET from an application server on the same network typically takes under a millisecond. A PostgreSQL query that touches multiple indexed tables might take 5–50 ms, and one that requires a sequential scan or joins unindexed columns can take seconds. An HTTP request to an external API might add 200–500 ms of latency plus rate limits.

These gaps mean that the value of a cache depends entirely on your hit rate. If 95% of reads are satisfied by the cache and only 5% fall through to the database, you've cut database load by 20x. If your access pattern is so varied that every request is a cache miss, you've added a network hop and the latency of a failed cache lookup without gaining anything.

The first thing to verify before adding a cache is whether your workload has the right shape: high read-to-write ratio, repeated access to the same keys, and data that can tolerate being slightly stale. Systems with mostly writes, perfectly uniform read distribution, or requirements for strict consistency gain little from caching.

Cache-aside (lazy loading)

Cache-aside is the most common pattern. The application manages the cache explicitly: read from the cache first; on a miss, read from the database and populate the cache; on writes, update the database and invalidate or update the cache entry.

def get_user(user_id):
    key = f"user:{user_id}"
    cached = redis.get(key)
    if cached:
        return json.loads(cached)

    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    redis.setex(key, 300, json.dumps(user))   # TTL: 5 minutes
    return user

def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    redis.delete(f"user:{user_id}")   # invalidate after write

The advantages of cache-aside are flexibility and resilience: the application only caches what it actually reads, and if the cache goes down, requests fall through to the database (slowly, but correctly). The disadvantage is that the first request for a key after a miss or invalidation is always slow — the "cold start" problem. Under high concurrency, many simultaneous misses for the same key can cause a thundering herd: dozens of threads all simultaneously query the database for the same row and all try to write the result to the cache.

Mitigation: use a short distributed lock (a Redis SETNX with expiry) so only one request refreshes the cache while others wait for the result, or use probabilistic early expiration to refresh keys slightly before they expire rather than letting them expire cold.

Read-through caching

Read-through is similar to cache-aside, but the cache layer itself is responsible for fetching from the database on a miss, rather than the application code. Libraries like Spring's @Cacheable annotation implement this pattern: you annotate a method, and the framework intercepts calls to check the cache before invoking the method body.

@Service
public class UserService {
    @Cacheable(value = "users", key = "#userId")
    public User getUser(Long userId) {
        // Only called on a cache miss; the framework handles caching
        return userRepository.findById(userId).orElseThrow();
    }

    @CacheEvict(value = "users", key = "#user.id")
    public User updateUser(User user) {
        return userRepository.save(user);
    }
}

Read-through simplifies application code but requires the cache infrastructure to understand your data source, either via a configurable loader or a framework-managed integration. It does not solve the thundering herd problem inherently; that requires separate coalescing logic.

Write-through caching

In a write-through cache, every write goes to the cache and the database synchronously before the write operation completes. The cache is always in sync with the database for any key that has been written through.

The benefit is that reads always hit warm data for recently written keys. The cost is that every write pays the latency of two stores (cache + database) instead of one. For write-heavy workloads, this doubles the latency of every mutation.

Write-through is well-suited to systems where the read pattern is highly predictable and the data is frequently re-read after writes — user profile updates that are immediately displayed back to the user, for example. It's poorly suited to bulk import jobs or event-logging pipelines where data is written once and rarely read from cache.

Write-behind (write-back) caching

Write-behind stores data in the cache immediately and writes to the database asynchronously — the cache acknowledges the write to the application before the database write completes. This makes writes very fast (a single in-memory store) at the expense of a durability window: if the cache node fails between the write and the database flush, the write is lost.

Write-behind is most appropriate where occasional data loss is acceptable and throughput is paramount: analytics event counters, view counts, user activity streams. It is dangerous for financial transactions or any write that the application considers durable once acknowledged. If you use write-behind, the cache system must persist a write-behind log (Redis AOF, for example) to bound the window of potential loss.

Choosing a TTL

Time-to-live is the simplest invalidation mechanism and the right default for data that changes infrequently. Choosing it requires quantifying the staleness you can tolerate and balancing it against database load.

A few guidelines: for user-facing display data (product descriptions, article content) where slight staleness is acceptable, 5–15 minutes is a reasonable starting point. For derived aggregates (total order count per user) you might tolerate 60 seconds. For data that must be current (account balance, inventory count for purchase flow), use cache-aside with explicit invalidation on write rather than relying on TTL.

Set TTLs with jitter: if all keys are set with a 300-second TTL simultaneously (e.g., after a cache flush), they all expire at once and cause a synchronized thundering herd. Adding a random offset (random.randint(270, 330)) staggers expiration and smooths the load spike.

Cache stampede and the dogpile effect

When a popular cache key expires and dozens of threads request it concurrently, each sees a miss and attempts to repopulate independently. This is the cache stampede (or dogpile) problem. Under extreme load it can overwhelm the database just when the cache is needed most.

The standard solutions are: a per-key lock (only one refresher, others wait); probabilistic early recompute (expire the key slightly early for a small random fraction of requests before it actually expires); or a stale-while-revalidate pattern (serve the stale value immediately while a background task refreshes it). HTTP caches already implement stale-while-revalidate natively via the Cache-Control directive of the same name.

HTTP caching as a layer

Before reaching application-level caches, consider what HTTP caching can do for you. A CDN or reverse proxy (Varnish, Cloudflare, Nginx) sitting in front of your application can cache entire HTTP responses. The response headers control this behavior:

Cache-Control: public, max-age=300, stale-while-revalidate=60
ETag: "a3f5e9b2"

max-age=300 tells the cache to serve the stored response for five minutes before revalidating. stale-while-revalidate=60 allows serving the stale response for an additional 60 seconds while revalidation happens in the background. An ETag allows conditional requests (If-None-Match), so revalidation only transfers the response body if the content has actually changed, not on every TTL expiry.

HTTP caches at the edge are operationally simpler than application caches for public content: no application code to maintain, no Redis cluster to operate, and geographic distribution is automatic with a CDN. They are limited to responses that are uniform across users; personalized or authenticated responses need per-user cache keys or must bypass the edge cache entirely.