Why Caching Matters
Caching is one of the most powerful tools in a system designer's arsenal. At its core, caching trades space for time: you store the result of an expensive computation or data fetch in a faster storage layer so subsequent requests can be served without repeating the original work. Before diving into strategies, you need to internalize the latency numbers that drive every caching decision.
Latency Numbers Every Engineer Should Know
| Operation | Latency | Relative Scale |
|---|---|---|
| L1 cache reference | ~1 ns | 1x |
| L2 cache reference | ~4 ns | 4x |
| Main memory (RAM) reference | ~100 ns | 100x |
| SSD random read | ~16 μs | 16,000x |
| HDD random read | ~2 ms | 2,000,000x |
| Same-datacenter round trip | ~500 μs | 500,000x |
| Cross-continent round trip | ~150 ms | 150,000,000x |
The gap between RAM and network access is five to six orders of magnitude. This is why an in-memory cache like Redis sitting next to your application server can eliminate the vast majority of latency in a read-heavy system. When an interviewer asks you to optimize a slow read path, caching should be the first lever you reach for.
A useful mental model: if a RAM lookup were one second, a cross-continent database query would take roughly five years. That ratio explains why companies like Twitter, Facebook, and Netflix invest heavily in multi-layered caching infrastructure.
Caching Strategies
There are four primary caching strategies, each suited to different access patterns and consistency requirements. Knowing when to apply each one is critical in interviews.
Cache-Aside (Lazy Loading)
Cache-aside is the most common pattern and the one you should default to in interviews unless you have a specific reason to choose otherwise. The application is responsible for all cache interactions.
The flow works as follows: the application first checks the cache. On a cache hit, data is returned immediately. On a cache miss, the application queries the database, returns the result to the caller, and writes it into the cache for future requests.
Pros: Only requested data is cached (no wasted memory). The cache naturally reflects actual access patterns. Resilient to cache failures — the system degrades gracefully to database reads.
Cons: The first request for any piece of data is always a cache miss (cold start). There is a window of inconsistency between the database and cache after a write.
Write-Through
In a write-through cache, every write goes to both the cache and the database synchronously. The application writes to the cache, and the cache layer is responsible for persisting to the database before acknowledging the write.
Pros: The cache is always consistent with the database. Reads after writes are always cache hits. Simplifies the application code because the caching layer handles persistence.
Cons: Write latency increases because you must wait for both the cache and database writes. Every piece of written data occupies cache space, even if it is never read. This can lead to cache pollution.
Write-through is often combined with cache-aside for reads: the cache-aside pattern handles reads while write-through ensures write consistency.
Write-Behind (Write-Back)
Write-behind is the asynchronous cousin of write-through. The application writes to the cache, and the cache asynchronously flushes changes to the database in the background, often batching multiple writes together.
Pros: Dramatically lower write latency since the application only waits for the cache write. Batching reduces database load. Excellent for write-heavy workloads.
Cons: Risk of data loss if the cache node fails before flushing to the database. Increased complexity in managing the write queue. Eventual consistency between cache and database.
Interview tip: Write-behind is a strong choice when discussing systems with heavy write loads, such as a metrics pipeline or activity feed. Always mention the data-loss risk and how you would mitigate it (replication, write-ahead logs).
Read-Through
Read-through is similar to cache-aside, but the cache itself is responsible for loading data from the database on a miss. The application only ever talks to the cache — it does not directly access the database for reads.
Pros: Simplifies application code. The caching layer encapsulates all read logic. Works well with read-heavy workloads.
Cons: The first read is still a cache miss. The cache needs to know how to query your data store, which can introduce coupling. Less common in practice than cache-aside.
Cache Eviction Policies
Caches have finite memory. When the cache is full and a new entry needs to be added, the system must decide which existing entry to evict. The choice of eviction policy directly impacts your cache hit rate.
LRU (Least Recently Used)
LRU evicts the entry that has not been accessed for the longest time. It is the most widely used eviction policy and the default in most caching systems, including Redis.
When to use: General-purpose workloads where recent access is a good predictor of future access. This covers the majority of web applications.
Implementation: Typically a doubly-linked list combined with a hash map, giving O(1) access and eviction. Redis uses an approximated LRU algorithm that samples a configurable number of keys and evicts the least recently used among the sample.
LFU (Least Frequently Used)
LFU evicts the entry with the fewest accesses. It favors items that are accessed often over items that were accessed recently but only once.
When to use: Workloads with stable popular items — for example, a product catalog where a small set of products gets the vast majority of traffic. LFU keeps those popular items cached even during bursts of one-off requests.
Trade-offs: LFU can be slow to adapt to changing access patterns. An item that was popular yesterday but is no longer relevant will stay cached because of its high historical frequency. Redis mitigates this with a decay mechanism on frequency counters.
TTL (Time-To-Live)
TTL is not strictly an eviction policy but a complementary mechanism. Each cache entry is assigned an expiration time. After the TTL expires, the entry is removed (or lazily evicted on next access).
When to use: Always. TTL is a safety net that prevents stale data from living in the cache indefinitely. Even if you use LRU or LFU, set a TTL to bound staleness.
Choosing TTL values: There is no universal answer. Balance freshness requirements against cache hit rates. A 5-minute TTL might be appropriate for a user profile; a 24-hour TTL could work for a product description. In interviews, explicitly state your TTL choice and justify it based on the consistency requirements of the system.
| Policy | Best For | Weakness |
|---|---|---|
| LRU | General-purpose workloads | Scan pollution (one-off bulk reads flush popular items) |
| LFU | Stable popularity distributions | Slow to adapt to shifting access patterns |
| TTL | Bounding staleness | Does not optimize for hit rate on its own |
Redis Deep Dive
Redis is the de facto standard for application-level caching. Understanding its internals beyond basic GET/SET will set you apart in interviews.
Data Structures
Redis is far more than a key-value store. Its rich data structures are one of its greatest strengths:
- Strings: The simplest type. Used for caching serialized objects, counters (
INCR), and distributed locks (SET NX EX). - Hashes: Maps of field-value pairs under a single key. Ideal for representing objects (e.g., user profiles) without serialization overhead.
- Lists: Ordered collections supporting push/pop from both ends. Used for message queues, activity feeds, and bounded logs (
LTRIM). - Sets: Unordered collections of unique elements. Useful for tagging, tracking unique visitors, and set operations (intersection, union).
- Sorted Sets: Sets where each member has a score. The backbone of leaderboards, rate limiters, and priority queues.
- Streams: Append-only log structures for event sourcing and message brokering. Consumer groups enable distributed processing.
Persistence: RDB vs AOF
Redis provides two persistence mechanisms, and understanding the trade-offs is essential.
RDB (Redis Database Backup): Point-in-time snapshots saved to disk at configurable intervals. Compact, fast to load on restart, but you can lose data between snapshots.
AOF (Append-Only File): Logs every write operation. More durable (configurable fsync: every second, every write, or never) but larger files and slower restarts. AOF rewriting compacts the log periodically.
Recommended production setup: Enable both. Use AOF for durability (with appendfsync everysec) and RDB for fast disaster-recovery restores. Redis 7+ supports Multi-Part AOF which improves rewrite performance.
Clustering and Sentinel
Redis Sentinel provides high availability for non-clustered Redis. It monitors master and replica nodes, performs automatic failover when the master goes down, and acts as a configuration provider for clients. Sentinel is suitable when your dataset fits on a single node.
Redis Cluster provides horizontal scalability by sharding data across multiple nodes. It uses hash slots (16,384 total) distributed across masters. Each master can have replicas for failover. Cluster handles automatic resharding and rebalancing.
When to use which: Start with a single Redis instance. When you need high availability, add Sentinel. When your dataset outgrows a single node's memory, move to Cluster. In an interview, mention that you would start simple and scale as needed — do not jump straight to a clustered setup for a system with 10 GB of cache data.
CDN Caching
Content Delivery Networks cache content at edge locations geographically close to users, dramatically reducing latency for static and semi-static content.
Edge Caching
CDNs like CloudFront, Cloudflare, and Akamai maintain Points of Presence (PoPs) around the world. When a user requests a resource, the CDN serves it from the nearest PoP if cached, or fetches it from the origin server and caches it for subsequent requests.
What to cache at the edge: Static assets (JS, CSS, images, fonts), API responses that are identical for all users (public data), and pre-rendered HTML pages. Avoid caching personalized or authenticated content unless you use cache keys that include user identity (which reduces hit rates).
Cache-Control Headers
HTTP cache behavior is controlled primarily through response headers:
Cache-Control: max-age=3600— cache for 1 hour.Cache-Control: s-maxage=86400— CDN-specific max age (overrides max-age for shared caches).Cache-Control: no-cache— must revalidate with origin before serving (does not mean "do not cache").Cache-Control: no-store— truly do not cache this response anywhere.Cache-Control: stale-while-revalidate=60— serve stale content while fetching a fresh copy in the background.ETagandLast-Modified— enable conditional requests (304 Not Modified) to avoid transferring unchanged content.
CDN Invalidation
When content changes, you need to remove stale versions from the CDN. Strategies include:
- Cache busting via URL versioning: Append a hash or version to the filename (e.g.,
app.a3f8b2.js). The new URL is a different cache key, so old cached versions are naturally ignored. This is the preferred approach for static assets. - Explicit invalidation: Issue an invalidation request to the CDN (e.g., CloudFront
CreateInvalidation). This propagates to all edge locations but can take minutes and may have cost implications at scale. - Short TTLs: Use low
s-maxagevalues for frequently changing content. Combine withstale-while-revalidateto maintain performance.
Failure Modes
Caching introduces its own class of failure modes. Being able to identify and mitigate these in an interview demonstrates depth of understanding.
Thundering Herd
When a popular cache entry expires, hundreds or thousands of concurrent requests simultaneously experience a cache miss and all hit the database at once. This can overwhelm the database and cause cascading failures.
Mitigations:
- Locking / single-flight: Only one request fetches from the database; all others wait for the result. Implementations include Redis
SETNXlocks or application-levelsingleflightpatterns (standard in Go'sgolang.org/x/sync/singleflight). - Staggered TTLs: Add a small random jitter to TTL values so entries do not expire simultaneously.
- Early refresh: Proactively refresh cache entries before they expire (background refresh when TTL is below a threshold).
Cache Stampede
A variant of thundering herd that occurs during cache warming (cold start) or after a cache node failure. A large number of keys are missing simultaneously, and all requests fall through to the database.
Mitigations:
- Cache warming: Pre-populate the cache before directing traffic to a new node.
- Circuit breakers: Limit the rate of database requests when cache miss rates spike.
- Fallback to stale data: Serve slightly stale cached data while the fresh data is being fetched.
Hot Keys
A single cache key receiving disproportionately high traffic can overwhelm the cache node responsible for that key, even if the overall system has plenty of capacity.
Mitigations:
- Local (L1) cache: Cache hot keys in application memory (with very short TTL) to reduce requests to the centralized cache.
- Key replication: Store the same data under multiple keys (e.g.,
hot_key:1,hot_key:2, ...,hot_key:N) and randomly distribute reads across them. - Read replicas: In Redis Cluster, add read replicas for the shard handling the hot key.
Cache Penetration
Repeated requests for data that does not exist in the database. Every request is a cache miss (because there is nothing to cache) and hits the database, which returns empty. This can be exploited as an attack vector.
Mitigations:
- Negative caching: Cache the null result with a short TTL (e.g., 30-60 seconds). This prevents repeated database lookups for the same nonexistent key.
- Bloom filters: Place a Bloom filter in front of the cache. Before querying the cache or database, check the filter. If it says the key definitely does not exist, return empty immediately. Bloom filters are space-efficient and have zero false negatives.
Cache Invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
Cache invalidation is the process of removing or updating stale entries when the underlying data changes. Getting it wrong leads to users seeing stale data; getting it right is genuinely hard in distributed systems.
Invalidation Strategies
- TTL-based expiry: The simplest approach. Set a TTL and accept bounded staleness. Appropriate when perfect consistency is not required (most read-heavy applications).
- Event-driven invalidation: When data changes, publish an event (via a message queue, CDC stream, or database trigger) that deletes or updates the corresponding cache entry. This provides near-real-time consistency but adds architectural complexity.
- Write-through invalidation: The write path updates both the database and the cache atomically (or as close to atomically as possible). Guarantees consistency but increases write latency.
- Version-based invalidation: Store a version number with each cache entry. When the data changes, increment the version. Reads check the version and refetch if stale. This is common in distributed caches where direct deletion is expensive.
The Consistency vs. Complexity Trade-off
In interviews, do not pretend cache invalidation is simple. Acknowledge the trade-off explicitly:
| Approach | Consistency | Complexity | Latency Impact |
|---|---|---|---|
| TTL-only | Eventual (bounded) | Low | None |
| Event-driven | Near-real-time | Medium-High | None on reads |
| Write-through | Strong | Medium | Higher writes |
| Version-based | Read-time check | Medium | Slight read overhead |
For most systems, TTL-based expiry combined with event-driven invalidation for critical paths is the sweet spot. Use TTL as a safety net and events for timely updates on the data that matters most.
Applying Caching in System Design Interviews
Knowing caching theory is necessary but not sufficient. You need to know when to introduce caching in your design and how to discuss it effectively.
When to Introduce Caching
Do not start your design with caching. Follow this progression:
- Start with a simple design — single server, single database. Establish the functional requirements.
- Identify bottlenecks — ask about read/write ratios. If the system is read-heavy (>10:1), caching is almost certainly needed.
- Introduce caching as an optimization — explain what you are caching, why, and which strategy you are using. This demonstrates intentional design rather than pattern-matching.
- Discuss trade-offs — consistency, memory cost, failure modes. This is where you differentiate yourself from other candidates.
What to Cache
Not everything should be cached. Strong candidates are selective:
- Cache: Database query results, computed aggregations, session data, API responses from external services, rendered HTML fragments, user metadata.
- Do not cache: Rapidly changing data that requires strong consistency (e.g., account balances during transactions), large blobs that exceed memory budgets, one-off queries unlikely to be repeated.
How to Talk About It
When you introduce caching in an interview, structure your explanation:
- Identify the problem: "Our read path hits the database for every request. At 100K QPS, this will bottleneck on database connections."
- Propose the solution: "I will add a Redis cache using cache-aside. We will cache user profiles with a 5-minute TTL."
- Quantify the impact: "Assuming a 95% hit rate, we reduce database load from 100K to 5K QPS. Cache reads are sub-millisecond versus 5ms for database queries."
- Address trade-offs: "We accept up to 5 minutes of staleness on profile data, which is acceptable for this use case. For the user's own profile view, we can invalidate on write."
- Handle failure modes: "If Redis goes down, we degrade to direct database reads. We will use Sentinel for automatic failover and set connection timeouts to prevent cascading delays."
Multi-Layer Caching
Production systems often use multiple caching layers. Mention this in interviews to demonstrate real-world awareness:
- L1 — Browser/client cache: HTTP cache headers, service worker caches.
- L2 — CDN edge cache: Static assets and public API responses.
- L3 — Application-level cache: In-process cache (e.g., Guava, Caffeine) for extremely hot data.
- L4 — Distributed cache: Redis or Memcached for shared state across application instances.
- L5 — Database query cache: MySQL query cache (generally deprecated) or materialized views.
Each layer has different latency, capacity, and consistency characteristics. A well-designed system uses the right layer for the right data.
Key Metrics to Mention
Interviewers appreciate quantitative thinking. Reference these metrics when discussing your caching layer:
- Hit rate: Percentage of requests served from cache. Target >95% for read-heavy systems.
- Miss penalty: Additional latency incurred on a cache miss (cache lookup + database query vs. database query alone).
- Memory utilization: How much of the cache's memory is in use. Size your cache based on working set, not total dataset.
- Eviction rate: How frequently entries are evicted. A high eviction rate suggests the cache is undersized.
- Staleness window: Maximum time a user might see outdated data. Define this explicitly based on business requirements.
Caching is a topic that appears in nearly every system design interview. Master the strategies, understand the failure modes, and practice articulating trade-offs clearly. The best candidates do not just add a cache — they explain precisely why, how, and what happens when things go wrong.
Want to practice discussing caching trade-offs in a realistic interview setting? Hoppers AI runs mock system design interviews with real-time AI feedback, helping you refine both your designs and your communication.