System design is the most feared interview round at every major tech company. It is also, paradoxically, the most predictable.
Unlike coding rounds where you might face an algorithm you have never seen, system design interviews follow a remarkably consistent structure. The interviewer gives you an open-ended prompt. You have 45 minutes. They evaluate how you decompose the problem, make trade-offs, and communicate under ambiguity.
Here is the uncomfortable truth: most failures in system design interviews have nothing to do with technical knowledge. They come from a lack of structure. We have seen candidates with 10 years of experience at top companies fumble because they jumped straight into drawing boxes. And we have seen candidates with 3 years of experience earn Strong Hire by following a disciplined framework that demonstrated clarity of thought at every step.
This article teaches that framework. We will walk through all six stages using a single running example — designing a URL shortener — so you can see exactly how each stage builds on the last.
Why Most Candidates Fail System Design
Before diving into the framework, it helps to understand the three failure modes that account for the vast majority of rejected system design performances. Platforms like Hello Interview and Interviewing.io have documented these patterns across thousands of mock and real interviews, and they are remarkably consistent.
Failure Mode 1: Jumping Straight to Architecture
The interviewer says "Design a URL shortener" and within 30 seconds the candidate is drawing load balancers, databases, and cache layers. They never asked a single clarifying question. They have no idea what scale they are designing for, whether URLs expire, or whether analytics matter. The resulting architecture is either wildly over-engineered or missing critical components, and the interviewer cannot tell if the candidate understands the problem or just memorized a diagram.
Failure Mode 2: Over-Engineering
The candidate adds Kafka, Redis, Elasticsearch, a separate analytics pipeline, a machine learning service for spam detection, and a global CDN — all for a system that the interviewer intended to handle 1,000 requests per second. Every additional component is a liability you have to justify. If you cannot explain why Kafka is in your design and what breaks without it, the interviewer sees complexity without understanding.
Failure Mode 3: Inability to Go Deep
The candidate draws a clean high-level diagram. The interviewer points at the cache layer and asks: "Walk me through your caching strategy. What happens on a cache miss? How do you handle invalidation? What about thundering herd?" The candidate freezes or gives a one-sentence answer. This is where the interview is actually decided. Surface-level architecture gets you to the midpoint. Depth is what earns the hire.
The 6-Stage Framework
This framework allocates your 45 minutes across six stages, each with a specific purpose. The time splits are approximate — adjust based on interviewer signals — but the sequence matters. Each stage produces an artifact that feeds the next.
Stage 1: Requirements Gathering (3-5 Minutes)
What it is and why it matters
Before you design anything, you need to know what you are designing. This stage is about converting an ambiguous prompt into a concrete specification. You are establishing functional requirements (what the system does), non-functional requirements (how well it does it), and explicit scope boundaries (what you are not building).
This is not busywork. The requirements you establish here determine every subsequent decision. Designing for 100 requests per day is fundamentally different from designing for 100 million. A system that prioritizes consistency looks nothing like one that prioritizes availability.
What interviewers score
Interviewers are evaluating three things: whether you resist the urge to design immediately, whether you ask questions that reveal genuine understanding of the problem space, and whether you can distinguish between must-haves and nice-to-haves. A candidate who spends 3 minutes here before touching the whiteboard signals seniority more effectively than any amount of technical vocabulary.
The #1 mistake
Not asking clarifying questions. This is the single most reliable indicator of a junior candidate. If you assume the problem is fully specified and start designing, you communicate that you have never scoped a real system. Every senior engineer knows that the hardest part of building systems is figuring out what to build.
Example: URL Shortener
A strong requirements conversation might go like this:
"Before I start designing, I want to make sure I understand the scope. For functional requirements: we need to create shortened URLs from long URLs, redirect users who visit the short URL, and potentially track click analytics. Should I include analytics in scope, or focus on the core shortening and redirect?"
"Can users choose custom aliases, or are all short codes system-generated? Should shortened URLs expire, or do they live forever?"
"For non-functional requirements: what scale are we targeting? I will assume something like 100 million new URLs per month and a 100:1 read-to-write ratio, so 10 billion redirects per month. That gives us roughly 4,000 writes per second and 400,000 reads per second at peak. Does that feel right?"
"For latency, redirects should be under 100ms at p99 since users are waiting. URL creation can tolerate 500ms. Availability matters more than consistency here — a stale redirect is better than a failed one."
In under 4 minutes, you have established concrete numbers, identified the dominant access pattern (reads massively outnumber writes), and made an explicit availability-over-consistency choice that will guide your architecture.
Stage 2: API Design (3-5 Minutes)
What it is and why it matters
Define the external contract of your system before designing its internals. This forces you to think about what the system does from the caller's perspective — what endpoints exist, what they accept, what they return. It is the equivalent of writing function signatures before implementing the function body.
API design also reveals hidden complexity. When you write out the redirect endpoint, you immediately realize it is the performance-critical hot path. When you define the creation endpoint, you start thinking about validation, authentication, and idempotency.
What interviewers score
Clean separation of concerns, appropriate HTTP methods, clear request and response shapes, and consideration for edge cases like rate limiting, pagination, and error handling. Interviewers want to see that you think about interfaces, not just implementations.
The #1 mistake
Skipping this stage entirely. Many candidates go directly from requirements to database schema or architecture diagrams. This tells the interviewer you do not think about the system from the user's perspective — a red flag for anyone expected to design APIs that other teams consume.
Example: URL Shortener
Three endpoints cover the core functionality:
POST /v1/urls — Create a shortened URL. Request body: { "long_url": "https://example.com/very/long/path", "custom_alias": "my-link", "expires_at": "2026-12-31T00:00:00Z" }. Response: { "short_code": "ab3Kf9x", "short_url": "https://sho.rt/ab3Kf9x", "created_at": "2026-03-12T..." }. Requires authentication. Rate limited to 100 requests per minute per user.
GET /{short_code} — Redirect to the original URL. Returns a 301 (permanent redirect) for SEO or 302 (temporary) if we want to track every click. No authentication required. This is the hot path — it must be fast.
GET /v1/urls/{short_code}/stats — Retrieve click analytics. Response: { "total_clicks": 15420, "clicks_by_day": [...], "top_referrers": [...] }. Requires authentication (owner only).
Notice how writing these out immediately surfaces design decisions: should the redirect be 301 or 302? That depends on whether we prioritize cacheability or analytics accuracy. This is exactly the kind of trade-off discussion interviewers want.
Stage 3: Data Model (3-5 Minutes)
What it is and why it matters
Choose your storage engine, define your schema, and map it to the access patterns you identified in Stage 1. The data model is the foundation everything else sits on. Get it wrong and your entire architecture fights against itself. Get it right and the rest of the design falls into place naturally.
What interviewers score
Whether your storage choice is justified by your access patterns (not by familiarity or trendiness), whether your schema supports your API without expensive joins or scans, and whether you have thought about indexing, data growth, and lifecycle management.
The #1 mistake
Not justifying your storage choice. Saying "I will use PostgreSQL" or "I will use DynamoDB" without explaining why is a missed opportunity. The reasoning matters more than the choice. A candidate who picks MySQL and explains clearly why it fits the access patterns will outperform one who picks Cassandra because it sounds more impressive but cannot articulate the trade-offs.
Example: URL Shortener
The dominant access pattern is a point lookup: given a short code, return the original URL. This is a textbook key-value workload. You have two reasonable paths:
Option A: Relational database (PostgreSQL). A urls table with columns: short_code (VARCHAR, primary key), original_url (TEXT), user_id (UUID), created_at (TIMESTAMP), expires_at (TIMESTAMP, nullable). Add a secondary index on user_id for the "list my URLs" query. Advantages: ACID transactions for collision handling, mature tooling, easy analytics queries. Disadvantage: single-node write throughput ceiling.
Option B: NoSQL key-value store (DynamoDB). Partition key is short_code. All attributes stored as a single item. Advantages: predictable single-digit-millisecond reads at any scale, built-in TTL for URL expiration. Disadvantage: no ad-hoc queries, analytics requires a separate store.
At 4,000 writes per second, either works. At 40,000, the NoSQL path is more natural. State your reasoning: "I am choosing DynamoDB because our primary access pattern is key-value lookup, we need single-digit millisecond latency at scale, and the built-in TTL handles URL expiration without a separate cleanup job."
For analytics, store click events separately. A clicks table (or stream) with short_code, timestamp, referrer, and country keeps the hot path clean and lets you process analytics asynchronously.
This is also where you address short code generation. Three common approaches: hash-based (take MD5 or SHA-256 of the long URL, base62-encode the first 7 characters, retry on collision), counter-based (auto-increment a global counter, base62-encode it — no collisions but requires coordination), or pre-generated pool (generate a batch of unique codes ahead of time, pop from the pool on each request — no collision checking at write time). Each has trade-offs worth discussing.
Stage 4: High-Level Architecture (5-8 Minutes)
What it is and why it matters
Now you draw the boxes and arrows. But unlike candidates who start here, you have context: you know your access patterns, your API, and your data model. Every component you draw serves a specific purpose you can articulate. This is the difference between architecture and art.
What interviewers score
Clear data flow for each API endpoint, well-separated components with defined responsibilities, and an architecture that reflects the requirements. If you said reads outnumber writes 100:1, the interviewer expects to see your read path optimized accordingly. They also check that you can walk through a concrete request from client to response.
The #1 mistake
Drawing too many boxes without explaining the data flow. An architecture diagram is not a collection of components — it is a story about how data moves through the system. If you draw a cache but never explain when it is read, when it is written, or what happens on a miss, the diagram is decoration.
Example: URL Shortener
Walk through each path separately:
Write path (URL creation): Client sends POST request to the API Gateway (which handles TLS termination, rate limiting, and authentication). The request routes to the URL Service, which validates the input, generates a short code, writes to the database, and returns the shortened URL. The write path is relatively simple and infrequent.
Read path (redirect — the hot path): Client requests the short URL. The request hits a CDN edge node. On a cache hit, the CDN returns the 301 redirect immediately — the database is never touched. On a CDN miss, the request reaches the Redirect Service, which checks Redis (application-level cache). On a Redis hit, it returns the redirect. On a Redis miss, it reads from the database, populates Redis with a TTL, and returns the redirect. This three-tier caching strategy (CDN, Redis, database) ensures the vast majority of redirects never touch the database.
Analytics path (decoupled): The Redirect Service emits a click event to a message queue (SQS or Kafka) asynchronously after every redirect. A separate analytics worker consumes these events in batches and writes them to the analytics store. This decoupling is critical: click tracking must never add latency to the redirect. If the queue is temporarily unavailable, redirects continue to work and analytics events are retried later.
Call out the key insight explicitly: the read path and the analytics write path are decoupled by design. The redirect has a strict latency SLA. Analytics can tolerate seconds or even minutes of delay. Coupling them would sacrifice user experience for data completeness — the wrong trade-off.
Stage 5: Deep Dive (8-10 Minutes)
What it is and why it matters
This is the most important stage of the interview. The interviewer picks a component — or asks you to choose — and expects you to demonstrate genuine depth. They want to see that you understand not just what a component does, but how it works, how it fails, and how you handle those failures. This is where senior engineers separate from junior ones. A junior candidate describes the happy path. A senior candidate describes what breaks.
What interviewers score
Implementation-level detail, awareness of failure modes, concrete numbers and thresholds, and the ability to reason through edge cases in real time. The interviewer is not testing memorization — they are testing whether you have actually built and operated systems like this.
The #1 mistake
Staying surface-level when probed. If the interviewer asks "How does your cache handle invalidation?" and you answer "We delete the key when the URL is updated," you have given a correct but shallow answer. A strong candidate discusses write-through versus cache-aside trade-offs, TTL strategies, thundering herd mitigation, and what happens to in-flight requests during invalidation.
Example: URL Shortener — Three Possible Deep Dives
Deep Dive A: Short Code Generation. If using hash-based generation with base62 encoding of a truncated SHA-256, what is the collision probability? With a 7-character base62 code, you have 62^7 = 3.5 trillion possible values. At 1 billion URLs, the birthday paradox gives a collision probability of roughly 1 in 7,000 — not negligible. Mitigation: on collision, retry with a different salt (append a counter or timestamp to the input before hashing). For counter-based approaches, the coordination challenge is the bottleneck. Solutions include using a distributed counter (e.g., Redis INCR with failover) or pre-allocating ranges to each application server (server 1 gets codes 1-10000, server 2 gets 10001-20000). Range allocation eliminates per-request coordination but requires a range service.
Deep Dive B: Caching Strategy. Use cache-aside with Redis. On a redirect request, check Redis first. On a hit (expected 95%+ of the time for a mature system), return immediately. On a miss, read from the database, populate Redis with a TTL matching the URL expiration (or 24 hours for non-expiring URLs), and return. For invalidation when a URL is deleted: delete the Redis key synchronously, but the CDN cache will serve stale responses until its TTL expires. If this is unacceptable, issue a CDN invalidation — but CDN invalidations are expensive and slow, so only do it for explicit deletions, not expirations. Handle thundering herd (many requests for the same uncached key simultaneously) with request coalescing: use a distributed lock so only one request populates the cache while others wait. Redis itself failing? The service falls back to direct database reads. Latency increases from 2ms to 20ms but availability is preserved. Implement a circuit breaker so you do not hammer a recovering Redis with connection attempts.
Deep Dive C: Database Scaling. At 100 million writes per month, a single database node handles the write load comfortably. But read traffic after cache misses can spike. Add read replicas for the redirect path. For the write path, if you outgrow a single node, shard by the first two characters of the short code. With base62, that gives you 3,844 possible shards — more than enough. The sharding key must be the short code (not user_id) because the hot path is redirect-by-short-code. Replication lag on read replicas means a newly created URL might not be immediately available for redirect — add a small write-through to the cache on creation to eliminate this window.
Stage 6: Scaling and Trade-offs (5 Minutes)
What it is and why it matters
The final stage is about self-awareness. Step back from your design and evaluate it critically. Where are the bottlenecks? What breaks at 10x scale? What trade-offs did you make and would you make them differently? This is where you demonstrate that you can think about systems holistically, not just build them component by component.
What interviewers score
Ability to identify your own design's weaknesses, concrete reasoning about what breaks at scale, explicit articulation of trade-offs (not just "it depends"), and operational maturity — monitoring, alerting, and debugging in production.
The #1 mistake
Not mentioning monitoring or observability. A system without observability is a system you cannot operate. Every experienced interviewer has been paged at 3 AM because a system had no metrics and a silent failure went undetected for hours. If you do not mention monitoring, you signal that you have never been responsible for a production system.
Example: URL Shortener
Bottleneck identification: Reads outnumber writes 100:1, so the read path is the bottleneck. Our three-tier cache (CDN, Redis, database) handles this well. The true bottleneck at extreme scale is the database — not for reads (cached) but for storage. At 1.2 billion new URLs per year, each averaging 500 bytes, that is 600 GB per year of URL data alone. Solution: TTL-based expiration for inactive URLs, archival of old data to cold storage, and sharding when a single node's storage fills up.
CAP trade-offs: "This system favors availability and partition tolerance. A redirect serving a recently-deleted URL for a few seconds is acceptable — the user sees a working page, not an error. Consistency is eventual: a URL created in Region A might take 1-2 seconds to become available in Region B. For a URL shortener, this is the right trade-off. For a banking system, it would not be."
Global distribution: If we expand globally, replicate the database across regions with asynchronous replication. Place Redis instances in each region. The CDN already handles geographic distribution for the read path. Writes go to the primary region and replicate outward. Short code generation needs care: use region-prefixed ranges to prevent cross-region collisions.
Monitoring: Four critical metrics. First, redirect latency at p50, p95, and p99 — alert if p99 exceeds 200ms. Second, cache hit ratio — should stay above 95%; a sudden drop indicates a cache failure or a traffic pattern shift. Third, short code generation rate and collision rate — a rising collision rate means your code space is filling up. Fourth, error rate on the redirect path — even 0.1% errors mean thousands of broken redirects per minute at scale. Build a dashboard that shows these in real time. Page the on-call engineer if redirect error rate exceeds 0.5% for 5 consecutive minutes.
How Scoring Actually Works
Interviewers at most companies score system design across three to four dimensions. Understanding the rubric helps you calibrate the depth of your responses to the level you are interviewing for.
| Dimension | Junior (L3-L4) | Senior (L5) | Staff (L6+) |
|---|---|---|---|
| Requirements | Asks basic functional questions. Accepts the problem as given. | Identifies non-functional requirements unprompted. Prioritizes trade-offs between latency, consistency, and availability. | Drives scope with product sense. Challenges assumptions. Proposes phased delivery: MVP first, then iterate. |
| Architecture | Draws correct components. Data flows make sense. | Justifies every component choice. Identifies the dominant access pattern and optimizes for it. | Proposes novel approaches. Anticipates failure modes before being asked. References lessons from real systems. |
| Deep Dive | Answers questions correctly when asked. | Proactively identifies the most interesting area and offers to go deep. Discusses failure modes and mitigations. | Demonstrates deep expertise. Cites specific numbers (cache hit rates, replication lag bounds). References how real companies solved similar problems. |
| Communication | Explains what they are building. | Explains why they made each decision. Structures the conversation clearly. | Drives the entire session. Calibrates depth to interviewer signals. Makes the interviewer feel like a collaborator, not an examiner. |
The jump from Junior to Senior is mostly about justification — not just what you chose, but why. The jump from Senior to Staff is about anticipation — identifying problems before the interviewer asks about them and proposing solutions that reference real-world constraints.
Practice Strategy
You do not need to practice 50 problems. You need to practice 8-10 problems deliberately, covering the major system design patterns. Each pattern teaches reusable concepts that transfer across problems.
Read-heavy systems (optimize the read path with caching, CDNs, denormalization):
- URL shortener — the canonical starter problem
- News feed / Twitter timeline — fan-out on write vs. fan-out on read
- Instagram / Pinterest — blob storage, CDN, infinite scroll pagination
Write-heavy systems (optimize ingestion, batching, async processing):
- Chat system (WhatsApp) — message ordering, delivery guarantees, presence
- Ad click tracking — high-throughput event ingestion, deduplication, real-time aggregation
- Logging / metrics pipeline — append-only workload, time-series storage
Real-time systems (low latency, state synchronization, push-based delivery):
- Ride-sharing (Uber) — geospatial indexing, real-time matching, location updates
- Gaming leaderboard — sorted sets, rank computation, fan-out on update
- Collaborative editing (Google Docs) — operational transforms or CRDTs, conflict resolution
Blob-heavy systems (large file storage, chunking, deduplication):
- File sync (Dropbox) — chunked upload, deduplication, sync protocol
- Video streaming (YouTube) — transcoding pipeline, adaptive bitrate, CDN
- Image hosting (Imgur) — upload flow, thumbnail generation, content moderation
For each problem, practice the full 45-minute simulation. Do not just read solutions — that creates an illusion of competence that collapses under interview pressure. Set a timer, talk out loud (or write on a whiteboard), and force yourself to allocate time across all six stages. The most common mistake in practice is spending 30 minutes on architecture and rushing through the deep dive and scaling stages, which are exactly where the evaluation weight lies.
From Framework to Fluency
A framework gives you structure. Practice gives you fluency. The gap between reading this article and confidently navigating a 45-minute system design interview is filled by repetition under realistic conditions.
Hoppers AI offers system design mock interviews that follow this exact 6-stage framework. An AI interviewer presents a problem, guides you through requirements, API design, data modeling, architecture, deep dives, and scaling with targeted follow-up probes at each stage. After the session, you receive a detailed scorecard covering requirements clarity, architecture quality, technical depth, and communication — the same dimensions real interviewers evaluate. You can practice as many times as you need, on your own schedule, and track improvement over time.
The candidates who pass system design interviews are not the ones who know the most. They are the ones who communicate the most clearly, make trade-offs the most deliberately, and go deep the most confidently. All three of those skills are built through practice, not reading.