API design is the connective tissue of every distributed system. In a system design interview, you will almost always be asked to define endpoints, choose protocols, and explain how clients communicate with your backend. Getting this stage right signals to the interviewer that you think in terms of contracts, not just boxes and arrows.
This chapter covers everything you need to confidently handle the API design stage — typically Stage 2 of the six-stage system design framework (Requirements, API Design, Data Model, High-Level Architecture, Deep Dive, and Scaling). We will move from protocols through REST conventions, rate limiting, idempotency, authentication, and finally show you how to present all of this cleanly in an interview setting.
Communication Protocols
Before you write a single endpoint, you need to pick the right transport. The protocol you choose shapes latency, throughput, developer ergonomics, and even cost. Here are the protocols that come up most often in interviews.
HTTP/1.1
The workhorse of the web. HTTP/1.1 uses a text-based request-response model over TCP. Each request requires its own connection (or reuses one via keep-alive, but only sequentially). This creates head-of-line blocking: if one request is slow, everything behind it waits.
When it fits: simple CRUD APIs, internal services with low concurrency, situations where broad client compatibility matters above all else.
HTTP/2
HTTP/2 solves head-of-line blocking at the application layer by multiplexing multiple streams over a single TCP connection. It also introduces header compression (HPACK) and server push. Most modern APIs default to HTTP/2.
When it fits: any production API serving browsers or mobile clients, microservice-to-microservice communication, anywhere you need concurrent requests over a single connection.
HTTP/3
HTTP/3 replaces TCP with QUIC (built on UDP). This eliminates TCP-level head-of-line blocking — if one stream's packet is lost, other streams are unaffected. Connection establishment is faster (0-RTT in many cases). Adoption is growing rapidly.
When it fits: latency-sensitive mobile applications, unreliable networks (cellular), video streaming, global CDN edge traffic.
WebSockets
WebSockets provide full-duplex, persistent connections. After an HTTP upgrade handshake, both client and server can push messages at any time. This makes them ideal for real-time features.
When it fits: chat applications, live dashboards, collaborative editing, multiplayer games, any feature where the server needs to push data to the client without polling.
Server-Sent Events (SSE)
SSE is a simpler alternative to WebSockets for server-to-client streaming. It uses a standard HTTP connection that stays open, with the server writing events in a text-based format. The browser handles reconnection automatically.
When it fits: live feeds, notification streams, progress updates — any case where data flows in one direction (server to client) and you want simplicity over the bidirectional power of WebSockets.
gRPC
gRPC uses HTTP/2 under the hood with Protocol Buffers for serialization. It supports four communication patterns: unary (request-response), server streaming, client streaming, and bidirectional streaming. Strong typing via .proto files means generated client and server stubs catch contract mismatches at compile time.
When it fits: microservice-to-microservice communication where latency and payload size matter, polyglot environments (generate clients in any language), internal APIs where browser support is not a constraint.
Protocol Comparison
| Protocol | Transport | Direction | Serialization | Best For | Drawback |
|---|---|---|---|---|---|
| HTTP/1.1 | TCP | Request-Response | Text (JSON/XML) | Simple CRUD APIs | Head-of-line blocking |
| HTTP/2 | TCP | Multiplexed streams | Binary frames | Modern APIs, microservices | TCP-level HOL blocking remains |
| HTTP/3 | QUIC (UDP) | Multiplexed streams | Binary frames | Mobile, edge, low-latency | Newer, less tooling support |
| WebSocket | TCP | Full-duplex | Any (typically JSON) | Chat, real-time collaboration | Stateful connections, harder to scale |
| SSE | TCP (HTTP) | Server → Client | Text event stream | Live feeds, notifications | Unidirectional only |
| gRPC | HTTP/2 | All four patterns | Protobuf (binary) | Internal microservices | No native browser support |
Interview tip: Don't just name a protocol — explain why it fits the problem. "We'll use WebSockets for the chat service because we need the server to push messages to connected clients in real time, and polling would add unacceptable latency" is far stronger than "we'll use WebSockets."
REST API Design Patterns
REST is still the default for public-facing APIs. Even if you plan to use gRPC internally, the interviewer will often ask you to sketch out a RESTful interface. Here are the conventions that demonstrate fluency.
Resource Naming
Resources are nouns, not verbs. Use plural names and nest logically.
GET /users/{userId}/messages— list messages for a userPOST /channels/{channelId}/messages— send a message to a channelDELETE /messages/{messageId}— delete a specific message
Avoid action-oriented URLs like POST /sendMessage. The HTTP method already conveys the action. The exception is for operations that genuinely don't map to CRUD — for example, POST /messages/{messageId}/translate — but use these sparingly.
HTTP Methods and Status Codes
Use methods semantically:
- GET — read, safe, idempotent
- POST — create, not idempotent (unless you add idempotency keys)
- PUT — full replace, idempotent
- PATCH — partial update, not necessarily idempotent
- DELETE — remove, idempotent
Status codes to know cold:
200 OK— success with body201 Created— resource created (return the resource +Locationheader)204 No Content— success, no body (common for DELETE)400 Bad Request— client sent invalid data401 Unauthorized— missing or invalid authentication403 Forbidden— authenticated but not authorized404 Not Found— resource does not exist409 Conflict— state conflict (e.g., duplicate creation)429 Too Many Requests— rate limited500 Internal Server Error— server-side failure
Pagination
Any endpoint that returns a list needs pagination. Two main strategies:
Offset-based: GET /messages?offset=20&limit=10. Simple to implement, but suffers from drift — if new items are inserted while a client pages through results, they can see duplicates or miss items.
Cursor-based: GET /messages?cursor=eyJpZCI6MTIzfQ&limit=10. The cursor is an opaque token (often a base64-encoded identifier) pointing to the last item the client saw. The server returns the next page relative to that item. No drift, performs well on large datasets because the database can seek directly to the cursor position.
Use cursor-based pagination for feeds, timelines, and chat histories — anywhere data changes frequently. Offset-based is acceptable for admin dashboards or search results where consistency is less critical.
API Versioning
You need a strategy for evolving your API without breaking existing clients. Three approaches:
- URL path:
/v1/messages,/v2/messages. Most common, easy to route. - Header:
Accept: application/vnd.myapp.v2+json. Cleaner URLs but harder to test in a browser. - Query parameter:
/messages?version=2. Simple but pollutes the query string.
In an interview, URL path versioning is the safest default. Mention that you would keep the previous version running during a migration period and deprecate with a timeline.
API Rate Limiting
Rate limiting protects your system from abuse, ensures fair usage, and prevents cascade failures. Interviewers love asking about this in deep dives.
Token Bucket
Each client has a bucket that holds up to N tokens. Tokens refill at a steady rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected with 429.
Strengths: allows short bursts (up to the bucket capacity), smooth long-term rate. Used by AWS, Stripe, and most major APIs.
Sliding Window
Track the count of requests in a rolling time window (e.g., the last 60 seconds). This avoids the boundary problem of fixed windows — where a client could send double the limit by timing requests at a window boundary.
Implementation: use a sorted set in Redis where each entry is a request timestamp. On each new request, remove entries older than the window, count remaining entries, and allow or reject.
Distributed Rate Limiting
In a multi-server setup, rate limiting state must be shared. The standard approach is a centralized store like Redis.
- Single Redis counter:
INCR user:{userId}:requestswith a TTL matching the window. Simple, but a single Redis node is a bottleneck. - Sliding window in Redis:
ZADDwith timestamp scores,ZREMRANGEBYSCOREto prune,ZCARDto count. More accurate, slightly more expensive. - Local + global hybrid: Each server tracks a local count and periodically syncs with a global store. Reduces Redis calls but introduces slight inaccuracy.
In your interview, mention that you would return 429 Too Many Requests with a Retry-After header so clients know when to retry.
Idempotency and Error Handling
Idempotency
An operation is idempotent if performing it multiple times produces the same result as performing it once. GET, PUT, and DELETE are naturally idempotent. POST is not — sending the same "create order" request twice could create two orders.
The fix: idempotency keys. The client generates a unique key (UUID) and sends it in a header: Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000. The server stores the result keyed by this value. If the same key arrives again, the server returns the stored result without re-executing the operation.
This is critical for payment APIs, order placement, and any operation where duplicates cause real-world harm. Stripe, for example, uses this pattern on every mutating endpoint.
Request and Response Schemas
Design consistent shapes:
// Success response
{
"data": { ... },
"meta": {
"cursor": "abc123",
"hasMore": true
}
}
// Error response
{
"error": {
"code": "INVALID_INPUT",
"message": "The 'email' field must be a valid email address.",
"details": [
{ "field": "email", "issue": "invalid_format" }
]
}
}Wrapping responses in a data envelope lets you add metadata (pagination cursors, rate limit info) without breaking the schema. Structured error objects with machine-readable codes allow clients to handle errors programmatically.
Error Handling Patterns
- Fail fast with clear codes: validate input at the edge. Return
400with specific field-level errors before touching the database. - Retry semantics:
5xxerrors are retryable;4xxerrors generally are not (except429). Document this for your clients. - Circuit breaker: if a downstream service fails repeatedly, stop calling it for a cooldown period. Return a degraded response or
503 Service Unavailable. - Partial success: for batch endpoints, return
207 Multi-Statuswith per-item results so the client knows exactly what succeeded and what failed.
Authentication and Authorization
Every API needs an auth strategy. The choice depends on who the consumers are and what security guarantees you need.
API Keys
A simple static secret passed in a header: X-API-Key: sk_live_abc123. Easy to implement and easy to understand.
When it fits: server-to-server communication, third-party integrations where the caller is a known service (e.g., a Stripe webhook calling your server). Not suitable for end-user authentication — API keys are typically long-lived and lack user identity.
OAuth 2.0
The industry standard for delegated authorization. A user grants a third-party app limited access to their resources without sharing credentials. The flow involves an authorization server issuing access tokens (and refresh tokens) after user consent.
When it fits: any product with third-party integrations, social login ("Sign in with Google"), or scenarios where fine-grained scopes are needed (e.g., read-only vs. read-write access).
JWT (JSON Web Tokens)
A self-contained token that encodes claims (user ID, roles, expiration) and is signed by the server. The key advantage: stateless verification. Any service can validate the token by checking the signature without calling an auth server. This eliminates a network hop on every request.
When it fits: microservice architectures where many services need to verify the caller's identity, mobile and single-page applications, anywhere you want to minimize auth-service load.
Trade-off: JWTs cannot be revoked individually without maintaining a blacklist, which partially defeats the stateless benefit. Common mitigation: short-lived access tokens (e.g., 15 minutes) paired with longer-lived refresh tokens.
Comparison
| Method | Identity | Revocability | Complexity | Best For |
|---|---|---|---|---|
| API Key | Service-level | Rotate key | Low | Server-to-server, webhooks |
| OAuth 2.0 | User-level | Revoke token | High | Third-party access, social login |
| JWT | User-level | Short TTL + refresh | Medium | Microservices, SPAs, mobile |
In an interview, the right answer is usually JWT for user-facing endpoints (issued via an OAuth flow or your own login endpoint) and API keys for server-to-server or webhook integrations. Mention token expiration and refresh semantics to show depth.
Presenting API Design in the Interview
Now let's tie it all together. In Stage 2 of the six-stage system design framework, the interviewer expects you to define the external contract of your system. Here is how to do it well.
Step 1: Start with the Core User Flows
Identify the 3-5 most critical operations from your requirements (Stage 1). For a chat application, these might be:
- Send a message
- Fetch message history for a channel
- Create a channel
- List channels for a user
Step 2: Define Endpoints
Write them out concisely. You do not need full OpenAPI specs — a table or list is enough:
| Method | Endpoint | Description |
|---|---|---|
| POST | /channels | Create a channel |
| GET | /users/{userId}/channels | List channels for a user |
| POST | /channels/{channelId}/messages | Send a message |
| GET | /channels/{channelId}/messages?cursor=X&limit=20 | Fetch message history |
Step 3: Call Out Key Decisions
This is where you differentiate yourself. For each design choice, state what you chose and why:
- "I'm using cursor-based pagination for message history because messages are append-heavy and offset pagination would cause drift."
- "For real-time message delivery, I'll use WebSockets alongside the REST API. REST handles CRUD; the WebSocket connection pushes new messages to connected clients."
- "I'll add an
Idempotency-Keyheader on the send-message endpoint to prevent duplicate messages if the client retries on a network timeout." - "Rate limiting will use a token bucket per user, enforced at the API gateway layer, with limits stored in Redis."
Step 4: Sketch the Request/Response
For the most important endpoint, quickly show the shape:
POST /channels/{channelId}/messages
Headers: Authorization: Bearer <jwt>, Idempotency-Key: <uuid>
Request:
{
"content": "Hello, world!",
"type": "text"
}
Response (201 Created):
{
"data": {
"messageId": "msg_a1b2c3",
"channelId": "ch_x9y8z7",
"senderId": "usr_p4q5r6",
"content": "Hello, world!",
"type": "text",
"createdAt": "2026-03-12T10:30:00Z"
}
}This takes 30 seconds and shows the interviewer you think about real contracts, not abstract hand-waving.
Common Mistakes to Avoid
- Designing too many endpoints. Focus on the core flows. You can always add more later if the interviewer asks.
- Ignoring error cases. Mention what happens when the channel doesn't exist (404), the user isn't a member (403), or the payload is invalid (400).
- Choosing a protocol without justification. Saying "WebSockets" without explaining why is a missed opportunity.
- Forgetting pagination. Any list endpoint without pagination is a red flag.
- Skipping authentication. Briefly state your auth strategy — it takes 10 seconds and shows completeness.
Quick Reference Checklist
Use this checklist when practicing API design for system design interviews:
- Identify core user flows from functional requirements
- Choose the right protocol and justify it
- Define 3-5 key REST endpoints with proper resource naming
- Specify pagination strategy (cursor-based for feeds)
- Add idempotency keys on non-idempotent mutations
- State your auth mechanism (JWT for users, API keys for services)
- Mention rate limiting approach and where it lives (gateway vs. application)
- Show one request/response example for the most critical endpoint
- Address error handling for the obvious failure modes
API design is one of the most concrete, demonstrable skills in a system design interview. Unlike high-level architecture where trade-offs can feel abstract, a well-designed API shows the interviewer that you can turn requirements into something a developer can actually build against. Practice sketching APIs for common systems — chat, e-commerce, social feeds — and the fluency will follow.
If you want to practice this skill under realistic conditions, Hoppers AI offers mock system design interviews that evaluate your API design stage alongside the full six-stage framework.