Networking & API Design — System Design Interviews

API design is the connective tissue of every distributed system. In a system design interview, you will almost always be asked to define endpoints, choose protocols, and explain how clients communicate with your backend. Getting this stage right signals to the interviewer that you think in terms of contracts, not just boxes and arrows.

This chapter covers everything you need to confidently handle the API design stage — typically Stage 2 of the six-stage system design framework (Requirements, API Design, Data Model, High-Level Architecture, Deep Dive, and Scaling). We will move from protocols through REST conventions, rate limiting, idempotency, authentication, and finally show you how to present all of this cleanly in an interview setting.

Communication Protocols

Before you write a single endpoint, you need to pick the right transport. The protocol you choose shapes latency, throughput, developer ergonomics, and even cost. Here are the protocols that come up most often in interviews.

HTTP/1.1

The workhorse of the web. HTTP/1.1 uses a text-based request-response model over TCP. Each request requires its own connection (or reuses one via keep-alive, but only sequentially). This creates head-of-line blocking: if one request is slow, everything behind it waits.

When it fits: simple CRUD APIs, internal services with low concurrency, situations where broad client compatibility matters above all else.

HTTP/2

HTTP/2 solves head-of-line blocking at the application layer by multiplexing multiple streams over a single TCP connection. It also introduces header compression (HPACK) and server push. Most modern APIs default to HTTP/2.

When it fits: any production API serving browsers or mobile clients, microservice-to-microservice communication, anywhere you need concurrent requests over a single connection.

HTTP/3

HTTP/3 replaces TCP with QUIC (built on UDP). This eliminates TCP-level head-of-line blocking — if one stream's packet is lost, other streams are unaffected. Connection establishment is faster (0-RTT in many cases). Adoption is growing rapidly.

When it fits: latency-sensitive mobile applications, unreliable networks (cellular), video streaming, global CDN edge traffic.

WebSockets

WebSockets provide full-duplex, persistent connections. After an HTTP upgrade handshake, both client and server can push messages at any time. This makes them ideal for real-time features.

When it fits: chat applications, live dashboards, collaborative editing, multiplayer games, any feature where the server needs to push data to the client without polling.

Server-Sent Events (SSE)

SSE is a simpler alternative to WebSockets for server-to-client streaming. It uses a standard HTTP connection that stays open, with the server writing events in a text-based format. The browser handles reconnection automatically.

When it fits: live feeds, notification streams, progress updates — any case where data flows in one direction (server to client) and you want simplicity over the bidirectional power of WebSockets.

gRPC

gRPC uses HTTP/2 under the hood with Protocol Buffers for serialization. It supports four communication patterns: unary (request-response), server streaming, client streaming, and bidirectional streaming. Strong typing via .proto files means generated client and server stubs catch contract mismatches at compile time.

When it fits: microservice-to-microservice communication where latency and payload size matter, polyglot environments (generate clients in any language), internal APIs where browser support is not a constraint.

Protocol Comparison

Protocol	Transport	Direction	Serialization	Best For	Drawback
HTTP/1.1	TCP	Request-Response	Text (JSON/XML)	Simple CRUD APIs	Head-of-line blocking
HTTP/2	TCP	Multiplexed streams	Binary frames	Modern APIs, microservices	TCP-level HOL blocking remains
HTTP/3	QUIC (UDP)	Multiplexed streams	Binary frames	Mobile, edge, low-latency	Newer, less tooling support
WebSocket	TCP	Full-duplex	Any (typically JSON)	Chat, real-time collaboration	Stateful connections, harder to scale
SSE	TCP (HTTP)	Server → Client	Text event stream	Live feeds, notifications	Unidirectional only
gRPC	HTTP/2	All four patterns	Protobuf (binary)	Internal microservices	No native browser support

Interview tip: Don't just name a protocol — explain why it fits the problem. "We'll use WebSockets for the chat service because we need the server to push messages to connected clients in real time, and polling would add unacceptable latency" is far stronger than "we'll use WebSockets."

REST API Design Patterns

REST is still the default for public-facing APIs. Even if you plan to use gRPC internally, the interviewer will often ask you to sketch out a RESTful interface. Here are the conventions that demonstrate fluency.

Resource Naming

Resources are nouns, not verbs. Use plural names and nest logically.

GET /users/{userId}/messages — list messages for a user
POST /channels/{channelId}/messages — send a message to a channel
DELETE /messages/{messageId} — delete a specific message

Avoid action-oriented URLs like POST /sendMessage. The HTTP method already conveys the action. The exception is for operations that genuinely don't map to CRUD — for example, POST /messages/{messageId}/translate — but use these sparingly.

HTTP Methods and Status Codes

Use methods semantically:

GET — read, safe, idempotent
POST — create, not idempotent (unless you add idempotency keys)
PUT — full replace, idempotent
PATCH — partial update, not necessarily idempotent
DELETE — remove, idempotent

Status codes to know cold:

200 OK — success with body
201 Created — resource created (return the resource + Location header)
204 No Content — success, no body (common for DELETE)
400 Bad Request — client sent invalid data
401 Unauthorized — missing or invalid authentication
403 Forbidden — authenticated but not authorized
404 Not Found — resource does not exist
409 Conflict — state conflict (e.g., duplicate creation)
429 Too Many Requests — rate limited
500 Internal Server Error — server-side failure

Pagination

Any endpoint that returns a list needs pagination. Two main strategies:

Offset-based: GET /messages?offset=20&limit=10. Simple to implement, but suffers from drift — if new items are inserted while a client pages through results, they can see duplicates or miss items.

Cursor-based: GET /messages?cursor=eyJpZCI6MTIzfQ&limit=10. The cursor is an opaque token (often a base64-encoded identifier) pointing to the last item the client saw. The server returns the next page relative to that item. No drift, performs well on large datasets because the database can seek directly to the cursor position.

Use cursor-based pagination for feeds, timelines, and chat histories — anywhere data changes frequently. Offset-based is acceptable for admin dashboards or search results where consistency is less critical.

API Versioning

You need a strategy for evolving your API without breaking existing clients. Three approaches:

URL path: /v1/messages, /v2/messages. Most common, easy to route.
Header: Accept: application/vnd.myapp.v2+json. Cleaner URLs but harder to test in a browser.
Query parameter: /messages?version=2. Simple but pollutes the query string.

In an interview, URL path versioning is the safest default. Mention that you would keep the previous version running during a migration period and deprecate with a timeline.

API Rate Limiting

Rate limiting protects your system from abuse, ensures fair usage, and prevents cascade failures. Interviewers love asking about this in deep dives.

Token Bucket

Each client has a bucket that holds up to N tokens. Tokens refill at a steady rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected with 429.

Strengths: allows short bursts (up to the bucket capacity), smooth long-term rate. Used by AWS, Stripe, and most major APIs.

Sliding Window

Track the count of requests in a rolling time window (e.g., the last 60 seconds). This avoids the boundary problem of fixed windows — where a client could send double the limit by timing requests at a window boundary.

Implementation: use a sorted set in Redis where each entry is a request timestamp. On each new request, remove entries older than the window, count remaining entries, and allow or reject.

Distributed Rate Limiting

In a multi-server setup, rate limiting state must be shared. The standard approach is a centralized store like Redis.

Single Redis counter: INCR user:{userId}:requests with a TTL matching the window. Simple, but a single Redis node is a bottleneck.
Sliding window in Redis: ZADD with timestamp scores, ZREMRANGEBYSCORE to prune, ZCARD to count. More accurate, slightly more expensive.
Local + global hybrid: Each server tracks a local count and periodically syncs with a global store. Reduces Redis calls but introduces slight inaccuracy.

In your interview, mention that you would return 429 Too Many Requests with a Retry-After header so clients know when to retry.

Idempotency and Error Handling

Idempotency

An operation is idempotent if performing it multiple times produces the same result as performing it once. GET, PUT, and DELETE are naturally idempotent. POST is not — sending the same "create order" request twice could create two orders.

The fix: idempotency keys. The client generates a unique key (UUID) and sends it in a header: Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000. The server stores the result keyed by this value. If the same key arrives again, the server returns the stored result without re-executing the operation.

This is critical for payment APIs, order placement, and any operation where duplicates cause real-world harm. Stripe, for example, uses this pattern on every mutating endpoint.

Request and Response Schemas

Design consistent shapes:

// Success response
{
  "data": { ... },
  "meta": {
    "cursor": "abc123",
    "hasMore": true
  }
}

// Error response
{
  "error": {
    "code": "INVALID_INPUT",
    "message": "The 'email' field must be a valid email address.",
    "details": [
      { "field": "email", "issue": "invalid_format" }
    ]
  }
}

Wrapping responses in a data envelope lets you add metadata (pagination cursors, rate limit info) without breaking the schema. Structured error objects with machine-readable codes allow clients to handle errors programmatically.

Error Handling Patterns

Fail fast with clear codes: validate input at the edge. Return 400 with specific field-level errors before touching the database.
Retry semantics: 5xx errors are retryable; 4xx errors generally are not (except 429). Document this for your clients.
Circuit breaker: if a downstream service fails repeatedly, stop calling it for a cooldown period. Return a degraded response or 503 Service Unavailable.
Partial success: for batch endpoints, return 207 Multi-Status with per-item results so the client knows exactly what succeeded and what failed.

Authentication and Authorization

Every API needs an auth strategy. The choice depends on who the consumers are and what security guarantees you need.

API Keys

A simple static secret passed in a header: X-API-Key: sk_live_abc123. Easy to implement and easy to understand.

When it fits: server-to-server communication, third-party integrations where the caller is a known service (e.g., a Stripe webhook calling your server). Not suitable for end-user authentication — API keys are typically long-lived and lack user identity.

OAuth 2.0

The industry standard for delegated authorization. A user grants a third-party app limited access to their resources without sharing credentials. The flow involves an authorization server issuing access tokens (and refresh tokens) after user consent.

When it fits: any product with third-party integrations, social login ("Sign in with Google"), or scenarios where fine-grained scopes are needed (e.g., read-only vs. read-write access).

JWT (JSON Web Tokens)

A self-contained token that encodes claims (user ID, roles, expiration) and is signed by the server. The key advantage: stateless verification. Any service can validate the token by checking the signature without calling an auth server. This eliminates a network hop on every request.

When it fits: microservice architectures where many services need to verify the caller's identity, mobile and single-page applications, anywhere you want to minimize auth-service load.

Trade-off: JWTs cannot be revoked individually without maintaining a blacklist, which partially defeats the stateless benefit. Common mitigation: short-lived access tokens (e.g., 15 minutes) paired with longer-lived refresh tokens.

Comparison

Method	Identity	Revocability	Complexity	Best For
API Key	Service-level	Rotate key	Low	Server-to-server, webhooks
OAuth 2.0	User-level	Revoke token	High	Third-party access, social login
JWT	User-level	Short TTL + refresh	Medium	Microservices, SPAs, mobile

In an interview, the right answer is usually JWT for user-facing endpoints (issued via an OAuth flow or your own login endpoint) and API keys for server-to-server or webhook integrations. Mention token expiration and refresh semantics to show depth.

Presenting API Design in the Interview

Now let's tie it all together. In Stage 2 of the six-stage system design framework, the interviewer expects you to define the external contract of your system. Here is how to do it well.

Step 1: Start with the Core User Flows

Identify the 3-5 most critical operations from your requirements (Stage 1). For a chat application, these might be:

Send a message
Fetch message history for a channel
Create a channel
List channels for a user

Step 2: Define Endpoints

Write them out concisely. You do not need full OpenAPI specs — a table or list is enough:

Method	Endpoint	Description
POST	`/channels`	Create a channel
GET	`/users/{userId}/channels`	List channels for a user
POST	`/channels/{channelId}/messages`	Send a message
GET	`/channels/{channelId}/messages?cursor=X&limit=20`	Fetch message history

Step 3: Call Out Key Decisions

This is where you differentiate yourself. For each design choice, state what you chose and why:

"I'm using cursor-based pagination for message history because messages are append-heavy and offset pagination would cause drift."
"For real-time message delivery, I'll use WebSockets alongside the REST API. REST handles CRUD; the WebSocket connection pushes new messages to connected clients."
"I'll add an Idempotency-Key header on the send-message endpoint to prevent duplicate messages if the client retries on a network timeout."
"Rate limiting will use a token bucket per user, enforced at the API gateway layer, with limits stored in Redis."

Step 4: Sketch the Request/Response

For the most important endpoint, quickly show the shape:

POST /channels/{channelId}/messages
Headers: Authorization: Bearer <jwt>, Idempotency-Key: <uuid>

Request:
{
  "content": "Hello, world!",
  "type": "text"
}

Response (201 Created):
{
  "data": {
    "messageId": "msg_a1b2c3",
    "channelId": "ch_x9y8z7",
    "senderId": "usr_p4q5r6",
    "content": "Hello, world!",
    "type": "text",
    "createdAt": "2026-03-12T10:30:00Z"
  }
}

This takes 30 seconds and shows the interviewer you think about real contracts, not abstract hand-waving.

Common Mistakes to Avoid

Designing too many endpoints. Focus on the core flows. You can always add more later if the interviewer asks.
Ignoring error cases. Mention what happens when the channel doesn't exist (404), the user isn't a member (403), or the payload is invalid (400).
Choosing a protocol without justification. Saying "WebSockets" without explaining why is a missed opportunity.
Forgetting pagination. Any list endpoint without pagination is a red flag.
Skipping authentication. Briefly state your auth strategy — it takes 10 seconds and shows completeness.

Quick Reference Checklist

Use this checklist when practicing API design for system design interviews:

Identify core user flows from functional requirements
Choose the right protocol and justify it
Define 3-5 key REST endpoints with proper resource naming
Specify pagination strategy (cursor-based for feeds)
Add idempotency keys on non-idempotent mutations
State your auth mechanism (JWT for users, API keys for services)
Mention rate limiting approach and where it lives (gateway vs. application)
Show one request/response example for the most critical endpoint
Address error handling for the obvious failure modes

API design is one of the most concrete, demonstrable skills in a system design interview. Unlike high-level architecture where trade-offs can feel abstract, a well-designed API shows the interviewer that you can turn requirements into something a developer can actually build against. Practice sketching APIs for common systems — chat, e-commerce, social feeds — and the fluency will follow.

If you want to practice this skill under realistic conditions, Hoppers AI offers mock system design interviews that evaluate your API design stage alongside the full six-stage framework.