Design a Chat System — System Design Interviews

Design a Chat System — Complete System Design Walkthrough

Chat systems are among the most frequently asked system design questions at top tech companies. Building one requires careful consideration of real-time communication, data consistency, presence management, and horizontal scalability. In this walkthrough, we will design a chat system from scratch, covering each stage of a structured system design interview.

Stage 1: Requirements Gathering

Before drawing a single box, clarify scope with your interviewer. This is where most candidates differentiate themselves. Spend 3-5 minutes here.

Functional Requirements

1-on-1 messaging — Real-time text message delivery between two users.
Group chat — Support groups of up to 500 members with the same real-time delivery guarantees.
Online presence — Show whether a user is online, offline, or idle.
Read receipts — Indicate when a message has been delivered and read by the recipient(s).
Message history — Persistent storage with paginated retrieval of past conversations.
Media sharing — Support for images and files (we will address the upload path but not build a full CDN).

Non-Functional Requirements

Scale: 50 million DAU, 500 million messages per day (~6,000 messages/second average, ~18,000 peak).
Latency: End-to-end message delivery under 300ms for online recipients (same region).
Availability: 99.99% uptime — chat is a core communication channel.
Consistency: Messages within a conversation must be ordered. Eventual consistency across read receipts and presence is acceptable.
Storage: Retain messages for at least 5 years. Average message size ~200 bytes, leading to roughly 100 GB of new message data per day before replication.

Interview tip: Always state your scale numbers explicitly. It shows the interviewer you understand that design decisions are driven by scale, and it gives you concrete numbers to reference during capacity estimation later.

Stage 2: API Design

A chat system uses two communication channels: WebSocket for real-time bidirectional messaging and REST for stateless operations like fetching history or managing groups.

WebSocket Events (Real-Time Channel)

Direction	Event	Payload
Client → Server	`send_message`	`{ conversationId, content, contentType, clientMsgId }`
Server → Client	`new_message`	`{ messageId, conversationId, senderId, content, contentType, timestamp }`
Client → Server	`typing`	`{ conversationId }`
Server → Client	`typing_indicator`	`{ conversationId, userId }`
Client → Server	`ack`	`{ messageId }`
Server → Client	`read_receipt`	`{ conversationId, userId, lastReadMessageId }`
Server → Client	`presence_update`	`{ userId, status, lastSeen }`

The clientMsgId field is a UUID generated by the client. It serves as an idempotency key to prevent duplicate messages on retry, and lets the client correlate its optimistic UI update with the server-confirmed messageId.

REST Endpoints

Method	Endpoint	Purpose
GET	`/v1/conversations/{id}/messages?cursor=&limit=50`	Paginated message history (cursor-based)
GET	`/v1/conversations`	List user's conversations with last message preview
POST	`/v1/conversations`	Create group or 1-1 conversation
PUT	`/v1/conversations/{id}/members`	Add/remove group members
POST	`/v1/media/upload`	Upload media, returns a media URL

Design decision: Why not use REST for sending messages? WebSocket connections are already open for receiving messages. Sending over the same channel avoids an extra HTTP round-trip and lets us push delivery confirmations immediately. REST is reserved for operations that do not require real-time semantics.

Stage 3: Data Model

The choice of storage is critical. Chat data has a clear access pattern: most reads are sequential within a conversation, and writes are append-heavy. This makes wide-column stores like Apache Cassandra or ScyllaDB ideal for the message store, while user and conversation metadata fits well in a relational database like PostgreSQL.

Messages Table (Cassandra)

Column	Type	Role
`conversation_id`	UUID	Partition key
`message_id`	TimeUUID / Snowflake ID	Clustering key (DESC)
`sender_id`	UUID
`content`	text
`content_type`	enum	text, image, file
`created_at`	timestamp
`client_msg_id`	UUID	Idempotency key

Partitioning by conversation_id ensures all messages in a conversation are co-located. The clustering key orders messages by time, making paginated history retrieval a single partition scan.

Conversations Table (PostgreSQL)

Column	Type
`conversation_id`	UUID (PK)
`type`	enum (direct, group)
`name`	text (nullable, for groups)
`created_at`	timestamp
`last_message_at`	timestamp
`last_message_preview`	text

Participants Table (PostgreSQL)

Column	Type
`conversation_id`	UUID (FK)
`user_id`	UUID (FK)
`joined_at`	timestamp
`last_read_message_id`	UUID (nullable)
`role`	enum (member, admin)

The last_read_message_id in the participants table is the backbone of read receipts. When a user opens a conversation, the client sends an ack with the latest message ID, and we update this column. To render receipts, we simply query all participants for a conversation and compare their last_read_message_id against each message.

Storage Choice Rationale

Cassandra for messages: Append-only writes, time-range queries, and horizontal scalability make it a natural fit. At 500M messages/day, a relational DB would struggle with write throughput on a single table.
PostgreSQL for metadata: Conversations and participants are low-volume, highly relational data. Joins (list conversations for a user with last message) are natural here.
Redis for presence and sessions: Ephemeral data with high read/write frequency. TTL-based expiration maps perfectly to online/offline detection.

Stage 4: High-Level Architecture

Let us trace the lifecycle of a message from sender to recipient.

Message Delivery Flow

Client sends a send_message event over its WebSocket connection to the WebSocket Gateway.
The Gateway forwards the message to the Chat Service, which validates the payload, generates a message_id (Snowflake ID for global ordering), and writes it to the Message Store (Cassandra).
The Chat Service updates the conversation's last_message_at and last_message_preview in PostgreSQL.
The Chat Service publishes the message to a Message Queue (Kafka or Redis Streams), partitioned by conversation_id.
A set of Fan-out Workers consume from the queue. For each recipient in the conversation, the worker checks the Session Registry (Redis) to find which Gateway server holds the recipient's WebSocket connection.
If the recipient is online: the worker routes the message to the correct Gateway server, which pushes it to the recipient's WebSocket.
If the recipient is offline: the worker enqueues a push notification via the Push Notification Service (APNs/FCM).

Fan-Out Strategy

For 1-1 chats, fan-out is trivial — one recipient. For group chats, we fan out to all members. With a 500-member cap, this is manageable. If we needed to support channels with millions of subscribers (like a broadcast system), we would switch to fan-out-on-read, but for groups up to 500, fan-out-on-write keeps delivery latency low.

WebSocket Gateway

The Gateway is a stateful service that maintains persistent WebSocket connections. Key design considerations:

Sticky sessions via consistent hashing — A load balancer maps each user to a specific Gateway instance. If a Gateway fails, connections are redistributed.
Session Registry — A Redis hash maps user_id to gateway_instance_id. When a user connects, we register; on disconnect, we deregister. The fan-out worker consults this registry to route messages.
Heartbeats — Clients send periodic pings (every 30 seconds). If three consecutive pings are missed, the server closes the connection and marks the user offline.

Stage 5: Deep Dive

We will dive deep into two critical subsystems: message ordering and the presence system.

Deep Dive 1: Message Ordering

Correct message ordering is one of the hardest problems in distributed chat. Users expect messages to appear in a consistent, intuitive sequence, but network delays, clock skew, and concurrent writes make this challenging.

ID Generation with Snowflake IDs

We use a Snowflake-style ID generator that produces 64-bit IDs composed of:

41 bits: Millisecond timestamp (gives ~69 years of range).
10 bits: Machine/worker ID (1,024 workers).
13 bits: Sequence number (8,192 messages per millisecond per worker).

Because the timestamp is the most significant portion, IDs are roughly time-ordered. Messages from different senders in the same millisecond will have different worker IDs, but that is acceptable — the ordering within a conversation only needs to be consistent, not perfectly causal.

Causal Ordering Within a Conversation

For stronger ordering guarantees, we use a per-conversation sequence number. The Chat Service maintains an atomic counter per conversation (stored in Redis). When a message arrives:

Increment the counter atomically: INCR conv:{conversation_id}:seq.
Attach the sequence number to the message before writing to Cassandra.
Clients render messages ordered by this sequence number.

This ensures that even if two messages arrive at different Chat Service instances simultaneously, they receive distinct, sequential numbers. The Redis INCR operation is atomic and operates in ~0.1ms, adding negligible latency.

Handling Client-Side Ordering

Clients may receive messages out of order due to network jitter. The client maintains a local buffer and sorts by sequence number. If a gap is detected (e.g., received seq 5 and 7 but not 6), the client waits briefly (200ms) for the missing message before requesting it via REST as a fallback.

Deep Dive 2: Presence System

Presence (online/offline/idle) seems simple but becomes complex at scale. With 50M DAU, we cannot broadcast presence changes to everyone.

Architecture

The Presence Service maintains state in Redis with the following schema:

Key	Value	TTL
`presence:{user_id}`	`{ status: "online", last_seen: ts, gateway_id: "gw-3" }`	90 seconds

The lifecycle works as follows:

Connection established: Gateway registers the user in Redis with status online and a 90-second TTL.
Heartbeat received: Gateway refreshes the TTL. If the user is active (sending messages, typing), status remains online. After 60 seconds of inactivity, status transitions to idle.
Connection lost: Gateway deletes the key. If the Gateway crashes, the TTL ensures the key expires automatically — no stale presence data.

Presence Fan-Out (Subscription Model)

We do not broadcast presence to all users. Instead, we use a subscription model:

When a user opens the app, the client subscribes to presence updates for users visible on screen (recent conversations, group members).
The Presence Service maintains a reverse index: presence_subscribers:{user_id} → Set of subscriber user_ids.
When a user's status changes, we publish the update only to their subscribers.
When a user navigates away from a conversation, the client unsubscribes.

This reduces presence fan-out from O(N) to O(K), where K is the number of active subscribers per user — typically under 100.

Consistency Trade-Off

Presence is inherently eventually consistent. A user might appear online for up to 90 seconds after their connection drops (if the Gateway crashes and cannot deregister). This is an acceptable trade-off — users are accustomed to slight delays in presence indicators. The alternative (strong consistency with a coordination protocol) would add latency to every heartbeat and is not worth the cost.

Stage 6: Scaling and Trade-Offs

Connection Management at Scale

Each WebSocket connection consumes a file descriptor and a small amount of memory (~10-20 KB). A modern server can handle ~500,000 concurrent connections. With 50M DAU and assuming 20% concurrency (10M simultaneous connections), we need approximately 20 Gateway instances.

Key scaling considerations:

Horizontal scaling: Add Gateway instances behind a load balancer. Use consistent hashing so that reconnecting clients are likely to hit the same instance (reducing session registry churn).
Graceful shutdown: When scaling down, drain connections gradually. Send a reconnect signal to connected clients so they migrate to another Gateway.
Multi-region: Deploy Gateway clusters in multiple regions. Use a global Session Registry (or regional registries with cross-region lookup) so messages can be routed to users connected in a different region.

Sharding the Message Store

Cassandra handles sharding natively via consistent hashing on the partition key (conversation_id). However, hot partitions can arise from extremely active group chats. Mitigation strategies:

Bucketed partitions: For very active conversations, append a time bucket to the partition key: conversation_id#YYYY-MM-DD. This spreads writes across multiple partitions while keeping time-range queries efficient.
Compaction strategy: Use TimeWindowCompactionStrategy — it aligns well with time-ordered message data and avoids expensive full compactions.

Kafka Partitioning for Message Delivery

The message queue is partitioned by conversation_id. This ensures that messages within a conversation are consumed in order by a single fan-out worker. If a consumer falls behind, we can increase partitions, but we must re-hash conversation assignments carefully to avoid reordering.

Trade-Off Summary

Decision	Choice	Trade-Off
Message store	Cassandra	High write throughput, but no cross-partition transactions. Acceptable because messages are independent.
Metadata store	PostgreSQL	Strong consistency for conversations/participants, but limited write scalability. Acceptable at our metadata volume.
Fan-out strategy	Write (push to recipients)	Low latency delivery, but costlier for large groups. 500-member cap keeps it manageable.
Presence consistency	Eventual (TTL-based)	May show stale online status briefly. Acceptable UX trade-off vs. coordination overhead.
Message ordering	Per-conversation sequence counter	Adds a Redis round-trip per message. Negligible at 0.1ms, but Redis becomes a dependency.
ID generation	Snowflake IDs	Roughly time-ordered without coordination. Clock skew can cause minor reordering across workers, but sequence numbers compensate.

Scoring Tips

To score well on a chat system design question, keep these principles in mind:

Lead with requirements. State your scale assumptions (DAU, messages/day, peak QPS) before drawing anything. This frames every subsequent decision.
Justify your storage choices. Do not just say "use Cassandra." Explain why the access pattern (append-heavy, time-range queries, partition-per-conversation) matches the storage engine's strengths.
Address the hard parts proactively. Message ordering and presence at scale are where interviewers probe. Show you understand the subtlety — clock skew, fan-out costs, eventual consistency trade-offs.
Distinguish 1-1 from group chat. The fan-out strategy, storage partitioning, and presence subscription model all change with group size. Interviewers expect you to address this.
Show depth over breadth. It is better to deeply explain message ordering (Snowflake IDs + per-conversation sequence counters + client-side buffering) than to superficially mention ten features.
Mention failure modes. What happens when a Gateway crashes? (TTL-based presence cleanup, message retries via the queue.) What happens when Cassandra is temporarily unavailable? (Queue absorbs writes, replay on recovery.) These details signal production-level thinking.

Practice walking through this architecture end-to-end in under 35 minutes. If you can explain each stage concisely while fielding follow-up questions, you are well-prepared for any chat system design interview. Tools like Hoppers AI can help you simulate this experience with real-time feedback on structure, depth, and communication clarity.